summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-06wifi: iwlwifi: mvm: Activate EMLSR based on traffic volumeMiri Korenblit
Adjust EMLSR activation to account for traffic levels. By tracking the number of RX/TX MPDUs, EMLSR will be activated only when traffic volume meets the required threshold. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.9480f99ac8fc.If9eb946e929a39e10fe5f4638bc8bc3f8976edf1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06wifi: iwlwifi: mvm: don't always unblock EMLSRMiri Korenblit
When an event occurs to unblock EMLSR, the code attempts to re-enable EMLSR. However, the current implementation always tries to activate EMLSR, regardless of whether the blocker was set before the unblocking event or not. If EMLSR was already unblocked, there is no need to re-activate it. Fixes: 6cf7df9f013f ("wifi: iwlwifi: mvm: Add helper functions to update EMLSR status") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.eb861402dac9.I6a1d9f774f5551cfab60ea37b71a62640496af9b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06wifi: iwlwifi: mvm: Always allow entering EMLSR from debugfsMiri Korenblit
EMLSR can't be activated from mac80211. Except for the debugfs, which is intended for testing purposes. Currently we don't allow entering EMLSR from debugfs if EMLSR is blocked, i.e. if mvmvif::esr_disable_reason is not 0. But we need a way to activate EMLSR regardless of the vif being blocked, for testing. Remove the check of esr_disable_reason Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.bc3c24d9e0e6.Iad60e22a0d7e2b2b989051e1140b6dc98bef7bcc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06wifi: iwlwifi: mvm: add a debugfs for (un)blocking EMLSRMiri Korenblit
This is needed for testing purposes. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.eba2b6f0664c.I5f058e02abda11bf2eccfd2bcb59ca26bae87a3a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06wifi: iwlwifi: mvm: trigger link selection after exiting EMLSRMiri Korenblit
If the reason for exiting EMLSR was a blocking reason, wait for the corresponding unblocking event: - if there is an ongoing scan - do nothing. Link selection will be triggered at the end of it. - If more than 30 seconds passed since the exit, trigger MLO scan, which will trigger link selection - If less then 30 seconds passed since exit, reuse the latest link selection result If the reason for exiting EMLSR was an exit reason (IWL_MVM_EXIT_*), schedule MLO scan in 30 seconds. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Link: https://msgid.link/20240505091420.6a808c4ae8f5.Ia79605838eb6deee9358bec633ef537f2653db92@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06wifi: iwlwifi: cleanup EMLSR when BT is active handlingMiri Korenblit
BT Coex disables EMLSR only for a 2.4 GHz link, but doesn't block the vif from using EMLSR with a different link pair. In addition, storing it in mvmvif:disable_esr_reason requires extracting the BT Coex bit before checking if EMLSR is blocked or not for a specific vif. Therefore, change the BT Coex bit to be an exit reason and not a blocker. On link selection, EMLSR mode will be re-calculated for the 2.4 GHz link instead of checking that bit. While at it, move the relevant function declarations to the EMLSR functions area in mvm.h Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.a2e93b67c895.I183a0039ef076613144648cc46fbe9ab3d47c574@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06Merge wireless into wireless-nextJohannes Berg
Given how late we are in the cycle, merge the two fixes from wireless into wireless-next as they don't see that urgent. This way, the wireless tree won't need rebasing later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06netfilter: nft_set_pipapo: merge deactivate helper into callerFlorian Westphal
Its the only remaining call site so there is no need for this to be separated anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06netfilter: nft_set_pipapo: prepare walk function for on-demand cloneFlorian Westphal
The existing code uses iter->type to figure out what data is needed, the live copy (READ) or clone (UPDATE). Without pending updates, priv->clone and priv->match will point to different memory locations, but they have identical content. Future patch will make priv->clone == NULL if there are no pending changes, in this case we must copy the live data for the UPDATE case. Currently this would require GFP_ATOMIC allocation. Split the walk function in two parts: one that does the walk and one that decides which data is needed. In the UPDATE case, callers hold the transaction mutex so we do not need the rcu read lock. This allows to use GFP_KERNEL allocation while cloning. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06netfilter: nft_set_pipapo: prepare destroy function for on-demand cloneFlorian Westphal
Once priv->clone can be NULL in case no insertions/removals occurred in the last transaction we need to drop set elements from priv->match if priv->clone is NULL. While at it, condense this function by reusing the pipapo_free_match helper instead of open-coded version. The rcu_barrier() is removed, its not needed: old call_rcu instances for pipapo_reclaim_match do not access struct nft_set. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06netfilter: nft_set_pipapo: make pipapo_clone helper return NULLFlorian Westphal
Currently it returns an error pointer, but the only possible failure is ENOMEM. After a followup patch, we'd need to discard the errno code, i.e. x = pipapo_clone() if (IS_ERR(x)) return NULL or make more changes to fix up callers to expect IS_ERR() code from set->ops->deactivate(). So simplify this and make it return ptr-or-null. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06netfilter: nft_set_pipapo: move prove_locking helper aroundFlorian Westphal
Preparation patch, the helper will soon get called from insert function too. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06netfilter: conntrack: remove flowtable early-drop testFlorian Westphal
Not sure why this special case exists. Early drop logic (which kicks in when conntrack table is full) should be independent of flowtable offload and only consider assured bit (i.e., two-way traffic was seen). flowtable entries hold a reference to the conntrack entry (struct nf_conn) that has been offloaded. The conntrack use count is not decremented until after the entry is free'd. This change therefore will not result in exceeding the conntrack table limit. It does allow early-drop of tcp flows even when they've been offloaded, but only if they have been offloaded before syn-ack was received or after at least one peer has sent a fin. Currently 'fin' packet reception already stops offloading, so this should not impact offloading either. Cc: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06netfilter: conntrack: documentation: remove reference to non-existent sysctlFlorian Westphal
The referenced sysctl doesn't exist anymore. Fixes: 4592ee7f525c ("netfilter: conntrack: remove offload_pickup sysctl again") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06netfilter: use NF_DROP instead of -NF_DROPJason Xing
At the beginning in 2009 one patch [1] introduced collecting drop counter in nf_conntrack_in() by returning -NF_DROP. Later, another patch [2] changed the return value of tcp_packet() which now is renamed to nf_conntrack_tcp_packet() from -NF_DROP to NF_DROP. As we can see, that -NF_DROP should be corrected. Similarly, there are other two points where the -NF_DROP is used. Well, as NF_DROP is equal to 0, inverting NF_DROP makes no sense as patch [2] said many years ago. [1] commit 7d1e04598e5e ("netfilter: nf_conntrack: account packets drop by tcp_packet()") [2] commit ec8d540969da ("netfilter: conntrack: fix dropping packet after l4proto->packet()") Signed-off-by: Jason Xing <kernelxing@tencent.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()Kent Overstreet
We're using mutex_lock() inside a wait_event() conditional - prepare_to_wait() has already flipped task state, so potentially blocking ops need annotation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06orangefs: fix out-of-bounds fsid accessMike Marshall
Arnd Bergmann sent a patch to fsdevel, he says: "orangefs_statfs() copies two consecutive fields of the superblock into the statfs structure, which triggers a warning from the string fortification helpers" Jan Kara suggested an alternate way to do the patch to make it more readable. I ran both ideas through xfstests and both seem fine. This patch is based on Jan Kara's suggestion. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2024-05-06LoongArch: KVM: Add mmio trace events supportBibo Mao
Add mmio trace events support, currently generic mmio events KVM_TRACE_MMIO_WRITE/xxx_READ/xx_READ_UNSATISFIED are added here. Also vcpu id field is added for all kvm trace events, since perf KVM tool parses vcpu id information for kvm entry event. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06LoongArch: KVM: Add software breakpoint supportBibo Mao
When VM runs in kvm mode, system will not exit to host mode when executing a general software breakpoint instruction such as INSN_BREAK, trap exception happens in guest mode rather than host mode. In order to debug guest kernel on host side, one mechanism should be used to let VM exit to host mode. Here a hypercall instruction with a special code is used for software breakpoint usage. VM exits to host mode and kvm hypervisor identifies the special hypercall code and sets exit_reason with KVM_EXIT_DEBUG. And then let qemu handle it. Idea comes from ppc kvm, one api KVM_REG_LOONGARCH_DEBUG_INST is added to get the hypercall code. VMM needs get sw breakpoint instruction with this api and set the corresponding sw break point for guest kernel. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06LoongArch: KVM: Add PV IPI support on guest sideBibo Mao
PARAVIRT config option and PV IPI is added for the guest side, function pv_ipi_init() is used to add IPI sending and IPI receiving hooks. This function firstly checks whether system runs in VM mode, and if kernel runs in VM mode, it will call function kvm_para_available() to detect the current hypervirsor type (now only KVM type detection is supported). The paravirt functions can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver functions. With virtual IPI sender, IPI message is stored in memory rather than emulated HW. IPI multicast is also supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SWI0 is used rather than real IPI HW. Since VCPU has separate HW SWI0 like HW timer, there is no trap in IPI interrupt acknowledge. Since IPI message is stored in memory, there is no trap in getting IPI message. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06LoongArch: KVM: Add PV IPI support on host sideBibo Mao
On LoongArch system, IPI hw uses iocsr registers. There are one iocsr register access on IPI sending, and two iocsr access on IPI receiving for the IPI interrupt handler. In VM mode all iocsr accessing will cause VM to trap into hypervisor. So with one IPI hw notification there will be three times of trap. In this patch PV IPI is added for VM, hypercall instruction is used for IPI sender, and hypervisor will inject an SWI to the destination vcpu. During the SWI interrupt handler, only CSR.ESTAT register is written to clear irq. CSR.ESTAT register access will not trap into hypervisor, so with PV IPI supported, there is one trap with IPI sender, and no trap with IPI receiver, there is only one trap with IPI notification. Also this patch adds IPI multicast support, the method is similar with x86. With IPI multicast support, IPI notification can be sent to at most 128 vcpus at one time. It greatly reduces the times of trapping into hypervisor. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06LoongArch: KVM: Add vcpu mapping from physical cpuidBibo Mao
Physical CPUID is used for interrupt routing for irqchips such as ipi, msgint and eiointc interrupt controllers. Physical CPUID is stored at the CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu is created and the physical CPUIDs of two vcpus cannot be the same. Different irqchips have different size declaration about physical CPUID, the max CPUID value for CSR LOONGARCH_CSR_CPUID on Loongson-3A5000 is 512, the max CPUID supported by IPI hardware is 1024, while for eiointc irqchip is 256, and for msgint irqchip is 65536. The smallest value from all interrupt controllers is selected now, and the max cpuid size is defines as 256 by KVM which comes from the eiointc irqchip. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06LoongArch: KVM: Add cpucfg area for kvm hypervisorBibo Mao
Instruction cpucfg can be used to get processor features. And there is a trap exception when it is executed in VM mode, and also it can be used to provide cpu features to VM. On real hardware cpucfg area 0 - 20 is used by now. Here one specified area 0x40000000 -- 0x400000ff is used for KVM hypervisor to provide PV features, and the area can be extended for other hypervisors in future. This area will never be used for real HW, it is only used by software. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06LoongArch: KVM: Add hypercall instruction emulationBibo Mao
On LoongArch system, there is a hypercall instruction special for virtualization. When system executes this instruction on host side, there is an illegal instruction exception reported, however it will trap into host when it is executed in VM mode. When hypercall is emulated, A0 register is set with value KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid instruction exception. So VM can continue to executing the next code. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06LoongArch/smp: Refine some ipi functions on LoongArch platformBibo Mao
Refine the ipi handling on LoongArch platform, there are three modifications: 1. Add generic function get_percpu_irq(), replacing some percpu irq functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with get_percpu_irq(). 2. Change definition about parameter action called by function loongson_send_ipi_single() and loongson_send_ipi_mask(), and it is defined as decimal encoding format at ipi sender side. Normal decimal encoding is used rather than binary bitmap encoding for ipi action, ipi hw sender uses decimal encoding code, and ipi receiver will get binary bitmap encoding, the ipi hw will convert it into bitmap in ipi message buffer. 3. Add a structure smp_ops on LoongArch platform so that pv ipi can be used later. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION()Zhu Yanjun
No functional changes intended. Fixes: f2298c0403b0 ("null_blk: multi queue aware block test driver") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240506075538.6064-1-yanjun.zhu@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-06x86/alternatives: Remove alternative_input_2()Borislav Petkov (AMD)
It is unused. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240506122848.20326-1-bp@kernel.org
2024-05-06EDAC/synopsys: Fix ECC status and IRQ control race conditionSerge Semin
The race condition around the ECCCLR register access happens in the IRQ disable method called in the device remove() procedure and in the ECC IRQ handler: 1. Enable IRQ: a. ECCCLR = EN_CE | EN_UE 2. Disable IRQ: a. ECCCLR = 0 3. IRQ handler: a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT b. ECCCLR = 0 c. ECCCLR = EN_CE | EN_UE So if the IRQ disabling procedure is called concurrently with the IRQ handler method the IRQ might be actually left enabled due to the statement 3c. The root cause of the problem is that ECCCLR register (which since v3.10a has been called as ECCCTL) has intermixed ECC status data clear flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and handling (write 1 to clear ECC status data) procedures must be serialised around the ECCCTL register modification to prevent the race. So fix the problem described above by adding the spin-lock around the ECCCLR modifications and preventing the IRQ-handler from modifying the IRQs enable flags (there is no point in disabling the IRQ and then re-enabling it again within a single IRQ handler call, see the statements 3a/3b and 3c above). Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com
2024-05-06powerpc/64: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=nMichael Ellerman
There is code that builds with calls to IO accessors even when CONFIG_PCI=n, but the actual calls are guarded by runtime checks. If not those calls would be faulting, because the page at virtual address zero is (usually) not mapped into the kernel. As Arnd pointed out, it is possible a large port value could cause the address to be above mmap_min_addr which would then access userspace, which would be a bug. To avoid any such issues, set _IO_BASE to POISON_POINTER_DELTA. That is a value chosen to point into unmapped space between the kernel and userspace, so any access will always fault. Note that on 32-bit POISON_POINTER_DELTA is 0, so the patch only has an effect on 64-bit. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240503075619.394467-2-mpe@ellerman.id.au
2024-05-06powerpc/io: Avoid clang null pointer arithmetic warningsMichael Ellerman
With -Wextra clang warns about pointer arithmetic using a null pointer. When building with CONFIG_PCI=n, that triggers a warning in the IO accessors, eg: In file included from linux/arch/powerpc/include/asm/io.h:672: linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro '__do_inb' 591 | #define __do_inb(port) readb((PCI_IO_ADDR)_IO_BASE + port); | ~~~~~~~~~~~~~~~~~~~~~ ^ That is because when CONFIG_PCI=n, _IO_BASE is defined as 0. Although _IO_BASE is defined as plain 0, the cast (PCI_IO_ADDR) converts it to void * before the addition with port happens. Instead the addition can be done first, and then the cast. The resulting value will be the same, but avoids the warning, and also avoids void pointer arithmetic which is apparently non-standard. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaaNsA@mail.gmail.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240503075619.394467-1-mpe@ellerman.id.au
2024-05-06powerpc/bpf: enable kfunc callHari Bathini
Currently, bpf jit code on powerpc assumes all the bpf functions and helpers to be part of core kernel text. This is false for kfunc case, as function addresses may not be part of core kernel text area. So, add support for addresses that are not within core kernel text area too, to enable kfunc support. Emit instructions based on whether the function address is within core kernel text address or not, to retain optimized instruction sequence where possible. In case of PCREL, as a bpf function that is not within core kernel text area is likely to go out of range with relative addressing on kernel base, use PC relative addressing. If that goes out of range, load the full address with PPC_LI64(). With addresses that are not within core kernel text area supported, override bpf_jit_supports_kfunc_call() to enable kfunc support. Also, override bpf_jit_supports_far_kfunc_call() to enable 64-bit pointers, as an address offset can be more than 32-bit long on PPC64. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240502173205.142794-2-hbathini@linux.ibm.com
2024-05-06powerpc/64/bpf: fix tail calls for PCREL addressingHari Bathini
With PCREL addressing, there is no kernel TOC. So, it is not setup in prologue when PCREL addressing is used. But the number of instructions to skip on a tail call was not adjusted accordingly. That resulted in not so obvious failures while using tailcalls. 'tailcalls' selftest crashed the system with the below call trace: bpf_test_run+0xe8/0x3cc (unreliable) bpf_prog_test_run_skb+0x348/0x778 __sys_bpf+0xb04/0x2b00 sys_bpf+0x28/0x38 system_call_exception+0x168/0x340 system_call_vectored_common+0x15c/0x2ec Also, as bpf programs are always module addresses and a bpf helper in general is a core kernel text address, using PC relative addressing often fails with "out of range of pcrel address" error. Switch to using kernel base for relative addressing to handle this better. Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL addresing") Cc: stable@vger.kernel.org # v6.4+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240502173205.142794-1-hbathini@linux.ibm.com
2024-05-06Documentation: Fix the address of the linuxppc-dev mailing listStephen Rothwell
This list was moved many years ago. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240503121012.3ba5000b@canb.auug.org.au
2024-05-06Documentation: Document PowerPC kernel dynamic DEXCR interfaceBenjamin Gray
Documents how to use the PR_PPC_GET_DEXCR and PR_PPC_SET_DEXCR prctl()'s for changing a process's DEXCR or its process tree default value. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-10-bgray@linux.ibm.com
2024-05-06selftests/powerpc/dexcr: Add chdexcr utilityBenjamin Gray
Adds a utility to exercise the prctl DEXCR inheritance in the shell. Supports setting and clearing each aspect. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Use correct SPDX license, use execvp() for usability, print errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-9-bgray@linux.ibm.com
2024-05-06selftests/powerpc/dexcr: Add DEXCR config details to lsdexcrBenjamin Gray
Now that the DEXCR can be configured with prctl, add a section in lsdexcr that explains why each aspect is set the way it is. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-8-bgray@linux.ibm.com
2024-05-06selftests/powerpc/dexcr: Attempt to enable NPHIE in hashchk selftestBenjamin Gray
Now that a process can control its DEXCR to some extent, make the hashchk tests more reliable by explicitly setting the local and onexec NPHIE aspect. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-7-bgray@linux.ibm.com
2024-05-06selftests/powerpc/dexcr: Add DEXCR prctl interface testBenjamin Gray
Some basic tests of the prctl interface of the DEXCR. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Add missing SPDX tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-6-bgray@linux.ibm.com
2024-05-06powerpc/dexcr: Add DEXCR prctl interfaceBenjamin Gray
Now that we track a DEXCR on a per-task basis, individual tasks are free to configure it as they like. The interface is a pair of getter/setter prctl's that work on a single aspect at a time (multiple aspects at once is more difficult if there are different rules applied for each aspect, now or in future). The getter shows the current state of the process config, and the setter allows setting/clearing the aspect. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Account for PR_RISCV_SET_ICACHE_FLUSH_CTX, shrink some longs lines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-5-bgray@linux.ibm.com
2024-05-06net: fix out-of-bounds access in ops_initThadeu Lima de Souza Cascardo
net_alloc_generic is called by net_alloc, which is called without any locking. It reads max_gen_ptrs, which is changed under pernet_ops_rwsem. It is read twice, first to allocate an array, then to set s.len, which is later used to limit the bounds of the array access. It is possible that the array is allocated and another thread is registering a new pernet ops, increments max_gen_ptrs, which is then used to set s.len with a larger than allocated length for the variable array. Fix it by reading max_gen_ptrs only once in net_alloc_generic. If max_gen_ptrs is later incremented, it will be caught in net_assign_generic. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Fixes: 073862ba5d24 ("netns: fix net_alloc_generic()") Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240502132006.3430840-1-cascardo@igalia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06Merge branch 'thermal-core'Rafael J. Wysocki
This includes a major rework of thermal governors and part of the thermal core interacting with them as well as some fixes and cleanups of the thermal debug code: - Redesign the thermal governor interface to allow the governors to work in a more straightforward way. - Make thermal governors take the current trip point thresholds into account in their computations which allows trip hysteresis to be observed more accurately. - Clean up thermal governors. - Make the thermal core manage passive polling for thermal zones and remove passive polling management from thermal governors. - Improve the handling of cooling device states and thermal mitigation episodes in progress in the thermal debug code. - Avoid excessive updates of trip point statistics and clean up the printing of thermal mitigation episode information. * thermal-core: (27 commits) thermal: core: Move passive polling management to the core thermal: core: Do not call handle_thermal_trip() if zone temperature is invalid thermal: trip: Add missing empty code line thermal/debugfs: Avoid printing zero duration for mitigation events in progress thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add() thermal/debugfs: Create records for cdev states as they get used thermal: core: Introduce thermal_governor_trip_crossed() thermal/debugfs: Make tze_seq_show() skip invalid trips and trips with no stats thermal/debugfs: Rename thermal_debug_update_temp() to thermal_debug_update_trip_stats() thermal/debugfs: Clean up thermal_debug_update_temp() thermal/debugfs: Avoid excessive updates of trip point statistics thermal: core: Relocate critical and hot trip handling thermal: core: Drop the .throttle() governor callback thermal: gov_user_space: Use .trip_crossed() instead of .throttle() thermal: gov_fair_share: Eliminate unnecessary integer divisions thermal: gov_fair_share: Use trip thresholds instead of trip temperatures thermal: gov_fair_share: Use .manage() callback instead of .throttle() thermal: gov_step_wise: Clean up thermal_zone_trip_update() thermal: gov_step_wise: Use trip thresholds instead of trip temperatures thermal: gov_step_wise: Use .manage() callback instead of .throttle() ...
2024-05-06Merge back thermal cotntrol material for v6.10.Rafael J. Wysocki
2024-05-06alpha: drop pre-EV56 supportArnd Bergmann
All EV4 machines are already gone, and the remaining EV5 based machines all support the slightly more modern EV56 generation as well. Debian only supports EV56 and later. Drop both of these and build kernels optimized for EV56 and higher when the "generic" options is selected, tuning for an out-of-order EV6 pipeline, same as Debian userspace. Since this was the only supported architecture without 8-bit and 16-bit stores, common kernel code no longer has to worry about aligning struct members, and existing workarounds from the block and tty layers can be removed. The alpha memory management code no longer needs an abstraction for the differences between EV4 and EV5+. Link: https://lists.debian.org/debian-alpha/2023/05/msg00009.html Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-05-06Merge branch 'add-tcp-fraglist-gro-support'Paolo Abeni
Felix Fietkau says: ==================== Add TCP fraglist GRO support When forwarding TCP after GRO, software segmentation is very expensive, especially when the checksum needs to be recalculated. One case where that's currently unavoidable is when routing packets over PPPoE. Performance improves significantly when using fraglist GRO implemented in the same way as for UDP. When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established socket in the same netns as the receiving device. While this may not cover all relevant use cases in multi-netns configurations, it should be good enough for most configurations that need this. Here's a measurement of running 2 TCP streams through a MediaTek MT7622 device (2-core Cortex-A53), which runs NAT with flow offload enabled from one ethernet port to PPPoE on another ethernet port + cake qdisc set to 1Gbps. rx-gro-list off: 630 Mbit/s, CPU 35% idle rx-gro-list on: 770 Mbit/s, CPU 40% idle Changes since v4: - add likely() to prefer the non-fraglist path in check Changes since v3: - optimize __tcpv4_gso_segment_csum - add unlikely() - reorder dev_net/skb_gro_network_header calls after NETIF_F_GRO_FRAGLIST check - add support for ipv6 nat - drop redundant pskb_may_pull check Changes since v2: - create tcp_gro_header_pull helper function to pull tcp header only once - optimize __tcpv4_gso_segment_list_csum, drop obsolete flags check Changes since v1: - revert bogus tcp flags overwrite on segmentation - fix kbuild issue with !CONFIG_IPV6 - only perform socket lookup for the first skb in the GRO train Changes since RFC: - split up patches - handle TCP flags mutations ==================== Link: https://lore.kernel.org/r/20240502084450.44009-1-nbd@nbd.name Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: add heuristic for enabling TCP fraglist GROFelix Fietkau
When forwarding TCP after GRO, software segmentation is very expensive, especially when the checksum needs to be recalculated. One case where that's currently unavoidable is when routing packets over PPPoE. Performance improves significantly when using fraglist GRO implemented in the same way as for UDP. When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established socket in the same netns as the receiving device. While this may not cover all relevant use cases in multi-netns configurations, it should be good enough for most configurations that need this. Here's a measurement of running 2 TCP streams through a MediaTek MT7622 device (2-core Cortex-A53), which runs NAT with flow offload enabled from one ethernet port to PPPoE on another ethernet port + cake qdisc set to 1Gbps. rx-gro-list off: 630 Mbit/s, CPU 35% idle rx-gro-list on: 770 Mbit/s, CPU 40% idle Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: create tcp_gro_header_pull helper functionFelix Fietkau
Pull the code out of tcp_gro_receive in order to access the tcp header from tcp4/6_gro_receive. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: create tcp_gro_lookup helper functionFelix Fietkau
This pulls the flow port matching out of tcp_gro_receive, so that it can be reused for the next change, which adds the TCP fraglist GRO heuristic. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: add code for TCP fraglist GROFelix Fietkau
This implements fraglist GRO similar to how it's handled in UDP, however no functional changes are added yet. The next change adds a heuristic for using fraglist GRO instead of regular GRO. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: add support for segmenting TCP fraglist GSO packetsFelix Fietkau
Preparation for adding TCP fraglist GRO support. It expects packets to be combined in a similar way as UDP fraglist GSO packets. For IPv4 packets, NAT is handled in the same way as UDP fraglist GSO. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: move skb_gro_receive_list from udp to coreFelix Fietkau
This helper function will be used for TCP fraglist GRO support Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>