summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2024-06-18rcu: Disable interrupts directly in rcu_gp_init()Paul E. McKenney
Interrupts are enabled in rcu_gp_init(), so this commit switches from local_irq_save() and local_irq_restore() to local_irq_disable() and local_irq_enable(). Link: https://lore.kernel.org/all/febb13ab-a4bb-48b4-8e97-7e9f7749e6da@moroto.mountain/ Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2024-06-18rcu/tree: Reduce wake up for synchronize_rcu() common caseJoel Fernandes (Google)
In the synchronize_rcu() common case, we will have less than SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker is pointless just to free the last injected wait head since at that point, all the users have already been awakened. Introduce a new counter to track this and prevent the wakeup in the common case. [ paulmck: Remove atomic_dec_return_release in cannot-happen state. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2024-06-18bpf: Fix remap of arena.Alexei Starovoitov
The bpf arena logic didn't account for mremap operation. Add a refcnt for multiple mmap events to prevent use-after-free in arena_vm_close. Fixes: 317460317a02 ("bpf: Introduce bpf_arena.") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Barret Rhoden <brho@google.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Closes: https://lore.kernel.org/bpf/Zmuw29IhgyPNKnIM@xpf.sh.intel.com Link: https://lore.kernel.org/bpf/20240617171812.76634-1-alexei.starovoitov@gmail.com
2024-06-17Merge tag 'lsm-pr-20240617' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm fix from Paul Moore: "A single LSM/IMA patch to fix a problem caused by sleeping while in a RCU critical section" * tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: ima: Avoid blocking in RCU read-side critical section
2024-06-17Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes" * tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: kcov: don't lose track of remote references during softirqs mm: shmem: fix getting incorrect lruvec when replacing a shmem folio mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick mm: fix possible OOB in numa_rebuild_large_mapping() mm/migrate: fix kernel BUG at mm/compaction.c:2761! selftests: mm: make map_fixed_noreplace test names stable mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default gcov: add support for GCC 14 zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING mm: huge_memory: fix misused mapping_large_folio_support() for anon folios lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get() lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n MAINTAINERS: remove Lorenzo as vmalloc reviewer Revert "mm: init_mlocked_on_free_v3" mm/page_table_check: fix crash on ZONE_DEVICE gcc: disable '-Warray-bounds' for gcc-9 ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger() ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
2024-06-17bpf: Add missed var_off setting in coerce_subreg_to_size_sx()Yonghong Song
In coerce_subreg_to_size_sx(), for the case where upper sign extension bits are the same for smax32 and smin32 values, we missed to setup properly. This is especially problematic if both smax32 and smin32's sign extension bits are 1. The following is a simple example illustrating the inconsistent verifier states due to missed var_off: 0: (85) call bpf_get_prandom_u32#7 ; R0_w=scalar() 1: (bf) r3 = r0 ; R0_w=scalar(id=1) R3_w=scalar(id=1) 2: (57) r3 &= 15 ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) 3: (47) r3 |= 128 ; R3_w=scalar(smin=umin=smin32=umin32=128,smax=umax=smax32=umax32=143,var_off=(0x80; 0xf)) 4: (bc) w7 = (s8)w3 REG INVARIANTS VIOLATION (alu): range bounds violation u64=[0xffffff80, 0x8f] s64=[0xffffff80, 0x8f] u32=[0xffffff80, 0x8f] s32=[0x80, 0xffffff8f] var_off=(0x80, 0xf) The var_off=(0x80, 0xf) is not correct, and the correct one should be var_off=(0xffffff80; 0xf) since from insn 3, we know that at insn 4, the sign extension bits will be 1. This patch fixed this issue by setting var_off properly. Fixes: 8100928c8814 ("bpf: Support new sign-extension mov insns") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240615174632.3995278-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-17bpf: Add missed var_off setting in set_sext32_default_val()Yonghong Song
Zac reported a verification failure and Alexei reproduced the issue with a simple reproducer ([1]). The verification failure is due to missed setting for var_off. The following is the reproducer in [1]: 0: R1=ctx() R10=fp0 0: (71) r3 = *(u8 *)(r10 -387) ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R10=fp0 1: (bc) w7 = (s8)w3 ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=127,var_off=(0x0; 0x7f)) 2: (36) if w7 >= 0x2533823b goto pc-3 mark_precise: frame0: last_idx 2 first_idx 0 subseq_idx -1 mark_precise: frame0: regs=r7 stack= before 1: (bc) w7 = (s8)w3 mark_precise: frame0: regs=r3 stack= before 0: (71) r3 = *(u8 *)(r10 -387) 2: R7_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=127,var_off=(0x0; 0x7f)) 3: (b4) w0 = 0 ; R0_w=0 4: (95) exit Note that after insn 1, the var_off for R7 is (0x0; 0x7f). This is not correct since upper 24 bits of w7 could be 0 or 1. So correct var_off should be (0x0; 0xffffffff). Missing var_off setting in set_sext32_default_val() caused later incorrect analysis in zext_32_to_64(dst_reg) and reg_bounds_sync(dst_reg). To fix the issue, set var_off correctly in set_sext32_default_val(). The correct reg state after insn 1 becomes: 1: (bc) w7 = (s8)w3 ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R7_w=scalar(smin=0,smax=umax=0xffffffff,smin32=-128,smax32=127,var_off=(0x0; 0xffffffff)) and at insn 2, the verifier correctly determines either branch is possible. [1] https://lore.kernel.org/bpf/CAADnVQLPU0Shz7dWV4bn2BgtGdxN3uFHPeobGBA72tpg5Xoykw@mail.gmail.com/ Fixes: 8100928c8814 ("bpf: Support new sign-extension mov insns") Reported-by: Zac Ecob <zacecob@protonmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240615174626.3994813-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-17cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeupKirill A. Shutemov
ACPI MADT doesn't allow to offline a CPU after it has been woken up. Currently, CPU hotplug is prevented based on the confidential computing attribute which is set for Intel TDX. But TDX is not the only possible user of the wake up method. Any platform that uses ACPI MADT wakeup method cannot offline CPU. Disable CPU offlining on ACPI MADT wakeup enumeration. This has no visible effects for users: currently, TDX guest is the only platform that uses the ACPI MADT wakeup method. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Tao Liu <ltao@redhat.com> Link: https://lore.kernel.org/r/20240614095904.1345461-5-kirill.shutemov@linux.intel.com
2024-06-17cpu/hotplug: Add support for declaring CPU offlining not supportedKirill A. Shutemov
The ACPI MADT mailbox wakeup method doesn't allow to offline a CPU after it has been woken up. Currently, offlining is prevented based on the confidential computing attribute which is set for Intel TDX. But TDX is not the only possible user of the wake up method. The MADT wakeup can be implemented outside of a confidential computing environment. Offline support is a property of the wakeup method, not the CoCo implementation. Introduce cpu_hotplug_disable_offlining() that can be called to indicate that CPU offlining should be disabled. This function is going to replace CC_ATTR_HOTPLUG_DISABLED for ACPI MADT wakeup method. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tao Liu <ltao@redhat.com> Link: https://lore.kernel.org/r/20240614095904.1345461-4-kirill.shutemov@linux.intel.com
2024-06-17irqdomain: Remove __irq_domain_add()Herve Codina
__irq_domain_add() has been replaced by irq_domain_instanciate() and so, it is no more used. Simply remove it. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-21-herve.codina@bootlin.com
2024-06-17irqdomain: Convert domain creation functions to irq_domain_instantiate()Herve Codina
Domain creation functions use __irq_domain_add(). With the introduction of irq_domain_instantiate(), __irq_domain_add() becomes obsolete. In order to fully remove __irq_domain_add(), convert domain creation function to irq_domain_instantiate() Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-19-herve.codina@bootlin.com
2024-06-17irqdomain: Add a resource managed version of irq_domain_instantiate()Herve Codina
Add a devres version of irq_domain_instantiate(). Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-17-herve.codina@bootlin.com
2024-06-17irqdomain: Add support for generic irq chips creation before publishing a domainHerve Codina
The current API functions create an irq_domain and also publish this newly created to domain. Once an irq_domain is published, consumers can request IRQ in order to use them. Some interrupt controller drivers have to perform some more operations with the created irq_domain in order to have it ready to be used. For instance: - Allocate generic irq chips with irq_alloc_domain_generic_chips() - Retrieve the generic irq chips with irq_get_domain_generic_chip() - Initialize retrieved chips: set register base address and offsets, set several hooks such as irq_mask, irq_unmask, ... With the newly introduced irq_domain_alloc_generic_chips(), an interrupt controller driver can use the irq_domain_chip_generic_info structure and set the init() hook to perform its generic chips initialization. In order to avoid a window where the domain is published but not yet ready to be used, handle the generic chip creation (i.e the irq_domain_alloc_generic_chips() call) before the domain is published. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-16-herve.codina@bootlin.com
2024-06-17genirq/generic_chip: Introduce init() and exit() hooksHerve Codina
Most of generic chip drivers need to perform some more additional initializations on the generic chips allocated before they can be fully ready. These additional initializations need to be performed before the IRQ domain is published to avoid a race condition between IRQ consumers and suppliers. Introduce the init() hook to perform these initializations at the right place just after the generic chip creation. Also introduce the exit() hook to allow reverting operations done by the init() hook just before the generic chip is destroyed. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-15-herve.codina@bootlin.com
2024-06-17genirq/generic_chip: Introduce irq_domain_{alloc,remove}_generic_chips()Herve Codina
The existing __irq_alloc_domain_generic_chips() uses a bunch of parameters to describe the generic chips that need to be allocated. Adding more parameters and wrappers to hide new parameters in the existing code leads to more and more code without any relevant values and without any flexibility. Introduce irq_domain_alloc_generic_chips() where the generic chips description is done using the irq_domain_chip_generic_info structure instead of the bunch of parameters to allow flexibility and easy evolution. Also introduce irq_domain_remove_generic_chips() to revert the operations done by irq_domain_alloc_generic_chips(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-14-herve.codina@bootlin.com
2024-06-17irqdomain: Introduce init() and exit() hooksHerve Codina
The current API does not allow additional initialization before the domain is published. This can lead to a race condition between consumers and supplier as a domain can be available for consumers before being fully ready. Introduce the init() hook to allow additional initialization before plublishing the domain. Also introduce the exit() hook to revert operations done in init() on domain removal. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-13-herve.codina@bootlin.com
2024-06-17irqdomain: Handle domain bus token in irq_domain_create()Herve Codina
irq_domain_update_bus_token() is the only way to set the domain bus token. This is sub-optimal as irq_domain_update_bus_token() can be called only once the domain is created and needs to revert some operations, change the domain name and redo the operations. In order to avoid this revert/change/redo sequence, take the domain bus into account token during the domain creation. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-12-herve.codina@bootlin.com
2024-06-17irqdomain: Make __irq_domain_create() return an error codeHerve Codina
__irq_domain_create() can fail for several reasons. When it fails it returns a NULL pointer and so filters out the exact failure reason. The only user of __irq_domain_create() is irq_domain_instantiate() which can return a PTR_ERR value. On __irq_domain_create() failure, it uses an arbitrary error code. Rather than using this arbitrary error value, make __irq_domain_create() return is own error code and use that one. [ tglx: Remove the pointless ERR_CAST. domain is a valid return pointer ] Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-11-herve.codina@bootlin.com
2024-06-17irqdomain: Use irq_domain_instantiate() for hierarchy domain creationHerve Codina
irq_domain_instantiate() handles all needs to be used in irq_domain_create_hierarchy() Avoid code duplication and use directly irq_domain_instantiate() for hierarchy domain creation. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-10-herve.codina@bootlin.com
2024-06-17irqdomain: Handle domain hierarchy parent in irq_domain_instantiate()Herve Codina
To use irq_domain_instantiate() from irq_domain_create_hierarchy(), irq_domain_instantiate() needs to handle the domain hierarchy parent. Add the required functionality. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-9-herve.codina@bootlin.com
2024-06-17irqdomain: Handle additional domain flags in irq_domain_instantiate()Herve Codina
In order to use irq_domain_instantiate() from several places such as irq_domain_create_hierarchy(), irq_domain_instantiate() needs to handle additional domain flags. Add the required infrastructure. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-8-herve.codina@bootlin.com
2024-06-17irqdomain: Convert __irq_domain_create() to use struct irq_domain_infoHerve Codina
The existing __irq_domain_create() use a bunch of parameters to create an irq domain. With the introduction of irq_domain_info structure, these parameters are available in the information structure itself. Using directly this information structure allows future flexibility to add other parameters in a simple way without the need to change the __irq_domain_create() prototype. Convert __irq_domain_create() to use the information structure. [ tglx: Fixup struct initializer ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-7-herve.codina@bootlin.com
2024-06-17irqdomain: Use a dedicated function to set the domain nameHerve Codina
The interrupt domain name computation and setting is directly done in __irq_domain_create(). This leads to a quite long __irq_domain_create() function. In order to simplify __irq_domain_create() and isolate the domain name computation and setting, move the related operations to a dedicated function. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-6-herve.codina@bootlin.com
2024-06-17irqdomain: Introduce irq_domain_instantiate()Herve Codina
The existing irq_domain_add_*() functions used to instantiate an IRQ domain are wrappers built on top of __irq_domain_add() and describe the domain properties using a bunch of parameters. Adding more parameters and wrappers to hide new parameters in the existing code lead to more and more code without any relevant value and without any flexibility. Introduce irq_domain_instantiate() where the interrupt domain properties are given using a irq_domain_info structure instead of the bunch of parameters to allow flexibility and easy evolution. irq_domain_instantiate() performs the same operation as the one done by __irq_domain_add(). For compatibility reason with existing code, keep __irq_domain_add() but convert it to irq_domain_instantiate(). [ tglx: Fixed up struct initializer coding style ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-3-herve.codina@bootlin.com
2024-06-17irqdomain: Introduce irq_domain_free()Herve Codina
In preparation of the introduction of the irq domain instantiation, introduce irq_domain_free() to avoid code duplication on later modifications. This new function is an extraction of the current operations performed to free the irq domain. No functional change intended. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-2-herve.codina@bootlin.com
2024-06-17irqdomain: Fixed unbalanced fwnode get and putHerve Codina
fwnode_handle_get(fwnode) is called when a domain is created with fwnode passed as a function parameter. fwnode_handle_put(domain->fwnode) is called when the domain is destroyed but during the creation a path exists that does not set domain->fwnode. If this path is taken, the fwnode get will never be put. To avoid the unbalanced get and put, set domain->fwnode unconditionally. Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only") Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com
2024-06-17cpu/hotplug: Reverse order of iteration in freeze_secondary_cpus()Stanislav Spassov
Whenever CPU hotplug state callbacks are registered, the startup callback is invoked on CPUs that have already reached the provided state in order of ascending CPU IDs. In freeze_secondary_cpus() the teardown of CPUs happens in the same are invoked in the same order. This is known to make a difference is the current implementation of these callbacks in arch/x86/events/intel/uncore.c: - uncore_event_cpu_online() designates the first CPU it is invoked for on each package as the uncore event collector for that package - uncore_event_cpu_offline() if the CPU being offlined is the event collector for its package, transfers that responsibility over to the next (by ascending CPU id) one in the same package With the current order of CPU teardowns in freeze_secondary_cpus(), the latter ends up doing the ownership transfer work on every single CPU. That work involves a synchronize_rcu() call, ultimately unnecessarily degrading the performance of CPU offlining. To address this make freeze_secondary_cpus() iterate through the CPUs in reverse order, so that the teardown happens in order of descending CPU IDs. [ tglx: Massage change log ] Signed-off-by: Stanislav Spassov <stanspas@amazon.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240524160449.48594-1-stanspas@amazon.de
2024-06-17smp: Use str_plural() to fix Coccinelle warningsThorsten Blum
Fixes the following two Coccinelle/coccicheck warnings reported by string_choices.cocci: opportunity for str_plural(num_cpus) opportunity for str_plural(num_nodes) Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240508154225.309703-2-thorsten.blum@toblux.com
2024-06-17cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked()Yuntao Wang
Commit 4205e4786d0b ("cpu/hotplug: Provide dynamic range for prepare stage") added a dynamic range for the prepare states, but did not handle the assignment of the dynstate variable in __cpuhp_setup_state_cpuslocked(). This causes the corresponding startup callback not to be invoked when calling __cpuhp_setup_state_cpuslocked() with the CPUHP_BP_PREPARE_DYN parameter, even though it should be. Currently, the users of __cpuhp_setup_state_cpuslocked(), for one reason or another, have not triggered this bug. Fixes: 4205e4786d0b ("cpu/hotplug: Provide dynamic range for prepare stage") Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240515134554.427071-1-ytcoode@gmail.com
2024-06-15kcov: don't lose track of remote references during softirqsAleksandr Nogikh
In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV metadata of the current task into a per-CPU variable. However, the kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote KCOV objects. If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens to get interrupted and kcov_remote_start() is called, it ultimately leads to kcov_remote_stop() NOT restoring the original KCOV reference. So when the task exits, all registered remote KCOV handles remain active forever. The most uncomfortable effect (at least for syzkaller) is that the bug prevents the reuse of the same /sys/kernel/debug/kcov descriptor. If we obtain it in the parent process and then e.g. drop some capabilities and continuously fork to execute individual programs, at some point current->kcov of the forked process is lost, kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls calls from subsequent forks fail. And, yes, the efficiency is also affected if we keep on losing remote kcov objects. a) kcov_remote_map keeps on growing forever. b) (If I'm not mistaken), we're also not freeing the memory referenced by kcov->area. Fix it by introducing a special kcov_mode that is assigned to the task that owns a KCOV remote object. It makes kcov_mode_enabled() return true and yet does not trigger coverage collection in __sanitizer_cov_trace_pc() and write_comp_data(). [nogikh@google.com: replace WRITE_ONCE() with an ordinary assignment] Link: https://lkml.kernel.org/r/20240614171221.2837584-1-nogikh@google.com Link: https://lkml.kernel.org/r/20240611133229.527822-1-nogikh@google.com Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts") Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15gcov: add support for GCC 14Peter Oberparleiter
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte long .gcda files with no usable data. To fix this, update GCOV_COUNTERS to match the value defined by GCC 14. Tested with GCC versions 14.1.0 and 13.2.0. Link: https://lkml.kernel.org/r/20240610092743.1609845-1-oberpar@linux.ibm.com Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reported-by: Allison Henderson <allison.henderson@oracle.com> Reported-by: Chuck Lever III <chuck.lever@oracle.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDINGOleg Nesterov
kernel_wait4() doesn't sleep and returns -EINTR if there is no eligible child and signal_pending() is true. That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending() return false and avoid a busy-wait loop. Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Rachel Menge <rachelmenge@linux.microsoft.com> Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/ Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Tested-by: Wei Fu <fuweid89@gmail.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Allen Pais <apais@linux.microsoft.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joel Granados <j.granados@samsung.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mike Christie <michael.christie@oracle.com> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Zqiang <qiang.zhang1211@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-14kunit: test: Add vm_mmap() allocation resource managerKees Cook
For tests that need to allocate using vm_mmap() (e.g. usercopy and execve), provide the interface to have the allocation tracked by KUnit itself. This requires bringing up a placeholder userspace mm. This combines my earlier attempt at this with Mark Rutland's version[1]. Normally alloc_mm() and arch_pick_mmap_layout() aren't exported for modules, so export these only for KUnit testing. Link: https://lore.kernel.org/lkml/20230321122514.1743889-2-mark.rutland@arm.com/ [1] Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-06-14Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-14 We've added 8 non-merge commits during the last 2 day(s) which contain a total of 9 files changed, 92 insertions(+), 11 deletions(-). The main changes are: 1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason() triggered via __bpf_try_make_writable(), from Florian Westphal. 2) Fix removal of kfuncs during linking phase which then throws a kernel build warning via resolve_btfids about unresolved symbols, from Tony Ambardar. 3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol is not available on User Mode Linux, from Maciej Żenczykowski. 4) Fix a register corruption in reg_set_min_max triggering an invariant violation in BPF verifier, from Daniel Borkmann. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Harden __bpf_kfunc tag against linker kfunc removal compiler_types.h: Define __retain for __attribute__((__retain__)) bpf: Avoid splat in pskb_pull_reason bpf: fix UML x86_64 compile failure selftests/bpf: Add test coverage for reg_set_min_max handling bpf: Reduce stack consumption in check_stack_write_fixed_off bpf: Fix reg_set_min_max corruption of fake_reg MAINTAINERS: mailmap: Update Stanislav's email address ==================== Link: https://lore.kernel.org/r/20240614203223.26500-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14bpf: Track delta between "linked" registers.Alexei Starovoitov
Compilers can generate the code r1 = r2 r1 += 0x1 if r2 < 1000 goto ... use knowledge of r2 range in subsequent r1 operations So remember constant delta between r2 and r1 and update r1 after 'if' condition. Unfortunately LLVM still uses this pattern for loops with 'can_loop' construct: for (i = 0; i < 1000 && can_loop; i++) The "undo" pass was introduced in LLVM https://reviews.llvm.org/D121937 to prevent this optimization, but it cannot cover all cases. Instead of fighting middle end optimizer in BPF backend teach the verifier about this pattern. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240613013815.953-3-alexei.starovoitov@gmail.com
2024-06-14function_graph: Add READ_ONCE() when accessing fgraph_array[]Steven Rostedt (Google)
In function_graph_enter() there's a loop that looks at fgraph_array[] elements which are fgraph_ops. It first tests if it is a fgraph_stub op, and if so skips it, as that's just there as a place holder. Then it checks the fgraph_ops filters to see if the ops wants to trace the current function. But if the compiler reloads the fgraph_array[] after the check against fgraph_stub, it could race with the fgraph_array[] being updated with the fgraph_stub. That would cause the stub to be processed. But the stub has a null "func_hash" field which will cause a NULL pointer dereference. Add a READ_ONCE() so that the gops that is compared against the fgraph_stub is also the gops that is processed later. Link: https://lore.kernel.org/all/CA+G9fYsSVJQZH=nM=1cjTc94PgSnMF9y65BnOv6XSoCG_b6wmw@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240613095223.1f07e3a4@rorschach.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: cc60ee813b503 ("function_graph: Use static_call and branch to optimize entry function") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-06-14tracing: Add last boot delta offset for stack tracesSteven Rostedt (Google)
The addresses of a stack trace event are relative to the kallsyms. As that can change between boots, when printing the stack trace from a buffer that was from the last boot, it needs all the addresses to be added to the "text_delta" that gives the delta between the addresses of the functions for the current boot compared to the address of the last boot. Then it can be passed to kallsyms to find the function name, otherwise it just shows a useless list of addresses. Link: https://lkml.kernel.org/r/20240612232027.145807384@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14tracing: Update function tracing output for previous boot bufferSteven Rostedt (Google)
For a persistent ring buffer that is saved across boots, if function tracing was performed in the previous boot, it only saves the address of the functions and uses "%pS" to print their names. But the current boot, those functions may be in different locations. The persistent meta-data saves the text delta between the two boots and can be used to find the address of the saved function of where it is located in the current boot. Link: https://lkml.kernel.org/r/20240612232026.988226055@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14tracing: Handle old buffer mappings for event strings and functionsSteven Rostedt (Google)
Use the saved text_delta and data_delta of a persistent memory mapped ring buffer that was saved from a previous boot, and use the delta in the trace event print output so that strings and functions show up normally. That is, for an event like trace_kmalloc() that prints the callsite via "%pS", if it used the address saved in the ring buffer it will not match the function that was saved in the previous boot if the kernel remaps itself between boots. For RCU events that point to saved static strings where only the address of the string is saved in the ring buffer, it too will be adjusted to point to where the string is on the current boot. Link: https://lkml.kernel.org/r/20240612232026.821020753@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14tracing/ring-buffer: Add last_boot_info file to boot instanceSteven Rostedt (Google)
If an instance is mapped to memory on boot up, create a new file called "last_boot_info" that will hold information that can be used to properly parse the raw data in the ring buffer. It will export the delta of the addresses for text and data from what it was from the last boot. It does not expose actually addresses (unless you knew what the actual address was from the last boot). The output will look like: # cat last_boot_info text delta: -268435456 data delta: -268435456 The text and data are kept separate in case they are ever made different. Link: https://lkml.kernel.org/r/20240612232026.658680738@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14ring-buffer: Save text and data locations in mapped meta dataSteven Rostedt (Google)
When a ring buffer is mapped to a specific address, save the address of a text function and some data. This will be used to determine the delta between the last boot and the current boot for pointers to functions as well as to data. Link: https://lkml.kernel.org/r/20240612232026.496176678@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14tracing: Add option to use memmapped memory for trace boot instanceSteven Rostedt (Google)
Add an option to the trace_instance kernel command line parameter that allows it to use the reserved memory from memmap boot parameter. memmap=12M$0x284500000 trace_instance=boot_mapped@0x284500000:12M The above will reserves 12 megs at the physical address 0x284500000. The second parameter will create a "boot_mapped" instance and use the memory reserved as the memory for the ring buffer. That will create an instance called "boot_mapped": /sys/kernel/tracing/instances/boot_mapped Note, because the ring buffer is using a defined memory ranged, it will act just like a memory mapped ring buffer. It will not have a snapshot buffer, as it can't swap out the buffer. The snapshot files as well as any tracers that uses a snapshot will not be present in the boot_mapped instance. Link: https://lkml.kernel.org/r/20240612232026.329660169@goodmis.org Cc: linux-mm@kvack.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14ring-buffer: Validate boot range memory eventsSteven Rostedt (Google)
Make sure all the events in each of the sub-buffers that were mapped in a memory region are valid. This moves the code that walks the buffers for time-stamp validation out of the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS ifdef block and is used to validate the content. Only the ring buffer event meta data and time stamps are checked and not the data load. This also has a second purpose. The buffer_page structure that points to the data sub-buffers has accounting that keeps track of the number of events that are on the sub-buffer. This updates that counter as well. That counter is used in reading the buffer and knowing if the ring buffer is empty or not. Link: https://lkml.kernel.org/r/20240612232026.172503570@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14ring-buffer: Add test if range of boot buffer is validSteven Rostedt (Google)
Add a test against the ring buffer memory range to see if it has valid data. The ring_buffer_meta structure is given a new field called "first_buffer" which holds the address of the first sub-buffer. This is used to both determine if the other fields are valid as well as finding the offset between the old addresses of the sub-buffer from the previous boot to the new addresses of the current boot. Since the values for nr_subbufs and subbuf_size is to be the same, check if the values in the meta page match the values calculated. Take the range of the first_buffer and the total size of all the buffers and make sure the saved head_buffer and commit_buffer fall in the range. Iterate through all the sub-buffers to make sure that the values in the sub-buffer "commit" field (the field that holds the amount of data on the sub-buffer) is within the end of the sub-buffer. Also check the index array to make sure that all the indexes are within nr_subbufs. Link: https://lkml.kernel.org/r/20240612232026.013843655@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14ring-buffer: Add output of ring buffer meta pageSteven Rostedt (Google)
Add a buffer_meta per-cpu file for the trace instance that is mapped to boot memory. This shows the current meta-data and can be used by user space tools to record off the current mappings to help reconstruct the ring buffer after a reboot. It does not expose any virtual addresses, just indexes into the sub-buffer pages. Link: https://lkml.kernel.org/r/20240612232025.854471446@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14tracing: Implement creating an instance based on a given memory regionSteven Rostedt (Google)
Allow for creating a new instance by passing in an address and size to map the ring buffer for the instance to. This will allow features like a pstore memory mapped region to be used for an tracing instance ring buffer that can be retrieved from one boot to the next. Link: https://lkml.kernel.org/r/20240612232025.692086240@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14ring-buffer: Add ring_buffer_meta dataSteven Rostedt (Google)
Populate the ring_buffer_meta array. It holds the pointer to the head_buffer (next to read), the commit_buffer (next to write) the size of the sub-buffers, number of sub-buffers and an array that keeps track of the order of the sub-buffers. This information will be stored in the persistent memory to help on reboot to reconstruct the ring buffer. Link: https://lkml.kernel.org/r/20240612232025.530733577@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14ring-buffer: Add ring_buffer_alloc_range()Steven Rostedt (Google)
In preparation to allowing the trace ring buffer to be allocated in a range of memory that is persistent across reboots, add ring_buffer_alloc_range(). It takes a contiguous range of memory and will split it up evenly for the per CPU ring buffers. If there's not enough memory to handle all CPUs with the minimum size, it will fail to allocate the ring buffer. Link: https://lkml.kernel.org/r/20240612232025.363998725@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14ring-buffer: Allow mapped field to be set without mappingSteven Rostedt (Google)
In preparation for having the ring buffer mapped to a dedicated location, which will have the same restrictions as user space memory mapped buffers, allow it to use the "mapped" field of the ring_buffer_per_cpu structure without having the user space meta page mapping. When this starts using the mapped field, it will need to handle adding a user space mapping (and removing it) from a ring buffer that is using a dedicated memory range. Link: https://lkml.kernel.org/r/20240612232025.190908567@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Youssef Esmat <youssefesmat@google.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-13bpf: crypto: make state and IV dynptr nullableVadim Fedorenko
Some ciphers do not require state and IV buffer, but with current implementation 0-sized dynptr is always needed. With adjustment to verifier we can provide NULL instead of 0-sized dynptr. Make crypto kfuncs ready for this. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://lore.kernel.org/r/20240613211817.1551967-3-vadfed@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>