summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-07libbpf, doc: Eliminate warnings in libbpf_naming_conventionRandy Dunlap
Use "code-block: none" instead of "c" for non-C-language code blocks. Removes these warnings: lnx-514-rc4/Documentation/bpf/libbpf/libbpf_naming_convention.rst:111: WARNING: Could not lex literal_block as "c". Highlighting skipped. lnx-514-rc4/Documentation/bpf/libbpf/libbpf_naming_convention.rst:124: WARNING: Could not lex literal_block as "c". Highlighting skipped. Fixes: f42cfb469f9b ("bpf: Add documentation for libbpf including API autogen") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210802015037.787-1-rdunlap@infradead.org
2021-08-07libbpf: Do not close un-owned FD 0 on errorsDaniel Xu
Before this patch, btf_new() was liable to close an arbitrary FD 0 if BTF parsing failed. This was because: * btf->fd was initialized to 0 through the calloc() * btf__free() (in the `done` label) closed any FDs >= 0 * btf->fd is left at 0 if parsing fails This issue was discovered on a system using libbpf v0.3 (without BTF_KIND_FLOAT support) but with a kernel that had BTF_KIND_FLOAT types in BTF. Thus, parsing fails. While this patch technically doesn't fix any issues b/c upstream libbpf has BTF_KIND_FLOAT support, it'll help prevent issues in the future if more BTF types are added. It also allow the fix to be backported to older libbpf's. Fixes: 3289959b97ca ("libbpf: Support BTF loading and raw data output in both endianness") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/5969bb991adedb03c6ae93e051fd2a00d293cf25.1627513670.git.dxu@dxuuu.xyz
2021-08-07libbpf: Fix probe for BPF_PROG_TYPE_CGROUP_SOCKOPTRobin Gögge
This patch fixes the probe for BPF_PROG_TYPE_CGROUP_SOCKOPT, so the probe reports accurate results when used by e.g. bpftool. Fixes: 4cdbfb59c44a ("libbpf: support sockopt hooks") Signed-off-by: Robin Gögge <r.goegge@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20210728225825.2357586-1-r.goegge@gmail.com
2021-08-07powerpc/pseries: Fix update of LPAR security flavor after LPMLaurent Dufour
After LPM, when migrating from a system with security mitigation enabled to a system with mitigation disabled, the security flavor exposed in /proc is not correctly set back to 0. Do not assume the value of the security flavor is set to 0 when entering init_cpu_char_feature_flags(), so when called after a LPM, the value is set correctly even if the mitigation are not turned off. Fixes: 6ce56e1ac380 ("powerpc/pseries: export LPAR security flavor in lparcfg") Cc: stable@vger.kernel.org # v5.13+ Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805152308.33988-1-ldufour@linux.ibm.com
2021-08-07powerpc/smp: Fix OOPS in topology_init()Christophe Leroy
Running an SMP kernel on an UP platform not prepared for it, I encountered the following OOPS: BUG: Kernel NULL pointer dereference on read at 0x00000034 Faulting instruction address: 0xc0a04110 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K SMP NR_CPUS=2 CMPCPRO Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-pmac-00001-g230fedfaad21 #5234 NIP: c0a04110 LR: c0a040d8 CTR: c0a04084 REGS: e100dda0 TRAP: 0300 Not tainted (5.13.0-pmac-00001-g230fedfaad21) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 84000284 XER: 00000000 DAR: 00000034 DSISR: 20000000 GPR00: c0006bd4 e100de60 c1033320 00000000 00000000 c0942274 00000000 00000000 GPR08: 00000000 00000000 00000001 00000063 00000007 00000000 c0006f30 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000005 GPR24: c0c67d74 c0c67f1c c0c60000 c0c67d70 c0c0c558 1efdf000 c0c00020 00000000 NIP [c0a04110] topology_init+0x8c/0x138 LR [c0a040d8] topology_init+0x54/0x138 Call Trace: [e100de60] [80808080] 0x80808080 (unreliable) [e100de90] [c0006bd4] do_one_initcall+0x48/0x1bc [e100def0] [c0a0150c] kernel_init_freeable+0x1c8/0x278 [e100df20] [c0006f44] kernel_init+0x14/0x10c [e100df30] [c00190fc] ret_from_kernel_thread+0x14/0x1c Instruction dump: 7c692e70 7d290194 7c035040 7c7f1b78 5529103a 546706fe 5468103a 39400001 7c641b78 40800054 80c690b4 7fb9402e <81060034> 7fbeea14 2c080000 7fa3eb78 ---[ end trace b246ffbc6bbbb6fb ]--- Fix it by checking smp_ops before using it, as already done in several other places in the arch/powerpc/kernel/smp.c Fixes: 39f87561454d ("powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/75287841cbb8740edd44880fe60be66d489160d9.1628097995.git.christophe.leroy@csgroup.eu
2021-08-07powerpc/32: Fix critical and debug interrupts on BOOKEChristophe Leroy
32 bits BOOKE have special interrupts for debug and other critical events. When handling those interrupts, dedicated registers are saved in the stack frame in addition to the standard registers, leading to a shift of the pt_regs struct. Since commit db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry"), the pt_regs struct is expected to be at the same place all the time. Instead of handling a special struct in addition to pt_regs, just add those special registers to struct pt_regs. Fixes: db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry") Cc: stable@vger.kernel.org Reported-by: Radu Rendec <radu.rendec@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/028d5483b4851b01ea4334d0751e7f260419092b.1625637264.git.christophe.leroy@csgroup.eu
2021-08-07powerpc/32s: Fix napping restore in data storage interrupt (DSI)Christophe Leroy
When a DSI (Data Storage Interrupt) is taken while in NAP mode, r11 doesn't survive the call to power_save_ppc32_restore(). So use r1 instead of r11 as they both contain the virtual stack pointer at that point. Fixes: 4c0104a83fc3 ("powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE") Cc: stable@vger.kernel.org # v5.13+ Reported-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/731694e0885271f6ee9ffc179eb4bcee78313682.1628003562.git.christophe.leroy@csgroup.eu
2021-08-06kyber: make trace_block_rq call consistent with documentationVincent Fu
The kyber ioscheduler calls trace_block_rq_insert() *after* the request is added to the queue but the documentation for trace_block_rq_insert() says that the call should be made *before* the request is added to the queue. Move the tracepoint for the kyber ioscheduler so that it is consistent with the documentation. Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Link: https://lore.kernel.org/r/20210804194913.10497-1-vincent.fu@samsung.com Reviewed by: Adam Manzanares <a.manzanares@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-06power: supply: sbs-battery: add support for time_to_empty_now attributeMatthias Schiffer
As defined by the Smart Battery Data Specification. An _AVG suffix is added to the enum values REG_TIME_TO_EMPTY and REG_TIME_TO_FULL to make the distinction clear. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2021-08-06power: supply: sbs-battery: relax voltage limitMatthias Schiffer
The Smart Battery Data Specification allows for values 0..65535 mV, there is no reason to limit the value to 20000. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2021-08-06power: supply: qcom_smbb: Remove superfluous error messageTang Bin
In the probe function, when get irq failed, the function platform_get_irq_byname() logs an error message, so remove redundant message here. Co-developed-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2021-08-06dt-bindings: power: supply: axp20x-battery: Add AXP209 compatibleMaxime Ripard
The AXP209 compatible was used in Device Trees and the driver, but it was never documented. Cc: Chen-Yu Tsai <wens@csie.org> Cc: linux-pm@vger.kernel.org Cc: Sebastian Reichel <sre@kernel.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2021-08-06dt-bindings: power: supply: axp20x: Add AXP803 compatibleMaxime Ripard
The AXP803 compatible was introduced recently with a fallback to the AXP813, but it was never documented. Cc: Chen-Yu Tsai <wens@csie.org> Cc: linux-pm@vger.kernel.org Cc: Sebastian Reichel <sre@kernel.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2021-08-06power: supply: max17042_battery: Add support for MAX77849 Fuel-GaugeNikita Travkin
MAX77849 is a combined fuel-gauge, charger and MUIC IC. Notably, fuel-gauge has dedicated i2c lines and seems to be fully compatible with max17047. Add new compatible for it reusing max17047 code paths. Signed-off-by: Nikita Travkin <nikita@trvn.ru> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2021-08-06dt-bindings: power: supply: max17042: Document max77849-batteryNikita Travkin
max77849 is a combined fuel-gauge, charger and MUIC device. Add it to the bindings documentation. Signed-off-by: Nikita Travkin <nikita@trvn.ru> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2021-08-06drm/amdgpu: Add preferred mode in modeset when freesync video mode's enabled.Solomon Chiu
[Why] With kernel module parameter "freesync_video" is enabled, if the mode is changed to preferred mode(the mode with highest rate), then Freesync fails because the preferred mode is treated as one of freesync video mode, and then be configurated as freesync video mode(fixed refresh rate). [How] Skip freesync fixed rate configurating when modeset to preferred mode. Signed-off-by: Solomon Chiu <solomon.chiu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-08-06rcu: Print human-readable message for schedule() in RCU readerPaul E. McKenney
The WARN_ON_ONCE() invocation within the CONFIG_PREEMPT=y version of rcu_note_context_switch() triggers when there is a voluntary context switch in an RCU read-side critical section, but there is quite a gap between the output of that WARN_ON_ONCE() and this RCU-usage error. This commit therefore converts the WARN_ON_ONCE() to a WARN_ONCE() that explicitly describes the problem in its message. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Explain why rcu_all_qs() is a stub in preemptible TREE RCUFrederic Weisbecker
The cond_resched() function reports an RCU quiescent state only in non-preemptible TREE RCU implementation. This commit therefore adds a comment explaining why cond_resched() does nothing in preemptible kernels. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Use per_cpu_ptr to get the pointer of per_cpu variableLiu Song
There are a few remaining locations in kernel/rcu that still use "&per_cpu()". This commit replaces them with "per_cpu_ptr(&)", and does not introduce any functional change. Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Remove useless "ret" update in rcu_gp_fqs_loop()Liu Song
Within rcu_gp_fqs_loop(), the "ret" local variable is set to the return value from swait_event_idle_timeout_exclusive(), but "ret" is unconditionally overwritten later in the code. This commit therefore removes this useless assignment. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Mark accesses in tree_stall.hPaul E. McKenney
This commit marks the accesses in tree_stall.h so as to both avoid undesirable compiler optimizations and to keep KCSAN focused on the accesses of the core algorithm. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stackPaul E. McKenney
The kbuild test project found an oversized stack frame in rcu_gp_kthread() for some kernel configurations. This oversizing was due to a very large amount of inlining, which is unnecessary due to the fact that this code executes infrequently. This commit therefore marks rcu_gp_init() and rcu_gp_fqs_loop noinline_for_stack to conserve stack space. Reported-by: kernel test robot <lkp@intel.com> Tested-by: Rong Chen <rong.a.chen@intel.com> [ paulmck: noinline_for_stack per Nathan Chancellor. ] Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Mark lockless ->qsmask read in rcu_check_boost_fail()Paul E. McKenney
Accesses to ->qsmask are normally protected by ->lock, but there is an exception in the diagnostic code in rcu_check_boost_fail(). This commit therefore applies data_race() to this access to avoid KCSAN complaining about the C-language writes protected by ->lock. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06srcutiny: Mark read-side data racesPaul E. McKenney
This commit marks some interrupt-induced read-side data races in __srcu_read_lock(), __srcu_read_unlock(), and srcu_torture_stats_print(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Start timing stall repetitions after warning completePaul E. McKenney
Systems with low-bandwidth consoles can have very large printk() latencies, and on such systems it makes no sense to have the next RCU CPU stall warning message start output before the prior message completed. This commit therefore sets the time of the next stall only after the prints have completed. While printing, the time of the next stall message is set to ULONG_MAX/2 jiffies into the future. Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()Sergey Senozhatsky
rcu_cpu_stall_reset() is one of the functions virtual CPUs execute during VM resume in order to handle jiffies skew that can trigger false positive stall warnings. Paul has pointed out that this approach is problematic because rcu_cpu_stall_reset() disables RCU grace period stall-detection virtually forever, while in fact it can just restart the stall-detection timeout. Suggested-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu/tree: Handle VM stoppage in stall detectionSergey Senozhatsky
The soft watchdog timer function checks if a virtual machine was suspended and hence what looks like a lockup in fact is a false positive. This is what kvm_check_and_clear_guest_paused() does: it tests guest PVCLOCK_GUEST_STOPPED (which is set by the host) and if it's set then we need to touch all watchdogs and bail out. Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED check works fine. There is, however, one more watchdog that runs from IRQ, so watchdog timer fn races with it, and that watchdog is not aware of PVCLOCK_GUEST_STOPPED - RCU stall detector. apic_timer_interrupt() smp_apic_timer_interrupt() hrtimer_interrupt() __hrtimer_run_queues() tick_sched_timer() tick_sched_handle() update_process_times() rcu_sched_clock_irq() This triggers RCU stalls on our devices during VM resume. If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU before watchdog_timer_fn()->kvm_check_and_clear_guest_paused() then there is nothing on this VCPU that touches watchdogs and RCU reads stale gp stall timestamp and new jiffies value, which makes it think that RCU has stalled. Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and don't report RCU stalls when we resume the VM. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rculist: Unify documentation about missing list_empty_rcu()Julian Wiedmann
We have two separate sections that talk about why list_empty_rcu() is not needed, so this commit consolidates them. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> [ paulmck: The usual wordsmithing. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Mark accesses to ->rcu_read_lock_nestingPaul E. McKenney
KCSAN flags accesses to ->rcu_read_lock_nesting as data races, but in the past, the overhead of marked accesses was excessive. However, that was long ago, and much has changed since then, both in terms of hardware and of compilers. Here is data taken on an eight-core laptop using Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz with a kernel built using gcc version 9.3.0, with all data in nanoseconds. Unmarked accesses (status quo), measured by three refscale runs: Minimum reader duration: 3.286 2.851 3.395 Median reader duration: 3.698 3.531 3.4695 Maximum reader duration: 4.481 5.215 5.157 Marked accesses, also measured by three refscale runs: Minimum reader duration: 3.501 3.677 3.580 Median reader duration: 4.053 3.723 3.895 Maximum reader duration: 7.307 4.999 5.511 This focused microbenhmark shows only sub-nanosecond differences which are unlikely to be visible at the system level. This commit therefore marks data-racing accesses to ->rcu_read_lock_nesting. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Weaken ->dynticks accesses and updatesPaul E. McKenney
Accesses to the rcu_data structure's ->dynticks field have always been fully ordered because it was not possible to prove that weaker ordering was safe. However, with the removal of the rcu_eqs_special_set() function and the advent of the Linux-kernel memory model, it is now easy to show that two of the four original full memory barriers can be weakened to acquire and release operations. The remaining pair must remain full memory barriers. This change makes the memory ordering requirements more evident, and it might well also speed up the to-idle and from-idle fastpaths on some architectures. The following litmus test, adapted from one supplied off-list by Frederic Weisbecker, models the RCU grace-period kthread detecting an idle CPU that is concurrently transitioning to non-idle: C dynticks-from-idle { DYNTICKS=0; (* Initially idle. *) } P0(int *X, int *DYNTICKS) { int dynticks; int x; // Idle. dynticks = READ_ONCE(*DYNTICKS); smp_store_release(DYNTICKS, dynticks + 1); smp_mb(); // Now non-idle x = READ_ONCE(*X); } P1(int *X, int *DYNTICKS) { int dynticks; WRITE_ONCE(*X, 1); smp_mb(); dynticks = smp_load_acquire(DYNTICKS); } exists (1:dynticks=0 /\ 0:x=1) Running "herd7 -conf linux-kernel.cfg dynticks-from-idle.litmus" verifies this transition, namely, showing that if the RCU grace-period kthread (P1) sees another CPU as idle (P0), then any memory access prior to the start of the grace period (P1's write to X) will be seen by any RCU read-side critical section following the to-non-idle transition (P0's read from X). This is a straightforward use of full memory barriers to force ordering in a store-buffering (SB) litmus test. The following litmus test, also adapted from the one supplied off-list by Frederic Weisbecker, models the RCU grace-period kthread detecting a non-idle CPU that is concurrently transitioning to idle: C dynticks-into-idle { DYNTICKS=1; (* Initially non-idle. *) } P0(int *X, int *DYNTICKS) { int dynticks; // Non-idle. WRITE_ONCE(*X, 1); dynticks = READ_ONCE(*DYNTICKS); smp_store_release(DYNTICKS, dynticks + 1); smp_mb(); // Now idle. } P1(int *X, int *DYNTICKS) { int x; int dynticks; smp_mb(); dynticks = smp_load_acquire(DYNTICKS); x = READ_ONCE(*X); } exists (1:dynticks=2 /\ 1:x=0) Running "herd7 -conf linux-kernel.cfg dynticks-into-idle.litmus" verifies this transition, namely, showing that if the RCU grace-period kthread (P1) sees another CPU as newly idle (P0), then any pre-idle memory access (P0's write to X) will be seen by any code following the grace period (P1's read from X). This is a simple release-acquire pair forcing ordering in a message-passing (MP) litmus test. Of course, if the grace-period kthread detects the CPU as non-idle, it will refrain from reporting a quiescent state on behalf of that CPU, so there are no ordering requirements from the grace-period kthread in that case. However, other subsystems call rcu_is_idle_cpu() to check for CPUs being non-idle from an RCU perspective. That case is also verified by the above litmus tests with the proviso that the sense of the low-order bit of the DYNTICKS counter be inverted. Unfortunately, on x86 smp_mb() is as expensive as a cache-local atomic increment. This commit therefore weakens only the read from ->dynticks. However, the updates are abstracted into a rcu_dynticks_inc() function to ease any future changes that might be needed. [ paulmck: Apply Linus Torvalds feedback. ] Link: https://lore.kernel.org/lkml/20210721202127.2129660-4-paulmck@kernel.org/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Remove special bit at the bottom of the ->dynticks counterJoel Fernandes (Google)
Commit b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter") reserved a bit at the bottom of the ->dynticks counter to defer flushing of TLBs, but this facility never has been used. This commit therefore removes this capability along with the rcu_eqs_special_set() function used to trigger it. Link: https://lore.kernel.org/linux-doc/CALCETrWNPOOdTrFabTDd=H7+wc6xJ9rJceg6OL1S0rTV5pfSsA@mail.gmail.com/ Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: "Joel Fernandes (Google)" <joel@joelfernandes.org> [ paulmck: Forward-port to v5.13-rc1. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lockYanfei Xu
If rcu_print_task_stall() is invoked on an rcu_node structure that does not contain any tasks blocking the current grace period, it takes an early exit that fails to release that rcu_node structure's lock. This results in a self-deadlock, which is detected by lockdep. To reproduce this bug: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0" This will also result in other complaints, including RCU's scheduler hook complaining about blocking rather than preemption and an rcutorture writer stall. Only a partial RCU CPU stall warning message will be printed because of the self-deadlock. This commit therefore releases the lock on the rcu_print_task_stall() function's early exit path. Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled") Tested-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06rcu: Fix to include first blocked task in stall warningYanfei Xu
The for loop in rcu_print_task_stall() always omits ts[0], which points to the first task blocking the stalled grace period. This in turn fails to count this first task, which means that ndetected will be equal to zero when all CPUs have passed through their quiescent states and only one task is blocking the stalled grace period. This zero value for ndetected will in turn result in an incorrect "All QSes seen" message: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: Tasks blocked on level-1 rcu_node (CPUs 12-23): (detected by 15, t=6504 jiffies, g=164777, q=9011209) rcu: All QSes seen, last rcu_preempt kthread activity 1 (4295252379-4295252378), jiffies_till_next_fqs=1, root ->qsmask 0x2 BUG: sleeping function called from invalid context at include/linux/uaccess.h:156 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 70613, name: msgstress04 INFO: lockdep is turned off. Preemption disabled at: [<ffff8000104031a4>] create_object.isra.0+0x204/0x4b0 CPU: 15 PID: 70613 Comm: msgstress04 Kdump: loaded Not tainted 5.12.2-yoctodev-standard #1 Hardware name: Marvell OcteonTX CN96XX board (DT) Call trace: dump_backtrace+0x0/0x2cc show_stack+0x24/0x30 dump_stack+0x110/0x188 ___might_sleep+0x214/0x2d0 __might_sleep+0x7c/0xe0 This commit therefore fixes the loop to include ts[0]. Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled") Tested-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-08-06Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A regression fix, bug fix, and a comment cleanup for ext4" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix potential htree corruption when growing large_dir directories ext4: remove conflicting comment from __ext4_forget ext4: fix potential uninitialized access to retval in kmmpd
2021-08-06mtd: rawnand: Fix probe failure due to of_get_nand_secure_regions()Manivannan Sadhasivam
Due to 14f97f0b8e2b, the rawnand platforms without "secure-regions" property defined in DT fails to probe. The issue is, of_get_nand_secure_regions() errors out if of_property_count_elems_of_size() returns a negative error code. If the "secure-regions" property is not present in DT, then also we'll get -EINVAL from of_property_count_elems_of_size() but it should not be treated as an error for platforms not declaring "secure-regions" in DT. So fix this behaviour by checking for the existence of that property in DT and return 0 if it is not present. Fixes: 14f97f0b8e2b ("mtd: rawnand: Add a check in of_get_nand_secure_regions()") Reported-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Martin Kaiser <martin@kaiser.cx> Tested-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210727062813.32619-1-manivannan.sadhasivam@linaro.org
2021-08-06mtd: fix lock hierarchy in deregister_mtd_blktransDesmond Cheong Zhi Xi
There is a lock hierarchy of major_names_lock --> mtd_table_mutex. One existing chain is as follows: 1. major_names_lock --> loop_ctl_mutex (when blk_request_module calls loop_probe) 2. loop_ctl_mutex --> bdev->bd_mutex (when loop_control_ioctl calls loop_remove, which then calls del_gendisk) 3. bdev->bd_mutex --> mtd_table_mutex (when blkdev_get_by_dev calls __blkdev_get, which then calls blktrans_open) Since unregister_blkdev grabs the major_names_lock, we need to call it outside the critical section for mtd_table_mutex, otherwise we invert the lock hierarchy. Reported-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210717100719.728829-1-desmondcheongzx@gmail.com
2021-08-06mtd: devices: mchp48l640: Fix memory leak on cmdColin Ian King
The allocation for cmd is not being kfree'd on the return leading to a memory leak. Fix this by kfree'ing it. Addresses-Coverity: ("Resource leak") Fixes: 88d125026753 ("mtd: devices: add support for microchip 48l640 EERAM") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Heiko Schocher <hs@denx.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210712145214.101377-1-colin.king@canonical.com
2021-08-06Merge tag 'trace-v5.14-rc4-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Fix tracepoint race between static_call and callback data As callbacks to a tracepoint are paired with the data that is passed in when the callback is registered to the tracepoint, it must have that data passed to the callback when the tracepoint is triggered, else bad things will happen. To keep the two together, they are both assigned to a tracepoint structure and added to an array. The tracepoint call site will dereference the structure (via RCU) and call the callback in that structure along with the data in that structure. This keeps the callback and data tightly coupled. Because of the overhead that retpolines have on tracepoint callbacks, if there's only one callback attached to a tracepoint (a common case), then it is called via a static call (code modified to do a direct call instead of an indirect call). But to implement this, the data had to be decoupled from the callback, as now the callback is implemented via a direct call from the static call and not an indirect call from the dereferenced structure. Note, the static call only calls a callback used when there's a single callback attached to the tracepoint. If more than one callback is attached to the same tracepoint, then the static call will call an iterator function that goes back to dereferencing the structure keeping the callback and its data tightly coupled again. Issues can arise when going from 0 callbacks to one, as the static call is assigned to the callback, and it must take care that the data passed to it is loaded before the static call calls the callback. Going from 1 to 2 callbacks is not an issue, as long as the static call is updated to the iterator before the tracepoint structure array is updated via RCU. Going from 2 to more or back down to 2 is not an issue as the iterator can handle all theses cases. But going from 2 to 1, care must be taken as the static call is now calling a callback and the data that is loaded must be the data for that callback. Care was taken to ensure the callback and data would be in-sync, but after a bug was reported, it became clear that not enough was done to make sure that was the case. These changes address this. The first change is to compare the old and new data instead of the old and new callback, as it's the data that can corrupt the callback, even if the callback is the same (something getting freed). The next change is to convert these transitions into states, to make it easier to know when a synchronization is needed, and to perform those synchronizations. The problem with this patch is that it slows down disabling all events from under a second, to making it take over 10 seconds to do the same work. But that is addressed in the final patch. The final patch uses the RCU state functions to keep track of the RCU state between the transitions, and only needs to perform the synchronization if an RCU synchronization hasn't been done already. This brings the performance of disabling all events back to its original value. That's because no synchronization is required between disabling tracepoints but is required when enabling a tracepoint after its been disabled. If an RCU synchronization happens after the tracepoint is disabled, and before it is re-enabled, there's no need to do the synchronization again. Both the second and third patch have subtle complexities that they are separated into two patches. But because the second patch causes such a regression in performance, the third patch adds a "Fixes" tag to the second patch, such that the two must be backported together and not just the second patch" * tag 'trace-v5.14-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracepoint: Use rcu get state and cond sync for static call updates tracepoint: Fix static call function vs data state mismatch tracepoint: static call: Compare data on transition from 2->1 callees
2021-08-06Merge tag 'pm-5.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Fix a recent regression in the timer events oriented (TEO) cpuidle governor causing it to misbehave when idle state 0 is disabled and rename two local variables for improved clarity on top of that" * tag 'pm-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: teo: Rename two local variables in teo_select() cpuidle: teo: Fix alternative idle state lookup
2021-08-06Merge tag 'acpi-5.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Revert a recent ACPICA commit causing boot issues to appear on some systems" * tag 'acpi-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPICA: Fix memory leak caused by _CID repair function"
2021-08-06Merge tag 'soc-fixes-5.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "Lots of small fixes for Arm SoCs this time, nothing too worrying: - omap/beaglebone boot regression fix in gpt12 timer - revert for i.mx8 soc driver breaking as a platform_driver - kexec/kdump fixes for op-tee - various fixes for incorrect DT settings on imx, mvebu, omap, stm32, and tegra causing problems. - device tree fixes for static checks in nomadik, versatile, stm32 - code fixes for issues found in build testing and with static checking on tegra, ixp4xx, imx, omap" * tag 'soc-fixes-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (36 commits) soc: ixp4xx/qmgr: fix invalid __iomem access soc: ixp4xx: fix printing resources ARM: ixp4xx: goramo_mlr depends on old PCI driver ARM: ixp4xx: fix compile-testing soc drivers soc/tegra: Make regulator couplers depend on CONFIG_REGULATOR ARM: dts: nomadik: Fix up interrupt controller node names ARM: dts: stm32: Fix touchscreen IRQ line assignment on DHCOM ARM: dts: stm32: Disable LAN8710 EDPD on DHCOM ARM: dts: stm32: Prefer HW RTC on DHCOM SoM omap5-board-common: remove not physically existing vdds_1v8_main fixed-regulator ARM: dts: am437x-l4: fix typo in can@0 node ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218 bus: ti-sysc: AM3: RNG is GP only ARM: omap2+: hwmod: fix potential NULL pointer access arm64: dts: armada-3720-turris-mox: remove mrvl,i2c-fast-mode arm64: dts: armada-3720-turris-mox: fixed indices for the SDHC controllers ARM: dts: imx: Swap M53Menlo pinctrl_power_button/pinctrl_power_out pins ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init ARM: dts: colibri-imx6ull: limit SDIO clock to 25MHz arm64: dts: ls1028: sl28: fix networking for variant 2 ...
2021-08-06Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "It's all pretty minor but the main fix is sorting out how we deal with return values from 32-bit system calls as audit expects error codes to be sign-extended to 64 bits Summary: - Fix extension/truncation of return values from 32-bit system calls - Fix interaction between unwinding and tracing - Fix spurious toolchain warning emitted during make - Fix Kconfig help text for RANDOMIZE_MODULE_REGION_FULL" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: stacktrace: avoid tracing arch_stack_walk() arm64: stacktrace: fix comment arm64: fix the doc of RANDOMIZE_MODULE_REGION_FULL arm64: move warning about toolchains to archprepare arm64: fix compat syscall return truncation
2021-08-06Merge tag 'mips-fixes_5.14_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Fix PMD accounting change" * tag 'mips-fixes_5.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: check return value of pgtable_pmd_page_ctor
2021-08-06Merge tag 'spi-fix-v5.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A small collection of fixes for SPI, small mostly driver specific things plus a fix for module autoloading which hadn't been working properly for DT systems" * tag 'spi-fix-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: cadence-quadspi: Fix check condition for DTR ops spi: mediatek: Fix fifo transfer spi: imx: mx51-ecspi: Fix CONFIGREG delay comment spi: imx: mx51-ecspi: Fix low-speed CONFIGREG delay calculation spi: update modalias_show after of_device_uevent_modalias support spi: meson-spicc: fix memory leak in meson_spicc_remove spi: spi-mux: Add module info needed for autoloading
2021-08-06Merge tag 'dmaengine-fix-5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "A bunch of driver fixes, notably: - idxd driver fixes for submission race, driver remove sequence, setup sequence for MSIXPERM, array index and updating descriptor vector - usb-dmac, pm reference leak fix - xilinx_dma, read-after-free fix - uniphier-xdmac fix for using atomic readl_poll_timeout_atomic() - of-dma, router_xlate to return - imx-dma, generic dma fix" * tag 'dmaengine-fix-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: imx-dma: configure the generic DMA type to make it work dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is not yet available dmaengine: stm32-dmamux: Fix PM usage counter unbalance in stm32 dmamux ops dmaengine: stm32-dma: Fix PM usage counter imbalance in stm32 dma ops dmaengine: uniphier-xdmac: Use readl_poll_timeout_atomic() in atomic state dmaengine: idxd: fix submission race window dmaengine: idxd: fix sequence for pci driver remove() and shutdown() dmaengine: idxd: fix desc->vector that isn't being updated dmaengine: idxd: fix setup sequence for MSIXPERM table dmaengine: idxd: fix array index when int_handles are being used dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe() dmaengine: xilinx_dma: Fix read-after-free bug when terminating transfers
2021-08-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Several small recent regressions - rather more than usual, but nothing too scary. Good to know people are testing. - Typo causing incorrect operation of the mlx5 mkey cache expiration - Revert a CM patch that is breaking some ULPs - Typo breaking SRQ in rxe - Revert a rxe patch breaking icrc calculation - Static checker warning about unbalanced locking in hns - Subtle cxgb4 regression from a recent atomic to refcount conversion" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/iw_cxgb4: Fix refcount underflow while destroying cqs. RDMA/hns: Fix the double unlock problem of poll_sem RDMA/rxe: Restore setting tot_len in the IPv4 header RDMA/rxe: Use the correct size of wqe when processing SRQ RDMA/cma: Revert INIT-INIT patch RDMA/mlx5: Delay emptying a cache entry when a new MR is added to it recently
2021-08-06Merge tag 'sound-5.14-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes: - A few regression fixes (PCM core fixes, USB-audio fixes) - Follow up fixes for the USB-audio mixer changes in this cycle - A long-standing ALSA sequencer race bug fix - Usual device-specific quirks for HD- and USB-audio" * tag 'sound-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: seq: Fix racy deletion of subscriber ALSA: memalloc: Fix regression with SNDRV_DMA_TYPE_CONTINUOUS ALSA: pcm - fix mmap capability check for the snd-dummy driver ALSA: usb-audio: Avoid unnecessary or invalid connector selection at resume ALSA: hda/realtek: add mic quirk for Acer SF314-42 ALSA: usb-audio: Add registration quirk for JBL Quantum 600 ALSA: hda/realtek: Fix headset mic for Acer SWIFT SF314-56 (ALC256) ALSA: usb-audio: Fix superfluous autosuspend recovery ALSA: usb-audio: fix incorrect clock source setting ALSA: scarlett2: Fix line out/speaker switching notifications ALSA: scarlett2: Correct channel mute status after mute button pressed ALSA: scarlett2: Fix Direct Monitor control name for 2i2 ALSA: scarlett2: Fix Mute/Dim/MSD Mode control names
2021-08-06Merge tag 'drm-fixes-2021-08-06' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular weekly fixes pull, live from a Brisbane lockdown with kids at home. A big bunch of scattered amdgpu fixes, but they are all pretty small, minor i915 fixes, kmb, and one vmwgfx regression fixes, all pretty quiet for this time. amdgpu: - Fix potential out-of-bounds read when updating GPUVM mapping - Renoir powergating fix - Yellow Carp updates - 8K fix for navi1x - Beige Goby updates and new DIDs - Fix DMUB firmware version output - EDP fix - pmops config fix i915: - Call i915_globals_exit if pci_register_device fails - (follow on fix for section mismatch) - Correct SFC_DONE register offset kmb: - DMA fix - driver date/version macros vmwgfx: - Fix I/O memory access on 64-bit systems" * tag 'drm-fixes-2021-08-06' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: add DID for beige goby drm/amdgpu/display: fix DMUB firmware version info drm/amd/display: workaround for hard hang on HPD on native DP drm/amd/display: Fix resetting DCN3.1 HW when resuming from S4 drm/amd/display: Increase stutter watermark for dcn303 drm/amd/display: Fix Dynamic bpp issue with 8K30 with Navi 1X drm/amd/display: Assume LTTPR interop for DCN31+ drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled drm/amd/pm: update yellow carp pmfw interface version drm/i915: fix i915_globals_exit() section mismatch error drm/i915: Call i915_globals_exit() if pci_register_device() fails drm/i915: Correct SFC_DONE register offset drm/vmwgfx: Fix a 64bit regression on svga3 drm/amdgpu: fix the doorbell missing when in CGPG issue for renoir. drm/amdgpu: Fix out-of-bounds read when update mapping drm/kmb: Define driver date and major/minor version drm/kmb: Enable LCD DMA for low TVDDCV
2021-08-06ext4: fix potential htree corruption when growing large_dir directoriesTheodore Ts'o
Commit b5776e7524af ("ext4: fix potential htree index checksum corruption) removed a required restart when multiple levels of index nodes need to be split. Fix this to avoid directory htree corruptions when using the large_dir feature. Cc: stable@kernel.org # v5.11 Cc: Благодаренко Артём <artem.blagodarenko@gmail.com> Fixes: b5776e7524af ("ext4: fix potential htree index checksum corruption) Reported-by: Denis <denis@voxelsoft.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Restrict range element expansion in ipset to avoid soft lockup, from Jozsef Kadlecsik. 2) Memleak in error path for nf_conntrack_bridge for IPv4 packets, from Yajun Deng. 3) Simplify conntrack garbage collection strategy to avoid frequent wake-ups, from Florian Westphal. 4) Fix NFNLA_HOOK_FUNCTION_NAME string, do not include module name. 5) Missing chain family netlink attribute in chain description in nfnetlink_hook. 6) Incorrect sequence number on nfnetlink_hook dumps. 7) Use netlink request family in reply message for consistency. 8) Remove offload_pickup sysctl, use conntrack for established state instead, from Florian Westphal. 9) Translate NFPROTO_INET/ingress to NFPROTO_NETDEV/ingress, since NFPROTO_INET is not exposed through nfnetlink_hook. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nfnetlink_hook: translate inet ingress to netdev netfilter: conntrack: remove offload_pickup sysctl again netfilter: nfnetlink_hook: Use same family as request message netfilter: nfnetlink_hook: use the sequence number of the request message netfilter: nfnetlink_hook: missing chain family netfilter: nfnetlink_hook: strip off module name from hookfn netfilter: conntrack: collect all entries in one cycle netfilter: nf_conntrack_bridge: Fix memory leak when error netfilter: ipset: Limit the maximal range of consecutive elements to add/delete ==================== Link: https://lore.kernel.org/r/20210806151149.6356-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>