summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2019-01-29bpf: btf: allow typedef func_protoYonghong Song
Current implementation does not allow typedef func_proto. But it is actually allowed. -bash-4.4$ cat t.c typedef int (f) (int); f *g; -bash-4.4$ clang -O2 -g -c -target bpf t.c -Xclang -target-feature -Xclang +dwarfris -bash-4.4$ pahole -JV t.o File t.o: [1] PTR (anon) type_id=2 [2] TYPEDEF f type_id=3 [3] FUNC_PROTO (anon) return=4 args=(4 (anon)) [4] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED -bash-4.4$ This patch related btf verifier to allow such (typedef func_proto) patterns. Fixes: 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-29x86/speculation: Remove redundant arch_smt_update() invocationZhenzhong Duan
With commit a74cfffb03b7 ("x86/speculation: Rework SMT state change"), arch_smt_update() is invoked from each individual CPU hotplug function. Therefore the extra arch_smt_update() call in the sysfs SMT control is redundant. Fixes: a74cfffb03b7 ("x86/speculation: Rework SMT state change") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <konrad.wilk@oracle.com> Cc: <dwmw@amazon.co.uk> Cc: <bp@suse.de> Cc: <srinivas.eeda@oracle.com> Cc: <peterz@infradead.org> Cc: <hpa@zytor.com> Link: https://lkml.kernel.org/r/e2e064f2-e8ef-42ca-bf4f-76b612964752@default
2019-01-29Merge branch 'devx-async' into k.o/for-nextJason Gunthorpe
Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29futex: No need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190122152151.16139-40-gregkh@linuxfoundation.org
2019-01-29timers: Mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where fall through is indeed expected. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190123081413.GA3949@embeddedor
2019-01-29timekeeping/debug: No need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190122152151.16139-43-gregkh@linuxfoundation.org
2019-01-29genirq/debugfs: No need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/20190122152151.16139-50-gregkh@linuxfoundation.org
2019-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-01-29 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Teach verifier dead code removal, this also allows for optimizing / removing conditional branches around dead code and to shrink the resulting image. Code store constrained architectures like nfp would have hard time doing this at JIT level, from Jakub. 2) Add JMP32 instructions to BPF ISA in order to allow for optimizing code generation for 32-bit sub-registers. Evaluation shows that this can result in code reduction of ~5-20% compared to 64 bit-only code generation. Also add implementation for most JITs, from Jiong. 3) Add support for __int128 types in BTF which is also needed for vmlinux's BTF conversion to work, from Yonghong. 4) Add a new command to bpftool in order to dump a list of BPF-related parameters from the system or for a specific network device e.g. in terms of available prog/map types or helper functions, from Quentin. 5) Add AF_XDP sock_diag interface for querying sockets from user space which provides information about the RX/TX/fill/completion rings, umem, memory usage etc, from Björn. 6) Add skb context access for skb_shared_info->gso_segs field, from Eric. 7) Add support for testing flow dissector BPF programs by extending existing BPF_PROG_TEST_RUN infrastructure, from Stanislav. 8) Split BPF kselftest's test_verifier into various subgroups of tests in order better deal with merge conflicts in this area, from Jakub. 9) Add support for queue/stack manipulations in bpftool, from Stanislav. 10) Document BTF, from Yonghong. 11) Dump supported ELF section names in libbpf on program load failure, from Taeung. 12) Silence a false positive compiler warning in verifier's BTF handling, from Peter. 13) Fix help string in bpftool's feature probing, from Prashant. 14) Remove duplicate includes in BPF kselftests, from Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Glexiner: "A single regression fix to address the unintended breakage of posix cpu timers. This is caused by a new sanity check in the common code, which fails for posix cpu timers under certain conditions because the posix cpu timer code never updates the variable which is checked" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Unbreak timer rearming
2019-01-27Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "A small series of fixes which all address possible missed wakeups: - Document and fix the wakeup ordering of wake_q - Add the missing barrier in rcuwait_wake_up(), which was documented in the comment but missing in the code - Fix the possible missed wakeups in the rwsem and futex code" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Fix (possible) missed wakeup futex: Fix (possible) missed wakeup sched/wake_q: Fix wakeup ordering for wake_q sched/wake_q: Document wake_q_add() sched/wait: Fix rcuwait_wake_up() ordering
2019-01-27Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "A small set of fixes for the interrupt subsystem: - Fix a double increment in the irq descriptor allocator which resulted in a sanity check only being done for every second affinity mask - Add a missing device tree translation in the stm32-exti driver. Without that the interrupt association is completely wrong. - Initialize the mutex in the GIC-V3 MBI driver - Fix the alignment for aliasing devices in the GIC-V3-ITS driver so multi MSI allocations work correctly - Ensure that the initial affinity of a interrupt is not empty at startup time. - Drop bogus include in the madera irq chip driver - Fix KernelDoc regression" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Align PCI Multi-MSI allocation on their size genirq/irqdesc: Fix double increment in alloc_descs() genirq: Fix the kerneldoc comment for struct irq_affinity_desc irqchip/madera: Drop GPIO includes irqchip/gic-v3-mbi: Fix uninitialized mbi_lock irqchip/stm32-exti: Add domain translate function genirq: Make sure the initial affinity is not empty
2019-01-27sched/fair: Fix unnecessary increase of balance intervalVincent Guittot
In case of active balancing, we increase the balance interval to cover pinned tasks cases not covered by all_pinned logic. Neverthless, the active migration triggered by asym packing should be treated as the normal unbalanced case and reset the interval to default value, otherwise active migration for asym_packing can be easily delayed for hundreds of ms because of this pinned task detection mechanism. The same happens to other conditions tested in need_active_balance() like misfit task and when the capacity of src_cpu is reduced compared to dst_cpu (see comments in need_active_balance() for details). Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: valentin.schneider@arm.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-27sched/fair: Fix rounding bug for asym packingVincent Guittot
When check_asym_packing() is triggered, the imbalance is set to: busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE But busiest_stat.avg_load equals: sgs->group_load * SCHED_CAPACITY_SCALE / sgs->group_capacity These divisions can generate a rounding that will make imbalance slightly lower than the weighted load of the cfs_rq. But this is enough to skip the rq in find_busiest_queue() and prevents asym migration from happening. Directly set imbalance to busiest's sgs->group_load to remove the rounding. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: valentin.schneider@arm.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-27sched/fair: Trigger asym_packing during idle load balanceVincent Guittot
Newly idle load balancing is not always triggered when a CPU becomes idle. This prevents the scheduler from getting a chance to migrate the task for asym packing. Enable active migration during idle load balance too. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: valentin.schneider@arm.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-27sched/fair: Robustify CFS-bandwidth timer lockingPeter Zijlstra
Traditionally hrtimer callbacks were run with IRQs disabled, but with the introduction of HRTIMER_MODE_SOFT it is possible they run from SoftIRQ context, which does _NOT_ have IRQs disabled. Allow for the CFS bandwidth timers (period_timer and slack_timer) to be ran from SoftIRQ context; this entails removing the assumption that IRQs are already disabled from the locking. While mainline doesn't strictly need this, -RT forces all timers not explicitly marked with MODE_HARD into MODE_SOFT and trips over this. And marking these timers as MODE_HARD doesn't make sense as they're not required for RT operation and can potentially be quite expensive. Reported-by: Tom Putzeys <tom.putzeys@be.atlascopco.com> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190107125231.GE14122@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-27sched/core: Give DCE a fighting chancePeter Zijlstra
All that fancy new Energy-Aware scheduling foo is hidden behind a static_key, which is awesome if you have the stuff enabled in your config. However, when you lack all the prerequisites it doesn't make any sense to pretend we'll ever actually run this, so provide a little more clue to the compiler so it can more agressively delete the code. text data bss dec hex filename 50297 976 96 51369 c8a9 defconfig-build/kernel/sched/fair.o 49227 944 96 50267 c45b defconfig-build/kernel/sched/fair.o Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-27sched/topology: Introduce a sysctl for Energy Aware SchedulingQuentin Perret
In its current state, Energy Aware Scheduling (EAS) starts automatically on asymmetric platforms having an Energy Model (EM). However, there are users who want to have an EM (for thermal management for example), but don't want EAS with it. In order to let users disable EAS explicitly, introduce a new sysctl called 'sched_energy_aware'. It is enabled by default so that EAS can start automatically on platforms where it makes sense. Flipping it to 0 rebuilds the scheduling domains and disables EAS. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: rjw@rjwysocki.net Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-11-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-26bpf: JIT blinds support JMP32Jiong Wang
This patch adds JIT blinds support for JMP32. Like BPF_JMP_REG/IMM, JMP32 version are needed for building raw bpf insn. They are added to both include/linux/filter.h and tools/include/linux/filter.h. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-26bpf: interpreter support for JMP32Jiong Wang
This patch implements interpreting new JMP32 instructions. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-26bpf: disassembler support JMP32Jiong Wang
This patch teaches disassembler about JMP32. There are two places to update: - Class 0x6 now used by BPF_JMP32, not "unused". - BPF_JMP32 need to show comparison operands properly. The disassemble format is to add an extra "(32)" before the operands if it is a sub-register. A better disassemble format for both JMP32 and ALU32 just show the register prefix as "w" instead of "r", this is the format using by LLVM assembler. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-26bpf: verifier support JMP32Jiong Wang
This patch teach verifier about the new BPF_JMP32 instruction class. Verifier need to treat it similar as the existing BPF_JMP class. A BPF_JMP32 insn needs to go through all checks that have been done on BPF_JMP. Also, verifier is doing runtime optimizations based on the extra info conditional jump instruction could offer, especially when the comparison is between constant and register that the value range of the register could be improved based on the comparison results. These code are updated accordingly. Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-26bpf: refactor verifier min/max code for condition jumpJiong Wang
The current min/max code does both signed and unsigned comparisons against the input argument "val" which is "u64" and there is explicit type casting when the comparison is signed. As we will need slightly more complexer type casting when JMP32 introduced, it is better to host the signed type casting. This makes the code more clean with ignorable runtime overhead. Also, code for J*GE/GT/LT/LE and JEQ/JNE are very similar, this patch combine them. The main purpose for this refactor is to make sure the min/max code will still be readable and with minimum code duplication after JMP32 introduced. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-25rcuperf: Stop abusing IS_ENABLED()Paul E. McKenney
The ever-evolving IS_ENABLED() macro is intended for CONFIG_* Kconfig options, but rcuperf currently uses it for the decidedly non-CONFIG_* MODULE macro. In the spirit of not inviting trouble, this commit substitutes tried-and-true #ifdef. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org>
2019-01-25rcutorture: Add grace period after CPU offlinePaul E. McKenney
Beyond a certain point in the CPU-hotplug offline process, timers get stranded on the outgoing CPU, and won't fire until that CPU comes back online, which might well be never. This commit therefore adds a hook in torture_onoff_init() that is invoked from torture_offline(), which rcutorture uses to occasionally wait for a grace period. This should result in failures for RCU implementations that rely on stranded timers eventually firing in the absence of the CPU coming back online. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcutorture: Record grace periods in forward-progress histogramPaul E. McKenney
This commit records grace periods in rcutorture's n_launders_hist[] histogram, thus allowing rcu_torture_fwd_cb_hist() to print out the elapsed number of grace periods between buckets. This information helps to determine whether a lack of forward progress is due to stalled grace periods on the one hand or due to sluggish callback invocation on the other. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25srcu: Remove srcu_queue_delayed_work_on()Sebastian Andrzej Siewior
srcu_queue_delayed_work_on() disables preemption (and therefore CPU hotplug in RCU's case) and then checks based on its own accounting if a CPU is online. If the CPU is online it uses queue_delayed_work_on() otherwise it fallbacks to queue_delayed_work(). The problem here is that queue_work() on -RT does not work with disabled preemption. queue_work_on() works also on an offlined CPU. queue_delayed_work_on() has the problem that it is possible to program a timer on an offlined CPU. This timer will fire once the CPU is online again. But until then, the timer remains programmed and nothing will happen. Add a local timer which will fire (as requested per delay) on the local CPU and then enqueue the work on the specific CPU. RCUtorture testing with SRCU-P for 24h showed no problems. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Fix obsolete DYNTICK_IRQ_NONIDLE commentPaul E. McKenney
This commit updates the DYNTICK_IRQ_NONIDLE header comment to remove the obsolete commentary about unmatched rcu_irq_{enter,exit}(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Repair rcu_nmi_exit() docbook headerPaul E. McKenney
This commit removes the "@irq" argument from the rcu_nmi_exit() docbook header, given that this function now has no arguments. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Remove preemption disabling from expedited CPU selectionPaul E. McKenney
It turns out that it is queue_delayed_work_on() rather than queue_work_on() that has difficulties when used concurrently with CPU-hotplug removal operations. It is therefore unnecessary to protect CPU identification and queue_work_on() with preempt_disable(). This commit therefore removes the preempt_disable() and preempt_enable() from sync_rcu_exp_select_cpus(), which has the further benefit of reducing the number of changes that must be maintained in the -rt patchset. Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Suggested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Rename rcu_process_callbacks() to rcu_core() for Tree RCUPaul E. McKenney
Although the name rcu_process_callbacks() still makes sense for Tiny RCU, where most of what it does is invoke callbacks, it no longer makes much sense for Tree RCU, especially given that the actually callback invocation is relegated to rcu_do_batch(), or, for no-CBs CPUs, to the rcuo kthreads. Especially in the latter case, rcu_process_callbacks() has very little to do with actual callbacks. A better description of this function is that it performs RCU's core processing. This commit therefore changes the name of Tree RCU's rcu_process_callbacks() function to rcu_core(), which also has the virtue of being consistent with the existing invoke_rcu_core() function. While in the area, the header comment is reworked. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Rename rcu_check_callbacks() to rcu_sched_clock_irq()Paul E. McKenney
The name rcu_check_callbacks() arguably made sense back in the early 2000s when RCU was quite a bit simpler than it is today, but it has become quite misleading, especially with the advent of dyntick-idle and NO_HZ_FULL. The rcu_check_callbacks() function is RCU's hook into the scheduling-clock interrupt, and is now but one of many ways that callbacks get promoted to invocable state. This commit therefore changes the name to rcu_sched_clock_irq(), which is the same number of characters and clearly indicates this function's relation to the rest of the Linux kernel. In addition, for the sake of consistency, rcu_flavor_check_callbacks() is also renamed to rcu_flavor_sched_clock_irq(). While in the area, the header comments for both functions are reworked. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25Merge branches 'consolidate.2019.01.26a' and 'fwd.2019.01.26a' into HEADPaul E. McKenney
consolidate.2019.01.26a: RCU flavor consolidation cleanups. fwd.2019.01.26a: RCU grace-period forward-progress fixes.
2019-01-25rcu: Prevent needless ->gp_seq_needed update in __note_gp_changes()Zhang, Jun
Currently, __note_gp_changes() checks to see if the rcu_node structure's ->gp_seq_needed is greater than or equal to that of the rcu_data structure, and if so, updates the rcu_data structure's ->gp_seq_needed field. This results in a useless store in the case where the two fields are equal. This commit therefore carries out this store only in the case where the rcu_node structure's ->gp_seq_needed is strictly greater than that of the rcu_data structure. Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Link: https://lkml.kernel.org/r/88DC34334CA3444C85D647DBFA962C2735AD5F77@SHSMSX104.ccr.corp.intel.com
2019-01-25rcu: Do RCU GP kthread self-wakeup from softirq and interruptZhang, Jun
The rcu_gp_kthread_wake() function is invoked when it might be necessary to wake the RCU grace-period kthread. Because self-wakeups are normally a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from this kthread, it naturally refuses to do the wakeup. Unfortunately, natural though it might be, this heuristic fails when rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler that interrupted the grace-period kthread just after the final check of the wait-event condition but just before the schedule() call. In this case, a wakeup is required, even though the call to rcu_gp_kthread_wake() is within the RCU grace-period kthread's context. Failing to provide this wakeup can result in grace periods failing to start, which in turn results in out-of-memory conditions. This race window is quite narrow, but it actually did happen during real testing. It would of course need to be fixed even if it was strictly theoretical in nature. This patch does not Cc stable because it does not apply cleanly to earlier kernel versions. Fixes: 48a7639ce80c ("rcu: Make callers awaken grace-period kthread") Reported-by: "He, Bo" <bo.he@intel.com> Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com> Co-developed-by: "He, Bo" <bo.he@intel.com> Co-developed-by: "xiao, jin" <jin.xiao@intel.com> Co-developed-by: Bai, Jie A <jie.a.bai@intel.com> Signed-off: "Zhang, Jun" <jun.zhang@intel.com> Signed-off: "He, Bo" <bo.he@intel.com> Signed-off: "xiao, jin" <jin.xiao@intel.com> Signed-off: Bai, Jie A <jie.a.bai@intel.com> Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com> [ paulmck: Switch from !in_softirq() to "!in_interrupt() && !in_serving_softirq() to avoid redundant wakeups and to also handle the interrupt-handler scenario as well as the softirq-handler scenario that actually occurred in testing. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com
2019-01-25rcu: Add sysrq rcu_node-dump capabilityPaul E. McKenney
Life is hard if RCU manages to get stuck without triggering RCU CPU stall warnings or triggering the rcu_check_gp_start_stall() checks for failing to start a grace period. This commit therefore adds a boot-time-selectable sysrq key (commandeering "y") that allows manually dumping Tree RCU state. The new rcutree.sysrq_rcu kernel boot parameter must be set for this sysrq to be available. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Protect rcu_check_gp_kthread_starvation() access to ->gp_flagsPaul E. McKenney
The rcu_check_gp_kthread_starvation() function can be invoked without holding locks, so the access to the rcu_state structure's ->gp_flags field must be protected with READ_ONCE(). This commit therefore adds this protection. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Improve diagnostics for failed RCU grace-period startPaul E. McKenney
If a grace period fails to start (for example, because you commented out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core() will invoke rcu_check_gp_start_stall(), which will notice and complain. However, this complaint is lacking crucial debugging information such as when the last wakeup executed and what the value of ->gp_seq was at that time. This commit therefore removes the current pr_alert() from rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(), which has been updated to print the needed information, which is collected by rcu_gp_kthread_wake(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Update NOCB commentsPaul E. McKenney
This commit updates a few obsolete comments in the RCU callback-offload code. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Remove unused rcu_cpu_kthread_cpu per-CPU variablePaul E. McKenney
The rcu_cpu_kthread_cpu used to provide debugfs information, but is no longer used. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Move rcu_cpu_has_work to rcu_data structurePaul E. McKenney
Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_has_work per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Remove unused rcu_cpu_kthread_loops per-CPU variablePaul E. McKenney
The rcu_cpu_kthread_loops variable used to provide debugfs information, but is no longer used. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Move rcu_cpu_kthread_status to rcu_data structurePaul E. McKenney
Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_kthread_status per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Move rcu_cpu_kthread_task to rcu_data structurePaul E. McKenney
Given that RCU has a perfectly good per-CPU rcu_data structure, most per-CPU quantities should be stored there. This commit therefore moves the rcu_cpu_kthread_task per-CPU variable to the rcu_data structure. This also makes this variable unconditionally present, which should be acceptable given the memory reduction due to the RCU flavor consolidation and also due to simplifications this will enable. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Accommodate zero jiffies_till_first_fqs and kthread kickingPaul E. McKenney
It is perfectly fine to set the rcutree.jiffies_till_first_fqs boot parameter to zero, in fact, this can be useful on specialty systems that usually have at least one idle CPU and that need fast grace periods. This is because this setting causes the RCU grace-period kthread to scan for idle threads immediately after grace-period initialization, as opposed to waiting several jiffies to do so. It is also perfectly fine to set the rcutree.rcu_kick_kthreads kernel parameter, which gives the RCU grace-period kthread an extra wakeup if it doesn't make progress for a period of three times the setting of the rcutree.jiffies_till_first_fqs boot parameter. This is of course problematic when the value of this parameter is zero, as it can result in unnecessary wakeup IPIs along with unnecessary WARN_ONCE() invocations. This commit therefore defers kthread kicking for at least two jiffies, regardless of the setting of rcutree.jiffies_till_first_fqs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Discard separate per-CPU callback countsPaul E. McKenney
Back when there were multiple flavors of RCU, it was necessary to separately count lazy and non-lazy callbacks for each CPU. These counts were used in CONFIG_RCU_FAST_NO_HZ kernels to determine how long a newly idle CPU should be allowed to sleep before handling its RCU callbacks. But now that there is only one flavor, the callback counts for a given CPU's sole rcu_data structure are the counts for that CPU. This commit therefore removes the rcu_data structure's ->nonlazy_posted and ->nonlazy_posted_snap fields, the rcu_idle_count_callbacks_posted() and rcu_cpu_has_callbacks() functions, repurposes the rcu_data structure's ->all_lazy field to record the laziness state at the beginning of the latest idle sojourn, and modifies CONFIG_RCU_FAST_NO_HZ RCU CPU stall warnings accordingly. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Inline _synchronize_rcu_expedited() into synchronize_rcu_expedited()Paul E. McKenney
Now that _synchronize_rcu_expedited() has only one caller, and given that this is a tail call, this commit inlines _synchronize_rcu_expedited() into synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu()Paul E. McKenney
Now that rcu_blocking_is_gp() makes the correct immediate-return decision for both PREEMPT and !PREEMPT, a single implementation of synchronize_rcu() will work correctly under both configurations. This commit therefore eliminates a few lines of code by consolidating the two implementations of synchronize_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu_expedited()Paul E. McKenney
The CONFIG_PREEMPT=n and CONFIG_PREEMPT=y implementations of synchronize_rcu_expedited() are quite similar, and with small modifications to rcu_blocking_is_gp() can be made identical. This commit therefore makes this change in order to save a few lines of code and to reduce the amount of duplicate code. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Determine expedited-GP IPI handler at build timePaul E. McKenney
Back when there could be multiple RCU flavors running in the same kernel at the same time, it was necessary to specify the expedited grace-period IPI handler at runtime. Now that there is only one RCU flavor, the IPI handler can be determined at build time. There is therefore no longer any reason for the RCU-preempt and RCU-sched IPI handlers to have different names, nor is there any reason to pass these handlers in function arguments and in the data structures enclosing workqueues. This commit therefore makes all these changes, pushing the specification of the expedited grace-period IPI handler down to the point of use. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25rcu: Inline rcu_kthread_do_work() into its sole remaining callerPaul E. McKenney
The rcu_kthread_do_work() function has a single-line body and only one remaining caller. This commit therefore saves a few lines of code by inlining rcu_kthread_do_work() into its sole remaining caller. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>