summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-01-24libbpf: Fix realloc usage in bpf_core_find_candsAndrii Nakryiko
Fix bug requesting invalid size of reallocated array when constructing CO-RE relocation candidate list. This can cause problems if there are many potential candidates and a very fine-grained memory allocator bucket sizes are used. Fixes: ddc7c3042614 ("libbpf: implement BPF CO-RE offset relocation algorithm") Reported-by: William Smith <williampsmith@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200124201847.212528-1-andriin@fb.com
2020-01-24ice: Enable writing hardware filtering tablesTony Nguyen
Enable the driver to write the filtering hardware tables to allow for changing of RSS rules. Upon loading of DDP package, a minimal configuration should be written to hardware. Introduce and initialize structures for storing configuration and make the top level calls to configure the RSS tables to initial values. A packet segment will be created but nothing is written to hardware yet. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-24libbpf: Improve handling of failed CO-RE relocationsAndrii Nakryiko
Previously, if libbpf failed to resolve CO-RE relocation for some instructions, it would either return error immediately, or, if .relaxed_core_relocs option was set, would replace relocatable offset/imm part of an instruction with a bogus value (-1). Neither approach is good, because there are many possible scenarios where relocation is expected to fail (e.g., when some field knowingly can be missing on specific kernel versions). On the other hand, replacing offset with invalid one can hide programmer errors, if this relocation failue wasn't anticipated. This patch deprecates .relaxed_core_relocs option and changes the approach to always replacing instruction, for which relocation failed, with invalid BPF helper call instruction. For cases where this is expected, BPF program should already ensure that that instruction is unreachable, in which case this invalid instruction is going to be silently ignored. But if instruction wasn't guarded, BPF program will be rejected at verification step with verifier log pointing precisely to the place in assembly where the problem is. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200124053837.2434679-1-andriin@fb.com
2020-01-24selftests: bpf: Reset global state between reuseport test runsLorenz Bauer
Currently, there is a lot of false positives if a single reuseport test fails. This is because expected_results and the result map are not cleared. Zero both after individual test runs, which fixes the mentioned false positives. Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-5-lmb@cloudflare.com
2020-01-24selftests: bpf: Make reuseport test output more legibleLorenz Bauer
Include the name of the mismatching result in human readable format when reporting an error. The new output looks like the following: unexpected result result: [1, 0, 0, 0, 0, 0] expected: [0, 0, 0, 0, 0, 0] mismatch on DROP_ERR_INNER_MAP (bpf_prog_linum:153) check_results:FAIL:382 Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-4-lmb@cloudflare.com
2020-01-24selftests: bpf: Ignore FIN packets for reuseport testsLorenz Bauer
The reuseport tests currently suffer from a race condition: FIN packets count towards DROP_ERR_SKB_DATA, since they don't contain a valid struct cmd. Tests will spuriously fail depending on whether check_results is called before or after the FIN is processed. Exit the BPF program early if FIN is set. Fixes: 91134d849a0e ("bpf: Test BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-3-lmb@cloudflare.com
2020-01-24selftests: bpf: Use a temporary file in test_sockmapLorenz Bauer
Use a proper temporary file for sendpage tests. This means that running the tests doesn't clutter the working directory, and allows running the test on read-only filesystems. Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200124112754.19664-2-lmb@cloudflare.com
2020-01-24Merge tag 'omap-for-fixes-whenever-signed' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Few minor fixes for omaps Looks like we have wrong default memory size for beaglebone black, it has at least 512 MB of RAM and not 256 MB. This causes an issue when booted with GRUB2 that does not seem to pass memory info to the kernel. And for am43x-epos-evm the SPI pin directions need to be configured for SPI to work. * tag 'omap-for-fixes-whenever-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1 ARM: dts: am335x-boneblack-common: fix memory size Link: https://lore.kernel.org/r/pull-1579895109-287828@atomide.com Signed-off-by: Olof Johansson <olof@lixom.net>
2020-01-24Merge tag 'tee-optee-fix2-for-5.5' of ↵Olof Johansson
https://git.linaro.org:/people/jens.wiklander/linux-tee into arm/fixes Fix OP-TEE compile error with nommu * tag 'tee-optee-fix2-for-5.5' of https://git.linaro.org:/people/jens.wiklander/linux-tee: tee: optee: Fix compilation issue with nommu Link: https://lore.kernel.org/r/20200123101310.GA10320@jax Signed-off-by: Olof Johansson <olof@lixom.net>
2020-01-24net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel pathTariq Toukan
When TCP out-of-order is identified (unexpected tcp seq mismatch), driver analyzes the packet and decides what handling should it get: 1. go to accelerated path (to be encrypted in HW), 2. go to regular xmit path (send w/o encryption), 3. drop. Packets marked with skb->decrypted by the TLS stack in the TX flow skips SW encryption, and rely on the HW offload. Verify that such packets are never sent un-encrypted on the wire. Add a WARN to catch such bugs, and prefer dropping the packet in these cases. Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5e: kTLS, Remove redundant posts in TX resync flowTariq Toukan
The call to tx_post_resync_params() is done earlier in the flow, the post of the control WQEs is unnecessarily repeated. Remove it. Fixes: 700ec4974240 ("net/mlx5e: kTLS, Fix missing SQ edge fill") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5e: kTLS, Fix corner-case checks in TX resync flowTariq Toukan
There are the following cases: 1. Packet ends before start marker: bypass offload. 2. Packet starts before start marker and ends after it: drop, not supported, breaks contract with kernel. 3. packet ends before tls record info starts: drop, this packet was already acknowledged and its record info was released. Add the above as comment in code. Mind possible wraparounds of the TCP seq, replace the simple comparison with a call to the TCP before() method. In addition, remove logic that handles negative sync_len values, as it became impossible. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Fixes: 46a3ea98074e ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5e: Clear VF config when switching modesDmytro Linkin
Currently VF in LEGACY mode are not able to go up. Also in OFFLOADS mode, when switching to it first time, VF can go up independently to his representor, which is not expected. Perform clearing of VF config when switching modes and set link state to AUTO as default value. Also, when switching to OFFLOADS mode set link state to DOWN, which allow VF link state to be controlled by its REP. Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Fixes: 556b9d16d3f5 ("net/mlx5: Clear VF's configuration on disabling SRIOV") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: DR, use non preemptible call to get the current cpu numberErez Shitrit
Use raw_smp_processor_id instead of smp_processor_id() otherwise we will get the following trace in debug-kernel: BUG: using smp_processor_id() in preemptible [00000000] code: devlink caller is dr_create_cq.constprop.2+0x31d/0x970 [mlx5_core] Call Trace: dump_stack+0x9a/0xf0 debug_smp_processor_id+0x1f3/0x200 dr_create_cq.constprop.2+0x31d/0x970 genl_family_rcv_msg+0x5fd/0x1170 genl_rcv_msg+0xb8/0x160 netlink_rcv_skb+0x11e/0x340 Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: E-Switch, Prevent ingress rate configuration of uplink repEli Cohen
Since the implementation relies on limiting the VF transmit rate to simulate ingress rate limiting, and since either uplink representor or ecpf are not associated with a VF, we limit the rate limit configuration for those ports. Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: DR, Enable counter on non-fwd-dest objectsErez Shitrit
The current code handles only counters that attached to dest, we still have the cases where we have counter on non-dest, like over drop etc. Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation") Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: Update the list of the PCI supported devicesMeir Lichtinger
Add the upcoming ConnectX-7 device ID. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24net/mlx5: Fix lowest FDB pool sizePaul Blakey
The pool sizes represent the pool sizes in the fw. when we request a pool size from fw, it will return the next possible group. We track how many pools the fw has left and start requesting groups from the big to the small. When we start request 4k group, which doesn't exists in fw, fw wants to allocate the next possible size, 64k, but will fail since its exhausted. The correct smallest pool size in fw is 128 and not 4k. Fixes: e52c28024008 ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-24alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=nStephen Boyd
The stubbed version of alarmtimer_get_rtcdev() is not exported. so this won't work if this function is used in a module when CONFIG_RTC_CLASS=n. Move the stub function to the header file and make it inline so that callers don't have to worry about linking against this symbol. rtcdev isn't used outside of this ifdef so it's not required to be redefined to NULL. Drop that while touching this area. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200124055849.154411-4-swboyd@chromium.org
2020-01-24alarmtimer: Use wakeup source from alarmtimer platform deviceStephen Boyd
Use the wakeup source that can be associated with the 'alarmtimer' platform device instead of registering another one by hand. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200124055849.154411-3-swboyd@chromium.org
2020-01-24alarmtimer: Make alarmtimer platform device child of RTC deviceStephen Boyd
The alarmtimer_suspend() function will fail if an RTC device is on a bus such as SPI or i2c and that RTC device registers and probes after alarmtimer_init() registers and probes the 'alarmtimer' platform device. This is because system wide suspend suspends devices in the reverse order of their probe. When alarmtimer_suspend() attempts to program the RTC for a wakeup it will try to program an RTC device on a bus that has already been suspended. Move the alarmtimer device registration to happen when the RTC which is used for wakeup is registered. Register the 'alarmtimer' platform device as a child of the RTC device too, so that it can be guaranteed that the RTC device won't be suspended when alarmtimer_suspend() is called. Reported-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200124055849.154411-2-swboyd@chromium.org
2020-01-24alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect realityStephen Boyd
This function doesn't do anything like this comment says when an RTC device hasn't been chosen. It looks like we used to do something like that before commit 8bc0dafb5cf3 ("alarmtimers: Rework RTC device selection using class interface") but that's long gone now. Remove this sentence to avoid confusing the reader. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200124055849.154411-5-swboyd@chromium.org
2020-01-24net: Fix skb->csum update in inet_proto_csum_replace16().Praveen Chaudhary
skb->csum is updated incorrectly, when manipulation for NF_NAT_MANIP_SRC\DST is done on IPV6 packet. Fix: There is no need to update skb->csum in inet_proto_csum_replace16(), because update in two fields a.) IPv6 src/dst address and b.) L4 header checksum cancels each other for skb->csum calculation. Whereas inet_proto_csum_replace4 function needs to update skb->csum, because update in 3 fields a.) IPv4 src/dst address, b.) IPv4 Header checksum and c.) L4 header checksum results in same diff as L4 Header checksum for skb->csum calculation. [ pablo@netfilter.org: a few comestic documentation edits ] Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com> Signed-off-by: Zhenggen Xu <zxu@linkedin.com> Signed-off-by: Andy Stracner <astracner@linkedin.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-24netfilter: nf_tables: autoload modules from the abort pathPablo Neira Ayuso
This patch introduces a list of pending module requests. This new module list is composed of nft_module_request objects that contain the module name and one status field that tells if the module has been already loaded (the 'done' field). In the first pass, from the preparation phase, the netlink command finds that a module is missing on this list. Then, a module request is allocated and added to this list and nft_request_module() returns -EAGAIN. This triggers the abort path with the autoload parameter set on from nfnetlink, request_module() is called and the module request enters the 'done' state. Since the mutex is released when loading modules from the abort phase, the module list is zapped so this is iteration occurs over a local list. Therefore, the request_module() calls happen when object lists are in consistent state (after fulling aborting the transaction) and the commit list is empty. On the second pass, the netlink command will find that it already tried to load the module, so it does not request it again and nft_request_module() returns 0. Then, there is a look up to find the object that the command was missing. If the module was successfully loaded, the command proceeds normally since it finds the missing object in place, otherwise -ENOENT is reported to userspace. This patch also updates nfnetlink to include the reason to enter the abort phase, which is required for this new autoload module rationale. Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module") Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-24netfilter: nf_tables: add __nft_chain_type_get()Pablo Neira Ayuso
This new helper function validates that unknown family and chain type coming from userspace do not trigger an out-of-bound array access. Bail out in case __nft_chain_type_get() returns NULL from nft_chain_parse_hook(). Fixes: 9370761c56b6 ("netfilter: nf_tables: convert built-in tables/chains to chain types") Reported-by: syzbot+156a04714799b1d480bc@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-24netfilter: nf_tables_offload: fix check the chain offload flagwenxu
In the nft_indr_block_cb the chain should check the flag with NFT_CHAIN_HW_OFFLOAD. Fixes: 9a32669fecfb ("netfilter: nf_tables_offload: support indr block call") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-24smp: Remove allocation mask from on_each_cpu_cond.*()Sebastian Andrzej Siewior
The allocation mask is no longer used by on_each_cpu_cond() and on_each_cpu_cond_mask() and can be removed. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-4-bigeasy@linutronix.de
2020-01-24smp: Add a smp_cond_func_t argument to smp_call_function_many()Sebastian Andrzej Siewior
on_each_cpu_cond_mask() allocates a new CPU mask. The newly allocated mask is a subset of the provided mask based on the conditional function. This memory allocation can be avoided by extending smp_call_function_many() with the conditional function and performing the remote function call based on the mask and the conditional function. Rename smp_call_function_many() to smp_call_function_many_cond() and add the smp_cond_func_t argument. If smp_cond_func_t is provided then it is used before invoking the function. Provide smp_call_function_many() with cond_func set to NULL. Let on_each_cpu_cond_mask() use smp_call_function_many_cond(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-3-bigeasy@linutronix.de
2020-01-24smp: Use smp_cond_func_t as type for the conditional functionSebastian Andrzej Siewior
Use a typdef for the conditional function instead defining it each time in the function prototype. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-2-bigeasy@linutronix.de
2020-01-24Merge tag 'irqchip-5.6' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - Conversion of the SiFive PLIC to hierarchical domains - New SiFive GPIO irqchip driver - New Aspeed SCI irqchip driver - New NXP INTMUX irqchip driver - Additional support for the Meson A1 GPIO irqchip - First part of the GICv4.1 support - Assorted fixes
2020-01-24Bluetooth: btrtl: Use kvmalloc for FW allocationsMaxim Mikityanskiy
Currently, kmemdup is applied to the firmware data, and it invokes kmalloc under the hood. The firmware size and patch_length are big (more than PAGE_SIZE), and on some low-end systems (like ASUS E202SA) kmalloc may fail to allocate a contiguous chunk under high memory usage and fragmentation: Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000a lmp_ver=06 lmp_subver=8821 Bluetooth: hci0: RTL: rom_version status=0 version=1 Bluetooth: hci0: RTL: loading rtl_bt/rtl8821a_fw.bin kworker/u9:2: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 <stack trace follows> As firmware load happens on each resume, Bluetooth will stop working after several iterations, when the kernel fails to allocate an order-4 page. This patch replaces kmemdup with kvmalloc+memcpy. It's not required to have a contiguous chunk here, because it's not mapped to the device directly. Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-01-24Merge branches 'doc.2019.12.10a', 'exp.2019.12.09a', 'fixes.2020.01.24a', ↵Paul E. McKenney
'kfree_rcu.2020.01.24a', 'list.2020.01.10a', 'preempt.2020.01.24a' and 'torture.2019.12.09a' into HEAD doc.2019.12.10a: Documentations updates exp.2019.12.09a: Expedited grace-period updates fixes.2020.01.24a: Miscellaneous fixes kfree_rcu.2020.01.24a: Batch kfree_rcu() work list.2020.01.10a: RCU-protected-list updates preempt.2020.01.24a: Preemptible RCU updates torture.2019.12.09a: Torture-test updates
2020-01-24rcu: Remove unused stop-machine #includePaul E. McKenney
Long ago, RCU used the stop-machine mechanism to implement expedited grace periods, but no longer does so. This commit therefore removes the no-longer-needed #includes of linux/stop_machine.h. Link: https://lwn.net/Articles/805317/ Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24powerpc: Remove comment about read_barrier_depends()Will Deacon
'read_barrier_depends()' doesn't exist anymore so stop talking about it. Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24.mailmap: Add entries for old paulmck@kernel.org addressesPaul E. McKenney
[ paulmck: Apply Florian Fainelli feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24srcu: Apply *_ONCE() to ->srcu_last_gp_endPaul E. McKenney
The ->srcu_last_gp_end field is accessed from any CPU at any time by synchronize_srcu(), so non-initialization references need to use READ_ONCE() and WRITE_ONCE(). This commit therefore makes that change. Reported-by: syzbot+08f3e9d26e5541e1ecf2@syzkaller.appspotmail.com Acked-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask()Paul E. McKenney
Currently, force_qs_rnp() uses a for_each_leaf_node_possible_cpu() loop containing a check of the current CPU's bit in ->qsmask. This works, but this commit saves three lines by instead using for_each_leaf_node_cpu_mask(), which combines the functionality of for_each_leaf_node_possible_cpu() and leaf_node_cpu_bit(). This commit also replaces the use of the local variable "bit" with rdp->grpmask. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Move rcu_{expedited,normal} definitions into rcupdate.hBen Dooks
This commit moves the rcu_{expedited,normal} definitions from kernel/rcu/update.c to include/linux/rcupdate.h to make sure they are in sync, and also to avoid the following warning from sparse: kernel/ksysfs.c:150:5: warning: symbol 'rcu_expedited' was not declared. Should it be static? kernel/ksysfs.c:167:5: warning: symbol 'rcu_normal' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.hLai Jiangshan
Only tree_stall.h needs to get name from GP state, so this commit moves the gp_state_names[] array and the gp_state_getname() from kernel/rcu/tree.h and kernel/rcu/tree.c, respectively, to kernel/rcu/tree_stall.h. While moving gp_state_names[], this commit uses the GCC syntax to ensure that the right string is associated with the right CPP macro. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Remove the declaration of call_rcu() in tree.hLai Jiangshan
The call_rcu() function is an external RCU API that is declared in include/linux/rcupdate.h. There is thus no point in redeclaring it in kernel/rcu/tree.h, so this commit removes that redundant declaration. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Fix tracepoint tracking RCU CPU kthread utilizationLai Jiangshan
In the call to trace_rcu_utilization() at the start of the loop in rcu_cpu_kthread(), "rcu_wait" is incorrect, plus this trace event needs to be hoisted above the loop to balance with either the "rcu_wait" or "rcu_yield", depending on how the loop exits. This commit therefore makes these changes. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Fix harmless omission of "CONFIG_" from #if conditionLai Jiangshan
The C preprocessor macros SRCU and TINY_RCU should instead be CONFIG_SRCU and CONFIG_TINY_RCU, respectively in the #f in kernel/rcu/rcu.h. But there is no harm when "TINY_RCU" is wrongly used, which are always non-defined, which makes "!defined(TINY_RCU)" always true, which means the code block is always included, and the included code block doesn't cause any compilation error so far in CONFIG_TINY_RCU builds. It is also the reason this change should not be taken in -stable. This commit adds the needed "CONFIG_" prefix to both macros. Not for -stable. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Avoid tick_dep_set_cpu() misorderingPaul E. McKenney
In the current code, rcu_nmi_enter_common() might decide to turn on the tick using tick_dep_set_cpu(), but be delayed just before doing so. Then the grace-period kthread might notice that the CPU in question had in fact gone through a quiescent state, thus turning off the tick using tick_dep_clear_cpu(). The later invocation of tick_dep_set_cpu() would then incorrectly leave the tick on. This commit therefore enlists the aid of the leaf rcu_node structure's ->lock to ensure that decisions to enable or disable the tick are carried out before they can be reversed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Provide wrappers for uses of ->rcu_read_lock_nestingLai Jiangshan
This commit provides wrapper functions for uses of ->rcu_read_lock_nesting to improve readability and to ease future changes to support inlining of __rcu_read_lock() and __rcu_read_unlock(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special()Paul E. McKenney
The rcu_node structure's ->expmask field is updated only when holding the ->lock, but is also accessed locklessly. This means that all ->expmask updates must use WRITE_ONCE() and all reads carried out without holding ->lock must use READ_ONCE(). This commit therefore changes the lockless ->expmask read in rcu_read_unlock_special() to use READ_ONCE(). Reported-by: syzbot+99f4ddade3c22ab0cf23@syzkaller.appspotmail.com Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Marco Elver <elver@google.com>
2020-01-24rcu: Clear ->rcu_read_unlock_special only onceLai Jiangshan
In rcu_preempt_deferred_qs_irqrestore(), ->rcu_read_unlock_special is cleared one piece at a time. Given that the "if" statements in this function use the copy in "special", this commit removes the clearing of the individual pieces in favor of clearing ->rcu_read_unlock_special in one go just after it has been determined to be non-zero. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Clear .exp_hint only when deferred quiescent state has been reportedLai Jiangshan
Currently, the .exp_hint flag is cleared in rcu_read_unlock_special(), which works, but which can also prevent subsequent rcu_read_unlock() calls from helping expedite the quiescent state needed by an ongoing expedited RCU grace period. This commit therefore defers clearing of .exp_hint from rcu_read_unlock_special() to rcu_preempt_deferred_qs_irqrestore(), thus ensuring that intervening calls to rcu_read_unlock() have a chance to help end the expedited grace period. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCULai Jiangshan
CONFIG_PREEMPTION and CONFIG_PREEMPT_RCU are always identical, but some code depends on CONFIG_PREEMPTION to access to rcu_preempt functionality. This patch changes CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU in these cases. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Remove kfree_call_rcu_nobatch()Joel Fernandes (Google)
Now that the kfree_rcu() special-casing has been removed from tree RCU, this commit removes kfree_call_rcu_nobatch() since it is no longer needed. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-01-24rcu: Remove kfree_rcu() special casing and lazy-callback handlingJoel Fernandes (Google)
This commit removes kfree_rcu() special-casing and the lazy-callback handling from Tree RCU. It moves some of this special casing to Tiny RCU, the removal of which will be the subject of later commits. This results in a nice negative delta. Suggested-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Add slab.h #include, thanks to kbuild test robot <lkp@intel.com>. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>