summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-15mlxsw: reg: Add a function to fill IPv6 unicast FDB entriesAmit Cohen
Add a function to fill IPv6 unicast FDB entries. Use the common function for common fields. Unlike IPv4 entries, the underlay IP address is not filled in the register payload, but instead a pointer to KVDL is used. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: Split handling of FDB tunnel entries between address familiesAmit Cohen
Currently, the function which adds/removes unicast tunnel FDB entries is shared between IPv4 and IPv6, while for IPv6 it warns because there is no support for it. The code for IPv6 will be more complicated because it needs to allocate/release a KVDL pointer for the underlay IPv6 address. As a preparation for IPv6 underlay support, split the code according to address family. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: spectrum_nve_vxlan: Make VxLAN flags check per address familyAmit Cohen
As part of 'can_offload' checks, there is a check of VxLAN flags. The supported flags for IPv6 VxLAN will be different from the existing flags because of some limitations. As preparation for IPv6 underlay support, make this check per address family. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: spectrum_ipip: Use common hash table for IPv6 address mappingAmit Cohen
Use the common hash table introduced by the previous patch instead of the IP-in-IP specific implementation. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15mlxsw: spectrum: Add hash table for IPv6 address mappingAmit Cohen
The device supports forwarding entries such as routes and FDBs that perform tunnel (e.g., VXLAN, IP-in-IP) encapsulation or decapsulation. When the underlay is IPv6, these entries do not encode the 128 bit IPv6 address used for encapsulation / decapsulation. Instead, these entries encode a 24 bit pointer to an array called KVDL where the IPv6 address is stored. Currently, only IP-in-IP with IPv6 underlay is supported, but subsequent patches will add support for VxLAN with IPv6 underlay. To avoid duplicating the logic required to store and retrieve these IPv6 addresses, introduce a hash table that will store the mapping between IPv6 addresses and their KVDL index. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15Merge tag 'mlx5-updates-2021-12-14' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saed Mahameed says: ==================== mlx5-updates-2021-12-14 Parsing Infrastructure for TC actions: The series introduce a TC action infrastructure to help parsing TC actions in a generic way for both FDB and NIC rules. To help maintain the parsing code of TC actions, we the parsing code to action parser per action TC type in separate files, instead of having one big switch case loop, duplicated between FDB and NIC parsers as before this patchset. Each TC flow_action->id is represented by a dedicated mlx5e_tc_act handler which has callbacks to check if the specific action is offload supported and to parse the specific action. We move each case (TC action) handling into the specific handler, which is responsible for parsing and determining if the action is supported. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15Merge tag 'wireless-drivers-2021-12-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.16 Second set of fixes for v5.16, hopefully also the last one. I changed my email in MAINTAINERS, one crash fix in iwlwifi and some build problems fixed. iwlwifi * fix crash caused by a warning * fix LED linking problem brcmsmac * rework LED dependencies for being consistent with other drivers mt76 * mt7921: fix build regression ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15rsi: fix array out of boundzhangyue
Limit the max of 'ii'. If 'ii' greater than or equal to 'RSI_MAX_VIFS', the array 'adapter->vifs' may be out of bound Signed-off-by: zhangyue <zhangyue1@kylinos.cn> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20211208095341.47777-1-zhangyue1@kylinos.cn
2021-12-15x86/boot: Move EFI range reservation after cmdline parsingMike Rapoport
The memory reservation in arch/x86/platform/efi/efi.c depends on at least two command line parameters. Put it back later in the boot process and move efi_memblock_x86_reserve_range() out of early_memory_reserve(). An attempt to fix this was done in 8d48bf8206f7 ("x86/boot: Pull up cmdline preparation and early param parsing") but that caused other troubles so it got reverted. The bug this is addressing is: Dan reports that Anjaneya Chagam can no longer use the efi=nosoftreserve kernel command line parameter to suppress "soft reservation" behavior. This is due to the fact that the following call-chain happens at boot: early_reserve_memory |-> efi_memblock_x86_reserve_range |-> efi_fake_memmap_early which does if (!efi_soft_reserve_enabled()) return; and that would have set EFI_MEM_NO_SOFT_RESERVE after having parsed "nosoftreserve". However, parse_early_param() gets called *after* it, leading to the boot cmdline not being taken into account. See also https://lore.kernel.org/r/e8dd8993c38702ee6dd73b3c11f158617e665607.camel@intel.com [ bp: Turn into a proper patch. ] Signed-off-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211213112757.2612-4-bp@alien8.de
2021-12-15tomoyo: use hwight16() in tomoyo_domain_quota_is_ok()Tetsuo Handa
hwight16() is much faster. While we are at it, no need to include "perm =" part into data_race() macro, for perm is a local variable that cannot be accessed by other threads. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2021-12-15tomoyo: Check exceeded quota early in tomoyo_domain_quota_is_ok().Dmitry Vyukov
If tomoyo is used in a testing/fuzzing environment in learning mode, for lots of domains the quota will be exceeded and stay exceeded for prolonged periods of time. In such cases it's pointless (and slow) to walk the whole acl list again and again just to rediscover that the quota is exceeded. We already have the TOMOYO_DIF_QUOTA_WARNED flag that notes the overflow condition. Check it early to avoid the slowdown. [penguin-kernel] This patch causes a user visible change that the learning mode will not be automatically resumed after the quota is increased. To resume the learning mode, administrator will need to explicitly clear TOMOYO_DIF_QUOTA_WARNED flag after increasing the quota. But I think that this change is generally preferable, for administrator likely wants to optimize the acl list for that domain before increasing the quota, or that domain likely hits the quota again. Therefore, don't try to care to clear TOMOYO_DIF_QUOTA_WARNED flag automatically when the quota for that domain changed. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2021-12-15Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-12-14 This series contains updates to ice driver only. Karol corrects division that was causing incorrect calculations and adds a check to ensure stale timestamps are not being used. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-12-14 This series contains updates to ice driver only. Haiyue adds support to query hardware for supported PTYPEs. Jeff changes PTYPE validation to utilize the capabilities queried from the hardware instead of maintaining a per DDP support list. Brett refactors promiscuous functions to provide common and clear interfaces to call for configuration. Wojciech modifies DDP package load to simplify determining the final state of the load. Tony removes the use of ice_status from the driver. This involves removing string conversion functions, converting variables and values to standard errors, and clean up. He also removes an unused define. Dan Carpenter removes unneeded casts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15Revert "x86/boot: Pull up cmdline preparation and early param parsing"Borislav Petkov
This reverts commit 8d48bf8206f77aa8687f0e241e901e5197e52423. It turned out to be a bad idea as it broke supplying mem= cmdline parameters due to parse_memopt() requiring preparatory work like setting up the e820 table in e820__memory_setup() in order to be able to exclude the range specified by mem=. Pulling that up would've broken Xen PV again, see threads at https://lkml.kernel.org/r/20210920120421.29276-1-jgross@suse.com due to xen_memory_setup() needing the first reservations in early_reserve_memory() - kernel and initrd - to have happened already. This could be fixed again by having Xen do those reservations itself... Long story short, revert this and do a simpler fix in a later patch. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211213112757.2612-3-bp@alien8.de
2021-12-15net: fec: fix system hang during suspend/resumeJoakim Zhang
1. During normal suspend (WoL not enabled) process, system has posibility to hang. The root cause is TXF interrupt coming after clocks disabled, system hang when accessing registers from interrupt handler. To fix this issue, disable all interrupts when system suspend. 2. System also has posibility to hang with WoL enabled during suspend, after entering stop mode, then magic pattern coming after clocks disabled, system will be waked up, and interrupt handler will be called, system hang when access registers. To fix this issue, disable wakeup irq in .suspend(), and enable it in .resume(). Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15net: ocelot: add support to get port mac from device-treeClément Léger
Add support to get mac from device-tree using of_get_ethdev_address. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15sun4i-emac.c: remove unnecessary branchConley Lee
According to the current implementation of emac_rx, every arrived packet will be processed in the while loop. So, there is no remain packet last time. The skb_last field and this branch for dealing with it is unnecessary. Signed-off-by: Conley Lee <conleylee@foxmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15ethtool: use ethnl_parse_header_dev_put()Eric Dumazet
It seems I missed that most ethnl_parse_header_dev_get() callers declare an on-stack struct ethnl_req_info, and that they simply call dev_put(req_info.dev) when about to return. Add ethnl_parse_header_dev_put() helper to properly untrack reference taken by ethnl_parse_header_dev_get(). Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15Revert "x86/boot: Mark prepare_command_line() __init"Borislav Petkov
This reverts commit c0f2077baa4113f38f008b8e912b9fb3ff8d43df. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211213112757.2612-2-bp@alien8.de
2021-12-15ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBookJeremy Szu
There is a HP ProBook which using ALC236 codec and need the ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to make mute LED and micmute LED work. Signed-off-by: Jeremy Szu <jeremy.szu@canonical.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211214164156.49711-1-jeremy.szu@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-12-14libbpf: Avoid reading past ELF data section end when copying licenseAndrii Nakryiko
Fix possible read beyond ELF "license" data section if the license string is not properly zero-terminated. Use the fact that libbpf_strlcpy never accesses the (N-1)st byte of the source string because it's replaced with '\0' anyways. If this happens, it's a violation of contract between libbpf and a user, but not handling this more robustly upsets CIFuzz, so given the fix is trivial, let's fix the potential issue. Fixes: 9fc205b413b3 ("libbpf: Add sane strncpy alternative and use it internally") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211214232054.3458774-1-andrii@kernel.org
2021-12-14net/mlx5e: Move goto action checks into tc_action goto post parse opRoi Dayan
Move goto action checks from parse nic/fdb funcs into the tc action infra goto post parse op. While moving this part also use NL_SET_ERR_MSG_MOD() instead of NL_SET_ERR_MSG(). Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Move vlan action chunk into tc action vlan post parse opRoi Dayan
Move vlan prio tag rewrite handling into tc action infra vlan post parse op. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add post_parse() op to tc action infrastructureRoi Dayan
The post_parse() op should be called after the parse op was called for all actions. It could be an action state is dependent on other actions. In the new op an action can fail the parse if the state is not valid anymore. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Move sample attr allocation to tc_action sample parse opRoi Dayan
There is no reason to wait with the kmalloc to after parsing all other actions. There could still be a failure later and before offloading the rule. So alloc the mem when parsing. The memory is being released on mlx5e_flow_put() which is called also on error flow. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: TC action parsing loopRoi Dayan
Introduce a common function to implement the generic parsing loop. The same function can be used for parsing NIC and FDB (Switchdev mode) flows. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add redirect ingress to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add sample and ptype to tc_action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add ct to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add mirred/redirect to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add mpls push/pop to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add vlan push/pop/mangle to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add pedit to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add csum to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add tunnel encap/decap to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add goto to tc action infraRoi Dayan
Add parsing support by implementing struct mlx5e_tc_act for this action. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14net/mlx5e: Add tc action infrastructureRoi Dayan
Add an infrastructure to help parsing tc actions in a generic way. Supporting an action parser means implementing struct mlx5e_tc_act for that action. The infrastructure will give the possibility to be generic when parsing tc actions, i.e. parse_tc_nic_actions() and parse_tc_fdb_actions(). To parse tc actions a user needs to allocate a parse_state instance and pass it when iterating over the tc actions parsers. If a parser doesn't exists then a user can treat it as unsupported. To add an action parser a user needs to implement two callbacks. The can_offload() callback to quickly check if an action can be offloaded. The parse_action() callback to do actual parsing and prepare for offload. Add implementation for drop, trap, mark and accept action parsers with this commit to act as examples and implement usage of the new infrastructure for those actions. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14bpf, selftests: Update test case for atomic cmpxchg on r0 with pointerDaniel Borkmann
Fix up unprivileged test case results for 'Dest pointer in r0' verifier tests given they now need to reject R0 containing a pointer value, and add a couple of new related ones with 32bit cmpxchg as well. root@foo:~/bpf/tools/testing/selftests/bpf# ./test_verifier #0/u invalid and of negative number OK #0/p invalid and of negative number OK [...] #1268/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK #1269/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK #1270/p XDP pkt read, pkt_data <= pkt_meta', good access OK #1271/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK #1272/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK Summary: 1900 PASSED, 0 SKIPPED, 0 FAILED Acked-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-14bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux regDaniel Borkmann
The implementation of BPF_CMPXCHG on a high level has the following parameters: .-[old-val] .-[new-val] BPF_R0 = cmpxchg{32,64}(DST_REG + insn->off, BPF_R0, SRC_REG) `-[mem-loc] `-[old-val] Given a BPF insn can only have two registers (dst, src), the R0 is fixed and used as an auxilliary register for input (old value) as well as output (returning old value from memory location). While the verifier performs a number of safety checks, it misses to reject unprivileged programs where R0 contains a pointer as old value. Through brute-forcing it takes about ~16sec on my machine to leak a kernel pointer with BPF_CMPXCHG. The PoC is basically probing for kernel addresses by storing the guessed address into the map slot as a scalar, and using the map value pointer as R0 while SRC_REG has a canary value to detect a matching address. Fix it by checking R0 for pointers, and reject if that's the case for unprivileged programs. Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg") Reported-by: Ryota Shiga (Flatt Security) Acked-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-14bpf, selftests: Add test case for atomic fetch on spilled pointerDaniel Borkmann
Test whether unprivileged would be able to leak the spilled pointer either by exporting the returned value from the atomic{32,64} operation or by reading and exporting the value from the stack after the atomic operation took place. Note that for unprivileged, the below atomic cmpxchg test case named "Dest pointer in r0 - succeed" is failing. The reason is that in the dst memory location (r10 -8) there is the spilled register r10: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (bf) r0 = r10 1: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (7b) *(u64 *)(r10 -8) = r0 2: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=fp 2: (b7) r1 = 0 3: R0_w=fp0 R1_w=invP0 R10=fp0 fp-8_w=fp 3: (db) r0 = atomic64_cmpxchg((u64 *)(r10 -8), r0, r1) 4: R0_w=fp0 R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm 4: (79) r1 = *(u64 *)(r0 -8) 5: R0_w=fp0 R1_w=invP(id=0) R10=fp0 fp-8_w=mmmmmmmm 5: (b7) r0 = 0 6: R0_w=invP0 R1_w=invP(id=0) R10=fp0 fp-8_w=mmmmmmmm 6: (95) exit However, allowing this case for unprivileged is a bit useless given an update with a new pointer will fail anyway: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (bf) r0 = r10 1: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (7b) *(u64 *)(r10 -8) = r0 2: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=fp 2: (db) r0 = atomic64_cmpxchg((u64 *)(r10 -8), r0, r10) R10 leaks addr into mem Acked-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-14bpf: Fix kernel address leakage in atomic fetchDaniel Borkmann
The change in commit 37086bfdc737 ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH") around check_mem_access() handling is buggy since this would allow for unprivileged users to leak kernel pointers. For example, an atomic fetch/and with -1 on a stack destination which holds a spilled pointer will migrate the spilled register type into a scalar, which can then be exported out of the program (since scalar != pointer) by dumping it into a map value. The original implementation of XADD was preventing this situation by using a double call to check_mem_access() one with BPF_READ and a subsequent one with BPF_WRITE, in both cases passing -1 as a placeholder value instead of register as per XADD semantics since it didn't contain a value fetch. The BPF_READ also included a check in check_stack_read_fixed_off() which rejects the program if the stack slot is of __is_pointer_value() if dst_regno < 0. The latter is to distinguish whether we're dealing with a regular stack spill/ fill or some arithmetical operation which is disallowed on non-scalars, see also 6e7e63cbb023 ("bpf: Forbid XADD on spilled pointers for unprivileged users") for more context on check_mem_access() and its handling of placeholder value -1. One minimally intrusive option to fix the leak is for the BPF_FETCH case to initially check the BPF_READ case via check_mem_access() with -1 as register, followed by the actual load case with non-negative load_reg to propagate stack bounds to registers. Fixes: 37086bfdc737 ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH") Reported-by: <n4ke4mry@gmail.com> Acked-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-14bcache: fix NULL pointer reference in cached_dev_detach_finishLin Feng
Commit 0259d4498ba4 ("bcache: move calc_cached_dev_sectors to proper place on backing device detach") tries to fix calc_cached_dev_sectors when bcache device detaches, but now we have: cached_dev_detach_finish ... bcache_device_detach(&dc->disk); ... closure_put(&d->c->caching); d->c = NULL; [*explicitly set dc->disk.c to NULL*] list_move(&dc->list, &uncached_devices); calc_cached_dev_sectors(dc->disk.c); [*passing a NULL pointer*] ... Upper codeflows shows how bug happens, this patch fix the problem by caching dc->disk.c beforehand, and cache_set won't be freed under us because c->caching closure at least holds a reference count and closure callback __cache_set_unregister only being called by bch_cache_set_stop which using closure_queue(&c->caching), that means c->caching closure callback for destroying cache_set won't be trigger by previous closure_put(&d->c->caching). So at this stage(while cached_dev_detach_finish is calling) it's safe to access cache_set dc->disk.c. Fixes: 0259d4498ba4 ("bcache: move calc_cached_dev_sectors to proper place on backing device detach") Signed-off-by: Lin Feng <linf@wangsu.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20211112053629.3437-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-14block: reduce kblockd_mod_delayed_work_on() CPU consumptionJens Axboe
Dexuan reports that he's seeing spikes of very heavy CPU utilization when running 24 disks and using the 'none' scheduler. This happens off the sched restart path, because SCSI requires the queue to be restarted async, and hence we're hammering on mod_delayed_work_on() to ensure that the work item gets run appropriately. Avoid hammering on the timer and just use queue_work_on() if no delay has been specified. Reported-and-tested-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/ Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-14Merge branch 'mptcp-fixes-for-ulp-a-deadlock-and-netlink-docs'Jakub Kicinski
Mat Martineau says: ==================== mptcp: Fixes for ULP, a deadlock, and netlink docs Two of the MPTCP fixes in this set are related to the TCP_ULP socket option with MPTCP sockets operating in "fallback" mode (the connection has reverted to regular TCP). The other issues are an observed deadlock and missing parameter documentation in the MPTCP netlink API. Patch 1 marks TCP_ULP as unsupported earlier in MPTCP setsockopt code, so the fallback code path in the MPTCP layer does not pass the TCP_ULP option down to the subflow TCP socket. Patch 2 makes sure a TCP fallback socket returned to userspace by accept()ing on a MPTCP listening socket does not allow use of the "mptcp" TCP_ULP type. That ULP is intended only for use by in-kernel MPTCP subflows. Patch 3 fixes the possible deadlock when sending data and there are socket option changes to sync to the subflows. Patch 4 makes sure all MPTCP netlink event parameters are documented in the MPTCP uapi header. ==================== Link: https://lore.kernel.org/r/20211214231604.211016-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14mptcp: add missing documented NL paramsMatthieu Baerts
'loc_id' and 'rem_id' are set in all events linked to subflows but those were missing in the events description in the comments. Fixes: b911c97c7dc7 ("mptcp: add netlink event support") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14mptcp: fix deadlock in __mptcp_push_pending()Maxim Galaganov
__mptcp_push_pending() may call mptcp_flush_join_list() with subflow socket lock held. If such call hits mptcp_sockopt_sync_all() then subsequently __mptcp_sockopt_sync() could try to lock the subflow socket for itself, causing a deadlock. sysrq: Show Blocked State task:ss-server state:D stack: 0 pid: 938 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x2d6/0x10c0 ? __mod_memcg_state+0x4d/0x70 ? csum_partial+0xd/0x20 ? _raw_spin_lock_irqsave+0x26/0x50 schedule+0x4e/0xc0 __lock_sock+0x69/0x90 ? do_wait_intr_irq+0xa0/0xa0 __lock_sock_fast+0x35/0x50 mptcp_sockopt_sync_all+0x38/0xc0 __mptcp_push_pending+0x105/0x200 mptcp_sendmsg+0x466/0x490 sock_sendmsg+0x57/0x60 __sys_sendto+0xf0/0x160 ? do_wait_intr_irq+0xa0/0xa0 ? fpregs_restore_userregs+0x12/0xd0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f9ba546c2d0 RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0 RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234 RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060 R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8 </TASK> Fix the issue by using __mptcp_flush_join_list() instead of plain mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by Florian. The sockopt sync will be deferred to the workqueue. Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244 Suggested-by: Florian Westphal <fw@strlen.de> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14mptcp: clear 'kern' flag from fallback socketsFlorian Westphal
The mptcp ULP extension relies on sk->sk_sock_kern being set correctly: It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from working for plain tcp sockets (any userspace-exposed socket). But in case of fallback, accept() can return a plain tcp sk. In such case, sk is still tagged as 'kernel' and setsockopt will work. This will crash the kernel, The subflow extension has a NULL ctx->conn mptcp socket: BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0 Call Trace: tcp_data_ready+0xf8/0x370 [..] Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14mptcp: remove tcp ulp setsockopt supportFlorian Westphal
TCP_ULP setsockopt cannot be used for mptcp because its already used internally to plumb subflow (tcp) sockets to the mptcp layer. syzbot managed to trigger a crash for mptcp connections that are in fallback mode: KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027] CPU: 1 PID: 1083 Comm: syz-executor.3 Not tainted 5.16.0-rc2-syzkaller #0 RIP: 0010:tls_build_proto net/tls/tls_main.c:776 [inline] [..] __tcp_set_ulp net/ipv4/tcp_ulp.c:139 [inline] tcp_set_ulp+0x428/0x4c0 net/ipv4/tcp_ulp.c:160 do_tcp_setsockopt+0x455/0x37c0 net/ipv4/tcp.c:3391 mptcp_setsockopt+0x1b47/0x2400 net/mptcp/sockopt.c:638 Remove support for TCP_ULP setsockopt. Fixes: d9e4c1291810 ("mptcp: only admit explicitly supported sockopt") Reported-by: syzbot+1fd9b69cde42967d1add@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14Merge branch 'net-dsa-hellcreek-fix-handling-of-mgmt-protocols'Jakub Kicinski
Kurt Kanzenbach says: ==================== net: dsa: hellcreek: Fix handling of MGMT protocols this series fixes some minor issues with regards to management protocols such as PTP and STP in the hellcreek DSA driver. Configure static FDB for these protocols. The end result is: |root@tsn:~# mv88e6xxx_dump --atu |Using device <platform/ff240000.switch> |ATU: |FID MAC 0123 Age OBT Pass Static Reprio Prio | 0 01:1b:19:00:00:00 1100 1 X X 6 | 1 01:00:5e:00:01:81 1100 1 X X 6 | 2 33:33:00:00:01:81 1100 1 X X 6 | 3 01:80:c2:00:00:0e 1100 1 X X X 6 | 4 01:00:5e:00:00:6b 1100 1 X X X 6 | 5 33:33:00:00:00:6b 1100 1 X X X 6 | 6 01:80:c2:00:00:00 1100 1 X X X 6 Previous version: * https://lore.kernel.org/r/20211213101810.121553-1-kurt@linutronix.de/ Changes since v1: * Target net-next, as this never worked correctly and is not critical * Add STP and PTP over UDP rules * Use pass_blocked for PDelay messages only (Richard Cochran) ==================== Link: https://lore.kernel.org/r/20211214134508.57806-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14net: dsa: hellcreek: Add missing PTP via UDP rulesKurt Kanzenbach
The switch supports PTP for UDP transport too. Therefore, add the missing static FDB entries to ensure correct forwarding of these packets. Fixes: ddd56dfe52c9 ("net: dsa: hellcreek: Add PTP clock support") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>