summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-17net: openvswitch: Fix matching zone id for invalid conns arriving from tcPaul Blakey
Zone id is not restored if we passed ct and ct rejected the connection, as there is no ct info on the skb. Save the zone from tc skb cb to tc skb extension and pass it on to ovs, use that info to restore the zone id for invalid connections. Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17net/sched: flow_dissector: Fix matching on zone id for invalid connsPaul Blakey
If ct rejects a flow, it removes the conntrack info from the skb. act_ct sets the post_ct variable so the dissector will see this case as an +tracked +invalid state, but the zone id is lost with the conntrack info. To restore the zone id on such cases, set the last executed zone, via the tc control block, when passing ct, and read it back in the dissector if there is no ct info on the skb (invalid connection). Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17net/sched: Extend qdisc control block with tc control blockPaul Blakey
BPF layer extends the qdisc control block via struct bpf_skb_data_end and because of that there is no more room to add variables to the qdisc layer control block without going over the skb->cb size. Extend the qdisc control block with a tc control block, and move all tc related variables to there as a pre-step for extending the tc control block with additional members. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix UAF in set catch-all element, from Eric Dumazet. 2) Fix MAC mangling for multicast/loopback traffic in nfnetlink_queue and nfnetlink_log, from Ignacy Gawędzki. 3) Remove expired entries from ctnetlink dump path regardless the tuple direction, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-17net: fix typo in a commentXiang wangx
The double 'as' in a comment is repeated, thus it should be removed. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16Merge tag 'audit-pr-20211216' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A single patch to fix a problem where the audit queue could grow unbounded when the audit daemon is forcibly stopped" * tag 'audit-pr-20211216' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: improve robustness of the audit queue handling
2021-12-16Merge tag 'net-5.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, wifi, bpf. Relatively large batches of fixes from BPF and the WiFi stack, calm in general networking. Current release - regressions: - dpaa2-eth: fix buffer overrun when reporting ethtool statistics Current release - new code bugs: - bpf: fix incorrect state pruning for <8B spill/fill - iavf: - add missing unlocks in iavf_watchdog_task() - do not override the adapter state in the watchdog task (again) - mlxsw: spectrum_router: consolidate MAC profiles when possible Previous releases - regressions: - mac80211 fixes: - rate control, avoid driver crash for retransmitted frames - regression in SSN handling of addba tx - a memory leak where sta_info is not freed - marking TX-during-stop for TX in in_reconfig, prevent stall - cfg80211: acquire wiphy mutex on regulatory work - wifi drivers: fix build regressions and LED config dependency - virtio_net: fix rx_drops stat for small pkts - dsa: mv88e6xxx: unforce speed & duplex in mac_link_down() Previous releases - always broken: - bpf fixes: - kernel address leakage in atomic fetch - kernel address leakage in atomic cmpxchg's r0 aux reg - signed bounds propagation after mov32 - extable fixup offset - extable address check - mac80211: - fix the size used for building probe request - send ADDBA requests using the tid/queue of the aggregation session - agg-tx: don't schedule_and_wake_txq() under sta->lock, avoid deadlocks - validate extended element ID is present - mptcp: - never allow the PM to close a listener subflow (null-defer) - clear 'kern' flag from fallback sockets, prevent crash - fix deadlock in __mptcp_push_pending() - inet_diag: fix kernel-infoleak for UDP sockets - xsk: do not sleep in poll() when need_wakeup set - smc: avoid very long waits in smc_release() - sch_ets: don't remove idle classes from the round-robin list - netdevsim: - zero-initialize memory for bpf map's value, prevent info leak - don't let user space overwrite read only (max) ethtool parms - ixgbe: set X550 MDIO speed before talking to PHY - stmmac: - fix null-deref in flower deletion w/ VLAN prio Rx steering - dwmac-rk: fix oob read in rk_gmac_setup - ice: time stamping fixes - systemport: add global locking for descriptor life cycle" * tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (89 commits) bpf, selftests: Fix racing issue in btf_skc_cls_ingress test selftest/bpf: Add a test that reads various addresses. bpf: Fix extable address check. bpf: Fix extable fixup offset. bpf, selftests: Add test case trying to taint map value pointer bpf: Make 32->64 bounds propagation slightly more robust bpf: Fix signed bounds propagation after mov32 sit: do not call ipip6_dev_free() from sit_init_net() net: systemport: Add global locking for descriptor lifecycle net/smc: Prevent smc_release() from long blocking net: Fix double 0x prefix print in SKB dump virtio_net: fix rx_drops stat for small pkts dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED sfc_ef100: potential dereference of null pointer net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup net: usb: lan78xx: add Allied Telesis AT29M2-AF net/packet: rx_owner_map depends on pg_vec netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc dpaa2-eth: fix ethtool statistics ixgbe: set X550 MDIO speed before talking to PHY ...
2021-12-16Merge tag 'soc-fixes-5.16-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There are a number of DT fixes, mostly for mistakes found through static checking of the dts files again, as well as a couple of minor changes to address incorrect DT settings. For i.MX, there is yet another series of devitree changes to update RGMII delay settings for ethernet, which is an ongoing problem after some driver changes. For SoC specific device drivers, a number of smaller fixes came up: - i.MX SoC identification was incorrectly registered non-i.MX machines when the driver is built-in - One fix on imx8m-blk-ctrl driver to get i.MX8MM MIPI reset work properly - a few compile fixes for warnings that get in the way of -Werror - a string overflow in the scpi firmware driver - a boot failure with FORTIFY_SOURCE on Rockchips machines - broken error handling in the AMD TEE driver - a revert for a tegra reset driver commit that broke HDA" * tag 'soc-fixes-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (25 commits) soc/tegra: fuse: Fix bitwise vs. logical OR warning firmware: arm_scpi: Fix string overflow in SCPI genpd driver soc: imx: Register SoC device only on i.MX boards soc: imx: imx8m-blk-ctrl: Fix imx8mm mipi reset ARM: dts: imx6ull-pinfunc: Fix CSI_DATA07__ESAI_TX0 pad name arm64: dts: imx8mq: remove interconnect property from lcdif ARM: socfpga: dts: fix qspi node compatible arm64: dts: apple: add #interrupt-cells property to pinctrl nodes dt-bindings: i2c: apple,i2c: allow multiple compatibles arm64: meson: remove COMMON_CLK arm64: meson: fix dts for JetHub D1 tee: amdtee: fix an IS_ERR() vs NULL bug arm64: dts: apple: change ethernet0 device type to ethernet arm64: dts: ten64: remove redundant interrupt declaration for gpio-keys arm64: dts: rockchip: fix poweroff on helios64 arm64: dts: rockchip: fix audio-supply for Rock Pi 4 arm64: dts: rockchip: fix rk3399-leez-p710 vcc3v3-lan supply arm64: dts: rockchip: fix rk3308-roc-cc vcc-sd supply arm64: dts: rockchip: remove mmc-hs400-enhanced-strobe from rk3399-khadas-edge ARM: rockchip: Use memcpy_toio instead of memcpy on smp bring-up ...
2021-12-16Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-12-16 We've added 15 non-merge commits during the last 7 day(s) which contain a total of 12 files changed, 434 insertions(+), 30 deletions(-). The main changes are: 1) Fix incorrect verifier state pruning behavior for <8B register spill/fill, from Paul Chaignon. 2) Fix x86-64 JIT's extable handling for fentry/fexit when return pointer is an ERR_PTR(), from Alexei Starovoitov. 3) Fix 3 different possibilities that BPF verifier missed where unprivileged could leak kernel addresses, from Daniel Borkmann. 4) Fix xsk's poll behavior under need_wakeup flag, from Magnus Karlsson. 5) Fix an oob-write in test_verifier due to a missed MAX_NR_MAPS bump, from Kumar Kartikeya Dwivedi. 6) Fix a race in test_btf_skc_cls_ingress selftest, from Martin KaFai Lau. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Fix racing issue in btf_skc_cls_ingress test selftest/bpf: Add a test that reads various addresses. bpf: Fix extable address check. bpf: Fix extable fixup offset. bpf, selftests: Add test case trying to taint map value pointer bpf: Make 32->64 bounds propagation slightly more robust bpf: Fix signed bounds propagation after mov32 bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg bpf, selftests: Add test case for atomic fetch on spilled pointer bpf: Fix kernel address leakage in atomic fetch selftests/bpf: Fix OOB write in test_verifier xsk: Do not sleep in poll() when need_wakeup set selftests/bpf: Tests for state pruning with u32 spill/fill bpf: Fix incorrect state pruning for <8B spill/fill ==================== Link: https://lore.kernel.org/r/20211216210005.13815-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16bpf, selftests: Fix racing issue in btf_skc_cls_ingress testMartin KaFai Lau
The libbpf CI reported occasional failure in btf_skc_cls_ingress: test_syncookie:FAIL:Unexpected syncookie states gen_cookie:80326634 recv_cookie:0 bpf prog error at line 97 "error at line 97" means the bpf prog cannot find the listening socket when the final ack is received. It then skipped processing the syncookie in the final ack which then led to "recv_cookie:0". The problem is the userspace program did not do accept() and went ahead to close(listen_fd) before the kernel (and the bpf prog) had a chance to process the final ack. The fix is to add accept() call so that the userspace will wait for the kernel to finish processing the final ack first before close()-ing everything. Fixes: 9a856cae2217 ("bpf: selftest: Add test_btf_skc_cls_ingress") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211216191630.466151-1-kafai@fb.com
2021-12-16selftest/bpf: Add a test that reads various addresses.Alexei Starovoitov
Add a function to bpf_testmod that returns invalid kernel and user addresses. Then attach an fexit program to that function that tries to read memory through these addresses. This logic checks that bpf_probe_read_kernel and BPF_PROBE_MEM logic is sane. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-12-16bpf: Fix extable address check.Alexei Starovoitov
The verifier checks that PTR_TO_BTF_ID pointer is either valid or NULL, but it cannot distinguish IS_ERR pointer from valid one. When offset is added to IS_ERR pointer it may become small positive value which is a user address that is not handled by extable logic and has to be checked for at the runtime. Tighten BPF_PROBE_MEM pointer check code to prevent this case. Fixes: 4c5de127598e ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.") Reported-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-12-16bpf: Fix extable fixup offset.Alexei Starovoitov
The prog - start_of_ldx is the offset before the faulting ldx to the location after it, so this will be used to adjust pt_regs->ip for jumping over it and continuing, and with old temp it would have been fixed up to the wrong offset, causing crash. Fixes: 4c5de127598e ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2021-12-16Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "A single fix for the clk framework that needed some more bake time in linux-next. The problem is that two clks being registered at the same time can lead to a busted clk tree if the parent isn't fully registered by the time the child finds the parent. We rejigger the place where we mark the parent as fully registered so that the child can't find the parent until things are proper" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: Don't parent clks until the parent is fully registered
2021-12-16bpf, selftests: Add test case trying to taint map value pointerDaniel Borkmann
Add a test case which tries to taint map value pointer arithmetic into a unknown scalar with subsequent export through the map. Before fix: # ./test_verifier 1186 #1186/u map access: trying to leak tained dst reg FAIL Unexpected success to load! verification time 24 usec stack depth 8 processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 #1186/p map access: trying to leak tained dst reg FAIL Unexpected success to load! verification time 8 usec stack depth 8 processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 Summary: 0 PASSED, 0 SKIPPED, 2 FAILED After fix: # ./test_verifier 1186 #1186/u map access: trying to leak tained dst reg OK #1186/p map access: trying to leak tained dst reg OK Summary: 2 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-12-16bpf: Make 32->64 bounds propagation slightly more robustDaniel Borkmann
Make the bounds propagation in __reg_assign_32_into_64() slightly more robust and readable by aligning it similarly as we did back in the __reg_combine_64_into_32() counterpart. Meaning, only propagate or pessimize them as a smin/smax pair. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-12-16bpf: Fix signed bounds propagation after mov32Daniel Borkmann
For the case where both s32_{min,max}_value bounds are positive, the __reg_assign_32_into_64() directly propagates them to their 64 bit counterparts, otherwise it pessimises them into [0,u32_max] universe and tries to refine them later on by learning through the tnum as per comment in mentioned function. However, that does not always happen, for example, in mov32 operation we call zext_32_to_64(dst_reg) which invokes the __reg_assign_32_into_64() as is without subsequent bounds update as elsewhere thus no refinement based on tnum takes place. Thus, not calling into the __update_reg_bounds() / __reg_deduce_bounds() / __reg_bound_offset() triplet as we do, for example, in case of ALU ops via adjust_scalar_min_max_vals(), will lead to more pessimistic bounds when dumping the full register state: Before fix: 0: (b4) w0 = -1 1: R0_w=invP4294967295 (id=0,imm=ffffffff, smin_value=4294967295,smax_value=4294967295, umin_value=4294967295,umax_value=4294967295, var_off=(0xffffffff; 0x0), s32_min_value=-1,s32_max_value=-1, u32_min_value=-1,u32_max_value=-1) 1: (bc) w0 = w0 2: R0_w=invP4294967295 (id=0,imm=ffffffff, smin_value=0,smax_value=4294967295, umin_value=4294967295,umax_value=4294967295, var_off=(0xffffffff; 0x0), s32_min_value=-1,s32_max_value=-1, u32_min_value=-1,u32_max_value=-1) Technically, the smin_value=0 and smax_value=4294967295 bounds are not incorrect, but given the register is still a constant, they break assumptions about const scalars that smin_value == smax_value and umin_value == umax_value. After fix: 0: (b4) w0 = -1 1: R0_w=invP4294967295 (id=0,imm=ffffffff, smin_value=4294967295,smax_value=4294967295, umin_value=4294967295,umax_value=4294967295, var_off=(0xffffffff; 0x0), s32_min_value=-1,s32_max_value=-1, u32_min_value=-1,u32_max_value=-1) 1: (bc) w0 = w0 2: R0_w=invP4294967295 (id=0,imm=ffffffff, smin_value=4294967295,smax_value=4294967295, umin_value=4294967295,umax_value=4294967295, var_off=(0xffffffff; 0x0), s32_min_value=-1,s32_max_value=-1, u32_min_value=-1,u32_max_value=-1) Without the smin_value == smax_value and umin_value == umax_value invariant being intact for const scalars, it is possible to leak out kernel pointers from unprivileged user space if the latter is enabled. For example, when such registers are involved in pointer arithmtics, then adjust_ptr_min_max_vals() will taint the destination register into an unknown scalar, and the latter can be exported and stored e.g. into a BPF map value. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Kuee K1r0a <liulin063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-12-16Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Fix missing error code on kexec failure path" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kexec: Fix missing error code 'ret' warning in load_other_segments()
2021-12-16Merge tag 'for-5.16/dm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix use after free in DM btree remove's rebalance_children() - Fix DM integrity data corruption, introduced during 5.16 merge, due to improper use of bvec_kmap_local() * tag 'for-5.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm integrity: fix data corruption due to improper use of bvec_kmap_local dm btree remove: fix use after free in rebalance_children()
2021-12-16arm64: kexec: Fix missing error code 'ret' warning in load_other_segments()Lakshmi Ramasubramanian
Since commit ac10be5cdbfa ("arm64: Use common of_kexec_alloc_and_setup_fdt()"), smatch reports the following warning: arch/arm64/kernel/machine_kexec_file.c:152 load_other_segments() warn: missing error code 'ret' Return code is not set to an error code in load_other_segments() when of_kexec_alloc_and_setup_fdt() call returns a NULL dtb. This results in status success (return code set to 0) being returned from load_other_segments(). Set return code to -EINVAL if of_kexec_alloc_and_setup_fdt() returns NULL dtb. Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: ac10be5cdbfa ("arm64: Use common of_kexec_alloc_and_setup_fdt()") Link: https://lore.kernel.org/r/20211210010121.101823-1-nramas@linux.microsoft.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-16afs: Fix mmapDavid Howells
Fix afs_add_open_map() to check that the vnode isn't already on the list when it adds it. It's possible that afs_drop_open_mmap() decremented the cb_nr_mmap counter, but hadn't yet got into the locked section to remove it. Also vnode->cb_mmap_link should be initialised, so fix that too. Fixes: 6e0e99d58a65 ("afs: Fix mmap coherency vs 3rd-party changes") Reported-by: kafs-testing+fedora34_64checkkafs-build-300@auristor.com Suggested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: kafs-testing+fedora34_64checkkafs-build-300@auristor.com cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/686465.1639435380@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-16sit: do not call ipip6_dev_free() from sit_init_net()Eric Dumazet
ipip6_dev_free is sit dev->priv_destructor, already called by register_netdevice() if something goes wrong. Alternative would be to make ipip6_dev_free() robust against multiple invocations, but other drivers do not implement this strategy. syzbot reported: dst_release underflow WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173 Modules linked in: CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173 Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48 RSP: 0018:ffffc9000aa5faa0 EFLAGS: 00010246 RAX: d6894a925dd15a00 RBX: 00000000ffffffff RCX: 0000000000040000 RDX: ffffc90005e19000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: 0000000000000000 R08: ffffffff816a1f42 R09: ffffed1017344f2c R10: ffffed1017344f2c R11: 0000000000000000 R12: 0000607f462b1358 R13: 1ffffffff1bfd305 R14: ffffe8ffffcb1358 R15: dffffc0000000000 FS: 00007f66c71a2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f88aaed5058 CR3: 0000000023e0f000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160 ipip6_dev_free net/ipv6/sit.c:1414 [inline] sit_init_net+0x229/0x550 net/ipv6/sit.c:1936 ops_init+0x313/0x430 net/core/net_namespace.c:140 setup_net+0x35b/0x9d0 net/core/net_namespace.c:326 copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470 create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226 ksys_unshare+0x57d/0xb50 kernel/fork.c:3075 __do_sys_unshare kernel/fork.c:3146 [inline] __se_sys_unshare kernel/fork.c:3144 [inline] __x64_sys_unshare+0x34/0x40 kernel/fork.c:3144 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f66c882ce99 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f66c71a2168 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f66c893ff60 RCX: 00007f66c882ce99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000048040200 RBP: 00007f66c8886ff1 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff6634832f R14: 00007f66c71a2300 R15: 0000000000022000 </TASK> Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20211216111741.1387540-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16net: systemport: Add global locking for descriptor lifecycleFlorian Fainelli
The descriptor list is a shared resource across all of the transmit queues, and the locking mechanism used today only protects concurrency across a given transmit queue between the transmit and reclaiming. This creates an opportunity for the SYSTEMPORT hardware to work on corrupted descriptors if we have multiple producers at once which is the case when using multiple transmit queues. This was particularly noticeable when using multiple flows/transmit queues and it showed up in interesting ways in that UDP packets would get a correct UDP header checksum being calculated over an incorrect packet length. Similarly TCP packets would get an equally correct checksum computed by the hardware over an incorrect packet length. The SYSTEMPORT hardware maintains an internal descriptor list that it re-arranges when the driver produces a new descriptor anytime it writes to the WRITE_PORT_{HI,LO} registers, there is however some delay in the hardware to re-organize its descriptors and it is possible that concurrent TX queues eventually break this internal allocation scheme to the point where the length/status part of the descriptor gets used for an incorrect data buffer. The fix is to impose a global serialization for all TX queues in the short section where we are writing to the WRITE_PORT_{HI,LO} registers which solves the corruption even with multiple concurrent TX queues being used. Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20211215202450.4086240-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16net/smc: Prevent smc_release() from long blockingD. Wythe
In nginx/wrk benchmark, there's a hung problem with high probability on case likes that: (client will last several minutes to exit) server: smc_run nginx client: smc_run wrk -c 10000 -t 1 http://server Client hangs with the following backtrace: 0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f 1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6 2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c 3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de 4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3 5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc] 6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d 7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl 8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93 9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5 10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2 11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a 12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994 13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373 14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c This issue dues to flush_work(), which is used to wait for smc_connect_work() to finish in smc_release(). Once lots of smc_connect_work() was pending or all executing work dangling, smc_release() has to block until one worker comes to free, which is equivalent to wait another smc_connnect_work() to finish. In order to fix this, There are two changes: 1. For those idle smc_connect_work(), cancel it from the workqueue; for executing smc_connect_work(), waiting for it to finish. For that purpose, replace flush_work() with cancel_work_sync(). 2. Since smc_connect() hold a reference for passive closing, if smc_connect_work() has been cancelled, release the reference. Fixes: 24ac3a08e658 ("net/smc: rebuild nonblocking connect") Reported-by: Tony Lu <tonylu@linux.alibaba.com> Tested-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16Merge tag 'tegra-for-5.16-soc-fixes' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/fixes soc/tegra: Fixes for v5.16-rc6 This contains a single build fix without which ARM allmodconfig builds are broken if -Werror is enabled. * tag 'tegra-for-5.16-soc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: soc/tegra: fuse: Fix bitwise vs. logical OR warning Link: https://lore.kernel.org/r/20211215162618.3568474-1-thierry.reding@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-16netfilter: ctnetlink: remove expired entries firstFlorian Westphal
When dumping conntrack table to userspace via ctnetlink, check if the ct has already expired before doing any of the 'skip' checks. This expires dead entries faster. /proc handler also removes outdated entries first. Reported-by: Vitaly Zuevsky <vzuevsky@ns1.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16net: Fix double 0x prefix print in SKB dumpGal Pressman
When printing netdev features %pNF already takes care of the 0x prefix, remove the explicit one. Fixes: 6413139dfc64 ("skbuff: increase verbosity when dumping skb data") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16virtio_net: fix rx_drops stat for small pktsWenliang Wang
We found the stat of rx drops for small pkts does not increment when build_skb fail, it's not coherent with other mode's rx drops stat. Signed-off-by: Wenliang Wang <wangwenliang.1995@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16dsa: mv88e6xxx: fix debug print for SPEED_UNFORCEDAndrey Eremeev
Debug print uses invalid check to detect if speed is unforced: (speed != SPEED_UNFORCED) should be used instead of (!speed). Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Andrey Eremeev <Axtone4all@yandex.ru> Fixes: 96a2b40c7bd3 ("net: dsa: mv88e6xxx: add port's MAC speed setter") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16sfc_ef100: potential dereference of null pointerJiasheng Jiang
The return value of kmalloc() needs to be checked. To avoid use in efx_nic_update_stats() in case of the failure of alloc. Fixes: b593b6f1b492 ("sfc_ef100: statistics gathering") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16net: stmmac: dwmac-rk: fix oob read in rk_gmac_setupJohn Keeping
KASAN reports an out-of-bounds read in rk_gmac_setup on the line: while (ops->regs[i]) { This happens for most platforms since the regs flexible array member is empty, so the memory after the ops structure is being read here. It seems that mostly this happens to contain zero anyway, so we get lucky and everything still works. To avoid adding redundant data to nearly all the ops structures, add a new flag to indicate whether the regs field is valid and avoid this loop when it is not. Fixes: 3bb3d6b1c195 ("net: stmmac: Add RK3566/RK3568 SoC support") Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-12-15 This series contains updates to igb, igbvf, igc and ixgbe drivers. Karen moves checks for invalid VF MAC filters to occur earlier for igb. Letu Ren fixes a double free issue in igbvf probe. Sasha fixes incorrect min value being used when calculating for max for igc. Robert Schlabbach adds documentation on enabling NBASE-T support for ixgbe. Cyril Novikov adds missing initialization of MDIO bus speed for ixgbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15net: usb: lan78xx: add Allied Telesis AT29M2-AFGreg Jesionowski
This adds the vendor and product IDs for the AT29M2-AF which is a lan7801-based device. Signed-off-by: Greg Jesionowski <jesionowskigreg@gmail.com> Link: https://lore.kernel.org/r/20211214221027.305784-1-jesionowskigreg@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15net/packet: rx_owner_map depends on pg_vecWillem de Bruijn
Packet sockets may switch ring versions. Avoid misinterpreting state between versions, whose fields share a union. rx_owner_map is only allocated with a packet ring (pg_vec) and both are swapped together. If pg_vec is NULL, meaning no packet ring was allocated, then neither was rx_owner_map. And the field may be old state from a tpacket_v3. Fixes: 61fad6816fc1 ("net/packet: tpacket_rcv: avoid a producer race condition") Reported-by: Syzbot <syzbot+1ac0994a0a0c55151121@syzkaller.appspotmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211215143937.106178-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15netdevsim: Zero-initialize memory for new map's value in function ↵Haimin Zhang
nsim_bpf_map_alloc Zero-initialize memory for new map's value in function nsim_bpf_map_alloc since it may cause a potential kernel information leak issue, as follows: 1. nsim_bpf_map_alloc calls nsim_map_alloc_elem to allocate elements for a new map. 2. nsim_map_alloc_elem uses kmalloc to allocate map's value, but doesn't zero it. 3. A user application can use IOCTL BPF_MAP_LOOKUP_ELEM to get specific element's information in the map. 4. The kernel function map_lookup_elem will call bpf_map_copy_value to get the information allocated at step-2, then use copy_to_user to copy to the user buffer. This can only leak information for an array map. Fixes: 395cacb5f1a0 ("netdevsim: bpf: support fake map offload") Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Haimin Zhang <tcs.kernel@gmail.com> Link: https://lore.kernel.org/r/20211215111530.72103-1-tcs.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15dpaa2-eth: fix ethtool statisticsIoana Ciornei
Unfortunately, with the blamed commit I also added a side effect in the ethtool stats shown. Because I added two more fields in the per channel structure without verifying if its size is used in any way, part of the ethtool statistics were off by 2. Fix this by not looking up the size of the structure but instead on a fixed value kept in a macro. Fixes: fc398bec0387 ("net: dpaa2: add adaptive interrupt coalescing") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20211215105831.290070-1-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16netfilter: fix regression in looped (broad|multi)cast's MAC handlingIgnacy Gawędzki
In commit 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared"), the test for non-empty MAC header introduced in commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") has been replaced with a test for a set MAC header. This breaks the case when the MAC header has been reset (using skb_reset_mac_header), as is the case with looped-back multicast packets. As a result, the packets ending up in NFQUEUE get a bogus hwaddr interpreted from the first bytes of the IP header. This patch adds a test for a non-empty MAC header in addition to the test for a set MAC header. The same two tests are also implemented in nfnetlink_log.c, where the initial code of commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC handling") has not been touched, but where supposedly the same situation may happen. Fixes: 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac header was cleared") Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-15netfilter: nf_tables: fix use-after-free in nft_set_catchall_destroy()Eric Dumazet
We need to use list_for_each_entry_safe() iterator because we can not access @catchall after kfree_rcu() call. syzbot reported: BUG: KASAN: use-after-free in nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline] BUG: KASAN: use-after-free in nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] BUG: KASAN: use-after-free in nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493 Read of size 8 at addr ffff8880716e5b80 by task syz-executor.3/8871 CPU: 1 PID: 8871 Comm: syz-executor.3 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x2ed mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:450 nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4486 [inline] nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] nft_set_destroy+0x3fd/0x4f0 net/netfilter/nf_tables_api.c:4493 __nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626 nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 blocking_notifier_call_chain kernel/notifier.c:318 [inline] blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306 netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788 __sock_release+0xcd/0x280 net/socket.c:649 sock_close+0x18/0x20 net/socket.c:1314 __fput+0x286/0x9f0 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f75fbf28adb Code: 0f 05 48 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8 63 fc ff ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 fc ff ff 8b 44 RSP: 002b:00007ffd8da7ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f75fbf28adb RDX: 00007f75fc08e828 RSI: ffffffffffffffff RDI: 0000000000000003 RBP: 00007f75fc08a960 R08: 0000000000000000 R09: 00007f75fc08e830 R10: 00007ffd8da7ed10 R11: 0000000000000293 R12: 00000000002067c3 R13: 00007ffd8da7ed10 R14: 00007f75fc088f60 R15: 0000000000000032 </TASK> Allocated by task 8886: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:434 [inline] ____kasan_kmalloc mm/kasan/common.c:513 [inline] ____kasan_kmalloc mm/kasan/common.c:472 [inline] __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:522 kasan_kmalloc include/linux/kasan.h:269 [inline] kmem_cache_alloc_trace+0x1ea/0x4a0 mm/slab.c:3575 kmalloc include/linux/slab.h:590 [inline] nft_setelem_catchall_insert net/netfilter/nf_tables_api.c:5544 [inline] nft_setelem_insert net/netfilter/nf_tables_api.c:5562 [inline] nft_add_set_elem+0x232e/0x2f40 net/netfilter/nf_tables_api.c:5936 nf_tables_newsetelem+0x6ff/0xbb0 net/netfilter/nf_tables_api.c:6032 nfnetlink_rcv_batch+0x1710/0x25f0 net/netfilter/nfnetlink.c:513 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 15335: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0xd1/0x110 mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:235 [inline] __cache_free mm/slab.c:3445 [inline] kmem_cache_free_bulk+0x67/0x1e0 mm/slab.c:3766 kfree_bulk include/linux/slab.h:446 [inline] kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3273 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Last potentially related work creation: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 __kasan_record_aux_stack+0xb5/0xe0 mm/kasan/generic.c:348 kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3550 nft_set_catchall_destroy net/netfilter/nf_tables_api.c:4489 [inline] nft_set_destroy net/netfilter/nf_tables_api.c:4504 [inline] nft_set_destroy+0x34a/0x4f0 net/netfilter/nf_tables_api.c:4493 __nft_release_table+0x79f/0xcd0 net/netfilter/nf_tables_api.c:9626 nft_rcv_nl_event+0x4f8/0x670 net/netfilter/nf_tables_api.c:9688 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 blocking_notifier_call_chain kernel/notifier.c:318 [inline] blocking_notifier_call_chain+0x67/0x90 kernel/notifier.c:306 netlink_release+0xcb6/0x1dd0 net/netlink/af_netlink.c:788 __sock_release+0xcd/0x280 net/socket.c:649 sock_close+0x18/0x20 net/socket.c:1314 __fput+0x286/0x9f0 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff8880716e5b80 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff8880716e5b80, ffff8880716e5bc0) The buggy address belongs to the page: page:ffffea0001c5b940 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880716e5c00 pfn:0x716e5 flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000000200 ffffea0000911848 ffffea00007c4d48 ffff888010c40200 raw: ffff8880716e5c00 ffff8880716e5000 000000010000001e 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x242040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_THISNODE), pid 3638, ts 211086074437, free_ts 211031029429 prep_new_page mm/page_alloc.c:2418 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369 __alloc_pages_node include/linux/gfp.h:570 [inline] kmem_getpages mm/slab.c:1377 [inline] cache_grow_begin+0x75/0x470 mm/slab.c:2593 cache_alloc_refill+0x27f/0x380 mm/slab.c:2965 ____cache_alloc mm/slab.c:3048 [inline] ____cache_alloc mm/slab.c:3031 [inline] __do_cache_alloc mm/slab.c:3275 [inline] slab_alloc mm/slab.c:3316 [inline] __do_kmalloc mm/slab.c:3700 [inline] __kmalloc+0x3b3/0x4d0 mm/slab.c:3711 kmalloc include/linux/slab.h:595 [inline] kzalloc include/linux/slab.h:724 [inline] tomoyo_get_name+0x234/0x480 security/tomoyo/memory.c:173 tomoyo_parse_name_union+0xbc/0x160 security/tomoyo/util.c:260 tomoyo_update_path_number_acl security/tomoyo/file.c:687 [inline] tomoyo_write_file+0x629/0x7f0 security/tomoyo/file.c:1034 tomoyo_write_domain2+0x116/0x1d0 security/tomoyo/common.c:1152 tomoyo_add_entry security/tomoyo/common.c:2042 [inline] tomoyo_supervisor+0xbc7/0xf00 security/tomoyo/common.c:2103 tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline] tomoyo_path_number_perm+0x419/0x590 security/tomoyo/file.c:734 security_file_ioctl+0x50/0xb0 security/security.c:1541 __do_sys_ioctl fs/ioctl.c:868 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0xb3/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1338 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389 free_unref_page_prepare mm/page_alloc.c:3309 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3388 slab_destroy mm/slab.c:1627 [inline] slabs_destroy+0x89/0xc0 mm/slab.c:1647 cache_flusharray mm/slab.c:3418 [inline] ___cache_free+0x4cc/0x610 mm/slab.c:3480 qlink_free mm/kasan/quarantine.c:146 [inline] qlist_free_all+0x4e/0x110 mm/kasan/quarantine.c:165 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272 __kasan_slab_alloc+0x97/0xb0 mm/kasan/common.c:444 kasan_slab_alloc include/linux/kasan.h:259 [inline] slab_post_alloc_hook mm/slab.h:519 [inline] slab_alloc_node mm/slab.c:3261 [inline] kmem_cache_alloc_node+0x2ea/0x590 mm/slab.c:3599 __alloc_skb+0x215/0x340 net/core/skbuff.c:414 alloc_skb include/linux/skbuff.h:1126 [inline] nlmsg_new include/net/netlink.h:953 [inline] rtmsg_ifinfo_build_skb+0x72/0x1a0 net/core/rtnetlink.c:3808 rtmsg_ifinfo_event net/core/rtnetlink.c:3844 [inline] rtmsg_ifinfo_event net/core/rtnetlink.c:3835 [inline] rtmsg_ifinfo+0x83/0x120 net/core/rtnetlink.c:3853 netdev_state_change net/core/dev.c:1395 [inline] netdev_state_change+0x114/0x130 net/core/dev.c:1386 linkwatch_do_dev+0x10e/0x150 net/core/link_watch.c:167 __linkwatch_run_queue+0x233/0x6a0 net/core/link_watch.c:213 linkwatch_event+0x4a/0x60 net/core/link_watch.c:252 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 Memory state around the buggy address: ffff8880716e5a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880716e5b00: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc >ffff8880716e5b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff8880716e5c00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880716e5c80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-15ixgbe: set X550 MDIO speed before talking to PHYCyril Novikov
The MDIO bus speed must be initialized before talking to the PHY the first time in order to avoid talking to it using a speed that the PHY doesn't support. This fixes HW initialization error -17 (IXGBE_ERR_PHY_ADDR_INVALID) on Denverton CPUs (a.k.a. the Atom C3000 family) on ports with a 10Gb network plugged in. On those devices, HLREG0[MDCSPD] resets to 1, which combined with the 10Gb network results in a 24MHz MDIO speed, which is apparently too fast for the connected PHY. PHY register reads over MDIO bus return garbage, leading to initialization failure. Reproduced with Linux kernel 4.19 and 5.15-rc7. Can be reproduced using the following setup: * Use an Atom C3000 family system with at least one X552 LAN on the SoC * Disable PXE or other BIOS network initialization if possible (the interface must not be initialized before Linux boots) * Connect a live 10Gb Ethernet cable to an X550 port * Power cycle (not reset, doesn't always work) the system and boot Linux * Observe: ixgbe interfaces w/ 10GbE cables plugged in fail with error -17 Fixes: e84db7272798 ("ixgbe: Introduce function to control MDIO speed") Signed-off-by: Cyril Novikov <cnovikov@lynx.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15dm integrity: fix data corruption due to improper use of bvec_kmap_localMike Snitzer
Commit 25058d1c725c ("dm integrity: use bvec_kmap_local in __journal_read_write") didn't account for __journal_read_write() later adding the biovec's bv_offset. As such using bvec_kmap_local() caused the start of the biovec to be skipped. Trivial test that illustrates data corruption: # integritysetup format /dev/pmem0 # integritysetup open /dev/pmem0 integrityroot # mkfs.xfs /dev/mapper/integrityroot ... bad magic number bad magic number Metadata corruption detected at xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 releasing dirty buffer (bulk) to free list! Fix this by using kmap_local_page() instead of bvec_kmap_local() in __journal_read_write(). Fixes: 25058d1c725c ("dm integrity: use bvec_kmap_local in __journal_read_write") Reported-by: Tony Asleson <tasleson@redhat.com> Reviewed-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-12-15ixgbe: Document how to enable NBASE-T supportRobert Schlabbach
Commit a296d665eae1 ("ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps support") introduced suppression of the advertisement of NBASE-T speeds by default, according to Todd Fujinaka to accommodate customers with network switches which could not cope with advertised NBASE-T speeds, as posted in the E1000-devel mailing list: https://sourceforge.net/p/e1000/mailman/message/37106269/ However, the suppression was not documented at all, nor was how to enable NBASE-T support. Properly document the NBASE-T suppression and how to enable NBASE-T support. Fixes: a296d665eae1 ("ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps support") Reported-by: Robert Schlabbach <robert_s@gmx.net> Signed-off-by: Robert Schlabbach <robert_s@gmx.net> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15igc: Fix typo in i225 LTR functionsSasha Neftin
The LTR maximum value was incorrectly written using the scale from the LTR minimum value. This would cause incorrect values to be sent, in cases where the initial calculation lead to different min/max scales. Fixes: 707abf069548 ("igc: Add initial LTR support") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15igbvf: fix double free in `igbvf_probe`Letu Ren
In `igbvf_probe`, if register_netdev() fails, the program will go to label err_hw_init, and then to label err_ioremap. In free_netdev() which is just below label err_ioremap, there is `list_for_each_entry_safe` and `netif_napi_del` which aims to delete all entries in `dev->napi_list`. The program has added an entry `adapter->rx_ring->napi` which is added by `netif_napi_add` in igbvf_alloc_queues(). However, adapter->rx_ring has been freed below label err_hw_init. So this a UAF. In terms of how to patch the problem, we can refer to igbvf_remove() and delete the entry before `adapter->rx_ring`. The KASAN logs are as follows: [ 35.126075] BUG: KASAN: use-after-free in free_netdev+0x1fd/0x450 [ 35.127170] Read of size 8 at addr ffff88810126d990 by task modprobe/366 [ 35.128360] [ 35.128643] CPU: 1 PID: 366 Comm: modprobe Not tainted 5.15.0-rc2+ #14 [ 35.129789] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 35.131749] Call Trace: [ 35.132199] dump_stack_lvl+0x59/0x7b [ 35.132865] print_address_description+0x7c/0x3b0 [ 35.133707] ? free_netdev+0x1fd/0x450 [ 35.134378] __kasan_report+0x160/0x1c0 [ 35.135063] ? free_netdev+0x1fd/0x450 [ 35.135738] kasan_report+0x4b/0x70 [ 35.136367] free_netdev+0x1fd/0x450 [ 35.137006] igbvf_probe+0x121d/0x1a10 [igbvf] [ 35.137808] ? igbvf_vlan_rx_add_vid+0x100/0x100 [igbvf] [ 35.138751] local_pci_probe+0x13c/0x1f0 [ 35.139461] pci_device_probe+0x37e/0x6c0 [ 35.165526] [ 35.165806] Allocated by task 366: [ 35.166414] ____kasan_kmalloc+0xc4/0xf0 [ 35.167117] foo_kmem_cache_alloc_trace+0x3c/0x50 [igbvf] [ 35.168078] igbvf_probe+0x9c5/0x1a10 [igbvf] [ 35.168866] local_pci_probe+0x13c/0x1f0 [ 35.169565] pci_device_probe+0x37e/0x6c0 [ 35.179713] [ 35.179993] Freed by task 366: [ 35.180539] kasan_set_track+0x4c/0x80 [ 35.181211] kasan_set_free_info+0x1f/0x40 [ 35.181942] ____kasan_slab_free+0x103/0x140 [ 35.182703] kfree+0xe3/0x250 [ 35.183239] igbvf_probe+0x1173/0x1a10 [igbvf] [ 35.184040] local_pci_probe+0x13c/0x1f0 Fixes: d4e0fe01a38a0 (igbvf: add new driver to support 82576 virtual functions) Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Letu Ren <fantasquex@gmail.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15igb: Fix removal of unicast MAC filters of VFsKaren Sornek
Move checking condition of VF MAC filter before clearing or adding MAC filter to VF to prevent potential blackout caused by removal of necessary and working VF's MAC filter. Fixes: 1b8b062a99dc ("igb: add VF trust infrastructure") Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15Merge tag 'ceph-for-5.16-rc6' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "An SGID directory handling fix (marked for stable), a metrics accounting fix and two fixups to appease static checkers" * tag 'ceph-for-5.16-rc6' of git://github.com/ceph/ceph-client: ceph: fix up non-directory creation in SGID directories ceph: initialize pathlen variable in reconnect_caps_cb ceph: initialize i_size variable in ceph_sync_read ceph: fix duplicate increment of opened_inodes metric
2021-12-15Merge tag 's390-5.16-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Add missing handling of R_390_PLT32DBL relocation type in arch_kexec_apply_relocations_add(). Clang and the upcoming gcc 11.3 generate such relocation entries, which our relocation code silently ignores, and which finally will result in an endless loop within the purgatory code in case of kexec. - Add proper handling of errors and print error messages when applying relocations - Fix duplicate tracking of irq nesting level in entry code - Let recordmcount.pl also look for jgnop mnemonic. Starting with binutils 2.37 objdump emits a jgnop mnemonic instead of brcl, which breaks mcount location detection. This is only a problem if used with compilers older than gcc 9, since with gcc 9 and newer compilers recordmcount.pl is not used anymore. - Remove preempt_disable()/preempt_enable() pair in kprobe_ftrace_handler() which was done for all architectures except for s390. - Update defconfig * tag 's390-5.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: recordmcount.pl: look for jgnop instruction as well as bcrl on s390 s390/entry: fix duplicate tracking of irq nesting level s390: enable switchdev support in defconfig s390/kexec: handle R_390_PLT32DBL rela in arch_kexec_apply_relocations_add() s390/ftrace: remove preempt_disable()/preempt_enable() pair s390/kexec_file: fix error handling when applying relocations s390/kexec_file: print some more error messages
2021-12-15Merge tag 'hyperv-fixes-signed-20211214' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fix from Wei Liu: "Build fix from Randy Dunlap" * tag 'hyperv-fixes-signed-20211214' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hv: utils: add PTP_1588_CLOCK to Kconfig to fix build
2021-12-15audit: improve robustness of the audit queue handlingPaul Moore
If the audit daemon were ever to get stuck in a stopped state the kernel's kauditd_thread() could get blocked attempting to send audit records to the userspace audit daemon. With the kernel thread blocked it is possible that the audit queue could grow unbounded as certain audit record generating events must be exempt from the queue limits else the system enter a deadlock state. This patch resolves this problem by lowering the kernel thread's socket sending timeout from MAX_SCHEDULE_TIMEOUT to HZ/10 and tweaks the kauditd_send_queue() function to better manage the various audit queues when connection problems occur between the kernel and the audit daemon. With this patch, the backlog may temporarily grow beyond the defined limits when the audit daemon is stopped and the system is under heavy audit pressure, but kauditd_thread() will continue to make progress and drain the queues as it would for other connection problems. For example, with the audit daemon put into a stopped state and the system configured to audit every syscall it was still possible to shutdown the system without a kernel panic, deadlock, etc.; granted, the system was slow to shutdown but that is to be expected given the extreme pressure of recording every syscall. The timeout value of HZ/10 was chosen primarily through experimentation and this developer's "gut feeling". There is likely no one perfect value, but as this scenario is limited in scope (root privileges would be needed to send SIGSTOP to the audit daemon), it is likely not worth exposing this as a tunable at present. This can always be done at a later date if it proves necessary. Cc: stable@vger.kernel.org Fixes: 5b52330bbfe63 ("audit: fix auditd/kernel connection state tracking") Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com> Tested-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-15soc/tegra: fuse: Fix bitwise vs. logical OR warningNathan Chancellor
A new warning in clang points out two instances where boolean expressions are being used with a bitwise OR instead of logical OR: drivers/soc/tegra/fuse/speedo-tegra20.c:72:9: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical] reg = tegra_fuse_read_spare(i) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ || drivers/soc/tegra/fuse/speedo-tegra20.c:72:9: note: cast one or both operands to int to silence this warning drivers/soc/tegra/fuse/speedo-tegra20.c:87:9: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical] reg = tegra_fuse_read_spare(i) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ || drivers/soc/tegra/fuse/speedo-tegra20.c:87:9: note: cast one or both operands to int to silence this warning 2 warnings generated. The motivation for the warning is that logical operations short circuit while bitwise operations do not. In this instance, tegra_fuse_read_spare() is not semantically returning a boolean, it is returning a bit value. Use u32 for its return type so that it can be used with either bitwise or boolean operators without any warnings. Fixes: 25cd5a391478 ("ARM: tegra: Add speedo-based process identification") Link: https://github.com/ClangBuiltLinux/linux/issues/1488 Suggested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-12-15Merge tag 'wireless-drivers-2021-12-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.16 Second set of fixes for v5.16, hopefully also the last one. I changed my email in MAINTAINERS, one crash fix in iwlwifi and some build problems fixed. iwlwifi * fix crash caused by a warning * fix LED linking problem brcmsmac * rework LED dependencies for being consistent with other drivers mt76 * mt7921: fix build regression ==================== Signed-off-by: David S. Miller <davem@davemloft.net>