summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-25netlink: specs: nfsd: replace underscores with dashes in namesJakub Kicinski
We're trying to add a strict regexp for the name format in the spec. Underscores will not be allowed, dashes should be used instead. This makes no difference to C (codegen, if used, replaces special chars in names) but it gives more uniform naming in Python. Fixes: 13727f85b49b ("NFSD: introduce netlink stubs") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20250624211002.3475021-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-25net: enetc: Correct endianness handling in _enetc_rd_reg64Simon Horman
enetc_hw.h provides two versions of _enetc_rd_reg64. One which simply calls ioread64() when available. And another that composes the 64-bit result from ioread32() calls. In the second case the code appears to assume that each ioread32() call returns a little-endian value. However both the shift and logical or used to compose the return value would not work correctly on big endian systems if this were the case. Moreover, this is inconsistent with the first case where the return value of ioread64() is assumed to be in host byte order. It appears that the correct approach is for both versions to treat the return value of ioread*() functions as being in host byte order. And this patch corrects the ioread32()-based version to do so. This is a bug but would only manifest on big endian systems that make use of the ioread32-based implementation of _enetc_rd_reg64. While all in-tree users of this driver are little endian and make use of the ioread64-based implementation of _enetc_rd_reg64. Thus, no in-tree user of this driver is affected by this bug. Flagged by Sparse. Compile tested only. Fixes: 16eb4c85c964 ("enetc: Add ethtool statistics") Closes: https://lore.kernel.org/all/AM9PR04MB850500D3FC24FE23DEFCEA158879A@AM9PR04MB8505.eurprd04.prod.outlook.com/ Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250624-etnetc-le-v1-1-a73a95d96e4e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-25atm: idt77252: Add missing `dma_map_error()`Thomas Fourier
The DMA map functions can fail and should be tested for errors. Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250624064148.12815-3-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-25selftests/bpf: adapt one more case in test_lru_map to the new target_freeWillem de Bruijn
The below commit that updated BPF_MAP_TYPE_LRU_HASH free target, also updated tools/testing/selftests/bpf/test_lru_map to match. But that missed one case that passes with 4 cores, but fails at higher cpu counts. Update test_lru_sanity3 to also adjust its expectation of target_free. This time tested with 1, 4, 16, 64 and 384 cpu count. Fixes: d4adf1c9ee77 ("bpf: Adjust free target to avoid global starvation of LRU map") Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20250625210412.2732970-1-willemdebruijn.kernel@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25bpf: add btf_type_is_i{32,64} helpersAnton Protopopov
There are places in BPF code which check if a BTF type is an integer of particular size. This code can be made simpler by using helpers. Add new btf_type_is_i{32,64} helpers, and simplify code in a few files. (Suggested by Eduard for a patch which copy-pasted such a check [1].) v1 -> v2: * export less generic helpers (Eduard) * make subject less generic than in [v1] (Eduard) [1] https://lore.kernel.org/bpf/7edb47e73baa46705119a23c6bf4af26517a640f.camel@gmail.com/ [v1] https://lore.kernel.org/bpf/20250624193655.733050-1-a.s.protopopov@gmail.com/ Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250625151621.1000584-1-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25Merge branch 'bpf-allow-void-cast-using-bpf_rdonly_cast'Alexei Starovoitov
Eduard Zingerman says: ==================== bpf: allow void* cast using bpf_rdonly_cast() Currently, pointers returned by `bpf_rdonly_cast()` have a type of "pointer to btf id", and only casts to structure types are allowed. Access to memory pointed to by these pointers is done through `BPF_PROBE_{MEM,MEMSX}` instructions and does not produce errors on invalid memory access. This patch set extends `bpf_rdonly_cast()` to allow casts to an equivalent of 'void *', effectively replacing `bpf_probe_read_kernel()` calls in situations where access to individual bytes or integers is necessary. The mechanism was suggested and explored by Andrii Nakryiko in [1]. To help with detecting support for this feature, an `enum bpf_features` is added with intended usage as follows: if (bpf_core_enum_value_exists(enum bpf_features, BPF_FEAT_RDONLY_CAST_TO_VOID)) ... [1] https://github.com/anakryiko/linux/tree/bpf-mem-cast Changelog: v2: https://lore.kernel.org/bpf/20250625000520.2700423-1-eddyz87@gmail.com/ v2 -> v3: - dropped direct numbering for __MAX_BPF_FEAT. v1: https://lore.kernel.org/bpf/20250624191009.902874-1-eddyz87@gmail.com/ v1 -> v2: - renamed BPF_FEAT_TOTAL to __MAX_BPF_FEAT and moved patch introducing bpf_features enum to the start of the series (Alexei); - dropped patch #3 allowing optout from CAP_SYS_ADMIN drop in prog_tests/verifier.c, use a separate runner in prog_tests/* instead. ==================== Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://patch.msgid.link/20250625182414.30659-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25selftests/bpf: check operations on untrusted ro pointers to memEduard Zingerman
The following cases are tested: - it is ok to load memory at any offset from rdonly_untrusted_mem; - rdonly_untrusted_mem offset/bounds are not tracked; - writes into rdonly_untrusted_mem are forbidden; - atomic operations on rdonly_untrusted_mem are forbidden; - rdonly_untrusted_mem can't be passed as a memory argument of a helper of kfunc; - it is ok to use PTR_TO_MEM and PTR_TO_BTF_ID in a same load instruction. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250625182414.30659-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25bpf: allow void* cast using bpf_rdonly_cast()Eduard Zingerman
Introduce support for `bpf_rdonly_cast(v, 0)`, which casts the value `v` to an untyped, untrusted pointer, logically similar to a `void *`. The memory pointed to by such a pointer is treated as read-only. As with other untrusted pointers, memory access violations on loads return zero instead of causing a fault. Technically: - The resulting pointer is represented as a register of type `PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED` with size zero. - Offsets within such pointers are not tracked. - Same load instructions are allowed to have both `PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED` and `PTR_TO_BTF_ID` as the base pointer types. In such cases, `bpf_insn_aux_data->ptr_type` is considered the weaker of the two: `PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED`. The following constraints apply to the new pointer type: - can be used as a base for LDX instructions; - can't be used as a base for ST/STX or atomic instructions; - can't be used as parameter for kfuncs or helpers. These constraints are enforced by existing handling of `MEM_RDONLY` flag and `PTR_TO_MEM` of size zero. Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250625182414.30659-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25bpf: add bpf_features enumEduard Zingerman
This commit adds a kernel side enum for use in conjucntion with BTF CO-RE bpf_core_enum_value_exists. The goal of the enum is to assist with available BPF features detection. Intended usage looks as follows: if (bpf_core_enum_value_exists(enum bpf_features, BPF_FEAT_<f>)) ... use feature f ... Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250625182414.30659-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25Merge branch 'range-tracking-for-bpf_neg'Alexei Starovoitov
Song Liu says: ==================== Add range tracking for BPF_NEG. Please see commit log of 1/2 for more details. --- Changes v3 => v4: 1. Fix selftest verifier_value_ptr_arith.c. (Eduard) v3: https://lore.kernel.org/bpf/20250624233328.313573-1-song@kernel.org/ Changes v2 => v3: 1. Minor changes in the selftests. (Eduard) v2: https://lore.kernel.org/bpf/20250624220038.656646-1-song@kernel.org/ Changes v1 => v2: 1. Split new selftests to a separate patch. (Eduard) 2. Reset reg id on BPF_NEG. (Eduard) 3. Use env->fake_reg instead of a bpf_reg_state on the stack. (Eduard) 4. Add __msg for passing selftests. v1: https://lore.kernel.org/bpf/20250624172320.2923031-1-song@kernel.org/ ==================== Link: https://patch.msgid.link/20250625164025.3310203-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25selftests/bpf: Add tests for BPF_NEG range tracking logicSong Liu
BPF_REG now has range tracking logic. Add selftests for BPF_NEG. Specifically, return value of LSM hook lsm.s/socket_connect is used to show that the verifer tracks BPF_NEG(1) falls in the [-4095, 0] range; while BPF_NEG(100000) does not fall in that range. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250625164025.3310203-3-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25bpf: Add range tracking for BPF_NEGSong Liu
Add range tracking for instruction BPF_NEG. Without this logic, a trivial program like the following will fail volatile bool found_value_b; SEC("lsm.s/socket_connect") int BPF_PROG(test_socket_connect) { if (!found_value_b) return -1; return 0; } with verifier log: "At program exit the register R0 has smin=0 smax=4294967295 should have been in [-4095, 0]". This is because range information is lost in BPF_NEG: 0: R1=ctx() R10=fp0 ; if (!found_value_b) @ xxxx.c:24 0: (18) r1 = 0xffa00000011e7048 ; R1_w=map_value(...) 2: (71) r0 = *(u8 *)(r1 +0) ; R0_w=scalar(smin32=0,smax=255) 3: (a4) w0 ^= 1 ; R0_w=scalar(smin32=0,smax=255) 4: (84) w0 = -w0 ; R0_w=scalar(range info lost) Note that, the log above is manually modified to highlight relevant bits. Fix this by maintaining proper range information with BPF_NEG, so that the verifier will know: 4: (84) w0 = -w0 ; R0_w=scalar(smin32=-255,smax=0) Also updated selftests based on the expected behavior. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250625164025.3310203-2-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-25selftests/bpf: Fix usdt multispec failure with arm64/clang20 selftest buildYonghong Song
When building the selftest with arm64/clang20, the following test failed: ... ubtest_multispec_usdt:PASS:usdt_100_called 0 nsec subtest_multispec_usdt:PASS:usdt_100_sum 0 nsec subtest_multispec_usdt:FAIL:usdt_300_bad_attach unexpected pointer: 0xaaaad82a2a80 #471/2 usdt/multispec:FAIL #471 usdt:FAIL But arm64/gcc11 built kernel selftests succeeded. Further debug found arm64/clang generated code has much less argument pattern after dedup, but gcc generated code has a lot more. Check usdt probes with usdt.test.o on arm64 platform: with gcc11 build binary: stapsdt 0x0000002e NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x00000000000054f8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[sp] stapsdt 0x00000031 NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x0000000000005510, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[sp, 4] ... stapsdt 0x00000032 NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x0000000000005660, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[sp, 60] ... stapsdt 0x00000034 NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x00000000000070e8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[sp, 1192] stapsdt 0x00000034 NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x0000000000007100, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[sp, 1196] ... stapsdt 0x00000032 NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x0000000000009ec4, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[sp, 60] with clang20 build binary: stapsdt 0x0000002e NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x00000000000009a0, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[x9] stapsdt 0x0000002e NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x00000000000009b8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[x9] ... stapsdt 0x0000002e NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x0000000000002590, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[x9] stapsdt 0x0000002e NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x00000000000025a8, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[x8] ... stapsdt 0x0000002f NT_STAPSDT (SystemTap probe descriptors) Provider: test Name: usdt_300 Location: 0x0000000000007fdc, Base: 0x0000000000000000, Semaphore: 0x0000000000000008 Arguments: -4@[x10] There are total 300 locations for usdt_300. For gcc11 built binary, there are 300 spec's. But for clang20 built binary, there are 3 spec's. The default BPF_USDT_MAX_SPEC_CNT is 256, so bpf_program__attach_usdt() will fail for gcc but it will succeed with clang. To fix the problem, do not do bpf_program__attach_usdt() for usdt_300 with arm64/clang setup. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250624211802.2198821-1-yonghong.song@linux.dev
2025-06-25libbpf: Fix possible use-after-free for externsAdin Scannell
The `name` field in `obj->externs` points into the BTF data at initial open time. However, some functions may invalidate this after opening and before loading (e.g. `bpf_map__set_value_size`), which results in pointers into freed memory and undefined behavior. The simplest solution is to simply `strdup` these strings, similar to the `essent_name`, and free them at the same time. In order to test this path, the `global_map_resize` BPF selftest is modified slightly to ensure the presence of an extern, which causes this test to fail prior to the fix. Given there isn't an obvious API or error to test against, I opted to add this to the existing test as an aspect of the resizing feature rather than duplicate the test. Fixes: 9d0a23313b1a ("libbpf: Add capability for resizing datasec maps") Signed-off-by: Adin Scannell <amscanne@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250625050215.2777374-1-amscanne@meta.com
2025-06-25Merge tag 'spi-fix-v6.16-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "One fix for a runtime PM underflow when removing the Cadence QuadSPI driver" * tag 'spi-fix-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-cadence-quadspi: Fix pm runtime unbalance
2025-06-25Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Fixes all in drivers. ufs and megaraid_sas are small and obvious. The large diffstat in fnic comes from two pieces: the addition of quite a bit of logging (no change to function) and the reworking of the timeout allocation path for the two conditions that can occur simultaneously to prevent reusing the same abort frame and then both trying to free it" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: fnic: Fix missing DMA mapping error in fnic_send_frame() scsi: fnic: Set appropriate logging level for log message scsi: fnic: Add and improve logs in FDMI and FDMI ABTS paths scsi: fnic: Turn off FDMI ACTIVE flags on link down scsi: fnic: Fix crash in fnic_wq_cmpl_handler when FDMI times out scsi: ufs: core: Fix clk scaling to be conditional in reset and restore scsi: megaraid_sas: Fix invalid node index
2025-06-25Merge tag 'uml-for-6.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML fixes from Johannes Berg: - fix FP registers in seccomp mode - prevent duplicate devices in VFIO support - don't ignore errors in UBD thread start - reduce stack use with clang 19 * tag 'uml-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: vector: Reduce stack usage in vector_eth_configure() um: Use correct data source in fpregs_legacy_set() um: vfio: Prevent duplicate device assignments um: ubd: Add missing error check in start_io_thread()
2025-06-25Merge tag 'wireless-2025-06-25' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few fixes: - iwlegacy: work around large stack with clang/kasan - mac80211: fix integer overflow - mac80211: fix link struct init vs. RCU publish - iwlwifi: fix warning on IFF_UP * tag 'wireless-2025-06-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: finish link init before RCU publish wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd version wifi: mac80211: fix beacon interval calculation overflow wifi: iwlegacy: work around excessive stack usage on clang/kasan ==================== Link: https://patch.msgid.link/20250625115433.41381-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-25Merge tag 'iwlwifi-fixes-2025-06-25' of ↵Johannes Berg
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Miri Korenblit says: ==================== iwlwifi-fixes: fix failure in interface up ==================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25um: vector: Reduce stack usage in vector_eth_configure()Tiwei Bie
When compiling with clang (19.1.7), initializing *vp using a compound literal may result in excessive stack usage. Fix it by initializing the required fields of *vp individually. Without this patch: $ objdump -d arch/um/drivers/vector_kern.o | ./scripts/checkstack.pl x86_64 0 ... 0x0000000000000540 vector_eth_configure [vector_kern.o]:1472 ... With this patch: $ objdump -d arch/um/drivers/vector_kern.o | ./scripts/checkstack.pl x86_64 0 ... 0x0000000000000540 vector_eth_configure [vector_kern.o]:208 ... Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506221017.WtB7Usua-lkp@intel.com/ Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250623110829.314864-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25um: Use correct data source in fpregs_legacy_set()Tiwei Bie
Read from the buffer pointed to by 'from' instead of '&buf', as 'buf' contains no valid data when 'ubuf' is NULL. Fixes: b1e1bd2e6943 ("um: Add helper functions to get/set state for SECCOMP") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250606124428.148164-5-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25um: vfio: Prevent duplicate device assignmentsTiwei Bie
Ensure devices are assigned only once. Reject subsequent requests for duplicate assignments. Fixes: a0e2cb6a9063 ("um: Add VFIO-based virtual PCI driver") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250606124428.148164-4-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-25um: ubd: Add missing error check in start_io_thread()Tiwei Bie
The subsequent call to os_set_fd_block() overwrites the previous return value. OR the two return values together to fix it. Fixes: f88f0bdfc32f ("um: UBD Improvements") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250606124428.148164-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-24Merge branch 'bpf-verifier-improve-precision-of-bpf_add-and-bpf_sub'Alexei Starovoitov
Harishankar Vishwanathan says: ==================== bpf, verifier: Improve precision of BPF_ADD and BPF_SUB This patchset improves the precision of BPF_ADD and BPF_SUB range tracking. It also adds selftests that exercise the cases where precision improvement occurs, and selftests for the cases where precise bounds cannot be computed and the output register state values are set to unbounded. Changelog: v3: * Improve readability in selftests and commit message by using more readable constants (suggested by Eduard Zingerman). * Add four new selftests for the cases where precise output register state bounds cannot be computed in scalar(32)_min_max_add/sub, so the output register state must be set to unbounded, i.e., [0, U64_MAX] or [0, U32_MAX]. * Add suggested-by Eduard tag to commit message for changes to verifier_bounds.c v2: * Add clearer example of precision improvement in the commit message for verifier.c changes. * Add selftests that exercise the precision improvement to verifier_bounds.c (suggested by Eduard Zingerman). v1: https://lore.kernel.org/bpf/20250610221356.2663491-1-harishankar.vishwanathan@gmail.com/ ==================== Link: https://patch.msgid.link/20250623040359.343235-1-harishankar.vishwanathan@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-24selftests/bpf: Add testcases for BPF_ADD and BPF_SUBHarishankar Vishwanathan
The previous commit improves the precision in scalar(32)_min_max_add, and scalar(32)_min_max_sub. The improvement in precision occurs in cases when all outcomes overflow or underflow, respectively. This commit adds selftests that exercise those cases. This commit also adds selftests for cases where the output register state bounds for u(32)_min/u(32)_max are conservatively set to unbounded (when there is partial overflow or underflow). Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu> Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu> Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250623040359.343235-3-harishankar.vishwanathan@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-24bpf, verifier: Improve precision for BPF_ADD and BPF_SUBHarishankar Vishwanathan
This patch improves the precison of the scalar(32)_min_max_add and scalar(32)_min_max_sub functions, which update the u(32)min/u(32)_max ranges for the BPF_ADD and BPF_SUB instructions. We discovered this more precise operator using a technique we are developing for automatically synthesizing functions for updating tnums and ranges. According to the BPF ISA [1], "Underflow and overflow are allowed during arithmetic operations, meaning the 64-bit or 32-bit value will wrap". Our patch leverages the wrap-around semantics of unsigned overflow and underflow to improve precision. Below is an example of our patch for scalar_min_max_add; the idea is analogous for all four functions. There are three cases to consider when adding two u64 ranges [dst_umin, dst_umax] and [src_umin, src_umax]. Consider a value x in the range [dst_umin, dst_umax] and another value y in the range [src_umin, src_umax]. (a) No overflow: No addition x + y overflows. This occurs when even the largest possible sum, i.e., dst_umax + src_umax <= U64_MAX. (b) Partial overflow: Some additions x + y overflow. This occurs when the largest possible sum overflows (dst_umax + src_umax > U64_MAX), but the smallest possible sum does not overflow (dst_umin + src_umin <= U64_MAX). (c) Full overflow: All additions x + y overflow. This occurs when both the smallest possible sum and the largest possible sum overflow, i.e., both (dst_umin + src_umin) and (dst_umax + src_umax) are > U64_MAX. The current implementation conservatively sets the output bounds to unbounded, i.e, [umin=0, umax=U64_MAX], whenever there is *any* possibility of overflow, i.e, in cases (b) and (c). Otherwise it computes tight bounds as [dst_umin + src_umin, dst_umax + src_umax]: if (check_add_overflow(*dst_umin, src_reg->umin_value, dst_umin) || check_add_overflow(*dst_umax, src_reg->umax_value, dst_umax)) { *dst_umin = 0; *dst_umax = U64_MAX; } Our synthesis-based technique discovered a more precise operator. Particularly, in case (c), all possible additions x + y overflow and wrap around according to eBPF semantics, and the computation of the output range as [dst_umin + src_umin, dst_umax + src_umax] continues to work. Only in case (b), do we need to set the output bounds to unbounded, i.e., [0, U64_MAX]. Case (b) can be checked by seeing if the minimum possible sum does *not* overflow and the maximum possible sum *does* overflow, and when that happens, we set the output to unbounded: min_overflow = check_add_overflow(*dst_umin, src_reg->umin_value, dst_umin); max_overflow = check_add_overflow(*dst_umax, src_reg->umax_value, dst_umax); if (!min_overflow && max_overflow) { *dst_umin = 0; *dst_umax = U64_MAX; } Below is an example eBPF program and the corresponding log from the verifier. The current implementation of scalar_min_max_add() sets r3's bounds to [0, U64_MAX] at instruction 5: (0f) r3 += r3, due to conservative overflow handling. 0: R1=ctx() R10=fp0 0: (b7) r4 = 0 ; R4_w=0 1: (87) r4 = -r4 ; R4_w=scalar() 2: (18) r3 = 0xa000000000000000 ; R3_w=0xa000000000000000 4: (4f) r3 |= r4 ; R3_w=scalar(smin=0xa000000000000000,smax=-1,umin=0xa000000000000000,var_off=(0xa000000000000000; 0x5fffffffffffffff)) R4_w=scalar() 5: (0f) r3 += r3 ; R3_w=scalar() 6: (b7) r0 = 1 ; R0_w=1 7: (95) exit With our patch, r3's bounds after instruction 5 are set to a much more precise [0x4000000000000000,0xfffffffffffffffe]. ... 5: (0f) r3 += r3 ; R3_w=scalar(umin=0x4000000000000000,umax=0xfffffffffffffffe) 6: (b7) r0 = 1 ; R0_w=1 7: (95) exit The logic for scalar32_min_max_add is analogous. For the scalar(32)_min_max_sub functions, the reasoning is similar but applied to detecting underflow instead of overflow. We verified the correctness of the new implementations using Agni [3,4]. We since also discovered that a similar technique has been used to calculate output ranges for unsigned interval addition and subtraction in Hacker's Delight [2]. [1] https://docs.kernel.org/bpf/standardization/instruction-set.html [2] Hacker's Delight Ch.4-2, Propagating Bounds through Add’s and Subtract’s [3] https://github.com/bpfverif/agni [4] https://people.cs.rutgers.edu/~sn349/papers/sas24-preprint.pdf Co-developed-by: Matan Shachnai <m.shachnai@rutgers.edu> Signed-off-by: Matan Shachnai <m.shachnai@rutgers.edu> Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250623040359.343235-2-harishankar.vishwanathan@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-24bnxt: properly flush XDP redirect listsYan Zhai
We encountered following crash when testing a XDP_REDIRECT feature in production: [56251.579676] list_add corruption. next->prev should be prev (ffff93120dd40f30), but was ffffb301ef3a6740. (next=ffff93120dd 40f30). [56251.601413] ------------[ cut here ]------------ [56251.611357] kernel BUG at lib/list_debug.c:29! [56251.621082] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [56251.632073] CPU: 111 UID: 0 PID: 0 Comm: swapper/111 Kdump: loaded Tainted: P O 6.12.33-cloudflare-2025.6. 3 #1 [56251.653155] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE [56251.663877] Hardware name: MiTAC GC68B-B8032-G11P6-GPU/S8032GM-HE-CFR, BIOS V7.020.B10-sig 01/22/2025 [56251.682626] RIP: 0010:__list_add_valid_or_report+0x4b/0xa0 [56251.693203] Code: 0e 48 c7 c7 68 e7 d9 97 e8 42 16 fe ff 0f 0b 48 8b 52 08 48 39 c2 74 14 48 89 f1 48 c7 c7 90 e7 d9 97 48 89 c6 e8 25 16 fe ff <0f> 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 e8 e7 d9 97 4c 89 [56251.725811] RSP: 0018:ffff93120dd40b80 EFLAGS: 00010246 [56251.736094] RAX: 0000000000000075 RBX: ffffb301e6bba9d8 RCX: 0000000000000000 [56251.748260] RDX: 0000000000000000 RSI: ffff9149afda0b80 RDI: ffff9149afda0b80 [56251.760349] RBP: ffff9131e49c8000 R08: 0000000000000000 R09: ffff93120dd40a18 [56251.772382] R10: ffff9159cf2ce1a8 R11: 0000000000000003 R12: ffff911a80850000 [56251.784364] R13: ffff93120fbc7000 R14: 0000000000000010 R15: ffff9139e7510e40 [56251.796278] FS: 0000000000000000(0000) GS:ffff9149afd80000(0000) knlGS:0000000000000000 [56251.809133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [56251.819561] CR2: 00007f5e85e6f300 CR3: 00000038b85e2006 CR4: 0000000000770ef0 [56251.831365] PKRU: 55555554 [56251.838653] Call Trace: [56251.845560] <IRQ> [56251.851943] cpu_map_enqueue.cold+0x5/0xa [56251.860243] xdp_do_redirect+0x2d9/0x480 [56251.868388] bnxt_rx_xdp+0x1d8/0x4c0 [bnxt_en] [56251.877028] bnxt_rx_pkt+0x5f7/0x19b0 [bnxt_en] [56251.885665] ? cpu_max_write+0x1e/0x100 [56251.893510] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.902276] __bnxt_poll_work+0x190/0x340 [bnxt_en] [56251.911058] bnxt_poll+0xab/0x1b0 [bnxt_en] [56251.919041] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.927568] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.935958] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.944250] __napi_poll+0x2b/0x160 [56251.951155] bpf_trampoline_6442548651+0x79/0x123 [56251.959262] __napi_poll+0x5/0x160 [56251.966037] net_rx_action+0x3d2/0x880 [56251.973133] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.981265] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.989262] ? __hrtimer_run_queues+0x162/0x2a0 [56251.996967] ? srso_alias_return_thunk+0x5/0xfbef5 [56252.004875] ? srso_alias_return_thunk+0x5/0xfbef5 [56252.012673] ? bnxt_msix+0x62/0x70 [bnxt_en] [56252.019903] handle_softirqs+0xcf/0x270 [56252.026650] irq_exit_rcu+0x67/0x90 [56252.032933] common_interrupt+0x85/0xa0 [56252.039498] </IRQ> [56252.044246] <TASK> [56252.048935] asm_common_interrupt+0x26/0x40 [56252.055727] RIP: 0010:cpuidle_enter_state+0xb8/0x420 [56252.063305] Code: dc 01 00 00 e8 f9 79 3b ff e8 64 f7 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 a5 32 3a ff 45 84 ff 0f 85 ae 01 00 00 fb 45 85 f6 <0f> 88 88 01 00 00 48 8b 04 24 49 63 ce 4c 89 ea 48 6b f1 68 48 29 [56252.088911] RSP: 0018:ffff93120c97fe98 EFLAGS: 00000202 [56252.096912] RAX: ffff9149afd80000 RBX: ffff9141d3a72800 RCX: 0000000000000000 [56252.106844] RDX: 00003329176c6b98 RSI: ffffffe36db3fdc7 RDI: 0000000000000000 [56252.116733] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000000000004e [56252.126652] R10: ffff9149afdb30c4 R11: 071c71c71c71c71c R12: ffffffff985ff860 [56252.136637] R13: 00003329176c6b98 R14: 0000000000000002 R15: 0000000000000000 [56252.146667] ? cpuidle_enter_state+0xab/0x420 [56252.153909] cpuidle_enter+0x2d/0x40 [56252.160360] do_idle+0x176/0x1c0 [56252.166456] cpu_startup_entry+0x29/0x30 [56252.173248] start_secondary+0xf7/0x100 [56252.179941] common_startup_64+0x13e/0x141 [56252.186886] </TASK> From the crash dump, we found that the cpu_map_flush_list inside redirect info is partially corrupted: its list_head->next points to itself, but list_head->prev points to a valid list of unflushed bq entries. This turned out to be a result of missed XDP flush on redirect lists. By digging in the actual source code, we found that commit 7f0a168b0441 ("bnxt_en: Add completion ring pointer in TX and RX ring structures") incorrectly overwrites the event mask for XDP_REDIRECT in bnxt_rx_xdp. We can stably reproduce this crash by returning XDP_TX and XDP_REDIRECT randomly for incoming packets in a naive XDP program. Properly propagate the XDP_REDIRECT events back fixes the crash. Fixes: a7559bc8c17c ("bnxt: support transmit and free of aggregation buffers") Tested-by: Andrew Rzeznik <arzeznik@cloudflare.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Link: https://patch.msgid.link/aFl7jpCNzscumuN2@debian.debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-24Merge tag 'selinux-pr-20250624' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "Another small SELinux patch to fix a problem seen by the dracut-ng folks during early boot when SELinux is enabled, but the policy has yet to be loaded" * tag 'selinux-pr-20250624' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: change security_compute_sid to return the ssid or tsid on match
2025-06-24vsock/uapi: fix linux/vm_sockets.h userspace compilation errorsStefano Garzarella
If a userspace application just include <linux/vm_sockets.h> will fail to build with the following errors: /usr/include/linux/vm_sockets.h:182:39: error: invalid application of ‘sizeof’ to incomplete type ‘struct sockaddr’ 182 | unsigned char svm_zero[sizeof(struct sockaddr) - | ^~~~~~ /usr/include/linux/vm_sockets.h:183:39: error: ‘sa_family_t’ undeclared here (not in a function) 183 | sizeof(sa_family_t) - | Include <sys/socket.h> for userspace (guarded by ifndef __KERNEL__) where `struct sockaddr` and `sa_family_t` are defined. We already do something similar in <linux/mptcp.h> and <linux/if.h>. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250623100053.40979-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-24spi: spi-cadence-quadspi: Fix pm runtime unbalanceKhairul Anuar Romli
Having PM put sync in remove function is causing PM underflow during remove operation. This is caused by the function, runtime_pm_get_sync, not being called anywhere during the op. Ensure that calls to pm_runtime_enable()/pm_runtime_disable() and pm_runtime_get_sync()/pm_runtime_put_sync() match. echo 108d2000.spi > /sys/bus/platform/drivers/cadence-qspi/unbind [ 49.644256] Deleting MTD partitions on "108d2000.spi.0": [ 49.649575] Deleting u-boot MTD partition [ 49.684087] Deleting root MTD partition [ 49.724188] cadence-qspi 108d2000.spi: Runtime PM usage count underflow! Continuous bind/unbind will result in an "Unbalanced pm_runtime_enable" error. Subsequent unbind attempts will return a "No such device" error, while bind attempts will return a "Resource temporarily unavailable" error. [ 47.592434] cadence-qspi 108d2000.spi: Runtime PM usage count underflow! [ 49.592233] cadence-qspi 108d2000.spi: detected FIFO depth (1024) different from config (128) [ 53.232309] cadence-qspi 108d2000.spi: Runtime PM usage count underflow! [ 55.828550] cadence-qspi 108d2000.spi: detected FIFO depth (1024) different from config (128) [ 57.940627] cadence-qspi 108d2000.spi: Runtime PM usage count underflow! [ 59.912490] cadence-qspi 108d2000.spi: detected FIFO depth (1024) different from config (128) [ 61.876243] cadence-qspi 108d2000.spi: Runtime PM usage count underflow! [ 61.883000] platform 108d2000.spi: Unbalanced pm_runtime_enable! [ 532.012270] cadence-qspi 108d2000.spi: probe with driver cadence-qspi failed1 Also, change clk_disable_unprepare() to clk_disable() since continuous bind and unbind operations will trigger a warning indicating that the clock is already unprepared. Fixes: 4892b374c9b7 ("mtd: spi-nor: cadence-quadspi: Add runtime PM support") cc: stable@vger.kernel.org # 6.6+ Signed-off-by: Khairul Anuar Romli <khairul.anuar.romli@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Link: https://patch.msgid.link/4e7a4b8aba300e629b45a04f90bddf665fbdb335.1749601877.git.khairul.anuar.romli@altera.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-06-24userns and mnt_idmap leak in open_tree_attr(2)Al Viro
Once want_mount_setattr() has returned a positive, it does require finish_mount_kattr() to release ->mnt_userns. Failing do_mount_setattr() does not change that. As the result, we can end up leaking userns and possibly mnt_idmap as well. Fixes: c4a16820d901 ("fs: add open_tree_attr()") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-24wifi: mac80211: finish link init before RCU publishJohannes Berg
Since the link/conf pointers can be accessed without any protection other than RCU, make sure the data is actually set up before publishing the structures. Fixes: b2e8434f1829 ("wifi: mac80211: set up/tear down client vif links properly") Link: https://patch.msgid.link/20250624130749.9a308b713c74.I4a80f5eead112a38730939ea591d2e275c721256@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-24Merge tag 'for-net-2025-06-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - L2CAP: Fix L2CAP MTU negotiation - hci_core: Fix use-after-free in vhci_flush() - btintel_pcie: Fix potential race condition in firmware download - hci_qca: fix unable to load the BT driver * tag 'for-net-2025-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_core: Fix use-after-free in vhci_flush() driver: bluetooth: hci_qca:fix unable to load the BT driver Bluetooth: L2CAP: Fix L2CAP MTU negotiation Bluetooth: btintel_pcie: Fix potential race condition in firmware download ==================== Link: https://patch.msgid.link/20250623165405.227619-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-24wifi: iwlwifi: mvm: assume '1' as the default mac_config_cmd versionMiri Korenblit
Unfortunately, FWs of some devices don't have the version of the iwl_mac_config_cmd defined in the TLVs. We send 0 as the 'def argument to iwl_fw_lookup_cmd_ver, so for such FWs, the return value will be 0, leading to a warning, and to not sending the command. Fix this by assuming that the default version is 1. Fixes: 83f3ac2848b4 ("wifi: iwlwifi: Fix incorrect logic on cmd_ver range checking") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250624071427.2662621-1-miriam.rachel.korenblit@intel.com
2025-06-24Merge branch 'af_unix-fix-two-oob-issues'Paolo Abeni
Kuniyuki Iwashima says: ==================== af_unix: Fix two OOB issues. From: Kuniyuki Iwashima <kuniyu@google.com> Recently, two issues are reported regarding MSG_OOB. Patch 1 fixes issues that happen when multiple consumed OOB skbs are placed consecutively in the recv queue. Patch 2 fixes an inconsistent behaviour that close()ing a socket with a consumed OOB skb at the head of the recv queue triggers -ECONNRESET on the peer's recv(). v1: https://lore.kernel.org/netdev/20250618043453.281247-1-kuni1840@gmail.com/ ==================== Link: https://patch.msgid.link/20250619041457.1132791-1-kuni1840@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-24selftest: af_unix: Add tests for -ECONNRESET.Kuniyuki Iwashima
A new function resetpair() calls close() for the receiver and checks the return value from recv() on the initial sender side. Now resetpair() is added to each test case and some additional test cases. Note that TCP sets -ECONNRESET to the consumed OOB, but we have decided not to touch TCP MSG_OOB code in the past. Before: # RUN msg_oob.no_peek.ex_oob_ex_oob ... # msg_oob.c:236:ex_oob_ex_oob:AF_UNIX :Connection reset by peer # msg_oob.c:237:ex_oob_ex_oob:Expected: # msg_oob.c:239:ex_oob_ex_oob:Expected ret[0] (-1) == expected_len (0) # ex_oob_ex_oob: Test terminated by assertion # FAIL msg_oob.no_peek.ex_oob_ex_oob not ok 14 msg_oob.no_peek.ex_oob_ex_oob ... # FAILED: 36 / 48 tests passed. # Totals: pass:36 fail:12 xfail:0 xpass:0 skip:0 error:0 After: # RUN msg_oob.no_peek.ex_oob_ex_oob ... # msg_oob.c:244:ex_oob_ex_oob:AF_UNIX : # msg_oob.c:245:ex_oob_ex_oob:TCP :Connection reset by peer # OK msg_oob.no_peek.ex_oob_ex_oob ok 14 msg_oob.no_peek.ex_oob_ex_oob ... # PASSED: 48 / 48 tests passed. # Totals: pass:48 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250619041457.1132791-5-kuni1840@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-24af_unix: Don't set -ECONNRESET for consumed OOB skb.Kuniyuki Iwashima
Christian Brauner reported that even after MSG_OOB data is consumed, calling close() on the receiver socket causes the peer's recv() to return -ECONNRESET: 1. send() and recv() an OOB data. >>> from socket import * >>> s1, s2 = socketpair(AF_UNIX, SOCK_STREAM) >>> s1.send(b'x', MSG_OOB) 1 >>> s2.recv(1, MSG_OOB) b'x' 2. close() for s2 sets ECONNRESET to s1->sk_err even though s2 consumed the OOB data >>> s2.close() >>> s1.recv(10, MSG_DONTWAIT) ... ConnectionResetError: [Errno 104] Connection reset by peer Even after being consumed, the skb holding the OOB 1-byte data stays in the recv queue to mark the OOB boundary and break recv() at that point. This must be considered while close()ing a socket. Let's skip the leading consumed OOB skb while checking the -ECONNRESET condition in unix_release_sock(). Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: Christian Brauner <brauner@kernel.org> Closes: https://lore.kernel.org/netdev/20250529-sinkt-abfeuern-e7b08200c6b0@brauner/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://patch.msgid.link/20250619041457.1132791-4-kuni1840@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-24af_unix: Add test for consecutive consumed OOB.Kuniyuki Iwashima
Let's add a test case where consecutive concumed OOB skbs stay at the head of the queue. Without the previous patch, ioctl(SIOCATMARK) assertion fails. Before: # RUN msg_oob.no_peek.ex_oob_ex_oob_oob ... # msg_oob.c:305:ex_oob_ex_oob_oob:Expected answ[0] (0) == oob_head (1) # ex_oob_ex_oob_oob: Test terminated by assertion # FAIL msg_oob.no_peek.ex_oob_ex_oob_oob not ok 12 msg_oob.no_peek.ex_oob_ex_oob_oob After: # RUN msg_oob.no_peek.ex_oob_ex_oob_oob ... # OK msg_oob.no_peek.ex_oob_ex_oob_oob ok 12 msg_oob.no_peek.ex_oob_ex_oob_oob Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250619041457.1132791-3-kuni1840@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-24af_unix: Don't leave consecutive consumed OOB skbs.Kuniyuki Iwashima
Jann Horn reported a use-after-free in unix_stream_read_generic(). The following sequences reproduce the issue: $ python3 from socket import * s1, s2 = socketpair(AF_UNIX, SOCK_STREAM) s1.send(b'x', MSG_OOB) s2.recv(1, MSG_OOB) # leave a consumed OOB skb s1.send(b'y', MSG_OOB) s2.recv(1, MSG_OOB) # leave a consumed OOB skb s1.send(b'z', MSG_OOB) s2.recv(1) # recv 'z' illegally s2.recv(1, MSG_OOB) # access 'z' skb (use-after-free) Even though a user reads OOB data, the skb holding the data stays on the recv queue to mark the OOB boundary and break the next recv(). After the last send() in the scenario above, the sk2's recv queue has 2 leading consumed OOB skbs and 1 real OOB skb. Then, the following happens during the next recv() without MSG_OOB 1. unix_stream_read_generic() peeks the first consumed OOB skb 2. manage_oob() returns the next consumed OOB skb 3. unix_stream_read_generic() fetches the next not-yet-consumed OOB skb 4. unix_stream_read_generic() reads and frees the OOB skb , and the last recv(MSG_OOB) triggers KASAN splat. The 3. above occurs because of the SO_PEEK_OFF code, which does not expect unix_skb_len(skb) to be 0, but this is true for such consumed OOB skbs. while (skip >= unix_skb_len(skb)) { skip -= unix_skb_len(skb); skb = skb_peek_next(skb, &sk->sk_receive_queue); ... } In addition to this use-after-free, there is another issue that ioctl(SIOCATMARK) does not function properly with consecutive consumed OOB skbs. So, nothing good comes out of such a situation. Instead of complicating manage_oob(), ioctl() handling, and the next ECONNRESET fix by introducing a loop for consecutive consumed OOB skbs, let's not leave such consecutive OOB unnecessarily. Now, while receiving an OOB skb in unix_stream_recv_urg(), if its previous skb is a consumed OOB skb, it is freed. [0]: BUG: KASAN: slab-use-after-free in unix_stream_read_actor (net/unix/af_unix.c:3027) Read of size 4 at addr ffff888106ef2904 by task python3/315 CPU: 2 UID: 0 PID: 315 Comm: python3 Not tainted 6.16.0-rc1-00407-gec315832f6f9 #8 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:122) print_report (mm/kasan/report.c:409 mm/kasan/report.c:521) kasan_report (mm/kasan/report.c:636) unix_stream_read_actor (net/unix/af_unix.c:3027) unix_stream_read_generic (net/unix/af_unix.c:2708 net/unix/af_unix.c:2847) unix_stream_recvmsg (net/unix/af_unix.c:3048) sock_recvmsg (net/socket.c:1063 (discriminator 20) net/socket.c:1085 (discriminator 20)) __sys_recvfrom (net/socket.c:2278) __x64_sys_recvfrom (net/socket.c:2291 (discriminator 1) net/socket.c:2287 (discriminator 1) net/socket.c:2287 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f8911fcea06 Code: 5d e8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 19 83 e2 39 83 fa 08 75 11 e8 26 ff ff ff 66 0f 1f 44 00 00 48 8b 45 10 0f 05 <48> 8b 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08 RSP: 002b:00007fffdb0dccb0 EFLAGS: 00000202 ORIG_RAX: 000000000000002d RAX: ffffffffffffffda RBX: 00007fffdb0dcdc8 RCX: 00007f8911fcea06 RDX: 0000000000000001 RSI: 00007f8911a5e060 RDI: 0000000000000006 RBP: 00007fffdb0dccd0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000202 R12: 00007f89119a7d20 R13: ffffffffc4653600 R14: 0000000000000000 R15: 0000000000000000 </TASK> Allocated by task 315: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:60 (discriminator 1) mm/kasan/common.c:69 (discriminator 1)) __kasan_slab_alloc (mm/kasan/common.c:348) kmem_cache_alloc_node_noprof (./include/linux/kasan.h:250 mm/slub.c:4148 mm/slub.c:4197 mm/slub.c:4249) __alloc_skb (net/core/skbuff.c:660 (discriminator 4)) alloc_skb_with_frags (./include/linux/skbuff.h:1336 net/core/skbuff.c:6668) sock_alloc_send_pskb (net/core/sock.c:2993) unix_stream_sendmsg (./include/net/sock.h:1847 net/unix/af_unix.c:2256 net/unix/af_unix.c:2418) __sys_sendto (net/socket.c:712 (discriminator 20) net/socket.c:727 (discriminator 20) net/socket.c:2226 (discriminator 20)) __x64_sys_sendto (net/socket.c:2233 (discriminator 1) net/socket.c:2229 (discriminator 1) net/socket.c:2229 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Freed by task 315: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:60 (discriminator 1) mm/kasan/common.c:69 (discriminator 1)) kasan_save_free_info (mm/kasan/generic.c:579 (discriminator 1)) __kasan_slab_free (mm/kasan/common.c:271) kmem_cache_free (mm/slub.c:4643 (discriminator 3) mm/slub.c:4745 (discriminator 3)) unix_stream_read_generic (net/unix/af_unix.c:3010) unix_stream_recvmsg (net/unix/af_unix.c:3048) sock_recvmsg (net/socket.c:1063 (discriminator 20) net/socket.c:1085 (discriminator 20)) __sys_recvfrom (net/socket.c:2278) __x64_sys_recvfrom (net/socket.c:2291 (discriminator 1) net/socket.c:2287 (discriminator 1) net/socket.c:2287 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) The buggy address belongs to the object at ffff888106ef28c0 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 68 bytes inside of freed 224-byte region [ffff888106ef28c0, ffff888106ef29a0) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888106ef3cc0 pfn:0x106ef2 head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x200000000000040(head|node=0|zone=2) page_type: f5(slab) raw: 0200000000000040 ffff8881001d28c0 ffffea000422fe00 0000000000000004 raw: ffff888106ef3cc0 0000000080190010 00000000f5000000 0000000000000000 head: 0200000000000040 ffff8881001d28c0 ffffea000422fe00 0000000000000004 head: ffff888106ef3cc0 0000000080190010 00000000f5000000 0000000000000000 head: 0200000000000001 ffffea00041bbc81 00000000ffffffff 00000000ffffffff head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888106ef2800: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc ffff888106ef2880: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb >ffff888106ef2900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888106ef2980: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888106ef2a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20250619041457.1132791-2-kuni1840@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-24wifi: mac80211: fix beacon interval calculation overflowLachlan Hodges
As we are converting from TU to usecs, a beacon interval of 100*1024 usecs will lead to integer wrapping. To fix change to use a u32. Fixes: 057d5f4ba1e4 ("mac80211: sync dtim_count to TSF") Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250621123209.511796-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-24wifi: iwlegacy: work around excessive stack usage on clang/kasanArnd Bergmann
In some rare randconfig builds, I seem to trigger a bug in clang where it unrolls a loop but then runs out of registers, which then get spilled to the stack: net/wireless/intel/iwlegacy/4965-rs.c:2262:1: error: stack frame size (1696) exceeds limit (1280) in 'il4965_rs_rate_init' [-Werror,-Wframe-larger-than] This seems to be the same one I saw in the omapdrm driver, and there is an easy workaround by not inlining the il4965_rs_rate_scale_clear_win function. Link: https://github.com/llvm/llvm-project/issues/143908 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Link: https://patch.msgid.link/20250620113946.3987160-1-arnd@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-23Merge branch 'bpf-specify-access-type-of-bpf_sysctl_get_name-args'Alexei Starovoitov
Jerome Marchand says: ==================== bpf: Specify access type of bpf_sysctl_get_name args The second argument of bpf_sysctl_get_name() helper is a pointer to a buffer that is being written to. However that isn't specify in the prototype. Until commit 37cce22dbd51a ("bpf: verifier: Refactor helper access type tracking") that mistake was hidden by the way the verifier treated helper accesses. Since then, the verifier, working on wrong infromation from the prototype, can make faulty optimization that would had been caught by the test_sysctl selftests if it was run by the CI. The first patch fixes bpf_sysctl_get_name prototype. The second patch converts the test_sysctl to prog_tests so that it will be run by the CI and catch similar issues in the future. Changes in v3: - Use ASSERT* macro instead of CHECK_FAIL. - Remove useless code. Changes in v2: - Replace ARG_PTR_TO_UNINIT_MEM by ARG_PTR_TO_MEM | MEM_WRITE. - Converts test_sysctl to prog_tests. ==================== Link: https://patch.msgid.link/20250619140603.148942-1-jmarchan@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-23selftests/bpf: Convert test_sysctl to prog_testsJerome Marchand
Convert test_sysctl test to prog_tests with minimal change to the tests themselves. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250619140603.148942-3-jmarchan@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-23bpf: Specify access type of bpf_sysctl_get_name argsJerome Marchand
The second argument of bpf_sysctl_get_name() helper is a pointer to a buffer that is being written to. However that isn't specify in the prototype. Until commit 37cce22dbd51a ("bpf: verifier: Refactor helper access type tracking"), all helper accesses were considered as a possible write access by the verifier, so no big harm was done. However, since then, the verifier might make wrong asssumption about the content of that address which might lead it to make faulty optimizations (such as removing code that was wrongly labeled dead). This is what happens in test_sysctl selftest to the tests related to sysctl_get_name. Add MEM_WRITE flag the second argument of bpf_sysctl_get_name(). Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250619140603.148942-2-jmarchan@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-23bridge: mcast: Fix use-after-free during router port configurationIdo Schimmel
The bridge maintains a global list of ports behind which a multicast router resides. The list is consulted during forwarding to ensure multicast packets are forwarded to these ports even if the ports are not member in the matching MDB entry. When per-VLAN multicast snooping is enabled, the per-port multicast context is disabled on each port and the port is removed from the global router port list: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 # ip link add name dummy1 up master br1 type dummy # ip link set dev dummy1 type bridge_slave mcast_router 2 $ bridge -d mdb show | grep router router ports on br1: dummy1 # ip link set dev br1 type bridge mcast_vlan_snooping 1 $ bridge -d mdb show | grep router However, the port can be re-added to the global list even when per-VLAN multicast snooping is enabled: # ip link set dev dummy1 type bridge_slave mcast_router 0 # ip link set dev dummy1 type bridge_slave mcast_router 2 $ bridge -d mdb show | grep router router ports on br1: dummy1 Since commit 4b30ae9adb04 ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions"), when per-VLAN multicast snooping is enabled, multicast disablement on a port will disable the per-{port, VLAN} multicast contexts and not the per-port one. As a result, a port will remain in the global router port list even after it is deleted. This will lead to a use-after-free [1] when the list is traversed (when adding a new port to the list, for example): # ip link del dev dummy1 # ip link add name dummy2 up master br1 type dummy # ip link set dev dummy2 type bridge_slave mcast_router 2 Similarly, stale entries can also be found in the per-VLAN router port list. When per-VLAN multicast snooping is disabled, the per-{port, VLAN} contexts are disabled on each port and the port is removed from the per-VLAN router port list: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 # ip link add name dummy1 up master br1 type dummy # bridge vlan add vid 2 dev dummy1 # bridge vlan global set vid 2 dev br1 mcast_snooping 1 # bridge vlan set vid 2 dev dummy1 mcast_router 2 $ bridge vlan global show dev br1 vid 2 | grep router router ports: dummy1 # ip link set dev br1 type bridge mcast_vlan_snooping 0 $ bridge vlan global show dev br1 vid 2 | grep router However, the port can be re-added to the per-VLAN list even when per-VLAN multicast snooping is disabled: # bridge vlan set vid 2 dev dummy1 mcast_router 0 # bridge vlan set vid 2 dev dummy1 mcast_router 2 $ bridge vlan global show dev br1 vid 2 | grep router router ports: dummy1 When the VLAN is deleted from the port, the per-{port, VLAN} multicast context will not be disabled since multicast snooping is not enabled on the VLAN. As a result, the port will remain in the per-VLAN router port list even after it is no longer member in the VLAN. This will lead to a use-after-free [2] when the list is traversed (when adding a new port to the list, for example): # ip link add name dummy2 up master br1 type dummy # bridge vlan add vid 2 dev dummy2 # bridge vlan del vid 2 dev dummy1 # bridge vlan set vid 2 dev dummy2 mcast_router 2 Fix these issues by removing the port from the relevant (global or per-VLAN) router port list in br_multicast_port_ctx_deinit(). The function is invoked during port deletion with the per-port multicast context and during VLAN deletion with the per-{port, VLAN} multicast context. Note that deleting the multicast router timer is not enough as it only takes care of the temporary multicast router states (1 or 3) and not the permanent one (2). [1] BUG: KASAN: slab-out-of-bounds in br_multicast_add_router.part.0+0x3f1/0x560 Write of size 8 at addr ffff888004a67328 by task ip/384 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 br_multicast_add_router.part.0+0x3f1/0x560 br_multicast_set_port_router+0x74e/0xac0 br_setport+0xa55/0x1870 br_port_slave_changelink+0x95/0x120 __rtnl_newlink+0x5e8/0xa40 rtnl_newlink+0x627/0xb00 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x124/0x1c0 do_syscall_64+0xbb/0x360 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] BUG: KASAN: slab-use-after-free in br_multicast_add_router.part.0+0x378/0x560 Read of size 8 at addr ffff888009f00840 by task bridge/391 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 br_multicast_add_router.part.0+0x378/0x560 br_multicast_set_port_router+0x6f9/0xac0 br_vlan_process_options+0x8b6/0x1430 br_vlan_rtm_process_one+0x605/0xa30 br_vlan_rtm_process+0x396/0x4c0 rtnetlink_rcv_msg+0x2f7/0xb70 netlink_rcv_skb+0x11f/0x350 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x124/0x1c0 do_syscall_64+0xbb/0x360 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: 2796d846d74a ("net: bridge: vlan: convert mcast router global option to per-vlan entry") Fixes: 4b30ae9adb04 ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions") Reported-by: syzbot+7bfa4b72c6a5da128d32@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/684c18bd.a00a0220.279073.000b.GAE@google.com/T/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250619182228.1656906-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-23ethernet: ionic: Fix DMA mapping testsThomas Fourier
Change error values of `ionic_tx_map_single()` and `ionic_tx_map_frag()` from 0 to `DMA_MAPPING_ERROR` to prevent collision with 0 as a valid address. This also fixes the use of `dma_mapping_error()` to test against 0 in `ionic_xdp_post_frame()` Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Fixes: 56e41ee12d2d ("ionic: better dma-map error handling") Fixes: ac8813c0ab7d ("ionic: convert Rx queue buffers to use page_pool") Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://patch.msgid.link/20250619094538.283723-2-fourier.thomas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-23Merge tag 'for-6.16/dm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mikulas Patocka: - dm-crypt: fix a crash on 32-bit machines - dm-raid: replace "rdev" with correct loop variable name "r" * tag 'for-6.16/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-raid: fix variable in journal device check dm-crypt: Extend state buffer size in crypt_iv_lmk_one
2025-06-23Merge tag 'f2fs-for-6.16-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: - fix double-unlock introduced by the recent folio conversion - fix stale page content beyond EOF complained by xfstests/generic/363 * tag 'f2fs-for-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: fix to zero post-eof page f2fs: Fix __write_node_folio() conversion
2025-06-23net: netpoll: Initialize UDP checksum field before checksummingBreno Leitao
commit f1fce08e63fe ("netpoll: Eliminate redundant assignment") removed the initialization of the UDP checksum, which was wrong and broke netpoll IPv6 transmission due to bad checksumming. udph->check needs to be set before calling csum_ipv6_magic(). Fixes: f1fce08e63fe ("netpoll: Eliminate redundant assignment") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250620-netpoll_fix-v1-1-f9f0b82bc059@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-23Merge tag 'for-6.16-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Fixes: - fix invalid inode pointer dereferences during log replay - fix a race between renames and directory logging - fix shutting down delayed iput worker - fix device byte accounting when dropping chunk - in zoned mode, fix offset calculations for DUP profile when conventional and sequential zones are used together Regression fixes: - fix possible double unlock of extent buffer tree (xarray conversion) - in zoned mode, fix extent buffer refcount when writing out extents (xarray conversion) Error handling fixes and updates: - handle unexpected extent type when replaying log - check and warn if there are remaining delayed inodes when putting a root - fix assertion when building free space tree - handle csum tree error with mount option 'rescue=ibadroot' Other: - error message updates: add prefix to all scrub related messages, include other information in messages" * tag 'for-6.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix alloc_offset calculation for partly conventional block groups btrfs: handle csum tree error with rescue=ibadroots correctly btrfs: fix race between async reclaim worker and close_ctree() btrfs: fix assertion when building free space tree btrfs: don't silently ignore unexpected extent type when replaying log btrfs: fix invalid inode pointer dereferences during log replay btrfs: fix double unlock of buffer_tree xarray when releasing subpage eb btrfs: update superblock's device bytes_used when dropping chunk btrfs: fix a race between renames and directory logging btrfs: scrub: add prefix for the error messages btrfs: warn if leaking delayed_nodes in btrfs_put_root() btrfs: fix delayed ref refcount leak in debug assertion btrfs: include root in error message when unlinking inode btrfs: don't drop a reference if btrfs_check_write_meta_pointer() fails