Age | Commit message (Collapse) | Author |
|
Along with kexec, KHO also has parts dealing with memory management, like
page/folio initialization, memblock, and preserving/unpreserving memory
for next kernel. Copy linux-mm@ to KHO patches so the right set of eyes
can look at changes to those parts.
Link: https://lkml.kernel.org/r/20250613131917.4488-1-pratyush@kernel.org
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This commit fixes two kinds of races, they may have different results:
Barry reported a BUG_ON in commit c50f8e6053b0, we may see the same
BUG_ON if the filemap lookup returned NULL and folio is added to swap
cache after that.
If another kind of race is triggered (folio changed after lookup) we
may see RSS counter is corrupted:
[ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0
type:MM_ANONPAGES val:-1
[ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0
type:MM_SHMEMPAGES val:1
Because the folio is being accounted to the wrong VMA.
I'm not sure if there will be any data corruption though, seems no.
The issues above are critical already.
On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache
lookup, and tries to move the found folio to the faulting vma. Currently,
it relies on checking the PTE value to ensure that the moved folio still
belongs to the src swap entry and that no new folio has been added to the
swap cache, which turns out to be unreliable.
While working and reviewing the swap table series with Barry, following
existing races are observed and reproduced [1]:
In the example below, move_pages_pte is moving src_pte to dst_pte, where
src_pte is a swap entry PTE holding swap entry S1, and S1 is not in the
swap cache:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1
... < interrupted> ...
<swapin src_pte, alloc and use folio A>
// folio A is a new allocated folio
// and get installed into src_pte
<frees swap entry S1>
// src_pte now points to folio A, S1
// has swap count == 0, it can be freed
// by folio_swap_swap or swap
// allocator's reclaim.
<try to swap out another folio B>
// folio B is a folio in another VMA.
<put folio B to swap cache using S1 >
// S1 is freed, folio B can use it
// for swap out with no problem.
...
folio = filemap_get_folio(S1)
// Got folio B here !!!
... < interrupted again> ...
<swapin folio B and free S1>
// Now S1 is free to be used again.
<swapout src_pte & folio A using S1>
// Now src_pte is a swap entry PTE
// holding S1 again.
folio_trylock(folio)
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// Moved invalid folio B here !!!
The race window is very short and requires multiple collisions of multiple
rare events, so it's very unlikely to happen, but with a deliberately
constructed reproducer and increased time window, it can be reproduced
easily.
This can be fixed by checking if the folio returned by filemap is the
valid swap cache folio after acquiring the folio lock.
Another similar race is possible: filemap_get_folio may return NULL, but
folio (A) could be swapped in and then swapped out again using the same
swap entry after the lookup. In such a case, folio (A) may remain in the
swap cache, so it must be moved too:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1, and S1 is not in swap cache
folio = filemap_get_folio(S1)
// Got NULL
... < interrupted again> ...
<swapin folio A and free S1>
<swapout folio A re-using S1>
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// folio A is ignored !!!
Fix this by checking the swap cache again after acquiring the src_pte
lock. And to avoid the filemap overhead, we check swap_map directly [2].
The SWP_SYNCHRONOUS_IO path does make the problem more complex, but so far
we don't need to worry about that, since folios can only be exposed to the
swap cache in the swap out path, and this is covered in this patch by
checking the swap cache again after acquiring the src_pte lock.
Testing with a simple C program that allocates and moves several GB of
memory did not show any observable performance change.
Link: https://lkml.kernel.org/r/20250604151038.21968-1-ryncsn@gmail.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Kairui Song <kasong@tencent.com>
Closes: https://lore.kernel.org/linux-mm/CAMgjq7B1K=6OOrK2OUZ0-tqCzi+EJt+2_K97TPGoSt=9+JwP7Q@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/CAGsJ_4yJhJBo16XhiC-nUzSheyX-V3-nFE+tAi=8Y560K8eT=A@mail.gmail.com/ [2]
Reviewed-by: Lokesh Gidra <lokeshgidra@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Chris Li <chrisl@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After commit 1aaf8c122918 ("mm: gup: fix infinite loop within
__get_longterm_locked") we are able to longterm pin folios that are not
supposed to get longterm pinned, simply because they temporarily have the
LRU flag cleared (esp. temporarily isolated).
For example, two __get_longterm_locked() callers can race, or
__get_longterm_locked() can race with anything else that temporarily
isolates folios.
The introducing commit mentions the use case of a driver that uses
vm_ops->fault to insert pages allocated through cma_alloc() into the page
tables, assuming they can later get longterm pinned. These pages/ folios
would never have the LRU flag set and consequently cannot get isolated.
There is no known in-tree user making use of that so far, fortunately.
To handle that in the future -- and avoid retrying forever to
isolate/migrate them -- we will need a different mechanism for the CMA
area *owner* to indicate that it actually already allocated the page and
is fine with longterm pinning it. The LRU flag is not suitable for that.
Probably we can lookup the relevant CMA area and query the bitmap; we only
have have to care about some races, probably. If already allocated, we
could just allow longterm pinning)
Anyhow, let's fix the "must not be longterm pinned" problem first by
reverting the original commit.
Link: https://lkml.kernel.org/r/20250611131314.594529-1-david@redhat.com
Fixes: 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
Signed-off-by: David Hildenbrand <david@redhat.com>
Closes: https://lore.kernel.org/all/20250522092755.GA3277597@tiffany/
Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Peter Xu <peterx@redhat.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Cc: Aijun Sun <aijun.sun@unisoc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The mm selftests are timing out with the current 180-second limit.
Testing shows that run_vmtests.sh takes approximately 11 minutes
(664 seconds) to complete.
Increase the timeout to 900 seconds (15 minutes) to provide sufficient
buffer for the tests to complete successfully.
Link: https://lkml.kernel.org/r/20250609120606.73145-2-shivankg@amd.com
Signed-off-by: Shivank Garg <shivankg@amd.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Following softlockup can be easily reproduced on my test machine with:
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled
swapon /dev/zram0 # zram0 is a 48G swap device
mkdir -p /sys/fs/cgroup/memory/test
echo 1G > /sys/fs/cgroup/test/memory.max
echo $BASHPID > /sys/fs/cgroup/test/cgroup.procs
while true; do
dd if=/dev/zero of=/tmp/test.img bs=1M count=5120
cat /tmp/test.img > /dev/null
rm /tmp/test.img
done
Then after a while:
watchdog: BUG: soft lockup - CPU#0 stuck for 763s! [cat:5787]
Modules linked in: zram virtiofs
CPU: 0 UID: 0 PID: 5787 Comm: cat Kdump: loaded Tainted: G L 6.15.0.orig-gf3021d9246bc-dirty #118 PREEMPT(voluntary)·
Tainted: [L]=SOFTLOCKUP
Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
RIP: 0010:mpol_shared_policy_lookup+0xd/0x70
Code: e9 b8 b4 ff ff 31 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 54 55 53 <48> 8b 1f 48 85 db 74 41 4c 8d 67 08 48 89 fb 48 89 f5 4c 89 e7 e8
RSP: 0018:ffffc90002b1fc28 EFLAGS: 00000202
RAX: 00000000001c20ca RBX: 0000000000724e1e RCX: 0000000000000001
RDX: ffff888118e214c8 RSI: 0000000000057d42 RDI: ffff888118e21518
RBP: 000000000002bec8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000bf4 R11: 0000000000000000 R12: 0000000000000001
R13: 00000000001c20ca R14: 00000000001c20ca R15: 0000000000000000
FS: 00007f03f995c740(0000) GS:ffff88a07ad9a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f03f98f1000 CR3: 0000000144626004 CR4: 0000000000770eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
shmem_alloc_folio+0x31/0xc0
shmem_swapin_folio+0x309/0xcf0
? filemap_get_entry+0x117/0x1e0
? xas_load+0xd/0xb0
? filemap_get_entry+0x101/0x1e0
shmem_get_folio_gfp+0x2ed/0x5b0
shmem_file_read_iter+0x7f/0x2e0
vfs_read+0x252/0x330
ksys_read+0x68/0xf0
do_syscall_64+0x4c/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f03f9a46991
Code: 00 48 8b 15 81 14 10 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 20 ad 01 00 f3 0f 1e fa 80 3d 35 97 10 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
RSP: 002b:00007fff3c52bd28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000040000 RCX: 00007f03f9a46991
RDX: 0000000000040000 RSI: 00007f03f98ba000 RDI: 0000000000000003
RBP: 00007fff3c52bd50 R08: 0000000000000000 R09: 00007f03f9b9a380
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000040000
R13: 00007f03f98ba000 R14: 0000000000000003 R15: 0000000000000000
</TASK>
The reason is simple, readahead brought some order 0 folio in swap cache,
and the swapin mTHP folio being allocated is in conflict with it, so
swapcache_prepare fails and causes shmem_swap_alloc_folio to return
-EEXIST, and shmem simply retries again and again causing this loop.
Fix it by applying a similar fix for anon mTHP swapin.
The performance change is very slight, time of swapin 10g zero folios
with shmem (test for 12 times):
Before: 2.47s
After: 2.48s
[kasong@tencent.com: add comment]
Link: https://lkml.kernel.org/r/20250610181645.45922-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250610181645.45922-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250609171751.36305-1-ryncsn@gmail.com
Fixes: 1dd44c0af4fa ("mm: shmem: skip swapcache for swapin of synchronous swap device")
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
dma_map_XXX() can fail and should be tested for errors with
dma_mapping_error().
Fixes: a63e78eb2b0f ("scsi: fnic: Add support for fabric based solicited requests and responses")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://lore.kernel.org/r/20250618065715.14740-2-fourier.thomas@gmail.com
Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com>
Reviewed-by: John Menghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Replace KERN_INFO with KERN_DEBUG for a log message.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Gian Carlo Boffa <gcboffa@cisco.com>
Reviewed-by: Arun Easi <aeasi@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/stable/20250612002212.4144-1-kartilak%40cisco.com
Link: https://lore.kernel.org/r/20250618003431.6314-4-kartilak@cisco.com
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add logs in FDMI and FDMI ABTS paths.
Modify log text in these paths.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Gian Carlo Boffa <gcboffa@cisco.com>
Reviewed-by: Arun Easi <aeasi@cisco.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20250618003431.6314-3-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When the link goes down and comes up, FDMI requests are not sent out
anymore.
Fix bug by turning off FNIC_FDMI_ACTIVE when the link goes down.
Fixes: 09c1e6ab4ab2 ("scsi: fnic: Add and integrate support for FDMI")
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Gian Carlo Boffa <gcboffa@cisco.com>
Reviewed-by: Arun Easi <aeasi@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Cc: stable@vger.kernel.org
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20250618003431.6314-2-kartilak@cisco.com
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When both the RHBA and RPA FDMI requests time out, fnic reuses a frame to
send ABTS for each of them. On send completion, this causes an attempt to
free the same frame twice that leads to a crash.
Fix crash by allocating separate frames for RHBA and RPA, and modify ABTS
logic accordingly.
Tested by checking MDS for FDMI information.
Tested by using instrumented driver to:
- Drop PLOGI response
- Drop RHBA response
- Drop RPA response
- Drop RHBA and RPA response
- Drop PLOGI response + ABTS response
- Drop RHBA response + ABTS response
- Drop RPA response + ABTS response
- Drop RHBA and RPA response + ABTS response for both of them
Fixes: 09c1e6ab4ab2 ("scsi: fnic: Add and integrate support for FDMI")
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Gian Carlo Boffa <gcboffa@cisco.com>
Tested-by: Arun Easi <aeasi@cisco.com>
Co-developed-by: Arun Easi <aeasi@cisco.com>
Signed-off-by: Arun Easi <aeasi@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Cc: stable@vger.kernel.org
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20250618003431.6314-1-kartilak@cisco.com
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In ufshcd_host_reset_and_restore(), scale up clocks only when clock
scaling is supported. Without this change CPU latency is voted for 0
(ufshcd_pm_qos_update) during resume unconditionally.
Signed-off-by: anvithdosapati <anvithdosapati@google.com>
Link: https://lore.kernel.org/r/20250616085734.2133581-1-anvithdosapati@google.com
Fixes: a3cd5ec55f6c ("scsi: ufs: add load based scaling of UFS gear")
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
On a system with DRAM interleave enabled, out-of-bound access is
detected:
megaraid_sas 0000:3f:00.0: requested/available msix 128/128 poll_queue 0
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28
index -1 is out of range for type 'cpumask *[1024]'
dump_stack_lvl+0x5d/0x80
ubsan_epilogue+0x5/0x2b
__ubsan_handle_out_of_bounds.cold+0x46/0x4b
megasas_alloc_irq_vectors+0x149/0x190 [megaraid_sas]
megasas_probe_one.cold+0xa4d/0x189c [megaraid_sas]
local_pci_probe+0x42/0x90
pci_device_probe+0xdc/0x290
really_probe+0xdb/0x340
__driver_probe_device+0x78/0x110
driver_probe_device+0x1f/0xa0
__driver_attach+0xba/0x1c0
bus_for_each_dev+0x8b/0xe0
bus_add_driver+0x142/0x220
driver_register+0x72/0xd0
megasas_init+0xdf/0xff0 [megaraid_sas]
do_one_initcall+0x57/0x310
do_init_module+0x90/0x250
init_module_from_file+0x85/0xc0
idempotent_init_module+0x114/0x310
__x64_sys_finit_module+0x65/0xc0
do_syscall_64+0x82/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix it accordingly.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Link: https://lore.kernel.org/r/20250604042556.3731059-1-yu.c.chen@intel.com
Fixes: 8049da6f3943 ("scsi: megaraid_sas: Use irq_set_affinity_and_hint()")
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One fix here from Thierry, fixing crashes caused by attempting to do
cache sync operations on uncached memory on Tegra platforms"
* tag 'spi-fix-v6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: tegra210-qspi: Remove cache operations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One patch here from Heiko which fixes stability issues on some
Rockchip platforms by implementing soft start support and providing
startup time information for their regulators"
* tag 'regulator-fix-v6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fan53555: add enable_time support and soft-start times
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- A workaround update (Vinay)
- Fix memset on iomem (Lucas)
- Fix early wedge on GuC Load failure (Daniele)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aFQ03kNzhbiNK7gW@fedora
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc3:
- vivante scheduler fix.
- v3d null pointer crash fix.
- fix backlight, booting GSP-RM, and potential integer shift overflow in nouveau.
- fix compiler warnings about unused linux/export.h
- fix malidp unknown modifier spam.
- fix for ssd130x.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/d44bab7b-01f8-45a8-a7f4-5d3d563d2f9d@linux.intel.com
|
|
MANA supports RDMA in PF mode. The driver should record the doorbell
physical address when in PF mode.
The doorbell physical address is used by the RDMA driver to map
doorbell pages of the device to user-mode applications through RDMA
verbs interface. In the past, they have been mapped to user-mode while
the device is in VF mode. With the support for PF mode implemented,
also expose those pages in PF mode.
Support for PF mode is implemented in
290e5d3c49f6 ("net: mana: Add support for Multi Vports on Bare metal")
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1750210606-12167-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the end result of a security_compute_sid() computation matches the
ssid or tsid, return that SID rather than looking it up again. This
avoids the problem of multiple initial SIDs that map to the same
context.
Cc: stable@vger.kernel.org
Reported-by: Guido Trentalancia <guido@trentalancia.com>
Fixes: ae254858ce07 ("selinux: introduce an initial SID for early boot processes")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Guido Trentalancia <guido@trentalancia.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless.
The ath12k fix to avoid FW crashes requires adding support for a
number of new FW commands so it's quite large in terms of LoC. The
rest is relatively small.
Current release - fix to a fix:
- ptp: fix breakage after ptp_vclock_in_use() rework
Current release - regressions:
- openvswitch: allocate struct ovs_pcpu_storage dynamically, static
allocation may exhaust module loader limit on smaller systems
Previous releases - regressions:
- tcp: fix tcp_packet_delayed() for peers with no selective ACK
support
Previous releases - always broken:
- wifi: ath12k: don't activate more links than firmware supports
- tcp: make sure sockets open via passive TFO have valid NAPI ID
- eth: bnxt_en: update MRU and RSS table of RSS contexts on queue
reset, prevent Rx queues from silently hanging after queue reset
- NFC: uart: set tty->disc_data only in success path"
* tag 'net-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
net: airoha: Differentiate hwfd buffer size for QDMA0 and QDMA1
net: airoha: Compute number of descriptors according to reserved memory size
tools: ynl: fix mixing ops and notifications on one socket
net: atm: fix /proc/net/atm/lec handling
net: atm: add lec_mutex
mlxbf_gige: return EPROBE_DEFER if PHY IRQ is not available
net: airoha: Always check return value from airoha_ppe_foe_get_entry()
NFC: nci: uart: Set tty->disc_data only in success path
calipso: Fix null-ptr-deref in calipso_req_{set,del}attr().
MAINTAINERS: Remove Shannon Nelson from MAINTAINERS file
net: lan743x: fix potential out-of-bounds write in lan743x_ptp_io_event_clock_get()
eth: fbnic: avoid double free when failing to DMA-map FW msg
tcp: fix passive TFO socket having invalid NAPI ID
selftests: net: add test for passive TFO socket NAPI ID
selftests: net: add passive TFO test binary
selftests: netdevsim: improve lib.sh include in peer.sh
tipc: fix null-ptr-deref when acquiring remote ip of ethernet bearer
Octeontx2-pf: Fix Backpresure configuration
net: ftgmac100: select FIXED_PHY
net: ethtool: remove duplicate defines for family info
...
|
|
Memory allocated for the ECC engine conf is not released during spinand
cleanup. Below kmemleak trace is seen for this memory leak:
unreferenced object 0xffffff80064f00e0 (size 8):
comm "swapper/0", pid 1, jiffies 4294937458
hex dump (first 8 bytes):
00 00 00 00 00 00 00 00 ........
backtrace (crc 0):
kmemleak_alloc+0x30/0x40
__kmalloc_cache_noprof+0x208/0x3c0
spinand_ondie_ecc_init_ctx+0x114/0x200
nand_ecc_init_ctx+0x70/0xa8
nanddev_ecc_engine_init+0xec/0x27c
spinand_probe+0xa2c/0x1620
spi_mem_probe+0x130/0x21c
spi_probe+0xf0/0x170
really_probe+0x17c/0x6e8
__driver_probe_device+0x17c/0x21c
driver_probe_device+0x58/0x180
__device_attach_driver+0x15c/0x1f8
bus_for_each_drv+0xec/0x150
__device_attach+0x188/0x24c
device_initial_probe+0x10/0x20
bus_probe_device+0x11c/0x160
Fix the leak by calling nanddev_ecc_engine_cleanup() inside
spinand_cleanup().
Signed-off-by: Pablo Martin-Gomez <pmartin-gomez@freebox.fr>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
Without this, the compiler (clang21) might emit a warning under W=1
because the variable ori31_emitted is set but never used if
CONFIG_PPC_BOOK3S_64=n.
Without this patch:
$ make -j $(nproc) W=1 ARCH=powerpc SHELL=/bin/bash arch/powerpc/net
[...]
CC arch/powerpc/net/bpf_jit_comp.o
CC arch/powerpc/net/bpf_jit_comp64.o
../arch/powerpc/net/bpf_jit_comp64.c: In function 'bpf_jit_build_body':
../arch/powerpc/net/bpf_jit_comp64.c:417:28: warning: variable 'ori31_emitted' set but not used [-Wunused-but-set-variable]
417 | bool sync_emitted, ori31_emitted;
| ^~~~~~~~~~~~~
AR arch/powerpc/net/built-in.a
With this patch:
[...]
CC arch/powerpc/net/bpf_jit_comp.o
CC arch/powerpc/net/bpf_jit_comp64.o
AR arch/powerpc/net/built-in.a
Fixes: dff883d9e93a ("bpf, arm64, powerpc: Change nospec to include v1 barrier")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506180402.uUXwVoSH-lkp@intel.com/
Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20250619142647.2157017-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- ltc4282: Avoid repeated register write operation
- occ: Fix unaligned accesses, and rework attribute registration to
reduce stack usage
- ftsteutates: Fix TOCTOU race
* tag 'hwmon-for-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ltc4282) avoid repeated register write
hwmon: (occ) fix unaligned accesses
hwmon: (occ) Rework attribute registration for stack usage
hwmon: (ftsteutates) Fix TOCTOU race in fts_read()
|
|
Lorenzo Bianconi says:
====================
net: airoha: Improve hwfd buffer/descriptor queues setup
Compute the number of hwfd buffers/descriptors according to the reserved
memory size if provided via DTS.
Reduce the required hwfd buffers queue size for QDMA1.
v3: https://lore.kernel.org/20250618-airoha-hw-num-desc-v3-0-18a6487cd75e@kernel.org
v1: https://lore.kernel.org/20250615-airoha-hw-num-desc-v1-0-8f88daa4abd7@kernel.org
====================
Link: https://patch.msgid.link/20250619-airoha-hw-num-desc-v4-0-49600a9b319a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
EN7581 SoC allows configuring the size and the number of buffers in
hwfd payload queue for both QDMA0 and QDMA1.
In order to reduce the required DRAM used for hwfd buffers queues and
decrease the memory footprint, differentiate hwfd buffer size for QDMA0
and QDMA1 and reduce hwfd buffer size to 1KB for QDMA1 (WAN) while
maintaining 2KB for QDMA0 (LAN).
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250619-airoha-hw-num-desc-v4-2-49600a9b319a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to not exceed the reserved memory size for hwfd buffers,
compute the number of hwfd buffers/descriptors according to the
reserved memory size and the size of each hwfd buffer (2KB).
Fixes: 3a1ce9e3d01b ("net: airoha: Add the capability to allocate hwfd buffers via reserved-memory")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250619-airoha-hw-num-desc-v4-1-49600a9b319a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
More fixes:
- ath12k
- avoid busy-waiting
- activate correct number of links
- iwlwifi
- iwldvm regression (lots of warnings)
- iwlmld merge damage regression (crash)
- fix build with some old gcc versions
- carl9170: don't talk to device w/o FW [syzbot]
- ath6kl: remove bad FW WARN [syzbot]
- ieee80211: use variable-length arrays [syzbot]
- mac80211
- remove WARN on delayed beacon update [syzbot]
- drop OCB frames with invalid source [syzbot]
* tag 'wireless-2025-06-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: Fix incorrect logic on cmd_ver range checking
wifi: iwlwifi: dvm: restore n_no_reclaim_cmds setting
wifi: iwlwifi: cfg: Limit cb_size to valid range
wifi: iwlwifi: restore missing initialization of async_handlers_list (again)
wifi: ath6kl: remove WARN on bad firmware input
wifi: carl9170: do not ping device which has failed to load firmware
wifi: ath12k: don't wait when there is no vdev started
wifi: ath12k: don't use static variables in ath12k_wmi_fw_stats_process()
wifi: ath12k: avoid burning CPU while waiting for firmware stats
wifi: ath12k: fix documentation on firmware stats
wifi: ath12k: don't activate more links than firmware supports
wifi: ath12k: update link active in case two links fall on the same MAC
wifi: ath12k: support WMI_MLO_LINK_SET_ACTIVE_CMDID command
wifi: ath12k: update freq range for each hardware mode
wifi: ath12k: parse and save sbs_lower_band_end_freq from WMI_SERVICE_READY_EXT2_EVENTID event
wifi: ath12k: parse and save hardware mode info from WMI_SERVICE_READY_EXT_EVENTID event for later use
wifi: ath12k: Avoid CPU busy-wait by handling VDEV_STAT and BCN_STAT
wifi: mac80211: don't WARN for late channel/color switch
wifi: mac80211: drop invalid source address OCB frames
wifi: remove zero-length arrays
====================
Link: https://patch.msgid.link/20250618210642.35805-6-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The multi message support loosened the connection between the request
and response handling, as we can now submit multiple requests before
we start processing responses. Passing the attr set to NlMsgs decoding
no longer makes sense (if it ever did), attr set may differ message
by messsage. Isolate the part of decoding responsible for attr-set
specific interpretation and call it once we identified the correct op.
Without this fix performing SET operation on an ethtool socket, while
being subscribed to notifications causes:
# File "tools/net/ynl/pyynl/lib/ynl.py", line 1096, in _op
# Exception| return self._ops(ops)[0]
# Exception| ~~~~~~~~~^^^^^
# File "tools/net/ynl/pyynl/lib/ynl.py", line 1040, in _ops
# Exception| nms = NlMsgs(reply, attr_space=op.attr_set)
# Exception| ^^^^^^^^^^^
The value of op we use on line 1040 is stale, it comes form the previous
loop. If a notification comes before a response we will update op to None
and the next iteration thru the loop will break with the trace above.
Fixes: 6fda63c45fe8 ("tools/net/ynl: fix cli.py --subscribe feature")
Fixes: ba8be00f68f5 ("tools/net/ynl: Add multi message support to ynl")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250618171746.1201403-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
net: atm: protect dev_lec[] with a mutex
Based on an initial syzbot report.
First patch is adding lec_mutex to address the report.
Second patch protects /proc/net/atm/lec operations.
We probably should delete this driver, it seems quite broken.
====================
Link: https://patch.msgid.link/20250618140844.1686882-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
/proc/net/atm/lec must ensure safety against dev_lec[] changes.
It appears it had dev_put() calls without prior dev_hold(),
leading to imbalance and UAF.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com> # Minor atm contributor
Link: https://patch.msgid.link/20250618140844.1686882-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot found its way in net/atm/lec.c, and found an error path
in lecd_attach() could leave a dangling pointer in dev_lec[].
Add a mutex to protect dev_lecp[] uses from lecd_attach(),
lec_vcc_attach() and lec_mcast_attach().
Following patch will use this mutex for /proc/net/atm/lec.
BUG: KASAN: slab-use-after-free in lecd_attach net/atm/lec.c:751 [inline]
BUG: KASAN: slab-use-after-free in lane_ioctl+0x2224/0x23e0 net/atm/lec.c:1008
Read of size 8 at addr ffff88807c7b8e68 by task syz.1.17/6142
CPU: 1 UID: 0 PID: 6142 Comm: syz.1.17 Not tainted 6.16.0-rc1-syzkaller-00239-g08215f5486ec #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0xcd/0x680 mm/kasan/report.c:521
kasan_report+0xe0/0x110 mm/kasan/report.c:634
lecd_attach net/atm/lec.c:751 [inline]
lane_ioctl+0x2224/0x23e0 net/atm/lec.c:1008
do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159
sock_do_ioctl+0x118/0x280 net/socket.c:1190
sock_ioctl+0x227/0x6b0 net/socket.c:1311
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl fs/ioctl.c:893 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Allocated by task 6132:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__do_kmalloc_node mm/slub.c:4328 [inline]
__kvmalloc_node_noprof+0x27b/0x620 mm/slub.c:5015
alloc_netdev_mqs+0xd2/0x1570 net/core/dev.c:11711
lecd_attach net/atm/lec.c:737 [inline]
lane_ioctl+0x17db/0x23e0 net/atm/lec.c:1008
do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159
sock_do_ioctl+0x118/0x280 net/socket.c:1190
sock_ioctl+0x227/0x6b0 net/socket.c:1311
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl fs/ioctl.c:893 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 6132:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:576
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2381 [inline]
slab_free mm/slub.c:4643 [inline]
kfree+0x2b4/0x4d0 mm/slub.c:4842
free_netdev+0x6c5/0x910 net/core/dev.c:11892
lecd_attach net/atm/lec.c:744 [inline]
lane_ioctl+0x1ce8/0x23e0 net/atm/lec.c:1008
do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159
sock_do_ioctl+0x118/0x280 net/socket.c:1190
sock_ioctl+0x227/0x6b0 net/socket.c:1311
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl fs/ioctl.c:893 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+8b64dec3affaed7b3af5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6852c6f6.050a0220.216029.0018.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250618140844.1686882-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The message "Error getting PHY irq. Use polling instead"
is emitted when the mlxbf_gige driver is loaded by the
kernel before the associated gpio-mlxbf driver, and thus
the call to get the PHY IRQ fails since it is not yet
available. The driver probe() must return -EPROBE_DEFER
if acpi_dev_gpio_irq_get_by() returns the same.
Fixes: 6c2a6ddca763 ("net: mellanox: mlxbf_gige: Replace non-standard interrupt handling")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250618135902.346-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
airoha_ppe_foe_get_entry routine can return NULL, so check the returned
pointer is not NULL in airoha_ppe_foe_flow_l2_entry_update()
Fixes: b81e0f2b58be3 ("net: airoha: Add FLOW_CLS_STATS callback support")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250618-check-ret-from-airoha_ppe_foe_get_entry-v2-1-068dcea3cc66@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Setting tty->disc_data before opening the NCI device means we need to
clean it up on error paths. This also opens some short window if device
starts sending data, even before NCIUARTSETDRIVER IOCTL succeeded
(broken hardware?). Close the window by exposing tty->disc_data only on
the success path, when opening of the NCI device and try_module_get()
succeeds.
The code differs in error path in one aspect: tty->disc_data won't be
ever assigned thus NULL-ified. This however should not be relevant
difference, because of "tty->disc_data=NULL" in nci_uart_tty_open().
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Fixes: 9961127d4bce ("NFC: nci: add generic uart support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20250618073649.25049-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller reported a null-ptr-deref in sock_omalloc() while allocating
a CALIPSO option. [0]
The NULL is of struct sock, which was fetched by sk_to_full_sk() in
calipso_req_setattr().
Since commit a1a5344ddbe8 ("tcp: avoid two atomic ops for syncookies"),
reqsk->rsk_listener could be NULL when SYN Cookie is returned to its
client, as hinted by the leading SYN Cookie log.
Here are 3 options to fix the bug:
1) Return 0 in calipso_req_setattr()
2) Return an error in calipso_req_setattr()
3) Alaways set rsk_listener
1) is no go as it bypasses LSM, but 2) effectively disables SYN Cookie
for CALIPSO. 3) is also no go as there have been many efforts to reduce
atomic ops and make TCP robust against DDoS. See also commit 3b24d854cb35
("tcp/dccp: do not touch listener sk_refcnt under synflood").
As of the blamed commit, SYN Cookie already did not need refcounting,
and no one has stumbled on the bug for 9 years, so no CALIPSO user will
care about SYN Cookie.
Let's return an error in calipso_req_setattr() and calipso_req_delattr()
in the SYN Cookie case.
This can be reproduced by [1] on Fedora and now connect() of nc times out.
[0]:
TCP: request_sock_TCPv6: Possible SYN flooding on port [::]:20002. Sending cookies.
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 3 UID: 0 PID: 12262 Comm: syz.1.2611 Not tainted 6.14.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:read_pnet include/net/net_namespace.h:406 [inline]
RIP: 0010:sock_net include/net/sock.h:655 [inline]
RIP: 0010:sock_kmalloc+0x35/0x170 net/core/sock.c:2806
Code: 89 d5 41 54 55 89 f5 53 48 89 fb e8 25 e3 c6 fd e8 f0 91 e3 00 48 8d 7b 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 26 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b
RSP: 0018:ffff88811af89038 EFLAGS: 00010216
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff888105266400
RDX: 0000000000000006 RSI: ffff88800c890000 RDI: 0000000000000030
RBP: 0000000000000050 R08: 0000000000000000 R09: ffff88810526640e
R10: ffffed1020a4cc81 R11: ffff88810526640f R12: 0000000000000000
R13: 0000000000000820 R14: ffff888105266400 R15: 0000000000000050
FS: 00007f0653a07640(0000) GS:ffff88811af80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f863ba096f4 CR3: 00000000163c0005 CR4: 0000000000770ef0
PKRU: 80000000
Call Trace:
<IRQ>
ipv6_renew_options+0x279/0x950 net/ipv6/exthdrs.c:1288
calipso_req_setattr+0x181/0x340 net/ipv6/calipso.c:1204
calipso_req_setattr+0x56/0x80 net/netlabel/netlabel_calipso.c:597
netlbl_req_setattr+0x18a/0x440 net/netlabel/netlabel_kapi.c:1249
selinux_netlbl_inet_conn_request+0x1fb/0x320 security/selinux/netlabel.c:342
selinux_inet_conn_request+0x1eb/0x2c0 security/selinux/hooks.c:5551
security_inet_conn_request+0x50/0xa0 security/security.c:4945
tcp_v6_route_req+0x22c/0x550 net/ipv6/tcp_ipv6.c:825
tcp_conn_request+0xec8/0x2b70 net/ipv4/tcp_input.c:7275
tcp_v6_conn_request+0x1e3/0x440 net/ipv6/tcp_ipv6.c:1328
tcp_rcv_state_process+0xafa/0x52b0 net/ipv4/tcp_input.c:6781
tcp_v6_do_rcv+0x8a6/0x1a40 net/ipv6/tcp_ipv6.c:1667
tcp_v6_rcv+0x505e/0x5b50 net/ipv6/tcp_ipv6.c:1904
ip6_protocol_deliver_rcu+0x17c/0x1da0 net/ipv6/ip6_input.c:436
ip6_input_finish+0x103/0x180 net/ipv6/ip6_input.c:480
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip6_input+0x13c/0x6b0 net/ipv6/ip6_input.c:491
dst_input include/net/dst.h:469 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
ip6_rcv_finish+0xb6/0x490 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ipv6_rcv+0xf9/0x490 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x12e/0x1f0 net/core/dev.c:5896
__netif_receive_skb+0x1d/0x170 net/core/dev.c:6009
process_backlog+0x41e/0x13b0 net/core/dev.c:6357
__napi_poll+0xbd/0x710 net/core/dev.c:7191
napi_poll net/core/dev.c:7260 [inline]
net_rx_action+0x9de/0xde0 net/core/dev.c:7382
handle_softirqs+0x19a/0x770 kernel/softirq.c:561
do_softirq.part.0+0x36/0x70 kernel/softirq.c:462
</IRQ>
<TASK>
do_softirq arch/x86/include/asm/preempt.h:26 [inline]
__local_bh_enable_ip+0xf1/0x110 kernel/softirq.c:389
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
__dev_queue_xmit+0xc2a/0x3c40 net/core/dev.c:4679
dev_queue_xmit include/linux/netdevice.h:3313 [inline]
neigh_hh_output include/net/neighbour.h:523 [inline]
neigh_output include/net/neighbour.h:537 [inline]
ip6_finish_output2+0xd69/0x1f80 net/ipv6/ip6_output.c:141
__ip6_finish_output net/ipv6/ip6_output.c:215 [inline]
ip6_finish_output+0x5dc/0xd60 net/ipv6/ip6_output.c:226
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip6_output+0x24b/0x8d0 net/ipv6/ip6_output.c:247
dst_output include/net/dst.h:459 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip6_xmit+0xbbc/0x20d0 net/ipv6/ip6_output.c:366
inet6_csk_xmit+0x39a/0x720 net/ipv6/inet6_connection_sock.c:135
__tcp_transmit_skb+0x1a7b/0x3b40 net/ipv4/tcp_output.c:1471
tcp_transmit_skb net/ipv4/tcp_output.c:1489 [inline]
tcp_send_syn_data net/ipv4/tcp_output.c:4059 [inline]
tcp_connect+0x1c0c/0x4510 net/ipv4/tcp_output.c:4148
tcp_v6_connect+0x156c/0x2080 net/ipv6/tcp_ipv6.c:333
__inet_stream_connect+0x3a7/0xed0 net/ipv4/af_inet.c:677
tcp_sendmsg_fastopen+0x3e2/0x710 net/ipv4/tcp.c:1039
tcp_sendmsg_locked+0x1e82/0x3570 net/ipv4/tcp.c:1091
tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1358
inet6_sendmsg+0xb9/0x150 net/ipv6/af_inet6.c:659
sock_sendmsg_nosec net/socket.c:718 [inline]
__sock_sendmsg+0xf4/0x2a0 net/socket.c:733
__sys_sendto+0x29a/0x390 net/socket.c:2187
__do_sys_sendto net/socket.c:2194 [inline]
__se_sys_sendto net/socket.c:2190 [inline]
__x64_sys_sendto+0xe1/0x1c0 net/socket.c:2190
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc3/0x1d0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f06553c47ed
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0653a06fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f0655605fa0 RCX: 00007f06553c47ed
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000b
RBP: 00007f065545db38 R08: 0000200000000140 R09: 000000000000001c
R10: f7384d4ea84b01bd R11: 0000000000000246 R12: 0000000000000000
R13: 00007f0655605fac R14: 00007f0655606038 R15: 00007f06539e7000
</TASK>
Modules linked in:
[1]:
dnf install -y selinux-policy-targeted policycoreutils netlabel_tools procps-ng nmap-ncat
mount -t selinuxfs none /sys/fs/selinux
load_policy
netlabelctl calipso add pass doi:1
netlabelctl map del default
netlabelctl map add default address:::1 protocol:calipso,1
sysctl net.ipv4.tcp_syncookies=2
nc -l ::1 80 &
nc ::1 80
Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: John Cheung <john.cs.hey@gmail.com>
Closes: https://lore.kernel.org/netdev/CAP=Rh=MvfhrGADy+-WJiftV2_WzMH4VEhEFmeT28qY+4yxNu4w@mail.gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20250617224125.17299-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Brett Creeley is taking ownership of AMD/Pensando drivers while I wander
off into the sunset with my retirement this month. I'll still keep an
eye out on a few topics for awhile, and maybe do some free-lance work in
the future.
Meanwhile, thank you all for the fun and support and the many learning
opportunities :-).
Special thanks go to DaveM for merging my first patch long ago, the big
ionic patchset a few years ago, and my last patchset last week.
Redirect things to a non-corporate account.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20250616224437.56581-1-shannon.nelson@amd.com
[Jakub: squash in the .mailmap update]
Signed-off-by: Shannon Nelson <sln@onemain.com>
Link: https://patch.msgid.link/20250619010603.1173141-1-sln@onemain.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the GuC fails to load we declare the device wedged. However, the
very first GuC load attempt on GT0 (from xe_gt_init_hwconfig) is done
before the GT1 GuC objects are initialized, so things go bad when the
wedge code attempts to cleanup GT1. To fix this, check the initialization
status in the functions called during wedge.
Fixes: 7dbe8af13c18 ("drm/xe: Wedge the entire device")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: stable@vger.kernel.org # v6.12+: 1e1981b16bb1: drm/xe: Fix taking invalid lock on wedge
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250611214453.1159846-2-daniele.ceraolospurio@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 0b93b7dcd9eb888a6ac7546560877705d4ad61bf)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
It is better to print out the non supported num_dmics than printing that
it is not matching with 2 or 4.
Fixes: 2fbeff33381c ("ASoC: Intel: add sof_sdw_get_tplg_files ops")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev>
Link: https://patch.msgid.link/20250619104705.26057-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
It should rather use xe_map_memset() as the BO is created with
XE_BO_FLAG_VRAM_IF_DGFX in xe_guc_pc_init().
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250612-vmap-vaddr-v1-1-26238ed443eb@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 21cf47d89fba353b2d5915ba4718040c4cb955d3)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
This allows for additional L2 caching modes.
Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340")
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250612-wa-14022085890-v4-2-94ba5dcc1e30@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 6ab42fa03d4c88a0ddf5e56e62794853b198e7bf)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Sanity check the values for queue depth and number of queues
we get from userspace when adding a device.
Signed-off-by: Ronnie Sahlberg <rsahlberg@whamcloud.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Fixes: 62fe99cef94a ("ublk: add read()/write() support for ublk char device")
Link: https://lore.kernel.org/r/20250619021031.181340-1-ronniesahlberg@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
tianshuo han reported a remotely-triggerable crash if the client sends a
kernel RPC server a specially crafted packet. If decoding the RPC reply
fails in such a way that SVC_GARBAGE is returned without setting the
rq_accept_statp pointer, then that pointer can be dereferenced and a
value stored there.
If it's the first time the thread has processed an RPC, then that
pointer will be set to NULL and the kernel will crash. In other cases,
it could create a memory scribble.
The server sunrpc code treats a SVC_GARBAGE return from svc_authenticate
or pg_authenticate as if it should send a GARBAGE_ARGS reply. RFC 5531
says that if authentication fails that the RPC should be rejected
instead with a status of AUTH_ERR.
Handle a SVC_GARBAGE return as an AUTH_ERROR, with a reason of
AUTH_BADCRED instead of returning GARBAGE_ARGS in that case. This
sidesteps the whole problem of touching the rpc_accept_statp pointer in
this situation and avoids the crash.
Cc: stable@kernel.org
Fixes: 29cd2927fb91 ("SUNRPC: Fix encoding of accepted but unsuccessful RPC replies")
Reported-by: tianshuo han <hantianshuo233@gmail.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The old nfsdfs interface for starting a server with multiple pools
handles the special case of a single entry array passed down from
userland by distributing the threads over every NUMA node.
The netlink control interface however constructs an array of length
nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
defeats the special casing that the old interface relies on.
Change nfsd_nl_threads_set_doit() to pass down the array from userland
as-is.
Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
Cc: stable@vger.kernel.org
Reported-by: Mike Snitzer <snitzer@kernel.org>
Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@kernel.org/
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
lan743x_ptp_io_event_clock_get()
Before calling lan743x_ptp_io_event_clock_get(), the 'channel' value
is checked against the maximum value of PCI11X1X_PTP_IO_MAX_CHANNELS(8).
This seems correct and aligns with the PTP interrupt status register
(PTP_INT_STS) specifications.
However, lan743x_ptp_io_event_clock_get() writes to ptp->extts[] with
only LAN743X_PTP_N_EXTTS(4) elements, using channel as an index:
lan743x_ptp_io_event_clock_get(..., u8 channel,...)
{
...
/* Update Local timestamp */
extts = &ptp->extts[channel];
extts->ts.tv_sec = sec;
...
}
To avoid an out-of-bounds write and utilize all the supported GPIO
inputs, set LAN743X_PTP_N_EXTTS to 8.
Detected using the static analysis tool - Svace.
Fixes: 60942c397af6 ("net: lan743x: Add support for PTP-IO Event Input External Timestamp (extts)")
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Rengarajan S <rengarajan.s@microchip.com>
Link: https://patch.msgid.link/20250616113743.36284-1-aleksei.kodanev@bell-sw.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When one of two zones composing a DUP block group is a conventional zone,
we have the zone_info[i]->alloc_offset = WP_CONVENTIONAL. That will, of
course, not match the write pointer of the other zone, and fails that
block group.
This commit solves that issue by properly recovering the emulated write
pointer from the last allocated extent. The offset for the SINGLE, DUP,
and RAID1 are straight-forward: it is same as the end of last allocated
extent. The RAID0 and RAID10 are a bit tricky that we need to do the math
of striping.
This is the kernel equivalent of Naohiro's user-space commit:
"btrfs-progs: zoned: fix alloc_offset calculation for partly
conventional block groups".
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
There is syzbot based reproducer that can crash the kernel, with the
following call trace: (With some debug output added)
DEBUG: rescue=ibadroots parsed
BTRFS: device fsid 14d642db-7b15-43e4-81e6-4b8fac6a25f8 devid 1 transid 8 /dev/loop0 (7:0) scanned by repro (1010)
BTRFS info (device loop0): first mount of filesystem 14d642db-7b15-43e4-81e6-4b8fac6a25f8
BTRFS info (device loop0): using blake2b (blake2b-256-generic) checksum algorithm
BTRFS info (device loop0): using free-space-tree
BTRFS warning (device loop0): checksum verify failed on logical 5312512 mirror 1 wanted 0xb043382657aede36608fd3386d6b001692ff406164733d94e2d9a180412c6003 found 0x810ceb2bacb7f0f9eb2bf3b2b15c02af867cb35ad450898169f3b1f0bd818651 level 0
DEBUG: read tree root path failed for tree csum, ret=-5
BTRFS warning (device loop0): checksum verify failed on logical 5328896 mirror 1 wanted 0x51be4e8b303da58e6340226815b70e3a93592dac3f30dd510c7517454de8567a found 0x51be4e8b303da58e634022a315b70e3a93592dac3f30dd510c7517454de8567a level 0
BTRFS warning (device loop0): checksum verify failed on logical 5292032 mirror 1 wanted 0x1924ccd683be9efc2fa98582ef58760e3848e9043db8649ee382681e220cdee4 found 0x0cb6184f6e8799d9f8cb335dccd1d1832da1071d12290dab3b85b587ecacca6e level 0
process 'repro' launched './file2' with NULL argv: empty string added
DEBUG: no csum root, idatacsums=0 ibadroots=134217728
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [#1] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f]
CPU: 5 UID: 0 PID: 1010 Comm: repro Tainted: G OE 6.15.0-custom+ #249 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
RIP: 0010:btrfs_lookup_csum+0x93/0x3d0 [btrfs]
Call Trace:
<TASK>
btrfs_lookup_bio_sums+0x47a/0xdf0 [btrfs]
btrfs_submit_bbio+0x43e/0x1a80 [btrfs]
submit_one_bio+0xde/0x160 [btrfs]
btrfs_readahead+0x498/0x6a0 [btrfs]
read_pages+0x1c3/0xb20
page_cache_ra_order+0x4b5/0xc20
filemap_get_pages+0x2d3/0x19e0
filemap_read+0x314/0xde0
__kernel_read+0x35b/0x900
bprm_execve+0x62e/0x1140
do_execveat_common.isra.0+0x3fc/0x520
__x64_sys_execveat+0xdc/0x130
do_syscall_64+0x54/0x1d0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
---[ end trace 0000000000000000 ]---
[CAUSE]
Firstly the fs has a corrupted csum tree root, thus to mount the fs we
have to go "ro,rescue=ibadroots" mount option.
Normally with that mount option, a bad csum tree root should set
BTRFS_FS_STATE_NO_DATA_CSUMS flag, so that any future data read will
ignore csum search.
But in this particular case, we have the following call trace that
caused NULL csum root, but not setting BTRFS_FS_STATE_NO_DATA_CSUMS:
load_global_roots_objectid():
ret = btrfs_search_slot();
/* Succeeded */
btrfs_item_key_to_cpu()
found = true;
/* We found the root item for csum tree. */
root = read_tree_root_path();
if (IS_ERR(root)) {
if (!btrfs_test_opt(fs_info, IGNOREBADROOTS))
/*
* Since we have rescue=ibadroots mount option,
* @ret is still 0.
*/
break;
if (!found || ret) {
/* @found is true, @ret is 0, error handling for csum
* tree is skipped.
*/
}
This means we completely skipped to set BTRFS_FS_STATE_NO_DATA_CSUMS if
the csum tree is corrupted, which results unexpected later csum lookup.
[FIX]
If read_tree_root_path() failed, always populate @ret to the error
number.
As at the end of the function, we need @ret to determine if we need to
do the extra error handling for csum tree.
Fixes: abed4aaae4f7 ("btrfs: track the csum, extent, and free space trees in a rb tree")
Reported-by: Zhiyu Zhang <zhiyuzhang999@gmail.com>
Reported-by: Longxing Li <coregee2000@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Syzbot reported an assertion failure due to an attempt to add a delayed
iput after we have set BTRFS_FS_STATE_NO_DELAYED_IPUT in the fs_info
state:
WARNING: CPU: 0 PID: 65 at fs/btrfs/inode.c:3420 btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420
Modules linked in:
CPU: 0 UID: 0 PID: 65 Comm: kworker/u8:4 Not tainted 6.15.0-next-20250530-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Workqueue: btrfs-endio-write btrfs_work_helper
RIP: 0010:btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420
Code: 4e ad 5d (...)
RSP: 0018:ffffc9000213f780 EFLAGS: 00010293
RAX: ffffffff83c635b7 RBX: ffff888058920000 RCX: ffff88801c769e00
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffff888058921b67 R09: 1ffff1100b12436c
R10: dffffc0000000000 R11: ffffed100b12436d R12: 0000000000000001
R13: dffffc0000000000 R14: ffff88807d748000 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffff888125c53000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002000000bd038 CR3: 000000006a142000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
btrfs_put_ordered_extent+0x19f/0x470 fs/btrfs/ordered-data.c:635
btrfs_finish_one_ordered+0x11d8/0x1b10 fs/btrfs/inode.c:3312
btrfs_work_helper+0x399/0xc20 fs/btrfs/async-thread.c:312
process_one_work kernel/workqueue.c:3238 [inline]
process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3321
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3402
kthread+0x70e/0x8a0 kernel/kthread.c:464
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
This can happen due to a race with the async reclaim worker like this:
1) The async metadata reclaim worker enters shrink_delalloc(), which calls
btrfs_start_delalloc_roots() with an nr_pages argument that has a value
less than LONG_MAX, and that in turn enters start_delalloc_inodes(),
which sets the local variable 'full_flush' to false because
wbc->nr_to_write is less than LONG_MAX;
2) There it finds inode X in a root's delalloc list, grabs a reference for
inode X (with igrab()), and triggers writeback for it with
filemap_fdatawrite_wbc(), which creates an ordered extent for inode X;
3) The unmount sequence starts from another task, we enter close_ctree()
and we flush the workqueue fs_info->endio_write_workers, which waits
for the ordered extent for inode X to complete and when dropping the
last reference of the ordered extent, with btrfs_put_ordered_extent(),
when we call btrfs_add_delayed_iput() we don't add the inode to the
list of delayed iputs because it has a refcount of 2, so we decrement
it to 1 and return;
4) Shortly after at close_ctree() we call btrfs_run_delayed_iputs() which
runs all delayed iputs, and then we set BTRFS_FS_STATE_NO_DELAYED_IPUT
in the fs_info state;
5) The async reclaim worker, after calling filemap_fdatawrite_wbc(), now
calls btrfs_add_delayed_iput() for inode X and there we trigger an
assertion failure since the fs_info state has the flag
BTRFS_FS_STATE_NO_DELAYED_IPUT set.
Fix this by setting BTRFS_FS_STATE_NO_DELAYED_IPUT only after we wait for
the async reclaim workers to finish, after we call cancel_work_sync() for
them at close_ctree(), and by running delayed iputs after wait for the
reclaim workers to finish and before setting the bit.
This race was recently introduced by commit 19e60b2a95f5 ("btrfs: add
extra warning if delayed iput is added when it's not allowed"). Without
the new validation at btrfs_add_delayed_iput(), this described scenario
was safe because close_ctree() later calls btrfs_commit_super(). That
will run any final delayed iputs added by reclaim workers in the window
between the btrfs_run_delayed_iputs() and the the reclaim workers being
shut down.
Reported-by: syzbot+0ed30ad435bf6f5b7a42@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/6840481c.a00a0220.d4325.000c.GAE@google.com/T/#u
Fixes: 19e60b2a95f5 ("btrfs: add extra warning if delayed iput is added when it's not allowed")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When building the free space tree with the block group tree feature
enabled, we can hit an assertion failure like this:
BTRFS info (device loop0 state M): rebuilding free space tree
assertion failed: ret == 0, in fs/btrfs/free-space-tree.c:1102
------------[ cut here ]------------
kernel BUG at fs/btrfs/free-space-tree.c:1102!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
Modules linked in:
CPU: 1 UID: 0 PID: 6592 Comm: syz-executor322 Not tainted 6.15.0-rc7-syzkaller-gd7fa1af5b33e #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : populate_free_space_tree+0x514/0x518 fs/btrfs/free-space-tree.c:1102
lr : populate_free_space_tree+0x514/0x518 fs/btrfs/free-space-tree.c:1102
sp : ffff8000a4ce7600
x29: ffff8000a4ce76e0 x28: ffff0000c9bc6000 x27: ffff0000ddfff3d8
x26: ffff0000ddfff378 x25: dfff800000000000 x24: 0000000000000001
x23: ffff8000a4ce7660 x22: ffff70001499cecc x21: ffff0000e1d8c160
x20: ffff0000e1cb7800 x19: ffff0000e1d8c0b0 x18: 00000000ffffffff
x17: ffff800092f39000 x16: ffff80008ad27e48 x15: ffff700011e740c0
x14: 1ffff00011e740c0 x13: 0000000000000004 x12: ffffffffffffffff
x11: ffff700011e740c0 x10: 0000000000ff0100 x9 : 94ef24f55d2dbc00
x8 : 94ef24f55d2dbc00 x7 : 0000000000000001 x6 : 0000000000000001
x5 : ffff8000a4ce6f98 x4 : ffff80008f415ba0 x3 : ffff800080548ef0
x2 : 0000000000000000 x1 : 0000000100000000 x0 : 000000000000003e
Call trace:
populate_free_space_tree+0x514/0x518 fs/btrfs/free-space-tree.c:1102 (P)
btrfs_rebuild_free_space_tree+0x14c/0x54c fs/btrfs/free-space-tree.c:1337
btrfs_start_pre_rw_mount+0xa78/0xe10 fs/btrfs/disk-io.c:3074
btrfs_remount_rw fs/btrfs/super.c:1319 [inline]
btrfs_reconfigure+0x828/0x2418 fs/btrfs/super.c:1543
reconfigure_super+0x1d4/0x6f0 fs/super.c:1083
do_remount fs/namespace.c:3365 [inline]
path_mount+0xb34/0xde0 fs/namespace.c:4200
do_mount fs/namespace.c:4221 [inline]
__do_sys_mount fs/namespace.c:4432 [inline]
__se_sys_mount fs/namespace.c:4409 [inline]
__arm64_sys_mount+0x3e8/0x468 fs/namespace.c:4409
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Code: f0047182 91178042 528089c3 9771d47b (d4210000)
---[ end trace 0000000000000000 ]---
This happens because we are processing an empty block group, which has
no extents allocated from it, there are no items for this block group,
including the block group item since block group items are stored in a
dedicated tree when using the block group tree feature. It also means
this is the block group with the highest start offset, so there are no
higher keys in the extent root, hence btrfs_search_slot_for_read()
returns 1 (no higher key found).
Fix this by asserting 'ret' is 0 only if the block group tree feature
is not enabled, in which case we should find a block group item for
the block group since it's stored in the extent root and block group
item keys are greater than extent item keys (the value for
BTRFS_BLOCK_GROUP_ITEM_KEY is 192 and for BTRFS_EXTENT_ITEM_KEY and
BTRFS_METADATA_ITEM_KEY the values are 168 and 169 respectively).
In case 'ret' is 1, we just need to add a record to the free space
tree which spans the whole block group, and we can achieve this by
making 'ret == 0' as the while loop's condition.
Reported-by: syzbot+36fae25c35159a763a2a@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/6841dca8.a00a0220.d4325.0020.GAE@google.com/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If there's an unexpected (invalid) extent type, we just silently ignore
it. This means a corruption or some bug somewhere, so instead return
-EUCLEAN to the caller, making log replay fail, and print an error message
with relevant information.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In a few places where we call read_one_inode(), if we get a NULL pointer
we end up jumping into an error path, or fallthrough in case of
__add_inode_ref(), where we then do something like this:
iput(&inode->vfs_inode);
which results in an invalid inode pointer that triggers an invalid memory
access, resulting in a crash.
Fix this by making sure we don't do such dereferences.
Fixes: b4c50cbb01a1 ("btrfs: return a btrfs_inode from read_one_inode()")
CC: stable@vger.kernel.org # 6.15+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we break out of the loop because an extent buffer doesn't have the bit
EXTENT_BUFFER_TREE_REF set, we end up unlocking the xarray twice, once
before we tested for the bit and break out of the loop, and once again
after the loop.
Fix this by testing the bit and exiting before unlocking the xarray.
The time spent testing the bit is negligible and it's not worth trying
to do that outside the critical section delimited by the xarray lock due
to the code complexity required to avoid it (like using a local boolean
variable to track whether the xarray is locked or not). The xarray unlock
only needs to be done before calling release_extent_buffer(), as that
needs to lock the xarray (through xa_cmpxchg_irq()) and does a more
significant amount of work.
Fixes: 19d7f65f032f ("btrfs: convert the buffer_radix to an xarray")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/linux-btrfs/aDRNDU0GM1_D4Xnw@stanley.mountain/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|