summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-07Merge tag 'vfs-6.14-rc2.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix fsnotify FMODE_NONOTIFY* handling. This also disables fsnotify on all pseudo files by default apart from very select exceptions. This carries a regression risk so we need to watch out and adapt accordingly. However, it is overall a significant improvement over the current status quo where every rando file can get fsnotify enabled. - Cleanup and simplify lockref_init() after recent lockref changes. - Fix vboxfs build with gcc-15. - Add an assert into inode_set_cached_link() to catch corrupt links. - Allow users to also use an empty string check to detect whether a given mount option string was empty or not. - Fix how security options were appended to statmount()'s ->mnt_opt field. - Fix statmount() selftests to always check the returned mask. - Fix uninitialized value in vfs_statx_path(). - Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading and preserve extensibility. * tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: sanity check the length passed to inode_set_cached_link() pidfs: improve ioctl handling fsnotify: disable pre-content and permission events by default selftests: always check mask returned by statmount(2) fsnotify: disable notification by default for all pseudo files fs: fix adding security options to statmount.mnt_opt fsnotify: use accessor to set FMODE_NONOTIFY_* lockref: remove count argument of lockref_init gfs2: switch to lockref_init(..., 1) gfs2: use lockref_init for gl_lockref statmount: let unset strings be empty vboxsf: fix building with GCC 15 fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
2025-02-07Merge tag 'bcachefs-2025-02-06.2' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Nothing major, things continue to be fairly quiet over here. - add a SubmittingPatches to clarify that patches submitted for bcachefs do, in fact, need to be tested - discard path now correctly issues journal flushes when needed, this fixes performance issues when the filesystem is nearly full and we're bottlenecked on copygc - fix a bug that could cause the pending rebalance work accounting to be off when devices are being onlined/offlined; users should report if they are still seeing this - and a few more trivial ones" * tag 'bcachefs-2025-02-06.2' of git://evilpiepirate.org/bcachefs: bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on bch_extent_rebalance bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit() bcachefs: Fix discard path journal flushing bcachefs: fix deadlock in journal_entry_open() bcachefs: fix incorrect pointer check in __bch2_subvolume_delete() bcachefs docs: SubmittingPatches.rst
2025-02-07MAINTAINERS: Remove myselfHector Martin
I no longer have any faith left in the kernel development process or community management approach. Apple/ARM platform development will continue downstream. If I feel like sending some patches upstream in the future myself for whatever subtree I may, or I may not. Anyone who feels like fighting the upstreaming fight themselves is welcome to do so. Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-07MAINTAINERS: Move Pavel to kernel.org addressPavel Machek
I need to filter my emails better, switch to pavel@kernel.org address to help with that. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-07vfs: sanity check the length passed to inode_set_cached_link()Mateusz Guzik
This costs a strlen() call when instatianating a symlink. Preferably it would be hidden behind VFS_WARN_ON (or compatible), but there is no such facility at the moment. With the facility in place the call can be patched out in production kernels. In the meantime, since the cost is being paid unconditionally, use the result to a fixup the bad caller. This is not expected to persist in the long run (tm). Sample splat: bad length passed for symlink [/tmp/syz-imagegen43743633/file0/file0] (got 131109, expected 37) [rest of WARN blurp goes here] Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250204213207.337980-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07pidfs: improve ioctl handlingChristian Brauner
Pidfs supports extensible and non-extensible ioctls. The extensible ioctls need to check for the ioctl number itself not just the ioctl command otherwise both backward- and forward compatibility are broken. The pidfs ioctl handler also needs to look at the type of the ioctl command to guard against cases where "[...] a daemon receives some random file descriptor from a (potentially less privileged) client and expects the FD to be of some specific type, it might call ioctl() on this FD with some type-specific command and expect the call to fail if the FD is of the wrong type; but due to the missing type check, the kernel instead performs some action that userspace didn't expect." (cf. [1]] Link: https://lore.kernel.org/r/20250204-work-pidfs-ioctl-v1-1-04987d239575@kernel.org Link: https://lore.kernel.org/r/CAG48ez2K9A5GwtgqO31u9ZL292we8ZwAA=TJwwEv7wRuJ3j4Lw@mail.gmail.com [1] Fixes: 8ce352818820 ("pidfs: check for valid ioctl commands") Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reported-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org # v6.13; please backport with 8ce352818820 ("pidfs: check for valid ioctl commands") Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07Merge patch series "Fix for huge faults regression"Christian Brauner
Amir Goldstein <amir73il@gmail.com> says: The two Fix patches have been tested by Alex together and each one independently. I also verified that they pass the LTP inoityf/fanotify tests. * patches from https://lore.kernel.org/r/20250203223205.861346-1-amir73il@gmail.com: fsnotify: disable pre-content and permission events by default fsnotify: disable notification by default for all pseudo files fsnotify: use accessor to set FMODE_NONOTIFY_* Link: https://lore.kernel.org/r/20250203223205.861346-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fsnotify: disable pre-content and permission events by defaultAmir Goldstein
After introducing pre-content events, we had a regression related to disabling huge faults on files that should never have pre-content events enabled. This happened because the default f_mode of allocated files (0) does not disable pre-content events. Pre-content events are disabled in file_set_fsnotify_mode_by_watchers() but internal files may not get to call this helper. Initialize f_mode to disable permission and pre-content events for all files and if needed they will be enabled for the callers of file_set_fsnotify_mode_by_watchers(). Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/ Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-4-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07selftests: always check mask returned by statmount(2)Miklos Szeredi
STATMOUNT_MNT_OPTS can actually be missing if there are no options. This is a change of behavior since 75ead69a7173 ("fs: don't let statmount return empty strings"). The other checks shouldn't actually trigger, but add them for correctness and for easier debugging if the test fails. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129160641.35485-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fsnotify: disable notification by default for all pseudo filesAmir Goldstein
Most pseudo files are not applicable for fsnotify events at all, let alone to the new pre-content events. Disable notifications to all files allocated with alloc_file_pseudo() and enable legacy inotify events for the specific cases of pipe and socket, which have known users of inotify events. Pre-content events are also kept disabled for sockets and pipes. Fixes: 20bf82a898b6 ("mm: don't allow huge faults for files with pre content watches") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/linux-fsdevel/20250131121703.1e4d00a7.alex.williamson@redhat.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wi2pThSVY=zhO=ZKxViBj5QCRX-=AS2+rVknQgJnHXDFg@mail.gmail.com/ Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-3-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fs: fix adding security options to statmount.mnt_optMiklos Szeredi
Prepending security options was made conditional on sb->s_op->show_options, but security options are independent of sb options. Fixes: 056d33137bf9 ("fs: prepend statmount.mnt_opts string with security_sb_mnt_opts()") Fixes: f9af549d1fd3 ("fs: export mount options via statmount()") Cc: stable@vger.kernel.org # v6.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129151253.33241-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fsnotify: use accessor to set FMODE_NONOTIFY_*Amir Goldstein
The FMODE_NONOTIFY_* bits are a 2-bits mode. Open coding manipulation of those bits is risky. Use an accessor file_set_fsnotify_mode() to set the mode. Rename file_set_fsnotify_mode() => file_set_fsnotify_mode_from_watchers() to make way for the simple accessor name. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20250203223205.861346-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07Merge patch series "further lockref cleanups"Christian Brauner
Andreas Gruenbacher <agruenba@redhat.com> says: Here's an updated version with an additional comment saying that lockref_init() initializes count to 1. * patches from https://lore.kernel.org/r/20250130135624.1899988-1-agruenba@redhat.com: lockref: remove count argument of lockref_init gfs2: switch to lockref_init(..., 1) gfs2: use lockref_init for gl_lockref Link: https://lore.kernel.org/r/20250130135624.1899988-1-agruenba@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07lockref: remove count argument of lockref_initAndreas Gruenbacher
All users of lockref_init() now initialize the count to 1, so hardcode that and remove the count argument. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Link: https://lore.kernel.org/r/20250130135624.1899988-4-agruenba@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07gfs2: switch to lockref_init(..., 1)Andreas Gruenbacher
In qd_alloc(), initialize the lockref count to 1 to cover the common case. Compensate for that in gfs2_quota_init() by adjusting the count back down to 0; this only occurs when mounting the filesystem rw. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Link: https://lore.kernel.org/r/20250130135624.1899988-3-agruenba@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07gfs2: use lockref_init for gl_lockrefAndreas Gruenbacher
Move the initialization of gl_lockref from gfs2_init_glock_once() to gfs2_glock_get(). This allows to use lockref_init() there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Link: https://lore.kernel.org/r/20250130135624.1899988-2-agruenba@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07statmount: let unset strings be emptyMiklos Szeredi
Just like it's normal for unset values to be zero, unset strings should be empty instead of containing random values. It seems to be a typical mistake that the mask returned by statmount is not checked, which can result in various bugs. With this fix, these bugs are prevented, since it is highly likely that userspace would just want to turn the missing mask case into an empty string anyway (most of the recently found cases are of this type). Link: https://lore.kernel.org/all/CAJfpegsVCPfCn2DpM8iiYSS5DpMsLB8QBUCHecoj6s0Vxf4jzg@mail.gmail.com/ Fixes: 68385d77c05b ("statmount: simplify string option retrieval") Fixes: 46eae99ef733 ("add statmount(2) syscall") Cc: stable@vger.kernel.org # v6.8 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250130121500.113446-1-mszeredi@redhat.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07vboxsf: fix building with GCC 15Brahmajit Das
Building with GCC 15 results in build error fs/vboxsf/super.c:24:54: error: initializer-string for array of ‘unsigned char’ is too long [-Werror=unterminated-string-initialization] 24 | static const unsigned char VBSF_MOUNT_SIGNATURE[4] = "\000\377\376\375"; | ^~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Due to GCC having enabled -Werror=unterminated-string-initialization[0] by default. Separately initializing each array element of VBSF_MOUNT_SIGNATURE to ensure NUL termination, thus satisfying GCC 15 and fixing the build error. [0]: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wno-unterminated-string-initialization Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com> Link: https://lore.kernel.org/r/20250121162648.1408743-1-brahmajit.xyz@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()Su Hui
Clang static checker(scan-build) warning: fs/stat.c:287:21: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage. 287 | stat->result_mask |= STATX_MNT_ID_UNIQUE; | ~~~~~~~~~~~~~~~~~ ^ fs/stat.c:290:21: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage. 290 | stat->result_mask |= STATX_MNT_ID; When vfs_getattr() failed because of security_inode_getattr(), 'stat' is uninitialized. In this case, there is a harmless garbage problem in vfs_statx_path(). It's better to return error directly when vfs_getattr() failed, avoiding garbage value and more clearly. Signed-off-by: Su Hui <suhui@nfschina.com> Link: https://lore.kernel.org/r/20250119025946.1168957-1-suhui@nfschina.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06bcachefs: bch2_bkey_sectors_need_rebalance() now only depends on ↵Kent Overstreet
bch_extent_rebalance Previously, bch2_bkey_sectors_need_rebalance() called bch2_target_accepts_data(), checking whether the target is writable. However, this means that adding or removing devices from a target would change the value of bch2_bkey_sectors_need_rebalance() for an existing extent; this needs to be invariant so that the extent trigger can correctly maintain rebalance_work accounting. Instead, check target_accepts_data() in io_opts_to_rebalance_opts(), before creating the bch_extent_rebalance entry. This fixes (one?) cause of rebalance_work accounting being off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()Kent Overstreet
Spotted by sparse. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: Fix discard path journal flushingKent Overstreet
The discard path is supposed to issue journal flushes when there's too many buckets empty buckets that need a journal commit before they can be written to again, but at some point this code seems to have been lost. Bring it back with a new optimization to make sure we don't issue too many journal flushes: the journal now tracks the sequence number of the most recent flush in progress, which the discard path uses when deciding which buckets need a journal flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: fix deadlock in journal_entry_open()Jeongjun Park
In the previous commit b3d82c2f2761, code was added to prevent journal sequence overflow. Among them, the code added to journal_entry_open() uses the bch2_fs_fatal_err_on() function to handle errors. However, __journal_res_get() , which calls journal_entry_open() , calls journal_entry_open() while holding journal->lock , but bch2_fs_fatal_err_on() internally tries to acquire journal->lock , which results in a deadlock. So we need to add a locked helper to handle fatal errors even when the journal->lock is held. Fixes: b3d82c2f2761 ("bcachefs: Guard against journal seq overflow") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: fix incorrect pointer check in __bch2_subvolume_delete()Jeongjun Park
For some unknown reason, checks on struct bkey_s_c_snapshot and struct bkey_s_c_snapshot_tree pointers are missing. Therefore, I think it would be appropriate to fix the incorrect pointer checking through this patch. Fixes: 4bd06f07bcb5 ("bcachefs: Fixes for snapshot_tree.master_subvol") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs docs: SubmittingPatches.rstKent Overstreet
Add an (initial?) patch submission checklist, focusing mainly on testing. Yes, all patches must be tested, and that starts (but does not end) with the patch author. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06Merge tag 'pci-v6.14-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - When saving a device's state, always save the upstream bridge's PM L1 Substates configuration as well because the bridge never saves its own state, and restoring a device needs the state for both ends; this was a regression that caused link and power management errors after suspend/resume (Ilpo Järvinen) - Correct TPH Control Register write, where we wrote the ST Mode where the THP Requester Enable value was intended (Robin Murphy) * tag 'pci-v6.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/TPH: Restore TPH Requester Enable correctly PCI/ASPM: Fix L1SS saving
2025-02-06Merge tag 'for-linus-6.14-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Three fixes for xen_hypercall_hvm() that was introduced in the 6.13 cycle" * tag 'for-linus-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: remove unneeded dummy push from xen_hypercall_hvm() x86/xen: add FRAME_END to xen_hypercall_hvm() x86/xen: fix xen_hypercall_hvm() to not clobber %rbx
2025-02-06Merge tag 'net-6.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Interestingly the recent kmemleak improvements allowed our CI to catch a couple of percpu leaks addressed here. We (mostly Jakub, to be accurate) are working to increase review coverage over the net code-base tweaking the MAINTAINER entries. Current release - regressions: - core: harmonize tstats and dstats - ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels - eth: tun: revert fix group permission check - eth: stmmac: revert "specify hardware capability value when FIFO size isn't specified" Previous releases - regressions: - udp: gso: do not drop small packets when PMTU reduces - rxrpc: fix race in call state changing vs recvmsg() - eth: ice: fix Rx data path for heavy 9k MTU traffic - eth: vmxnet3: fix tx queue race condition with XDP Previous releases - always broken: - sched: pfifo_tail_enqueue: drop new packet when sch->limit == 0 - ethtool: ntuple: fix rss + ring_cookie check - rxrpc: fix the rxrpc_connection attend queue handling Misc: - recognize Kuniyuki Iwashima as a maintainer" * tag 'net-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) Revert "net: stmmac: Specify hardware capability value when FIFO size isn't specified" MAINTAINERS: add a sample ethtool section entry MAINTAINERS: add entry for ethtool rxrpc: Fix race in call state changing vs recvmsg() rxrpc: Fix call state set to not include the SERVER_SECURING state net: sched: Fix truncation of offloaded action statistics tun: revert fix group permission check selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog() netem: Update sch->q.qlen before qdisc_tree_reduce_backlog() selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0 pfifo_tail_enqueue: Drop new packet when sch->limit == 0 selftests: mptcp: connect: -f: no reconnect net: rose: lock the socket in rose_bind() net: atlantic: fix warning during hot unplug rxrpc: Fix the rxrpc_connection attend queue handling net: harmonize tstats and dstats selftests: drv-net: rss_ctx: don't fail reconfigure test if queue offset not supported selftests: drv-net: rss_ctx: add missing cleanup in queue reconfigure ethtool: ntuple: fix rss + ring_cookie check ethtool: rss: fix hiding unsupported fields in dumps ...
2025-02-06PCI/TPH: Restore TPH Requester Enable correctlyRobin Murphy
When we reenable TPH after changing a Steering Tag value, we need the actual TPH Requester Enable value, not the ST Mode (which only happens to work out by chance for non-extended TPH in interrupt vector mode). Link: https://lore.kernel.org/r/13118098116d7bce07aa20b8c52e28c7d1847246.1738759933.git.robin.murphy@arm.com Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Wei Huang <wei.huang2@amd.com>
2025-02-06PCI/ASPM: Fix L1SS savingIlpo Järvinen
Commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both the Upstream Port and its upstream bridge when handling an Upstream Port, which matches what the L1SS restore side does. However, parent->state_saved can be set true at an earlier time when the upstream bridge saved other parts of its state. Then later when attempting to save the L1SS config while handling the Upstream Port, parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in early return and skipping saving bridge's L1SS config because it is assumed to be already saved. Later on restore, junk is written into L1SS config which causes issues with some devices. Remove parent->state_saved check and unconditionally save L1SS config also for the upstream bridge from an Upstream Port which ought to be harmless from correctness point of view. With the Upstream Port check now present, saving the L1SS config more than once for the bridge is no longer a problem (unlike when the parent->state_saved check got introduced into the fix during its development). Link: https://lore.kernel.org/r/20250131152913.2507-1-ilpo.jarvinen@linux.intel.com Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731 Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com> Reported by: Rafael J. Wysocki <rafael@kernel.org> Closes: https://lore.kernel.org/r/CAJZ5v0iKmynOQ5vKSQbg1J_FmavwZE-nRONovOZ0mpMVauheWg@mail.gmail.com Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Closes: https://lore.kernel.org/r/d7246feb-4f3f-4d0c-bb64-89566b170671@molgen.mpg.de Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360
2025-02-06Revert "net: stmmac: Specify hardware capability value when FIFO size isn't ↵Russell King (Oracle)
specified" This reverts commit 8865d22656b4, which caused breakage for platforms which are not using xgmac2 or gmac4. Only these two cores have the capability of providing the FIFO sizes from hardware capability fields (which are provided in priv->dma_cap.[tr]x_fifo_size.) All other cores can not, which results in these two fields containing zero. We also have platforms that do not provide a value in priv->plat->[tr]x_fifo_size, resulting in these also being zero. This causes the new tests introduced by the reverted commit to fail, and produce e.g.: stmmaceth f0804000.eth: Can't specify Rx FIFO size An example of such a platform which fails is QEMU's npcm750-evb. This uses dwmac1000 which, as noted above, does not have the capability to provide the FIFO sizes from hardware. Therefore, revert the commit to maintain compatibility with the way the driver used to work. Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/4e98f967-f636-46fb-9eca-d383b9495b86@roeck-us.net Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Steven Price <steven.price@arm.com> Fixes: 8865d22656b4 ("net: stmmac: Specify hardware capability value when FIFO size isn't specified") Link: https://patch.msgid.link/E1tfeyR-003YGJ-Gb@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-06MAINTAINERS: add a sample ethtool section entryJakub Kicinski
I feel like we don't do a good enough keeping authors of driver APIs around. The ethtool code base was very nicely compartmentalized by Michal. Establish a precedent of creating MAINTAINERS entries for "sections" of the ethtool API. Use Andrew and cable test as a sample entry. The entry should ideally cover 3 elements: a core file, test(s), and keywords. The last one is important because we intend the entries to cover core code *and* reviews of drivers implementing given API! Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250204215750.169249-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-06MAINTAINERS: add entry for ethtoolJakub Kicinski
Michal did an amazing job converting ethtool to Netlink, but never added an entry to MAINTAINERS for himself. Create a formal entry so that we can delegate (portions) of this code to folks. Over the last 3 years majority of the reviews have been done by Andrew and I. I suppose Michal didn't want to be on the receiving end of the flood of patches. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250204215729.168992-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-05Merge branch 'rxrpc-call-state-fixes'Jakub Kicinski
David Howells says: ==================== rxrpc: Call state fixes Here some call state fixes for AF_RXRPC. (1) Fix the state of a call to not treat the challenge-response cycle as part of an incoming call's state set. The problem is that it makes handling received of the final packet in the receive phase difficult as that wants to change the call state - but security negotiations may not yet be complete. (2) Fix a race between the changing of the call state at the end of the request reception phase of a service call, recvmsg() collecting the last data and sendmsg() trying to send the reply before the I/O thread has advanced the call state. Link: https://lore.kernel.org/20250203110307.7265-2-dhowells@redhat.com ==================== Link: https://patch.msgid.link/20250204230558.712536-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05rxrpc: Fix race in call state changing vs recvmsg()David Howells
There's a race in between the rxrpc I/O thread recording the end of the receive phase of a call and recvmsg() examining the state of the call to determine whether it has completed. The problem is that call->_state records the I/O thread's view of the call, not the application's view (which may lag), so that alone is not sufficient. To this end, the application also checks whether there is anything left in call->recvmsg_queue for it to pick up. The call must be in state RXRPC_CALL_COMPLETE and the recvmsg_queue empty for the call to be considered fully complete. In rxrpc_input_queue_data(), the latest skbuff is added to the queue and then, if it was marked as LAST_PACKET, the state is advanced... But this is two separate operations with no locking around them. As a consequence, the lack of locking means that sendmsg() can jump into the gap on a service call and attempt to send the reply - but then get rejected because the I/O thread hasn't advanced the state yet. Simply flipping the order in which things are done isn't an option as that impacts the client side, causing the checks in rxrpc_kernel_check_life() as to whether the call is still alive to race instead. Fix this by moving the update of call->_state inside the skb queue spinlocked section where the packet is queued on the I/O thread side. rxrpc's recvmsg() will then automatically sync against this because it has to take the call->recvmsg_queue spinlock in order to dequeue the last packet. rxrpc's sendmsg() doesn't need amending as the app shouldn't be calling it to send a reply until recvmsg() indicates it has returned all of the request. Fixes: 93368b6bd58a ("rxrpc: Move call state changes from recvmsg to I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250204230558.712536-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05rxrpc: Fix call state set to not include the SERVER_SECURING stateDavid Howells
The RXRPC_CALL_SERVER_SECURING state doesn't really belong with the other states in the call's state set as the other states govern the call's Rx/Tx phase transition and govern when packets can and can't be received or transmitted. The "Securing" state doesn't actually govern the reception of packets and would need to be split depending on whether or not we've received the last packet yet (to mirror RECV_REQUEST/ACK_REQUEST). The "Securing" state is more about whether or not we can start forwarding packets to the application as recvmsg will need to decode them and the decoding can't take place until the challenge/response exchange has completed. Fix this by removing the RXRPC_CALL_SERVER_SECURING state from the state set and, instead, using a flag, RXRPC_CALL_CONN_CHALLENGING, to track whether or not we can queue the call for reception by recvmsg() or notify the kernel app that data is ready. In the event that we've already received all the packets, the connection event handler will poke the app layer in the appropriate manner. Also there's a race whereby the app layer sees the last packet before rxrpc has managed to end the rx phase and change the state to one amenable to allowing a reply. Fix this by queuing the packet after calling rxrpc_end_rx_phase(). Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250204230558.712536-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net: sched: Fix truncation of offloaded action statisticsIdo Schimmel
In case of tc offload, when user space queries the kernel for tc action statistics, tc will query the offloaded statistics from device drivers. Among other statistics, drivers are expected to pass the number of packets that hit the action since the last query as a 64-bit number. Unfortunately, tc treats the number of packets as a 32-bit number, leading to truncation and incorrect statistics when the number of packets since the last query exceeds 0xffffffff: $ tc -s filter show dev swp2 ingress filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 skip_sw in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device swp1) stolen index 1 ref 1 bind 1 installed 58 sec used 0 sec Action statistics: Sent 1133877034176 bytes 536959475 pkt (dropped 0, overlimits 0 requeues 0) [...] According to the above, 2111-byte packets were redirected which is impossible as only 64-byte packets were transmitted and the MTU was 1500. Fix by treating packets as a 64-bit number: $ tc -s filter show dev swp2 ingress filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 skip_sw in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device swp1) stolen index 1 ref 1 bind 1 installed 61 sec used 0 sec Action statistics: Sent 1370624380864 bytes 21416005951 pkt (dropped 0, overlimits 0 requeues 0) [...] Which shows that only 64-byte packets were redirected (1370624380864 / 21416005951 = 64). Fixes: 380407023526 ("net/sched: Enable netdev drivers to update statistics of offloaded actions") Reported-by: Joe Botha <joe@atomic.ac> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250204123839.1151804-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05tun: revert fix group permission checkWillem de Bruijn
This reverts commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3. The blamed commit caused a regression when neither tun->owner nor tun->group is set. This is intended to be allowed, but now requires CAP_NET_ADMIN. Discussion in the referenced thread pointed out that the original issue that prompted this patch can be resolved in userspace. The relaxed access control may also make a device accessible when it previously wasn't, while existing users may depend on it to not be. This is a clean pure git revert, except for fixing the indentation on the gid_valid line that checkpatch correctly flagged. Fixes: 3ca459eaba1b ("tun: fix group permission check") Link: https://lore.kernel.org/netdev/CAFqZXNtkCBT4f+PwyVRmQGoT3p1eVa01fCG_aNtpt6dakXncUg@mail.gmail.com/ Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Stas Sergeev <stsp2@yandex.ru> Link: https://patch.msgid.link/20250204161015.739430-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05Merge branch 'net_sched-two-security-bug-fixes-and-test-cases'Jakub Kicinski
Cong Wang says: ==================== net_sched: two security bug fixes and test cases This patchset contains two bug fixes reported in security mailing list, and test cases for both of them. ==================== Link: https://patch.msgid.link/20250204005841.223511-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog()Cong Wang
Integrate the test case provided by Mingi Cho into TDC. All test results: 1..4 ok 1 ca5e - Check class delete notification for ffff: ok 2 e4b7 - Check class delete notification for root ffff: ok 3 33a9 - Check ingress is not searchable on backlog update ok 4 a4b9 - Test class qlen notification Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-5-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05netem: Update sch->q.qlen before qdisc_tree_reduce_backlog()Cong Wang
qdisc_tree_reduce_backlog() notifies parent qdisc only if child qdisc becomes empty, therefore we need to reduce the backlog of the child qdisc before calling it. Otherwise it would miss the opportunity to call cops->qlen_notify(), in the case of DRR, it resulted in UAF since DRR uses ->qlen_notify() to maintain its active list. Fixes: f8d4bc455047 ("net/sched: netem: account for backlog updates from child qdisc") Cc: Martin Ottens <martin.ottens@fau.de> Reported-by: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-4-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0Quang Le
When limit == 0, pfifo_tail_enqueue() must drop new packet and increase dropped packets count of the qdisc. All test results: 1..16 ok 1 a519 - Add bfifo qdisc with system default parameters on egress ok 2 585c - Add pfifo qdisc with system default parameters on egress ok 3 a86e - Add bfifo qdisc with system default parameters on egress with handle of maximum value ok 4 9ac8 - Add bfifo qdisc on egress with queue size of 3000 bytes ok 5 f4e6 - Add pfifo qdisc on egress with queue size of 3000 packets ok 6 b1b1 - Add bfifo qdisc with system default parameters on egress with invalid handle exceeding maximum value ok 7 8d5e - Add bfifo qdisc on egress with unsupported argument ok 8 7787 - Add pfifo qdisc on egress with unsupported argument ok 9 c4b6 - Replace bfifo qdisc on egress with new queue size ok 10 3df6 - Replace pfifo qdisc on egress with new queue size ok 11 7a67 - Add bfifo qdisc on egress with queue size in invalid format ok 12 1298 - Add duplicate bfifo qdisc on egress ok 13 45a0 - Delete nonexistent bfifo qdisc ok 14 972b - Add prio qdisc on egress with invalid format for handles ok 15 4d39 - Delete bfifo qdisc twice ok 16 d774 - Check pfifo_head_drop qdisc enqueue behaviour when limit == 0 Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05pfifo_tail_enqueue: Drop new packet when sch->limit == 0Quang Le
Expected behaviour: In case we reach scheduler's limit, pfifo_tail_enqueue() will drop a packet in scheduler's queue and decrease scheduler's qlen by one. Then, pfifo_tail_enqueue() enqueue new packet and increase scheduler's qlen by one. Finally, pfifo_tail_enqueue() return `NET_XMIT_CN` status code. Weird behaviour: In case we set `sch->limit == 0` and trigger pfifo_tail_enqueue() on a scheduler that has no packet, the 'drop a packet' step will do nothing. This means the scheduler's qlen still has value equal 0. Then, we continue to enqueue new packet and increase scheduler's qlen by one. In summary, we can leverage pfifo_tail_enqueue() to increase qlen by one and return `NET_XMIT_CN` status code. The problem is: Let's say we have two qdiscs: Qdisc_A and Qdisc_B. - Qdisc_A's type must have '->graft()' function to create parent/child relationship. Let's say Qdisc_A's type is `hfsc`. Enqueue packet to this qdisc will trigger `hfsc_enqueue`. - Qdisc_B's type is pfifo_head_drop. Enqueue packet to this qdisc will trigger `pfifo_tail_enqueue`. - Qdisc_B is configured to have `sch->limit == 0`. - Qdisc_A is configured to route the enqueued's packet to Qdisc_B. Enqueue packet through Qdisc_A will lead to: - hfsc_enqueue(Qdisc_A) -> pfifo_tail_enqueue(Qdisc_B) - Qdisc_B->q.qlen += 1 - pfifo_tail_enqueue() return `NET_XMIT_CN` - hfsc_enqueue() check for `NET_XMIT_SUCCESS` and see `NET_XMIT_CN` => hfsc_enqueue() don't increase qlen of Qdisc_A. The whole process lead to a situation where Qdisc_A->q.qlen == 0 and Qdisc_B->q.qlen == 1. Replace 'hfsc' with other type (for example: 'drr') still lead to the same problem. This violate the design where parent's qlen should equal to the sum of its childrens'qlen. Bug impact: This issue can be used for user->kernel privilege escalation when it is reachable. Fixes: 57dbb2d83d10 ("sched: add head drop fifo queue") Reported-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests: mptcp: connect: -f: no reconnectMatthieu Baerts (NGI0)
The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE by closing the connection with unread bytes in the receive queue. The xdisconnect() helper was used to stop the connection, but it does more than that: it will shut it down, then wait before reconnecting to the same address. This causes the mptcp_join's "fastclose test" to fail all the time. This failure is due to a recent change, with commit 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect"), but that went unnoticed because the test is currently ignored. The recent modification only shown an existing issue: xdisconnect() doesn't need to be used here, only the shutdown() part is needed. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250204-net-mptcp-sft-conn-f-v1-1-6b470c72fffa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05x86/xen: remove unneeded dummy push from xen_hypercall_hvm()Juergen Gross
Stack alignment of the kernel in 64-bit mode is 8, not 16, so the dummy push in xen_hypercall_hvm() for aligning the stack to 16 bytes can be removed. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-05x86/xen: add FRAME_END to xen_hypercall_hvm()Juergen Gross
xen_hypercall_hvm() is missing a FRAME_END at the end, add it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502030848.HTNTTuo9-lkp@intel.com/ Fixes: b4845bb63838 ("x86/xen: add central hypercall functions") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-05x86/xen: fix xen_hypercall_hvm() to not clobber %rbxJuergen Gross
xen_hypercall_hvm(), which is used when running as a Xen PVH guest at most only once during early boot, is clobbering %rbx. Depending on whether the caller relies on %rbx to be preserved across the call or not, this clobbering might result in an early crash of the system. This can be avoided by using an already saved register instead of %rbx. Fixes: b4845bb63838 ("x86/xen: add central hypercall functions") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-05Merge tag 'for-6.14-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - add lockdep annotation for relocation root to fix a splat warning while merging roots - fix assertion failure when splitting ordered extent after transaction abort - don't print 'qgroup inconsistent' message when rescan process updates qgroup data sooner than the subvolume deletion process - fix use-after-free (accessing the error number) when attempting to join an aborted transaction - avoid starting new transaction if not necessary when cleaning qgroup during subvolume drop * tag 'for-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: avoid starting new transaction when cleaning qgroup during subvolume drop btrfs: fix use-after-free when attempting to join an aborted transaction btrfs: do not output error message if a qgroup has been already cleaned up btrfs: fix assertion failure when splitting ordered extent after transaction abort btrfs: fix lockdep splat while merging a relocation root
2025-02-04net: rose: lock the socket in rose_bind()Eric Dumazet
syzbot reported a soft lockup in rose_loopback_timer(), with a repro calling bind() from multiple threads. rose_bind() must lock the socket to avoid this issue. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+7ff41b5215f0c534534e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67a0f78d.050a0220.d7c5a.00a0.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/20250203170838.3521361-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: atlantic: fix warning during hot unplugJacob Moroni
Firmware deinitialization performs MMIO accesses which are not necessary if the device has already been removed. In some cases, these accesses happen via readx_poll_timeout_atomic which ends up timing out, resulting in a warning at hw_atl2_utils_fw.c:112: [ 104.595913] Call Trace: [ 104.595915] <TASK> [ 104.595918] ? show_regs+0x6c/0x80 [ 104.595923] ? __warn+0x8d/0x150 [ 104.595925] ? aq_a2_fw_deinit+0xcf/0xe0 [atlantic] [ 104.595934] ? report_bug+0x182/0x1b0 [ 104.595938] ? handle_bug+0x6e/0xb0 [ 104.595940] ? exc_invalid_op+0x18/0x80 [ 104.595942] ? asm_exc_invalid_op+0x1b/0x20 [ 104.595944] ? aq_a2_fw_deinit+0xcf/0xe0 [atlantic] [ 104.595952] ? aq_a2_fw_deinit+0xcf/0xe0 [atlantic] [ 104.595959] aq_nic_deinit.part.0+0xbd/0xf0 [atlantic] [ 104.595964] aq_nic_deinit+0x17/0x30 [atlantic] [ 104.595970] aq_ndev_close+0x2b/0x40 [atlantic] [ 104.595975] __dev_close_many+0xad/0x160 [ 104.595978] dev_close_many+0x99/0x170 [ 104.595979] unregister_netdevice_many_notify+0x18b/0xb20 [ 104.595981] ? __call_rcu_common+0xcd/0x700 [ 104.595984] unregister_netdevice_queue+0xc6/0x110 [ 104.595986] unregister_netdev+0x1c/0x30 [ 104.595988] aq_pci_remove+0xb1/0xc0 [atlantic] Fix this by skipping firmware deinitialization altogether if the PCI device is no longer present. Tested with an AQC113 attached via Thunderbolt by performing repeated unplug cycles while traffic was running via iperf. Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code") Signed-off-by: Jacob Moroni <mail@jakemoroni.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203143604.24930-3-mail@jakemoroni.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>