summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-16Merge tag 'mm-hotfixes-stable-2025-04-16-19-59' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "31 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 22 patches are for MM, 9 are otherwise" * tag 'mm-hotfixes-stable-2025-04-16-19-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits) MAINTAINERS: update HUGETLB reviewers mm: fix apply_to_existing_page_range() selftests/mm: fix compiler -Wmaybe-uninitialized warning alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate mailmap: add entry for Jean-Michel Hautbois mm: (un)track_pfn_copy() fix + doc improvements mm: fix filemap_get_folios_contig returning batches of identical folios mm/hugetlb: add a line break at the end of the format string selftests: mincore: fix tmpfs mincore test failure mm/hugetlb: fix set_max_huge_pages() when there are surplus pages mm/cma: report base address of single range correctly mm: page_alloc: speed up fallbacks in rmqueue_bulk() kunit: slub: add module description mm/kasan: add module decription ucs2_string: add module description zlib: add module description fpga: tests: add module descriptions samples/livepatch: add module descriptions ASN.1: add module description mm/vma: add give_up_on_oom option on modify/merge, use in uffd release ...
2025-04-16selftests: ublk: add generic_06 for covering fault injectUday Shankar
Add one simple fault inject target, and verify if an application using ublk device sees an I/O error quickly after the ublk server dies. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250416035444.99569-9-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16ublk: simplify aborting ublk requestMing Lei
Now ublk_abort_queue() is moved to ublk char device release handler, meantime our request queue is "quiesced" because either ->canceling was set from uring_cmd cancel function or all IOs are inflight and can't be completed by ublk server, things becomes easy much: - all uring_cmd are done, so we needn't to mark io as UBLK_IO_FLAG_ABORTED for handling completion from uring_cmd - ublk char device is closed, no one can hold IO request reference any more, so we can simply complete this request or requeue it for ublk_nosrv_should_reissue_outstanding. Reviewed-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250416035444.99569-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16ublk: remove __ublk_quiesce_dev()Ming Lei
Remove __ublk_quiesce_dev() and open code for updating device state as QUIESCED. We needn't to drain inflight requests in __ublk_quiesce_dev() any more, because all inflight requests are aborted in ublk char device release handler. Also we needn't to set ->canceling in __ublk_quiesce_dev() any more because it is done unconditionally now in ublk_ch_release(). Reviewed-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250416035444.99569-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16ublk: improve detection and handling of ublk server exitUday Shankar
There are currently two ways in which ublk server exit is detected by ublk_drv: 1. uring_cmd cancellation. If there are any outstanding uring_cmds which have not been completed to the ublk server when it exits, io_uring calls the uring_cmd callback with a special cancellation flag as the issuing task is exiting. 2. I/O timeout. This is needed in addition to the above to handle the "saturated queue" case, when all I/Os for a given queue are in the ublk server, and therefore there are no outstanding uring_cmds to cancel when the ublk server exits. There are a couple of issues with this approach: - It is complex and inelegant to have two methods to detect the same condition - The second method detects ublk server exit only after a long delay (~30s, the default timeout assigned by the block layer). This delays the nosrv behavior from kicking in and potential subsequent recovery of the device. The second issue is brought to light with the new test_generic_06 which will be added in following patch. It fails before this fix: selftests: ublk: test_generic_06.sh dev id is 0 dd: error writing '/dev/ublkb0': Input/output error 1+0 records in 0+0 records out 0 bytes copied, 30.0611 s, 0.0 kB/s DEAD dd took 31 seconds to exit (>= 5s tolerance)! generic_06 : [FAIL] Fix this by instead detecting and handling ublk server exit in the character file release callback. This has several advantages: - This one place can handle both saturated and unsaturated queues. Thus, it replaces both preexisting methods of detecting ublk server exit. - It runs quickly on ublk server exit - there is no 30s delay. - It starts the process of removing task references in ublk_drv. This is needed if we want to relax restrictions in the driver like letting only one thread serve each queue There is also the disadvantage that the character file release callback can also be triggered by intentional close of the file, which is a significant behavior change. Preexisting ublk servers (libublksrv) are dependent on the ability to open/close the file multiple times. To address this, only transition to a nosrv state if the file is released while the ublk device is live. This allows for programs to open/close the file multiple times during setup. It is still a behavior change if a ublk server decides to close/reopen the file while the device is LIVE (i.e. while it is responsible for serving I/O), but that would be highly unusual. This behavior is in line with what is done by FUSE, which is very similar to ublk in that a userspace daemon is providing services traditionally provided by the kernel. With this change in, the new test (and all other selftests, and all ublksrv tests) pass: selftests: ublk: test_generic_06.sh dev id is 0 dd: error writing '/dev/ublkb0': Input/output error 1+0 records in 0+0 records out 0 bytes copied, 0.0376731 s, 0.0 kB/s DEAD generic_04 : [PASS] Signed-off-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250416035444.99569-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16ublk: move device reset into ublk_ch_release()Ming Lei
ublk_ch_release() is called after ublk char device is closed, when all uring_cmd are done, so it is perfect fine to move ublk device reset to ublk_ch_release() from ublk_ctrl_start_recovery(). This way can avoid to grab the exiting daemon task_struct too long. However, reset of the following ublk IO flags has to be moved until ublk io_uring queues are ready: - ubq->canceling For requeuing IO in case of ublk_nosrv_dev_should_queue_io() before device is recovered - ubq->fail_io For failing IO in case of UBLK_F_USER_RECOVERY_FAIL_IO before device is recovered - ublk_io->flags For preventing using io->cmd With this way, recovery is simplified a lot. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250416035444.99569-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_ioMing Lei
Now ublk deals with ublk_nosrv_dev_should_queue_io() by keeping request queue as quiesced. This way is fragile because queue quiesce crosses syscalls or process contexts. Switch to rely on ubq->canceling for dealing with ublk_nosrv_dev_should_queue_io(), because it has been used for this purpose during io_uring context exiting, and it can be reused before recovering too. In ublk_queue_rq(), the request will be added to requeue list without kicking off requeue in case of ubq->canceling, and finally requests added in requeue list will be dispatched from either ublk_stop_dev() or ublk_ctrl_end_recovery(). Meantime we have to move reset of ubq->canceling from ublk_ctrl_start_recovery() to ublk_ctrl_end_recovery(), when IO handling can be recovered completely. Then blk_mq_quiesce_queue() and blk_mq_unquiesce_queue() are always used in same context. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Uday Shankar <ushankar@purestorage.com> Link: https://lore.kernel.org/r/20250416035444.99569-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16ublk: add ublk_force_abort_dev()Ming Lei
Add ublk_force_abort_dev() for handling ublk_nosrv_dev_should_queue_io() in ublk_stop_dev(). Then queue quiesce and unquiesce can be paired in single function. Meantime not change device state to QUIESCED any more, since the disk is going to be removed soon. Reviewed-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250416035444.99569-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16ublk: properly serialize all FETCH_REQsUday Shankar
Most uring_cmds issued against ublk character devices are serialized because each command affects only one queue, and there is an early check which only allows a single task (the queue's ubq_daemon) to issue uring_cmds against that queue. However, this mechanism does not work for FETCH_REQs, since they are expected before ubq_daemon is set. Since FETCH_REQs are only used at initialization and not in the fast path, serialize them using the per-ublk-device mutex. This fixes a number of data races that were previously possible if a badly behaved ublk server decided to issue multiple FETCH_REQs against the same qid/tag concurrently. Reported-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250416035444.99569-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: move creating UBLK_TMP into _prep_test()Ming Lei
test may exit early because of missing program or not having required feature before calling _prep_test(), then $UBLK_TMP isn't cleaned. Fix it by moving creating $UBLK_TMP into _prep_test(), any resources created since _prep_test() will be cleaned by _cleanup_test(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-14-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: add test_stress_05.shMing Lei
Add test_stress_05.sh for covering removing device with recovery enabled. io-hang has been observed with the following patch: https://lore.kernel.org/linux-block/20250403-ublk_timeout-v3-1-aa09f76c7451@purestorage.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-13-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: support user recoveryMing Lei
Add user recovery feature. Meantime add user recovery test: generic_04 and generic_05(zero copy) Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-12-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: support target specific command lineMing Lei
Support target specific command line for making related command line code handling more readable & clean. Also helps for adding new features. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-11-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: increase max nr_queues and queue depthMing Lei
Increase max nr_queues to 32, and queue depth to 1024. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-10-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: set queue pthread's cpu affinityMing Lei
In NUMA machine, ublk IO performance is very sensitive with queue pthread's affinity setting. Retrieve queue's affinity and select the 1st cpu as queue thread's sched affinity, and it is observed that single cpu task affinity can get stable & good performance if client application is put on proper cpu. Dump this info when adding one ublk device. Use shmem to communicate queue's tid between parent and daemon. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-9-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: setup ring with ↵Ming Lei
IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN It is observed that this way is more efficient for fast nvme backing file. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: add two stress tests for zero copy featureMing Lei
Add stress_03 & stress_04 for covering zero copy feature. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: run stress tests in parallelMing Lei
Run stress tests in parallel, meantime add shell local function to simplify the two stress tests. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: make sure _add_ublk_dev can return in sub-shellMing Lei
Detach ublk daemon from the starting process completely by double-fork and clearing its process group, so that `_add_ublk_dev` can return from sub-shell. Then it is more friendly for writing shell test script for adding/recovering ublk device. Prepare for running ublk test in parallel. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: cleanup backfile automaticallyMing Lei
Use global array of $UBLK_BACKFILES for storing all backfile name, then clean them automatically. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: add io_uring uapi headerMing Lei
Add io_uring UAPI header so that ublk can work with latest uapi definition. Fix the following build failure: stripe.c: In function ‘stripe_to_uring_op’: stripe.c:120:29: error: ‘IORING_OP_READV_FIXED’ undeclared (first use in this function); did you mean ‘IORING_OP_READ_FIXED’? 120 | return zc ? IORING_OP_READV_FIXED : IORING_OP_READV; | ^~~~~~~~~~~~~~~~~~~~~ | IORING_OP_READ_FIXED Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Fixes: 57ed58c13256 ("selftests: ublk: enable zero copy for stripe target") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16selftests: ublk: fix ublk_find_tgt()Ming Lei
Bounds check for iterator variable `i` is missed, so add it and fix ublk_find_tgt(). Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16net: don't try to ops lock uninitialized devsJakub Kicinski
We need to be careful when operating on dev while in rtnl_create_link(). Some devices (vxlan) initialize netdev_ops in ->newlink, so later on. Avoid using netdev_lock_ops(), the device isn't registered so we cannot legally call its ops or generate any notifications for it. netdev_ops_assert_locked_or_invisible() is safe to use, it checks registration status first. Reported-by: syzbot+de1c7d68a10e3f123bdd@syzkaller.appspotmail.com Fixes: 04efcee6ef8d ("net: hold instance lock during NETDEV_CHANGE") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250415151552.768373-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16ptp: ocp: fix start time alignment in ptp_ocp_signal_setSagi Maimon
In ptp_ocp_signal_set, the start time for periodic signals is not aligned to the next period boundary. The current code rounds up the start time and divides by the period but fails to multiply back by the period, causing misaligned signal starts. Fix this by multiplying the rounded-up value by the period to ensure the start time is the closest next period. Fixes: 4bd46bb037f8e ("ptp: ocp: Use DIV64_U64_ROUND_UP for rounding.") Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250415053131.129413-1-maimon.sagi@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16Merge branch 'collection-of-dsa-bug-fixes'Jakub Kicinski
Vladimir Oltean says: ==================== Collection of DSA bug fixes Prompted by Russell King's 3 DSA bug reports from Friday (linked in their respective patches: 1, 2 and 3), I am providing fixes to those, as well as flushing the queue with 2 other bug fixes I had. 1: fix NULL pointer dereference during mv88e6xxx driver unbind, on old switch models which lack PVT and/or STU. Seen on the ZII dev board rev B. 2: fix failure to delete bridge port VLANs on old mv88e6xxx chips which lack STU. Seen on the same board. 3: fix WARN_ON() and resource leak in DSA core on driver unbind. Seen on the same board but is a much more widespread issue. 4: fix use-after-free during probing of DSA trees with >= 3 switches, if -EPROBE_DEFER exists. In principle issue also exists for the ZII board, I reproduced on Turris MOX. 5: fix incorrect use of refcount API in DSA core for those switches which use tag_8021q (felix, sja1105, vsc73xx). Returning an error when attempting to delete a tag_8021q VLAN prints a WARN_ON(), which is harmless but might be problematic with CONFIG_PANIC_ON_OOPS. ==================== Link: https://patch.msgid.link/20250414212708.2948164-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() failsVladimir Oltean
This is very similar to the problem and solution from commit 232deb3f9567 ("net: dsa: avoid refcount warnings when ->port_{fdb,mdb}_del returns error"), except for the dsa_port_do_tag_8021q_vlan_del() operation. Fixes: c64b9c05045a ("net: dsa: tag_8021q: add proper cross-chip notifier support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414213020.2959021-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: free routing table on probe failureVladimir Oltean
If complete = true in dsa_tree_setup(), it means that we are the last switch of the tree which is successfully probing, and we should be setting up all switches from our probe path. After "complete" becomes true, dsa_tree_setup_cpu_ports() or any subsequent function may fail. If that happens, the entire tree setup is in limbo: the first N-1 switches have successfully finished probing (doing nothing but having allocated persistent memory in the tree's dst->ports, and maybe dst->rtable), and switch N failed to probe, ending the tree setup process before anything is tangible from the user's PoV. If switch N fails to probe, its memory (ports) will be freed and removed from dst->ports. However, the dst->rtable elements pointing to its ports, as created by dsa_link_touch(), will remain there, and will lead to use-after-free if dereferenced. If dsa_tree_setup_switches() returns -EPROBE_DEFER, which is entirely possible because that is where ds->ops->setup() is, we get a kasan report like this: ================================================================== BUG: KASAN: slab-use-after-free in mv88e6xxx_setup_upstream_port+0x240/0x568 Read of size 8 at addr ffff000004f56020 by task kworker/u8:3/42 Call trace: __asan_report_load8_noabort+0x20/0x30 mv88e6xxx_setup_upstream_port+0x240/0x568 mv88e6xxx_setup+0xebc/0x1eb0 dsa_register_switch+0x1af4/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 Allocated by task 42: __kasan_kmalloc+0x84/0xa0 __kmalloc_cache_noprof+0x298/0x490 dsa_switch_touch_ports+0x174/0x3d8 dsa_register_switch+0x800/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 Freed by task 42: __kasan_slab_free+0x48/0x68 kfree+0x138/0x418 dsa_register_switch+0x2694/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 The simplest way to fix the bug is to delete the routing table in its entirety. dsa_tree_setup_routing_table() has no problem in regenerating it even if we deleted links between ports other than those of switch N, because dsa_link_touch() first checks whether the port pair already exists in dst->rtable, allocating if not. The deletion of the routing table in its entirety already exists in dsa_tree_teardown(), so refactor that into a function that can also be called from the tree setup error path. In my analysis of the commit to blame, it is the one which added dsa_link elements to dst->rtable. Prior to that, each switch had its own ds->rtable which is freed when the switch fails to probe. But the tree is potentially persistent memory. Fixes: c5f51765a1f6 ("net: dsa: list DSA links in the fabric") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414213001.2957964-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: clean up FDB, MDB, VLAN entries on unbindVladimir Oltean
As explained in many places such as commit b117e1e8a86d ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del"), DSA is written given the assumption that higher layers have balanced additions/deletions. As such, it only makes sense to be extremely vocal when those assumptions are violated and the driver unbinds with entries still present. But Ido Schimmel points out a very simple situation where that is wrong: https://lore.kernel.org/netdev/ZDazSM5UsPPjQuKr@shredder/ (also briefly discussed by me in the aforementioned commit). Basically, while the bridge bypass operations are not something that DSA explicitly documents, and for the majority of DSA drivers this API simply causes them to go to promiscuous mode, that isn't the case for all drivers. Some have the necessary requirements for bridge bypass operations to do something useful - see dsa_switch_supports_uc_filtering(). Although in tools/testing/selftests/net/forwarding/local_termination.sh, we made an effort to popularize better mechanisms to manage address filters on DSA interfaces from user space - namely macvlan for unicast, and setsockopt(IP_ADD_MEMBERSHIP) - through mtools - for multicast, the fact is that 'bridge fdb add ... self static local' also exists as kernel UAPI, and might be useful to someone, even if only for a quick hack. It seems counter-productive to block that path by implementing shim .ndo_fdb_add and .ndo_fdb_del operations which just return -EOPNOTSUPP in order to prevent the ndo_dflt_fdb_add() and ndo_dflt_fdb_del() from running, although we could do that. Accepting that cleanup is necessary seems to be the only option. Especially since we appear to be coming back at this from a different angle as well. Russell King is noticing that the WARN_ON() triggers even for VLANs: https://lore.kernel.org/netdev/Z_li8Bj8bD4-BYKQ@shell.armlinux.org.uk/ What happens in the bug report above is that dsa_port_do_vlan_del() fails, then the VLAN entry lingers on, and then we warn on unbind and leak it. This is not a straight revert of the blamed commit, but we now add an informational print to the kernel log (to still have a way to see that bugs exist), and some extra comments gathered from past years' experience, to justify the logic. Fixes: 0832cd9f1f02 ("net: dsa: warn if port lists aren't empty in dsa_port_teardown") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414212930.2956310-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: mv88e6xxx: fix -ENOENT when deleting VLANs and MST is unsupportedVladimir Oltean
Russell King reports that on the ZII dev rev B, deleting a bridge VLAN from a user port fails with -ENOENT: https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/ This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(), which tries to find an MST entry in &chip->msts associated with the SID, but fails and returns -ENOENT as such. But we know that this chip does not support MST at all, so that is not surprising. The question is why does the guard in mv88e6xxx_mst_put() not exit early: if (!sid) return 0; And the answer seems to be simple: the sid comes from vlan.sid which supposedly was previously populated by mv88e6xxx_vtu_get(). But some chip->info->ops->vtu_getnext() implementations do not populate vlan.sid, for example see mv88e6185_g1_vtu_getnext(). In that case, later in mv88e6xxx_port_vlan_leave() we are using a garbage sid which is just residual stack memory. Testing for sid == 0 covers all cases of a non-bridge VLAN or a bridge VLAN mapped to the default MSTI. For some chips, SID 0 is valid and installed by mv88e6xxx_stu_setup(). A chip which does not support the STU would implicitly only support mapping all VLANs to the default MSTI, so although SID 0 is not valid, it would be sufficient, if we were to zero-initialize the vlan structure, to fix the bug, due to the coincidence that a test for vlan.sid == 0 already exists and leads to the same (correct) behavior. Another option which would be sufficient would be to add a test for mv88e6xxx_has_stu() inside mv88e6xxx_mst_put(), symmetric to the one which already exists in mv88e6xxx_mst_get(). But that placement means the caller will have to dereference vlan.sid, which means it will access uninitialized memory, which is not nice even if it ignores it later. So we end up making both modifications, in order to not rely just on the sid == 0 coincidence, but also to avoid having uninitialized structure fields which might get temporarily accessed. Fixes: acaf4d2e36b3 ("net: dsa: mv88e6xxx: MST Offloading") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414212913.2955253-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: mv88e6xxx: avoid unregistering devlink regions which were never ↵Vladimir Oltean
registered Russell King reports that a system with mv88e6xxx dereferences a NULL pointer when unbinding this driver: https://lore.kernel.org/netdev/Z_lRkMlTJ1KQ0kVX@shell.armlinux.org.uk/ The crash seems to be in devlink_region_destroy(), which is not NULL tolerant but is given a NULL devlink global region pointer. At least on some chips, some devlink regions are conditionally registered since the blamed commit, see mv88e6xxx_setup_devlink_regions_global(): if (cond && !cond(chip)) continue; These are MV88E6XXX_REGION_STU and MV88E6XXX_REGION_PVT. If the chip does not have an STU or PVT, it should crash like this. To fix the issue, avoid unregistering those regions which are NULL, i.e. were skipped at mv88e6xxx_setup_devlink_regions_global() time. Fixes: 836021a2d0e0 ("net: dsa: mv88e6xxx: Export cross-chip PVT as devlink region") Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414212850.2953957-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: txgbe: fix memory leak in txgbe_probe() error pathAbdun Nihaal
When txgbe_sw_init() is called, memory is allocated for wx->rss_key in wx_init_rss_key(). However, in txgbe_probe() function, the subsequent error paths after txgbe_sw_init() don't free the rss_key. Fix that by freeing it in error path along with wx->mac_table. Also change the label to which execution jumps when txgbe_sw_init() fails, because otherwise, it could lead to a double free for rss_key, when the mac_table allocation fails in wx_sw_init(). Fixes: 937d46ecc5f9 ("net: wangxun: add ethtool_ops for channel number") Reported-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250415032910.13139-1-abdun.nihaal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: bridge: switchdev: do not notify new brentries as changedJonas Gorski
When adding a bridge vlan that is pvid or untagged after the vlan has already been added to any other switchdev backed port, the vlan change will be propagated as changed, since the flags change. This causes the vlan to not be added to the hardware for DSA switches, since the DSA handler ignores any vlans for the CPU or DSA ports that are changed. E.g. the following order of operations would work: $ ip link add swbridge type bridge vlan_filtering 1 vlan_default_pvid 0 $ ip link set lan1 master swbridge $ bridge vlan add dev swbridge vid 1 pvid untagged self $ bridge vlan add dev lan1 vid 1 pvid untagged but this order would break: $ ip link add swbridge type bridge vlan_filtering 1 vlan_default_pvid 0 $ ip link set lan1 master swbridge $ bridge vlan add dev lan1 vid 1 pvid untagged $ bridge vlan add dev swbridge vid 1 pvid untagged self Additionally, the vlan on the bridge itself would become undeletable: $ bridge vlan port vlan-id lan1 1 PVID Egress Untagged swbridge 1 PVID Egress Untagged $ bridge vlan del dev swbridge vid 1 self $ bridge vlan port vlan-id lan1 1 PVID Egress Untagged swbridge 1 Egress Untagged since the vlan was never added to DSA's vlan list, so deleting it will cause an error, causing the bridge code to not remove it. Fix this by checking if flags changed only for vlans that are already brentry and pass changed as false for those that become brentries, as these are a new vlan (member) from the switchdev point of view. Since *changed is set to true for becomes_brentry = true regardless of would_change's value, this will not change any rtnetlink notification delivery, just the value passed on to switchdev in vlan->changed. Fixes: 8d23a54f5bee ("net: bridge: switchdev: differentiate new VLANs from changed ones") Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250414200020.192715-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: b53: enable BPDU reception for management portJonas Gorski
For STP to work, receiving BPDUs is essential, but the appropriate bit was never set. Without GC_RX_BPDU_EN, the switch chip will filter all BPDUs, even if an appropriate PVID VLAN was setup. Fixes: ff39c2d68679 ("net: dsa: b53: Add bridge support") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Link: https://patch.msgid.link/20250414200434.194422-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16Merge branch 'ynl-avoid-leaks-in-attr-override-and-spec-fixes-for-c'Jakub Kicinski
Jakub Kicinski says: ==================== ynl: avoid leaks in attr override and spec fixes for C The C rt-link work revealed more problems in existing codegen and classic netlink specs. Patches 1 - 4 fix issues with the codegen. Patches 1 and 2 are pre-requisites for patch 3. Patch 3 fixes leaking memory if user tries to override already set attr. Patch 4 validates attrs in case kernel sends something we don't expect. Remaining patches fix and align the specs. Patch 5 changes nesting, the rest are naming adjustments. ==================== Link: https://patch.msgid.link/20250414211851.602096-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16netlink: specs: rt-neigh: prefix struct nfmsg members with ndmJakub Kicinski
Attach ndm- to all members of struct nfmsg. We could possibly use name-prefix just for C, but I don't think we have any precedent for using name-prefix on structs, and other rtnetlink sub-specs give full names for fixed header struct members. Fixes: bc515ed06652 ("netlink: specs: Add a spec for neighbor tables in rtnetlink") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16netlink: specs: rt-link: adjust mctp attribute namingJakub Kicinski
MCTP attribute naming is inconsistent. In C we have: IFLA_MCTP_NET, IFLA_MCTP_PHYS_BINDING, ^^^^ but in YAML: - mctp-net - phys-binding ^ no "mctp" It's unclear whether the "mctp" part of the name is supposed to be a prefix or part of attribute name. Make it a prefix, seems cleaner, even tho technically phys-binding was added later. Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16netlink: specs: rtnetlink: attribute naming correctionsJakub Kicinski
Some attribute names diverge in very minor ways from the C names. These are most likely typos, and they prevent the C codegen from working. Fixes: bc515ed06652 ("netlink: specs: Add a spec for neighbor tables in rtnetlink") Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16netlink: specs: rt-link: add an attr layer around alt-ifnameJakub Kicinski
alt-ifname attr is directly placed in requests (as an alternative to ifname) but in responses its wrapped up in IFLA_PROP_LIST and only there is may be multi-attr. See rtnl_fill_prop_list(). Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16tools: ynl-gen: make sure we validate subtype of array-nestJakub Kicinski
ArrayNest AKA indexed-array support currently skips inner type validation. We count the attributes and then we parse them, make sure we call validate, too. Otherwise buggy / unexpected kernel response may lead to crashes. Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16tools: ynl-gen: individually free previous values on double setJakub Kicinski
When user calls request_attrA_set() multiple times (for the same attribute), and attrA is of type which allocates memory - we try to free the previously associated values. For array types (including multi-attr) we have only freed the array, but the array may have contained pointers. Refactor the code generation for free attr and reuse the generated lines in setters to flush out the previous state. Since setters are static inlines in the header we need to add forward declarations for the free helpers of pure nested structs. Track which types get used by arrays and include the right forwad declarations. At least ethtool string set and bit set would not be freed without this. Tho, admittedly, overriding already set attribute twice is likely a very very rare thing to do. Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16tools: ynl-gen: move local vars after the opening bracketJakub Kicinski
The "function writing helper" tries to put local variables between prototype and the opening bracket. Clearly wrong, but up until now nothing actually uses it to write local vars so it wasn't noticed. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16tools: ynl-gen: don't declare loop iterator in placeJakub Kicinski
The codegen tries to follow the "old" C style and declare loop iterators at the start of the block / function. Only nested request handling breaks this style, so adjust it. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16cxgb4: fix memory leak in cxgb4_init_ethtool_filters() error pathAbdun Nihaal
In the for loop used to allocate the loc_array and bmap for each port, a memory leak is possible when the allocation for loc_array succeeds, but the allocation for bmap fails. This is because when the control flow goes to the label free_eth_finfo, only the allocations starting from (i-1)th iteration are freed. Fix that by freeing the loc_array in the bmap allocation error path. Fixes: d915c299f1da ("cxgb4: add skeleton for ethtool n-tuple filters") Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414170649.89156-1-abdun.nihaal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-17Merge tag 'amd-drm-fixes-6.15-2025-04-16' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.15-2025-04-16: amdgpu: - Cleaner shader sysfs fix - Suspend fix - Fix doorbell free ordering - Video caps fix - DML2 memory allocation optimization - HDP fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250416223137.1146653-1-alexander.deucher@amd.com
2025-04-16bcachefs: Add missing READ_ONCE() for metadata replicasKent Overstreet
If we race with the user changing the metadata_replicas setting, this could cause us to get an incorrectly sized disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-16Bluetooth: vhci: Avoid needless snprintf() callsKees Cook
Avoid double-copying of string literals. Use a "const char *" for each string instead of copying from .rodata into stack and then into the skb. We can go directly from .rodata to the skb. This also works around a Clang bug (that has since been fixed[1]). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401250927.1poZERd6-lkp@intel.com/ Fixes: ab4e4380d4e1 ("Bluetooth: Add vhci devcoredump support") Link: https://github.com/llvm/llvm-project/commit/ea2e66aa8b6e363b89df66dc44275a0d7ecd70ce [1] Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-04-16Bluetooth: l2cap: Process valid commands in too long frameFrédéric Danis
This is required for passing PTS test cases: - L2CAP/COS/CED/BI-14-C Multiple Signaling Command in one PDU, Data Truncated, BR/EDR, Connection Request - L2CAP/COS/CED/BI-15-C Multiple Signaling Command in one PDU, Data Truncated, BR/EDR, Disconnection Request The test procedure defined in L2CAP.TS.p39 for both tests is: 1. The Lower Tester sends a C-frame to the IUT with PDU Length set to 8 and Channel ID set to the correct signaling channel for the logical link. The Information payload contains one L2CAP_ECHO_REQ packet with Data Length set to 0 with 0 octets of echo data and one command packet and Data Length set as specified in Table 4.6 and the correct command data. 2. The IUT sends an L2CAP_ECHO_RSP PDU to the Lower Tester. 3. Perform alternative 3A, 3B, 3C, or 3D depending on the IUT’s response. Alternative 3A (IUT terminates the link): 3A.1 The IUT terminates the link. 3A.2 The test ends with a Pass verdict. Alternative 3B (IUT discards the frame): 3B.1 The IUT does not send a reply to the Lower Tester. Alternative 3C (IUT rejects PDU): 3C.1 The IUT sends an L2CAP_COMMAND_REJECT_RSP PDU to the Lower Tester. Alternative 3D (Any other IUT response): 3D.1 The Upper Tester issues a warning and the test ends. 4. The Lower Tester sends a C-frame to the IUT with PDU Length set to 4 and Channel ID set to the correct signaling channel for the logical link. The Information payload contains Data Length set to 0 with an L2CAP_ECHO_REQ packet with 0 octets of echo data. 5. The IUT sends an L2CAP_ECHO_RSP PDU to the Lower Tester. With expected outcome: In Steps 2 and 5, the IUT responds with an L2CAP_ECHO_RSP. In Step 3A.1, the IUT terminates the link. In Step 3B.1, the IUT does not send a reply to the Lower Tester. In Step 3C.1, the IUT rejects the PDU. In Step 3D.1, the IUT sends any valid response. Currently PTS fails with the following logs: Failed to receive ECHO RESPONSE. And HCI logs: > ACL Data RX: Handle 11 flags 0x02 dlen 20 L2CAP: Information Response (0x0b) ident 2 len 12 Type: Fixed channels supported (0x0003) Result: Success (0x0000) Channels: 0x000000000000002e L2CAP Signaling (BR/EDR) Connectionless reception AMP Manager Protocol L2CAP Signaling (LE) > ACL Data RX: Handle 11 flags 0x02 dlen 13 frame too long 08 01 00 00 08 02 01 00 aa ......... Cc: stable@vger.kernel.org Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-04-16Merge tag 'devicetree-fixes-for-6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - A couple of maintainers updates - Remove obsolete Renesas TPU timer binding - Add i.MX94 support to nxp,sysctr-timer and fsl,irqsteer - Add support for 'data-lanes' property in fsl,imx8mq-nwl-dsi binding * tag 'devicetree-fixes-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: soc: fsl: fsl,ls1028a-reset: Fix maintainer entry dt-bindings: timer: renesas,tpu: remove obsolete binding dt-bindings: timer: nxp,sysctr-timer: Add i.MX94 support dt-bindings: interrupt-controller: fsl,irqsteer: Add i.MX94 support dt-bindings: display: nwl-dsi: Allow 'data-lanes' property for port@1 dt-bindings: xilinx: Remove myself from maintainership
2025-04-16block: integrity: Do not call set_page_dirty_lock()Martin K. Petersen
Placing multiple protection information buffers inside the same page can lead to oopses because set_page_dirty_lock() can't be called from interrupt context. Since a protection information buffer is not backed by a file there is no point in setting its page dirty, there is nothing to synchronize. Drop the call to set_page_dirty_lock() and remove the last argument to bio_integrity_unpin_bvec(). Cc: stable@vger.kernel.org Fixes: 492c5d455969 ("block: bio-integrity: directly map user buffers") Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/yq1v7r3ev9g.fsf@ca-mkp.ca.oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-16drm/amdgpu: Use the right function for hdp flushLijo Lazar
There are a few prechecks made before HDP flush like a flush is not required on APU bare metal. Using hdp callback directly bypasses those checks. Use amdgpu_device_flush_hdp which takes care of prechecks. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1d9bff4cf8c53d33ee2ff1b11574e5da739ce61c)