summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-03-07capability: Remove unused has_capabilityDr. David Alan Gilbert
The vanilla has_capability() function has been unused since 2018's commit dcb569cf6ac9 ("Smack: ptrace capability use fixes") Remove it. Fixup a comment in security/commoncap.c that referenced it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Serge Hallyn <sergeh@kernel.org>
2025-03-07ubsan/overflow: Rework integer overflow sanitizer option to turn on everythingKees Cook
Since we're going to approach integer overflow mitigation a type at a time, we need to enable all of the associated sanitizers, and then opt into types one at a time. Rename the existing "signed wrap" sanitizer to just the entire topic area: "integer wrap". Enable the implicit integer truncation sanitizers, with required callbacks and tests. Notably, this requires features (currently) only available in Clang, so we can depend on the cc-option tests to determine availability instead of doing version tests. Link: https://lore.kernel.org/r/20250307041914.937329-1-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-07netpoll: Optimize skb refilling on critical pathBreno Leitao
netpoll tries to refill the skb queue on every packet send, independently if packets are being consumed from the pool or not. This was particularly problematic while being called from printk(), where the operation would be done while holding the console lock. Introduce a more intelligent approach to skb queue management. Instead of constantly attempting to refill the queue, the system now defers refilling to a work queue and only triggers the workqueue when a buffer is actually dequeued. This change significantly reduces operations with the lock held. Add a work_struct to the netpoll structure for asynchronous refilling, updating find_skb() to schedule refill work only when necessary (skb is dequeued). These changes have demonstrated a 15% reduction in time spent during netpoll_send_msg operations, especially when no SKBs are not consumed from consumed from pool. When SKBs are being dequeued, the improvement is even better, around 70%, mainly because refilling the SKB pool is now happening outside of the critical patch (with console_owner lock held). Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250304-netpoll_refill_v2-v1-1-06e2916a4642@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-07net: phylink: Remove unused phylink_init_eeeDr. David Alan Gilbert
phylink_init_eee() is currently unused. It was last added in 2019 by commit 86e58135bc4a ("net: phylink: add phylink_init_eee() helper") but it didn't actually wire a use up. It had previous been removed in 2017 by commit 939eae25d9a5 ("phylink: remove phylink_init_eee()"). Remove it again. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250306184534.246152-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-08power: supply: core: get rid of of_nodeSebastian Reichel
This removes .of_node from 'struct power_supply', since there is already a copy in .dev.of_node and there is no need to have two copies. Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20250225-psy-core-convert-to-fwnode-v1-1-d5e4369936bb@collabora.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-03-08power: supply: Remove unused set_charged methodDr. David Alan Gilbert
The previous patches in this series removed the only caller and only setter of this method. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20250307230225.128775-4-linux@treblig.org Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-03-08power: supply: core: Remove unused power_supply_set_battery_chargedDr. David Alan Gilbert
power_supply_set_battery_charged() has been unused since 2019's commit 0f884f8a090e ("ARM: pxa: remove raumfeld board files and defconfig") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20250307230225.128775-2-linux@treblig.org Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-03-07Merge tag 'acpi-6.14-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Restore the previous behavior of the ACPI platform_profile sysfs interface that has been changed recently in a way incompatible with the existing user space (Mario Limonciello)" * tag 'acpi-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: platform/x86/amd: pmf: Add balanced-performance to hidden choices platform/x86/amd: pmf: Add 'quiet' to hidden choices ACPI: platform_profile: Add support for hidden choices
2025-03-07Merge tag 'block-6.14-20250306' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - TCP use after free fix on polling (Sagi) - Controller memory buffer cleanup fixes (Icenowy) - Free leaking requests on bad user passthrough commands (Keith) - TCP error message fix (Maurizio) - TCP corruption fix on partial PDU (Maurizio) - TCP memory ordering fix for weakly ordered archs (Meir) - Type coercion fix on message error for TCP (Dan) - Name the RQF flags enum, fixing issues with anon enums and BPF import of it - ublk parameter setting fix - GPT partition 7-bit conversion fix * tag 'block-6.14-20250306' of git://git.kernel.dk/linux: block: Name the RQF flags enum nvme-tcp: fix signedness bug in nvme_tcp_init_connection() block: fix conversion of GPT partition name to 7-bit ublk: set_params: properly check if parameters can be applied nvmet-tcp: Fix a possible sporadic response drops in weakly ordered arch nvme-tcp: fix potential memory corruption in nvme_tcp_recv_pdu() nvme-tcp: Fix a C2HTermReq error message nvmet: remove old function prototype nvme-ioctl: fix leaked requests on mapping error nvme-pci: skip CMB blocks incompatible with PCI P2P DMA nvme-pci: clean up CMBMSC when registering CMB fails nvme-tcp: fix possible UAF in nvme_tcp_poll
2025-03-07io_uring/rw: defer reg buf vec importPavel Begunkov
Import registered buffers for vectored reads and writes later at issue time as we now do for other fixed ops. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e8491c976e4ab83a4e3dc428e9fe7555e59583b8.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring: add infra for importing vectored reg buffersPavel Begunkov
Add io_import_reg_vec(), which will be responsible for importing vectored registered buffers. The function might reallocate the vector, but it'd try to do the conversion in place first, which is why it's required of the user to pad the iovec to the right border of the cache. Overlapping also depends on struct iovec being larger than bvec, which is not the case on e.g. 32 bit architectures. Don't try to complicate this case and make sure vectors never overlap, it'll be improved later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60bd246b1249476a6996407c1dbc38ef6febad14.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring: introduce struct iou_vecPavel Begunkov
I need a convenient way to pass around and work with iovec+size pair, put them into a structure and makes use of it in rw.c Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d39fadafc9e9047b0a292e5be6db3cf2f48bb1f7.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07Merge branch 'for-6.15/io_uring-epoll-wait' into for-6.15/io_uring-reg-vecJens Axboe
* for-6.15/io_uring-epoll-wait: io_uring/epoll: add support for IORING_OP_EPOLL_WAIT io_uring/epoll: remove CONFIG_EPOLL guards eventpoll: add epoll_sendevents() helper eventpoll: abstract out ep_try_send_events() helper eventpoll: abstract out parameter sanity checking
2025-03-07Merge branch 'for-6.15/io_uring-rx-zc' into for-6.15/io_uring-reg-vecJens Axboe
* for-6.15/io_uring-rx-zc: (80 commits) io_uring/zcrx: add selftest case for recvzc with read limit io_uring/zcrx: add a read limit to recvzc requests io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap io_uring: Rename KConfig to Kconfig io_uring/zcrx: fix leaks on failed registration io_uring/zcrx: recheck ifq on shutdown io_uring/zcrx: add selftest net: add documentation for io_uring zcrx io_uring/zcrx: add copy fallback io_uring/zcrx: throttle receive requests io_uring/zcrx: set pp memory provider for an rx queue io_uring/zcrx: add io_recvzc request io_uring/zcrx: dma-map area for the device io_uring/zcrx: implement zerocopy receive pp memory provider io_uring/zcrx: grab a net device io_uring/zcrx: add io_zcrx_area io_uring/zcrx: add interface queue and refill queue net: add helpers for setting a memory provider on an rx queue net: page_pool: add memory provider helpers net: prepare for non devmem TCP memory providers ...
2025-03-07Merge branch 'for-6.15/io_uring' into for-6.15/io_uring-reg-vecJens Axboe
* for-6.15/io_uring: (80 commits) io_uring: introduce io_cache_free() helper io_uring/rsrc: skip NULL file/buffer checks in io_free_rsrc_node() io_uring/rsrc: avoid NULL node check on io_sqe_buffer_register() failure io_uring/rsrc: call io_free_node() on io_sqe_buffer_register() failure io_uring/rsrc: free io_rsrc_node using kfree() io_uring/rsrc: split out io_free_node() helper io_uring/rsrc: include io_uring_types.h in rsrc.h ublk: don't cast registered buffer index to int io_uring/nop: use io_find_buf_node() io_uring/rsrc: declare io_find_buf_node() in header file io_uring/ublk: report error when unregister operation fails io_uring: convert cmd_to_io_kiocb() macro to function io_uring/uring_cmd: specify io_uring_cmd_import_fixed() pointer type io_uring/rsrc: use rq_data_dir() to compute bvec dir selftests: ublk: add ublk zero copy test selftests: ublk: add file backed ublk selftests: ublk: add kernel selftests for ublk io_uring: cache nodes and mapped buffers ublk: zc register/unregister bvec io_uring: add support for kernel registered bvecs ...
2025-03-07PM: EM: Address RCU-related sparse warningsRafael J. Wysocki
The usage of __rcu in the Energy Model code is quite inconsistent which causes the following sparse warnings to trigger: kernel/power/energy_model.c:169:15: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:169:15: expected struct em_perf_table [noderef] __rcu *table kernel/power/energy_model.c:169:15: got struct em_perf_table * kernel/power/energy_model.c:171:9: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:171:9: expected struct callback_head *head kernel/power/energy_model.c:171:9: got struct callback_head [noderef] __rcu * kernel/power/energy_model.c:171:9: warning: cast removes address space '__rcu' of expression kernel/power/energy_model.c:182:19: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:182:19: expected struct kref *kref kernel/power/energy_model.c:182:19: got struct kref [noderef] __rcu * kernel/power/energy_model.c:200:15: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:200:15: expected struct em_perf_table [noderef] __rcu *table kernel/power/energy_model.c:200:15: got void *[assigned] _res kernel/power/energy_model.c:204:20: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:204:20: expected struct kref *kref kernel/power/energy_model.c:204:20: got struct kref [noderef] __rcu * kernel/power/energy_model.c:320:19: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:320:19: expected struct kref *kref kernel/power/energy_model.c:320:19: got struct kref [noderef] __rcu * kernel/power/energy_model.c:325:45: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:325:45: expected struct em_perf_state *table kernel/power/energy_model.c:325:45: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:425:45: warning: incorrect type in argument 3 (different address spaces) kernel/power/energy_model.c:425:45: expected struct em_perf_state *table kernel/power/energy_model.c:425:45: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:442:15: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:442:15: expected void const *objp kernel/power/energy_model.c:442:15: got struct em_perf_table [noderef] __rcu *[assigned] em_table kernel/power/energy_model.c:626:55: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:626:55: expected struct em_perf_state *table kernel/power/energy_model.c:626:55: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:681:16: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:681:16: expected struct em_perf_state *new_ps kernel/power/energy_model.c:681:16: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:699:37: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:699:37: expected struct em_perf_state *table kernel/power/energy_model.c:699:37: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:733:38: warning: incorrect type in argument 3 (different address spaces) kernel/power/energy_model.c:733:38: expected struct em_perf_state *table kernel/power/energy_model.c:733:38: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:855:53: warning: dereference of noderef expression kernel/power/energy_model.c:864:32: warning: dereference of noderef expression This is because the __rcu annotation for sparse is only applicable to pointers that need rcu_dereference() or equivalent for protection, which basically means pointers assigned with rcu_assign_pointer(). Make all of the above sparse warnings go away by cleaning up the usage of __rcu and using rcu_dereference_protected() where applicable. Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/5885405.DvuYhMxLoT@rjwysocki.net
2025-03-07bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()Luis Chamberlain
The commit titled "block/bdev: lift block size restrictions to 64k" lifted the block layer's max supported block size to 64k inside the helper blk_validate_block_size() now that we support large folios. However in lifting the block size we also removed the silly use cases many filesystems have to use sb_set_blocksize() to *verify* that the block size <= PAGE_SIZE. The call to sb_set_blocksize() was used to check the block size <= PAGE_SIZE since historically we've always supported userspace to create for example 64k block size filesystems even on 4k page size systems, but what we didn't allow was mounting them. Older filesystems have been using the check with sb_set_blocksize() for years. While, we could argue that such checks should be filesystem specific, there are much more users of sb_set_blocksize() than LBS enabled filesystem on upstream, so just do the easier thing and bring back the PAGE_SIZE check for sb_set_blocksize() users and only skip it for LBS enabled filesystems. This will ensure that tests such as generic/466 when run in a loop against say, ext4, won't try to try to actually mount a filesystem with a block size larger than your filesystem supports given your PAGE_SIZE and in the worst case crash. Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20250307020403.3068567-1-mcgrof@kernel.org Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06fs/pipe: add simpler helpers for common casesLinus Torvalds
The fix to atomically read the pipe head and tail state when not holding the pipe mutex has caused a number of headaches due to the size change of the involved types. It turns out that we don't have _that_ many places that access these fields directly and were affected, but we have more than we strictly should have, because our low-level helper functions have been designed to have intimate knowledge of how the pipes work. And as a result, that random noise of direct 'pipe->head' and 'pipe->tail' accesses makes it harder to pinpoint any actual potential problem spots remaining. For example, we didn't have a "is the pipe full" helper function, but instead had a "given these pipe buffer indexes and this pipe size, is the pipe full". That's because some low-level pipe code does actually want that much more complicated interface. But most other places literally just want a "is the pipe full" helper, and not having it meant that those places ended up being unnecessarily much too aware of this all. It would have been much better if only the very core pipe code that cared had been the one aware of this all. So let's fix it - better late than never. This just introduces the trivial wrappers for "is this pipe full or empty" and to get how many pipe buffers are used, so that instead of writing if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) the places that literally just want to know if a pipe is full can just say if (pipe_is_full(pipe)) instead. The existing trivial cases were converted with a 'sed' script. This cuts down on the places that access pipe->head and pipe->tail directly outside of the pipe code (and core splice code) quite a lot. The splice code in particular still revels in doing the direct low-level accesses, and the fuse fuse_dev_splice_write() code also seems a bit unnecessarily eager to go very low-level, but it's at least a bit better than it used to be. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06net/mlx5: Relocate function declarations from port.h to mlx5_core.hShahar Shitrit
The port header is a general file under include, yet it contains declarations for functions that are either not exported or exported but not used outside the mlx5_core driver. To enhance code organization, we move these declarations to mlx5_core.h, where they are more appropriately scoped. This refactor removes unnecessary exported symbols and prevents unexported functions from being inadvertently referenced outside of the mlx5_core driver. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06block: Name the RQF flags enumBreno Leitao
Commit 5f89154e8e9e3445f9b59 ("block: Use enum to define RQF_x bit indexes") converted the RQF flags to an anonymous enum, which was a beneficial change. This patch goes one step further by naming the enum as "rqf_flags". This naming enables exporting these flags to BPF clients, eliminating the need to duplicate these flags in BPF code. Instead, BPF clients can now access the same kernel-side values through CO:RE (Compile Once, Run Everywhere), as shown in this example: rqf_stats = bpf_core_enum_value(enum rqf_flags, __RQF_STATS) Suggested-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20250306-rqf_flags-v1-1-bbd64918b406@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-06Merge branch 'sched/urgent' into sched/core, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: net/ethtool/cabletest.c 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock") 637399bf7e77 ("net: ethtool: netlink: Allow NULL nlattrs when getting a phy_device") No Adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06eth: bnxt: remove most dependencies on RTNLStanislav Fomichev
Only devlink and sriov paths are grabbing rtnl explicitly. The rest is covered by netdev instance lock which the core now grabs, so there is no need to manage rtnl in most places anymore. On the core side we can now try to drop rtnl in some places (do_setlink for example) for the drivers that signal non-rtnl mode (TBD). Boot-tested and with `ethtool -L eth1 combined 24` to trigger reset. Cc: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-15-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06docs: net: document new locking realityStanislav Fomichev
Also clarify ndo_get_stats (that read and write paths can run concurrently) and mention only RCU. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-14-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: add option to request netdev instance lockStanislav Fomichev
Currently only the drivers that implement shaper or queue APIs are grabbing instance lock. Add an explicit opt-in for the drivers that want to grab the lock without implementing the above APIs. There is a 3-byte hole after @up, use it: /* --- cacheline 47 boundary (3008 bytes) --- */ u32 napi_defer_hard_irqs; /* 3008 4 */ bool up; /* 3012 1 */ /* XXX 3 bytes hole, try to pack */ struct mutex lock; /* 3016 144 */ /* XXX last struct has 1 hole */ Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-13-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: replace dev_addr_sem with netdev instance lockStanislav Fomichev
Lockdep reports possible circular dependency in [0]. Instead of fixing the ordering, replace global dev_addr_sem with netdev instance lock. Most of the paths that set/get mac are RTNL protected. Two places where it's not, convert to explicit locking: - sysfs address_show - dev_get_mac_address via dev_ioctl 0: https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/993321/24-router-bridge-1d-lag-sh/stderr Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-12-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ndo_bpfStanislav Fomichev
Cover the paths that come via bpf system call and XSK bind. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-10-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during sysfs operationsStanislav Fomichev
Most of them are already covered by the converted dev_xxx APIs. Add the locking wrappers for the remaining ones. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-9-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ioctl operationsStanislav Fomichev
Convert all ndo_eth_ioctl invocations to dev_eth_ioctl which does the locking. Reflow some of the dev_siocxxx to drop else clause. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during rtnetlink operationsStanislav Fomichev
To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock are software nesting devices (bonding/team/vrf/etc). Those devices call into the core stack for their lower (potentially real hw) devices. To avoid explicitly wrapping all those places into instance lock/unlock, introduce new API boundaries: - (some) existing dev_xxx calls are now considered "external" (to drivers) APIs and they transparently grab the instance lock if needed (dev_api.c) - new netif_xxx calls are internal core stack API (naming is sketchy, I've tried netdev_xxx_locked per Jakub's suggestion, but it feels a bit verbose; but happy to get back to this naming scheme if this is the preference) This avoids touching most of the existing ioctl/sysfs/drivers paths. Note the special handling of ndo_xxx_slave operations: I exploit the fact that none of the drivers that call these functions need/use instance lock. At the same time, they use dev_xxx APIs, so the lower device has to be unlocked. Changes in unregister_netdevice_many_notify (to protect dev->state with instance lock) trigger lockdep - the loop over close_list (mostly from cleanup_net) introduces spurious ordering issues. netdev_lock_cmp_fn has a justification on why it's ok to suppress for now. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during queue operationsStanislav Fomichev
For the drivers that use queue management API, switch to the mode where core stack holds the netdev instance lock. This affects the following drivers: - bnxt - gve - netdevsim Originally I locked only start/stop, but switched to holding the lock over all iterations to make them look atomic to the device (feels like it should be easier to reason about). Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during nft ndo_setup_tcStanislav Fomichev
Introduce new dev_setup_tc for nft ndo_setup_tc paths. Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ndo_open/ndo_stopStanislav Fomichev
For the drivers that use shaper API, switch to the mode where core stack holds the netdev lock. This affects two drivers: * iavf - already grabs netdev lock in ndo_open/ndo_stop, so mostly remove these * netdevsim - switch to _locked APIs to avoid deadlock iavf_close diff is a bit confusing, the existing call looks like this: iavf_close() { netdev_lock() .. netdev_unlock() wait_event_timeout(down_waitqueue) } I change it to the following: netdev_lock() iavf_close() { .. netdev_unlock() wait_event_timeout(down_waitqueue) netdev_lock() // reusing this lock call } netdev_unlock() Since I'm reusing existing netdev_lock call, so it looks like I only add netdev_unlock. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06Merge tag 'amd-pstate-v6.15-2025-03-06' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Merge amd-pstate updates for 6.15 (3/6/25) from Mario Limonciello: "A lot of code optimization to avoid cases where call paths will end up calling the same writes multiple times and needlessly caching variables. To accomplish this some of the writes are now made into an atomically written "perf" variable. Locking has been overhauled to ensure it only applies to the necessary functions. Tracing has been adjusted to ensure trace events only are used right before writing out to the hardware." NOTE: This is a redo of amd-pstate-v6.15-2025-03-03 with a fixed Fixes tag. * tag 'amd-pstate-v6.15-2025-03-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: (29 commits) cpufreq/amd-pstate: Drop actions in amd_pstate_epp_cpu_offline() cpufreq/amd-pstate: Stop caching EPP cpufreq/amd-pstate: Rework CPPC enabling cpufreq/amd-pstate: Drop debug statements for policy setting cpufreq/amd-pstate: Update cppc_req_cached for shared mem EPP writes cpufreq/amd-pstate: Move all EPP tracing into *_update_perf and *_set_epp functions cpufreq/amd-pstate: Cache CPPC request in shared mem case too cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks cpufreq/amd-pstate-ut: Adjust variable scope cpufreq/amd-pstate-ut: Run on all of the correct CPUs cpufreq/amd-pstate-ut: Drop SUCCESS and FAIL enums cpufreq/amd-pstate-ut: Allow lowest nonlinear and lowest to be the same cpufreq/amd-pstate-ut: Use _free macro to free put policy cpufreq/amd-pstate: Drop `cppc_cap1_cached` cpufreq/amd-pstate: Overhaul locking cpufreq/amd-pstate: Move perf values into a union cpufreq/amd-pstate: Drop min and max cached frequencies cpufreq/amd-pstate: Show a warning when a CPU fails to setup cpufreq/amd-pstate: Invalidate cppc_req_cached during suspend cpufreq/amd-pstate: Fix the clamping of perf values ...
2025-03-06PM: EM: Consify two parameters of em_dev_register_perf_domain()Rafael J. Wysocki
Notice that em_dev_register_perf_domain() and the functions called by it do not update objects pointed to by its cb and cpus parameters, so the const modifier can be added to them. This allows the return value of cpumask_of() or a pointer to a struct em_data_callback declared as const to be passed to em_dev_register_perf_domain() directly without explicit type casting which is rather handy. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/4648962.LvFx2qVVIh@rjwysocki.net
2025-03-06mm/migrate: Add migrate_device_pfnsMatthew Brost
Add migrate_device_pfns which prepares an array of pre-populated device pages for migration. This is needed for eviction of known set of non-contiguous devices pages to cpu pages which is a common case for SVM in DRM drivers using TTM. v2: - s/migrate_device_vma_range/migrate_device_prepopulated_range - Drop extra mmu invalidation (Vetter) v3: - s/migrate_device_prepopulated_range/migrate_device_pfns (Alistar) - Use helper to lock device pages (Alistar) - Update commit message with why this is required (Alistar) Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-3-matthew.brost@intel.com
2025-03-06fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmwareJason Gunthorpe
Add the FWCTL_RPC ioctl which allows a request/response RPC call to device firmware. Drivers implementing this call must follow the security guidelines under Documentation/userspace-api/fwctl.rst The core code provides some memory management helpers to get the messages copied from and back to userspace. The driver is responsible for allocating the output message memory and delivering the message to the device. Link: https://patch.msgid.link/r/5-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06taint: Add TAINT_FWCTLJason Gunthorpe
Requesting a fwctl scope of access that includes mutating device debug data will cause the kernel to be tainted. Changing the device operation through things in the debug scope may cause the device to malfunction in undefined ways. This should be reflected in the TAINT flags to help any debuggers understand that something has been done. Link: https://patch.msgid.link/r/4-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06fwctl: FWCTL_INFO to return basic information about the deviceJason Gunthorpe
Userspace will need to know some details about the fwctl interface being used to locate the correct userspace code to communicate with the kernel. Provide a simple device_type enum indicating what the kernel driver is. Allow the device to provide a device specific info struct that contains any additional information that the driver may need to provide to userspace. Link: https://patch.msgid.link/r/3-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06fwctl: Basic ioctl dispatch for the character deviceJason Gunthorpe
Each file descriptor gets a chunk of per-FD driver specific context that allows the driver to attach a device specific struct to. The core code takes care of the memory lifetime for this structure. The ioctl dispatch and design is based on what was built for iommufd. The ioctls have a struct which has a combined in/out behavior with a typical 'zero pad' scheme for future extension and backwards compatibility. Like iommufd some shared logic does most of the ioctl marshaling and compatibility work and table dispatches to some function pointers for each unique ioctl. This approach has proven to work quite well in the iommufd and rdma subsystems. Allocate an ioctl number space for the subsystem. Link: https://patch.msgid.link/r/2-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06fwctl: Add basic structure for a class subsystem with a cdevJason Gunthorpe
Create the class, character device and functions for a fwctl driver to un/register to the subsystem. A typical fwctl driver has a sysfs presence like: $ ls -l /dev/fwctl/fwctl0 crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0 $ ls /sys/class/fwctl/fwctl0 dev device power subsystem uevent $ ls /sys/class/fwctl/fwctl0/device/infiniband/ ibp0s10f0 $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/ fwctl0/ $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0 dev device power subsystem uevent Which allows userspace to link all the multi-subsystem driver components together and learn the subsystem specific names for the device's components. Link: https://patch.msgid.link/r/1-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06tracing: Remove orphaned event_trace_printkHengqi Chen
The event_trace_printk macro has no callers since commit b8e65554d80b ("tracing: remove deprecated TRACE_FORMAT"). So drop it. Link: https://lore.kernel.org/20250213113951.813258-1-hengqi.chen@gmail.com Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-06Merge tag 'vfs-6.14-rc6.fixes' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix spelling mistakes in idmappings.rst - Fix RCU warnings in override_creds()/revert_creds() - Create new pid namespaces with default limit now that pid_max is namespaced * tag 'vfs-6.14-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: pid: Do not set pid_max in new pid namespaces doc: correcting two prefix errors in idmappings.rst cred: Fix RCU warnings in override/revert_creds
2025-03-06fs/pipe: express 'pipe_empty()' in terms of 'pipe_occupancy()'Linus Torvalds
That's what 'pipe_full()' does, so it's more consistent. But more importantly it gets the type limits right when the pipe head and tail are no longer necessarily 'unsigned int'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06Merge tag 'ffa-updates-6.15' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm FF-A updates for v6.15 This update primarily focuses on FF-A framework notification support along with other improvements, including UUID handling enhancements and various fixes. 1. FF-A framework notification upport - Adds support for multiple UUIDs per partition to register individual SRI callbacks. - Handles Rx buffer full framework notifications and provides a general interface for future extensions. 2. Improved multiple UUID/services per-partition handling - Adds support for UUID passing in FFA_MSG_SEND2, improving multiple UUID/service support in the driver. - Introduces a helper function to check whether a partition can receive REQUEST2 messages. 3. Partition handling generic improvements - Implements device unregistration for better partition cleanup. - Improves handling of the host partition presence in partition info. 4. FF-A version updates - Upgrades the driver version to FF-A v1.2. - Rejects major versions higher than the driver version as incompatible. 5. Big-Endian support fixes - Fixes big-endian issues in: __ffa_partition_info_regs_get() __ffa_partition_info_get() - Big-endian support is still incomplete, and only these changes can be verified without additional application/testing updates at the moment. We can discover all the partitions correctly with big-endian kernel now. 6. Miscellaneous fixes - Fixes function prototype misalignments in: sync_send_receive{,2} - Adds explicit type casting for return values from: FFA_VERSION and NOTIFICATION_INFO_GET - Corrects vCPU list parsing in ffa_notification_info_get(). 7. UUID management in the driver and DMA mask updates - Replaces UUID buffer with the standard UUID format in ffa_partition_info structure. - Fixes a typo in some FF-A bus macros. - Sets dma_mask for FF-A devices. In short, this update enhances notification handling, UUID support, and overall robustness of the FF-A driver while addressing multiple fixes and cleanups. * tag 'ffa-updates-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: (23 commits) firmware: arm_ffa: Set dma_mask for ffa devices firmware: arm_ffa: Skip the first/partition ID when parsing vCPU list firmware: arm_ffa: Explicitly cast return value from NOTIFICATION_INFO_GET firmware: arm_ffa: Explicitly cast return value from FFA_VERSION before comparison firmware: arm_ffa: Handle ffa_notification_get correctly at virtual FF-A instance firmware: arm_ffa: Allow multiple UUIDs per partition to register SRI callback firmware: arm_ffa: Add support for handling framework notifications firmware: arm_ffa: Add support for {un,}registration of framework notifications firmware: arm_ffa: Stash ffa_device instead of notify_type in notifier_cb_info firmware: arm_ffa: Refactoring to prepare for framework notification support firmware: arm_ffa: Remove unnecessary declaration of ffa_partitions_cleanup() firmware: arm_ffa: Reject higher major version as incompatible firmware: arm_ffa: Upgrade FF-A version to v1.2 in the driver firmware: arm_ffa: Add support for passing UUID in FFA_MSG_SEND2 firmware: arm_ffa: Helper to check if a partition can receive REQUEST2 messages firmware: arm_ffa: Unregister the FF-A devices when cleaning up the partitions firmware: arm_ffa: Handle the presence of host partition in the partition info firmware: arm_ffa: Refactor addition of partition information into XArray firmware: arm_ffa: Fix big-endian support in __ffa_partition_info_regs_get() firmware: arm_ffa: Fix big-endian support in __ffa_partition_info_get() ... Link: https://lore.kernel.org/r/20250304105928.432997-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-06Merge tag 'smccc-update-6.15' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm SMCCC update for v6.15 Just a single update introducing the support for the optional SOC_ID name string from the Arm SMCCC v1.6 specification. If the SOC_ID name string is implemented, the machine field of the SoC Device Attributes will reflect it. The original intent of SOC_ID was to provide a JEP-106 code for the SiP and the SoC revision to uniquely identify the SoC. However, there has been a request to add this optional name so that SoC firmware can directly provide the SoC name to the OS. This change avoids the need for frequent updates to various tools that would otherwise require maintaining hardcoded model/machine name tables for new SoCs. * tag 'smccc-update-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: smccc: Support optional Arm SMCCC SOC_ID name Link: https://lore.kernel.org/r/20250304105845.432813-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-06Merge tag 'asahi-soc-rtkit-6.15' of https://github.com/AsahiLinux/linux into ↵Arnd Bergmann
soc/drivers Apple SoC RTKit IPC library updates for 6.15: - Additional logging for errors - A few minor improvements and bugfixes required for drivers that are yet to be upstreamed * tag 'asahi-soc-rtkit-6.15' of https://github.com/AsahiLinux/linux: soc: apple: rtkit: Cut syslog messages after the first '\0' soc: apple: rtkit: Use high prio work queue soc: apple: rtkit: Implement OSLog buffers properly soc: apple: rtkit: Add and use PWR_STATE_INIT instead of _ON soc: apple: rtkit: Fix use-after-free in apple_rtkit_crashlog_rx() soc: apple: rtkit: Pass the crashlog to the crashed() callback soc: apple: rtkit: Check & log more failures Link: https://lore.kernel.org/r/20250302113842.58092-1-sven@svenpeter.dev Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-06badblocks: use sector_t instead of int to avoid truncation of badblocks lengthZheng Qixing
There is a truncation of badblocks length issue when set badblocks as follow: echo "2055 4294967299" > bad_blocks cat bad_blocks 2055 3 Change 'sectors' argument type from 'int' to 'sector_t'. This change avoids truncation of badblocks length for large sectors by replacing 'int' with 'sector_t' (u64), enabling proper handling of larger disk sizes and ensuring compatibility with 64-bit sector addressing. Fixes: 9e0e252a048b ("badblocks: Add core badblock management code") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Coly Li <colyli@kernel.org> Link: https://lore.kernel.org/r/20250227075507.151331-13-zhengqixing@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-06badblocks: return boolean from badblocks_set() and badblocks_clear()Zheng Qixing
Change the return type of badblocks_set() and badblocks_clear() from int to bool, indicating success or failure. Specifically: - _badblocks_set() and _badblocks_clear() functions now return true for success and false for failure. - All calls to these functions are updated to handle the new boolean return type. - This change improves code clarity and ensures a more consistent handling of success and failure states. Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Coly Li <colyli@kernel.org> Acked-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20250227075507.151331-11-zhengqixing@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-06uprobes/x86: Harden uretprobe syscall trampoline checkJiri Olsa
Jann reported a possible issue when trampoline_check_ip returns address near the bottom of the address space that is allowed to call into the syscall if uretprobes are not set up: https://lore.kernel.org/bpf/202502081235.5A6F352985@keescook/T/#m9d416df341b8fbc11737dacbcd29f0054413cbbf Though the mmap minimum address restrictions will typically prevent creating mappings there, let's make sure uretprobe syscall checks for that. Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250212220433.3624297-1-jolsa@kernel.org