summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-10-23PCI: Protect Link Control 2 Register with RMW lockingIlpo Järvinen
The PCIe Bandwidth Controller performs RMW accesses the Link Control 2 Register which can occur concurrently to other sources of Link Control 2 Register writes. Therefore, add Link Control 2 Register among the PCI Express Capability Registers that need RMW locking. Link: https://lore.kernel.org/r/20241018144755.7875-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-10-23cgroup/freezer: Reduce redundant traversal for cgroup_freezeChen Ridong
Whether a cgroup is frozen is determined solely by whether it is set to to be frozen and whether its parent is frozen. Currently, when is cgroup is frozen or unfrozen, it iterates through the entire subtree to freeze or unfreeze its descentdants. However, this is unesessary for a cgroup that does not change its effective frozen status. This path aims to skip the subtree if its parent does not have a change in effective freeze. For an example, subtree like, a-b-c-d-e-f-g, when a is frozen, the entire tree is frozen. If we freeze b and c again, it is unesessary to iterate d, e, f and g. So does that If we unfreeze b/c. Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-23uprobe: Add support for session consumerJiri Olsa
This change allows the uprobe consumer to behave as session which means that 'handler' and 'ret_handler' callbacks are connected in a way that allows to: - control execution of 'ret_handler' from 'handler' callback - share data between 'handler' and 'ret_handler' callbacks The session concept fits to our common use case where we do filtering on entry uprobe and based on the result we decide to run the return uprobe (or not). It's also convenient to share the data between session callbacks. To achive this we are adding new return value the uprobe consumer can return from 'handler' callback: UPROBE_HANDLER_IGNORE - Ignore 'ret_handler' callback for this consumer. And store cookie and pass it to 'ret_handler' when consumer has both 'handler' and 'ret_handler' callbacks defined. We store shared data in the return_consumer object array as part of the return_instance object. This way the handle_uretprobe_chain can find related return_consumer and its shared data. We also store entry handler return value, for cases when there are multiple consumers on single uprobe and some of them are ignored and some of them not, in which case the return probe gets installed and we need to have a way to find out which consumer needs to be ignored. The tricky part is when consumer is registered 'after' the uprobe entry handler is hit. In such case this consumer's 'ret_handler' gets executed as well, but it won't have the proper data pointer set, so we can filter it out. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241018202252.693462-3-jolsa@kernel.org
2024-10-23uprobe: Add data pointer to consumer handlersJiri Olsa
Adding data pointer to both entry and exit consumer handlers and all its users. The functionality itself is coming in following change. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241018202252.693462-2-jolsa@kernel.org
2024-10-23rcu: Delete unused rcu_gp_might_be_stalled() functionPaul E. McKenney
The rcu_gp_might_be_stalled() function is no longer used, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-23kasan: Fix Software Tag-Based KASAN with GCCMarco Elver
Per [1], -fsanitize=kernel-hwaddress with GCC currently does not disable instrumentation in functions with __attribute__((no_sanitize_address)). However, __attribute__((no_sanitize("hwaddress"))) does correctly disable instrumentation. Use it instead. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117196 [1] Link: https://lore.kernel.org/r/000000000000f362e80620e27859@google.com Link: https://lore.kernel.org/r/ZvFGwKfoC4yVjN_X@J2N7QTR9R3 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218854 Reported-by: syzbot+908886656a02769af987@syzkaller.appspotmail.com Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrew Pinski <pinskia@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Fixes: 7b861a53e46b ("kasan: Bump required compiler version") Link: https://lore.kernel.org/r/20241021120013.3209481-1-elver@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-10-23debugfs: add small file operations for most filesJohannes Berg
As struct file_operations is really big, but (most) debugfs files only use simple_open, read, write and perhaps seek, and don't need anything else, this wastes a lot of space for NULL pointers. Add a struct debugfs_short_fops and some bookkeeping code in debugfs so that users can use that with debugfs_create_file() using _Generic to figure out which function to use. Converting mac80211 to use it where possible saves quite a bit of space: 1010127 205064 1220 1216411 128f9b net/mac80211/mac80211.ko (before) 981199 205064 1220 1187483 121e9b net/mac80211/mac80211.ko (after) ------- -28928 = ~28KiB With a marginal space cost in debugfs: 8701 550 16 9267 2433 fs/debugfs/inode.o (before) 25233 325 32 25590 63f6 fs/debugfs/file.o (before) 8914 558 16 9488 2510 fs/debugfs/inode.o (after) 25380 325 32 25737 6489 fs/debugfs/file.o (after) --------------- +360 +8 (All on x86-64) A simple spatch suggests there are more than 300 instances, not even counting the ones hidden in macros like in mac80211, that could be trivially converted, for additional savings of about 240 bytes for each. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20241022151838.26f9925fb959.Ia80b55e934bbfc45ce0df42a3233d34b35508046@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23sysctl: update comments to new registration APIsThomas Weißschuh
The sysctl registration APIs do not need a terminating table entry anymore and with commit acc154691fc7 ("sysctl: Warn on an empty procname element") even emit warnings if such a sentinel entry is supplied. While at it also remove the mention of "table->de" which was removed in commit 3fbfa98112fc ("[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables") back in 2007. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2024-10-23regulator: doc: remove documentation comment for regulator_initJerome Brunet
Remove documentation comment related to regulator_init callback. This solves the following warning when building the kernel documentation: ./include/linux/regulator/machine.h:290: warning: Excess struct member 'regulator_init' description in 'regulator_init_data' Fixes: 602ff58ae4fe ("regulator: core: remove machine init callback from config") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20241023155257.0fa7211d@canb.auug.org.au/ Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://patch.msgid.link/20241023-regulator-doc-fixup-v1-2-ec018742ad73@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-23regulator: doc: add missing documentation for init_cbJerome Brunet
Add comment documenting introduced init_cb. This solves the following warning when building the kernel documentation: ./include/linux/regulator/driver.h:435: warning: Function parameter or struct member 'init_cb' not described in 'regulator_desc' Fixes: cfcdf395c21e ("regulator: core: add callback to perform runtime init") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20241023155120.6c4fea20@canb.auug.org.au/ Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://patch.msgid.link/20241023-regulator-doc-fixup-v1-1-ec018742ad73@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-23net: netdev_tx_sent_queue() small optimizationEric Dumazet
Change smp_mb() imediately following a set_bit() with smp_mb__after_atomic(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241018052310.2612084-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23netpoll: remove ndo_netpoll_setup() second argumentEric Dumazet
npinfo is not used in any of the ndo_netpoll_setup() methods. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241018052108.2610827-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23kfifo: don't include dma-mapping.h in kfifo.hChristoph Hellwig
Nothing in kfifo.h directly needs dma-mapping.h, only two macros use DMA_MAPPING_ERROR when actually instantiated. Drop the dma-mapping.h include to reduce include bloat. Add an explicity <linux/io.h> include to drivers/mailbox/omap-mailbox.c as that file uses __raw_readl and __raw_writel through a complicated include chain involving <linux/dma-mapping.h> Fixes: d52b761e4b1a ("kfifo: add kfifo_dma_out_prepare_mapped()") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241023055317.313234-1-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-22bpf: Add MEM_WRITE attributeDaniel Borkmann
Add a MEM_WRITE attribute for BPF helper functions which can be used in bpf_func_proto to annotate an argument type in order to let the verifier know that the helper writes into the memory passed as an argument. In the past MEM_UNINIT has been (ab)used for this function, but the latter merely tells the verifier that the passed memory can be uninitialized. There have been bugs with overloading the latter but aside from that there are also cases where the passed memory is read + written which currently cannot be expressed, see also 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22regulator: init_data handling updateMark Brown
Merge series from Jerome Brunet <jbrunet@baylibre.com>: This patchset groups the regulator patches around the init_data topic discussed on pmbus write protect patchset [1] [1]: https://lore.kernel.org/r/20240920-pmbus-wp-v1-0-d679ef31c483@baylibre.com
2024-10-22clk: Provide devm_clk_bulk_get_all_enabled() helperCristian Ciocaltea
Commit 265b07df758a ("clk: Provide managed helper to get and enable bulk clocks") added devm_clk_bulk_get_all_enable() function, but missed to return the number of clocks stored in the clk_bulk_data table referenced by the clks argument. Without knowing the number, it's not possible to iterate these clocks when needed, hence the argument is useless and could have been simply removed. Introduce devm_clk_bulk_get_all_enabled() variant, which is consistent with devm_clk_bulk_get_all() in terms of the returned value: > 0 if one or more clocks have been stored = 0 if there are no clocks < 0 if an error occurred Moreover, the naming is consistent with devm_clk_get_enabled(), i.e. use the past form of 'enable'. To reduce code duplication and improve patch readability, make devm_clk_bulk_get_all_enable() use the new helper, as suggested by Stephen Boyd. Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20241019-clk_bulk_ena_fix-v4-1-57f108f64e70@collabora.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-10-22regulator: core: remove machine init callback from configJerome Brunet
The machine specific regulator_init() appears to be unused. It does not allow a lot of interaction with the regulator framework, since nothing from the framework is passed along (desc, config, etc ...) Machine specific init may also be done with the added init_cb() in the regulator description, so remove regulator_init(). Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://patch.msgid.link/20241008-regulator-ignored-data-v2-3-d1251e0ee507@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-22regulator: core: add callback to perform runtime initJerome Brunet
Provide an initialisation callback to handle runtime parameters. The idea is similar to the regulator_init() callback, but it provides regulator specific structures, instead of just the driver specific data. As an example, this allows the driver to amend the regulator constraints based on runtime parameters if necessary. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://patch.msgid.link/20241008-regulator-ignored-data-v2-2-d1251e0ee507@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-22seccomp: Stub for !HAVE_ARCH_SECCOMP_FILTERLinus Walleij
If we have CONFIG_SECCOMP but not CONFIG_HAVE_ARCH_SECCOMP_FILTER we get a compilation error: ../kernel/entry/common.c: In function 'syscall_trace_enter': ../kernel/entry/common.c:55:23: error: implicit declaration of function '__secure_computing' [-Werror=implicit-function-declaration] 55 | ret = __secure_computing(NULL); | ^~~~~~~~~~~~~~~~~~ This is because generic entry calls __secure_computing() unconditionally. Provide the needed stub similar to how current ARM does this by calling secure_computing_strict() in the absence of secure_computing(). This is similar to what is done for ARM in arch/arm/kernel/ptrace.c. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20241022-seccomp-compile-error-v2-1-c9f08a4f8ebb@linaro.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-10-22Merge branch 'for-6.13/block-atomic' into for-6.13/blockJens Axboe
Merge in block/fs prep patches for the atomic write support. * for-6.13/block-atomic: block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid()
2024-10-22block: sed-opal: add ioctl IOC_OPAL_SET_SID_PWGreg Joyce
After a SED drive is provisioned, there is no way to change the SID password via the ioctl() interface. A new ioctl IOC_OPAL_SET_SID_PW will allow the password to be changed. The valid current password is required. Signed-off-by: Greg Joyce <gjoyce@linux.ibm.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20240829175639.6478-2-gjoyce@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: enable passthrough command statisticsKeith Busch
Applications using the passthrough interfaces for IO want to continue seeing the disk stats. These requests had been fenced off from this block layer feature. While the block layer doesn't necessarily know what a passthrough command does, we do know the data size and direction, which is enough to account for the command's stats. Since tracking these has the potential to produce unexpected results, the passthrough stats are locked behind a new queue flag that needs to be enabled with the /sys/block/<dev>/queue/iostats_passthrough attribute. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241007153236.2818562-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: introduce add_disk_fwnode()Christian Marangi
Introduce add_disk_fwnode() as a replacement of device_add_disk() that permits to pass and attach a fwnode to disk dev. This variant can be useful for eMMC that might have the partition table for the disk defined in DT. A parser can later make use of the attached fwnode to parse the related table and init the hardcoded partition for the disk. device_add_disk() is converted to a simple wrapper of add_disk_fwnode() with the fwnode entry set as NULL. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241002221306.4403-4-ansuelsmth@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: remove redundant passthrough check in blk_mq_need_time_stamp()Jens Axboe
Simply checking the rq_flags is enough to determine if accounting is being done for this request. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22virtchnl: fix m68k build.Paolo Abeni
The kernel test robot reported a build failure on m68k in the intel driver due to the recent shapers-related changes. The mentioned arch has funny alignment properties, let's be explicit about the binary layout expectation introducing a padding field. Fixes: 608a5c05c39b ("virtchnl: support queue rate limit and quanta size configuration") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410131710.71Wt6LKO-lkp@intel.com/ Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/e45d1c9f17356d431b03b419f60b8b763d2ff768.1729000481.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22pmdomain: mediatek: Add support for MT6735Yassine Oudjana
Add support for SCPSYS power domains of MT6735. All non-CPU power domains are added except for MD2 (C2K modem), which is left out due to issues with powering it on. Signed-off-by: Yassine Oudjana <y.oudjana@protonmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20241017085136.68053-3-y.oudjana@protonmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-10-22acl: Annotate struct posix_acl with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member a_entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Use struct_size() to calculate the number of bytes to allocate for new and cloned acls and remove the local size variables. Change the posix_acl_alloc() function parameter count from int to unsigned int to match posix_acl's a_count data type. Add identifier names to the function definition to silence two checkpatch warnings. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20241018121426.155247-2-thorsten.blum@linux.dev Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-22acl: Realign struct posix_acl to save 8 bytesThorsten Blum
Reduce posix_acl's struct size by 8 bytes by realigning its members. Cc: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20241015202158.2376-1-thorsten.blum@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-22epoll: Add synchronous wakeup support for ep_poll_callbackXuewen Yan
Now, the epoll only use wake_up() interface to wake up task. However, sometimes, there are epoll users which want to use the synchronous wakeup flag to hint the scheduler, such as Android binder driver. So add a wake_up_sync() define, and use the wake_up_sync() when the sync is true in ep_poll_callback(). Co-developed-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Link: https://lore.kernel.org/r/20240426080548.8203-1-xuewen.yan@unisoc.com Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Brian Geffon <bgeffon@google.com> Reported-by: Benoit Lize <lizeb@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-22epoll: annotate racy checkChristian Brauner
Epoll relies on a racy fastpath check during __fput() in eventpoll_release() to avoid the hit of pointlessly acquiring a semaphore. Annotate that race by using WRITE_ONCE() and READ_ONCE(). Link: https://lore.kernel.org/r/66edfb3c.050a0220.3195df.001a.GAE@google.com Link: https://lore.kernel.org/r/20240925-fungieren-anbauen-79b334b00542@brauner Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: syzbot+3b6b32dc50537a49bb4a@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix the guest view of the ID registers, making the relevant fields writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1) - Correcly expose S1PIE to guests, fixing a regression introduced in 6.12-rc1 with the S1POE support - Fix the recycling of stage-2 shadow MMUs by tracking the context (are we allowed to block or not) as well as the recycling state - Address a couple of issues with the vgic when userspace misconfigures the emulation, resulting in various splats. Headaches courtesy of our Syzkaller friends - Stop wasting space in the HYP idmap, as we are dangerously close to the 4kB limit, and this has already exploded in -next - Fix another race in vgic_init() - Fix a UBSAN error when faking the cache topology with MTE enabled RISCV: - RISCV: KVM: use raw_spinlock for critical section in imsic x86: - A bandaid for lack of XCR0 setup in selftests, which causes trouble if the compiler is configured to have x86-64-v3 (with AVX) as the default ISA. Proper XCR0 setup will come in the next merge window. - Fix an issue where KVM would not ignore low bits of the nested CR3 and potentially leak up to 31 bytes out of the guest memory's bounds - Fix case in which an out-of-date cached value for the segments could by returned by KVM_GET_SREGS. - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL - Override MTRR state for KVM confidential guests, making it WB by default as is already the case for Hyper-V guests. Generic: - Remove a couple of unused functions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits) RISCV: KVM: use raw_spinlock for critical section in imsic KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups KVM: selftests: x86: Avoid using SSE/AVX instructions KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range() KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot x86/kvm: Override default caching mode for SEV-SNP and TDX KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic KVM: Remove unused kvm_vcpu_gfn_to_pfn KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS KVM: arm64: Fix shift-out-of-bounds bug KVM: arm64: Shave a few bytes from the EL2 idmap code KVM: arm64: Don't eagerly teardown the vgic on init error KVM: arm64: Expose S1PIE to guests KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule KVM: arm64: nv: Punt stage-2 recycling to a vCPU request KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed ...
2024-10-21dmaengine: idxd: Move DSA/IAA device IDs to IDXD driverFenghua Yu
Since the DSA/IAA device IDs are only used by the IDXD driver, there is no need to define them as public IDs. Move their definitions to the IDXD driver to limit their scope. This change helps reduce unnecessary exposure of the device IDs in the global space, making the codebase cleaner and better encapsulated. There is no functional change. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20241018213725.4167413-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-10-21iomap: turn iomap_want_unshare_iter into an inline functionChristoph Hellwig
iomap_want_unshare_iter currently sits in fs/iomap/buffered-io.c, which depends on CONFIG_BLOCK. It is also in used in fs/dax.c whіch has no such dependency. Given that it is a trivial check turn it into an inline in include/linux/iomap.h to fix the DAX && !BLOCK build. Fixes: 6ef6a0e821d3 ("iomap: share iomap_unshare_iter predicate code with fsdax") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241015041350.118403-1-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21uaccess: add copy_struct_to_user helperAleksa Sarai
This is based on copy_struct_from_user(), but there is one additional case to consider when creating a syscall that returns an extensible-struct to userspace -- how should data in the struct that cannot fit into the userspace struct be handled (ksize > usize)? There are three possibilies: 1. The interface is like sched_getattr(2), where new information will be silently not provided to userspace. This is probably what most interfaces will want to do, as it provides the most possible backwards-compatibility. 2. The interface is like lsm_list_modules(2), where you want to return an error like -EMSGSIZE if not providing information could result in the userspace program making a serious mistake (such as one that could lead to a security problem) or if you want to provide some flag to userspace so they know that they are missing some information. 3. The interface is like statx(2), where there some kind of a request mask that indicates what data userspace would like. One could imagine that statx2(2) (using extensible structs) would want to return -EMSGSIZE if the user explicitly requested a field that their structure is too small to fit, but not return an error if the field was not explicitly requested. This is kind of a mix between (1) and (2) based on the requested mask. The copy_struct_to_user() helper includes a an extra argument that is used to return a boolean flag indicating whether there was a non-zero byte in the trailing bytes that were not copied to userspace. This can be used in the following ways to handle all three cases, respectively: 1. Just pass NULL, as you don't care about this case. 2. Return an error (say -EMSGSIZE) if the argument was set to true by copy_struct_to_user(). 3. If the argument was set to true by copy_struct_to_user(), check if there is a flag that implies a field larger than usize. This is the only case where callers of copy_struct_to_user() should check usize themselves. This will probably require scanning an array that specifies what flags were added for each version of the flags struct and returning an error if the request mask matches any of the flags that were added in versions of the struct that are larger than usize. At the moment we don't have any users of (3), so this patch doesn't include any helpers to make the necessary scanning easier, but it should be fairly easy to add some if necessary. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Link: https://lore.kernel.org/r/20241010-extensible-structs-check_fields-v3-1-d2833dfe6edd@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel spaceBibo Mao
There are two pages in one TLB entry on LoongArch system. For kernel space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit set, otherwise HW treats it as non-global tlb, there will be potential problems if tlb entry for kernel space is not global. Such as fail to flush kernel tlb with the function local_flush_tlb_kernel_range() which supposed only flush tlb with global bit. Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and kasan areas. For these areas both two consecutive page table entries should be enabled with PAGE_GLOBAL bit. So with function set_pte() and pte_clear(), pte buddy entry is checked and set besides its own pte entry. However it is not atomic operation to set both two pte entries, there is problem with test_vmalloc test case. So function kernel_pte_init() is added to init a pte table when it is created for kernel address space, and the default initial pte value is PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry need update with function set_pte() and pte_clear(), nothing to do with the pte buddy entry. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-10-21soc: mediatek: Add MediaTek DVFS Resource Collector (DVFSRC) driverAngeloGioacchino Del Regno
The Dynamic Voltage and Frequency Scaling Resource Collector (DVFSRC) is a Hardware module used to collect all the requests from both software and the various remote processors embedded into the SoC and decide about a minimum operating voltage and a minimum DRAM frequency to fulfill those requests in an effort to provide the best achievable performance per watt. This hardware IP is capable of transparently performing direct register R/W on all of the DVFSRC-controlled regulators and SoC bandwidth knobs. This driver includes support for MT8183, MT8192 and MT8195. Co-Developed-by: Dawei Chien <dawei.chien@mediatek.com> [Angelo: Partial refactoring and cleanups] Reviewed-by: Georgi Djakov <djakov@kerenl.org> Link: https://lore.kernel.org/r/20240610085735.147134-5-angelogioacchino.delregno@collabora.com Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-10-21fs/super.c: introduce get_tree_bdev_flags()Gao Xiang
As Allison reported [1], currently get_tree_bdev() will store "Can't lookup blockdev" error message. Although it makes sense for pure bdev-based fses, this message may mislead users who try to use EROFS file-backed mounts since get_tree_nodev() is used as a fallback then. Add get_tree_bdev_flags() to specify extensible flags [2] and GET_TREE_BDEV_QUIET_LOOKUP to silence "Can't lookup blockdev" message since it's misleading to EROFS file-backed mounts now. [1] https://lore.kernel.org/r/CAOYeF9VQ8jKVmpy5Zy9DNhO6xmWSKMB-DO8yvBB0XvBE7=3Ugg@mail.gmail.com [2] https://lore.kernel.org/r/ZwUkJEtwIpUA4qMz@infradead.org Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241009033151.2334888-1-hsiangkao@linux.alibaba.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-21net: fix races in netdev_tx_sent_queue()/dev_watchdog()Eric Dumazet
Some workloads hit the infamous dev_watchdog() message: "NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out" It seems possible to hit this even for perfectly normal BQL enabled drivers: 1) Assume a TX queue was idle for more than dev->watchdog_timeo (5 seconds unless changed by the driver) 2) Assume a big packet is sent, exceeding current BQL limit. 3) Driver ndo_start_xmit() puts the packet in TX ring, and netdev_tx_sent_queue() is called. 4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue() before txq->trans_start has been written. 5) txq->trans_start is written later, from netdev_start_xmit() if (rc == NETDEV_TX_OK) txq_trans_update(txq) dev_watchdog() running on another cpu could read the old txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5) did not happen yet. To solve the issue, write txq->trans_start right before one XOFF bit is set : - _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue() - __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue() From dev_watchdog(), we have to read txq->state before txq->trans_start. Add memory barriers to enforce correct ordering. In the future, we could avoid writing over txq->trans_start for normal operations, and rename this field to txq->xoff_start_time. Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21mtd: spinand: Constify struct nand_ecc_engine_opsChristophe JAILLET
'struct nand_ecc_engine_ops' are not modified in these drivers. Constifying this structure moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. Update the prototype of mxic_ecc_get_pipelined_ops() accordingly. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 16709 1374 16 18099 46b3 drivers/mtd/nand/ecc-mxic.o After: ===== text data bss dec hex filename 16789 1294 16 18099 46b3 drivers/mtd/nand/ecc-mxic.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/72597e9de2320a4109be2112e696399592edacd4.1729271136.git.christophe.jaillet@wanadoo.fr
2024-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR (net-6.12-rc4). Conflicts: 107a034d5c1e ("net/mlx5: qos: Store rate groups in a qos domain") 1da9cfd6c41c ("net/mlx5: Unregister notifier on eswitch init failure") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-21Merge 6.12-rc4 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-21Merge 6.12-rc4 into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-21Merge 6.12-rc4 into char-misc-nextGreg Kroah-Hartman
We need the iio fixes from 6.12-rc4 in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-21Merge tag 'v6.12-rc4' into sched/core, to resolve conflictIngo Molnar
Overlapping fixes solving the same bug slightly differently: 7266f0a6d3bb fs/bcachefs: Fix __wait_on_freeing_inode() definition of waitqueue entry 3b80552e7057 bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry Use the upstream version. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-10-20Merge tag 'tty-6.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for 6.12-rc4: - qcom-geni serial driver fixes, wow what a mess of a UART chip that thing is... - vt infoleak fix for odd font sizes - imx serial driver bugfix - yet-another n_gsm ldisc bugfix, slowly chipping down the issues in that piece of code All of these have been in linux-next for over a week with no reported issues" * tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: qcom-geni: rename suspend functions serial: qcom-geni: drop unused receive parameter serial: qcom-geni: drop flip buffer WARN() serial: qcom-geni: fix rx cancel dma status bit serial: qcom-geni: fix receiver enable serial: qcom-geni: fix dma rx cancellation serial: qcom-geni: fix shutdown race serial: qcom-geni: revert broken hibernation support serial: qcom-geni: fix polled console initialisation serial: imx: Update mctrl old_status on RTSD interrupt tty: n_gsm: Fix use-after-free in gsm_cleanup_mux vt: prevent kernel-infoleak in con_font_get()
2024-10-20Merge tag 'irq_urgent_for_v6.12_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Fix a case for sifive-plic where an interrupt gets disabled *and* masked and remains masked when it gets reenabled later - Plug a small race in GIC-v4 where userspace can force an affinity change of a virtual CPU (vPE) in its unmapping path - Do not mix the two sets of ocelot irqchip's registers in the mask calculation of the main interrupt sticky register - Other smaller fixlets and cleanups * tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/renesas-rzg2l: Fix missing put_device irqchip/riscv-intc: Fix SMP=n boot with ACPI irqchip/sifive-plic: Unmask interrupt in plic_irq_enable() irqchip/gic-v4: Don't allow a VMOVP on a dying VPE irqchip/sifive-plic: Return error code on failure irqchip/riscv-imsic: Fix output text of base address irqchip/ocelot: Comment sticky register clearing code irqchip/ocelot: Fix trigger register address irqchip: Remove obsolete config ARM_GIC_V3_ITS_PCI
2024-10-20Merge tag 'sched_urgent_for_v6.12_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduling fixes from Borislav Petkov: - Add PREEMPT_RT maintainers - Fix another aspect of delayed dequeued tasks wrt determining their state, i.e., whether they're runnable or blocked - Handle delayed dequeued tasks and their migration wrt PSI properly - Fix the situation where a delayed dequeue task gets enqueued into a new class, which should not happen - Fix a case where memory allocation would happen while the runqueue lock is held, which is a no-no - Do not over-schedule when tasks with shorter slices preempt the currently running task - Make sure delayed to deque entities are properly handled before unthrottling - Other smaller cleanups and improvements * tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Add an entry for PREEMPT_RT. sched/fair: Fix external p->on_rq users sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug sched/core: Dequeue PSI signals for blocked tasks that are delayed sched: Fix delayed_dequeue vs switched_from_fair() sched/core: Disable page allocation in task_tick_mm_cid() sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl() sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running sched: Fix sched_delayed vs cfs_bandwidth
2024-10-20KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomicDr. David Alan Gilbert
The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit 1bbc60d0c7e5 ("KVM: x86/mmu: Remove MMU auditing") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Message-ID: <20241001141354.18009-3-linux@treblig.org> [Adjust Documentation/virt/kvm/locking.rst. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20KVM: Remove unused kvm_vcpu_gfn_to_pfnDr. David Alan Gilbert
The last use of kvm_vcpu_gfn_to_pfn was removed by commit b1624f99aa8f ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Message-ID: <20241001141354.18009-2-linux@treblig.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-19getname_maybe_null() - the third variant of pathname copy-inAl Viro
Semantics used by statx(2) (and later *xattrat(2)): without AT_EMPTY_PATH it's standard getname() (i.e. ERR_PTR(-ENOENT) on empty string, ERR_PTR(-EFAULT) on NULL), with AT_EMPTY_PATH both empty string and NULL are accepted. Calling conventions: getname_maybe_null(user_pointer, flags) returns * pointer to struct filename when non-empty string had been successfully read * ERR_PTR(...) on error * NULL if an empty string or NULL pointer had been given with AT_EMPTY_PATH in the flags argument. It tries to avoid allocation in the last case; it's not always able to do so, in which case the temporary struct filename instance is freed and NULL returned anyway. Fast path is inlined. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>