summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-02-25Merge tag 'vfs-6.14-rc5.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Use __readahead_folio() in fuse again to fix a UAF issue when using splice - Remove d_op->d_delete method from pidfs - Remove d_op->d_delete method from nsfs - Simplify iomap_dio_bio_iter() - Fix a UAF in ovl_dentry_update_reval - Fix a miscalulated file range for filemap_fdatawrite_range_kick() - Don't skip skip dirty page in folio_unmap_invalidate() * tag 'vfs-6.14-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: Minor code simplification in iomap_dio_bio_iter() nsfs: remove d_op->d_delete pidfs: remove d_op->d_delete mm/truncate: don't skip dirty page in folio_unmap_invalidate() mm/filemap: fix miscalculated file range for filemap_fdatawrite_range_kick() fuse: don't truncate cached, mutated symlink ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_up fuse: revert back to __readahead_folio() for readahead
2025-02-25block: make segment size limit workable for > 4K PAGE_SIZEMing Lei
Using PAGE_SIZE as a minimum expected DMA segment size in consideration of devices which have a max DMA segment size of < 64k when used on 64k PAGE_SIZE systems leads to devices not being able to probe such as eMMC and Exynos UFS controller [0] [1] you can end up with a probe failure as follows: WARNING: CPU: 2 PID: 397 at block/blk-settings.c:339 blk_validate_limits+0x364/0x3c0 Ensure we use min(max_seg_size, seg_boundary_mask + 1) as the new min segment size when max segment size is < PAGE_SIZE for 16k and 64k base page size systems. If anyone need to backport this patch, the following commits are depended: commit 6aeb4f836480 ("block: remove bio_add_pc_page") commit 02ee5d69e3ba ("block: remove blk_rq_bio_prep") commit b7175e24d6ac ("block: add a dma mapping iterator") Link: https://lore.kernel.org/linux-block/20230612203314.17820-1-bvanassche@acm.org/ # [0] Link: https://lore.kernel.org/linux-block/1d55e942-5150-de4c-3a02-c3d066f87028@acm.org/ # [1] Cc: Yi Zhang <yi.zhang@redhat.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Keith Busch <kbusch@kernel.org> Tested-by: Paul Bunyan <pbunyan@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250225022141.2154581-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-25EDAC: Add a Error Check Scrub control featureShiju Jose
Add an Error Check Scrub (ECS) control to manage a memory device's ECS feature. The ECS is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-5) and allows the DRAM to internally read, correct single-bit errors, and write back corrected data bits to the DRAM array while providing transparency to error counts. The DDR5 device contains a number of memory media Field Replaceable Units (FRU) per device. The DDR5 ECS feature and thus the ECS control driver supports configuring the ECS parameters per FRU. Memory devices support the ECS feature register with the EDAC device driver, which retrieves the ECS descriptor from the EDAC ECS driver. This driver exposes sysfs ECS control attributes to userspace via /sys/bus/edac/devices/<dev-name>/ecs_fruX/. The common sysfs ECS control interface abstracts the control of an arbitrary ECS functionality to a common set of functions. Support for the ECS feature is added separately because the control attributes of the DDR5 ECS feature differ from those of the scrub feature. The sysfs ECS attribute nodes are only present if the client driver has implemented the corresponding attribute callback function and passed the necessary operations to the EDAC RAS feature driver during registration. [ bp: Massage, fixup edac_dev_register() retvals. ] Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20250212143654.1893-4-shiju.jose@huawei.com
2025-02-25EDAC: Add scrub control featureShiju Jose
Add a scrub control to manage memory scrubbers in the system. Devices with a scrub feature register with the EDAC device driver which retrieves the scrub descriptor from the scrub driver and exposes the control attributes for a instance to userspace at /sys/bus/edac/devices/<dev-name>/scrubX/. The common sysfs scrub control interface abstracts the control of arbitrary scrubbing functionality into a common set of functions. The attribute nodes are only present if the client driver has implemented the corresponding attribute callback function and passed the operations to the device driver during registration. [ bp: Massage commit message, docs and code, simplify text a bit. Integrate fixup for: https://lore.kernel.org/r/202502251009.0sGkolEJ-lkp@intel.com Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> ] Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20250212143654.1893-3-shiju.jose@huawei.com
2025-02-25EDAC: Add support for EDAC device features controlShiju Jose
Add generic EDAC device feature controls supporting the registration of RAS features available in the system. The driver exposes control attributes for these features to userspace in /sys/bus/edac/devices/<dev-name>/<ras-feature> [ bp: Touch-up documentation, simplify, make edac_dev_type static, fixup edac_dev_register() retvals. ] Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Daniel Ferguson <danielf@os.amperecomputing.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20250212143654.1893-2-shiju.jose@huawei.com
2025-02-25sctp: Remove unused payload from sctp_idatahdrThorsten Blum
Remove the unused payload array from the struct sctp_idatahdr. Cc: Kees Cook <kees@kernel.org> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20250223204505.2499-3-thorsten.blum@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-25Merge tag 'ib-devres-iio-input-pinctrl-v6.15' into intel/pinctrlAndy Shevchenko
There are a few Intel pin control drivers that are affected by the devm_kmemdup_array() conversion, merge the ib-devres-iio-input-pinctrl for making development going smoothly. * Split devres APIs to a separate header (linux/device/devres.h) * Move IOMEM_ERR_PTR() to err.h to avoid unneeded loops * Introduce devm_kmemdup_array() * Use devm_kmemdup_array() in input, IIO, and pinctrl subsystems Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2025-02-24net: phy: add phylib-internal.hHeiner Kallweit
This patch is a starting point for moving phylib-internal declarations to a private header file. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/082eacd2-a888-4716-8797-b3491ce02820@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24net/mlx5e: Add correct match to check IPSec syndromes for switchdev modeJianbo Liu
In commit dddb49b63d86 ("net/mlx5e: Add IPsec and ASO syndromes check in HW"), IPSec and ASO syndromes checks after decryption for the specified ASO object were added. But they are correct only for eswith in legacy mode. For switchdev mode, metadata register c1 is used to save the mapped id (not ASO object id). So, need to change the match accordingly for the check rules in status table. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250220213959.504304-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-25Merge tag 'ib-devres-iio-input-pinctrl-v6.15' into psy-nextSebastian Reichel
Merge immutable branch introducing devm_kmemdup_array(), so that it can be used in the sc27xx fuel gauge. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2025-02-24net: remove '__' from __skb_flow_get_ports()Nicolas Dichtel
Only one version of skb_flow_get_ports() exists after the previous commit, so let's remove the useless '__'. Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://patch.msgid.link/20250221110941.2041629-3-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24skbuff: kill skb_flow_get_ports()Nicolas Dichtel
Since commit a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff"), this function is not used anymore. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250221110941.2041629-2-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-24cpumask: drop cpumask_next_wrap_old()Yury Norov
Now that we have cpumask_next_wrap() wired to generic find_next_bit_wrap(), the old implementation is not needed. Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-02-24cpumask: re-introduce cpumask_next{,_and}_wrap()Yury Norov
cpumask_next_wrap_old() has two additional parameters, comparing to its generic counterpart find_next_bit_wrap(). The reason for that is historical. Before 4fe49b3b97c262 ("lib/bitmap: introduce for_each_set_bit_wrap() macro"), cpumask_next_wrap() was used to implement for_each_cpu_wrap() iterator. Now that the iterator is an alias to generic for_each_set_bit_wrap(), the additional parameters aren't used and may confuse readers. All existing users call cpumask_next_wrap() in a way that makes it possible to turn it to straight and simple alias to find_next_bit_wrap(). In a couple of places kernel users opencode missing cpumask_next_and_wrap(). Add it as well. CC: Alexander Gordeev <agordeev@linux.ibm.com> CC: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-02-24cpumask: deprecate cpumask_next_wrap()Yury Norov
The next patch aligns implementation of cpumask_next_wrap() with the find_next_bit_wrap(), and it changes function signature. To make the transition smooth, this patch deprecates current implementation by adding an _old suffix. The following patches switch current users to the new implementation one by one. No functional changes were intended. Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-02-24gpio: mmio: Add flag for calling pinctrl back-endLinus Walleij
It turns out that with this flag we can switch over an entire driver to use gpio-mmio instead of a bunch of custom code, also providing get/set_multiple() to it in the process, so it seems like a reasonable feature to add. The generic pin control backend requires us to call the gpiochip_generic_request(), gpiochip_generic_free(), pinctrl_gpio_direction_output() and pinctrl_gpio_direction_input() callbacks, so if the new flag for a pin control back-end is set, we make sure these functions get called as expected. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250219-vf610-mmio-v3-1-588b64f0b689@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-24binfmt: Remove loader from linux_binprm structYonatan Goldschmidt
Commit 987f20a9dcce ("a.out: Remove the a.out implementation") removed the last in-tree user of the loader field, and as far as I can tell, it was the only one historically. Signed-off-by: Yonatan Goldschmidt <yon.goldschmidt@gmail.com> Link: https://lore.kernel.org/r/20250223223234.13764-1-yon.goldschmidt@gmail.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-24regulator: Add (devm_)of_regulator_get()Sebastian Reichel
The Rockchip power-domain controller also plans to make use of per-domain regulators similar to the MediaTek power-domain controller. Since existing DTs are missing the regulator information, the kernel should fallback to the automatically created dummy regulator if necessary. Thus the version without the _optional suffix is needed. The Rockchip driver plans to use the managed version, but to be consistent with existing code the unmanaged version is added at the same time. Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://patch.msgid.link/20250220-rk3588-gpu-pwr-domain-regulator-v6-1-a4f9c24e5b81@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-24block/bdev: lift block size restrictions to 64kLuis Chamberlain
We now can support blocksizes larger than PAGE_SIZE, so in theory we should be able to lift the restriction up to the max supported page cache order. However bound ourselves to what we can currently validate and test. Through blktests and fstest we can validate up to 64k today. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20250221223823.1680616-8-mcgrof@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-24fs: Turn page_offset() into a wrapper around folio_pos()Matthew Wilcox (Oracle)
This is far less efficient for the lagging filesystems which still use page_offset(), but it removes an access to page->index. It also fixes a bug -- if any filesystem passed a tail page to page_offset(), it would return garbage which might result in the filesystem choosing to not writeback a dirty page. There probably aren't any examples of this, but I can't be certain. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Link: https://lore.kernel.org/r/20250221203932.3588740-1-willy@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-24devres: Introduce devm_kmemdup_array()Raag Jadav
Introduce '_array' variant of devm_kmemdup() which is more robust and consistent with alloc family of helpers. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2025-02-24driver core: Split devres APIs to device/devres.hAndy Shevchenko
device.h is a huge header which is hard to follow and easy to miss something. Improve that by splitting devres APIs to device/devres.h. In particular this helps to speedup the build of the code that includes device.h solely for a devres APIs. While at it, cast the error pointers to __iomem using IOMEM_ERR_PTR() and fix sparse warnings. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2025-02-24err.h: move IOMEM_ERR_PTR() to err.hRaag Jadav
Since IOMEM_ERR_PTR() macro deals with an error pointer, a better place for it is err.h. This helps avoid dependency on io.h for the users that don't need it. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Raag Jadav <raag.jadav@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2025-02-24gpiolib: sanitize the return value of gpio_chip::set_config()Bartosz Golaszewski
The return value of the set_config() callback may be propagated to user-space. If a bad driver returns a positive number, it may confuse user programs. Tighten the API contract and check for positive numbers returned by GPIO controllers. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250210-gpio-sanitize-retvals-v1-3-12ea88506cb2@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-24gpiolib: sanitize the return value of gpio_chip::request()Bartosz Golaszewski
The return value of the request() callback may be propagated to user-space. If a bad driver returns a positive number, it may confuse user programs. Tighten the API contract and check for positive numbers returned by GPIO controllers. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250210-gpio-sanitize-retvals-v1-2-12ea88506cb2@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-24Merge tag 'v6.14-rc4' of ↵Bartosz Golaszewski
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into HEAD Linux 6.14-rc4
2025-02-23cpufreq/amd-pstate: Use scope based cleanup for cpufreq_policy refsDhananjay Ugwekar
There have been instances in past where refcount decrementing is missed while exiting a function. Use automatic scope based cleanup to avoid such errors. Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Link: https://lore.kernel.org/r/20250205112523.201101-12-dhananjay.ugwekar@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-02-23net/mlx5: Change POOL_NEXT_SIZE define value and make it globalPatrisious Haddad
Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define is used to request the available maximum sized flow table, and zero doesn't make sense for it, whereas some places in the driver use zero explicitly expecting the smallest table size possible but instead due to this define they end up allocating the biggest table size unawarely. In addition move the definition to "include/linux/mlx5/fs.h" to expose the define to IB driver as well, while appropriately renaming it. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219085808.349923-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-23net/mlx5: Add new health syndrome error and crr bit offsetShahar Shitrit
Add new error value for trust lockdown in health syndrome enum. Also, include the offset for crr bit in the health buffer layout. These changes prepare for downstream patches that update health event handling. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250219085808.349923-2-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-22Merge tag 'sched-urgent-2025-02-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq fixes from Ingo Molnar: - Fix overly spread-out RSEQ concurrency ID allocation pattern that regressed certain workloads - Fix RSEQ registration syscall behavior on -EFAULT errors when CONFIG_DEBUG_RSEQ=y (This debug option is disabled on most distributions) * tag 'sched-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Fix rseq registration with CONFIG_DEBUG_RSEQ sched: Compact RSEQ concurrency IDs with reduced threads and affinity
2025-02-22iio: imu: adis: Add DIAG_STAT registerRobert Budai
Some devices may have more than 16 bits of status. This patch allows the user to specify the size of the DIAG_STAT register. It defaults to 2 if not specified. This is mainly for backward compatibility. Co-developed-by: Ramona Gradinariu <ramona.gradinariu@analog.com> Signed-off-by: Ramona Gradinariu <ramona.gradinariu@analog.com> Co-developed-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Co-developed-by: Nuno Sá <nuno.sa@analog.com> Signed-off-by: Nuno Sá <nuno.sa@analog.com> Signed-off-by: Robert Budai <robert.budai@analog.com> Link: https://patch.msgid.link/20250217105753.605465-4-robert.budai@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-02-22iio: imu: adis: Add reset to custom opsRobert Budai
This patch allows the custom definition of reset functionality for adis object. It is useful in cases where the driver does not need to sleep after the reset since it is handled by the library. Co-developed-by: Ramona Gradinariu <ramona.gradinariu@analog.com> Signed-off-by: Ramona Gradinariu <ramona.gradinariu@analog.com> Co-developed-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Co-developed-by: Nuno Sá <nuno.sa@analog.com> Signed-off-by: Nuno Sá <nuno.sa@analog.com> Signed-off-by: Robert Budai <robert.budai@analog.com> Link: https://patch.msgid.link/20250217105753.605465-3-robert.budai@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-02-22iio: imu: adis: Add custom ops structRobert Budai
This patch introduces a custom ops struct letting users define custom read and write functions. Some adis devices might define a completely different spi protocol from the one used in the default implementation. Co-developed-by: Ramona Gradinariu <ramona.gradinariu@analog.com> Signed-off-by: Ramona Gradinariu <ramona.gradinariu@analog.com> Co-developed-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Co-developed-by: Nuno Sá <nuno.sa@analog.com> Signed-off-by: Nuno Sá <nuno.sa@analog.com> Signed-off-by: Robert Budai <robert.budai@analog.com> Link: https://patch.msgid.link/20250217105753.605465-2-robert.budai@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-02-22crypto: ahash - Add virtual address supportHerbert Xu
This patch adds virtual address support to ahash. Virtual addresses were previously only supported through shash. The user may choose to use virtual addresses with ahash by calling ahash_request_set_virt instead of ahash_request_set_crypt. The API will take care of translating this to an SG list if necessary, unless the algorithm declares that it supports chaining. Therefore in order for an ahash algorithm to support chaining, it must also support virtual addresses directly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22crypto: hash - Add request chaining APIHerbert Xu
This adds request chaining to the ahash interface. Request chaining allows multiple requests to be submitted in one shot. An algorithm can elect to receive chained requests by setting the flag CRYPTO_ALG_REQ_CHAIN. If this bit is not set, the API will break up chained requests and submit them one-by-one. A new err field is added to struct crypto_async_request to record the return value for each individual request. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-21usb: Add base USB MCTP definitionsJeremy Kerr
Upcoming changes will add a USB host (and later gadget) driver for the MCTP-over-USB protocol. Add a header that provides common definitions for protocol support: the packet header format and a few framing definitions. Add a define for the MCTP class code, as per https://usb.org/defined-class-codes. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-1-3353030fe9cc@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-02-20 We've added 19 non-merge commits during the last 8 day(s) which contain a total of 35 files changed, 1126 insertions(+), 53 deletions(-). The main changes are: 1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing 2) Add network TX timestamping support to BPF sock_ops, from Jason Xing 3) Add TX metadata Launch Time support, from Song Yoong Siang * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: igc: Add launch time support to XDP ZC igc: Refactor empty frame insertion for launch time support net: stmmac: Add launch time support to XDP ZC selftests/bpf: Add launch time request to xdp_hw_metadata xsk: Add launch time hardware offload support to XDP Tx metadata selftests/bpf: Add simple bpf tests in the tx path for timestamping feature bpf: Support selective sampling for bpf timestamping bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING bpf: Disable unsafe helpers in TX timestamping callbacks bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping bpf: Add networking timestamping support to bpf_get/setsockopt() selftests/bpf: Add rto max for bpf_setsockopt test bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt ==================== Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21PCI/ERR: Handle TLP Log in Flit modeIlpo Järvinen
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is presented through AER and DPC Capability registers. The TLP Prefix Log Register is not present with Flit mode, and the register becomes an extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13). Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the extended TLP Header Log when the Link is in Flit mode. As the Prefix Log and Extended TLP Header are not present at the same time, a C union can be used. Determining whether the error occurred while the Link was in Flit mode is a bit complicated. In case of AER, the Advanced Error Capabilities and Control Register directly tells whether the error was logged in Flit mode or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14), unfortunately, does not contain the same information. Unlike AER, the DPC Capability does not provide a way to discern whether the error was logged in Flit mode (this is confirmed by PCI WG to be an oversight in the spec). DPC will bring the Link down immediately following an error, which makes it impossible to acquire the Flit Mode Status directly from the Link Status 2 register because Flit Mode Status is only set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use the flit_mode value stored into the struct pci_bus. Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21PCI: Track Flit Mode Status & print it with link statusIlpo Järvinen
PCIe r6.0 added Flit mode, which mainly alters HW behavior, but there are some OS visible changes. The OS visible changes include differences in the layout of some capabilities and interpretation of the TLP headers (in diagnostics situations). To be able to determine which mode the PCIe Link is using, store the Flit Mode Status (PCIe r6.1 sec 7.5.3.20) information in addition to the Link speed into struct pci_bus in pcie_update_link_speed(). Link: https://lore.kernel.org/r/20250207161836.2755-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: use unsigned int:1 instead of bool, update flit_mode setting] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21rtnetlink: Pack newlink() params into structXiao Liang
There are 4 net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device. Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request. +------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning. On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead. To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21Merge tag 'block-6.14-20250221' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - FC controller state check fixes (Daniel) - PCI Endpoint fixes (Damien) - TCP connection failure fixe (Caleb) - TCP handling C2HTermReq PDU (Maurizio) - RDMA queue state check (Ruozhu) - Apple controller fixes (Hector) - Target crash on disbaled namespace (Hannes) - MD pull request via Yu: - Fix queue limits error handling for raid0, raid1 and raid10 - Fix for a NULL pointer deref in request data mapping - Code cleanup for request merging * tag 'block-6.14-20250221' of git://git.kernel.dk/linux: nvme: only allow entering LIVE from CONNECTING state nvme-fc: rely on state transitions to handle connectivity loss apple-nvme: Support coprocessors left idle apple-nvme: Release power domains when probe fails nvmet: Use enum definitions instead of hardcoded values nvme: Cleanup the definition of the controller config register fields nvme/ioctl: add missing space in err message nvme-tcp: fix connect failure on receiving partial ICResp PDU nvme: tcp: Fix compilation warning with W=1 nvmet: pci-epf: Avoid RCU stalls under heavy workload nvmet: pci-epf: Do not uselessly write the CSTS register nvmet: pci-epf: Correctly initialize CSTS when enabling the controller nvmet-rdma: recheck queue state is LIVE in state lock in recv done nvmet: Fix crash when a namespace is disabled nvme-tcp: add basic support for the C2HTermReq PDU nvme-pci: quirk Acer FA100 for non-uniqueue identifiers block: fix NULL pointer dereferenced within __blk_rq_map_sg block/merge: remove unnecessary min() with UINT_MAX md/raid*: Fix the set_queue_limits implementations
2025-02-21coresight: core: Add provision for panic callbacksLinu Cherian
Panic callback handlers allows coresight device drivers to sync relevant trace data and trace metadata to reserved memory regions so that they can be retrieved later in the subsequent boot or in the crashdump kernel. Signed-off-by: Linu Cherian <lcherian@marvell.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20250212114918.548431-4-lcherian@marvell.com
2025-02-21iommu: Turn fault_data to iommufd private pointerNicolin Chen
A "fault_data" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases, along with the attach_handle. Now, the iommufd version of the sw_msi function will reuse the attach_handle and fault_data for a non-fault case. Rename "fault_data" to "iommufd_hwpt" so as not to confine it to a "fault" case. Move it into a union to be the iommufd private pointer. A following patch will move the iova_cookie to the union for dma-iommu too after the iommufd_sw_msi implementation is added. Since we have two unions now, add some simple comments for readability. Link: https://patch.msgid.link/r/ee5039503f28a16590916e9eef28b917e2d1607a.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21iommu: Make iommu_dma_prepare_msi() into a generic operationJason Gunthorpe
SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21genirq/msi: Refactor iommu_dma_compose_msi_msg()Jason Gunthorpe
The two-step process to translate the MSI address involves two functions, iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg(). Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as it had to dereference the opaque cookie pointer. Now, the previous patch changed the cookie pointer into an integer so there is no longer any need for the iommu layer to be involved. Further, the call sites of iommu_dma_compose_msi_msg() all follow the same pattern of setting an MSI message address_hi/lo to non-translated and then immediately calling iommu_dma_compose_msi_msg(). Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly accepts the u64 version of the address and simplifies all the callers. Move the new helper to linux/msi.h since it has nothing to do with iommu. Aside from refactoring, this logically prepares for the next patch, which allows multiple implementation options for iommu_dma_prepare_msi(). So, it does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c as it no longer provides the only iommu_dma_prepare_msi() implementation. Link: https://patch.msgid.link/r/eda62a9bafa825e9cdabd7ddc61ad5a21c32af24.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookieJason Gunthorpe
The IOMMU translation for MSI message addresses has been a 2-step process, separated in time: 1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address is stored in the MSI descriptor when an MSI interrupt is allocated. 2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a translated message address. This has an inherent lifetime problem for the pointer stored in the cookie that must remain valid between the two steps. However, there is no locking at the irq layer that helps protect the lifetime. Today, this works under the assumption that the iommu domain is not changed while MSI interrupts being programmed. This is true for normal DMA API users within the kernel, as the iommu domain is attached before the driver is probed and cannot be changed while a driver is attached. Classic VFIO type1 also prevented changing the iommu domain while VFIO was running as it does not support changing the "container" after starting up. However, iommufd has improved this so that the iommu domain can be changed during VFIO operation. This potentially allows userspace to directly race VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()). This potentially causes both the cookie pointer and the unlocked call to iommu_get_domain_for_dev() on the MSI translation path to become UAFs. Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA address is already known during iommu_dma_prepare_msi() and cannot change. Thus, it can simply be stored as an integer in the MSI descriptor. The other UAF related to iommu_get_domain_for_dev() will be addressed in patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by using the IOMMU group mutex. Link: https://patch.msgid.link/r/a4f2cd76b9dc1833ee6c1cf325cba57def22231c.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21Merge tag 'v6.14-rc3' into x86/mm, to pick up fixes before merging new changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-21perf/core: Move perf_event sysctls into kernel/eventsJoel Granados
Move ctl tables to two files: - perf_event_{paranoid,mlock_kb,max_sample_rate} and perf_cpu_time_max_percent into kernel/events/core.c - perf_event_max_{stack,context_per_stack} into kernel/events/callchain.c Make static variables and functions that are fully contained in core.c and callchain.cand remove them from include/linux/perf_event.h. Additionally six_hundred_forty_kb is moved to callchain.c. Two new sysctl tables are added ({callchain,events_core}_sysctl_table) with their respective sysctl registration functions. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kerenel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250218-jag-mv_ctltables-v1-5-cd3698ab8d29@kernel.org
2025-02-21Merge branch 'perf/urgent' into perf/core, to pick up fixes before merging ↵Ingo Molnar
new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-02-21mm/filemap: fix miscalculated file range for filemap_fdatawrite_range_kick()Jingbo Xu
iocb->ki_pos has been updated with the number of written bytes since generic_perform_write(). Besides __filemap_fdatawrite_range() accepts the inclusive end of the data range. Fixes: 1d4457576570 ("mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Link: https://lore.kernel.org/r/20250218120209.88093-2-jefflexu@linux.alibaba.com Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>