summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-02-13jbd2: Avoid long replay times due to high number or revoke blocksJan Kara
Some users are reporting journal replay takes a long time when there is excessive number of revoke blocks in the journal. Reported times are like: 1048576 records - 95 seconds 2097152 records - 580 seconds The problem is that hash chains in the revoke table gets excessively long in these cases. Fix the problem by sizing the revoke table appropriately before the revoke pass. Thanks to Alexey Zhuravlev <azhuravlev@ddn.com> for benchmarking the patch with large numbers of revoke blocks [1]. [1] https://lore.kernel.org/all/20250113183107.7bfef7b6@x390.bzzz77.ru Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250121140925.17231-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-02-12net: phylink: provide phylink_mac_implements_lpi()Russell King (Oracle)
Provide a helper to determine whether the MAC operations structure implements the LPI operations, which will be used by both phylink and DSA. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1thR9g-003vX6-4s@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: phy: Add support for driver-specific next update timeOleksij Rempel
Introduce the `phy_get_next_update_time` function to allow PHY drivers to dynamically determine the time (in jiffies) until the next state update event. This enables more flexible and adaptive polling intervals based on the link state or other conditions. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250210082358.200751-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12spi: fix missing offload_flags docDavid Lechner
Add offload_flags to the documentation comment for struct spi_transfer. This was missed when adding the field. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20250212154356.784944ea@canb.auug.org.au/ Fixes: 700a281905f2 ("spi: add offload TX/RX streaming APIs") Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250212-spi-offload-fixes-v1-1-e192c69e3bb3@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-12gpiolib: Simplify implementation of for_each_hwgpio_in_range()Andy Shevchenko
The whole purpose of the custom CLASS() is to have possibility to initialise the counter variable _i to 0. This can't be done with simple __free() macro as it will be not allowed by C language. OTOH, the CLASS() operates with the pointers and explicit usage of the scoped variable _data is not needed, since the pointers are kept the same over the iterations. Simplify the implementation of for_each_hwgpio_in_range(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250207151149.2119765-3-andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-12gpiolib: Deduplicate some code in for_each_requested_gpio_in_range()Andy Shevchenko
Refactor for_each_requested_gpio_in_range() to deduplicate some code which is basically repeats the for_each_hwgpio(). In order to achieve this, split the latter to two, for_each_hwgpio_in_range() and for_each_hwgpio(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250207151149.2119765-2-andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-12Merge patch series "fs: allow changing idmappings"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Currently, it isn't possible to change the idmapping of an idmapped mount. This is becoming an obstacle for various use-cases. /* idmapped home directories with systemd-homed */ On newer systems /home is can be an idmapped mount such that each file on disk is owned by 65536 and a subfolder exists for foreign id ranges such as containers. For example, a home directory might look like this (using an arbitrary folder as an example): user1@localhost:~/data/mount-idmapped$ ls -al /data/ total 16 drwxrwxrwx 1 65536 65536 36 Jan 27 12:15 . drwxrwxr-x 1 root root 184 Jan 27 12:06 .. -rw-r--r-- 1 65536 65536 0 Jan 27 12:07 aaa -rw-r--r-- 1 65536 65536 0 Jan 27 12:07 bbb -rw-r--r-- 1 65536 65536 0 Jan 27 12:07 cc drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers When logging in home is mounted as an idmapped mount with the following idmappings: 65536:$(id -u):1 // uid mapping 65536:$(id -g):1 // gid mapping 2147352576:2147352576:65536 // uid mapping 2147352576:2147352576:65536 // gid mapping So for a user with uid/gid 1000 an idmapped /home would like like this: user1@localhost:~/data/mount-idmapped$ ls -aln /mnt/ total 16 drwxrwxrwx 1 1000 1000 36 Jan 27 12:15 . drwxrwxr-x 1 0 0 184 Jan 27 12:06 .. -rw-r--r-- 1 1000 1000 0 Jan 27 12:07 aaa -rw-r--r-- 1 1000 1000 0 Jan 27 12:07 bbb -rw-r--r-- 1 1000 1000 0 Jan 27 12:07 cc drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers In other words, 65536 is mapped to the user's uid/gid and the range 2147352576 up to 2147352576 + 65536 is an identity mapping for containers. When a container is started a transient uid/gid range is allocated outside of both mappings of the idmapped mount. For example, the container might get the idmapping: $ cat /proc/1742611/uid_map 0 537985024 65536 This container will be allowed to write to disk within the allocated foreign id range 2147352576 to 2147352576 + 65536. To do this an idmapped mount must be created from an already idmapped mount such that: - The mappings for the user's uid/gid must be dropped, i.e., the following mappings are removed: 65536:$(id -u):1 // uid mapping 65536:$(id -g):1 // gid mapping - A mapping for the transient uid/gid range to the foreign uid/gid range is added: 2147352576:537985024:65536 In combination this will mean that the container will write to disk within the foreign id range 2147352576 to 2147352576 + 65536. /* nested containers */ When the outer container makes use of idmapped mounts it isn't posssible to create an idmapped mount for the inner container with a differen idmapping from the outer container's idmapped mount. There are other usecases and the two above just serve as an illustration of the problem. This patchset makes it possible to create a new idmapped mount from an already idmapped mount. It aims to adhere to current performance constraints and requirements: - Idmapped mounts aim to have near zero performance implications for path lookup. That is why no refernce counting, locking or any other mechanism can be required that would impact performance. This works be ensuring that a regular mount transitions to an idmapped mount once going from a static nop_mnt_idmap mapping to a non-static idmapping. - The idmapping of a mount change anymore for the lifetime of the mount afterwards. This not just avoids UAF issues it also avoids pitfalls such as generating non-matching uid/gid values. Changing idmappings could be solved by: - Idmappings could simply be reference counted (above the simple reference count when sharing them across multiple mounts). This would require pairing mnt_idmap_get() with mnt_idmap_put() which would end up being sprinkled everywhere into the VFS and some filesystems that access idmappings directly. It wouldn't just be quite ugly and introduce new complexity it would have a noticeable performance impact. - Idmappings could gain RCU protection. This would help the LOOKUP_RCU case and avoids taking reference counts under RCU. When not under LOOKUP_RCU reference counts need to be acquired on each idmapping. This would require pairing mnt_idmap_get() with mnt_idmap_put() which would end up being sprinkled everywhere into the VFS and some filesystems that access idmappings directly. This would have the same downsides as mentioned earlier. - The earlier solutions work by updating the mnt->mnt_idmap pointer with the new idmapping. Instead of this it would be possible to change the idmapping itself to avoid UAF issues. To do this a sequence counter would have to be added to struct mount. When retrieving the idmapping to generate uid/gid values the sequence counter would need to be sampled and the generation of the uid/gid would spin until the update of the idmap is finished. This has problems as well but the biggest issue will be that this can lead to inconsistent permission checking and inconsistent uid/gid pairs even more than this is already possible today. Specifically, during creation it could happen that: idmap = mnt_idmap(mnt); inode_permission(idmap, ...); may_create(idmap); // create file with uid/gid based on @idmap in between the permission checking and the generation of the uid/gid value the idmapping could change leading to the permission checking and uid/gid value that is actually used to create a file on disk being out of sync. Similarly if two values are generated like: idmap = mnt_idmap(mnt) vfsgid = make_vfsgid(idmap); // idmapping gets update concurrently vfsuid = make_vfsuid(idmap); @vfsgid and @vfsuid could be out of sync if the idmapping was changed in between. The generation of vfsgid/vfsuid could span a lot of codelines so to guard against this a sequence count would have to be passed around. The performance impact of this solutio are less clear but very likely not zero. - Using SRCU similar to fanotify that can sleep. I find that not just ugly but it would have memory consumption implications and is overall pretty ugly. /* solution */ So, to avoid all of these pitfalls creating an idmapped mount from an already idmapped mount will be done atomically, i.e., a new detached mount is created and a new set of mount properties applied to it without it ever having been exposed to userspace at all. This can be done in two ways. A new flag to open_tree() is added OPEN_TREE_CLEAR_IDMAP that clears the old idmapping and returns a mount that isn't idmapped. And then it is possible to set mount attributes on it again including creation of an idmapped mount. This has the consequence that a file descriptor must exist in userspace that doesn't have any idmapping applied and it will thus never work in unpriviledged scenarios. As a container would be able to remove the idmapping of the mount it has been given. That should be avoided. Instead, we add open_tree_attr() which works just like open_tree() but takes an optional struct mount_attr parameter. This is useful beyond idmappings as it fills a gap where a mount never exists in userspace without the necessary mount properties applied. This is particularly useful for mount options such as MOUNT_ATTR_{RDONLY,NOSUID,NODEV,NOEXEC}. To create a new idmapped mount the following works: // Create a first idmapped mount struct mount_attr attr = { .attr_set = MOUNT_ATTR_IDMAP .userns_fd = fd_userns }; fd_tree = open_tree(-EBADF, "/", OPEN_TREE_CLONE, &attr, sizeof(attr)); move_mount(fd_tree, "", -EBADF, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); // Create a second idmapped mount from the first idmapped mount attr.attr_set = MOUNT_ATTR_IDMAP; attr.userns_fd = fd_userns2; fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr)); // Create a second non-idmapped mount from the first idmapped mount: memset(&attr, 0, sizeof(attr)); attr.attr_clr = MOUNT_ATTR_IDMAP; fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr)); * patches from https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org: fs: allow changing idmappings fs: add kflags member to struct mount_kattr fs: add open_tree_attr() fs: add copy_mount_setattr() helper fs: add vfs_open_tree() helper Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-12fs: add open_tree_attr()Christian Brauner
Add open_tree_attr() which allow to atomically create a detached mount tree and set mount options on it. If OPEN_TREE_CLONE is used this will allow the creation of a detached mount with a new set of mount options without it ever being exposed to userspace without that set of mount options applied. Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-3-c25feb0d2eb3@kernel.org Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-12statmount: allow to retrieve idmappingsChristian Brauner
This adds the STATMOUNT_MNT_UIDMAP and STATMOUNT_MNT_GIDMAP options. It allows the retrieval of idmappings via statmount(). Currently it isn't possible to figure out what idmappings are applied to an idmapped mount. This information is often crucial. Before statmount() the only realistic options for an interface like this would have been to add it to /proc/<pid>/fdinfo/<nr> or to expose it in /proc/<pid>/mountinfo. Both solution would have been pretty ugly and would've shown information that is of strong interest to some application but not all. statmount() is perfect for this. The idmappings applied to an idmapped mount are shown relative to the caller's user namespace. This is the most useful solution that doesn't risk leaking information or confuse the caller. For example, an idmapped mount might have been created with the following idmappings: mount --bind -o X-mount.idmap="0:10000:1000 2000:2000:1 3000:3000:1" /srv /opt Listing the idmappings through statmount() in the same context shows: mnt_id: 2147485088 mnt_parent_id: 2147484816 fs_type: btrfs mnt_root: /srv mnt_point: /opt mnt_opts: ssd,discard=async,space_cache=v2,subvolid=5,subvol=/ mnt_uidmap[0]: 0 10000 1000 mnt_uidmap[1]: 2000 2000 1 mnt_uidmap[2]: 3000 3000 1 mnt_gidmap[0]: 0 10000 1000 mnt_gidmap[1]: 2000 2000 1 mnt_gidmap[2]: 3000 3000 1 But the idmappings might not always be resolvable in the caller's user namespace. For example: unshare --user --map-root In this case statmount() will skip any mappings that fil to resolve in the caller's idmapping: mnt_id: 2147485087 mnt_parent_id: 2147484016 fs_type: btrfs mnt_root: /srv mnt_point: /opt mnt_opts: ssd,discard=async,space_cache=v2,subvolid=5,subvol=/ The caller can differentiate between a mount not being idmapped and a mount that is idmapped but where all mappings fail to resolve in the caller's idmapping by check for the STATMOUNT_MNT_{G,U}IDMAP flag being raised but the number of mappings in ->mnt_{g,u}idmap_num being zero. Note that statmount() requires that the whole range must be resolvable in the caller's user namespace. If a subrange fails to map it will still list the map as not resolvable. This is a practical compromise to avoid having to find which subranges are resovable and wich aren't. Idmappings are listed as a string array with each mapping separated by zero bytes. This allows to retrieve the idmappings and immediately use them for writing to e.g., /proc/<pid>/{g,u}id_map and it also allow for simple iteration like: if (stmnt->mask & STATMOUNT_MNT_UIDMAP) { const char *idmap = stmnt->str + stmnt->mnt_uidmap; for (size_t idx = 0; idx < stmnt->mnt_uidmap_nr; idx++) { printf("mnt_uidmap[%lu]: %s\n", idx, idmap); idmap += strlen(idmap) + 1; } } Link: https://lore.kernel.org/r/20250204-work-mnt_idmap-statmount-v2-2-007720f39f2e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-12uidgid: add map_id_range_up()Christian Brauner
Add map_id_range_up() to verify that the full kernel id range can be mapped up in a given idmapping. This will be used in follow-up patches. Link: https://lore.kernel.org/r/20250204-work-mnt_idmap-statmount-v2-1-007720f39f2e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-12gpiolib: add gpiod_multi_set_value_cansleep()David Lechner
Add a new gpiod_multi_set_value_cansleep() helper function with fewer parameters than gpiod_set_array_value_cansleep(). Calling gpiod_set_array_value_cansleep() can get quite verbose. In many cases, the first arguments all come from the same struct gpio_descs, so having a separate function where we can just pass that cuts down on the boilerplate. Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20250210-gpio-set-array-helper-v3-1-d6a673674da8@baylibre.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-11net: phy: remove unused PHY_INIT_TIMEOUT and PHY_FORCE_TIMEOUTHeiner Kallweit
Both definitions are unused. Last users have been removed with: f3ba9d490d6e ("net: s6gmac: remove driver") 2bd229df5e2e ("net: phy: remove state PHY_FORCING") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/f8e7b8ed-a665-41ad-b0ce-cbfdb65262ef@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: phy: rename phy_set_eee_broken to phy_disable_eee_modeHeiner Kallweit
Consider that an EEE mode may not be broken but simply not supported by the MAC, and rename function phy_set_eee_broken(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/30deb630-3f6b-4ffb-a1e6-a9736021f43a@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11net: phy: rename eee_broken_modes to eee_disabled_modesHeiner Kallweit
This bitmap is used also if the MAC doesn't support an EEE mode. So the mode isn't necessarily broken in the PHY. Therefore rename the bitmap. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/6cd11422-dd67-4c87-a642-308de694af92@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11block: introduce init_wait_func()Muchun Song
There is already a macro DEFINE_WAIT_FUNC() to declare a wait_queue_entry with a specified waking function. But there is not a counterpart for initializing one wait_queue_entry with a specified waking function. So introducing init_wait_func() for this, which also could be used in iocost and rq-qos. Using default_wake_function() in rq_qos_wait() to wake up waiters, which could remove ->task field from rq_qos_wait_data. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20250208090416.38642-1-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-11pwm: lpss: Clarify the bypass member semantics in struct pwm_lpss_boardinfoAndy Shevchenko
Instead of an odd comment, cite the documentation, which says more clearly what's going on with the programming flow on some of the Intel SoCs. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2025-02-11spi: offload: types: include linux/bits.hDavid Lechner
Add #include <linux/bits.h> to linux/spi/offload/types.h since this file uses the BIT macro. Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20250210-spi-offload-extra-headers-v1-1-0f3356362254@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-02-11mtd: rawnand: qcom: finish converting register to FIELD_PREPChristian Marangi
With some research in some obscure old QSDK, it was possible to find the MASK of the last register there were still set with raw shift and convert them to FIELD_PREP API. This is only a cleanup and modernize the code a bit and doesn't make any behaviour change. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-02-11wifi: ieee80211: Add missing EHT MAC capabilitiesIlan Peer
Add missing EHT MAC capabilities definitions. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250205110958.6c1643c345a1.I7405b9c35cb39ae97a52c3fbcc36b0bd81e495dc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-11wifi: mac80211: Add support for EPCS configurationIlan Peer
Add support for configuring EPCS state: - When EPCS is enabled, send an EPCS enable request action frame to the AP. When the AP replies with EPCS enable response, enable EPCS by applying the QoS parameters provided by the AP. Do so for all the valid MLD links. Once EPCS is enabled, support processing of unsolicited EPCS enable response frames. - When EPCS is disabled, send an EPCS teardown request to the AP and apply the QoS parameters as obtained from the last received beacons. Do so for all the valid links. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250205110958.7a90afd7e140.I3f602d65f5c1fd849d6c70b12307dda33aa91ccb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-10r8152: add vendor/device ID pair for Dell Alienware AW1022zAleksander Jan Bajkowski
The Dell AW1022z is an RTL8156B based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Link: https://patch.msgid.link/20250206224033.980115-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10unroll: add generic loop unroll helpersAlexander Lobakin
There are cases when we need to explicitly unroll loops. For example, cache operations, filling DMA descriptors on very high speeds etc. Add compiler-specific attribute macros to give the compiler a hint that we'd like to unroll a loop. Example usage: #define UNROLL_BATCH 8 unrolled_count(UNROLL_BATCH) for (u32 i = 0; i < UNROLL_BATCH; i++) op(priv, i); Note that sometimes the compilers won't unroll loops if they think this would have worse optimization and perf than without unrolling, and that unroll attributes are available only starting GCC 8. For older compiler versions, no hints/attributes will be applied. For better unrolling/parallelization, don't have any variables that interfere between iterations except for the iterator itself. Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250206182630.3914318-2-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-10crash: Remove KEXEC_CORE_NOTE_NAMEAkihiko Odaki
KEXEC_CORE_NOTE_NAME is no longer used. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/r/20250115-elf-v5-6-0f9e55bbb2fc@daynix.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-10crash: Use note name macrosAkihiko Odaki
Use note name macros to match with the userspace's expectation. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/r/20250115-elf-v5-4-0f9e55bbb2fc@daynix.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-10hrtimers: Make hrtimer_update_function() less expensiveThomas Gleixner
The sanity checks in hrtimer_update_function() are expensive for high frequency usage like in the io/uring code due to locking. Hide the sanity checks behind CONFIG_PROVE_LOCKING, which has a decent chance to be enabled on a regular basis for testing. Fixes: 8f02e3563bb5 ("hrtimers: Introduce hrtimer_update_function()") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/87ikpllali.ffs@tglx
2025-02-10iio: buffer-dmaengine: add devm_iio_dmaengine_buffer_setup_with_handle()David Lechner
Add a new devm_iio_dmaengine_buffer_setup_with_handle() function to handle cases where the DMA channel is managed by the caller rather than being requested and released by the iio_dmaengine module. Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250207-dlech-mainline-spi-engine-offload-2-v8-9-e48a489be48c@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-02-10iio: buffer-dmaengine: split requesting DMA channel from allocating bufferDavid Lechner
Refactor the IIO dmaengine buffer code to split requesting the DMA channel from allocating the buffer. We want to be able to add a new function where the IIO device driver manages the DMA channel, so these two actions need to be separate. To do this, calling dma_request_chan() is moved from iio_dmaengine_buffer_alloc() to iio_dmaengine_buffer_setup_ext(). A new __iio_dmaengine_buffer_setup_ext() helper function is added to simplify error unwinding and will also be used by a new function in a later patch. iio_dmaengine_buffer_free() now only frees the buffer and does not release the DMA channel. A new iio_dmaengine_buffer_teardown() function is added to unwind everything done in iio_dmaengine_buffer_setup_ext(). This keeps things more symmetrical with obvious pairs alloc/free and setup/teardown. Calling dma_get_slave_caps() in iio_dmaengine_buffer_alloc() is moved so that we can avoid any gotos for error unwinding. Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250207-dlech-mainline-spi-engine-offload-2-v8-8-e48a489be48c@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-02-10Merge tag 'spi-offload' into togregJonathan Cameron
spi: Add offload APIs This series adds support for offloading complete SPI transactions, including the initiation, to the hardware.
2025-02-10seccomp: remove the 'sd' argument from __secure_computing()Oleg Nesterov
After the previous changes 'sd' is always NULL. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250128150313.GA15336@redhat.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-10seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTEROleg Nesterov
Depending on CONFIG_HAVE_ARCH_SECCOMP_FILTER, __secure_computing(NULL) will crash or not. This is not consistent/safe, especially considering that after the previous change __secure_computing(sd) is always called with sd == NULL. Fortunately, if CONFIG_HAVE_ARCH_SECCOMP_FILTER=n, __secure_computing() has no callers, these architectures use secure_computing_strict(). Yet it make sense make __secure_computing(NULL) safe in this case. Note also that with this change we can unexport secure_computing_strict() and change the current callers to use __secure_computing(NULL). Fixes: 8cf8dfceebda ("seccomp: Stub for !HAVE_ARCH_SECCOMP_FILTER") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250128150307.GA15325@redhat.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-10blk-crypto: add ioctls to create and prepare hardware-wrapped keysEric Biggers
Until this point, the kernel can use hardware-wrapped keys to do encryption if userspace provides one -- specifically a key in ephemerally-wrapped form. However, no generic way has been provided for userspace to get such a key in the first place. Getting such a key is a two-step process. First, the key needs to be imported from a raw key or generated by the hardware, producing a key in long-term wrapped form. This happens once in the whole lifetime of the key. Second, the long-term wrapped key needs to be converted into ephemerally-wrapped form. This happens each time the key is "unlocked". In Android, these operations are supported in a generic way through KeyMint, a userspace abstraction layer. However, that method is Android-specific and can't be used on other Linux systems, may rely on proprietary libraries, and also misleads people into supporting KeyMint features like rollback resistance that make sense for other KeyMint keys but don't make sense for hardware-wrapped inline encryption keys. Therefore, this patch provides a generic kernel interface for these operations by introducing new block device ioctls: - BLKCRYPTOIMPORTKEY: convert a raw key to long-term wrapped form. - BLKCRYPTOGENERATEKEY: have the hardware generate a new key, then return it in long-term wrapped form. - BLKCRYPTOPREPAREKEY: convert a key from long-term wrapped form to ephemerally-wrapped form. These ioctls are implemented using new operations in blk_crypto_ll_ops. Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650 Link: https://lore.kernel.org/r/20250204060041.409950-4-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-10blk-crypto: add basic hardware-wrapped key supportEric Biggers
To prevent keys from being compromised if an attacker acquires read access to kernel memory, some inline encryption hardware can accept keys which are wrapped by a per-boot hardware-internal key. This avoids needing to keep the raw keys in kernel memory, without limiting the number of keys that can be used. Such hardware also supports deriving a "software secret" for cryptographic tasks that can't be handled by inline encryption; this is needed for fscrypt to work properly. To support this hardware, allow struct blk_crypto_key to represent a hardware-wrapped key as an alternative to a raw key, and make drivers set flags in struct blk_crypto_profile to indicate which types of keys they support. Also add the ->derive_sw_secret() low-level operation, which drivers supporting wrapped keys must implement. For more information, see the detailed documentation which this patch adds to Documentation/block/inline-encryption.rst. Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650 Link: https://lore.kernel.org/r/20250204060041.409950-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-10mtd: spinand: otp: add helpers functionsMartin Kurbanov
The global functions spinand_otp_read() and spinand_otp_write() have been introduced. Since most SPI-NAND flashes read/write OTP in the same way, let's define global functions to avoid code duplication. Signed-off-by: Martin Kurbanov <mmkurbanov@salutedevices.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-02-10mtd: spinand: make spinand_{wait,otp_page_size} globalMartin Kurbanov
Change the functions spinand_wait() and spinand_otp_page_size() from static to global so that SPI NAND flash drivers don't duplicate it. Signed-off-by: Martin Kurbanov <mmkurbanov@salutedevices.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-02-10mtd: spinand: add OTP supportMartin Kurbanov
The MTD subsystem already supports accessing two OTP areas: user and factory. User areas can be written by the user. This patch provides the SPINAND_FACT_OTP_INFO and SPINAND_USER_OTP_INFO macros to add parameters to spinand_info. To implement OTP operations, the client (flash driver) is provided with callbacks for user area: .read(), .write(), .info(), .lock(), .erase(); and for factory area: .read(), .info(); Signed-off-by: Martin Kurbanov <mmkurbanov@salutedevices.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-02-10mtd: spinand: make spinand_{read,write}_page globalMartin Kurbanov
Change these functions from static to global so that to use them later in OTP operations. Since reading OTP pages is no different from reading pages from the main area. Signed-off-by: Martin Kurbanov <mmkurbanov@salutedevices.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-02-10lsm: fix a missing security_uring_allowed() prototypePaul Moore
The !CONFIG_SECURITY dummy function was declared as an "extern int" instead of "static inline" (likely a copy-n-paste error), which was was causing the compiler to complain about a missing prototype. This patch converts the dummy definition over to a "static inline" to resolve the compiler problems. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/oe-kbuild-all/202502081912.TeokpAAe-lkp@intel.com/ Fixes: c6ad9fdbd44b ("io_uring,lsm,selinux: add LSM hooks for io_uring_setup()") Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-10jbd2: remove unused transaction->t_private_listKemeng Shi
After we remove ext4 journal callback, transaction->t_private_list is not used anymore. Just remove unused transaction->t_private_list. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241218145414.1422946-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-02-10iomap: support incremental iomap_iter advancesBrian Foster
The current iomap_iter iteration model reads the mapping from the filesystem, processes the subrange of the operation associated with the current mapping, and returns the number of bytes processed back to the iteration code. The latter advances the position and remaining length of the iter in preparation for the next iteration. At the _iter() handler level, this tends to produce a processing loop where the local code pulls the current position and remaining length out of the iter, iterates it locally based on file offset, and then breaks out when the associated range has been fully processed. This works well enough for current handlers, but upcoming enhancements require a bit more flexibility in certain situations. Enhancements for zero range will lead to a situation where the processing loop is no longer a pure ascending offset walk, but rather dictated by pagecache state and folio lookup. Since folio lookup and write preparation occur at different levels, it is more difficult to manage position and length outside of the iter. To provide more flexibility to certain iomap operations, introduce support for incremental iomap_iter advances from within the operation itself. This allows more granular advances for operations that might not use the typical file offset based walk. Note that the semantics for operations that use incremental advances is slightly different than traditional operations. Operations that advance the iter directly are expected to return success or failure (i.e. 0 or negative error code) in iter.processed rather than the number of bytes processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-8-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10iomap: export iomap_iter_advance() and return remaining lengthBrian Foster
As a final step for generic iter advance, export the helper and update it to return the remaining length of the current iteration after the advance. This will usually be 0 in the iomap_iter() case, but will be useful for the various operations that iterate on their own and will be updated to advance as they progress. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-7-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10iomap: factor out iomap length helperBrian Foster
In preparation to support more granular iomap iter advancing, factor the pos/len values as parameters to length calculation. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-2-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10dmaengine: Use dma_request_channel() instead of __dma_request_channel()Andy Shevchenko
Reduce use of internal __dma_request_channel() function in public dmaengine.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250205145757.889247-3-andriy.shevchenko@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-02-10dmaengine: Replace dma_request_slave_channel() by dma_request_chan()Andy Shevchenko
Replace dma_request_slave_channel() by dma_request_chan() as suggested since the former is deprecated. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250205145757.889247-2-andriy.shevchenko@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-02-10VFS: repack LOOKUP_ bit flags.NeilBrown
The LOOKUP_ bits are not in order, which can make it awkward when adding new bits. Two bits have recently been added to the end which makes them look like "scoping flags", but in fact they aren't. Also LOOKUP_PARENT is described as "internal use only" but is used in fs/nfs/ This patch: - Moves these three flags into the "pathwalk mode" section - changes all bits to use the BIT(n) macro - Allocates bits in order leaving gaps between the sections, and documents those gaps. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250206054504.2950516-8-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10VFS: repack DENTRY_ flags.NeilBrown
Bits 13, 23, 24, and 27 are not used. Move all those holes to the end. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250206054504.2950516-7-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10PCI: pci_ids: add INTEL_HDA_PTL_HPierre-Louis Bossart
Add Intel PTL-H audio Device ID. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250210081730.22916-2-peter.ujfalusi@linux.intel.com
2025-02-09Merge tag 'kbuild-fixes-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Suppress false-positive -Wformat-{overflow,truncation}-non-kprintf warnings regardless of the W= option - Avoid CONFIG_TRIM_UNUSED_KSYMS dropping symbols passed to symbol_get() - Fix a build regression of the Debian linux-headers package * tag 'kbuild-fixes-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: install-extmod-build: add missing quotation marks for CC variable kbuild: fix misspelling in scripts/Makefile.lib kbuild: keep symbols for symbol_get() even with CONFIG_TRIM_UNUSED_KSYMS scripts/Makefile.extrawarn: Do not show clang's non-kprintf warnings at W=1
2025-02-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Correctly clean the BSS to the PoC before allowing EL2 to access it on nVHE/hVHE/protected configurations - Propagate ownership of debug registers in protected mode after the rework that landed in 6.14-rc1 - Stop pretending that we can run the protected mode without a GICv3 being present on the host - Fix a use-after-free situation that can occur if a vcpu fails to initialise the NV shadow S2 MMU contexts - Always evaluate the need to arm a background timer for fully emulated guest timers - Fix the emulation of EL1 timers in the absence of FEAT_ECV - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0 s390: - move some of the guest page table (gmap) logic into KVM itself, inching towards the final goal of completely removing gmap from the non-kvm memory management code. As an initial set of cleanups, move some code from mm/gmap into kvm and start using __kvm_faultin_pfn() to fault-in pages as needed; but especially stop abusing page->index and page->lru to aid in the pgdesc conversion. x86: - Add missing check in the fix to defer starting the huge page recovery vhost_task - SRSO_USER_KERNEL_NO does not need SYNTHESIZED_F" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits) KVM: x86/mmu: Ensure NX huge page recovery thread is alive before waking KVM: remove kvm_arch_post_init_vm KVM: selftests: Fix spelling mistake "initally" -> "initially" kvm: x86: SRSO_USER_KERNEL_NO is not synthesized KVM: arm64: timer: Don't adjust the EL2 virtual timer offset KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECV KVM: arm64: timer: Always evaluate the need for a soft timer KVM: arm64: Fix nested S2 MMU structures reallocation KVM: arm64: Fail protected mode init if no vgic hardware is present KVM: arm64: Flush/sync debug state in protected mode KVM: s390: selftests: Streamline uc_skey test to issue iske after sske KVM: s390: remove the last user of page->index KVM: s390: move PGSTE softbits KVM: s390: remove useless page->index usage KVM: s390: move gmap_shadow_pgt_lookup() into kvm KVM: s390: stop using lists to keep track of used dat tables KVM: s390: stop using page->index for non-shadow gmaps KVM: s390: move some gmap shadowing functions away from mm/gmap.c KVM: s390: get rid of gmap_translate() KVM: s390: get rid of gmap_fault() ...
2025-02-09lib/crc-t10dif: remove crc_t10dif_is_optimized()Eric Biggers
With the "crct10dif" algorithm having been removed from the crypto API, crc_t10dif_is_optimized() is no longer used. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208175647.12333-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-09crypto: ecdsa - Harden against integer overflows in DIV_ROUND_UP()Lukas Wunner
Herbert notes that DIV_ROUND_UP() may overflow unnecessarily if an ecdsa implementation's ->key_size() callback returns an unusually large value. Herbert instead suggests (for a division by 8): X / 8 + !!(X & 7) Based on this formula, introduce a generic DIV_ROUND_UP_POW2() macro and use it in lieu of DIV_ROUND_UP() for ->key_size() return values. Additionally, use the macro in ecc_digits_from_bytes(), whose "nbytes" parameter is a ->key_size() return value in some instances, or a user-specified ASN.1 length in the case of ecdsa_get_signature_rs(). Link: https://lore.kernel.org/r/Z3iElsILmoSu6FuC@gondor.apana.org.au/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>