linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-08-12	netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags	David Howells
	The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used correctly. The problem is that we try to set them up in the request initialisation, but we the cache may be in the process of setting up still, and so the state may not be correct. Further, we secondarily sample the cache state and make contradictory decisions later. The issue arises because we set up the cache resources, which allows the cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which triggers cache writing even if we didn't set the flags when allocating. Fix this in the following way: (1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in ->init_request() rather than trying to juggle that in netfs_alloc_request(). (2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is to be done, then PG_private_2 is to be used rather than only setting it if we decide to cache and then having netfs_rreq_unlock_folios() set the non-PG_private_2 writeback-to-cache if it wasn't set. (3) Split netfs_rreq_unlock_folios() into two functions, one of which contains the deprecated code for using PG_private_2 to avoid accidentally doing the writeback path - and always use it if USE_PGPRIV2 is set. (4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always wait for PG_private_2. This function is deprecated and only used by ceph anyway, and so label it so. (5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use fscache_operation_valid() on the cache_resources instead. This has the advantage of picking up the result of netfs_begin_cache_read() and fscache_begin_write_operation() - which are called after the object is initialised and will wait for the cache to come to a usable state. Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be applied on top of that. Without this as well, things like: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { and: WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386 may happen, along with some UAFs due to PG_private_2 not getting used to wait on writeback completion. Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty") Reported-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Hristo Venev <hristo@venev.name> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: ceph-devel@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/1173209.1723152682@warthog.procyon.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12	file: fix typo in take_fd() comment	Mathias Krause
	The explanatory comment above take_fd() contains a typo, fix that to not confuse readers. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240809135035.748109-1-minipli@grsecurity.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12	lsm: add the inode_free_security_rcu() LSM implementation hook	Paul Moore
	The LSM framework has an existing inode_free_security() hook which is used by LSMs that manage state associated with an inode, but due to the use of RCU to protect the inode, special care must be taken to ensure that the LSMs do not fully release the inode state until it is safe from a RCU perspective. This patch implements a new inode_free_security_rcu() implementation hook which is called when it is safe to free the LSM's internal inode state. Unfortunately, this new hook does not have access to the inode itself as it may already be released, so the existing inode_free_security() hook is retained for those LSMs which require access to the inode. Cc: stable@vger.kernel.org Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-12	lsm: cleanup lsm_hooks.h	Paul Moore
	Some cleanup and style corrections for lsm_hooks.h. * Drop the lsm_inode_alloc() extern declaration, it is not needed. * Relocate lsm_get_xattr_slot() and extern variables in the file to improve grouping of related objects. * Don't use tabs to needlessly align structure fields. Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-12	srcu: faster gp seq wrap-around	JP Kobryn
	Using a higher value for the initial gp sequence counters allows for wrapping to occur faster. It can help with surfacing any issues that may be happening as a result of the wrap around. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-12	Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull fd bitmap fix from Al Viro: "Fix bitmap corruption on close_range() by cleaning up copy_fd_bitmaps()" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
2024-08-12	platform/x86/intel/vsec: Add PMT read callbacks	David E. Box
	Some PMT providers require device specific actions before their telemetry can be read. Provide assignable PMT read callbacks to allow providers to perform those actions. Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20240725122346.4063913-3-michael.j.ruhl@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-08-12	platform/x86/intel/vsec.h: Move to include/linux	David E. Box
	Some drivers outside of PDX86 need access to the vsec header. Move it to include/linux to make it easier to include. Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20240725122346.4063913-2-michael.j.ruhl@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-08-12	ethtool: rss: don't report key if device doesn't support it	Jakub Kicinski
	marvell/otx2 and mvpp2 do not support setting different keys for different RSS contexts. Contexts have separate indirection tables but key is shared with all other contexts. This is likely fine, indirection table is the most important piece. Don't report the key-related parameters from such drivers. This prevents driver-errors, e.g. otx2 always writes the main key, even when user asks to change per-context key. The second reason is that without this change tracking the keys by the core gets complicated. Even if the driver correctly reject setting key with rss_context != 0, change of the main key would have to be reflected in the XArray for all additional contexts. Since the additional contexts don't have their own keys not including the attributes (in Netlink speak) seems intuitive. ethtool CLI seems to deal with it just fine. Having to set the flag in majority of the drivers is a bit tedious but not reporting the key is a safer default. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12	ethtool: make ethtool_ops::cap_rss_ctx_supported optional	Jakub Kicinski
	cap_rss_ctx_supported was created because the API for creating and configuring additional contexts is mux'ed with the normal RSS API. Presence of ops does not imply driver can actually support rss_context != 0 (in fact drivers mostly ignore that field). cap_rss_ctx_supported lets core check that the driver is context-aware before calling it. Now that we have .create_rxfh_context, there is no such ambiguity. We can depend on presence of the op. Make setting the bit optional. Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12	memory: ti-aemif: remove platform data support	Bartosz Golaszewski
	There are no longer any users of the ti-aemif driver that set up platform data from board files. We can shrink the driver by removing support for it. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20240809-ti-aemif-v1-1-27b1e5001390@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2024-08-12	Merge tag 'spi-acpi-lookup-dummy' of ↵	Takashi Iwai
	https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi into topic/cirrus-hp-g12 spi: Add empty versions of ACPI lookup functions A patch from Richard Fitzgerald adding dummy versions of the ACPI lookup functions for SPI: Provide empty versions of acpi_spi_count_resources(), acpi_spi_device_alloc() and acpi_spi_find_controller_by_adev() if the real functions are not being built. This commit fixes two problems with the original definitions: 1) There wasn't an empty version of these functions 2) The #if only depended on CONFIG_ACPI. But the functions are implemented in the core spi.c so CONFIG_SPI_MASTER must also be enabled for the real functions to exist.
2024-08-11	net: mii: constify advertising mask	Russell King (Oracle)
	Constify the advertising mask to linkmode functions that only read from the advertising mask. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11	context_tracking, rcu: Rename DYNTICK_IRQ_NONIDLE into CT_NESTING_IRQ_NONIDLE	Valentin Schneider
	The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any meaning. Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-11	context_tracking, rcu: Rename ct_dynticks_nmi_nesting_cpu() into ↵	Valentin Schneider
	ct_nmi_nesting_cpu() The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any meaning. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-11	context_tracking, rcu: Rename ct_dynticks_nmi_nesting() into ct_nmi_nesting()	Valentin Schneider
	The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any meaning. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-11	context_tracking, rcu: Rename struct context_tracking .dynticks_nmi_nesting ↵	Valentin Schneider
	into .nmi_nesting The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any meaning. [ neeraj.upadhyay: Fix htmldocs build error reported by Stephen Rothwell ] Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-11	context_tracking, rcu: Rename ct_dynticks_nesting_cpu() into ct_nesting_cpu()	Valentin Schneider
	The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the 'dynticks' prefix can be dropped without losing any meaning. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-11	context_tracking, rcu: Rename ct_dynticks_nesting() into ct_nesting()	Valentin Schneider
	The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-11	context_tracking, rcu: Rename struct context_tracking .dynticks_nesting into ↵	Valentin Schneider
	.nesting The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers. [ neeraj.upadhyay: Fix htmldocs build error reported by Stephen Rothwell ] Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-10	Merge tag 'i2c-for-6.11-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - Two fixes for SMBusAlert handling in the I2C core: one to avoid an endless loop when scanning for handlers and one to make sure handlers are always called even if HW has broken behaviour - I2C header build fix for when ACPI is enabled but I2C isn't - The testunit gets a rename in the code to match the documentation - Two fixes for the Qualcomm GENI I2C controller are cleaning up the error exit patch in the runtime_resume() function. The first is disabling the clock, the second disables the icc on the way out * tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: testunit: match HostNotify test name with docs i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume i2c: qcom-geni: Add missing clk_disable_unprepare in geni_i2c_runtime_resume i2c: Fix conditional for substituting empty ACPI functions i2c: smbus: Send alert notifications to all devices if source not found i2c: smbus: Improve handling of stuck alerts
2024-08-10	Revert "lib/mpi: Introduce ec implementation to MPI library"	Herbert Xu
	This reverts commit d58bb7e55a8a65894cc02f27c3e2bf9403e7c40f. It's no longer needed since sm2 has been removed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-08-09	Merge tag 'irq-domain-24-08-09' into irq/core	Thomas Gleixner
	Merge the irqdomain changes which are required for regmap to apply depending patches and therefore tagged.
2024-08-09	irqdomain: Allow giving name suffix for domain	Matti Vaittinen
	Devices can provide multiple interrupt lines. One reason for this is that a device has multiple subfunctions, each providing its own interrupt line. Another reason is that a device can be designed to be used (also) on a system where some of the interrupts can be routed to another processor. A line often further acts as a demultiplex for specific interrupts and has it's respective set of interrupt (status, mask, ack, ...) registers. Regmap supports the handling of these registers and demultiplexing interrupts, but the interrupt domain code ends up assigning the same name for the per interrupt line domains. This causes a naming collision in the debugFS code and leads to confusion, as /proc/interrupts shows two separate interrupts with the same domain name and hardware interrupt number. Instead of adding a workaround in regmap or driver code, allow giving a name suffix for the domain name when the domain is created. Add a name_suffix field in the irq_domain_info structure and make irq_domain_instantiate() use this suffix if it is given when a domain is created. [ tglx: Adopt it to the cleanup patch and fixup the invalid NULL return ] Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/871q2yvk5x.ffs@tglx
2024-08-09	irqdomain: Simplify simple and legacy domain creation	Matti Vaittinen
	irq_domain_create_simple() and irq_domain_create_legacy() use __irq_domain_instantiate(), but have extra handling of allocating interrupt descriptors and associating interrupts in them. Some of that is duplicated. There are also call sites which have conditonals to invoke different interrupt domain creator functions, where one of them is usually irq_domain_create_legacy(). Alternatively they associate the interrupts for the legacy case after creating the domain. Moving the extra logic of irq_domain_create_simple()/legacy() into __irq_domain_instantiate() allows to consolidate that. Introduce hwirq_base and virq_base members in the irq_domain_info structure, which allows to transport the required information and add the conditional interrupt descriptor allocation and interrupt association into __irq_domain_instantiate(). This reduces irq_domain_create_legacy() and irq_domain_create_simple() to trivial wrappers which fill in the info structure and allows call sites which must support the legacy case along with more modern mechanism to select the domain type via the parameters of the info struct. [ tglx: Massaged change log ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/32d07bd79eb2b5416e24da9e9e8fe5955423dcf9.1723120028.git.mazziesaccount@gmail.com
2024-08-09	Merge tag 'bitmap-6.11-rc' of https://github.com/norov/linux	Linus Torvalds
	Pull cpumask fix from Yury Norov: "Fix for cpumask merge" [ Mea culpa, this was my mismerge due to too much cut-and-paste - Linus ] * tag 'bitmap-6.11-rc' of https://github.com/norov/linux: cpumask: Fix crash on updating CPU enabled mask
2024-08-09	Merge tag 'probes-fixes-v6.11-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull kprobe fixes from Masami Hiramatsu: - Fix misusing str_has_prefix() parameter order to check symbol prefix correctly - bpf: remove unused declaring of bpf_kprobe_override * tag 'probes-fixes-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: kprobes: Fix to check symbol prefixes correctly bpf: kprobe: remove unused declaring of bpf_kprobe_override
2024-08-09	Merge tag 'i2c-host-fixes-6.11-rc3' of ↵	Wolfram Sang
	git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current Two fixes on the Qualcomm GENI I2C controller are cleaning up the error exit patch in the runtime_resume() function. The first is disabling the clock, the second disables the icc on the way out.
2024-08-09	Merge patch series "nsfs: iterate through mount namespaces"	Christian Brauner
	Christian Brauner <brauner@kernel.org> says: Recently, we added the ability to list mounts in other mount namespaces and the ability to retrieve namespace file descriptors without having to go through procfs by deriving them from pidfds. This extends nsfs in two ways: (1) Add the ability to retrieve information about a mount namespace via NS_MNT_GET_INFO. This will return the mount namespace id and the number of mounts currently in the mount namespace. The number of mounts can be used to size the buffer that needs to be used for listmount() and is in general useful without having to actually iterate through all the mounts. The structure is extensible. (2) Add the ability to iterate through all mount namespaces over which the caller holds privilege returning the file descriptor for the next or previous mount namespace. To retrieve a mount namespace the caller must be privileged wrt to it's owning user namespace. This means that PID 1 on the host can list all mounts in all mount namespaces or that a container can list all mounts of its nested containers. Optionally pass a structure for NS_MNT_GET_INFO with NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount namespace in one go. (1) and (2) can be implemented for other namespace types easily. Together with recent api additions this means one can iterate through all mounts in all mount namespaces without ever touching procfs. Here's a sample program list_all_mounts_everywhere.c: // SPDX-License-Identifier: GPL-2.0-or-later #define _GNU_SOURCE #include <asm/unistd.h> #include <assert.h> #include <errno.h> #include <fcntl.h> #include <getopt.h> #include <linux/stat.h> #include <sched.h> #include <stddef.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/param.h> #include <sys/pidfd.h> #include <sys/stat.h> #include <sys/statfs.h> #define die_errno(format, ...) \ do { \ fprintf(stderr, "%m \| %s: %d: %s: " format "\n", __FILE__, \ __LINE__, __func__, ##__VA_ARGS__); \ exit(EXIT_FAILURE); \ } while (0) /* Get the id for a mount namespace / #define NS_GET_MNTNS_ID _IO(0xb7, 0x5) / Get next mount namespace. / struct mnt_ns_info { __u32 size; __u32 nr_mounts; __u64 mnt_ns_id; }; #define MNT_NS_INFO_SIZE_VER0 16 / size of first published struct / / Get information about namespace. / #define NS_MNT_GET_INFO _IOR(0xb7, 10, struct mnt_ns_info) / Get next namespace. / #define NS_MNT_GET_NEXT _IOR(0xb7, 11, struct mnt_ns_info) / Get previous namespace. / #define NS_MNT_GET_PREV _IOR(0xb7, 12, struct mnt_ns_info) #define PIDFD_GET_MNT_NAMESPACE _IO(0xFF, 3) #define STATX_MNT_ID_UNIQUE 0x00004000U / Want/got extended stx_mount_id / #define __NR_listmount 458 #define __NR_statmount 457 / * @mask bits for statmount(2) / #define STATMOUNT_SB_BASIC 0x00000001U / Want/got sb_... / #define STATMOUNT_MNT_BASIC 0x00000002U / Want/got mnt_... / #define STATMOUNT_PROPAGATE_FROM 0x00000004U / Want/got propagate_from / #define STATMOUNT_MNT_ROOT 0x00000008U / Want/got mnt_root / #define STATMOUNT_MNT_POINT 0x00000010U / Want/got mnt_point / #define STATMOUNT_FS_TYPE 0x00000020U / Want/got fs_type / #define STATMOUNT_MNT_NS_ID 0x00000040U / Want/got mnt_ns_id / #define STATMOUNT_MNT_OPTS 0x00000080U / Want/got mnt_opts / struct statmount { __u32 size; / Total size, including strings / __u32 mnt_opts; __u64 mask; / What results were written / __u32 sb_dev_major; / Device ID / __u32 sb_dev_minor; __u64 sb_magic; / ..._SUPER_MAGIC / __u32 sb_flags; / SB_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} / __u32 fs_type; / [str] Filesystem type / __u64 mnt_id; / Unique ID of mount / __u64 mnt_parent_id; / Unique ID of parent (for root == mnt_id) / __u32 mnt_id_old; / Reused IDs used in proc/.../mountinfo / __u32 mnt_parent_id_old; __u64 mnt_attr; / MOUNT_ATTR_... / __u64 mnt_propagation; / MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} / __u64 mnt_peer_group; / ID of shared peer group / __u64 mnt_master; / Mount receives propagation from this ID / __u64 propagate_from; / Propagation from in current namespace / __u32 mnt_root; / [str] Root of mount relative to root of fs / __u32 mnt_point; / [str] Mountpoint relative to current root / __u64 mnt_ns_id; __u64 __spare2[49]; char str[]; / Variable size part containing strings / }; struct mnt_id_req { __u32 size; __u32 spare; __u64 mnt_id; __u64 param; __u64 mnt_ns_id; }; #define MNT_ID_REQ_SIZE_VER1 32 / sizeof second published struct / #define LSMT_ROOT 0xffffffffffffffff / root mount / static int __statmount(__u64 mnt_id, __u64 mnt_ns_id, __u64 mask, struct statmount stmnt, size_t bufsize, unsigned int flags) { struct mnt_id_req req = { .size = MNT_ID_REQ_SIZE_VER1, .mnt_id = mnt_id, .param = mask, .mnt_ns_id = mnt_ns_id, }; return syscall(__NR_statmount, &req, stmnt, bufsize, flags); } static struct statmount sys_statmount(__u64 mnt_id, __u64 mnt_ns_id, __u64 mask, unsigned int flags) { size_t bufsize = 1 << 15; struct statmount stmnt = NULL, tmp = NULL; int ret; for (;;) { tmp = realloc(stmnt, bufsize); if (!tmp) goto out; stmnt = tmp; ret = __statmount(mnt_id, mnt_ns_id, mask, stmnt, bufsize, flags); if (!ret) return stmnt; if (errno != EOVERFLOW) goto out; bufsize <<= 1; if (bufsize >= UINT_MAX / 2) goto out; } out: free(stmnt); printf("statmount failed"); return NULL; } static ssize_t sys_listmount(__u64 mnt_id, __u64 last_mnt_id, __u64 mnt_ns_id, __u64 list[], size_t num, unsigned int flags) { struct mnt_id_req req = { .size = MNT_ID_REQ_SIZE_VER1, .mnt_id = mnt_id, .param = last_mnt_id, .mnt_ns_id = mnt_ns_id, }; return syscall(__NR_listmount, &req, list, num, flags); } int main(int argc, char argv[]) { #define LISTMNT_BUFFER 10 __u64 list[LISTMNT_BUFFER], last_mnt_id = 0; int ret, pidfd, fd_mntns; struct mnt_ns_info info = {}; pidfd = pidfd_open(getpid(), 0); if (pidfd < 0) die_errno("pidfd_open failed"); fd_mntns = ioctl(pidfd, PIDFD_GET_MNT_NAMESPACE, 0); if (fd_mntns < 0) die_errno("ioctl(PIDFD_GET_MNT_NAMESPACE) failed"); ret = ioctl(fd_mntns, NS_MNT_GET_INFO, &info); if (ret < 0) die_errno("ioctl(NS_GET_MNTNS_ID) failed"); printf("Listing %u mounts for mount namespace %d:%llu\n", info.nr_mounts, fd_mntns, info.mnt_ns_id); for (;;) { ssize_t nr_mounts; next: nr_mounts = sys_listmount(LSMT_ROOT, last_mnt_id, info.mnt_ns_id, list, LISTMNT_BUFFER, 0); if (nr_mounts <= 0) { printf("Finished listing mounts for mount namespace %d:%llu\n\n", fd_mntns, info.mnt_ns_id); ret = ioctl(fd_mntns, NS_MNT_GET_NEXT, 0); if (ret < 0) die_errno("ioctl(NS_MNT_GET_NEXT) failed"); close(ret); ret = ioctl(fd_mntns, NS_MNT_GET_NEXT, &info); if (ret < 0) { if (errno == ENOENT) { printf("Finished listing all mount namespaces\n"); exit(0); } die_errno("ioctl(NS_MNT_GET_NEXT) failed"); } close(fd_mntns); fd_mntns = ret; last_mnt_id = 0; printf("Listing %u mounts for mount namespace %d:%llu\n", info.nr_mounts, fd_mntns, info.mnt_ns_id); goto next; } for (size_t cur = 0; cur < nr_mounts; cur++) { struct statmount stmnt; last_mnt_id = list[cur]; stmnt = sys_statmount(last_mnt_id, info.mnt_ns_id, STATMOUNT_SB_BASIC \| STATMOUNT_MNT_BASIC \| STATMOUNT_MNT_ROOT \| STATMOUNT_MNT_POINT \| STATMOUNT_MNT_NS_ID \| STATMOUNT_MNT_OPTS \| STATMOUNT_FS_TYPE, 0); if (!stmnt) { printf("Failed to statmount(%llu) in mount namespace(%llu)\n", last_mnt_id, info.mnt_ns_id); continue; } printf("mnt_id(%u/%llu) \| mnt_parent_id(%u/%llu): %s @ %s ==> %s with options: %s\n", stmnt->mnt_id_old, stmnt->mnt_id, stmnt->mnt_parent_id_old, stmnt->mnt_parent_id, stmnt->str + stmnt->fs_type, stmnt->str + stmnt->mnt_root, stmnt->str + stmnt->mnt_point, stmnt->str + stmnt->mnt_opts); free(stmnt); } } exit(0); } patches from https://lore.kernel.org/r/20240719-work-mount-namespace-v1-0-834113cab0d2@kernel.org: nsfs: iterate through mount namespaces file: add fput() cleanup helper fs: add put_mnt_ns() cleanup helper fs: allow mount namespace fd Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-09	file: add fput() cleanup helper	Christian Brauner
	Add a simple helper to put a file reference. Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-4-834113cab0d2@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-09	fs: add put_mnt_ns() cleanup helper	Christian Brauner
	Add a simple helper to put a mount namespace reference. Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-3-834113cab0d2@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-08	net: sungem_phy: Constify struct mii_phy_def	Christophe JAILLET
	'struct mii_phy_def' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increase overall security. While at it fix the checkpatch warning related to this patch (some missing newlines and spaces around *) On a x86_64, with allmodconfig: Before: ====== 27709 928 0 28637 6fdd drivers/net/sungem_phy.o After: ===== text data bss dec hex filename 28157 476 0 28633 6fd9 drivers/net/sungem_phy.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/54c3b30930f80f4895e6fa2f4234714fdea4ef4e.1723033266.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Link: https://patch.msgid.link/20240808170148.3629934-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-08	Merge tag 'net-6.11-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth. Current release - regressions: - eth: bnxt_en: fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() on older chips Current release - new code bugs: - ethtool: fix off-by-one error / kdoc contradicting the code for max RSS context IDs - Bluetooth: hci_qca: - QCA6390: fix support on non-DT platforms - QCA6390: don't call pwrseq_power_off() twice - fix a NULL-pointer derefence at shutdown - eth: ice: fix incorrect assigns of FEC counters Previous releases - regressions: - mptcp: fix handling endpoints with both 'signal' and 'subflow' flags set - virtio-net: fix changing ring count when vq IRQ coalescing not supported - eth: gve: fix use of netif_carrier_ok() during reconfig / reset Previous releases - always broken: - eth: idpf: fix bugs in queue re-allocation on reconfig / reset - ethtool: fix context creation with no parameters Misc: - linkwatch: use system_unbound_wq to ease RTNL contention" * tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (41 commits) net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897. ethtool: Fix context creation with no parameters net: ethtool: fix off-by-one error in max RSS context IDs net: pse-pd: tps23881: include missing bitfield.h header net: fec: Stop PPS on driver remove net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities l2tp: fix lockdep splat net: stmmac: dwmac4: fix PCS duplex mode decode idpf: fix UAFs when destroying the queues idpf: fix memleak in vport interrupt configuration idpf: fix memory leaks and crashes while performing a soft reset bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register() net/smc: add the max value of fallback reason count Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor Bluetooth: l2cap: always unlock channel in l2cap_conless_channel() Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390 ice: Fix incorrect assigns of FEC counts ...
2024-08-08	cpumask: Fix crash on updating CPU enabled mask	Gavin Shan
	The CPU enabled mask instead of the CPU possible mask should be used by set_cpu_enabled(). Otherwise, we run into crash due to write to the read-only CPU possible mask when vCPU is hot added on ARM64. (qemu) device_add host-arm-cpu,id=cpu1,socket-id=1 Unable to handle kernel write to read-only memory at virtual address ffff800080fa7190 : Call trace: register_cpu+0x1a4/0x2e8 arch_register_cpu+0x84/0xd8 acpi_processor_add+0x480/0x5b0 acpi_bus_attach+0x1c4/0x300 acpi_dev_for_one_check+0x3c/0x50 device_for_each_child+0x68/0xc8 acpi_dev_for_each_child+0x48/0x80 acpi_bus_attach+0x84/0x300 acpi_bus_scan+0x74/0x220 acpi_scan_rescan_bus+0x54/0x88 acpi_device_hotplug+0x208/0x478 acpi_hotplug_work_fn+0x2c/0x50 process_one_work+0x15c/0x3c0 worker_thread+0x2ec/0x400 kthread+0x120/0x130 ret_from_fork+0x10/0x20 Fix it by passing the CPU enabled mask instead of the CPU possible mask to set_cpu_enabled(). Fixes: 51c4767503d5 ("Merge tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux") Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2024-08-08	Merge tag 'drm-misc-next-2024-08-01' of ↵	Daniel Vetter
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
2024-08-08	net: ethtool: fix off-by-one error in max RSS context IDs	Edward Cree
	Both ethtool_ops.rxfh_max_context_id and the default value used when it's not specified are supposed to be exclusive maxima (the former is documented as such; the latter, U32_MAX, cannot be used as an ID since it equals ETH_RXFH_CONTEXT_ALLOC), but xa_alloc() expects an inclusive maximum. Subtract one from 'limit' to produce an inclusive maximum, and pass that to xa_alloc(). Increase bnxt's max by one to prevent a (very minor) regression, as BNXT_MAX_ETH_RSS_CTX is an inclusive max. This is safe since bnxt is not actually hard-limited; BNXT_MAX_ETH_RSS_CTX is just a leftover from old driver code that managed context IDs itself. Rename rxfh_max_context_id to rxfh_max_num_contexts to make its semantics (hopefully) more obvious. Fixes: 847a8ab18676 ("net: ethtool: let the core choose RSS context IDs") Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/5a2d11a599aa5b0cc6141072c01accfb7758650c.1723045898.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-08	genirq: Remove irq_chip_regs:: Polarity	Jiri Slaby (SUSE)
	The polarity member of struct irq_chip_regs is unused. Remove it along with its kernel-doc. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240808104118.430670-3-jirislaby@kernel.org
2024-08-08	genirq: Remove unused irq_chip_generic:: {type,polarity}_cache	Jiri Slaby (SUSE)
	The type_cache and polarity_cache members of struct irq_chip_generic are unused. Remove them both along with their kernel-doc. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240808104118.430670-2-jirislaby@kernel.org
2024-08-07	ring-buffer: Remove unused function ring_buffer_nr_pages()	Jianhui Zhou
	Because ring_buffer_nr_pages() is not an inline function and user accesses buffer->buffers[cpu]->nr_pages directly, the function ring_buffer_nr_pages is removed. Signed-off-by: Jianhui Zhou <912460177@qq.com> Link: https://lore.kernel.org/tencent_F4A7E9AB337F44E0F4B858D07D19EF460708@qq.com Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-07	tracing: Use refcount for trace_event_file reference counter	Steven Rostedt
	Instead of using an atomic counter for the trace_event_file reference counter, use the refcount interface. It has various checks to make sure the reference counting is correct, and will warn if it detects an error (like refcount_inc() on '0'). Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20240726144208.687cce24@rorschach.local.home Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-07	buffer: Convert __block_write_begin() to take a folio	Matthew Wilcox (Oracle)
	Almost all callers have a folio now, so change __block_write_begin() to take a folio and remove a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	fs: Convert aops->write_begin to take a folio	Matthew Wilcox (Oracle)
	Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	fs: Convert aops->write_end to take a folio	Matthew Wilcox (Oracle)
	Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07	buffer: Convert block_write_end() to take a folio	Matthew Wilcox (Oracle)
	All callers now have a folio, so pass it in instead of converting from a folio to a page and back to a folio again. Saves a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-06	x86/traps: Enable UBSAN traps on x86	Gatlin Newhouse
	Currently ARM64 extracts which specific sanitizer has caused a trap via encoded data in the trap instruction. Clang on x86 currently encodes the same data in the UD1 instruction but x86 handle_bug() and is_valid_bugaddr() currently only look at UD2. Bring x86 to parity with ARM64, similar to commit 25b84002afb9 ("arm64: Support Clang UBSAN trap codes for better reporting"). See the llvm links for information about the code generation. Enable the reporting of UBSAN sanitizer details on x86 compiled with clang when CONFIG_UBSAN_TRAP=y by analysing UD1 and retrieving the type immediate which is encoded by the compiler after the UD1. [ tglx: Simplified it by moving the printk() into handle_bug() ] Signed-off-by: Gatlin Newhouse <gatlin.newhouse@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20240724000206.451425-1-gatlin.newhouse@gmail.com Link: https://github.com/llvm/llvm-project/commit/c5978f42ec8e9#diff-bb68d7cd885f41cfc35843998b0f9f534adb60b415f647109e597ce448e92d9f Link: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86InstrSystem.td#L27
2024-08-05	binfmt_elf, coredump: Log the reason of the failed core dumps	Roman Kisel
	Missing, failed, or corrupted core dumps might impede crash investigations. To improve reliability of that process and consequently the programs themselves, one needs to trace the path from producing a core dumpfile to analyzing it. That path starts from the core dump file written to the disk by the kernel or to the standard input of a user mode helper program to which the kernel streams the coredump contents. There are cases where the kernel will interrupt writing the core out or produce a truncated/not-well-formed core dump without leaving a note. Add logging for the core dump collection failure paths to be able to reason what has gone wrong when the core dump is malformed or missing. Report the size of the data written to aid in diagnosing the user mode helper. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Link: https://lore.kernel.org/r/20240718182743.1959160-3-romank@linux.microsoft.com Signed-off-by: Kees Cook <kees@kernel.org>
2024-08-05	coredump: Standartize and fix logging	Roman Kisel
	The coredump code does not log the process ID and the comm consistently, logs unescaped comm when it does log it, and does not always use the ratelimited logging. That makes it harder to analyze logs and puts the system at the risk of spamming the system log incase something crashes many times over and over again. Fix that by logging TGID and comm (escaped) consistently and using the ratelimited logging always. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Tested-by: Allen Pais <apais@linux.microsoft.com> Link: https://lore.kernel.org/r/20240718182743.1959160-2-romank@linux.microsoft.com Signed-off-by: Kees Cook <kees@kernel.org>
2024-08-05	net/mlx5: Add support for MTPTM and MTCTR registers	Rahul Rameshbabu
	Make Management Precision Time Measurement (MTPTM) register and Management Cross Timestamp (MTCTR) register usable in mlx5 driver. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20240730134055.1835261-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-05	inet6: constify 'struct net' parameter of various lookup helpers	Eric Dumazet
	Following helpers do not touch their struct net argument: - bpf_sk_lookup_run_v6() - __inet6_lookup_established() - inet6_lookup_reuseport() - inet6_lookup_listener() - inet6_lookup_run_sk_lookup() - __inet6_lookup() - inet6_lookup() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240802134029.3748005-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>