linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-03-02	nvme-rdma: move NVME_RDMA_IP_PORT from common file	Max Gurtovoy
	The correct place for this definition is the nvme rdma header file and not the common nvme header file. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Israel Rukshin <israelr@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-02	USB: typec: no opencoding FIELD_GET	Oliver Neukum
	We have a macro. It should be used. Signed-off-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20240229132401.3270-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-02	Merge tag 'thunderbolt-for-v6.9-rc1' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next Mika writes: thunderbolt: Changes for v6.9 merge window This includes following USB4/Thunderbolt changes for the v6.9 merge window: - Reset the topology also for USB4 v1 routers on driver load - DisplayPort tunneling and bandwidth allocation mode improvements - Tracepoint support for the control channel - Couple of minor fixes and cleanups. All these have been in linux-next with no reported issues. * tag 'thunderbolt-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (23 commits) thunderbolt: Constify the struct device_type usage thunderbolt: Add trace events support for the control channel thunderbolt: Keep the domain powered when USB4 port is in redrive mode thunderbolt: Improve DisplayPort tunnel setup process to be more robust thunderbolt: Calculate DisplayPort tunnel bandwidth after DPRX capabilities read thunderbolt: Reserve released DisplayPort bandwidth for a group for 10 seconds thunderbolt: Introduce tb_tunnel_direction_downstream() thunderbolt: Re-order bandwidth group functions thunderbolt: Fail the failed bandwidth request properly thunderbolt: Log an error if DPTX request is not cleared thunderbolt: Handle bandwidth allocation mode disable request thunderbolt: Re-calculate estimated bandwidth when allocation mode is enabled thunderbolt: Use DP_LOCAL_CAP for maximum bandwidth calculation thunderbolt: Correct typo in host_reset parameter thunderbolt: Skip discovery also in USB4 v2 host thunderbolt: Reset only non-USB4 host routers in resume thunderbolt: Remove usage of the deprecated ida_simple_xx() API thunderbolt: Fix rollback in tb_port_lane_bonding_enable() for lane 1 thunderbolt: Fix XDomain rx_lanes_show and tx_lanes_show thunderbolt: Reset topology created by the boot firmware ...
2024-03-02	Merge tag 'coresight-next-v6.9' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Suzuki writes: coresight: hwtracing subsystem updates for v6.9 Changes targeting Linux v6.9 include: - CoreSight: Enable W=1 warnings as default - CoreSight: Clean up sysfs/perf mode handling for tracing - Support for Qualcomm TPDM CMB Dataset - Miscellaneous fixes to the CoreSight subsystem - Fix for hisi_ptt PMU to reject events targeting other PMUs Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> * tag 'coresight-next-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (32 commits) coresight-tpda: Change qcom,dsb-element-size to qcom,dsb-elem-bits dt-bindings: arm: qcom,coresight-tpdm: Rename qcom,dsb-element-size hwtracing: hisi_ptt: Move type check to the beginning of hisi_ptt_pmu_event_init() coresight: tpdm: Fix build break due to uninitialised field coresight: etm4x: Set skip_power_up in etm4_init_arch_data function coresight-tpdm: Add msr register support for CMB dt-bindings: arm: qcom,coresight-tpdm: Add support for TPDM CMB MSR register coresight-tpdm: Add timestamp control register support for the CMB coresight-tpdm: Add pattern registers support for CMB coresight-tpdm: Add support to configure CMB coresight-tpda: Add support to configure CMB element coresight-tpdm: Add CMB dataset support dt-bindings: arm: qcom,coresight-tpdm: Add support for CMB element size coresight-tpdm: Optimize the useage of tpdm_has_dsb_dataset coresight-tpdm: Optimize the store function of tpdm simple dataset coresight: Add helper for setting csdev->mode coresight: Add a helper for getting csdev->mode coresight: Add helper for atomically taking the device coresight: Add explicit member initializers to coresight_dev_type coresight: Remove unused stubs ...
2024-03-02	Merge tag 'mhi-for-v6.9' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Host ======== - Added new MHI_PM_SYS_ERR_FAIL state to the MHI state machine to properly cleanup the channel state if the device fails to respond to the MHI reset during SYS_ERR handling. This issue was discovered with the Qualcomm AIC100 AI accelerator device. - Modified the code that reads and exposes the OEM_PK_HASH registers through sysfs to read them on-demand instead of reading once during boot. Qualcomm AIC100 devices support provisioning the keys dynamically, so this allows the users to know the upto date information. - Added tracepoint support to expose the debug information over tracefs. - Reverted the commit that reads the MHI device revision from the device during boot. This is done because the read info was not used anywhere (dead code) and also it is not possible to read the revision info from all the devices. - Constified the modem config for Telit FN980 modem as required by the MHI core. MHI Endpoint ============ - Replaced kzalloc() with kcalloc() in an effort to avoid integer overflows during multiplication. Even though there is no potential overflow in the endpoint code, this is done for the sake of uniformity and best practice. - Fixed the kmem_cache_create() failure check to use the correct variable. * tag 'mhi-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: host: pci_generic: constify modem_telit_fn980_hw_v1_config bus: mhi: host: Change the trace string for the userspace tools mapping bus: mhi: ep: check the correct variable in mhi_ep_register_controller() Revert "bus: mhi: core: Add support for reading MHI info from device" bus: mhi: host: Add tracing support bus: mhi: ep: Use kcalloc() instead of kzalloc() bus: mhi: host: Read PK HASH dynamically bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state
2024-03-02	block: define bvec_iter as __packed __aligned(4)	Ming Lei
	In commit 19416123ab3e ("block: define 'struct bvec_iter' as packed"), what we need is to save the 4byte padding, and avoid `bio` to spread on one extra cache line. It is enough to define it as '__packed __aligned(4)', as '__packed' alone means byte aligned, and can cause compiler to generate horrible code on architectures that don't support unaligned access in case that bvec_iter is embedded in other structures. Cc: Mikulas Patocka <mpatocka@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 19416123ab3e ("block: define 'struct bvec_iter' as packed") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-01	net/mlx5: Check capability for fw_reset	Moshe Shemesh
	Functions which can't access MFRL (Management Firmware Reset Level) register, have no use of fw_reset structures or events. Remove fw_reset structures allocation and registration for fw reset events notifications for these functions. Having the devlink param enable_remote_dev_reset on functions that don't have this capability is misleading as these functions are not allowed to influence the reset flow. Hence, this patch removes this parameter for such functions. In addition, return not supported on devlink reload action fw_activate for these functions. Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01	compiler.h: Explain how __is_constexpr() works	Kees Cook
	The __is_constexpr() macro is dark magic. Shed some light on it with a comment to explain how and why it works. Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20240301044428.work.411-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01	overflow: Allow non-type arg to type_max() and type_min()	Kees Cook
	A common use of type_max() is to find the max for the type of a variable. Using the pattern type_max(typeof(var)) is needlessly verbose. Instead, since typeof(type) == type we can just explicitly call typeof() on the argument to type_max() and type_min(). Add wrappers for readability. We can do some replacements right away: $ git grep '\btype_$min\\|max$(typeof' \| wc -l 11 Link: https://lore.kernel.org/r/20240301062221.work.840-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-01	power: supply: core: make power_supply_class constant	Ricardo B. Marliere
	Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the power_supply_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240301-class_cleanup-power-v1-1-97e0b7bf9c94@marliere.net Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2024-03-01	sched/idle: Conditionally handle tick broadcast in default_idle_call()	Thomas Gleixner
	The x86 architecture has an idle routine for AMD CPUs which are affected by erratum 400. On the affected CPUs the local APIC timer stops in the C1E halt state. It therefore requires tick broadcasting. The invocation of tick_broadcast_enter()/exit() from this function violates the RCU constraints because it can end up in lockdep or tracing, which rightfully triggers a warning. tick_broadcast_enter()/exit() must be invoked before ct_cpuidle_enter() and after ct_cpuidle_exit() in default_idle_call(). Add a static branch conditional invocation of tick_broadcast_enter()/exit() into this function to allow X86 to replace the AMD specific idle code. It's guarded by a config switch which will be selected by x86. Otherwise it's a NOOP. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240229142248.266708822@linutronix.de
2024-03-01	block: add a queue_limits_stack_bdev helper	Christoph Hellwig
	Add a small wrapper around blk_stack_limits that allows passing a bdev for the bottom device and prints an error in case of misaligned device. The name fits into the new queue limits API and the intent is to eventually replace disk_stack_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240228225653.947152-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01	block: add a queue_limits_set helper	Christoph Hellwig
	Add a small wrapper around queue_limits_commit_update for stacking drivers that don't want to update existing limits, but set an entirely new set. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240228225653.947152-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01	svcrdma: Add Write chunk WRs to the RPC's Send WR chain	Chuck Lever
	Chain RDMA Writes that convey Write chunks onto the local Send chain. This means all WRs for an RPC Reply are now posted with a single ib_post_send() call, and there is a single Send completion when all of these are done. That reduces both the per-transport doorbell rate and completion rate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	svcrdma: Post WRs for Write chunks in svc_rdma_sendto()	Chuck Lever
	Refactor to eventually enable svcrdma to post the Write WRs for each RPC response using the same ib_post_send() as the Send WR (ie, as a single WR chain). svc_rdma_result_payload (originally svc_rdma_read_payload) was added so that the upper layer XDR encoder could identify a range of bytes to be possibly conveyed by RDMA (if a Write chunk was provided by the client). The purpose of commit f6ad77590a5d ("svcrdma: Post RDMA Writes while XDR encoding replies") was to post as much of the result payload outside of svc_rdma_sendto() as possible because svc_rdma_sendto() used to be called with the xpt_mutex held. However, since commit ca4faf543a33 ("SUNRPC: Move xpt_mutex into socket xpo_sendto methods"), the xpt_mutex is no longer held when calling svc_rdma_sendto(). Thus, that benefit is no longer an issue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	svcrdma: Post the Reply chunk and Send WR together	Chuck Lever
	Reduce the doorbell and Send completion rates when sending RPC/RDMA replies that have Reply chunks. NFS READDIR procedures typically return their result in a Reply chunk, for example. Instead of calling ib_post_send() to post the Write WRs for the Reply chunk, and then calling it again to post the Send WR that conveys the transport header, chain the Write WRs to the Send WR and call ib_post_send() only once. Thanks to the Send Queue completion ordering rules, when the Send WR completes, that guarantees that Write WRs posted before it have also completed successfully. Thus all Write WRs for the Reply chunk can remain unsignaled. Instead of handling a Write completion and then a Send completion, only the Send completion is seen, and it handles clean up for both the Writes and the Send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt	Chuck Lever
	Since the RPC transaction's svc_rdma_send_ctxt will stay around for the duration of the RDMA Write operation, the write_info structure for the Reply chunk can reside in the request's svc_rdma_send_ctxt instead of being allocated separately. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	svcrdma: Post Send WR chain	Chuck Lever
	Eventually I'd like the server to post the reply's Send WR along with any Write WRs using only a single call to ib_post_send(), in order to reduce the NIC's doorbell rate. To do this, add an anchor for a WR chain to svc_rdma_send_ctxt, and refactor svc_rdma_send() to post this WR chain to the Send Queue. For the moment, the posted chain will continue to contain a single Send WR. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	sunrpc: remove ->pg_stats from svc_program	Josef Bacik
	Now that this isn't used anywhere, remove it. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	sunrpc: pass in the sv_stats struct through svc_create_pooled	Josef Bacik
	Since only one service actually reports the rpc stats there's not much of a reason to have a pointer to it in the svc_program struct. Adjust the svc_create_pooled function to take the sv_stats as an argument and pass the struct through there as desired instead of getting it from the svc_program->pg_stats. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01	iommu: constify fwnode in iommu_ops_from_fwnode()	Krzysztof Kozlowski
	Make pointer to fwnode_handle a pointer to const for code safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240216144027.185959-3-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01	iommu: constify of_phandle_args in xlate	Krzysztof Kozlowski
	The xlate callbacks are supposed to translate of_phandle_args to proper provider without modifying the of_phandle_args. Make the argument pointer to const for code safety and readability. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240216144027.185959-2-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01	pidfs: convert to path_from_stashed() helper	Christian Brauner
	Moving pidfds from the anonymous inode infrastructure to a separate tiny in-kernel filesystem similar to sockfs, pipefs, and anon_inodefs causes selinux denials and thus various userspace components that make heavy use of pidfds to fail as pidfds used anon_inode_getfile() which aren't subject to any LSM hooks. But dentry_open() is and that would cause regressions. The failures that are seen are selinux denials. But the core failure is dbus-broker. That cascades into other services failing that depend on dbus-broker. For example, when dbus-broker fails to start polkit and all the others won't be able to work because they depend on dbus-broker. The reason for dbus-broker failing is because it doesn't handle failures for SO_PEERPIDFD correctly. Last kernel release we introduced SO_PEERPIDFD (and SCM_PIDFD). SO_PEERPIDFD allows dbus-broker and polkit and others to receive a pidfd for the peer of an AF_UNIX socket. This is the first time in the history of Linux that we can safely authenticate clients in a race-free manner. dbus-broker immediately made use of this but messed up the error checking. It only allowed EINVAL as a valid failure for SO_PEERPIDFD. That's obviously problematic not just because of LSM denials but because of seccomp denials that would prevent SO_PEERPIDFD from working; or any other new error code from there. So this is catching a flawed implementation in dbus-broker as well. It has to fallback to the old pid-based authentication when SO_PEERPIDFD doesn't work no matter the reasons otherwise it'll always risk such failures. So overall that LSM denial should not have caused dbus-broker to fail. It can never assume that a feature released one kernel ago like SO_PEERPIDFD can be assumed to be available. So, the next fix separate from the selinux policy update is to try and fix dbus-broker at [3]. That should make it into Fedora as well. In addition the selinux reference policy should also be updated. See [4] for that. If Selinux is in enforcing mode in userspace and it encounters anything that it doesn't know about it will deny it by default. And the policy is entirely in userspace including declaring new types for stuff like nsfs or pidfs to allow it. For now we continue to raise S_PRIVATE on the inode if it's a pidfs inode which means things behave exactly like before. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2265630 Link: https://github.com/fedora-selinux/selinux-policy/pull/2050 Link: https://github.com/bus1/dbus-broker/pull/343 [3] Link: https://github.com/SELinuxProject/refpolicy/pull/762 [4] Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240222190334.GA412503@dev-arch.thelio-3990X Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01	nsfs: convert to path_from_stashed() helper	Christian Brauner
	Use the newly added path_from_stashed() helper for nsfs. Link: https://lore.kernel.org/r/20240218-neufahrzeuge-brauhaus-fb0eb6459771@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01	pidfd: add pidfs	Christian Brauner
	This moves pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This has been on my todo for quite a while as it will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows: * statx() on pidfds becomes useful for the first time. * pidfds can be compared simply via statx() and then comparing inode numbers. * pidfds have unique inode numbers for the system lifetime. * struct pid is now stashed in inode->i_private instead of file->private_data. This means it is now possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. * file->private_data is freed up for per-file options for pidfds. * Each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. Even if we were to move to anon_inode_create_getfile() which creates new inodes we'd still be associating the same struct pid with multiple different inodes. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode for each struct pid and we reuse that inode for all pidfds. We use iget_locked() to find that inode again based on the inode number which isn't recycled. We allocate a new dentry for each pidfd that uses the same inode. That is similar to anonymous inodes which reuse the same inode for thousands of dentries. For pidfds we're talking way less than that. There usually won't be a lot of concurrent openers of the same struct pid. They can probably often be counted on two hands. I know that systemd does use separate pidfd for the same struct pid for various complex process tracking issues. So I think with that things actually become way simpler. Especially because we don't have to care about lookup. Dentries and inodes continue to be always deleted. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs which uses a similar stashing mechanism just for namespaces. Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01	net: bql: fix building with BQL disabled	Arnd Bergmann
	It is now possible to disable BQL, but that causes the cpsw driver to break: drivers/net/ethernet/ti/am65-cpsw-nuss.c:297:28: error: no member named 'dql' in 'struct netdev_queue' 297 \| dql_avail(&netif_txq->dql), There is already a helper function in net/sch_generic.h that could be used to help here. Move its implementation into the common linux/netdevice.h along with the other bql interfaces and change both users over to the new interface. Fixes: ea7f3cfaa588 ("net: bql: allow the config to be disabled") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01	Simplify net_dbg_ratelimited() dummy	Geert Uytterhoeven
	There is no need to wrap calls to the no_printk() helper inside an always-false check, as no_printk() already does that internally. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01	ipv6: add ipv6_devconf_read_txrx cacheline_group	Eric Dumazet
	IPv6 TX and RX fast path use the following fields: - disable_ipv6 - hop_limit - mtu6 - forwarding - disable_policy - proxy_ndp Place them in a group to increase data locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01	Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory	Michael Kelley
	The VMBUS_RING_SIZE macro adds space for a ring buffer header to the requested ring buffer size. The header size is always 1 page, and so its size varies based on the PAGE_SIZE for which the kernel is built. If the requested ring buffer size is a large power-of-2 size and the header size is small, the resulting size is inefficient in its use of memory. For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory allocator, and wastes 508 Kbytes of memory. In such situations, the exact size of the ring buffer isn't that important, and it's OK to allocate the 4 Kbyte header at the beginning of the 512 Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted. Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer sizes. "Large" is somewhat arbitrarily defined as 8 times the size of the ring buffer header (which is of size PAGE_SIZE). For example, for 4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first 4 Kbytes as the ring buffer header. For 64 Kbyte PAGE_SIZE, ring buffers of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer header. In both cases, smaller sizes add space for the header so the ring size isn't reduced too much by using part of the space for the header. For example, with a 64 Kbyte page size, we don't want a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half of the space for the header. In such a case, the memory allocation is less efficient, but it's the best that can be done. While the new algorithm slightly changes the amount of space allocated for ring buffers by drivers that use VMBUS_RING_SIZE, the devices aren't known to be sensitive to small changes in ring buffer size, so there shouldn't be any effect. Fixes: c1135c7fd0e9 ("Drivers: hv: vmbus: Introduce types of GPADL") Fixes: 6941f67ad37d ("hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218502 Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Tested-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Link: https://lore.kernel.org/r/20240229004533.313662-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20240229004533.313662-1-mhklinux@outlook.com>
2024-02-29	lib/string_helpers: Add flags param to string_get_size()	Andy Shevchenko
	The new flags parameter allows controlling - Whether or not the units suffix is separated by a space, for compatibility with sort -h - Whether or not to append a B suffix - we're not always printing bytes. Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240229205345.93902-1-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29	workqueue: Drain BH work items on hot-unplugged CPUs	Tejun Heo
	Boqun pointed out that workqueues aren't handling BH work items on offlined CPUs. Unlike tasklet which transfers out the pending tasks from CPUHP_SOFTIRQ_DEAD, BH workqueue would just leave them pending which is problematic. Note that this behavior is specific to BH workqueues as the non-BH per-CPU workers just become unbound when the CPU goes offline. This patch fixes the issue by draining the pending BH work items from an offlined CPU from CPUHP_SOFTIRQ_DEAD. Because work items carry more context, it's not as easy to transfer the pending work items from one pool to another. Instead, run BH work items which execute the offlined pools on an online CPU. Note that this assumes that no further BH work items will be queued on the offlined CPUs. This assumption is shared with tasklet and should be fine for conversions. However, this issue also exists for per-CPU workqueues which will just keep executing work items queued after CPU offline on unbound workers and workqueue should reject per-CPU and BH work items queued on offline CPUs. This will be addressed separately later. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-reviewed-by: Boqun Feng <boqun.feng@gmail.com> Link: http://lkml.kernel.org/r/Zdvw0HdSXcU3JZ4g@boqun-archlinux
2024-02-29	overflow: Use POD in check_shl_overflow()	Andy Shevchenko
	The check_shl_overflow() uses u64 type that is defined in types.h. Instead of including that header, just switch to use POD type directly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240228204919.3680786-2-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	kernel.h: Move lib/cmdline.c prototypes to string.h	Andy Shevchenko
	The lib/cmdline.c is basically a set of some small string parsers which are wide used in the kernel. Their prototypes belong to the string.h rather then kernel.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231003130142.2936503-1-andriy.shevchenko@linux.intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	fortify: Improve buffer overflow reporting	Kees Cook
	Improve the reporting of buffer overflows under CONFIG_FORTIFY_SOURCE to help accelerate debugging efforts. The calculations are all just sitting in registers anyway, so pass them along to the function to be reported. For example, before: detected buffer overflow in memcpy and after: memcpy: detected buffer overflow: 4096 byte read of buffer size 1 Link: https://lore.kernel.org/r/20230407192717.636137-10-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	fortify: Provide KUnit counters for failure testing	Kees Cook
	The standard C string APIs were not designed to have a failure mode; they were expected to always succeed without memory safety issues. Normally, CONFIG_FORTIFY_SOURCE will use fortify_panic() to stop processing, as truncating a read or write may provide an even worse system state. However, this creates a problem for testing under things like KUnit, which needs a way to survive failures. When building with CONFIG_KUNIT, provide a failure path for all users of fortify_panic, and track whether the failure was a read overflow or a write overflow, for KUnit tests to examine. Inspired by similar logic in the slab tests. Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	fortify: Split reporting and avoid passing string pointer	Kees Cook
	In preparation for KUnit testing and further improvements in fortify failure reporting, split out the report and encode the function and access failure (read or write overflow) into a single u8 argument. This mainly ends up saving a tiny bit of space in the data segment. For a defconfig with FORTIFY_SOURCE enabled: $ size gcc/vmlinux.before gcc/vmlinux.after text data bss dec hex filename 26132309 9760658 2195460 38088427 2452eeb gcc/vmlinux.before 26132386 9748382 2195460 38076228 244ff44 gcc/vmlinux.after Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	refcount: Annotated intentional signed integer wrap-around	Kees Cook
	Mark the various refcount_t functions with __signed_wrap, as we depend on the wrapping behavior to detect the overflow and perform saturation. Silences warnings seen with the LKDTM REFCOUNT_* tests: UBSAN: signed-integer-overflow in ../include/linux/refcount.h:189:11 2147483647 + 1 cannot be represented in type 'int' Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/r/20240221051634.work.287-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	lib/string_choices: Add str_plural() helper	Michal Wajdeczko
	Add str_plural() helper to replace existing open implementations used by many drivers and help improve future user facing messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20240214165015.1656-1-michal.wajdeczko@intel.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	overflow: Introduce wrapping_assign_add() and wrapping_assign_sub()	Kees Cook
	This allows replacements of the idioms "var += offset" and "var -= offset" with the wrapping_assign_add() and wrapping_assign_sub() helpers respectively. They will avoid wrap-around sanitizer instrumentation. Add to the selftests to validate behavior and lack of side-effects. Reviewed-by: Marco Elver <elver@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	overflow: Introduce wrapping_add(), wrapping_sub(), and wrapping_mul()	Kees Cook
	Provide helpers that will perform wrapping addition, subtraction, or multiplication without tripping the arithmetic wrap-around sanitizers. The first argument is the type under which the wrap-around should happen with. In other words, these two calls will get very different results: wrapping_mul(int, 50, 50) == 2500 wrapping_mul(u8, 50, 50) == 196 Add to the selftests to validate behavior and lack of side-effects. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Marco Elver <elver@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	overflow: Adjust check_*_overflow() kern-doc to reflect results	Kees Cook
	The check_*_overflow() helpers will return results with potentially wrapped-around values. These values have always been checked by the selftests, so avoid the confusing language in the kern-doc. The idea of "safe for use" was relative to the expectation of whether or not the caller wants a wrapped value -- the calculation itself will always follow arithmetic wrapping rules. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	kernel.h: Move upper__bits() and lower__bits() to wordpart.h	Andy Shevchenko
	The wordpart.h header is collecting APIs related to the handling parts of the word (usually in byte granularity). The upper__bits() and lower__bits() are good candidates to be moved to there. This helps to clean up header dependency hell with regard to kernel.h as the latter gathers completely unrelated stuff together and slows down compilation (especially when it's included into other header). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240214172752.3605073-1-andriy.shevchenko@linux.intel.com Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-29	Merge tag 'net-6.8-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29	cgroup/cpuset: Remove cpuset_do_slab_mem_spread()	Xiongwei Song
	The SLAB allocator has been removed sine 6.8-rc1 [1], so there is no user with SLAB_MEM_SPREAD and cpuset_do_slab_mem_spread(). Then SLAB_MEM_SPREAD is marked as unused by [2]. Here we can remove cpuset_do_slab_mem_spread(). For more details, please check [3]. [1] https://lore.kernel.org/linux-mm/20231120-slab-remove-slab-v2-0-9c9c70177183@suse.cz/ [2] https://lore.kernel.org/linux-kernel/20240223-slab-cleanup-flags-v2-0-02f1753e8303@suse.cz/T/ [3] https://lore.kernel.org/lkml/32bc1403-49da-445a-8c00-9686a3b0d6a3@redhat.com/T/#mf14b838c5e0e77f4756d436bac3d8c0447ea4350 Signed-off-by: Xiongwei Song <xiongwei.song@windriver.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-02-29	dpll: fix build failure due to rcu_dereference_check() on unknown type	Eric Dumazet
	Tasmiya reports that their compiler complains that we deref a pointer to unknown type with rcu_dereference_rtnl(): include/linux/rcupdate.h:439:9: error: dereferencing pointer to incomplete type ‘struct dpll_pin’ Unclear what compiler it is, at the moment, and we can't report but since DPLL can't be a module - move the code from the header into the source file. Fixes: 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com> Link: https://lore.kernel.org/all/3fcf3a2c-1c1b-42c1-bacb-78fdcd700389@linux.vnet.ibm.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240229190515.2740221-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29	cpufreq: Limit resolving a frequency to policy min/max	Shivnandan Kumar
	Resolving a frequency to an efficient one should not transgress policy->max (which can be set for thermal reason) and policy->min. Currently, there is possibility where scaling_cur_freq can exceed scaling_max_freq when scaling_max_freq is an inefficient frequency. Add a check to ensure that resolving a frequency will respect policy->min/max. Cc: All applicable <stable@vger.kernel.org> Fixes: 1f39fa0dccff ("cpufreq: Introducing CPUFREQ_RELATION_E") Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com> [ rjw: Whitespace adjustment, changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-29	f2fs: stop checkpoint when get a out-of-bounds segment	Zhiguo Niu
	There is low probability that an out-of-bounds segment will be got on a small-capacity device. In order to prevent subsequent write requests allocating block address from this invalid segment, which may cause unexpected issue, stop checkpoint should be performed. Also introduce a new stop cp reason: STOP_CP_REASON_NO_SEGMENT. Note, f2fs_stop_checkpoint(, false) is complex and it may sleep, so we should move it outside segmap_lock spinlock coverage in get_new_segment(). Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-02-29	Merge tag 'nf-24-02-29' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin. Patch #2 fixes an issue with bridge netfilter and broadcast/multicast packets. There is a day 0 bug in br_netfilter when used with connection tracking. Conntrack assumes that an nf_conn structure that is not yet added to hash table ("unconfirmed"), is only visible by the current cpu that is processing the sk_buff. For bridge this isn't true, sk_buff can get cloned in between, and clones can be processed in parallel on different cpu. This patch disables NAT and conntrack helpers for multicast packets. Patch #3 adds a selftest to cover for the br_netfilter bug. netfilter pull request 24-02-29 * tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() ==================== Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29	gpio: nomadik: support mobileye,eyeq5-gpio	Théo Lebrun
	We create a custom compatible for the STA2X11 IP block as integrated into the Mobileye EyeQ5 platform. Its wake and alternate functions have been disabled, we want to avoid touching those registers. We both do: (1) early return in functions that do not support the platform, but with warnings, and (2) avoid calling those functions in the first place. We ensure that pinctrl-nomadik is not used with this STA2X11 variant. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Link: https://lore.kernel.org/r/20240228-mbly-gpio-v2-24-3ba757474006@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>