linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-03-08	io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE	Jens Axboe
	We only use the flag for this purpose, so rename it accordingly. This further prevents various other use cases of it, keeping it clean and consistent. Then we can also check it in one spot, when it's being attempted recycled, and remove some dead code in io_kbuf_recycle_ring(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-08	rtc: class: make rtc_class constant	Ricardo B. Marliere
	Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the rtc_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240305-class_cleanup-abelloni-v1-1-944c026137c8@marliere.net Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-03-08	net: dqs: add NIC stall detector based on BQL	Jakub Kicinski
	softnet_data->time_squeeze is sometimes used as a proxy for host overload or indication of scheduling problems. In practice this statistic is very noisy and has hard to grasp units - e.g. is 10 squeezes a second to be expected, or high? Delaying network (NAPI) processing leads to drops on NIC queues but also RTT bloat, impacting pacing and CA decisions. Stalls are a little hard to detect on the Rx side, because there may simply have not been any packets received in given period of time. Packet timestamps help a little bit, but again we don't know if packets are stale because we're not keeping up or because someone (cough cgroups) disabled IRQs for a long time. We can, however, use Tx as a proxy for Rx stalls. Most drivers use combined Rx+Tx NAPIs so if Tx gets starved so will Rx. On the Tx side we know exactly when packets get queued, and completed, so there is no uncertainty. This patch adds stall checks to BQL. Why BQL? Because it's a convenient place to add such checks, already called by most drivers, and it has copious free space in its structures (this patch adds no extra cache references or dirtying to the fast path). The algorithm takes one parameter - max delay AKA stall threshold and increments a counter whenever NAPI got delayed for at least that amount of time. It also records the length of the longest stall. To be precise every time NAPI has not polled for at least stall thrs we check if there were any Tx packets queued between last NAPI run and now - stall_thrs/2. Unlike the classic Tx watchdog this mechanism does not ignore stalls caused by Tx being disabled, or loss of link. I don't think the check is worth the complexity, and stall is a stall, whether due to host overload, flow control, link down... doesn't matter much to the application. We have been running this detector in production at Meta for 2 years, with the threshold of 8ms. It's the lowest value where false positives become rare. There's still a constant stream of reported stalls (especially without the ksoftirqd deferral patches reverted), those who like their stall metrics to be 0 may prefer higher value. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08	Merge branches 'arm/mediatek', 'arm/renesas', 'arm/smmu', 'x86/vt-d', ↵	Joerg Roedel
	'x86/amd' and 'core' into next
2024-03-07	netdev: add per-queue statistics	Jakub Kicinski
	The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: introduce include/net/rps.h	Eric Dumazet
	Move RPS related structures and helpers from include/linux/netdevice.h and include/net/sock.h to a new include file. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move skbuff_cache(s) to net_hotdata	Eric Dumazet
	skbuff_cache, skbuff_fclone_cache and skb_small_head_cache are used in rx/tx fast paths. Move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move dev_rx_weight to net_hotdata	Eric Dumazet
	dev_rx_weight is read from process_backlog(). Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move dev_tx_weight to net_hotdata	Eric Dumazet
	dev_tx_weight is used in tx fast path. Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move netdev_max_backlog to net_hotdata	Eric Dumazet
	netdev_max_backlog is used in rx fat path. Move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: move ptype_all into net_hotdata	Eric Dumazet
	ptype_all is used in rx/tx fast paths. Move it to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	net: introduce struct net_hotdata	Eric Dumazet
	Instead of spreading networking critical fields all over the places, add a custom net_hotdata structure so that we can precisely control its layout. In this first patch, move : - gro_normal_batch used in rx (GRO stack) - offload_base used in rx and tx (GRO and TSO stacks) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	Merge tag 'mm-hotfixes-stable-2024-03-07-16-17' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "6 hotfixes. 4 are cc:stable and the remainder pertain to post-6.7 issues or aren't considered to be needed in earlier kernel versions" * tag 'mm-hotfixes-stable-2024-03-07-16-17' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: scripts/gdb/symbols: fix invalid escape sequence warning mailmap: fix Kishon's email init/Kconfig: lower GCC version check for -Warray-bounds mm, mmap: fix vma_merge() case 7 with vma_ops->close mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails mm, vmscan: prevent infinite loop for costly GFP_NOIO \| __GFP_RETRY_MAYFAIL allocations
2024-03-07	bpf: Plumb get_unmapped_area() callback into bpf_map_ops	Alexei Starovoitov
	Subsequent patches introduce bpf_arena that imposes special alignment requirements on address selection. Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20240307031228.42896-4-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-07	driver core: Add FWLINK_FLAG_IGNORE to completely ignore a fwnode link	Saravana Kannan
	A fwnode link between specific supplier-consumer fwnodes can be added multiple times for multiple reasons. If that dependency doesn't exist, deleting the fwnode link once doesn't guarantee that it won't get created again. So, add FWLINK_FLAG_IGNORE flag to mark a fwnode link as one that needs to be completely ignored. Since a fwnode link's flags is an OR of all the flags passed to all the fwnode_link_add() calls to create that specific fwnode link, the FWLINK_FLAG_IGNORE flag is preserved and can be used to mark a fwnode link as on that need to be completely ignored until it is deleted. Signed-off-by: Saravana Kannan <saravanak@google.com> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240305050458.1400667-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	driver core: Adds flags param to fwnode_link_add()	Saravana Kannan
	Allow the callers to set fwnode link flags when adding fwnode links. Signed-off-by: Saravana Kannan <saravanak@google.com> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240305050458.1400667-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	device property: Don't use "proxy" headers	Andy Shevchenko
	Update header inclusions to follow IWYU (Include What You Use) principle. Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240301180138.271590-5-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	device property: Move enum dev_dma_attr to fwnode.h	Andy Shevchenko
	The struct fwnode_operations defines one of the callback to return enum dev_dma_attr. But this currently is defined in property.h. Move it to the correct location. Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240301180138.271590-4-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	driver core: Move fw_devlink stuff to where it belongs	Andy Shevchenko
	A few APIs, i.e. fwnode_is_ancestor_of(), fwnode_get_next_parent_dev(), and get_dev_from_fwnode(), that belong specifically to the fw_devlink APIs, may be static, but they are not. Resolve this mess by moving them to the driver/base/core where the all users are being resided and make static. No functional changes intended. Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240301180138.271590-3-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	driver core: Drop unneeded 'extern' keyword in fwnode.h	Andy Shevchenko
	We do not use 'extern' keyword with functions. Remove the last one mistakenly added to fwnode.h. Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: Saravana Kannan <saravanak@google.com> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240301180138.271590-2-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	firmware_loader: introduce __free() cleanup hanler	Dmitry Torokhov
	Define cleanup handler using facilities from linux/cleanup.h to simplify error handling in code using firmware loader. This will allow writing code like this: int driver_update_firmware(...) { const struct firmware *fw_entry __free(firmware) = NULL; int error; ... error = request_firmware(&fw_entry, fw_name, dev); if (error) { dev_err(dev, "failed to request firmware %s: %d", fw_name, error); return error; } error = check_firmware_valid(fw_entry); if (error) return error; guard(mutex)(&instance->lock); error = use_firmware(instance, fw); if (error) return error; return 0; } Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Luis Chamberalin <mcgrof@kernel.org> Link: https://lore.kernel.org/r/ZaeQw7VXhnirX4pQ@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	uio: introduce UIO_MEM_DMA_COHERENT type	Chris Leech
	Add a UIO memtype specifically for sharing dma_alloc_coherent memory with userspace, backed by dma_mmap_coherent. This is mainly for the bnx2/bnx2x/bnx2i "cnic" interface, although there are a few other uio drivers which map dma_alloc_coherent memory and will be converted to use dma_mmap_coherent as well. Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Chris Leech <cleech@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240205200137.138302-1-cleech@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	cdx: add MSI support for CDX bus	Nipun Gupta
	Add CDX-MSI domain per CDX controller with gic-its domain as a parent, to support MSI for CDX devices. CDX devices allocate MSIs from the CDX domain. Also, introduce APIs to alloc and free IRQs for CDX domain. In CDX subsystem firmware is a controller for all devices and their configuration. CDX bus controller sends all the write_msi_msg commands to firmware running on RPU and the firmware interfaces with actual devices to pass this information to devices Since, CDX controller is the only way to communicate with the Firmware for MSI write info, CDX domain per controller required in contrast to having a CDX domain per device. Co-developed-by: Nikhil Agarwal <nikhil.agarwal@amd.com> Signed-off-by: Nikhil Agarwal <nikhil.agarwal@amd.com> Co-developed-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nikhil Agarwal <nikhil.agarwal@amd.com> Link: https://lore.kernel.org/r/20240226082816.100872-1-nipun.gupta@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	greybus: move is_gb_* functions out of greybus.h	Ricardo B. Marliere
	The functions below are only used within the context of drivers/greybus/core.c, so move them all into core and drop their 'inline' specifiers: is_gb_host_device(), is_gb_module(), is_gb_interface(), is_gb_control(), is_gb_bundle() and is_gb_svc(). Suggested-by: Alex Elder <elder@ieee.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20240226-device_cleanup-greybus2-v1-1-5f7d1161e684@marliere.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	phy: tegra: xusb: Add API to retrieve the port number of phy	Wayne Chang
	This patch introduces a new API, tegra_xusb_padctl_get_port_number, to the Tegra XUSB Pad Controller driver. This API is used to identify the USB port that is associated with a given PHY. The function takes a PHY pointer for either a USB2 PHY or USB3 PHY as input and returns the corresponding port number. If the PHY pointer is invalid, it returns -ENODEV. Cc: stable@vger.kernel.org Signed-off-by: Wayne Chang <waynec@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/r/20240307030328.1487748-2-waynec@nvidia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	dio: make dio_bus_type const	Ricardo B. Marliere
	Now that the driver core can properly handle constant struct bus_type, move the dio_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20240212-bus_cleanup-dio-v2-1-3b1ba4c0547d@marliere.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	firmware: xilinx: Add ZynqMP efuse access API	Praveen Teja Kundanala
	Add zynqmp_pm_efuse_access API in the ZynqMP firmware for read/write access of efuse memory. Signed-off-by: Praveen Teja Kundanala <praveen.teja.kundanala@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20240224114516.86365-6-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	slimbus: core: make slimbus_bus const	Ricardo B. Marliere
	Since commit d492cc2573a0 ("driver core: device.h: make struct bus_type a const *"), the driver core can properly handle constant struct bus_type, move the slimbus_bus variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20240224114403.86230-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-07	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07	Merge tag 'net-6.8-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, ipsec and netfilter. No solution yet for the stmmac issue mentioned in the last PR, but it proved to be a lockdep false positive, not a blocker. Current release - regressions: - dpll: move all dpll<>netdev helpers to dpll code, fix build regression with old compilers Current release - new code bugs: - page_pool: fix netlink dump stop/resume Previous releases - regressions: - bpf: fix verifier to check bpf_func_state->callback_depth when pruning states as otherwise unsafe programs could get accepted - ipv6: avoid possible UAF in ip6_route_mpath_notify() - ice: reconfig host after changing MSI-X on VF - mlx5: - e-switch, change flow rule destination checking - add a memory barrier to prevent a possible null-ptr-deref - switch to using _bh variant of of spinlock where needed Previous releases - always broken: - netfilter: nf_conntrack_h323: add protection for bmp length out of range - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP program in CPU map which led to random xdp_md fields - xfrm: fix UDP encapsulation in TX packet offload - netrom: fix data-races around sysctls - ice: - fix potential NULL pointer dereference in ice_bridge_setlink() - fix uninitialized dplls mutex usage - igc: avoid returning frame twice in XDP_REDIRECT - i40e: disable NAPI right after disabling irqs when handling xsk_pool - geneve: make sure to pull inner header in geneve_rx() - sparx5: fix use after free inside sparx5_del_mact_entry - dsa: microchip: fix register write order in ksz8_ind_write8() Misc: - selftests: mptcp: fixes for diag.sh" * tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) net: pds_core: Fix possible double free in error handling path netrom: Fix data-races around sysctl_net_busy_read netrom: Fix a data-race around sysctl_netrom_link_fails_count netrom: Fix a data-race around sysctl_netrom_routing_control netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size netrom: Fix a data-race around sysctl_netrom_transport_busy_delay netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries netrom: Fix a data-race around sysctl_netrom_transport_timeout netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser netrom: Fix a data-race around sysctl_netrom_default_path_quality netfilter: nf_conntrack_h323: Add protection for bmp length out of range netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout netfilter: nft_ct: fix l3num expectations with inet pseudo family netfilter: nf_tables: reject constant set with timeout netfilter: nf_tables: disallow anonymous set with timeout flag net/rds: fix WARNING in rds_conn_connect_if_down net: dsa: microchip: fix register write order in ksz8_ind_write8() ...
2024-03-07	Merge tag 'nvme-6.9-2024-03-07' of git://git.infradead.org/nvme into ↵	Jens Axboe
	for-6.9/block Pull NVMe updates from Keith: "nvme updates for Linux 6.9 - RDMA target enhancements (Max) - Fabrics fixes (Max, Guixin, Hannes) - Atomic queue_limits usage (Christoph) - Const use for class_register (Ricardo) - Identification error handling fixes (Shin'ichiro, Keith)" * tag 'nvme-6.9-2024-03-07' of git://git.infradead.org/nvme: (31 commits) nvme: clear caller pointer on identify failure nvme: host: fix double-free of struct nvme_id_ns in ns_update_nuse() nvme: fcloop: make fcloop_class constant nvme: fabrics: make nvmf_class constant nvme: core: constify struct class usage nvme-fabrics: typo in nvmf_parse_key() nvme-multipath: use atomic queue limits API for stacking limits nvme-multipath: pass queue_limits to blk_alloc_disk nvme: use the atomic queue limits update API nvme: cleanup nvme_configure_metadata nvme: don't query identify data in configure_metadata nvme: split out a nvme_identify_ns_nvm helper nvme: move common logic into nvme_update_ns_info nvme: move setting the write cache flags out of nvme_set_queue_limits nvme: move a few things out of nvme_update_disk_info nvme: don't use nvme_update_disk_info for the multipath disk nvme: move blk_integrity_unregister into nvme_init_integrity nvme: cleanup the nvme_init_integrity calling conventions nvme: move max_integrity_segments handling out of nvme_init_integrity nvme: remove nvme_revalidate_zones ...
2024-03-07	spi: Fix types of the last chip select storage variables	Andy Shevchenko
	First of all, last_cs_index_mask should be aligned with the original cs_index_mask, which is 16-bit (for now) wide. Use the same pattern for the last_cs_index_mask. Second, last_cs can be negative and since 'char' is equal to 'unsigned char' in the kernel, it's incorrect, strictly speaking, to assign signed number to it. Use s8 type as it's done for *_native_cs ones. With this change, regroup a bit the ordering to avoid too much memory space to be wasted due to paddings. Shuffle kernel documentation accordignly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://msgid.link/r/20240307150256.3789138-3-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-03-07	net: phylink: clean the pcs_get_state documentation	Maxime Chevallier
	commit 4d72c3bb60dd ("net: phylink: strip out pre-March 2020 legacy code") dropped the mac_pcs_get_state ops in phylink_mac_ops in favor of dedicated PCS operation pcs_get_state. However, the documentation for the pcs_get_state ops was incorrectly converted and now self-references. Drop the extra comment. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07	firmware: cirrus: cs_dsp: Remove non-existent member from kerneldoc	Richard Fitzgerald
	The kerneldoc for struct cs_dsp refers to a fw_file_name member but there's no such member. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://msgid.link/r/20240307105516.40250-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-03-07	Merge tag 'for-next-6.9' of ↵	Christian Brauner
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode into vfs.misc Merge case-insensitive updates from Gabriel Krisman Bertazi: - Patch case-insensitive lookup by trying the case-exact comparison first, before falling back to costly utf8 casefolded comparison. - Fix to forbid using a case-insensitive directory as part of an overlayfs mount. - Patchset to ensure d_op are set at d_alloc time for fscrypt and casefold volumes, ensuring filesystem dentries will all have the correct ops, whether they come from a lookup or not. * tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-07	leds: Fix ifdef check for gpio_led_register_device()	Arnd Bergmann
	gpio_led_register_device() is built whenever CONFIG_LEDS_GPIO_REGISTER is enabled, and this may be used even when CONFIG_NEW_LEDS is turned off. However, the stub declaration in the header is provided for all configs without CONFIG_NEW_LEDS, resulting in a build failure: drivers/leds/leds-gpio-register.c:24:1: error: redefinition of 'gpio_led_register_device' 24 \| gpio_led_register_device(int id, const struct gpio_led_platform_data *pdata) \| ^ include/linux/leds.h:646:39: note: previous definition is here Change the #ifdef check to match the definition. Note: this apparently took years of randconfig builds to hit, since a number of other drivers just 'select NEW_LEDS' anyway. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240228093834.2230004-1-arnd@kernel.org Signed-off-by: Lee Jones <lee@kernel.org>
2024-03-07	leds: Make flash and multicolor dependencies unconditional	Arnd Bergmann
	Along the same lines as making devm_led_classdev_register() declared extern unconditional, do the same thing for the two sub-classes that have similar stubs. The users of these interfaces go to great lengths to allow building with both the generic leds API and the extended version, but realistically there is not much use in this, so just simplify it to always rely on it and remove the confusing fallback logic. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20240109090715.982332-2-arnd@kernel.org Signed-off-by: Lee Jones <lee@kernel.org>
2024-03-07	leds: Remove led_init_default_state_get() and ↵	Arnd Bergmann
	devm_led_classdev_register_ext() stubs These two functions have stub implementations that are called when NEW_LEDS and/or LEDS_CLASS are disabled, theorerically allowing drivers to optionally use the LED subsystem. However, this has never really worked because a built-in driver is unable to link against these functions if the LED class is in a loadable module. Heiner ran into this problem with a driver that newly gained a LEDS_CLASS dependency and suggested using an IS_REACHABLE() check. This is the reverse approach, removing the stub entirely to acknowledge that it is pointless in its current form, and that not having it avoids misleading developers into thinking that they can rely on it. This survived around 1000 randconfig builds to validate that any callers of the interface already have the correct Kconfig dependency already, with the exception of the one that Heiner just added. Cc: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/linux-leds/0f6f432b-c650-4bb8-a1b5-fe3372804d52@gmail.com/T/#u Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240109090715.982332-1-arnd@kernel.org Signed-off-by: Lee Jones <lee@kernel.org>
2024-03-07	Merge branches 'ib-qcom-leds-6.9' and 'ib-leds-backlight-6.9' into ↵	Lee Jones
	ibs-for-leds-merged
2024-03-07	leds: expresswire: Don't use "proxy" headers	Andy Shevchenko
	Update header inclusions to follow IWYU (Include What You Use) principle. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Duje Mihanović <duje.mihanovic@skole.hr> Link: https://lore.kernel.org/r/20240223203010.881065-1-andriy.shevchenko@linux.intel.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-03-07	leds: Introduce ExpressWire library	Duje Mihanović
	The ExpressWire protocol is shared between at least KTD2692 and KTD2801 with slight differences such as timings and the former not having a defined set of pulses for enabling the protocol (possibly because it does not support PWM unlike KTD2801). Despite these differences the ExpressWire handling code can be shared between the two, so in preparation for adding KTD2801 support introduce a library implementing this protocol. Suggested-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Duje Mihanović <duje.mihanovic@skole.hr> Link: https://lore.kernel.org/r/20240125-ktd2801-v5-1-e22da232a825@skole.hr Signed-off-by: Lee Jones <lee@kernel.org>
2024-03-07	net/mlx5: Enable SD feature	Tariq Toukan
	Have an actual mlx5_sd instance in the core device, and fix the getter accordingly. This allows SD stuff to flow, the feature becomes supported only here. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07	net/mlx5: Add MPIR bit in mcam_access_reg	Tariq Toukan
	Add a cap bit in mcam_access_reg to check for MPIR support. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-07	i2c: remove redundant condition	Hsin-Yu.Chen
	I2C_M_RD is defined as and guaranteed to be 1 and 'flag & I2C_M_RD' is one or zero. No need for an additional condition to obtain the value. Signed-off-by: Hsin-Yu.Chen <harry021633@gmail.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> [wsa: slightly updated commit message] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-03-07	Merge branch kvm-arm64/vfio-normal-nc into kvmarm/next	Oliver Upton
	* kvm-arm64/vfio-normal-nc: : Normal-NC support for vfio-pci @ stage-2, courtesy of Ankit Agrawal : : KVM's policy to date has been that any and all MMIO mapping at stage-2 : is treated as Device-nGnRE. This is primarily done due to concerns of : the guest triggering uncontainable failures in the system if they manage : to tickle the device / memory system the wrong way, though this is : unnecessarily restrictive for devices that can be reasoned as 'safe'. : : Unsurprisingly, the Device-* mapping can really hurt the performance of : assigned devices that can handle Gathering, and can be an outright : correctness issue if the guest driver does unaligned accesses. : : Rather than opening the floodgates to the full ecosystem of devices that : can be exposed to VMs, take the conservative approach and allow PCI : devices to be mapped as Normal-NC since it has been determined to be : 'safe'. vfio: Convey kvm that the vfio-pci device is wc safe KVM: arm64: Set io memory s2 pte as normalnc for vfio pci device mm: Introduce new flag to indicate wc safe KVM: arm64: Introduce new flag for non-cacheable IO memory Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-03-06	bpf: Introduce may_goto instruction	Alexei Starovoitov
	Introduce may_goto instruction that from the verifier pov is similar to open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it doesn't iterate any objects. In assembly 'may_goto' is a nop most of the time until bpf runtime has to terminate the program for whatever reason. In the current implementation may_goto has a hidden counter, but other mechanisms can be used. For programs written in C the later patch introduces 'cond_break' macro that combines 'may_goto' with 'break' statement and has similar semantics: cond_break is a nop until bpf runtime has to break out of this loop. It can be used in any normal "for" or "while" loop, like for (i = zero; i < cnt; cond_break, i++) { The verifier recognizes that may_goto is used in the program, reserves additional 8 bytes of stack, initializes them in subprog prologue, and replaces may_goto instruction with: aux_reg = (u64 )(fp - 40) if aux_reg == 0 goto pc+off aux_reg -= 1 (u64 )(fp - 40) = aux_reg may_goto instruction can be used by LLVM to implement __builtin_memcpy, __builtin_strcmp. may_goto is not a full substitute for bpf_for() macro. bpf_for() doesn't have induction variable that verifiers sees, so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded. But when the code is written as: for (i = 0; i < 100; cond_break, i++) the verifier see 'i' as precise constant zero, hence cond_break (aka may_goto) doesn't help to converge the loop. A static or global variable can be used as a workaround: static int zero = 0; for (i = zero; i < 100; cond_break, i++) // works! may_goto works well with arena pointers that don't need to be bounds checked on access. Load/store from arena returns imprecise unbounded scalar and loops with may_goto pass the verifier. Reserve new opcode BPF_JMP \| BPF_JCOND for may_goto insn. JCOND stands for conditional pseudo jump. Since goto_or_nop insn was proposed, it may use the same opcode. may_goto vs goto_or_nop can be distinguished by src_reg: code = BPF_JMP \| BPF_JCOND src_reg = 0 - may_goto src_reg = 1 - goto_or_nop Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com
2024-03-06	mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().	Alexei Starovoitov
	vmap/vmalloc APIs are used to map a set of pages into contiguous kernel virtual space. get_vm_area() with appropriate flag is used to request an area of kernel address range. It's used for vmalloc, vmap, ioremap, xen use cases. - vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag. - the areas created by vmap() function should be tagged with VM_MAP. - ioremap areas are tagged with VM_IOREMAP. BPF would like to extend the vmap API to implement a lazily-populated sparse, yet contiguous kernel virtual space. Introduce VM_SPARSE flag and vm_area_map_pages(area, start_addr, count, pages) API to map a set of pages within a given area. It has the same sanity checks as vmap() does. It also checks that get_vm_area() was created with VM_SPARSE flag which identifies such areas in /proc/vmallocinfo and returns zero pages on read through /proc/kcore. The next commits will introduce bpf_arena which is a sparsely populated shared memory region between bpf program and user space process. It will map privately-managed pages into a sparse vm area with the following steps: // request virtual memory region during bpf prog verification area = get_vm_area(area_size, VM_SPARSE); // on demand vm_area_map_pages(area, kaddr, kend, pages); vm_area_unmap_pages(area, kaddr, kend); // after bpf program is detached and unloaded free_vm_area(area); Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/bpf/20240305030516.41519-3-alexei.starovoitov@gmail.com
2024-03-06	mm/treewide: align up pXd_leaf() retval across archs	Peter Xu
	Even if pXd_leaf() API is defined globally, it's not clear on the retval, and there are three types used (bool, int, unsigned log). Always return a boolean for pXd_leaf() APIs. Link: https://lkml.kernel.org/r/20240305043750.93762-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06	mm: pgtable: add missing pt_index to struct ptdesc	Qi Zheng
	In s390, the page->index field is used for gmap (see gmap_shadow_pgt()), so add the corresponding pt_index to struct ptdesc and add a comment to clarify this. Link: https://lkml.kernel.org/r/283624c2af45fb2090b41a6b1b5481bb0a45bad7.1709541697.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06	mm: pgtable: correct the wrong comment about ptdesc->__page_flags	Qi Zheng
	Patch series "minor fixes and supplement for ptdesc". In this series, the [PATCH 1/3] and [PATCH 2/3] are fixes for some issues discovered during code inspection. The [PATCH 3/3] is a supplement to ptdesc conversion in s390, I don't know why this is not done in the commit 6326c26c1514 ("s390: convert various pgalloc functions to use ptdescs"), maybe I missed something. And since I don't have an s390 environment, I hope kernel test robot can help compile and test, and this is why I did not fold [PATCH 2/3] and [PATCH 3/3] into one patch. This patch (of 3): The commit 32cc0b7c9d50 ("powerpc: add pte_free_defer() for pgtables sharing page") introduced the use of PageActive flag to page table fragments tracking, so the ptdesc->__page_flags is not unused, so correct the wrong comment. Link: https://lkml.kernel.org/r/cover.1709541697.git.zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/cc42d5915fd98fd802f920de243f535efcfe01db.1709541697.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>