summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-11-04Merge tag 'qcom-drivers-fixes-for-6.12' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm driver fixes for v6.12 The Qualcomm EDAC driver's configuration of interrupts is made optional, to avoid violating security constriants on X Elite platform . The SCM drivers' detection mechanism for the presence of SHM bridge in QTEE, is corrected to handle the case where firmware successfully returns that the interface isn't supported. The GLINK driver and the PMIC GLINK interface is updated to handle buffer allocation issues during initialization of the communication channel. Allocation error handling in the socinfo dirver is corrected, and then the fix is corrected. * tag 'qcom-drivers-fixes-for-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: pmic_glink: Handle GLINK intent allocation rejections rpmsg: glink: Handle rejected intent request better soc: qcom: socinfo: fix revision check in qcom_socinfo_probe() firmware: qcom: scm: Return -EOPNOTSUPP for unsupported SHM bridge enabling EDAC/qcom: Make irq configuration optional firmware: qcom: scm: fix a NULL-pointer dereference firmware: qcom: scm: suppress download mode error soc: qcom: Add check devm_kasprintf() returned value MAINTAINERS: Qualcomm SoC: Match reserved-memory bindings Link: https://lore.kernel.org/r/20241101161455.746290-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-04RDMA/nldev: Add IB device and net device rename eventsChiara Meiohas
Implement event sending for IB device rename and IB device port associated netdevice rename. In iproute2, rdma monitor displays the IB device name, port and the netdevice name when displaying event info. Since users can modiy these names, we track and notify on renaming events. Note: In order to receive netdevice rename events, drivers must use the ib_device_set_netdev() API when attaching net devices to IB devices. $ rdma monitor $ rmmod mlx5_ib [UNREGISTER] dev 1 rocep8s0f1 [UNREGISTER] dev 0 rocep8s0f0 $ modprobe mlx5_ib [REGISTER] dev 2 mlx5_0 [NETDEV_ATTACH] dev 2 mlx5_0 port 1 netdev 4 eth2 [REGISTER] dev 3 mlx5_1 [NETDEV_ATTACH] dev 3 mlx5_1 port 1 netdev 5 eth3 [RENAME] dev 2 rocep8s0f0 [RENAME] dev 3 rocep8s0f1 $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev [UNREGISTER] dev 2 rocep8s0f0 [REGISTER] dev 4 mlx5_0 [NETDEV_ATTACH] dev 4 mlx5_0 port 30 netdev 4 eth2 [RENAME] dev 4 rdmap8s0f0 $ echo 4 > /sys/class/net/eth2/device/sriov_numvfs [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 2 netdev 7 eth4 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 3 netdev 8 eth5 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 4 netdev 9 eth6 [NETDEV_ATTACH] dev 4 rdmap8s0f0 port 5 netdev 10 eth7 [REGISTER] dev 5 mlx5_0 [NETDEV_ATTACH] dev 5 mlx5_0 port 1 netdev 11 eth8 [REGISTER] dev 6 mlx5_1 [NETDEV_ATTACH] dev 6 mlx5_1 port 1 netdev 12 eth9 [RENAME] dev 5 rocep8s0f0v0 [RENAME] dev 6 rocep8s0f0v1 [REGISTER] dev 7 mlx5_0 [NETDEV_ATTACH] dev 7 mlx5_0 port 1 netdev 13 eth10 [RENAME] dev 7 rocep8s0f0v2 [REGISTER] dev 8 mlx5_0 [NETDEV_ATTACH] dev 8 mlx5_0 port 1 netdev 14 eth11 [RENAME] dev 8 rocep8s0f0v3 $ ip link set eth2 name myeth2 [NETDEV_RENAME] netdev 4 myeth2 $ ip link set eth1 name myeth1 ** no events received, because eth1 is not attached to an IB device ** Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/093c978ef2766fd3ab4ff8798eeb68f2f11582f6.1730367038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Move ib_uverbs_file struct to uverbs_types.hPatrisious Haddad
In light of the previous commit, make the ib_uverbs_file accessible to drivers by moving its definition to uverbs_types.h, to allow drivers to freely access the struct argument and create a personalized cleanup flow. For the same reason expose uverbs_try_lock_object function to allow driver to safely access the uverbs objects. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/29b718e0dca35daa5f496320a39284fc1f5a1722.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Add device ufile cleanup operationPatrisious Haddad
Add a driver operation to allow preemptive cleanup of ufile HW resources before the standard ufile cleanup flow begins. Thus, expediting the final cleanup phase which leads to fast teardown overall. This allows the use of driver specific clean up procedures to make the cleanup process more efficient. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/cabe00d75132b5732cb515944e3c500a01fb0b4a.1730373303.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/core: Implement RoCE GID port rescan and export delete functionChiara Meiohas
rdma_roce_rescan_port() scans all network devices in the system and adds the gids if relevant to the RoCE device port. When not in bonding mode it adds the GIDs of the netdevice in this port. When in bonding mode it adds the GIDs of both the port's netdevice and the bond master netdevice. Export roce_del_all_netdev_gids(), which removes all GIDs associated with a specific netdevice for a given port. Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Link: https://patch.msgid.link/674d498da4637a1503ff1367e28bd09ff942fd5e.1730381292.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04RDMA/mlx5: Support OOO RX WQE consumptionEdward Srouji
Support QP with out-of-order (OOO) capabilities enabled. This allows WRs on the receiver side of the QP to be consumed OOO, permitting the sender side to transmit messages without guaranteeing arrival order on the receiver side. When enabled, the completion ordering of WRs remains in-order, regardless of the Receive WRs consumption order. RDMA Read and RDMA Atomic operations on the responder side continue to be executed in-order, while the ordering of data placement for RDMA Write and Send operations is not guaranteed. Atomic operations larger than 8 bytes are currently not supported. Therefore, when this feature is enabled, the created QP restricts its atomic support to 8 bytes at most. In addition, when querying the device, a new flag is returned in response to indicate that the Kernel supports OOO QP. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/06ac609a5f358c8fb0a090d22c61a2f9329d82e6.1725362773.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04Introduce mlx5 data direct placement (DDP)Leon Romanovsky
This feature allows WRs on the receiver side of the QP to be consumed out of order, permitting the sender side to transmit messages without guaranteeing arrival order on the receiver side. When enabled, the completion ordering of WRs remains in-order, regardless of the Receive WRs consumption order. RDMA Read and RDMA Atomic operations on the responder side continue to be executed in-order, while the ordering of data placement for RDMA Write and Send operations is not guaranteed. Signed-off-by: Leon Romanovsky <leon@kernel.org> * mlx5-next: net/mlx5: Introduce data placement ordering bits
2024-11-04net: enetc: add initial netc-blk-ctrl driver supportWei Fang
The netc-blk-ctrl driver is used to configure Integrated Endpoint Register Block (IERB) and Privileged Register Block (PRB) of NETC. For i.MX platforms, it is also used to configure the NETCMIX block. The IERB contains registers that are used for pre-boot initialization, debug, and non-customer configuration. The PRB controls global reset and global error handling for NETC. The NETCMIX block is mainly used to set MII protocol and PCS protocol of the links, it also contains settings for some other functions. Note the IERB configuration registers can only be written after being unlocked by PRB, otherwise, all write operations are inhibited. A warm reset is performed when the IERB is unlocked, and it results in an FLR to all NETC devices. Therefore, all NETC device drivers must be probed or initialized after the warm reset is finished. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-04net/mlx5: Introduce data placement ordering bitsEdward Srouji
Introduce out-of-order (OOO) data placement (DP) IFC related bits to support OOO DP QP. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/f30e5cbb5459fd02f27f35909bb545cab346b58b.1725362773.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-11-04Merge tag 'v6.12-rc6' of ↵Bartosz Golaszewski
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into gpio/for-next Linux 6.12-rc6
2024-11-04Backmerge v6.12-rc6 of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next Backmerge Linus tree for some drm-fixes needed for msm and xe merges. Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-11-03soc: qcom: llcc: add support for SAR2130P and SAR1130PDmitry Baryshkov
Implement necessary support for the LLCC control on the SAR1130P and SAR2130P platforms. These two platforms use different ATTR1_MAX_CAP shift and also require manual override for num_banks. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20241026-sar2130p-llcc-v3-3-2a58fa1b4d12@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-11-03Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-10-31 We've added 13 non-merge commits during the last 16 day(s) which contain a total of 16 files changed, 710 insertions(+), 668 deletions(-). The main changes are: 1) Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it, from Puranjay Mohan. 2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest into test_progs so that it can be run in BPF CI, from Alexis Lothoré. 3) Two BPF sockmap selftest fixes, from Zijian Zhang. 4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check, from Vincent Li. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Add a selftest for bpf_csum_diff() selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier bpf: bpf_csum_diff: Optimize and homogenize for all archs net: checksum: Move from32to16() to generic header selftests/bpf: remove xdp_synproxy IP_DF check selftests/bpf: remove test_tcp_check_syncookie selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress selftests/bpf: get rid of global vars in btf_skc_cls_ingress selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress selftests/bpf: factorize conn and syncookies tests in a single runner selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap selftests/bpf: Fix msg_verify_data in test_sockmap ==================== Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03dim: pass dim_sample to net_dim() by referenceCaleb Sander Mateos
net_dim() is currently passed a struct dim_sample argument by value. struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64 passes it on the stack. All callers have already initialized dim_sample on the stack, so passing it by value requires pushing a duplicated copy to the stack. Either witing to the stack and immediately reading it, or perhaps dereferencing addresses relative to the stack pointer in a chain of push instructions, seems to perform quite poorly. In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time, 94% of which is attributed to the first push instruction to copy dim_sample on the stack for the call to net_dim(): // Call ktime_get() 0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47> // Pass the address of struct dim in %rdi |4ead7: lea 0x3d0(%rbx),%rdi // Set dim_sample.pkt_ctr |4eade: mov %r13d,0x8(%rsp) // Set dim_sample.byte_ctr |4eae3: mov %r12d,0xc(%rsp) // Set dim_sample.event_ctr 0.15 |4eae8: mov %bp,0x10(%rsp) // Duplicate dim_sample on the stack 94.16 |4eaed: push 0x10(%rsp) 2.79 |4eaf1: push 0x10(%rsp) 0.07 |4eaf5: push %rax // Call net_dim() 0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b> To allow the caller to reuse the struct dim_sample already on the stack, pass the struct dim_sample by reference to net_dim(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03dim: make dim_calc_stats() inputs const pointersCaleb Sander Mateos
Make the start and end arguments to dim_calc_stats() const pointers to clarify that the function does not modify their values. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com> Link: https://patch.msgid.link/20241031002326.3426181-1-csander@purestorage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03iio: events: make IIO_EVENT_CODE macro privateDavid Lechner
Make IIO_EVENT_CODE "private" by adding a leading underscore. There are no more users of this macro in the kernel so we can make it "private" and encourage developers to use the specialized versions of the macro instead. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20241101-iio-fix-event-macro-use-v1-3-0000c5d09f6d@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: events.h: add event identifier macros for differential channelJulien Stephan
Currently, there are 3 helper macros in iio/events.h to create event identifiers: - IIO_EVENT_CODE : create generic event identifier for differential and non differential channels - IIO_MOD_EVENT_CODE : create event identifier for modified (non differential) channels - IIO_UNMOD_EVENT_CODE : create event identifier for unmodified (non differential) channels For differential channels, drivers are expected to use IIO_EVENT_CODE. However, only one driver in drivers/iio currently uses it correctly, leading to inconsistent event identifiers for differential channels that don’t match the intended attributes (such as max1363.c that supports differential channels, but only uses IIO_UNMOD_EVENT_CODE). To prevent such issues in future drivers, a new helper macro, IIO_DIFF_EVENT_CODE, is introduced to specifically create event identifiers for differential channels. Only one helper is needed for differential channels since they cannot have modifiers. Additionally, the descriptions for IIO_MOD_EVENT_CODE and IIO_UNMOD_EVENT_CODE have been updated to clarify that they are intended for non-differential channels, Signed-off-by: Julien Stephan <jstephan@baylibre.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20241028-iio-add-macro-for-even-identifier-for-differential-channels-v1-1-b452c90f7ea6@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: fix write_event_config signatureJulien Stephan
write_event_config callback use an int for state, but it is actually a boolean. iio_ev_state_store is actually using kstrtobool to check user input, then gives the converted boolean value to write_event_config. Fix signature and update all iio drivers to use the new signature. This patch has been partially written using coccinelle with the following script: $ cat iio-bool.cocci // Options: --all-includes virtual patch @c1@ identifier iioinfo; identifier wecfunc; @@ static const struct iio_info iioinfo = { ..., .write_event_config = ( wecfunc | &wecfunc ), ..., }; @@ identifier c1.wecfunc; identifier indio_dev, chan, type, dir, state; @@ int wecfunc(struct iio_dev *indio_dev, const struct iio_chan_spec *chan, enum iio_event_type type, enum iio_event_direction dir, -int +bool state) { ... } make coccicheck MODE=patch COCCI=iio-bool.cocci M=drivers/iio Unfortunately, this script didn't match all files: * all write_event_config callbacks using iio_device_claim_direct_scoped were not detected and not patched. * all files that do not assign and declare the write_event_config callback in the same file. iio.h was also manually updated. The patch was build tested using allmodconfig config. cc: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Julien Stephan <jstephan@baylibre.com> Link: https://patch.msgid.link/20241031-iio-fix-write-event-config-signature-v2-7-2bcacbb517a2@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: Add channel type for attentionRicardo Ribalda
Add a new channel type representing if the user's attention state to the the system. This usually means if the user is looking at the screen or not. Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://patch.msgid.link/20241101-hpd-v3-3-e9c80b7c7164@chromium.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: hid-sensors: Add proximity and attention IDsRicardo Ribalda
The HID Usage Table at https://usb.org/sites/default/files/hut1_5.pdf reserves: - 0x4b2 for Human Proximity Range Distance between a human and the computer. Default unit of measure is meters; https://www.usb.org/sites/default/files/hutrr39b_0.pdf - 0x4bd for Human Attention Detected Human-Presence sensors detect the presence of humans in the sensor’s field-of-view using diverse and evolving technologies. Some presence sensors are implemented with low resolution video cameras, which can additionally track a subject’s attention (i.e. if the user is ‘looking’ at the system with the integrated sensor). A Human-Presence sensor, providing a Host with the user’s attention state, allows the Host to optimize its behavior. For example, to brighten/dim the system display, based on the user’s attention to the system (potentially prolonging battery life). Default unit is true/false; https://www.usb.org/sites/default/files/hutrr107-humanpresenceattention_1.pdf Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://patch.msgid.link/20241101-hpd-v3-1-e9c80b7c7164@chromium.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03iio: Mark iio_dev::priv member with __privateAndy Shevchenko
The member is not supposed to be accessed directly, mark it with __private to catch the misuses up. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20241101105342.3645018-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-03Merge tag 'mm-hotfixes-stable-2024-11-03-10-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM. The usual collection of singletons - please see the changelogs" * tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes mm: shrinker: avoid memleak in alloc_shrinker_info .mailmap: update e-mail address for Eugen Hristev vmscan,migrate: fix page count imbalance on node stats when demoting pages mailmap: update Jarkko's email addresses mm: allow set/clear page_type again nilfs2: fix potential deadlock with newly created symlinks Squashfs: fix variable overflow in squashfs_readpage_block kasan: remove vmalloc_percpu test tools/mm: -Werror fixes in page-types/slabinfo mm, swap: avoid over reclaim of full clusters mm: fix PSWPIN counter for large folios swap-in mm: avoid VM_BUG_ON when try to map an anon large folio to zero page. mm/codetag: fix null pointer check logic for ref and tag mm/gup: stop leaking pinned pages in low memory conditions
2024-11-03pwm: core: export pwm_get_state_hw()David Lechner
Export the pwm_get_state_hw() function. This is useful in cases where we want to know what the hardware is actually doing, rather than what what we requested it should do. Locking had to be rearranged to ensure that the chip is still operational before trying to access ops now that this can be called from outside the pwm core. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20241029-pwm-export-pwm_get_state_hw-v2-1-03ba063a3230@baylibre.com [ukleinek: Add dummy for !CONFIG_PWM] Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2024-11-03net/tcp: Add missing lockdep annotations for TCP-AO hlist traversalsDmitry Safonov
Under CONFIG_PROVE_RCU_LIST + CONFIG_RCU_EXPERT hlist_for_each_entry_rcu() provides very helpful splats, which help to find possible issues. I missed CONFIG_RCU_EXPERT=y in my testing config the same as described in a3e4bf7f9675 ("configs/debug: make sure PROVE_RCU_LIST=y takes effect"). The fix itself is trivial: add the very same lockdep annotations as were used to dereference ao_info from the socket. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20241028152645.35a8be66@kernel.org/ Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20241030-tcp-ao-hlist-lockdep-annotate-v1-1-bf641a64d7c6@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ethtool: Avoid thousands of -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Change the type of the middle struct member currently causing trouble from `struct ethtool_link_settings` to `struct ethtool_link_settings_hdr`. Additionally, update the type of some variables in various functions that don't access the flexible-array member, changing them to the newly created `struct ethtool_link_settings_hdr`. These changes are needed because the type of the conflicting middle members changed. So, those instances that expect the type to be `struct ethtool_link_settings` should be adjusted to the newly created type `struct ethtool_link_settings_hdr`. Also, adjust variable declarations to follow the reverse xmas tree convention. Fix 3338 of the following -Wflex-array-member-not-at-end warnings: include/linux/ethtool.h:214:38: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/0bc2809fe2a6c11dd4c8a9a10d9bd65cccdb559b.1730238285.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03UAPI: ethtool: Use __struct_group() in struct ethtool_link_settingsGustavo A. R. Silva
Use the `__struct_group()` helper to create a new tagged `struct ethtool_link_settings_hdr`. This structure groups together all the members of the flexible `struct ethtool_link_settings` except the flexible array. As a result, the array is effectively separated from the rest of the members without modifying the memory layout of the flexible structure. This new tagged struct will be used to fix problematic declarations of middle-flex-arrays in composite structs[1]. [1] https://git.kernel.org/linus/d88cabfd9abc Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/9e9fb0bd72e5ba1e916acbb4995b1e358b86a689.1730238285.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()Yu Zhao
When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton@google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: James Houghton <jthoughton@google.com> Reported-by: David Stevens <stevensd@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-03mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL statsYu Zhao
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google.com/ This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: James Houghton <jthoughton@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: David Stevens <stevensd@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-03Merge tag 'input-for-v6.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - a fix for regression in input core introduced in 6.11 preventing re-registering input handlers - a fix for adp5588-keys driver tyring to disable interrupt 0 at suspend when devices is used without interrupt - a fix for edt-ft5x06 to stop leaking regmap structure when probing fails and to make sure it is not released too early on removal. * tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: fix regression when re-registering input handlers Input: adp5588-keys - do not try to disable interrupt 0 Input: edt-ft5x06 - fix regmap leak when probe fails
2024-11-03Merge tag 'timers-urgent-2024-11-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for posix CPU timers. When a thread is cloned, the posix CPU timers are not inherited. If the parent has a CPU timer armed the corresponding tick dependency in the tasks tick_dep_mask is set and copied to the new thread, which means the new thread and all decendants will prevent the system to go into full NOHZ operation. Clear the tick dependency mask in copy_process() to fix this" * tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
2024-11-03compiler_types: Add noinline_for_tracing annotationYafang Shao
Kernel functions that are not inlined can be easily hooked with BPF for tracing. However, functions intended for tracing may still be inlined unexpectedly. For example, in our case, after upgrading the compiler from GCC 9 to GCC 11, the tcp_drop_reason() function was inlined, which broke our monitoring tools. To prevent this, we need to ensure that the function remains non-inlined. The noinline_for_tracing annotation is introduced as a general solution for preventing inlining of kernel functions that need to be traced. This approach avoids the need for adding individual noinline comments to each function and provides a more consistent way to maintain traceability. Link: https://lore.kernel.org/netdev/CANn89iKvr44ipuRYFaPTpzwz=B_+pgA94jsggQ946mjwreV6Aw@mail.gmail.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://patch.msgid.link/20241024093742.87681-2-laoar.shao@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03dpll: add clock quality level attribute and opJiri Pirko
In order to allow driver expose quality level of the clock it is running, introduce a new netlink attr with enum to carry it to the userspace. Also, introduce an op the dpll netlink code calls into the driver to obtain the value. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20241030081157.966604-2-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03dt-bindings: clock: renesas,r9a08g045-vbattb: Document VBATTBClaudiu Beznea
The VBATTB IP of the Renesas RZ/G3S SoC controls the clock for RTC, the tamper detector and a small general usage memory of 128B. The VBATTB controller controls the clock for the RTC on the Renesas RZ/G3S. The HW block diagram for the clock logic is as follows: +----------+ XC `\ RTXIN --->| |----->| \ +----+ VBATTCLK | 32K clock| | |----->|gate|-----------> | osc | XBYP | | +----+ RTXOUT --->| |----->| / +----------+ ,/ One could connect as input to this HW block either a crystal or an external clock device. This is board specific. After discussions w/ Stephen Boyd the clock tree associated with this hardware block was exported in Linux as: input-xtal xbyp xc mux vbattclk where: - input-xtal is the input clock (connected to RTXIN, RTXOUT pins) - xc, xbyp are mux inputs - mux is the internal mux - vbattclk is the gate clock that feeds in the end the RTC to allow selecting the input of the MUX though assigned-clock DT properties, using the already existing clock drivers and avoid adding other DT properties. This allows select the input of the mux based on the type of the connected input clock: - if the 32768 crystal is connected as input for the VBATTB, the input of the mux should be xc - if an external clock device is connected as input for the VBATTB the input of the mux should be xbyp Add bindings for the VBATTB controller. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Link: https://lore.kernel.org/20241101095720.2247815-2-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2024-11-03deal with the last remaing boolean uses of fd_file()Al Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03introduce "fd_pos" class, convert fdget_pos() users to it.Al Viro
fdget_pos() for constructor, fdput_pos() for cleanup, all users of fd..._pos() converted trivially. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03switch netlink_getsockbyfilp() to taking descriptorAl Viro
the only call site (in do_mq_notify()) obtains the argument from an immediately preceding fdget() and it is immediately followed by fdput(); might as well just replace it with a variant that would take a descriptor instead of struct file * and have file lookups handled inside that function. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03net/socket.c: switch to CLASS(fd)Al Viro
The important part in sockfd_lookup_light() is avoiding needless file refcount operations, not the marginal reduction of the register pressure from not keeping a struct file pointer in the caller. Switch to use fdget()/fdpu(); with sane use of CLASS(fd) we can get a better code generation... Would be nice if somebody tested it on networking test suites (including benchmarks)... sockfd_lookup_light() does fdget(), uses sock_from_file() to get the associated socket and returns the struct socket reference to the caller, along with "do we need to fput()" flag. No matching fdput(), the caller does its equivalent manually, using the fact that sock->file points to the struct file the socket has come from. Get rid of that - have the callers do fdget()/fdput() and use sock_from_file() directly. That kills sockfd_lookup_light() and fput_light() (no users left). What's more, we can get rid of explicit fdget()/fdput() by switching to CLASS(fd, ...) - code generation does not suffer, since now fdput() inserted on "descriptor is not opened" failure exit is recognized to be a no-op by compiler. [folded a fix for braino in do_recvmmsg() caught by Simon Horman] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-02Input: fix regression when re-registering input handlersDmitry Torokhov
Commit d469647bafd9 ("Input: simplify event handling logic") introduced code that would set handler->events() method to either input_handler_events_filter() or input_handler_events_default() or input_handler_events_null(), depending on the kind of input handler (a filter or a regular one) we are dealing with. Unfortunately this breaks cases when we try to re-register the same filter (as is the case with sysrq handler): after initial registration the handler will have 2 event handling methods defined, and will run afoul of the check in input_handler_check_methods(): input: input_handler_check_methods: only one event processing method can be defined (sysrq) sysrq: Failed to register input handler, error -22 Fix this by adding handle_events() method to input_handle structure and setting it up when registering a new input handle according to event handling methods defined in associated input_handler structure, thus avoiding modifying the input_handler structure. Reported-by: "Ned T. Crigler" <crigler@gmail.com> Reported-by: Christian Heusel <christian@heusel.eu> Tested-by: "Ned T. Crigler" <crigler@gmail.com> Tested-by: Peter Seiderer <ps.report@gmx.net> Fixes: d469647bafd9 ("Input: simplify event handling logic") Link: https://lore.kernel.org/r/Zx2iQp6csn42PJA7@xavtug Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-11-02io_uring: add support for hybrid IOPOLLhexue
A new hybrid poll is implemented on the io_uring layer. Once an IO is issued, it will not poll immediately, but rather block first and re-run before IO complete, then poll to reap IO. While this poll method could be a suboptimal solution when running on a single thread, it offers performance lower than regular polling but higher than IRQ, and CPU utilization is also lower than polling. To use hybrid polling, the ring must be setup with both the IORING_SETUP_IOPOLL and IORING_SETUP_HYBRID)IOPOLL flags set. Hybrid polling has the same restrictions as IOPOLL, in that commands must explicitly support it. Signed-off-by: hexue <xue01.he@samsung.com> Link: https://lore.kernel.org/r/20241101091957.564220-2-xue01.he@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: allow cloning with node replacementsJens Axboe
Currently cloning a buffer table will fail if the destination already has a table. But it should be possible to use it to replace existing elements. Add a IORING_REGISTER_DST_REPLACE cloning flag, which if set, will allow the destination to already having a buffer table. If that is the case, then entries designated by offset + nr buffers will be replaced if they already exist. Note that it's allowed to use IORING_REGISTER_DST_REPLACE and not have an existing table, in which case it'll work just like not having the flag set and an empty table - it'll just assign the newly created table for that case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: allow cloning at an offsetJens Axboe
Right now buffer cloning is an all-or-nothing kind of thing - either the whole table is cloned from a source to a destination ring, or nothing at all. However, it's not always desired to clone the whole thing. Allow for the application to specify a source and destination offset, and a number of buffers to clone. If the destination offset is non-zero, then allocate sparse nodes upfront. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: unify file and buffer resource tablesJens Axboe
For files, there's nr_user_files/file_table/file_data, and buffers have nr_user_bufs/user_bufs/buf_data. There's no reason why file_table and file_data can't be the same thing, and ditto for the buffer side. That gets rid of more io_ring_ctx state that's in two spots rather than just being in one spot, as it should be. Put all the registered file data in one locations, and ditto on the buffer front. This also avoids having both io_rsrc_data->nodes being an allocated array, and ->user_bufs[] or ->file_table.nodes. There's no reason to have this information duplicated. Keep it in one spot, io_rsrc_data, along with how many resources are available. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: get rid of io_rsrc_node allocation cacheJens Axboe
It's not going to be needed in the fast path going forward, so kill it off. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: get rid of per-ring io_rsrc_node listJens Axboe
Work in progress, but get rid of the per-ring serialization of resource nodes, like registered buffers and files. Main issue here is that one node can otherwise hold up a bunch of other nodes from getting freed, which is especially a problem for file resource nodes and networked workloads where some descriptors may not see activity in a long time. As an example, instantiate an io_uring ring fd and create a sparse registered file table. Even 2 will do. Then create a socket and register it as fixed file 0, F0. The number of open files in the app is now 5, with 0/1/2 being the usual stdin/out/err, 3 being the ring fd, and 4 being the socket. Register this socket (eg "the listener") in slot 0 of the registered file table. Now add an operation on the socket that uses slot 0. Finally, loop N times, where each loop creates a new socket, registers said socket as a file, then unregisters the socket, and finally closes the socket. This is roughly similar to what a basic accept loop would look like. At the end of this loop, it's not unreasonable to expect that there would still be 5 open files. Each socket created and registered in the loop is also unregistered and closed. But since the listener socket registered first still has references to its resource node due to still being active, each subsequent socket unregistration is stuck behind it for reclaim. Hence 5 + N files are still open at that point, where N is awaiting the final put held up by the listener socket. Rewrite the io_rsrc_node handling to NOT rely on serialization. Struct io_kiocb now gets explicit resource nodes assigned, with each holding a reference to the parent node. A parent node is either of type FILE or BUFFER, which are the two types of nodes that exist. A request can have two nodes assigned, if it's using both registered files and buffers. Since request issue and task_work completion is both under the ring private lock, no atomics are needed to handle these references. It's a simple unlocked inc/dec. As before, the registered buffer or file table each hold a reference as well to the registered nodes. Final put of the node will remove the node and free the underlying resource, eg unmap the buffer or put the file. Outside of removing the stall in resource reclaim described above, it has the following advantages: 1) It's a lot simpler than the previous scheme, and easier to follow. No need to specific quiesce handling anymore. 2) There are no resource node allocations in the fast path, all of that happens at resource registration time. 3) The structs related to resource handling can all get simplified quite a bit, like io_rsrc_node and io_rsrc_data. io_rsrc_put can go away completely. 4) Handling of resource tags is much simpler, and doesn't require persistent storage as it can simply get assigned up front at registration time. Just copy them in one-by-one at registration time and assign to the resource node. The only real downside is that a request is now explicitly limited to pinning 2 resources, one file and one buffer, where before just assigning a resource node to a request would pin all of them. The upside is that it's easier to follow now, as an individual resource is explicitly referenced and assigned to the request. With this in place, the above mentioned example will be using exactly 5 files at the end of the loop, not N. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02i3c: Document I3C_ADDR_SLOT_EXT_STATUS_MASKAlexandre Belloni
As the mask is part of the enum, document it. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20241102132841.2446176-1-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-11-02vdso: Rename struct arch_vdso_data to arch_vdso_time_dataNam Cao
The struct arch_vdso_data is only about vdso time data. So rename it to arch_vdso_time_data to make it obvious. Non time-related data will be migrated out of these structs soon. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-28-b64f0842d512@linutronix.de
2024-11-02crypto: hisilicon/qm - disable same error report before resettingWeili Qian
If an error indicating that the device needs to be reset is reported, disable the error reporting before device reset is complete, enable the error reporting after the reset is complete to prevent the same error from being reported repeatedly. Fixes: eaebf4c3b103 ("crypto: hisilicon - Unify hardware error init/uninit into QM") Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-02crypto: hisilicon - support querying the capability registerQi Tao
Query the capability register status of accelerator devices (SEC, HPRE and ZIP) through the debugfs interface, for example: cat cap_regs. The purpose is to improve the robustness and locability of hardware devices and drivers. Signed-off-by: Qi Tao <taoqi10@huawei.com> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-02crypto: asymmetric_keys - Remove unused functionsDr. David Alan Gilbert
encrypt_blob(), decrypt_blob() and create_signature() were some of the functions added in 2018 by commit 5a30771832aa ("KEYS: Provide missing asymmetric key subops for new key type ops [ver #2]") however, they've not been used. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-02timekeeping: Remove CONFIG_DEBUG_TIMEKEEPINGThomas Gleixner
Since 135225a363ae timekeeping_cycles_to_ns() handles large offsets which would lead to 64bit multiplication overflows correctly. It's also protected against negative motion of the clocksource unconditionally, which was exclusive to x86 before. timekeeping_advance() handles large offsets already correctly. That means the value of CONFIG_DEBUG_TIMEKEEPING which analyzed these cases is very close to zero. Remove all of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241031120328.536010148@linutronix.de