linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-07	thunderbolt: Prevent use-after-free in resume from hibernate	Mika Westerberg
	Kenneth noticed that his laptop crashes randomly when resuming from hibernate if there is device connected and display tunneled. I was able to reproduce this as well with the following steps: 1. Boot the system up, nothing connected. 2. Connect Thunderbolt 4 dock to the host. 3. Connect monitor to the Thunderbolt 4 dock. 4. Verify that there is picture on the screen. 5. Enter hibernate. 6. Exit hibernate. 7. Wait for the system to resume. Expectation: System resumes just fine, the connected monitor still shows screen. Actual result: There is crash during resume, screen is blank. What happens is that during resume from hibernate we tear down any existing tunnels created by the boot kernel and this ends up calling tb_dp_dprx_stop() which calls tb_tunnel_put() dropping the reference count to zero even though we never called tb_dp_dprx_start() for it (we never do that for discovery). This makes the discovered DP tunnel memory to be released and any access after that causes use-after-free and possible crash. Fix this so that we only stop DPRX flow if it has been started in the first place. Reported-by: Kenneth Crudup <kenny@panix.com> Closes: https://lore.kernel.org/linux-usb/8e175721-806f-45d6-892a-bd3356af80c9@panix.com/ Cc: stable@vger.kernel.org Fixes: d6d458d42e1e ("thunderbolt: Handle DisplayPort tunnel activation asynchronously") Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2025-03-07	wifi: cfg80211: cancel wiphy_work before freeing wiphy	Miri Korenblit
	A wiphy_work can be queued from the moment the wiphy is allocated and initialized (i.e. wiphy_new_nm). When a wiphy_work is queued, the rdev::wiphy_work is getting queued. If wiphy_free is called before the rdev::wiphy_work had a chance to run, the wiphy memory will be freed, and then when it eventally gets to run it'll use invalid memory. Fix this by canceling the work before freeing the wiphy. Fixes: a3ee4dc84c4e ("wifi: cfg80211: add a work abstraction with special semantics") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250306123626.efd1d19f6e07.I48229f96f4067ef73f5b87302335e2fd750136c9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-07	wifi: mac80211: fix SA Query processing in MLO	Johannes Berg
	When MLO is used and SA Query processing isn't done by userspace (e.g. wpa_supplicant w/o CONFIG_OCV), then the mac80211 code kicks in but uses the wrong addresses. Fix them. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250306123626.bab48bb49061.I9391b22f1360d20ac8c4e92604de23f27696ba8f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-07	wifi: nl80211: fix assoc link handling	Johannes Berg
	The refactoring of the assoc link handling in order to support multi-link reconfiguration broke the setting of the assoc link ID, and thus resulted in the wrong BSS "use_for" value being selected. Fix that for both association and ML reconfiguration. Fixes: 720fa448f5a7 ("wifi: nl80211: Split the links handling of an association request") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250306123626.7b233d769c32.I62fd04a8667dd55cedb9a1c0414cc92dd098da75@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-07	wifi: mac80211: don't queue sdata::work for a non-running sdata	Miri Korenblit
	The worker really shouldn't be queued for a non-running interface. Also, if ieee80211_setup_sdata is called between queueing and executing the wk, it will be initialized, which will corrupt wiphy_work_list. Fixes: f8891461a277 ("mac80211: do not start any work during reconfigure flow") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250306123626.1e02caf82640.I4949e71ed56e7186ed4968fa9ddff477473fa2f4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-07	wifi: mac80211: flush the station before moving it to UN-AUTHORIZED state	Emmanuel Grumbach
	We first want to flush the station to make sure we no longer have any frames being Tx by the station before the station is moved to un-authorized state. Failing to do that will lead to races: a frame may be sent after the station's state has been changed. Since the API clearly states that the driver can't fail the sta_state() transition down the list of state, we can easily flush the station first, and only then call the driver's sta_state(). Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250306123626.450bc40e8b04.I636ba96843c77f13309c15c9fd6eb0c5a52a7976@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-07	wifi: iwlwifi: trans: cancel restart work on op mode leave	Miri Korenblit
	If the restart work happens to run after the opmode left (i.e. called iwl_trans_op_mode_leave), then the opmode memory (including its mutex) is likely to be freed already, and trans->opmode is NULL. Although the hw is stopped in that stage, which means that this restart got aborted (i.e. STATUS_RESET_PENDING will be cleared), it still can access trans->opmode (NULL pointer dereference) or the opmodes memory (which is freed). Fix this by canceling the restart wk in iwl_trans_op_mode_leave. Also make sure that the restart wk is really aborted. Fixes: 7391b2a4f7db ("wifi: iwlwifi: rework firmware error handling") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250306122425.801301ba1b8b.I6f6143f550b6335b699920c5d4b2b78449607a96@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-07	wifi: iwlwifi: mvm: fix PNVM timeout for non-MSI-X platforms	Emmanuel Grumbach
	When MSI-X is not enabled, we mask all the interrupts in the interrupt handler and re-enable them when the interrupt thread runs. If STATUS_INT_ENABLED is not set, we won't re-enable in the thread. In order to get the ALIVE interrupt, we allow the ALIVE interrupt itself, and RX as well in order to receive the ALIVE notification (which is received as an RX from the firmware. The problem is that STATUS_INT_ENABLED is clear until the op_mode calls trans_fw_alive which means that until trans_fw_alive is called, any notification from the firmware will not be received. This became a problem when we inserted the pnvm_load exactly between the ALIVE and trans_fw_alive. Fix that by calling trans_fw_alive before loading the PNVM. This will allow to get the notification from the firmware about PNVM load being complete and continue the flow normally. This didn't happen on MSI-X because we don't disable the interrupts in the ISR when MSI-X is available. The error in the log looks like this: iwlwifi 0000:00:03.0: Timeout waiting for PNVM load! iwlwifi 0000:00:03.0: Failed to start RT ucode: -110 iwlwifi 0000:00:03.0: WRT: Collecting data: ini trigger 13 fired (delay=0ms). Fixes: 70d3ca86b025 ("iwlwifi: mvm: ring the doorbell and wait for PNVM load completion") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250306122425.0f2cf207aae1.I025d8f724b44f52eadf6c19069352eb9275613a8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-07	wifi: iwlwifi: pcie: Fix TSO preparation	Ilan Peer
	The allocation of the scatter gather data structure should be done based on the number of memory chunks that need to be mapped, and it is not dependent on the overall payload length. Fix it. In addition, as the skb_to_sgvec() function returns an 'int' do not assign it to an 'unsigned int' as otherwise the error check would be useless. Fixes: 7f5e3038f029 ("wifi: iwlwifi: map entire SKB when sending AMSDUs") Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250306122425.8c0e23a3d583.I3cb4d6768c9d28ce3da6cd0a6c65466176cfc1ee@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-07	wifi: rework MAINTAINERS entries a bit	Johannes Berg
	Since I really don't want to be CC'ed on every patch add X: entries for all the drivers that are otherwise covered. In some cases, add a bit more to drivers that have other entries, mostly for the vendor directories, but for libertas also add libertas_tf. While at it, also add all nl80211-related (vendor) UAPI header files to the nl80211 entry. Link: https://patch.msgid.link/20250306092831.f7fdfe7df7b2.I7c86da443038af32e9bcbaa5f53b1e4128a0d1f9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-06	fs/pipe: add simpler helpers for common cases	Linus Torvalds
	The fix to atomically read the pipe head and tail state when not holding the pipe mutex has caused a number of headaches due to the size change of the involved types. It turns out that we don't have _that_ many places that access these fields directly and were affected, but we have more than we strictly should have, because our low-level helper functions have been designed to have intimate knowledge of how the pipes work. And as a result, that random noise of direct 'pipe->head' and 'pipe->tail' accesses makes it harder to pinpoint any actual potential problem spots remaining. For example, we didn't have a "is the pipe full" helper function, but instead had a "given these pipe buffer indexes and this pipe size, is the pipe full". That's because some low-level pipe code does actually want that much more complicated interface. But most other places literally just want a "is the pipe full" helper, and not having it meant that those places ended up being unnecessarily much too aware of this all. It would have been much better if only the very core pipe code that cared had been the one aware of this all. So let's fix it - better late than never. This just introduces the trivial wrappers for "is this pipe full or empty" and to get how many pipe buffers are used, so that instead of writing if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) the places that literally just want to know if a pipe is full can just say if (pipe_is_full(pipe)) instead. The existing trivial cases were converted with a 'sed' script. This cuts down on the places that access pipe->head and pipe->tail directly outside of the pipe code (and core splice code) quite a lot. The splice code in particular still revels in doing the direct low-level accesses, and the fuse fuse_dev_splice_write() code also seems a bit unnecessarily eager to go very low-level, but it's at least a bit better than it used to be. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06	Merge tag 'drm-fixes-2025-03-07' of https://gitlab.freedesktop.org/drm/kernel	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Fixes across the board, mostly xe and imagination with some amd and misc others. The xe fixes are mostly hmm related, though there are some others in there as well, nothing really stands out otherwise. The nouveau Kconfig to select FW_CACHE is in this, which we discussed a while back. nouveau: - rely on fw caching Kconfig fix imagination: - avoid deadlock on fence release - fix fence initialisation - fix timestamps firmware traces scheduler: - fix include guard bochs: - dpms fix i915: - bump max stream count to match pipes xe: - Remove double page flip on initial plane - Properly setup userptr pfn_flags_mask - Fix GT "for each engine" workarounds - Fix userptr races and missed validations - Userptr invalid page access fixes - Cleanup some style nits amdgpu: - Fix NULL check in DC code - SMU 14 fix amdkfd: - Fix NULL check in queue validation radeon: - RS400 HyperZ fix" * tag 'drm-fixes-2025-03-07' of https://gitlab.freedesktop.org/drm/kernel: (22 commits) drm/bochs: Fix DPMS regression drm/xe/userptr: Unmap userptrs in the mmu notifier drm/xe/hmm: Don't dereference struct page pointers without notifier lock drm/xe/hmm: Style- and include fixes drm/xe: Add staging tree for VM binds drm/xe: Fix fault mode invalidation with unbind drm/xe/vm: Fix a misplaced #endif drm/xe/vm: Validate userptr during gpu vma prefetching drm/amd/pm: always allow ih interrupt from fw drm/radeon: Fix rs400_gpu_init for ATI mobility radeon Xpress 200M drm/amdkfd: Fix NULL Pointer Dereference in KFD queue drm/amd/display: Fix null check for pipe_ctx->plane_state in resource_build_scaling_params drm/xe: Fix GT "for each engine" workarounds drm/xe/userptr: properly setup pfn_flags_mask drm/i915/mst: update max stream count to match number of pipes drm/xe: Remove double pageflip drm/sched: Fix preprocessor guard drm/imagination: Fix timestamps in firmware traces drm/imagination: only init job done fences once drm/imagination: Hold drm_gem_gpuva lock for unmap ...
2025-03-06	Merge tag 'nf-25-03-06' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix racy non-atomic read-then-increment operation with PREEMPT_RT in nft_ct, from Sebastian Andrzej Siewior. 2) GC is not skipped when jiffies wrap around in nf_conncount, from Nicklas Bo Jensen. 3) flush_work() on nf_tables_destroy_work waits for the last queued instance, this could be an instance that is different from the one that we must wait for, then make destruction work queue. * tag 'nf-25-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: make destruction work queue pernet netfilter: nf_conncount: garbage collection is not skipped when jiffies wrap around netfilter: nft_ct: Use __refcount_inc() for per-CPU nft_ct_pcpu_template. ==================== Link: https://patch.msgid.link/20250306153446.46712-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	Merge branch 'mlx5-misc-enhancements-2025-03-04'	Jakub Kicinski
	Tariq Toukan says: ==================== mlx5 misc enhancements 2025-03-04 This series introduces enhancements to the mlx5 core and Eth drivers. Patches 1-3 by Shahar introduce support for configuring lanes alongside speed when autonegotiation is disabled. The combination of speed and the number of lanes corresponds to a specific link mode (in the extended mask typically used in newer hardware), allowing the user to select a precise link mode when autonegotiation is off, instead of just choosing the speed. Patch 4 by Amir extends the multi-port LAG support. Patches 5-6 by Leon enhance IPsec matching logic. v1: https://lore.kernel.org/20250226114752.104838-1-tariqt@nvidia.com ==================== Link: https://patch.msgid.link/20250304160620.417580-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5e: Properly match IPsec subnet addresses	Leon Romanovsky
	Existing match criteria didn't allow to match whole subnet and only by specific addresses only. This caused to tunnel mode do not forward such traffic through relevant SA. In tunnel mode, policies look like this: src 192.169.0.0/16 dst 192.169.0.0/16 dir out priority 383615 ptype main tmpl src 192.169.101.2 dst 192.169.101.1 proto esp spi 0xc5141c18 reqid 1 mode tunnel crypto offload parameters: dev eth2 mode packet In this case, the XFRM core code handled all subnet calculations and forwarded network address to the drivers e.g. 192.169.0.0. For mlx5 devices, there is a need to set relevant prefix e.g. 0xFFFF00 to perform flow steering match operation. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250304160620.417580-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5e: Separate address related variables to be in struct	Leon Romanovsky
	Prepare the code to addition of prefix handling logic which is needed to support matching logic based on source and/or destination network prefixes. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250304160620.417580-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5: Lag, Enable Multiport E-Switch offloads on 8 ports LAG	Amir Tzin
	Patch [1] added mlx5 driver support for 8 ports HCAs which are available since ConnectX-8. Now that Multiport E-Switch is tested, we can enable it by removing flag MLX5_LAG_MPESW_OFFLOADS_SUPPORTED_PORTS. [1] commit e0e6adfe8c20 ("net/mlx5: Enable 8 ports LAG") Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250304160620.417580-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5e: Enable lanes configuration when auto-negotiation is off	Shahar Shitrit
	Currently, when auto-negotiation is disabled, the driver retrieves the speed and converts it into all link modes that correspond to that speed. With this patch, we add the ability to set the number of lanes, so that the combination of speed and lanes corresponds to exactly one specific link mode for the extended bit map. For the legacy bit map the driver sets all link modes correspond to speed and lanes. This change provides users with the option to set a specific link mode, rather than enabling all link modes associated with a given speed when auto-negotiation is off. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5: Refactor link speed handling with mlx5_link_info struct	Shahar Shitrit
	Introduce struct mlx5_link_info with a speed field and change the types of mlx5e_link_speed and mlx5e_ext_link_speed from arrays of u32 to arrays of struct mlx5_link_info. These arrays are renamed to mlx5e_link_info and mlx5e_ext_link_info, respectively. This change prepares for a future patch that will introduce a lanes field in struct mlx5_link_info and add lanes mapping alongside the speed for each link mode in the two arrays. Additionally, rename function mlx5_port_speed2linkmodes() to mlx5_port_info2linkmodes() and function mlx5_port_ptys2speed() to mlx5_port_ptys2info() and update the speed parameter/return type to struct mlx5_link_info, in preparation for the upcoming patch where these functions will also utilize the lanes field. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net/mlx5: Relocate function declarations from port.h to mlx5_core.h	Shahar Shitrit
	The port header is a general file under include, yet it contains declarations for functions that are either not exported or exported but not used outside the mlx5_core driver. To enhance code organization, we move these declarations to mlx5_core.h, where they are more appropriately scoped. This refactor removes unnecessary exported symbols and prevents unexported functions from being inadvertently referenced outside of the mlx5_core driver. Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250304160620.417580-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	Merge branch 'add-perout-configuration-support-in-iep-driver'	Jakub Kicinski
	Meghana Malladi says: ==================== Add perout configuration support in IEP driver IEP driver supported both perout and pps signal generation but perout feature is faulty with half-cooked support due to some missing configuration. Hence perout feature is removed as a bug fix. This patch series adds back this feature which configures perout signal based on the arguments passed by the perout request. This patch series is continuation to the bug fix: https://lore.kernel.org/20250227092441.1848419-1-m-malladi@ti.com as suggested by Jakub Kicinski and Jacob Keller: https://lore.kernel.org/20250220172410.025b96d6@kernel.org v3: https://lore.kernel.org/20250303135124.632845-1-m-malladi@ti.com ==================== Link: https://patch.msgid.link/20250304105753.1552159-1-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: ti: icss-iep: Add phase offset configuration for perout signal	Meghana Malladi
	icss_iep_perout_enable_hw() is a common function for generating both pps and perout signals. When enabling pps, the application needs to only pass enable/disable argument, whereas for perout it supports different flags to configure the signal. In case the app passes a valid phase offset value, the signal should start toggling after that phase offset, else start immediately or as soon as possible. ICSS_IEP_SYNC_START_REG register take number of clock cycles to wait before starting the signal after activation time. Set appropriate value to this register to support phase offset. Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250304105753.1552159-3-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: ti: icss-iep: Add pwidth configuration for perout signal	Meghana Malladi
	icss_iep_perout_enable_hw() is a common function for generating both pps and perout signals. When enabling pps, the application needs to only pass enable/disable argument, whereas for perout it supports different flags to configure the signal. But icss_iep_perout_enable_hw() function is missing to hook the configuration params passed by the app, causing perout to behave same a pps (except being able to configure the period). As duty cycle is also one feature which can configured for perout, incorporate this in the function to get the expected signal. Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250304105753.1552159-2-m-malladi@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	block: Name the RQF flags enum	Breno Leitao
	Commit 5f89154e8e9e3445f9b59 ("block: Use enum to define RQF_x bit indexes") converted the RQF flags to an anonymous enum, which was a beneficial change. This patch goes one step further by naming the enum as "rqf_flags". This naming enables exporting these flags to BPF clients, eliminating the need to duplicate these flags in BPF code. Instead, BPF clients can now access the same kernel-side values through CO:RE (Compile Once, Run Everywhere), as shown in this example: rqf_stats = bpf_core_enum_value(enum rqf_flags, __RQF_STATS) Suggested-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20250306-rqf_flags-v1-1-bbd64918b406@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-06	selftests: openvswitch: don't hardcode the drop reason subsys	Jakub Kicinski
	WiFi removed one of their subsys entries from drop reasons, in commit 286e69677065 ("wifi: mac80211: Drop cooked monitor support") SKB_DROP_REASON_SUBSYS_OPENVSWITCH is now 2 not 3. The drop reasons are not uAPI, read the correct value from debug info. We need to enable vmlinux BTF, otherwise pahole needs a few GB of memory to decode the enum name. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20250304180615.945945-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: airoha: Enable TSO/Scatter Gather for LAN port	Lorenzo Bianconi
	Set net_device vlan_features in order to enable TSO and Scatter Gather for DSA user ports. Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250304-lan-enable-tso-v1-1-b398eb9976ba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: airoha: Fix lan4 support in airoha_qdma_get_gdm_port()	Lorenzo Bianconi
	EN7581 SoC supports lan{1,4} ports on MT7530 DSA switch. Fix lan4 reported value in airoha_qdma_get_gdm_port routine. Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250304-airoha-eth-fix-lan4-v1-1-832417da4bb5@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	Merge branch 'increase-maximum-mtu-to-9k-for-airoha-en7581-soc'	Jakub Kicinski
	Lorenzo Bianconi says: ==================== Increase maximum MTU to 9k for Airoha EN7581 SoC EN7581 SoC supports 9k maximum MTU. Enable the reception of Scatter-Gather (SG) frames for Airoha EN7581. Introduce airoha_dev_change_mtu callback. ==================== Link: https://patch.msgid.link/20250304-airoha-eth-rx-sg-v1-0-283ebc61120e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: airoha: Increase max mtu to 9k	Lorenzo Bianconi
	EN7581 SoC supports 9k maximum MTU. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250304-airoha-eth-rx-sg-v1-4-283ebc61120e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: airoha: Introduce airoha_dev_change_mtu callback	Lorenzo Bianconi
	Add airoha_dev_change_mtu callback to update the MTU of a running device. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250304-airoha-eth-rx-sg-v1-3-283ebc61120e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: airoha: Enable Rx Scatter-Gather	Lorenzo Bianconi
	EN7581 SoC can receive 9k frames. Enable the reception of Scatter-Gather (SG) frames. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250304-airoha-eth-rx-sg-v1-2-283ebc61120e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: airoha: Move min/max packet len configuration in airoha_dev_open()	Lorenzo Bianconi
	In order to align max allowed packet size to the configured mtu, move REG_GDM_LEN_CFG configuration in airoha_dev_open routine. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250304-airoha-eth-rx-sg-v1-1-283ebc61120e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: stmmac: simplify phylink_suspend() and phylink_resume() calls	Russell King (Oracle)
	Currently, the calls to phylink's suspend and resume functions are inside overly complex tests, and boil down to: if (device_may_wakeup(priv->device) && priv->plat->pmt) { call phylink } else { call phylink and if (device_may_wakeup(priv->device)) do something else } This results in phylink always being called, possibly with differing arguments for phylink_suspend(). Simplify this code, noting that each site is slightly different due to the order in which phylink is called and the "something else". Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/E1tpQL1-005St4-Hn@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	sched: address a potential NULL pointer dereference in the GRED scheduler.	Jun Yang
	If kzalloc in gred_init returns a NULL pointer, the code follows the error handling path, invoking gred_destroy. This, in turn, calls gred_offload, where memset could receive a NULL pointer as input, potentially leading to a kernel crash. When table->opt is NULL in gred_init(), gred_change_table_def() is not called yet, so it is not necessary to call ->ndo_setup_tc() in gred_offload(). Signed-off-by: Jun Yang <juny24602@gmail.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Fixes: f25c0515c521 ("net: sched: gred: dynamically allocate tc_gred_qopt_offload") Link: https://patch.msgid.link/20250305154410.3505642-1-juny24602@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: stmmac: avoid shadowing global buf_sz	Russell King (Oracle)
	stmmac_rx() declares a local variable named "buf_sz" but there is also a global variable for a module parameter which is called the same. To avoid confusion, rename the local variable. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/E1tpswi-005U6C-Py@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	Merge branch '100GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-03-05 (ice) This series contains updates to ice driver. Larysa removes modification of destination override that caused LLDP packets to be blocked. Grzegorz fixes a memory leak in aRFS. Marcin resolves an issue with operation of switchdev and LAG. Przemek adjusts order of calls for registering devlink in relation to health reporters. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: register devlink prior to creating health reporters ice: Fix switchdev slow-path in LAG ice: fix memory leak in aRFS after reset ice: do not configure destination override for switchdev ==================== Link: https://patch.msgid.link/20250305213549.1514274-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-07	Merge tag 'amd-drm-fixes-6.14-2025-03-06' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.14-2025-03-06: amdgpu: - Fix NULL check in DC code - SMU 14 fix amdkfd: - Fix NULL check in queue validation radeon: - RS400 HyperZ fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306193424.27413-1-alexander.deucher@amd.com
2025-03-06	Merge tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefs	Linus Torvalds
	Pull bcachefs fixes from Kent Overstreet: - Fix a compatibility issue: we shouldn't be setting incompat feature bits unless explicitly requested - Fix another bug where the journal alloc/resize path could spuriously fail with -BCH_ERR_open_buckets_empty - Copygc shouldn't run on read-only devices: fragmentation isn't an issue if we're not currently writing to a given device, and it may not have anywhere to move the data to * tag 'bcachefs-2025-03-06' of git://evilpiepirate.org/bcachefs: bcachefs: copygc now skips non-rw devices bcachefs: Fix bch2_dev_journal_alloc() spuriously failing bcachefs: Don't set BCH_FEATURE_incompat_version_field unless requested
2025-03-06	selftests: net: bpf_offload: add 'libbpf_global' to ignored maps	Jakub Kicinski
	After installing pahole on the CI image we have a new map created by libbpf. Ignore it otherwise we see: Exception: Time out waiting for map counts to stabilize want 2, have 3 Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250304233204.1139251-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	selftests: net: fix error message in bpf_offload	Jakub Kicinski
	We hit a following exception on timeout, nmaps is never set: Test bpftool bound info reporting (own ns)... Traceback (most recent call last): File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 1128, in <module> check_dev_info(False, "") File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 583, in check_dev_info maps = bpftool_map_list_wait(expected=2, ns=ns) File "/home/virtme/testing-1/tools/testing/selftests/net/./bpf_offload.py", line 215, in bpftool_map_list_wait raise Exception("Time out waiting for map counts to stabilize want %d, have %d" % (expected, nmaps)) NameError: name 'nmaps' is not defined Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250304233204.1139251-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	tcp: clamp window like before the cleanup	Matthieu Baerts (NGI0)
	A recent cleanup changed the behaviour of tcp_set_window_clamp(). This looks unintentional, and affects MPTCP selftests, e.g. some tests re-establishing a connection after a disconnect are now unstable. Before the cleanup, this operation was done: new_rcv_ssthresh = min(tp->rcv_wnd, new_window_clamp); tp->rcv_ssthresh = max(new_rcv_ssthresh, tp->rcv_ssthresh); The cleanup used the 'clamp' macro which takes 3 arguments -- value, lowest, and highest -- and returns a value between the lowest and the highest allowable values. This then assumes ... lowest (rcv_ssthresh) <= highest (rcv_wnd) ... which doesn't seem to be always the case here according to the MPTCP selftests, even when running them without MPTCP, but only TCP. For example, when we have ... rcv_wnd < rcv_ssthresh < new_rcv_ssthresh ... before the cleanup, the rcv_ssthresh was not changed, while after the cleanup, it is lowered down to rcv_wnd (highest). During a simple test with TCP, here are the values I observed: new_window_clamp (val) rcv_ssthresh (lo) rcv_wnd (hi) 117760 (out) 65495 < 65536 128512 (out) 109595 > 80256 => lo > hi 1184975 (out) 328987 < 329088 113664 (out) 65483 < 65536 117760 (out) 110968 < 110976 129024 (out) 116527 > 109696 => lo > hi Here, we can see that it is not that rare to have rcv_ssthresh (lo) higher than rcv_wnd (hi), so having a different behaviour when the clamp() macro is used, even without MPTCP. Note: new_window_clamp is always out of range (rcv_ssthresh < rcv_wnd) here, which seems to be generally the case in my tests with small connections. I then suggests reverting this part, not to change the behaviour. Fixes: 863a952eb79a ("tcp: tcp_set_window_clamp() cleanup") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/551 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250305-net-next-fix-tcp-win-clamp-v1-1-12afb705d34e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: stmmac: mostly remove "buf_sz"	Russell King (Oracle)
	The "buf_sz" parameter is not used in the stmmac driver - there is one place where the value of buf_sz is validated, and two places where it is written. It is otherwise unused. Remove these accesses. However, leave the module parameter in place as removing it could cause module load to fail, breaking user setups. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/E1tpswn-005U6I-TU@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	ptp: ocp: Remove redundant check in _signal_summary_show	Ivan Abramov
	In the function _signal_summary_show(), there is a NULL-check for &bp->signal[nr], which cannot actually be NULL. Therefore, this redundant check can be removed. Signed-off-by: Ivan Abramov <i.abramov@mt-integration.ru> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250305092520.25817-1-i.abramov@mt-integration.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	Merge branch 'net-stmmac-dwc-qos-add-fsd-eqos-support'	Jakub Kicinski
	Swathi says: ==================== net: stmmac: dwc-qos: Add FSD EQoS support FSD platform has two instances of EQoS IP, one is in FSYS0 block and another one is in PERIC block. This patch series add required DT binding and platform driver specific changes for the same. ==================== Link: https://patch.msgid.link/20250305091246.106626-1-swathi.ks@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	net: stmmac: dwc-qos: Add FSD EQoS support	Swathi K S
	The FSD SoC contains two instance of the Synopsys DWC ethernet QOS IP core. The binding that it uses is slightly different from existing ones because of the integration (clocks, resets). Signed-off-by: Swathi K S <swathi.ks@samsung.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250305091246.106626-3-swathi.ks@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	dt-bindings: net: Add FSD EQoS device tree bindings	Swathi K S
	Add FSD Ethernet compatible in Synopsys dt-bindings document. Add FSD Ethernet YAML schema to enable the DT validation. Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Signed-off-by: Ravi Patel <ravi.patel@samsung.com> Signed-off-by: Swathi K S <swathi.ks@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250305091246.106626-2-swathi.ks@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	Merge branch 'tcp-even-faster-connect-under-stress'	Jakub Kicinski
	Eric Dumazet says: ==================== tcp: even faster connect() under stress This is a followup on the prior series, "tcp: scale connect() under pressure" Now spinlocks are no longer in the picture, we see a very high cost of the inet6_ehashfn() function. In this series (of 2), I change how lport contributes to inet6_ehashfn() to ensure better cache locality and call inet6_ehashfn() only once per connect() system call. This brings an additional 229 % increase of performance for "neper/tcp_crr -6 -T 200 -F 30000" stress test, while greatly improving latency metrics. Before: latency_min=0.014131929 latency_max=17.895073144 latency_mean=0.505675853 latency_stddev=2.125164772 num_samples=307884 throughput=139866.80 After: latency_min=0.003041375 latency_max=7.056589232 latency_mean=0.141075048 latency_stddev=0.526900516 num_samples=312996 throughput=320677.21 ==================== Link: https://patch.msgid.link/20250305034550.879255-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	inet: call inet6_ehashfn() once from inet6_hash_connect()	Eric Dumazet
	inet6_ehashfn() being called from __inet6_check_established() has a big impact on performance, as shown in the Tested section. After prior patch, we can compute the hash for port 0 from inet6_hash_connect(), and derive each hash in __inet_hash_connect() from this initial hash: hash(saddr, lport, daddr, dport) == hash(saddr, 0, daddr, dport) + lport Apply the same principle for __inet_check_established(), although inet_ehashfn() has a smaller cost. Tested: Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server Before this patch: utime_start=0.286131 utime_end=4.378886 stime_start=11.952556 stime_end=1991.655533 num_transactions=1446830 latency_min=0.001061085 latency_max=12.075275028 latency_mean=0.376375302 latency_stddev=1.361969596 num_samples=306383 throughput=151866.56 perf top: 50.01% [kernel] [k] __inet6_check_established 20.65% [kernel] [k] __inet_hash_connect 15.81% [kernel] [k] inet6_ehashfn 2.92% [kernel] [k] rcu_all_qs 2.34% [kernel] [k] __cond_resched 0.50% [kernel] [k] _raw_spin_lock 0.34% [kernel] [k] sched_balance_trigger 0.24% [kernel] [k] queued_spin_lock_slowpath After this patch: utime_start=0.315047 utime_end=9.257617 stime_start=7.041489 stime_end=1923.688387 num_transactions=3057968 latency_min=0.003041375 latency_max=7.056589232 latency_mean=0.141075048 # Better latency metrics latency_stddev=0.526900516 num_samples=312996 throughput=320677.21 # 111 % increase, and 229 % for the series perf top: inet6_ehashfn is no longer seen. 39.67% [kernel] [k] __inet_hash_connect 37.06% [kernel] [k] __inet6_check_established 4.79% [kernel] [k] rcu_all_qs 3.82% [kernel] [k] __cond_resched 1.76% [kernel] [k] sched_balance_domains 0.82% [kernel] [k] _raw_spin_lock 0.81% [kernel] [k] sched_balance_rq 0.81% [kernel] [k] sched_balance_trigger 0.76% [kernel] [k] queued_spin_lock_slowpath Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250305034550.879255-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()	Eric Dumazet
	In order to speedup __inet_hash_connect(), we want to ensure hash values for <source address, port X, destination address, destination port> are not randomly spread, but monotonically increasing. Goal is to allow __inet_hash_connect() to derive the hash value of a candidate 4-tuple with a single addition in the following patch in the series. Given : hash_0 = inet_ehashfn(saddr, 0, daddr, dport) hash_sport = inet_ehashfn(saddr, sport, daddr, dport) Then (hash_sport == hash_0 + sport) for all sport values. As far as I know, there is no security implication with this change. After this patch, when __inet_hash_connect() has to try XXXX candidates, the hash table buckets are contiguous and packed, allowing a better use of cpu caches and hardware prefetchers. Tested: Server: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog Client: ulimit -n 40000; neper/tcp_crr -T 200 -F 30000 -6 --nolog -c -H server Before this patch: utime_start=0.271607 utime_end=3.847111 stime_start=18.407684 stime_end=1997.485557 num_transactions=1350742 latency_min=0.014131929 latency_max=17.895073144 latency_mean=0.505675853 latency_stddev=2.125164772 num_samples=307884 throughput=139866.80 perf top on client: 56.86% [kernel] [k] __inet6_check_established 17.96% [kernel] [k] __inet_hash_connect 13.88% [kernel] [k] inet6_ehashfn 2.52% [kernel] [k] rcu_all_qs 2.01% [kernel] [k] __cond_resched 0.41% [kernel] [k] _raw_spin_lock After this patch: utime_start=0.286131 utime_end=4.378886 stime_start=11.952556 stime_end=1991.655533 num_transactions=1446830 latency_min=0.001061085 latency_max=12.075275028 latency_mean=0.376375302 latency_stddev=1.361969596 num_samples=306383 throughput=151866.56 perf top: 50.01% [kernel] [k] __inet6_check_established 20.65% [kernel] [k] __inet_hash_connect 15.81% [kernel] [k] inet6_ehashfn 2.92% [kernel] [k] rcu_all_qs 2.34% [kernel] [k] __cond_resched 0.50% [kernel] [k] _raw_spin_lock 0.34% [kernel] [k] sched_balance_trigger 0.24% [kernel] [k] queued_spin_lock_slowpath There is indeed an increase of throughput and reduction of latency. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250305034550.879255-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06	tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()	Eric Dumazet
	We have platforms with 6 NUMA nodes and 480 cpus. inet_ehash_locks_alloc() currently allocates a single 64KB page to hold all ehash spinlocks. This adds more pressure on a single node. Change inet_ehash_locks_alloc() to use vmalloc() to spread the spinlocks on all online nodes, driven by NUMA policies. At boot time, NUMA policy is interleave=all, meaning that tcp_hashinfo.ehash_locks gets hash dispersion on all nodes. Tested: lack5:~# grep inet_ehash_locks_alloc /proc/vmallocinfo 0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2 lack5:~# echo 8192 >/proc/sys/net/ipv4/tcp_child_ehash_entries lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo" 0x000000004e99d30c-0x00000000763f3279 36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=1 N1=2 N2=2 N3=1 N4=1 N5=1 0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2 lack5:~# numactl --interleave=0,5 unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo" 0x00000000fd73a33e-0x0000000004b9a177 36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=4 N5=4 0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2 lack5:~# echo 1024 >/proc/sys/net/ipv4/tcp_child_ehash_entries lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo" 0x00000000db07d7a2-0x00000000ad697d29 8192 inet_ehash_locks_alloc+0x90/0x100 pages=1 vmalloc N2=1 0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2 Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250305130550.1865988-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>