summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-05-01octeon_ep: Fix host hang issue during device rebootSathesh B Edara
When the host loses heartbeat messages from the device, the driver calls the device-specific ndo_stop function, which frees the resources. If the driver is unloaded in this scenario, it calls ndo_stop again, attempting to free resources that have already been freed, leading to a host hang issue. To resolve this, dev_close should be called instead of the device-specific stop function.dev_close internally calls ndo_stop to stop the network interface and performs additional cleanup tasks. During the driver unload process, if the device is already down, ndo_stop is not called. Fixes: 5cb96c29aa0e ("octeon_ep: add heartbeat monitor") Signed-off-by: Sathesh B Edara <sedara@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250429114624.19104-1-sedara@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01net: fec: ERR007885 Workaround for conventional TXMattias Barthel
Activate TX hang workaround also in fec_enet_txq_submit_skb() when TSO is not enabled. Errata: ERR007885 Symptoms: NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out commit 37d6017b84f7 ("net: fec: Workaround for imx6sx enet tx hang when enable three queues") There is a TDAR race condition for mutliQ when the software sets TDAR and the UDMA clears TDAR simultaneously or in a small window (2-4 cycles). This will cause the udma_tx and udma_tx_arbiter state machines to hang. So, the Workaround is checking TDAR status four time, if TDAR cleared by hardware and then write TDAR, otherwise don't set TDAR. Fixes: 53bb20d1faba ("net: fec: add variable reg_desc_active to speed things up") Signed-off-by: Mattias Barthel <mattias.barthel@atlascopco.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250429090826.3101258-1-mattiasbarthel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01net: lan743x: Fix memleak issue when GSO enabledThangaraj Samynathan
Always map the `skb` to the LS descriptor. Previously skb was mapped to EXT descriptor when the number of fragments is zero with GSO enabled. Mapping the skb to EXT descriptor prevents it from being freed, leading to a memory leak Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250429052527.10031-1-thangaraj.s@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operationsSagi Maimon
On Adva boards, SMA sysfs store/get operations can call __handle_signal_outputs() or __handle_signal_inputs() while the `irig` and `dcf` pointers are uninitialized, leading to a NULL pointer dereference in __handle_signal() and causing a kernel crash. Adva boards don't use `irig` or `dcf` functionality, so add Adva-specific callbacks `ptp_ocp_sma_adva_set_outputs()` and `ptp_ocp_sma_adva_set_inputs()` that avoid invoking `irig` or `dcf` input/output routines. Fixes: ef61f5528fca ("ptp: ocp: add Adva timecard support") Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250429073320.33277-1-maimon.sagi@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01bnxt_en: fix module unload sequenceVadim Fedorenko
Recent updates to the PTP part of bnxt changed the way PTP FIFO is cleared, skbs waiting for TX timestamps are now cleared during ndo_close() call. To do clearing procedure, the ptp structure must exist and point to a valid address. Module destroy sequence had ptp clear code running before netdev close causing invalid memory access and kernel crash. Change the sequence to destroy ptp structure after device close. Fixes: 8f7ae5a85137 ("bnxt_en: improve TX timestamping FIFO configuration") Reported-by: Taehee Yoo <ap420073@gmail.com> Closes: https://lore.kernel.org/netdev/CAMArcTWDe2cd41=ub=zzvYifaYcYv-N-csxfqxUvejy_L0D6UQ@mail.gmail.com/ Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250430170343.759126-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-01clk: sunxi-ng: fix order of arguments in clock macroAndre Przywara
When introducing the SUNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT macro, the order of the last two arguments was different between the users and the definition: features became flags and flags became features. This just didn't end up in a disaster yet because most users ended up passing 0 for both arguments, and other clocks (for the new A523 SoC) are not yet used. Swap the order of the arguments in the definition, so that users stay untouched. Fixes: cdbb9d0d09db ("clk: sunxi-ng: mp: provide wrappers for setting feature flags") Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://patch.msgid.link/20250430095325.477311-1-andre.przywara@arm.com [wens@csie.org: fix typo in commit message] Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-05-01ASoC: stm32: sai: fix kernel rate configurationMark Brown
Merge series from Olivier Moysan <olivier.moysan@foss.st.com>: This patchset adds some checks on kernel minimum rate requirements. This avoids potential clock rate misconfiguration, when setting the kernel frequency on STM32MP2 SoCs.
2025-05-01Merge tag 'drm-misc-fixes-2025-04-30' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes A spurious WARN fix for nouveau, an init and interrupt handling fixes for ivpu, a warning fix for ttm, a hotplug fix for fdinfo, vblank fixes for adp, a memory leak fix for the shmem kunit tests, and a timing fix for mipi-dbi. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://lore.kernel.org/r/20250430-dark-eggplant-trout-c4ea6c@houat
2025-04-30dm: add missing unlock on in dm_keyslot_evict()Dan Carpenter
We need to call dm_put_live_table() even if dm_get_live_table() returns NULL. Fixes: 9355a9eb21a5 ("dm: support key eviction from keyslot managers of underlying devices") Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-04-30Merge tag 'modules-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull modules fixes from Petr Pavlu: "A single series to properly handle the module_kobject creation. This fixes a problem with missing /sys/module/<module>/drivers for built-in modules" * tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: drivers: base: handle module_kobject creation kernel: globalize lookup_or_create_module_kobject() kernel: refactor lookup_or_create_module_kobject() kernel: param: rename locate_module_kobject
2025-04-30cpufreq: intel_pstate: Unchecked MSR aceess in legacy modeSrinivas Pandruvada
When turbo mode is unavailable on a Skylake-X system, executing the command: # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo results in an unchecked MSR access error: WRMSR to 0x199 (attempted to write 0x0000000100001300). This issue was reproduced on an OEM (Original Equipment Manufacturer) system and is not a common problem across all Skylake-X systems. This error occurs because the MSR 0x199 Turbo Engage Bit (bit 32) is set when turbo mode is disabled. The issue arises when intel_pstate fails to detect that turbo mode is disabled. Here intel_pstate relies on MSR_IA32_MISC_ENABLE bit 38 to determine the status of turbo mode. However, on this system, bit 38 is not set even when turbo mode is disabled. According to the Intel Software Developer's Manual (SDM), the BIOS sets this bit during platform initialization to enable or disable opportunistic processor performance operations. Logically, this bit should be set in such cases. However, the SDM also specifies that "OS and applications must use CPUID leaf 06H to detect processors with opportunistic processor performance operations enabled." Therefore, in addition to checking MSR_IA32_MISC_ENABLE bit 38, verify that CPUID.06H:EAX[1] is 0 to accurately determine if turbo mode is disabled. Fixes: 4521e1a0ce17 ("cpufreq: intel_pstate: Reflect current no_turbo state correctly") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-04-30soundwire: intel_auxdevice: Fix system suspend/resume handlingRafael J. Wysocki
Before commit bca84a7b93fd ("PM: sleep: Use DPM_FLAG_SMART_SUSPEND conditionally") the runtime PM status of the device in intel_resume() had always been RPM_ACTIVE because setting DPM_FLAG_SMART_SUSPEND had caused the core to call pm_runtime_set_active() for that device during the "noirq" resume phase. For this reason, the pm_runtime_suspended() check in intel_resume() had never triggered and the code depending on it had never run. That had not caused any observable functional issues to appear, so effectively the code in question had never been needed. After commit bca84a7b93fd the core does not call pm_runtime_set_active() for all devices with DPM_FLAG_SMART_SUSPEND set any more and the code depending on the pm_runtime_suspended() check in intel_resume() runs if the device is runtime-suspended prior to a system-wide suspend transition. Unfortunately, when it runs, it breaks things due to the attempt to runtime-resume bus->dev which most likely is not ready for a runtime resume at that point. It also does other more-or-less questionable things. Namely, it calls pm_runtime_idle() for a device with a nonzero runtime PM usage counter which has no effect (all devices have nonzero runtime PM usage counters during system-wide suspend and resume). It also calls pm_runtime_mark_last_busy() for the device even though devices cannot runtime-suspend during system-wide suspend and resume (because their runtime PM usage counters are nonzero) and an analogous call is made in the same function later. Moreover, it sets the runtime PM status of the device to RPM_ACTIVE before activating it. For the reasons listed above, remove that code altogether. On top of that, add a pm_runtime_disable() call to intel_suspend() to prevent the device from being runtime-resumed at any point after intel_suspend() has started to manipulate it because the changes made by that function would be undone by a runtime-suspend of the device. Next, once runtime PM has been disabled, the runtime PM status of the device cannot change, so pm_runtime_status_suspended() can be used instead of pm_runtime_suspended() in intel_suspend(). Finally, make intel_resume() call pm_runtime_set_active() at the end to set the runtime PM status of the device to "active" because it has just been activated and re-enable runtime PM for it after that. Additionally, drop the setting of DPM_FLAG_SMART_SUSPEND from the driver because it has no effect on devices handled by it. Fixes: bca84a7b93fd ("PM: sleep: Use DPM_FLAG_SMART_SUSPEND conditionally") Reported-by: Bard Liao <yung-chuan.liao@linux.intel.com> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/12680420.O9o76ZdvQC@rjwysocki.net
2025-04-30nvmet-auth: always free derived key dataHannes Reinecke
After calling nvme_auth_derive_tls_psk() we need to free the resulting psk data, as either TLS is disable (and we don't need the data anyway) or the psk data is copied into the resulting key (and can be free, too). Fixes: fa2e0f8bbc68 ("nvmet-tcp: support secure channel concatenation") Reported-by: Yi Zhang <yi.zhang@redhat.com> Suggested-by: Maurizio Lombardi <mlombard@bsdbackstore.eu> Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvmet-tcp: don't restore null sk_state_changeAlistair Francis
queue->state_change is set as part of nvmet_tcp_set_queue_sock(), but if the TCP connection isn't established when nvmet_tcp_set_queue_sock() is called then queue->state_change isn't set and sock->sk->sk_state_change isn't replaced. As such we don't need to restore sock->sk->sk_state_change if queue->state_change is NULL. This avoids NULL pointer dereferences such as this: [ 286.462026][ C0] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 286.462814][ C0] #PF: supervisor instruction fetch in kernel mode [ 286.463796][ C0] #PF: error_code(0x0010) - not-present page [ 286.464392][ C0] PGD 8000000140620067 P4D 8000000140620067 PUD 114201067 PMD 0 [ 286.465086][ C0] Oops: Oops: 0010 [#1] SMP KASAN PTI [ 286.465559][ C0] CPU: 0 UID: 0 PID: 1628 Comm: nvme Not tainted 6.15.0-rc2+ #11 PREEMPT(voluntary) [ 286.466393][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 286.467147][ C0] RIP: 0010:0x0 [ 286.467420][ C0] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 286.467977][ C0] RSP: 0018:ffff8883ae008580 EFLAGS: 00010246 [ 286.468425][ C0] RAX: 0000000000000000 RBX: ffff88813fd34100 RCX: ffffffffa386cc43 [ 286.469019][ C0] RDX: 1ffff11027fa68b6 RSI: 0000000000000008 RDI: ffff88813fd34100 [ 286.469545][ C0] RBP: ffff88813fd34160 R08: 0000000000000000 R09: ffffed1027fa682c [ 286.470072][ C0] R10: ffff88813fd34167 R11: 0000000000000000 R12: ffff88813fd344c3 [ 286.470585][ C0] R13: ffff88813fd34112 R14: ffff88813fd34aec R15: ffff888132cdd268 [ 286.471070][ C0] FS: 00007fe3c04c7d80(0000) GS:ffff88840743f000(0000) knlGS:0000000000000000 [ 286.471644][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 286.472543][ C0] CR2: ffffffffffffffd6 CR3: 000000012daca000 CR4: 00000000000006f0 [ 286.473500][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 286.474467][ C0] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 286.475453][ C0] Call Trace: [ 286.476102][ C0] <IRQ> [ 286.476719][ C0] tcp_fin+0x2bb/0x440 [ 286.477429][ C0] tcp_data_queue+0x190f/0x4e60 [ 286.478174][ C0] ? __build_skb_around+0x234/0x330 [ 286.478940][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.479659][ C0] ? __pfx_tcp_data_queue+0x10/0x10 [ 286.480431][ C0] ? tcp_try_undo_loss+0x640/0x6c0 [ 286.481196][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.482046][ C0] ? kvm_clock_get_cycles+0x14/0x30 [ 286.482769][ C0] ? ktime_get+0x66/0x150 [ 286.483433][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.484146][ C0] tcp_rcv_established+0x6e4/0x2050 [ 286.484857][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.485523][ C0] ? ipv4_dst_check+0x160/0x2b0 [ 286.486203][ C0] ? __pfx_tcp_rcv_established+0x10/0x10 [ 286.486917][ C0] ? lock_release+0x217/0x2c0 [ 286.487595][ C0] tcp_v4_do_rcv+0x4d6/0x9b0 [ 286.488279][ C0] tcp_v4_rcv+0x2af8/0x3e30 [ 286.488904][ C0] ? raw_local_deliver+0x51b/0xad0 [ 286.489551][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.490198][ C0] ? __pfx_tcp_v4_rcv+0x10/0x10 [ 286.490813][ C0] ? __pfx_raw_local_deliver+0x10/0x10 [ 286.491487][ C0] ? __pfx_nf_confirm+0x10/0x10 [nf_conntrack] [ 286.492275][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.492900][ C0] ip_protocol_deliver_rcu+0x8f/0x370 [ 286.493579][ C0] ip_local_deliver_finish+0x297/0x420 [ 286.494268][ C0] ip_local_deliver+0x168/0x430 [ 286.494867][ C0] ? __pfx_ip_local_deliver+0x10/0x10 [ 286.495498][ C0] ? __pfx_ip_local_deliver_finish+0x10/0x10 [ 286.496204][ C0] ? ip_rcv_finish_core+0x19a/0x1f20 [ 286.496806][ C0] ? lock_release+0x217/0x2c0 [ 286.497414][ C0] ip_rcv+0x455/0x6e0 [ 286.497945][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.498550][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.499137][ C0] ? __pfx_ip_rcv_finish+0x10/0x10 [ 286.499763][ C0] ? lock_release+0x217/0x2c0 [ 286.500327][ C0] ? dl_scaled_delta_exec+0xd1/0x2c0 [ 286.500922][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.501480][ C0] __netif_receive_skb_one_core+0x166/0x1b0 [ 286.502173][ C0] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 286.502903][ C0] ? lock_acquire+0x2b2/0x310 [ 286.503487][ C0] ? process_backlog+0x372/0x1350 [ 286.504087][ C0] ? lock_release+0x217/0x2c0 [ 286.504642][ C0] process_backlog+0x3b9/0x1350 [ 286.505214][ C0] ? process_backlog+0x372/0x1350 [ 286.505779][ C0] __napi_poll.constprop.0+0xa6/0x490 [ 286.506363][ C0] net_rx_action+0x92e/0xe10 [ 286.506889][ C0] ? __pfx_net_rx_action+0x10/0x10 [ 286.507437][ C0] ? timerqueue_add+0x1f0/0x320 [ 286.507977][ C0] ? sched_clock_cpu+0x68/0x540 [ 286.508492][ C0] ? lock_acquire+0x2b2/0x310 [ 286.509043][ C0] ? kvm_sched_clock_read+0xd/0x20 [ 286.509607][ C0] ? handle_softirqs+0x1aa/0x7d0 [ 286.510187][ C0] handle_softirqs+0x1f2/0x7d0 [ 286.510754][ C0] ? __pfx_handle_softirqs+0x10/0x10 [ 286.511348][ C0] ? irqtime_account_irq+0x181/0x290 [ 286.511937][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.512510][ C0] do_softirq.part.0+0x89/0xc0 [ 286.513100][ C0] </IRQ> [ 286.513548][ C0] <TASK> [ 286.513953][ C0] __local_bh_enable_ip+0x112/0x140 [ 286.514522][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.515072][ C0] __dev_queue_xmit+0x872/0x3450 [ 286.515619][ C0] ? nft_do_chain+0xe16/0x15b0 [nf_tables] [ 286.516252][ C0] ? __pfx___dev_queue_xmit+0x10/0x10 [ 286.516817][ C0] ? selinux_ip_postroute+0x43c/0xc50 [ 286.517433][ C0] ? __pfx_selinux_ip_postroute+0x10/0x10 [ 286.518061][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.518606][ C0] ? ip_output+0x164/0x4a0 [ 286.519149][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.519671][ C0] ? ip_finish_output2+0x17d5/0x1fb0 [ 286.520258][ C0] ip_finish_output2+0xb4b/0x1fb0 [ 286.520787][ C0] ? __pfx_ip_finish_output2+0x10/0x10 [ 286.521355][ C0] ? __ip_finish_output+0x15d/0x750 [ 286.521890][ C0] ip_output+0x164/0x4a0 [ 286.522372][ C0] ? __pfx_ip_output+0x10/0x10 [ 286.522872][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.523402][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.524031][ C0] ? __pfx_ip_finish_output+0x10/0x10 [ 286.524605][ C0] ? __ip_queue_xmit+0x999/0x2260 [ 286.525200][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.525744][ C0] ? ipv4_dst_check+0x16a/0x2b0 [ 286.526279][ C0] ? lock_release+0x217/0x2c0 [ 286.526793][ C0] __ip_queue_xmit+0x1883/0x2260 [ 286.527324][ C0] ? __skb_clone+0x54c/0x730 [ 286.527827][ C0] __tcp_transmit_skb+0x209b/0x37a0 [ 286.528374][ C0] ? __pfx___tcp_transmit_skb+0x10/0x10 [ 286.528952][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.529472][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.530152][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.530691][ C0] tcp_write_xmit+0xb81/0x88b0 [ 286.531224][ C0] ? mod_memcg_state+0x4d/0x60 [ 286.531736][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.532253][ C0] __tcp_push_pending_frames+0x90/0x320 [ 286.532826][ C0] tcp_send_fin+0x141/0xb50 [ 286.533352][ C0] ? __pfx_tcp_send_fin+0x10/0x10 [ 286.533908][ C0] ? __local_bh_enable_ip+0xab/0x140 [ 286.534495][ C0] inet_shutdown+0x243/0x320 [ 286.535077][ C0] nvme_tcp_alloc_queue+0xb3b/0x2590 [nvme_tcp] [ 286.535709][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.536314][ C0] ? __pfx_nvme_tcp_alloc_queue+0x10/0x10 [nvme_tcp] [ 286.536996][ C0] ? do_raw_spin_unlock+0x54/0x1e0 [ 286.537550][ C0] ? _raw_spin_unlock+0x29/0x50 [ 286.538127][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.538664][ C0] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 286.539249][ C0] ? nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.539892][ C0] ? __wake_up+0x40/0x60 [ 286.540392][ C0] nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.541047][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.541589][ C0] nvme_tcp_setup_ctrl+0x8b/0x7a0 [nvme_tcp] [ 286.542254][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.542887][ C0] ? __pfx_nvme_tcp_setup_ctrl+0x10/0x10 [nvme_tcp] [ 286.543568][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.544166][ C0] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 286.544792][ C0] ? nvme_change_ctrl_state+0x196/0x2e0 [nvme_core] [ 286.545477][ C0] nvme_tcp_create_ctrl+0x839/0xb90 [nvme_tcp] [ 286.546126][ C0] nvmf_dev_write+0x3db/0x7e0 [nvme_fabrics] [ 286.546775][ C0] ? rw_verify_area+0x69/0x520 [ 286.547334][ C0] vfs_write+0x218/0xe90 [ 286.547854][ C0] ? do_syscall_64+0x9f/0x190 [ 286.548408][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.549037][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.549659][ C0] ? __pfx_vfs_write+0x10/0x10 [ 286.550259][ C0] ? do_syscall_64+0x9f/0x190 [ 286.550840][ C0] ? syscall_exit_to_user_mode+0x8e/0x280 [ 286.551516][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.552180][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.552834][ C0] ? ksys_read+0xf5/0x1c0 [ 286.553386][ C0] ? __pfx_ksys_read+0x10/0x10 [ 286.553964][ C0] ksys_write+0xf5/0x1c0 [ 286.554499][ C0] ? __pfx_ksys_write+0x10/0x10 [ 286.555072][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.555698][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.556319][ C0] ? do_syscall_64+0x54/0x190 [ 286.556866][ C0] do_syscall_64+0x93/0x190 [ 286.557420][ C0] ? rcu_read_unlock+0x17/0x60 [ 286.557986][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.558526][ C0] ? lock_release+0x217/0x2c0 [ 286.559087][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.559659][ C0] ? count_memcg_events.constprop.0+0x4a/0x60 [ 286.560476][ C0] ? exc_page_fault+0x7a/0x110 [ 286.561064][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.561647][ C0] ? lock_release+0x217/0x2c0 [ 286.562257][ C0] ? do_user_addr_fault+0x171/0xa00 [ 286.562839][ C0] ? do_user_addr_fault+0x4a2/0xa00 [ 286.563453][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.564112][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.564677][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.565317][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.565922][ C0] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 286.566542][ C0] RIP: 0033:0x7fe3c05e6504 [ 286.567102][ C0] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 8b 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 286.568931][ C0] RSP: 002b:00007fff76444f58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 286.569807][ C0] RAX: ffffffffffffffda RBX: 000000003b40d930 RCX: 00007fe3c05e6504 [ 286.570621][ C0] RDX: 00000000000000cf RSI: 000000003b40d930 RDI: 0000000000000003 [ 286.571443][ C0] RBP: 0000000000000003 R08: 00000000000000cf R09: 000000003b40d930 [ 286.572246][ C0] R10: 0000000000000000 R11: 0000000000000202 R12: 000000003b40cd60 [ 286.573069][ C0] R13: 00000000000000cf R14: 00007fe3c07417f8 R15: 00007fe3c073502e [ 286.573886][ C0] </TASK> Closes: https://lore.kernel.org/linux-nvme/5hdonndzoqa265oq3bj6iarwtfk5dewxxjtbjvn5uqnwclpwt6@a2n6w3taxxex/ Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvmet-tcp: select CONFIG_TLS from CONFIG_NVME_TARGET_TCP_TLSAlistair Francis
Ensure that TLS support is enabled in the kernel when CONFIG_NVME_TARGET_TCP_TLS is enabled. Without this the code compiles, but does not actually work unless something else enables CONFIG_TLS. Fixes: 675b453e0241 ("nvmet-tcp: enable TLS handshake upcall") Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvme-tcp: select CONFIG_TLS from CONFIG_NVME_TCP_TLSAlistair Francis
Ensure that TLS support is enabled in the kernel when CONFIG_NVME_TCP_TLS is enabled. Without this the code compiles, but does not actually work unless something else enables CONFIG_TLS. Fixes: be8e82caa68 ("nvme-tcp: enable TLS handshake upcall") Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvme-tcp: fix premature queue removal and I/O failoverMichael Liang
This patch addresses a data corruption issue observed in nvme-tcp during testing. In an NVMe native multipath setup, when an I/O timeout occurs, all inflight I/Os are canceled almost immediately after the kernel socket is shut down. These canceled I/Os are reported as host path errors, triggering a failover that succeeds on a different path. However, at this point, the original I/O may still be outstanding in the host's network transmission path (e.g., the NIC’s TX queue). From the user-space app's perspective, the buffer associated with the I/O is considered completed since they're acked on the different path and may be reused for new I/O requests. Because nvme-tcp enables zero-copy by default in the transmission path, this can lead to corrupted data being sent to the original target, ultimately causing data corruption. We can reproduce this data corruption by injecting delay on one path and triggering i/o timeout. To prevent this issue, this change ensures that all inflight transmissions are fully completed from host's perspective before returning from queue stop. To handle concurrent I/O timeout from multiple namespaces under the same controller, always wait in queue stop regardless of queue's state. This aligns with the behavior of queue stopping in other NVMe fabric transports. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Michael Liang <mliang@purestorage.com> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reviewed-by: Randy Jennings <randyj@purestorage.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30bnxt_en: Fix ethtool -d byte order for 32-bit valuesMichael Chan
For version 1 register dump that includes the PCIe stats, the existing code incorrectly assumes that all PCIe stats are 64-bit values. Fix it by using an array containing the starting and ending index of the 32-bit values. The loop in bnxt_get_regs() will use the array to do proper endian swap for the 32-bit values. Fixes: b5d600b027eb ("bnxt_en: Add support for 'ethtool -d'") Reviewed-by: Shruti Parab <shruti.parab@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix out-of-bound memcpy() during ethtool -wShruti Parab
When retrieving the FW coredump using ethtool, it can sometimes cause memory corruption: BUG: KFENCE: memory corruption in __bnxt_get_coredump+0x3ef/0x670 [bnxt_en] Corrupted memory at 0x000000008f0f30e8 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#45): __bnxt_get_coredump+0x3ef/0x670 [bnxt_en] ethtool_get_dump_data+0xdc/0x1a0 __dev_ethtool+0xa1e/0x1af0 dev_ethtool+0xa8/0x170 dev_ioctl+0x1b5/0x580 sock_do_ioctl+0xab/0xf0 sock_ioctl+0x1ce/0x2e0 __x64_sys_ioctl+0x87/0xc0 do_syscall_64+0x5c/0xf0 entry_SYSCALL_64_after_hwframe+0x78/0x80 ... This happens when copying the coredump segment list in bnxt_hwrm_dbg_dma_data() with the HWRM_DBG_COREDUMP_LIST FW command. The info->dest_buf buffer is allocated based on the number of coredump segments returned by the FW. The segment list is then DMA'ed by the FW and the length of the DMA is returned by FW. The driver then copies this DMA'ed segment list to info->dest_buf. In some cases, this DMA length may exceed the info->dest_buf length and cause the above BUG condition. Fix it by capping the copy length to not exceed the length of info->dest_buf. The extra DMA data contains no useful information. This code path is shared for the HWRM_DBG_COREDUMP_LIST and the HWRM_DBG_COREDUMP_RETRIEVE FW commands. The buffering is different for these 2 FW commands. To simplify the logic, we need to move the line to adjust the buffer length for HWRM_DBG_COREDUMP_RETRIEVE up, so that the new check to cap the copy length will work for both commands. Fixes: c74751f4c392 ("bnxt_en: Return error if FW returns more data than dump length") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix coredump logic to free allocated bufferShruti Parab
When handling HWRM_DBG_COREDUMP_LIST FW command in bnxt_hwrm_dbg_dma_data(), the allocated buffer info->dest_buf is not freed in the error path. In the normal path, info->dest_buf is assigned to coredump->data and it will eventually be freed after the coredump is collected. Free info->dest_buf immediately inside bnxt_hwrm_dbg_dma_data() in the error path. Fixes: c74751f4c392 ("bnxt_en: Return error if FW returns more data than dump length") Reported-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: delay pci_alloc_irq_vectors() in the AER pathKashyap Desai
This patch is similar to the last patch to delay the pci_alloc_irq_vectors() call in the AER path until after calling bnxt_reserve_rings(). bnxt_reserve_rings() needs to properly map the MSIX table first before we call pci_alloc_irq_vectors() which may immediately write to the MSIX table in some architectures. Move the bnxt_init_int_mode() call from bnxt_io_slot_reset() to bnxt_io_resume() after calling bnxt_reserve_rings(). With this change, the AER path may call bnxt_open() -> bnxt_hwrm_if_change() with bp->irq_tbl set to NULL. bp->irq_tbl is cleared when we call bnxt_clear_int_mode() in bnxt_io_slot_reset(). So we cannot use !bp->irq_tbl to detect aborted FW reset. Add a new BNXT_FW_RESET_STATE_ABORT to detect aborted FW reset in bnxt_hwrm_if_change(). Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()Kashyap Desai
On some architectures (e.g. ARM), calling pci_alloc_irq_vectors() will immediately cause the MSIX table to be written. This will not work if we haven't called bnxt_reserve_rings() to properly map the MSIX table to the MSIX vectors reserved by FW. Fix the FW error recovery path to delay the bnxt_init_int_mode() -> pci_alloc_irq_vectors() call by removing it from bnxt_hwrm_if_change(). bnxt_request_irq() later in the code path will call it and by then the MSIX table is properly mapped. Fixes: 4343838ca5eb ("bnxt_en: Replace deprecated PCI MSIX APIs") Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()Somnath Kotur
If bnxt_rx_vlan() fails because the VLAN protocol ID is invalid, the SKB is freed but we're missing the call to recycle it. This may cause the warning: "page_pool_release_retry() stalled pool shutdown" Add the missing skb_mark_for_recycle() in bnxt_rx_vlan(). Fixes: 86b05508f775 ("bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix ethtool selftest output in one of the failure casesKalesh AP
When RDMA driver is loaded, running offline self test is not supported and driver returns failure early. But it is not clearing the input buffer and hence the application prints some junk characters for individual test results. Fix it by clearing the buffer before returning. Fixes: 895621f1c816 ("bnxt_en: Don't support offline self test when RoCE driver is loaded") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix error handling path in bnxt_init_chip()Shravya KN
WARN_ON() is triggered in __flush_work() if bnxt_init_chip() fails because we call cancel_work_sync() on dim work that has not been initialized. WARNING: CPU: 37 PID: 5223 at kernel/workqueue.c:4201 __flush_work.isra.0+0x212/0x230 The driver relies on the BNXT_STATE_NAPI_DISABLED bit to check if dim work has already been cancelled. But in the bnxt_open() path, BNXT_STATE_NAPI_DISABLED is not set and this causes the error path to think that it needs to cancel the uninitalized dim work. Fix it by setting BNXT_STATE_NAPI_DISABLED during initialization. The bit will be cleared when we enable NAPI and initialize dim work. Fixes: 40452969a506 ("bnxt_en: Fix DIM shutdown") Suggested-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30s390/con3270: Use strscpy() instead of strcpy()Heiko Carstens
Use strscpy() instead of strcpy() so that bounds checking is performed on the destination buffer. This requires to keep track of the size of the dynamically allocated prompt memory area, which is done with a new prompt_sz within struct tty3270. Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390: Simple strcpy() to strscpy() conversionsHeiko Carstens
Convert all strcpy() usages to strscpy() where the conversion means just replacing strcpy() with strscpy(). strcpy() is deprecated since it performs no bounds checking on the destination buffer. Reviewed-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey/crypto: Introduce xflags param for pkey in-kernel APIHarald Freudenberger
Add a new parameter xflags to the in-kernel API function pkey_key2protkey(). Currently there is only one flag supported: * PKEY_XFLAG_NOMEMALLOC: If this flag is given in the xflags parameter, the pkey implementation is not allowed to allocate memory but instead should fall back to use preallocated memory or simple fail with -ENOMEM. This flag is for protected key derive within a cipher or similar which must not allocate memory which would cause io operations - see also the CRYPTO_ALG_ALLOCATES_MEMORY flag in crypto.h. The one and only user of this in-kernel API - the skcipher implementations PAES in paes_s390.c set this flag upon request to derive a protected key from the given raw key material. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-26-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey: Provide and pass xflags within pkey and zcrypt layersHarald Freudenberger
Provide and pass the xflag parameter from pkey ioctls through the pkey handler and further down to the implementations (CCA, EP11, PCKMO and UV). So all the code is now prepared and ready to support xflags ("execution flag"). The pkey layer supports the xflag PKEY_XFLAG_NOMEMALLOC: If this flag is given in the xflags parameter, the pkey implementation is not allowed to allocate memory but instead should fall back to use preallocated memory or simple fail with -ENOMEM. This flag is for protected key derive within a cipher or similar which must not allocate memory which would cause io operations - see also the CRYPTO_ALG_ALLOCATES_MEMORY flag in crypto.h. Within the pkey handlers this flag is then to be translated to appropriate zcrypt xflags before any zcrypt related functions are called. So the PKEY_XFLAG_NOMEMALLOC translates to ZCRYPT_XFLAG_NOMEMALLOC - If this flag is set, no memory allocations which may trigger any IO operations are done. The pkey in-kernel pkey API still does not provide this xflag param. That's intended to come with a separate patch which enables this functionality. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-25-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey: Use preallocated memory for retrieve of UV secret metadataHarald Freudenberger
The pkey uv functions may be called in a situation where memory allocations which trigger IO operations are not allowed. An example: decryption of the swap partition with protected key (PAES). The pkey uv code takes care of this by holding one preallocated struct uv_secret_list to be used with the new UV function uv_find_secret(). The older function uv_get_secret_metadata() used before always allocates/frees an ephemeral memory buffer. The preallocated struct is concurrency protected by a mutex. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-23-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey: Rework EP11 pkey handler to use stack for small memory allocsHarald Freudenberger
There have been some places in the EP11 handler code where relatively small amounts of memory have been allocated an freed at the end of the function. This code has been reworked to use the stack instead. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-21-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/pkey: Rework CCA pkey handler to use stack for small memory allocsHarald Freudenberger
There have been some places in the CCA handler code where relatively small amounts of memory have been allocated an freed at the end of the function. This code has been reworked to use the stack instead. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-20-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework ep11 misc functions to use cprb mempoolHarald Freudenberger
There are two places in the ep11 misc code where a short term memory buffer is needed. Rework this code to use the cprb mempool to satisfy this ephemeral memory requirements. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-19-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Locate ep11_domain_query_info onto the stack instead of kmallocHarald Freudenberger
Locate the relative small struct ep11_domain_query_info variable onto the stack instead of kmalloc()/kfree(). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-18-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Propagate xflags argument with cca_get_info()Harald Freudenberger
Propagate the xflags argument from the cca_get_info() caller down to the lower level functions for proper memory allocation hints. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-17-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework cca misc functions kmallocs to use the cprb mempoolHarald Freudenberger
Rework two places in the zcrypt cca misc code using kmalloc() for ephemeral memory allocation. As there is anyway now a cprb mempool let's use this pool instead to satisfy these short term memory allocations. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-16-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework ep11 findcard() implementation and callersHarald Freudenberger
Rework the memory usage of the ep11 findcard() implementation: - findcard does not allocate memory for the list of apqns any more. - the callers are now responsible to provide an array of apqns to store the matching apqns into. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-15-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework cca findcard() implementation and callersHarald Freudenberger
Rework the memory usage of the cca findcard() implementation: - findcard does not allocate memory for the list of apqns any more. - the callers are now responsible to provide an array of apqns to store the matching apqns into. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-14-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Remove CCA and EP11 card and domain info cachesHarald Freudenberger
Remove the caching of the CCA and EP11 card and domain info. In nearly all places where the card or domain info is fetched the verify param was enabled and thus the cache was bypassed. The only real place where info from the cache was used was in the sysfs pseudo files in cases where the card/queue was switched to "offline". All other callers insisted on getting fresh info and thus a communication to the card was enforced. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-13-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Remove unused functions from cca miscHarald Freudenberger
The static function findcard() and the zcrypt cca_findcard() function are both not used any more. Remove this outdated code and an internal function only called by these. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-12-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Introduce pre-allocated device status array for ep11 miscHarald Freudenberger
Introduce a pre-allocated device status array memory together with a mutex controlling the occupation to be used by the findcard() function. Limit the device status array to max 128 cards and max 128 domains to reduce the size of this pre-allocated memory to 64 KB. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-11-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Introduce pre-allocated device status array for cca miscHarald Freudenberger
Introduce a pre-allocated device status array memory together with a mutex controlling the occupation to be used by the findcard2() function. Limit the device status array to max 128 cards and max 128 domains to reduce the size of this pre-allocated memory to 64 KB. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-10-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Rework zcrypt function zcrypt_device_status_mask_extHarald Freudenberger
Rework the existing function zcrypt_device_status_mask_ext(): Add two new parameters to provide upper limits for cards and queues. The existing implementation needed an array of 256 * 256 * 4 = 256 KB which is really huge. The reworked function is more flexible in the sense that the caller can decide the upper limit for cards and domains to be stored into the status array. So for example a caller may decide to only query for cards 0...127 and queues 0...127 and thus only an array of size 128 * 128 * 4 = 64 KB is needed. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-9-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Introduce cprb mempool for ep11 misc functionsHarald Freudenberger
Introduce a cprb mempool for the zcrypt ep11 misc functions (zcrypt_ep11misc.*) do some preparation rework to support a do-not-allocate path through some zcrypt ep11 misc functions. The mempool is controlled by the zcrypt module parameter "mempool_threshold" which shall control the minimal amount of memory items for CCA and EP11. The mempool shall support "mempool_threshold" requests/replies in parallel which means for EP11 to hold a send and receive buffer memory per request. Each of this cprb space items is limited to 8 KB. So by default the mempool consumes 5 * 2 * 8KB = 80KB If the mempool is depleted upon one ep11 misc functions is called with the ZCRYPT_XFLAG_NOMEMALLOC xflag set, the function will fail with -ENOMEM and the caller is responsible for taking further actions. This is only part of an rework to support a new xflag ZCRYPT_XFLAG_NOMEMALLOC but not yet complete. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-8-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Introduce cprb mempool for cca misc functionsHarald Freudenberger
Introduce a new module parameter "zcrypt_mempool_threshold" for the zcrypt module. This parameter controls the minimal amount of mempool items which are pre-allocated for urgent requests/replies and will be used with the support for the new xflag ZCRYPT_XFLAG_NOMEMALLOC. The default value of 5 shall provide enough memory items to support up to 5 requests (and their associated reply) in parallel. The minimum value is 1 and is checked in zcrypt module init(). If the mempool is depleted upon one cca misc functions is called with the named xflag set, the function will fail with -ENOMEM and the caller is responsible for taking further actions. For CCA each mempool item is 16KB, as a CCA CPRB needs to hold the request and the reply. The pool items only support requests/replies with a limit of about 8KB. So by default the CCA mempool consumes 5 * 16KB = 80KB This is only part of an rework to support a new xflag ZCRYPT_XFLAG_NOMEMALLOC but not yet complete. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-7-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/ap/zcrypt: New xflag parameterHarald Freudenberger
Introduce a new flag parameter for the both cprb send functions zcrypt_send_cprb() and zcrypt_send_ep11_cprb(). This new xflags parameter ("execution flags") shall be used to provide execution hints and flags for this crypto request. There are two flags implemented to be used with these functions: * ZCRYPT_XFLAG_USERSPACE - indicates to the lower layers that all the ptrs address userspace. So when construction the ap msg copy_from_user() is to be used. If this flag is NOT set, the ptrs address kernel memory and thus memcpy() is to be used. * ZCRYPT_XFLAG_NOMEMALLOC - indicates that this task must not allocate memory which may be allocated with io operations. For the AP bus and zcrypt message layer this means: * The ZCRYPT_XFLAG_USERSPACE is mapped to the already existing bool variable "userspace" which is propagated to the zcrypt proto implementations. * The ZCRYPT_XFLAG_NOMEMALLOC results in setting the AP flag AP_MSG_FLAG_MEMPOOL when the AP msg buffer is initialized. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-6-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/zcrypt: Avoid alloc and copy of ep11 targets if kernelspace cprbHarald Freudenberger
If there is a target list of APQNs given when an CPRB is to be send via zcrypt_send_ep11_cprb() there is always a kmalloc() done and the targets are copied via z_copy_from_user. As there are callers from kernel space (zcrypt_ep11misc.c) which signal this via the userspace parameter improve this code to directly use the given target list in case of kernelspace thus removing the unnecessary memory alloc and mem copy. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-5-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/ap: Introduce ap message buffer poolHarald Freudenberger
There is a need for a do-not-allocate-memory path through the AP bus layer. The pkey layer may be triggered via the in-kernel interface from a protected key crypto algorithm (namely PAES) to convert a secure key into a protected key. This happens in a workqueue context, so sleeping is allowed but memory allocations causing IO operations are not permitted. To accomplish this, an AP message memory pool with pre-allocated space is established. When ap_init_apmsg() with use_mempool set to true is called, instead of kmalloc() the ap message buffer is allocated from the ap_msg_pool. This pool only holds a limited amount of buffers: ap_msg_pool_min_items with the item size AP_DEFAULT_MAX_MSG_SIZE and exactly one of these items (if available) is returned if ap_init_apmsg() with the use_mempool arg set to true is called. When this pool is exhausted and use_mempool is set true, ap_init_apmsg() returns -ENOMEM without any attempt to allocate memory and the caller has to deal with that. Default values for this mempool of ap messages is: * Each buffer is 12KB (that is the default AP bus size and all the urgent messages should fit into this space). * Minimum items held in the pool is 8. This value is adjustable via module parameter ap.msgpool_min_items. The zcrypt layer may use this flag to indicate to the ap bus that the processing path for this message should not allocate memory but should use pre-allocated memory buffer instead. This is to prevent deadlocks with crypto and io for example with encrypted swap volumes. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-4-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/ap/zcrypt: Rework AP message buffer allocationHarald Freudenberger
Slight rework on the way how AP message buffers are allocated. Instead of having multiple places with kmalloc() calls all the AP message buffers are now allocated and freed on exactly one place: ap_init_apmsg() allocates the current AP bus max limit of ap_max_msg_size (defaults to 12KB). The AP message buffer is then freed in ap_release_apmsg(). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-3-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-04-30s390/ap: Move response_type struct into ap_msg structHarald Freudenberger
Move the very small response_type struct into struct ap_msg. So there is no need to kmalloc this tiny struct with each ap message preparation. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20250424133619.16495-2-freude@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>