summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-23Merge branch 'enable-rx-hw-timestamp-for-ptp-packets-using-cpts-fifo'Paolo Abeni
Chintan Vankar says: ==================== Enable RX HW timestamp for PTP packets using CPTS FIFO The CPSW offers two mechanisms for communicating packet ingress timestamp information to the host. The first mechanism is via the CPTS Event FIFO which records timestamp when triggered by certain events. One such event is the reception of an Ethernet packet with a specified EtherType field. This is used to capture ingress timestamps for PTP packets. With this mechanism the host must read the timestamp (from the CPTS FIFO) separately from the packet payload which is delivered via DMA. In the second mechanism of timestamping, CPSW driver enables hardware timestamping for all received packets by setting the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS module to timestamp all received packets, followed by passing timestamp via DMA descriptors. This mechanism is responsible for triggering errata i2401: "CPSW: Host Timestamps Cause CPSW Port to Lock up." The errata affects all K3 SoCs. Link to errata for AM64x: https://www.ti.com/lit/er/sprz457h/sprz457h.pdf As a workaround we can use first mechanism to timestamp received packets. ==================== Link: https://lore.kernel.org/r/20240419082626.57225-1-c-vankar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ethernet: ti: am65-cpsw/ethtool: Enable RX HW timestamp only for PTP ↵Chintan Vankar
packets In the current mechanism of timestamping, am65-cpsw-nuss driver enables hardware timestamping for all received packets by setting the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS module to timestamp all received packets, followed by passing timestamp via DMA descriptors. This mechanism causes CPSW Port to Lock up. To prevent port lock up, don't enable rx packet timestamping by setting TSTAMP_EN bit in CPTS_CONTROL register. The workaround for timestamping received packets is to utilize the CPTS Event FIFO that records timestamps corresponding to certain events. The CPTS module is configured to generate timestamps for Multicast Ethernet, UDP/IPv4 and UDP/IPv6 PTP packets. Update supported hwtstamp_rx_filters values for CPSW's timestamping capability. Fixes: b1f66a5bee07 ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support") Signed-off-by: Chintan Vankar <c-vankar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ethernet: ti: am65-cpts: Enable RX HW timestamp for PTP packets using ↵Chintan Vankar
CPTS FIFO Add a new function "am65_cpts_rx_timestamp()" which checks for PTP packets from header and timestamps them. Add another function "am65_cpts_find_rx_ts()" which finds CPTS FIFO Event to get the timestamp of received PTP packet. Signed-off-by: Chintan Vankar <c-vankar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23ax25: Fix netdev refcount issueDuoming Zhou
The dev_tracker is added to ax25_cb in ax25_bind(). When the ax25 device is detaching, the dev_tracker of ax25_cb should be deallocated in ax25_kill_by_device() instead of the dev_tracker of ax25_dev. The log reported by ref_tracker is shown below: [ 80.884935] ref_tracker: reference already released. [ 80.885150] ref_tracker: allocated in: [ 80.885349] ax25_dev_device_up+0x105/0x540 [ 80.885730] ax25_device_event+0xa4/0x420 [ 80.885730] notifier_call_chain+0xc9/0x1e0 [ 80.885730] __dev_notify_flags+0x138/0x280 [ 80.885730] dev_change_flags+0xd7/0x180 [ 80.885730] dev_ifsioc+0x6a9/0xa30 [ 80.885730] dev_ioctl+0x4d8/0xd90 [ 80.885730] sock_do_ioctl+0x1c2/0x2d0 [ 80.885730] sock_ioctl+0x38b/0x4f0 [ 80.885730] __se_sys_ioctl+0xad/0xf0 [ 80.885730] do_syscall_64+0xc4/0x1b0 [ 80.885730] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 80.885730] ref_tracker: freed in: [ 80.885730] ax25_device_event+0x272/0x420 [ 80.885730] notifier_call_chain+0xc9/0x1e0 [ 80.885730] dev_close_many+0x272/0x370 [ 80.885730] unregister_netdevice_many_notify+0x3b5/0x1180 [ 80.885730] unregister_netdev+0xcf/0x120 [ 80.885730] sixpack_close+0x11f/0x1b0 [ 80.885730] tty_ldisc_kill+0xcb/0x190 [ 80.885730] tty_ldisc_hangup+0x338/0x3d0 [ 80.885730] __tty_hangup+0x504/0x740 [ 80.885730] tty_release+0x46e/0xd80 [ 80.885730] __fput+0x37f/0x770 [ 80.885730] __x64_sys_close+0x7b/0xb0 [ 80.885730] do_syscall_64+0xc4/0x1b0 [ 80.885730] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 80.893739] ------------[ cut here ]------------ [ 80.894030] WARNING: CPU: 2 PID: 140 at lib/ref_tracker.c:255 ref_tracker_free+0x47b/0x6b0 [ 80.894297] Modules linked in: [ 80.894929] CPU: 2 PID: 140 Comm: ax25_conn_rel_6 Not tainted 6.9.0-rc4-g8cd26fd90c1a #11 [ 80.895190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qem4 [ 80.895514] RIP: 0010:ref_tracker_free+0x47b/0x6b0 [ 80.895808] Code: 83 c5 18 4c 89 eb 48 c1 eb 03 8a 04 13 84 c0 0f 85 df 01 00 00 41 83 7d 00 00 75 4b 4c 89 ff 9 [ 80.896171] RSP: 0018:ffff888009edf8c0 EFLAGS: 00000286 [ 80.896339] RAX: 1ffff1100141ac00 RBX: 1ffff1100149463b RCX: dffffc0000000000 [ 80.896502] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff88800a0d6518 [ 80.896925] RBP: ffff888009edf9b0 R08: ffff88806d3288d3 R09: 1ffff1100da6511a [ 80.897212] R10: dffffc0000000000 R11: ffffed100da6511b R12: ffff88800a4a31d4 [ 80.897859] R13: ffff88800a4a31d8 R14: dffffc0000000000 R15: ffff88800a0d6518 [ 80.898279] FS: 00007fd88b7fe700(0000) GS:ffff88806d300000(0000) knlGS:0000000000000000 [ 80.899436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 80.900181] CR2: 00007fd88c001d48 CR3: 000000000993e000 CR4: 00000000000006f0 ... [ 80.935774] ref_tracker: sp%d@000000000bb9df3d has 1/1 users at [ 80.935774] ax25_bind+0x424/0x4e0 [ 80.935774] __sys_bind+0x1d9/0x270 [ 80.935774] __x64_sys_bind+0x75/0x80 [ 80.935774] do_syscall_64+0xc4/0x1b0 [ 80.935774] entry_SYSCALL_64_after_hwframe+0x67/0x6f Change ax25_dev->dev_tracker to the dev_tracker of ax25_cb in order to mitigate the bug. Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20240419020456.29826-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23wifi: ath12k: ACPI band edge channel power supportLingbo Kong
Currently, ath12k does not have the ability to set band edge channel power for WCN7850. In order to support this, ath12k gets band edge channel power table in ath12k_acpi_dsm_get_data() function and sets pdev_id and param_type_id, then finally sends these data and WMI_PDEV_SET_BIOS_INTERFACE_CMDID to firmware to set band edge channel power. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Lingbo Kong <quic_lingbok@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240422033054.979-5-quic_lingbok@quicinc.com
2024-04-23wifi: ath12k: ACPI CCA threshold supportLingbo Kong
Currently, ath12k does not have the ability to adjust Clear Channel Assessment (CCA) threshold values to meet the regulatory requirements. Get the values from ACPI and send them to the firmware using ath12k_wmi_set_bios_cmd() function. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Lingbo Kong <quic_lingbok@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240422033054.979-4-quic_lingbok@quicinc.com
2024-04-23wifi: ath12k: ACPI SAR supportLingbo Kong
In order to enable ACPI SAR (Specific Absorption Rate), ath12k gets SAR and GEO offset tables from ACPI and sends the data to firmware using WMI_PDEV_SET_BIOS_SAR_TABLE_CMDID and WMI_PDEV_SET_BIOS_GEO_TABLE_CMDID commands. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Lingbo Kong <quic_lingbok@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240422033054.979-3-quic_lingbok@quicinc.com
2024-04-23wifi: ath12k: ACPI TAS supportLingbo Kong
Currently, ath12k does not support Time-Average-SAR (TAS). In order to enable TAS read the tables from ACPI and send them to the firmware using WMI_PDEV_SET_BIOS_INTERFACE_CMDID command. Besides, ath12k registers an ACPI event callback so that ACPI can notify ath12k to get the updated SAR power table and sends it to the firmware when the device state is changed. ACPI is only enabled for WCN7850 using struct ath12k_hw_params::acpi_guid field. Most likely QCN9274 will never support ACPI as the chip is not used in laptops. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Lingbo Kong <quic_lingbok@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240422033054.979-2-quic_lingbok@quicinc.com
2024-04-23wifi: ath12k: change supports_suspend to true for WCN7850Baochen Qiang
Now that all things are ready, enable supports_suspend to make suspend/resume work for WCN7850. Don't touch other chips because they don't support suspend. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-11-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: support suspend/resumeBaochen Qiang
Now that all infrastructure is in place and ath12k is fixed to handle all the corner cases, power down the ath12k firmware during suspend and power it back up during resume. For suspend, two conditions needs to be satisfied: 1. since MHI channel unprepare would be done in late suspend stage, ath12k needs to get all QMI-dependent things done before that stage. 2. and because unprepare MHI channels requires a working MHI stack, ath12k is not allowed to call mhi_power_down() until that finishes. So the original suspend callback is separated into two parts: the first part handles all QMI-dependent things in suspend callback; while the second part powers down MHI in suspend_late callback. This is valid because kernel calls ath12k's suspend callback before calling all suspend_late callbacks, making the first condition satisfied. And because MHI devices are children of ath12k device (ab->dev), kernel guarantees that ath12k's suspend_late callback is called after QRTR's suspend_late callback, this satisfies the second condition. Above analysis also applies to resume process. so the original resume callback is separated into two parts: the first part powers up MHI stack in resume_early callback, this guarantees MHI stack is working when QRTR tries to prepare MHI channels (kernel calls QRTR's resume_early callback after ath12k's resume_early callback, due to the child-father relationship); the second part waits for the completion of restart, which would succeed since MHI channels are ready for use by QMI. Another notable change is in power down path, we tell mhi_power_down() to not to destroy MHI devices, making it possible for QRTR to help unprepare/prepare MHI channels, and finally get us rid of the potential probe-defer issue when resume. Also change related code due to interface changes. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-10-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: avoid stopping mac80211 queues in ath12k_core_restart()Baochen Qiang
Currently when resume ath12k_core_restart() calls ath12k_core_pre_reconfigure_recovery() where mac80211 queues are stopped by calling ieee80211_stop_queues(). Then in ath12k_mac_op_reconfig_complete() those queues are not started because ieee80211_wake_queues() is skipped due to the check on reconfig_type. The result is that mac80211 could not deliver any frame to ath12k to send out, finally making connection fail. [84473.104249] PM: suspend exit [84479.372397] wlan0: no VHT 160 MHz capability on 5 GHz, limiting to 80 MHz [84479.372401] wlan0: determined local STA to be EHT, BW limited to 80 MHz [84479.372416] wlan0: determined AP 00:03:7f:12:b7:b7 to be HE [84479.372420] wlan0: connecting with HE mode, max bandwidth 80 MHz [84479.580348] wlan0: authenticate with 00:03:7f:12:b7:b7 (local address=00:03:7f:37:11:53) [84479.580351] wlan0: send auth to 00:03:7f:12:b7:b7 (try 1/3) [84480.698993] wlan0: send auth to 00:03:7f:12:b7:b7 (try 2/3) [84481.816505] wlan0: send auth to 00:03:7f:12:b7:b7 (try 3/3) [84482.810966] wlan0: authentication with 00:03:7f:12:b7:b7 timed out Actually we don't need to stop/start queues during suspend/resume, so remove ath12k_core_pre_reconfigure_recovery() from ath12k_core_restart(). This won't cause any regression because currently the only chance ath12k_core_restart() gets called is in reset case, where ab->is_reset is set so that function will never be executed. Also remove ath12k_core_post_reconfigure_recovery() because it is not needed in suspend/resume case. This is also valid due to above analysis. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-9-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: no need to handle pktlog during suspend/resumeBaochen Qiang
Currently pktlog is stopped in suspend callback and started in resume callback, and in either scenarios it's basically to delete/modify ab->mon_reap_timer and to purge related rings. For WCN7850 it's pointless because pktlog is not enabled: both ab->mon_reap_timer and those rings are not initialized. So remove pktlog handling in suspend/resume callbacks. And further, remove these two functions and related callee because no one is calling them. Other chips are not affected because now only WCN7850 supports suspend/resume. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-8-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: flush all packets before suspendBaochen Qiang
In order to send out all packets before going to suspend, current code adds a 500ms delay as a workaround. It is a rough estimate and may not work. Fix this by checking packet counters, if counters become zero, then all packets are sent out or dropped. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-7-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: decrease MHI channel buffer length to 8KBBaochen Qiang
Currently buf_len field of ath12k_mhi_config_wcn7850 is assigned with 0, making MHI use a default size, 64KB, to allocate channel buffers. This is likely to fail in some scenarios where system memory is highly fragmented and memory compaction or reclaim is not allowed. For now we haven't get any failure report on this in ath12k, but there indeed is one such case in ath11k [1]. Actually those buffers are used only by QMI target -> host communication. And for WCN7850, the largest packet size for that is less than 6KB. So change buf_len field to 8KB, which results in order 1 allocation if page size is 4KB. In this way, we can at least save some memory, and as well as decrease the possibility of allocation failure in those scenarios. [1] https://lore.kernel.org/ath11k/96481a45-3547-4d23-ad34-3a8f1d90c1cd@suse.cz/ Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-6-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: fix warning on DMA ring capabilities eventBaochen Qiang
We are seeing below warning in both reset and suspend/resume scenarios: [ 4153.776040] ath12k_pci 0000:04:00.0: Already processed, so ignoring dma ring caps This is because ab->num_db_cap is not cleared in ath12k_wmi_free_dbring_caps(), so clear it to avoid such warnings. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-5-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: do not dump SRNG statistics during resumeBaochen Qiang
Both the firmware reset feature and the power management suspend/resume feature share common power-down and power-up functionality. One aspect of the power-up functionality is the handling of the ATH12K_QMI_EVENT_FW_INIT_DONE event. When this event is received, a call is made to ath12k_hal_dump_srng_stats(), with the purpose to collect information that may be useful in debugging the cause of a firmware reset. Unfortunately, since this functionality is shared between both the firmware reset path and the power management resume path, the kernel log is flooded with messages during resume. Since these messages are not useful during resume, and in fact can be confusing and can increase the time it takes to resume, update the logic to only call ath12k_hal_dump_srng_stats() during firmware reset. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-4-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: remove MHI LOOPBACK channelsBaochen Qiang
There is no driver to match these two channels, so remove them. This fixes warnings from MHI subsystem during suspend: mhi mhi0_LOOPBACK: 1: Failed to reset channel, still resetting mhi mhi0_LOOPBACK: 0: Failed to reset channel, still resetting Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-3-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: rearrange IRQ enable/disable in reset pathBaochen Qiang
For non-WoW suspend/resume, ath12k host powers down whole hardware when suspend and powers up it when resume, the code path it goes through is very like the ath12k reset logic. In order to reuse that logic, rearrange IRQ handling in the reset path. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240412060620.27519-2-quic_bqiang@quicinc.com
2024-04-23wifi: ath12k: fix kernel crash during resumeBaochen Qiang
Currently during resume, QMI target memory is not properly handled, resulting in kernel crash in case DMA remap is not supported: BUG: Bad page state in process kworker/u16:54 pfn:36e80 page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x36e80 page dumped because: nonzero _refcount Call Trace: bad_page free_page_is_bad_report __free_pages_ok __free_pages dma_direct_free dma_free_attrs ath12k_qmi_free_target_mem_chunk ath12k_qmi_msg_mem_request_cb The reason is: Once ath12k module is loaded, firmware sends memory request to host. In case DMA remap not supported, ath12k refuses the first request due to failure in allocating with large segment size: ath12k_pci 0000:04:00.0: qmi firmware request memory request ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 7077888 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 8454144 ath12k_pci 0000:04:00.0: qmi dma allocation failed (7077888 B type 1), will try later with small size ath12k_pci 0000:04:00.0: qmi delays mem_request 2 ath12k_pci 0000:04:00.0: qmi firmware request memory request Later firmware comes back with more but small segments and allocation succeeds: ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 262144 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 524288 ath12k_pci 0000:04:00.0: qmi mem seg type 4 size 65536 ath12k_pci 0000:04:00.0: qmi mem seg type 1 size 524288 Now ath12k is working. If suspend is triggered, firmware will be reloaded during resume. As same as before, firmware requests two large segments at first. In ath12k_qmi_msg_mem_request_cb() segment count and size are assigned: ab->qmi.mem_seg_count == 2 ab->qmi.target_mem[0].size == 7077888 ab->qmi.target_mem[1].size == 8454144 Then allocation failed like before and ath12k_qmi_free_target_mem_chunk() is called to free all allocated segments. Note the first segment is skipped because its v.addr is cleared due to allocation failure: chunk->v.addr = dma_alloc_coherent() Also note that this leaks that segment because it has not been freed. While freeing the second segment, a size of 8454144 is passed to dma_free_coherent(). However remember that this segment is allocated at the first time firmware is loaded, before suspend. So its real size is 524288, much smaller than 8454144. As a result kernel found we are freeing some memory which is in use and thus crashed. So one possible fix would be to free those segments during suspend. This works because with them freed, ath12k_qmi_free_target_mem_chunk() does nothing: all segment addresses are NULL so dma_free_coherent() is not called. But note that ath11k has similar logic but never hits this issue. Reviewing code there shows the luck comes from QMI memory reuse logic. So the decision is to port it to ath12k. Like in ath11k, the crash is avoided by adding prev_size to target_mem_chunk structure and caching real segment size in it, then prev_size instead of current size is passed to dma_free_coherent(), no unexpected memory is freed now. Also reuse m3 buffer. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240419034034.2842-1-quic_bqiang@quicinc.com
2024-04-23ASoC: rt715-sdca: volume step modificationJack Yu
Volume step (dB/step) modification to fix format error which shown in amixer control. Signed-off-by: Jack Yu <jack.yu@realtek.com> Link: https://lore.kernel.org/r/b1f546ad16dc4c7abb7daa7396e8345c@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-23Merge branch ↵Paolo Abeni
'read-phy-address-of-switch-from-device-tree-on-mt7530-dsa-subdriver' Arınç ÜNAL says: ==================== Read PHY address of switch from device tree on MT7530 DSA subdriver This patch series makes the driver read the PHY address the switch listens on from the device tree which, in result, brings support for MT7530 switches listening on a different PHY address than 31. And the patch series simplifies the core operations. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> ==================== Link: https://lore.kernel.org/r/20240418-b4-for-netnext-mt7530-phy-addr-from-dt-and-simplify-core-ops-v3-0-3b5fb249b004@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: dsa: mt7530: simplify core operationsArınç ÜNAL
The core_rmw() function calls core_read_mmd_indirect() to read the requested register, and then calls core_write_mmd_indirect() to write the requested value to the register. Because Clause 22 is used to access Clause 45 registers, some operations on core_write_mmd_indirect() are unnecessarily run. Get rid of core_read_mmd_indirect() and core_write_mmd_indirect(), and run only the necessary operations on core_write() and core_rmw(). Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: dsa: mt7530-mdio: read PHY address of switch from device treeArınç ÜNAL
Read the PHY address the switch listens on from the reg property of the switch node on the device tree. This change brings support for MT7530 switches on boards with such bootstrapping configuration where the switch listens on a different PHY address than the hardcoded PHY address on the driver, 31. As described on the "MT7621 Programming Guide v0.4" document, the MT7530 switch and its PHYs can be configured to listen on the range of 7-12, 15-20, 23-28, and 31 and 0-4 PHY addresses. There are operations where the switch PHY registers are used. For the PHY address of the control PHY, transform the MT753X_CTRL_PHY_ADDR constant into a macro and use it. The PHY address for the control PHY is 0 when the switch listens on 31. In any other case, it is one greater than the PHY address the switch listens on. Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23eeprom: at24: fix memory corruption race conditionDaniel Okazaki
If the eeprom is not accessible, an nvmem device will be registered, the read will fail, and the device will be torn down. If another driver accesses the nvmem device after the teardown, it will reference invalid memory. Move the failure point before registering the nvmem device. Signed-off-by: Daniel Okazaki <dtokazaki@google.com> Fixes: b20eb4c1f026 ("eeprom: at24: drop unnecessary label") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240422174337.2487142-1-dtokazaki@google.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-04-23netfs: Fix writethrough-mode error handlingDavid Howells
Fix the error return in netfs_perform_write() acting in writethrough-mode to return any cached error in the case that netfs_end_writethrough() returns 0. This can affect the use of O_SYNC/O_DSYNC/RWF_SYNC/RWF_DSYNC in 9p and afs. Fixes: 41d8e7673a77 ("netfs: Implement a write-through caching option") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/6736.1713343639@warthog.procyon.org.uk Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23ntfs3: add legacy ntfs file operationsChristian Brauner
To ensure that ioctl()s can't be used to circumvent write restrictions. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23ntfs3: enforce read-only when used as legacy ntfs driverChristian Brauner
Ensure that ntfs3 is mounted read-only when it is used to provide the legacy ntfs driver. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-23powerpc/pseries/iommu: LPAR panics during boot up with a frozen PEGaurav Batra
At the time of LPAR boot up, partition firmware provides Open Firmware property ibm,dma-window for the PE. This property is provided on the PCI bus the PE is attached to. There are execptions where the partition firmware might not provide this property for the PE at the time of LPAR boot up. One of the scenario is where the firmware has frozen the PE due to some error condition. This PE is frozen for 24 hours or unless the whole system is reinitialized. Within this time frame, if the LPAR is booted, the frozen PE will be presented to the LPAR but ibm,dma-window property could be missing. Today, under these circumstances, the LPAR oopses with NULL pointer dereference, when configuring the PCI bus the PE is attached to. BUG: Kernel NULL pointer dereference on read at 0x000000c8 Faulting instruction address: 0xc0000000001024c0 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: Supported: Yes CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1 Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries NIP: c0000000001024c0 LR: c0000000001024b0 CTR: c000000000102450 REGS: c0000000037db5c0 TRAP: 0300 Not tainted (6.4.0-150600.9-default) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000822 XER: 00000000 CFAR: c00000000010254c DAR: 00000000000000c8 DSISR: 00080000 IRQMASK: 0 ... NIP [c0000000001024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0 LR [c0000000001024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 Call Trace: pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable) pcibios_setup_bus_self+0x1c0/0x370 __of_scan_bus+0x2f8/0x330 pcibios_scan_phb+0x280/0x3d0 pcibios_init+0x88/0x12c do_one_initcall+0x60/0x320 kernel_init_freeable+0x344/0x3e4 kernel_init+0x34/0x1d0 ret_from_kernel_user_thread+0x14/0x1c Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window") Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240422205141.10662-1-gbatra@linux.ibm.com
2024-04-23regulator: change stubbed devm_regulator_get_enable to return OkMatti Vaittinen
The devm_regulator_get_enable() should be a 'call and forget' API, meaning, when it is used to enable the regulators, the API does not provide a handle to do any further control of the regulators. It gives no real benefit to return an error from the stub if CONFIG_REGULATOR is not set. On the contrary, returning and error is causing problems to drivers when hardware is such it works out just fine with no regulator control. Returning an error forces drivers to specifically handle the case where CONFIG_REGULATOR is not set, making the mere existence of the stub questionalble. Furthermore, the stub of the regulator_enable() seems to be returning Ok. Change the stub implementation for the devm_regulator_get_enable() to return Ok so drivers do not separately handle the case where the CONFIG_REGULATOR is not set. Signed-off-by: Matti Vaittinen <mazziesaccount@gmail.com> Reported-by: Aleksander Mazur <deweloper@wp.pl> Suggested-by: Guenter Roeck <linux@roeck-us.net> Fixes: da279e6965b3 ("regulator: Add devm helpers for get and enable") Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/ZiYF6d1V1vSPcsJS@drtxq0yyyyyyyyyyyyyby-3.rev.dnainternet.fi Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-23ASoC: cs35l56: Avoid static analysis warning of uninitialised variableSimon Trimmer
Static checkers complain that the silicon_uid variable passed by pointer to cs35l56_read_silicon_uid() could later be used uninitialised when calling cs_amp_get_efi_calibration_data(). cs35l56_read_silicon_uid() must have succeeded to call cs_amp_get_efi_calibration_data() and that would have populated the variable. However, initialise the value so we are not haunted by it forevermore. Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com> Fixes: e1830f66f6c6 ("ASoC: cs35l56: Add helper functions for amp calibration") Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://lore.kernel.org/r/20240422103211.236063-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-22net: ethernet: mtk_eth_soc: flower: validate control flagsAsbjørn Sloth Tønnesen
This driver currently doesn't support any control flags. Use flow_rule_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418161821.189263-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22dpaa2-switch: flower: validate control flagsAsbjørn Sloth Tønnesen
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20240418161802.189247-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22cxgb4: flower: validate control flagsAsbjørn Sloth Tønnesen
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Only compile tested, no hardware available. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418161751.189226-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: openvswitch: Check vport netdev nameJun Gu
Ensure that the provided netdev name is not one of its aliases to prevent unnecessary creation and destruction of the vport by ovs-vswitchd. Signed-off-by: Jun Gu <jun.gu@easystack.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20240419061425.132723-1-jun.gu@easystack.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge branch 'netlink-add-nftables-spec-w-multi-messages'Jakub Kicinski
Donald Hunter says: ==================== netlink: Add nftables spec w/ multi messages This series adds a ynl spec for nftables and extends ynl with a --multi command line option that makes it possible to send transactional batches for nftables. This series includes a patch for nfnetlink which adds ACK processing for batch begin/end messages. If you'd prefer that to be sent separately to nf-next then I can do so, but I included it here so that it gets seen in context. An example of usage is: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi batch-begin '{"res-id": 10}' \ --multi newtable '{"name": "test", "nfgen-family": 1}' \ --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --multi batch-end '{"res-id": 10}' [None, None, None, None] It can also be used for bundling get requests: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi gettable '{"name": "test", "nfgen-family": 1}' \ --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --output-json [{"name": "test", "use": 1, "handle": 1, "flags": [], "nfgen-family": 1, "version": 0, "res-id": 2}, {"table": "test", "name": "chain", "handle": 1, "use": 0, "nfgen-family": 1, "version": 0, "res-id": 2}] There are 2 issues that may be worth resolving: - ynl reports errors by raising an NlError exception so only the first error gets reported. This could be changed to add errors to the list of responses so that multiple errors could be reported. - If any message does not get a response (e.g. batch-begin w/o patch 2) then ynl waits indefinitely. A recv timeout could be added which would allow ynl to terminate. ==================== Link: https://lore.kernel.org/r/20240418104737.77914-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22netfilter: nfnetlink: Handle ACK flags for batch messagesDonald Hunter
The NLM_F_ACK flag is ignored for nfnetlink batch begin and end messages. This is a problem for ynl which wants to receive an ack for every message it sends, not just the commands in between the begin/end messages. Add processing for ACKs for begin/end messages and provide responses when requested. I have checked that iproute2, pyroute2 and systemd are unaffected by this change since none of them use NLM_F_ACK for batch begin/end. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tools/net/ynl: Add multi message support to ynlDonald Hunter
Add a "--multi <do-op> <json>" command line to ynl that makes it possible to add several operations to a single netlink request payload. The --multi command line option is repeated for each operation. This is used by the nftables family for transaction batches. For example: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi batch-begin '{"res-id": 10}' \ --multi newtable '{"name": "test", "nfgen-family": 1}' \ --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --multi batch-end '{"res-id": 10}' [None, None, None, None] It can also be used for bundling get requests: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi gettable '{"name": "test", "nfgen-family": 1}' \ --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --output-json [{"name": "test", "use": 1, "handle": 1, "flags": [], "nfgen-family": 1, "version": 0, "res-id": 2}, {"table": "test", "name": "chain", "handle": 1, "use": 0, "nfgen-family": 1, "version": 0, "res-id": 2}] Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tools/net/ynl: Fix extack decoding for directional opsDonald Hunter
NetlinkProtocol.decode() was looking up ops by response value which breaks when it is used for extack decoding of directional ops. Instead, pass the op to decode(). Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22doc/netlink/specs: Add draft nftables specDonald Hunter
Add a spec for nftables that has nearly complete coverage of the ops, but limited coverage of rule types and subexpressions. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge branch 'for-uring-ubufops' into HEADJakub Kicinski
Pavel Begunkov says: ==================== implement io_uring notification (ubuf_info) stacking (net part) To have per request buffer notifications each zerocopy io_uring send request allocates a new ubuf_info. However, as an skb can carry only one uarg, it may force the stack to create many small skbs hurting performance in many ways. The patchset implements notification, i.e. an io_uring's ubuf_info extension, stacking. It attempts to link ubuf_info's into a list, allowing to have multiple of them per skb. liburing/examples/send-zerocopy shows up 6 times performance improvement for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without the patchset it requires much larger sends to utilise all potential. bytes | before | after (Kqps) 1200 | 195 | 1023 4000 | 193 | 1386 8000 | 154 | 1058 ==================== Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Five ksmbd server fixes, most also for stable: - rename fix - two fixes for potential out of bounds - fix for connections from MacOS (padding in close response) - fix for when to enable persistent handles" * tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: add continuous availability share parameter ksmbd: common: use struct_group_attr instead of struct_group for network_open_info ksmbd: clear RENAME_NOREPLACE before calling vfs_rename ksmbd: validate request buffer size in smb2_allocate_rsp_buf() ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
2024-04-22net: add callback for setting a ubuf_info to skbPavel Begunkov
At the moment an skb can only have one ubuf_info associated with it, which might be a performance problem for zerocopy sends in cases like TCP via io_uring. Add a callback for assigning ubuf_info to skb, this way we will implement smarter assignment later like linking ubuf_info together. Note, it's an optional callback, which should be compatible with skb_zcopy_set(), that's because the net stack might potentially decide to clone an skb and take another reference to ubuf_info whenever it wishes. Also, a correct implementation should always be able to bind to an skb without prior ubuf_info, otherwise we could end up in a situation when the send would not be able to progress. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: extend ubuf_info callback to ops structurePavel Begunkov
We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-23kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICEMiguel Ojeda
When KUnit tests are enabled, under very big kernel configurations (e.g. `allyesconfig`), we can trigger a `rustdoc` ICE [1]: RUSTDOC TK rust/kernel/lib.rs error: the compiler unexpectedly panicked. this is a bug. The reason is that this build step has a duplicated `@rustc_cfg` argument, which contains the kernel configuration, and thus a lot of arguments. The factor 2 happens to be enough to reach the ICE. Thus remove the unneeded `@rustc_cfg`. By doing so, we clean up the command and workaround the ICE. The ICE has been fixed in the upcoming Rust 1.79 [2]. Cc: stable@vger.kernel.org Fixes: a66d733da801 ("rust: support running Rust documentation tests as KUnit ones") Link: https://github.com/rust-lang/rust/issues/122722 [1] Link: https://github.com/rust-lang/rust/pull/122840 [2] Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240422091215.526688-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-04-23rust: kernel: require `Send` for `Module` implementationsWedson Almeida Filho
The thread that calls the module initialisation code when a module is loaded is not guaranteed [in fact, it is unlikely] to be the same one that calls the module cleanup code on module unload, therefore, `Module` implementations must be `Send` to account for them moving from one thread to another implicitly. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Cc: stable@vger.kernel.org # 6.8.x: df70d04d5697: rust: phy: implement `Send` for `Registration` Cc: stable@vger.kernel.org Fixes: 247b365dc8dc ("rust: add `kernel` crate") Link: https://lore.kernel.org/r/20240328195457.225001-3-wedsonaf@gmail.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-04-23rust: phy: implement `Send` for `Registration`Wedson Almeida Filho
In preparation for requiring `Send` for `Module` implementations in the next patch. Cc: FUJITA Tomonori <fujita.tomonori@gmail.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: netdev@vger.kernel.org Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240328195457.225001-2-wedsonaf@gmail.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-04-22Merge branch 'tcp-avoid-sending-too-small-packets'Jakub Kicinski
Eric Dumazet says: ==================== tcp: avoid sending too small packets tcp_sendmsg() cooks 'large' skbs, that are later split if needed from tcp_write_xmit(). After a split, the leftover skb size is smaller than the optimal size, and this causes a performance drop. In this series, tcp_grow_skb() helper is added to shift payload from the second skb in the write queue to the first skb to always send optimal sized skbs. This increases TSO efficiency, and decreases number of ACK packets. ==================== Link: https://lore.kernel.org/r/20240418214600.1291486-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tcp: try to send bigger TSO packetsEric Dumazet
While investigating TCP performance, I found that TCP would sometimes send big skbs followed by a single MSS skb, in a 'locked' pattern. For instance, BIG TCP is enabled, MSS is set to have 4096 bytes of payload per segment. gso_max_size is set to 181000. This means that an optimal TCP packet size should contain 44 * 4096 = 180224 bytes of payload, However, I was seeing packets sizes interleaved in this pattern: 172032, 8192, 172032, 8192, 172032, 8192, <repeat> tcp_tso_should_defer() heuristic is defeated, because after a split of a packet in write queue for whatever reason (this might be a too small CWND or a small enough pacing_rate), the leftover packet in the queue is smaller than the optimal size. It is time to try to make 'leftover packets' bigger so that tcp_tso_should_defer() can give its full potential. After this patch, we can see the following output: 14:13:34.009273 IP6 sender > receiver: Flags [P.], seq 4048380:4098360, ack 1, win 256, options [nop,nop,TS val 3425678144 ecr 1561784500], length 49980 14:13:34.010272 IP6 sender > receiver: Flags [P.], seq 4098360:4148340, ack 1, win 256, options [nop,nop,TS val 3425678145 ecr 1561784501], length 49980 14:13:34.011271 IP6 sender > receiver: Flags [P.], seq 4148340:4198320, ack 1, win 256, options [nop,nop,TS val 3425678146 ecr 1561784502], length 49980 14:13:34.012271 IP6 sender > receiver: Flags [P.], seq 4198320:4248300, ack 1, win 256, options [nop,nop,TS val 3425678147 ecr 1561784503], length 49980 14:13:34.013272 IP6 sender > receiver: Flags [P.], seq 4248300:4298280, ack 1, win 256, options [nop,nop,TS val 3425678148 ecr 1561784504], length 49980 14:13:34.014271 IP6 sender > receiver: Flags [P.], seq 4298280:4348260, ack 1, win 256, options [nop,nop,TS val 3425678149 ecr 1561784505], length 49980 14:13:34.015272 IP6 sender > receiver: Flags [P.], seq 4348260:4398240, ack 1, win 256, options [nop,nop,TS val 3425678150 ecr 1561784506], length 49980 14:13:34.016270 IP6 sender > receiver: Flags [P.], seq 4398240:4448220, ack 1, win 256, options [nop,nop,TS val 3425678151 ecr 1561784507], length 49980 14:13:34.017269 IP6 sender > receiver: Flags [P.], seq 4448220:4498200, ack 1, win 256, options [nop,nop,TS val 3425678152 ecr 1561784508], length 49980 14:13:34.018276 IP6 sender > receiver: Flags [P.], seq 4498200:4548180, ack 1, win 256, options [nop,nop,TS val 3425678153 ecr 1561784509], length 49980 14:13:34.019259 IP6 sender > receiver: Flags [P.], seq 4548180:4598160, ack 1, win 256, options [nop,nop,TS val 3425678154 ecr 1561784510], length 49980 With 200 concurrent flows on a 100Gbit NIC, we can see a reduction of TSO packets (and ACK packets) of about 30 %. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tcp: call tcp_set_skb_tso_segs() from tcp_write_xmit()Eric Dumazet
tcp_write_xmit() calls tcp_init_tso_segs() to set gso_size and gso_segs on the packet. tcp_init_tso_segs() requires the stack to maintain an up to date tcp_skb_pcount(), and this makes sense for packets in rtx queue. Not so much for packets still in the write queue. In the following patch, we don't want to deal with tcp_skb_pcount() when moving payload from 2nd skb to 1st skb in the write queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tcp: remove dubious FIN exception from tcp_cwnd_test()Eric Dumazet
tcp_cwnd_test() has a special handing for the last packet in the write queue if it is smaller than one MSS and has the FIN flag. This is in violation of TCP RFC, and seems quite dubious. This packet can be sent only if the current CWND is bigger than the number of packets in flight. Making tcp_cwnd_test() result independent of the first skb in the write queue is needed for the last patch of the series. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>