summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-26i3c: mipi-i3c-hci: Error out instead on BUG_ON() in IBI DMA setupJarkko Nikula
Definitely condition dma_get_cache_alignment * defined value > 256 during driver initialization is not reason to BUG_ON(). Turn that to graceful error out with -EINVAL. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20240628131559.502822-3-jarkko.nikula@linux.intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-07-26i3c: mipi-i3c-hci: Set IBI Status and Data Ring base addressesJarkko Nikula
IBI Status and Data Ring base address registers are not set so HW obviously cannot update those rings after In-Band Interrupt. Set them to already allocated and mapped ring addresses. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20240628131559.502822-2-jarkko.nikula@linux.intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-07-26i3c: mipi-i3c-hci: Switch to lower_32_bits()/upper_32_bits() helpersJarkko Nikula
Rather than having own lo32()/hi32() helpers for dealing with 32-bit and 64-bit build targets switch to generic lower_32_bits()/upper_32_bits() helpers. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Link: https://lore.kernel.org/r/20240628131559.502822-1-jarkko.nikula@linux.intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-07-26i3c: dw: Remove ibi_capable propertyAniket
Since DW I3C IP master role always supports IBI, we don't need to keep two variants of master ops and select one using this property. Hence remove the code. Signed-off-by: Aniket <aniketmaurya@google.com> Link: https://lore.kernel.org/r/20240627034119.3938050-1-aniketmaurya@google.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-07-26i3c: dw: Fix IBI intr programmingAniket
IBI_SIR_REQ_REJECT register is not present if the IP has IC_HAS_IBI_DATA = 1 set. So don't rely on doing read- modify-write op on this register. Instead maintain a variable to store the sir reject mask and use it to set IBI_SIR_REQ_REJECT. Signed-off-by: Aniket <aniketmaurya@google.com> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-07-26i3c: dw: Fix clearing queue thldAniket
QUEUE_THLD_CTRL_IBI_STAT_MASK is repeated twice. Replace with QUEUE_THLD_CTRL_IBI_DATA_MASK. Signed-off-by: Aniket <aniketmaurya@google.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-07-26i3c: mipi-i3c-hci: Fix number of DAT/DCT entries for HCI versions < 1.1Jarkko Nikula
I was wrong about the TABLE_SIZE field description in the commit 0676bfebf576 ("i3c: mipi-i3c-hci: Fix DAT/DCT entry sizes"). For the MIPI I3C HCI versions 1.0 and earlier the TABLE_SIZE field in the registers DAT_SECTION_OFFSET and DCT_SECTION_OFFSET is indeed defined in DWORDs and not number of entries like it is defined in later versions. Where above fix allowed driver initialization to continue the wrongly interpreted TABLE_SIZE field leads variables DAT_entries being twice and DCT_entries four times as big as they really are. That in turn leads clearing the DAT table over the boundary in the dat_v1.c: hci_dat_v1_init(). So interprete the TABLE_SIZE field in DWORDs for HCI versions < 1.1 and fix number of DAT/DCT entries accordingly. Fixes: 0676bfebf576 ("i3c: mipi-i3c-hci: Fix DAT/DCT entry sizes") Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-07-26i3c: master: svc: resend target address when get NACKFrank Li
According to I3C Spec 1.1.1, 11-Jun-2021, section: 5.1.2.2.3: If the Controller chooses to start an I3C Message with an I3C Dynamic Address, then special provisions shall be made because that same I3C Target may be initiating an IBI or a Controller Role Request. So, one of three things may happen: (skip 1, 2) 3. The Addresses match and the RnW bits also match, and so neither Controller nor Target will ACK since both are expecting the other side to provide ACK. As a result, each side might think it had "won" arbitration, but neither side would continue, as each would subsequently see that the other did not provide ACK. ... For either value of RnW: Due to the NACK, the Controller shall defer the Private Write or Private Read, and should typically transmit the Target ^^^^^^^^^^^^^^^^^^^ Address again after a Repeated START (i.e., the next one or any one prior ^^^^^^^^^^^^^ to a STOP in the Frame). Since the Address Header following a Repeated START is not arbitrated, the Controller will always win (see Section 5.1.2.2.4). Resend target address again if address is not 7E and controller get NACK. Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-07-26erofs: convert comma to semicolonChen Ni
Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240724020721.2389738-1-nichen@iscas.ac.cn Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-07-26erofs: support multi-page folios for erofs_bread()Gao Xiang
If the requested page is part of the previous multi-page folio, there is no need to call read_mapping_folio() again. Also, get rid of the remaining one of page->index [1] in our codebase. [1] https://lore.kernel.org/r/Zp8fgUSIBGQ1TN0D@casper.infradead.org Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240723073024.875290-1-hsiangkao@linux.alibaba.com
2024-07-26erofs: add support for FS_IOC_GETFSSYSFSPATHHuang Xiaojia
FS_IOC_GETFSSYSFSPATH ioctl exposes /sys/fs path of a given filesystem, potentially standarizing sysfs reporting. This patch add support for FS_IOC_GETFSSYSFSPATH for erofs, "erofs/<dev>" will be outputted for bdev cases, "erofs/[domain_id,]<fs_id>" will be outputted for fscache cases. Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Link: https://lore.kernel.org/r/20240720082335.441563-1-huangxiaojia2@huawei.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-07-26erofs: fix race in z_erofs_get_gbuf()Gao Xiang
In z_erofs_get_gbuf(), the current task may be migrated to another CPU between `z_erofs_gbuf_id()` and `spin_lock(&gbuf->lock)`. Therefore, z_erofs_put_gbuf() will trigger the following issue which was found by stress test: <2>[772156.434168] kernel BUG at fs/erofs/zutil.c:58! .. <4>[772156.435007] <4>[772156.439237] CPU: 0 PID: 3078 Comm: stress Kdump: loaded Tainted: G E 6.10.0-rc7+ #2 <4>[772156.439239] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.0.0 01/01/2017 <4>[772156.439241] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[772156.439243] pc : z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.439252] lr : z_erofs_lz4_decompress+0x600/0x6a0 [erofs] .. <6>[772156.445958] stress (3127): drop_caches: 1 <4>[772156.446120] Call trace: <4>[772156.446121] z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.446761] z_erofs_lz4_decompress+0x600/0x6a0 [erofs] <4>[772156.446897] z_erofs_decompress_queue+0x740/0xa10 [erofs] <4>[772156.447036] z_erofs_runqueue+0x428/0x8c0 [erofs] <4>[772156.447160] z_erofs_readahead+0x224/0x390 [erofs] .. Fixes: f36f3010f676 ("erofs: rename per-CPU buffers to global buffer pool and make it configurable") Cc: <stable@vger.kernel.org> # 6.10+ Reviewed-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240722035110.3456740-1-hsiangkao@linux.alibaba.com
2024-07-26erofs: support STATX_DIOALIGNHongbo Li
Add support for STATX_DIOALIGN to EROFS, so that direct I/O alignment restrictions are exposed to userspace in a generic way. [Before] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:0 dio offset align:0 ``` [After] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:512 dio offset align:512 ``` Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240718083243.2485437-1-hsiangkao@linux.alibaba.com
2024-07-26Merge tag 'amd-drm-fixes-6.11-2024-07-25' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-fixes-6.11-2024-07-25: amdgpu: - SDMA 5.2 workaround - GFX12 fixes - Uninitialized variable fix - VCN/JPEG 4.0.3 fixes - Misc display fixes - RAS fixes - VCN4/5 harvest fix - GPU reset fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240725202900.2155572-1-alexander.deucher@amd.com
2024-07-26Merge tag 'drm-misc-next-fixes-2024-07-25' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next A single fix for a panel compatible Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240725-frisky-wren-of-tact-f5f504@houat
2024-07-26Merge tag 'drm-intel-next-fixes-2024-07-25' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next - Do not consider preemption during execlists_dequeue for gen8 [gt] (Nitin Gote) - Allow NULL memory region (Jonathan Cavitt) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZqICQzyzm/6hDWy4@linux
2024-07-25Merge tag 'net-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. A lot of networking people were at a conference last week, busy catching COVID, so relatively short PR. Current release - regressions: - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP Current release - new code bugs: - l2tp: protect session IDR and tunnel session list with one lock, make sure the state is coherent to avoid a warning - eth: bnxt_en: update xdp_rxq_info in queue restart logic - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field Previous releases - regressions: - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len, the field reuses previously un-validated pad Previous releases - always broken: - tap/tun: drop short frames to prevent crashes later in the stack - eth: ice: add a per-VF limit on number of FDIR filters - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash" * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) tun: add missing verification for short frame tap: add missing verification for short frame mISDN: Fix a use after free in hfcmulti_tx() gve: Fix an edge case for TSO skb validity check bnxt_en: update xdp_rxq_info in queue restart logic tcp: process the 3rd ACK with sk_socket for TFO/MPTCP selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling MAINTAINERS: make Breno the netconsole maintainer MAINTAINERS: Update bonding entry net: nexthop: Initialize all fields in dumped nexthops net: stmmac: Correct byte order of perfect_match selftests: forwarding: skip if kernel not support setting bridge fdb learning limit tipc: Return non-zero value from tipc_udp_addr2str() on error netfilter: nft_set_pipapo_avx2: disable softinterrupts ice: Fix recipe read procedure ice: Add a per-VF limit on number of FDIR filters net: bonding: correctly annotate RCU in bond_should_notify_peers() ...
2024-07-25Merge tag 'printk-for-6.11-trivial' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - trivial printk changes The bigger "real" printk work is still being discussed. * tag 'printk-for-6.11-trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: vsprintf: add missing MODULE_DESCRIPTION() macro printk: Rename console_replay_all() and update context
2024-07-25Merge tag 'constfy-sysctl-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl constification from Joel Granados: "Treewide constification of the ctl_table argument of proc_handlers using a coccinelle script and some manual code formatting fixups. This is a prerequisite to moving the static ctl_table structs into read-only data section which will ensure that proc_handler function pointers cannot be modified" * tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: treewide: constify the ctl_table argument of proc_handlers
2024-07-25Merge tag 'efi-fixes-for-v6.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Wipe screen_info after allocating it from the heap - used by arm32 and EFI zboot, other EFI architectures allocate it statically - Revert to allocating boot_params from the heap on x86 when entering via the native PE entrypoint, to work around a regression on older Dell hardware * tag 'efi-fixes-for-v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Revert to heap allocated boot_params for PE entrypoint efi/libstub: Zero initialize heap allocated struct screen_info
2024-07-25Merge tag 'kgdb-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Three small changes this cycle: - Clean up an architecture abstraction that is no longer needed because all the architectures have converged. - Actually use the prompt argument to kdb_position_cursor() instead of ignoring it (functionally this fix is a nop but that was due to luck rather than good judgement) - Fix a -Wformat-security warning" * tag 'kgdb-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Get rid of redundant kdb_curr_task() kdb: Use the passed prompt in kdb_position_cursor() kdb: address -Wformat-security warnings
2024-07-25Merge tag 'mips_6.11_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS updates from Thomas Bogendoerfer: - Use improved timer sync for Loongson64 - Fix address of GCR_ACCESS register - Add missing MODULE_DESCRIPTION * tag 'mips_6.11_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: mips: sibyte: add missing MODULE_DESCRIPTION() macro MIPS: SMP-CPS: Fix address for GCR_ACCESS register for CM3 and later MIPS: Loongson64: Switch to SYNC_R4K
2024-07-25Merge tag 'parisc-for-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "The gettimeofday() and clock_gettime() syscalls are now available as vDSO functions, and Dave added a patch which allows to use NVMe cards in the PCI slots as fast and easy alternative to SCSI discs. Summary: - add gettimeofday() and clock_gettime() vDSO functions - enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor with PCIe NVME card to function in parisc machines - allow users to reduce kernel unaligned runtime warnings - minor code cleanups" * tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN parisc: Use max() to calculate parisc_tlb_flush_threshold parisc: Fix warning at drivers/pci/msi/msi.h:121 parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions parisc: Clean up unistd.h file
2024-07-25Merge tag 'uml-for-linus-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Support for preemption - i386 Rust support - Huge cleanup by Benjamin Berg - UBSAN support - Removal of dead code * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits) um: vector: always reset vp->opened um: vector: remove vp->lock um: register power-off handler um: line: always fill *error_out in setup_one_line() um: remove pcap driver from documentation um: Enable preemption in UML um: refactor TLB update handling um: simplify and consolidate TLB updates um: remove force_flush_all from fork_handler um: Do not flush MM in flush_thread um: Delay flushing syscalls until the thread is restarted um: remove copy_context_skas0 um: remove LDT support um: compress memory related stub syscalls while adding them um: Rework syscall handling um: Add generic stub_syscall6 function um: Create signal stack memory assignment in stub_data um: Remove stub-data.h include from common-offsets.h um: time-travel: fix signal blocking race/hang um: time-travel: remove time_exit() ...
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-25Merge tag 'linux-watchdog-6.11-rc1' of ↵Linus Torvalds
git://www.linux-watchdog.org/linux-watchdog Pull watchdog updates from Wim Van Sebroeck: - make watchdog_class const - rework of the rzg2l_wdt driver - other small fixes and improvements * tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/linux-watchdog: dt-bindings: watchdog: dlg,da9062-watchdog: Drop blank space watchdog: rzn1: Convert comma to semicolon watchdog: lenovo_se10_wdt: Convert comma to semicolon dt-bindings: watchdog: renesas,wdt: Document RZ/G3S support watchdog: rzg2l_wdt: Add suspend/resume support watchdog: rzg2l_wdt: Rely on the reset driver for doing proper reset watchdog: rzg2l_wdt: Remove comparison with zero watchdog: rzg2l_wdt: Remove reset de-assert from probe watchdog: rzg2l_wdt: Check return status of pm_runtime_put() watchdog: rzg2l_wdt: Use pm_runtime_resume_and_get() watchdog: rzg2l_wdt: Make the driver depend on PM watchdog: rzg2l_wdt: Restrict the driver to ARCH_RZG2L and ARCH_R9A09G011 watchdog: imx7ulp_wdt: keep already running watchdog enabled watchdog: starfive: Add missing clk_disable_unprepare() watchdog: Make watchdog_class const
2024-07-25Merge tag 'dma-mapping-6.11-2024-07-24' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: - fix the order of actions in dmam_free_coherent (Lance Richardson) * tag 'dma-mapping-6.11-2024-07-24' of git://git.infradead.org/users/hch/dma-mapping: dma: fix call order in dmam_free_coherent
2024-07-25Merge tag 'asoc-fix-v6.11-merge-window' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.11 A selection of routine fixes and quirks that came in since the merge window. The fsl-asoc-card change is a fix for systems with multiple cards where updating templates in place leaks data from one card to another.
2024-07-25Merge branch 'tap-tun-harden-by-dropping-short-frame'Jakub Kicinski
Dongli Zhang says: ==================== tap/tun: harden by dropping short frame This is to harden all of tap/tun to avoid any short frame smaller than the Ethernet header (ETH_HLEN). While the xen-netback already rejects short frame smaller than ETH_HLEN ... 914 static void xenvif_tx_build_gops(struct xenvif_queue *queue, 915 int budget, 916 unsigned *copy_ops, 917 unsigned *map_ops) 918 { ... ... 1007 if (unlikely(txreq.size < ETH_HLEN)) { 1008 netdev_dbg(queue->vif->dev, 1009 "Bad packet size: %d\n", txreq.size); 1010 xenvif_tx_err(queue, &txreq, extra_count, idx); 1011 break; 1012 } ... the short frame may not be dropped by vhost-net/tap/tun. This fixes CVE-2024-41090 and CVE-2024-41091. ==================== Link: https://patch.msgid.link/20240724170452.16837-1-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25tun: add missing verification for short frameDongli Zhang
The cited commit missed to check against the validity of the frame length in the tun_xdp_one() path, which could cause a corrupted skb to be sent downstack. Even before the skb is transmitted, the tun_xdp_one-->eth_type_trans() may access the Ethernet header although it can be less than ETH_HLEN. Once transmitted, this could either cause out-of-bound access beyond the actual length, or confuse the underlayer with incorrect or inconsistent header length in the skb metadata. In the alternative path, tun_get_user() already prohibits short frame which has the length less than Ethernet header size from being transmitted for IFF_TAP. This is to drop any frame shorter than the Ethernet header size just like how tun_get_user() does. CVE: CVE-2024-41091 Inspired-by: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/ Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Cc: stable@vger.kernel.org Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20240724170452.16837-3-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25tap: add missing verification for short frameSi-Wei Liu
The cited commit missed to check against the validity of the frame length in the tap_get_user_xdp() path, which could cause a corrupted skb to be sent downstack. Even before the skb is transmitted, the tap_get_user_xdp()-->skb_set_network_header() may assume the size is more than ETH_HLEN. Once transmitted, this could either cause out-of-bound access beyond the actual length, or confuse the underlayer with incorrect or inconsistent header length in the skb metadata. In the alternative path, tap_get_user() already prohibits short frame which has the length less than Ethernet header size from being transmitted. This is to drop any frame shorter than the Ethernet header size just like how tap_get_user() does. CVE: CVE-2024-41090 Link: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/ Fixes: 0efac27791ee ("tap: accept an array of XDP buffs through sendmsg()") Cc: stable@vger.kernel.org Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20240724170452.16837-2-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25mISDN: Fix a use after free in hfcmulti_tx()Dan Carpenter
Don't dereference *sp after calling dev_kfree_skb(*sp). Fixes: af69fb3a8ffa ("Add mISDN HFC multiport driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/8be65f5a-c2dd-4ba0-8a10-bfe5980b8cfb@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25gve: Fix an edge case for TSO skb validity checkBailey Forrest
The NIC requires each TSO segment to not span more than 10 descriptors. NIC further requires each descriptor to not exceed 16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO). The descriptors for an skb are generated by gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format. gve_tx_add_skb_no_copy_dqo() loops through each skb frag and generates a descriptor for the entire frag if the frag size is not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s) of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO). gve_can_send_tso() checks if the descriptors thus generated for an skb would meet the requirement that each TSO-segment not span more than 10 descriptors. However, the current code misses an edge case when a TSO segment spans multiple descriptors within a large frag. This change fixes the edge case. gve_can_send_tso() relies on the assumption that max gso size (9728) is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb fragment a TSO segment can never span more than 2 descriptors. Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240724143431.3343722-1-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25bnxt_en: update xdp_rxq_info in queue restart logicTaehee Yoo
When the netdev_rx_queue_restart() restarts queues, the bnxt_en driver updates(creates and deletes) a page_pool. But it doesn't update xdp_rxq_info, so the xdp_rxq_info is still connected to an old page_pool. So, bnxt_rx_ring_info->page_pool indicates a new page_pool, but bnxt_rx_ring_info->xdp_rxq is still connected to an old page_pool. An old page_pool is no longer used so it is supposed to be deleted by page_pool_destroy() but it isn't. Because the xdp_rxq_info is holding the reference count for it and the xdp_rxq_info is not updated, an old page_pool will not be deleted in the queue restart logic. Before restarting 1 queue: ./tools/net/ynl/samples/page-pool enp10s0f1np1[6] page pools: 4 (zombies: 0) refs: 8192 bytes: 33554432 (refs: 0 bytes: 0) recycling: 0.0% (alloc: 128:8048 recycle: 0:0) After restarting 1 queue: ./tools/net/ynl/samples/page-pool enp10s0f1np1[6] page pools: 5 (zombies: 0) refs: 10240 bytes: 41943040 (refs: 0 bytes: 0) recycling: 20.0% (alloc: 160:10080 recycle: 1920:128) Before restarting queues, an interface has 4 page_pools. After restarting one queue, an interface has 5 page_pools, but it should be 4, not 5. The reason is that queue restarting logic creates a new page_pool and an old page_pool is not deleted due to the absence of an update of xdp_rxq_info logic. Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: David Wei <dw@davidwei.uk> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20240721053554.1233549-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25io_uring/msg_ring: fix uninitialized use of target_req->flagsJens Axboe
syzbot reports that KMSAN complains that 'nr_tw' is an uninit-value with the following report: BUG: KMSAN: uninit-value in io_req_local_work_add io_uring/io_uring.c:1192 [inline] BUG: KMSAN: uninit-value in io_req_task_work_add_remote+0x588/0x5d0 io_uring/io_uring.c:1240 io_req_local_work_add io_uring/io_uring.c:1192 [inline] io_req_task_work_add_remote+0x588/0x5d0 io_uring/io_uring.c:1240 io_msg_remote_post io_uring/msg_ring.c:102 [inline] io_msg_data_remote io_uring/msg_ring.c:133 [inline] io_msg_ring_data io_uring/msg_ring.c:152 [inline] io_msg_ring+0x1c38/0x1ef0 io_uring/msg_ring.c:305 io_issue_sqe+0x383/0x22c0 io_uring/io_uring.c:1710 io_queue_sqe io_uring/io_uring.c:1924 [inline] io_submit_sqe io_uring/io_uring.c:2180 [inline] io_submit_sqes+0x1259/0x2f20 io_uring/io_uring.c:2295 __do_sys_io_uring_enter io_uring/io_uring.c:3205 [inline] __se_sys_io_uring_enter+0x40c/0x3ca0 io_uring/io_uring.c:3142 __x64_sys_io_uring_enter+0x11f/0x1a0 io_uring/io_uring.c:3142 x64_sys_call+0x2d82/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:427 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f which is the following check: if (nr_tw < nr_wait) return; in io_req_local_work_add(). While nr_tw itself cannot be uninitialized, it does depend on req->flags, which off the msg ring issue path can indeed be uninitialized. Fix this by always clearing the allocated 'req' fully if we can't grab one from the cache itself. Fixes: 50cf5f3842af ("io_uring/msg_ring: add an alloc cache for io_kiocb entries") Reported-by: syzbot+82609b8937a4458106ca@syzkaller.appspotmail.com Link: https://lore.kernel.org/io-uring/000000000000fd3d8d061dfc0e4a@google.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-25Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-25 We've added 14 non-merge commits during the last 8 day(s) which contain a total of 19 files changed, 177 insertions(+), 70 deletions(-). The main changes are: 1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and BPF sockhash. Also add test coverage for this case, from Michal Luczaj. 2) Fix a segmentation issue when downgrading gso_size in the BPF helper bpf_skb_adjust_room(), from Fred Li. 3) Fix a compiler warning in resolve_btfids due to a missing type cast, from Liwei Song. 4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte boundary in the fexit_sleep BPF selftest, from Puranjay Mohan. 5) Fix a xsk regression to require a flag when actuating tx_metadata_len, from Stanislav Fomichev. 6) Fix function prototype BTF dumping in libbpf for prototypes that have no input arguments, from Andrii Nakryiko. 7) Fix stacktrace symbol resolution in perf script for BPF programs containing subprograms, from Hou Tao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids bpf, events: Use prog to emit ksymbol event for main program selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash bpftool: Fix typo in usage help libbpf: Fix no-args func prototype BTF dumping syntax MAINTAINERS: Update powerpc BPF JIT maintainers MAINTAINERS: Update email address of Naveen selftests/bpf: fexit_sleep: Fix stack allocation for arm64 ==================== Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25nvme-pci: add missing condition check for existence of mapped dataLeon Romanovsky
nvme_map_data() is called when request has physical segments, hence the nvme_unmap_data() should have same condition to avoid dereference. Fixes: 4aedb705437f ("nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-25ASoC: fsl-asoc-card: Dynamically allocate memory for snd_soc_dai_link_componentsShengjiu Wang
The static snd_soc_dai_link_components cause conflict for multiple instances of this generic driver. For example, when there is wm8962 and SPDIF case enabled together, the contaminated snd_soc_dai_link_components will cause another device probe fail. Fixes: 6d174cc4f224 ("ASoC: fsl-asoc-card: merge spdif support from imx-spdif.c") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Link: https://patch.msgid.link/1721877773-5229-1-git-send-email-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-07-25arm64: mm: Fix lockless walks with static and dynamic page-table foldingWill Deacon
Lina reports random oopsen originating from the fast GUP code when 16K pages are used with 4-level page-tables, the fourth level being folded at runtime due to lack of LPA2. In this configuration, the generic implementation of p4d_offset_lockless() will return a 'p4d_t *' corresponding to the 'pgd_t' allocated on the stack of the caller, gup_fast_pgd_range(). This is normally fine, but when the fourth level of page-table is folded at runtime, pud_offset_lockless() will offset from the address of the 'p4d_t' to calculate the address of the PUD in the same page-table page. This results in a stray stack read when the 'p4d_t' has been allocated on the stack and can send the walker into the weeds. Fix the problem by providing our own definition of p4d_offset_lockless() when CONFIG_PGTABLE_LEVELS <= 4 which returns the real page-table pointer rather than the address of the local stack variable. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/50360968-13fb-4e6f-8f52-1725b3177215@asahilina.net Fixes: 0dd4f60a2c76 ("arm64: mm: Add support for folding PUDs at runtime") Reported-by: Asahi Lina <lina@asahilina.net> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240725090345.28461-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-07-25iommu: arm-smmu: Fix Tegra workaround for PAGE_SIZE mappingsAshish Mhetre
PAGE_SIZE can be 16KB for Tegra which is not supported by MMU-500 on both Tegra194 and Tegra234. Retain only valid granularities from pgsize_bitmap which would either be 4KB or 64KB. Signed-off-by: Ashish Mhetre <amhetre@nvidia.com> Link: https://lore.kernel.org/r/20240724173132.219978-1-amhetre@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-25of: remove internal arguments from of_property_for_each_u32()Luca Ceresoli
The of_property_for_each_u32() macro needs five parameters, two of which are primarily meant as internal variables for the macro itself (in the for() clause). Yet these two parameters are used by a few drivers, and this can be considered misuse or at least bad practice. Now that the kernel uses C11 to build, these two parameters can be avoided by declaring them internally, thus changing this pattern: struct property *prop; const __be32 *p; u32 val; of_property_for_each_u32(np, "xyz", prop, p, val) { ... } to this: u32 val; of_property_for_each_u32(np, "xyz", val) { ... } However two variables cannot be declared in the for clause even with C11, so declare one struct that contain the two variables we actually need. As the variables inside this struct are not meant to be used by users of this macro, give the struct instance the noticeable name "_it" so it is visible during code reviews, helping to avoid new code to use it directly. Most usages are trivially converted as they do not use those two parameters, as expected. The non-trivial cases are: - drivers/clk/clk.c, of_clk_get_parent_name(): easily doable anyway - drivers/clk/clk-si5351.c, si5351_dt_parse(): this is more complex as the checks had to be replicated in a different way, making code more verbose and somewhat uglier, but I refrained from a full rework to keep as much of the original code untouched having no hardware to test my changes All the changes have been build tested. The few for which I have the hardware have been runtime-tested too. Reviewed-by: Andre Przywara <andre.przywara@arm.com> # drivers/clk/sunxi/clk-simple-gates.c, drivers/clk/sunxi/clk-sun8i-bus-gates.c Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # drivers/gpio/gpio-brcmstb.c Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> # drivers/irqchip/irq-atmel-aic-common.c Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # drivers/iio/adc/ti_am335x_adc.c Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> # drivers/pwm/pwm-samsung.c Acked-by: Richard Leitner <richard.leitner@linux.dev> # drivers/usb/misc/usb251xb.c Acked-by: Mark Brown <broonie@kernel.org> # sound/soc/codecs/arizona.c Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com> # sound/soc/codecs/arizona.c Acked-by: Michael Ellerman <mpe@ellerman.id.au> # arch/powerpc/sysdev/xive/spapr.c Acked-by: Stephen Boyd <sboyd@kernel.org> # clk Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Acked-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20240724-of_property_for_each_u32-v3-1-bea82ce429e2@bootlin.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-07-25dt-bindings: watchdog: add support for Amlogic A4 SoCsHuqiang Qin
Update dt-binding document for watchdog of Amlogic A4 SoCs. Signed-off-by: Huqiang Qin <huqiang.qin@amlogic.com> Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240709-a4-a5_watchdog-v1-1-2ae852e05ec2@amlogic.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-07-25ASoC: amd: yc: Support mic on Lenovo Thinkpad E16 Gen 2Takashi Iwai
Lenovo Thinkpad E16 Gen 2 AMD model (model 21M5) needs a corresponding quirk entry for making the internal mic working. Link: https://bugzilla.suse.com/show_bug.cgi?id=1228269 Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20240725065442.9293-1-tiwai@suse.de Signed-off-by: Mark Brown <broonie@kernel.org>
2024-07-25x86/xen: fix memblock_reserve() usage on PVHRoger Pau Monne
The current usage of memblock_reserve() in init_pvh_bootparams() is done before the .bss is zeroed, and that used to be fine when memblock_reserved_init_regions implicitly ended up in the .meminit.data section. However after commit 73db3abdca58c memblock_reserved_init_regions ends up in the .bss section, thus breaking it's usage before the .bss is cleared. Move and rename the call to xen_reserve_extra_memory() so it's done in the x86_init.oem.arch_setup hook, which gets executed after the .bss has been zeroed, but before calling e820__memory_setup(). Fixes: 73db3abdca58c ("init/modpost: conditionally check section mismatch to __meminit*") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240725073116.14626-3-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-25x86/xen: move xen_reserve_extra_memory()Roger Pau Monne
In preparation for making the function static. No functional change. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240725073116.14626-2-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-25tcp: process the 3rd ACK with sk_socket for TFO/MPTCPMatthieu Baerts (NGI0)
The 'Fixes' commit recently changed the behaviour of TCP by skipping the processing of the 3rd ACK when a sk->sk_socket is set. The goal was to skip tcp_ack_snd_check() in tcp_rcv_state_process() not to send an unnecessary ACK in case of simultaneous connect(). Unfortunately, that had an impact on TFO and MPTCP. I started to look at the impact on MPTCP, because the MPTCP CI found some issues with the MPTCP Packetdrill tests [1]. Then Paolo Abeni suggested me to look at the impact on TFO with "plain" TCP. For MPTCP, when receiving the 3rd ACK of a request adding a new path (MP_JOIN), sk->sk_socket will be set, and point to the MPTCP sock that has been created when the MPTCP connection got established before with the first path. The newly added 'goto' will then skip the processing of the segment text (step 7) and not go through tcp_data_queue() where the MPTCP options are validated, and some actions are triggered, e.g. sending the MPJ 4th ACK [2] as demonstrated by the new errors when running a packetdrill test [3] establishing a second subflow. This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be delayed. Still, we don't want to have this behaviour as it delays the switch to the fully established mode, and invalid MPTCP options in this 3rd ACK will not be caught any more. This modification also affects the MPTCP + TFO feature as well, and being the reason why the selftests started to be unstable the last few days [4]. For TFO, the existing 'basic-cookie-not-reqd' test [5] was no longer passing: if the 3rd ACK contains data, and the connection is accept()ed before receiving them, these data would no longer be processed, and thus not ACKed. One last thing about MPTCP, in case of simultaneous connect(), a fallback to TCP will be done, which seems fine: `../common/defaults.sh` 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460, sackOK, TS val 100 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 < S 0:0(0) win 1000 <mss 1460, sackOK, TS val 407 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 330 ecr 0, nop, wscale 8, mpcapable v1 flags[flag_h] nokey> +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]> +0 > . 1:1(0) ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1> Simultaneous SYN-data crossing is also not supported by TFO, see [6]. Kuniyuki Iwashima suggested to restrict the processing to SYN+ACK only: that's a more generic solution than the one initially proposed, and also enough to fix the issues described above. Later on, Eric Dumazet mentioned that an ACK should still be sent in reaction to the second SYN+ACK that is received: not sending a DUPACK here seems wrong and could hurt: 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460, sackOK, TS val 1000 ecr 0,nop,wscale 8> +0 < S 0:0(0) win 1000 <mss 1000, sackOK, nop, nop> +0 > S. 0:0(0) ack 1 <mss 1460, sackOK, TS val 3308134035 ecr 0,nop,wscale 8> +0 < S. 0:0(0) ack 1 win 1000 <mss 1000, sackOK, nop, nop> +0 > . 1:1(0) ack 1 <nop, nop, sack 0:1> // <== Here So in this version, the 'goto consume' is dropped, to always send an ACK when switching from TCP_SYN_RECV to TCP_ESTABLISHED. This ACK will be seen as a DUPACK -- with DSACK if SACK has been negotiated -- in case of simultaneous SYN crossing: that's what is expected here. Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9936227696 [1] Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [2] Link: https://github.com/multipath-tcp/packetdrill/blob/mptcp-net-next/gtests/net/mptcp/syscalls/accept.pkt#L28 [3] Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [4] Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/server/basic-cookie-not-reqd.pkt#L21 [5] Link: https://github.com/google/packetdrill/blob/master/gtests/net/tcp/fastopen/client/simultaneous-fast-open.pkt [6] Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") Suggested-by: Paolo Abeni <pabeni@redhat.com> Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240724-upstream-net-next-20240716-tcp-3rd-ack-consume-sk_socket-v3-1-d48339764ce9@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-25rbd: don't assume rbd_is_lock_owner() for exclusive mappingsIlya Dryomov
Expanding on the previous commit, assuming that rbd_is_lock_owner() always returns true (i.e. that we are either in RBD_LOCK_STATE_LOCKED or RBD_LOCK_STATE_QUIESCING) if the mapping is exclusive is wrong too. In case ceph_cls_set_cookie() fails, the lock would be temporarily released even if the mapping is exclusive, meaning that we can end up even in RBD_LOCK_STATE_UNLOCKED. IOW, exclusive mappings are really "just" about disabling automatic lock transitions (as documented in the man page), not about grabbing the lock and holding on to it whatever it takes. Cc: stable@vger.kernel.org Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2024-07-25rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappingsIlya Dryomov
Every time a watch is reestablished after getting lost, we need to update the cookie which involves quiescing exclusive lock. For this, we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING roughly for the duration of rbd_reacquire_lock() call. If the mapping is exclusive and I/O happens to arrive in this time window, it's failed with EROFS (later translated to EIO) based on the wrong assumption in rbd_img_exclusive_lock() -- "lock got released?" check there stopped making sense with commit a2b1da09793d ("rbd: lock should be quiesced on reacquire"). To make it worse, any such I/O is added to the acquiring list before EROFS is returned and this sets up for violating rbd_lock_del_request() precondition that the request is either on the running list or not on any list at all -- see commit ded080c86b3f ("rbd: don't move requests to the running list on errors"). rbd_lock_del_request() ends up processing these requests as if they were on the running list which screws up quiescing_wait completion counter and ultimately leads to rbd_assert(!completion_done(&rbd_dev->quiescing_wait)); being triggered on the next watch error. Cc: stable@vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait Cc: stable@vger.kernel.org Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2024-07-25rbd: rename RBD_LOCK_STATE_RELEASING and releasing_waitIlya Dryomov
... to RBD_LOCK_STATE_QUIESCING and quiescing_wait to recognize that this state and the associated completion are backing rbd_quiesce_lock(), which isn't specific to releasing the lock. While exclusive lock does get quiesced before it's released, it also gets quiesced before an attempt to update the cookie is made and there the lock is not released as long as ceph_cls_set_cookie() succeeds. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2024-07-25selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata testStanislav Fomichev
This flag is now required to use tx_metadata_len. Fixes: 40808a237d9c ("selftests/bpf: Add TX side to xdp_metadata") Reported-by: Julian Schindel <mail@arctic-alpaca.de> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20240713015253.121248-3-sdf@fomichev.me