linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-02-22	btrfs: fix data races when accessing the reserved amount of block reserves	Filipe Manana
	At space_info.c we have several places where we access the ->reserved field of a block reserve without taking the block reserve's spinlock first, which makes KCSAN warn about a data race since that field is always updated while holding the spinlock. The reports from KCSAN are like the following: [117.193526] BUG: KCSAN: data-race in btrfs_block_rsv_release [btrfs] / need_preemptive_reclaim [btrfs] [117.195148] read to 0x000000017f587190 of 8 bytes by task 6303 on cpu 3: [117.195172] need_preemptive_reclaim+0x222/0x2f0 [btrfs] [117.195992] __reserve_bytes+0xbb0/0xdc8 [btrfs] [117.196807] btrfs_reserve_metadata_bytes+0x4c/0x120 [btrfs] [117.197620] btrfs_block_rsv_add+0x78/0xa8 [btrfs] [117.198434] btrfs_delayed_update_inode+0x154/0x368 [btrfs] [117.199300] btrfs_update_inode+0x108/0x1c8 [btrfs] [117.200122] btrfs_dirty_inode+0xb4/0x140 [btrfs] [117.200937] btrfs_update_time+0x8c/0xb0 [btrfs] [117.201754] touch_atime+0x16c/0x1e0 [117.201789] filemap_read+0x674/0x728 [117.201823] btrfs_file_read_iter+0xf8/0x410 [btrfs] [117.202653] vfs_read+0x2b6/0x498 [117.203454] ksys_read+0xa2/0x150 [117.203473] __s390x_sys_read+0x68/0x88 [117.203495] do_syscall+0x1c6/0x210 [117.203517] __do_syscall+0xc8/0xf0 [117.203539] system_call+0x70/0x98 [117.203579] write to 0x000000017f587190 of 8 bytes by task 11 on cpu 0: [117.203604] btrfs_block_rsv_release+0x2e8/0x578 [btrfs] [117.204432] btrfs_delayed_inode_release_metadata+0x7c/0x1d0 [btrfs] [117.205259] __btrfs_update_delayed_inode+0x37c/0x5e0 [btrfs] [117.206093] btrfs_async_run_delayed_root+0x356/0x498 [btrfs] [117.206917] btrfs_work_helper+0x160/0x7a0 [btrfs] [117.207738] process_one_work+0x3b6/0x838 [117.207768] worker_thread+0x75e/0xb10 [117.207797] kthread+0x21a/0x230 [117.207830] __ret_from_fork+0x6c/0xb8 [117.207861] ret_from_fork+0xa/0x30 So add a helper to get the reserved amount of a block reserve while holding the lock. The value may be not be up to date anymore when used by need_preemptive_reclaim() and btrfs_preempt_reclaim_metadata_space(), but that's ok since the worst it can do is cause more reclaim work do be done sooner rather than later. Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-22	btrfs: send: don't issue unnecessary zero writes for trailing hole	Filipe Manana
	If we have a sparse file with a trailing hole (from the last extent's end to i_size) and then create an extent in the file that ends before the file's i_size, then when doing an incremental send we will issue a write full of zeroes for the range that starts immediately after the new extent ends up to i_size. While this isn't incorrect because the file ends up with exactly the same data, it unnecessarily results in using extra space at the destination with one or more extents full of zeroes instead of having a hole. In same cases this results in using megabytes or even gigabytes of unnecessary space. Example, reproducer: $ cat test.sh #!/bin/bash DEV=/dev/sdh MNT=/mnt/sdh mkfs.btrfs -f $DEV mount $DEV $MNT # Create 1G sparse file. xfs_io -f -c "truncate 1G" $MNT/foobar # Create base snapshot. btrfs subvolume snapshot -r $MNT $MNT/mysnap1 # Create send stream (full send) for the base snapshot. btrfs send -f /tmp/1.snap $MNT/mysnap1 # Now write one extent at the beginning of the file and one somewhere # in the middle, leaving a gap between the end of this second extent # and the file's size. xfs_io -c "pwrite -S 0xab 0 128K" \ -c "pwrite -S 0xcd 512M 128K" \ $MNT/foobar # Now create a second snapshot which is going to be used for an # incremental send operation. btrfs subvolume snapshot -r $MNT $MNT/mysnap2 # Create send stream (incremental send) for the second snapshot. btrfs send -p $MNT/mysnap1 -f /tmp/2.snap $MNT/mysnap2 # Now recreate the filesystem by receiving both send streams and # verify we get the same content that the original filesystem had # and file foobar has only two extents with a size of 128K each. umount $MNT mkfs.btrfs -f $DEV mount $DEV $MNT btrfs receive -f /tmp/1.snap $MNT btrfs receive -f /tmp/2.snap $MNT echo -e "\nFile fiemap in the second snapshot:" # Should have: # # 128K extent at file range [0, 128K[ # hole at file range [128K, 512M[ # 128K extent file range [512M, 512M + 128K[ # hole at file range [512M + 128K, 1G[ xfs_io -r -c "fiemap -v" $MNT/mysnap2/foobar # File should be using 256K of data (two 128K extents). echo -e "\nSpace used by the file: $(du -h $MNT/mysnap2/foobar \| cut -f 1)" umount $MNT Running the test, we can see with fiemap that we get an extent for the range [512M, 1G[, while in the source filesystem we have an extent for the range [512M, 512M + 128K[ and a hole for the rest of the file (the range [512M + 128K, 1G[): $ ./test.sh (...) File fiemap in the second snapshot: /mnt/sdh/mysnap2/foobar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..255]: 26624..26879 256 0x0 1: [256..1048575]: hole 1048320 2: [1048576..2097151]: 2156544..3205119 1048576 0x1 Space used by the file: 513M This happens because once we finish processing an inode, at finish_inode_if_needed(), we always issue a hole (write operations full of zeros) if there's a gap between the end of the last processed extent and the file's size, even if that range is already a hole in the parent snapshot. Fix this by issuing the hole only if the range is not already a hole. After this change, running the test above, we get the expected layout: $ ./test.sh (...) File fiemap in the second snapshot: /mnt/sdh/mysnap2/foobar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..255]: 26624..26879 256 0x0 1: [256..1048575]: hole 1048320 2: [1048576..1048831]: 26880..27135 256 0x1 3: [1048832..2097151]: hole 1048320 Space used by the file: 256K A test case for fstests will follow soon. CC: stable@vger.kernel.org # 6.1+ Reported-by: Dorai Ashok S A <dash.btrfs@inix.me> Link: https://lore.kernel.org/linux-btrfs/c0bf7818-9c45-46a8-b3d3-513230d0c86e@inix.me/ Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-22	btrfs: dev-replace: properly validate device names	David Sterba
	There's a syzbot report that device name buffers passed to device replace are not properly checked for string termination which could lead to a read out of bounds in getname_kernel(). Add a helper that validates both source and target device name buffers. For devid as the source initialize the buffer to empty string in case something tries to read it later. This was originally analyzed and fixed in a different way by Edward Adam Davis (see links). Link: https://lore.kernel.org/linux-btrfs/000000000000d1a1d1060cc9c5e7@google.com/ Link: https://lore.kernel.org/linux-btrfs/tencent_44CA0665C9836EF9EEC80CB9E7E206DF5206@qq.com/ CC: stable@vger.kernel.org # 4.19+ CC: Edward Adam Davis <eadavis@qq.com> Reported-and-tested-by: syzbot+33f23b49ac24f986c9e8@syzkaller.appspotmail.com Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-22	btrfs: zoned: don't skip block group profile checks on conventional zones	Johannes Thumshirn
	On a zoned filesystem with conventional zones, we're skipping the block group profile checks for the conventional zones. This allows converting a zoned filesystem's data block groups to RAID when all of the zones backing the chunk are on conventional zones. But this will lead to problems, once we're trying to allocate chunks backed by sequential zones. So also check for conventional zones when loading a block group's profile on them. Reported-by: HAN Yuwei <hrx@bupt.moe> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/#t Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-22	Merge tag 'ath-next-20240222' of ↵	Kalle Valo
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath ath.git patches for v6.9 We have support for QCA2066 now and also several new features in ath12k. Major changes: ath12k * firmware-2.bin support * support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) * QCN9274: support split-PHY devices * WCN7850: enable Power Save Mode in station mode * WCN7850: P2P support ath11k: * QCA6390 & WCN6855: support 2 concurrent station interfaces * QCA2066 support
2024-02-22	drm/ttm: Fix an invalid freeing on already freed page in error path	Thomas Hellström
	If caching mode change fails due to, for example, OOM we free the allocated pages in a two-step process. First the pages for which the caching change has already succeeded. Secondly the pages for which a caching change did not succeed. However the second step was incorrectly freeing the pages already freed in the first step. Fix. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path") Cc: Christian König <christian.koenig@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Huang Rui <ray.huang@amd.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.4+ Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240221073324.3303-1-thomas.hellstrom@linux.intel.com
2024-02-22	ARM: dts: renesas: rcar-gen2: Add missing #interrupt-cells to DA9063 nodes	Geert Uytterhoeven
	make dtbs_check W=2: arch/arm/boot/dts/renesas/r8a7790-lager.dts:444.11-458.5: Warning (interrupt_provider): /i2c-mux4/pmic@58: Missing '#interrupt-cells' in interrupt provider ... Fix this by adding the missing #interrupt-cells properties. Reported-by: Rob Herring <robh@kernel.org> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/a351e503ea97fb1af68395843f513925ff1bdf26.1707922460.git.geert+renesas@glider.be
2024-02-22	l2tp: pass correct message length to ip6_append_data	Tom Parkin
	l2tp_ip6_sendmsg needs to avoid accounting for the transport header twice when splicing more data into an already partially-occupied skbuff. To manage this, we check whether the skbuff contains data using skb_queue_empty when deciding how much data to append using ip6_append_data. However, the code which performed the calculation was incorrect: ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0; ...due to C operator precedence, this ends up setting ulen to transhdrlen for messages with a non-zero length, which results in corrupted packets on the wire. Add parentheses to correct the calculation in line with the original intent. Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()") Cc: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Tom Parkin <tparkin@katalix.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	Revert "ACPI: EC: Use a spin lock without disabing interrupts"	Rafael J. Wysocki
	Commit eb9299beadbd ("ACPI: EC: Use a spin lock without disabing interrupts") introduced an unexpected user-visible change in behavior, which is a significant CPU load increase when the EC is in use. This most likely happens due to increased spinlock contention and so reducing this effect would require a major rework of the EC driver locking. There is no time for this in the current cycle, so revert commit eb9299beadbd. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218511 Reported-by: Dieter Mummenschanz <dmummenschanz@web.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-22	Merge tag 'nf-24-02-22' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) If user requests to wake up a table and hook fails, restore the dormant flag from the error path, from Florian Westphal. 2) Reset dst after transferring it to the flow object, otherwise dst gets released twice from the error path. 3) Release dst in case the flowtable selects a direct xmit path, eg. transmission to bridge port. Otherwise, dst is memleaked. 4) Register basechain and flowtable hooks at the end of the command. Error path releases these datastructure without waiting for the rcu grace period. 5) Use kzalloc() to initialize struct nft_hook to fix a KMSAN report on access to hook type, also from Florian Westphal. netfilter pull request 24-02-22 * tag 'nf-24-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: use kzalloc for hook allocation netfilter: nf_tables: register hooks last when adding new chain/flowtable netfilter: nft_flow_offload: release dst in case direct xmit path is used netfilter: nft_flow_offload: reset dst in route object after setting up flow netfilter: nf_tables: set dormant flag on hook register failure ==================== Link: https://lore.kernel.org/r/20240222000843.146665-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	Merge tag 'for-netdev' of ↵	Paolo Abeni
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-02-22 The following pull-request contains BPF updates for your net tree. We've added 11 non-merge commits during the last 24 day(s) which contain a total of 15 files changed, 217 insertions(+), 17 deletions(-). The main changes are: 1) Fix a syzkaller-triggered oops when attempting to read the vsyscall page through bpf_probe_read_kernel and friends, from Hou Tao. 2) Fix a kernel panic due to uninitialized iter position pointer in bpf_iter_task, from Yafang Shao. 3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel, from Martin KaFai Lau. 4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET) due to incorrect truesize accounting, from Sebastian Andrzej Siewior. 5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready, from Shigeru Yoshida. 6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be resolved, from Hari Bathini. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready() selftests/bpf: Add negtive test cases for task iter bpf: Fix an issue due to uninitialized bpf_iter_task selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel selftest/bpf: Test the read of vsyscall page under x86-64 x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h bpf, scripts: Correct GPL license name xsk: Add truesize to skb_add_rx_frag(). bpf: Fix warning for bpf_cpumask in verifier ==================== Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY	Siddharth Vadapalli
	Commit bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG") extended support of the driver from the existing support for RTL8211F(D)(I)-CG PHY to the newer RTL8211F(D)(I)-VD-CG PHY. While that commit indicated that the RTL8211F_PHYCR2 register is not supported by the "VD-CG" PHY model and therefore updated the corresponding section in rtl8211f_config_init() to be invoked conditionally, the call to "genphy_soft_reset()" was left as-is, when it should have also been invoked conditionally. This is because the call to "genphy_soft_reset()" was first introduced by the commit 0a4355c2b7f8 ("net: phy: realtek: add dt property to disable CLKOUT clock") since the RTL8211F guide indicates that a PHY reset should be issued after setting bits in the PHYCR2 register. As the PHYCR2 register is not applicable to the "VD-CG" PHY model, fix the rtl8211f_config_init() function by invoking "genphy_soft_reset()" conditionally based on the presence of the "PHYCR2" register. Fixes: bb726b753f75 ("net: phy: realtek: add support for RTL8211F(D)(I)-VD-CG") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240220070007.968762-1-s-vadapalli@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	Merge branch 'ioam6-fix-write-to-cloned-skb-s'	Paolo Abeni
	Justin Iurman says: ==================== ioam6: fix write to cloned skb's Make sure the IOAM data insertion is not applied on cloned skb's. As a consequence, ioam selftests needed a refactoring. ==================== Link: https://lore.kernel.org/r/20240219135255.15429-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	selftests: ioam: refactoring to align with the fix	Justin Iurman
	ioam6_parser uses a packet socket. After the fix to prevent writing to cloned skb's, the receiver does not see its IOAM data anymore, which makes input/forward ioam-selftests to fail. As a workaround, ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to get hop-by-hop options. As a consequence, the hook is "after" the IOAM data insertion by the receiver and all tests are working again. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	Fix write to cloned skb in ipv6_hop_ioam()	Justin Iurman
	ioam6_fill_trace_data() writes inside the skb payload without ensuring it's writeable (e.g., not cloned). This function is called both from the input and output path. The output path (ioam6_iptunnel) already does the check. This commit provides a fix for the input path, inside ipv6_hop_ioam(). It also updates ip6_parse_tlv() to refresh the network header pointer ("nh") when returning from ipv6_hop_ioam(). Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	phonet/pep: fix racy skb_queue_empty() use	Rémi Denis-Courmont
	The receive queues are protected by their respective spin-lock, not the socket lock. This could lead to skb_peek() unexpectedly returning NULL or a pointer to an already dequeued socket buffer. Fixes: 9641458d3ec4 ("Phonet: Pipe End Point for Phonet Pipes protocol") Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com> Link: https://lore.kernel.org/r/20240218081214.4806-2-remi@remlab.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	phonet: take correct lock to peek at the RX queue	Rémi Denis-Courmont
	The receive queue is protected by its embedded spin-lock, not the socket lock, so we need the former lock here (and only that one). Fixes: 107d0d9b8d9a ("Phonet: Phonet datagram transport protocol") Reported-by: Luosili <rootlab@huawei.com> Signed-off-by: Rémi Denis-Courmont <courmisch@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240218081214.4806-1-remi@remlab.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	erofs: fix refcount on the metabuf used for inode lookup	Sandeep Dhavale
	In erofs_find_target_block() when erofs_dirnamecmp() returns 0, we do not assign the target metabuf. This causes the caller erofs_namei()'s erofs_put_metabuf() at the end to be not effective leaving the refcount on the page. As the page from metabuf (buf->page) is never put, such page cannot be migrated or reclaimed. Fix it now by putting the metabuf from previous loop and assigning the current metabuf to target before returning so caller erofs_namei() can do the final put as it was intended. Fixes: 500edd095648 ("erofs: use meta buffers for inode lookup") Cc: <stable@vger.kernel.org> # 5.18+ Signed-off-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240221210348.3667795-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-02-21	PPPoL2TP: Add more code snippets	Samuel Thibault
	The existing documentation was not telling that one has to create a PPP channel and a PPP interface to get PPPoL2TP data offloading working. Also, tunnel switching was not mentioned, so that people were thinking it was not supported, while it actually is. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Acked-by: Tom Parkin <tparkin@katalix.com> Link: https://lore.kernel.org/r/20240217211425.qj576u3jmaa6yidf@begin Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: sparx5: Add spinlock for frame transmission from CPU	Horatiu Vultur
	Both registers used when doing manual injection or fdma injection are shared between all the net devices of the switch. It was noticed that when having two process which each of them trying to inject frames on different ethernet ports, that the HW started to behave strange, by sending out more frames then expected. When doing fdma injection it is required to set the frame in the DCB and then make sure that the next pointer of the last DCB is invalid. But because there is no locks for this, then easily this pointer between the DCB can be broken and then it would create a loop of DCBs. And that means that the HW will continuously transmit these frames in a loop. Until the SW will break this loop. Therefore to fix this issue, add a spin lock for when accessing the registers for manual or fdma injection. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Fixes: f3cad2611a77 ("net: sparx5: add hostmode with phylink support") Link: https://lore.kernel.org/r/20240219080043.1561014-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net/sched: flower: Add lock protection when remove filter handle	Jianbo Liu
	As IDR can't protect itself from the concurrent modification, place idr_remove() under the protection of tp->lock. Fixes: 08a0063df3ae ("net/sched: flower: Move filter handle initialization earlier") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20240220085928.9161-1-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	devlink: fix port dump cmd type	Jiri Pirko
	Unlike other commands, due to a c&p error, port dump fills-up cmd with wrong value, different from port-get request cmd, port-get doit reply and port notification. Fix it by filling cmd with value DEVLINK_CMD_PORT_NEW. Skimmed through devlink userspace implementations, none of them cares about this cmd value. Only ynl, for which, this is actually a fix, as it expects doit and dumpit ops rsp_value to be the same. Omit the fixes tag, even thought this is fix, better to target this for next release. Fixes: bfcd3a466172 ("Introduce devlink infrastructure") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240220075245.75416-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: stmmac: Fix EST offset for dwmac 5.10	Kurt Kanzenbach
	Fix EST offset for dwmac 5.10. Currently configuring Qbv doesn't work as expected. The schedule is configured, but never confirmed: \|[ 128.250219] imx-dwmac 428a0000.ethernet eth1: configured EST The reason seems to be the refactoring of the EST code which set the wrong EST offset for the dwmac 5.10. After fixing this it works as before: \|[ 106.359577] imx-dwmac 428a0000.ethernet eth1: configured EST \|[ 128.430715] imx-dwmac 428a0000.ethernet eth1: EST: SWOL has been switched Tested on imx93. Fixes: c3f3b97238f6 ("net: stmmac: Refactor EST implementation") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20240220-stmmac_est-v1-1-c41f9ae2e7b7@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	udp: add local "peek offset enabled" flag	Paolo Abeni
	We want to re-organize the struct sock layout. The sk_peek_off field location is problematic, as most protocols want it in the RX read area, while UDP wants it on a cacheline different from sk_receive_queue. Create a local (inside udp_sock) copy of the 'peek offset is enabled' flag and place it inside the same cacheline of reader_queue. Check such flag before reading sk_peek_off. This will save potential false sharing and cache misses in the fast-path. Tested under UDP flood with small packets. The struct sock layout update causes a 4% performance drop, and this patch restores completely the original tput. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/67ab679c15fbf49fa05b3ffe05d91c47ab84f147.1708426665.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	Merge branch 'tools-ynl-fix-impossible-errors'	Jakub Kicinski
	Jakub Kicinski says: ==================== tools: ynl: fix impossible errors Fix bugs discovered while I was hacking in low level stuff in YNL and kept breaking the socket, exercising the "impossible" error paths. v1: https://lore.kernel.org/all/20240217001742.2466993-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20240220161112.2735195-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	tools: ynl: don't leak mcast_groups on init error	Jakub Kicinski
	Make sure to free the already-parsed mcast_groups if we don't get an ack from the kernel when reading family info. This is part of the ynl_sock_create() error path, so we won't get a call to ynl_sock_destroy() to free them later. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240220161112.2735195-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	tools: ynl: make sure we always pass yarg to mnl_cb_run	Jakub Kicinski
	There is one common error handler in ynl - ynl_cb_error(). It expects priv to be a pointer to struct ynl_parse_arg AKA yarg. To avoid potential crashes if we encounter a stray NLMSG_ERROR always pass yarg as priv (or a struct which has it as the first member). ynl_cb_null() has a similar problem directly - it expects yarg but priv passed by the caller is ys. Found by code inspection. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240220161112.2735195-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: mctp: put sock on tag allocation failure	Jeremy Kerr
	We may hold an extra reference on a socket if a tag allocation fails: we optimistically allocate the sk_key, and take a ref there, but do not drop if we end up not using the allocated key. Ensure we're dropping the sock on this failure by doing a proper unref rather than directly kfree()ing. Fixes: de8a6b15d965 ("net: mctp: add an explicit reference from a mctp_sk_key to sock") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/ce9b61e44d1cdae7797be0c5e3141baf582d23a0.1707983487.git.jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22	netfilter: nf_tables: use kzalloc for hook allocation	Florian Westphal
	KMSAN reports unitialized variable when registering the hook, reg->hook_ops_type == NF_HOOK_OP_BPF) ~~~~~~~~~~~ undefined This is a small structure, just use kzalloc to make sure this won't happen again when new fields get added to nf_hook_ops. Fixes: 7b4b2fa37587 ("netfilter: annotate nf_tables base hook ops") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22	netfilter: nf_tables: register hooks last when adding new chain/flowtable	Pablo Neira Ayuso
	Register hooks last when adding chain/flowtable to ensure that packets do not walk over datastructure that is being released in the error path without waiting for the rcu grace period. Fixes: 91c7b38dc9f0 ("netfilter: nf_tables: use new transaction infrastructure to handle chain") Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22	netfilter: nft_flow_offload: release dst in case direct xmit path is used	Pablo Neira Ayuso
	Direct xmit does not use it since it calls dev_queue_xmit() to send packets, hence it calls dst_release(). kmemleak reports: unreferenced object 0xffff88814f440900 (size 184): comm "softirq", pid 0, jiffies 4294951896 hex dump (first 32 bytes): 00 60 5b 04 81 88 ff ff 00 e6 e8 82 ff ff ff ff .`[............. 21 0b 50 82 ff ff ff ff 00 00 00 00 00 00 00 00 !.P............. backtrace (crc cb2bf5d6): [<000000003ee17107>] kmem_cache_alloc+0x286/0x340 [<0000000021a5de2c>] dst_alloc+0x43/0xb0 [<00000000f0671159>] rt_dst_alloc+0x2e/0x190 [<00000000fe5092c9>] __mkroute_output+0x244/0x980 [<000000005fb96fb0>] ip_route_output_flow+0xc0/0x160 [<0000000045367433>] nf_ip_route+0xf/0x30 [<0000000085da1d8e>] nf_route+0x2d/0x60 [<00000000d1ecd1cb>] nft_flow_route+0x171/0x6a0 [nft_flow_offload] [<00000000d9b2fb60>] nft_flow_offload_eval+0x4e8/0x700 [nft_flow_offload] [<000000009f447dbb>] expr_call_ops_eval+0x53/0x330 [nf_tables] [<00000000072e1be6>] nft_do_chain+0x17c/0x840 [nf_tables] [<00000000d0551029>] nft_do_chain_inet+0xa1/0x210 [nf_tables] [<0000000097c9d5c6>] nf_hook_slow+0x5b/0x160 [<0000000005eccab1>] ip_forward+0x8b6/0x9b0 [<00000000553a269b>] ip_rcv+0x221/0x230 [<00000000412872e5>] __netif_receive_skb_one_core+0xfe/0x110 Fixes: fa502c865666 ("netfilter: flowtable: simplify route logic") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22	netfilter: nft_flow_offload: reset dst in route object after setting up flow	Pablo Neira Ayuso
	dst is transferred to the flow object, route object does not own it anymore. Reset dst in route object, otherwise if flow_offload_add() fails, error path releases dst twice, leading to a refcount underflow. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-22	netfilter: nf_tables: set dormant flag on hook register failure	Florian Westphal
	We need to set the dormant flag again if we fail to register the hooks. During memory pressure hook registration can fail and we end up with a table marked as active but no registered hooks. On table/base chain deletion, nf_tables will attempt to unregister the hook again which yields a warn splat from the nftables core. Reported-and-tested-by: syzbot+de4025c006ec68ac56fc@syzkaller.appspotmail.com Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-21	Merge branch 'net-phy-marvell-88q2xxx-add-driver-for-the-marvell-88q2220-phy'	Jakub Kicinski
	Dimitri Fedrau says: ==================== net: phy: marvell-88q2xxx: add driver for the Marvell 88Q2220 PHY ==================== Link: https://lore.kernel.org/r/20240218075753.18067-1-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: move interrupt configuration	Dimitri Fedrau
	Move interrupt configuration from mv88q222x_revb0_config_init to mv88q2xxx_config_init. Same register and bits are used for the 88q2xxx devices. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Stefan Eichenberger <eichest@gmail.com> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-15-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: remove duplicated assignment of pma_extable	Dimitri Fedrau
	Remove assignment of phydev->pma_extable in mv88q222x_revb0_config_init. It is already done in mv88q2xxx_config_init, just call mv88q2xxx_config_init. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-14-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: cleanup mv88q2xxx_config_init	Dimitri Fedrau
	mv88q2xxx_config_init calls genphy_c45_read_pma which is done by mv88q2xxx_read_status, it calls also mv88q2xxx_config_aneg which is also called by the PHY state machine. Let the PHY state machine handle the phydriver ops in their intendend way. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Stefan Eichenberger <eichest@gmail.com> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-13-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: switch to mv88q2xxx_config_aneg	Dimitri Fedrau
	Switch to mv88q2xxx_config_aneg for Marvell 88Q2220 devices and remove the mv88q222x_config_aneg function which is basically a copy of the mv88q2xxx_config_aneg function. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-12-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: make mv88q2xxx_config_aneg generic	Dimitri Fedrau
	Marvell 88Q2xxx devices follow the same scheme, after configuration they need a soft reset. Soft resets differ between devices, so we use the .soft_reset callback instead of creating .config_aneg callbacks for each device. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-11-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: add cable test support	Dimitri Fedrau
	Add cable test support for Marvell 88Q222x devices. Reported distance granularity is 1m. 1m cable, open: Cable test started for device eth0. Cable test completed for device eth0. Pair A code Open Circuit Pair A, fault length: 1.00m 1m cable, shorted: Cable test started for device eth0. Cable test completed for device eth0. Pair A code Short within Pair Pair A, fault length: 1.00m 6m cable, open: Cable test started for device eth0. Cable test completed for device eth0. Pair A code Open Circuit Pair A, fault length: 6.00m 6m cable, shorted: Cable test started for device eth0. Cable test completed for device eth0. Pair A code Short within Pair Pair A, fault length: 6.00m Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240218075753.18067-10-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: add support for temperature sensor	Dimitri Fedrau
	Marvell 88q2xxx devices have an inbuilt temperature sensor. Add hwmon support for this sensor. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-9-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: add suspend / resume ops	Dimitri Fedrau
	Add suspend/resume ops for Marvell 88Q2xxx devices. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-8-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: add interrupt support for link detection	Dimitri Fedrau
	Added .config_intr and .handle_interrupt callbacks. Whenever the link goes up or down an interrupt will be triggered. Interrupts are configured separately for 100/1000BASET1. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-7-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: add driver for the Marvell 88Q2220 PHY	Dimitri Fedrau
	Add a driver for the Marvell 88Q2220. This driver allows to detect the link, switch between 100BASE-T1 and 1000BASE-T1 and switch between master and slave mode. Autonegotiation is supported. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Gregor Herburger <gregor.herburger@ew.tq-group.com> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-6-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: marvell-88q2xxx: fix typos	Dimitri Fedrau
	Rename mv88q2xxxx_get_sqi to mv88q2xxx_get_sqi and mv88q2xxxx_get_sqi_max to mv88q2xxx_get_sqi_max. Fix linebreaks and use everywhere hexadecimal numbers written with lowercase letters instead of mixing it up. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-5-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: c45: detect 100/1000BASE-T1 linkmode advertisements	Dimitri Fedrau
	Set 100BT1 and 1000BT1 linkmode advertisement bits to adv_l_mask to enable detection. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-4-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: Support 100/1000BT1 linkmode advertisements	Dimitri Fedrau
	Extend helper functions mii_t1_adv_m_mod_linkmode_t and linkmode_adv_to_mii_t1_adv_m_t to support 100BT1 and 1000BT1 linkmode advertisements. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-3-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: Add BaseT1 auto-negotiation constants	Dimitri Fedrau
	Added constants for advertising 100BT1 and 1000BT1 in register BASE-T1 auto-negotiation advertisement register [31:16] (Register 7.515) Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com> Link: https://lore.kernel.org/r/20240218075753.18067-2-dima.fedrau@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	net: phy: mxl-gpy: fill in possible_interfaces for GPY21x chipset	Raju Lakkaraju
	Fill in the possible_interfaces member. GPY21x phys support the SGMII and 2500base-X interfaces Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Link: https://lore.kernel.org/r/20240216054435.22380-1-Raju.Lakkaraju@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	Merge branch 'tls-fixes-for-record-type-handling-with-peek'	Jakub Kicinski
	Sabrina Dubroca says: ==================== tls: fixes for record type handling with PEEK There are multiple bugs in tls_sw_recvmsg's handling of record types when MSG_PEEK flag is used, which can lead to incorrectly merging two records: - consecutive non-DATA records shouldn't be merged, even if they're the same type (partly handled by the test at the end of the main loop) - records of the same type (even DATA) shouldn't be merged if one record of a different type comes in between ==================== Link: https://lore.kernel.org/r/cover.1708007371.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>