summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-08selftests: drv-net: add checksum testsWillem de Bruijn
Run tools/testing/selftest/net/csum.c as part of drv-net. This binary covers multiple scenarios, based on arguments given, for both IPv4 and IPv6: - Accept UDP correct checksum - Detect UDP invalid checksum - Accept TCP correct checksum - Detect TCP invalid checksum - Transmit UDP: basic checksum offload - Transmit UDP: zero checksum conversion The test direction is reversed between receive and transmit tests, so that the NIC under test is always the local machine. In total this adds up to 12 testcases, with more to follow. For conciseness, I replaced individual functions with a function factory. Also detect hardware offload feature availability using Ethtool netlink and skip tests when either feature is off. This need may be common for offload feature tests and eventually deserving of a thin wrapper in lib.py. Missing are the PF_PACKET based send tests ('-P'). These use virtio_net_hdr to program hardware checksum offload. Which requires looking up the local MAC address and (harder) the MAC of the next hop. I'll have to give it some though how to do that robustly and where that code would belong. Tested: make -C tools/testing/selftests/ \ TARGETS="drivers/net drivers/net/hw" \ install INSTALL_PATH=/tmp/ksft cd /tmp/ksft sudo NETIF=ens4 REMOTE_TYPE=ssh \ REMOTE_ARGS="root@10.40.0.2" \ LOCAL_V4="10.40.0.1" \ REMOTE_V4="10.40.0.2" \ ./run_kselftest.sh -t drivers/net/hw:csum.py Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240507154216.501111-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08ipv6: prevent NULL dereference in ip6_output()Eric Dumazet
According to syzbot, there is a chance that ip6_dst_idev() returns NULL in ip6_output(). Most places in IPv6 stack deal with a NULL idev just fine, but not here. syzbot reported: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7] CPU: 0 PID: 9775 Comm: syz-executor.4 Not tainted 6.9.0-rc5-syzkaller-00157-g6a30653b604a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:ip6_output+0x231/0x3f0 net/ipv6/ip6_output.c:237 Code: 3c 1e 00 49 89 df 74 08 4c 89 ef e8 19 58 db f7 48 8b 44 24 20 49 89 45 00 49 89 c5 48 8d 9d e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 4c 8b 74 24 28 0f 85 61 01 00 00 8b 1b 31 ff RSP: 0018:ffffc9000927f0d8 EFLAGS: 00010202 RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000040000 RDX: ffffc900131f9000 RSI: 0000000000004f47 RDI: 0000000000004f48 RBP: 0000000000000000 R08: ffffffff8a1f0b9a R09: 1ffffffff1f51fad R10: dffffc0000000000 R11: fffffbfff1f51fae R12: ffff8880293ec8c0 R13: ffff88805d7fc000 R14: 1ffff1100527d91a R15: dffffc0000000000 FS: 00007f135c6856c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 0000000064096000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> NF_HOOK include/linux/netfilter.h:314 [inline] ip6_xmit+0xefe/0x17f0 net/ipv6/ip6_output.c:358 sctp_v6_xmit+0x9f2/0x13f0 net/sctp/ipv6.c:248 sctp_packet_transmit+0x26ad/0x2ca0 net/sctp/output.c:653 sctp_packet_singleton+0x22c/0x320 net/sctp/outqueue.c:783 sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline] sctp_outq_flush+0x6d5/0x3e20 net/sctp/outqueue.c:1212 sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline] sctp_do_sm+0x59cc/0x60c0 net/sctp/sm_sideeffect.c:1169 sctp_primitive_ASSOCIATE+0x95/0xc0 net/sctp/primitive.c:73 __sctp_connect+0x9cd/0xe30 net/sctp/socket.c:1234 sctp_connect net/sctp/socket.c:4819 [inline] sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834 __sys_connect_file net/socket.c:2048 [inline] __sys_connect+0x2df/0x310 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x7a/0x90 net/socket.c:2072 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 778d80be5269 ("ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20240507161842.773961-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08hsr: Simplify code for announcing HSR nodes timer setupLukasz Majewski
Up till now the code to start HSR announce timer, which triggers sending supervisory frames, was assuming that hsr_netdev_notify() would be called at least twice for hsrX interface. This was required to have different values for old and current values of network device's operstate. This is problematic for a case where hsrX interface is already in the operational state when hsr_netdev_notify() is called, so timer is not configured to trigger and as a result the hsrX is not sending supervisory frames to HSR ring. This error has been discovered when hsr_ping.sh script was run. To be more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states assigned in hsr_check_carrier_and_operstate(hsr). As a result there was no issue with sending supervisory frames. However, with hsr3, the notify function was called only once with operstate set to IF_OPER_UP and timer responsible for triggering supervisory frames was not fired. The solution is to use netif_oper_up() and netif_running() helper functions to assess if network hsrX device is up. Only then, when the timer is not already pending, it is started. Otherwise it is deactivated. Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240507111214.3519800-1-lukma@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08phonet: no longer hold RTNL in route_dumpit()Eric Dumazet
route_dumpit() already relies on RCU, RTNL is not needed. Also change return value at the end of a dump. This allows NLMSG_DONE to be appended to the current skb at the end of a dump, saving a couple of recvmsg() system calls. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Remi Denis-Courmont <courmisch@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240507121748.416287-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08net: annotate data-races around dev->if_portEric Dumazet
Various ndo_set_config() methods can change dev->if_port dev->if_port is going to be read locklessly from rtnl_fill_link_ifmap(). Add corresponding WRITE_ONCE() on writer sides. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240507184144.1230469-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()Eric Dumazet
syzbot is able to trigger the following crash [1], caused by unsafe ip6_dst_idev() use. Indeed ip6_dst_idev() can return NULL, and must always be checked. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 31648 Comm: syz-executor.0 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:__fib6_rule_action net/ipv6/fib6_rules.c:237 [inline] RIP: 0010:fib6_rule_action+0x241/0x7b0 net/ipv6/fib6_rules.c:267 Code: 02 00 00 49 8d 9f d8 00 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 f9 32 bf f7 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 20 00 74 08 48 89 df e8 e0 32 bf f7 4c 8b 03 48 89 ef 4c RSP: 0018:ffffc9000fc1f2f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1a772f98c8186700 RDX: 0000000000000003 RSI: ffffffff8bcac4e0 RDI: ffffffff8c1f9760 RBP: ffff8880673fb980 R08: ffffffff8fac15ef R09: 1ffffffff1f582bd R10: dffffc0000000000 R11: fffffbfff1f582be R12: dffffc0000000000 R13: 0000000000000080 R14: ffff888076509000 R15: ffff88807a029a00 FS: 00007f55e82ca6c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31d23000 CR3: 0000000022b66000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> fib_rules_lookup+0x62c/0xdb0 net/core/fib_rules.c:317 fib6_rule_lookup+0x1fd/0x790 net/ipv6/fib6_rules.c:108 ip6_route_output_flags_noref net/ipv6/route.c:2637 [inline] ip6_route_output_flags+0x38e/0x610 net/ipv6/route.c:2649 ip6_route_output include/net/ip6_route.h:93 [inline] ip6_dst_lookup_tail+0x189/0x11a0 net/ipv6/ip6_output.c:1120 ip6_dst_lookup_flow+0xb9/0x180 net/ipv6/ip6_output.c:1250 sctp_v6_get_dst+0x792/0x1e20 net/sctp/ipv6.c:326 sctp_transport_route+0x12c/0x2e0 net/sctp/transport.c:455 sctp_assoc_add_peer+0x614/0x15c0 net/sctp/associola.c:662 sctp_connect_new_asoc+0x31d/0x6c0 net/sctp/socket.c:1099 __sctp_connect+0x66d/0xe30 net/sctp/socket.c:1197 sctp_connect net/sctp/socket.c:4819 [inline] sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834 __sys_connect_file net/socket.c:2048 [inline] __sys_connect+0x2df/0x310 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x7a/0x90 net/socket.c:2072 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 5e5f3f0f8013 ("[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240507163145.835254-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08net: dst_cache: minor optimization in dst_cache_set_ip6()Eric Dumazet
There is no need to use this_cpu_ptr(dst_cache->cache) twice. Compiler is unable to optimize the second call, because of per-cpu constraints. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240507132717.627518-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08net: dst_cache: annotate data-races around dst_cache->reset_tsEric Dumazet
dst_cache->reset_ts is read or written locklessly, add READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240507132000.614591-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08dt-bindings: net: mediatek: remove wrongly added clocks and SerDesDaniel Golle
Several clocks as well as both sgmiisys phandles were added by mistake to the Ethernet bindings for MT7988. Also, the total number of clocks didn't match with the actual number of items listed. This happened because the vendor driver which served as a reference uses a high number of syscon phandles to access various parts of the SoC which wasn't acceptable upstream. Hence several parts which have never previously been supported (such SerDes PHY and USXGMII PCS) are going to be implemented by separate drivers. As a result the device tree will look much more sane. Quickly align the bindings with the upcoming reality of the drivers actually adding support for the remaining Ethernet-related features of the MT7988 SoC. Fixes: c94a9aabec36 ("dt-bindings: net: mediatek,net: add mt7988-eth binding") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/1569290b21cc787a424469ed74456a7e976b102d.1715084326.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08netlink/specs: Add VF attributes to rt_link specDonald Hunter
Add support for retrieving VFs as part of link info. For example: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \ --do getlink --json '{"ifi-index": 38, "ext-mask": ["vf", "skip-stats"]}' {'address': 'b6:75:91:f2:64:65', [snip] 'vfinfo-list': {'info': [{'broadcast': b'\xff\xff\xff\xff\xff\xff\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00', 'link-state': {'link-state': 'auto', 'vf': 0}, 'mac': {'mac': b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00', 'vf': 0}, 'rate': {'max-tx-rate': 0, 'min-tx-rate': 0, 'vf': 0}, 'rss-query-en': {'setting': 0, 'vf': 0}, 'spoofchk': {'setting': 0, 'vf': 0}, 'trust': {'setting': 0, 'vf': 0}, 'tx-rate': {'rate': 0, 'vf': 0}, 'vlan': {'qos': 0, 'vf': 0, 'vlan': 0}, 'vlan-list': {'info': [{'qos': 0, 'vf': 0, 'vlan': 0, 'vlan-proto': 0}]}}, {'broadcast': b'\xff\xff\xff\xff\xff\xff\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00', 'link-state': {'link-state': 'auto', 'vf': 1}, 'mac': {'mac': b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00' b'\x00\x00\x00\x00\x00\x00\x00\x00', 'vf': 1}, 'rate': {'max-tx-rate': 0, 'min-tx-rate': 0, 'vf': 1}, 'rss-query-en': {'setting': 0, 'vf': 1}, 'spoofchk': {'setting': 0, 'vf': 1}, 'trust': {'setting': 0, 'vf': 1}, 'tx-rate': {'rate': 0, 'vf': 1}, 'vlan': {'qos': 0, 'vf': 1, 'vlan': 0}, 'vlan-list': {'info': [{'qos': 0, 'vf': 1, 'vlan': 0, 'vlan-proto': 0}]}}]}, 'xdp': {'attached': 0}} Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240507103603.23017-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08dt-bindings: net: ipq4019-mdio: add IPQ9574 compatibleAlexandru Gagniuc
Add a compatible property specific to IPQ9574. This should be used along with the IPQ4019 compatible. This second compatible serves the same purpose as the ipq{5,6,8} compatibles. This is to indicate that the clocks properties are required. Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240507024758.2810514-1-mr.nuke.me@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08Merge tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Five ksmbd server fixes, all also for stable - Three fixes related to SMB3 leases (fixes two xfstests, and a locking issue) - Unitialized variable fix - Socket creation fix when bindv6only is set" * tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: do not grant v2 lease if parent lease key and epoch are not set ksmbd: use rwsem instead of rwlock for lease break ksmbd: avoid to send duplicate lease break notifications ksmbd: off ipv6only for both ipv4/ipv6 binding ksmbd: fix uninitialized symbol 'share' in smb2_tree_connect()
2024-05-08Merge tag 'fuse-fixes-6.9-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Two one-liner fixes for issues introduced in -rc1" * tag 'fuse-fixes-6.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: virtiofs: include a newline in sysfs tag fuse: verify zero padding in fuse_backing_map
2024-05-08Merge tag 'exfat-for-6.9-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - Fix xfstests generic/013 test failure with dirsync mount option - Initialize the reserved fields of deleted file and stream extension dentries to zero * tag 'exfat-for-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: zero the reserved fields of file and stream extension dentries exfat: fix timing of synchronizing bitmap and inode
2024-05-08Merge tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: - Various syzbot fixes; mainly small gaps in validation - Fix an integer overflow in fiemap() which was preventing filefrag from returning the full list of extents - Fix a refcounting bug on the device refcount, turned up by new assertions in the development branch - Fix a device removal/readd bug; write_super() was repeatedly dropping and retaking bch_dev->io_ref references * tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefs: bcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async() bcachefs: Fix race in bch2_write_super() bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAX bcachefs: Add missing skcipher_request_set_callback() call bcachefs: Fix snapshot_t() usage in bch2_fs_quota_read_inode() bcachefs: Fix shift-by-64 in bformat_needs_redo() bcachefs: Guard against unknown k.k->type in __bkey_invalid() bcachefs: Add missing validation for superblock section clean bcachefs: Fix assert in bch2_alloc_v4_invalid() bcachefs: fix overflow in fiemap bcachefs: Add a better limit for maximum number of buckets bcachefs: Fix lifetime issue in device iterator helpers bcachefs: Fix bch2_dev_lookup() refcounting bcachefs: Initialize bch_write_op->failed in inline data path bcachefs: Fix refcount put in sb_field_resize error path bcachefs: Inodes need extra padding for varint_decode_fast() bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit() bcachefs: bucket_pos_to_bp_noerror() bcachefs: don't free error pointers bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()
2024-05-08Merge tag 'soc-fixes-6.9-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "These are a couple of last minute fixes that came in over the previous week, addressing: - A pin configuration bug on a qualcomm board that caused issues with ethernet and mmc - Two minor code fixes for misleading console output in the microchip firmware driver - A build warning in the sifive cache driver" * tag 'soc-fixes-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: firmware: microchip: clarify that sizes and addresses are in hex firmware: microchip: don't unconditionally print validation success arm64: dts: qcom: sa8155p-adp: fix SDHC2 CD pin configuration cache: sifive_ccache: Silence unused variable warning
2024-05-08Merge tag 'pci-v6.9-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Update kernel-parameters doc to describe "pcie_aspm=off" more accurately (Bjorn Helgaas) - Restore the parent's (not the child's) ASPM state to the parent during resume, which fixes a reboot during resume (Kai-Heng Feng) * tag 'pci-v6.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/ASPM: Restore parent state to parent, child state to child PCI/ASPM: Clarify that pcie_aspm=off means leave ASPM untouched
2024-05-08Merge branch 'rxrpc-miscellaneous-fixes'Jakub Kicinski
David Howells says: ==================== rxrpc: Miscellaneous fixes (part) Here some miscellaneous fixes for AF_RXRPC: (1) Fix the congestion control algorithm to start cwnd at 4 and to not cut ssthresh when the peer cuts its rwind size. (2) Only transmit a single ACK for all the DATA packets glued together into a jumbo packet to reduce the number of ACKs being generated. ==================== Link: https://lore.kernel.org/r/20240503150749.1001323-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08rxrpc: Only transmit one ACK per jumbo packet receivedDavid Howells
Only generate one ACK packet for all the subpackets in a jumbo packet. If we would like to generate more than one ACK, we prioritise them base on their reason code, in the order, highest first: OutOfSeq > NoSpace > ExceedsWin > Duplicate > Requested > Delay > Idle For the first four, we reference the lowest offending subpacket; for the last three, the highest. This reduces the number of ACKs we end up transmitting to one per UDP packet transmitted to reduce network loading and packet parsing. Fixes: 5d7edbc9231e ("rxrpc: Get rid of the Rx ring") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com <mailto:jaltman@auristor.com>> Link: https://lore.kernel.org/r/20240503150749.1001323-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08rxrpc: Fix congestion control algorithmDavid Howells
Make the following fixes to the congestion control algorithm: (1) Don't vary the cwnd starting value by the size of RXRPC_TX_SMSS since that's currently held constant - set to the size of a jumbo subpacket payload so that we can create jumbo packets on the fly. The current code invariably picks 3 as the starting value. Further, the starting cwnd needs to be an even number because we ack every other packet, so set it to 4. (2) Don't cut ssthresh when we see an ACK come from the peer with a receive window (rwind) less than ssthresh. ssthresh keeps track of characteristics of the connection whereas rwind may be reduced by the peer for any reason - and may be reduced to 0. Fixes: 1fc4fa2ac93d ("rxrpc: Fix congestion management") Fixes: 0851115090a3 ("rxrpc: Reduce ssthresh to peer's receive window") Signed-off-by: David Howells <dhowells@redhat.com> Suggested-by: Simon Wilkinson <sxw@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com <mailto:jaltman@auristor.com>> Link: https://lore.kernel.org/r/20240503150749.1001323-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08selftests: test_bridge_neigh_suppress.sh: Fix failures due to duplicate MACIdo Schimmel
When creating the topology for the test, three veth pairs are created in the initial network namespace before being moved to one of the network namespaces created by the test. On systems where systemd-udev uses MACAddressPolicy=persistent (default since systemd version 242), this will result in some net devices having the same MAC address since they were created with the same name in the initial network namespace. In turn, this leads to arping / ndisc6 failing since packets are dropped by the bridge's loopback filter. Fix by creating each net device in the correct network namespace instead of moving it there from the initial network namespace. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240426074015.251854d4@kernel.org/ Fixes: 7648ac72dcd7 ("selftests: net: Add bridge neighbor suppression test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240507113033.1732534-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08test: hsr: Call cleanup_all_ns when hsr_redbox.sh script exitsLukasz Majewski
Without this change the created netns instances are not cleared after this script execution. To fix this problem the cleanup_all_ns function from ../lib.sh is called. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08ax25: Remove superfuous "return" from ax25_ds_set_timerJoel Granados
Remove the explicit call to "return" in the void ax25_ds_set_timer function that was introduced in 78a7b5dbc060 ("ax.25: x.25: Remove the now superfluous sentinel elements from ctl_table array"). Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08ipvs: allow some sysctls in non-init user namespacesAlexander Mikhalitsyn
Let's make all IPVS sysctls writtable even when network namespace is owned by non-initial user namespace. Let's make a few sysctls to be read-only for non-privileged users: - sync_qlen_max - sync_sock_size - run_estimation - est_cpulist - est_nice I'm trying to be conservative with this to prevent introducing any security issues in there. Maybe, we can allow more sysctls to be writable, but let's do this on-demand and when we see real use-case. This patch is motivated by user request in the LXC project [1]. Having this can help with running some Kubernetes [2] or Docker Swarm [3] workloads inside the system containers. Link: https://github.com/lxc/lxc/issues/4278 [1] Link: https://github.com/kubernetes/kubernetes/blob/b722d017a34b300a2284b890448e5a605f21d01e/pkg/proxy/ipvs/proxier.go#L103 [2] Link: https://github.com/moby/libnetwork/blob/3797618f9a38372e8107d8c06f6ae199e1133ae8/osl/namespace_linux.go#L682 [3] Cc: Julian Anastasov <ja@ssi.bg> Cc: Simon Horman <horms@verge.net.au> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08ipvs: add READ_ONCE barrier for ipvs->sysctl_amemthreshAlexander Mikhalitsyn
Cc: Julian Anastasov <ja@ssi.bg> Cc: Simon Horman <horms@verge.net.au> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08ipv6: Fix potential uninit-value access in __ip6_make_skb()Shigeru Yoshida
As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in __ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags instead of testing HDRINCL on the socket to avoid a race condition which causes uninit-value access. Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: stmmac: dwmac-ipq806x: account for rgmii-txid/rxid/id phy-modeChristian Marangi
Currently the ipq806x dwmac driver is almost always used attached to the CPU port of a switch and phy-mode was always set to "rgmii" or "sgmii". Some device came up with a special configuration where the PHY is directly attached to the GMAC port and in those case phy-mode needs to be set to "rgmii-id" to make the PHY correctly work and receive packets. Since the driver supports only "rgmii" and "sgmii" mode, when "rgmii-id" (or variants) mode is set, the mode is rejected and probe fails. Add support also for these phy-modes to correctly setup PHYs that requires delay applied to tx/rx. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: bridge: switchdev: Improve error message for port_obj_add/del functionsOleksij Rempel
Enhance the error reporting mechanism in the switchdev framework to provide more informative and user-friendly error messages. Following feedback from users struggling to understand the implications of error messages like "failed (err=-28) to add object (id=2)", this update aims to clarify what operation failed and how this might impact the system or network. With this change, error messages now include a description of the failed operation, the specific object involved, and a brief explanation of the potential impact on the system. This approach helps administrators and developers better understand the context and severity of errors, facilitating quicker and more effective troubleshooting. Example of the improved logging: [ 70.516446] ksz-switch spi0.0 uplink: Failed to add Port Multicast Database entry (object id=2) with error: -ENOSPC (-28). [ 70.516446] Failure in updating the port's Multicast Database could lead to multicast forwarding issues. [ 70.516446] Current HW/SW setup lacks sufficient resources. This comprehensive update includes handling for a range of switchdev object IDs, ensuring that most operations within the switchdev framework benefit from clearer error reporting. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: phy: marvell-88q2xxx: add support for Rev B1 and B2Gregor Herburger
Different revisions of the Marvell 88q2xxx phy needs different init sequences. Add init sequence for Rev B1 and Rev B2. Rev B2 init sequence skips one register write. Tested-by: Dimitri Fedrau <dima.fedrau@gmail.com> Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08appletalk: Improve handling of broadcast packetsVincent Duvert
When a broadcast AppleTalk packet is received, prefer queuing it on the socket whose address matches the address of the interface that received the packet (and is listening on the correct port). Userspace applications that handle such packets will usually send a response on the same socket that received the packet; this fix allows the response to be sent on the correct interface. If a socket matching the interface's address is not found, an arbitrary socket listening on the correct port will be used, if any. This matches the implementation's previous behavior. Fixes atalkd's responses to network information requests when multiple network interfaces are configured to use AppleTalk. Link: https://lore.kernel.org/netdev/20200722113752.1218-2-vincent.ldev@duvert.net/ Link: https://gist.github.com/VinDuv/4db433b6dce39d51a5b7847ee749b2a4 Signed-off-by: Vincent Duvert <vincent.ldev@duvert.net> Signed-off-by: Doug Brown <doug@schmorgal.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net/ipv4: add tracepoint for icmp_sendPeilin He
Introduce a tracepoint for icmp_send, which can help users to get more detail information conveniently when icmp abnormal events happen. 1. Giving an usecase example: ============================= When an application experiences packet loss due to an unreachable UDP destination port, the kernel will send an exception message through the icmp_send function. By adding a trace point for icmp_send, developers or system administrators can obtain detailed information about the UDP packet loss, including the type, code, source address, destination address, source port, and destination port. This facilitates the trouble-shooting of UDP packet loss issues especially for those network-service applications. 2. Operation Instructions: ========================== Switch to the tracing directory. cd /sys/kernel/tracing Filter for destination port unreachable. echo "type==3 && code==3" > events/icmp/icmp_send/filter Enable trace event. echo 1 > events/icmp/icmp_send/enable 3. Result View: ================ udp_client_erro-11370 [002] ...s.12 124.728002: icmp_send: icmp_send: type=3, code=3. From 127.0.0.1:41895 to 127.0.0.1:6666 ulen=23 skbaddr=00000000589b167a Signed-off-by: Peilin He <he.peilin@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Liu Chun <liu.chun2@zte.com.cn> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: bridge: fix corrupted ethernet header on multicast-to-unicastFelix Fietkau
The change from skb_copy to pskb_copy unfortunately changed the data copying to omit the ethernet header, since it was pulled before reaching this point. Fix this by calling __skb_push/pull around pskb_copy. Fixes: 59c878cbcdd8 ("net: bridge: fix multicast-to-unicast with fraglist GSO") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08Merge branch 'ksz-dcb-dscp'David S. Miller
Oleksij Rempel says: ==================== add DCB and DSCP support for KSZ switches This patch series is aimed at improving support for DCB (Data Center Bridging) and DSCP (Differentiated Services Code Point) on KSZ switches. The main goal is to introduce global DSCP and PCP (Priority Code Point) mapping support, addressing the limitation of KSZ switches not having per-port DSCP priority mapping. This involves extending the DSA framework with new callbacks for managing trust settings for global DSCP and PCP maps. Additionally, we introduce IEEE 802.1q helpers for default configurations, benefiting other drivers too. Change logs are in separate patches. Compared to v6 this series includes some new patches for DSCP global mapping support and QoS selftest script for KSZ9477 switches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08selftests: microchip: add test for QoS support on KSZ9477 switch familyOleksij Rempel
Add tests covering following functionality on KSZ9477 switch family: - default port priority - global DSCP to Internal Priority Mapping - apptrust configuration This script was tested on KSZ9893R Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: microchip: add support DSCP priority mappingOleksij Rempel
Microchip KSZ and LAN variants do not have per port DSCP priority configuration. Instead there is a global DSCP mapping table. This patch provides write access to this global DSCP map. In case entry is "deleted", we map corresponding DSCP entry to a best effort prio, which is expected to be the default priority for all untagged traffic. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: add support switches global DSCP priority mappingOleksij Rempel
Some switches like Microchip KSZ variants do not support per port DSCP priority configuration. Instead there is a global DSCP mapping table. To handle it, we will accept set/del request to any of user ports to make global configuration and update dcb app entries for all other ports. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: microchip: let DCB code do PCP and DSCP policy configurationOleksij Rempel
802.1P (PCP) and DiffServ (DSCP) are handled now by DCB code. Let it do all needed initial configuration. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: microchip: init predictable IPV to queue mapping for all non ↵Oleksij Rempel
KSZ8xxx variants Init priority to queue mapping in the way as it shown in IEEE 802.1Q mapping example. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: microchip: enable ETS support for KSZ989X variantsOleksij Rempel
I tested ETS support on KSZ9893, so it should work other KSZ989X variants too, which was till not listed as support. With this change we now officially not support only ksz8 family of chips. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: microchip: dcb: add special handling for KSZ88X3 familyOleksij Rempel
KSZ88X3 switches have different behavior on different ports: - It seems to be not possible to disable VLAN PCP classification on port 2. It means, as soon as mutliqueue support is enabled, frames with VLAN tag will get PCP prios. This behavior do not affect Port 1 - it is possible to disable PCP prios. - DSCP classification is not working on Port 2. Since there are still usable configuration combinations, I added some quirks to make sure user will get appropriate error message if not possible configuration is chosen. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: microchip: add support for different DCB app configurationsOleksij Rempel
Add DCB support to configure app trust sources and default port priority. Following commands can be used for testing: dcb apptrust set dev lan1 order pcp dscp dcb app replace dev lan1 default-prio 3 Since it is not possible to configure DSCP-Prio mapping per port, this patch provide only ability to read switch global dscp-prio mapping and way to enable/disable app trust for DSCP. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: microchip: add multi queue support for KSZ88X3 variantsOleksij Rempel
KSZ88X3 switches support up to 4 queues. Rework ksz8795_set_prio_queue() to support KSZ8795 and KSZ88X3 families of switches. Per default, configure KSZ88X3 to use one queue, since it need special handling due to priority related errata. Errata handling is implemented in a separate patch. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: add IEEE 802.1q specific helpersOleksij Rempel
IEEE 802.1q specification provides recommendation and examples which can be used as good default values for different drivers. This patch implements mapping examples documented in IEEE 802.1Q-2022 in Annex I "I.3 Traffic type to traffic class mapping" and IETF DSCP naming and mapping DSCP to Traffic Type inspired by RFC8325. This helpers will be used in followup patches for dsa/microchip DCB implementation. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: microchip: add IPV information supportOleksij Rempel
Most of Microchip KSZ switches use Internal Priority Value associated with every frame. For example, it is possible to map any VLAN PCP or DSCP value to IPV and at the end, map IPV to a queue. Since amount of IPVs is not equal to amount of queues, add this information and make use of it in some functions. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: dsa: add support for DCB get/set apptrust configurationOleksij Rempel
Add DCB support to get/set trust configuration for different packet priority information sources. Some switch allow to chose different source of packet priority classification. For example on KSZ switches it is possible to configure VLAN PCP and/or DSCP sources. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08virtiofs: include a newline in sysfs tagBrian Foster
The internal tag string doesn't contain a newline. Append one when emitting the tag via sysfs. [Stefan] Orthogonal to the newline issue, sysfs_emit(buf, "%s", fs->tag) is needed to prevent format string injection. Signed-off-by: Brian Foster <bfoster@redhat.com> Fixes: a8f62f50b4e4 ("virtiofs: export filesystem tags through sysfs") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-05-07Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-05-06 (ice) This series contains updates to ice driver only. Paul adds support for additional E830 devices and adjusts naming for existing E830 devices. Marcin commonizes a couple of TC setup calls to reduce duplicated code. Mateusz adds ice_vsi_cfg_params into ice_vsi to consolidate info. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: refactor struct ice_vsi_cfg_params to be inside of struct ice_vsi ice: Deduplicate tc action setup ice: update E830 device ids and comments ice: add additional E830 device ids ==================== Link: https://lore.kernel.org/r/20240506170827.948682-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07net: usb: sr9700: stop lying about skb->truesizeEric Dumazet
Some usb drivers set small skb->truesize and break core networking stacks. In this patch, I removed one of the skb->truesize override. I also replaced one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240506143939.3673865-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07net: usb: smsc75xx: stop lying about skb->truesizeEric Dumazet
Some usb drivers try to set small skb->truesize and break core networking stacks. In this patch, I removed one of the skb->truesize override. I also replaced one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steve Glendinning <steve.glendinning@shawell.net> Link: https://lore.kernel.org/r/20240506142358.3657918-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07usb: aqc111: stop lying about skb->truesizeEric Dumazet
Some usb drivers try to set small skb->truesize and break core networking stacks. I replace one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") Fixes: 361459cd9642 ("net: usb: aqc111: Implement RX data path") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240506135546.3641185-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>