summaryrefslogtreecommitdiff
path: root/tools/testing/selftests/drivers/net
AgeCommit message (Collapse)Author
2024-02-19selftests: bonding: set active slave to primary eth1 specificallyHangbin Liu
In bond priority testing, we set the primary interface to eth1 and add eth0,1,2 to bond in serial. This is OK in normal times. But when in debug kernel, the bridge port that eth0,1,2 connected would start slowly (enter blocking, forwarding state), which caused the primary interface down for a while after enslaving and active slave changed. Here is a test log from Jakub's debug test[1]. [ 400.399070][ T50] br0: port 1(s0) entered disabled state [ 400.400168][ T50] br0: port 4(s2) entered disabled state [ 400.941504][ T2791] bond0: (slave eth0): making interface the new active one [ 400.942603][ T2791] bond0: (slave eth0): Enslaving as an active interface with an up link [ 400.943633][ T2766] br0: port 1(s0) entered blocking state [ 400.944119][ T2766] br0: port 1(s0) entered forwarding state [ 401.128792][ T2792] bond0: (slave eth1): making interface the new active one [ 401.130771][ T2792] bond0: (slave eth1): Enslaving as an active interface with an up link [ 401.131643][ T69] br0: port 2(s1) entered blocking state [ 401.132067][ T69] br0: port 2(s1) entered forwarding state [ 401.346201][ T2793] bond0: (slave eth2): Enslaving as a backup interface with an up link [ 401.348414][ T50] br0: port 4(s2) entered blocking state [ 401.348857][ T50] br0: port 4(s2) entered forwarding state [ 401.519669][ T250] bond0: (slave eth0): link status definitely down, disabling slave [ 401.526522][ T250] bond0: (slave eth1): link status definitely down, disabling slave [ 401.526986][ T250] bond0: (slave eth2): making interface the new active one [ 401.629470][ T250] bond0: (slave eth0): link status definitely up [ 401.630089][ T250] bond0: (slave eth1): link status definitely up [...] # TEST: prio (active-backup ns_ip6_target primary_reselect 1) [FAIL] # Current active slave is eth2 but not eth1 Fix it by setting active slave to primary slave specifically before testing. [1] https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/464301/1-bond-options-sh/stdout Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-01selftests: bonding: Check initial stateBenjamin Poirier
The purpose of the test_LAG_cleanup() function is to check that some hardware addresses are removed from underlying devices after they have been unenslaved. The test function simply checks that those addresses are not present at the end. However, if the addresses were never added to begin with due to some error in device setup, the test function currently passes. This is a false positive since in that situation the test did not actually exercise the intended functionality. Add a check that the expected addresses are indeed present after device setup. This makes the test function more robust. I noticed this problem when running the team/dev_addr_lists.sh test on a system without support for dummy and ipv6: tools/testing/selftests/drivers/net/team# ./dev_addr_lists.sh Error: Unknown device type. Error: Unknown device type. This program is not intended to be run as root. RTNETLINK answers: Operation not supported TEST: team cleanup mode lacp [ OK ] Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20240131140848.360618-3-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-01selftests: team: Add missing config optionsBenjamin Poirier
Similar to commit dd2d40acdbb2 ("selftests: bonding: Add more missing config options"), add more networking-specific config options which are needed for team device tests. For testing, I used the minimal config generated by virtme-ng and I added the options in the config file. Afterwards, the team device test passed. Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20240131140848.360618-2-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-25selftests: bonding: do not test arp/ns target with mode balance-alb/tlbHangbin Liu
The prio_arp/ns tests hard code the mode to active-backup. At the same time, The balance-alb/tlb modes do not support arp/ns target. So remove the prio_arp/ns tests from the loop and only test active-backup mode. Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests") Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> Closes: https://lore.kernel.org/netdev/17415.1705965957@famine/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240123075917.1576360-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-24selftests: netdevsim: fix the udp_tunnel_nic testJakub Kicinski
This test is missing a whole bunch of checks for interface renaming and one ifup. Presumably it was only used on a system with renaming disabled and NetworkManager running. Fixes: 91f430b2c49d ("selftests: net: add a test for UDP tunnel info infra") Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240123060529.1033912-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-19selftests: bonding: Increase timeout to 1200sBenjamin Poirier
When tests are run by runner.sh, bond_options.sh gets killed before it can complete: make -C tools/testing/selftests run_tests TARGETS="drivers/net/bonding" [...] # timeout set to 120 # selftests: drivers/net/bonding: bond_options.sh # TEST: prio (active-backup miimon primary_reselect 0) [ OK ] # TEST: prio (active-backup miimon primary_reselect 1) [ OK ] # TEST: prio (active-backup miimon primary_reselect 2) [ OK ] # TEST: prio (active-backup arp_ip_target primary_reselect 0) [ OK ] # TEST: prio (active-backup arp_ip_target primary_reselect 1) [ OK ] # TEST: prio (active-backup arp_ip_target primary_reselect 2) [ OK ] # not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds This test includes many sleep statements, at least some of which are related to timers in the operation of the bonding driver itself. Increase the test timeout to allow the test to complete. I ran the test in slightly different VMs (including one without HW virtualization support) and got runtimes of 13m39.760s, 13m31.238s, and 13m2.956s. Use a ~1.5x "safety factor" and set the timeout to 1200s. Fixes: 42a8d4aaea84 ("selftests: bonding: add bonding prio option test") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240116104402.1203850a@kernel.org/#t Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240118001233.304759-1-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanesAmit Cohen
'qos_pfc' test checks PFC behavior. The idea is to limit the traffic using a shaper somewhere in the flow of the packets. In this area, the buffer is smaller than the buffer at the beginning of the flow, so it fills up until there is no more space left. The test configures there PFC which is supposed to notice that the headroom is filling up and send PFC Xoff to indicate the transmitter to stop sending traffic for the priorities sharing this PG. The Xon/Xoff threshold is auto-configured and always equal to 2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet, traffic will keep arriving until the transmitter receives and processes the PFC packet. This amount of traffic is known as the PFC delay allowance. Currently the buffer for the delay traffic is configured as 100KB. The MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB. This allows 80KB extra to be stored in this buffer. 8-lane ports use two buffers among which the configured buffer is split, the Xoff threshold then applies to each buffer in parallel. The test does not take into account the behavior of 8-lane ports, when the ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes, packets are dropped and the test fails. Check if the relevant ports use 8 lanes, in such case double the size of the buffer, as the headroom is split half-half. Cc: Shuah Khan <shuah@kernel.org> Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/23ff11b7dff031eb04a41c0f5254a2b636cd8ebb.1705502064.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18selftests: mlxsw: qos_pfc: Remove wrong descriptionAmit Cohen
In the diagram of the topology, $swp3 and $swp4 are described as 1Gbps ports. This is wrong information, the test does not configure such speed. Cc: Shuah Khan <shuah@kernel.org> Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/0087e2d416aff7e444d15f7c2958fc1d438dc27e.1705502064.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18mlxsw: spectrum_acl_tcam: Fix stack corruptionIdo Schimmel
When tc filters are first added to a net device, the corresponding local port gets bound to an ACL group in the device. The group contains a list of ACLs. In turn, each ACL points to a different TCAM region where the filters are stored. During forwarding, the ACLs are sequentially evaluated until a match is found. One reason to place filters in different regions is when they are added with decreasing priorities and in an alternating order so that two consecutive filters can never fit in the same region because of their key usage. In Spectrum-2 and newer ASICs the firmware started to report that the maximum number of ACLs in a group is more than 16, but the layout of the register that configures ACL groups (PAGT) was not updated to account for that. It is therefore possible to hit stack corruption [1] in the rare case where more than 16 ACLs in a group are required. Fix by limiting the maximum ACL group size to the minimum between what the firmware reports and the maximum ACLs that fit in the PAGT register. Add a test case to make sure the machine does not crash when this condition is hit. [1] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120 [...] dump_stack_lvl+0x36/0x50 panic+0x305/0x330 __stack_chk_fail+0x15/0x20 mlxsw_sp_acl_tcam_group_update+0x116/0x120 mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110 mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20 mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0 mlxsw_sp_acl_rule_add+0x47/0x240 mlxsw_sp_flower_replace+0x1a9/0x1d0 tc_setup_cb_add+0xdc/0x1c0 fl_hw_replace_filter+0x146/0x1f0 fl_change+0xc17/0x1360 tc_new_tfilter+0x472/0xb90 rtnetlink_rcv_msg+0x313/0x3b0 netlink_rcv_skb+0x58/0x100 netlink_unicast+0x244/0x390 netlink_sendmsg+0x1e4/0x440 ____sys_sendmsg+0x164/0x260 ___sys_sendmsg+0x9a/0xe0 __sys_sendmsg+0x7a/0xc0 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x63/0x6b Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC") Reported-by: Orel Hagag <orelh@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/2d91c89afba59c22587b444994ae419dbea8d876.1705502064.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failureAmit Cohen
Lately, a bug was found when many TC filters are added - at some point, several bugs are printed to dmesg [1] and the switch is crashed with segmentation fault. The issue starts when gen_pool_free() fails because of unexpected behavior - a try to free memory which is already freed, this leads to BUG() call which crashes the switch and makes many other bugs. Trying to track down the unexpected behavior led to a bug in eRP code. The function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated index, sets the value and returns an error code. When gen_pool_alloc() fails it returns address 0, we track it and return -ENOBUFS outside, BUT the call for gen_pool_alloc() already override the index in erp_table structure. This is a problem when such allocation is done as part of table expansion. This is not a new table, which will not be used in case of allocation failure. We try to expand eRP table and override the current index (non-zero) with zero. Then, it leads to an unexpected behavior when address 0 is freed twice. Note that address 0 is valid in erp_table->base_index and indeed other tables use it. gen_pool_alloc() fails in case that there is no space left in the pre-allocated pool, in our case, the pool is limited to ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max erp entries are required, we exceed the limit and return an error, this error leads to "Failed to migrate vregion" print. Fix this by changing erp_table->base_index only in case of a successful allocation. Add a test case for such a scenario. Without this fix it causes segmentation fault: $ TESTS="max_erp_entries_test" ./tc_flower.sh ./tc_flower.sh: line 988: 1560 Segmentation fault tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null [1]: kernel BUG at lib/genalloc.c:508! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1 Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021 RIP: 0010:gen_pool_free_owner+0xc9/0xe0 ... Call Trace: <TASK> __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum] mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum] objagg_obj_root_destroy+0x18/0x80 [objagg] objagg_obj_destroy+0x12c/0x130 [objagg] mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum] mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum] mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum] mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum] mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum] mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum] tc_setup_cb_destroy+0xc1/0x180 fl_hw_destroy_filter+0x94/0xc0 [cls_flower] __fl_delete+0x1ac/0x1c0 [cls_flower] fl_destroy+0xc2/0x150 [cls_flower] tcf_proto_destroy+0x1a/0xa0 ... mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion Fixes: f465261aa105 ("mlxsw: spectrum_acl: Implement common eRP core") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18selftests: bonding: Add more missing config optionsBenjamin Poirier
As a followup to commit 03fb8565c880 ("selftests: bonding: add missing build configs"), add more networking-specific config options which are needed for bonding tests. For testing, I used the minimal config generated by virtme-ng and I added the options in the config file. All bonding tests passed. Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6 Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20240116154926.202164-1-bpoirier@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-18selftests: netdevsim: add a config fileJakub Kicinski
netdevsim tests aren't very well integrated with kselftest, which has its advantages and disadvantages. But regardless of the intended integration - a config file to know what kernel to build is very useful, add one. Fixes: fc4c93f145d7 ("selftests: add basic netdevsim devlink flash testing") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240116154311.1945801-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16selftests: bonding: add missing build configsJakub Kicinski
bonding tests also try to create bridge, veth and dummy interfaces. These are not currently listed in config. Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") Fixes: c078290a2b76 ("selftests: include bonding tests into the kselftest infra") Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240116020201.1883023-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-16selftests: netdevsim: correct expected FEC stringsJakub Kicinski
ethtool CLI has changed its output. Make the test compatible. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240114224748.1210578-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16selftests: netdevsim: sprinkle more udevadm settleJakub Kicinski
Number of tests are failing when netdev renaming is active on the system. Add udevadm settle in logic determining the names. Fixes: 242aaf03dc9b ("selftests: add a test for ethtool pause stats") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240114224726.1210532-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16selftests: bonding: Change script interpreterBenjamin Poirier
The tests changed by this patch, as well as the scripts they source, use features which are not part of POSIX sh (ex. 'source' and 'local'). As a result, these tests fail when /bin/sh is dash such as on Debian. Change the interpreter to bash so that these tests can run successfully. Fixes: d43eff0b85ae ("selftests: bonding: up/down delay w/ slave link flapping") Tested-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c e009b2efb7a8 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()") 0f2b21477988 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL") https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-02selftests: bonding: do not set port down when adding to bondHangbin Liu
Similar to commit be809424659c ("selftests: bonding: do not set port down before adding to bond"). The bond-arp-interval-causes-panic test failed after commit a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") as the kernel will set the port down _after_ adding to bond if setting port down specifically. Fix it by removing the link down operation when adding to bond. Fixes: 2ffd57327ff1 ("selftests: bonding: cause oops in bond_rr_gen_slave_id") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Benjamin Poirier <benjamin.poirier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-18selftests: mlxsw: Add PCI reset testIdo Schimmel
Test that PCI reset works correctly by verifying that only the expected reset methods are supported and that after issuing the reset the ifindex of the port changes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13selftests: netdevsim: use suitable existing dummy file for flash testJiri Pirko
The file name used in flash test was "dummy" because at the time test was written, drivers were responsible for file request and as netdevsim didn't do that, name was unused. However, the file load request is now done in devlink code and therefore the file has to exist. Use first random file from /lib/firmware for this purpose. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-28selftests: bonding: create directly devices in the target namespacesZhengchao Shao
If failed to set link1_1 to netns client, we should delete link1_1 in the cleanup path. But if set link1_1 to netns client successfully, delete link1_1 will report warning. So it will be safer creating directly the devices in the target namespaces. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Closes: https://lore.kernel.org/all/ZNyJx1HtXaUzOkNA@Laptop-X1/ Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/net/inet_sock.h f866fbc842de ("ipv4: fix data-races around inet->inet_id") c274af224269 ("inet: introduce inet->inet_flags") https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/ Adjacent changes: drivers/net/bonding/bond_alb.c e74216b8def3 ("bonding: fix macvlan over alb bond support") f11e5bd159b0 ("bonding: support balance-alb with openvswitch") drivers/net/ethernet/broadcom/bgmac.c d6499f0b7c7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()") 23a14488ea58 ("net: bgmac: Fix return value check for fixed_phy_register()") drivers/net/ethernet/broadcom/genet/bcmmii.c 32bbe64a1386 ("net: bcmgenet: Fix return value check for fixed_phy_register()") acf50d1adbf4 ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()") net/sctp/socket.c f866fbc842de ("ipv4: fix data-races around inet->inet_id") b09bde5c3554 ("inet: move inet->mc_loop to inet->inet_frags") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24selftests: bonding: add macvlan over bond testingHangbin Liu
Add a macvlan over bonding test with mode active-backup, balance-tlb and balance-alb. ]# ./bond_macvlan.sh TEST: active-backup: IPv4: client->server [ OK ] TEST: active-backup: IPv6: client->server [ OK ] TEST: active-backup: IPv4: client->macvlan_1 [ OK ] TEST: active-backup: IPv6: client->macvlan_1 [ OK ] TEST: active-backup: IPv4: client->macvlan_2 [ OK ] TEST: active-backup: IPv6: client->macvlan_2 [ OK ] TEST: active-backup: IPv4: macvlan_1->macvlan_2 [ OK ] TEST: active-backup: IPv6: macvlan_1->macvlan_2 [ OK ] TEST: active-backup: IPv4: server->client [ OK ] TEST: active-backup: IPv6: server->client [ OK ] TEST: active-backup: IPv4: macvlan_1->client [ OK ] TEST: active-backup: IPv6: macvlan_1->client [ OK ] TEST: active-backup: IPv4: macvlan_2->client [ OK ] TEST: active-backup: IPv6: macvlan_2->client [ OK ] TEST: active-backup: IPv4: macvlan_2->macvlan_2 [ OK ] TEST: active-backup: IPv6: macvlan_2->macvlan_2 [ OK ] [...] TEST: balance-alb: IPv4: client->server [ OK ] TEST: balance-alb: IPv6: client->server [ OK ] TEST: balance-alb: IPv4: client->macvlan_1 [ OK ] TEST: balance-alb: IPv6: client->macvlan_1 [ OK ] TEST: balance-alb: IPv4: client->macvlan_2 [ OK ] TEST: balance-alb: IPv6: client->macvlan_2 [ OK ] TEST: balance-alb: IPv4: macvlan_1->macvlan_2 [ OK ] TEST: balance-alb: IPv6: macvlan_1->macvlan_2 [ OK ] TEST: balance-alb: IPv4: server->client [ OK ] TEST: balance-alb: IPv6: server->client [ OK ] TEST: balance-alb: IPv4: macvlan_1->client [ OK ] TEST: balance-alb: IPv6: macvlan_1->client [ OK ] TEST: balance-alb: IPv4: macvlan_2->client [ OK ] TEST: balance-alb: IPv6: macvlan_2->client [ OK ] TEST: balance-alb: IPv4: macvlan_2->macvlan_2 [ OK ] TEST: balance-alb: IPv6: macvlan_2->macvlan_2 [ OK ] Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-24selftest: bond: add new topo bond_topo_2d1c.shHangbin Liu
Add a new testing topo bond_topo_2d1c.sh which is used more commonly. Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the extra link. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-21selftests: bonding: do not set port down before adding to bondHangbin Liu
Before adding a port to bond, it need to be set down first. In the lacpdu test the author set the port down specifically. But commit a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") changed the operation order, the kernel will set the port down _after_ adding to bond. So all the ports will be down at last and the test failed. In fact, the veth interfaces are already inactive when added. This means there's no need to set them down again before adding to the bond. Let's just remove the link down operation. Fixes: a4abfa627c38 ("net: rtnetlink: Enslave device before bringing it up") Reported-by: Zhengchao Shao <shaozhengchao@huawei.com> Closes: https://lore.kernel.org/netdev/a0ef07c7-91b0-94bd-240d-944a330fcabd@huawei.com/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20230817082459.1685972-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18selftests: mlxsw: Fix test failure on Spectrum-4Ido Schimmel
Remove assumptions about shared buffer cell size and instead query the cell size from devlink. Adjust the test to send small packets that fit inside a single cell. Tested on Spectrum-{1,2,3,4}. Fixes: 4735402173e6 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f7dfbf3c4d1cb23838d9eb99bab09afaa320c4ca.1692268427.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16selftests: bonding: remove redundant delete action of device link1_1Zhengchao Shao
When run command "ip netns delete client", device link1_1 has been deleted. So, it is no need to delete link1_1 again. Remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-09selftests: mlxsw: router_bridge_lag: Add a new selftestPetr Machata
Add a selftest to verify enslavement to a LAG with upper after fresh devlink reload. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/373a7754daa4dac32759a45095f47b08a2a869c8.1691498735.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02selftests: mlxsw: rif_bridge: Add a new selftestPetr Machata
This test verifies driver behavior with regards to creation of RIFs for a bridge as LAGs are added or removed to/from it, and ports added or removed to/from the LAG. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02selftests: mlxsw: rif_lag_vlan: Add a new selftestPetr Machata
This test verifies driver behavior with regards to creation of RIFs for LAG VLAN uppers as ports are added or removed to/from the LAG. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-02selftests: mlxsw: rif_lag: Add a new selftestPetr Machata
This test verifies driver behavior with regards to creation of RIFs for a LAG as ports are added or removed to/from it. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21selftests: mlxsw: rtnetlink: Drop obsolete testsPetr Machata
Support for enslaving ports to LAGs with uppers will be added in the following patches. Selftests to make sure it actually does the right thing are ready and will be sent as a follow-up. Similarly, ordering of MACVLAN creation and RIF creation will be relaxed and it will be permitted to create a MACVLAN first. Thus these two tests are obsolete. Drop them. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-12selftests: mlxsw: Test port range registers' occupancyIdo Schimmel
Test that filters that match on the same port range, but with different combination of IPv4/IPv6 and TCP/UDP all use the same port range register by observing port range registers' occupancy via devlink-resource. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/0a2eb63b234fb062ff011e80231868cc80000c81.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12selftests: mlxsw: Add scale test for port rangesIdo Schimmel
Query the maximum number of supported port range registers using devlink-resource and test that this number can be reached by configuring tc filters with different port ranges. Test that an error is returned in case the maximum number is exceeded. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/48eee181270d9f291e09d1858c7b26a3f7fcc164.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: one_armed_router: Use port MAC for bridge addressPetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The bridge eventually inherits MAC address from its first member, after the enslavement is acked. A number of (mainly VXLAN) selftests already work around the problem by setting the MAC address to whatever it will eventually be anyway. Do the same for this selftest. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: vxlan: Disable IPv6 autogen on bridgesPetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge (this holds for all bridges used here), the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks various aspects of VXLAN offloading and the bridges do not need to participate in routing traffic. The IP addresses or the RIFs are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in this selftest, thus exempting them from mlxsw router attention. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: spectrum: q_in_vni_veto: Disable IPv6 autogen on a bridgePetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks vetoing of a different aspect of the configuration and the bridge does not need to participate in routing traffic. The IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: qos_mc_aware: Disable IPv6 autogen on bridgesPetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge (this holds for both bridges used here), the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks traffic prioritization and scheduling, and the bridges serve for their L2 forwarding capabilities, and do not need to participate in routing traffic. The IP addresses or the RIFs are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in this selftest, thus exempting them from mlxsw router attention. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: qos_ets_strict: Disable IPv6 autogen on bridgesPetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge (this holds for both bridges used here), the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks traffic prioritization and scheduling, and the bridges serve for their L2 forwarding capabilities, and do not need to participate in routing traffic. The IP addresses or the RIFs are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in this selftest, thus exempting them from mlxsw router attention. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: qos_dscp_bridge: Disable IPv6 autogen on a bridgePetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks DCB DSCP-based prioritization, and the bridge serves for its L2 forwarding capabilities, and does not need to participate in routing traffic. The IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: mirror_gre_scale: Disable IPv6 autogen on a bridgePetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge, the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks how many mirroring sessions a machine is capable of offloading. The IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridge in this selftest, thus exempting it from mlxsw router attention. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: extack: Disable IPv6 autogen on bridgesPetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. At the time that the front panel port is enslaved to the bridge (this holds for all bridges used here), the bridge MAC address does not have the same prefix as other interfaces in the system. On Nvidia Spectrum-1 machines all the RIFs have to have the same 38-bit MAC address prefix. Since the bridge does not obey this limitation, the RIF cannot be created, and the enslavement attempt is vetoed on the grounds of the configuration not being offloadable. The selftest itself however checks whether a different vetoed aspect of the configuration provides an extack. The IP address or the RIF are irrelevant. Fix by disabling automatic IPv6 address generation for the HW-offloaded bridges in this selftest, thus exempting them from mlxsw router attention. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21selftests: mlxsw: q_in_q_veto: Disable IPv6 autogen on bridgesPetr Machata
In a future patch, mlxsw will start adding RIFs to uppers of front panel port netdevices, if they have an IP address. The swp enslavement to the 802.1ad bridge is not allowed, because RIFs are not allowed to be created for 802.1ad bridges, but the address indicates one needs to be created. Thus the veto selftests fail already during the port enslavement. Then the attempt to create a VLAN on top of the same bridge is not vetoed, because the bridge is not related to mlxsw, and the selftest fails. Fix by disabling automatic IPv6 address generation for the bridges in this selftest, thus exempting them from the mlxsw router attention. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05selftests: mlxsw: egress_vid_classification: Fix the diagramPetr Machata
The topology diagram implies that $swp1 and $swp2 are members of the bridge br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the diagram. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-05selftests: mlxsw: ingress_rif_conf_1d: Fix the diagramPetr Machata
The topology diagram implies that $swp1 and $swp2 are members of the bridge br0, when in fact only their uppers, $swp1.10 and $swp2.10 are. Adjust the diagram. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes. No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11Merge tag 'net-6.4-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Current release - regressions: - mtk_eth_soc: fix NULL pointer dereference Previous releases - regressions: - core: - skb_partial_csum_set() fix against transport header magic value - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs(). - annotate sk->sk_err write from do_recvmmsg() - add vlan_get_protocol_and_depth() helper - netlink: annotate accesses to nlk->cb_running - netfilter: always release netdev hooks from notifier Previous releases - always broken: - core: deal with most data-races in sk_wait_event() - netfilter: fix possible bug_on with enable_hooks=1 - eth: bonding: fix send_peer_notif overflow - eth: xpcs: fix incorrect number of interfaces - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register" * tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits) af_unix: Fix data races around sk->sk_shutdown. af_unix: Fix a data race of sk->sk_receive_queue->qlen. net: datagram: fix data-races in datagram_poll() net: mscc: ocelot: fix stat counter register values ipvlan:Fix out-of-bounds caused by unclear skb->cb docs: networking: fix x25-iface.rst heading & index order gve: Remove the code of clearing PBA bit tcp: add annotations around sk->sk_shutdown accesses net: add vlan_get_protocol_and_depth() helper net: pcs: xpcs: fix incorrect number of interfaces net: deal with most data-races in sk_wait_event() net: annotate sk->sk_err write from do_recvmmsg() netlink: annotate accesses to nlk->cb_running kselftest: bonding: add num_grat_arp test selftests: forwarding: lib: add netns support for tc rule handle stats get Documentation: bonding: fix the doc of peer_notif_delay bonding: fix send_peer_notif overflow net: ethernet: mtk_eth_soc: fix NULL pointer dereference selftests: nft_flowtable.sh: check ingress/egress chain too selftests: nft_flowtable.sh: monitor result file sizes ...
2023-05-10selftests: bonding: delete unnecessary lineLiang Li
"ip link set dev "$devbond1" nomaster" This line code in bond-eth-type-change.sh is unnecessary. Because $devbond1 was not added to any master device. Signed-off-by: Liang Li <liali@redhat.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-10kselftest: bonding: add num_grat_arp testHangbin Liu
TEST: num_grat_arp (active-backup miimon num_grat_arp 10) [ OK ] TEST: num_grat_arp (active-backup miimon num_grat_arp 20) [ OK ] TEST: num_grat_arp (active-backup miimon num_grat_arp 30) [ OK ] TEST: num_grat_arp (active-backup miimon num_grat_arp 50) [ OK ] Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-07Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "Third version of perf tool updates, with the build problems with with using a 'vmlinux.h' generated from the main build fixed, and the bpf skeleton build disabled by default. Build: - Require libtraceevent to build, one can disable it using NO_LIBTRACEEVENT=1. It is required for tools like 'perf sched', 'perf kvm', 'perf trace', etc. libtraceevent is available in most distros so installing 'libtraceevent-devel' should be a one-time event to continue building perf as usual. Using NO_LIBTRACEEVENT=1 produces tooling that is functional and sufficient for lots of users not interested in those libtraceevent dependent features. - Allow Python support in 'perf script' when libtraceevent isn't linked, as not all features requires it, for instance Intel PT does not use tracepoints. - Error if the python interpreter needed for jevents to work isn't available and NO_JEVENTS=1 isn't set, preventing a build without support for JSON vendor events, which is a rare but possible condition. The two check error messages: $(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.) $(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.) - Make libbpf 1.0 the minimum required when building with out of tree, distro provided libbpf. - Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++ demangler, add 'perf test' entry for it. - Make binutils libraries opt in, as distros disable building with it due to licensing, they were used for C++ demangling, for instance. - Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or equivalent) isn't installed, we'll just have a build warning: Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev - Add a feature test for scandirat(), that is not implemented so far in musl and uclibc, disabling features that need it, such as scanning for tracepoints in /sys/kernel/tracing/events. perf BPF filters: - New feature where BPF can be used to filter samples, for instance: $ sudo ./perf record -e cycles --filter 'period > 1000' true $ sudo ./perf script perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms]) perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms]) perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms]) perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms]) perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms]) true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms]) true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) - In addition to 'period' (PERF_SAMPLE_PERIOD), the other PERF_SAMPLE_ can be used for filtering, and also some other sample accessible values, from tools/perf/Documentation/perf-record.txt: Essentially the BPF filter expression is: <term> <operator> <value> (("," | "||") <term> <operator> <value>)* The <term> can be one of: ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr, code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat, p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock, mem_dtlb, mem_blk, mem_hops The <operator> can be one of: ==, !=, >, >=, <, <=, & The <value> can be one of: <number> (for any term) na, load, store, pfetch, exec (for mem_op) l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl) na, none, hit, miss, hitm, fwd, peer (for mem_snoop) remote (for mem_remote) na, locked (for mem_locked) na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb) na, by_data, by_addr (for mem_blk) hops0, hops1, hops2, hops3 (for mem_hops) perf lock contention: - Show lock type with address. - Track and show mmap_lock, siglock and per-cpu rq_lock with address. This is done for mmap_lock by following the current->mm pointer: $ sudo ./perf lock con -abl -- sleep 10 contended total wait max wait avg wait address symbol ... 16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640 17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0 3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock 3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58 1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70 9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock 14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0 3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock 16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560 11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock 1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8 1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock 581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058 5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070 112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120 381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock 255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80 - Update default map size to 16384. - Allocate single letter option -M for --map-nr-entries, as it is proving being frequently used. - Fix struct rq lock access for older kernels with BPF's CO-RE (Compile once, run everywhere). - Fix problems found with MSAn. perf report/top: - Add inline information when using --call-graph=fp or lbr, as was already done to the --call-graph=dwarf callchain mode. - Improve the 'srcfile' sort key performance by really using an optimization introduced in 6.2 for the 'srcline' sort key that avoids calling addr2line for comparision with each sample. perf sched: - Make 'perf sched latency/map/replay' to use "sched:sched_waking" instead of "sched:sched_waking", consistent with 'perf record' since d566a9c2d482 ("perf sched: Prefer sched_waking event when it exists"). perf ftrace: - Make system wide the default target for latency subcommand, run the following command then generate some network traffic and press control+C: # perf ftrace latency -T __kfree_skb ^C DURATION | COUNT | GRAPH | 0 - 1 us | 27 | ############# | 1 - 2 us | 22 | ########### | 2 - 4 us | 8 | #### | 4 - 8 us | 5 | ## | 8 - 16 us | 24 | ############ | 16 - 32 us | 2 | # | 32 - 64 us | 1 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | # perf top: - Add --branch-history (LBR: Last Branch Record) option, just like already available for 'perf record'. - Fix segfault in thread__comm_len() where thread->comm was being used outside thread->comm_lock. perf annotate: - Allow configuring objdump and addr2line in ~/.perfconfig., so that you can use alternative binaries, such as llvm's. perf kvm: - Add TUI mode for 'perf kvm stat report'. Reference counting: - Add reference count checking infrastructure to check for use after free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs, more to come. To build with it use -DREFCNT_CHECKING=1 in the make command line to build tools/perf. Documented at: https://perf.wiki.kernel.org/index.php/Reference_Count_Checking - The above caught, for instance, fix, present in this series: - Fix maps use after put in 'perf test "Share thread maps"': 'maps' is copied from leader, but the leader is put on line 79 and then 'maps' is used to read the reference count below - so a use after put, with the put of maps happening within thread__put. Fixed by reversing the order of puts so that the leader is put last. - Also several fixes were made to places where reference counts were not being held. - Make this one of the tests in 'make -C tools/perf build-test' to regularly build test it and to make sure no direct access to the reference counted structs are made, doing that via accessors to check the validity of the struct pointer. ARM64: - Fix 'perf report' segfault when filtering coresight traces by sparse lists of CPUs. - Add support for 'simd' as a sort field for 'perf report', to show ARM's NEON SIMD's predicate flags: "partial" and "empty". arm64 vendor events: - Add N1 metrics. Intel vendor events: - Add graniterapids, grandridge and sierraforrest events. - Refresh events for: alderlake, aldernaken, broadwell, broadwellde, broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex, jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids, silvermont, skylake, tigerlake and westmereep-dp - Refresh metrics for alderlake-n, broadwell, broadwellde, broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and skylakex. perf stat: - Implement --topdown using JSON metrics. - Add TopdownL1 JSON metric as a default if present, but disable it for now for some Intel hybrid architectures, a series of patches addressing this is being reviewed and will be submitted for v6.5. - Use metrics for --smi-cost. - Update topdown documentation. Vendor events (JSON) infrastructure: - Add support for computing and printing metric threshold values. For instance, here is one found in thesapphirerapids json file: { "BriefDescription": "Percentage of cycles spent in System Management Interrupts.", "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)", "MetricGroup": "smi", "MetricName": "smi_cycles", "MetricThreshold": "smi_cycles > 0.1", "ScaleUnit": "100%" }, - Test parsing metric thresholds with the fake PMU in 'perf test pmu-events'. - Support for printing metric thresholds in 'perf list'. - Add --metric-no-threshold option to 'perf stat'. - Add rand (reverse and) and has_pmem (optane memory) support to metrics. - Sort list of input files to avoid depending on the order from readdir() helping in obtaining reproducible builds. S/390: - Add common metrics: - CPI (cycles per instruction), prbstate (ratio of instructions executed in problem state compared to total number of instructions), l1mp (Level one instruction and data cache misses per 100 instructions). - Add cache metrics for z13, z14, z15 and z16. - Add metric for TLB and cache. ARM: - Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE (Memory Tagging Extension) and MOPS (Memory Operations) load/store. Intel PT hardware tracing: - Add event type names UINTR (User interrupt delivered) and UIRET (Exiting from user interrupt routine), documented in table 32-50 "CFE Packet Type and Vector Fields Details" in the Intel Processor Trace chapter of The Intel SDM Volume 3 version 078. - Add support for new branch instructions ERETS and ERETU. - Fix CYC timestamps after standalone CBR ARM CoreSight hardware tracing: - Allow user to override timestamp and contextid settings. - Fix segfault in dso lookup. - Fix timeless decode mode detection. - Add separate decode paths for timeless and per-thread modes. auxtrace: - Fix address filter entire kernel size. Miscellaneous: - Fix use-after-free and unaligned bugs in the PLT handling routines. - Use zfree() to reduce chances of use after free. - Add missing 0x prefix for addresses printed in hexadecimal in 'perf probe'. - Suppress massive unsupported target platform errors in the unwind code. - Fix return incorrect build_id size in elf_read_build_id(). - Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 . - Add missing new parameter in kfree_skb tracepoint to the python scripts using it. - Add 'perf bench syscall fork' benchmark. - Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in 'perf mem'. - Fix wrong size expectation for perf test 'Setup struct perf_event_attr' caused by the patch adding perf_event_attr::config3. - Fix some spelling mistakes" * tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits) Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL" Revert "perf build: Warn for BPF skeletons if endian mismatches" perf metrics: Fix SEGV with --for-each-cgroup perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE perf stat: Separate bperf from bpf_profiler perf test record+probe_libc_inet_pton: Fix call chain match on x86_64 perf test record+probe_libc_inet_pton: Fix call chain match on s390 perf tracepoint: Fix memory leak in is_valid_tracepoint() perf cs-etm: Add fix for coresight trace for any range of CPUs perf build: Fix unescaped # in perf build-test perf unwind: Suppress massive unsupported target platform errors perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it perf script: Print raw ip instead of binary offset for callchain perf symbols: Fix return incorrect build_id size in elf_read_build_id() perf list: Modify the warning message about scandirat(3) perf list: Fix memory leaks in print_tracepoint_events() perf lock contention: Rework offset calculation with BPF CO-RE perf lock contention: Fix struct rq lock access perf stat: Disable TopdownL1 on hybrid perf stat: Avoid SEGV on counter->name ...