linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-02-09	perf sched: Fix memory leak in perf_sched__map()	Yang Jihong
	perf_sched__map() needs to free memory of map_cpus, color_pids and color_cpus in normal path and rollback allocated memory in error path. Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-3-yangjihong1@huawei.com
2024-02-09	perf sched: Move start_work_mutex and work_done_wait_mutex initialization to ↵	Yang Jihong
	perf_sched__replay() The start_work_mutex and work_done_wait_mutex are used only for the 'perf sched replay'. Put their initialization in perf_sched__replay () to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.197 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ] # perf sched replay run measurement overhead: 108 nsecs sleep measurement overhead: 65658 nsecs the run test took 999991 nsecs the sleep test took 1079324 nsecs nr_run_events: 42378 nr_sleep_events: 43102 nr_wakeup_events: 31852 target-less wakeups: 17 multi-target wakeups: 712 task 0 ( swapper: 0), nr_events: 10451 task 1 ( swapper: 1), nr_events: 3 task 2 ( swapper: 2), nr_events: 1 <SNIP> task 717 ( sched-messaging: 74483), nr_events: 152 task 718 ( sched-messaging: 74484), nr_events: 1944 task 719 ( sched-messaging: 74485), nr_events: 73 task 720 ( sched-messaging: 74486), nr_events: 163 task 721 ( sched-messaging: 74487), nr_events: 942 task 722 ( sched-messaging: 74488), nr_events: 78 task 723 ( sched-messaging: 74489), nr_events: 1090 ------------------------------------------------------------ #1 : 1366.507, ravg: 1366.51, cpu: 7682.70 / 7682.70 #2 : 1410.072, ravg: 1370.86, cpu: 7723.88 / 7686.82 #3 : 1396.296, ravg: 1373.41, cpu: 7568.20 / 7674.96 #4 : 1381.019, ravg: 1374.17, cpu: 7531.81 / 7660.64 #5 : 1393.826, ravg: 1376.13, cpu: 7725.25 / 7667.11 #6 : 1401.581, ravg: 1378.68, cpu: 7594.82 / 7659.88 #7 : 1381.337, ravg: 1378.94, cpu: 7371.22 / 7631.01 #8 : 1373.842, ravg: 1378.43, cpu: 7894.92 / 7657.40 #9 : 1364.697, ravg: 1377.06, cpu: 7324.91 / 7624.15 #10 : 1363.613, ravg: 1375.72, cpu: 7209.55 / 7582.69 # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-2-yangjihong1@huawei.com
2024-02-09	selftests: udpgso: Pull up network setup into shell script	Jakub Sitnicki
	udpgso regression test configures routing and device MTU directly through uAPI (Netlink, ioctl) to do its job. While there is nothing wrong with it, it takes more effort than doing it from shell. Looking forward, we would like to extend the udpgso regression tests to cover the EIO corner case [1], once it gets addressed. That will require a dummy device and device feature manipulation to set it up. Which means more Netlink code. So, in preparation, pull out network configuration into the shell script part of the test, so it is easily extendable in the future. Also, because it now easy to setup routing, add a second local IPv6 address. Because the second address is not managed by the kernel, we can "replace" the corresponding local route with a reduced-MTU one. This unblocks the disabled "ipv6 connected" test case. Add a similar setup for IPv4 for symmetry. [1] https://lore.kernel.org/netdev/87jzqsld6q.fsf@cloudflare.com/ Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20240207-jakub-krn-635-v3-1-3dfa3da8a7d3@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: openvswitch: Add validation for the recursion test	Aaron Conole
	Add a test case into the netlink checks that will show the number of nested action recursions won't exceed 16. Going to 17 on a small clone call isn't enough to exhaust the stack on (most) systems, so it should be safe to run even on systems that don't have the fix applied. Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240207132416.1488485-3-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: net: include forwarding lib	Paolo Abeni
	The altnames test uses the forwarding/lib.sh and that dependency currently causes failures when running the test after install: make -C tools/testing/selftests/ TARGETS=net install ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:altnames.sh # ... # ./altnames.sh: line 8: ./forwarding/lib.sh: No such file or directory # RTNETLINK answers: Operation not permitted # ./altnames.sh: line 73: tests_run: command not found # ./altnames.sh: line 65: pre_cleanup: command not found Address the issue leveraging the TEST_INCLUDES infrastructure provided by commit 2a0683be5b4c ("selftests: Introduce Makefile variable to list shared bash scripts") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/f7b1e9d468224cbc136d304362315499fe39848f.1707298927.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: tc-testing: add mirred to block tdc tests	Victor Nogueira
	Add 8 new mirred tdc tests that target mirred to block: - Add mirred mirror to egress block action - Add mirred mirror to ingress block action - Add mirred redirect to egress block action - Add mirred redirect to ingress block action - Try to add mirred action with both dev and block - Try to add mirred action without specifying neither dev nor block - Replace mirred redirect to dev action with redirect to block - Replace mirred redirect to block action with mirror to dev Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20240202020726.529170-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Fix bridge locked port test flakiness	Ido Schimmel
	The redirection test case fails in the netdev CI on debug kernels because an FDB entry is learned despite the presence of a tc filter that redirects incoming traffic [1]. I am unable to reproduce the failure locally, but I can see how it can happen given that learning is first enabled and only then the ingress tc filter is configured. On debug kernels the time window between these two operations is longer compared to regular kernels, allowing random packets to be transmitted and trigger learning. Fix by reversing the order and configure the ingress tc filter before enabling learning. [1] [...] # TEST: Locked port MAB redirect [FAIL] # Locked entry created for redirected traffic Fixes: 38c43a1ce758 ("selftests: forwarding: Add test case for traffic redirection from a locked port") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208155529.1199729-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Suppress grep warnings	Ido Schimmel
	Suppress the following grep warnings: [...] INFO: # Port group entries configuration tests - (, G) TEST: Common port group entries configuration tests (IPv4 (, G)) [ OK ] TEST: Common port group entries configuration tests (IPv6 (, G)) [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv4 (, G) port group entries configuration tests [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv6 (*, G) port group entries configuration tests [ OK ] [...] They do not fail the test, but do clutter the output. Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208155529.1199729-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Fix bridge MDB test flakiness	Ido Schimmel
	After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed. Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1]. Fix by reducing the Max Response Delay to 1 second. [1] [...] # TEST: IPv4 host entries forwarding tests [FAIL] # Packet locally received after flood Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208155529.1199729-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Fix layer 2 miss test flakiness	Ido Schimmel
	After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed. Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1]. Fix by reducing the Max Response Delay to 1 second. [1] [...] # TEST: L2 miss - Multicast (IPv4) [FAIL] # Unregistered multicast filter was hit after adding MDB entry Fixes: 8c33266ae26a ("selftests: forwarding: Add layer 2 miss test cases") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208155529.1199729-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: net: Fix bridge backup port test flakiness	Ido Schimmel
	The test toggles the carrier of a bridge port in order to test the bridge backup port feature. Due to the linkwatch delayed work the carrier change is not always reflected fast enough to the bridge driver and packets are not forwarded as the test expects, resulting in failures [1]. Fix by busy waiting on the bridge port state until it changes to the desired state following the carrier change. [1] # Backup port # ----------- [...] # TEST: swp1 carrier off [ OK ] # TEST: No forwarding out of swp1 [FAIL] [ 641.995910] br0: port 1(swp1) entered disabled state # TEST: No forwarding out of vx0 [ OK ] Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208123110.1063930-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Add missing multicast routing config entries	Ido Schimmel
	The two tests that make use of multicast routig (router.sh and router_multicast.sh) are currently failing in the netdev CI because the kernel is missing multicast routing support. Fix by adding the required config entries. Fixes: 6d4efada3b82 ("selftests: forwarding: Add multicast routing test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240208165538.1303021-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08	selftests: net: add more missing kernel config	Paolo Abeni
	The reuseport_addr_any.sh is currently skipping DCCP tests and pmtu.sh is skipping all the FOU/GUE related cases: add the missing options. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/38d3ca7f909736c1aef56e6244d67c82a9bba6ff.1707326987.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08	perf test: Skip metric w/o event name on arm64 in stat STD output linter	Yicong Yang
	stat+std_output.sh test fails on my arm64 machine: [root@localhost shell]# ./stat+std_output.sh Checking STD output: no args Unknown event name in TopDownL1 # 0.18 retiring [root@localhost shell]# ./stat+std_output.sh Checking STD output: no args [Success] Checking STD output: system wide [Success] Checking STD output: interval [Success] Checking STD output: per thread Unknown event name in tmux: server-1114960 # 0.41 frontend_bound When no args specified `perf stat` will add TopdownL1 metric group and the output will be like: [root@localhost shell]# perf stat -- stress-ng --vm 1 --timeout 1 stress-ng: info: [3351733] setting to a 1 second run per stressor stress-ng: info: [3351733] dispatching hogs: 1 vm stress-ng: info: [3351733] successful run completed in 1.02s Performance counter stats for 'stress-ng --vm 1 --timeout 1': 1,037.71 msec task-clock # 1.000 CPUs utilized 13 context-switches # 12.528 /sec 1 cpu-migrations # 0.964 /sec 67,544 page-faults # 65.090 K/sec 2,691,932,561 cycles # 2.594 GHz (74.56%) 6,571,333,653 instructions # 2.44 insn per cycle (74.92%) 521,863,142 branches # 502.901 M/sec (75.21%) 425,879 branch-misses # 0.08% of all branches (87.57%) TopDownL1 # 0.61 retiring (87.67%) # 0.03 frontend_bound (87.67%) # 0.02 bad_speculation (87.67%) # 0.34 backend_bound (74.61%) 1.038138390 seconds time elapsed 0.844849000 seconds user 0.189053000 seconds sys Metrics in group TopDownL1 don't have event name on arm64 but are not listed in the $skip_metric list which they should be listed. Add them to the skip list as what does for x86 platforms in [1]. [1] commit 4d60e83dfcee ("perf test: Skip metrics w/o event name in stat STD output linter") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: linuxarm@huawei.com Cc: kan.liang@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240207091222.54096-1-yangyicong@huawei.com
2024-02-08	perf symbols: Slightly improve module file executable section mappings	Adrian Hunter
	Currently perf does not record module section addresses except for the .text section. In general that means perf cannot get module section mappings correct (except for .text) when loading symbols from a kernel module file. (Note using --kcore does not have this issue) Improve that situation slightly by identifying executable sections that use the same mapping as the .text section. That happens when an executable section comes directly after the .text section, both in memory and on file, something that can be determined by following the same layout rules used by the kernel, refer kernel layout_sections(). Note whether that happens is somewhat arbitrary, so this is not a final solution. Example from tracing a virtual machine process: Before: $ perf script \| grep unknown CPU 0/KVM 1718 203.511270: 318341 cpu-cycles:P: ffffffffc13e8a70 [unknown] (/lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko) $ perf script -vvv 2>&1 >/dev/null \| grep kvm.intel \| grep 'noinstr.text\\|ffff' Map: 0-7e0 41430 [kvm_intel].noinstr.text Map: ffffffffc13a7000-ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko After: $ perf script \| grep 203.511270 CPU 0/KVM 1718 203.511270: 318341 cpu-cycles:P: ffffffffc13e8a70 vmx_vmexit+0x0 (/lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko) $ perf script -vvv 2>&1 >/dev/null \| grep kvm.intel \| grep 'noinstr.text\\|ffff' Map: ffffffffc13a7000-ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko Reported-by: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240208085326.13432-3-adrian.hunter@intel.com
2024-02-08	perf tools: Make it possible to see perf's kernel and module memory mappings	Adrian Hunter
	Dump kmaps if using 'perf --debug kmaps' or verbose > 2 (e.g. -vvv) for tools 'perf script' and 'perf report' if there is no browser. Example: $ perf --debug kmaps script 2>&1 >/dev/null \| grep kvm.intel build id event received for /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko: 0691d75e10e72ebbbd45a44c59f6d00a5604badf [20] Map: 0-3a3 4f5d8 [kvm_intel].modinfo Map: 0-5240 5f280 [kvm_intel]__versions Map: 0-30 64 [kvm_intel].note.Linux Map: 0-14 644c0 [kvm_intel].orc_header Map: 0-5297 43680 [kvm_intel].rodata Map: 0-5bee 3b837 [kvm_intel].text.unlikely Map: 0-7e0 41430 [kvm_intel].noinstr.text Map: 0-2080 713c0 [kvm_intel].bss Map: 0-26 705c8 [kvm_intel].data..read_mostly Map: 0-5888 6a4c0 [kvm_intel].data Map: 0-22 70220 [kvm_intel].data.once Map: 0-40 705f0 [kvm_intel].data..percpu Map: 0-1685 41d20 [kvm_intel].init.text Map: 0-4b8 6fd60 [kvm_intel].init.data Map: 0-380 70248 [kvm_intel]__dyndbg Map: 0-8 70218 [kvm_intel].exit.data Map: 0-438 4f980 [kvm_intel]__param Map: 0-5f5 4ca0f [kvm_intel].rodata.str1.1 Map: 0-3657 493b8 [kvm_intel].rodata.str1.8 Map: 0-e0 70640 [kvm_intel].data..ro_after_init Map: 0-500 70ec0 [kvm_intel].gnu.linkonce.this_module Map: ffffffffc13a7000-ffffffffc1421000 a0 /lib/modules/6.7.2-local/kernel/arch/x86/kvm/kvm-intel.ko The example above shows how the module section mappings are all wrong except for the main .text mapping at 0xffffffffc13a7000. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240208085326.13432-2-adrian.hunter@intel.com
2024-02-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/stmicro/stmmac/common.h 38cc3c6dcc09 ("net: stmmac: protect updates of 64-bit statistics counters") fd5a6a71313e ("net: stmmac: est: Per Tx-queue error count for HLBF") c5c3e1bfc9e0 ("net: stmmac: Offload queueMaxSDU from tc-taprio") drivers/net/wireless/microchip/wilc1000/netdev.c c9013880284d ("wifi: fill in MODULE_DESCRIPTION()s for wilc1000") 328efda22af8 ("wifi: wilc1000: do not realloc workqueue everytime an interface is added") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08	Merge tag 'net-6.8-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and netfilter. Current release - regressions: - nic: intel: fix old compiler regressions - netfilter: ipset: missing gc cancellations fixed Current release - new code bugs: - netfilter: ctnetlink: fix filtering for zone 0 Previous releases - regressions: - core: fix from address in memcpy_to_iter_csum() - netfilter: nfnetlink_queue: un-break NF_REPEAT - af_unix: fix memory leak for dead unix_(sk)->oob_skb in GC. - devlink: avoid potential loop in devlink_rel_nested_in_notify_work() - iwlwifi: - mvm: fix a battery life regression - fix double-free bug - mac80211: fix waiting for beacons logic - nic: nfp: flower: prevent re-adding mac index for bonded port Previous releases - always broken: - rxrpc: fix generation of serial numbers to skip zero - tipc: check the bearer type before calling tipc_udp_nl_bearer_add() - tunnels: fix out of bounds access when building IPv6 PMTU error - nic: hv_netvsc: register VF in netvsc_probe if NET_DEVICE_REGISTER missed - nic: atlantic: fix DMA mapping for PTP hwts ring Misc: - selftests: more fixes to deal with very slow hosts" * tag 'net-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits) netfilter: nft_set_pipapo: remove scratch_aligned pointer netfilter: nft_set_pipapo: add helper to release pcpu scratch area netfilter: nft_set_pipapo: store index in scratch maps netfilter: nft_set_rbtree: skip end interval element from gc netfilter: nfnetlink_queue: un-break NF_REPEAT netfilter: nf_tables: use timestamp to check for set element timeout netfilter: nft_ct: reject direction for ct id netfilter: ctnetlink: fix filtering for zone 0 s390/qeth: Fix potential loss of L3-IP@ in case of network issues netfilter: ipset: Missing gc cancellations fixed octeontx2-af: Initialize maps. net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio netfilter: nft_set_pipapo: remove static in nft_pipapo_get() netfilter: nft_compat: restrict match/target protocol to u16 netfilter: nft_compat: reject unused compat flag netfilter: nft_compat: narrow down revision to unsigned 8-bits net: intel: fix old compiler regressions MAINTAINERS: Maintainer change for rds selftests: cmsg_ipv6: repeat the exact packet ...
2024-02-08	x86: replace CONFIG_HAVE_KVM with IS_ENABLED(CONFIG_KVM)	Paolo Bonzini
	It is more accurate to check if KVM is enabled, instead of having the architecture say so. Architectures always "have" KVM, so for example checking CONFIG_HAVE_KVM in x86 code is pointless, but if KVM is disabled in a specific build, there is no need for support code. Alternatively, many of the #ifdefs could simply be deleted. However, this would add completely dead code. For example, when KVM is disabled, there should not be any posted interrupts, i.e. NOT wiring up the "dummy" handlers and treating IRQs on those vectors as spurious is the right thing to do. Cc: x86@kernel.org Cc: kbingham@kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08	Merge tag 'nf-24-02-08' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Narrow down target/match revision to u8 in nft_compat. 2) Bail out with unused flags in nft_compat. 3) Restrict layer 4 protocol to u16 in nft_compat. 4) Remove static in pipapo get command that slipped through when reducing set memory footprint. 5) Follow up incremental fix for the ipset performance regression, this includes the missing gc cancellation, from Jozsef Kadlecsik. 6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0 as no filtering, from Felix Huettner. 7) Reject direction for NFT_CT_ID. 8) Use timestamp to check for set element expiration while transaction is handled to prevent garbage collection from removing set elements that were just added by this transaction. Packet path and netlink dump/get path still use current time to check for expiration. 9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal. 10) map_index needs to be percpu and per-set, not just percpu. At this time its possible for a pipapo set to fill the all-zero part with ones and take the 'might have bits set' as 'start-from-zero' area. From Florian Westphal. This includes three patches: - Change scratchpad area to a structure that provides space for a per-set-and-cpu toggle and uses it of the percpu one. - Add a new free helper to prepare for the next patch. - Remove the scratch_aligned pointer and makes AVX2 implementation use the exact same memory addresses for read/store of the matching state. netfilter pull request 24-02-08 * tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_set_pipapo: remove scratch_aligned pointer netfilter: nft_set_pipapo: add helper to release pcpu scratch area netfilter: nft_set_pipapo: store index in scratch maps netfilter: nft_set_rbtree: skip end interval element from gc netfilter: nfnetlink_queue: un-break NF_REPEAT netfilter: nf_tables: use timestamp to check for set element timeout netfilter: nft_ct: reject direction for ct id netfilter: ctnetlink: fix filtering for zone 0 netfilter: ipset: Missing gc cancellations fixed netfilter: nft_set_pipapo: remove static in nft_pipapo_get() netfilter: nft_compat: restrict match/target protocol to u16 netfilter: nft_compat: reject unused compat flag netfilter: nft_compat: narrow down revision to unsigned 8-bits ==================== Link: https://lore.kernel.org/r/20240208112834.1433-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08	netfilter: ctnetlink: fix filtering for zone 0	Felix Huettner
	previously filtering for the default zone would actually skip the zone filter and flush all zones. Fixes: eff3c558bb7e ("netfilter: ctnetlink: support filtering by zone") Reported-by: Ilya Maximets <i.maximets@ovn.org> Closes: https://lore.kernel.org/netdev/2032238f-31ac-4106-8f22-522e76df5a12@ovn.org/ Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-07	selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros	Muhammad Usama Anjum
	Correct header file is needed for getting CLOSE_RANGE_* macros. Previously it was tested with newer glibc which didn't show the need to include the header which was a mistake. Link: https://lkml.kernel.org/r/20231024155137.219700-1-usama.anjum@collabora.com Fixes: ec54424923cf ("selftests: core: remove duplicate defines") Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com> Link: https://lore.kernel.org/all/7161219e-0223-d699-d6f3-81abd9abf13b@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07	selftests: bonding: use slowwait instead of hard code sleep	Hangbin Liu
	Use slowwait instead of hard code sleep for bonding tests. In function setup_prepare(), the client_create() will be called after server_create(). So I think there is no need to sleep in server_create() and remove it. For lab_lib.sh, remove bonding module may affect other running bonding tests. And some test env may buildin bond which can't be removed. The bonding link should be removed by lag_reset_network() or netns delete. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240205130048.282087-5-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07	selftests: bonding: reduce garp_test/arp_validate test time	Hangbin Liu
	The purpose of grat_arp is testing commit 9949e2efb54e ("bonding: fix send_peer_notif overflow"). As the send_peer_notif was defined to u8, to overflow it, we need to send_peer_notif = num_peer_notif * peer_notif_delay = num_grat_arp * peer_notify_delay / miimon > 255 (kernel) (kernel parameter) (user parameter) e.g. 30 (num_grat_arp) * 1000 (peer_notify_delay) / 100 (miimon) > 255. Which need 30s to complete sending garp messages. To save the testing time, the only way is reduce the miimon number. Something like 30 (num_grat_arp) * 100 (peer_notify_delay) / 10 (miimon) > 255. To save more time, the 50 num_grat_arp testing could be removed. The arp_validate_test also need to check the mii_status, which sleep too long. Use slowwait to save some time. For other connection checkings, make sure active slave changed first. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240205130048.282087-4-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07	selftests: bonding: use tc filter to check if LACP was sent	Hangbin Liu
	Use tc filter to check if LACP was sent, which is accurate and save more time. No need to remove bonding module as some test env may buildin bonding. And the bond link has been deleted. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240205130048.282087-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07	selftests/net/forwarding: add slowwait functions	Hangbin Liu
	Add slowwait functions to wait for some operations that may need a long time to finish. The busywait executes the cmd too fast, which is kind of wasting cpu in this scenario. At the same time, if shell debugging is enabled with `set -x`. the busywait will output too much logs. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240205130048.282087-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07	selftests/bpf: Mark cpumask kfunc declarations as __weak	Yafang Shao
	After the series "Annotate kfuncs in .BTF_ids section"[0], kfuncs can be generated from bpftool. Let's mark the existing cpumask kfunc declarations __weak so they don't conflict with definitions that will eventually come from vmlinux.h. [0]. https://lore.kernel.org/all/cover.1706491398.git.dxu@dxuuu.xyz Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/20240206081416.26242-5-laoar.shao@gmail.com
2024-02-07	selftests/bpf: Fix error checking for cpumask_success__load()	Yafang Shao
	We should verify the return value of cpumask_success__load(). Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240206081416.26242-4-laoar.shao@gmail.com
2024-02-07	tools/resolve_btfids: Fix cross-compilation to non-host endianness	Viktor Malik
	The .BTF_ids section is pre-filled with zeroed BTF ID entries during the build and afterwards patched by resolve_btfids with correct values. Since resolve_btfids always writes in host-native endianness, it relies on libelf to do the translation when the target ELF is cross-compiled to a different endianness (this was introduced in commit 61e8aeda9398 ("bpf: Fix libelf endian handling in resolv_btfids")). Unfortunately, the translation will corrupt the flags fields of SET8 entries because these were written during vmlinux compilation and are in the correct endianness already. This will lead to numerous selftests failures such as: $ sudo ./test_verifier 502 502 #502/p sleepable fentry accept FAIL Failed to load prog 'Invalid argument'! bpf_fentry_test1 is not sleepable verification time 34 usec stack depth 0 processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Summary: 0 PASSED, 0 SKIPPED, 1 FAILED Since it's not possible to instruct libelf to translate just certain values, let's manually bswap the flags (both global and entry flags) in resolve_btfids when needed, so that libelf then translates everything correctly. Fixes: ef2c6f370a63 ("tools/resolve_btfids: Add support for 8-byte BTF sets") Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com
2024-02-07	tools/resolve_btfids: Refactor set sorting with types from btf_ids.h	Viktor Malik
	Instead of using magic offsets to access BTF ID set data, leverage types from btf_ids.h (btf_id_set and btf_id_set8) which define the actual layout of the data. Thanks to this change, set sorting should also continue working if the layout changes. This requires to sync the definition of 'struct btf_id_set8' from include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync the rest of the file at the moment, b/c that would require to also sync multiple dependent headers and we don't need any other defs from btf_ids.h. Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com
2024-02-07	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "x86 guest: - Avoid false positive for check that only matters on AMD processors x86: - Give a hint when Win2016 might fail to boot due to XSAVES && !XSAVEC configuration - Do not allow creating an in-kernel PIT unless an IOAPIC already exists RISC-V: - Allow ISA extensions that were enabled for bare metal in 6.8 (Zbc, scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa) S390: - fix CC for successful PQAP instruction - fix a race when creating a shadow page" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM x86/kvm: Fix SEV check in sev_map_percpu_data() KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum KVM: x86: Check irqchip mode before create PIT KVM: riscv: selftests: Add Zfa extension to get-reg-list test RISC-V: KVM: Allow Zfa extension for Guest/VM KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test RISC-V: KVM: Allow Zihintntl extension for Guest/VM KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM KVM: riscv: selftests: Add vector crypto extensions to get-reg-list test RISC-V: KVM: Allow vector crypto extensions for Guest/VM KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list test RISC-V: KVM: Allow scalar crypto extensions for Guest/VM KVM: riscv: selftests: Add Zbc extension to get-reg-list test RISC-V: KVM: Allow Zbc extension for Guest/VM KVM: s390: fix cc for successful PQAP KVM: s390: vsie: fix race during shadow creation
2024-02-07	perf record: Display data size on pipe mode	Namhyung Kim
	Currently pipe mode doesn't set the file size and it results in a misleading message of 0 data size at the end. Although it might miss some accounting for pipe header or more, just displaying the data size would reduce the possible confusion. Before: $ perf record -o- perf test -w noploop \| perf report -i- -q --percent-limit=1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] <====== (here) 99.58% perf perf [.] noploop After: $ perf record -o- perf test -w noploop \| perf report -i- -q --percent-limit=1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.229 MB - ] 99.46% perf perf [.] noploop Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240112231340.779469-1-namhyung@kernel.org
2024-02-07	perf script: Print source line for each jump in brstackinsn	Kan Liang
	With the srcline option, the perf script only prints a source line at the beginning of a sample with call/ret from functions, but not for each jump in brstackinsn. It's useful to print a source line for each jump in brstackinsn when the end user analyze the full assembler sequences of branch sequences for the sample. The srccode option can also be used to locate the source code line. However, it's printed almost for every line and makes the output less readable. $perf script -F +brstackinsn,+srcline --xed Before the patch, tchain_edit_deb 1463275 15228549.107820: 282495 instructions:u: 401133 f3+0xd (/home/kan/os.li> tchain_edit.c:22 f3+40: tchain_edit.c:20 000000000040114e jle 0x401133 # PRED 6 cycles [6] 0000000000401133 movl -0x4(%rbp), %eax 0000000000401136 and $0x1, %eax 0000000000401139 test %eax, %eax 000000000040113b jz 0x401143 000000000040113d addl $0x1, -0x4(%rbp) 0000000000401141 jmp 0x401147 # PRED 3 cycles [9] 2.00 IPC 0000000000401147 cmpl $0x3e7, -0x4(%rbp) 000000000040114e jle 0x401133 # PRED 6 cycles [15] 0.33 IPC After the patch, tchain_edit_deb 1463275 15228549.107820: 282495 instructions:u: 401133 f3+0xd (/home/kan/os.li> tchain_edit.c:22 f3+40: tchain_edit.c:20 000000000040114e jle 0x401133 srcline: tchain_edit.c:20 # PRED 6 cycles [6] 0000000000401133 movl -0x4(%rbp), %eax 0000000000401136 and $0x1, %eax 0000000000401139 test %eax, %eax 000000000040113b jz 0x401143 000000000040113d addl $0x1, -0x4(%rbp) 0000000000401141 jmp 0x401147 srcline: tchain_edit.c:23 # PRED 3 cycles [9] 2.00 IPC 0000000000401147 cmpl $0x3e7, -0x4(%rbp) 000000000040114e jle 0x401133 srcline: tchain_edit.c:20 # PRED 6 cycles [15] 0.33 IPC Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: ahmad.yasin@intel.com Cc: amiri.khalil@intel.com Cc: ak@linux.intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240205145819.1943114-1-kan.liang@linux.intel.com
2024-02-07	perf kvm powerpc: Fix build	Ian Rogers
	Updates to struct parse_events_error needed to be carried through to PowerPC specific event parsing. Fixes: fd7b8e8fb20f ("perf parse-events: Print all errors") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung <namhyung@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206235902.2917395-1-irogers@google.com
2024-02-07	selftests: add ESRCH tests for pidfd_getfd()	Tycho Andersen
	Ensure that pidfd_getfd() reports -ESRCH if the task is already exiting. Signed-off-by: Tycho Andersen <tandersen@netflix.com> Link: https://lore.kernel.org/r/20240206192357.81942-1-tycho@tycho.pizza Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-07	selftests: cmsg_ipv6: repeat the exact packet	Jakub Kicinski
	cmsg_ipv6 test requests tcpdump to capture 4 packets, and sends until tcpdump quits. Only the first packet is "real", however, and the rest are basic UDP packets. So if tcpdump doesn't start in time it will miss the real packet and only capture the UDP ones. This makes the test fail on slow machine (no KVM or with debug enabled) 100% of the time, while it passes in fast environments. Repeat the "real" / expected packet. Fixes: 9657ad09e1fa ("selftests: net: test IPV6_TCLASS") Fixes: 05ae83d5a4a2 ("selftests: net: test IPV6_HOPLIMIT") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-07	selftests/seccomp: Pin benchmark to single CPU	Kees Cook
	The seccomp benchmark test (for validating the benefit of bitmaps) can be sensitive to scheduling speed, so pin the process to a single CPU, which appears to significantly improve reliability, and loosen the "close enough" checking to allow up to 10% variance instead of 1%. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202402061002.3a8722fd-oliver.sang@intel.com Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-06	tools: ynl: add support for encoding multi-attr	Alessandro Marcolini
	Multi-attr elements could not be encoded because of missing logic in the ynl code. Enable encoding of these attributes by checking if the attribute is a multi-attr and if the value to be processed is a list. This has been tested both with the taprio and ets qdisc which contain this kind of attributes. Signed-off-by: Alessandro Marcolini <alessandromarcolini99@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/c5bc9f5797168dbf7a4379c42f38d5de8ac7f38a.1706962013.git.alessandromarcolini99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-06	tools: ynl: correct typo and docstring	Alessandro Marcolini
	Correct typo in SpecAttr docstring. Changed SpecSubMessageFormat docstring. Signed-off-by: Alessandro Marcolini <alessandromarcolini99@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/6ab1dea7fb1f635c0d8b237f03a49eaa448c2bf4.1706962013.git.alessandromarcolini99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-06	KVM: selftests: Don't assert on exact number of 4KiB in dirty log split test	Sean Christopherson
	Drop dirty_log_page_splitting_test's assertion that the number of 4KiB pages remains the same across dirty logging being enabled and disabled, as the test doesn't guarantee that mappings outside of the memslots being dirty logged are stable, e.g. KVM's mappings for code and pages in memslot0 can be zapped by things like NUMA balancing. To preserve the spirit of the check, assert that (a) the number of 4KiB pages after splitting is _at least_ the number of 4KiB pages across all memslots under test, and (b) the number of hugepages before splitting adds up to the number of pages across all memslots under test. (b) is a little tenuous as it relies on memslot0 being incompatible with transparent hugepages, but that holds true for now as selftests explicitly madvise() MADV_NOHUGEPAGE for memslot0 (__vm_create() unconditionally specifies the backing type as VM_MEM_SRC_ANONYMOUS). Reported-by: Yi Lai <yi1.lai@intel.com> Reported-by: Tao Su <tao1.su@linux.intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20240131222728.4100079-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-06	KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test	Sean Christopherson
	When finishing the final iteration of dirty_log_test testcase, set host_quit _before_ the final "continue" so that the vCPU worker doesn't run an extra iteration, and delete the hack-a-fix of an extra "continue" from the dirty ring testcase. This fixes a bug where the extra post to sem_vcpu_cont may not be consumed, which results in failures in subsequent runs of the testcases. The bug likely was missed during development as x86 supports only a single "guest mode", i.e. there aren't any subsequent testcases after the dirty ring test, because for_each_guest_mode() only runs a single iteration. For the regular dirty log testcases, letting the vCPU run one extra iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and only if the worker is explicitly told to stop (vcpu_sync_stop_requested). But for the dirty ring test, which needs to periodically stop the vCPU to reap the dirty ring, letting the vCPU resume the guest _after_ the last iteration means the vCPU will get stuck without an extra "continue". However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to be consumed, e.g. if the vCPU worker sees host_quit==true before resuming the guest. This results in a dangling sem_vcpu_cont, which leads to subsequent iterations getting out of sync, as the vCPU worker will continue on before the main task is ready for it to resume the guest, leading to a variety of asserts, e.g. ==== Test Assertion Failure ==== dirty_log_test.c:384: dirty_ring_vcpu_ring_full pid=14854 tid=14854 errno=22 - Invalid argument 1 0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384 2 0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505 3 (inlined by) run_test at dirty_log_test.c:802 4 0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100 5 0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3) 6 0x0000ffff9be173c7: ?? ??:0 7 0x0000ffff9be1749f: ?? ??:0 8 0x000000000040206f: _start at ??:? Didn't continue vcpu even without ring full Alternatively, the test could simply reset the semaphores before each testcase, but papering over hacks with more hacks usually ends in tears. Reported-by: Shaoqin Huang <shahuang@redhat.com> Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test") Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240202231831.354848-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-06	libbpf: Use OPTS_SET() macro in bpf_xdp_query()	Toke Høiland-Jørgensen
	When the feature_flags and xdp_zc_max_segs fields were added to the libbpf bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro. This causes libbpf to write to those fields unconditionally, which means that programs compiled against an older version of libbpf (with a smaller size of the bpf_xdp_query_opts struct) will have its stack corrupted by libbpf writing out of bounds. The patch adding the feature_flags field has an early bail out if the feature_flags field is not part of the opts struct (via the OPTS_HAS) macro, but the patch adding xdp_zc_max_segs does not. For consistency, this fix just changes the assignments to both fields to use the OPTS_SET() macro. Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com
2024-02-06	bpf: Use -Wno-address-of-packed-member in some selftests	Jose E. Marchesi
	[Differences from V2: - Remove conditionals in the source files pragmas, as the pragma is supported by both GCC and clang.] Both GCC and clang implement the -Wno-address-of-packed-member warning, which is enabled by -Wall, that warns about taking the address of a packed struct field when it can lead to an "unaligned" address. This triggers the following errors (-Werror) when building three particular BPF selftests with GCC: progs/test_cls_redirect.c 986 \| if (ipv4_is_fragment((void )&encap->ip)) { progs/test_cls_redirect_dynptr.c 410 \| pkt_ipv4_checksum((void )&encap_gre->ip); progs/test_cls_redirect.c 521 \| pkt_ipv4_checksum((void )&encap_gre->ip); progs/test_tc_tunnel.c 232 \| set_ipv4_csum((void )&h_outer.ip); These warnings do not signal any real problem in the tests as far as I can see. This patch adds pragmas to these test files that inhibit the -Waddress-of-packed-member warning. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240206102330.7113-1-jose.marchesi@oracle.com
2024-02-06	iommufd/selftest: Add mock IO hugepages tests	Joao Martins
	Leverage previously added MOCK_FLAGS_DEVICE_HUGE_IOVA flag to create an IOMMU domain with more than MOCK_IO_PAGE_SIZE supported. Plumb the hugetlb backing memory for buffer allocation and change the expected page size to MOCK_HUGE_PAGE_SIZE (1M) when hugepage variant test cases are used. These so far are limited to 128M and 256M IOVA range tests cases which is when 1M hugepages can be used. Link: https://lore.kernel.org/r/20240202133415.23819-9-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06	iommufd/selftest: Refactor dirty bitmap tests	Joao Martins
	Rework the functions that test and set the bitmaps to receive a new parameter (the pte_page_size) that reflects the expected PTE size in the page tables. The same scheme is still used i.e. even bits are dirty and odd page indexes aren't dirty. Here it just refactors to consider the size of the PTE rather than hardcoded to IOMMU mock base page assumptions. While at it, refactor dirty bitmap tests to use the idev_id created by the fixture instead of creating a new one. This is in preparation for doing tests with IOMMU hugepages where multiple bits set as part of recording a whole hugepage as dirty and thus the pte_page_size will vary depending on io hugepages or io base pages. Link: https://lore.kernel.org/r/20240202133415.23819-6-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06	iommufd/selftest: Test u64 unaligned bitmaps	Joao Martins
	Exercise the dirty tracking bitmaps with byte unaligned addresses in addition to the PAGE_SIZE unaligned bitmaps, using a address towards the end of the page boundary. In doing so, increase the tailroom we allocate for the bitmap from MOCK_PAGE_SIZE(2K) into PAGE_SIZE(4K), such that we can test end of bitmap boundary. Link: https://lore.kernel.org/r/20240202133415.23819-4-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06	bonding: Add independent control state machine	Aahil Awatramani
	Add support for the independent control state machine per IEEE 802.1AX-2008 5.4.15 in addition to the existing implementation of the coupled control state machine. Introduces two new states, AD_MUX_COLLECTING and AD_MUX_DISTRIBUTING in the LACP MUX state machine for separated handling of an initial Collecting state before the Collecting and Distributing state. This enables a port to be in a state where it can receive incoming packets while not still distributing. This is useful for reducing packet loss when a port begins distributing before its partner is able to collect. Added new functions such as bond_set_slave_tx_disabled_flags and bond_set_slave_rx_enabled_flags to precisely manage the port's collecting and distributing states. Previously, there was no dedicated method to disable TX while keeping RX enabled, which this patch addresses. Note that the regular flow process in the kernel's bonding driver remains unaffected by this patch. The extension requires explicit opt-in by the user (in order to ensure no disruptions for existing setups) via netlink support using the new bonding parameter coupled_control. The default value for coupled_control is set to 1 so as to preserve existing behaviour. Signed-off-by: Aahil Awatramani <aahila@google.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240202175858.1573852-1-aahila@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06	selftests/net: Amend per-netns counter checks	Dmitry Safonov
	Selftests here check not only that connect()/accept() for TCP-AO/TCP-MD5/non-signed-TCP combinations do/don't establish connections, but also counters: those are per-AO-key, per-socket and per-netns. The counters are checked on the server's side, as the server listener has TCP-AO/TCP-MD5/no keys for different peers. All tests run in the same namespaces with the same veth pair, created in test_init(). After close() in both client and server, the sides go through the regular FIN/ACK + FIN/ACK sequence, which goes in the background. If the selftest has already started a new testing scenario, read per-netns counters - it may fail in the end iff it doesn't expect the TCPAOGood per-netns counters go up during the test. Let's just kill both TCP-AO sides - that will avoid any asynchronous background TCP-AO segments going to either sides. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/all/20240201132153.4d68f45e@kernel.org/T/#u Fixes: 6f0c472a6815 ("selftests/net: Add TCP-AO + TCP-MD5 + no sign listen socket tests") Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://lore.kernel.org/r/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-06	selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW	Willem de Bruijn
	This test is time sensitive. It may fail on virtual machines and for debug builds. Continue to run in these environments to get code coverage. But optionally suppress failure for timing errors (only). This is controlled with environment variable KSFT_MACHINE_SLOW. The test continues to return 0 (KSFT_PASS), rather than KSFT_XFAIL as previously discussed. Because making so_txtime.c return that and then making so_txtime.sh capture runs that pass that vs KSFT_FAIL and pass it on added a bunch of (fragile bash) boilerplate, while the result is interpreted the same as KSFT_PASS anyway. Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240201162130.2278240-1-willemdebruijn.kernel@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-05	perf/pmu-events/powerpc: Update json mapfile with Power11 PVR	Madhavan Srinivasan
	Update the Power11 PVR to json mapfile to enable json events. Power11 is PowerISA v3.1 compliant and support Power10 events. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Cc: atrajeev@linux.vnet.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240129120855.551529-1-maddy@linux.ibm.com