linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-02-13	selftests/resctrl: Improve perf init	Ilpo Järvinen
	struct perf_event_attr initialization is spread into perf_event_initialize() and perf_event_attr_initialize() and setting ->config is hardcoded by the deepest level. perf_event_attr init belongs to perf_event_attr_initialize() so move it entirely there. Rename the other function perf_event_initialized_read_format(). Call each init function directly from the test as they will take different parameters (especially true after the perf related global variables are moved to local variables). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Consolidate naming of perf event related things	Ilpo Järvinen
	Naming for perf event related functions, types, and variables is inconsistent. Make struct read_format and all functions related to perf events start with "perf_". Adjust variable names towards the same direction but use shorter names for variables where appropriate (pe prefix). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Remove nested calls in perf event handling	Ilpo Järvinen
	Perf event handling has functions that are the sole caller of another perf event handling related function: - reset_enable_llc_perf() calls perf_event_open_llc_miss() - perf_event_measure() calls get_llc_perf() Remove the extra layer of calls to make the code easier to follow by moving the code into the calling function. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Remove unnecessary __u64 -> unsigned long conversion	Ilpo Järvinen
	Perf counters are __u64 but the code converts them to unsigned long before printing them out. Remove unnecessary type conversion and retain the perf originating value as __u64. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Split show_cache_info() to test specific and generic parts	Ilpo Järvinen
	show_cache_info() calculates results and provides generic cache information. This makes it hard to alter pass/fail conditions. Separate the test specific checks into CAT and CMT test files and leave only the generic information part into show_cache_info(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Split measure_cache_vals()	Ilpo Järvinen
	measure_cache_vals() does a different thing depending on the test case that called it: - For CAT, it measures LLC misses through perf. - For CMT, it measures LLC occupancy through resctrl. Split these two functionalities into own functions the CAT and CMT tests can call directly. Replace passing the struct resctrl_val_param parameter with the filename because it's more generic and all those functions need out of resctrl_val. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Exclude shareable bits from schemata in CAT test	Ilpo Järvinen
	CAT test doesn't take shareable bits into account, i.e., the test might be sharing cache with some devices (e.g., graphics). Introduce get_mask_no_shareable() and use it to provision an environment for CAT test where the allocated LLC is isolated better. Excluding shareable_bits may create hole(s) into the cbm_mask, thus add a new helper count_contiguous_bits() to find the longest contiguous set of CBM bits. create_bit_mask() is needed by an upcoming CAT test rewrite so make it available in resctrl.h right away. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Create cache_portion_size() helper	Ilpo Järvinen
	CAT and CMT tests calculate size of the cache portion for the n-bits cache allocation on their own. Add cache_portion_size() helper that calculates size of the cache portion for the given number of bits and use it to replace the existing span calculations. This also prepares for the new CAT test that will need to determine the size of the cache portion also during results processing. Rename also 'cache_size' local variables to 'cache_total_size' to prevent misinterpretations. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Mark get_cache_size() cache_type const	Ilpo Järvinen
	get_cache_size() does not modify cache_type so it could be const. Mark cache_type const so that const char * can be passed to it. This prevents warnings once many of the test parameters are marked const. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Refactor get_cbm_mask() and rename to get_full_cbm()	Ilpo Järvinen
	Callers of get_cbm_mask() are required to pass a string into which the capacity bitmask (CBM) is read. Neither CAT nor CMT tests need the bitmask as string but just convert it into an unsigned long value. Another limitation is that the bit mask reader can only read .../cbm_mask files. Generalize the bit mask reading function into get_bit_mask() such that it can be used to handle other files besides the .../cbm_mask and handles the unsigned long conversion within get_bit_mask() using fscanf(). Change get_cbm_mask() to use get_bit_mask() and rename it to get_full_cbm() to better indicate what the function does. Return error from get_full_cbm() if the bitmask is zero for some reason because it makes the code more robust as the selftests naturally assume the bitmask has some bits. Also mark cache_type const while at it and remove useless comments that are related to processing of CBM bits. Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Refactor fill_buf functions	Ilpo Järvinen
	There are unnecessary nested calls in fill_buf.c: - run_fill_buf() calls fill_cache() - alloc_buffer() calls malloc_and_init_memory() Simplify the code flow and remove those unnecessary call levels by moving the called code inside the calling function and remove the duplicated error print. Resolve the difference in run_fill_buf() and fill_cache() parameter name into 'buf_size' which is more descriptive than 'span'. Also, while moving the allocation related code, rename 'p' into 'buf' to be consistent in naming the variables. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Split fill_buf to allow tests finer-grained control	Ilpo Järvinen
	MBM, MBA and CMT test cases call run_fill_buf() that in turn calls fill_cache() to alloc and loop indefinitely around the buffer. This binds buffer allocation and running the benchmark into a single bundle so that a selftest cannot allocate a buffer once and reuse it. CAT test doesn't want to loop around the buffer continuously and after rewrite it needs the ability to allocate the buffer separately. Split buffer allocation out of fill_cache() into alloc_buffer(). This change is part of preparation for the new CAT test that allocates a buffer and does multiple passes over the same buffer (but not in an infinite loop). Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Change function comments to say < 0 on error	Ilpo Järvinen
	A number function comments state the function return non-zero on failure but in reality they can only return 0 on success and < 0 on error. Update the comments to say < 0 on error to match the behavior. While at it, improve cat_val() comment to state that 0 means the test was run (either pass or fail). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Don't use ctrlc_handler() outside signal handling	Ilpo Järvinen
	perf_event_open_llc_miss() calls ctrlc_handler() to cleanup if perf_event_open() returns an error. Those cleanups, however, are not the responsibility of perf_event_open_llc_miss() and it thus interferes unnecessarily with the usual cleanup pattern. Worse yet, ctrlc_handler() calls exit() in the end preventing the ordinary cleanup done in the calling function from executing. ctrlc_handler() should only be used as a signal handler, not during normal error handling. Remove call to ctrlc_handler() from perf_event_open_llc_miss(). As unmounting resctrlfs and test cleanup are already handled properly by error rollbacks in the calling functions, no other changes are necessary. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Return -1 instead of errno on error	Ilpo Järvinen
	A number of functions in the resctrl selftests return errno. It is problematic because errno is positive which is often counterintuitive. Also, every site returning errno prints the error message already with ksft_perror() so there is not much added value in returning the precise error code. Simply convert all places returning errno to return -1 that is typical userspace error code in case of failures. While at it, improve resctrl_val() comment to state that 0 means the test was run (either pass or fail). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests/resctrl: Convert perror() to ksft_perror() or ksft_print_msg()	Ilpo Järvinen
	The resctrl selftest code contains a number of perror() calls. Some of them come with hash character and some don't. The kselftest framework provides ksft_perror() that is compatible with test output formatting so it should be used instead of adding custom hash signs. Some perror() calls are too far away from anything that sets error. For those call sites, ksft_print_msg() must be used instead. Convert perror() to ksft_perror() or ksft_print_msg(). Other related changes: - Remove hash signs - Remove trailing stops & newlines from ksft_perror() - Add terminating newlines for converted ksft_print_msg() - Use consistent capitalization - Small fixes/tweaks to typos & grammar of the messages - Extract error printing out of PARENT_EXIT() to be able to differentiate Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-13	selftests: net: more pmtu.sh fixes	Paolo Abeni
	The netdev CI is reporting failures for the pmtu test: [ 115.929264] br0: port 2(vxlan_a) entered forwarding state # 2024/02/08 17:33:22 socat[7871] E bind(7, {AF=10 [0000:0000:0000:0000:0000:0000:0000:0000]:50000}, 28): Address already in use # 2024/02/08 17:33:22 socat[7877] E write(7, 0x5598fb6ff000, 8192): Connection refused # TEST: IPv6, bridged vxlan4: PMTU exceptions [FAIL] # File size 0 mismatches exepcted value in locally bridged vxlan test The root cause is apparently a socket created by a previous iteration of the relevant loop still lasting in LAST_ACK state. Note that even the file size check is racy, the receiver process dumping the file could still be running in background Allow the listener to bound on the same local port via SO_REUSEADDR and collect file output file size only after the listener completion. Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/4f51c11a1ce7ca7a4dabd926cffff63dadac9ba1.1707731086.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13	selftests: net: more strict check in net_helper	Paolo Abeni
	The helper waiting for a listener port can match any socket whose hexadecimal representation of source or destination addresses matches that of the given port. Additionally, any socket state is accepted. All the above can let the helper return successfully before the relevant listener is actually ready, with unexpected results. So far I could not find any related failure in the netdev CI, but the next patch is going to make the critical event more easily reproducible. Address the issue matching the port hex only vs the relevant socket field and additionally checking the socket state for TCP sockets. Fixes: 3bdd9fd29cb0 ("selftests/net: synchronize udpgro tests' tx and rx connection") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/192b3dbc443d953be32991d1b0ca432bd4c65008.1707731086.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13	selftests: net: cope with slow env in so_txtime.sh test	Paolo Abeni
	The mentioned test is failing in slow environments: # SO_TXTIME ipv4 clock monotonic # ./so_txtime: recv: timeout: Resource temporarily unavailable not ok 1 selftests: net: so_txtime.sh # exit=1 Tuning the tolerance in the test binary is error-prone and doomed to failures is slow-enough environment. Just resort to suppress any error in such cases. Note to suppress them we need first to refactor a bit the code moving it to explicit error handling. Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/2142d9ed4b5c5aa07dd1b455779625d91b175373.1707730902.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-13	selftests: net: cope with slow env in gro.sh test	Paolo Abeni
	The gro self-tests sends the packets to be aggregated with multiple write operations. When running is slow environment, it's hard to guarantee that the GRO engine will wait for the last packet in an intended train. The above causes almost deterministic failures in our CI for the 'large' test-case. Address the issue explicitly ignoring failures for such case in slow environments (KSFT_MACHINE_SLOW==true). Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/97d3ba83f5a2bfeb36f6bc0fb76724eb3dafb608.1707729403.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-12	selftests: net: ip_local_port_range: define IPPROTO_MPTCP	Maxim Galaganov
	Older glibc's netinet/in.h may leave IPPROTO_MPTCP undefined when building ip_local_port_range.c, that leads to "error: use of undeclared identifier 'IPPROTO_MPTCP'". Define IPPROTO_MPTCP in such cases, just like in other MPTCP selftests. Fixes: 122db5e3634b ("selftests/net: add MPTCP coverage for IP_LOCAL_PORT_RANGE") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/netdev/CA+G9fYvGO5q4o_Td_kyQgYieXWKw6ktMa-Q0sBu6S-0y3w2aEQ@mail.gmail.com/ Signed-off-by: Maxim Galaganov <max@internet.ru> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/r/20240209132512.254520-1-max@internet.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-10	Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7 issues or aren't considered to be needed in earlier kernel versions" * tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) nilfs2: fix potential bug in end_buffer_async_write mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() MAINTAINERS: Leo Yan has moved mm/zswap: don't return LRU_SKIP if we have dropped lru lock fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super mailmap: switch email address for John Moon mm: zswap: fix objcg use-after-free in entry destruction mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range() arch/arm/mm: fix major fault accounting when retrying under per-VMA lock selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page mm: memcg: optimize parent iteration in memcg_rstat_updated() nilfs2: fix data corruption in dsync block recovery for small block sizes mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get() exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand() getrusage: use sig->stats_lock rather than lock_task_sighand() getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() ...
2024-02-10	selftests: tls: use exact comparison in recv_partial	Jakub Kicinski
	This exact case was fail for async crypto and we weren't catching it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-09	selftests: openvswitch: Add validation for the recursion test	Aaron Conole
	Add a test case into the netlink checks that will show the number of nested action recursions won't exceed 16. Going to 17 on a small clone call isn't enough to exhaust the stack on (most) systems, so it should be safe to run even on systems that don't have the fix applied. Signed-off-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240207132416.1488485-3-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Fix bridge locked port test flakiness	Ido Schimmel
	The redirection test case fails in the netdev CI on debug kernels because an FDB entry is learned despite the presence of a tc filter that redirects incoming traffic [1]. I am unable to reproduce the failure locally, but I can see how it can happen given that learning is first enabled and only then the ingress tc filter is configured. On debug kernels the time window between these two operations is longer compared to regular kernels, allowing random packets to be transmitted and trigger learning. Fix by reversing the order and configure the ingress tc filter before enabling learning. [1] [...] # TEST: Locked port MAB redirect [FAIL] # Locked entry created for redirected traffic Fixes: 38c43a1ce758 ("selftests: forwarding: Add test case for traffic redirection from a locked port") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208155529.1199729-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Suppress grep warnings	Ido Schimmel
	Suppress the following grep warnings: [...] INFO: # Port group entries configuration tests - (, G) TEST: Common port group entries configuration tests (IPv4 (, G)) [ OK ] TEST: Common port group entries configuration tests (IPv6 (, G)) [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv4 (, G) port group entries configuration tests [ OK ] grep: warning: stray \ before / grep: warning: stray \ before / grep: warning: stray \ before / TEST: IPv6 (*, G) port group entries configuration tests [ OK ] [...] They do not fail the test, but do clutter the output. Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208155529.1199729-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Fix bridge MDB test flakiness	Ido Schimmel
	After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed. Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1]. Fix by reducing the Max Response Delay to 1 second. [1] [...] # TEST: IPv4 host entries forwarding tests [FAIL] # Packet locally received after flood Fixes: b6d00da08610 ("selftests: forwarding: Add bridge MDB test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208155529.1199729-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: forwarding: Fix layer 2 miss test flakiness	Ido Schimmel
	After enabling a multicast querier on the bridge (like the test is doing), the bridge will wait for the Max Response Delay before starting to forward according to its MDB in order to let Membership Reports enough time to be received and processed. Currently, the test is waiting for exactly the default Max Response Delay (10 seconds) which is racy and leads to failures [1]. Fix by reducing the Max Response Delay to 1 second. [1] [...] # TEST: L2 miss - Multicast (IPv4) [FAIL] # Unregistered multicast filter was hit after adding MDB entry Fixes: 8c33266ae26a ("selftests: forwarding: Add layer 2 miss test cases") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208155529.1199729-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09	selftests: net: Fix bridge backup port test flakiness	Ido Schimmel
	The test toggles the carrier of a bridge port in order to test the bridge backup port feature. Due to the linkwatch delayed work the carrier change is not always reflected fast enough to the bridge driver and packets are not forwarded as the test expects, resulting in failures [1]. Fix by busy waiting on the bridge port state until it changes to the desired state following the carrier change. [1] # Backup port # ----------- [...] # TEST: swp1 carrier off [ OK ] # TEST: No forwarding out of swp1 [FAIL] [ 641.995910] br0: port 1(swp1) entered disabled state # TEST: No forwarding out of vx0 [ OK ] Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240208123110.1063930-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08	selftests: net: add more missing kernel config	Paolo Abeni
	The reuseport_addr_any.sh is currently skipping DCCP tests and pmtu.sh is skipping all the FOU/GUE related cases: add the missing options. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/38d3ca7f909736c1aef56e6244d67c82a9bba6ff.1707326987.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-08	Merge tag 'net-6.8-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and netfilter. Current release - regressions: - nic: intel: fix old compiler regressions - netfilter: ipset: missing gc cancellations fixed Current release - new code bugs: - netfilter: ctnetlink: fix filtering for zone 0 Previous releases - regressions: - core: fix from address in memcpy_to_iter_csum() - netfilter: nfnetlink_queue: un-break NF_REPEAT - af_unix: fix memory leak for dead unix_(sk)->oob_skb in GC. - devlink: avoid potential loop in devlink_rel_nested_in_notify_work() - iwlwifi: - mvm: fix a battery life regression - fix double-free bug - mac80211: fix waiting for beacons logic - nic: nfp: flower: prevent re-adding mac index for bonded port Previous releases - always broken: - rxrpc: fix generation of serial numbers to skip zero - tipc: check the bearer type before calling tipc_udp_nl_bearer_add() - tunnels: fix out of bounds access when building IPv6 PMTU error - nic: hv_netvsc: register VF in netvsc_probe if NET_DEVICE_REGISTER missed - nic: atlantic: fix DMA mapping for PTP hwts ring Misc: - selftests: more fixes to deal with very slow hosts" * tag 'net-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits) netfilter: nft_set_pipapo: remove scratch_aligned pointer netfilter: nft_set_pipapo: add helper to release pcpu scratch area netfilter: nft_set_pipapo: store index in scratch maps netfilter: nft_set_rbtree: skip end interval element from gc netfilter: nfnetlink_queue: un-break NF_REPEAT netfilter: nf_tables: use timestamp to check for set element timeout netfilter: nft_ct: reject direction for ct id netfilter: ctnetlink: fix filtering for zone 0 s390/qeth: Fix potential loss of L3-IP@ in case of network issues netfilter: ipset: Missing gc cancellations fixed octeontx2-af: Initialize maps. net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio netfilter: nft_set_pipapo: remove static in nft_pipapo_get() netfilter: nft_compat: restrict match/target protocol to u16 netfilter: nft_compat: reject unused compat flag netfilter: nft_compat: narrow down revision to unsigned 8-bits net: intel: fix old compiler regressions MAINTAINERS: Maintainer change for rds selftests: cmsg_ipv6: repeat the exact packet ...
2024-02-08	Merge tag 'nf-24-02-08' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Narrow down target/match revision to u8 in nft_compat. 2) Bail out with unused flags in nft_compat. 3) Restrict layer 4 protocol to u16 in nft_compat. 4) Remove static in pipapo get command that slipped through when reducing set memory footprint. 5) Follow up incremental fix for the ipset performance regression, this includes the missing gc cancellation, from Jozsef Kadlecsik. 6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0 as no filtering, from Felix Huettner. 7) Reject direction for NFT_CT_ID. 8) Use timestamp to check for set element expiration while transaction is handled to prevent garbage collection from removing set elements that were just added by this transaction. Packet path and netlink dump/get path still use current time to check for expiration. 9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal. 10) map_index needs to be percpu and per-set, not just percpu. At this time its possible for a pipapo set to fill the all-zero part with ones and take the 'might have bits set' as 'start-from-zero' area. From Florian Westphal. This includes three patches: - Change scratchpad area to a structure that provides space for a per-set-and-cpu toggle and uses it of the percpu one. - Add a new free helper to prepare for the next patch. - Remove the scratch_aligned pointer and makes AVX2 implementation use the exact same memory addresses for read/store of the matching state. netfilter pull request 24-02-08 * tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_set_pipapo: remove scratch_aligned pointer netfilter: nft_set_pipapo: add helper to release pcpu scratch area netfilter: nft_set_pipapo: store index in scratch maps netfilter: nft_set_rbtree: skip end interval element from gc netfilter: nfnetlink_queue: un-break NF_REPEAT netfilter: nf_tables: use timestamp to check for set element timeout netfilter: nft_ct: reject direction for ct id netfilter: ctnetlink: fix filtering for zone 0 netfilter: ipset: Missing gc cancellations fixed netfilter: nft_set_pipapo: remove static in nft_pipapo_get() netfilter: nft_compat: restrict match/target protocol to u16 netfilter: nft_compat: reject unused compat flag netfilter: nft_compat: narrow down revision to unsigned 8-bits ==================== Link: https://lore.kernel.org/r/20240208112834.1433-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08	netfilter: ctnetlink: fix filtering for zone 0	Felix Huettner
	previously filtering for the default zone would actually skip the zone filter and flush all zones. Fixes: eff3c558bb7e ("netfilter: ctnetlink: support filtering by zone") Reported-by: Ilya Maximets <i.maximets@ovn.org> Closes: https://lore.kernel.org/netdev/2032238f-31ac-4106-8f22-522e76df5a12@ovn.org/ Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-07	selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros	Muhammad Usama Anjum
	Correct header file is needed for getting CLOSE_RANGE_* macros. Previously it was tested with newer glibc which didn't show the need to include the header which was a mistake. Link: https://lkml.kernel.org/r/20231024155137.219700-1-usama.anjum@collabora.com Fixes: ec54424923cf ("selftests: core: remove duplicate defines") Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com> Link: https://lore.kernel.org/all/7161219e-0223-d699-d6f3-81abd9abf13b@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "x86 guest: - Avoid false positive for check that only matters on AMD processors x86: - Give a hint when Win2016 might fail to boot due to XSAVES && !XSAVEC configuration - Do not allow creating an in-kernel PIT unless an IOAPIC already exists RISC-V: - Allow ISA extensions that were enabled for bare metal in 6.8 (Zbc, scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa) S390: - fix CC for successful PQAP instruction - fix a race when creating a shadow page" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM x86/kvm: Fix SEV check in sev_map_percpu_data() KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum KVM: x86: Check irqchip mode before create PIT KVM: riscv: selftests: Add Zfa extension to get-reg-list test RISC-V: KVM: Allow Zfa extension for Guest/VM KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test RISC-V: KVM: Allow Zihintntl extension for Guest/VM KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM KVM: riscv: selftests: Add vector crypto extensions to get-reg-list test RISC-V: KVM: Allow vector crypto extensions for Guest/VM KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list test RISC-V: KVM: Allow scalar crypto extensions for Guest/VM KVM: riscv: selftests: Add Zbc extension to get-reg-list test RISC-V: KVM: Allow Zbc extension for Guest/VM KVM: s390: fix cc for successful PQAP KVM: s390: vsie: fix race during shadow creation
2024-02-07	selftests: add ESRCH tests for pidfd_getfd()	Tycho Andersen
	Ensure that pidfd_getfd() reports -ESRCH if the task is already exiting. Signed-off-by: Tycho Andersen <tandersen@netflix.com> Link: https://lore.kernel.org/r/20240206192357.81942-1-tycho@tycho.pizza Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-07	selftests: cmsg_ipv6: repeat the exact packet	Jakub Kicinski
	cmsg_ipv6 test requests tcpdump to capture 4 packets, and sends until tcpdump quits. Only the first packet is "real", however, and the rest are basic UDP packets. So if tcpdump doesn't start in time it will miss the real packet and only capture the UDP ones. This makes the test fail on slow machine (no KVM or with debug enabled) 100% of the time, while it passes in fast environments. Repeat the "real" / expected packet. Fixes: 9657ad09e1fa ("selftests: net: test IPV6_TCLASS") Fixes: 05ae83d5a4a2 ("selftests: net: test IPV6_HOPLIMIT") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-07	selftests/seccomp: Pin benchmark to single CPU	Kees Cook
	The seccomp benchmark test (for validating the benefit of bitmaps) can be sensitive to scheduling speed, so pin the process to a single CPU, which appears to significantly improve reliability, and loosen the "close enough" checking to allow up to 10% variance instead of 1%. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202402061002.3a8722fd-oliver.sang@intel.com Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-06	KVM: selftests: Don't assert on exact number of 4KiB in dirty log split test	Sean Christopherson
	Drop dirty_log_page_splitting_test's assertion that the number of 4KiB pages remains the same across dirty logging being enabled and disabled, as the test doesn't guarantee that mappings outside of the memslots being dirty logged are stable, e.g. KVM's mappings for code and pages in memslot0 can be zapped by things like NUMA balancing. To preserve the spirit of the check, assert that (a) the number of 4KiB pages after splitting is _at least_ the number of 4KiB pages across all memslots under test, and (b) the number of hugepages before splitting adds up to the number of pages across all memslots under test. (b) is a little tenuous as it relies on memslot0 being incompatible with transparent hugepages, but that holds true for now as selftests explicitly madvise() MADV_NOHUGEPAGE for memslot0 (__vm_create() unconditionally specifies the backing type as VM_MEM_SRC_ANONYMOUS). Reported-by: Yi Lai <yi1.lai@intel.com> Reported-by: Tao Su <tao1.su@linux.intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20240131222728.4100079-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-06	KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test	Sean Christopherson
	When finishing the final iteration of dirty_log_test testcase, set host_quit _before_ the final "continue" so that the vCPU worker doesn't run an extra iteration, and delete the hack-a-fix of an extra "continue" from the dirty ring testcase. This fixes a bug where the extra post to sem_vcpu_cont may not be consumed, which results in failures in subsequent runs of the testcases. The bug likely was missed during development as x86 supports only a single "guest mode", i.e. there aren't any subsequent testcases after the dirty ring test, because for_each_guest_mode() only runs a single iteration. For the regular dirty log testcases, letting the vCPU run one extra iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and only if the worker is explicitly told to stop (vcpu_sync_stop_requested). But for the dirty ring test, which needs to periodically stop the vCPU to reap the dirty ring, letting the vCPU resume the guest _after_ the last iteration means the vCPU will get stuck without an extra "continue". However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to be consumed, e.g. if the vCPU worker sees host_quit==true before resuming the guest. This results in a dangling sem_vcpu_cont, which leads to subsequent iterations getting out of sync, as the vCPU worker will continue on before the main task is ready for it to resume the guest, leading to a variety of asserts, e.g. ==== Test Assertion Failure ==== dirty_log_test.c:384: dirty_ring_vcpu_ring_full pid=14854 tid=14854 errno=22 - Invalid argument 1 0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384 2 0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505 3 (inlined by) run_test at dirty_log_test.c:802 4 0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100 5 0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3) 6 0x0000ffff9be173c7: ?? ??:0 7 0x0000ffff9be1749f: ?? ??:0 8 0x000000000040206f: _start at ??:? Didn't continue vcpu even without ring full Alternatively, the test could simply reset the semaphores before each testcase, but papering over hacks with more hacks usually ends in tears. Reported-by: Shaoqin Huang <shahuang@redhat.com> Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test") Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240202231831.354848-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-06	iommufd/selftest: Add mock IO hugepages tests	Joao Martins
	Leverage previously added MOCK_FLAGS_DEVICE_HUGE_IOVA flag to create an IOMMU domain with more than MOCK_IO_PAGE_SIZE supported. Plumb the hugetlb backing memory for buffer allocation and change the expected page size to MOCK_HUGE_PAGE_SIZE (1M) when hugepage variant test cases are used. These so far are limited to 128M and 256M IOVA range tests cases which is when 1M hugepages can be used. Link: https://lore.kernel.org/r/20240202133415.23819-9-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06	iommufd/selftest: Refactor dirty bitmap tests	Joao Martins
	Rework the functions that test and set the bitmaps to receive a new parameter (the pte_page_size) that reflects the expected PTE size in the page tables. The same scheme is still used i.e. even bits are dirty and odd page indexes aren't dirty. Here it just refactors to consider the size of the PTE rather than hardcoded to IOMMU mock base page assumptions. While at it, refactor dirty bitmap tests to use the idev_id created by the fixture instead of creating a new one. This is in preparation for doing tests with IOMMU hugepages where multiple bits set as part of recording a whole hugepage as dirty and thus the pte_page_size will vary depending on io hugepages or io base pages. Link: https://lore.kernel.org/r/20240202133415.23819-6-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06	iommufd/selftest: Test u64 unaligned bitmaps	Joao Martins
	Exercise the dirty tracking bitmaps with byte unaligned addresses in addition to the PAGE_SIZE unaligned bitmaps, using a address towards the end of the page boundary. In doing so, increase the tailroom we allocate for the bitmap from MOCK_PAGE_SIZE(2K) into PAGE_SIZE(4K), such that we can test end of bitmap boundary. Link: https://lore.kernel.org/r/20240202133415.23819-4-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06	selftests/net: Amend per-netns counter checks	Dmitry Safonov
	Selftests here check not only that connect()/accept() for TCP-AO/TCP-MD5/non-signed-TCP combinations do/don't establish connections, but also counters: those are per-AO-key, per-socket and per-netns. The counters are checked on the server's side, as the server listener has TCP-AO/TCP-MD5/no keys for different peers. All tests run in the same namespaces with the same veth pair, created in test_init(). After close() in both client and server, the sides go through the regular FIN/ACK + FIN/ACK sequence, which goes in the background. If the selftest has already started a new testing scenario, read per-netns counters - it may fail in the end iff it doesn't expect the TCPAOGood per-netns counters go up during the test. Let's just kill both TCP-AO sides - that will avoid any asynchronous background TCP-AO segments going to either sides. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/all/20240201132153.4d68f45e@kernel.org/T/#u Fixes: 6f0c472a6815 ("selftests/net: Add TCP-AO + TCP-MD5 + no sign listen socket tests") Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://lore.kernel.org/r/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-05	selftests: net: let big_tcp test cope with slow env	Paolo Abeni
	In very slow environments, most big TCP cases including segmentation and reassembly of big TCP packets have a good chance to fail: by default the TCP client uses write size well below 64K. If the host is low enough autocorking is unable to build real big TCP packets. Address the issue using much larger write operations. Note that is hard to observe the issue without an extremely slow and/or overloaded environment; reduce the TCP transfer time to allow for much easier/faster reproducibility. Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-05	selftests/powerpc/papr_vpd: Check devfd before get_system_loc_code()	R Nageswara Sastry
	Calling get_system_loc_code before checking devfd and errno fails the test when the device is not available, the expected behaviour is a SKIP. Change the order of 'SKIP_IF_MSG' to correctly SKIP when the /dev/ papr-vpd device is not available. Test output before: Test FAILED on line 271 Test output after: [SKIP] Test skipped on line 266: /dev/papr-vpd not present Signed-off-by: R Nageswara Sastry <rnsastry@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240131130859.14968-1-rnsastry@linux.ibm.com
2024-02-02	selftests: net: avoid just another constant wait	Paolo Abeni
	Using hard-coded constant timeout to wait for some expected event is deemed to fail sooner or later, especially in slow env. Our CI has spotted another of such race: # TEST: ipv6: cleanup of cached exceptions - nexthop objects [FAIL] # can't delete veth device in a timely manner, PMTU dst likely leaked Replace the crude sleep with a loop looking for the expected condition at low interval for a much longer range. Fixes: b3cc4f8a8a41 ("selftests: pmtu: add explicit tests for PMTU exceptions cleanup") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/fd5c745e9bb665b724473af6a9373a8c2a62b247.1706812005.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-02	selftests: net: fix tcp listener handling in pmtu.sh	Paolo Abeni
	The pmtu.sh test uses a few TCP listener in a problematic way: It hard-codes a constant timeout to wait for the listener starting-up in background. That introduces unneeded latency and on very slow and busy host it can fail. Additionally the test starts again the same listener in the same namespace on the same port, just after the previous connection completed. Fast host can attempt starting the new server before the old one really closed the socket. Address the issues using the wait_local_port_listen helper and explicitly waiting for the background listener process exit. Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/f8e8f6d44427d8c45e9f6a71ee1a321047452087.1706812005.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-02	selftests: net: fix setup_ns usage in rtnetlink.sh	Paolo Abeni
	The setup_ns helper marks the testns global variable as readonly. Later attempts to set such variable are unsuccessful, causing a couple test failures. Avoid completely the variable re-initialization and let the function access the global value. Fixes: e9ce7ededf14 ("selftests: rtnetlink: use setup_ns in bonding test") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/6e7c937c8ff73ca52a21a4a536a13a76ec0173a8.1706812005.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-02	selftests: net: cut more slack for gro fwd tests.	Paolo Abeni
	The udpgro_fwd.sh self-tests are somewhat unstable. There are a few timing constraints the we struggle to meet on very slow environments. Instead of skipping the whole tests in such envs, increase the test resilience WRT very slow hosts: increase the inter-packets timeouts, avoid resetting the counters every second and finally disable reduce the background traffic noise. Tested with: for I in $(seq 1 100); do ./tools/testing/selftests/kselftest_install/run_kselftest.sh \ -t net:udpgro_fwd.sh \|\| exit -1 done in a slow environment. Fixes: a062260a9d5f ("selftests: net: add UDP GRO forwarding self-tests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/f4b6b11064a0d39182a9ae6a853abae3e9b4426a.1706812005.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>