linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-02-08	objtool: Ignore dangling jump table entries	Josh Poimboeuf
	Clang sometimes leaves dangling unused jump table entries which point to the end of the function. Ignore them. Closes: https://lore.kernel.org/20250113235835.vqgvb7cdspksy5dn@jpoimboe Reported-by: Klaus Kusche <klaus.kusche@computerix.info> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/ee25c0b7e80113e950bd1d4c208b671d35774ff4.1736891751.git.jpoimboe@kernel.org
2025-02-07	selftests/bpf: Remove with_addr.sh and with_tunnels.sh	Bastien Curutchet (eBPF Foundation)
	Those two scripts were used by test_flow_dissector.sh to setup/cleanup the network topology before/after the tests. test_flow_dissector.sh have been deleted by commit 63b37657c5fd ("selftests/bpf: remove test_flow_dissector.sh") so they aren't used anywhere now. Remove the two unused scripts and their Makefile entries. Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250204-with-v1-1-387a42118cd4@bootlin.com
2025-02-07	netdevsim: allow normal queue reset while down	Jakub Kicinski
	Resetting queues while the device is down should be legal. Allow it, test it. Ideally we'd test this with a real device supporting devmem but I don't have access to such devices. Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250206225638.1387810-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-07	bpf: selftests: Test constant key extraction on irrelevant maps	Daniel Xu
	Test that very high constant map keys are not interpreted as an error value by the verifier. This would previously fail. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/c0590b62eb9303f389b2f52c0c7e9cf22a358a30.1738689872.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-07	sched_ext: Print an event, SCX_EV_ENQ_SLICE_DFL, in scx_qmap/central	Changwoo Min
	Modify the scx_qmap and scx_celtral schedulers to print the SCX_EV_ENQ_SLICE_DFL event every second. Signed-off-by: Changwoo Min <changwoo@igalia.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-07	tools/power turbostat: Clustered Uncore MHz counters should honor show/hide ↵	Len Brown
	options The clustered uncore frequency counters, UMHz. should honor the --show and --hide options. All non-specified counters should be implicityly hidden. But when --show was used, UMHz. showed up anyway: $ sudo turbostat -q -S --show Busy% Busy% UMHz0.0 UMHz1.0 UMHz2.0 UMHz3.0 UMHz4.0 Indeed, there was no string that can be used to explicitly show or hide clustered uncore counters. Even through they are dynamically probed and added, group the clustered UMHz. counters with the legacy built-in-counter "UncMHz" for show/hide. turbostat --show Busy% does not show UMHz.. turbostat --show UncMHz shows either UncMHz or UMHz., if present turbostat --hide UncMHz hides either UncMHz or UMHz., if present Reported-by: Artem Bityutskiy <artem.bityutskiy@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
2025-02-07	Merge tag 'vfs-6.14-rc2.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix fsnotify FMODE_NONOTIFY* handling. This also disables fsnotify on all pseudo files by default apart from very select exceptions. This carries a regression risk so we need to watch out and adapt accordingly. However, it is overall a significant improvement over the current status quo where every rando file can get fsnotify enabled. - Cleanup and simplify lockref_init() after recent lockref changes. - Fix vboxfs build with gcc-15. - Add an assert into inode_set_cached_link() to catch corrupt links. - Allow users to also use an empty string check to detect whether a given mount option string was empty or not. - Fix how security options were appended to statmount()'s ->mnt_opt field. - Fix statmount() selftests to always check the returned mask. - Fix uninitialized value in vfs_statx_path(). - Fix pidfs_ioctl() sanity checks to guard against ioctl() overloading and preserve extensibility. * tag 'vfs-6.14-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: sanity check the length passed to inode_set_cached_link() pidfs: improve ioctl handling fsnotify: disable pre-content and permission events by default selftests: always check mask returned by statmount(2) fsnotify: disable notification by default for all pseudo files fs: fix adding security options to statmount.mnt_opt fsnotify: use accessor to set FMODE_NONOTIFY_* lockref: remove count argument of lockref_init gfs2: switch to lockref_init(..., 1) gfs2: use lockref_init for gl_lockref statmount: let unset strings be empty vboxsf: fix building with GCC 15 fs/stat.c: avoid harmless garbage value problem in vfs_statx_path()
2025-02-07	selftests: always check mask returned by statmount(2)	Miklos Szeredi
	STATMOUNT_MNT_OPTS can actually be missing if there are no options. This is a change of behavior since 75ead69a7173 ("fs: don't let statmount return empty strings"). The other checks shouldn't actually trigger, but add them for correctness and for easier debugging if the test fails. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129160641.35485-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	selftests/bpf: Correct the check of join cgroup	Jason Xing
	Use ASSERT_OK_FD to check the return value of join cgroup, or else this test will pass even if the fd < 0. ASSERT_OK_FD can print the error message to the console. Suggested-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/all/6d62bd77-6733-40c7-b240-a1aeff55566c@linux.dev/ Link: https://patch.msgid.link/20250204051154.57655-1-kerneljasonxing@gmail.com
2025-02-06	Merge branch 'io_uring-zero-copy-rx'	Jakub Kicinski
	David Wei says: ==================== io_uring zero copy rx This patchset contains net/ patches needed by a new io_uring request implementing zero copy rx into userspace pages, eliminating a kernel to user copy. We configure a page pool that a driver uses to fill a hw rx queue to hand out user pages instead of kernel pages. Any data that ends up hitting this hw rx queue will thus be dma'd into userspace memory directly, without needing to be bounced through kernel memory. 'Reading' data out of a socket instead becomes a _notification_ mechanism, where the kernel tells userspace where the data is. The overall approach is similar to the devmem TCP proposal. This relies on hw header/data split, flow steering and RSS to ensure packet headers remain in kernel memory and only desired flows hit a hw rx queue configured for zero copy. Configuring this is outside of the scope of this patchset. We share netdev core infra with devmem TCP. The main difference is that io_uring is used for the uAPI and the lifetime of all objects are bound to an io_uring instance. Data is 'read' using a new io_uring request type. When done, data is returned via a new shared refill queue. A zero copy page pool refills a hw rx queue from this refill queue directly. Of course, the lifetime of these data buffers are managed by io_uring rather than the networking stack, with different refcounting rules. This patchset is the first step adding basic zero copy support. We will extend this iteratively with new features e.g. dynamically allocated zero copy areas, THP support, dmabuf support, improved copy fallback, general optimisations and more. In terms of netdev support, we're first targeting Broadcom bnxt. Patches aren't included since Taehee Yoo has already sent a more comprehensive patchset adding support in [1]. Google gve should already support this, and Mellanox mlx5 support is WIP pending driver changes. =========== Performance =========== Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet. Test setup: * AMD EPYC 9454 * Broadcom BCM957508 200G * Kernel v6.11 base [2] * liburing fork [3] * kperf fork [4] * 4K MTU * Single TCP flow With application thread + net rx softirq pinned to _different_ cores: +-------------------------------+ \| epoll \| io_uring \| \|-----------\|-------------------\| \| 82.2 Gbps \| 116.2 Gbps (+41%) \| +-------------------------------+ Pinned to _same_ core: +-------------------------------+ \| epoll \| io_uring \| \|-----------\|-------------------\| \| 62.6 Gbps \| 80.9 Gbps (+29%) \| +-------------------------------+ ===== Links ===== Broadcom bnxt support: [1]: https://lore.kernel.org/20241003160620.1521626-8-ap420073@gmail.com Linux kernel branch including io_uring bits: [2]: https://github.com/isilence/linux.git zcrx/v13 liburing for testing: [3]: https://github.com/isilence/liburing.git zcrx/next kperf for testing: [4]: https://git.kernel.dk/kperf.git ==================== Link: https://patch.msgid.link/20250204215622.695511-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06	netdev: add io_uring memory provider info	David Wei
	Add a nested attribute for io_uring memory provider info. For now it is empty and its presence indicates that a particular page pool or queue has an io_uring memory provider attached. $ ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get [{'id': 80, 'ifindex': 2, 'inflight': 64, 'inflight-mem': 262144, 'napi-id': 525}, {'id': 79, 'ifindex': 2, 'inflight': 320, 'inflight-mem': 1310720, 'io_uring': {}, 'napi-id': 525}, ... $ ./cli.py --spec netlink/specs/netdev.yaml --dump queue-get [{'id': 0, 'ifindex': 1, 'type': 'rx'}, {'id': 0, 'ifindex': 1, 'type': 'tx'}, {'id': 0, 'ifindex': 2, 'napi-id': 513, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 514, 'type': 'rx'}, ... {'id': 12, 'ifindex': 2, 'io_uring': {}, 'napi-id': 525, 'type': 'rx'}, ... Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-6-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.14-rc2). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06	tools: ynl: add all headers to makefile deps	Jakub Kicinski
	The Makefile.deps lists uAPI headers to make the build work when system headers are older than in-tree headers. The problem doesn't occur for new headers, because system headers are not there at all. But out-of-tree YNL clone on GH also uses this header to identify header dependencies, and one day the system headers will exist, and will get out of date. So let's add the headers we missed. I don't think this is a fix, but FWIW the commits which added the missing headers are: commit 04e65df94b31 ("netlink: spec: add shaper YAML spec") commit 49922401c219 ("ethtool: separate definitions that are gonna be generated") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250205173352.446704-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06	selftests/seccomp: validate uretprobe syscall passes through seccomp	Eyal Birger
	The uretprobe syscall is implemented as a performance enhancement on x86_64 by having the kernel inject a call to it on function exit; User programs cannot call this system call explicitly. As such, this syscall is considered a kernel implementation detail and should not be filtered by seccomp. Enhance the seccomp bpf test suite to check that uretprobes can be attached to processes without the killing the process regardless of seccomp policy. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20250202162921.335813-3-eyal.birger@gmail.com [kees: Skip archs without __NR_uretprobe] Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-06	Merge tag 'net-6.14-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Interestingly the recent kmemleak improvements allowed our CI to catch a couple of percpu leaks addressed here. We (mostly Jakub, to be accurate) are working to increase review coverage over the net code-base tweaking the MAINTAINER entries. Current release - regressions: - core: harmonize tstats and dstats - ipv6: fix dst refleaks in rpl, seg6 and ioam6 lwtunnels - eth: tun: revert fix group permission check - eth: stmmac: revert "specify hardware capability value when FIFO size isn't specified" Previous releases - regressions: - udp: gso: do not drop small packets when PMTU reduces - rxrpc: fix race in call state changing vs recvmsg() - eth: ice: fix Rx data path for heavy 9k MTU traffic - eth: vmxnet3: fix tx queue race condition with XDP Previous releases - always broken: - sched: pfifo_tail_enqueue: drop new packet when sch->limit == 0 - ethtool: ntuple: fix rss + ring_cookie check - rxrpc: fix the rxrpc_connection attend queue handling Misc: - recognize Kuniyuki Iwashima as a maintainer" * tag 'net-6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) Revert "net: stmmac: Specify hardware capability value when FIFO size isn't specified" MAINTAINERS: add a sample ethtool section entry MAINTAINERS: add entry for ethtool rxrpc: Fix race in call state changing vs recvmsg() rxrpc: Fix call state set to not include the SERVER_SECURING state net: sched: Fix truncation of offloaded action statistics tun: revert fix group permission check selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog() netem: Update sch->q.qlen before qdisc_tree_reduce_backlog() selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0 pfifo_tail_enqueue: Drop new packet when sch->limit == 0 selftests: mptcp: connect: -f: no reconnect net: rose: lock the socket in rose_bind() net: atlantic: fix warning during hot unplug rxrpc: Fix the rxrpc_connection attend queue handling net: harmonize tstats and dstats selftests: drv-net: rss_ctx: don't fail reconfigure test if queue offset not supported selftests: drv-net: rss_ctx: add missing cleanup in queue reconfigure ethtool: ntuple: fix rss + ring_cookie check ethtool: rss: fix hiding unsupported fields in dumps ...
2025-02-06	tools: ynl-gen: support limits using definitions	Jakub Kicinski
	Support using defines / constants in integer checks. Carolina will need this for rate API extensions. Reported-by: Carolina Jubran <cjubran@nvidia.com> Link: https://lore.kernel.org/1e886aaf-e1eb-4f1a-b7ef-f63b350a3320@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250203215510.1288728-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-06	tools: ynl-gen: don't output external constants	Jakub Kicinski
	A definition with a "header" property is an "external" definition for C code, as in it is defined already in another C header file. Other languages will need the exact value but C codegen should not recreate it. So don't output those definitions in the uAPI header. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250203215510.1288728-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-05	selftests: forwarding: vxlan_bridge_1d: Check aging while forwarding	Ido Schimmel
	Extend the VXLAN FDB aging test case to verify that FDB entries are aged out when they only forward traffic and not refreshed by received traffic. The test fails before "vxlan: Age out FDB entries based on 'updated' time": # ./vxlan_bridge_1d.sh [...] TEST: VXLAN: Ageing of learned FDB entry [FAIL] [...] # echo $? 1 And passes after it: # ./vxlan_bridge_1d.sh [...] TEST: VXLAN: Ageing of learned FDB entry [ OK ] [...] # echo $? 0 Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250204145549.1216254-9-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05	selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog()	Cong Wang
	Integrate the test case provided by Mingi Cho into TDC. All test results: 1..4 ok 1 ca5e - Check class delete notification for ffff: ok 2 e4b7 - Check class delete notification for root ffff: ok 3 33a9 - Check ingress is not searchable on backlog update ok 4 a4b9 - Test class qlen notification Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-5-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05	selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0	Quang Le
	When limit == 0, pfifo_tail_enqueue() must drop new packet and increase dropped packets count of the qdisc. All test results: 1..16 ok 1 a519 - Add bfifo qdisc with system default parameters on egress ok 2 585c - Add pfifo qdisc with system default parameters on egress ok 3 a86e - Add bfifo qdisc with system default parameters on egress with handle of maximum value ok 4 9ac8 - Add bfifo qdisc on egress with queue size of 3000 bytes ok 5 f4e6 - Add pfifo qdisc on egress with queue size of 3000 packets ok 6 b1b1 - Add bfifo qdisc with system default parameters on egress with invalid handle exceeding maximum value ok 7 8d5e - Add bfifo qdisc on egress with unsupported argument ok 8 7787 - Add pfifo qdisc on egress with unsupported argument ok 9 c4b6 - Replace bfifo qdisc on egress with new queue size ok 10 3df6 - Replace pfifo qdisc on egress with new queue size ok 11 7a67 - Add bfifo qdisc on egress with queue size in invalid format ok 12 1298 - Add duplicate bfifo qdisc on egress ok 13 45a0 - Delete nonexistent bfifo qdisc ok 14 972b - Add prio qdisc on egress with invalid format for handles ok 15 4d39 - Delete bfifo qdisc twice ok 16 d774 - Check pfifo_head_drop qdisc enqueue behaviour when limit == 0 Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05	selftests: mptcp: connect: -f: no reconnect	Matthieu Baerts (NGI0)
	The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE by closing the connection with unread bytes in the receive queue. The xdisconnect() helper was used to stop the connection, but it does more than that: it will shut it down, then wait before reconnecting to the same address. This causes the mptcp_join's "fastclose test" to fail all the time. This failure is due to a recent change, with commit 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect"), but that went unnoticed because the test is currently ignored. The recent modification only shown an existing issue: xdisconnect() doesn't need to be used here, only the shutdown() part is needed. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250204-net-mptcp-sft-conn-f-v1-1-6b470c72fffa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05	bridge: mdb: Allow replace of a host-joined group	Petr Machata
	Attempts to replace an MDB group membership of the host itself are currently bounced: # ip link add name br up type bridge vlan_filtering 1 # bridge mdb replace dev br port br grp 239.0.0.1 vid 2 # bridge mdb replace dev br port br grp 239.0.0.1 vid 2 Error: bridge: Group is already joined by host. A similar operation done on a member port would succeed. Ignore the check for replacement of host group memberships as well. The bit of code that this enables is br_multicast_host_join(), which, for already-joined groups only refreshes the MC group expiration timer, which is desirable; and a userspace notification, also desirable. Change a selftest that exercises this code path from expecting a rejection to expecting a pass. The rest of MDB selftests pass without modification. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/e5c5188b9787ae806609e7ca3aa2a0a501b9b5c4.1738685648.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05	selftests: net: suppress ReST file generation when building selftests	Jakub Kicinski
	Some selftests need libynl.a. When building it try to skip generating the ReST documentation, libynl.a does not depend on them. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203214850.1282291-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05	selftests/bpf: Support dynamically linking LLVM if static is not available	Daniel Xu
	Since 67ab80a01886 ("selftests/bpf: Prefer static linking for LLVM libraries"), only statically linking test_progs is supported. However, some distros only provide a dynamically linkable LLVM. This commit adds a fallback for dynamically linking LLVM if static linking is not available. If both options are available, static linking is chosen. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/872b64e93de9a6cd6a7a10e6a5c5e7893704f743.1738276344.git.dxu@dxuuu.xyz
2025-02-05	selftests/bpf: Add a BTF verification test for kflagged type_tag	Ihor Solodrai
	Add a BTF verification test case for a type_tag with a kflag set. Type tags with a kflag are now valid. Add BTF_DECL_ATTR_ENC and BTF_TYPE_ATTR_ENC test helper macros, corresponding to *_TAG_ENC. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250130201239.1429648-7-ihor.solodrai@linux.dev
2025-02-05	bpf: Allow kind_flag for BTF type and decl tags	Ihor Solodrai
	BTF type tags and decl tags now may have info->kflag set to 1, changing the semantics of the tag. Change BTF verification to permit BTF that makes use of this feature: * remove kflag check in btf_decl_tag_check_meta(), as both values are valid * allow kflag to be set for BTF_KIND_TYPE_TAG type in btf_ref_type_check_meta() Make sure kind_flag is NOT set when checking for specific BTF tags, such as "kptr", "user" etc. Modify a selftest checking for kflag in decl_tag accordingly. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250130201239.1429648-6-ihor.solodrai@linux.dev
2025-02-05	selftests/bpf: Add a btf_dump test for type_tags	Ihor Solodrai
	Factor out common routines handling custom BTF from test_btf_dump_incremental. Then use them in the test_btf_dump_type_tags. test_btf_dump_type_tags verifies that a type tag is dumped correctly with respect to its kflag. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250130201239.1429648-5-ihor.solodrai@linux.dev
2025-02-05	libbpf: Check the kflag of type tags in btf_dump	Ihor Solodrai
	If the kflag is set for a BTF type tag, then the tag represents an arbitrary __attribute__. Change btf_dump accordingly. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250130201239.1429648-4-ihor.solodrai@linux.dev
2025-02-05	docs/bpf: Document the semantics of BTF tags with kind_flag	Ihor Solodrai
	Explain the meaning of kind_flag in BTF type_tags and decl_tags. Update uapi btf.h kind_flag comment to reflect the changes. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250130201239.1429648-3-ihor.solodrai@linux.dev
2025-02-05	libbpf: Introduce kflag for type_tags and decl_tags in BTF	Ihor Solodrai
	Add the following functions to libbpf API: * btf__add_type_attr() * btf__add_decl_attr() These functions allow to add to BTF the type tags and decl tags with info->kflag set to 1. The kflag indicates that the tag directly encodes an __attribute__ and not a normal tag. See Documentation/bpf/btf.rst changes in the subsequent patch for details on the semantics. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250130201239.1429648-2-ihor.solodrai@linux.dev
2025-02-05	Merge tag 'v6.14-rc1' into perf-tools-next	Namhyung Kim
	To get the various fixes in the current master. Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-05	torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()	Paul E. McKenney
	Recent experience shows that the srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe() functions are not sufficiently tested. This commit therefore causes the torture.sh script's SRCU lockdep testing to use these two functions. This will cause these two functions to be regularly tested by several developers (myself included) who use torture.sh as an RCU acceptance test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05	rcutorture: Make scenario SRCU-P use srcu_read_lock_fast()	Paul E. McKenney
	This commit causes the rcutorture SRCU-P scenario use the srcu_read_lock_fast() and srcu_read_unlock_fast() functions. This will cause these two functions to be regularly tested by several developers (myself included), for example, those who use torture.sh as an RCU acceptance test. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
2025-02-05	selftests/mm: use PIDFD_SELF in guard pages test	Lorenzo Stoakes
	Now we have PIDFD_SELF available for process_madvise(), make use of it in the guard pages test. This is both more convenient and asserts that PIDFD_SELF works as expected. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/69fbbe088d3424de9983e145228459cb05a8f13d.1738268370.git.lorenzo.stoakes@oracle.com Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-05	selftests/pidfd: add tests for PIDFD_SELF_*	Lorenzo Stoakes
	Add tests to assert that PIDFD_SELF* correctly refers to the current thread and process. We explicitly test pidfd_send_signal(), however We defer testing of mm-specific functionality which uses pidfd, namely process_madvise() and process_mrelease() to mm testing (though note the latter can not be sensibly tested as it would require the testing process to be dying). We also correct the pidfd_open_test.c fields which refer to .request_mask whereas the UAPI header refers to .mask, which otherwise break the import of the UAPI header. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/7ab0e48b26ba53abf7b703df2dd11a2e99b8efb2.1738268370.git.lorenzo.stoakes@oracle.com Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-05	selftests/pidfd: add new PIDFD_SELF* defines	Christian Brauner
	They will be needed in selftests in follow-up patches. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-04	perf stat: Changes to event name uniquification	Ian Rogers
	The existing logic would disable uniquification on an evlist or enable it per evsel, this is unfortunate as uniquification is most needed when events have the same name and so the whole evlist must be considered. Change the initial disable uniquify on an evlist processing to also set a needs_uniquify flag, for cases like the matching event names. This must be done as an initial pass as uniquification of an event name will change the behavior of the check. Keep the per counter uniquification but now only uniquify event names when the needs_uniquify flag is set. Before this change a hwmon like temp1 wouldn't be uniquified and afterwards it will (ie the PMU is added to the temp1 event's name). Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-04	perf stat: Don't merge counters purely on name	Ian Rogers
	Counter merging was added in commit 942c5593393d ("perf stat: Add perf_stat_merge_counters()"), however, it merges events with the same name on different PMUs regardless of whether the different PMUs are actually of the same type (ie they differ only in the suffix on the PMU). For hwmon events there may be a temp1 event on every PMU, but the PMU names are all unique and don't differ just by a suffix. The merging is over eager and will merge all the hwmon counters together meaning an aggregated and very large temp1 value is shown. The same would be true for say cache events and memory controller events where the PMUs differ but the event names are the same. Fix the problem by correctly saying two PMUs alias when they differ only by suffix. Note, there is an overlap with evsel's merged_stat with aggregation and the evsel's metric_leader where aggregation happens for metrics. Fixes: 942c5593393d ("perf stat: Add perf_stat_merge_counters()") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-04	perf pmu: Rename name matching for no suffix or wildcard variants	Ian Rogers
	Wildcard PMU naming will match a name like pmu_1 to a PMU name like pmu_10 but not to a PMU name like pmu_2 as the suffix forms part of the match. No suffix matching will match pmu_10 to either pmu_1 or pmu_2. Add or rename matching functions on PMU to make it clearer what kind of matching is being performed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-04	perf pmus: Restructure pmu_read_sysfs to scan fewer PMUs	Ian Rogers
	Rather than scanning core or all PMUs, allow pmu_read_sysfs to read some combination of core, other, hwmon and tool PMUs. The PMUs that should be read and are already read are held as bitmaps. It is known that a "hwmon_" prefix is necessary for a hwmon PMU's name, similarly with "tool", so only scan those PMUs in situations the PMU name or the PMU's type number make sense to. The number of openat system calls reduces from 276 to 98 for a hwmon event. The number of openats for regular perf events isn't changed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-04	perf evsel: Reduce scanning core PMUs in is_hybrid	Ian Rogers
	evsel__is_hybrid returns true if there are multiple core PMUs and the evsel is for a core PMU. Determining the number of core PMUs can require loading/scanning PMUs. There's no point doing the scanning if evsel for the is_hybrid test isn't core so reorder the tests to reduce PMU scanning. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250201074320.746259-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-05	tools/counter: add direction change event to watcher	David Lechner
	Add support for the new COUNTER_EVENT_DIRECTION_CHANGE to the counter_watch_events tool. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20250110-counter-ti-eqep-add-direction-support-v2-3-c6b6f96d2db9@baylibre.com Signed-off-by: William Breathitt Gray <wbg@kernel.org>
2025-02-05	tools/counter: gitignore counter_watch_events	David Lechner
	Ignore the executable counter_watch_events when building in-tree. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20250110-counter-ti-eqep-add-direction-support-v2-1-c6b6f96d2db9@baylibre.com Signed-off-by: William Breathitt Gray <wbg@kernel.org>
2025-02-04	netconsole: selftest: Add test for fragmented messages	Breno Leitao
	Add a new selftest to verify netconsole's handling of messages that exceed the packet size limit and require fragmentation. The test sends messages with varying sizes and userdata, validating that: 1. Large messages are correctly fragmented and reassembled 2. Userdata fields are properly preserved across fragments 3. Messages work correctly with and without kernel release version appending The test creates a networking environment using netdevsim, sends messages through /dev/kmsg, and verifies the received fragments maintain message integrity. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04	perf test: Fix Hwmon PMU test endianess issue	Thomas Richter
	perf test 11 hwmon fails on s390 with this error # ./perf test -Fv 11 --- start --- ---- end ---- 11.1: Basic parsing test : Ok --- start --- Testing 'temp_test_hwmon_event1' Using CPUID IBM,3931,704,A01,3.7,002f temp_test_hwmon_event1 -> hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/ FAILED tests/hwmon_pmu.c:189 Unexpected config for 'temp_test_hwmon_event1', 292470092988416 != 655361 ---- end ---- 11.2: Parsing without PMU name : FAILED! --- start --- Testing 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/' FAILED tests/hwmon_pmu.c:189 Unexpected config for 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/', 292470092988416 != 655361 ---- end ---- 11.3: Parsing with PMU name : FAILED! # The root cause is in member test_event::config which is initialized to 0xA0001 or 655361. During event parsing a long list event parsing functions are called and end up with this gdb call stack: #0 hwmon_pmu__config_term (hwm=0x168dfd0, attr=0x3ffffff5ee8, term=0x168db60, err=0x3ffffff81c8) at util/hwmon_pmu.c:623 #1 hwmon_pmu__config_terms (pmu=0x168dfd0, attr=0x3ffffff5ee8, terms=0x3ffffff5ea8, err=0x3ffffff81c8) at util/hwmon_pmu.c:662 #2 0x00000000012f870c in perf_pmu__config_terms (pmu=0x168dfd0, attr=0x3ffffff5ee8, terms=0x3ffffff5ea8, zero=false, apply_hardcoded=false, err=0x3ffffff81c8) at util/pmu.c:1519 #3 0x00000000012f88a4 in perf_pmu__config (pmu=0x168dfd0, attr=0x3ffffff5ee8, head_terms=0x3ffffff5ea8, apply_hardcoded=false, err=0x3ffffff81c8) at util/pmu.c:1545 #4 0x00000000012680c4 in parse_events_add_pmu (parse_state=0x3ffffff7fb8, list=0x168dc00, pmu=0x168dfd0, const_parsed_terms=0x3ffffff6090, auto_merge_stats=true, alternate_hw_config=10) at util/parse-events.c:1508 #5 0x00000000012684c6 in parse_events_multi_pmu_add (parse_state=0x3ffffff7fb8, event_name=0x168ec10 "temp_test_hwmon_event1", hw_config=10, const_parsed_terms=0x0, listp=0x3ffffff6230, loc_=0x3ffffff70e0) at util/parse-events.c:1592 #6 0x00000000012f0e4e in parse_events_parse (_parse_state=0x3ffffff7fb8, scanner=0x16878c0) at util/parse-events.y:293 #7 0x00000000012695a0 in parse_events__scanner (str=0x3ffffff81d8 "temp_test_hwmon_event1", input=0x0, parse_state=0x3ffffff7fb8) at util/parse-events.c:1867 #8 0x000000000126a1e8 in __parse_events (evlist=0x168b580, str=0x3ffffff81d8 "temp_test_hwmon_event1", pmu_filter=0x0, err=0x3ffffff81c8, fake_pmu=false, warn_if_reordered=true, fake_tp=false) at util/parse-events.c:2136 #9 0x00000000011e36aa in parse_events (evlist=0x168b580, str=0x3ffffff81d8 "temp_test_hwmon_event1", err=0x3ffffff81c8) at /root/linux/tools/perf/util/parse-events.h:41 #10 0x00000000011e3e64 in do_test (i=0, with_pmu=false, with_alias=false) at tests/hwmon_pmu.c:164 #11 0x00000000011e422c in test__hwmon_pmu (with_pmu=false) at tests/hwmon_pmu.c:219 #12 0x00000000011e431c in test__hwmon_pmu_without_pmu (test=0x1610368 <suite.hwmon_pmu>, subtest=1) at tests/hwmon_pmu.c:23 where the attr::config is set to value 292470092988416 or 0x10a0000000000 in line 625 of file ./util/hwmon_pmu.c: attr->config = key.type_and_num; However member key::type_and_num is defined as union and bit field: union hwmon_pmu_event_key { long type_and_num; struct { int num :16; enum hwmon_type type :8; }; }; s390 is big endian and Intel is little endian architecture. The events for the hwmon dummy pmu have num = 1 or num = 2 and type is set to HWMON_TYPE_TEMP (which is 10). On s390 this assignes member key::type_and_num the value of 0x10a0000000000 (which is 292470092988416) as shown in above trace output. Fix this and export the structure/union hwmon_pmu_event_key so the test shares the same implementation as the event parsing functions for union and bit fields. This should avoid endianess issues on all platforms. Output after: # ./perf test -F 11 11.1: Basic parsing test : Ok 11.2: Parsing without PMU name : Ok 11.3: Parsing with PMU name : Ok # Fixes: 531ee0fd4836 ("perf test: Add hwmon "PMU" test") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250131112400.568975-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-04	cxl: Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'	Dan Williams
	The pending efforts to add CXL Accelerator (type-2) device [1], and Dynamic Capacity (DCD) support [2], tripped on the no-longer-fit-for-purpose design in the CXL subsystem for tracking device-physical-address (DPA) metadata. Trip hazards include: - CXL Memory Devices need to consider a PMEM partition, but Accelerator devices with CXL.mem likely do not in the common case. - CXL Memory Devices enumerate DPA through Memory Device mailbox commands like Partition Info, Accelerators devices do not. - CXL Memory Devices that support DCD support more than 2 partitions. Some of the driver algorithms are awkward to expand to > 2 partition cases. - DPA performance data is a general capability that can be shared with accelerators, so tracking it in 'struct cxl_memdev_state' is no longer suitable. - Hardcoded assumptions around the PMEM partition always being index-1 if RAM is zero-sized or PMEM is zero sized. - 'enum cxl_decoder_mode' is sometimes a partition id and sometimes a memory property, it should be phased in favor of a partition id and the memory property comes from the partition info. Towards cleaning up those issues and allowing a smoother landing for the aforementioned pending efforts, introduce a 'struct cxl_dpa_partition' array to 'struct cxl_dev_state', and 'struct cxl_range_info' as a shared way for Memory Devices and Accelerators to initialize the DPA information in 'struct cxl_dev_state'. For now, split a new cxl_dpa_setup() from cxl_mem_create_range_info() to get the new data structure initialized, and cleanup some qos_class init. Follow on patches will go further to use the new data structure to cleanup algorithms that are better suited to loop over all possible partitions. cxl_dpa_setup() follows the locking expectations of mutating the device DPA map, and is suitable for Accelerator drivers to use. Accelerators likely only have one hardcoded 'ram' partition to convey to the cxl_core. Link: http://lore.kernel.org/20241230214445.27602-1-alejandro.lucero-palau@amd.com [1] Link: http://lore.kernel.org/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com [2] Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alejandro Lucero <alucerop@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Alejandro Lucero <alucerop@amd.com> Link: https://patch.msgid.link/173864305827.668823.13978794102080021276.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-04	cxl: Introduce to_{ram,pmem}_{res,perf}() helpers	Dan Williams
	In preparation for consolidating all DPA partition information into an array of DPA metadata, introduce helpers that hide the layout of the current data. I.e. make the eventual replacement of ->ram_res, ->pmem_res, ->ram_perf, and ->pmem_perf with a new DPA metadata array a no-op for code paths that consume that information, and reduce the noise of follow-on patches. The end goal is to consolidate all DPA information in 'struct cxl_dev_state', but for now the helpers just make it appear that all DPA metadata is relative to @cxlds. As the conversion to generic partition metadata walking is completed, these helpers will naturally be eliminated, or reduced in scope. Cc: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Alejandro Lucero <alucerop@amd.com> Link: https://patch.msgid.link/173864305238.668823.16553986866633608541.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-04	sched_ext: Print core event count in scx_qmap scheduler	Changwoo Min
	Modify the scx_qmap scheduler to print the core event counter every second. Signed-off-by: Changwoo Min <changwoo@igalia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-04	sched_ext: Print core event count in scx_central scheduler	Changwoo Min
	Modify the scx_central scheduler to print the core event counter every second. Signed-off-by: Changwoo Min <changwoo@igalia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-02-04	sched_ext: Add scx_bpf_events() and scx_read_event() for BPF schedulers	Changwoo Min
	scx_bpf_events() is added to the header files so the BPF scheduler can use it. Also, scx_read_event() is added to read an event type in a compatible way. Signed-off-by: Changwoo Min <changwoo@igalia.com> Signed-off-by: Tejun Heo <tj@kernel.org>