summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-11net: hsr: Remove interlink_sequence_nr.Eric Dumazet
Remove interlink_sequence_nr which is unused. [ bigeasy: split out from Eric's patch ]. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240906132816.657485-3-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: hsr: Use the seqnr lock for frames received via interlink port.Sebastian Andrzej Siewior
syzbot reported that the seqnr_lock is not acquire for frames received over the interlink port. In the interlink case a new seqnr is generated and assigned to the frame. Frames, which are received over the slave port have already a sequence number assigned so the lock is not required. Acquire the hsr_priv::seqnr_lock during in the invocation of hsr_forward_skb() if a packet has been received from the interlink port. Reported-by: syzbot+3d602af7549af539274e@syzkaller.appspotmail.com Closes: https://groups.google.com/g/syzkaller-bugs/c/KppVvGviGg4/m/EItSdCZdBAAJ Fixes: 5055cccfc2d1c ("net: hsr: Provide RedBox support (HSR-SAN)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lukasz Majewski <lukma@denx.de> Tested-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240906132816.657485-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge branch 'selftests-mptcp-misc-small-fixes'Jakub Kicinski
Matthieu Baerts says: ==================== selftests: mptcp: misc. small fixes Here are some various fixes for the MPTCP selftests. Patch 1 fixes a recently modified test to continue to work as expected on older kernels. This is a fix for a recent fix that can be backported up to v5.15. Patch 2 and 3 include dependences when exporting or installing the tests. Two fixes for v6.11-rc1. ==================== Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-0-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11selftests: mptcp: include net_helper.sh fileMatthieu Baerts (NGI0)
Similar to the previous commit, the net_helper.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: 1af3bc912eac ("selftests: mptcp: lib: use wait_local_port_listen helper") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-3-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11selftests: mptcp: include lib.sh fileMatthieu Baerts (NGI0)
The lib.sh file from the parent directory is used by the MPTCP selftests and it needs to be present when running the tests. This file then needs to be listed in the Makefile to be included when exporting or installing the tests, e.g. with: make -C tools/testing/selftests \ TARGETS=net/mptcp \ install INSTALL_PATH=$KSFT_INSTALL_PATH cd $KSFT_INSTALL_PATH ./run_kselftest.sh -c net/mptcp Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-2-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11selftests: mptcp: join: restrict fullmesh endp on 1st sfMatthieu Baerts (NGI0)
A new endpoint using the IP of the initial subflow has been recently added to increase the code coverage. But it breaks the test when using old kernels not having commit 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk"), e.g. on v5.15. Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local endpoint being tracked or not"), it is possible to add the new endpoint conditionally, by checking if "mptcp_pm_subflow_check_next" is present in kallsyms: this is not directly linked to the commit introducing this symbol but for the parent one which is linked anyway. So we can know in advance what will be the expected behaviour, and add the new endpoint only when it makes sense to do so. Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12netfilter: nft_socket: make cgroupsv2 matching work with namespacesFlorian Westphal
When running in container environmment, /sys/fs/cgroup/ might not be the real root node of the sk-attached cgroup. Example: In container: % stat /sys//fs/cgroup/ Device: 0,21 Inode: 2214 .. % stat /sys/fs/cgroup/foo Device: 0,21 Inode: 2264 .. The expectation would be for: nft add rule .. socket cgroupv2 level 1 "foo" counter to match traffic from a process that got added to "foo" via "echo $pid > /sys/fs/cgroup/foo/cgroup.procs". However, 'level 3' is needed to make this work. Seen from initial namespace, the complete hierarchy is: % stat /sys/fs/cgroup/system.slice/docker-.../foo Device: 0,21 Inode: 2264 .. i.e. hierarchy is 0 1 2 3 / -> system.slice -> docker-1... -> foo ... but the container doesn't know that its "/" is the "docker-1.." cgroup. Current code will retrieve the 'system.slice' cgroup node and store its kn->id in the destination register, so compare with 2264 ("foo" cgroup id) will not match. Fetch "/" cgroup from ->init() and add its level to the level we try to extract. cgroup root-level is 0 for the init-namespace or the level of the ancestor that is exposed as the cgroup root inside the container. In the above case, cgrp->level of "/" resolved in the container is 2 (docker-1...scope/) and request for 'level 1' will get adjusted to fetch the actual level (3). v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it. (kernel test robot) Cc: cgroups@vger.kernel.org Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-12netfilter: nft_socket: fix sk refcount leaksFlorian Westphal
We must put 'sk' reference before returning. Fixes: 039b1f4f24ec ("netfilter: nft_socket: fix erroneous socket assignment") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-11blk_iocost: make read-only static array vrate_adj_pct constColin Ian King
The static array vrate_adj_pct is read-only, so make it const as well. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240911214124.197403-1-colin.i.king@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11i2c: isch: Add missed 'else'Andy Shevchenko
In accordance with the existing comment and code analysis it is quite likely that there is a missed 'else' when adapter times out. Add it. Fixes: 5bc1200852c3 ("i2c: Add Intel SCH SMBus support") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: <stable@vger.kernel.org> # v2.6.27+ Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-09-11spi: nxp-fspi: fix the KASAN report out-of-bounds bugHan Xu
Change the memcpy length to fix the out-of-bounds issue when writing the data that is not 4 byte aligned to TX FIFO. To reproduce the issue, write 3 bytes data to NOR chip. dd if=3b of=/dev/mtd0 [ 36.926103] ================================================================== [ 36.933409] BUG: KASAN: slab-out-of-bounds in nxp_fspi_exec_op+0x26ec/0x2838 [ 36.940514] Read of size 4 at addr ffff00081037c2a0 by task dd/455 [ 36.946721] [ 36.948235] CPU: 3 UID: 0 PID: 455 Comm: dd Not tainted 6.11.0-rc5-gc7b0e37c8434 #1070 [ 36.956185] Hardware name: Freescale i.MX8QM MEK (DT) [ 36.961260] Call trace: [ 36.963723] dump_backtrace+0x90/0xe8 [ 36.967414] show_stack+0x18/0x24 [ 36.970749] dump_stack_lvl+0x78/0x90 [ 36.974451] print_report+0x114/0x5cc [ 36.978151] kasan_report+0xa4/0xf0 [ 36.981670] __asan_report_load_n_noabort+0x1c/0x28 [ 36.986587] nxp_fspi_exec_op+0x26ec/0x2838 [ 36.990800] spi_mem_exec_op+0x8ec/0xd30 [ 36.994762] spi_mem_no_dirmap_read+0x190/0x1e0 [ 36.999323] spi_mem_dirmap_write+0x238/0x32c [ 37.003710] spi_nor_write_data+0x220/0x374 [ 37.007932] spi_nor_write+0x110/0x2e8 [ 37.011711] mtd_write_oob_std+0x154/0x1f0 [ 37.015838] mtd_write_oob+0x104/0x1d0 [ 37.019617] mtd_write+0xb8/0x12c [ 37.022953] mtdchar_write+0x224/0x47c [ 37.026732] vfs_write+0x1e4/0x8c8 [ 37.030163] ksys_write+0xec/0x1d0 [ 37.033586] __arm64_sys_write+0x6c/0x9c [ 37.037539] invoke_syscall+0x6c/0x258 [ 37.041327] el0_svc_common.constprop.0+0x160/0x22c [ 37.046244] do_el0_svc+0x44/0x5c [ 37.049589] el0_svc+0x38/0x78 [ 37.052681] el0t_64_sync_handler+0x13c/0x158 [ 37.057077] el0t_64_sync+0x190/0x194 [ 37.060775] [ 37.062274] Allocated by task 455: [ 37.065701] kasan_save_stack+0x2c/0x54 [ 37.069570] kasan_save_track+0x20/0x3c [ 37.073438] kasan_save_alloc_info+0x40/0x54 [ 37.077736] __kasan_kmalloc+0xa0/0xb8 [ 37.081515] __kmalloc_noprof+0x158/0x2f8 [ 37.085563] mtd_kmalloc_up_to+0x120/0x154 [ 37.089690] mtdchar_write+0x130/0x47c [ 37.093469] vfs_write+0x1e4/0x8c8 [ 37.096901] ksys_write+0xec/0x1d0 [ 37.100332] __arm64_sys_write+0x6c/0x9c [ 37.104287] invoke_syscall+0x6c/0x258 [ 37.108064] el0_svc_common.constprop.0+0x160/0x22c [ 37.112972] do_el0_svc+0x44/0x5c [ 37.116319] el0_svc+0x38/0x78 [ 37.119401] el0t_64_sync_handler+0x13c/0x158 [ 37.123788] el0t_64_sync+0x190/0x194 [ 37.127474] [ 37.128977] The buggy address belongs to the object at ffff00081037c2a0 [ 37.128977] which belongs to the cache kmalloc-8 of size 8 [ 37.141177] The buggy address is located 0 bytes inside of [ 37.141177] allocated 3-byte region [ffff00081037c2a0, ffff00081037c2a3) [ 37.153465] [ 37.154971] The buggy address belongs to the physical page: [ 37.160559] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x89037c [ 37.168596] flags: 0xbfffe0000000000(node=0|zone=2|lastcpupid=0x1ffff) [ 37.175149] page_type: 0xfdffffff(slab) [ 37.179021] raw: 0bfffe0000000000 ffff000800002500 dead000000000122 0000000000000000 [ 37.186788] raw: 0000000000000000 0000000080800080 00000001fdffffff 0000000000000000 [ 37.194553] page dumped because: kasan: bad access detected [ 37.200144] [ 37.201647] Memory state around the buggy address: [ 37.206460] ffff00081037c180: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc [ 37.213701] ffff00081037c200: fa fc fc fc 05 fc fc fc 03 fc fc fc 02 fc fc fc [ 37.220946] >ffff00081037c280: 06 fc fc fc 03 fc fc fc fc fc fc fc fc fc fc fc [ 37.228186] ^ [ 37.232473] ffff00081037c300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 37.239718] ffff00081037c380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 37.246962] ================================================================== [ 37.254394] Disabling lock debugging due to kernel taint 0+1 records in 0+1 records out 3 bytes copied, 0.335911 s, 0.0 kB/s Fixes: a5356aef6a90 ("spi: spi-mem: Add driver for NXP FlexSPI controller") Cc: stable@kernel.org Signed-off-by: Han Xu <han.xu@nxp.com> Link: https://patch.msgid.link/20240911211146.3337068-1-han.xu@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-11docs/bpf: Add constant values for linkagesWill Hawkins
Make the values of the symbolic constants that define the valid linkages for functions and variables explicit. Signed-off-by: Will Hawkins <hawkinsw@obs.cr> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240911055033.2084881-1-hawkinsw@obs.cr
2024-09-11Merge tag 'wireless-next-2024-09-11' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.12 The last -next "new features" pull request for v6.12. The stack now supports DFS on MLO but otherwise nothing really standing out. Major changes: cfg80211/mac80211 * EHT rate support in AQL airtime * DFS support for MLO rtw89 * complete BT-coexistence code for RTL8852BT * RTL8922A WoWLAN net-detect support * tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits) wifi: brcmfmac: cfg80211: Convert comma to semicolon wifi: rsi: Remove an unused field in struct rsi_debugfs wifi: libertas: Cleanup unused declarations wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_bus_probe() wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_sdio_probe() wifi: wilc1000: fix potential RCU dereference issue in wilc_parse_join_bss_param wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext() wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop() wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors wifi: cfg80211: fix kernel-doc for per-link data wifi: mt76: mt7925: replace chan config with extend txpower config for clc wifi: mt76: mt7925: fix a potential array-index-out-of-bounds issue for clc wifi: mt76: mt7615: check devm_kasprintf() returned value wifi: mt76: mt7925: convert comma to semicolon wifi: mt76: mt7925: fix a potential association failure upon resuming wifi: mt76: Avoid multiple -Wflex-array-member-not-at-end warnings wifi: mt76: mt7921: Check devm_kasprintf() returned value wifi: mt76: mt7915: check devm_kasprintf() returned value wifi: mt76: mt7915: avoid long MCU command timeouts during SER wifi: mt76: mt7996: fix uninitialized TLV data ... ==================== Link: https://patch.msgid.link/20240911084147.A205DC4AF0F@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11bpf: Use fake pt_regs when doing bpf syscall tracepoint tracingYonghong Song
Salvatore Benedetto reported an issue that when doing syscall tracepoint tracing the kernel stack is empty. For example, using the following command line bpftrace -e 'tracepoint:syscalls:sys_enter_read { print("Kernel Stack\n"); print(kstack()); }' bpftrace -e 'tracepoint:syscalls:sys_exit_read { print("Kernel Stack\n"); print(kstack()); }' the output for both commands is === Kernel Stack === Further analysis shows that pt_regs used for bpf syscall tracepoint tracing is from the one constructed during user->kernel transition. The call stack looks like perf_syscall_enter+0x88/0x7c0 trace_sys_enter+0x41/0x80 syscall_trace_enter+0x100/0x160 do_syscall_64+0x38/0xf0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The ip address stored in pt_regs is from user space hence no kernel stack is printed. To fix the issue, kernel address from pt_regs is required. In kernel repo, there are already a few cases like this. For example, in kernel/trace/bpf_trace.c, several perf_fetch_caller_regs(fake_regs_ptr) instances are used to supply ip address or use ip address to construct call stack. Instead of allocate fake_regs in the stack which may consume a lot of bytes, the function perf_trace_buf_alloc() in perf_syscall_{enter, exit}() is leveraged to create fake_regs, which will be passed to perf_call_bpf_{enter,exit}(). For the above bpftrace script, I got the following output with this patch: for tracepoint:syscalls:sys_enter_read === Kernel Stack syscall_trace_enter+407 syscall_trace_enter+407 do_syscall_64+74 entry_SYSCALL_64_after_hwframe+75 === and for tracepoint:syscalls:sys_exit_read === Kernel Stack syscall_exit_work+185 syscall_exit_work+185 syscall_exit_to_user_mode+305 do_syscall_64+118 entry_SYSCALL_64_after_hwframe+75 === Reported-by: Salvatore Benedetto <salvabenedetto@meta.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240910214037.3663272-1-yonghong.song@linux.dev
2024-09-11perf trace: Mark the 'head' arg in the set_robust_list syscall as coming ↵Arnaldo Carvalho de Melo
from user space With that it uses the generic BTF based pretty printer: This one we need to think about, not being acquainted with this syscall, should we _traverse_ that list somehow? Would that be useful? root@number:~# perf trace -e set_robust_list sleep 1 0.000 ( 0.004 ms): sleep/1206493 set_robust_list(head: (struct robust_list_head){.list = (struct robust_list){.next = (struct robust_list *)0x7f48a9a02a20,},.futex_offset = (long int)-32,}, len: 24) = root@number:~# strace prints the default integer args: root@number:~# strace -e set_robust_list sleep 1 set_robust_list(0x7efd99559a20, 24) = 0 +++ exited with 0 +++ root@number:~# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org Link: https://lore.kernel.org/lkml/ZuH6MquMraBvODRp@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11Merge branch 'bpf-add-percpu-map-value-size-check'Andrii Nakryiko
Tao Chen says: ==================== bpf: Add percpu map value size check Check percpu map value size first and add the test case in selftest. Change list: - v2 -> v3: - use bpf_map_create API and mv test case in map_percpu_stats.c - v1 -> v2: - round up map value size with 8 bytes in patch 1 - add selftest case in patch 2 ==================== Link: https://lore.kernel.org/r/20240910144111.1464912-1-chen.dylane@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-09-11bpf/selftests: Check errno when percpu map value size exceedsTao Chen
This test case checks the errno message when percpu map value size exceeds PCPU_MIN_UNIT_SIZE. root@debian:~# ./test_maps ... test_map_percpu_stats_hash_of_maps:PASS test_map_percpu_stats_map_value_size:PASS test_sk_storage_map:PASS Signed-off-by: Jinke Han <jinkehan@didiglobal.com> Signed-off-by: Tao Chen <chen.dylane@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240910144111.1464912-3-chen.dylane@gmail.com
2024-09-11bpf: Check percpu map value size firstTao Chen
Percpu map is often used, but the map value size limit often ignored, like issue: https://github.com/iovisor/bcc/issues/2519. Actually, percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first, like percpu map of local_storage. Maybe the error message seems clearer compared with "cannot allocate memory". Signed-off-by: Jinke Han <jinkehan@didiglobal.com> Signed-off-by: Tao Chen <chen.dylane@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com
2024-09-11i2c: xiic: Try re-initialization on bus busy timeoutRobert Hancock
In the event that the I2C bus was powered down when the I2C controller driver loads, or some spurious pulses occur on the I2C bus, it's possible that the controller detects a spurious I2C "start" condition. In this situation it may continue to report the bus is busy indefinitely and block the controller from working. The "single-master" DT flag can be specified to disable bus busy checks entirely, but this may not be safe to use in situations where other I2C masters may potentially exist. In the event that the controller reports "bus busy" for too long when starting a transaction, we can try reinitializing the controller to see if the busy condition clears. This allows recovering from this scenario. Fixes: e1d5b6598cdc ("i2c: Add support for Xilinx XPS IIC Bus Interface") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Cc: <stable@vger.kernel.org> # v2.6.34+ Reviewed-by: Manikanta Guntupalli <manikanta.guntupalli@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-09-11i2c: xiic: Wait for TX empty to avoid missed TX NAKsRobert Hancock
Frequently an I2C write will be followed by a read, such as a register address write followed by a read of the register value. In this driver, when the TX FIFO half empty interrupt was raised and it was determined that there was enough space in the TX FIFO to send the following read command, it would do so without waiting for the TX FIFO to actually empty. Unfortunately it appears that in some cases this can result in a NAK that was raised by the target device on the write, such as due to an unsupported register address, being ignored and the subsequent read being done anyway. This can potentially put the I2C bus into an invalid state and/or result in invalid read data being processed. To avoid this, once a message has been fully written to the TX FIFO, wait for the TX FIFO empty interrupt before moving on to the next message, to ensure NAKs are handled properly. Fixes: e1d5b6598cdc ("i2c: Add support for Xilinx XPS IIC Bus Interface") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Cc: <stable@vger.kernel.org> # v2.6.34+ Reviewed-by: Manikanta Guntupalli <manikanta.guntupalli@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-09-11sock_map: Add a cond_resched() in sock_hash_free()Eric Dumazet
Several syzbot soft lockup reports all have in common sock_hash_free() If a map with a large number of buckets is destroyed, we need to yield the cpu when needed. Fixes: 75e68e5bf2c7 ("bpf, sockhash: Synchronize delete from bucket list on map free") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240906154449.3742932-1-edumazet@google.com
2024-09-11perf trace: Mark the 'rseq' arg in the rseq syscall as coming from user spaceArnaldo Carvalho de Melo
With that it uses the generic BTF based pretty printer: root@number:~# grep -w rseq /sys/kernel/tracing/events/syscalls/sys_enter_rseq/format field:struct rseq * rseq; offset:16; size:8; signed:0; print fmt: "rseq: 0x%08lx, rseq_len: 0x%08lx, flags: 0x%08lx, sig: 0x%08lx", ((unsigned long)(REC->rseq)), ((unsigned long)(REC->rseq_len)), ((unsigned long)(REC->flags)), ((unsigned long)(REC->sig)) root@number:~# Before: root@number:~# perf trace -e rseq 0.000 ( 0.017 ms): Isolated Web C/1195452 rseq(rseq: 0x7ff0ecfe6fe0, rseq_len: 32, sig: 1392848979) = 0 74.018 ( 0.006 ms): :1195453/1195453 rseq(rseq: 0x7f2af20fffe0, rseq_len: 32, sig: 1392848979) = 0 1817.220 ( 0.009 ms): Isolated Web C/1195454 rseq(rseq: 0x7f5c9ec7dfe0, rseq_len: 32, sig: 1392848979) = 0 2515.526 ( 0.034 ms): :1195455/1195455 rseq(rseq: 0x7f61503fffe0, rseq_len: 32, sig: 1392848979) = 0 ^Croot@number:~# After: root@number:~# perf trace -e rseq 0.000 ( 0.019 ms): Isolated Web C/1197258 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)4,.cpu_id = (__u32)4,.mm_cid = (__u32)5,}, rseq_len: 32, sig: 1392848979) = 0 1663.835 ( 0.019 ms): Isolated Web C/1197259 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)24,.cpu_id = (__u32)24,.mm_cid = (__u32)2,}, rseq_len: 32, sig: 1392848979) = 0 4750.444 ( 0.018 ms): Isolated Web C/1197260 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)8,.cpu_id = (__u32)8,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0 4994.132 ( 0.018 ms): Isolated Web C/1197261 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)10,.cpu_id = (__u32)10,.mm_cid = (__u32)1,}, rseq_len: 32, sig: 1392848979) = 0 4997.578 ( 0.011 ms): Isolated Web C/1197263 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)16,.cpu_id = (__u32)16,.mm_cid = (__u32)4,}, rseq_len: 32, sig: 1392848979) = 0 4997.462 ( 0.014 ms): Isolated Web C/1197262 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)17,.cpu_id = (__u32)17,.mm_cid = (__u32)3,}, rseq_len: 32, sig: 1392848979) = 0 ^Croot@number:~# We'll probably need to come up with some way for using the BTF info to synthesize a test that then gets used and captures the output of the 'perf trace' output to check if the arguments are the ones synthesized, randomically, for now, lets make do manually: root@number:~# cat ~acme/c/rseq.c #include <sys/syscall.h> /* Definition of SYS_* constants */ #include <linux/rseq.h> #include <errno.h> #include <string.h> #include <unistd.h> #include <stdint.h> #include <stdio.h> /* Provide own rseq stub because glibc doesn't */ __attribute__((weak)) int sys_rseq(struct rseq *rseq, __u32 rseq_len, int flags, __u32 sig) { return syscall(SYS_rseq, rseq, rseq_len, flags, sig); } int main(int argc, char *argv[]) { struct rseq rseq = { .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }; int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf); printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno)); return err; } root@number:~# perf trace -e rseq ~acme/c/rseq sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) 0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument) root@number:~#root@number:~# cat ~acme/c/rseq.c #include <sys/syscall.h> /* Definition of SYS_* constants */ #include <linux/rseq.h> #include <errno.h> #include <string.h> #include <unistd.h> #include <stdint.h> #include <stdio.h> /* Provide own rseq stub because glibc doesn't */ __attribute__((weak)) int sys_rseq(struct rseq *rseq, __u32 rseq_len, int flags, __u32 sig) { return syscall(SYS_rseq, rseq, rseq_len, flags, sig); } int main(int argc, char *argv[]) { struct rseq rseq = { .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }; int err = sys_rseq(&rseq, sizeof(rseq), 98765, 0xdeadbeaf); printf("sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, %d, 0) = %d (%s)\n", sizeof(rseq), err, strerror(errno)); return err; } root@number:~# perf trace -e rseq ~acme/c/rseq sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) 0.000 ( 0.003 ms): rseq/1200640 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.064 ( 0.001 ms): rseq/1200640 rseq(rseq: (struct rseq){.cpu_id_start = (__u32)12,.cpu_id = (__u32)34,.rseq_cs = (__u64)56,.flags = (__u32)78,.node_id = (__u32)90,.mm_cid = (__u32)12,}, rseq_len: 32, flags: 98765, sig: 3735928495) = -1 EINVAL (Invalid argument) root@number:~# Interesting, glibc seems to be using rseq here, as in addition to the totally fake one this test case uses, we have this one, around these other syscalls: 0.175 ( 0.001 ms): rseq/1201095 set_tid_address(tidptr: 0x7f6def759a10) = 1201095 (rseq) 0.177 ( 0.001 ms): rseq/1201095 set_robust_list(head: 0x7f6def759a20, len: 24) = 0 0.178 ( 0.001 ms): rseq/1201095 rseq(rseq: (struct rseq){}, rseq_len: 32, sig: 1392848979) = 0.231 ( 0.005 ms): rseq/1201095 mprotect(start: 0x7f6def93f000, len: 16384, prot: READ) = 0 0.238 ( 0.003 ms): rseq/1201095 mprotect(start: 0x403000, len: 4096, prot: READ) = 0 0.244 ( 0.004 ms): rseq/1201095 mprotect(start: 0x7f6def99c000, len: 8192, prot: READ) Matches strace (well, not really as the strace in fedora:40 doesn't know about rseq, printing just integer values in hex): set_robust_list(0x7fbc6acc7a20, 24) = 0 rseq(0x7fbc6acc8060, 0x20, 0, 0x53053053) = 0 mprotect(0x7fbc6aead000, 16384, PROT_READ) = 0 mprotect(0x403000, 4096, PROT_READ) = 0 mprotect(0x7fbc6af0a000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7fbc6aebd000, 81563) = 0 rseq(0x7fff15bb9920, 0x20, 0x181cd, 0xdeadbeaf) = -1 EINVAL (Invalid argument) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x9), ...}) = 0 getrandom("\xd0\x34\x97\x17\x61\xc2\x2b\x10", 8, GRND_NONBLOCK) = 8 brk(NULL) = 0x18ff4000 brk(0x19015000) = 0x19015000 write(1, "sys_rseq({ .cpu_id_start = 12, ."..., 136sys_rseq({ .cpu_id_start = 12, .cpu_id = 34, .rseq_cs = 56, .flags = 78, .node_id = 90, .mm_cid = 12, }, 32, 0) = -1 (Invalid argument) ) = 136 exit_group(-1) = ? +++ exited with 255 +++ root@number:~# And also the focus for the v6.13 should be to have a better, strace like BTF pretty printer as one of the outputs we can get from the libbpf BTF dumper. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZuH2K1LLt1pIDkbd@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-11Merge branches 'acpi-video', 'acpi-resource', 'acpi-pad' and 'acpi-misc'Rafael J. Wysocki
Merge ACPI backlight (video) driver update, ACPI resource management updates, an ACPI processor aggregator device (PAD) driver fix, and miscellaneous ACPI updates for 6.12-rc1: - Add force_vendor quirk for Panasonic Toughbook CF-18 in the ACPI backlight driver (Hans de Goede). - Make the DMI checks related to backlight handling on Lenovo Yoga Tab 3 X90F less strict (Hans de Goede). - Enforce native backlight handling on Apple MacbookPro9,2 (Esther Shimanovich). - Add IRQ override quirks for Asus Vivobook Go E1404GAB and MECHREV GM7XG0M, and refine the TongFang GMxXGxx quirk (Li Chen, Tamim Khan, Werner Sembach). - Fix crash in exit_round_robin() in the ACPI processor aggregator device (PAD) driver (Seiji Nishikawa). - Define and use symbols for device and class name lengths in the ACPI bus type code and make the code use strscpy() instead of strcpy() in several places (Muhammad Qasim Abdul Majeed). * acpi-video: ACPI: video: Add force_vendor quirk for Panasonic Toughbook CF-18 ACPI: x86: Make Lenovo Yoga Tab 3 X90F DMI match less strict ACPI: video: Make Lenovo Yoga Tab 3 X90F DMI match less strict ACPI: video: force native for Apple MacbookPro9,2 * acpi-resource: ACPI: resource: Add another DMI match for the TongFang GMxXGxx ACPI: resource: Skip IRQ override on Asus Vivobook Go E1404GAB ACPI: resource: Do IRQ override on MECHREV GM7XG0M * acpi-pad: ACPI: PAD: fix crash in exit_round_robin() * acpi-misc: ACPI: button: Use strscpy() instead of strcpy() ACPI: bus: Define and use symbols for device and class name lengths ACPI: battery : Use strscpy() instead of strcpy() ACPI: acpi_processor: Use strscpy instead() of strcpy() ACPI: PAD: Use strscpy() instead of strcpy() ACPI: AC: Use strscpy() instead of strcpy()
2024-09-11io_uring/rsrc: add reference count to struct io_mapped_ubufJens Axboe
Currently there's a single ring owner of a mapped buffer, and hence the reference count will always be 1 when it's torn down and freed. However, in preparation for being able to link io_mapped_ubuf to different spots, add a reference count to manage the lifetime of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11io_uring/rsrc: clear 'slot' entry upfrontJens Axboe
No functional changes in this patch, but clearing the slot pointer earlier will be required by a later change. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11Merge branches 'acpi-battery', 'acpi-pmic', 'acpi-cppc' and 'acpi-processor'Rafael J. Wysocki
Merge ACPI battery driver, ACPI PMIC driver, ACPI processor driver and ACPI CPPC library updates for 6.12-rc1: - Use the driver core for the async probing management in the ACPI battery driver (Thomas Weißschuh). - Remove redundant initalizations of a local variable to NULL from the ACPI battery driver (Ilpo Järvinen). - Use strscpy() instead of strcpy() in the ACPI battery driver (Muhammad Qasim Abdul Majeed). - Remove unneeded check in tps68470_pmic_opregion_probe() (Aleksandr Mishin). - Add support for setting the EPP register through the ACPI CPPC sysfs interface if it is in FFH (Mario Limonciello). - Fix MASK_VAL() usage in the ACPI CPPC library (Clément Léger). - Reduce the log level of a per-CPU message about idle states in the ACPI processor driver (Li RongQing). * acpi-battery: ACPI: battery: use driver core managed async probing ACPI: battery: Remove redundant NULL initalizations ACPI: battery: Use strscpy() instead of strcpy() * acpi-pmic: ACPI: PMIC: Remove unneeded check in tps68470_pmic_opregion_probe() * acpi-cppc: ACPI: CPPC: Add support for setting EPP register in FFH ACPI: CPPC: Fix MASK_VAL() usage * acpi-processor: ACPI: processor: Reduce the log level of a per-CPU message about idle states
2024-09-11Merge branches 'acpi-ec', 'acpi-sysfs', 'acpi-utils' and 'acpi-soc'Rafael J. Wysocki
Merge an ACPI EC driver update, ACPI sysfs interface updates, an ACPI library function update, and an ACPI APD driver update for 6.12-rc1: - Do not release locks during operation region accesses in the ACPI EC driver (Rafael Wysocki). - Fix up the _STR handling in the ACPI device object sysfs interface, make it represent the device object attributes as an attribute group and make it rely on driver core functionality for sysfs attrubute management (Thomas Weißschuh). - Extend error messages printed to the kernel log when acpi_evaluate_dsm() fails to include revision and function number (David Wang). - Add a new AMDI0015 platform device ID to the ACPi APD driver for AMD SoCs (Shyam Sundar S K). * acpi-ec: ACPI: EC: Do not release locks during operation region accesses * acpi-sysfs: ACPI: sysfs: remove return value of acpi_device_setup_files() ACPI: sysfs: manage sysfs attributes through device core ACPI: sysfs: manage attributes as attribute_group ACPI: sysfs: evaluate _STR on each sysfs access ACPI: sysfs: validate return type of _STR method * acpi-utils: ACPI: utils: Add rev/func to message when acpi_evaluate_dsm() fails * acpi-soc: ACPI: APD: Add AMDI0015 as platform device
2024-09-11Merge branch 'acpi-riscv'Rafael J. Wysocki
Merge ACPI and irqchip updates related to external interrupt controller support on RISC-V: - Add ACPI device enumeration support for interrupt controller probing including taking dependencies into account (Sunil V L). - Implement ACPI-based interrupt controller probing on RISC-V (Sunil V L). - Add ACPI support for AIA in riscv-intc and add ACPI support to riscv-imsic, riscv-aplic, and sifive-plic (Sunil V L). * acpi-riscv: irqchip/sifive-plic: Add ACPI support irqchip/riscv-aplic: Add ACPI support irqchip/riscv-imsic: Add ACPI support irqchip/riscv-imsic-state: Create separate function for DT irqchip/riscv-intc: Add ACPI support for AIA ACPI: RISC-V: Implement function to add implicit dependencies ACPI: RISC-V: Initialize GSI mapping structures ACPI: RISC-V: Implement function to reorder irqchip probe entries ACPI: RISC-V: Implement PCI related functionality ACPI: pci_link: Clear the dependencies after probe ACPI: bus: Add RINTC IRQ model for RISC-V ACPI: scan: Define weak function to populate dependencies ACPI: scan: Add RISC-V interrupt controllers to honor list ACPI: scan: Refactor dependency creation ACPI: bus: Add acpi_riscv_init() function ACPI: scan: Add a weak arch_sort_irqchip_probe() to order the IRQCHIP probe arm64: PCI: Migrate ACPI related functions to pci-acpi.c
2024-09-11Merge branch 'acpica'Rafael J. Wysocki
Merge ACPICA updates for 6.12-rc1: - Check return value in acpi_db_convert_to_package() (Pei Xiao). - Detect FACS and allow setting the waking vector on reduced-hardware ACPI platforms (Jiaqing Zhao). - Allow ACPICA to represent semaphores as integers (Adrien Destugues). - Complete CXL 3.0 CXIMS structures support in ACPICA (Zhang Rui). - Make ACPICA support SPCR version 4 and add RISC-V SBI Subtype to DBG2 (Sia Jee Heng). - Implement the Dword_PCC Resource Descriptor Macro in ACPICA (Jose Marinho). - Correct the typo in struct acpi_mpam_msc_node member (Punit Agrawal). - Implement ACPI_WARNING_ONCE() and ACPI_ERROR_ONCE() and use them to prevent a Stall() violation warning from being printed every time this takes place (Vasily Khoruzhick). - Allow PCC Data Type in MCTP resource (Adam Young). - Fix memory leaks on acpi_ps_get_next_namepath() and acpi_ps_get_next_field() failures (Armin Wolf). - Add support for supressing leading zeros in hex strings when converting them to integers and update integer-to-hex-string conversions in ACPICA (Armin Wolf). - Add support for Windows 11 22H2 _OSI string (Armin Wolf). - Avoid warning for Dump Functions in ACPICA (Adam Lackorzynski). - Add extended linear address mode to HMAT MSCIS in ACPICA (Dave Jiang). - Handle empty connection_node in iasl (Aleksandrs Vinarskis). - Allow for more flexibility in _DSM args (Saket Dumbre). - Setup for ACPICA release 20240827 (Saket Dumbre). * acpica: (23 commits) ACPICA: Setup for ACPICA release 20240827 ACPICA: Allow for more flexibility in _DSM args ACPICA: iasl: handle empty connection_node ACPICA: HMAT: Add extended linear address mode to MSCIS ACPICA: Avoid warning for Dump Functions ACPICA: Add support for Windows 11 22H2 _OSI string ACPICA: Update integer-to-hex-string conversions ACPICA: Add support for supressing leading zeros in hex strings ACPICA: Allow for supressing leading zeros when using acpi_ex_convert_to_ascii() ACPICA: Fix memory leak if acpi_ps_get_next_field() fails ACPICA: Fix memory leak if acpi_ps_get_next_namepath() fails ACPICA: Allow PCC Data Type in MCTP resource. ACPICA: executer/exsystem: Don't nag user about every Stall() violating the spec ACPICA: Implement ACPI_WARNING_ONCE and ACPI_ERROR_ONCE ACPICA: MPAM: Correct the typo in struct acpi_mpam_msc_node member ACPICA: Implement the Dword_PCC Resource Descriptor Macro ACPICA: Headers: Add RISC-V SBI Subtype to DBG2 ACPICA: SPCR: Update the SPCR table to version 4 ACPICA: Complete CXL 3.0 CXIMS structures ACPICA: haiku: Fix invalid value used for semaphores ...
2024-09-11fbdev: omapfb: Fix typo in commentAndrew Kreimer
Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2024-09-11KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifierMarc Zyngier
Now that REG_HIDDEN_USER has no direct user anymore, remove it entirely and update all users of sysreg_hidden_user() to call sysreg_hidden() instead. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240904082419.1982402-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-11KVM: arm64: Simplify visibility handling of AArch32 SPSR_*Marc Zyngier
Since SPSR_* are not associated with any register in the sysreg array, nor do they have .get_user()/.set_user() helpers, they are invisible to userspace with that encoding. Therefore hidden_user_visibility() serves no purpose here, and can be safely removed. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240904082419.1982402-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-11KVM: arm64: Simplify handling of CNTKCTL_EL12Marc Zyngier
We go trough a great deal of effort to map CNTKCTL_EL12 to CNTKCTL_EL1 while hidding this mapping from userspace via a special visibility helper. However, it would be far simpler to just provide an accessor doing the mapping job, removing the need for a visibility helper. With that done, we can also remove the EL12_REG() macro which serves no purpose. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240904082419.1982402-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-11Merge branch 'tip/sched/core' into sched_ext/for-6.12Tejun Heo
Pull in tip/sched/core to resolve two merge conflicts: - 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()") 5d871a63997f ("sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c") A simple context conflict. The former added __update_blocked_others() in the same #ifdef CONFIG_SMP block that effective_cpu_util() and sched_cpu_util() are in and the latter moved those functions to fair.c. This makes __update_blocked_others() more out of place. Will follow up with a patch to relocate. - 96fd6c65efc6 ("sched: Factor out update_other_load_avgs() from __update_blocked_others()") 84d265281d6c ("sched/pelt: Use rq_clock_task() for hw_pressure") The former factored out the body of __update_blocked_others() into update_other_load_avgs(). The latter changed how update_hw_load_avg() is called in the body. Resolved by applying the change to update_other_load_avgs() instead. Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-11Merge tag 'arm-fixes-6.11-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "The bulk of the changes this time are for device tree files in the rockchips platform, addressing correctness issues on individual boards, plus one change in the rk356x SoC file to make it match the binding. The only other changes that came in are - a CPU frequencey scaling fix for JH7110 (RISC-V) - a build fix for the cznic hwrandom driver - a fix for a deadlock in qualcomm uefi secure application firmware driver" * tag 'arm-fixes-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: platform: cznic: turris-omnia-mcu: fix HW_RANDOM dependency riscv: dts: starfive: jh7110-common: Fix lower rate of CPUfreq by setting PLL0 rate to 1.5GHz firmware: qcom: uefisecapp: Fix deadlock in qcuefi_acquire() arm64: dts: rockchip: Fix compatibles for RK3588 VO{0,1}_GRF dt-bindings: soc: rockchip: Fix compatibles for RK3588 VO{0,1}_GRF arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma arm64: dts: rockchip: fix eMMC/SPI corruption when audio has been used on RK3399 Puma arm64: dts: rockchip: fix PMIC interrupt pin in pinctrl for ROCK Pi E arm64: dts: rockchip: Remove broken tsadc pinctrl binding for rk356x
2024-09-11Merge tag 'for-6.11/dm-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mikulas Patocka: - fix a race condition in dm-integrity * tag 'for-6.11/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-integrity: fix a race condition when accessing recalc_sector
2024-09-11Merge tag 'printk-for-6.11-fixup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Fix build of serial_core as a module * tag 'printk-for-6.11-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: Export match_devname_and_update_preferred_console()
2024-09-11kernel/workqueue.c: fix DEFINE_PER_CPU_SHARED_ALIGNED expansionBaoquan He
Make tags always produces below annoying warnings: ctags: Warning: kernel/workqueue.c:470: null expansion of name pattern "\1" ctags: Warning: kernel/workqueue.c:474: null expansion of name pattern "\1" ctags: Warning: kernel/workqueue.c:478: null expansion of name pattern "\1" In commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU expansions"), codes in places have been adjusted including cpu_worker_pools definition. I noticed in commit 4cb1ef64609f ("workqueue: Implement BH workqueues to eventually replace tasklets"), cpu_worker_pools definition was unfolded back. Not sure if it was intentionally done or ignored carelessly. Makes change to mute them specifically. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-11minmax: reduce min/max macro expansion in atomisp driverLorenzo Stoakes
Avoid unnecessary nested min()/max() which results in egregious macro expansion. Use clamp_t() as this introduces the least possible expansion, and turn the {s,u}DIGIT_FITTING() macros into inline functions to avoid the nested expansion. This resolves an issue with slackware 15.0 32-bit compilation as reported by Richard Narron. Presumably the min/max fixups would be difficult to backport, this patch should be easier and fix's Richard's problem in 5.15. Reported-by: Richard Narron <richard@aaazen.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Closes: https://lore.kernel.org/all/4a5321bd-b1f-1832-f0c-cea8694dc5aa@aaazen.com/ Fixes: 867046cc7027 ("minmax: relax check to allow comparison between unsigned arguments and signed constants") Cc: stable@vger.kernel.org Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-11fbdev: pxafb: Fix possible use after free in pxafb_task()Kaixin Wang
In the pxafb_probe function, it calls the pxafb_init_fbinfo function, after which &fbi->task is associated with pxafb_task. Moreover, within this pxafb_init_fbinfo function, the pxafb_blank function within the &pxafb_ops struct is capable of scheduling work. If we remove the module which will call pxafb_remove to make cleanup, it will call unregister_framebuffer function which can call do_unregister_framebuffer to free fbi->fb through put_fb_info(fb_info), while the work mentioned above will be used. The sequence of operations that may lead to a UAF bug is as follows: CPU0 CPU1 | pxafb_task pxafb_remove | unregister_framebuffer(info) | do_unregister_framebuffer(fb_info) | put_fb_info(fb_info) | // free fbi->fb | set_ctrlr_state(fbi, state) | __pxafb_lcd_power(fbi, 0) | fbi->lcd_power(on, &fbi->fb.var) | //use fbi->fb Fix it by ensuring that the work is canceled before proceeding with the cleanup in pxafb_remove. Note that only root user can remove the driver at runtime. Signed-off-by: Kaixin Wang <kxwang23@m.fudan.edu.cn> Signed-off-by: Helge Deller <deller@gmx.de>
2024-09-11bpf: lsm: Set bpf_lsm_blob_sizes.lbs_task to 0Song Liu
bpf task local storage is now using task_struct->bpf_storage, so bpf_lsm_blob_sizes.lbs_task is no longer needed. Remove it to save some memory. Fixes: a10787e6d58c ("bpf: Enable task local storage for tracing programs") Cc: stable@vger.kernel.org Cc: KP Singh <kpsingh@kernel.org> Cc: Matt Bobrowski <mattbobrowski@google.com> Signed-off-by: Song Liu <song@kernel.org> Acked-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20240911055508.9588-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11selftests/bpf: Fix arena_atomics failure due to llvm changeYonghong Song
llvm change [1] made a change such that __sync_fetch_and_{and,or,xor}() will generate atomic_fetch_*() insns even if the return value is not used. This is a deliberate choice to make sure barrier semantics are preserved from source code to asm insn. But the change in [1] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,v ar_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [2] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Version 2 of the patch fixed the issue by using inline asm ([3]). But further discussion suggested to find a way from source to generate locked insn which is more user friendly. So in not-merged llvm patch ([4]), if relax memory ordering is used and the return value is not used, locked insn could be generated. So with llvm patch [4] to compile the bpf selftest, the following code __c11_atomic_fetch_and(&and64_value, 0x011ull << 32, memory_order_relaxed); is able to generate locked insn, hence fixing the selftest failure. [1] https://github.com/llvm/llvm-project/pull/106494 [2] d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") [3] https://lore.kernel.org/bpf/20240803025928.4184433-1-yonghong.song@linux.dev/ [4] https://github.com/llvm/llvm-project/pull/107343 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240909223431.1666305-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11Merge branches 'pm-sleep', 'pm-opp' and 'pm-tools'Rafael J. Wysocki
Merge updates related to system sleep, operating performance points (OPP) updates, and PM tooling updates for 6.12-rc1: - Remove unused stub for saveable_highmem_page() and remove deprecated macros from power management documentation (Andy Shevchenko). - Use ysfs_emit() and sysfs_emit_at() in "show" functions in the PM sysfs interface (Xueqin Luo). - Update the maintainers information for the operating-points-v2-ti-cpu DT binding (Dhruva Gole). - Drop unnecessary of_match_ptr() from ti-opp-supply (Rob Herring). - Update directory handling and installation process in the pm-graph Makefile and add .gitignore to ignore sleepgraph.py artifacts to pm-graph (Amit Vadhavana, Yo-Jung Lin). - Make cpupower display residency value in idle-info (Aboorva Devarajan). - Add missing powercap_set_enabled() stub function to cpupower (John B. Wyatt IV). - Add SWIG support to cpupower (John B. Wyatt IV). * pm-sleep: PM: hibernate: Remove unused stub for saveable_highmem_page() Documentation: PM: Discourage use of deprecated macros PM: sleep: Use sysfs_emit() and sysfs_emit_at() in "show" functions PM: hibernate: Use sysfs_emit() and sysfs_emit_at() in "show" functions * pm-opp: dt-bindings: opp: operating-points-v2-ti-cpu: Update maintainers opp: ti: Drop unnecessary of_match_ptr() * pm-tools: pm:cpupower: Add error warning when SWIG is not installed MAINTAINERS: Add Maintainers for SWIG Python bindings pm:cpupower: Include test_raw_pylibcpupower.py pm:cpupower: Add SWIG bindings files for libcpupower pm:cpupower: Add missing powercap_set_enabled() stub function pm-graph: Update directory handling and installation process in Makefile pm-graph: Make git ignore sleepgraph.py artifacts tools/cpupower: display residency value in idle-info
2024-09-11Merge branch 'harden-and-extend-elf-build-id-parsing-logic'Alexei Starovoitov
Andrii Nakryiko says: ==================== Harden and extend ELF build ID parsing logic The goal of this patch set is to extend existing ELF build ID parsing logic, currently mostly used by BPF subsystem, with support for working in sleepable mode in which memory faults are allowed and can be relied upon to fetch relevant parts of ELF file to find and fetch .note.gnu.build-id information. This is useful and important for BPF subsystem itself, but also for PROCMAP_QUERY ioctl(), built atop of /proc/<pid>/maps functionality (see [0]), which makes use of the same build_id_parse() functionality. PROCMAP_QUERY is always called from sleepable user process context, so it doesn't have to suffer from current restrictions of build_id_parse() which are due to the NMI context assumption. Along the way, we harden the logic to avoid TOCTOU, overflow, out-of-bounds access problems. This is the very first patch, which can be backported to older releases, if necessary. We also lift existing limitations of only working as long as ELF program headers and build ID note section is contained strictly within the very first page of ELF file. We achieve all of the above without duplication of logic between sleepable and non-sleepable modes through freader abstraction that manages underlying folio from page cache (on demand) and gives a simple to use direct memory access interface. With that, single page restrictions and adding sleepable mode support is rather straightforward. We also extend existing set of BPF selftests with a few tests targeting build ID logic across sleepable and non-sleepabe contexts (we utilize sleepable and non-sleepable uprobes for that). [0] https://lore.kernel.org/linux-mm/20240627170900.1672542-4-andrii@kernel.org/ v6->v7: - added filemap_invalidate_{lock,unlock}_shared() around read_cache_folio and kept Eduard's Reviewed-by (Eduard); v5->v6: - use local phnum variable in get_build_id_32() (Jann); - switch memcmp() instead of strcmp() in parse_build_id() (Jann); v4->v5: - pass proper file reference to read_cache_folio() (Shakeel); - fix another potential overflow due to two u32 additions (Andi); - add PageUptodate() check to patch #1 (Jann); v3->v4: - fix few more potential overflow and out-of-bounds access issues (Andi); - use purely folio-based implementation for freader (Matthew); v2->v3: - remove unneeded READ_ONCE()s and force phoff to u64 for 32-bit mode (Andi); - moved hardening fixes to the front for easier backporting (Jann); - call freader_cleanup() from build_id_parse_buf() for consistency (Jiri); v1->v2: - ensure MADV_PAGEOUT works reliably by paging data in first (Shakeel); - to fix BPF CI build optionally define MADV_POPULATE_READ in selftest. ==================== Link: https://lore.kernel.org/r/20240829174232.3133883-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11selftests/bpf: add build ID testsAndrii Nakryiko
Add a new set of tests validating behavior of capturing stack traces with build ID. We extend uprobe_multi target binary with ability to trigger uprobe (so that we can capture stack traces from it), but also we allow to force build ID data to be either resident or non-resident in memory (see also a comment about quirks of MADV_PAGEOUT). That way we can validate that in non-sleepable context we won't get build ID (as expected), but with sleepable uprobes we will get that build ID regardless of it being physically present in memory. Also, we add a small add-on linker script which reorders .note.gnu.build-id section and puts it after (big) .text section, putting build ID data outside of the very first page of ELF file. This will test all the relaxations we did in build ID parsing logic in kernel thanks to freader abstraction. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240829174232.3133883-11-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpersAndrii Nakryiko
Add sleepable implementations of bpf_get_stack() and bpf_get_task_stack() helpers and allow them to be used from sleepable BPF program (e.g., sleepable uprobes). Note, the stack trace IPs capturing itself is not sleepable (that would need to be a separate project), only build ID fetching is sleepable and thus more reliable, as it will wait for data to be paged in, if necessary. For that we make use of sleepable build_id_parse() implementation. Now that build ID related internals in kernel/bpf/stackmap.c can be used both in sleepable and non-sleepable contexts, we need to add additional rcu_read_lock()/rcu_read_unlock() protection around fetching perf_callchain_entry, but with the refactoring in previous commit it's now pretty straightforward. We make sure to do rcu_read_unlock (in sleepable mode only) right before stack_map_get_build_id_offset() call which can sleep. By that time we don't have any more use of perf_callchain_entry. Note, bpf_get_task_stack() will fail for user mode if task != current. And for kernel mode build ID are irrelevant. So in that sense adding sleepable bpf_get_task_stack() implementation is a no-op. It feel right to wire this up for symmetry and completeness, but I'm open to just dropping it until we support `user && crosstask` condition. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240829174232.3133883-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entryAndrii Nakryiko
Change stack_map_get_build_id_offset() which is used to convert stack trace IP addresses into build ID+offset pairs. Right now this function accepts an array of u64s as an input, and uses array of struct bpf_stack_build_id as an output. This is problematic because u64 array is coming from perf_callchain_entry, which is (non-sleepable) RCU protected, so once we allows sleepable build ID fetching, this all breaks down. But its actually pretty easy to make stack_map_get_build_id_offset() works with array of struct bpf_stack_build_id as both input and output. Which is what this patch is doing, eliminating the dependency on perf_callchain_entry. We require caller to fill out bpf_stack_build_id.ip fields (all other can be left uninitialized), and update in place as we do build ID resolution. We make sure to READ_ONCE() and cache locally current IP value as we used it in a few places to find matching VMA and so on. Given this data is directly accessible and modifiable by user's BPF code, we should make sure to have a consistent view of it. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240829174232.3133883-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11lib/buildid: don't limit .note.gnu.build-id to the first page in ELFAndrii Nakryiko
With freader we don't need to restrict ourselves to a single page, so let's allow ELF notes to be at any valid position with the file. We also merge parse_build_id() and parse_build_id_buf() as now the only difference between them is note offset overflow, which makes sense to check in all situations. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240829174232.3133883-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11lib/buildid: implement sleepable build_id_parse() APIAndrii Nakryiko
Extend freader with a flag specifying whether it's OK to cause page fault to fetch file data that is not already physically present in memory. With this, it's now easy to wait for data if the caller is running in sleepable (faultable) context. We utilize read_cache_folio() to bring the desired folio into page cache, after which the rest of the logic works just the same at folio level. Suggested-by: Omar Sandoval <osandov@fb.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240829174232.3133883-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-11lib/buildid: rename build_id_parse() into build_id_parse_nofault()Andrii Nakryiko
Make it clear that build_id_parse() assumes that it can take no page fault by renaming it and current few users to build_id_parse_nofault(). Also add build_id_parse() stub which for now falls back to non-sleepable implementation, but will be changed in subsequent patches to take advantage of sleepable context. PROCMAP_QUERY ioctl() on /proc/<pid>/maps file is using build_id_parse() and will automatically take advantage of more reliable sleepable context implementation. Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240829174232.3133883-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>