summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-08erofs: add a reserved buffer pool for lz4 decompressionChunhai Guo
This adds a special global buffer pool (in the end) for reserved pages. Using a reserved pool for LZ4 decompression significantly reduces the time spent on extra temporary page allocation for the extreme cases in low memory scenarios. The table below shows the reduction in time spent on page allocation for LZ4 decompression when using a reserved pool. The results were obtained from multi-app launch benchmarks on ARM64 Android devices running the 5.15 kernel with an 8-core CPU and 8GB of memory. In the benchmark, we launched 16 frequently-used apps, and the camera app was the last one in each round. The data in the table is the average time of camera app for each round. After using the reserved pool, there was an average improvement of 150ms in the overall launch time of our camera app, which was obtained from the systrace log. +--------------+---------------+--------------+---------+ | | w/o page pool | w/ page pool | diff | +--------------+---------------+--------------+---------+ | Average (ms) | 3434 | 21 | -99.38% | +--------------+---------------+--------------+---------+ Based on the benchmark logs, 64 pages are sufficient for 95% of scenarios. This value can be adjusted with a module parameter `reserved_pages`. The default value is 0. This pool is currently only used for the LZ4 decompressor, but it can be applied to more decompressors if needed. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240402131523.2703948-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08erofs: do not use pagepool in z_erofs_gbuf_growsize()Chunhai Guo
Let's use alloc_pages_bulk_array() for simplicity and get rid of unnecessary pagepool. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240402092757.2635257-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08erofs: rename per-CPU buffers to global buffer pool and make it configurableChunhai Guo
It will cost more time if compressed buffers are allocated on demand for low-latency algorithms (like lz4) so EROFS uses per-CPU buffers to keep compressed data if in-place decompression is unfulfilled. While it is kind of wasteful of memory for a device with hundreds of CPUs, and only a small number of CPUs concurrently decompress most of the time. This patch renames it as 'global buffer pool' and makes it configurable. This allows two or more CPUs to share a common buffer to reduce memory occupation. Suggested-by: Gao Xiang <xiang@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Link: https://lore.kernel.org/r/20240402100036.2673604-1-guochunhai@vivo.com Signed-off-by: Sandeep Dhavale <dhavale@google.com> Link: https://lore.kernel.org/r/20240408215231.3376659-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08erofs: rename utils.c to zutil.cChunhai Guo
Currently, utils.c is only useful if CONFIG_EROFS_FS_ZIP is on. So let's rename it to zutil.c as well as avoid its inclusion if CONFIG_EROFS_FS_ZIP is explicitly disabled. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240401135550.2550043-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-05-08uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be}Erick Archer
This commit can be considered an addition to commit ca7e324e8ad3 ("compiler_types: add Endianness-dependent __counted_by_{le,be}") [1]. In the commit referenced above the __counted_by_{le,be}() attributes were defined based on platform's endianness with the goal to that the structures contain flexible arrays at the end, and the counter for, can be annotated with these attributes. So, this commit only provide UAPI macros for UAPI structs that will gain annotations for __counted_by_{le, be} attributes. And it is the previous step to be able to use these attributes in UAPI. Link: https://lore.kernel.org/r/20240327142241.1745989-2-aleksander.lobakin@intel.com Suggested-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Erick Archer <erick.archer@outlook.com> Link: https://lore.kernel.org/r/AS8PR02MB72372E45071E8821C07236F78BE42@AS8PR02MB7237.eurprd02.prod.outlook.com Fixes: ca7e324e8ad3 ("compiler_types: add Endianness-dependent __counted_by_{le,be}") Signed-off-by: Kees Cook <keescook@chromium.org>
2024-05-08virtiofs: include a newline in sysfs tagBrian Foster
The internal tag string doesn't contain a newline. Append one when emitting the tag via sysfs. [Stefan] Orthogonal to the newline issue, sysfs_emit(buf, "%s", fs->tag) is needed to prevent format string injection. Signed-off-by: Brian Foster <bfoster@redhat.com> Fixes: a8f62f50b4e4 ("virtiofs: export filesystem tags through sysfs") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-05-08x86/pci/ce4100: Remove unused 'struct sim_reg_op'Dr. David Alan Gilbert
'struct sim_reg_op' wasn't ever used since it was introduced 14 years ago via: 91d8037f563e ("ce4100: Add PCI register emulation for CE4100") Remove it. [ mingo: Improved the changelog. ] Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240507232348.46677-1-linux@treblig.org
2024-05-08Merge tag 'qcom-arm64-for-6.10-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/dt A few more Qualcomm Arm64 DeviceTree updates for v6.10 This corrects the obviously broken compatible of the USB VBUS regulator in PM6150. It clears the odd-looking default address on QCS404 EVB, with the expectation that a proper address is provides by other means. The newly added SM8650 GPU node is corrected with a missing memory region. The third DWC3 instance on SC8280XP is added, and enabled on Lenovo Thinkpad X13s to give working fingerprint sensor. * tag 'qcom-arm64-for-6.10-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: pm6150: correct USB VBUS regulator compatible arm64: dts: qcom: qcs404: fix bluetooth device address arm64: dts: qcom: sc8280xp-x13s: enable USB MP and fingerprint reader arm64: dts: qcom: sc8280xp: Add USB DWC3 Multiport controller arm64: dts: qcom: sm8650: Fix GPU cx_mem size Link: https://lore.kernel.org/r/20240508021820.206441-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-05-08Merge tag 'qcom-arm64-defconfig-for-6.10-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/defconfig One more Qualcomm Arm64 defconfig update for v6.10 This enables the SM6115 interconnect provider, to make it possible to boot boards on this SoC. * tag 'qcom-arm64-defconfig-for-6.10-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: defconfig: select INTERCONNECT_QCOM_SM6115 as built-in Link: https://lore.kernel.org/r/20240508021312.206121-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-05-08Merge tag 'qcom-drivers-for-6.10-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers A few more Qualcomm driver updates for v6.10 This fixes a sleep-while-atomic issue in pmic_glink, stemming from the fact that the GLINK callback comes from interrupt context. It fixes the Bluetooth address in the example of qcom,wcnss, and it enables UEFI variables on SC8180X devices (Primus and Flex 5G). * tag 'qcom-drivers-for-6.10-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: firmware: qcom: uefisecapp: Allow on sc8180x Primus and Flex 5G soc: qcom: pmic_glink: Make client-lock non-sleeping dt-bindings: soc: qcom,wcnss: fix bluetooth address example Link: https://lore.kernel.org/r/20240508020900.204413-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-05-08spi: Remove unneded check for orig_nentsAndy Shevchenko
Both dma_unmap_sgtable() and sg_free_table() in spi_unmap_buf_attrs() have checks for orig_nents against 0. No need to duplicate this. All the same applies to other DMA mapping API calls. Also note, there is no other user in the kernel that does this kind of checks. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240507201028.564630-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-07Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-05-06 (ice) This series contains updates to ice driver only. Paul adds support for additional E830 devices and adjusts naming for existing E830 devices. Marcin commonizes a couple of TC setup calls to reduce duplicated code. Mateusz adds ice_vsi_cfg_params into ice_vsi to consolidate info. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: refactor struct ice_vsi_cfg_params to be inside of struct ice_vsi ice: Deduplicate tc action setup ice: update E830 device ids and comments ice: add additional E830 device ids ==================== Link: https://lore.kernel.org/r/20240506170827.948682-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07net: usb: sr9700: stop lying about skb->truesizeEric Dumazet
Some usb drivers set small skb->truesize and break core networking stacks. In this patch, I removed one of the skb->truesize override. I also replaced one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240506143939.3673865-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07net: usb: smsc75xx: stop lying about skb->truesizeEric Dumazet
Some usb drivers try to set small skb->truesize and break core networking stacks. In this patch, I removed one of the skb->truesize override. I also replaced one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steve Glendinning <steve.glendinning@shawell.net> Link: https://lore.kernel.org/r/20240506142358.3657918-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07usb: aqc111: stop lying about skb->truesizeEric Dumazet
Some usb drivers try to set small skb->truesize and break core networking stacks. I replace one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") Fixes: 361459cd9642 ("net: usb: aqc111: Implement RX data path") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240506135546.3641185-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07mptcp: only allow set existing scheduler for net.mptcp.schedulerGregory Detal
The current behavior is to accept any strings as inputs, this results in an inconsistent result where an unexisting scheduler can be set: # sysctl -w net.mptcp.scheduler=notdefault net.mptcp.scheduler = notdefault This patch changes this behavior by checking for existing scheduler before accepting the input. Fixes: e3b2870b6d22 ("mptcp: add a new sysctl scheduler") Cc: stable@vger.kernel.org Signed-off-by: Gregory Detal <gregory.detal@gmail.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240506-upstream-net-20240506-mptcp-sched-exist-v1-1-2ed1529e521e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07selftests/net: fix uninitialized variablesJohn Hubbard
When building with clang, via: make LLVM=1 -C tools/testing/selftest ...clang warns about three variables that are not initialized in all cases: 1) The opt_ipproto_off variable is used uninitialized if "testname" is not "ip". Willem de Bruijn pointed out that this is an actual bug, and suggested the fix that I'm using here (thanks!). 2) The addr_len is used uninitialized, but only in the assert case, which bails out, so this is harmless. 3) The family variable in add_listener() is only used uninitialized in the error case (neither IPv4 nor IPv6 is specified), so it's also harmless. Fix by initializing each variable. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20240506190204.28497-1-jhubbard@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07lib: Allow for the DIM library to be modularFlorian Fainelli
Allow the Dynamic Interrupt Moderation (DIM) library to be built as a module. This is particularly useful in an Android GKI (Google Kernel Image) configuration where everything is built as a module, including Ethernet controller drivers. Having to build DIMLIB into the kernel image with potentially no user is wasteful. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240506175040.410446-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07nfc: nci: Fix kcov check in nci_rx_work()Tetsuo Handa
Commit 7e8cdc97148c ("nfc: Add KCOV annotations") added kcov_remote_start_common()/kcov_remote_stop() pair into nci_rx_work(), with an assumption that kcov_remote_stop() is called upon continue of the for loop. But commit d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") forgot to call kcov_remote_stop() before break of the for loop. Reported-by: syzbot <syzbot+0438378d6f157baae1a2@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2 Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/6d10f829-5a0c-405a-b39a-d7266f3a1a0b@I-love.SAKURA.ne.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07mptcp: fix possible NULL dereferencesEric Dumazet
subflow_add_reset_reason(skb, ...) can fail. We can not assume mptcp_get_ext(skb) always return a non NULL pointer. syzbot reported: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 0 PID: 5098 Comm: syz-executor132 Not tainted 6.9.0-rc6-syzkaller-01478-gcdc74c9d06e7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:subflow_v6_route_req+0x2c7/0x490 net/mptcp/subflow.c:388 Code: 8d 7b 07 48 89 f8 48 c1 e8 03 42 0f b6 04 20 84 c0 0f 85 c0 01 00 00 0f b6 43 07 48 8d 1c c3 48 83 c3 18 48 89 d8 48 c1 e8 03 <42> 0f b6 04 20 84 c0 0f 85 84 01 00 00 0f b6 5b 01 83 e3 0f 48 89 RSP: 0018:ffffc9000362eb68 EFLAGS: 00010206 RAX: 0000000000000003 RBX: 0000000000000018 RCX: ffff888022039e00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88807d961140 R08: ffffffff8b6cb76b R09: 1ffff1100fb2c230 R10: dffffc0000000000 R11: ffffed100fb2c231 R12: dffffc0000000000 R13: ffff888022bfe273 R14: ffff88802cf9cc80 R15: ffff88802ad5a700 FS: 0000555587ad2380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f420c3f9720 CR3: 0000000022bfc000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_conn_request+0xf07/0x32c0 net/ipv4/tcp_input.c:7180 tcp_rcv_state_process+0x183c/0x4500 net/ipv4/tcp_input.c:6663 tcp_v6_do_rcv+0x8b2/0x1310 net/ipv6/tcp_ipv6.c:1673 tcp_v6_rcv+0x22b4/0x30b0 net/ipv6/tcp_ipv6.c:1910 ip6_protocol_deliver_rcu+0xc76/0x1570 net/ipv6/ip6_input.c:438 ip6_input_finish+0x186/0x2d0 net/ipv6/ip6_input.c:483 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5625 [inline] __netif_receive_skb+0x1ea/0x650 net/core/dev.c:5739 netif_receive_skb_internal net/core/dev.c:5825 [inline] netif_receive_skb+0x1e8/0x890 net/core/dev.c:5885 tun_rx_batched+0x1b7/0x8f0 drivers/net/tun.c:1549 tun_get_user+0x2f35/0x4560 drivers/net/tun.c:2002 tun_chr_write_iter+0x113/0x1f0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2110 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xa84/0xcb0 fs/read_write.c:590 ksys_write+0x1a0/0x2c0 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 3e140491dd80 ("mptcp: support rstreason for passive reset") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240506123032.3351895-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07selftests: netfilter: conntrack_tcp_unreplied.sh: wait for initial ↵Florian Westphal
connection attempt Netdev CI reports occasional failures with this test ("ERROR: ns2-dX6bUE did not pick up tcp connection from peer"). Add explicit busywait call until the initial connection attempt shows up in conntrack rather than a one-shot 'must exist' check. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240506114320.12178-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07Merge branch 'libbpf: further struct_ops fixes and improvements'Martin KaFai Lau
Andrii Nakryiko says: ==================== Fix yet another case of mishandling SEC("struct_ops") programs that were nulled out programmatically through BPF skeleton by the user. While at it, add some improvements around detecting and reporting errors, specifically a common case of declaring SEC("struct_ops") program, but forgetting to actually make use of it by setting it as a callback implementation in SEC(".struct_ops") variable (i.e., map) declaration. A bunch of new selftests are added as well. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-07selftests/bpf: shorten subtest names for struct_ops_module testAndrii Nakryiko
Drive-by clean up, we shouldn't use meaningless "test_" prefix for subtest names. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-8-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-07selftests/bpf: validate struct_ops early failure detection logicAndrii Nakryiko
Add a simple test that validates that libbpf will reject isolated struct_ops program early with helpful warning message. Also validate that explicit use of such BPF program through BPF skeleton after BPF object is open won't trigger any warnings. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-7-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-07libbpf: improve early detection of doomed-to-fail BPF program loadingAndrii Nakryiko
Extend libbpf's pre-load checks for BPF programs, detecting more typical conditions that are destinated to cause BPF program failure. This is an opportunity to provide more helpful and actionable error message to users, instead of potentially very confusing BPF verifier log and/or error. In this case, we detect struct_ops BPF program that was not referenced anywhere, but still attempted to be loaded (according to libbpf logic). Suggest that the program might need to be used in some struct_ops variable. User will get a message of the following kind: libbpf: prog 'test_1_forgotten': SEC("struct_ops") program isn't referenced anywhere, did you forget to use it? Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-6-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-07libbpf: fix libbpf_strerror_r() handling unknown errorsAndrii Nakryiko
strerror_r(), used from libbpf-specific libbpf_strerror_r() wrapper is documented to return error in two different ways, depending on glibc version. Take that into account when handling strerror_r()'s own errors, which happens when we pass some non-standard (internal) kernel error to it. Before this patch we'd have "ERROR: strerror_r(524)=22", which is quite confusing. Now for the same situation we'll see a bit less visually scary "unknown error (-524)". At least we won't confuse user with irrelevant EINVAL (22). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-5-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-07selftests/bpf: add another struct_ops callback use case testAndrii Nakryiko
Add a test which tests the case that was just fixed. Kernel has full type information about callback, but user explicitly nulls out the reference to declaratively set BPF program reference. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-4-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-07libbpf: handle yet another corner case of nulling out struct_ops programAndrii Nakryiko
There is yet another corner case where user can set STRUCT_OPS program reference in STRUCT_OPS map to NULL, but libbpf will fail to disable autoload for such BPF program. This time it's the case of "new" kernel which has type information about callback field, but user explicitly nulled-out program reference from user-space after opening BPF object. Fix, hopefully, the last remaining unhandled case. Fixes: 0737df6de946 ("libbpf: better fix for handling nulled-out struct_ops program") Fixes: f973fccd43d3 ("libbpf: handle nulled-out program in struct_ops correctly") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-3-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-07libbpf: remove unnecessary struct_ops prog validity checkAndrii Nakryiko
libbpf ensures that BPF program references set in map->st_ops->progs[i] during open phase are always valid STRUCT_OPS programs. This is done in bpf_object__collect_st_ops_relos(). So there is no need to double-check that in bpf_map__init_kern_struct_ops(). Simplify the code by removing unnecessary check. Also, we avoid using local prog variable to keep code similar to the upcoming fix, which adds similar logic in another part of bpf_map__init_kern_struct_ops(). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240507001335.1445325-2-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07net: dccp: Fix ccid2_rtt_estimator() kernel-docJeff Johnson
make C=1 reports: warning: Function parameter or struct member 'mrtt' not described in 'ccid2_rtt_estimator' So document the 'mrtt' parameter. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240505-ccid2_rtt_estimator-kdoc-v1-1-09231fcb9145@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08rust: alloc: fix dangling pointer in VecExt<T>::reserve()Danilo Krummrich
Currently, a Vec<T>'s ptr value, after calling Vec<T>::new(), is initialized to Unique::dangling(). Hence, in VecExt<T>::reserve(), we're passing a dangling pointer (instead of NULL) to krealloc() whenever a new Vec<T>'s backing storage is allocated through VecExt<T> extension functions. This only works as long as align_of::<T>(), used by Unique::dangling() to derive the dangling pointer, resolves to a value between 0x0 and ZERO_SIZE_PTR (0x10) and krealloc() hence treats it the same as a NULL pointer however. This isn't a case we should rely on, since there may be types whose alignment may exceed the range still covered by krealloc(), plus other kernel allocators are not as tolerant either. Instead, pass a real NULL pointer to krealloc_aligned() if Vec<T>'s capacity is zero. Fixes: 5ab560ce12ed ("rust: alloc: update `VecExt` to take allocation flags") Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Benno Lossin <benno.lossin@proton.me> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Wedson Almeida Filho <walmeida@microsoft.com> Link: https://lore.kernel.org/r/20240501134834.22323-1-dakr@redhat.com [ Solved `use` conflict and applied the `if`-instead-of-`match` change discussed in the list. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-05-07net: phy: marvell: add support for MV88E6250 family internal PHYsMatthias Schiffer
The embedded PHYs of the 88E6250 family switches are very basic - they do not even have an Extended Address / Page register. This adds support for the PHYs to the driver to set up PHY interrupts and retrieve error stats. To deal with PHYs without a page register, "simple" variants of all stat handling functions are introduced. The code should work with all 88E6250 family switches (6250/6220/6071/ 6070/6020). The PHY ID 0x01410db0 was read from a 88E6020, under the assumption that all switches of this family use the same ID. The spec only lists the prefix 0x01410c00 and leaves the last 10 bits as reserved, but that seems too unspecific to be useful, as it would cover several existing PHY IDs already supported by the driver; therefore, the ID read from the actual hardware is used. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/0695f699cd942e6e06da9d30daeedfd47785bc01.1714643285.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07net: phy: marvell: constify marvell_hw_statsMatthias Schiffer
The list of stat registers is read-only, so we can declare it as const. Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/24d7a2f39e0c4c94466e8ad43228fdd798053f3a.1714643285.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07Merge branch 'fix-number-of-arguments-in-test'Andrii Nakryiko
Cupertino Miranda says: ==================== Fix number of arguments in test Hi everyone, This is a new version based on comments. Regards, Cupertino Changes from v1: - Comment with gcc-bpf replaced by bpf_gcc. - Used pragma GCC optimize to disable GCC optimization in test. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: David Faust <david.faust@oracle.com> Cc: Jose Marchesi <jose.marchesi@oracle.com> Cc: Elena Zannoni <elena.zannoni@oracle.com> ==================== Link: https://lore.kernel.org/r/20240507122220.207820-1-cupertino.miranda@oracle.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-05-07selftests/bpf: Change functions definitions to support GCCCupertino Miranda
The test_xdp_noinline.c contains 2 functions that use more then 5 arguments. This patch collapses the 2 last arguments in an array. Also in GCC and ipa_sra optimization increases the number of arguments used in function encap_v4. This pass disables the optimization for that particular file. Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240507122220.207820-3-cupertino.miranda@oracle.com
2024-05-07selftests/bpf: Add CFLAGS per source file and runnerCupertino Miranda
This patch adds support to specify CFLAGS per source file and per test runner. Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240507122220.207820-2-cupertino.miranda@oracle.com
2024-05-07bpf: Temporarily define BPF_NO_PRESEVE_ACCESS_INDEX for GCCJose E. Marchesi
The vmlinux.h file generated by bpftool makes use of compiler pragmas in order to install the CO-RE preserve_access_index in all the struct types derived from the BTF info: #ifndef __VMLINUX_H__ #define __VMLINUX_H__ #ifndef BPF_NO_PRESERVE_ACCESS_INDEX #pragma clang attribute push (__attribute__((preserve_access_index)), apply_t = record #endif [... type definitions generated from kernel BTF ... ] #ifndef BPF_NO_PRESERVE_ACCESS_INDEX #pragma clang attribute pop #endif The `clang attribute push/pop' pragmas are specific to clang/llvm and are not supported by GCC. At the moment the BTF dumping services in libbpf do not support dicriminating between types dumped because they are directly referred and types dumped because they are dependencies. A suitable API is being worked now. See [1] and [2]. In the interim, this patch changes the selftests/bpf Makefile so it passes -DBPF_NO_PRESERVE_ACCESS_INDEX to GCC when it builds the selftests. This workaround is temporary, and may have an impact on the results of the GCC-built tests. [1] https://lore.kernel.org/bpf/20240503111836.25275-1-jose.marchesi@oracle.com/T/#u [2] https://lore.kernel.org/bpf/20240504205510.24785-1-jose.marchesi@oracle.com/T/#u Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240507095011.15867-1-jose.marchesi@oracle.com
2024-05-07Merge branch 'bpf-avoid-attribute-ignored-warnings-in-gcc'Andrii Nakryiko
Jose E. Marchesi says: ==================== bpf: avoid `attribute ignored' warnings in GCC These two patches avoid warnings (turned into errors) when building the BPF selftests with GCC. [Changes from V1: - As requested by reviewer, an additional patch has been added in order to remove __hidden from the `private' macro in cpumask_common.h. - Typo bening -> benign fixed in the commit message of the second patch.] ==================== Link: https://lore.kernel.org/r/20240507074227.4523-1-jose.marchesi@oracle.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-05-07bpf: Disable some `attribute ignored' warnings in GCCJose E. Marchesi
This patch modifies selftests/bpf/Makefile to pass -Wno-attributes to GCC. This is because of the following attributes which are ignored: - btf_decl_tag - btf_type_tag There are many of these. At the moment none of these are recognized/handled by gcc-bpf. We are aware that btf_decl_tag is necessary for some of the selftest harness to communicate test failure/success. Support for it is in progress in GCC upstream: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650482.html However, the GCC master branch is not yet open, so the series above (currently under review upstream) wont be able to make it there until 14.1 gets released, probably mid next week. As for btf_type_tag, more extensive work will be needed in GCC upstream to support it in both BTF and DWARF. We have a WIP big patch for that, but that is not needed to compile/build the selftests. - used There are SEC macros defined in the selftests as: #define SEC(N) __attribute__((section(N),used)) The SEC macro is used for both functions and global variables. According to the GCC documentation `used' attribute is really only meaningful for functions, and it warns when the attribute is used for other global objects, like for example ctl_array in test_xdp_noinline.c. Ignoring this is benign. - align_value In progs/test_cls_redirect.c:127 there is: typedef uint8_t *net_ptr __attribute__((align_value(8))); GCC warns that it is ignoring this attribute, because it is not implemented by GCC. I think ignoring this attribute in GCC is benign, because according to the clang documentation [1] its purpose seems to be merely declarative and doesn't seem to translate into extra checks at run-time, only to perhaps better optimized code ("runtime behavior is undefined if the pointed memory object is not aligned to the specified alignment"). [1] https://clang.llvm.org/docs/AttributeReference.html#align-value Tested in bpf-next master. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240507074227.4523-3-jose.marchesi@oracle.com
2024-05-07bpf: Avoid __hidden__ attribute in static objectJose E. Marchesi
An object defined as `static' defaults to hidden visibility. If additionally the visibility(__weak__) compiler attribute is applied to the declaration of the object, GCC warns that the attribute gets ignored. This patch removes the only instance of this problem among the BPF selftests. Tested in bpf-next master. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240507074227.4523-2-jose.marchesi@oracle.com
2024-05-07bpf: Remove redundant page mask of vmf->addressHaiyue Wang
As the comment described in "struct vm_fault": ".address" : 'Faulting virtual address - masked' ".real_address" : 'Faulting virtual address - unmasked' The link [1] said: "Whatever the routes, all architectures end up to the invocation of handle_mm_fault() which, in turn, (likely) ends up calling __handle_mm_fault() to carry out the actual work of allocating the page tables." __handle_mm_fault() does address assignment: .address = address & PAGE_MASK, .real_address = address, This is debug dump by running `./test_progs -a "*arena*"`: [ 69.767494] arena fault: vmf->address = 10000001d000, vmf->real_address = 10000001d008 [ 69.767496] arena fault: vmf->address = 10000001c000, vmf->real_address = 10000001c008 [ 69.767499] arena fault: vmf->address = 10000001b000, vmf->real_address = 10000001b008 [ 69.767501] arena fault: vmf->address = 10000001a000, vmf->real_address = 10000001a008 [ 69.767504] arena fault: vmf->address = 100000019000, vmf->real_address = 100000019008 [ 69.769388] arena fault: vmf->address = 10000001e000, vmf->real_address = 10000001e1e8 So we can use the value of 'vmf->address' to do BPF arena kernel address space cast directly. [1] https://docs.kernel.org/mm/page_tables.html Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Link: https://lore.kernel.org/r/20240507063358.8048-1-haiyue.wang@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-07btrfs: qgroup: fix initialization of auto inherit arrayDan Carpenter
The "i++" was accidentally left out so it just sets qgids[0] over and over. This can lead to unexpected problems, as the groups[1:] would be all 0, leading to later find_qgroup_rb() unable to find a qgroup and cause snapshot creation failure. Fixes: 5343cd9364ea ("btrfs: qgroup: simple quota auto hierarchy for nested subvolumes") CC: stable@vger.kernel.org # 6.7+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: count super block write errors in device instead of tracking folio ↵Matthew Wilcox (Oracle)
error state Currently the error status of super block write is tracked in page/folio status bit Error. For that we need to keep the reference for the whole duration of write and wait. Count the number of superblock writeback errors in the btrfs_device. That means we don't need the folio to stay around until it's waited for, and can avoid the extra call to folio_get/put. Also remove a mention of PageError in a comment as it's the last mention of the page Error state. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: use the folio iterator in btrfs_end_super_write()Matthew Wilcox (Oracle)
Iterate over folios instead of bvecs. Switch the order of unlock and put to be the usual order; we know this folio can't be put until it's been waited for, but that's fragile. Remove the calls to ClearPageUptodate / SetPageUptodate -- if PAGE_SIZE is larger than BTRFS_SUPER_INFO_SIZE, we'd be marking the entire folio uptodate without having actually initialised all the bytes in the page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: convert super block writes to folio in write_dev_supers()Matthew Wilcox (Oracle)
This is a direct conversion from pages to folios, assuming single page folio. Also removes some calls to obsolete APIs and some hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: convert super block writes to folio in wait_dev_supers()Matthew Wilcox (Oracle)
This is a direct conversion from pages to folios, assuming single page folio. Also removes a few calls to compound_head() and calls to obsolete APIs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07bio: Export bio_add_folio_nofail to modulesMatthew Wilcox (Oracle)
Several modules use __bio_add_page() today and may need to be converted to bio_add_folio_nofail(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: remove duplicate included header from fs.hThorsten Blum
Remove duplicate included header file linux/blkdev.h . Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-07btrfs: add a cached state to extent_clear_unlock_delallocJosef Bacik
Now that we have the lock_extent tightly coupled with extent_clear_unlock_delalloc we can add a cached state to extent_clear_unlock_delalloc and benefit from skipping the extra lookup when we're doing cow. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>