summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-07-27Merge branch 'bpf-support-new-insns-from-cpu-v4'Alexei Starovoitov
Yonghong Song says: ==================== bpf: Support new insns from cpu v4 In previous discussion ([1]), it is agreed that we should introduce cpu version 4 (llvm flag -mcpu=v4) which contains some instructions which can simplify code, make code easier to understand, fix the existing problem, or simply for feature completeness. More specifically, the following new insns are proposed: . sign extended load . sign extended mov . bswap . signed div/mod . ja with 32-bit offset This patch set added kernel support for insns proposed in [1] except BPF_ST which already has full kernel support. Beside the above proposed insns, LLVM will generate BPF_ST insn as well under -mcpu=v4. The llvm patch ([2]) has been merged into llvm-project 'main' branch. The patchset implements interpreter, jit and verifier support for these new insns. For this patch set, I tested cpu v2/v3/v4 and the selftests are all passed. I also tested selftests introduced in this patch set with additional changes beside normal jit testing (bpf_jit_enable = 1 and bpf_jit_harden = 0) - bpf_jit_enable = 0 - bpf_jit_enable = 1 and bpf_jit_harden = 1 and both testing passed. [1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/ [2] https://reviews.llvm.org/D144829 Changelogs: v4 -> v5: . for v4, patch 8/17 missed in mailing list and patchwork, so resend. . rebase on top of master v3 -> v4: . some minor asm syntax adjustment based on llvm change. . add clang version and target arch guard for new tests so they can still compile with old llvm compilers. . some changes to the bpf doc. v2 -> v3: . add missed disasm change from v2. . handle signed load of ctx fields properly. . fix some interpreter sdiv/smod error when bpf_jit_enable = 0. . fix some verifier range bounding errors. . add more C tests. RFCv1 -> v2: . add more verifier supports for signed extend load and mov insns. . rename some insn names to be more consistent with intel practice. . add cpuv4 test runner for test progs. . add more unit and C tests. . add documentation. ==================== Link: https://lore.kernel.org/r/20230728011143.3710005-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27docs/bpf: Add documentation for new instructionsYonghong Song
Add documentation in instruction-set.rst for new instruction encoding and their corresponding operations. Also removed the question related to 'no BPF_SDIV' in bpf_design_QA.rst since we have BPF_SDIV insn now. Cc: bpf@ietf.org Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011342.3724411-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27selftests/bpf: Test ldsx with more complex casesYonghong Song
The following ldsx cases are tested: - signed readonly map value - read/write map value - probed memory - not-narrowed ctx field access - narrowed ctx field access. Without previous proper verifier/git handling, the test will fail. If cpuv4 is not supported either by compiler or by jit, the test will be skipped. # ./test_progs -t ldsx_insn #113/1 ldsx_insn/map_val and probed_memory:SKIP #113/2 ldsx_insn/ctx_member_sign_ext:SKIP #113/3 ldsx_insn/ctx_member_narrow_sign_ext:SKIP #113 ldsx_insn:SKIP Summary: 1/0 PASSED, 3 SKIPPED, 0 FAILED Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011336.3723434-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27selftests/bpf: Add unit tests for new gotol insnYonghong Song
Add unit tests for gotol insn. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011329.3721881-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27selftests/bpf: Add unit tests for new sdiv/smod insnsYonghong Song
Add unit tests for sdiv/smod insns. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011321.3720500-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27selftests/bpf: Add unit tests for new bswap insnsYonghong Song
Add unit tests for bswap insns. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011314.3720109-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27selftests/bpf: Add unit tests for new sign-extension mov insnsYonghong Song
Add unit tests for movsx insns. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011309.3719295-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27selftests/bpf: Add unit tests for new sign-extension load insnsYonghong Song
Add unit tests for new ldsx insns. The test includes sign-extension with a single value or with a value range. If cpuv4 is not supported due to (1) older compiler, e.g., less than clang version 18, or (2) test runner test_progs and test_progs-no_alu32 which tests cpu v2 and v3, or (3) non-x86_64 arch not supporting new insns in jit yet, a dummy program is added with below output: #318/1 verifier_ldsx/cpuv4 is not supported by compiler or jit, use a dummy test:OK #318 verifier_ldsx:OK to indicate the test passed with a dummy test instead of actually testing cpuv4. I am using a dummy prog to avoid changing the verifier testing infrastructure. Once clang 18 is widely available and other architectures support cpuv4, at least for CI run, the dummy program can be removed. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011304.3719139-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27selftests/bpf: Add a cpuv4 test runner for cpu=v4 testingYonghong Song
Similar to no-alu32 runner, if clang compiler supports -mcpu=v4, a cpuv4 runner is created to test bpf programs compiled with -mcpu=v4. The following are some num-of-insn statistics for each newer instructions based on existing selftests, excluding subsequent cpuv4 insn specific tests. insn pattern # of instructions reg = (s8)reg 4 reg = (s16)reg 4 reg = (s32)reg 144 reg = *(s8 *)(reg + off) 13 reg = *(s16 *)(reg + off) 14 reg = *(s32 *)(reg + off) 15215 reg = bswap16 reg 142 reg = bswap32 reg 38 reg = bswap64 reg 14 reg s/= reg 0 reg s%= reg 0 gotol <offset> 58 Note that in llvm -mcpu=v4 implementation, the compiler is a little bit conservative about generating 'gotol' insn (32-bit branch offset) as it didn't precise count the number of insns (e.g., some insns are debug insns, etc.). Compared to old 'goto' insn, newer 'gotol' insn should have comparable verification states to 'goto' insn. With current patch set, all selftests passed with -mcpu=v4 when running test_progs-cpuv4 binary. The -mcpu=v3 and -mcpu=v2 run are also successful. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011250.3718252-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27selftests/bpf: Fix a test_verifier failureYonghong Song
The following test_verifier subtest failed due to new encoding for BSWAP. $ ./test_verifier ... #99/u invalid 64-bit BPF_END FAIL Unexpected success to load! verification time 215 usec stack depth 0 processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 #99/p invalid 64-bit BPF_END FAIL Unexpected success to load! verification time 198 usec stack depth 0 processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Tighten the test so it still reports a failure. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011244.3717464-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Add kernel/bpftool asm support for new instructionsYonghong Song
Add asm support for new instructions so kernel verifier and bpftool xlated insn dumps can have proper asm syntax for new instructions. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28Merge tag 'drm-intel-fixes-2023-07-27' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Use shmem for dpt objects [dpt] (Radhakrishna Sripada) - Fix an error handling path in igt_write_huge() (Christophe JAILLET) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZMI4Mtom7pDhLB7M@tursulin-desk
2023-07-27bpf: Support new 32bit offset jmp instructionYonghong Song
Add interpreter/jit/verifier support for 32bit offset jmp instruction. If a conditional jmp instruction needs more than 16bit offset, it can be simulated with a conditional jmp + a 32bit jmp insn. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011231.3716103-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Fix jit blinding with new sdiv/smov insnsYonghong Song
Handle new insns properly in bpf_jit_blind_insn() function. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011225.3715812-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Support new signed div/mod instructions.Yonghong Song
Add interpreter/jit support for new signed div/mod insns. The new signed div/mod instructions are encoded with unsigned div/mod instructions plus insn->off == 1. Also add basic verifier support to ensure new insns get accepted. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011219.3714605-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Support new unconditional bswap instructionYonghong Song
The existing 'be' and 'le' insns will do conditional bswap depends on host endianness. This patch implements unconditional bswap insns. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011213.3712808-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Handle sign-extenstin ctx member accessesYonghong Song
Currently, if user accesses a ctx member with signed types, the compiler will generate an unsigned load followed by necessary left and right shifts. With the introduction of sign-extension load, compiler may just emit a ldsx insn instead. Let us do a final movsx sign extension to the final unsigned ctx load result to satisfy original sign extension requirement. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011207.3712528-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Support new sign-extension mov insnsYonghong Song
Add interpreter/jit support for new sign-extension mov insns. The original 'MOV' insn is extended to support reg-to-reg signed version for both ALU and ALU64 operations. For ALU mode, the insn->off value of 8 or 16 indicates sign-extension from 8- or 16-bit value to 32-bit value. For ALU64 mode, the insn->off value of 8/16/32 indicates sign-extension from 8-, 16- or 32-bit value to 64-bit value. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011202.3712300-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Support new sign-extension load insnsYonghong Song
Add interpreter/jit support for new sign-extension load insns which adds a new mode (BPF_MEMSX). Also add verifier support to recognize these insns and to do proper verification with new insns. In verifier, besides to deduce proper bounds for the dst_reg, probed memory access is also properly handled. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-28Merge tag 'drm-misc-fixes-2023-07-27' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A single patch to remove an unused function. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/dqvxednqyab5t7gvwvcq72x6yu7ug5gusmhpgs3kq6z7pf3co6@ofr6s7547gbe
2023-07-27bpf, docs: fix BPF_NEG entry in instruction-set.rstJose E. Marchesi
This patch fixes the documentation of the BPF_NEG instruction to denote that it does not use the source register operand. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Acked-by: Dave Thaler <dthaler@microsoft.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230726092543.6362-1-jose.marchesi@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27MAINTAINERS: stmmac: retire Giuseppe CavallaroJakub Kicinski
I tried to get stmmac maintainers to be more active by agreeing with them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks during his "shift month", no reviews are flowing. All the contributions are much appreciated! But stmmac is quite active, we need participating maintainers :( Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230726151120.1649474-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27net: dsa: fix older DSA drivers using phylinkRussell King (Oracle)
Older DSA drivers that do not provide an dsa_ops adjust_link method end up using phylink. Unfortunately, a recent phylink change that requires its supported_interfaces bitmap to be filled breaks these drivers because the bitmap remains empty. Rather than fixing each driver individually, fix it in the core code so we have a sensible set of defaults. Reported-by: Sergei Antonov <saproj@gmail.com> Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27net: datalink: Remove unused declarationsYueHaibing
These declarations is not used after ipx protocol removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230726144054.28780-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27net: Remove unused declaration dev_restart()YueHaibing
This is not used, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230726143715.24700-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27dccp: Remove unused declaration dccp_feat_initialise_sysctls()YueHaibing
This is never used, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230726143239.9904-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE lengthLin Ma
There are totally 9 ndo_bridge_setlink handlers in the current kernel, which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3) i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5) ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7) nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink. By investigating the code, we find that 1-7 parse and use nlattr IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 2 byte integer. To avoid such issues, also for other ndo_bridge_setlink handlers in the future. This patch adds the nla_len check in rtnl_bridge_setlink and does an early error return if length mismatches. To make it works, the break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure this nla_for_each_nested iterates every attribute. Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27bridge: Remove unused declaration br_multicast_set_hash_max()YueHaibing
Since commit 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable") this is not used, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230726143141.11704-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27net: remove comment in ndisc_router_discoveryPatrick Rohr
Removes superfluous (and misplaced) comment from ndisc_router_discovery. Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230726184742.342825-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28ata: pata_ns87415: mark ns87560_tf_read staticArnd Bergmann
The global function triggers a warning because of the missing prototype drivers/ata/pata_ns87415.c:263:6: warning: no previous prototype for 'ns87560_tf_read' [-Wmissing-prototypes] 263 | void ns87560_tf_read(struct ata_port *ap, struct ata_taskfile *tf) There are no other references to this, so just make it static. Fixes: c4b5b7b6c4423 ("pata_ns87415: Initial cut at 87415/87560 IDE support") Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-07-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27mm/memory-failure: fix hardware poison check in unpoison_memory()Sidhartha Kumar
It was pointed out[1] that using folio_test_hwpoison() is wrong as we need to check the indiviual page that has poison. folio_test_hwpoison() only checks the head page so go back to using PageHWPoison(). User-visible effects include existing hwpoison-inject tests possibly failing as unpoisoning a single subpage could lead to unpoisoning an entire folio. Memory unpoisoning could also not work as expected as the function will break early due to only checking the head page and not the actually poisoned subpage. [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/ Link: https://lkml.kernel.org/r/20230717181812.167757-1-sidhartha.kumar@oracle.com Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27proc/vmcore: fix signedness bug in read_from_oldmem()Dan Carpenter
The bug is the error handling: if (tmp < nr_bytes) { "tmp" can hold negative error codes but because "nr_bytes" is type size_t the negative error codes are treated as very high positive values (success). Fix this by changing "nr_bytes" to type ssize_t. The "nr_bytes" variable is used to store values between 1 and PAGE_SIZE and they can fit in ssize_t without any issue. Link: https://lkml.kernel.org/r/b55f7eed-1c65-4adc-95d1-6c7c65a54a6e@moroto.mountain Fixes: 5d8de293c224 ("vmcore: convert copy_oldmem_page() to take an iov_iter") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27mailmap: update remaining active codeaurora.org email addressesBjorn Andersson
The lack of mailmap updates for @codeaurora.org addresses reduces the usefulness of tools such as get_maintainer.pl. Some recent (and welcome!) additions has been made to improve the situation, this concludes the effort. Link: https://lkml.kernel.org/r/20230720210256.1296567-1-quic_bjorande@quicinc.com Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27mm: lock VMA in dup_anon_vma() before setting ->anon_vmaJann Horn
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the VMA that is being expanded to cover the area previously occupied by another VMA. This currently happens while `dst` is not write-locked. This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent page faults can happen on `dst` under the per-VMA lock. This is already icky in itself, since such page faults can now install pages into `dst` that are attached to an `anon_vma` that is not yet tied back to the `anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due to an out-of-memory error, things get much worse: `anon_vma_clone()` then reverts `dst->anon_vma` back to NULL, and `dst` remains completely unconnected to the `anon_vma`, even though we can have pages in the area covered by `dst` that point to the `anon_vma`. This means the `anon_vma` of such pages can be freed while the pages are still mapped into userspace, which leads to UAF when a helper like folio_lock_anon_vma_read() tries to look up the anon_vma of such a page. This theoretically is a security bug, but I believe it is really hard to actually trigger as an unprivileged user because it requires that you can make an order-0 GFP_KERNEL allocation fail, and the page allocator tries pretty hard to prevent that. I think doing the vma_start_write() call inside dup_anon_vma() is the most straightforward fix for now. For a kernel-assisted reproducer, see the notes section of the patch mail. Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27mm: fix memory ordering for mm_lock_seq and vm_lock_seqJann Horn
mm->mm_lock_seq effectively functions as a read/write lock; therefore it must be used with acquire/release semantics. A specific example is the interaction between userfaultfd_register() and lock_vma_under_rcu(). userfaultfd_register() does the following from the point where it changes a VMA's flags to the point where concurrent readers are permitted again (in a simple scenario where only a single private VMA is accessed and no merging/splitting is involved): userfaultfd_register userfaultfd_set_vm_flags vm_flags_reset vma_start_write down_write(&vma->vm_lock->lock) vma->vm_lock_seq = mm_lock_seq [marks VMA as busy] up_write(&vma->vm_lock->lock) vm_flags_init [sets VM_UFFD_* in __vm_flags] vma->vm_userfaultfd_ctx.ctx = ctx mmap_write_unlock vma_end_write_all WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA] There are no memory barriers in between the __vm_flags update and the mm->mm_lock_seq update that unlocks the VMA, so the unlock can be reordered to above the `vm_flags_init()` call, which means from the perspective of a concurrent reader, a VMA can be marked as a userfaultfd VMA while it is not VMA-locked. That's bad, we definitely need a store-release for the unlock operation. The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly fine because all accesses to vma->vm_lock_seq that matter are always protected by the VMA lock. There is a racy read in vma_start_read() though that can tolerate false-positives, so we should be using WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN). On the other side, lock_vma_under_rcu() works as follows in the relevant region for locking and userfaultfd check: lock_vma_under_rcu vma_start_read vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout] down_read_trylock(&vma->vm_lock->lock) vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check] userfaultfd_armed checks vma->vm_flags & __VM_UFFD_FLAGS Here, the interesting aspect is how far down the mm->mm_lock_seq read can be reordered - if this read is reordered down below the vma->vm_flags access, this could cause lock_vma_under_rcu() to partly operate on information that was read while the VMA was supposed to be locked. To prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we need to read it with a load-acquire. Some of the comment wording is based on suggestions by Suren. BACKPORT WARNING: One of the functions changed by this patch (which I've written against Linus' tree) is vma_try_start_write(), but this function no longer exists in mm/mm-everything. I don't know whether the merged version of this patch will be ordered before or after the patch that removes vma_try_start_write(). If you're backporting this patch to a tree with vma_try_start_write(), make sure this patch changes that function. Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27scripts/spelling.txt: remove 'thead' as a typoDrew Fustini
T-Head is a vendor of processor core IP, and they have recently introduced the RISC-V TH1520 SoC. Remove 'thead' as a typo of 'thread' to avoid checkpatch incorrectly warning that 'thead' is typo in patches that add support for T-Head designs in the kernel. Link: https://lkml.kernel.org/r/20230723010329.674186-1-dfustini@baylibre.com Link: https://www.t-head.cn/ Signed-off-by: Drew Fustini <dfustini@baylibre.com> Acked-by: Guo Ren <guoren@kernel.org> Cc: Conor Dooley <conor@kernel.org> Cc: Jisheng Zhang <jszhang@kernel.org> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Diederik de Haas <didi.debian@cknow.org> Cc: Ian Rogers <irogers@google.com> Cc: Luca Ceresoli <luca.ceresoli@bootlin.com> # versaclock5 Cc: Randy Dunlap <rdunlap@infradead.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27mm/pagewalk: fix EFI_PGT_DUMP of espfix areaHugh Dickins
Booting x86_64 with CONFIG_EFI_PGT_DUMP=y shows messages of the form "mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)". EFI_PGT_DUMP dumps all of efi_mm, including the espfix area, which is set up with pmd entries which fit the pmd_bad() check: so 0d940a9b270b warns and clears those entries, which would ruin running Win16 binaries. The failing pte_offset_map() stopped such a kernel from even booting, until a few commits later be872f83bf57 changed the pagewalk to tolerate that: but it needs to be even more careful, to not spoil those entries. I might have preferred to change init_espfix_ap() not to use "bad" pmd entries; or to leave them out of the efi_mm dump. But there is great value in staying away from there, and a pagewalk check of address against TASK_SIZE may protect from other such aberrations too. Link: https://lkml.kernel.org/r/22bca736-4cab-9ee5-6a52-73a3b2bbe865@google.com Closes: https://lore.kernel.org/linux-mm/CABXGCsN3JqXckWO=V7p=FhPU1tK03RE1w9UE6xL5Y86SMk209w@mail.gmail.com/ Fixes: 0d940a9b270b ("mm/pgtable: allow pte_offset_map[_lock]() to fail") Fixes: be872f83bf57 ("mm/pagewalk: walk_pte_range() allow for pte_offset_map()") Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27shmem: minor fixes to splice-read implementationHugh Dickins
HWPoison: my reading of folio_test_hwpoison() is that it only tests the head page of a large folio, whereas splice_folio_into_pipe() will splice as much of the folio as it can: so for safety we should also check the has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned. (Perhaps that ugliness can be improved at the mm end later.) The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask it for "part" not "len". Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com Fixes: bd194b187115 ("shmem: Implement splice-read") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27tmpfs: fix Documentation of noswap and huge mount optionsHugh Dickins
The noswap mount option is surely not one of the three options for sizing: move its description down. The huge= mount option does not accept numeric values: those are just in an internal enum. Delete those numbers, and follow the manpage text more closely (but there's not yet any fadvise() or fcntl() which applies here). /sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and barely relevant to mounting a tmpfs: just refer to transhuge.rst (while still using the words deny and force, to help as informal reminders). [rdunlap@infradead.org: fixup Docs table for huge mount options] Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Fixes: d0f5a85442d1 ("shmem: update documentation") Fixes: 2c6efe9cf2d7 ("shmem: add support to ignore swap") Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27Revert "um: Use swap() to make code cleaner"Andy Shevchenko
This reverts commit 9b0da3f22307af693be80f5d3a89dc4c7f360a85. The sigio.c is clearly user space code which is handled by arch/um/scripts/Makefile.rules (see USER_OBJS rule). The above mentioned commit simply broke this agreement, we may not use Linux kernel internal headers in them without thorough thinking. Hence, revert the wrong commit. Link: https://lkml.kernel.org/r/20230724143131.30090-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307212304.cH79zJp1-lkp@intel.com/ Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Herve Codina <herve.codina@bootlin.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Richard Weinberger <richard@nod.at> Cc: Yang Guang <yang.guang5@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27mm/damon/core-test: initialise context before test in damon_test_set_attrs()Feng Tang
Running kunit test for 6.5-rc1 hits one bug: ok 10 damon_test_update_monitoring_result general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 110 Comm: kunit_try_catch Tainted: G N 6.5.0-rc2 #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:damon_set_attrs+0xb9/0x120 Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d 8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00 49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89 RSP: 0000:ffffc900005bfd40 EFLAGS: 00010246 RAX: ffffffff81159fc0 RBX: ffffc900005bfeb8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc900005bfd70 RBP: ffffc90000013c10 R08: ffffc900005bfdc0 R09: ffffffff81ff10ed R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78 R13: ffff88810eb78180 R14: ffffffff818297c0 R15: ffffc90000013c28 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002a1c001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> damon_test_set_attrs+0x63/0x1f0 kunit_generic_run_threadfn_adapter+0x17/0x30 kthread+0xfd/0x130 The problem seems to be related with the damon_ctx was used without being initialized. Fix it by adding the initialization. Link: https://lkml.kernel.org/r/20230718052811.1065173-1-feng.tang@intel.com Fixes: aa13779be6b7 ("mm/damon/core-test: add a test for damon_set_attrs()") Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-27Merge tag 'net-6.5-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, netfilter. Current release - regressions: - core: fix splice_to_socket() for O_NONBLOCK socket - af_unix: fix fortify_panic() in unix_bind_bsd(). - can: raw: fix lockdep issue in raw_release() Previous releases - regressions: - tcp: reduce chance of collisions in inet6_hashfn(). - netfilter: skip immediate deactivate in _PREPARE_ERROR - tipc: stop tipc crypto on failure in tipc_node_create - eth: igc: fix kernel panic during ndo_tx_timeout callback - eth: iavf: fix potential deadlock on allocation failure Previous releases - always broken: - ipv6: fix bug where deleting a mngtmpaddr can create a new temporary address - eth: ice: fix memory management in ice_ethtool_fdir.c - eth: hns3: fix the imp capability bit cannot exceed 32 bits issue - eth: vxlan: calculate correct header length for GPE - eth: stmmac: apply redundant write work around on 4.xx too" * tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits) tipc: stop tipc crypto on failure in tipc_node_create af_unix: Terminate sun_path when bind()ing pathname socket. tipc: check return value of pskb_trim() benet: fix return value check in be_lancer_xmit_workarounds() virtio-net: fix race between set queues and probe net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64 splice, net: Fix splice_to_socket() for O_NONBLOCK socket net: fec: tx processing does not call XDP APIs if budget is 0 mptcp: more accurate NL event generation selftests: mptcp: join: only check for ip6tables if needed tools: ynl-gen: fix parse multi-attr enum attribute tools: ynl-gen: fix enum index in _decode_enum(..) netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR netfilter: nft_set_rbtree: fix overlap expiration walk igc: Fix Kernel Panic during ndo_tx_timeout callback net: dsa: qca8k: fix mdb add/del case with 0 VID net: dsa: qca8k: fix broken search_and_del net: dsa: qca8k: fix search_and_insert wrong handling of new rule net: dsa: qca8k: enable use_single_write for qca8xxx ...
2023-07-27Merge tag 'soundwire-6.5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire fixes from Vinod Koul: - Core fix for enumeration completion - Qualcomm driver fix to update status - AMD driver fix for probe error check * tag 'soundwire-6.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: soundwire: amd: Fix a check for errors in probe() soundwire: qcom: update status correctly with mask soundwire: fix enumeration completion
2023-07-27Merge tag 'phy-fixes-6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy fixes from Vinod Koul: - Out of bound fix for hisilicon phy - Qualcomm synopsis femto phy for keeping clock enabled during suspend and enabling ref clocks - Mediatek driver fixes for upper limit test and error code * tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe() phy: qcom-snps-femto-v2: use qcom_snps_hsphy_suspend/resume error code phy: qcom-snps-femto-v2: properly enable ref clock phy: qcom-snps-femto-v2: keep cfg_ahb_clk enabled during runtime suspend phy: mediatek: hdmi: mt8195: fix prediv bad upper limit test phy: phy-mtk-dp: Fix an error code in probe()
2023-07-27Merge tag 'for-6.5-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix accounting of global block reserve size when block group tree is enabled - the async discard has been enabled in 6.2 unconditionally, but for zoned mode it does not make that much sense to do it asynchronously as the zones are reset as needed - error handling and proper error value propagation fixes * tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: check for commit error at btrfs_attach_transaction_barrier() btrfs: check if the transaction was aborted at btrfs_wait_for_commit() btrfs: remove BUG_ON()'s in add_new_free_space() btrfs: account block group tree when calculating global reserve size btrfs: zoned: do not enable async discard
2023-07-27Merge tag 'fixes-2023-07-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fix from Mike Rapoport: "A call to memblock_free() or memblock_phys_free() issued after memblock data is discarded will result in use after free in memblock_isolate_range(). Avoid those issues by making sure that memblock_discard points memblock.reserved.regions back at the static buffer" * tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: mm,memblock: reset memblock.reserved to system init state to prevent UAF
2023-07-27net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefixJiri Pirko
As esw_offloads_load/unload_rep() are used outside eswitch.c it is nicer for them to have "mlx5_" prefix. Add it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27net/mlx5: Make mlx5_eswitch_load/unload_vport() staticJiri Pirko
mlx5_eswitch_load/unload_vport()() functions are not used outside of eswitch.c. Make them static. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-27net/mlx5: Make mlx5_esw_offloads_rep_load/unload() staticJiri Pirko
mlx5_esw_offloads_rep_load/unload() functions are not used outside of eswitch_offloads.c. Make them static. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>