Age | Commit message (Collapse) | Author |
|
Yonghong Song says:
====================
bpf: Support new insns from cpu v4
In previous discussion ([1]), it is agreed that we should introduce
cpu version 4 (llvm flag -mcpu=v4) which contains some instructions
which can simplify code, make code easier to understand, fix the
existing problem, or simply for feature completeness. More specifically,
the following new insns are proposed:
. sign extended load
. sign extended mov
. bswap
. signed div/mod
. ja with 32-bit offset
This patch set added kernel support for insns proposed in [1] except
BPF_ST which already has full kernel support. Beside the above proposed
insns, LLVM will generate BPF_ST insn as well under -mcpu=v4.
The llvm patch ([2]) has been merged into llvm-project 'main' branch.
The patchset implements interpreter, jit and verifier support for these new
insns.
For this patch set, I tested cpu v2/v3/v4 and the selftests are all passed.
I also tested selftests introduced in this patch set with additional changes
beside normal jit testing (bpf_jit_enable = 1 and bpf_jit_harden = 0)
- bpf_jit_enable = 0
- bpf_jit_enable = 1 and bpf_jit_harden = 1
and both testing passed.
[1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
[2] https://reviews.llvm.org/D144829
Changelogs:
v4 -> v5:
. for v4, patch 8/17 missed in mailing list and patchwork, so resend.
. rebase on top of master
v3 -> v4:
. some minor asm syntax adjustment based on llvm change.
. add clang version and target arch guard for new tests
so they can still compile with old llvm compilers.
. some changes to the bpf doc.
v2 -> v3:
. add missed disasm change from v2.
. handle signed load of ctx fields properly.
. fix some interpreter sdiv/smod error when bpf_jit_enable = 0.
. fix some verifier range bounding errors.
. add more C tests.
RFCv1 -> v2:
. add more verifier supports for signed extend load and mov insns.
. rename some insn names to be more consistent with intel practice.
. add cpuv4 test runner for test progs.
. add more unit and C tests.
. add documentation.
====================
Link: https://lore.kernel.org/r/20230728011143.3710005-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add documentation in instruction-set.rst for new instruction encoding
and their corresponding operations. Also removed the question
related to 'no BPF_SDIV' in bpf_design_QA.rst since we have
BPF_SDIV insn now.
Cc: bpf@ietf.org
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011342.3724411-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The following ldsx cases are tested:
- signed readonly map value
- read/write map value
- probed memory
- not-narrowed ctx field access
- narrowed ctx field access.
Without previous proper verifier/git handling, the test will fail.
If cpuv4 is not supported either by compiler or by jit,
the test will be skipped.
# ./test_progs -t ldsx_insn
#113/1 ldsx_insn/map_val and probed_memory:SKIP
#113/2 ldsx_insn/ctx_member_sign_ext:SKIP
#113/3 ldsx_insn/ctx_member_narrow_sign_ext:SKIP
#113 ldsx_insn:SKIP
Summary: 1/0 PASSED, 3 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011336.3723434-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add unit tests for gotol insn.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011329.3721881-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add unit tests for sdiv/smod insns.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011321.3720500-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add unit tests for bswap insns.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011314.3720109-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add unit tests for movsx insns.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011309.3719295-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add unit tests for new ldsx insns. The test includes sign-extension
with a single value or with a value range.
If cpuv4 is not supported due to
(1) older compiler, e.g., less than clang version 18, or
(2) test runner test_progs and test_progs-no_alu32 which tests
cpu v2 and v3, or
(3) non-x86_64 arch not supporting new insns in jit yet,
a dummy program is added with below output:
#318/1 verifier_ldsx/cpuv4 is not supported by compiler or jit, use a dummy test:OK
#318 verifier_ldsx:OK
to indicate the test passed with a dummy test instead of actually
testing cpuv4. I am using a dummy prog to avoid changing the
verifier testing infrastructure. Once clang 18 is widely available
and other architectures support cpuv4, at least for CI run,
the dummy program can be removed.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011304.3719139-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Similar to no-alu32 runner, if clang compiler supports -mcpu=v4,
a cpuv4 runner is created to test bpf programs compiled with
-mcpu=v4.
The following are some num-of-insn statistics for each newer
instructions based on existing selftests, excluding subsequent
cpuv4 insn specific tests.
insn pattern # of instructions
reg = (s8)reg 4
reg = (s16)reg 4
reg = (s32)reg 144
reg = *(s8 *)(reg + off) 13
reg = *(s16 *)(reg + off) 14
reg = *(s32 *)(reg + off) 15215
reg = bswap16 reg 142
reg = bswap32 reg 38
reg = bswap64 reg 14
reg s/= reg 0
reg s%= reg 0
gotol <offset> 58
Note that in llvm -mcpu=v4 implementation, the compiler is a little
bit conservative about generating 'gotol' insn (32-bit branch offset)
as it didn't precise count the number of insns (e.g., some insns are
debug insns, etc.). Compared to old 'goto' insn, newer 'gotol' insn
should have comparable verification states to 'goto' insn.
With current patch set, all selftests passed with -mcpu=v4
when running test_progs-cpuv4 binary. The -mcpu=v3 and -mcpu=v2 run
are also successful.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011250.3718252-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The following test_verifier subtest failed due to
new encoding for BSWAP.
$ ./test_verifier
...
#99/u invalid 64-bit BPF_END FAIL
Unexpected success to load!
verification time 215 usec
stack depth 0
processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
#99/p invalid 64-bit BPF_END FAIL
Unexpected success to load!
verification time 198 usec
stack depth 0
processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Tighten the test so it still reports a failure.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011244.3717464-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add asm support for new instructions so kernel verifier and bpftool
xlated insn dumps can have proper asm syntax for new instructions.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Use shmem for dpt objects [dpt] (Radhakrishna Sripada)
- Fix an error handling path in igt_write_huge() (Christophe JAILLET)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZMI4Mtom7pDhLB7M@tursulin-desk
|
|
Add interpreter/jit/verifier support for 32bit offset jmp instruction.
If a conditional jmp instruction needs more than 16bit offset,
it can be simulated with a conditional jmp + a 32bit jmp insn.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011231.3716103-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Handle new insns properly in bpf_jit_blind_insn() function.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011225.3715812-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add interpreter/jit support for new signed div/mod insns.
The new signed div/mod instructions are encoded with
unsigned div/mod instructions plus insn->off == 1.
Also add basic verifier support to ensure new insns get
accepted.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011219.3714605-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The existing 'be' and 'le' insns will do conditional bswap
depends on host endianness. This patch implements
unconditional bswap insns.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011213.3712808-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, if user accesses a ctx member with signed types,
the compiler will generate an unsigned load followed by
necessary left and right shifts.
With the introduction of sign-extension load, compiler may
just emit a ldsx insn instead. Let us do a final movsx sign
extension to the final unsigned ctx load result to
satisfy original sign extension requirement.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011207.3712528-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add interpreter/jit support for new sign-extension mov insns.
The original 'MOV' insn is extended to support reg-to-reg
signed version for both ALU and ALU64 operations. For ALU mode,
the insn->off value of 8 or 16 indicates sign-extension
from 8- or 16-bit value to 32-bit value. For ALU64 mode,
the insn->off value of 8/16/32 indicates sign-extension
from 8-, 16- or 32-bit value to 64-bit value.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011202.3712300-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add interpreter/jit support for new sign-extension load insns
which adds a new mode (BPF_MEMSX).
Also add verifier support to recognize these insns and to
do proper verification with new insns. In verifier, besides
to deduce proper bounds for the dst_reg, probed memory access
is also properly handled.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A single patch to remove an unused function.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/dqvxednqyab5t7gvwvcq72x6yu7ug5gusmhpgs3kq6z7pf3co6@ofr6s7547gbe
|
|
This patch fixes the documentation of the BPF_NEG instruction to
denote that it does not use the source register operand.
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Acked-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230726092543.6362-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
I tried to get stmmac maintainers to be more active by agreeing with
them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks
during his "shift month", no reviews are flowing.
All the contributions are much appreciated! But stmmac is quite
active, we need participating maintainers :(
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726151120.1649474-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Older DSA drivers that do not provide an dsa_ops adjust_link method end
up using phylink. Unfortunately, a recent phylink change that requires
its supported_interfaces bitmap to be filled breaks these drivers
because the bitmap remains empty.
Rather than fixing each driver individually, fix it in the core code so
we have a sensible set of defaults.
Reported-by: Sergei Antonov <saproj@gmail.com>
Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com> # dsa_loop
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These declarations is not used after ipx protocol removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726144054.28780-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is not used, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726143715.24700-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is never used, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230726143239.9904-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are totally 9 ndo_bridge_setlink handlers in the current kernel,
which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.
By investigating the code, we find that 1-7 parse and use nlattr
IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
length 0) to be viewed as a 2 byte integer.
To avoid such issues, also for other ndo_bridge_setlink handlers in the
future. This patch adds the nla_len check in rtnl_bridge_setlink and
does an early error return if length mismatches. To make it works, the
break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
this nla_for_each_nested iterates every attribute.
Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable")
this is not used, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20230726143141.11704-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Removes superfluous (and misplaced) comment from ndisc_router_discovery.
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230726184742.342825-1-prohr@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The global function triggers a warning because of the missing prototype
drivers/ata/pata_ns87415.c:263:6: warning: no previous prototype for 'ns87560_tf_read' [-Wmissing-prototypes]
263 | void ns87560_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
There are no other references to this, so just make it static.
Fixes: c4b5b7b6c4423 ("pata_ns87415: Initial cut at 87415/87560 IDE support")
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It was pointed out[1] that using folio_test_hwpoison() is wrong as we need
to check the indiviual page that has poison. folio_test_hwpoison() only
checks the head page so go back to using PageHWPoison().
User-visible effects include existing hwpoison-inject tests possibly
failing as unpoisoning a single subpage could lead to unpoisoning an
entire folio. Memory unpoisoning could also not work as expected as
the function will break early due to only checking the head page and
not the actually poisoned subpage.
[1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/
Link: https://lkml.kernel.org/r/20230717181812.167757-1-sidhartha.kumar@oracle.com
Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The bug is the error handling:
if (tmp < nr_bytes) {
"tmp" can hold negative error codes but because "nr_bytes" is type size_t
the negative error codes are treated as very high positive values
(success). Fix this by changing "nr_bytes" to type ssize_t. The
"nr_bytes" variable is used to store values between 1 and PAGE_SIZE and
they can fit in ssize_t without any issue.
Link: https://lkml.kernel.org/r/b55f7eed-1c65-4adc-95d1-6c7c65a54a6e@moroto.mountain
Fixes: 5d8de293c224 ("vmcore: convert copy_oldmem_page() to take an iov_iter")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The lack of mailmap updates for @codeaurora.org addresses reduces the
usefulness of tools such as get_maintainer.pl. Some recent (and welcome!)
additions has been made to improve the situation, this concludes the
effort.
Link: https://lkml.kernel.org/r/20230720210256.1296567-1-quic_bjorande@quicinc.com
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA. This currently happens while `dst` is not write-locked.
This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock. This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.
This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.
This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.
I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.
For a kernel-assisted reproducer, see the notes section of the patch mail.
Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.
A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().
userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):
userfaultfd_register
userfaultfd_set_vm_flags
vm_flags_reset
vma_start_write
down_write(&vma->vm_lock->lock)
vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
up_write(&vma->vm_lock->lock)
vm_flags_init
[sets VM_UFFD_* in __vm_flags]
vma->vm_userfaultfd_ctx.ctx = ctx
mmap_write_unlock
vma_end_write_all
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked. That's bad, we definitely need a
store-release for the unlock operation.
The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock. There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:
lock_vma_under_rcu
vma_start_read
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
down_read_trylock(&vma->vm_lock->lock)
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
userfaultfd_armed
checks vma->vm_flags & __VM_UFFD_FLAGS
Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked. To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.
Some of the comment wording is based on suggestions by Suren.
BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything. I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write(). If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.
Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
T-Head is a vendor of processor core IP, and they have recently introduced
the RISC-V TH1520 SoC. Remove 'thead' as a typo of 'thread' to avoid
checkpatch incorrectly warning that 'thead' is typo in patches that add
support for T-Head designs in the kernel.
Link: https://lkml.kernel.org/r/20230723010329.674186-1-dfustini@baylibre.com
Link: https://www.t-head.cn/
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Acked-by: Guo Ren <guoren@kernel.org>
Cc: Conor Dooley <conor@kernel.org>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Diederik de Haas <didi.debian@cknow.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Luca Ceresoli <luca.ceresoli@bootlin.com> # versaclock5
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Booting x86_64 with CONFIG_EFI_PGT_DUMP=y shows messages of the form
"mm/pgtable-generic.c:53: bad pmd (____ptrval____)(8000000100077061)".
EFI_PGT_DUMP dumps all of efi_mm, including the espfix area, which is set
up with pmd entries which fit the pmd_bad() check: so 0d940a9b270b warns
and clears those entries, which would ruin running Win16 binaries.
The failing pte_offset_map() stopped such a kernel from even booting,
until a few commits later be872f83bf57 changed the pagewalk to tolerate
that: but it needs to be even more careful, to not spoil those entries.
I might have preferred to change init_espfix_ap() not to use "bad" pmd
entries; or to leave them out of the efi_mm dump. But there is great
value in staying away from there, and a pagewalk check of address against
TASK_SIZE may protect from other such aberrations too.
Link: https://lkml.kernel.org/r/22bca736-4cab-9ee5-6a52-73a3b2bbe865@google.com
Closes: https://lore.kernel.org/linux-mm/CABXGCsN3JqXckWO=V7p=FhPU1tK03RE1w9UE6xL5Y86SMk209w@mail.gmail.com/
Fixes: 0d940a9b270b ("mm/pgtable: allow pte_offset_map[_lock]() to fail")
Fixes: be872f83bf57 ("mm/pagewalk: walk_pte_range() allow for pte_offset_map()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
HWPoison: my reading of folio_test_hwpoison() is that it only tests the
head page of a large folio, whereas splice_folio_into_pipe() will splice
as much of the folio as it can: so for safety we should also check the
has_hwpoisoned flag, set if any of the folio's pages are hwpoisoned.
(Perhaps that ugliness can be improved at the mm end later.)
The call to splice_zeropage_into_pipe() risked overrunning past EOF: ask
it for "part" not "len".
Link: https://lkml.kernel.org/r/32c72c9c-72a8-115f-407d-f0148f368@google.com
Fixes: bd194b187115 ("shmem: Implement splice-read")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The noswap mount option is surely not one of the three options for sizing:
move its description down.
The huge= mount option does not accept numeric values: those are just in
an internal enum. Delete those numbers, and follow the manpage text more
closely (but there's not yet any fadvise() or fcntl() which applies here).
/sys/kernel/mm/transparent_hugepage/shmem_enabled is hard to describe, and
barely relevant to mounting a tmpfs: just refer to transhuge.rst (while
still using the words deny and force, to help as informal reminders).
[rdunlap@infradead.org: fixup Docs table for huge mount options]
Link: https://lkml.kernel.org/r/20230725052333.26857-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/986cb0bf-9780-354-9bb-4bf57aadbab@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Fixes: d0f5a85442d1 ("shmem: update documentation")
Fixes: 2c6efe9cf2d7 ("shmem: add support to ignore swap")
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit 9b0da3f22307af693be80f5d3a89dc4c7f360a85.
The sigio.c is clearly user space code which is handled by
arch/um/scripts/Makefile.rules (see USER_OBJS rule).
The above mentioned commit simply broke this agreement,
we may not use Linux kernel internal headers in them without
thorough thinking.
Hence, revert the wrong commit.
Link: https://lkml.kernel.org/r/20230724143131.30090-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307212304.cH79zJp1-lkp@intel.com/
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Herve Codina <herve.codina@bootlin.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Richard Weinberger <richard@nod.at>
Cc: Yang Guang <yang.guang5@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Running kunit test for 6.5-rc1 hits one bug:
ok 10 damon_test_update_monitoring_result
general protection fault, probably for non-canonical address 0x1bffa5c419cfb81: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 110 Comm: kunit_try_catch Tainted: G N 6.5.0-rc2 #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:damon_set_attrs+0xb9/0x120
Code: f8 00 00 00 4c 8d 58 e0 48 39 c3 74 ba 41 ba 59 17 b7 d1 49 8b 43 10 4d
8d 4b 10 48 8d 70 e0 49 39 c1 74 50 49 8b 40 08 31 d2 <69> 4e 18 10 27 00 00
49 f7 30 31 d2 48 89 c5 89 c8 f7 f5 31 d2 89
RSP: 0000:ffffc900005bfd40 EFLAGS: 00010246
RAX: ffffffff81159fc0 RBX: ffffc900005bfeb8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 01bffa5c419cfb69 RDI: ffffc900005bfd70
RBP: ffffc90000013c10 R08: ffffc900005bfdc0 R09: ffffffff81ff10ed
R10: 00000000d1b71759 R11: ffffffff81ff10dd R12: ffffc90000013a78
R13: ffff88810eb78180 R14: ffffffff818297c0 R15: ffffc90000013c28
FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000002a1c001 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
damon_test_set_attrs+0x63/0x1f0
kunit_generic_run_threadfn_adapter+0x17/0x30
kthread+0xfd/0x130
The problem seems to be related with the damon_ctx was used without
being initialized. Fix it by adding the initialization.
Link: https://lkml.kernel.org/r/20230718052811.1065173-1-feng.tang@intel.com
Fixes: aa13779be6b7 ("mm/damon/core-test: add a test for damon_set_attrs()")
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, netfilter.
Current release - regressions:
- core: fix splice_to_socket() for O_NONBLOCK socket
- af_unix: fix fortify_panic() in unix_bind_bsd().
- can: raw: fix lockdep issue in raw_release()
Previous releases - regressions:
- tcp: reduce chance of collisions in inet6_hashfn().
- netfilter: skip immediate deactivate in _PREPARE_ERROR
- tipc: stop tipc crypto on failure in tipc_node_create
- eth: igc: fix kernel panic during ndo_tx_timeout callback
- eth: iavf: fix potential deadlock on allocation failure
Previous releases - always broken:
- ipv6: fix bug where deleting a mngtmpaddr can create a new
temporary address
- eth: ice: fix memory management in ice_ethtool_fdir.c
- eth: hns3: fix the imp capability bit cannot exceed 32 bits issue
- eth: vxlan: calculate correct header length for GPE
- eth: stmmac: apply redundant write work around on 4.xx too"
* tag 'net-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
tipc: stop tipc crypto on failure in tipc_node_create
af_unix: Terminate sun_path when bind()ing pathname socket.
tipc: check return value of pskb_trim()
benet: fix return value check in be_lancer_xmit_workarounds()
virtio-net: fix race between set queues and probe
net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
splice, net: Fix splice_to_socket() for O_NONBLOCK socket
net: fec: tx processing does not call XDP APIs if budget is 0
mptcp: more accurate NL event generation
selftests: mptcp: join: only check for ip6tables if needed
tools: ynl-gen: fix parse multi-attr enum attribute
tools: ynl-gen: fix enum index in _decode_enum(..)
netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
netfilter: nft_set_rbtree: fix overlap expiration walk
igc: Fix Kernel Panic during ndo_tx_timeout callback
net: dsa: qca8k: fix mdb add/del case with 0 VID
net: dsa: qca8k: fix broken search_and_del
net: dsa: qca8k: fix search_and_insert wrong handling of new rule
net: dsa: qca8k: enable use_single_write for qca8xxx
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fixes from Vinod Koul:
- Core fix for enumeration completion
- Qualcomm driver fix to update status
- AMD driver fix for probe error check
* tag 'soundwire-6.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: amd: Fix a check for errors in probe()
soundwire: qcom: update status correctly with mask
soundwire: fix enumeration completion
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:
- Out of bound fix for hisilicon phy
- Qualcomm synopsis femto phy for keeping clock enabled during suspend
and enabling ref clocks
- Mediatek driver fixes for upper limit test and error code
* tag 'phy-fixes-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe()
phy: qcom-snps-femto-v2: use qcom_snps_hsphy_suspend/resume error code
phy: qcom-snps-femto-v2: properly enable ref clock
phy: qcom-snps-femto-v2: keep cfg_ahb_clk enabled during runtime suspend
phy: mediatek: hdmi: mt8195: fix prediv bad upper limit test
phy: phy-mtk-dp: Fix an error code in probe()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix accounting of global block reserve size when block group tree is
enabled
- the async discard has been enabled in 6.2 unconditionally, but for
zoned mode it does not make that much sense to do it asynchronously
as the zones are reset as needed
- error handling and proper error value propagation fixes
* tag 'for-6.5-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: check for commit error at btrfs_attach_transaction_barrier()
btrfs: check if the transaction was aborted at btrfs_wait_for_commit()
btrfs: remove BUG_ON()'s in add_new_free_space()
btrfs: account block group tree when calculating global reserve size
btrfs: zoned: do not enable async discard
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock fix from Mike Rapoport:
"A call to memblock_free() or memblock_phys_free() issued after
memblock data is discarded will result in use after free in
memblock_isolate_range().
Avoid those issues by making sure that memblock_discard points
memblock.reserved.regions back at the static buffer"
* tag 'fixes-2023-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
mm,memblock: reset memblock.reserved to system init state to prevent UAF
|
|
As esw_offloads_load/unload_rep() are used outside eswitch.c it is nicer
for them to have "mlx5_" prefix. Add it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5_eswitch_load/unload_vport()() functions are not used
outside of eswitch.c. Make them static.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5_esw_offloads_rep_load/unload() functions are not used
outside of eswitch_offloads.c. Make them static.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|