Age | Commit message (Collapse) | Author |
|
There is a dedicated tool for configuration of DCB in iproute2 now. Use it
in the selftest instead of mlnx_qos.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a dedicated tool for configuration of DCB in iproute2 now. Use it
in the selftest instead of mlnx_qos.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The test sometimes fails with an error message such as:
TEST: tc sample (w/ flower) rate (egress) [FAIL]
Expected 100 packets, got 70 packets, which is -30% off. Required accuracy is +-25%
Make the test more robust by generating more packets, therefore
increasing the number of expected samples. Decrease the transmission
delay in order not to needlessly prolong the test.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the array of the ports that were split in the port_scale test
is local, so the port_cleanup() unsplits an empty array.
Make the array global so the cleanup will be preformed properly.
Suggested-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expect the lowest IPv4 address in a subnet to be assignable
and addressable as a unicast (non-broadcast) address on a
local network segment.
Signed-off-by: Seth David Schoen <schoen@loyalty.org>
Suggested-by: John Gilmore <gnu@toad.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds some basic tests for the low level bpf_tc_* API.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-4-memxor@gmail.com
|
|
This adds functions that wrap the netlink API used for adding, manipulating,
and removing traffic control filters.
The API summary:
A bpf_tc_hook represents a location where a TC-BPF filter can be attached.
This means that creating a hook leads to creation of the backing qdisc,
while destruction either removes all filters attached to a hook, or destroys
qdisc if requested explicitly (as discussed below).
The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
query, and detach tc filters. All functions return 0 on success, and a
negative error code on failure.
bpf_tc_hook_create - Create a hook
Parameters:
@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
proper enum constant. Note that parent must be unset when
attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
valid value for attach_point.
Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
bpf_tc_hook_destroy - Destroy a hook
Parameters:
@hook - Cannot be NULL. The behaviour depends on value of
attach_point. If BPF_TC_INGRESS, all filters attached to
the ingress hook will be detached. If BPF_TC_EGRESS, all
filters attached to the egress hook will be detached. If
BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
deleted, also detaching all filters. As before, parent must
be unset for these attach_points, and set for BPF_TC_CUSTOM.
It is advised that if the qdisc is operated on by many programs,
then the program at least check that there are no other existing
filters before deleting the clsact qdisc. An example is shown
below:
DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
.attach_point = BPF_TC_INGRESS);
/* set opts as NULL, as we're not really interested in
* getting any info for a particular filter, but just
* detecting its presence.
*/
r = bpf_tc_query(&hook, NULL);
if (r == -ENOENT) {
/* no filters */
hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
return bpf_tc_hook_destroy(&hook);
} else {
/* failed or r == 0, the latter means filters do exist */
return r;
}
Note that there is a small race between checking for no
filters and deleting the qdisc. This is currently unavoidable.
Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
bpf_tc_attach - Attach a filter to a hook
Parameters:
@hook - Cannot be NULL. Represents the hook the filter will be
attached to. Requirements for ifindex and attach_point are
same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
is also supported. In that case, parent must be set to the
handle where the filter will be attached (using BPF_TC_PARENT).
E.g. to set parent to 1:16 like in tc command line, the
equivalent would be BPF_TC_PARENT(1, 16).
@opts - Cannot be NULL. The following opts are optional:
* handle - The handle of the filter
* priority - The priority of the filter
Must be >= 0 and <= UINT16_MAX
Note that when left unset, they will be auto-allocated by
the kernel. The following opts must be set:
* prog_fd - The fd of the loaded SCHED_CLS prog
The following opts must be unset:
* prog_id - The ID of the BPF prog
The following opts are optional:
* flags - Currently only BPF_TC_F_REPLACE is allowed. It
allows replacing an existing filter instead of
failing with -EEXIST.
The following opts will be filled by bpf_tc_attach on a
successful attach operation if they are unset:
* handle - The handle of the attached filter
* priority - The priority of the attached filter
* prog_id - The ID of the attached SCHED_CLS prog
This way, the user can know what the auto allocated values
for optional opts like handle and priority are for the newly
attached filter, if they were unset.
Note that some other attributes are set to fixed default
values listed below (this holds for all bpf_tc_* APIs):
protocol as ETH_P_ALL, direct action mode, chain index of 0,
and class ID of 0 (this can be set by writing to the
skb->tc_classid field from the BPF program).
bpf_tc_detach
Parameters:
@hook - Cannot be NULL. Represents the hook the filter will be
detached from. Requirements are same as described above
in bpf_tc_attach.
@opts - Cannot be NULL. The following opts must be set:
* handle, priority
The following opts must be unset:
* prog_fd, prog_id, flags
bpf_tc_query
Parameters:
@hook - Cannot be NULL. Represents the hook where the filter lookup will
be performed. Requirements are same as described above in
bpf_tc_attach().
@opts - Cannot be NULL. The following opts must be set:
* handle, priority
The following opts must be unset:
* prog_fd, prog_id, flags
The following fields will be filled by bpf_tc_query upon a
successful lookup:
* prog_id
Some usage examples (using BPF skeleton infrastructure):
BPF program (test_tc_bpf.c):
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("classifier")
int cls(struct __sk_buff *skb)
{
return 0;
}
Userspace loader:
struct test_tc_bpf *skel = NULL;
int fd, r;
skel = test_tc_bpf__open_and_load();
if (!skel)
return -ENOMEM;
fd = bpf_program__fd(skel->progs.cls);
DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
if_nametoindex("lo"), .attach_point =
BPF_TC_INGRESS);
/* Create clsact qdisc */
r = bpf_tc_hook_create(&hook);
if (r < 0)
goto end;
DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
r = bpf_tc_attach(&hook, &opts);
if (r < 0)
goto end;
/* Print the auto allocated handle and priority */
printf("Handle=%u", opts.handle);
printf("Priority=%u", opts.priority);
opts.prog_fd = opts.prog_id = 0;
bpf_tc_detach(&hook, &opts);
end:
test_tc_bpf__destroy(skel);
This is equivalent to doing the following using tc command line:
# tc qdisc add dev lo clsact
# tc filter add dev lo ingress bpf obj foo.o sec classifier da
# tc filter del dev lo ingress handle <h> prio <p> bpf
... where the handle and priority can be found using:
# tc filter show dev lo ingress
Another example replacing a filter (extending prior example):
/* We can also choose both (or one), let's try replacing an
* existing filter.
*/
DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
opts.handle, .priority = opts.priority,
.prog_fd = fd);
r = bpf_tc_attach(&hook, &replace_opts);
if (r == -EEXIST) {
/* Expected, now use BPF_TC_F_REPLACE to replace it */
replace_opts.flags = BPF_TC_F_REPLACE;
return bpf_tc_attach(&hook, &replace_opts);
} else if (r < 0) {
return r;
}
/* There must be no existing filter with these
* attributes, so cleanup and return an error.
*/
replace_opts.prog_fd = replace_opts.prog_id = 0;
bpf_tc_detach(&hook, &replace_opts);
return -1;
To obtain info of a particular filter:
/* Find info for filter with handle 1 and priority 50 */
DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
.priority = 50);
r = bpf_tc_query(&hook, &info_opts);
if (r == -ENOENT)
printf("Filter not found");
else if (r < 0)
return r;
printf("Prog ID: %u", info_opts.prog_id);
return 0;
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design
[ Daniel: also did major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
|
|
This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c. It also adds a libbpf_netlink_send_recv() that
is useful to wrap send + recv handling in a generic way. Subsequent patch
will also use this function for sending and receiving a netlink response.
The libbpf_nl_get_link() helper has been removed instead, moving socket
creation into the newly named libbpf_netlink_send_recv().
Every nested attribute's closure must happen using the helper
nlattr_end_nested(), which sets its length properly. NLA_F_NESTED is
enforced using nlattr_begin_nested() helper. Other simple attributes
can be added directly.
The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:
struct {
struct nlmsghdr nh;
struct tcmsg t;
char buf[4096];
} req;
Then, maxsz should be sizeof(req).
This change also converts the open coded attribute preparation with these
helpers. Note that the only failure the internal call to nlattr_add()
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.
The libbpf_netlink_send_recv() call takes care of opening the socket,
sending the netlink message, receiving the response, potentially invoking
callbacks, and return errors if any, and then finally close the socket.
This allows users to avoid identical socket setup code in different places.
The only user of libbpf_nl_get_link() has been converted to make use of it.
__bpf_set_link_xdp_fd_replace() has also been refactored to use it.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
[ Daniel: major patch cleanup ]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210512103451.989420-2-memxor@gmail.com
|
|
Detect use of static entry-point BPF programs (those with SEC() markings) and
emit error message. This is similar to
c1cccec9c636 ("libbpf: Reject static maps") but for BPF programs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210514195534.1440970-1-andrii@kernel.org
|
|
Static maps never really worked with libbpf, because all such maps were always
silently resolved to the very first map. Detect static maps (both legacy and
BTF-defined) and report user-friendly error.
Tested locally by switching few maps (legacy and BTF-defined) in selftests to
static ones and verifying that now libbpf rejects them loudly.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210513233643.194711-2-andrii@kernel.org
|
|
Adjust static_linked selftests to test a mix of global and static variables
and their handling of bpftool's skeleton generation code.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210513233643.194711-1-andrii@kernel.org
|
|
Do the same global -> static BTF update for global functions with STV_INTERNAL
visibility to turn on static BPF verification mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-7-andrii@kernel.org
|
|
Fix silly bug in updating ELF symbol's visibility.
Fixes: a46349227cd8 ("libbpf: Add linker extern resolution support for functions and global variables")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-6-andrii@kernel.org
|
|
As discussed in [0], stop emitting static variables in BPF skeletons to avoid
issues with name-conflicting static variables across multiple
statically-linked BPF object files.
Users using static variables to pass data between BPF programs and user-space
should do a trivial one-time switch according to the following simple rules:
- read-only `static volatile const` variables should be converted to
`volatile const`;
- read/write `static volatile` variables should just drop `static volatile`
modifiers to become global variables/symbols. To better handle older Clang
versions, such newly converted global variables should be explicitly
initialized with a specific value or `= 0`/`= {}`, whichever is
appropriate.
[0] https://lore.kernel.org/bpf/CAEf4BzZo7_r-hsNvJt3w3kyrmmBJj7ghGY8+k4nvKF0KLjma=w@mail.gmail.com/T/#m664d4b0d6b31ac8b2669360e0fc2d6962e9f5ec1
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-5-andrii@kernel.org
|
|
In preparation of skipping emitting static variables in BPF skeletons, switch
all current selftests uses of static variables to pass data between BPF and
user-space to use global variables.
All non-read-only `static volatile` variables become just plain global
variables by dropping `static volatile` part.
Read-only `static volatile const` variables, though, still require `volatile`
modifier, otherwise compiler will ignore whatever values are set from
user-space.
Few static linker tests are using name-conflicting static variables to
validate that static linker still properly handles static variables and
doesn't trip up on name conflicts.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-4-andrii@kernel.org
|
|
For better future extensibility add per-file linker options. Currently
the set of available options is empty. This changes bpf_linker__add_file()
API, but it's not a breaking change as bpf_linker APIs hasn't been released
yet.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210507054119.270888-3-andrii@kernel.org
|
|
Similarly to .rodata, strip any const/volatile/restrict modifiers when
generating BPF skeleton. They are not helpful and actually just get in the way.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210507054119.270888-2-andrii@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- Convert sh and sparc to use generic shell scripts to generate the
syscall headers
- refactor .gitignore files
- Update kernel/config_data.gz only when the content of the .config
is really changed, which avoids the unneeded re-link of vmlinux
- move "remove stale files" workarounds to scripts/remove-stale-files
- suppress unused-but-set-variable warnings by default for Clang
as well
- fix locale setting LANG=C to LC_ALL=C
- improve 'make distclean'
- always keep intermediate objects from scripts/link-vmlinux.sh
- move IF_ENABLED out of <linux/kconfig.h> to make it self-contained
- misc cleanups
* tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits)
linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h>
kbuild: Don't remove link-vmlinux temporary files on exit/signal
kbuild: remove the unneeded comments for external module builds
kbuild: make distclean remove tag files in sub-directories
kbuild: make distclean work against $(objtree) instead of $(srctree)
kbuild: refactor modname-multi by using suffix-search
kbuild: refactor fdtoverlay rule
kbuild: parameterize the .o part of suffix-search
arch: use cross_compiling to check whether it is a cross build or not
kbuild: remove ARCH=sh64 support from top Makefile
.gitignore: prefix local generated files with a slash
kbuild: replace LANG=C with LC_ALL=C
Makefile: Move -Wno-unused-but-set-variable out of GCC only block
kbuild: add a script to remove stale generated files
kbuild: update config_data.gz only when the content of .config is changed
.gitignore: ignore only top-level modules.builtin
.gitignore: move tags and TAGS close to other tag files
kernel/.gitgnore: remove stale timeconst.h and hz.bc
usr/include: refactor .gitignore
genksyms: fix stale comment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.13-rc1, including fixes from bpf, can and
netfilter trees. Self-contained fixes, nothing risky.
Current release - new code bugs:
- dsa: ksz: fix a few bugs found by static-checker in the new driver
- stmmac: fix frame preemption handshake not triggering after
interface restart
Previous releases - regressions:
- make nla_strcmp handle more then one trailing null character
- fix stack OOB reads while fragmenting IPv4 packets in openvswitch
and net/sched
- sctp: do asoc update earlier in sctp_sf_do_dupcook_a
- sctp: delay auto_asconf init until binding the first addr
- stmmac: clear receive all(RA) bit when promiscuous mode is off
- can: mcp251x: fix resume from sleep before interface was brought up
Previous releases - always broken:
- bpf: fix leakage of uninitialized bpf stack under speculation
- bpf: fix masking negation logic upon negative dst register
- netfilter: don't assume that skb_header_pointer() will never fail
- only allow init netns to set default tcp cong to a restricted algo
- xsk: fix xp_aligned_validate_desc() when len == chunk_size to avoid
false positive errors
- ethtool: fix missing NLM_F_MULTI flag when dumping
- can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
- sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
- bridge: fix NULL-deref caused by a races between assigning
rx_handler_data and setting the IFF_BRIDGE_PORT bit
Latecomer:
- seg6: add counters support for SRv6 Behaviors"
* tag 'net-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits)
atm: firestream: Use fallthrough pseudo-keyword
net: stmmac: Do not enable RX FIFO overflow interrupts
mptcp: fix splat when closing unaccepted socket
i40e: Remove LLDP frame filters
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
i40e: fix the restart auto-negotiation after FEC modified
i40e: Fix use-after-free in i40e_client_subtask()
i40e: fix broken XDP support
netfilter: nftables: avoid potential overflows on 32bit arches
netfilter: nftables: avoid overflows in nft_hash_buckets()
tcp: Specify cmsgbuf is user pointer for receive zerocopy.
mlxsw: spectrum_mr: Update egress RIF list before route's action
net: ipa: fix inter-EE IRQ register definitions
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
can: mcp251x: fix resume from sleep before interface was brought up
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
netfilter: nftables: Fix a memleak from userdata error path in new objects
netfilter: remove BUG_ON() after skb_header_pointer()
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
...
|
|
Merge yet more updates from Andrew Morton:
"This is everything else from -mm for this merge window.
90 patches.
Subsystems affected by this patch series: mm (cleanups and slub),
alpha, procfs, sysctl, misc, core-kernel, bitmap, lib, compat,
checkpatch, epoll, isofs, nilfs2, hpfs, exit, fork, kexec, gcov,
panic, delayacct, gdb, resource, selftests, async, initramfs, ipc,
drivers/char, and spelling"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (90 commits)
mm: fix typos in comments
mm: fix typos in comments
treewide: remove editor modelines and cruft
ipc/sem.c: spelling fix
fs: fat: fix spelling typo of values
kernel/sys.c: fix typo
kernel/up.c: fix typo
kernel/user_namespace.c: fix typos
kernel/umh.c: fix some spelling mistakes
include/linux/pgtable.h: few spelling fixes
mm/slab.c: fix spelling mistake "disired" -> "desired"
scripts/spelling.txt: add "overflw"
scripts/spelling.txt: Add "diabled" typo
scripts/spelling.txt: add "overlfow"
arm: print alloc free paths for address in registers
mm/vmalloc: remove vwrite()
mm: remove xlate_dev_kmem_ptr()
drivers/char: remove /dev/kmem for good
mm: fix some typos and code style problems
ipc/sem.c: mundane typo fixes
...
|
|
The section "19) Editor modelines and other cruft" in
Documentation/process/coding-style.rst clearly says, "Do not include any
of these in source files."
I recently receive a patch to explicitly add a new one.
Let's do treewide cleanups, otherwise some people follow the existing code
and attempt to upstream their favoriate editor setups.
It is even nicer if scripts/checkpatch.pl can check it.
If we like to impose coding style in an editor-independent manner, I think
editorconfig (patch [1]) is a saner solution.
[1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/
Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
'assert.h' included in 'sparsebit.c' is duplicated.
It is also included in the 161th line.
'string.h' included in 'mincore_selftest.c' is duplicated.
It is also included in the 15th line.
'sched.h' included in 'tlbie_test.c' is duplicated.
It is also included in the 33th line.
Link: https://lkml.kernel.org/r/20210316073336.426255-1-zhang.yunkai@zte.com.cn
Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "fs/epoll: restore user-visible behavior upon event ready".
This series tries to address a change in user visible behavior, reported
in https://bugzilla.kernel.org/show_bug.cgi?id=208943.
Epoll does not report an event to all the threads running epoll_wait()
on the same epoll descriptor. Unsurprisingly, this was bisected back to
339ddb53d373 (fs/epoll: remove unnecessary wakeups of nested epoll), which
has had various problems in the past, beyond only nested epoll usage.
This patch (of 2):
This incorporates the testcase originally reported in:
https://bugzilla.kernel.org/show_bug.cgi?id=208943
Which ensures an event is reported to all threads blocked on the same
epoll descriptor, otherwise only a single thread will receive the wakeup
once the event become ready.
Link: https://lkml.kernel.org/r/20210405231025.33829-1-dave@stgolabs.net
Link: https://lkml.kernel.org/r/20210405231025.33829-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add fast paths to find_*_bit() functions as per kernel implementation.
Link: https://lkml.kernel.org/r/20210401003153.97325-12-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Sync the implementation with recent kernel changes.
Link: https://lkml.kernel.org/r/20210401003153.97325-9-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Sync implementation with the kernel and move the macro from
tools/include/linux/bitmap.h to tools/include/asm-generic/bitsperlong.h
Link: https://lkml.kernel.org/r/20210401003153.97325-7-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Kernel version generates better code.
Link: https://lkml.kernel.org/r/20210401003153.97325-4-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Some functions in tools/include/linux/bitmap.h declare nbits as int. In
the kernel nbits is declared as unsigned int.
Link: https://lkml.kernel.org/r/20210401003153.97325-3-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "lib/find_bit: fast path for small bitmaps", v6.
Bitmap operations are much simpler and faster in case of small bitmaps
which fit into a single word. In linux/bitmap.c we have a machinery that
allows compiler to replace actual function call with a few instructions if
bitmaps passed into the function are small and their size is known at
compile time.
find_*_bit() API lacks this functionality; but users will benefit from it
a lot. One important example is cpumask subsystem when NR_CPUS <=
BITS_PER_LONG.
This patch (of 12):
GENMASK(h, l) may be passed with unsigned types. In such case,
type-limits warning is generated for example in case of GENMASK(h, 0).
Link: https://lkml.kernel.org/r/20210401003153.97325-1-yury.norov@gmail.com
Link: https://lkml.kernel.org/r/20210401003153.97325-2-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexey Klimov <aklimov@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jianpeng Ma <jianpeng.ma@intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Test that /proc instance mounted with
mount -t proc -o subset=pid
contains only ".", "..", "self", "thread-self" and pid directories.
Note:
Currently "subset=pid" doesn't return "." and ".." via readdir.
This must be a bug.
Link: https://lkml.kernel.org/r/YFYZZ7WGaZlsnChS@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Now that proc_ops are separate from file_operations and other operations
it easy to check all instances to have ->proc_lseek hook and remove check
in main code.
Note:
nonseekable_open() files naturally don't require ->proc_lseek.
Garbage collect pde_lseek() function.
[adobriyan@gmail.com: smoke test lseek()]
Link: https://lkml.kernel.org/r/YG4OIhChOrVTPgdN@localhost.localdomain
Link: https://lkml.kernel.org/r/YFYX0Bzwxlc7aBa/@localhost.localdomain
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Merge more updates from Andrew Morton:
"The remainder of the main mm/ queue.
143 patches.
Subsystems affected by this patch series (all mm): pagecache, hugetlb,
userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap,
kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and
kfence"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits)
kfence: use power-efficient work queue to run delayed work
kfence: maximize allocation wait timeout duration
kfence: await for allocation using wait_event
kfence: zero guard page after out-of-bounds access
mm/process_vm_access.c: remove duplicate include
mm/mempool: minor coding style tweaks
mm/highmem.c: fix coding style issue
btrfs: use memzero_page() instead of open coded kmap pattern
iov_iter: lift memzero_page() to highmem.h
mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
mm/zswap.c: switch from strlcpy to strscpy
arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
mm,memory_hotplug: allocate memmap from the added memory range
mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
mm,memory_hotplug: relax fully spanned sections check
drivers/base/memory: introduce memory_block_{online,offline}
mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
- new driver for the Realtek Otto GPIO controller
- ACPI support for gpio-mpc8xxx
- edge event support for gpio-sch (+ Kconfig fixes)
- Kconfig improvements in gpio-ich
- fixes to older issues in gpio-mockup
- ACPI quirk for ignoring EC wakeups on Dell Venue 10 Pro 5055
- improve the GPIO aggregator code by using more generic interfaces
instead of reimplementing them in the driver
- convert the DT bindings for gpio-74x164 to yaml
- documentation improvements
- a slew of other minor fixes and improvements to GPIO drivers
* tag 'gpio-updates-for-v5.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (34 commits)
dt-bindings: gpio: add YAML description for rockchip,gpio-bank
gpio: mxs: remove useless function
dt-bindings: gpio: fairchild,74hc595: Convert to json-schema
gpio: it87: remove unused code
gpio: 104-dio-48e: Fix coding style issues
gpio: mpc8xxx: Add ACPI support
gpio: ich: Switch to be dependent on LPC_ICH
gpio: sch: Drop MFD_CORE selection
gpio: sch: depends on LPC_SCH
gpiolib: acpi: Add quirk to ignore EC wakeups on Dell Venue 10 Pro 5055
gpio: sch: Hook into ACPI GPE handler to catch GPIO edge events
gpio: sch: Add edge event support
gpio: aggregator: Replace custom get_arg() with a generic next_arg()
lib/cmdline: Export next_arg() for being used in modules
gpio: omap: Use device_get_match_data() helper
gpio: Add Realtek Otto GPIO support
dt-bindings: gpio: Binding for Realtek Otto GPIO
docs: kernel-parameters: Add gpio_mockup_named_lines
docs: kernel-parameters: Move gpio-mockup for alphabetic order
lib: bitmap: provide devm_bitmap_alloc() and devm_bitmap_zalloc()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
"Bug fixes and a smattering of features"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (21 commits)
tools/power turbostat: version 2021.05.04
tools/power turbostat: Support "turbostat --hide idle"
tools/power turbostat: elevate priority of interval mode
tools/power turbostat: formatting
tools/power turbostat: rename tcc variables
tools/power turbostat: add TCC Offset support
tools/power turbostat: save original CPU model
tools/power turbostat: Fix Core C6 residency on Atom CPUs
tools/power turbostat: Print the C-state Pre-wake settings
tools/power turbostat: Enable tsc_tweak for Elkhart Lake and Jasper Lake
tools/power turbostat: unmark non-kernel-doc comment
tools/power/turbostat: Remove Package C6 Retention on Ice Lake Server
tools/power turbostat: Fix offset overflow issue in index converting
tools/power/turbostat: Fix turbostat for AMD Zen CPUs
tools/power turbostat: update version number
tools/power turbostat: Fix DRAM Energy Unit on SKX
Revert "tools/power turbostat: adjust for temperature offset"
tools/power turbostat: Support Ice Lake D
tools/power turbostat: Support Alder Lake Mobile
tools/power turbostat: print microcode patch level
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:
- Added a KTEST section in the MAINTAINERS file
- Included John Hawley as a co-maintainer
- Add an example config that would work with VMware workstation guests
- Cleanups to the code
* tag 'ktest-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Add KTEST section to MAINTAINERS file
ktest: Re-arrange the code blocks for better discoverability
ktest: Further consistency cleanups
ktest: Fixing indentation to match expected pattern
ktest: Adding editor hints to improve consistency
ktest: Add example config for using VMware VMs
ktest: Minor cleanup with uninitialized variable $build_options
|
|
When pages are pinned they can be faulted in userland and migrated, and
they can be faulted right in kernel without migration.
In either case, the pinned pages must end-up being pinnable (not
movable).
Add a new test to gup_test, to help verify that the gup/pup
(get_user_pages() / pin_user_pages()) behavior with respect to pinnable
and movable pages is reasonable and correct. Specifically, provide a
way to:
1) Verify that only "pinnable" pages are pinned. This is checked
automatically for you.
2) Verify that gup/pup performance is reasonable. This requires
comparing benchmarks between doing gup/pup on pages that have been
pre-faulted in from user space, vs. doing gup/pup on pages that are
not faulted in until gup/pup time (via FOLL_TOUCH). This decision is
controlled with the new -z command line option.
Link: https://lkml.kernel.org/r/20210215161349.246722-15-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In gup_test both gup_flags and test_flags use the same flags field.
This is broken.
Farther, in the actual gup_test.c all the passed gup_flags are erased
and unconditionally replaced with FOLL_WRITE.
Which means that test_flags are ignored, and code like this always
performs pin dump test:
155 if (gup->flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN)
156 nr = pin_user_pages(addr, nr, gup->flags,
157 pages + i, NULL);
158 else
159 nr = get_user_pages(addr, nr, gup->flags,
160 pages + i, NULL);
161 break;
Add a new test_flags field, to allow raw gup_flags to work. Add a new
subcommand for DUMP_USER_PAGES_TEST to specify that pin test should be
performed.
Remove unconditional overwriting of gup_flags via FOLL_WRITE. But,
preserve the previous behaviour where FOLL_WRITE was the default flag,
and add a new option "-W" to unset FOLL_WRITE.
Rename flags with gup_flags.
With the fix, dump works like this:
root@virtme:/# gup_test -c
---- page #0, starting from user virt addr: 0x7f8acb9e4000
page:00000000d3d2ee27 refcount:2 mapcount:1 mapping:0000000000000000
index:0x0 pfn:0x100bcf
anon flags: 0x300000000080016(referenced|uptodate|lru|swapbacked)
raw: 0300000000080016 ffffd0e204021608 ffffd0e208df2e88 ffff8ea04243ec61
raw: 0000000000000000 0000000000000000 0000000200000000 0000000000000000
page dumped because: gup_test: dump_pages() test
DUMP_USER_PAGES_TEST: done
root@virtme:/# gup_test -c -p
---- page #0, starting from user virt addr: 0x7fd19701b000
page:00000000baed3c7d refcount:1025 mapcount:1 mapping:0000000000000000
index:0x0 pfn:0x108008
anon flags: 0x300000000080014(uptodate|lru|swapbacked)
raw: 0300000000080014 ffffd0e204200188 ffffd0e205e09088 ffff8ea04243ee71
raw: 0000000000000000 0000000000000000 0000040100000000 0000000000000000
page dumped because: gup_test: dump_pages() test
DUMP_USER_PAGES_TEST: done
Refcount shows the difference between pin vs no-pin case.
Also change type of nr from int to long, as it counts number of pages.
Link: https://lkml.kernel.org/r/20210215161349.246722-14-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix a dormant bug in userfaultfd_events_test(), where we did `return
faulting_process(0)` instead of `exit(faulting_process(0))`. This
caused the forked process to keep running, trying to execute any further
test cases after the events test in parallel with the "real" process.
Add a simple test case which exercises minor faults. In short, it does
the following:
1. "Sets up" an area (area_dst) and a second shared mapping to the same
underlying pages (area_dst_alias).
2. Register one of these areas with userfaultfd, in minor fault mode.
3. Start a second thread to handle any minor faults.
4. Populate the underlying pages with the non-UFFD-registered side of
the mapping. Basically, memset() each page with some arbitrary
contents.
5. Then, using the UFFD-registered mapping, read all of the page
contents, asserting that the contents match expectations (we expect
the minor fault handling thread can modify the page contents before
resolving the fault).
The minor fault handling thread, upon receiving an event, flips all the
bits (~) in that page, just to prove that it can modify it in some
arbitrary way. Then it issues a UFFDIO_CONTINUE ioctl, to setup the
mapping and resolve the fault. The reading thread should wake up and
see this modification.
Currently the minor fault test is only enabled in hugetlb_shared mode,
as this is the only configuration the kernel feature supports.
Link: https://lkml.kernel.org/r/20210301222728.176417-7-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Further extend <debugfs>/split_huge_pages to accept
"<path>,<pgoff_start>,<pgoff_end>" for file-backed THP split tests since
tmpfs may have file backed by THP that mapped nowhere.
Update selftest program to test file-backed THP split too.
Link: https://lkml.kernel.org/r/20210331235309.332292-2-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mika Penttila <mika.penttila@nextfour.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We did not have a direct user interface of splitting the compound page
backing a THP and there is no need unless we want to expose the THP
implementation details to users. Make <debugfs>/split_huge_pages accept a
new command to do that.
By writing "<pid>,<vaddr_start>,<vaddr_end>" to
<debugfs>/split_huge_pages, THPs within the given virtual address range
from the process with the given pid are split. It is used to test
split_huge_page function. In addition, a selftest program is added to
tools/testing/selftests/vm to utilize the interface by splitting
PMD THPs and PTE-mapped THPs.
This does not change the old behavior, i.e., writing 1 to the interface
to split all THPs in the system.
Link: https://lkml.kernel.org/r/20210331235309.332292-1-zi.yan@sent.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mika Penttila <mika.penttila@nextfour.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
As idle, in particular, can have many columns on some machines...
Make it easy to ignore them all at once.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
This makes interval mode less likely to see delayed
results on a heavily loaded system.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Spring is here...
run a long overdue Lendent on turbostat.c
no functional change
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
There are two TCC activation temeprature.
One is the default TCC activation temperature, also known as TJ_MAX.
Another one is the effective TCC activation temperature, which is the
subtraction of default TCC activation temperature and TCC offset.
The name of variable tcc_activation_temp might be misleading here.
Thus rename tcc_activation_temp to tj_max, and use tcc_default and
tcc_offset to calculate the effective TCC activation temperature.
No functional change in this patch.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
The length of TCC Offset bits varies on different platforms.
Decode TCC Offset bits only for the platforms that we have verified.
For the others, only show default TCC activation temperature.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
CPU model may get changed in intel_model_duplicates() for code reuse.
But there are still some cases we need the original CPU model to handle
minor differences between generations.
Thus save the original CPU model.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
For Atom CPUs that have core cstate deeper than C6,
MSR_CORE_C6_RESIDENCY actually returns the residency for both CC6 and
deeper Core cstates.
Thus, the real Core C6 residency should be the subtraction of
MSR_CORE_C6_RESIDENCY return value and MSR_CORE_C6_RESIDENCY return value.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
C-state pre-wake setting[1] is an optimization for some Intel CPUs to
be woken up from deep C-states in order to reduce latency. According to
the spec, the BIT30 is the C-state Pre-wake Disable. Expose this setting
accordingly.
Sample output from turbostat:
...
cpu51: MSR_IA32_POWER_CTL: 0x1a00a40059 (C1E auto-promotion: DISabled)
C-state Pre-wake: ENabled
cpu51: MSR_TURBO_RATIO_LIMIT: 0x2021212121212224
...
[1] https://intel.github.io/wult/#c-state-pre-wake
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
It was found that on Elkhart Lake the TSC frequency is driven by
a separate crystal-clock domain, which is different from the
BCLK domain which includes mperf. This has result in small different
speed thus inconsistence between TSC and the mperf, which caused the
Busy% to be higher than 100%. On this platform it seems that the mperf
runs faster than tsc when the CPU is 100% utilized:
delta tsc(18815473183) < delta mperf(18958403680) for 10 seconds.
To align TSC with mperf, leverage the tsc_tweak mechanism introduced for
cores newer than Skylake, so that TSC and mperf would be calculated in
the same domain.
Reported-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|