Age | Commit message (Collapse) | Author |
|
Update sk_storage_map_test to make sure kernel does not copy user
non-zero value spin lock to kernel, and does not copy kernel spin
lock value to user.
If user spin lock value is copied to kernel, this test case will
make kernel spin on the copied lock, resulting in rcu stall and
softlockup.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20221114134720.1057939-3-xukuohai@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_selem_alloc function is used by inode_storage, sk_storage and
task_storage maps to set map value, for these map types, there may
be a spin lock in the map value, so if we use memcpy to copy the whole
map value from user, the spin lock field may be initialized incorrectly.
Since the spin lock field is zeroed by kzalloc, call copy_map_value
instead of memcpy to skip copying the spin lock field to fix it.
Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20221114134720.1057939-2-xukuohai@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test closes both iterator link fd and cgroup fd, and removes the
cgroup file to make a dead cgroup before reading from cgroup iterator.
It also uses kern_sync_rcu() and usleep() to wait for the release of
start cgroup. If the start cgroup is not pinned by cgroup iterator,
reading from iterator fd will trigger use-after-free.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20221121073440.1828292-4-houtao@huaweicloud.com
|
|
Add remove_cgroup() to remove a cgroup which doesn't have any children
or live processes. It will be used by the following patch to test cgroup
iterator on a dead cgroup.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221121073440.1828292-3-houtao@huaweicloud.com
|
|
bpf_iter_attach_cgroup() has already acquired an extra reference for the
start cgroup, but the reference may be released if the iterator link fd
is closed after the creation of iterator fd, and it may lead to
user-after-free problem when reading the iterator fd.
An alternative fix is pinning iterator link when opening iterator,
but it will make iterator link being still visible after the close of
iterator link fd and the behavior is different with other link types, so
just fixing it by acquiring another reference for the start cgroup.
Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221121073440.1828292-2-houtao@huaweicloud.com
|
|
xfstests generic/013 and generic/476 reported WARNING as follows:
WARNING: lock held when returning to user space!
6.1.0-rc5+ #4 Not tainted
------------------------------------------------
fsstress/504233 is leaving the kernel with locks still held!
2 locks held by fsstress/504233:
#0: ffff888054c38850 (&sb->s_type->i_mutex_key#21){+.+.}-{3:3}, at:
lock_two_nondirectories+0xcf/0xf0
#1: ffff8880b8fec750 (&sb->s_type->i_mutex_key#21/4){+.+.}-{3:3}, at:
lock_two_nondirectories+0xb7/0xf0
This will lead to deadlock and hungtask.
Fix this by releasing locks when failed to write out on a file range in
cifs_file_copychunk_range().
Fixes: 3e3761f1ec7d ("smb3: use filemap_write_and_wait_range instead of filemap_write_and_wait")
Cc: stable@vger.kernel.org # 6.0
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The TVAL register is 32 bit signed. Thus only the lower 31 bits are
available to specify when an interrupt is to occur at some time in the
near future. Attempting to specify a larger interval with TVAL results
in a negative time delta which means the timer fires immediately upon
being programmed, rather than firing at that expected future time.
The solution is for Linux to declare that TVAL is a 31 bit register rather
than give its true size of 32 bits. This prevents Linux from programming
TVAL with a too-large value. Note that, prior to 5.16, this little trick
was the standard way to handle TVAL in Linux, so there is nothing new
happening here on that front.
The softlockup detector hides the issue, because it keeps generating
short timer deadlines that are within the scope of the broken timer.
Disabling it, it starts using NO_HZ with much longer timer deadlines, which
turns into an interrupt flood:
11: 1124855130 949168462 758009394 76417474 104782230 30210281
310890 1734323687 GICv2 29 Level arch_timer
And "much longer" isn't that long: it takes less than 43s to underflow
TVAL at 50MHz (the frequency of the counter on XGene-1).
Some comments on the v1 version of this patch by Marc Zyngier:
XGene implements CVAL (a 64bit comparator) in terms of TVAL (a countdown
register) instead of the other way around. TVAL being a 32bit register,
the width of the counter should equally be 32. However, TVAL is a
*signed* value, and keeps counting down in the negative range once the
timer fires.
It means that any TVAL value with bit 31 set will fire immediately,
as it cannot be distinguished from an already expired timer. Reducing
the timer range back to a paltry 31 bits papers over the issue.
Another problem cannot be fixed though, which is that the timer interrupt
*must* be handled within the negative countdown period, or the interrupt
will be lost (TVAL will rollover to a positive value, indicative of a
new timer deadline).
Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations")
Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221024165422.GA51107@zipoli.concurrent-rt.com
Link: https://lore.kernel.org/r/20221121145343.896018-1-maz@kernel.org
[maz: revamped the commit message]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Regulator changes for am335x-pcm-953
This is for deferred probe issue on am335x-pcm-953 sdhci-omap regulator.
* tag 'am335x-pcm-953-regulators' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: am335x-pcm-953: Define fixed regulators in root node
Link: https://lore.kernel.org/r/pull-1669036672-530717@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Most allocation sites in the kernel want an explicitly sized allocation
(and not "more"), and that dynamic runtime analysis tools (e.g. KASAN,
UBSAN_BOUNDS, FORTIFY_SOURCE, etc) are looking for precise bounds checking
(i.e. not something that is rounded up). A tiny handful of allocations
were doing an implicit alloc/realloc loop that actually depended on
ksize(), and didn't actually always call realloc. This has created a
long series of bugs and problems over many years related to the runtime
bounds checking, so these callers are finally being adjusted to _not_
depend on the ksize() side-effect, by doing one of several things:
- tracking the allocation size precisely and just never calling ksize()
at all [1].
- always calling realloc and not using ksize() at all. (This solution
ends up actually be a subset of the next solution.)
- using kmalloc_size_roundup() to explicitly round up the desired
allocation size immediately [2].
The bpf/verifier case is this another of this latter case, and is the
last outstanding case to be fixed in the kernel.
Because some of the dynamic bounds checking depends on the size being an
_argument_ to an allocator function (i.e. see the __alloc_size attribute),
the ksize() users are rare, and it could waste local variables, it
was been deemed better to explicitly separate the rounding up from the
allocation itself [3].
Round up allocations with kmalloc_size_roundup() so that the verifier's
use of ksize() is always accurate.
[1] e.g.:
https://git.kernel.org/linus/712f210a457d
https://git.kernel.org/linus/72c08d9f4c72
[2] e.g.:
https://git.kernel.org/netdev/net-next/c/12d6c1d3a2ad
https://git.kernel.org/netdev/net-next/c/ab3f7828c979
https://git.kernel.org/netdev/net-next/c/d6dd508080a3
[3] https://lore.kernel.org/lkml/0ea1fc165a6c6117f982f4f135093e69cb884930.camel@redhat.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221118183409.give.387-kees@kernel.org
|
|
This patch introduced a regression: commit 48596a8ddc46 ("netfilter:
ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
The variable e.ip is passed to adtfn() function which finally adds the
ip address to the set. The patch above refactored the for loop and moved
e.ip = htonl(ip) to the end of the for loop.
What this means is that if the value of "ip" changes between the first
assignement of e.ip and the forloop, then e.ip is pointing to a
different ip address than "ip".
Test case:
$ ipset create jdtest_tmp hash:ip family inet hashsize 2048 maxelem 100000
$ ipset add jdtest_tmp 10.0.1.1/31
ipset v6.21.1: Element cannot be added to the set: it's already added
The value of ip gets updated inside the "else if (tb[IPSET_ATTR_CIDR])"
block but e.ip is still pointing to the old value.
Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
Reviewed-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Syzkaller reported BUG as follows:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
__might_resched.cold+0x222/0x26b
kmem_cache_alloc+0x2e7/0x3c0
update_qgroup_limit_item+0xe1/0x390
btrfs_qgroup_inherit+0x147b/0x1ee0
create_subvol+0x4eb/0x1710
btrfs_mksubvol+0xfe5/0x13f0
__btrfs_ioctl_snap_create+0x2b0/0x430
btrfs_ioctl_snap_create_v2+0x25a/0x520
btrfs_ioctl+0x2a1c/0x5ce0
__x64_sys_ioctl+0x193/0x200
do_syscall_64+0x35/0x80
Fix this by calling qgroup_dirty() on @dstqgroup, and update limit item in
btrfs_run_qgroups() later outside of the spinlock context.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When trying to see if we can clone a file range, there are cases where we
end up sending two write operations in case the inode from the source root
has an i_size that is not sector size aligned and the length from the
current offset to its i_size is less than the remaining length we are
trying to clone.
Issuing two write operations when we could instead issue a single write
operation is not incorrect. However it is not optimal, specially if the
extents are compressed and the flag BTRFS_SEND_FLAG_COMPRESSED was passed
to the send ioctl. In that case we can end up sending an encoded write
with an offset that is not sector size aligned, which makes the receiver
fallback to decompressing the data and writing it using regular buffered
IO (so re-compressing the data in case the fs is mounted with compression
enabled), because encoded writes fail with -EINVAL when an offset is not
sector size aligned.
The following example, which triggered a bug in the receiver code for the
fallback logic of decompressing + regular buffer IO and is fixed by the
patchset referred in a Link at the bottom of this changelog, is an example
where we have the non-optimal behaviour due to an unaligned encoded write:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdj
MNT=/mnt/sdj
mkfs.btrfs -f $DEV > /dev/null
mount -o compress $DEV $MNT
# File foo has a size of 33K, not aligned to the sector size.
xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
# Now clone the first 32K of file bar into foo at offset 0.
xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
# Snapshot the default subvolume and create a full send stream (v2).
btrfs subvolume snapshot -r $MNT $MNT/snap
btrfs send --compressed-data -f /tmp/test.send $MNT/snap
echo -e "\nFile bar in the original filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
mkfs.btrfs -f $DEV > /dev/null
mount $DEV $MNT
echo -e "\nReceiving stream in a new filesystem..."
btrfs receive -f /tmp/test.send $MNT
echo -e "\nFile bar in the new filesystem:"
od -A d -t x1 $MNT/snap/bar
umount $MNT
Before this patch, the send stream included one regular write and one
encoded write for file 'bar', with the later being not sector size aligned
and causing the receiver to fallback to decompression + buffered writes.
The output of the btrfs receive command in verbose mode (-vvv):
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
utimes
clone bar - source=foo source offset=0 offset=0 length=32768
write bar - offset=32768 length=1024
encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
encoded_write bar - falling back to decompress and write due to errno 22 ("Invalid argument")
(...)
This patch avoids the regular write followed by an unaligned encoded write
so that we end up sending a single encoded write that is aligned. So after
this patch the stream content is (output of btrfs receive -vvv):
(...)
mkfile o258-7-0
rename o258-7-0 -> bar
utimes
clone bar - source=foo source offset=0 offset=0 length=32768
encoded_write bar - offset=32768, len=4096, unencoded_offset=32768, unencoded_file_len=32768, unencoded_len=65536, compression=1, encryption=0
(...)
So we get more optimal behaviour and avoid the silent data loss bug in
versions of btrfs-progs affected by the bug referred by the Link tag
below (btrfs-progs v5.19, v5.19.1, v6.0 and v6.0.1).
Link: https://lore.kernel.org/linux-btrfs/cover.1668529099.git.fdmanana@suse.com/
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
generation is an on-disk __le64 value, so use btrfs_super_generation to
convert it to host endian before comparing it.
Fixes: 12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
pm_save_spec_msr() keeps a list of all the MSRs which _might_ need
to be saved and restored at hibernate and resume. However, it has
zero awareness of CPU support for these MSRs. It mostly works by
unconditionally attempting to manipulate these MSRs and relying on
rdmsrl_safe() being able to handle a #GP on CPUs where the support is
unavailable.
However, it's possible for reads (RDMSR) to be supported for a given MSR
while writes (WRMSR) are not. In this case, msr_build_context() sees
a successful read (RDMSR) and marks the MSR as valid. Then, later, a
write (WRMSR) fails, producing a nasty (but harmless) error message.
This causes restore_processor_state() to try and restore it, but writing
this MSR is not allowed on the Intel Atom N2600 leading to:
unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \
at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20)
Call Trace:
<TASK>
restore_processor_state
x86_acpi_suspend_lowlevel
acpi_suspend_enter
suspend_devices_and_enter
pm_suspend.cold
state_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
? do_syscall_64
? up_read
? lock_is_held_type
? asm_exc_page_fault
? lockdep_hardirqs_on
entry_SYSCALL_64_after_hwframe
To fix this, add the corresponding X86_FEATURE bit for each MSR. Avoid
trying to manipulate the MSR when the feature bit is clear. This
required adding a X86_FEATURE bit for MSRs that do not have one already,
but it's a small price to pay.
[ bp: Move struct msr_enumeration inside the only function that uses it. ]
Fixes: 73924ec4d560 ("x86/pm: Save the MSR validity status at context setup")
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.1668539735.git.pawan.kumar.gupta@linux.intel.com
|
|
Mat Martineau says:
====================
mptcp: More specific netlink command errors
This series makes the error reporting for the MPTCP_PM_CMD_ADD_ADDR netlink
command more specific, since there are multiple reasons the command could
fail.
Note that patch 2 adds a GENL_SET_ERR_MSG_FMT() macro to genetlink.h,
which is outside the MPTCP subsystem.
Patch 1 refactors in-kernel listening socket and endpoint creation to
simplify the second patch.
Patch 2 updates the error values returned by the in-kernel path manager
when it fails to create a local endpoint.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Endpoint creation can fail for a number of reasons; in case of failure
append the error number to the extended ack message, using a newly
introduced generic helper.
Additionally let mptcp_pm_nl_append_new_local_addr() report different
error reasons.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When endpoint creation fails, we need to free the newly allocated
entry and eventually destroy the paired mptcp listener socket.
Consolidate such action in a single point let all the errors path
reach it.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support for the TSX control MSR is enumerated in MSR_IA32_ARCH_CAPABILITIES.
This is different from how other CPU features are enumerated i.e. via
CPUID. Currently, a call to tsx_ctrl_is_supported() is required for
enumerating the feature. In the absence of a feature bit for TSX control,
any code that relies on checking feature bits directly will not work.
In preparation for adding a feature bit check in MSR save/restore
during suspend/resume, set a new feature bit X86_FEATURE_TSX_CTRL when
MSR_IA32_TSX_CTRL is present. Also make tsx_ctrl_is_supported() use the
new feature bit to avoid any overhead of reading the MSR.
[ bp: Remove tsx_ctrl_is_supported(), add room for two more feature
bits in word 11 which are coming up in the next merge window. ]
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/de619764e1d98afbb7a5fa58424f1278ede37b45.1668539735.git.pawan.kumar.gupta@linux.intel.com
|
|
We assume the correct errno is -EADDRINUSE when sk->sk_prot->get_port()
fails, so some ->get_port() functions return just 1 on failure and the
callers return -EADDRINUSE instead.
However, mptcp_get_port() can return -EINVAL. Let's not ignore the error.
Note the only exception is inet_autobind(), all of whose callers return
-EAGAIN instead.
Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This code accidentally uses the RX macro twice instead of the RX and TX.
Fixes: 6c635f78c474 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Zero-length arrays are deprecated[1] and are being replaced with
flexible array members in support of the ongoing efforts to tighten the
FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing
with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3.
Replace zero-length array with flexible-array member in struct key_vector.
This results in no differences in binary output.
[1] https://github.com/KSPP/linux/issues/78
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The `nettest` binary, built from `selftests/net/nettest.c`,
was expected to be found in the path during test execution of
`fcnal-test.sh` and `pmtu.sh`, leading to tests getting
skipped when the binary is not installed in the system, as can
be seen in these logs found in the wild [1]:
# TEST: vti4: PMTU exceptions [SKIP]
[ 350.600250] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready
[ 350.607421] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready
# 'nettest' command not found; skipping tests
# xfrm6udp not supported
# TEST: vti6: PMTU exceptions (ESP-in-UDP) [SKIP]
[ 351.605102] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready
[ 351.612243] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready
# 'nettest' command not found; skipping tests
# xfrm4udp not supported
The `unicast_extensions.sh` tests also rely on `nettest`, but
it runs fine there because it looks for the binary in the
current working directory [2]:
The same mechanism that works for the Unicast extensions tests
is here copied over to the PMTU and functional tests.
[1] https://lkft.validation.linaro.org/scheduler/job/5839508#L6221
[2] https://lkft.validation.linaro.org/scheduler/job/5839508#L7958
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Smatch detected the following warning.
drivers/net/ethernet/renesas/rswitch.c:1717 rswitch_init() warn:
'%pM' cannot be followed by 'n'
The 'n' should be '\n'.
Reported-by: Dan Carpenter <error27@gmail.com>
Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netdev.vger.kernel.org archive mirror
Steen Hegelund says:
====================
net: Add support for VCAP debugFS in Sparx5
This provides support for getting VCAP instance, VCAP rule and VCAP port
keyset configuration information via the debug file system.
It builds on top of the initial IS2 VCAP support found in these series:
https://lore.kernel.org/all/20221020130904.1215072-1-steen.hegelund@microchip.com/
https://lore.kernel.org/all/20221109114116.3612477-1-steen.hegelund@microchip.com/
https://lore.kernel.org/all/20221111130519.1459549-1-steen.hegelund@microchip.com/
Functionality:
==============
The VCAP API exposes a /sys/kernel/debug/sparx5/vcaps folder containing
the following entries:
- raw_<vcap>_<instance>
This is a raw dump of the VCAP instance with a line for each available
VCAP rule. This information is limited to the VCAP rule address, the
rule size and the rule keyset name as this requires very little
information from the VCAP cache.
This can be used to detect if a valid rule is stored at the correct
address.
- <vcap>_<instance>
This dumps the VCAP instance configuration: address ranges, chain id
ranges, word size of keys and actions etc, and for each VCAP rule the
details of keys (values and masks) and actions are shown.
This is useful when discovering if the expected rule is present and in
which order it will be matched.
- <interface>
This shows the keyset configuration per lookup and traffic type and the
set of sticky bits (common for all interfaces). This is cleared when
shown, so it is possible to sample over a period of time.
It also shows if this port/lookup is enabled for matching in the VCAP.
This can be used to find out which keyset the traffic being sent to a
port, will be matched against, and if such traffic has been seen by one
of the ports.
Delivery:
=========
This is current plan for delivering the full VCAP feature set of Sparx5:
- TC protocol all support for IS2 VCAP
- Sparx5 IS0 VCAP support
- TC policer and drop action support (depends on the Sparx5 QoS support
upstreamed separately)
- Sparx5 ES0 VCAP support
- TC flower template support
- TC matchall filter support for mirroring and policing ports
- TC flower filter mirror action support
- Sparx5 ES2 VCAP support
Version History:
================
v2 Removed a 'support' folder (used for integration testing) that had
been added in patch 6/8 by a mistake.
Wrapped long lines.
v1 Initial version
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This tests the functionality of the debugFS support:
- finding valid keyset on an address
- raw VCAP output
- full rule VCAP output
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This ensures that the VCAP cache and the lists maintained in the VCAP
instance is protected when accessed by different clients.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This add support for displaying the keys and actions in a rule.
The keys and action display format will be determined by the size and the
type of the key or action. The longer keys will typically be displayed as a
hexadecimal byte array.
The actionset is not decoded in full as the Sparx5 IS2 only has one
supported action, so this will be added later with other VCAP types.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This add support to show all rules in a VCAP instance. The information
shown is:
- rule id
- address range
- size
- chain id
- keyset name, subword size, register span
- actionset name, subword size, register span
- counter value
- sticky bit (one bit width counter)
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds support for decoding VCAP rules with a minimum number of
attributes: address, rule size and keyset.
This allows for a quick inspection of a VCAP instance to determine if the
rule are present and in the correct order.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a debugFS root folder for Sparx5 and add a vcap folder underneath with
the VCAP instances and the ports
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This ensures that the last_used_addr in a VCAP instance is returned to the
default value when all rules have been deleted.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This ensures that the l3_proto always have a valid value and that any
dissector parsing errors causes the flower rule to be discarded.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
The following patchset contains late Netfilter fixes for net:
1) Use READ_ONCE()/WRITE_ONCE() to update ct->mark, from Daniel Xu.
Not reported by syzbot, but I presume KASAN would trigger post
a splat on this. This is a rather old issue, predating git history.
2) Do not set up extensions for set element with end interval flag
set on. This leads to bogusly skipping this elements as expired
when listing the set/map to userspace as well as increasing
memory consumpton when stateful expressions are used. This issue
has been present since 4.18, when timeout support for rbtree set
was added.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 6.1, part 3:
- Fix a small memory leak in mach-mxs code.
- Correct PCIe pad configuration for imx8mp-evk board.
- Fix ref/tcxo clock frequency property for imx6q-prti6q board.
* tag 'imx-fixes-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6q-prti6q: Fix ref/tcxo-clock-frequency properties
arm64: dts: imx8mp-evk: correct pcie pad settings
ARM: mxs: fix memory leak in mxs_machine_init()
Link: https://lore.kernel.org/r/20221119073812.GQ16229@T480
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes
- RSB bus communication fixes
- missing IOMMU reference property to H6 Hantro G2
* tag 'sunxi-fixes-for-6.1-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: h6: Add IOMMU reference to Hantro G2
media: dt-bindings: allwinner: h6-vpu-g2: Add IOMMU reference property
bus: sunxi-rsb: Support atomic transfers
bus: sunxi-rsb: Remove the shutdown callback
Link: https://lore.kernel.org/r/Y3ftpBFk5+fndA4B@jernej-laptop
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes
AT91 fixes for 6.1 #2
It contains:
- fix UDC on at91sam9g20ek boards by adding vbus pin
* tag 'at91-fixes-6.1-2' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: dts: at91: sam9g20ek: enable udc vbus gpio pinctrl
Link: https://lore.kernel.org/r/20221118131205.301662-1-claudiu.beznea@microchip.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes
Fix possible memory leak in optee_register_device()
* tag 'optee-fix-for-6.1' of https://git.linaro.org/people/jens.wiklander/linux-tee:
tee: optee: fix possible memory leak in optee_register_device()
Link: https://lore.kernel.org/r/Y3d2OuJ60U30OkZe@jade
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Eliminate the following coccicheck warning:
./arch/loongarch/kernel/unwind_prologue.c:84:5-13: WARNING: Unsigned
expression compared with zero: frame_ra < 0
Signed-off-by: KaiLong Wang <wangkailong@jari.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Set _PAGE_DIRTY only if _PAGE_MODIFIED is set in {pmd,pte}_mkwrite().
Otherwise, _PAGE_DIRTY silences the TLB modify exception and make us
have no chance to mark a pmd/pte dirty (_PAGE_MODIFIED) for software.
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Now {pmd,pte}_mkdirty() set _PAGE_DIRTY bit unconditionally, this causes
random segmentation fault after commit 0ccf7f168e17bb7e ("mm/thp: carry
over dirty bit when thp splits on pmd").
The reason is: when fork(), parent process use pmd_wrprotect() to clear
huge page's _PAGE_WRITE and _PAGE_DIRTY (for COW); then pte_mkdirty() set
_PAGE_DIRTY as well as _PAGE_MODIFIED while splitting dirty huge pages;
once _PAGE_DIRTY is set, there will be no tlb modify exception so the COW
machanism fails; and at last memory corruption occurred between parent
and child processes.
So, we should set _PAGE_DIRTY only when _PAGE_WRITE is set in {pmd,pte}_
mkdirty().
Cc: stable@vger.kernel.org
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
If a kernel thread is created by a user thread, it may carry FPU/SIMD
thread info flags (TIF_USEDFPU, TIF_USEDSIMD, etc.). Then it will be
considered as a fpu owner and kernel try to save its FPU/SIMD context
and cause such errors:
[ 41.518931] do_fpu invoked from kernel context![#1]:
[ 41.523933] CPU: 1 PID: 395 Comm: iou-wrk-394 Not tainted 6.1.0-rc5+ #217
[ 41.530757] Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.pre-beta8 08/18/2022
[ 41.544064] $ 0 : 0000000000000000 90000000011e9468 9000000106c7c000 9000000106c7fcf0
[ 41.552101] $ 4 : 9000000106305d40 9000000106689800 9000000106c7fd08 0000000003995818
[ 41.560138] $ 8 : 0000000000000001 90000000009a72e4 0000000000000020 fffffffffffffffc
[ 41.568174] $12 : 0000000000000000 0000000000000000 0000000000000020 00000009aab7e130
[ 41.576211] $16 : 00000000000001ff 0000000000000407 0000000000000001 0000000000000000
[ 41.584247] $20 : 0000000000000000 0000000000000001 9000000106c7fd70 90000001002f0400
[ 41.592284] $24 : 0000000000000000 900000000178f740 90000000011e9834 90000001063057c0
[ 41.600320] $28 : 0000000000000000 0000000000000001 9000000006826b40 9000000106305140
[ 41.608356] era : 9000000000228848 _save_fp+0x0/0xd8
[ 41.613542] ra : 90000000011e9468 __schedule+0x568/0x8d0
[ 41.619160] CSR crmd: 000000b0
[ 41.619163] CSR prmd: 00000000
[ 41.622359] CSR euen: 00000000
[ 41.625558] CSR ecfg: 00071c1c
[ 41.628756] CSR estat: 000f0000
[ 41.635239] ExcCode : f (SubCode 0)
[ 41.638783] PrId : 0014c010 (Loongson-64bit)
[ 41.643191] Modules linked in: acpi_ipmi vfat fat ipmi_si ipmi_devintf cfg80211 ipmi_msghandler rfkill fuse efivarfs
[ 41.653734] Process iou-wrk-394 (pid: 395, threadinfo=0000000004ebe913, task=00000000636fa1be)
[ 41.662375] Stack : 00000000ffff0875 9000000006800ec0 9000000006800ec0 90000000002d57e0
[ 41.670412] 0000000000000001 0000000000000000 9000000106535880 0000000000000001
[ 41.678450] 9000000105291800 0000000000000000 9000000105291838 900000000178e000
[ 41.686487] 9000000106c7fd90 9000000106305140 0000000000000001 90000000011e9834
[ 41.694523] 00000000ffff0875 90000000011f034c 9000000105291838 9000000105291830
[ 41.702561] 0000000000000000 9000000006801440 00000000ffff0875 90000000002d48c0
[ 41.710597] 9000000128800001 9000000106305140 9000000105291838 9000000105291838
[ 41.718634] 9000000105291830 9000000107811740 9000000105291848 90000000009bf1e0
[ 41.726672] 9000000105291830 9000000107811748 2d6b72772d756f69 0000000000343933
[ 41.734708] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 41.742745] ...
[ 41.745252] Call Trace:
[ 42.197868] [<9000000000228848>] _save_fp+0x0/0xd8
[ 42.205214] [<90000000011ed468>] __schedule+0x568/0x8d0
[ 42.210485] [<90000000011ed834>] schedule+0x64/0xd4
[ 42.215411] [<90000000011f434c>] schedule_timeout+0x88/0x188
[ 42.221115] [<90000000009c36d0>] io_wqe_worker+0x184/0x350
[ 42.226645] [<9000000000221cf0>] ret_from_kernel_thread+0xc/0x9c
This can be easily triggered by ltp testcase syscalls/io_uring02 and it
can also be easily fixed by clearing the FPU/SIMD thread info flags for
kernel threads in copy_thread().
Cc: stable@vger.kernel.org
Reported-by: Qi Hu <huqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
SMP operations can be shared by Loongson-2 series and Loongson-3 series,
so we change the prefix from loongson3 to loongson for all functions and
data structures.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Combine acpi_boot_table_init() and acpi_boot_init() since they are very
simple, and we don't need to check the return value of acpi_boot_init().
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
Fix this up by changing the LoongArch Makefile to use "grep -E" instead.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Function sparx5_tc_setup_qdisc_ets() always returns negative value
because it return -EOPNOTSUPP in the end. This patch returns the
rersult of sparx5_tc_ets_add() and sparx5_tc_ets_del() directly.
Fixes: 211225428d65 ("net: microchip: sparx5: add support for offloading ets qdisc")
Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If device_register() returns error in vmbus_device_register(),
the name allocated by dev_set_name() must be freed. As comment
of device_register() says, it should use put_device() to give
up the reference in the error path. So fix this by calling
put_device(), then the name can be freed in kobject_cleanup().
Fixes: 09d50ff8a233 ("Staging: hv: make the Hyper-V virtual bus code build")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20221119081135.1564691-3-yangyingliang@huawei.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
vmbus_add_channel_work()
In the error path of vmbus_device_register(), device_unregister()
is called, which calls vmbus_device_release(). The latter frees
the struct hv_device that was passed in to vmbus_device_register().
So remove the kfree() in vmbus_add_channel_work() to avoid a double
free.
Fixes: c2e5df616e1a ("vmbus: add per-channel sysfs info")
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20221119081135.1564691-2-yangyingliang@huawei.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Jeroen de Borst says:
====================
gve: Handle alternate miss-completions
Some versions of the virtual NIC present miss-completions in
an alternative way. Let the diver handle these alternate completions
and announce this capability to the device.
The capability is announced uing a new AdminQ command that sends
driver information to the device. The device can refuse a driver
if it is lacking support for a capability, or it can adopt it's
behavior to work around OS specific issues.
Changed in v5:
- Removed comments in fucntion calls
- Switched ENOTSUPP back to EOPNOTSUPP and made sure it gets passed
Changed in v4:
- Clarified new AdminQ command in cover letter
- Changed EOPNOTSUPP to ENOTSUPP to match device's response
Changed in v3:
- Rewording cover letter
- Added 'Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>'
Changes in v2:
- Changed the subject to include 'gve:'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The virtual NIC has 2 ways of indicating a miss-path
completion. This handles the alternate.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check whether the driver is compatible with the device
presented.
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|