Age | Commit message (Collapse) | Author |
|
Cache move-in for virtual accesses is controlled by the TLB. Thus,
we must generally purge TLB entries before flushing. The flush routines
must use TLB entries that inhibit cache move-in.
V2: Load physical address prior to flushing TLB. In flush_cache_page,
flush TLB when flushing and purging.
V3: Don't flush when start equals end.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Standalone ports use vid 0. Let the bridge use vid 1 when
"vlan_default_pvid 0" is set to avoid collisions. Since no
VLAN is created when default pvid is 0 this is set
at "PORT_ATTR_SET" and handled in the Switchdev fdb handler.
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
allocated_mem is allocated by kcalloc(). The memory is set to zero.
It is unnecessary to call memset again.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In calipso_map_cat_ntoh(), in the for loop, if the return value of
netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when
netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses
of bitmap[byte_offset] occurs.
The bug was found during fuzzing. The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0
Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252
CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x21c/0x230
show_stack+0x1c/0x60
dump_stack_lvl+0x64/0x7c
print_address_description.constprop.0+0x70/0x2d0
__kasan_report+0x158/0x16c
kasan_report+0x74/0x120
__asan_load1+0x80/0xa0
netlbl_bitmap_walk+0x3c/0xd0
calipso_opt_getattr+0x1a8/0x230
calipso_sock_getattr+0x218/0x340
calipso_sock_getattr+0x44/0x60
netlbl_sock_getattr+0x44/0x80
selinux_netlbl_socket_setsockopt+0x138/0x170
selinux_socket_setsockopt+0x4c/0x60
security_socket_setsockopt+0x4c/0x90
__sys_setsockopt+0xbc/0x2b0
__arm64_sys_setsockopt+0x6c/0x84
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x88/0x200
do_el0_svc+0x88/0xa0
el0_svc+0x128/0x1b0
el0t_64_sync_handler+0x9c/0x120
el0t_64_sync+0x16c/0x170
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Duoming Zhou says:
====================
Fix refcount leak and NPD bugs in ax25
The first patch fixes refcount leak in ax25 that could cause
ax25-ex-connected-session-now-listening-state-bug.
The second patch fixes NPD bugs in ax25 timers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous commit 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect")
move ax25_disconnect into lock_sock() in order to prevent NPD bugs. But
there are race conditions that may lead to null pointer dereferences in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we use
ax25_kill_by_device() to detach the ax25 device.
One of the race conditions that cause null pointer dereferences can be
shown as below:
(Thread 1) | (Thread 2)
ax25_connect() |
ax25_std_establish_data_link() |
ax25_start_t1timer() |
mod_timer(&ax25->t1timer,..) |
| ax25_kill_by_device()
(wait a time) | ...
| s->ax25_dev = NULL; //(1)
ax25_t1timer_expiry() |
ax25->ax25_dev->values[..] //(2)| ...
... |
We set null to ax25_cb->ax25_dev in position (1) and dereference
the null pointer in position (2).
The corresponding fail log is shown below:
===============================================================
BUG: kernel NULL pointer dereference, address: 0000000000000050
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc6-00794-g45690b7d0
RIP: 0010:ax25_t1timer_expiry+0x12/0x40
...
Call Trace:
call_timer_fn+0x21/0x120
__run_timers.part.0+0x1ca/0x250
run_timer_softirq+0x2c/0x60
__do_softirq+0xef/0x2f3
irq_exit_rcu+0xb6/0x100
sysvec_apic_timer_interrupt+0xa2/0xd0
...
This patch moves ax25_disconnect() before s->ax25_dev = NULL
and uses del_timer_sync() to delete timers in ax25_disconnect().
If ax25_disconnect() is called by ax25_kill_by_device() or
ax25->ax25_dev is NULL, the reason in ax25_disconnect() will be
equal to ENETUNREACH, it will wait all timers to stop before we
set null to s->ax25_dev in ax25_kill_by_device().
Fixes: 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to
avoid UAF bugs") and commit feef318c855a ("ax25: fix UAF bugs of
net_device caused by rebinding operation") increase the refcounts of
ax25_dev and net_device in ax25_bind() and decrease the matching refcounts
in ax25_kill_by_device() in order to prevent UAF bugs, but there are
reference count leaks.
The root cause of refcount leaks is shown below:
(Thread 1) | (Thread 2)
ax25_bind() |
... |
ax25_addr_ax25dev() |
ax25_dev_hold() //(1) |
... |
dev_hold_track() //(2) |
... | ax25_destroy_socket()
| ax25_cb_del()
| ...
| hlist_del_init() //(3)
|
|
(Thread 3) |
ax25_kill_by_device() |
... |
ax25_for_each(s, &ax25_list) { |
if (s->ax25_dev == ax25_dev) //(4) |
... |
Firstly, we use ax25_bind() to increase the refcount of ax25_dev in
position (1) and increase the refcount of net_device in position (2).
Then, we use ax25_cb_del() invoked by ax25_destroy_socket() to delete
ax25_cb in hlist in position (3) before calling ax25_kill_by_device().
Finally, the decrements of refcounts in ax25_kill_by_device() will not
be executed, because no s->ax25_dev equals to ax25_dev in position (4).
This patch adds decrements of refcounts in ax25_release() and use
lock_sock() to do synchronization. If refcounts decrease in ax25_release(),
the decrements of refcounts in ax25_kill_by_device() will not be
executed and vice versa.
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Fixes: 87563a043cef ("ax25: fix reference count leaks of ax25_dev")
Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the <linux/cgroup-defs.h> dependency to <linux/psi.h>, because
cgroup_move_task() will dereference 'struct css_set'.
( Only older toolchains are affected, due to variations in
the implementation of rcu_assign_pointer() et al. )
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This reverts commit cf3e26427c08ad9015956293ab389004ac6a338e.
Multi-vCPU Hyper-V guests started crashing randomly on boot with the
latest kvm/queue and the problem can be bisected the problem to this
particular patch. Basically, I'm not able to boot e.g. 16-vCPU guest
successfully anymore. Both Intel and AMD seem to be affected. Reverting
the commit saves the day.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Since "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
is going to be reverted, it's not going to be true anymore that
the zap-page flow does not free any 'struct kvm_mmu_page'. Introduce
an early flush before tdp_mmu_zap_leafs() returns, to preserve
bisectability.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When CONFIG_DEBUG_INFO_BTF is disabled, bpf_get_btf_vmlinux can return a
NULL pointer. Check for it in btf_get_module_btf to prevent a NULL pointer
dereference.
While kernel test robot only complained about this specific case, let's
also check for NULL in other call sites of bpf_get_btf_vmlinux.
Fixes: 9492450fd287 ("bpf: Always raise reference in btf_get_module_btf")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220320143003.589540-1-memxor@gmail.com
|
|
Add a test case for stacktrace with skip > 0 using a small sized
buffer. It didn't support skipping entries greater than or equal to
the size of buffer and filled the skipped part with 0.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220314182042.71025-2-namhyung@kernel.org
|
|
Let's say that the caller has storage for num_elem stack frames. Then,
the BPF stack helper functions walk the stack for only num_elem frames.
This means that if skip > 0, one keeps only 'num_elem - skip' frames.
This is because it sets init_nr in the perf_callchain_entry to the end
of the buffer to save num_elem entries only. I believe it was because
the perf callchain code unwound the stack frames until it reached the
global max size (sysctl_perf_event_max_stack).
However it now has perf_callchain_entry_ctx.max_stack to limit the
iteration locally. This simplifies the code to handle init_nr in the
BPF callstack entries and removes the confusion with the perf_event's
__PERF_SAMPLE_CALLCHAIN_EARLY which sets init_nr to 0.
Also change the comment on bpf_get_stack() in the header file to be
more explicit what the return value means.
Fixes: c195651e565a ("bpf: add bpf_get_stack helper")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com
Link: https://lore.kernel.org/bpf/20220314182042.71025-1-namhyung@kernel.org
Based-on-patch-by: Eugene Loh <eugene.loh@oracle.com>
|
|
Using HPAGE_PMD_SIZE as the size for bpf_prog_pack is not ideal in some
cases. Specifically, for NUMA systems, __vmalloc_node_range requires
PMD_SIZE * num_online_nodes() to allocate huge pages. Also, if the system
does not support huge pages (i.e., with cmdline option nohugevmalloc), it
is better to use PAGE_SIZE packs.
Add logic to select proper size for bpf_prog_pack. This solution is not
ideal, as it makes assumption about the behavior of module_alloc and
__vmalloc_node_range. However, it appears to be the easiest solution as
it doesn't require changes in module_alloc and vmalloc code.
Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220311201135.3573610-1-song@kernel.org
|
|
Jakub Sitnicki says:
====================
This patch set is a result of a discussion we had around the RFC patchset from
Ilya [1]. The fix for the narrow loads from the RFC series is still relevant,
but this series does not depend on it. Nor is it required to unbreak sk_lookup
tests on BE, if this series gets applied.
To summarize the takeaways from [1]:
1) we want to make 2-byte load from ctx->remote_port portable across LE and BE,
2) we keep the 4-byte load from ctx->remote_port as it is today - result varies
on endianess of the platform.
[1] https://lore.kernel.org/bpf/20220222182559.2865596-2-iii@linux.ibm.com/
v1 -> v2:
- Remove needless check that 4-byte load is from &ctx->remote_port offset
(Martin)
[v1]: https://lore.kernel.org/bpf/20220317165826.1099418-1-jakub@cloudflare.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The context access converter rewrites the 4-byte load from
bpf_sk_lookup->remote_port to a 2-byte load from bpf_sk_lookup_kern
structure.
It means that we cannot treat the destination register contents as a 32-bit
value, or the code will not be portable across big- and little-endian
architectures.
This is exactly the same case as with 4-byte loads from bpf_sock->dst_port
so follow the approach outlined in [1] and treat the register contents as a
16-bit value in the test.
[1]: https://lore.kernel.org/bpf/20220317113920.1068535-5-jakub@cloudflare.com/
Fixes: 2ed0dc5937d3 ("selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220319183356.233666-4-jakub@cloudflare.com
|
|
In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct
bpf_sk_lookup 16-bit wide") ->remote_port field changed from __u32 to
__be16.
However, narrow load tests which exercise 1-byte sized loads from
offsetof(struct bpf_sk_lookup, remote_port) were not adopted to reflect the
change.
As a result, on little-endian we continue testing loads from addresses:
- (__u8 *)&ctx->remote_port + 3
- (__u8 *)&ctx->remote_port + 4
which map to the zero padding following the remote_port field, and don't
break the tests because there is no observable change.
While on big-endian, we observe breakage because tests expect to see zeros
for values loaded from:
- (__u8 *)&ctx->remote_port - 1
- (__u8 *)&ctx->remote_port - 2
Above addresses map to ->remote_ip6 field, which precedes ->remote_port,
and are populated during the bpf_sk_lookup IPv6 tests.
Unsurprisingly, on s390x we observe:
#136/38 sk_lookup/narrow access to ctx v4:OK
#136/39 sk_lookup/narrow access to ctx v6:FAIL
Fix it by removing the checks for 1-byte loads from offsets outside of the
->remote_port field.
Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide")
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220319183356.233666-3-jakub@cloudflare.com
|
|
In commit 9a69e2b385f4 ("bpf: Make remote_port field in struct
bpf_sk_lookup 16-bit wide") the remote_port field has been split up and
re-declared from u32 to be16.
However, the accompanying changes to the context access converter have not
been well thought through when it comes big-endian platforms.
Today 2-byte wide loads from offsetof(struct bpf_sk_lookup, remote_port)
are handled as narrow loads from a 4-byte wide field.
This by itself is not enough to create a problem, but when we combine
1. 32-bit wide access to ->remote_port backed by a 16-wide wide load, with
2. inherent difference between litte- and big-endian in how narrow loads
need have to be handled (see bpf_ctx_narrow_access_offset),
we get inconsistent results for a 2-byte loads from &ctx->remote_port on LE
and BE architectures. This in turn makes BPF C code for the common case of
2-byte load from ctx->remote_port not portable.
To rectify it, inform the context access converter that remote_port is
2-byte wide field, and only 1-byte loads need to be treated as narrow
loads.
At the same time, we special-case the 4-byte load from &ctx->remote_port to
continue handling it the same way as do today, in order to keep the
existing BPF programs working.
Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220319183356.233666-2-jakub@cloudflare.com
|
|
Joanne Koong says:
====================
From: Joanne Koong <joannelkoong@gmail.com>
Currently, local storage memory can only be allocated atomically
(GFP_ATOMIC). This restriction is too strict for sleepable bpf
programs.
In this patchset, sleepable programs can allocate memory in local
storage using GFP_KERNEL, while non-sleepable programs always default to
GFP_ATOMIC.
v3 <- v2:
* Add extra case to local_storage.c selftest to test associating multiple
elements with the local storage, which triggers a GFP_KERNEL allocation in
local_storage_update().
* Cast gfp_t to __s32 in verifier to fix the sparse warnings
v2 <- v1:
* Allocate the memory before/after the raw_spin_lock_irqsave, depending
on the gfp flags
* Rename mem_flags to gfp_flags
* Reword the comment "*mem_flags* is set by the bpf verifier" to
"*gfp_flags* is a hidden argument provided by the verifier"
* Add a sentence to the commit message about existing local storage
selftests covering both the GFP_ATOMIC and GFP_KERNEL paths in
bpf_local_storage_update.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a few calls to the existing local storage selftest to
test that we can associate multiple elements with the local storage.
The sleepable program's call to bpf_sk_storage_get with sk_storage_map2
will lead to an allocation of a new selem under the GFP_KERNEL flag.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220318045553.3091807-3-joannekoong@fb.com
|
|
Currently, local storage memory can only be allocated atomically
(GFP_ATOMIC). This restriction is too strict for sleepable bpf
programs.
In this patch, the verifier detects whether the program is sleepable,
and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a
5th argument to bpf_task/sk/inode_storage_get. This flag will propagate
down to the local storage functions that allocate memory.
Please note that bpf_task/sk/inode_storage_update_elem functions are
invoked by userspace applications through syscalls. Preemption is
disabled before bpf_task/sk/inode_storage_update_elem is called, which
means they will always have to allocate memory atomically.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220318045553.3091807-2-joannekoong@fb.com
|
|
If BPF object doesn't have an BTF info, don't attempt to search for BTF
types describing BPF map key or value layout.
Fixes: 262cfb74ffda ("libbpf: Init btf_{key,value}_type_id on internal map open")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220320001911.3640917-1-andrii@kernel.org
|
|
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
While the original code is valid, it is not the obvious choice for the
sizeof() call and in preparation to limit the scope of the list iterator
variable the sizeof should be changed to the size of the destination.
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull kvm fix from Paolo Bonzini:
"Fix for the SLS mitigation, which makes a 'SETcc/RET' pair grow
to 'SETcc/RET/INT3'.
This doesn't fit in 4 bytes any more, so the alignment has to
change to 8 for this case"
* tag 'for-linus-5.17' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm/emulate: Fix SETcc emulation function offsets with SLS
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Two driver fixes:
- a fix for zinitix touchscreen to properly report contacts
- a fix for aiptek tablet driver to be more resilient to devices with
incorrect descriptors"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: aiptek - properly check endpoint type
Input: zinitix - do not report shadow fingers
|
|
When an enum is used in the visible parts of a trace event that is
exported to user space, the user space applications like perf and
trace-cmd do not have a way to know what the value of the enum is. To
solve this, at boot up (or module load) the printk formats are modified to
replace the enum with their numeric value in the string output.
Array fields of the event are defined by [<nr-elements>] in the type
portion of the format file so that the user space parsers can correctly
parse the array into the appropriate size chunks. But in some trace
events, an enum is used in defining the size of the array, which once
again breaks the parsing of user space tooling.
This was solved the same way as the print formats were, but it modified
the type strings of the trace event. This caused crashes in some
architectures because, as supposed to the print string, is a const string
value. This was not detected on x86, as it appears that const strings are
still writable (at least in boot up), but other architectures this is not
the case, and writing to a const string will cause a kernel fault.
To fix this, use kstrdup() to copy the type before modifying it. If the
trace event is for the core kernel there's no need to free it because the
string will be in use for the life of the machine being on line. For
modules, create a link list to store all the strings being allocated for
modules and when the module is removed, free them.
Link: https://lore.kernel.org/all/yt9dr1706b4i.fsf@linux.ibm.com/
Link: https://lkml.kernel.org/r/20220318153432.3984b871@gandalf.local.home
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Fixes: b3bc8547d3be ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The commit in Fixes started adding INT3 after RETs as a mitigation
against straight-line speculation.
The fastop SETcc implementation in kvm's insn emulator uses macro magic
to generate all possible SETcc functions and to jump to them when
emulating the respective instruction.
However, it hardcodes the size and alignment of those functions to 4: a
three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an
INT3 that gets slapped after the RET, which brings the whole scheme out
of alignment:
15: 0f 90 c0 seto %al
18: c3 ret
19: cc int3
1a: 0f 1f 00 nopl (%rax)
1d: 0f 91 c0 setno %al
20: c3 ret
21: cc int3
22: 0f 1f 00 nopl (%rax)
25: 0f 92 c0 setb %al
28: c3 ret
29: cc int3
and this explodes like this:
int3: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1
Hardware name: Dell Inc. Precision WorkStation T3400 /0TP412, BIOS A14 04/30/2012
RIP: 0010:setc+0x5/0x8 [kvm]
Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \
1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \
0f 94 c0 c3 cc 0f 1f 00 0f 95 c0
Call Trace:
<TASK>
? x86_emulate_insn [kvm]
? x86_emulate_instruction [kvm]
? vmx_handle_exit [kvm_intel]
? kvm_arch_vcpu_ioctl_run [kvm]
? kvm_vcpu_ioctl [kvm]
? __x64_sys_ioctl
? do_syscall_64
? entry_SYSCALL_64_after_hwframe
</TASK>
Raise the alignment value when SLS is enabled and use a macro for that
instead of hard-coding naked numbers.
Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net
[Add a comment and a bit of safety checking, since this is going to be changed
again for IBT support. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fix from Arnd Bergmann:
"Here is one last regression fix for 5.17, reverting a patch that went
into 5.16 as a cleanup that ended up breaking external interrupts on
Layerscape chips.
The revert makes it work again, but also reintroduces a build time
warning about the nonstandard DT binding that will have to be dealt
with in the future"
* tag 'soc-fixes-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
Revert "arm64: dts: freescale: Fix 'interrupt-map' parent address cells"
|
|
The flowtable object is already passed as argument to
nf_flow_table_iterate(), do use not data pointer to pass flowtable.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Already available through the flowtable object, remove it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.
Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.
Example 1: (System.map)
ffffffff832fc78c t init
ffffffff832fc79e t init
ffffffff832fc8f8 t init
Example 2: (initcall_debug log)
calling init+0x0/0x12 @ 1
initcall init+0x0/0x12 returned 0 after 15 usecs
calling init+0x0/0x60 @ 1
initcall init+0x0/0x60 returned 0 after 2 usecs
calling init+0x0/0x9a @ 1
initcall init+0x0/0x9a returned 0 after 74 usecs
Fixes: f587de0e2feb ("[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if we can elide the load. Cancel if the new candidate
isn't identical to previous store.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The fib expression stores to a register, so we can't add empty stub.
Check that the register that is being written is in fact redundant.
In most cases, this is expected to cancel tracking as re-use is
unlikely.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
tunnel expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information. This patch does not perform bitwise tracking.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
xfrm expression performs. This allows to skip this redundant operation.
If the destination contains a different selector, update the register
tracking information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
socket expression performs. This allows to skip this redundant
operation. If the destination contains a different selector, update the
register tracking information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The immediate expression might clobber existing data on the registers,
cancel register tracking for the destination register.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this
osf expression performs. Always cancel register tracking for jhash since
this requires tracking multiple source registers in case of
concatenations. Perform register tracking (without bitwise) for symhash
since input does not come from source register.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Allow to recycle the previous output of the OS fingerprint expression
if flags and ttl are the same.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Random and increment are stateful, each invocation results in fresh output.
Cancel register tracking for these two expressions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
its enough to export the meta get reduce helper and then call it
from nft_meta_bridge too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In most cases, nft_lookup will be read-only, i.e. won't clobber
registers. In case of map, we need to cancel the registers that will
see stores.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Check if the destination register already contains the data that this ct
expression performs. This allows to skip this redundant operation. If
the destination contains a different selector, update the register
tracking information.
Export nft_expr_reduce_bitwise as a symbol since nft_ct might be
compiled as a module.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Output of expressions might be larger than one single register, this might
clobber existing data. Reset tracking for all destination registers that
required to store the expression output.
This patch adds three new helper functions:
- nft_reg_track_update: cancel previous register tracking and update it.
- nft_reg_track_cancel: cancel any previous register tracking info.
- __nft_reg_track_cancel: cancel only one single register tracking info.
Partial register clobbering detection is also supported by checking the
.num_reg field which describes the number of register that are used.
This patch updates the following expressions:
- meta_bridge
- bitwise
- byteorder
- meta
- payload
to use these helper functions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Skip register tracking for expressions that perform read-only operations
on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to
avoid defining stubs for these expressions.
This patch re-enables register tracking which was disabled in ed5f85d42290
("netfilter: nf_tables: disable register tracking"). Follow up patches
add remaining register tracking for existing expressions.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The function sets the pernet boolean to avoid the spurious warning from
nf_ct_lookup_helper() when assigning conntrack helpers via nftables.
Fixes: 1a64edf54f55 ("netfilter: nft_ct: add helper set support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small(ish) fixes, both in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: fnic: Finish scsi_cmnd before dropping the spinlock
scsi: mpt3sas: Page fault in reply q processing
|
|
as of commit 4608fdfc07e1
("netfilter: conntrack: collect all entries in one cycle")
conntrack gc was changed to run every 2 minutes.
On systems where conntrack hash table is set to large value, most evictions
happen from gc worker rather than the packet path due to hash table
distribution.
This causes netlink event overflows when events are collected.
This change collects average expiry of scanned entries and
reschedules to the average remaining value, within 1 to 60 second interval.
To avoid event overflows, reschedule after each bucket and add a
limit for both run time and number of evictions per run.
If more entries have to be evicted, reschedule and restart 1 jiffy
into the future.
Reported-by: Karel Rericha <karel@maxtel.cz>
Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|