Age | Commit message (Collapse) | Author |
|
Instead of using array_size, use a function that takes care of the
multiplication. While at it, switch to kvcalloc since this allocation
should not be very large.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
advertise the set of quirks which may be disabled to userspace, so it is
impossible to predict the behavior of KVM. Worse yet,
KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
it fails to reject attempts to set invalid quirk bits.
The only valid workaround for the quirky quirks API is to add a new CAP.
Actually advertise the set of quirks that can be disabled to userspace
so it can predict KVM's behavior. Reject values for cap->args[0] that
contain invalid bits.
Finally, add documentation for the new capability and describe the
existing quirks.
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220301060351.442881-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Message-Id: <Yh5eJSG19S2sjZfy@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Guests X86_BUG_NULL_SEG if and only if the host has them. Use the info
from static_cpu_has_bug to form the 0x80000021 CPUID leaf that was
defined for Zen3. Userspace can then set the bit even on older CPUs
that do not have the bug, such as Zen2.
Do the same for X86_FEATURE_LFENCE_RDTSC as well, since various processors
have had very different ways of detecting it and not all of them are
available to userspace.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
CPUID leaf 0x80000021 defines some features (or lack of bugs) of AMD
processors. Expose the ones that make sense via KVM_GET_SUPPORTED_CPUID.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM_X86_OP_OPTIONAL_RET0 can only be used with 32-bit return values on 32-bit
systems, because unsigned long is only 32-bits wide there and 64-bit values
are returned in edx:eax.
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Tong Zhang says:
====================
fix typos: "to short" -> "too short"
doing some code review and I found out there are a couple of places
where "too short" is misspelled as "to short".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"frame to short" -> "frame too short"
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"Frame to short" -> "Frame too short"
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"packet length to short" -> "packet length too short"
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"RX USB to short" -> "RX USB too short"
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Casper Andersson says:
====================
net: sparx5: Add multicast support
Add multicast support to Sparx5.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds mdb handlers. Uses the PGID arbiter to
find a free entry in the PGID table for the
multicast group port mask.
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PGID (Port Group ID) table holds port masks
for different purposes. The first 72 are reserved
for port destination masks, flood masks, and CPU
forwarding. The rest are shared between multicast,
link aggregation, and virtualization profiles. The
GLAG area is reserved to not be used by anything
else, since it is a subset of the MCAST area.
The arbiter keeps track of which entries are in
use. You can ask for a free ID or give back one
you are done using.
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simon Horman says:
====================
nfp: support for NFP-3800
Yinjun Zhan says:
This is the second of a two part series to support the NFP-3800 device.
To utilize the new hardware features of the NFP-3800, driver adds support
of a new data path NFDK. This series mainly does some refactor work to the
data path related implementations. The data path specific implementations
are now separated into nfd3 and nfdk directories respectively, and the
common part is also moved into a new file.
* The series starts with a small refinement in Patch 1/10. Patches 2/10 and
3/10 are the main refactoring of data path implementation, which prepares
for the adding the NFDK data path.
* Before the introduction of NFDK, there's some more preparation work
for NFP-3800 features, such as multi-descriptor per-packet and write-back
mechanism of TX pointer, which is done in patches 4/10, 5/10, 6/10, 7/10.
* Patch 8/10 allows the driver to select data path according
to firmware version. Finally, patches 9/10 and 10/10 introduce the new
NFDK data path.
Changes between v1 and v2
* Correct kdoc for nfp_nfdk_tx()
* Correct build warnings on 32-bit
Thanks to everyone who contributed to this work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Due to the different definition of txbuf in NFDK comparing to NFD3,
there're no pre-allocated txbufs for xdp use in NFDK's implementation,
we just use the existed rxbuf and recycle it when xdp tx is completed.
For each packet to transmit in xdp path, we cannot use more than
`NFDK_TX_DESC_PER_SIMPLE_PKT` txbufs, one is to stash virtual address,
and another is for dma address, so currently the amount of transmitted
bytes is not accumulated. Also we borrow the last bit of virtual addr
to indicate a new transmitted packet due to address's alignment
attribution.
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add new data path. The TX is completely different, each packet
has multiple descriptor entries (between 2 and 32). TX ring is
divided into blocks 32 descriptor, and descritors of one packet
can't cross block bounds. The RX side is the same for now.
ABI version 5 or later is required. There is no support for
VLAN insertion on TX. XDP_TX action and AF_XDP zero-copy is not
implemented in NFDK path.
Changes to Jakub's work:
* Move statistics of hw_csum_tx after jumbo packet's segmentation.
* Set L3_CSUM flag to enable recaculating of L3 header checksum
in ipv4 case.
* Mark the case of TSO a packet with metadata prepended as
unsupported.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Xingfeng Hu <xingfeng.hu@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Dianchao Wang <dianchao.wang@corigine.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prepare for choosing data path based on the firmware version field.
Exploit one bit from the reserved byte in the firmware version field
as the data path type. We need the firmware version right after
vNIC is allocated, so it has to be read inside nfp_net_alloc(),
callers don't have to set it afterwards.
Following patches will bring the implementation of the second data
path.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure that features supported only by some of the data paths
are not enabled for all. Add a mask of supported features into
the data path op structure.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Newer versions of the PCIe microcode support writing back the
position of the TX pointer back into host memory. This speeds
up TX completions, because we avoid a read from device memory
(replacing PCIe read with DMA coherent read).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
QCidx is not used on fast path, move it to the lower cacheline.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
New datapaths may use multiple descriptor units to describe
a single packet. Prepare for that by adding a descriptors
per simple frame constant into ring size calculations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To reduce the coupling of slow path ring implementations and their
callers, use callbacks instead.
Changes to Jakub's work:
* Also use callbacks for xmit functions
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for support for a new datapath format move all
ring and fast path logic into separate files. It is basically
a verbatim move with some wrapping functions, no new structures
and functions added.
The current data path is called NFD3 from the initial version
of the driver ABI it used. The non-fast path, but ring related
functions are moved to nfp_net_dp.c file.
Changes to Jakub's work:
* Rebase on xsk related code.
* Split the patch, move the callback changes to next commit.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ring enable masks are 64bit long. Replace mask calculation from:
block_cnt == 64 ? 0xffffffffffffffffULL : (1 << block_cnt) - 1
with:
(U64_MAX >> (64 - block_cnt))
to simplify the code.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
free_watch() does everything barring actually freeing the watch object. Fix
this by adding the missing kfree.
kmemleak produces a report something like the following. Note that as an
address can be seen in the first word, the watch would appear to have gone
through call_rcu().
BUG: memory leak
unreferenced object 0xffff88810ce4a200 (size 96):
comm "syz-executor352", pid 3605, jiffies 4294947473 (age 13.720s)
hex dump (first 32 bytes):
e0 82 48 0d 81 88 ff ff 00 00 00 00 00 00 00 00 ..H.............
80 a2 e4 0c 81 88 ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8214e6cc>] kmalloc include/linux/slab.h:581 [inline]
[<ffffffff8214e6cc>] kzalloc include/linux/slab.h:714 [inline]
[<ffffffff8214e6cc>] keyctl_watch_key+0xec/0x2e0 security/keys/keyctl.c:1800
[<ffffffff8214ec84>] __do_sys_keyctl+0x3c4/0x490 security/keys/keyctl.c:2016
[<ffffffff84493a25>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff84493a25>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-and-tested-by: syzbot+6e2de48f06cdb2884bfc@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
In watch_queue_set_size(), the error cleanup code doesn't take account of
the fact that __free_page() can't handle a NULL pointer when trying to free
up buffer pages that did get allocated.
Fix this by only calling __free_page() on the pages actually allocated.
Without the fix, this can lead to something like the following:
BUG: KASAN: null-ptr-deref in __free_pages+0x1f/0x1b0 mm/page_alloc.c:5473
Read of size 4 at addr 0000000000000034 by task syz-executor168/3599
...
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__kasan_report mm/kasan/report.c:446 [inline]
kasan_report.cold+0x66/0xdf mm/kasan/report.c:459
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
instrument_atomic_read include/linux/instrumented.h:71 [inline]
atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
page_ref_count include/linux/page_ref.h:67 [inline]
put_page_testzero include/linux/mm.h:717 [inline]
__free_pages+0x1f/0x1b0 mm/page_alloc.c:5473
watch_queue_set_size+0x499/0x630 kernel/watch_queue.c:275
pipe_ioctl+0xac/0x2b0 fs/pipe.c:632
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-and-tested-by: syzbot+d55757faa9b80590767b@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next.
This patchset contains updates for the nf_tables register tracking
infrastructure, disable bogus warning when attaching ct helpers,
one namespace pollution fix and few cleanups for the flowtable.
1) Revisit conntrack gc routine to reduce chances of overruning
the netlink buffer from the event path. From Florian Westphal.
2) Disable warning on explicit ct helper assignment, from Phil Sutter.
3) Read-only expressions do not update registers, mark them as
NFT_REDUCE_READONLY. Add helper functions to update the register
tracking information. This patch re-enables the register tracking
infrastructure.
4) Cancel register tracking in case an expression fully/partially
clobbers existing data.
5) Add register tracking support for remaining expressions: ct,
lookup, meta, numgen, osf, hash, immediate, socket, xfrm, tunnel,
fib, exthdr.
6) Rename init and exit functions for the conntrack h323 helper,
from Randy Dunlap.
7) Remove redundant field in struct flow_offload_work.
8) Update nf_flow_table_iterate() to pass flowtable to callback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reset the last_readdir at the same time, and add a comment explaining
why we don't free last_readdir when dir_emit returns false.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
If read_mapping_folio() fails then "inline_version" is printed without
being initialized.
[ jlayton: use CEPH_INLINE_NONE instead of "-1" ]
Fixes: 083db6fd3e73 ("ceph: uninline the data on a file opened for writing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
stdev is computed in `cephfs-top` tool - clients forward
square of sums and IO count required to calculate stdev.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Make the math a bit simpler to understand (should not
affect execution speeds).
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Latencies are of type ktime_t, coverting from jiffies is incorrect.
Also, switch to "struct ceph_timespec" for r/w/m latencies.
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The ceph_find_inode() may will fail and return NULL.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The ceph_get_inode() will search for or insert a new inode into the
hash for the given vino, and return a reference to it. If new is
non-NULL, its reference is consumed.
We should release the reference when in error handing cases.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Cache move-in for virtual accesses is controlled by the TLB. Thus,
we must generally purge TLB entries before flushing. The flush routines
must use TLB entries that inhibit cache move-in.
V2: Load physical address prior to flushing TLB. In flush_cache_page,
flush TLB when flushing and purging.
V3: Don't flush when start equals end.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Standalone ports use vid 0. Let the bridge use vid 1 when
"vlan_default_pvid 0" is set to avoid collisions. Since no
VLAN is created when default pvid is 0 this is set
at "PORT_ATTR_SET" and handled in the Switchdev fdb handler.
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
allocated_mem is allocated by kcalloc(). The memory is set to zero.
It is unnecessary to call memset again.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In calipso_map_cat_ntoh(), in the for loop, if the return value of
netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when
netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses
of bitmap[byte_offset] occurs.
The bug was found during fuzzing. The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0
Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252
CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x21c/0x230
show_stack+0x1c/0x60
dump_stack_lvl+0x64/0x7c
print_address_description.constprop.0+0x70/0x2d0
__kasan_report+0x158/0x16c
kasan_report+0x74/0x120
__asan_load1+0x80/0xa0
netlbl_bitmap_walk+0x3c/0xd0
calipso_opt_getattr+0x1a8/0x230
calipso_sock_getattr+0x218/0x340
calipso_sock_getattr+0x44/0x60
netlbl_sock_getattr+0x44/0x80
selinux_netlbl_socket_setsockopt+0x138/0x170
selinux_socket_setsockopt+0x4c/0x60
security_socket_setsockopt+0x4c/0x90
__sys_setsockopt+0xbc/0x2b0
__arm64_sys_setsockopt+0x6c/0x84
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x88/0x200
do_el0_svc+0x88/0xa0
el0_svc+0x128/0x1b0
el0t_64_sync_handler+0x9c/0x120
el0t_64_sync+0x16c/0x170
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Duoming Zhou says:
====================
Fix refcount leak and NPD bugs in ax25
The first patch fixes refcount leak in ax25 that could cause
ax25-ex-connected-session-now-listening-state-bug.
The second patch fixes NPD bugs in ax25 timers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous commit 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect")
move ax25_disconnect into lock_sock() in order to prevent NPD bugs. But
there are race conditions that may lead to null pointer dereferences in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we use
ax25_kill_by_device() to detach the ax25 device.
One of the race conditions that cause null pointer dereferences can be
shown as below:
(Thread 1) | (Thread 2)
ax25_connect() |
ax25_std_establish_data_link() |
ax25_start_t1timer() |
mod_timer(&ax25->t1timer,..) |
| ax25_kill_by_device()
(wait a time) | ...
| s->ax25_dev = NULL; //(1)
ax25_t1timer_expiry() |
ax25->ax25_dev->values[..] //(2)| ...
... |
We set null to ax25_cb->ax25_dev in position (1) and dereference
the null pointer in position (2).
The corresponding fail log is shown below:
===============================================================
BUG: kernel NULL pointer dereference, address: 0000000000000050
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc6-00794-g45690b7d0
RIP: 0010:ax25_t1timer_expiry+0x12/0x40
...
Call Trace:
call_timer_fn+0x21/0x120
__run_timers.part.0+0x1ca/0x250
run_timer_softirq+0x2c/0x60
__do_softirq+0xef/0x2f3
irq_exit_rcu+0xb6/0x100
sysvec_apic_timer_interrupt+0xa2/0xd0
...
This patch moves ax25_disconnect() before s->ax25_dev = NULL
and uses del_timer_sync() to delete timers in ax25_disconnect().
If ax25_disconnect() is called by ax25_kill_by_device() or
ax25->ax25_dev is NULL, the reason in ax25_disconnect() will be
equal to ENETUNREACH, it will wait all timers to stop before we
set null to s->ax25_dev in ax25_kill_by_device().
Fixes: 7ec02f5ac8a5 ("ax25: fix NPD bug in ax25_disconnect")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous commit d01ffb9eee4a ("ax25: add refcount in ax25_dev to
avoid UAF bugs") and commit feef318c855a ("ax25: fix UAF bugs of
net_device caused by rebinding operation") increase the refcounts of
ax25_dev and net_device in ax25_bind() and decrease the matching refcounts
in ax25_kill_by_device() in order to prevent UAF bugs, but there are
reference count leaks.
The root cause of refcount leaks is shown below:
(Thread 1) | (Thread 2)
ax25_bind() |
... |
ax25_addr_ax25dev() |
ax25_dev_hold() //(1) |
... |
dev_hold_track() //(2) |
... | ax25_destroy_socket()
| ax25_cb_del()
| ...
| hlist_del_init() //(3)
|
|
(Thread 3) |
ax25_kill_by_device() |
... |
ax25_for_each(s, &ax25_list) { |
if (s->ax25_dev == ax25_dev) //(4) |
... |
Firstly, we use ax25_bind() to increase the refcount of ax25_dev in
position (1) and increase the refcount of net_device in position (2).
Then, we use ax25_cb_del() invoked by ax25_destroy_socket() to delete
ax25_cb in hlist in position (3) before calling ax25_kill_by_device().
Finally, the decrements of refcounts in ax25_kill_by_device() will not
be executed, because no s->ax25_dev equals to ax25_dev in position (4).
This patch adds decrements of refcounts in ax25_release() and use
lock_sock() to do synchronization. If refcounts decrease in ax25_release(),
the decrements of refcounts in ax25_kill_by_device() will not be
executed and vice versa.
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Fixes: 87563a043cef ("ax25: fix reference count leaks of ax25_dev")
Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the <linux/cgroup-defs.h> dependency to <linux/psi.h>, because
cgroup_move_task() will dereference 'struct css_set'.
( Only older toolchains are affected, due to variations in
the implementation of rcu_assign_pointer() et al. )
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This reverts commit cf3e26427c08ad9015956293ab389004ac6a338e.
Multi-vCPU Hyper-V guests started crashing randomly on boot with the
latest kvm/queue and the problem can be bisected the problem to this
particular patch. Basically, I'm not able to boot e.g. 16-vCPU guest
successfully anymore. Both Intel and AMD seem to be affected. Reverting
the commit saves the day.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Since "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
is going to be reverted, it's not going to be true anymore that
the zap-page flow does not free any 'struct kvm_mmu_page'. Introduce
an early flush before tdp_mmu_zap_leafs() returns, to preserve
bisectability.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.18-2022-03-18:
amdgpu:
- Aldebaran fixes
- SMU 13.0.5 fixes
- DCN 3.1.5 fixes
- DCN 3.1.6 fixes
- Pipe split fixes
- More display FP cleanup
- DP 2.0 UHBR fix
- DC GPU reset fix
- DC deep color ratio fix
- SMU robustness fixes
- Runtime PM fix for APUs
- IGT reload fixes
- SR-IOV fix
- Misc fixes and cleanups
amdkfd:
- CRIU fixes
- SVM fixes
UAPI:
- Properly handle SDMA transfers with CRIU
Proposed user mode change: https://github.com/checkpoint-restore/criu/pull/1709
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220318203717.5833-1-alexander.deucher@amd.com
|
|
When CONFIG_DEBUG_INFO_BTF is disabled, bpf_get_btf_vmlinux can return a
NULL pointer. Check for it in btf_get_module_btf to prevent a NULL pointer
dereference.
While kernel test robot only complained about this specific case, let's
also check for NULL in other call sites of bpf_get_btf_vmlinux.
Fixes: 9492450fd287 ("bpf: Always raise reference in btf_get_module_btf")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220320143003.589540-1-memxor@gmail.com
|
|
In remove_phb_dynamic() we use &phb->io_resource, after we've called
device_unregister(&host_bridge->dev). But the unregister may have freed
phb, because pcibios_free_controller_deferred() is the release function
for the host_bridge.
If there are no outstanding references when we call device_unregister()
then phb will be freed out from under us.
This has gone mainly unnoticed, but with slub_debug and page_poison
enabled it can lead to a crash:
PID: 7574 TASK: c0000000d492cb80 CPU: 13 COMMAND: "drmgr"
#0 [c0000000e4f075a0] crash_kexec at c00000000027d7dc
#1 [c0000000e4f075d0] oops_end at c000000000029608
#2 [c0000000e4f07650] __bad_page_fault at c0000000000904b4
#3 [c0000000e4f076c0] do_bad_slb_fault at c00000000009a5a8
#4 [c0000000e4f076f0] data_access_slb_common_virt at c000000000008b30
Data SLB Access [380] exception frame:
R0: c000000000167250 R1: c0000000e4f07a00 R2: c000000002a46100
R3: c000000002b39ce8 R4: 00000000000000c0 R5: 00000000000000a9
R6: 3894674d000000c0 R7: 0000000000000000 R8: 00000000000000ff
R9: 0000000000000100 R10: 6b6b6b6b6b6b6b6b R11: 0000000000008000
R12: c00000000023da80 R13: c0000009ffd38b00 R14: 0000000000000000
R15: 000000011c87f0f0 R16: 0000000000000006 R17: 0000000000000003
R18: 0000000000000002 R19: 0000000000000004 R20: 0000000000000005
R21: 000000011c87ede8 R22: 000000011c87c5a8 R23: 000000011c87d3a0
R24: 0000000000000000 R25: 0000000000000001 R26: c0000000e4f07cc8
R27: c00000004d1cc400 R28: c0080000031d00e8 R29: c00000004d23d800
R30: c00000004d1d2400 R31: c00000004d1d2540
NIP: c000000000167258 MSR: 8000000000009033 OR3: c000000000e9f474
CTR: 0000000000000000 LR: c000000000167250 XER: 0000000020040003
CCR: 0000000024088420 MQ: 0000000000000000 DAR: 6b6b6b6b6b6b6ba3
DSISR: c0000000e4f07920 Syscall Result: fffffffffffffff2
[NIP : release_resource+56]
[LR : release_resource+48]
#5 [c0000000e4f07a00] release_resource at c000000000167258 (unreliable)
#6 [c0000000e4f07a30] remove_phb_dynamic at c000000000105648
#7 [c0000000e4f07ab0] dlpar_remove_slot at c0080000031a09e8 [rpadlpar_io]
#8 [c0000000e4f07b50] remove_slot_store at c0080000031a0b9c [rpadlpar_io]
#9 [c0000000e4f07be0] kobj_attr_store at c000000000817d8c
#10 [c0000000e4f07c00] sysfs_kf_write at c00000000063e504
#11 [c0000000e4f07c20] kernfs_fop_write_iter at c00000000063d868
#12 [c0000000e4f07c70] new_sync_write at c00000000054339c
#13 [c0000000e4f07d10] vfs_write at c000000000546624
#14 [c0000000e4f07d60] ksys_write at c0000000005469f4
#15 [c0000000e4f07db0] system_call_exception at c000000000030840
#16 [c0000000e4f07e10] system_call_vectored_common at c00000000000c168
To avoid it, we can take a reference to the host_bridge->dev until we're
done using phb. Then when we drop the reference the phb will be freed.
Fixes: 2dd9c11b9d4d ("powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)")
Reported-by: David Dai <zdai@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Link: https://lore.kernel.org/r/20220318034219.1188008-1-mpe@ellerman.id.au
|