Age | Commit message (Collapse) | Author |
|
Today the last process to update a semaphore is remembered and
reported in the pid namespace of that process. If there are processes
in any other pid namespace querying that process id with GETPID the
result will be unusable nonsense as it does not make any
sense in your own pid namespace.
Due to ipc_update_pid I don't think you will be able to get System V
ipc semaphores into a troublesome cache line ping-pong. Using struct
pids from separate process are not a problem because they do not share
a cache line. Using struct pid from different threads of the same
process are unlikely to be a problem as the reference count update
can be avoided.
Further linux futexes are a much better tool for the job of mutual
exclusion between processes than System V semaphores. So I expect
programs that are performance limited by their interprocess mutual
exclusion primitive will be using futexes.
So while it is possible that enhancing the storage of the last
rocess of a System V semaphore from an integer to a struct pid
will cause a performance regression because of the effect
of frequently updating the pid reference count. I don't expect
that to happen in practice.
This change updates semctl(..., GETPID, ...) to return the
process id of the last process to update a semphore inthe
pid namespace of the calling process.
Fixes: b488893a390e ("pid namespaces: changes to show virtual ids to user")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Today msg_lspid and msg_lrpid are remembered in the pid namespace of
the creator and the processes that last send or received a sysvipc
message. If you have processes in multiple pid namespaces that is
just wrong. The process ids reported will not make the least bit of
sense.
This fix is slightly more susceptible to a performance problem than
the related fix for System V shared memory. By definition the pids
are updated by msgsnd and msgrcv, the fast path of System V message
queues. The only concern over the previous implementation is the
incrementing and decrementing of the pid reference count. As that is
the only difference and multiple updates by of the task_tgid by
threads in the same process have been shown in af_unix sockets to
create a cache line ping-pong between cpus of the same processor.
In this case I don't expect cache lines holding pid reference counts
to ping pong between cpus. As senders and receivers update different
pids there is a natural separation there. Further if multiple threads
of the same process either send or receive messages the pid will be
updated to the same value and ipc_update_pid will avoid the reference
count update.
Which means in the common case I expect msg_lspid and msg_lrpid to
remain constant, and reference counts not to be updated when messages
are sent.
In rare cases it may be possible to trigger the issue which was
observed for af_unix sockets, but it will require multiple processes
with multiple threads to be either sending or receiving messages. It
just does not feel likely that anyone would do that in practice.
This change updates msgctl(..., IPC_STAT, ...) to return msg_lspid and
msg_lrpid in the pid namespace of the process calling stat.
This change also updates cat /proc/sysvipc/msg to return print msg_lspid
and msg_lrpid in the pid namespace of the process that opened the proc
file.
Fixes: b488893a390e ("pid namespaces: changes to show virtual ids to user")
Reviewed-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Today shm_cpid and shm_lpid are remembered in the pid namespace of the
creator and the processes that last touched a sysvipc shared memory
segment. If you have processes in multiple pid namespaces that
is just wrong, and I don't know how this has been over-looked for
so long.
As only creation and shared memory attach and shared memory detach
update the pids I do not expect there to be a repeat of the issues
when struct pid was attached to each af_unix skb, which in some
notable cases cut the performance in half. The problem was threads of
the same process updating same struct pid from different cpus causing
the cache line to be highly contended and bounce between cpus.
As creation, attach, and detach are expected to be rare operations for
sysvipc shared memory segments I do not expect that kind of cache line
ping pong to cause probems. In addition because the pid is at a fixed
location in the structure instead of being dynamic on a skb, the
reference count of the pid does not need to be updated on each
operation if the pid is the same. This ability to simply skip the pid
reference count changes if the pid is unchanging further reduces the
likelihood of the a cache line holding a pid reference count
ping-ponging between cpus.
Fixes: b488893a390e ("pid namespaces: changes to show virtual ids to user")
Reviewed-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
Access to the socket API and the root network namespace is only available
when networking is enabled:
ERROR: "kernel_sendmsg" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "sock_release" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "sock_create_kern" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "kernel_getsockname" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "init_net" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "kernel_recvmsg" [drivers/soc/qcom/qmi_helpers.ko] undefined!
Adding a dependency on CONFIG_NET lets us build it in all randconfig
builds.
Fixes: 9b8a11e82615 ("soc: qcom: Introduce QMI encoder/decoder")
Acked-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
In some firmware configuration, UMR usage from Virtual Functions is restricted.
This information is published to the driver using new capability bits.
Avoid using UMRs in these cases and use the Firmware slow-path flow to create
mkeys and populate them with Virtual to Physical address translation.
Older drivers that do not have this patch, will end up using memory keys that
aren't populated with Virtual to Physical address translation that is done
part of the UMR work.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
When working with RC QPs, the FW sets the ECN capable bits for all
the RoCE v2 packets. On the other hand, for UD QPs, the driver needs
to set the the ECN capable bits in the Address Handler since the HW
generates each packet according to the Address Handler and not
the QP context.
If ECN is not enabled in NIC or switch, these bits are ignored.
Fixes: 2811ba51b049 ("IB/mlx5: Add RoCE fields to Address Vector")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The ioctl() UAPIs are meant to be used by both user-space
and kernel ioctl() handlers.
Mostly, these UAPI structs tend to consist of simple types, but
sometimes user-space pointers may be passed between user-space and
kernel. We would like to avoid dereferencing a user-space pointer in
the kernel, thus - we always define RDMA_UAPI_PTR as a __aligned_u64
type.
Fixes: 1f7ff9d5d36a ('IB/uverbs: Move to new headers and make naming consistent')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The design of the uAPI had intended all structs to share the same layout on 32
and 64 bit compiles. Unfortunately over the years some errors have crept in.
This series fixes all the incompatabilities. It goes along with a userspace
rdma-core series that causes the providers to use these structs directly and
then does various self-checks on the command formation.
Those checks were combined with output from pahole on 32 and 64 bit compiles
to confirm that the structure layouts are the same.
This series does not make implicit padding explicit, as long as the implicit
padding is the same on 32 and 64 bit compiles.
Finally, the issue is put to rest by using __aligned_u64 in the uapi headers,
if new code copies that type, and is checked in userspace, it is unlikely we
will see problems in future.
There are two patches that break the ABI for a 32 bit kernel, one for rxe and
one for mlx4. Both patches have notes, but the overall feeling from Doug and I
is that providing compat is just too difficult and not necessary since there
is no real user of a 32 bit userspace and 32 bit kernel for various good
reasons.
The 32 bit userspace / 64 bit kernel case however does seem to have some real
users and does need to work as designed.
* 32compat:
RDMA: Change all uapi headers to use __aligned_u64 instead of __u64
RDMA/rxe: Fix uABI structure layouts for 32/64 compat
RDMA/mlx4: Fix uABI structure layouts for 32/64 compat
RDMA/qedr: Fix uABI structure layouts for 32/64 compat
RDMA/ucma: Fix uABI structure layouts for 32/64 compat
RDMA: Remove minor pahole differences between 32/64
|
|
The new auditing standard for the subsystem will be to only use
__aligned_64 in uapi headers to try and prevent 32/64 compat bugs
from existing in the future.
Changing all existing usage will help ensure new developers copy the
right idea.
The before/after of this patch was tested using pahole on 32 and 64
bit compiles to confirm it has no change in the structure layout, so
this patch is a NOP.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
With 32 bit compilation several of the fields become misaligned here.
Fixing this is an ABI break for 32 bit rxe and it is in well used
portions of the rxe ABI.
To handle this we bump the ABI version, as expected. However the user
space driver doesn't handle it properly today, so all existing user
space continues to work.
Updated userspace will start to require the necessary kernel version.
We don't expect there to be any 32 bit users of rxe. Most likely cases,
such as ARM 32 already generally don't work because rxe does not handle
the CPU cache properly on its shared with userspace pages.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
rss_caps in struct mlx4_uverbs_ex_query_device_resp is misaligned on
32 bit compared to 64 bit, add explicit padding.
The rss caps were introduced recently and are very rarely used in user
space, mainly for DPDK.
We don't expect there to be a real 32 bit user, so this change is done
without compat considerations.
Fixes: 09d208b258a2 ("IB/mlx4: Add report for RSS capabilities by vendor channel")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
struct qedr_alloc_ucontext_resp is a different length in 32 and 64
bit compiles due to implicit compiler padding.
The structs alloc_pd_uresp, create_cq_uresp and create_qp_uresp are
not padded by the compiler, but in user space the compiler pads them
due to the way the core and driver structs are concatenated. Make
this padding explicit and consistent for future sanity.
The kernel driver can already handle the user buffer being smaller
than required and copies correctly, so no compat or ABI break happens
from introducing the explicit padding.
Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles.
The kernel requires it to be the expected length or longer so 32 bit
builds running on a 64 bit kernel will not work.
Retain full compat by having all kernels accept a struct with or without
the trailing reserved field.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
To help automatic detection we want pahole to report the same struct
layouts for 32 and 64 bit compiles. These cases are all implicit
padding added at the end of embedded structs as part of a union.
The added reserved fields have no impact on the ABI.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Generally we do a preload before doing idr allocation. This also help
improve the allocation success rate in memory pressure.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Even after the previous patch to drop lo_ctl_mutex while calling
vfs_getattr(), there are other cases where we can end up sleeping for a
long time while holding lo_ctl_mutex. Let's avoid the uninterruptible
sleep from the ioctls.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We hit an issue where a loop device on NFS was stuck in
loop_get_status() doing vfs_getattr() after the NFS server died, which
caused a pile-up of uninterruptible processes waiting on lo_ctl_mutex.
There's no reason to hold this lock while we wait on the filesystem;
let's drop it so that other processes can do their thing. We need to
grab a reference on lo_backing_file while we use it, and we can get rid
of the check on lo_device, which has been unnecessary since commit
a34c0ae9ebd6 ("[PATCH] loop: remove the bio remapping capability") in
the linux-history tree.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Ensure that device exists prior to accessing its properties.
Reported-by: <syzbot+71655d44855ac3e76366@syzkaller.appspotmail.com>
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Add missing check that device is connected prior to access it.
[ 55.358652] BUG: KASAN: null-ptr-deref in rdma_init_qp_attr+0x4a/0x2c0
[ 55.359389] Read of size 8 at addr 00000000000000b0 by task qp/618
[ 55.360255]
[ 55.360432] CPU: 1 PID: 618 Comm: qp Not tainted 4.16.0-rc1-00071-gcaf61b1b8b88 #91
[ 55.361693] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[ 55.363264] Call Trace:
[ 55.363833] dump_stack+0x5c/0x77
[ 55.364215] kasan_report+0x163/0x380
[ 55.364610] ? rdma_init_qp_attr+0x4a/0x2c0
[ 55.365238] rdma_init_qp_attr+0x4a/0x2c0
[ 55.366410] ucma_init_qp_attr+0x111/0x200
[ 55.366846] ? ucma_notify+0xf0/0xf0
[ 55.367405] ? _get_random_bytes+0xea/0x1b0
[ 55.367846] ? urandom_read+0x2f0/0x2f0
[ 55.368436] ? kmem_cache_alloc_trace+0xd2/0x1e0
[ 55.369104] ? refcount_inc_not_zero+0x9/0x60
[ 55.369583] ? refcount_inc+0x5/0x30
[ 55.370155] ? rdma_create_id+0x215/0x240
[ 55.370937] ? _copy_to_user+0x4f/0x60
[ 55.371620] ? mem_cgroup_commit_charge+0x1f5/0x290
[ 55.372127] ? _copy_from_user+0x5e/0x90
[ 55.372720] ucma_write+0x174/0x1f0
[ 55.373090] ? ucma_close_id+0x40/0x40
[ 55.373805] ? __lru_cache_add+0xa8/0xd0
[ 55.374403] __vfs_write+0xc4/0x350
[ 55.374774] ? kernel_read+0xa0/0xa0
[ 55.375173] ? fsnotify+0x899/0x8f0
[ 55.375544] ? fsnotify_unmount_inodes+0x170/0x170
[ 55.376689] ? __fsnotify_update_child_dentry_flags+0x30/0x30
[ 55.377522] ? handle_mm_fault+0x174/0x320
[ 55.378169] vfs_write+0xf7/0x280
[ 55.378864] SyS_write+0xa1/0x120
[ 55.379270] ? SyS_read+0x120/0x120
[ 55.379643] ? mm_fault_error+0x180/0x180
[ 55.380071] ? task_work_run+0x7d/0xd0
[ 55.380910] ? __task_pid_nr_ns+0x120/0x140
[ 55.381366] ? SyS_read+0x120/0x120
[ 55.381739] do_syscall_64+0xeb/0x250
[ 55.382143] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 55.382841] RIP: 0033:0x7fc2ef803e99
[ 55.383227] RSP: 002b:00007fffcc5f3be8 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
[ 55.384173] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc2ef803e99
[ 55.386145] RDX: 0000000000000057 RSI: 0000000020000080 RDI: 0000000000000003
[ 55.388418] RBP: 00007fffcc5f3c00 R08: 0000000000000000 R09: 0000000000000000
[ 55.390542] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400480
[ 55.392916] R13: 00007fffcc5f3cf0 R14: 0000000000000000 R15: 0000000000000000
[ 55.521088] Code: e5 4d 1e ff 48 89 df 44 0f b6 b3 b8 01 00 00 e8 65 50 1e ff 4c 8b 2b 49
8d bd b0 00 00 00 e8 56 50 1e ff 41 0f b6 c6 48 c1 e0 04 <49> 03 85 b0 00 00 00 48 8d 78 08
48 89 04 24 e8 3a 4f 1e ff 48
[ 55.525980] RIP: rdma_init_qp_attr+0x52/0x2c0 RSP: ffff8801e2c2f9d8
[ 55.532648] CR2: 00000000000000b0
[ 55.534396] ---[ end trace 70cee64090251c0b ]---
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Fixes: d541e45500bd ("IB/core: Convert ah_attr from OPA to IB when copying to user")
Reported-by: <syzbot+7b62c837c2516f8f38c8@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
In an effort to pick up channels that are in a funky state we
optimistically tried to open all channels that we found, with the
addition that we failed if the other side did not handshake the opening.
But as we're starting the modem a second time all channels are found -
in a "funky" state - and we try to open them. But the modem firmware
requires the IPCRTR to be up in order to initialize. So any channels we
try to open before that will fail and will not be opened again.
This takes care of the regression, at the cost of reintroducing the
previous behavior of handling of channels with "funky" states.
Reverts commit c12fc4519f60 ("rpmsg: smd: Create device for all channels")
Reported-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
process_one_req() can race with rdma_addr_cancel():
CPU0 CPU1
==== ====
process_one_work()
debug_work_deactivate(work);
process_one_req()
rdma_addr_cancel()
mutex_lock(&lock);
set_timeout(&req->work,..);
__queue_work()
debug_work_activate(work);
mutex_unlock(&lock);
mutex_lock(&lock);
[..]
list_del(&req->list);
mutex_unlock(&lock);
[..]
// ODEBUG explodes since the work is still queued.
kfree(req);
Causing ODEBUG to detect the use after free:
ODEBUG: free active (active state 0) object type: work_struct hint: process_one_req+0x0/0x6c0 include/net/dst.h:165
WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291 debug_print_object+0x166/0x220 lib/debugobjects.c:288
kvm: emulating exchange as write
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: ib_addr process_one_req
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
panic+0x1e4/0x41c kernel/panic.c:183
__warn+0x1dc/0x200 kernel/panic.c:547
report_bug+0x1f4/0x2b0 lib/bug.c:186
fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
RSP: 0000:ffff8801d966f210 EFLAGS: 00010086
RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e
RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000
RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8
R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001
R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0
__debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
kfree+0xc7/0x260 mm/slab.c:3799
process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592
process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
worker_thread+0x223/0x1990 kernel/workqueue.c:2247
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to MAC")
Reported-by: <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Edward Cree says:
====================
sfc: rework locking around filter management
The use of a spinlock to protect filter state combined with the need for a
sleeping operation (MCDI) to apply that state to the NIC (on EF10) led to
unfixable race conditions, around the handling of filter restoration after
an MC reboot.
So, this patch series removes the requirement to be able to modify the SW
filter table from atomic context, by using a workqueue to request
asynchronous filter operations (which are needed for ARFS). Then, the
filter table locks are changed to mutexes, replacing the dance of spinlocks
and 'busy' flags. Also, a mutex is added to protect the RSS context state,
since otherwise a similar race is possible around restoring that after an
MC reboot. While we're at it, fix a couple of other related bugs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The FLOW_RSS flag was causing us to insert UDP filters when TCP was wanted.
Fixes: 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Otherwise races are possible between ethtool ops and
efx_ef10_rx_restore_rss_contexts().
Also, don't try to perform the restore on every reset, only after an MC
reboot, otherwise we'll leak RSS contexts on the NIC.
Fixes: 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If some other operation gets the MCDI lock ahead of us and performs an MC
reboot, then our attempt to insert the filter will fail with EINVAL,
because the destination VI (spec->dmaq_id, MC_CMD_FILTER_OP_IN_RX_QUEUE) does
not exist. But the caller's request (which might e.g. be an ethtool ntuple
request from userland) isn't invalid, it just got unlucky; so return EAGAIN.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With this change, the spinlock efx->filter_lock is no longer used and is
thus removed.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
efx->filter_lock remains in place for use on farch, but EF10 now ignores it.
EFX_EF10_FILTER_FLAG_BUSY is no longer needed, hence it is removed.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of having an efx->type->filter_rfs_insert() method, just use
workitems with a worker function that calls efx->type->filter_insert().
The only user of this is efx_filter_rfs(), which now queues a call to
efx_filter_rfs_work().
Similarly, efx_filter_rfs_expire() is now a worker function called on a
new channel->filter_work work_struct, so the method
efx->type->filter_rfs_expire_one() is no longer called in atomic context.
We also add a new mutex efx->rps_mutex to protect the RPS state (efx->
rps_expire_channel, efx->rps_expire_index, and channel->rps_flow_id) so
that the taking of efx->filter_lock can be moved to
efx->type->filter_rfs_expire_one().
Thus, all filter table functions are now called in a sleepable context,
allowing them to use sleeping locks in a future patch.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kirill Tkhai says:
====================
Make pernet_operations always read locked
All the pernet_operations are converted, and the last one
is in this patchset (nfsd_net_ops acked by J. Bruce Fields).
So, it's the time to kill pernet_operations::async field,
and make setup_net() and cleanup_net() always require
the rwsem only read locked.
All further pernet_operations have to be developed to fit
this rule. Some of previous patches added a comment to
struct pernet_operations about that.
Also, this patchset renames net_sem to pernet_ops_rwsem
to make the target area of the rwsem is more clear visible,
and adds more comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds comments to different places to improve
readability.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net_sem is some undefined area name, so it will be better
to make the area more defined.
Rename it to pernet_ops_rwsem for better readability and
better intelligibility.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All pernet_operations are reviewed and converted, hooray!
Reflect this in core code: setup_net() and cleanup_net()
will take down_read() always.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These pernet_operations look similar to rpcsec_gss_net_ops,
they just create and destroy another caches. So, they also
can be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use relaxed I/O on the hot path. This achieves significant performance
improvements. On a 10G link, this makes a basic iperf TCP test go from
an average of 4.5 Gbits/sec to about 9.40 Gbits/sec.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Maxime: Commit message, cosmetic changes]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit c5ad119fb6c09b0297446be05bd66602fa564758
("net: sched: pfifo_fast use skb_array") driver is exposed
to an issue where it is hitting NULL skbs while handling TX
completions. Driver uses mmiowb() to flush the writes to the
doorbell bar which is a write-combined bar, however on x86
mmiowb() does not flush the write combined buffer.
This patch fixes this problem by replacing mmiowb() with wmb()
after the write combined doorbell write so that writes are
flushed and synchronized from more than one processor.
V1->V2:
-------
This patch was marked as "superseded" in patchwork.
(Not really sure for what reason).Resending it as v2.
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We tried to remove vq poll from wait queue, but do not check whether
or not it was in a list before. This will lead double free. Fixing
this by switching to use vhost_poll_stop() which zeros poll->wqh after
removing poll from waitqueue to make sure it won't be freed twice.
Cc: Darren Kenny <darren.kenny@oracle.com>
Reported-by: syzbot+c0272972b01b872e604a@syzkaller.appspotmail.com
Fixes: 2b8b328b61c79 ("vhost_net: handle polling errors when setting backend")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As found by the ubsan checker, the value of the 'index' variable can be
out of range for the bc[] array:
UBSAN: Undefined behaviour in arch/parisc/kernel/drivers.c:655:21
index 6 is out of range for type 'char [6]'
Backtrace:
[<104fa850>] __ubsan_handle_out_of_bounds+0x68/0x80
[<1019d83c>] check_parent+0xc0/0x170
[<1019d91c>] descend_children+0x30/0x6c
[<1059e164>] device_for_each_child+0x60/0x98
[<1019cd54>] parse_tree_node+0x40/0x54
[<1019d86c>] check_parent+0xf0/0x170
[<1019d91c>] descend_children+0x30/0x6c
[<1059e164>] device_for_each_child+0x60/0x98
[<1019d938>] descend_children+0x4c/0x6c
[<1059e164>] device_for_each_child+0x60/0x98
[<1019cd54>] parse_tree_node+0x40/0x54
[<1019cffc>] hwpath_to_device+0xa4/0xc4
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
|
|
Qemu now supports emulating PA-RISC machines. For that a forked version
of SeaBIOS available at https://github.com/hdeller/seabios-hppa is used
which requires some information about the emulated machine.
This patch adds code to generate a header file with the necessary
information for SeaBIOS. The information is extracted from the firmware
the current kernel is running on.
Tested on a B160L workstation.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Carlo Pisani noticed that his C3600 workstation behaved unstable during heavy
I/O on the PCI bus with a VIA VT6421 IDE/SATA PCI card.
To avoid such instability, this patch switches the LBA PCI bus from Hard Fail
mode into Soft Fail mode. In this mode the bus will return -1UL for timed out
MMIO transactions, which is exactly how the x86 (and most other architectures)
PCI busses behave.
This patch is based on a proposal by Grant Grundler and Kyle McMartin 10
years ago:
https://www.spinics.net/lists/linux-parisc/msg01027.html
Cc: Carlo Pisani <carlojpisani@gmail.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Reviewed-by: Grant Grundler <grantgrundler@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Make sure that the HPMC (High Priority Machine Check) handler is 16-byte
aligned and that it's length in the IVT is a multiple of 16 bytes.
Otherwise PDC may decide not to call the HPMC crash handler.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Matt Turner <mattst88@gmail.com>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Matt Turner <mattst88@gmail.com>
|
|
I'm not aware of any machines which won't be able to run our SMP kernel.
Refine the Kconfig help text.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Smatch warns that is_tiny can be used uninitialized:
arch/parisc/math-emu/fcnvff.c:297 dbl_to_sgl_fcnvff()
error: uninitialized symbol 'is_tiny'.
This code is very old so that suggests the bug doesn't have a huge
affect in real life. But I've read the code and it seems like a
reasonable warning. Either way it should be harmless to initialize it
to false and silence the static checker warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
On parisc we want to be as much as possible compatible to the major
architectures like x86. Those architectures have MAP_TYPE defined as 0x0f which
covers MAP_SHARED and MAP_PRIVATE and leaves two more bits unused.
In contrast, on parisc we have MAP_TYPE defined to 0x03 which covers MAP_SHARED
and MAP_PRIVATE only. But we don't have the 2 bits free as x86.
Usually that's not a problem, but during the discussions for pmem+dax support
the idea came up to use the two remaining bits of MAP_TYPE (on x86 and others)
for the new MAP_DIRECT and MAP_SYNC flags. One requirement is, that an old
kernel should correctly handle MAP_DIRECT and MAP_SYNC and fail on those if
set. This only works if MAP_TYPE has 4 bits.
Even though the pmem+dax people now choosed another solution via
MAP_SHARED_VALIDATE, let's still proceed to be more compatible to x86 by adding
two more bits for future usage.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Annotate user buffer and use NULL to avoid sparse warnings.
Signed-off-by: Helge Deller <deller@gmx.de>
|