Age | Commit message (Collapse) | Author |
|
IB_QPT_RAW_PACKET allows applications to build a complete packet,
including L2 headers, when sending; on the receive side, the HW will
not strip any headers.
This QP type is designed for userspace direct access to Ethernet; for
example by applications that do TCP/IP themselves. Only processes
with the NET_RAW capability are allowed to create raw packet QPs (the
name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
SOL_RAW sockets).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
When IPV6 is not enabled:
ERROR: "register_inet6addr_notifier" [drivers/infiniband/hw/ocrdma/ocrdma.ko] undefined!
ERROR: "unregister_inet6addr_notifier" [drivers/infiniband/hw/ocrdma/ocrdma.ko] undefined!
Fix this by wrapping the inet6 calls in #ifdef IPV6. Also make the
ocrdma module depend on (IPV6 || IPV6=n) to forbid the case of modular
ipv6 but built-in ocrdma (which can't work, because ocrdma calls ipv6
functions).
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
We only need to disable the IRQs one time.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[ Rename "wq_flags" to more conventional "flags." - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The ocrdma_alloc_lkey() function never returns NULL pointers -- it
returns ERR_PTRs.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Events sent to ocrdma_inet6addr_event() are sent from an atomic context,
therefore we can't try to lock a mutex within the notifier callback.
We could just switch the mutex to a spinlock since all it does it
protect a list, but I've gone ahead and switched the list to use RCU
instead. I couldn't fully test it since I don't have IB hardware, so
if it doesn't fully work for some reason let me know and I'll switch
it back to using a spinlock.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
[ Fixed locking in ocrdma_add(). - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
We need to set ib_evt.device, or else ib_dispatch_event() will crash
when we call it for unaffiliated events (and consumers may get
confused in their QP/CQ/SRQ event handler for affiliated events).
Also fix sparse warning:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:678:36: warning: Using plain integer as NULL pointer
There's no need to clear ib_evt, since every member is initialized.
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
First, fix
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c: In function 'ocrdma_alloc_pd':
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:371:17: warning: 'dpp_page_addr' may be used uninitialized in this function [-Wuninitialized]
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:337:6: note: 'dpp_page_addr' was declared here
which seems that it may border on a bug (the call to ocrdma_del_mmap()
might conceivably do bad things if pd->dpp_enabled is not set and
dpp_page_addr ends up with just the wrong value).
Also take care of:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c: In function 'ocrdma_init_hw':
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:2587:5: warning: 'status' may be used uninitialized in this function [-Wuninitialized]
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:2549:17: note: 'status' was declared here
which is only real if num_eq == 0, which should be impossible.
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
- Increase MSI-X vectors by 5 for RoCE traffic.
- Add macro to check roce support on a device.
- Add device-specific doorbell and MSI-X vector fields shared with NIC
functionality.
- Provide RoCE driver registration and deregistration functions.
- Add support functions which will be invoked on adapter add/remove
and port up/down events.
- Traverse through the list of adapters to invoke callback functions.
Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
- Add generic function to issue mailbox cmd on MQ as export function.
- RoCE driver will use this before it setups its own MQ.
Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
The following lockdep problem was reported by Or Gerlitz <ogerlitz@mellanox.com>:
[ INFO: possible recursive locking detected ]
3.3.0-32035-g1b2649e-dirty #4 Not tainted
---------------------------------------------
kworker/5:1/418 is trying to acquire lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0138a41>] rdma_destroy_i d+0x33/0x1f0 [rdma_cm]
but task is already holding lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disable_ca llback+0x24/0x45 [rdma_cm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/5:1/418:
#0: (ib_cm){.+.+.+}, at: [<ffffffff81042ac1>] process_one_work+0x210/0x4a 6
#1: ((&(&work->work)->work)){+.+.+.}, at: [<ffffffff81042ac1>] process_on e_work+0x210/0x4a6
#2: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disab le_callback+0x24/0x45 [rdma_cm]
stack backtrace:
Pid: 418, comm: kworker/5:1 Not tainted 3.3.0-32035-g1b2649e-dirty #4
Call Trace:
[<ffffffff8102b0fb>] ? console_unlock+0x1f4/0x204
[<ffffffff81068771>] __lock_acquire+0x16b5/0x174e
[<ffffffff8106461f>] ? save_trace+0x3f/0xb3
[<ffffffff810688fa>] lock_acquire+0xf0/0x116
[<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff81364351>] mutex_lock_nested+0x64/0x2ce
[<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffff81065abc>] ? trace_hardirqs_on+0xd/0xf
[<ffffffffa0138a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffffa0139c02>] cma_req_handler+0x418/0x644 [rdma_cm]
[<ffffffffa012ee88>] cm_process_work+0x32/0x119 [ib_cm]
[<ffffffffa0130299>] cm_req_handler+0x928/0x982 [ib_cm]
[<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
[<ffffffffa0130326>] cm_work_handler+0x33/0xfe5 [ib_cm]
[<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
[<ffffffff81042b6e>] process_one_work+0x2bd/0x4a6
[<ffffffff81042ac1>] ? process_one_work+0x210/0x4a6
[<ffffffff813669f3>] ? _raw_spin_unlock_irq+0x2b/0x40
[<ffffffff8104316e>] worker_thread+0x1d6/0x350
[<ffffffff81042f98>] ? rescuer_thread+0x241/0x241
[<ffffffff81046a32>] kthread+0x84/0x8c
[<ffffffff8136e854>] kernel_thread_helper+0x4/0x10
[<ffffffff81366d59>] ? retint_restore_args+0xe/0xe
[<ffffffff810469ae>] ? __init_kthread_worker+0x56/0x56
[<ffffffff8136e850>] ? gs_change+0xb/0xb
The actual locking is fine, since we're dealing with different locks,
but from the same lock class. cma_disable_callback() acquires the
listening id mutex, whereas rdma_destroy_id() acquires the mutex for
the new connection id. To fix this, delay the call to
rdma_destroy_id() until we've released the listening id mutex.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Since XRC support was added, the uverbs code has locked SRQ, CQ and PD
objects needed during QP and SRQ creation in different orders
depending on the the code path. This leads to the (at least
theoretical) possibility of deadlock, and triggers the lockdep splat
below.
Fix this by making sure we always lock the SRQ first, then CQs and
finally the PD.
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc5+ #34 Not tainted
-------------------------------------------------------
ibv_srq_pingpon/2484 is trying to acquire lock:
(SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
but task is already holding lock:
(CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (CQ-uobj){+++++.}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #1 (PD-uobj){++++++}:
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs]
[<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
-> #0 (SRQ-uobj){+++++.}:
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(CQ-uobj);
lock(PD-uobj);
lock(CQ-uobj);
lock(SRQ-uobj);
*** DEADLOCK ***
3 locks held by ibv_srq_pingpon/2484:
#0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs]
#1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
#2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
stack backtrace:
Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34
Call Trace:
[<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209
[<ffffffff81070898>] __lock_acquire+0xa29/0xd06
[<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs]
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070fd0>] lock_acquire+0xbf/0xfe
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffff81070eee>] ? lock_release+0x166/0x189
[<ffffffff81384f28>] down_read+0x34/0x43
[<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs]
[<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs]
[<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs]
[<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe
[<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffff810d470f>] ? might_fault+0x40/0x90
[<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs]
[<ffffffff810fe47f>] vfs_write+0xa7/0xee
[<ffffffff810ff736>] ? fget_light+0x3b/0x99
[<ffffffff810fe65f>] sys_write+0x45/0x69
[<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Add names for our lockdep classes, so instead of having to decipher
lockdep output with mysterious names:
Chain exists of:
key#14 --> key#11 --> key#13
lockdep will give us something nicer:
Chain exists of:
SRQ-uobj --> PD-uobj --> CQ-uobj
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Change sizeof(array)/sizeof(array[0]) to ARRAY_SIZE.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Function import_ep() is incorrectly using ep->dst instead of the dst
ptr passed in. This causes a crash when accepting new rdma connections
becase ep->dst is not initialized yet.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
Just as we don't allow PDs, CQs, etc. to be destroyed if there are QPs
that are attached to them, don't let a QP be destroyed if there are
multicast group(s) attached to it. Use the existing usecnt field of
struct ib_qp which was added by commit 0e0ec7e ("RDMA/core: Export
ib_open_qp() to share XRC TGT QPs") to track this.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull xen fixes from Konrad Rzeszutek Wilk:
- fix to Kconfig to make it fit within 80 line characters,
- two bootup fixes (AMD 8-core and with PCI BIOS),
- cleanup code in a Xen PV fb driver,
- and a crash fix when trying to see non-existent PTE's
* tag 'stable/for-linus-3.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/Kconfig: fix Kconfig layout
xen/pci: don't use PCI BIOS service for configuration space accesses
xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs
xen/apic: Return the APIC ID (and version) for CPU 0.
drivers/video/xen-fbfront.c: add missing cleanup code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull two percpu fixes from Tejun Heo:
"One adds missing KERN_CONT on split printk()s and the other makes
the percpu allocator avoid using PMD_SIZE as atom_size on x86_32.
Using PMD_SIZE led to vmalloc area exhaustion on certain
configurations (x86_32 android) and the only cost of using PAGE_SIZE
instead is static percpu area not being aligned to large page
mapping."
* 'for-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu, x86: don't use PMD_SIZE as embedded atom_size on 32bit
percpu: use KERN_CONT in pcpu_dump_alloc_info()
|
|
Pull ARM fixes from Russell King:
"This is mainly audit fixes, found by folks who happened to enable this
feature and then found it broke their user applications."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7414/1: SMP: prevent use of the console when using idmap_pgd
ARM: 7412/1: audit: use only AUDIT_ARCH_ARM regardless of endianness
ARM: 7411/1: audit: fix treatment of saved ip register during syscall tracing
ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve
|
|
Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:
9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment
Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.
Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).
Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.
iptables -I PREROUTING -t raw -p tcp --dport 8888 \
-j CT --helper ftp
And you disabled the automatic helper assignment.
I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This makes logging a bit clearer as it gives the actual bus location and
makes things like board hookup a bit smoother.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
This refreshes the "timeout" attribute in existing expectations if one is
given.
The use case for this would be for userspace helpers to extend the lifetime
of the expectation when requested, as this is not possible right now
without deleting/recreating the expectation.
I use this specifically for forwarding DCERPC traffic through:
DCERPC has a port mapper daemon that chooses a (seemingly) random port for
future traffic to go to. We expect this traffic (with a reasonable
timeout), but sometimes the port mapper will tell the client to continue
using the same port. This allows us to extend the expectation accordingly.
Signed-off-by: Kelvie Wong <kelvie@ieee.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max
needs to be exported.
The dependency was added by "ipvs: wakeup master thread" patch.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.
This quiets the sparse warnings:
warning: symbol '__ipvs_proto_data_get' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Functions not referenced outside of a source file should be marked
static to prevent it from being exposed globally.
This quiets the sparse warnings:
warning: symbol 'ip_vs_ftp_init' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
cp->flags is marked volatile but ip_vs_bind_dest
can safely modify the flags, so save some CPU cycles by
using temp variable.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.
The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.
Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.
Make sure the backup socks do not block on reading.
Special thanks to Aleksey Chudov for helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.
sync_refresh_period: in seconds, difference in reported connection
timer that triggers new sync message. It can be used to
avoid sync messages for the specified period (or half of
the connection timeout if it is lower) if connection state
is not changed from last sync.
sync_retries: integer, 0..3, defines sync retries with period of
sync_refresh_period/8. Useful to protect against loss of
sync messages.
Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.
Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.
As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.
Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,
Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.
Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.
As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).
Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.
Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.
Thanks to Pablo Neira Ayuso for his valuable feedback!
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Initially, when the synced connection is created we
use the forwarding method provided by master but once we
bind to destination it can be changed. As result, we must
update the application and the transmitter.
As ip_vs_try_bind_dest is called always for connections
that require dest binding, there is no need to validate the
cp and dest pointers.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
As IP_VS_CONN_F_NOOUTPUT is derived from the
forwarding method we should get it from conn_flags just
like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
to real server.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.
This is safe since it will always run from a process context.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
They are called only on initialization.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.
When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.
This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.
Also update Documentation with current brnf default settings.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper
[ Note: flows that already got a helper will keep using it even
if automatic helper assignment has been disabled ]
Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.
There are good reasons to stop supporting automatic helper
assignment, for further information, please read:
http://www.netfilter.org/news.html#2012-04-03
This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If RSS is disabled on the PF (efx->n_rx_channels == 1) we try to set
up the indirection table so that VFs can use it, setting
efx->rss_spread = efx_vf_size(efx). But if SR-IOV was disabled at
compile time, this evaluates to 0 and we end up dividing by zero when
initialising the table.
I considered changing the fallback definition of efx_vf_size() to
return 1, but its value is really meaningless if we are not going to
enable VFs. Therefore add a condition of efx_sriov_wanted(efx) in
efx_probe_interrupts().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
|
|
Use devres to implement dev_get_regmap(). This should mean that in almost
all cases devices wishing to take advantage of framework features based on
regmap shouldn't need to explicitly pass the regmap into the framework.
This simplifies device setup a bit.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
* ret variable initialization removed as useless
* similar code strings concatenated and functions code
flow became more plain
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Make the return value explicitly true or false.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|