Age | Commit message (Collapse) | Author |
|
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.
The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.
Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.
Make sure the backup socks do not block on reading.
Special thanks to Aleksey Chudov for helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.
sync_refresh_period: in seconds, difference in reported connection
timer that triggers new sync message. It can be used to
avoid sync messages for the specified period (or half of
the connection timeout if it is lower) if connection state
is not changed from last sync.
sync_retries: integer, 0..3, defines sync retries with period of
sync_refresh_period/8. Useful to protect against loss of
sync messages.
Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.
Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.
As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.
Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,
Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.
Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.
As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).
Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.
Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.
Thanks to Pablo Neira Ayuso for his valuable feedback!
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Initially, when the synced connection is created we
use the forwarding method provided by master but once we
bind to destination it can be changed. As result, we must
update the application and the transmitter.
As ip_vs_try_bind_dest is called always for connections
that require dest binding, there is no need to validate the
cp and dest pointers.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
As IP_VS_CONN_F_NOOUTPUT is derived from the
forwarding method we should get it from conn_flags just
like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
to real server.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.
This is safe since it will always run from a process context.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
Schedulers are initialized and bound to services only
on commands.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
They are called only on initialization.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
|
|
if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.
When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.
This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.
Also update Documentation with current brnf default settings.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper
[ Note: flows that already got a helper will keep using it even
if automatic helper assignment has been disabled ]
Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.
There are good reasons to stop supporting automatic helper
assignment, for further information, please read:
http://www.netfilter.org/news.html#2012-04-03
This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If RSS is disabled on the PF (efx->n_rx_channels == 1) we try to set
up the indirection table so that VFs can use it, setting
efx->rss_spread = efx_vf_size(efx). But if SR-IOV was disabled at
compile time, this evaluates to 0 and we end up dividing by zero when
initialising the table.
I considered changing the fallback definition of efx_vf_size() to
return 1, but its value is really meaningless if we are not going to
enable VFs. Therefore add a condition of efx_sriov_wanted(efx) in
efx_probe_interrupts().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
|
|
Use devres to implement dev_get_regmap(). This should mean that in almost
all cases devices wishing to take advantage of framework features based on
regmap shouldn't need to explicitly pass the regmap into the framework.
This simplifies device setup a bit.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
* ret variable initialization removed as useless
* similar code strings concatenated and functions code
flow became more plain
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Make the return value explicitly true or false.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case of short marker, the number of received packets was not
incremented doing a zero divide when computing the filling rate.
Reported-by: Hans Petter Selasky <hans.petter.selasky@bitfrost.no>
Signed-off-by: Jean-François Moine <moinejf@free.fr>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
With the embed percpu first chunk allocator, x86 uses either PAGE_SIZE
or PMD_SIZE for atom_size. PMD_SIZE is used when CPU supports PSE so
that percpu areas are aligned to PMD mappings and possibly allow using
PMD mappings in vmalloc areas in the future. Using larger atom_size
doesn't waste actual memory; however, it does require larger vmalloc
space allocation later on for !first chunks.
With reasonably sized vmalloc area, PMD_SIZE shouldn't be a problem
but x86_32 at this point is anything but reasonable in terms of
address space and using larger atom_size reportedly leads to frequent
percpu allocation failures on certain setups.
As there is no reason to not use PMD_SIZE on x86_64 as vmalloc space
is aplenty and most x86_64 configurations support PSE, fix the issue
by always using PMD_SIZE on x86_64 and PAGE_SIZE on x86_32.
v2: drop cpu_has_pse test and make x86_64 always use PMD_SIZE and
x86_32 PAGE_SIZE as suggested by hpa.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yanmin Zhang <yanmin.zhang@intel.com>
Reported-by: ShuoX Liu <shuox.liu@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4F97BA98.6010001@intel.com>
Cc: stable@vger.kernel.org
|
|
This patch removes a redundant metadata block check. See description below.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some
guest memory pages into host memory so that they can be safely accessed
from usermode. It does this used get_user_pages_fast(). When the VPA is
unregistered, or the VCPUs are cleaned up, these pages are released using
put_page().
However, the get_user_pages() is invoked on the specific memory are of the
VPA which could lie within hugepages. In case the pinned page is huge,
we explicitly find the head page of the compound page before calling
put_page() on it.
At least with the latest kernel, this is not correct. put_page() already
handles finding the correct head page of a compound, and also deals with
various counts on the individual tail page which are important for
transparent huge pages. We don't support transparent hugepages on Power,
but even so, bypassing this count maintenance can lead (when the VM ends)
to a hugepage being released back to the pool with a non-zero mapcount on
one of the tail pages. This can then lead to a bad_page() when the page
is released from the hugepage pool.
This removes the explicit compound_head() call to correct this bug.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Commit 943bc7e110f2 ("x86: Fix section warnings") added
__cpuinitdata here, while for functions __cpuinit should be
used.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <sp@numascale.com>
Link: http://lkml.kernel.org/r/4FA947910200007800082470@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Steffen Persvold <sp@numascale.com>
|
|
This reverts commit 785f857d1cb0856b612b46a0545b74aa2596e44a.
The commit causes a problem with the wrong D3 state after suspend
because the call of hda_set_power_state() involves with the power-up
sequence, which changes the power_count, and this confuses the resume
sequence that checks the power_count as well.
Originally, this go-to-D3 sequence should be a simple task without the
power-up sequence. But, it'd need some proper sanity checks in the
case of power-saved state, so it's not too easy to write now in the
3.4-rc cycle.
In short, the safest option now is to revert this affecting commit.
Of course, we need to clean up and robustify the power-saving code
better for 3.5 kernel.
Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Include the header to pickup the definitions of the exported symbols.
Quiets the following sparse warnings:
warning: symbol 'vb2_dma_contig_memops' was not declared. Should it be static?
warning: symbol 'vb2_dma_contig_init_ctx' was not declared. Should it be static?
warning: symbol 'vb2_dma_contig_cleanup_ctx' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
as NULL pointer
The function vb2_dma_contig_vaddr returns a void * not an integer.
Quiets the sparse noise:
warning: Using plain integer as NULL pointer
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The vb2_get_vma() function is called by videobuf2-dma-contig. Export it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The commit 3b4c34aac7abea4754059084d0eef667a1993ac8
"s5p-fimc: Add support for VIDIOC_PREPARE_BUF/CREATE_BUFS ioctls"
added a handler for VIDIOC_CREATE_BUFS ioctl, but the queue_setup
callback wasn't updated to properly interpret the pixel format.
In this situation memory corruption may happen with VIDIOC_CREATE_BUFS
ioctl. Update the queue_setup op to fix this.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The call for alc_auto_parse_customize_define() must be done after the
fixup pre-probe initialization. Otherwise SKU_IGNORE fixup won't work
properly (e.g. HP RP5800 with ALC662 codec).
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Replace __HAVE_ARCH_TASK_ALLOCATOR and __HAVE_ARCH_THREAD_ALLOCATOR
with proper config switches.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20120505150142.371309416@linutronix.de
|
|
Exaclty the same as the core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20120505150142.252861878@linutronix.de
|
|
No point in using kmalloc for allocating 2 pages.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Lennox Wu <lennox.wu@gmail.com>
Link: http://lkml.kernel.org/r/20120505150142.123383955@linutronix.de
|
|
The core now has a threadinfo allocator which uses a kmemcache when
THREAD_SIZE < PAGE_SIZE.
Deal with the xstate cleanup in the new arch_release_task_struct()
function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Mundt <lethal@linux-sh.org>
Link: http://lkml.kernel.org/r/20120505150142.189348931@linutronix.de
|
|
Let the core code allocate and handle the kgdb cleanup with the
arch_release_thread_info() function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120505150141.996582377@linutronix.de
|
|
The core now has a threadinfo allocator which uses a kmemcache when
THREAD_SIZE < PAGE_SIZE.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20120505150142.059161130@linutronix.de
|
|
No point in using kmalloc for allocating 0, 1 resp. 2 pages for
threadinfo.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Link: http://lkml.kernel.org/r/20120505150141.936950979@linutronix.de
|
|
The core now has a threadinfo allocator which uses a kmemcache when
THREAD_SIZE < PAGE_SIZE.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Kuo <rkuo@codeaurora.org>
Link: http://lkml.kernel.org/r/20120505150141.812612113@linutronix.de
|
|
No reason why m32r needs to use kmalloc to allocate 2 pages instead of
using the core allocator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Link: http://lkml.kernel.org/r/20120505150141.875430830@linutronix.de
|
|
The core now has a threadinfo allocator which uses a kmemcache when
THREAD_SIZE < PAGE_SIZE.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120505150141.751600045@linutronix.de
|
|
There is no functional difference. __get_free_pages() ends up calling
alloc_pages_node().
This also allocates only one page which matches THREAD_SIZE instead of
an extra page for nothing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Link: http://lkml.kernel.org/r/20120505150141.681236240@linutronix.de
|
|
The only difference is the free_thread_info function, which frees
xstate.
Use the new arch_release_task_struct() function instead and switch
over to the core allocator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120505150141.559556763@linutronix.de
Cc: x86@kernel.org
|
|
There is no functional difference. __get_free_pages() ends up calling
alloc_pages_node().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Salter <msalter@redhat.com>
Link: http://lkml.kernel.org/r/20120505150141.621728944@linutronix.de
|
|
Several architectures have their own kmemcache based thread allocator
because THREAD_SIZE is smaller than PAGE_SIZE. Add it to the core code
conditionally on THREAD_SIZE < PAGE_SIZE so the private copies can go.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120505150141.491002124@linutronix.de
|
|
Reason: Pull in the separate branch which was created so arch/tile can
base further work on it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When setting TRY crop on the sub-device the mutex was erroneously acquired
rather than released on exit path. This bug is present in kernels starting
from v3.2.
Cc: stable@vger.kernel.org
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Use the core allocator and deal with the extra cleanup in
arch_release_thread_info().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Link: http://lkml.kernel.org/r/20120505150142.311126440@linutronix.de
|