Age | Commit message (Collapse) | Author |
|
In the following commit:
7675104990ed ("sched: Implement lockless wake-queues")
we gained lockless wake-queues.
The -RT kernel managed to lockup itself with those. There could be multiple
attempts for task X to enqueue it for a wakeup _even_ if task X is already
running.
The reason is that task X could be runnable but not yet on CPU. The the
task performing the wakeup did not leave the CPU it could performe
multiple wakeups.
With the proper timming task X could be running and enqueued for a
wakeup. If this happens while X is performing a fork() then its its
child will have a !NULL `wake_q` member copied.
This is not a problem as long as the child task does not participate in
lockless wakeups :)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 7675104990ed ("sched: Implement lockless wake-queues")
Link: http://lkml.kernel.org/r/20151221171710.GA5499@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some of the sched bitfieds (notably sched_reset_on_fork) can be set
on other than current, this can cause the r-m-w to race with other
updates.
Since all the sched bits are serialized by scheduler locks, pull them
in a separate word.
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: hannes@cmpxchg.org
Cc: mhocko@kernel.org
Cc: vdavydov@parallels.com
Link: http://lkml.kernel.org/r/20151125150207.GM11639@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Our global init task can have sub-threads, so ->pid check is not reliable
enough for is_global_init(), we need to check tgid instead. This has been
spotted by Oleg and a fix was proposed by Richard a long time ago (see the
link below).
Oleg wrote:
: Because is_global_init() is only true for the main thread of /sbin/init.
:
: Just look at oom_unkillable_task(). It tries to not kill init. But, say,
: select_bad_process() can happily find a sub-thread of is_global_init()
: and still kill it.
I recently hit the problem in question; re-sending the patch (to the
best of my knowledge it has never been submitted) with updated function
comment. Credit goes to Oleg and Richard.
Suggested-by: Richard Guy Briggs <rgb@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W . Biederman <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge E . Hallyn <serge.hallyn@ubuntu.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://www.redhat.com/archives/linux-audit/2013-December/msg00086.html
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX'
on 32-bit systems:
UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18
signed integer overflow:
87950 * 47742 cannot be represented in type 'int'
The most likely effect of this bug are bad load average numbers
resulting in weird scheduling. It's also likely that this can
persist for a longer time - until the system goes idle for
a long time so that all load avg numbers get reset.
[ This is the CFS load average metric, not the procfs output, which
is separate. ]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/1450097243-30137-1-git-send-email-aryabinin@virtuozzo.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
There's a race on CPU unplug where we free the swevent hash array
while it can still have events on. This will result in a
use-after-free which is BAD.
Simply do not free the hash array on unplug. This leaves the thing
around and no use-after-free takes place.
When the last swevent dies, we do a for_each_possible_cpu() iteration
anyway to clean these up, at which time we'll free it, so no leakage
will occur.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
I managed to tickle this warning:
[ 2338.884942] ------------[ cut here ]------------
[ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
[ 2338.900504] Modules linked in:
[ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
[ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[ 2338.923071] ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
[ 2338.931382] ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
[ 2338.939678] 0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
[ 2338.947987] Call Trace:
[ 2338.950726] [<ffffffff815c680c>] dump_stack+0x4e/0x82
[ 2338.956474] [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
[ 2338.963195] [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
[ 2338.969720] [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
[ 2338.976244] [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
[ 2338.982575] [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
[ 2338.988810] [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
[ 2338.995339] [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
[ 2339.001669] [<ffffffff8121e297>] search_binary_handler+0x97/0x200
[ 2339.008581] [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
[ 2339.016072] [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
[ 2339.021819] [<ffffffff819fc165>] stub_execve+0x5/0x5
[ 2339.027469] [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
[ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
what we expected it to be.
This is because context switches can swap the task_struct::perf_event_ctxp[]
pointer around. Therefore you have to either disable preemption when looking
at current, or hold ctx->lock.
Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
before disabling interrupts, therefore a preemption in the right place
can swap contexts around and we're using the wrong one.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: syzkaller <syzkaller@googlegroups.com>
Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
arch/x86/built-in.o: In function `arch_setup_additional_pages':
(.text+0x587): undefined reference to `pvclock_pvti_cpu0_va'
KVM_GUEST selects PARAVIRT_CLOCK, so we can make pvclock_pvti_cpu0_va depend
on KVM_GUEST.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/444d38a9bcba832685740ea1401b569861d09a72.1451446564.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This reverts commit e958e079e254 ("dmaengine: mic_x100: add missing
spin_unlock").
The above patch is incorrect. There is nothing wrong with the original
code. The spin_lock is acquired in the "prep" functions and released
in "submit".
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
|
|
When a qdisc is using per cpu stats (currently just the ingress
qdisc) only the bstats are being freed. This also free's the qstats.
Fixes: b0ab6f92752b9f9d8 ("net: sched: enable per cpu qstats")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The LSR instruction cannot be used to perform a zero right shift since a
0 as the immediate value (imm5) in the LSR instruction encoding means
that a shift of 32 is perfomed. See DecodeIMMShift() in the ARM ARM.
Make the JIT skip generation of the LSR if a zero-shift is requested.
This was found using american fuzzy lop.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This socket-lookup path did not pass along the skb in question
in my original BPF-based socket selection patch. The skb in the
udpN_lib_lookup2 path can be used for BPF-based socket selection just
like it is in the 'traditional' udpN_lib_lookup path.
udpN_lib_lookup2 kicks in when there are greater than 10 sockets in
the same hlist slot. Coincidentally, I chose 10 sockets per
reuseport group in my functional test, so the lookup2 path was not
excersised. This adds an additional set of tests with 20 sockets.
Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Fixes: 3ca8e4029969 ("soreuseport: BPF selection functional test")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit acf673a3187edf72068ee2f92f4dc47d66baed47 fixed a user triggerable free
memory scribble but in doing so replaced it with a different one that allows
the user to control the data and scribble even more.
sixpack_close is called by the tty layer in tty context. The tty context is
protected by sp_get() and sp_put(). However network layer activity via
sp_xmit() is not protected this way. We must therefore stop the queue
otherwise the user gets to dump a buffer mostly of their choice into freed
kernel pages.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During initialization, when creating the send descriptor queues (SDQs),
we specify the CPU egress traffic class of each SDQ. The maximum number
of classes of this type is different in the two ASICs supported by this
PCI driver.
New firmware versions check this value is set correctly, which causes
errors on the Spectrum ASIC, as its max exposed egress traffic class is
lower than 7.
Solve this by setting this field to 3, which is an acceptable value for
both ASICs.
Note that we currently do not expose the QoS capabilities of the ASICs,
so setting this to an hardcoded value is OK for now.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value. All the BPF JITs fail to clear A if this is used as
the first instruction in a filter. This was found using american fuzzy
lop.
Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs. Except for ARM, the
rest have only been compile-tested.
Fixes: 3480593131e0 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
brcfmac
* fix IBSS which got broken over time
* new USB id for bcm43242 dongle
* arp offload configuration through inet notifier
ath9k
* add random number generator support (CONFIG_ATH9K_HWRNG)
iwlwifi
* Make scan parameters low latency aware
* Fix in the NL80211_FEATURE_FULL_AP_CLIENT_STATE state case
* Fix enable injection mode (Chaya Rachel)
* Various cleanups (Dan / Julia / myself)
* Allow to stay more time on popular channels (David Spinadel)
* Bug fixes for D0i3 (Eliad / Luca)
* Fixes for GO uAPSD (myself)
* Start of TSO support (myself)
* Rate control bug fixes (Eyal / Gregory)
* Start the work on 9000 devices (Johannes / Sara / Oren)
* Start the work on a new Tx queue allocation model (Liad)
* Debug infrastructure enhancements (Golan)
mwifiex
* add a debugfs file for chip reset
* advertise SMS4 cipher suite
* increase ap and station interface limit to 3
* enable MSI support on newer pcie devices (8897 onwards)
rtlwifi
* fix lots of module parameter usage
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gcc fails to see that the use of the 'last_offset' variable
in hns_nic_reuse_page() is used correctly and issues a bogus
warning:
drivers/net/ethernet/hisilicon/hns/hns_enet.c: In function 'hns_nic_reuse_page':
drivers/net/ethernet/hisilicon/hns/hns_enet.c:541:6: warning: 'last_offset' may be used uninitialized in this function [-Wmaybe-uninitialized]
This simplifies the function to make it more obvious what is
going on to both readers and compilers, which makes the warning
go away.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The only user was removed in commit
029f7f3b8701cc7a ("netfilter: ipv6: nf_defrag: avoid/free clone operations").
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sprintf() can access memory outside of the range of the character array,
and is risky in some situations. The driver specified prop_name string
can be longer than NAME_MAX here (only an attacker will do that though)
and so blindly copying it into the character array of size NAME_MAX
isn't safe. Instead we must use snprintf() here.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
stm_is_locked_sr() takes the status register (SR) value as the last
parameter, not the second.
Reported-by: Bayi Cheng <bayi.cheng@mediatek.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Bayi Cheng <bayi.cheng@mediatek.com>
|
|
Spansion and Winbond have occasionally used the same manufacturer ID,
and they don't support the same features. Particularly, writing SR=0
seems to break read access for Spansion's s25fl064k. Unfortunately, we
don't currently have a way to differentiate these Spansion and Winbond
parts, so rather than regressing support for these Spansion flash, let's
drop the new Winbond lock/unlock support for now. We can try to address
Winbond support during the next release cycle.
Original discussion:
http://patchwork.ozlabs.org/patch/549173/
http://patchwork.ozlabs.org/patch/553683/
Fixes: 357ca38d4751 ("mtd: spi-nor: support lock/unlock/is_locked for Winbond")
Fixes: c6fc2171b249 ("mtd: spi-nor: disable protection for Winbond flash at startup")
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reported-by: Felix Fietkau <nbd@openwrt.org>
Cc: Felix Fietkau <nbd@openwrt.org>
|
|
|
|
|
|
|
|
|
|
[I stole this patch from Eric Biederman. He wrote:]
> There is no defined mechanism to pass network namespace information
> into /sbin/bridge-stp therefore don't even try to invoke it except
> for bridge devices in the initial network namespace.
>
> It is possible for unprivileged users to cause /sbin/bridge-stp to be
> invoked for any network device name which if /sbin/bridge-stp does not
> guard against unreasonable arguments or being invoked twice on the
> same network device could cause problems.
[Hannes: changed patch using netns_eq]
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IOCTL SIOCRTMSG does nothing but return EINVAL.
So comment it as unused.
SIOCRTMSG is only used in:
* net/ipv4/af_inet.c
* include/uapi/linux/sockios.h
inet_ioctl calls ip_rt_ioctl.
ip_rt_ioctl only handles SIOCADDRT and SIOCDELRT and returns -EINVAL
otherwise.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two more fixes:
1. The recordmcount change had an output that used sprintf()
(incorrectly) when it should have been a fprintf() to stderr.
2. The printk_formats file could crash if someone added a
trace_printk() in the core kernel, and also added one in a module.
This does not affect production kernels. Only kernels where
developers add trace_printk() for debugging can crash"
* tag 'trace-v4.4-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix setting of start_index in find_next()
ftrace/scripts: Fix incorrect use of sprintf in recordmcount
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull tile bugfix from Chris Metcalf:
"This fixes a bug that Sudip's buildbot found for tilepro allmodconfig.
I've tagged it for stable only back to 3.19, which was when most of
the other affected architectures added their support for working
around this issue"
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro
|
|
Saeed Mahameed says:
====================
Introduce mlx5 ethernet timestamping
This patch series introduces the support for ConnectX-4 timestamping
and the PTP kernel interface.
Changes from V2:
net/mlx5_core: Introduce access function to read internal_timer
- Remove one line function
- Change function name
net/mlx5e: Add HW timestamping (TS) support:
- Data path performance optimization (caching tstamp struct in rq,sq)
- Change read/write_lock_irqsave to read/write_lock
- Move ioctl functions to en_clock file
- Changed overflow start algorithm according to comments from Richard
- Move timestamp init/cleanup to open/close ndos.
In details:
1st patch prevents the driver from modifying skb->data and SKB CB in
device xmit function.
2nd patch adds the needed low level helpers for:
- Fetching the hardware clock (hardware internal timer)
- Parsing CQEs timestamps
- Device frequency capability
3rd patch adds new en_clock.c file that handles all needed timestamping
operations:
- Internal clock structure initialization and other helper functions
- Added the needed ioctl for setting/getting the current timestamping
configuration.
- used this configuration in RX/TX data path to fill the SKB with
the timestamp.
4th patch Introduces PTP (PHC) support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a PHC support to the mlx5_en driver. Use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled. This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.
The driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-4 card.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for enable/disable HW timestamping for incoming and/or
outgoing packets. To enable/disable HW timestamping appropriate
ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and
HWTSAMP_TX_ON/OFF only are supported. Make all relevant changes in
RX/TX flows to consider TS request and plant HW timestamps into
relevant structures.
Add internal clock for converting hardware timestamp to nanoseconds. In
addition, add a service task to catch internal clock overflow, to make
sure timestamping is accurate.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A preparation step which adds support for reading the hardware
internal timer and the hardware timestamping from the CQE.
In addition, advertize device_frequency_khz HCA capability.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the SKB is cloned, or has an elevated users count, someone else
can be looking at it at the same time.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
and 'regmap/topic/seq' into regmap-next
|
|
'regmap/topic/irq-type' into regmap-next
|
|
|
|
|
|
Unlike the registers file we don't have any substantial performance
concerns rendering the entire file (it involves no device accesses) so
just use seq_printf() to simplify the code.
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Some of devices supports the trigger level for interrupt
like rising/falling edge specially for GPIOs. The interrupt
support of such devices may have uses the generic regmap irq
framework for implementation.
Add support to configure the trigger type device interrupt
register via regmap-irq framework. The regmap-irq framework
configures the trigger register only if the details of trigger
type registers are provided.
[Fixed use of terery operator for legibility -- broonie]
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Xin Long says:
====================
sctp: use transport hashtable to replace association's with rhashtable
for telecom center, the usual case is that a server is connected by thousands
of clients. but if the server with only one enpoint(udp style) use the same
sport and dport to communicate with every clients, and every assoc in server
will be hashed in the same chain of global assoc hashtable due to currently we
choose dport and sport as the hash key.
when a packet is received, sctp_rcv try to find the assoc with sport and dport,
since that chain is too long to find it fast, it make the performance turn to
very low, some test data is as follow:
in server:
$./ss [start a udp style server there]
in client:
$./cc [start 2500 sockets to connect server with same port and different ip,
and use one of them to send data to server]
===== test on net-next
-- perf top
server:
55.73% [kernel] [k] sctp_assoc_is_match
6.80% [kernel] [k] sctp_assoc_lookup_paddr
4.81% [kernel] [k] sctp_v4_cmp_addr
3.12% [kernel] [k] _raw_spin_unlock_irqrestore
1.94% [kernel] [k] sctp_cmp_addr_exact
client:
46.01% [kernel] [k] sctp_endpoint_lookup_assoc
5.55% libc-2.17.so [.] __libc_calloc
5.39% libc-2.17.so [.] _int_free
3.92% libc-2.17.so [.] _int_malloc
3.23% [kernel] [k] __memset
-- spent time
time is 487s, send pkt is 10000000
we need to change the way to calculate the hash key, to use lport +
rport + paddr as the hash key can avoid this issue.
besides, this patchset will use transport hashtable to replace
association hashtable to lookup with rhashtable api. get transport
first then get association by t->asoc. and also it will make tcp
style work better.
===== test with this patchset:
-- perf top
server:
15.98% [kernel] [k] _raw_spin_unlock_irqrestore
9.92% [kernel] [k] __pv_queued_spin_lock_slowpath
7.22% [kernel] [k] copy_user_generic_string
2.38% libpthread-2.17.so [.] __recvmsg_nocancel
1.88% [kernel] [k] sctp_recvmsg
client:
11.90% [kernel] [k] sctp_hash_cmp
8.52% [kernel] [k] rht_deferred_worker
4.94% [kernel] [k] __pv_queued_spin_lock_slowpath
3.95% [kernel] [k] sctp_bind_addr_match
2.49% [kernel] [k] __memset
-- spent time
time is 22s, send pkt is 10000000
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_endpoint_lookup_assoc is called in the protection of sock lock
there is no need to call local_bh_disable in this function. so remove
them.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Traversal the transport rhashtable, get the association only once through
the condition assoc->peer.primary_path != transport.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.
lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.
this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport
hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport
init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Removed the unused quirks. These quirks don't used anywhere.
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Use to_pci_dev() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Use to_platform_device() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The misc control register is 32bit wide, the used readw/writew
accessors only mainipulate the low 16bit of this register. It
currently doesn't matter as all the bit changed are located in
the lower half, but together with the u32 variable used to hold
the contents of the register it is seriously confusing.
Switch to 32bit accessors to avoid any future breakage.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|