Age | Commit message (Collapse) | Author |
|
An FoU socket is currently bound to the wildcard-address. While this
works fine, there are several use-cases where the use of the
wildcard-address is not desirable. For example, I use FoU on some
multi-homed servers and would like to use FoU on only one of the
interfaces.
This commit adds support for binding FoU sockets to a given source
address/interface, as well as connecting the socket to a given
destination address/port. udp_tunnel already provides the required
infrastructure, so most of the code added is for exposing and setting
the different attributes (local address, peer address, etc.).
The lookups performed when we add, delete or get an FoU-socket has also
been updated to compare all the attributes a user can set. Since the
comparison now involves several elements, I have added a separate
comparison-function instead of open-coding.
In order to test the code and ensure that the new comparison code works
correctly, I started by creating a wildcard socket bound to port 1234 on
my machine. I then tried to create a non-wildcarded socket bound to the
same port, as well as fetching and deleting the socket (including source
address, peer address or interface index in the netlink request). Both
the create, fetch and delete request failed. Deleting/fetching the
socket was only successful when my netlink request attributes matched
those used to create the socket.
I then repeated the tests, but with a socket bound to a local ip
address, a socket bound to a local address + interface, and a bound
socket that was also «connected» to a peer. Add only worked when no
socket with the matching source address/interface (or wildcard) existed,
while fetch/delete was only successful when all attributes matched.
In addition to testing that the new code work, I also checked that the
current behavior is kept. If none of the new attributes are provided,
then an FoU-socket is configured as before (i.e., wildcarded). If any
of the new attributes are provided, the FoU-socket is configured as
expected.
v1->v2:
* Fixed building with IPv6 disabled (kbuild).
* Fixed a return type warning and make the ugly comparison function more
readable (kbuild).
* Describe more in detail what has been tested (thanks David Miller).
* Make peer port required if peer address is specified.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
"Fixes here and there, a couple new device IDs, as usual:
1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.
2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.
3) Fix documentation for some eBPF helpers, from Quentin Monnet.
4) Some UAPI bpf header sync with tools, also from Quentin Monnet.
5) Set descriptor ownership bit at the right time for jumbo frames in
stmmac driver, from Aaro Koskinen.
6) Set IFF_UP properly in tun driver, from Eric Dumazet.
7) Fix load/store doubleword instruction generation in powerpc eBPF
JIT, from Naveen N. Rao.
8) nla_nest_start() return value checks all over, from Kangjie Lu.
9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
merge window. From Marcelo Ricardo Leitner and Xin Long.
10) Fix memory corruption with large MTUs in stmmac, from Aaro
Koskinen.
11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
Dumazet.
12) Fix topology subscription cancellation in tipc, from Erik Hugne.
13) Memory leak in genetlink error path, from Yue Haibing.
14) Valid control actions properly in packet scheduler, from Davide
Caratti.
15) Even if we get EEXIST, we still need to rehash if a shrink was
delayed. From Herbert Xu.
16) Fix interrupt mask handling in interrupt handler of r8169, from
Heiner Kallweit.
17) Fix leak in ehea driver, from Wen Yang"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
dpaa2-eth: fix race condition with bql frame accounting
chelsio: use BUG() instead of BUG_ON(1)
net: devlink: skip info_get op call if it is not defined in dumpit
net: phy: bcm54xx: Encode link speed and activity into LEDs
tipc: change to check tipc_own_id to return in tipc_net_stop
net: usb: aqc111: Extend HWID table by QNAP device
net: sched: Kconfig: update reference link for PIE
net: dsa: qca8k: extend slave-bus implementations
net: dsa: qca8k: remove leftover phy accessors
dt-bindings: net: dsa: qca8k: support internal mdio-bus
dt-bindings: net: dsa: qca8k: fix example
net: phy: don't clear BMCR in genphy_soft_reset
bpf, libbpf: clarify bump in libbpf version info
bpf, libbpf: fix version info and add it to shared object
rxrpc: avoid clang -Wuninitialized warning
tipc: tipc clang warning
net: sched: fix cleanup NULL pointer exception in act_mirr
r8169: fix cable re-plugging issue
net: ethernet: ti: fix possible object reference leak
net: ibm: fix possible object reference leak
...
|
|
VirtualBox 6.0.x has a new feature where the guest kernel driver passes
info about the origin of the request (e.g. userspace or kernelspace) to
the hypervisor.
If we do not pass this information then when running the 6.0.x userspace
guest-additions tools on a 6.0.x host, some requests will get denied
with a VERR_VERSION_MISMATCH error, breaking vboxservice.service and
the mounting of shared folders marked to be auto-mounted.
This commit implements passing the requestor info to the host, fixing this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If an xfrmi is associated to a vrf layer 3 master device,
xfrm_policy_check() fails after traffic decapsulation. The input
interface is replaced by the layer 3 master device, and hence
xfrmi_decode_session() can't match the xfrmi anymore to satisfy
policy checking.
Extend ingress xfrmi lookup to honor the original layer 3 slave
device, allowing xfrm interfaces to operate within a vrf domain.
Fixes: f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-03-26
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) introduce bpf_tcp_check_syncookie() helper for XDP and tc, from Lorenz.
2) allow bpf_skb_ecn_set_ce() in tc, from Peter.
3) numerous bpf tc tunneling improvements, from Willem.
4) and other miscellaneous improvements from Adrian, Alan, Daniel, Ivan, Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is hard to say what KEY_SCREEN and KEY_ZOOM mean, but historically DVB
folks have used them to indicate switch to full screen mode. Later, they
converged on using KEY_ZOOM to switch into full screen mode and KEY)SCREEN
to control aspect ratio (see Documentation/media/uapi/rc/rc-tables.rst).
Let's commit to these uses, and define:
- KEY_FULL_SCREEN (and make KEY_ZOOM its alias)
- KEY_ASPECT_RATIO (and make KEY_SCREEN its alias)
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Fix for name mismatch and omitted colons in the
security_list_options documentation.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The shm_* hooks were changed in the commit
"shm/security: Pass kern_ipc_perm not shmid_kernel into the
shm security hooks" (7191adff2a55). The type of the argument
shp was changed from shmid_kernel to kern_ipc_perm. This patch
updates the documentation for the hooks accordingly.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The sem_* hooks were changed in the commit
"sem/security: Pass kern_ipc_perm not sem_array into the
sem security hooks" (aefad9593ec5). The type of the argument
sma was changed from sem_array to kern_ipc_perm. This patch
updates the documentation for the hooks accordingly.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The msg_queue_* hooks were changed in the commit
"msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue
security hooks" (d8c6e8543294). The type of the argument msq was changed
from msq_queue to kern_ipc_perm. This patch updates the documentation
for the hooks accordingly.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
This patch updates the documentation for the audit_* hooks
to use the same arguments names as in the hook's declarations.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The path_chmod hook was changed in the commit
"switch security_path_chmod() to struct path *" (cdcf116d44e7).
The argument @mnt was removed from the hook, @dentry was changed
to @path. This patch updates the documentation accordingly.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The socket_getpeersec_dgram hook was changed in the commit
"[AF_UNIX]: Kernel memory leak fix for af_unix datagram
getpeersec patch" (dc49c1f94e34). The arguments @secdata
and @seclen were changed to @sock and @secid. This patch
updates the documentation accordingly.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The task_setscheduler hook was changed in the commit
"security: remove unused parameter from security_task_setscheduler()"
(b0ae19811375). The arguments @policy, @lp were removed from the hook.
This patch updates the documentation accordingly.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
This patch slightly fixes the documentation for the
socket_post_create hook. The documentation states that
i_security field is accessible through inode field of socket
structure (i.e., 'sock->inode->i_security'). There is no inode
field in the socket structure. The i_security field is accessible
through SOCK_INODE macro. The patch updates the documentation
to reflect this.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The syslog hook was changed in the commit
"capabilities/syslog: open code cap_syslog logic to
fix build failure" (12b3052c3ee8). The argument @from_file
was removed from the hook. This patch updates the
documentation for the syslog hook accordingly.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The @type argument of the sb_copy_data hook was removed
in the commit "LSM/SELinux: Interfaces to allow FS to control
mount options" (e0007529893c). This commit removes the description
of the @type argument from the LSM documentation.
Signed-off-by: Denis Efremov <efremov@ispras.ru>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
|
|
The cleanup_srcu_struct_quiesced() function was added because NVME
used WQ_MEM_RECLAIM workqueues and SRCU did not, which meant that
NVME workqueues waiting on SRCU workqueues could result in deadlocks
during low-memory conditions. However, SRCU now also has WQ_MEM_RECLAIM
workqueues, so there is no longer a potential for deadlock. Furthermore,
it turns out to be extremely hard to use cleanup_srcu_struct_quiesced()
correctly due to the fact that SRCU callback invocation accesses the
srcu_struct structure's per-CPU data area just after callbacks are
invoked. Therefore, the usual practice of using srcu_barrier() to wait
for callbacks to be invoked before invoking cleanup_srcu_struct_quiesced()
fails because SRCU's callback-invocation workqueue handler might be
delayed, which can result in cleanup_srcu_struct_quiesced() being invoked
(and thus freeing the per-CPU data) before the SRCU's callback-invocation
workqueue handler is finished using that per-CPU data. Nor is this a
theoretical problem: KASAN emitted use-after-free warnings because of
this problem on actual runs.
In short, NVME can now safely invoke cleanup_srcu_struct(), which
avoids the use-after-free scenario. And cleanup_srcu_struct_quiesced()
is quite difficult to use safely. This commit therefore removes
cleanup_srcu_struct_quiesced(), switching its sole user back to
cleanup_srcu_struct(). This effectively reverts the following pair
of commits:
f7194ac32ca2 ("srcu: Add cleanup_srcu_struct_quiesced()")
4317228ad9b8 ("nvme: Avoid flush dependency in delete controller flow")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
|
|
The rcu_head_after_call_rcu() function reads the rhp->func pointer twice,
which can result in a false-positive WARN_ON_ONCE() if the callback
were passed to call_rcu() between the two reads. Although racing
rcu_head_after_call_rcu() with call_rcu() is to be a dubious use case
(the return value is not reliable in that case), intermittent and
irreproducible warnings are also quite dubious. This commit therefore
uses a single READ_ONCE() to pick up the value of rhp->func once, then
tests that value twice, thus guaranteeing consistent processing within
rcu_head_after_call_rcu()().
Neverthless, racing rcu_head_after_call_rcu() with call_rcu() is still
a dubious use case.
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
[ paulmck: Add blank line after declaration per checkpatch.pl. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
|
|
Previously the green and amber LEDs on this quad PHY were solid, to
indicate an encoding of the link speed (10/100/1000).
This keeps the LEDs always on just as before, but now they flash on
Rx/Tx activity.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit
bf904d2762ee ("x86/pti/64: Remove the SYSCALL64 entry trampoline")
removed the sole usage of kclist_add_remap() but it did not remove the
left-over definition from the include file.
Fix the same.
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: kexec@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Omar Sandoval <osandov@fb.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1553583028-17804-1-git-send-email-bhsharma@redhat.com
|
|
In commit 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
I introduced a check for xfrm protocol, but according to Herbert
IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
it should be removed from validate_tmpl().
And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
protocols, this is why xfrm_state_flush() could still miss
IPPROTO_ROUTING, which leads that those entries are left in
net->xfrm.state_all before exit net. Fix this by replacing
IPSEC_PROTO_ANY with zero.
This patch also extracts the check from validate_tmpl() to
xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
With this, no other protocols should be added into xfrm.
Fixes: 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This reverts commit 1aec4211204d9463d1fd209eb50453de16254599.
Steven Rostedt reports that it causes a hang at bootup and bisected it
to this commit.
The troigger is apparently a module alias for "parport_lowlevel" that
points to "parport_pc", which causes a hang with
modprobe -q -- parport_lowlevel
blocking forever with a backtrace like this:
wait_for_completion_killable+0x1c/0x28
call_usermodehelper_exec+0xa7/0x108
__request_module+0x351/0x3d8
get_lowlevel_driver+0x28/0x41 [parport]
__parport_register_driver+0x39/0x1f4 [parport]
daisy_drv_init+0x31/0x4f [parport]
parport_bus_init+0x5d/0x7b [parport]
parport_default_proc_register+0x26/0x1000 [parport]
do_one_initcall+0xc2/0x1e0
do_init_module+0x50/0x1d4
load_module+0x1c2e/0x21b3
sys_init_module+0xef/0x117
Supid says:
"Due to the new device model daisy driver will now try to find the
parallel ports while trying to register its driver so that it can bind
with them. Now, since daisy driver is loaded while parport bus is
initialising the list of parport is still empty and it tries to load
the lowlevel driver, which has an alias set to parport_pc, now causes
a deadlock"
But I don't think the daisy driver should be loaded by the parport
initialization in the first place, so let's revert the whole change.
If the daisy driver can just initialize separately on its own (like a
driver should), instead of hooking into the parport init sequence
directly, this issue probably would go away.
Reported-and-bisected-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The G12A Documentation lacked these 2 reset lines, but they are present and
used for each USB 2 PHYs.
Add them to the dt-bindings for the upcoming USB support.
Fixes: dbfc54534dfc ("dt-bindings: reset: meson: add g12a bindings")
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
|
|
Apparently without it it is incorrect syntax and causes a warning about
undocumented struct field.
Fixes: b230d5aba2d1 ("LSM: add new hook for kernfs node initialization")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Rather than setting debug output flags during early init, its makes
more sense to simply re-define ACPI_DEBUG_DEFAULT specifically for
Linux.
ACPICA commit 60903715711f4b00ca1831779a8a23279a66497d
Link: https://github.com/acpica/acpica/commit/60903715
Fixes: ce5cbf53496b ("ACPI: Set debug output flags independent of ACPICA")
Reported-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Tested-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
With this patch multicast packets with a limited number of destinations
(current default: 16) will be split and transmitted by the originator as
individual unicast transmissions.
Wifi broadcasts with their low bitrate are still a costly undertaking.
In a mesh network this cost multiplies with the overall size of the mesh
network. Therefore using multiple unicast transmissions instead of
broadcast flooding is almost always less burdensome for the mesh
network.
The maximum amount of unicast packets can be configured via the newly
introduced multicast_fanout parameter. If this limit is exceeded
distribution will fall back to classic broadcast flooding.
The multicast-to-unicast conversion is performed on the initial
multicast sender node and counts on a final destination node, mesh-wide
basis (and not next hop, neighbor node basis).
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
All files got a SPDX-License-Identifier with commit 7db7d9f369a4
("batman-adv: Add SPDX license identifier above copyright header"). All the
required information about the license conditions can be found in
LICENSES/.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
Timers are added to the timer wheel off by one. This is required in
case a timer is queued directly before incrementing jiffies to prevent
early timer expiry.
When reading a timer trace and relying only on the expiry time of the timer
in the timer_start trace point and on the now in the timer_expiry_entry
trace point, it seems that the timer fires late. With the current
timer_expiry_entry trace point information only now=jiffies is printed but
not the value of base->clk. This makes it impossible to draw a conclusion
to the index of base->clk and makes it impossible to examine timer problems
without additional trace points.
Therefore add the base->clk value to the timer_expire_entry trace
point, to be able to calculate the index the timer base is located at
during collecting expired timers.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20190321120921.16463-5-anna-maria@linutronix.de
|
|
Since commit 04b8eb7a4ccd ("symbol lookup: introduce
dereference_symbol_descriptor()") %pf is deprecated, because %ps is smart
enough to handle function pointer dereference on platforms where such a
dereference is required.
While at it add proper line breaks to stay in the 80 character limit.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20190321120921.16463-4-anna-maria@linutronix.de
|
|
Some drivers are becoming more dependent on NET_DEVLINK being selected
in configuration. With upcoming compat functions, the behavior would be
wrong in case devlink was not compiled in. So make the drivers select
NET_DEVLINK and rely on the functions being there, not just stubs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add spinlock to protect port type and type_dev pointer consistency.
Without that, userspace may see inconsistent type and type_dev
combinations.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v1->v2:
- rebased
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:
- Prevent potential NULL pointer dereferences in the HPET and HyperV
code
- Exclude the GART aperture from /proc/kcore to prevent kernel
crashes on access
- Use the correct macros for Cyrix I/O on Geode processors
- Remove yet another kernel address printk leak
- Announce microcode reload completion as requested by quite some
people. Microcode loading has become popular recently.
- Some 'Make Clang' happy fixlets
- A few cleanups for recently added code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/gart: Exclude GART aperture from kcore
x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
x86/mm/pti: Make local symbols static
x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors
x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
x86/microcode: Announce reload operation's completion
x86/hyperv: Prevent potential NULL pointer dereference
x86/hpet: Prevent potential NULL pointer dereference
x86/lib: Fix indentation issue, remove extra tab
x86/boot: Restrict header scope to make Clang happy
x86/mm: Don't leak kernel addresses
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for the interrupt subsystem:
- Remove secondary GIC support on systems w/o device-tree support
- A set of small fixlets in various irqchip drivers
- static and fall-through annotations
- Kernel doc and typo fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Mark expected switch case fall-through
genirq/devres: Remove excess parameter from kernel doc
irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static
irqchip/mbigen: Don't clear eventid when freeing an MSI
irqchip/stm32: Don't set rising configuration registers at init
irqchip/stm32: Don't clear rising/falling config registers at init
dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support
irqchip/mmp: Make mmp_irq_domain_ops static
irqchip/brcmstb-l2: Make two init functions static
genirq: Fix typo in comment of IRQD_MOVE_PCNTXT
irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
irqchip/gic: Drop support for secondary GIC in non-DT systems
irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
|
|
Pull auxdisplay updates from Miguel Ojeda:
"A few fixes and improvements for auxdisplay:
- Series to fix a memory leak in hd44780 while introducing
charlcd_free(). From Andy Shevchenko
- Series to clean up the Kconfig menus and a couple of improvements
for charlcd. From Mans Rullgard"
* tag 'auxdisplay-for-linus-v5.1-rc2' of git://github.com/ojeda/linux:
auxdisplay: charlcd: make backlight initial state configurable
auxdisplay: charlcd: simplify init message display
auxdisplay: deconfuse configuration
auxdisplay: hd44780: Convert to use charlcd_free()
auxdisplay: panel: Convert to use charlcd_free()
auxdisplay: charlcd: Introduce charlcd_free() helper
auxdisplay: charlcd: Move to_priv() to charlcd namespace
auxdisplay: hd44780: Fix memory leak on ->remove()
|
|
This patch adds a new opcode to INFO IOCTL that returns the device status.
This will allow users to query the device status in order to avoid sending
command submissions while device is in reset.
Signed-off-by: Dalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-03-20
This series includes updates to mlx5 driver,
1) Compiler warnings cleanup from Saeed Mahameed
2) Parav Pandit simplifies sriov enable/disables
3) Gustavo A. R. Silva, Removes a redundant assignment
4) Moshe Shemesh, Adds Geneve tunnel stateless offload support
5) Eli Britstein, Adds the Support for VLAN modify action and
Replaces TC VLAN pop and push actions with VLAN modify
Note: This series includes two simple non-mlx5 patches,
1) Declare IANA_VXLAN_UDP_PORT definition in include/net/vxlan.h,
and use it in some drivers.
2) Declare GENEVE_UDP_PORT definition in include/net/geneve.h,
and use it in mlx5 and nfp drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Often times, recvmsg() system calls and BH handling for a particular
TCP socket are done on different cpus.
This means the incoming skb had to be allocated on a cpu,
but freed on another.
This incurs a high spinlock contention in slab layer for small rpc,
but also a high number of cache line ping pongs for larger packets.
A full size GRO packet might use 45 page fragments, meaning
that up to 45 put_page() can be involved.
More over performing the __kfree_skb() in the recvmsg() context
adds a latency for user applications, and increase probability
of trapping them in backlog processing, since the BH handler
might found the socket owned by the user.
This patch, combined with the prior one increases the rpc
performance by about 10 % on servers with large number of cores.
(tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
instead of 8 Mpps)
This also increases single bulk flow performance on 40Gbit+ links,
since in this case there are often two cpus working in tandem :
- CPU handling the NIC rx interrupts, feeding the receive queue,
and (after this patch) freeing the skbs that were consumed.
- CPU in recvmsg() system call, essentially 100 % busy copying out
data to user space.
Having at most one skb in a per-socket cache has very little risk
of memory exhaustion, and since it is protected by socket lock,
its management is essentially free.
Note that if rps/rfs is used, we do not enable this feature, because
there is high chance that the same cpu is handling both the recvmsg()
system call and the TCP rx path, but that another cpu did the skb
allocations in the device driver right before the RPS/RFS logic.
To properly handle this case, it seems we would need to record
on which cpu skb was allocated, and use a different channel
to give skbs back to this cpu.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks.
20.69% [kernel] [k] queued_spin_lock_slowpath
5.64% [kernel] [k] _raw_spin_lock
3.83% [kernel] [k] syscall_return_via_sysret
3.48% [kernel] [k] __entry_text_start
1.76% [kernel] [k] __netif_receive_skb_core
1.64% [kernel] [k] __fget
For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes.
In many cases, ACK packets are handled by another cpus, and this unfortunately
incurs heavy costs for slab layer.
This patch uses an extra pointer in socket structure, so that we try to reuse
the same skb and avoid these expensive costs.
We cache at most one skb per socket so this should be safe as far as
memory pressure is concerned.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We prefer static_branch_unlikely() over static_key_false() these days.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The queue is marked not empty after acquiring the seqlock,
and it's up to the NOLOCK qdisc clearing such flag on dequeue.
Since the empty status lays on the same cache-line of the
seqlock, it's always hot on cache during the updates.
This makes the empty flag update a little bit loosy. Given
the lack of synchronization between enqueue and dequeue, this
is unavoidable.
v2 -> v3:
- qdisc_is_empty() has a const argument (Eric)
v1 -> v2:
- use really an 'empty' flag instead of 'not_empty', as
suggested by Eric
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add documentation to the tcp_ca_state enum, since this enum is
exposed in uapi.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Sowmini Varadhan <sowmini05@gmail.com>
Acked-by: Sowmini Varadhan <sowmini05@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Valentin reported that unplugging a CPU occasionally results in a warning
in the tick broadcast code which is triggered when an offline CPU is in the
broadcast mask.
This happens because the outgoing CPU is not removing itself from the
broadcast masks, especially not from the broadcast_force_mask. The removal
happens on the control CPU after the outgoing CPU is dead. It's a long
standing issue, but the warning is harmless.
Rework the hotplug mechanism so that the outgoing CPU removes itself from
the broadcast masks after disabling interrupts and removing itself from the
online mask.
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903211540180.1784@nanos.tec.linutronix.de
|
|
Pull io_uring fixes and improvements from Jens Axboe:
"The first five in this series are heavily inspired by the work Al did
on the aio side to fix the races there.
The last two re-introduce a feature that was in io_uring before it got
merged, but which I pulled since we didn't have a good way to have
BVEC iters that already have a stable reference. These aren't
necessarily related to block, it's just how io_uring pins fixed
buffers"
* tag 'io_uring-20190323' of git://git.kernel.dk/linux-block:
block: add BIO_NO_PAGE_REF flag
iov_iter: add ITER_BVEC_FLAG_NO_REF flag
io_uring: mark me as the maintainer
io_uring: retry bulk slab allocs as single allocs
io_uring: fix poll races
io_uring: fix fget/fput handling
io_uring: add prepped flag
io_uring: make io_read/write return an integer
io_uring: use regular request ref counts
|
|
Pull block fixes from Jens Axboe:
"A set of fixes/changes that should go into this series. This contains:
- Kernel doc / comment updates (Bart, Shenghui)
- Un-export of core-only used function (Bart)
- Fix race on loop file access (Dongli)
- pf/pcd queue cleanup fixes (me)
- Use appropriate helper for RESTART bit set (Yufen)
- Use named identifier for classic poll (Yufen)"
* tag 'for-linus-20190323' of git://git.kernel.dk/linux-block:
sbitmap: trivial - update comment for sbitmap_deferred_clear_bit
blkcg: Fix kernel-doc warnings
blk-iolatency: #include "blk.h"
block: Unexport blk_mq_add_to_requeue_list()
block: add BLK_MQ_POLL_CLASSIC for hybrid poll and return EINVAL for unexpected value
blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx
loop: access lo_backing_file only when the loop device is Lo_bound
blk-mq: use blk_mq_sched_mark_restart_hctx to set RESTART
paride/pcd: cleanup queues when detection fails
paride/pf: cleanup queues when detection fails
|
|
Pull ceph fixes from Ilya Dryomov:
"A follow up for the new alloc_size logic and a blacklisting fix,
marked for stable"
* tag 'ceph-for-5.1-rc2' of git://github.com/ceph/ceph-client:
rbd: drop wait_for_latest_osdmap()
libceph: wait for latest osdmap in ceph_monc_blacklist_add()
rbd: set io_min, io_opt and discard_granularity to alloc_size
|
|
On machines where the GART aperture is mapped over physical RAM,
/proc/kcore contains the GART aperture range. Accessing the GART range via
/proc/kcore results in a kernel crash.
vmcore used to have the same issue, until it was fixed with commit
2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging
existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
when attempting to read the aperture region, and so it won't read from the
actual memory.
Apply the same workaround for kcore. First implement the same hook
infrastructure for kcore, then reuse the hook functions introduced in the
previous vmcore fix. Just with some minor adjustment, rename some functions
for more general usage, and simplify the hook infrastructure a bit as there
is no module usage yet.
Suggested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Dave Young <dyoung@redhat.com>
Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
|
|
When pushing tunnel headers, annotate skbs in the same way as tunnel
devices.
For GSO packets, the network stack requires certain fields set to
segment packets with tunnel headers. gro_gse_segment depends on
transport and inner mac header, for instance.
Add an option to pass this information.
Remove the restriction on len_diff to network header length, which
is too short, e.g., for GRE protocols.
Changes
v1->v2:
- document new flags
- BPF_F_ADJ_ROOM_MASK moved
v2->v3:
- BPF_F_ADJ_ROOM_ENCAP_L3_MASK moved
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_skb_adjust_room adjusts gso_size of gso packets to account for the
pushed or popped header room.
This is not allowed with UDP, where gso_size delineates datagrams. Add
an option to avoid these updates and allow this call for datagrams.
It can also be used with TCP, when MSS is known to allow headroom,
e.g., through MSS clamping or route MTU.
Changes v1->v2:
- document flag BPF_F_ADJ_ROOM_FIXED_GSO
- do not expose BPF_F_ADJ_ROOM_MASK through uapi, as it may change.
Link: https://patchwork.ozlabs.org/patch/1052497/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_skb_adjust_room net allows inserting room in an skb.
Existing mode BPF_ADJ_ROOM_NET inserts room after the network header
by pulling the skb, moving the network header forward and zeroing the
new space.
Add new mode BPF_ADJUST_ROOM_MAC that inserts room after the mac
header. This allows inserting tunnel headers in front of the network
header without having to recreate the network header in the original
space, avoiding two copies.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|