summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-10-26sunrpc: don't pass on-stack memory to sg_set_bufJ. Bruce Fields
As of ac4e97abce9b "scatterlist: sg_set_buf() argument must be in linear mapping", sg_set_buf hits a BUG when make_checksum_v2->xdr_process_buf, among other callers, passes it memory on the stack. We only need a scatterlist to pass this to the crypto code, and it seems like overkill to require kmalloc'd memory just to encrypt a few bytes, but for now this seems the best fix. Many of these callers are in the NFS write paths, so we allocate with GFP_NOFS. It might be possible to do without allocations here entirely, but that would probably be a bigger project. Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-26netfilter: nft_ct: add notrack supportPablo Neira Ayuso
This patch adds notrack support. I decided to add a new expression, given that this doesn't fit into the existing set operation. Notrack doesn't need a source register, and an hypothetical NFT_CT_NOTRACK key makes no sense since matching the untracked state is done through NFT_CT_STATE. I'm placing this new notrack expression into nft_ct.c, I think a single module is too much. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26netfilter: nft_meta: permit pkttype mangling in ip/ip6 preroutingLiping Zhang
After supporting this, we can combine it with hash expression to emulate the 'cluster match'. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26netfilter: nft_numgen: start round robin from zeroLiping Zhang
Currently we start round robin from 1, but it's better to start round robin from 0. This is to keep consistent with xt_statistic in iptables. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26netfilter: nf_tables: allow expressions to return STOLENFlorian Westphal
Currently not supported, we'd oops as skb was (or is) free'd elsewhere. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26netfilter: nfnetlink_log: Use GFP_NOWARN for skb allocationCalvin Owens
Since the code explicilty falls back to a smaller allocation when the large one fails, we shouldn't complain when that happens. Signed-off-by: Calvin Owens <calvinowens@fb.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26netfilter: xt_multiport: Use switch case instead of multiple condition checksGao Feng
There are multiple equality condition checks in the original codes, so it is better to use switch case instead of them. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-26cfg80211: process events caused by suspend before suspendingJohannes Berg
When suspending without WoWLAN, cfg80211 will ask drivers to disconnect. Even when the driver does this synchronously, and immediately returns with a notification, cfg80211 schedules the handling thereof to a workqueue, and may then call back into the driver when the driver was already suspended/ing. Fix this by processing all events caused by cfg80211_leave_all() directly after that function returns. The driver still needs to do the right thing here and wait for the firmware response, but that is - at least - true for mwifiex where this occurred. Reported-by: Amitkumar Karwar <akarwar@marvell.com> Tested-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-23net: ip, diag -- Add diag interface for raw socketsCyrill Gorcunov
In criu we are actively using diag interface to collect sockets present in the system when dumping applications. And while for unix, tcp, udp[lite], packet, netlink it works as expected, the raw sockets do not have. Thus add it. v2: - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@) - implement @destroy for diag requests (by dsa@) v3: - add export of raw_abort for IPv6 (by dsa@) - pass net-admin flag into inet_sk_diag_fill due to changes in net-next branch (by dsa@) v4: - use @pad in struct inet_diag_req_v2 for raw socket protocol specification: raw module carries sockets which may have custom protocol passed from socket() syscall and sole @sdiag_protocol is not enough to match underlied ones - start reporting protocol specifed in socket() call when sockets are raw ones for the same reason: user space tools like ss may parse this attribute and use it for socket matching v5 (by eric.dumazet@): - use sock_hold in raw_sock_get instead of atomic_inc, we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock when looking up so counter won't be zero here. v6: - use sdiag_raw_protocol() helper which will access @pad structure used for raw sockets protocol specification: we can't simply rename this member without breaking uapi v7: - sine sdiag_raw_protocol() helper is not suitable for uapi lets rather make an alias structure with proper names. __check_inet_diag_req_raw helper will catch if any of structure unintentionally changed. CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2016-10-21 Here are some more Bluetooth fixes for the 4.9 kernel: - Fix to btwilink driver probe function return value - Power management fix to hci_bcm - Fix to encoding name in scan response data Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23lwt: Remove unused len fieldThomas Graf
The field is initialized by ILA and MPLS but never used. Remove it. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23net: sctp, forbid negative lengthJiri Slaby
Most of getsockopt handlers in net/sctp/socket.c check len against sizeof some structure like: if (len < sizeof(int)) return -EINVAL; On the first look, the check seems to be correct. But since len is int and sizeof returns size_t, int gets promoted to unsigned size_t too. So the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is false. Fix this in sctp by explicitly checking len < 0 before any getsockopt handler is called. Note that sctp_getsockopt_events already handled the negative case. Since we added the < 0 check elsewhere, this one can be removed. If not checked, this is the result: UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19 shift exponent 52 is too large for 32-bit type 'int' CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422 Call Trace: [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300 ... [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp] [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp] [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23ipv6: do not increment mac header when it's unsetJason A. Donenfeld
Otherwise we'll overflow the integer. This occurs when layer 3 tunneled packets are handed off to the IPv6 layer. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23net: allow to kill a task which waits net_mutex in copy_new_nsAndrey Vagin
net_mutex can be locked for a long time. It may be because many namespaces are being destroyed or many processes decide to create a network namespace. Both these operations are heavy, so it is better to have an ability to kill a process which is waiting net_mutex. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID framesShmulik Ladkani
META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci) is zero, then no vlan accel tag is present. This is incorrect for zero VID vlan accel packets, making the following match fail: tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ... Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was introduced in 05423b2 "vlan: allow null VLAN ID to be used" (and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not adapted). Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's value. Fixes: 05423b2413 ("vlan: allow null VLAN ID to be used") Fixes: 1a31f2042e ("netsched: Allow meta match on vlan tag on receive") Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22bpf: add helper for retrieving current numa node idDaniel Borkmann
Use case is mainly for soreuseport to select sockets for the local numa node, but since generic, lets also add this for other networking and tracing program types. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22udp: use it's own memory accounting schemaPaolo Abeni
Completely avoid default sock memory accounting and replace it with udp-specific accounting. Since the new memory accounting model encapsulates completely the required locking, remove the socket lock on both enqueue and dequeue, and avoid using the backlog on enqueue. Be sure to clean-up rx queue memory on socket destruction, using udp its own sk_destruct. Tested using pktgen with random src port, 64 bytes packet, wire-speed on a 10G link as sender and udp_sink as the receiver, using an l4 tuple rxhash to stress the contention, and one or more udp_sink instances with reuseport. nr readers Kpps (vanilla) Kpps (patched) 1 170 440 3 1250 2150 6 3000 3650 9 4200 4450 12 5700 6250 v4 -> v5: - avoid unneeded test in first_packet_length v3 -> v4: - remove useless sk_rcvqueues_full() call v2 -> v3: - do not set the now unsed backlog_rcv callback v1 -> v2: - add memory pressure support - fixed dropwatch accounting for ipv6 Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22udp: implement memory accounting helpersPaolo Abeni
Avoid using the generic helpers. Use the receive queue spin lock to protect the memory accounting operation, both on enqueue and on dequeue. On dequeue perform partial memory reclaiming, trying to leave a quantum of forward allocated memory. On enqueue use a custom helper, to allow some optimizations: - use a plain spin_lock() variant instead of the slightly costly spin_lock_irqsave(), - avoid dst_force check, since the calling code has already dropped the skb dst - avoid orphaning the skb, since skb_steal_sock() already did the work for us The above needs custom memory reclaiming on shutdown, provided by the udp_destruct_sock(). v5 -> v6: - don't orphan the skb on enqueue v4 -> v5: - replace the mem_lock with the receive queue spin lock - ensure that the bh is always allowed to enqueue at least a skb, even if sk_rcvbuf is exceeded v3 -> v4: - reworked memory accunting, simplifying the schema - provide an helper for both memory scheduling and enqueuing v1 -> v2: - use a udp specific destrctor to perform memory reclaiming - remove a couple of helpers, unneeded after the above cleanup - do not reclaim memory on dequeue if not under memory pressure - reworked the fwd accounting schema to avoid potential integer overflow Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22net/socket: factor out helpers for memory and queue manipulationPaolo Abeni
Basic sock operations that udp code can use with its own memory accounting schema. No functional change is introduced in the existing APIs. v4 -> v5: - avoid whitespace changes v2 -> v4: - avoid exporting __sock_enqueue_skb v1 -> v2: - avoid export sock_rmem_free Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22ipv4: use the right lock for ping_group_rangeWANG Cong
This reverts commit a681574c99be23e4d20b769bf0e543239c364af5 ("ipv4: disable BH in set_ping_group_range()") because we never read ping_group_range in BH context (unlike local_port_range). Then, since we already have a lock for ping_group_range, those using ip_local_ports.lock for ping_group_range are clearly typos. We might consider to share a same lock for both ping_group_range and local_port_range w.r.t. space saving, but that should be for net-next. Fixes: a681574c99be ("ipv4: disable BH in set_ping_group_range()") Fixes: ba6b918ab234 ("ping: move ping_group_range out of CONFIG_SYSCTL") Cc: Eric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22netns: revert "netns: avoid disabling irq for netns id"Paul Moore
This reverts commit bc51dddf98c9 ("netns: avoid disabling irq for netns id") as it was found to cause problems with systems running SELinux/audit, see the mailing list thread below: * http://marc.info/?t=147694653900002&r=1&w=2 Eventually we should be able to reintroduce this code once we have rewritten the audit multicast code to queue messages much the same way we do for unicast messages. A tracking issue for this can be found below: * https://github.com/linux-audit/audit-kernel/issues/23 Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Reported-by: Elad Raz <e@eladraz.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21net: remove MTU limits on a few ether_setup callersJarod Wilson
These few drivers call ether_setup(), but have no ndo_change_mtu, and thus were overlooked for changes to MTU range checking behavior. They previously had no range checks, so for feature-parity, set their min_mtu to 0 and max_mtu to ETH_MAX_MTU (65535), instead of the 68 and 1500 inherited from the ether_setup() changes. Fine-tuning can come after we get back to full feature-parity here. CC: netdev@vger.kernel.org Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> CC: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> CC: R Parameswaran <parameswaran.r7@gmail.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21ipv6: fix a potential deadlock in do_ipv6_setsockopt()WANG Cong
Baozeng reported this deadlock case: CPU0 CPU1 ---- ---- lock([ 165.136033] sk_lock-AF_INET6); lock([ 165.136033] rtnl_mutex); lock([ 165.136033] sk_lock-AF_INET6); lock([ 165.136033] rtnl_mutex); Similar to commit 87e9f0315952 ("ipv4: fix a potential deadlock in mcast getsockopt() path") this is due to we still have a case, ipv6_sock_mc_close(), where we acquire sk_lock before rtnl_lock. Close this deadlock with the similar solution, that is always acquire rtnl lock first. Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket") Reported-by: Baozeng Ding <sploving1@gmail.com> Tested-by: Baozeng Ding <sploving1@gmail.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from Geert Uytterhoeven. 2) Fix wrong timeout in set elements added from packet path via nft_dynset, from Anders K. Pedersen. 3) Remove obsolete nf_conntrack_events_retry_timeout sysctl documentation, from Nicolas Dichtel. 4) Ensure proper initialization of log flags via xt_LOG, from Liping Zhang. 5) Missing alias to autoload ipcomp, also from Liping Zhang. 6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping. 7) Wrong integer type in the new nft_parse_u32_check() function, from Dan Carpenter. 8) Another wrong integer type declaration in nft_exthdr_init, also from Dan Carpenter. 9) Fix insufficient mode validation in nft_range. 10) Fix compilation warning in nft_range due to possible uninitialized value, from Arnd Bergmann. 11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to calm down kmemcheck, from Florian Westphal. 12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached, from Nicolas Dichtel. 13) Fix nf_queue() after conversion to single-linked hook list, related to incorrect bypass flag handling and incorrect hook point of reinjection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21batman-adv: fix splat on disabling an interfaceLinus Lüssing
As long as there is still a reference for a hard interface held, there might still be a forwarding packet relying on its attributes. Therefore avoid setting hard_iface->soft_iface to NULL when disabling a hard interface. This fixes the following, potential splat: batman_adv: bat0: Interface deactivated: eth1 batman_adv: bat0: Removing interface: eth1 cgroup: new mount options do not match the existing superblock, will be ignored batman_adv: bat0: Interface deactivated: eth3 batman_adv: bat0: Removing interface: eth3 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 1986 at ./net/batman-adv/bat_iv_ogm.c:549 batadv_iv_send_outstanding_bat_ogm_packet+0x145/0x643 [batman_adv] Modules linked in: batman_adv(O-) <...> CPU: 3 PID: 1986 Comm: kworker/u8:2 Tainted: G W O 4.6.0-rc6+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet [batman_adv] 0000000000000000 ffff88001d93bca0 ffffffff8126c26b 0000000000000000 0000000000000000 ffff88001d93bcf0 ffffffff81051615 ffff88001f19f818 000002251d93bd68 0000000000000046 ffff88001dc04a00 ffff88001becbe48 Call Trace: [<ffffffff8126c26b>] dump_stack+0x67/0x90 [<ffffffff81051615>] __warn+0xc7/0xe5 [<ffffffff8105164b>] warn_slowpath_null+0x18/0x1a [<ffffffffa0356f24>] batadv_iv_send_outstanding_bat_ogm_packet+0x145/0x643 [batman_adv] [<ffffffff8108b01f>] ? __lock_is_held+0x32/0x54 [<ffffffff810689a2>] process_one_work+0x2a8/0x4f5 [<ffffffff81068856>] ? process_one_work+0x15c/0x4f5 [<ffffffff81068df2>] worker_thread+0x1d5/0x2c0 [<ffffffff81068c1d>] ? process_scheduled_works+0x2e/0x2e [<ffffffff81068c1d>] ? process_scheduled_works+0x2e/0x2e [<ffffffff8106dd90>] kthread+0xc0/0xc8 [<ffffffff8144de82>] ret_from_fork+0x22/0x40 [<ffffffff8106dcd0>] ? __init_kthread_worker+0x55/0x55 ---[ end trace 647f9f325123dc05 ]--- What happened here is, that there was still a forw_packet (here: a BATMAN IV OGM) in the queue of eth3 with the forw_packet->if_incoming set to eth1 and the forw_packet->if_outgoing set to eth3. When eth3 is to be deactivated and removed, then this thread waits for the forw_packet queued on eth3 to finish. Because eth1 was deactivated and removed earlier and by that had forw_packet->if_incoming->soft_iface, set to NULL, the splat when trying to send/flush the OGM on eth3 occures. Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> [sven@narfation.org: Reduced size of Oops message] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-20ipv4/6: use core net MTU range checkingJarod Wilson
ipv4/ip_tunnel: - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen - preserve all ndo_change_mtu checks for now to prevent regressions ipv6/ip6_tunnel: - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - preserve all ndo_change_mtu checks for now to prevent regressions ipv6/ip6_vti: - min_mtu = 1280, max_mtu = 65535 - remove redundant vti6_change_mtu ipv6/sit: - min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen - remove redundant ipip6_tunnel_change_mtu CC: netdev@vger.kernel.org CC: "David S. Miller" <davem@davemloft.net> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in misc driversJarod Wilson
firewire-net: - set min/max_mtu - remove fwnet_change_mtu nes: - set max_mtu - clean up nes_netdev_change_mtu xpnet: - set min/max_mtu - remove xpnet_dev_change_mtu hippi: - set min/max_mtu - remove hippi_change_mtu batman-adv: - set max_mtu - remove batadv_interface_change_mtu - initialization is a little async, not 100% certain that max_mtu is set in the optimal place, don't have hardware to test with rionet: - set min/max_mtu - remove rionet_change_mtu slip: - set min/max_mtu - streamline sl_change_mtu um/net_kern: - remove pointless ndo_change_mtu hsi/clients/ssi_protocol: - use core MTU range checking - remove now redundant ssip_pn_set_mtu ipoib: - set a default max MTU value - Note: ipoib's actual max MTU can vary, depending on if the device is in connected mode or not, so we'll just set the max_mtu value to the max possible, and let the ndo_change_mtu function continue to validate any new MTU change requests with checks for CM or not. Note that ipoib has no min_mtu set, and thus, the network core's mtu > 0 check is the only lower bounds here. mptlan: - use net core MTU range checking - remove now redundant mpt_lan_change_mtu fddi: - min_mtu = 21, max_mtu = 4470 - remove now redundant fddi_change_mtu (including export) fjes: - min_mtu = 8192, max_mtu = 65536 - The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to get past the core net MTU range checks so fjes_change_mtu can validate a new MTU against what it supports (see fjes_support_mtu in fjes_hw.c) hsr: - min_mtu = 0 (calls ether_setup, max_mtu is 1500) f_phonet: - min_mtu = 6, max_mtu = 65541 u_ether: - min_mtu = 14, max_mtu = 15412 phonet/pep-gprs: - min_mtu = 576, max_mtu = 65530 - remove redundant gprs_set_mtu CC: netdev@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: Stefan Richter <stefanr@s5r6.in-berlin.de> CC: Faisal Latif <faisal.latif@intel.com> CC: linux-rdma@vger.kernel.org CC: Cliff Whickman <cpw@sgi.com> CC: Robin Holt <robinmholt@gmail.com> CC: Jes Sorensen <jes@trained-monkey.org> CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sathya Prakash <sathya.prakash@broadcom.com> CC: Chaitra P B <chaitra.basappa@broadcom.com> CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> CC: MPT-FusionLinux.pdl@broadcom.com CC: Sebastian Reichel <sre@kernel.org> CC: Felipe Balbi <balbi@kernel.org> CC: Arvid Brodin <arvid.brodin@alten.se> CC: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in core net infraJarod Wilson
geneve: - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu - This one isn't quite as straight-forward as others, could use some closer inspection and testing macvlan: - set min/max_mtu tun: - set min/max_mtu, remove tun_net_change_mtu vxlan: - Merge __vxlan_change_mtu back into vxlan_change_mtu - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function - This one is also not as straight-forward and could use closer inspection and testing from vxlan folks bridge: - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function openvswitch: - set min/max_mtu, remove internal_dev_change_mtu - note: max_mtu wasn't checked previously, it's been set to 65535, which is the largest possible size supported sch_teql: - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535) macsec: - min_mtu = 0, max_mtu = 65535 macvlan: - min_mtu = 0, max_mtu = 65535 ntb_netdev: - min_mtu = 0, max_mtu = 65535 veth: - min_mtu = 68, max_mtu = 65535 8021q: - min_mtu = 0, max_mtu = 65535 CC: netdev@vger.kernel.org CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Tom Herbert <tom@herbertland.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Paolo Abeni <pabeni@redhat.com> CC: Jiri Benc <jbenc@redhat.com> CC: WANG Cong <xiyou.wangcong@gmail.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Pravin B Shelar <pshelar@ovn.org> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Patrick McHardy <kaber@trash.net> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Pravin Shelar <pshelar@nicira.com> CC: Maxim Krasnyansky <maxk@qti.qualcomm.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in WAN driversJarod Wilson
- set min/max_mtu in all hdlc drivers, remove hdlc_change_mtu - sent max_mtu in lec driver, remove lec_change_mtu - set min/max_mtu in x25_asy driver CC: netdev@vger.kernel.org CC: Krzysztof Halasa <khc@pm.waw.pl> CC: Krzysztof Halasa <khalasa@piap.pl> CC: Jan "Yenya" Kasprzak <kas@fi.muni.cz> CC: Francois Romieu <romieu@fr.zoreil.com> CC: Kevin Curtis <kevin.curtis@farsite.co.uk> CC: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in wireless driversJarod Wilson
- set max_mtu in wil6210 driver - set max_mtu in atmel driver - set min/max_mtu in cisco airo driver, remove airo_change_mtu - set min/max_mtu in ipw2100/ipw2200 drivers, remove libipw_change_mtu - set min/max_mtu in p80211netdev, remove wlan_change_mtu - set min/max_mtu in net/mac80211/iface.c and remove ieee80211_change_mtu - set min/max_mtu in wimax/i2400m and remove i2400m_change_mtu - set min/max_mtu in intersil/hostap and remove prism2_change_mtu - set min/max_mtu in intersil/orinoco - set min/max_mtu in tty/n_gsm and remove gsm_change_mtu CC: netdev@vger.kernel.org CC: linux-wireless@vger.kernel.org CC: Maya Erez <qca_merez@qca.qualcomm.com> CC: Simon Kelley <simon@thekelleys.org.uk> CC: Stanislav Yakovlev <stas.yakovlev@gmail.com> CC: Johannes Berg <johannes@sipsolutions.net> CC: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20ipv4: disable BH in set_ping_group_range()Eric Dumazet
In commit 4ee3bd4a8c746 ("ipv4: disable BH when changing ip local port range") Cong added BH protection in set_local_port_range() but missed that same fix was needed in set_ping_group_range() Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Eric Salo <salo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20udp: must lock the socket in udp_disconnect()Eric Dumazet
Baozeng Ding reported KASAN traces showing uses after free in udp_lib_get_port() and other related UDP functions. A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash. I could write a reproducer with two threads doing : static int sock_fd; static void *thr1(void *arg) { for (;;) { connect(sock_fd, (const struct sockaddr *)arg, sizeof(struct sockaddr_in)); } } static void *thr2(void *arg) { struct sockaddr_in unspec; for (;;) { memset(&unspec, 0, sizeof(unspec)); connect(sock_fd, (const struct sockaddr *)&unspec, sizeof(unspec)); } } Problem is that udp_disconnect() could run without holding socket lock, and this was causing list corruptions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: add recursion limit to GROSabrina Dubroca
Currently, GRO can do unlimited recursion through the gro_receive handlers. This was fixed for tunneling protocols by limiting tunnel GRO to one level with encap_mark, but both VLAN and TEB still have this problem. Thus, the kernel is vulnerable to a stack overflow, if we receive a packet composed entirely of VLAN headers. This patch adds a recursion counter to the GRO layer to prevent stack overflow. When a gro_receive function hits the recursion limit, GRO is aborted for this skb and it is processed normally. This recursion counter is put in the GRO CB, but could be turned into a percpu counter if we run out of space in the CB. Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report. Fixes: CVE-2016-7039 Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.") Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20ipv6: properly prevent temp_prefered_lft sysctl raceJiri Bohac
The check for an underflow of tmp_prefered_lft is always false because tmp_prefered_lft is unsigned. The intention of the check was to guard against racing with an update of the temp_prefered_lft sysctl, potentially resulting in an underflow. As suggested by David Miller, the best way to prevent the race is by reading the sysctl variable using READ_ONCE. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20netfilter: fix nf_queue handlingPablo Neira Ayuso
nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace list_head with single linked list") for two reasons: 1) If the bypass flag is set on, there are no userspace listeners and we still have more hook entries to iterate over, then jump to the next hook. Otherwise accept the packet. On nf_reinject() path, the okfn() needs to be invoked. 2) We should not re-enter the same hook on packet reinjection. If the packet is accepted, we have to skip the current hook from where the packet was enqueued, otherwise the packets gets enqueued over and over again. This restores the previous list_for_each_entry_continue() behaviour happening from nf_iterate() that was dealing with these two cases. This patch introduces a new nf_queue() wrapper function so this fix becomes simpler. Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-20netfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reachedNicolas Dichtel
When the maximum evictions number is reached, do not wait 5 seconds before the next run. CC: Florian Westphal <fw@strlen.de> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-20tcp: relax listening_hash operationsEric Dumazet
softirq handlers use RCU protection to lookup listeners, and write operations all happen from process context. We do not need to block BH for dump operations. Also SYN_RECV since request sockets are stored in the ehash table : 1) inet_diag_dump_icsk() no longer need to clear cb->args[3] and cb->args[4] that were used as cursors while iterating the old per listener hash table. 2) Also factorize a test : No need to scan listening_hash[] if r->id.idiag_dport is not zero. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/ncsi: Improve HNCDSC AEN handlerGavin Shan
This improves AEN handler for Host Network Controller Driver Status Change (HNCDSC): * The channel's lock should be hold when accessing its state. * Do failover when host driver isn't ready. * Configure channel when host driver becomes ready. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/ncsi: Choose hot channel as active one if necessaryGavin Shan
The issue was found on BCM5718 which has two NCSI channels in one package: C0 and C1. C0 is in link-up state while C1 is in link-down state. C0 is chosen as active channel until unplugging and plugging C0's cable: On unplugging C0's cable, LSC (Link State Change) AEN packet received on C0 to report link-down event. After that, C1 is chosen as active channel. LSC AEN for link-up event is lost on C0 when plugging C0's cable back. We lose the network even C0 is usable. This resolves the issue by recording the (hot) channel that was ever chosen as active one. The hot channel is chosen to be active one if none of available channels in link-up state. With this, C0 is still the active one after unplugging C0's cable. LSC AEN packet received on C0 when plugging its cable back. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/ncsi: Fix stale link state of inactive channels on failoverGavin Shan
The issue was found on BCM5718 which has two NCSI channels in one package: C0 and C1. Both of them are connected to different LANs, means they are in link-up state and C0 is chosen as the active one until resetting BCM5718 happens as below. Resetting BCM5718 results in LSC (Link State Change) AEN packet received on C0, meaning LSC AEN is missed on C1. When LSC AEN packet received on C0 to report link-down, it fails over to C1 because C1 is in link-up state as software can see. However, C1 is in link-down state in hardware. It means the link state is out of synchronization between hardware and software, resulting in inappropriate channel (C1) selected as active one. This resolves the issue by sending separate GLS (Get Link Status) commands to all channels in the package before trying to do failover. The last link states of all channels in the package are retrieved. With it, C0 (not C1) is selected as active one as expected. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/ncsi: Avoid if statements in ncsi_suspend_channel()Gavin Shan
There are several if/else statements in the state machine implemented by switch/case in ncsi_suspend_channel() to avoid duplicated code. It makes the code a bit hard to be understood. This drops if/else statements in ncsi_suspend_channel() to improve the code readability as Joel Stanley suggested. Also, it becomes easy to add more states in the state machine without affecting current code. No logical changes introduced by this. Suggested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20ila: Fix tailroom allocation of lwtstateThomas Graf
Tailroom is supposed to be of length sizeof(struct ila_lwt) but sizeof(struct ila_params) is currently allocated. This leads to the dst_cache and connected member of ila_lwt being referenced out of bounds. struct ila_lwt { struct ila_params p; struct dst_cache dst_cache; u32 connected : 1; }; Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module") Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net/sched: act_mirred: Use passed lastuse argumentPaul Blakey
stats_update callback is called by NIC drivers doing hardware offloading of the mirred action. Lastuse is passed as argument to specify when the stats was actually last updated and is not always the current time. Fixes: 9798e6fe4f9b ('net: act_mirred: allow statistic updates from offloaded actions') Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19openvswitch: remove unnecessary EXPORT_SYMBOLsJiri Benc
Some symbols exported to other modules are really used only by openvswitch.ko. Remove the exports. Tested by loading all 4 openvswitch modules, nothing breaks. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19openvswitch: remove unused functionsJiri Benc
ovs_vport_deferred_free is not used anywhere. It's the only caller of free_vport_rcu thus this one can be removed, too. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19Bluetooth: Fix append max 11 bytes of name to scan rsp dataMichał Narajowski
Append maximum of 10 + 1 bytes of name to scan response data. Complete name is appended only if exists and is <= 10 characters. Else append short name if exists or shorten complete name if not. This makes sure name is consistent across multiple advertising instances. Signed-off-by: Michał Narajowski <michal.narajowski@codecoup.pl> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-10-19netfilter: x_tables: suppress kmemcheck warningFlorian Westphal
Markus Trippelsdorf reports: WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88001e605480) 4055601e0088ffff000000000000000090686d81ffffffff0000000000000000 u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u ^ |RIP: 0010:[<ffffffff8166e561>] [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160 [..] [<ffffffff8166e561>] nf_register_net_hook+0x51/0x160 [<ffffffff8166eaaf>] nf_register_net_hooks+0x3f/0xa0 [<ffffffff816d6715>] ipt_register_table+0xe5/0x110 [..] This warning is harmless; we copy 'uninitialized' data from the hook ops but it will not be used. Long term the structures keeping run-time data should be disentangled from those only containing config-time data (such as where in the list to insert a hook), but thats -next material. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-19Merge branch 'gup_flag-cleanups'Linus Torvalds
Merge the gup_flags cleanups from Lorenzo Stoakes: "This patch series adjusts functions in the get_user_pages* family such that desired FOLL_* flags are passed as an argument rather than implied by flags. The purpose of this change is to make the use of FOLL_FORCE explicit so it is easier to grep for and clearer to callers that this flag is being used. The use of FOLL_FORCE is an issue as it overrides missing VM_READ/VM_WRITE flags for the VMA whose pages we are reading from/writing to, which can result in surprising behaviour. The patch series came out of the discussion around commit 38e088546522 ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"), which addressed a BUG_ON() being triggered when a page was faulted in with PROT_NONE set but having been overridden by FOLL_FORCE. do_numa_page() was run on the assumption the page _must_ be one marked for NUMA node migration as an actual PROT_NONE page would have been dealt with prior to this code path, however FOLL_FORCE introduced a situation where this assumption did not hold. See https://marc.info/?l=linux-mm&m=147585445805166 for the patch proposal" Additionally, there's a fix for an ancient bug related to FOLL_FORCE and FOLL_WRITE by me. [ This branch was rebased recently to add a few more acked-by's and reviewed-by's ] * gup_flag-cleanups: mm: replace access_process_vm() write parameter with gup_flags mm: replace access_remote_vm() write parameter with gup_flags mm: replace __access_remote_vm() write parameter with gup_flags mm: replace get_user_pages_remote() write/force parameters with gup_flags mm: replace get_user_pages() write/force parameters with gup_flags mm: replace get_vaddr_frames() write/force parameters with gup_flags mm: replace get_user_pages_locked() write/force parameters with gup_flags mm: replace get_user_pages_unlocked() write/force parameters with gup_flags mm: remove write/force parameters from __get_user_pages_unlocked() mm: remove write/force parameters from __get_user_pages_locked() mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
2016-10-19tcp: do not export sysctl_tcp_low_latencyEric Dumazet
Since commit b2fb4f54ecd4 ("tcp: uninline tcp_prequeue()") we no longer access sysctl_tcp_low_latency from a module. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19switchdev: Execute bridge ndos only for bridge portsIdo Schimmel
We recently got the following warning after setting up a vlan device on top of an offloaded bridge and executing 'bridge link': WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum] [...] CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1 Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015 0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3 0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b 0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990 Call Trace: [<ffffffff8135eaa3>] dump_stack+0x63/0x90 [<ffffffff8108c43b>] __warn+0xcb/0xf0 [<ffffffff8108c56d>] warn_slowpath_null+0x1d/0x20 [<ffffffffa01420d5>] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum] [<ffffffffa0142195>] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum] [<ffffffff816f151f>] switchdev_port_attr_get+0x4f/0x140 [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140 [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140 [<ffffffff816f1d6b>] switchdev_port_bridge_getlink+0x5b/0xc0 [<ffffffff816f2680>] ? switchdev_port_fdb_dump+0x90/0x90 [<ffffffff815f5427>] rtnl_bridge_getlink+0xe7/0x190 [<ffffffff8161a1b2>] netlink_dump+0x122/0x290 [<ffffffff8161b0df>] __netlink_dump_start+0x15f/0x190 [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230 [<ffffffff815fab46>] rtnetlink_rcv_msg+0x1a6/0x220 [<ffffffff81208118>] ? __kmalloc_node_track_caller+0x208/0x2c0 [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230 [<ffffffff815fa9a0>] ? rtnl_newlink+0x890/0x890 [<ffffffff8161cf54>] netlink_rcv_skb+0xa4/0xc0 [<ffffffff815f56f8>] rtnetlink_rcv+0x28/0x30 [<ffffffff8161c92c>] netlink_unicast+0x18c/0x240 [<ffffffff8161ccdb>] netlink_sendmsg+0x2fb/0x3a0 [<ffffffff815c5a48>] sock_sendmsg+0x38/0x50 [<ffffffff815c6031>] SYSC_sendto+0x101/0x190 [<ffffffff815c7111>] ? __sys_recvmsg+0x51/0x90 [<ffffffff815c6b6e>] SyS_sendto+0xe/0x10 [<ffffffff817017f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4 The problem is that the 8021q module propagates the call to ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't recognize the netdev, as it's not offloaded. While we can ignore calls being made to non-bridge ports inside the driver, a better fix would be to push this check up to the switchdev layer. Note that these ndos can be called for non-bridged netdev, but this only happens in certain PF drivers which don't call the corresponding switchdev functions anyway. Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Tamir Winetroub <tamirw@mellanox.com> Tested-by: Tamir Winetroub <tamirw@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>