summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-29bnxt_en: Support all variants of the 5750X chip family.Michael Chan
Define the 57508, 57504, and 57502 chip IDs that are all part of the BNXT_CHIP_P5 family of chips. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Refactor bnxt_init_one() and turn on TPA support on 57500 chips.Michael Chan
With the new TPA feature in the 57500 chips, we need to discover the feature first before setting up the netdev features. Refactor the the firmware probe and init logic more cleanly into 2 functions and and make these calls before setting up the netdev features. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Support TPA counters on 57500 chips.Michael Chan
Support the new expanded TPA v2 counters on 57500 B0 chips for ethtool -S. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Allocate the larger per-ring statistics block for 57500 chips.Michael Chan
The new TPA implemantation has additional TPA counters that extend the per-ring statistics block. Allocate the proper size accordingly. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Refactor ethtool ring statistics logic.Michael Chan
The current code assumes that the per ring statistics counters are fixed. In newer chips that support a newer version of TPA, the TPA counters are also changed. Refactor the code by defining these counter names in arrays so that it is easy to add a new array for a new set of counters supported by the newer chips. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Add hardware GRO setup function for 57500 chips.Michael Chan
Add a more optimized hardware GRO function to setup the SKB on 57500 chips. Some workaround code is no longer needed on 57500 chips and the pseudo checksum is also calculated in hardware, so no need to do the software pseudo checksum in the driver. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Add TPA ID mapping logic for 57500 chips.Michael Chan
The new TPA feature on 57500 supports a larger number of concurrent TPAs (up to 1024) divided among the functions. We need to add some logic to map the hardware TPA ID to a software index that keeps track of each TPA in progress. A 1:1 direct mapping without translation would be too wasteful as we would have to allocate 1024 TPA structures for each RX ring on each PCI function. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Add fast path logic for TPA on 57500 chips.Michael Chan
With all the previous refactoring, the TPA fast path can now be modified slightly to support TPA on the new chips. The main difference is that the agg completions are retrieved differently using the bnxt_get_tpa_agg_p5() function on the new chips. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Set TPA GRO mode flags on 57500 chips properly.Michael Chan
On 57500 chips, hardware GRO mode cannot be determined from the TPA end, so we need to check bp->flags to determine if we are in hardware GRO mode or not. Modify bnxt_set_features so that the TPA flags in bp->flags don't change until the device is closed. This will ensure that the fast path can safely rely on bp->flags to determine the TPA mode. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Refactor tunneled hardware GRO logic.Michael Chan
The 2 GRO functions to set up the hardware GRO SKB fields for 2 different hardware chips have practically identical logic for tunneled packets. Refactor the logic into a separate bnxt_gro_tunnel() function that can be used by both functions. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Handle standalone RX_AGG completions.Michael Chan
On the new 57500 chips, these new RX_AGG completions are not coalesced at the TPA_END completion. Handle these by storing them in the array in the bnxt_tpa_info struct, as they are seen when processing the CMPL ring. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.Michael Chan
Add an aggregation array to bnxt_tpa_info struct to keep track of the aggregation completions. The aggregation completions are not completed at the TPA_END completion on 57500 chips so we need to keep track of them. The array is only allocated on the new chips when required. An agg_count field is also added to keep track of the number of these completions. The maximum concurrent TPA is now discovered from firmware instead of the hardcoded 64. Add a new bp->max_tpa to keep track of maximum configured TPA. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Refactor TPA logic.Michael Chan
Refactor the TPA logic slightly, so that the code can be more easily extended to support TPA on the new 57500 chips. In particular, the logic to get the next aggregation completion is refactored into a new function bnxt_get_agg() so that this operation is made more generalized. This operation will be different on the new chip in TPA mode. The logic to recycle the aggregation buffers has a new start index parameter added for the same purpose. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Add TPA structure definitions for BCM57500 chips.Michael Chan
The new chips have a slightly modified TPA interface for LRO/GRO_HW. Modify the TPA structures so that the same structures can also be used on the new chips. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29bnxt_en: Update firmware interface spec. to 1.10.0.89.Michael Chan
Among the changes are new CoS discard counters and new ctx_hw_stats_ext struct for the latest 5750X B0 chips. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29can: fix ioctl function removalOliver Hartkopp
Commit 60649d4e0af ("can: remove obsolete empty ioctl() handler") replaced the almost empty can_ioctl() function with sock_no_ioctl() which always returns -EOPNOTSUPP. Even though we don't have any ioctl() functions on socket/network layer we need to return -ENOIOCTLCMD to be able to forward ioctl commands like SIOCGIFINDEX to the network driver layer. This patch fixes the wrong return codes in the CAN network layer protocols. Reported-by: kernel test robot <rong.a.chen@intel.com> Fixes: 60649d4e0af ("can: remove obsolete empty ioctl() handler") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: hamradio: baycom_epp: Mark expected switch fall-throughGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warning (Building: i386): drivers/net/hamradio/baycom_epp.c: In function ‘transmit’: drivers/net/hamradio/baycom_epp.c:491:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (i) { ^ drivers/net/hamradio/baycom_epp.c:504:3: note: here default: /* fall through */ ^~~~~~~ Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: wan: sdla: Mark expected switch fall-throughGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warning (Building: i386): drivers/net/wan/sdla.c: In function ‘sdla_errors’: drivers/net/wan/sdla.c:414:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (cmd == SDLA_INFORMATION_WRITE) ^ drivers/net/wan/sdla.c:417:3: note: here default: ^~~~~~~ Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: sctp: drop unneeded likely() call around IS_ERR()Enrico Weigelt
IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabledPetr Machata
Spectrum systems have a configurable limit on how far into the packet they parse. By default, the limit is 96 bytes. An IPv6 PTP packet is layered as Ethernet/IPv6/UDP (14+40+8 bytes), and sequence ID of a PTP event is only available 32 bytes into payload, for a total of 94 bytes. When an additional 802.1q header is present as well (such as when ptp4l is running on a VLAN port), the parsing limit is exceeded. Such packets are not recognized as PTP, and are not timestamped. Therefore generalize the current VXLAN-specific parsing depth setting to allow reference-counted requests from other modules as well. Keep it in the VXLAN module, because the MPRS register also configures UDP destination port number used for VXLAN, and is thus closely tied to the VXLAN code anyway. Then invoke the new interfaces from both VXLAN (in obvious places), as well as from PTP code, when the (global) timestamping configuration changes from disabled to enabled or vice versa. Fixes: 8748642751ed ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29Merge branch 'devmap_hash'Alexei Starovoitov
Toke Høiland-Jørgensen says: ==================== This series adds a new map type, devmap_hash, that works like the existing devmap type, but using a hash-based indexing scheme. This is useful for the use case where a devmap is indexed by ifindex (for instance for use with the routing table lookup helper). For this use case, the regular devmap needs to be sized after the maximum ifindex number, not the number of devices in it. A hash-based indexing scheme makes it possible to size the map after the number of devices it should contain instead. This was previously part of my patch series that also turned the regular bpf_redirect() helper into a map-based one; for this series I just pulled out the patches that introduced the new map type. Changelog: v5: - Dynamically set the number of hash buckets by rounding up max_entries to the nearest power of two (mirroring the regular hashmap), as suggested by Jesper. v4: - Remove check_memlock parameter that was left over from an earlier patch series. - Reorder struct members to avoid holes. v3: - Rework the split into different patches - Use spin_lock_irqsave() - Also add documentation and bash completion definitions for bpftool v2: - Split commit adding the new map type so uapi and tools changes are separate. Changes to these patches since the previous series: - Rebase on top of the other devmap changes (makes this one simpler!) - Don't enforce key==val, but allow arbitrary indexes. - Rename the type to devmap_hash to reflect the fact that it's just a hashmap now. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29tools: Add definitions for devmap_hash map typeToke Høiland-Jørgensen
This adds selftest and bpftool updates for the devmap_hash map type. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29tools/libbpf_probes: Add new devmap_hash typeToke Høiland-Jørgensen
This adds the definition for BPF_MAP_TYPE_DEVMAP_HASH to libbpf_probes.c in tools/lib/bpf. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29tools/include/uapi: Add devmap_hash BPF map typeToke Høiland-Jørgensen
This adds the devmap_hash BPF map type to the uapi headers in tools/. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29xdp: Add devmap_hash map type for looking up devices by hashed indexToke Høiland-Jørgensen
A common pattern when using xdp_redirect_map() is to create a device map where the lookup key is simply ifindex. Because device maps are arrays, this leaves holes in the map, and the map has to be sized to fit the largest ifindex, regardless of how many devices actually are actually needed in the map. This patch adds a second type of device map where the key is looked up using a hashmap, instead of being used as an array index. This allows maps to be densely packed, so they can be smaller. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29xdp: Refactor devmap allocation code for reuseToke Høiland-Jørgensen
The subsequent patch to add a new devmap sub-type can re-use much of the initialisation and allocation code, so refactor it into separate functions. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29include/bpf.h: Remove map_insert_ctx() stubsToke Høiland-Jørgensen
When we changed the device and CPU maps to use linked lists instead of bitmaps, we also removed the need for the map_insert_ctx() helpers to keep track of the bitmaps inside each map. However, it seems I forgot to remove the function definitions stubs, so remove those here. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-29netfilter: ipset: Fix rename concurrency with listingJozsef Kadlecsik
Shijie Luo reported that when stress-testing ipset with multiple concurrent create, rename, flush, list, destroy commands, it can result ipset <version>: Broken LIST kernel message: missing DATA part! error messages and broken list results. The problem was the rename operation was not properly handled with respect of listing. The patch fixes the issue. Reported-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2019-07-29netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and ↵Stefano Brivio
hash:ip,mac sets In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the KADT functions for sets matching on MAC addreses the copy of source or destination MAC address depending on the configured match. This was done correctly for hash:mac, but for hash:ip,mac and bitmap:ip,mac, copying and pasting the same code block presents an obvious problem: in these two set types, the MAC address is the second dimension, not the first one, and we are actually selecting the MAC address depending on whether the first dimension (IP address) specifies source or destination. Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags. This way, mixing source and destination matches for the two dimensions of ip,mac set types works as expected. With this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add 192.0.2.1/24 dev veth1 ip -net A addr add 192.0.2.2/24 dev veth2 ip link set veth1 up ip -net A link set veth2 up dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16 ip netns exec A ipset add test_bitmap 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP ip netns exec A ipset create test_hash hash:ip,mac ip netns exec A ipset add test_hash 192.0.2.1,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP ipset correctly matches a test packet: # ping -c1 192.0.2.2 >/dev/null # echo $? 0 Reported-by: Chen Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2019-07-29netfilter: ipset: Actually allow destination MAC address for hash:ip,mac ↵Stefano Brivio
sets too In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the KADT check that prevents matching on destination MAC addresses for hash:mac sets, but forgot to remove the same check for hash:ip,mac set. Drop this check: functionality is now commented in man pages and there's no reason to restrict to source MAC address matching anymore. Reported-by: Chen Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2019-07-29Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost fixes from Michael Tsirkin: - Fixes in the iommu and balloon devices. - Disable the meta-data optimization for now - I hope we can get it fixed shortly, but there's no point in making users suffer crashes while we are working on that. * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: disable metadata prefetch optimization iommu/virtio: Update to most recent specification balloon: fix up comments mm/balloon_compaction: avoid duplicate page removal
2019-07-29net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operationsRasmus Villemoes
We have an ERPS (Ethernet Ring Protection Switching) setup involving mv88e6250 switches which we're in the process of switching to a BSP based on the mainline driver. Breaking any link in the ring works as expected, with the ring reconfiguring itself quickly and traffic continuing with almost no noticable drops. However, when plugging back the cable, we see 5+ second stalls. This has been tracked down to the userspace application in charge of the protocol missing a few CCM messages on the good link (the one that was not unplugged), causing it to broadcast a "signal fail". That message eventually reaches its link partner, which responds by blocking the port. Meanwhile, the first node has continued to block the port with the just plugged-in cable, breaking the network. And the reason for those missing CCM messages has in turn been tracked down to the VTU apparently being too busy servicing load/purge operations that the normal lookups are delayed. Initial state, the link between C and D is blocked in software. _____________________ / \ | | A ----- B ----- C *---- D Unplug the cable between C and D. _____________________ / \ | | A ----- B ----- C * * D Reestablish the link between C and D. _____________________ / \ | | A ----- B ----- C *---- D Somehow, enough VTU/ATU operations happen inside C that prevents the application from receving the CCM messages from B in a timely manner, so a Signal Fail message is sent by C. When B receives that, it responds by blocking its port. _____________________ / \ | | A ----- B *---* C *---- D Very shortly after this, the signal fail condition clears on the BC link (some CCM messages finally make it through), so C unblocks the port. However, a guard timer inside B prevents it from removing the blocking before 5 seconds have elapsed. It is not unlikely that our userspace ERPS implementation could be smarter and/or is simply buggy. However, this patch fixes the symptoms we see, and is a small optimization that should not break anything (knock wood). The idea is simply to avoid doing an VTU load of an entry identical to the one already present. To do that, we need to know whether mv88e6xxx_vtu_get() actually found an existing entry, or has just prepared a struct mv88e6xxx_vtu_entry for us to load. To that end, let vlan->valid be an output parameter. The other two callers of mv88e6xxx_vtu_get() are not affected by this patch since they pass new=false. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: spider_net: Mark expected switch fall-throughGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/ethernet/toshiba/spider_net.c: In function 'spider_net_release_tx_chain': drivers/net/ethernet/toshiba/spider_net.c:783:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (!brutal) { ^ drivers/net/ethernet/toshiba/spider_net.c:792:3: note: here case SPIDER_NET_DESCR_RESPONSE_ERROR: ^~~~ Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: ehea: Mark expected switch fall-throughGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/ethernet/ibm/ehea/ehea_main.c: In function 'ehea_mem_notifier': include/linux/printk.h:311:2: warning: this statement may fall through [-Wimplicit-fallthrough=] printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/ibm/ehea/ehea_main.c:3253:3: note: in expansion of macro 'pr_info' pr_info("memory offlining canceled"); ^~~~~~~ drivers/net/ethernet/ibm/ehea/ehea_main.c:3256:2: note: here case MEM_ONLINE: ^~~~ Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29Merge tag 'platform-drivers-x86-v5.3-3' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver fixes from Andy Shevchenko: "Business as usual, a few fixes and new IDs: - PC Engines APU got one fix for software dependencies to automatically load them and another fix for mapping of key button in the front to issue restart event. - OLPC driver is now probed automatically based on module device table. - Intel PMC core driver supports Intel Ice Lake NNPI processor. - WMI driver missed description of a new field in the structure that has been added" * tag 'platform-drivers-x86-v5.3-3' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: pcengines-apuv2: use KEY_RESTART for front button platform/x86: intel_pmc_core: Add ICL-NNPI support to PMC Core Platform: OLPC: add SPI MODULE_DEVICE_TABLE platform/x86: wmi: add missing struct parameter description platform/x86: pcengines-apuv2: Fix softdep statement
2019-07-29mvpp2: refactor the HW checksum setupMatteo Croce
The hardware can only offload checksum calculation on first port due to the Tx FIFO size limitation, and has a maximum L3 offset of 128 bytes. Document this in a comment and move duplicated code in a function. Fixes: 576193f2d579 ("net: mvpp2: jumbo frames support") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: fix ifindex collision during namespace removalJiri Pirko
Commit aca51397d014 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.") introduced a possibility to hit a BUG in case device is returning back to init_net and two following conditions are met: 1) dev->ifindex value is used in a name of another "dev%d" device in init_net. 2) dev->name is used by another device in init_net. Under real life circumstances this is hard to get. Therefore this has been present happily for over 10 years. To reproduce: $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff 3: enp0s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff $ ip netns add ns1 $ ip -n ns1 link add dummy1ns1 type dummy $ ip -n ns1 link add dummy2ns1 type dummy $ ip link set enp0s2 netns ns1 $ ip -n ns1 link set enp0s2 name dummy0 [ 100.858894] virtio_net virtio0 dummy0: renamed from enp0s2 $ ip link add dev4 type dummy $ ip -n ns1 a 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: dummy1ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff 3: dummy2ns1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff 4: dummy0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff 4: dev4: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff $ ip netns del ns1 [ 158.717795] default_device_exit: failed to move dummy0 to init_net: -17 [ 158.719316] ------------[ cut here ]------------ [ 158.720591] kernel BUG at net/core/dev.c:9824! [ 158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI [ 158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18 [ 158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [ 158.727508] Workqueue: netns cleanup_net [ 158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f [ 158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e [ 158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282 [ 158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000 [ 158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64 [ 158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c [ 158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000 [ 158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72 [ 158.750638] FS: 0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000 [ 158.752944] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0 [ 158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 158.762758] Call Trace: [ 158.763882] ? dev_change_net_namespace+0xbb0/0xbb0 [ 158.766148] ? devlink_nl_cmd_set_doit+0x520/0x520 [ 158.768034] ? dev_change_net_namespace+0xbb0/0xbb0 [ 158.769870] ops_exit_list.isra.0+0xa8/0x150 [ 158.771544] cleanup_net+0x446/0x8f0 [ 158.772945] ? unregister_pernet_operations+0x4a0/0x4a0 [ 158.775294] process_one_work+0xa1a/0x1740 [ 158.776896] ? pwq_dec_nr_in_flight+0x310/0x310 [ 158.779143] ? do_raw_spin_lock+0x11b/0x280 [ 158.780848] worker_thread+0x9e/0x1060 [ 158.782500] ? process_one_work+0x1740/0x1740 [ 158.784454] kthread+0x31b/0x420 [ 158.786082] ? __kthread_create_on_node+0x3f0/0x3f0 [ 158.788286] ret_from_fork+0x3a/0x50 [ 158.789871] ---[ end trace defd6c657c71f936 ]--- [ 158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f [ 158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e [ 158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282 [ 158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000 [ 158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64 [ 158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c [ 158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000 [ 158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72 [ 158.829899] FS: 0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000 [ 158.834923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0 [ 158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fix this by checking if a device with the same name exists in init_net and fallback to original code - dev%d to allocate name - in case it does. This was found using syzkaller. Fixes: aca51397d014 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29r8169: make use of xmit_moreHeiner Kallweit
There was a previous attempt to use xmit_more, but the change had to be reverted because under load sometimes a transmit timeout occurred [0]. Maybe this was caused by a missing memory barrier, the new attempt keeps the memory barrier before the call to netif_stop_queue like it is used by the driver as of today. The new attempt also changes the order of some calls as suggested by Eric. [0] https://lkml.org/lkml/2019/2/10/39 Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29mvpp2: refactor MTU change codeMatteo Croce
The MTU change code can call napi_disable() with the device already down, leading to a deadlock. Also, lot of code is duplicated unnecessarily. Rework mvpp2_change_mtu() to avoid the deadlock and remove duplicated code. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29rocker: fix memory leaks of fib_work on two error return pathsColin Ian King
Currently there are two error return paths that leak memory allocated to fib_work. Fix this by kfree'ing fib_work before returning. Addresses-Coverity: ("Resource leak") Fixes: 19a9d136f198 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway") Fixes: dbcc4fa718ee ("rocker: Fail attempts to use routes with nexthop objects") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: stmmac: manage errors returned by of_get_mac_address()Martin Blumenstingl
Commit d01f449c008a ("of_net: add NVMEM support to of_get_mac_address") added support for reading the MAC address from an nvmem-cell. This required changing the logic to return an error pointer upon failure. If stmmac is loaded before the nvmem provider driver then of_get_mac_address() return an error pointer with -EPROBE_DEFER. Propagate this error so the stmmac driver will be probed again after the nvmem provider driver is loaded. Default to a random generated MAC address in case of any other error, instead of using the error pointer as MAC address. Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29staging/octeon: Allow test build on !MIPSMatthew Wilcox (Oracle)
Add compile test support by moving all includes of files under asm/octeon into octeon-ethernet.h, and if we're not on MIPS, stub out all the calls into the octeon support code in octeon-stubs.h Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29Do not dereference 'siw_crypto_shash' before checkingBernard Metzler
Reported-by: "Dan Carpenter" <dan.carpenter@oracle.com> Fixes: f29dd55b0236 ("rdma/siw: queue pair methods") Link: https://lore.kernel.org/r/OF61E386ED.49A73798-ON00258444.003BD6A6-00258444.003CC8D9@notes.na.collabserv.com Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29net/af_iucv: mark expected switch fall-throughsGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warnings: net/iucv/af_iucv.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 537:3, 519:6, 2246:6, 510:6 Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29arcnet: com20020-isa: Mark expected switch fall-throughsGustavo A. R. Silva
Mark switch cases where we are expecting to fall through. This patch fixes the following warnings: drivers/net/arcnet/com20020-isa.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 205:13, 203:10, 209:7, 201:11, 207:8 Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29ALSA: pcm: fix lost wakeup event scenarios in snd_pcm_drainYuki Tsunashima
lost wakeup can occur after enabling irq, therefore put task into interruptible before enabling interrupts, without this change, task can be put to sleep and snd_pcm_drain will delay Fixes: f2b3614cefb6 ("ALSA: PCM - Don't check DMA time-out too shortly") Signed-off-by: Yuki Tsunashima <ytsunashima@jp.adit-jv.com> Signed-off-by: Suresh Udipi <sudipi@jp.adit-jv.com> [ported from 4.9] Signed-off-by: Adam Miartus <amiartus@de.adit-jv.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-07-29RDMA/qedr: Fix the hca_type and hca_rev returned in device attributesMichal Kalderon
There was a place holder for hca_type and vendor was returned in hca_rev. Fix the hca_rev to return the hw revision and fix the hca_type to return an informative string representing the hca. Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Link: https://lore.kernel.org/r/20190728111338.21930-1-michal.kalderon@marvell.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29net: bridge: delete local fdb on device init failureNikolay Aleksandrov
On initialization failure we have to delete the local fdb which was inserted due to the default pvid creation. This problem has been present since the inception of default_pvid. Note that currently there are 2 cases: 1) in br_dev_init() when br_multicast_init() fails 2) if register_netdevice() fails after calling ndo_init() This patch takes care of both since br_vlan_flush() is called on both occasions. Also the new fdb delete would be a no-op on normal bridge device destruction since the local fdb would've been already flushed by br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is called last when adding a port thus nothing can fail after it. Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: ag71xx: use resource_size for the ioremap sizeDing Xiang
use resource_size to calcuate ioremap size and make the code simpler. Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-29net: sched: Fix a possible null-pointer dereference in dequeue_func()Jia-Ju Bai
In dequeue_func(), there is an if statement on line 74 to check whether skb is NULL: if (skb) When skb is NULL, it is used on line 77: prefetch(&skb->end); Thus, a possible null-pointer dereference may occur. To fix this bug, skb->end is used when skb is not NULL. This bug is found by a static analysis tool STCheck written by us. Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>