summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-02-28selftests: forwarding: Add initial testing frameworkIdo Schimmel
Add initial framework to test packet forwarding functionality. The tests can run on actual devices using loop-backed cables or using veth pairs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28virtio-net: disable NAPI only when enabled during XDP setJason Wang
We try to disable NAPI to prevent a single XDP TX queue being used by multiple cpus. But we don't check if device is up (NAPI is enabled), this could result stall because of infinite wait in napi_disable(). Fixing this by checking device state through netif_running() before. Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipvlan: use per device spinlock to protect addrs list updatesPaolo Abeni
This changeset moves ipvlan address under RCU protection, using a per ipvlan device spinlock to protect list mutation and RCU read access to protect list traversal. Also explicitly use RCU read lock to traverse the per port ipvlans list, so that we can now perform a full address lookup without asserting the RTNL lock. Overall this allows the ipvlan driver to check fully for duplicate addresses - before this commit ipv6 addresses assigned by autoconf via prefix delegation where accepted without any check - and avoid the following rntl assertion failure still in the same code path: RTNL: assertion failed at drivers/net/ipvlan/ipvlan_core.c (124) WARNING: CPU: 15 PID: 0 at drivers/net/ipvlan/ipvlan_core.c:124 ipvlan_addr_busy+0x97/0xa0 [ipvlan] Modules linked in: ipvlan(E) ixgbe CPU: 15 PID: 0 Comm: swapper/15 Tainted: G E 4.16.0-rc2.ipvlan+ #1782 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 RIP: 0010:ipvlan_addr_busy+0x97/0xa0 [ipvlan] RSP: 0018:ffff881ff9e03768 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff881fdf2a9000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00000000000000f6 RDI: 0000000000000300 RBP: ffff881fdf2a8000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: ffff881ff9e034c0 R12: ffff881fe07bcc00 R13: 0000000000000001 R14: ffffffffa02002b0 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff881ff9e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc5c1a4f248 CR3: 000000207e012005 CR4: 00000000001606e0 Call Trace: <IRQ> ipvlan_addr6_event+0x6c/0xd0 [ipvlan] notifier_call_chain+0x49/0x90 atomic_notifier_call_chain+0x6a/0x100 ipv6_add_addr+0x5f9/0x720 addrconf_prefix_rcv_add_addr+0x244/0x3c0 addrconf_prefix_rcv+0x2f3/0x790 ndisc_router_discovery+0x633/0xb70 ndisc_rcv+0x155/0x180 icmpv6_rcv+0x4ac/0x5f0 ip6_input_finish+0x138/0x6a0 ip6_input+0x41/0x1f0 ipv6_rcv+0x4db/0x8d0 __netif_receive_skb_core+0x3d5/0xe40 netif_receive_skb_internal+0x89/0x370 napi_gro_receive+0x14f/0x1e0 ixgbe_clean_rx_irq+0x4ce/0x1020 [ixgbe] ixgbe_poll+0x31a/0x7a0 [ixgbe] net_rx_action+0x296/0x4f0 __do_softirq+0xcf/0x4f5 irq_exit+0xf5/0x110 do_IRQ+0x62/0x110 common_interrupt+0x91/0x91 </IRQ> v1 -> v2: drop unneeded in_softirq check in ipvlan_addr6_validator_event() Fixes: e9997c2938b2 ("ipvlan: fix check for IP addresses in control path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28ipvlan: egress mcast packets are not exceptionalPaolo Abeni
Currently, if IPv6 is enabled on top of an ipvlan device in l3 mode, the following warning message: Dropped {multi|broad}cast of type= [86dd] is emitted every time that a RS is generated and dmseg is soon filled with irrelevant messages. Replace pr_warn with pr_debug, to preserve debuggability, without scaring the sysadmin. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28x86/platform/intel-mid: Handle Intel Edison reboot correctlySebastian Panceac
When the Intel Edison module is powered with 3.3V, the reboot command makes the module stuck. If the module is powered at a greater voltage, like 4.4V (as the Edison Mini Breakout board does), reboot works OK. The official Intel Edison BSP sends the IPCMSG_COLD_RESET message to the SCU by default. The IPCMSG_COLD_BOOT which is used by the upstream kernel is only sent when explicitely selected on the kernel command line. Use IPCMSG_COLD_RESET unconditionally which makes reboot work independent of the power supply voltage. [ tglx: Massaged changelog ] Fixes: bda7b072de99 ("x86/platform/intel-mid: Implement power off sequence") Signed-off-by: Sebastian Panceac <sebastian@resin.io> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1519810849-15131-1-git-send-email-sebastian@resin.io
2018-02-28nvmet: fix PSDT field check in command formatMax Gurtovoy
PSDT field section according to NVM_Express-1.3: "This field specifies whether PRPs or SGLs are used for any data transfer associated with the command. PRPs shall be used for all Admin commands for NVMe over PCIe. SGLs shall be used for all Admin and I/O commands for NVMe over Fabrics. This field shall be set to 01b for NVMe over Fabrics 1.0 implementations. Suggested-by: Idan Burstein <idanb@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
2018-02-28Merge branch 'mlxsw-mq-red-offload'David S. Miller
Jiri Pirko says: ==================== mlxsw: Offload multi-queue RED support Nogah says: Support a two level hierarchy of offloaded qdiscs in mlxsw, with sch_prio being the root qdisc and sch_red as the children. +----------+ | sch_prio | +----+-----+ | | +----------------------------------+ | | | | | | | | | +---v---+ +----v---+ +-----v--+ |sch_red| |sch_red | |sch_red | +-------+ +--------+ +--------+ When setting sch_prio as the root qdisc on a physical port, mlxsw will offload it. When adding it with sch_red as a child qdisc, it will offload it as well. Relocating child qdisc or connecting them to more then one child will result in unoffloading them. Relocating child qdisc more then once is highly unrecommended and might cause a miss match between the kernel configuration and the offloaded one. The offloaded configuration will be aligned with the one shown in the show command. Changing the priomap parameter of sch_prio might cause a band that its configuration was changed and it has offloaded sch_red set on it, to lose some stats data as if sch_red was unoffloaded and offloaded again. However, it won't affect the data on this band that will have sch_red continuously. Patch 1 adds support for setting RED as the child of root qdisc. Patches 2-4 add support for RED bstasts for offloaded child qdiscs. Patches 5-6 handle backlog related changes for offloaded child qdiscs. Patches 7-8 update PRIO in mlxsw to be able to have RED as child on its bands. Patch 9 adds offload handles for PRIO graft operations. In mlxsw it will cause the driver to stop offloading the child in question. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: prio: Handle graft commandNogah Frankel
Handle graft command for an offloaded sch_prio. Grafting a qdisc to any place other than under its original parent is not supported by mlxsw and will cause the grafted qdisc to stop being offloaded. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: sch: prio: Add offload ability for grafting a childNogah Frankel
Offload sch_prio graft command for capable drivers. Warn in case of a failure, unless the graft was done as part of a destroy operation (the new qdisc is a noop) or if all the qdiscs (the parent, the old child, and the new one) are not offloaded. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: prio: Delete child qdiscs when removing bandsNogah Frankel
When the number the bands of sch_prio is decreased, child qdiscs on the deleted bands would get deleted as well. This change and deletions are being done under sch_tree_lock of the sch_prio qdisc. Part of the destruction of qdisc is unoffloading it, if it is offloaded. Un-offloading can't be done inside this lock. Move the offload command to be done before reducing the number of bands, so unoffloading of the qdiscs that are about to be deleted could be done outside of the lock. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: Update sch_prio stats to include sch_red related dropsNogah Frankel
sch_prio as root qdisc should count all the drops its children have. Since it is possible for it to have sch_red children, it needs to count RED early drops. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: sch: Don't warn on missmatching qlen and backlog for offloaded qdiscsNogah Frankel
Offloaded qdiscs are allowed to expose only parts of their statistics. It means that if backlog is being exposed and qlen is not, it might trigger a warning in qdisc_tree_reduce_backlog. Do not warn in case the qdisc that was removed was an offloaded one. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: Update backlog handling of a child qdiscsNogah Frankel
When removing a child qdisc its backlog will be decreased from the parent backlog. The driver backlog count should do the same. When the parent changes its configuration, the child might need to clean its stats. However, the backlog can't be cleaned with the rest of the stats, because it reflects a momentary value that needs to be synced with the core, not the history of the qdisc. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: Collect stats for sch_red based on priomapNogah Frankel
Priority counters count packets according to their packet priority. Collect the stats for sch_red based on these counters, so the qdisc bstats will be the sum of counters matching the priorities marked in the qdisc priomap. Changing the mapping of the priorities to bands while traffic is running can result in losing the stats of the bands qdiscs from their last dump call to this change, as if the qdisc was unoffloaded and re-offloaded. It will not affect the traffic behaviour according to sch_red. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: Add priority map per qdiscNogah Frankel
Add priority map per qdisc, to indicate which priorities are being directed through this qdisc. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: Add priority countersNogah Frankel
Add TX packets and bytes counters per switch priority per port. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28mlxsw: spectrum: qdiscs: Support qdisc per tclassNogah Frankel
Add the option to set a qdisc per tclass. Match the qdisc to the tclass by parent ID. Supported currently for sch_red only. It allows offloading sch_prio as root qdisc and sch_red as its child. (However, doing so might corrupt the stats for both parent and child.) Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net: mvpp2: Add hardware offloading for VLAN filteringMaxime Chevallier
Marvell PPv2 controller allows for generic packet filtering. This commit adds entries to implement VLAN filtering. The approach taken is : - Filter entries that would match on the presence of the VLAN tag (existing VLAN detection, DSA / EDSA detection) will set the next lookup ID to be for the VID. - For each VLAN existing on a given port, we add an entry that matches this specific VID. If the incoming packet matches the VID entry, it is set for the next lookup in the chain (LU_L2). - A Guard entry is added for each port, that will match if the incoming packet didn't match any of the above VID entries. This entry tags the packet to be dropped. Due to this design, and the fact that the total 256 filter entries are also used for other purposes, we have a limit of 10 VLANs per port. To accommodate the case where we would need more VLANS on one port, this patch implements the ndo_set_features to allow for disabling of VLAN filtering using ethtool. The default config has VLAN filtering disabled. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net/tcp/illinois: replace broken algorithm reference linkJoey Pabalinas
The link to the pdf containing the algorithm description is now a dead link; it seems http://www.ifp.illinois.edu/~srikant/ has been moved to https://sites.google.com/a/illinois.edu/srikant/ and none of the original papers can be found there... I have replaced it with the only working copy I was able to find. n.b. there is also a copy available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.6350&rep=rep1&type=pdf However, this seems to only be a *cached* version, so I am unsure exactly how reliable that link can be expected to remain over time and have decided against using that one. Signed-off-by: Joey Pabalinas <joeypabalinas@gmail.com> 1 file changed, 1 insertion(+), 1 deletion(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28r8169: convert remaining feature flag and remove enum featuresHeiner Kallweit
Now that only one feature flag is left we can convert it and remove enum features. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge branch 'macmace-cleanups'David S. Miller
Finn Thain says: ==================== Fixes, cleanup and modernization for macmace driver Changes since v4 of combined patch series: - Removed redundant and non-portable MACH_IS_MAC tests. - Omitted patches unrelated to macmace driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net/macmace: Drop redundant MACH_IS_MAC testFinn Thain
The MACH_IS_MAC test is redundant here because the platform device won't get registered unless MACH_IS_MAC. Adopt module_platform_driver() convention. Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28net/macmace: Fix and clean up log messagesFinn Thain
Don't log the unexpanded "eth%d" format string. Log the chip revision in the probe message (consistent with mace.c). Drop redundant debug messages for FIFO events recorded in the interface statistics (also consistent with mace.c). Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28test_bpf: reduce MAX_TESTRUNSEric Dumazet
For tests that are using the maximal number of BPF instruction, each run takes 20 usec. Looping 10,000 times on them totals 200 ms, which is bad when the loop is not preemptible. test_bpf: #264 BPF_MAXINSNS: Call heavy transformations jited:1 19248 18548 PASS test_bpf: #269 BPF_MAXINSNS: ld_abs+get_processor_id jited:1 20896 PASS Lets divide by ten the number of iterations, so that max latency is 20ms. We could use need_resched() to break the loop earlier if we believe 20 ms is too much. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-28inet: whitespace cleanupStephen Hemminger
Ran simple script to find/remove trailing whitespace and blank lines at EOF because that kind of stuff git whines about and editors leave behind. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: purge write queue upon RSTSoheil Hassas Yeganeh
When the connection is reset, there is no point in keeping the packets on the write queue until the connection is closed. RFC 793 (page 70) and RFC 793-bis (page 64) both suggest purging the write queue upon RST: https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07 Moreover, this is essential for a correct MSG_ZEROCOPY implementation, because userspace cannot call close(fd) before receiving zerocopy signals even when the connection is reset. Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28emulex/benet: Constify *be_misconfig_evt_port_state[]Hernán Gonzalez
Note: This is compile only tested as I have no access to the hw. No benefit gained except for some self-documenting. add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0) Function old new delta Total: Before=2757703, After=2757703, chg +0.00% Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28qlogic/qed: Constify *pkt_type_str[]Hernán Gonzalez
Note: This is compile only tested as I have no access to the hw. Constifying and declaring as static saves 24 bytes. add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-24 (-24) Function old new delta pkt_type_str 24 - -24 Total: Before=3599256, After=3599232, chg -0.00% Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar> Acked-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge branch 'tcp-revert-a-F-RTO-extension-due-to-broken-middle-boxes'David S. Miller
Yuchung Cheng says: ==================== tcp: revert a F-RTO extension due to broken middle-boxes This patch series reverts a (non-standard) TCP F-RTO extension that aimed to detect more spurious timeouts. Unfortunately it could result in poor performance due to broken middle-boxes that modify TCP packets. E.g. https://www.spinics.net/lists/netdev/msg484154.html We believe the best and simplest solution is to just revert the change. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: revert F-RTO extension to detect more spurious timeoutsYuchung Cheng
This reverts commit 89fe18e44f7ee5ab1c90d0dff5835acee7751427. While the patch could detect more spurious timeouts, it could cause poor TCP performance on broken middle-boxes that modifies TCP packets (e.g. receive window, SACK options). Since the performance gain is much smaller compared to the potential loss. The best solution is to fully revert the change. Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") Reported-by: Teodor Milkov <tm@del.bg> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28tcp: revert F-RTO middle-box workaroundYuchung Cheng
This reverts commit cc663f4d4c97b7297fb45135ab23cfd508b35a77. While fixing some broken middle-boxes that modifies receive window fields, it does not address middle-boxes that strip off SACK options. The best solution is to fully revert this patch and the root F-RTO enhancement. Fixes: cc663f4d4c97 ("tcp: restrict F-RTO to work-around broken middle-boxes") Reported-by: Teodor Milkov <tm@del.bg> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge branch 's390-qeth-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2018-02-27 please apply some more qeth patches for -net and stable. One patch fixes a performance bug in the TSO path. Then there's several more fixes for IP management on L3 devices - including a revert, so that the subsequent fix cleanly applies to earlier kernels. The final patch takes care of a race in the control IO code that causes qeth to miss the cmd response, and subsequently trigger device recovery. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28s390/qeth: fix IPA command submission raceJulian Wiedmann
If multiple IPA commands are build & sent out concurrently, fill_ipacmd_header() may assign a seqno value to a command that's different from what send_control_data() later assigns to this command's reply. This is due to other commands passing through send_control_data(), and incrementing card->seqno.ipa along the way. So one IPA command has no reply that's waiting for its seqno, while some other IPA command has multiple reply objects waiting for it. Only one of those waiting replies wins, and the other(s) times out and triggers a recovery via send_ipa_cmd(). Fix this by making sure that the same seqno value is assigned to a command and its reply object. Do so immediately before submitting the command & while holding the irq_pending "lock", to produce nicely ascending seqnos. As a side effect, *all* IPA commands now use a reply object that's waiting for its actual seqno. Previously, early IPA commands that were submitted while the card was still DOWN used the "catch-all" IDX seqno. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28s390/qeth: fix IP address lookup for L3 devicesJulian Wiedmann
Current code ("qeth_l3_ip_from_hash()") matches a queried address object against objects in the IP table by IP address, Mask/Prefix Length and MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually require is either a) "is this IP address registered" (ie. match by IP address only), before adding a new address. b) or "is this address object registered" (ie. match all relevant attributes), before deleting an address. Right now 1. the ADD path is too strict in its lookup, and eg. doesn't detect conflicts between an existing NORMAL address and a new VIPA address (because the NORMAL address will have mask != 0, while VIPA has a mask == 0), 2. the DELETE path is not strict enough, and eg. allows del_rxip() to delete a VIPA address as long as the IP address matches. Fix all this by adding helpers (_addr_match_ip() and _addr_match_all()) that do the appropriate checking. Note that the ADD path for NORMAL addresses is special, as qeth keeps track of how many times such an address is in use (and there is no immediate way of returning errors to the caller). So when a requested NORMAL address _fully_ matches an existing one, it's not considered a conflict and we merely increment the refcount. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Revert "s390/qeth: fix using of ref counter for rxip addresses"Julian Wiedmann
This reverts commit cb816192d986f7596009dedcf2201fe2e5bc2aa7. The issue this attempted to fix never actually occurs. l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address was previously added to the card. If so, it returns -EEXIST and doesn't call l3_add_ip(). As a result, the "address exists" path in l3_add_ip() is never taken for rxip addresses, and this patch had no effect. Fixes: cb816192d986 ("s390/qeth: fix using of ref counter for rxip addresses") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28s390/qeth: fix double-free on IP add/remove raceJulian Wiedmann
Registering an IPv4 address with the HW takes quite a while, so we temporarily drop the ip_htable lock. Any concurrent add/remove of the same IP adjusts the IP's use count, and (on remove) is then blocked by addr->in_progress. After the register call has completed, we check the use count for concurrently attempted add/remove calls - and possibly straight-away deregister the IP again. This happens via l3_delete_ip(), which 1) looks up the queried IP in the htable (getting a reference to the *same* queried object), 2) deregisters the IP from the HW, and 3) frees the IP object. The caller in l3_add_ip() then does a second free on the same object. For this case, skip all the extra checks and lookups in l3_delete_ip() and just deregister & free the IP object ourselves. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28s390/qeth: fix IP removal on offline cardsJulian Wiedmann
If the HW is not reachable, then none of the IPs in qeth's internal table has been registered with the HW yet. So when deleting such an IP, there's no need to stage it for deregistration - just drop it from the table. This fixes the "add-delete-add" scenario on an offline card, where the the second "add" merely increments the IP's use count. But as the IP is still set to DISP_ADDR_DELETE from the previous "delete" step, l3_recover_ip() won't register it with the HW when the card goes online. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28s390/qeth: fix overestimated count of buffer elementsJulian Wiedmann
qeth_get_elements_for_range() doesn't know how to handle a 0-length range (ie. start == end), and returns 1 when it should return 0. Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all of the skb's linear data) are skipped when mapping the skb into regular buffer elements. This overestimation may cause several performance-related issues: 1. sub-optimal IO buffer selection, where the next buffer gets selected even though the skb would actually still fit into the current buffer. 2. forced linearization, if the element count for a non-linear skb exceeds QETH_MAX_BUFFER_ELEMENTS. Rather than modifying qeth_get_elements_for_range() and adding overhead to every caller, fix up those callers that are in risk of passing a 0-length range. Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28Merge branch 'SFP-updates'David S. Miller
Russell King says: ==================== SFP updates Included in this series are a further few updates for SFP support: - Adding support for Fiberstore's non-standard BiDi modules operating at 1310nm/1550nm wavelengths rather than the 1000BASE-BX standard of 1310nm/1490nm. - Adding support for negotiating the PHY interface mode with the MAC, so that modules supporting faster speeds and Gigabit ethernet work with Gigabit-only MACs. - Adding support for high power (>1W) SFP modules. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28sfp: add high power module supportJon Nettleton
This patch is the result of work by both Jon Nettleton and Russell King. Jon wrote the original patch, adding support for SFP modules which require a power level greater than '1'. Russell's changes: - Fix the power levels for big-endian, and make the code flow better. - Convert to use device_property_read_u8() - Warn for power levels exceeding host level SFF-8431 says: "To avoid exceeding system power supply limits and cooling capacity, all modules at power up by default shall operate with up to 1.0 W. Hosts supporting Power Level II or III operation may enable a Power Level II or III module through the 2-wire interface. Power Level II or III modules shall assert the power level declaration bit of SFF-8472." Print a warning for modules that exceed the host power level, and leave them operating in power level 1. - Fix i2c write The first byte of any write after the bus address is always the device address. In order to write a value to device D, address I, value V, we need to generate on the bus: S DDDDDDDD A IIIIIIII A VVVVVVVV A P where S = start, R = restart, A = ack, P = stop. Splitting this as two: S DDDDDDDD A IIIIIIII A R DDDDDDDD A VVVVVVVV A P results in the device's address register being written first by I and then by V - the addressed register within the device is not written. - Avoid power mode switching if 0xa2 is not implemented Some modules indicate that they support power level II or power level III, but do not implement address 0xa2, meaning that the bit to set them to high power mode is not accessible. These modules appear to have the sff8472_compliance field set to zero, and also do not implement diagnostics. Detect this, but also ensure that the module does not require the address switching mode, which we do not implement. - Use mW for power level rather than power level number. - Fix high power mode transition We must not switch to SFP_MOD_PRESENT state until we have finished initialising, because the remaining state machines check for that state. Add SFP_MOD_HPOWER as an intermediate state. - Use definition for I2C register address rather than constant. Signed-off-by: Jon Nettleton <jon@solid-run.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28dt-bindings: add maximum power level to SFP bindingRussell King
Add the new maximum power level property to the SFP binding. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28phylink,sfp: negotiate interface format with MACRussell King
Negotiate the interface format with the MAC rather than requiring it to be a fixed type specified solely by the SFP module. This allows modules that can work with several different interface signalling formats to select a format compatible with the MAC - for example, a Fiber module supporing Gigabit ethernet and faster connected to a Gigabit only MAC needs to select the 1000BASE-X mode. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28sfp: support 1G BiDi (eg, FiberStore SFP-GE-BX) modulesRussell King
Some BiDi modules (eg, FiberStore SFP-GE-BX) are not compliant with 1000BASE-BX as they use different wavelengths from the 1000BASE-BX standard (eg, 1310nm/1550nm rather than 1310nm/1490nm). These modules support 1000BASE-X ethernet, so detect them by a failure to find any other support, the 8B10B encoding and a bit rate that falls within the 1Gbps window. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28team: Use extack to report enslavement failuresIdo Schimmel
Use extack inside team's enslavement function and also propagate it to the netdevice notifier to allow enslaved ports to report the failure reason. Example: $ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}' $ ip link set dev lo master team0 Error: Loopback device can't be added as a team port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28gianfar: Fix Rx byte accounting for ndev statsClaudiu Manoil
Don't include in the Rx bytecount of the packet sent up the stack: the FCB (frame control block), and the padding bytes inserted by the controller into the frame payload, nor the FCS. All these are being pulled out of the skb by gfar_process_frame(). This issue is old, likely from the driver's beginnings, however it was amplified by recent: commit d903ec77118c ("gianfar: simplify FCS handling and fix memory leak") which basically added the FCS to the Rx bytecount, and so brought this to my attention. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28fm10k: bump version numberJacob Keller
We're aligned with latest version released on SourceForge, so update the version number to match. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-28fm10k: fix incorrect warning for function prototypeJacob Keller
Recent kernels now complain about incorrect function prototype comments, in order to ensure comments are accurate to the function. However, it incorrectly associates the comment above the fm10k_pci_tbl[] as a function header comment. Fix this by removing the extra "*" in the comment. This normally indicates that the function is a doxygen style function header comment. Once removed, the logic no longer kicks in and the following warning is fixed: warning: cannot understand function prototype: 'const struct pci_device_id fm10k_pci_tbl[] = ' Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-28fm10k: fix function doxygen commentsJacob Keller
Several function header comments had incorrect function parameter definitions. Recent versions of the upstream kernel have started to warn about these issues. Fix up the comments which do not match in order to resolve these new warnings. While fixing these, update the copyright year also. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-28objtool: Fix another switch table detection issueJosh Poimboeuf
Continue the switch table detection whack-a-mole. Add a check to distinguish KASAN data reads from switch data reads. The switch jump tables in .rodata have relocations associated with them. This fixes the following warning: crypto/asymmetric_keys/x509_cert_parser.o: warning: objtool: x509_note_pkey_algo()+0xa4: sibling call from callable instruction with modified stack frame Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/d7c8853022ad47d158cb81e953a40469fc08a95e.1519784382.git.jpoimboe@redhat.com
2018-02-28x86/xen: Zero MSR_IA32_SPEC_CTRL before suspendJuergen Gross
Older Xen versions (4.5 and before) might have problems migrating pv guests with MSR_IA32_SPEC_CTRL having a non-zero value. So before suspending zero that MSR and restore it after being resumed. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Link: https://lkml.kernel.org/r/20180226140818.4849-1-jgross@suse.com