summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-01net/mlx5: E-Switch, Use iterator for vlan and min-inline setupsBodong Wang
Use the defined iterators to traversal VF reps/vport. Also, rely on num of VFs rather than the counter of enabled vports as PF will also be enabled from ECPF side, and the counter will be different from num of VFs. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: E-Switch, Reg/unreg function changed event at correct stageBodong Wang
When driver is doing eswitch mode change, it's critical to keep number of enabled VFs unchanged. However, it can be changed on the fly once function changed event is registered. To remove this uncertainty, function changed event should not be registered before all setups, and first be unregistered before all cleanups. Wrap this functionality together with vport event handler. Fixes: 61fc880839e6 ("net/mlx5: E-Switch, Handle representors creation in handler context") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: E-Switch, Consolidate eswitch function number of VFsBodong Wang
Enabled number of VFs is key for eswich manager to do flow steering initialization and vport configurations. However, the number of enabled VFs may come from two sources as below. PF: num of VFs is provided by enabled SR-IOV of itself. ECPF: num of VFs is provided by enabled SR-IOV from its peer PF. And SR-IOV can't be enabled from ECPF itself. Current driver handles the two cases in different stages and passing the number of enabled VFs among a large scope of internal functions. It is usually hard to find out where is the real number of VFs from due to layers of argument pass-in. This patch consolidated that number from the entry point of doing eswitch setup, and maintained a copy so that eswitch functions can refer to it directly. Eswitch driver shall always use this number when referring to enabled number of VFs, don't use other numbers such as from SR-IOV. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: E-Switch, Refactor eswitch SR-IOV interfaceBodong Wang
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Handle host PF vport mac/guid for ECPFBodong Wang
When ECPF is eswitch manager, it has the privilege to query and configure the mac and node guid of host PF. While vport number of host PF is 0, the vport command should be issued with other_vport set in this case as the cmd is issued by ECPF vport(0xfffe). Add a specific function to query own vport mac. Low level functions are used by vport manager to query/modify any vport mac and node guid. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: E-Switch, Use correct flags when configuring vlanBodong Wang
Before the offending commit, vlan will be configured if either vlan or qos is set. After the change with new set flags, function callers should provide flags accordingly. Fixes: e33dfe316cf3 ("net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Reduce dependency on enabled_vfs counter and num_vfsParav Pandit
While enabling SR-IOV, PCI core already checks that if SR-IOV is already enabled, it returns failure error code. Hence, remove such duplicate check from mlx5_core driver. While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in reverse order of mlx5_device_enable_sriov(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Don't handle VF func change if host PF is disabledBodong Wang
When ECPF eswitch manager is at offloads mode, it monitors functions changed event from host PF side and acts according to the number of VFs enabled/disabled. As ECPF and host PF work in two independent hosts, it's possible that host PF OS reboots but ECPF system is still kept on and continues monitoring events from host PF. When kernel from host PF side is booting, PCI iov driver does sriov_init and compute_max_vf_buses by iterating over all valid num of VFs. This triggers FLR and generates functions changed events, even though host PF HCA is not enabled at this time. However, ECPF is not aware of this information, and still handles these events as usual. ECPF system will see massive number of reps are created, but destroyed immediately once creation finished. To eliminate this noise, a bit is added to host parameter context to indicate host PF is disabled. ECPF will not handle the VF changed event if this bit is set. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devicesParav Pandit
As mlx5_get_next_phys_dev is used only for PCI PF devices use case, limit it to search only for PCI devices. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Move pci status reg access mutex to mlx5_pci_initParav Pandit
mlx5_pci_init() performs pci specific initialization of the mlx5_core_dev struct. Hence move pci_status_mutex to pci initialization routine mlx5_pci_init(). This allows reusing mlx5_mdev_init() to non PCI devices. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_typeHuy Nguyen
Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5 device types. mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep mlx5_coredev_type in mlx5_core_dev structure. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01RDMA/mlx5: Cleanup rep when doing unloadBodong Wang
When an IB rep is loaded, netdev for the same vport is saved for later reference. However, it's not cleaned up when doing unload. For ECPF, kernel crashes when driver is referring to the already removed netdev. Following steps lead to a shown call trace: 1. Create n VFs from host PF 2. Distroy the VFs 3. Run "rdma link" from ARM Call trace: mlx5_ib_get_netdev+0x9c/0xe8 [mlx5_ib] mlx5_query_port_roce+0x268/0x558 [mlx5_ib] mlx5_ib_rep_query_port+0x14/0x34 [mlx5_ib] ib_query_port+0x9c/0xfc [ib_core] fill_port_info+0x74/0x28c [ib_core] nldev_port_get_doit+0x1a8/0x1e8 [ib_core] rdma_nl_rcv_msg+0x16c/0x1c0 [ib_core] rdma_nl_rcv+0xe8/0x144 [ib_core] netlink_unicast+0x184/0x214 netlink_sendmsg+0x288/0x354 sock_sendmsg+0x18/0x2c __sys_sendto+0xbc/0x138 __arm64_sys_sendto+0x28/0x34 el0_svc_common+0xb0/0x100 el0_svc_handler+0x6c/0x84 el0_svc+0x8/0xc Cleanup the rep and netdev reference when unloading IB rep. Fixes: 26628e2d58c9 ("RDMA/mlx5: Move to single device multiport ports in switchdev mode") Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01{IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mappingBodong Wang
In the single IB device mode, the mapping between vport number and rep relies on a counter. However for dynamic vport allocation, it is desired to keep consistent map of eswitch vport and IB port. Hence, simplify code to remove the free running counter and instead use the available vport index during load/unload sequence from the eswitch. Signed-off-by: Bodong Wang <bodong@mellanox.com> Suggested-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: E-Switch, Use vport index when init repBodong Wang
Driver is referring to the array index when doing rep initialization, using vport is confusing as it's normally interpreted as vport number. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Added MCQI and MCQS registers' description to ifcShay Agroskin
Given a fw component index, the MCQI register allows us to query this component's information (e.g. its version and capabilities). Given a fw component index, the MCQS register allows us to query the status of a fw component, including its type and state (e.g. PRESET/IN_USE). It can be used to find the index of a component of a specific type, by sequentially increasing the component index, and querying each time the type of the returned component. If max component index is reached, 'last_index_flag' is set by the HCA. These registers' description was added to query the running and pending fw version of the HCA. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Add hardware definitions for sub functionsParav Pandit
Update mlx5 device interface data structures for: 1. New command definitions for allocating, deallocating SF 2. Query SF partition 3. Eswitch SF fields 4. HCA CAP SF fields 5. Extend Eswitch functions command for SF Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.gitKalle Valo
ath.git patches for 5.3. Major changes: ath10k * fixes for SDIO support * add support for firmware logging via WMI
2019-07-01ipv4: don't set IPv6 only flags to IPv4 addressesMatteo Croce
Avoid the situation where an IPV6 only flag is applied to an IPv4 address: # ip addr add 192.0.2.1/24 dev dummy0 nodad home mngtmpaddr noprefixroute # ip -4 addr show dev dummy0 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global noprefixroute dummy0 valid_lft forever preferred_lft forever Or worse, by sending a malicious netlink command: # ip -4 addr show dev dummy0 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 inet 192.0.2.1/24 scope global nodad optimistic dadfailed home tentative mngtmpaddr noprefixroute stable-privacy dummy0 valid_lft forever preferred_lft forever Signed-off-by: Matteo Croce <mcroce@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01samples: pktgen: allow to specify destination portDaniel T. Lee
Currently, kernel pktgen has the feature to specify udp destination port for sending packet. (e.g. pgset "udp_dst_min 9") But on samples, each of the scripts doesn't have any option to achieve this. This commit adds the DST_PORT option to specify the target port(s) in the script. -p : ($DST_PORT) destination PORT range (e.g. 433-444) is also allowed Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01samples: pktgen: add some helper functions for port parsingDaniel T. Lee
This commit adds port parsing and port validate helper function to parse single or range of port(s) from a given string. (e.g. 1234, 443-444) Helpers will be used in prior to set target port(s) in samples/pktgen. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01net:gue.h:Fix shifting signed 32-bit value by 31 bits problemVandana BN
Fix GUE_PFLAG_REMCSUM to use "U" cast to avoid shifting signed 32-bit value by 31 bits problem. Signed-off-by: Vandana BN <bnvandana@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01ipv6: icmp: allow flowlabel reflection in echo repliesEric Dumazet
Extend flowlabel_reflect bitmask to allow conditional reflection of incoming flowlabels in echo replies. Note this has precedence against auto flowlabels. Add flowlabel_reflect enum to replace hard coded values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01net: dst.h: Fix shifting signed 32-bit value by 31 bits problemVandana BN
Fix DST_FEATURE_ECN_CA to use "U" cast to avoid shifting signed 32-bit value by 31 bits problem. Signed-off-by: Vandana BN <bnvandana@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01Documentation/networking: fix default_ttl typo in mpls-sysctlHangbin Liu
default_ttl should be integer instead of bool Reported-by: Ying Xu <yinxu@redhat.com> Fixes: a59166e47086 ("mpls: allow TTL propagation from IP packets to be configured") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01xfrm: remove get_mtu indirection from xfrm_typeFlorian Westphal
esp4_get_mtu and esp6_get_mtu are exactly the same, the only difference is a single sizeof() (ipv4 vs. ipv6 header). Merge both into xfrm_state_mtu() and remove the indirection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-06-30net: openvswitch: fix csum updates for MPLS actionsJohn Hurley
Skbs may have their checksum value populated by HW. If this is a checksum calculated over the entire packet then the CHECKSUM_COMPLETE field is marked. Changes to the data pointer on the skb throughout the network stack still try to maintain this complete csum value if it is required through functions such as skb_postpush_rcsum. The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when changes are made to packet data without a push or a pull. This occurs when the ethertype of the MAC header is changed or when MPLS lse fields are modified. The modification is carried out using the csum_partial function to get the csum of a buffer and add it into the larger checksum. The buffer is an inversion of the data to be removed followed by the new data. Because the csum is calculated over 16 bits and these values align with 16 bits, the effect is the removal of the old value from the CHECKSUM_COMPLETE and addition of the new value. However, the csum fed into the function and the outcome of the calculation are also inverted. This would only make sense if it was the new value rather than the old that was inverted in the input buffer. Fix the issue by removing the bit inverts in the csum_partial calculation. The bug was verified and the fix tested by comparing the folded value of the updated CHECKSUM_COMPLETE value with the folded value of a full software checksum calculation (reset skb->csum to 0 and run skb_checksum_complete(skb)). Prior to the fix the outcomes differed but after they produce the same result. Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Fixes: bc7cc5999fd3 ("openvswitch: update checksum in {push,pop}_mpls") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30Merge tag 'mlx5e-updates-2019-06-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2019-06-28 This series adds some misc updates for mlx5e driver 1) Allow adding the same mac more than once in MPFS table 2) Move to HW checksumming advertising 3) Report netdevice MPLS features 4) Correct physical port name of the PF representor 5) Reduce stack usage in mlx5_eswitch_termtbl_create 6) Refresh TIR improvement for representors 7) Expose same physical switch_id for all representors ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30Merge branch '10GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2019-06-28 This series contains a smorgasbord of updates to many of the Intel drivers. Gustavo A. R. Silva updates the ice and iavf drivers to use the strcut_size() helper where possible. Miguel increases the pause and refresh time for flow control in the e1000e driver during reset for certain devices. Dann Frazier fixes a potential NULL pointer dereference in ixgbe driver when using non-IPSec enabled devices. Colin Ian King fixes a potential overflow during a shift in the ixgbe driver. Also fixes a potential NULL pointer dereference in the iavf driver by adding a check. Venkatesh Srinivas converts the e1000 driver to use dma_wmb() instead of wmb() for doorbell writes to avoid SFENCEs in the transmit and receive paths. Arjan updates the e1000e driver to improve boot time by over 100 msec by reducing the usleep ranges suring system startup. Artem updates the igb driver register dump in ethtool, first prepares the register dump for future additions of registers in the dump, then secondly, adds the RR2DCDELAY register to the dump. When dealing with time-sensitive networks, this register is helpful in determining your latency from the device to the ring. Alex fixes the ixgbevf driver to use the current cached link state, rather than trying to re-check the value from the PF. Harshitha adds support for MACVLAN offloads in i40e by using channels as MACVLAN interfaces. Detlev Casanova updates the e1000e driver to use delayed work instead of timers to run the watchdog. Vitaly fixes an issue in e1000e, where when disconnecting and reconnecting the physical cable connection, the NIC enters a DMoff state. This state causes a mismatch in link and duplexing, so check the PCIm function state and perform a PHY reset when in this state to resolve the issue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30Merge branch 'bnxt_en-Bug-fixes'David S. Miller
Michael Chan says: ==================== bnxt_en: Bug fixes. Miscellaneous bug fix patches, including two resource handling fixes for the RDMA driver, a PCI shutdown patch to add pci_disable_device(), a patch to fix ethtool selftest crash, and the last one suppresses an unnecessry error message. Please also queue patches 1, 2, and 3 for -stable. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30bnxt_en: Suppress error messages when querying DSCP DCB capabilities.Michael Chan
Some firmware versions do not support this so use the silent variant to send the message to firmware to suppress the harmless error. This error message is unnecessarily alarming the user. Fixes: afdc8a84844a ("bnxt_en: Add DCBNL DSCP application protocol support.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30bnxt_en: Cap the returned MSIX vectors to the RDMA driver.Michael Chan
In an earlier commit to improve NQ reservations on 57500 chips, we set the resv_irqs on the 57500 VFs to the fixed value assigned by the PF regardless of how many are actually used. The current code assumes that resv_irqs minus the ones used by the network driver must be the ones for the RDMA driver. This is no longer true and we may return more MSIX vectors than requested, causing inconsistency. Fix it by capping the value. Fixes: 01989c6b69d9 ("bnxt_en: Improve NQ reservations.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30bnxt_en: Fix statistics context reservation logic for RDMA driver.Michael Chan
The current logic assumes that the RDMA driver uses one statistics context adjacent to the ones used by the network driver. This assumption is not true and the statistics context used by the RDMA driver is tied to its MSIX base vector. This wrong assumption can cause RDMA driver failure after changing ethtool rings on the network side. Fix the statistics reservation logic accordingly. Fixes: 780baad44f0f ("bnxt_en: Reserve 1 stat_ctx for RDMA driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30bnxt_en: Fix ethtool selftest crash under error conditions.Michael Chan
After ethtool loopback packet tests, we re-open the nic for the next IRQ test. If the open fails, we must not proceed with the IRQ test or we will crash with NULL pointer dereference. Fix it by checking the bnxt_open_nic() return code before proceeding. Reported-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Fixes: 67fea463fd87 ("bnxt_en: Add interrupt test to ethtool -t selftest.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30bnxt_en: Disable bus master during PCI shutdown and driver unload.Michael Chan
Some chips with older firmware can continue to perform DMA read from context memory even after the memory has been freed. In the PCI shutdown method, we need to call pci_disable_device() to shutdown DMA to prevent this DMA before we put the device into D3hot. DMA memory request in D3hot state will generate PCI fatal error. Similarly, in the driver remove method, the context memory should only be freed after DMA has been shutdown for correctness. Fixes: 98f04cf0f1fc ("bnxt_en: Check context memory requirements from firmware.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-30Merge tag 'iwlwifi-next-for-kalle-2019-06-29' of ↵Kalle Valo
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Patches intended for v5.3 * Work on the new debugging framework continues; * Update the FW API for CSI; * Special SAR implementation for South Korea; * Fixes in the module init error paths; * Debugging infra work continues; * A bunch of RF-kill fixes by Emmanuel; * A fix for AP mode, also related to RF-kill, by Johannes. * A few clean-ups; * Other small fixes and improvements;
2019-06-30Merge tag 'mt76-for-kvalo-2019-06-27' of https://github.com/nbd168/wirelessKalle Valo
mt76 patches for 5.3 * use NAPI polling for tx cleanup on mt7603/mt7615 * various fixes for mt7615 * unify some code between mt7603 and mt7615 * fix locking issues on mt76x02 * add support for toggling edcca on mt7603 * fix reading target tx power with ext PA on mt7603/mt7615 * fix initalizing channel maximum power * fix rate control / tx status reporting issues on mt76x02/mt7603 * add support for eeprom calibration data from mtd on mt7615 * support configuring tx power on mt7615 * fix external PA support on mt76x0 * per-chain signal reporting on mt7615 * rx/tx buffer fixes for USB devices
2019-06-29r8169: remove not needed call to dma_sync_single_for_deviceHeiner Kallweit
DMA_API_HOWTO.txt includes an example explaining when dma_sync_single_for_device() is not needed, and that example matches our use case. The buffer isn't changed by the CPU and direction is DMA_FROM_DEVICE, so we can remove the call to dma_sync_single_for_device(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29r8169: consider that 32 Bit DMA is the defaultHeiner Kallweit
Documentation/DMA-API-HOWTO.txt states: By default, the kernel assumes that your device can address 32-bits of DMA addressing. For a 64-bit capable device, this needs to be increased, and for a device with limitations, it needs to be decreased. Therefore we don't need the 32 Bit DMA fallback configuration and can remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29r8169: improve handling VLAN tagHeiner Kallweit
The VLAN tag is stored in the descriptor in network byte order. Using swab16 works on little endian host systems only. Better play safe and use ntohs or htons respectively. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29net: dsa: mv88e6xxx: wait after reset deactivationBaruch Siach
Add a 1ms delay after reset deactivation. Otherwise the chip returns bogus ID value. This is observed with 88E6390 (Peridot) chip. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29bnx2x: Prevent ptp_task to be rescheduled indefinitelyGuilherme G. Piccoli
Currently bnx2x ptp worker tries to read a register with timestamp information in case of TX packet timestamping and in case it fails, the routine reschedules itself indefinitely. This was reported as a kworker always at 100% of CPU usage, which was narrowed down to be bnx2x ptp_task. By following the ioctl handler, we could narrow down the problem to an NTP tool (chrony) requesting HW timestamping from bnx2x NIC with RX filter zeroed; this isn't reproducible for example with ptp4l (from linuxptp) since this tool requests a supported RX filter. It seems NIC FW timestamp mechanism cannot work well with RX_FILTER_NONE - driver's PTP filter init routine skips a register write to the adapter if there's not a supported filter request. This patch addresses the problem of bnx2x ptp thread's everlasting reschedule by retrying the register read 10 times; between the read attempts the thread sleeps for an increasing amount of time starting in 1ms to give FW some time to perform the timestamping. If it still fails after all retries, we bail out in order to prevent an unbound resource consumption from bnx2x. The patch also adds an ethtool statistic for accounting the skipped TX timestamp packets and it reduces the priority of timestamping error messages to prevent log flooding. The code was tested using both linuxptp and chrony. Reported-and-tested-by: Przemyslaw Hausman <przemyslaw.hausman@canonical.com> Suggested-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29selftests: rtnetlink: skip ipsec offload tests if netdevsim isn't presentFlorian Westphal
running the script on systems without netdevsim now prints: SKIP: ipsec_offload can't load netdevsim instead of error message & failed status. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29igmp: fix memory leak in igmpv3_del_delrec()Eric Dumazet
im->tomb and/or im->sources might not be NULL, but we currently overwrite their values blindly. Using swap() will make sure the following call to kfree_pmc(pmc) will properly free the psf structures. Tested with the C repro provided by syzbot, which basically does : socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IP, IP_ADD_MEMBERSHIP, "\340\0\0\2\177\0\0\1\0\0\0\0", 12) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=0}) = 0 setsockopt(3, SOL_IP, IP_MSFILTER, "\340\0\0\2\177\0\0\1\1\0\0\0\1\0\0\0\377\377\377\377", 20) = 0 ioctl(3, SIOCSIFFLAGS, {ifr_name="lo", ifr_flags=IFF_UP}) = 0 exit_group(0) = ? BUG: memory leak unreferenced object 0xffff88811450f140 (size 64): comm "softirq", pid 0, jiffies 4294942448 (age 32.070s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 ................ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace: [<00000000c7bad083>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000c7bad083>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000c7bad083>] slab_alloc mm/slab.c:3326 [inline] [<00000000c7bad083>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<000000009acc4151>] kmalloc include/linux/slab.h:547 [inline] [<000000009acc4151>] kzalloc include/linux/slab.h:742 [inline] [<000000009acc4151>] ip_mc_add1_src net/ipv4/igmp.c:1976 [inline] [<000000009acc4151>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2100 [<000000004ac14566>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2484 [<0000000052d8f995>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:959 [<000000004ee1e21f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1248 [<0000000066cdfe74>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2618 [<000000009383a786>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3126 [<00000000d8ac0c94>] __sys_setsockopt+0x98/0x120 net/socket.c:2072 [<000000001b1e9666>] __do_sys_setsockopt net/socket.c:2083 [inline] [<000000001b1e9666>] __se_sys_setsockopt net/socket.c:2080 [inline] [<000000001b1e9666>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2080 [<00000000420d395e>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<000000007fd83a4b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link down") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Reported-by: syzbot+6ca1abd0db68b5173a4f@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29Merge branch 'em_ipt-add-support-for-addrtype'David S. Miller
Nikolay Aleksandrov says: ==================== em_ipt: add support for addrtype We would like to be able to use the addrtype from tc for ACL rules and em_ipt seems the best place to add support for the already existing xt match. The biggest issue is that addrtype revision 1 (with ipv6 support) is NFPROTO_UNSPEC and currently em_ipt can't differentiate between v4/v6 if such xt match is used because it passes the match's family instead of the packet one. The first 3 patches make em_ipt match only on IP traffic (currently both policy and addrtype recognize such traffic only) and make it pass the actual packet's protocol instead of the xt match family when it's unspecified. They also add support for NFPROTO_UNSPEC xt matches. The last patch allows to add addrtype rules via em_ipt. We need to keep the user-specified nfproto for dumping in order to be compatible with libxtables, we cannot dump NFPROTO_UNSPEC as the nfproto or we'll get an error from libxtables, thus the nfproto is limited to ipv4/ipv6 in patch 03 and is recorded. v3: don't use the user nfproto for matching, only for dumping, more information is available in the commit message in patch 03 v2: change patch 02 to set the nfproto only when unspecified and drop patch 04 from v1 (Eyal Birger) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29net: sched: em_ipt: add support for addrtype matchingNikolay Aleksandrov
Allow em_ipt to use addrtype for matching. Restrict the use only to revision 1 which has IPv6 support. Since it's a NFPROTO_UNSPEC xt match we use the user-specified nfproto for matching, in case it's unspecified both v4/v6 will be matched by the rule. v2: no changes, was patch 5 in v1 Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29net: sched: em_ipt: keep the user-specified nfproto and dump itNikolay Aleksandrov
If we dump NFPROTO_UNSPEC as nfproto user-space libxtables can't handle it and would exit with an error like: "libxtables: unhandled NFPROTO in xtables_set_nfproto" In order to avoid the error return the user-specified nfproto. If we don't record it then the match family is used which can be NFPROTO_UNSPEC. Even if we add support to mask NFPROTO_UNSPEC in iproute2 we have to be compatible with older versions which would be also be allowed to add NFPROTO_UNSPEC matches (e.g. addrtype after the last patch). v3: don't use the user nfproto for matching, only for dumping the rule, also don't allow the nfproto to be unspecified (explained above) v2: adjust changes to missing patch, was patch 04 in v1 Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29net: sched: em_ipt: set the family based on the packet if it's unspecifiedNikolay Aleksandrov
Set the family based on the packet if it's unspecified otherwise protocol-neutral matches will have wrong information (e.g. NFPROTO_UNSPEC). In preparation for using NFPROTO_UNSPEC xt matches. v2: set the nfproto only when unspecified Suggested-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29net: sched: em_ipt: match only on ip/ipv6 trafficNikolay Aleksandrov
Restrict matching only to ip/ipv6 traffic and make sure we can use the headers, otherwise matches will be attempted on any protocol which can be unexpected by the xt matches. Currently policy supports only ipv4/6. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29Merge branch 'Sub-ns-increment-fixes-in-Macb-PTP'David S. Miller
Harini Katakam says: ==================== Sub ns increment fixes in Macb PTP The subns increment register fields are not captured correctly in the driver. Fix the same and also increase the subns incr resolution. Sub ns resolution was increased to 24 bits in r1p06f2 version. To my knowledge, this PTP driver, with its current BD time stamp implementation, is only useful to that version or above. So, I have increased the resolution unconditionally. Please let me know if there is any IP versions incompatible with this - there is no register to obtain this information from. Changes from RFC: None ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-29net: macb: Fix SUBNS increment and increase resolutionHarini Katakam
The subns increment register has 24 bits as follows: RegBit[15:0] = Subns[23:8]; RegBit[31:24] = Subns[7:0] Fix the same in the driver and increase sub ns resolution to the best capable, 24 bits. This should be the case on all GEM versions that this PTP driver supports. Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>