summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-09igc: Fix wrong timestamp latency numbersVinicius Costa Gomes
The previous timestamping latency numbers were obtained by interpolating the i210 numbers with the i225 crystal clock value. That calculation was wrong. Use the correct values from real measurements. Fixes: 81b055205e8b ("igc: Add support for RX timestamping") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-09-09i40e: always propagate error value in i40e_set_vsi_promisc()Stefan Assmann
The for loop in i40e_set_vsi_promisc() reports errors via dev_err() but does not propagate the error up the call chain. Instead it continues the loop and potentially overwrites the reported error value. This results in the error being recorded in the log buffer, but the caller might never know anything went the wrong way. To avoid this situation i40e_set_vsi_promisc() needs to temporarily store the error after reporting it. This is still not optimal as multiple different errors may occur, so store the first error and hope that's the main issue. Fixes: 37d318d7805f (i40e: Remove scheduling while atomic possibility) Reported-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-09-09i40e: fix return of uninitialized aq_ret in i40e_set_vsi_promiscStefan Assmann
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c: In function ‘i40e_set_vsi_promisc’: drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1176:14: error: ‘aq_ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized] i40e_status aq_ret; In case the code inside the if statement and the for loop does not get executed aq_ret will be uninitialized when the variable gets returned at the end of the function. Avoid this by changing num_vlans from int to u16, so aq_ret always gets set. Making this change in additional places as num_vlans should never be negative. Fixes: 37d318d7805f ("i40e: Remove scheduling while atomic possibility") Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-09-09net: dsa: b53: Report VLAN table occupancy via devlinkFlorian Fainelli
We already maintain an array of VLANs used by the switch so we can simply iterate over it to report the occupancy via devlink. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09Merge branch 'net-qed-disable-aRFS-in-NPAR-and-100G'David S. Miller
Igor Russkikh says: ==================== net: qed disable aRFS in NPAR and 100G This patchset fixes some recent issues found by customers. v3: resending on Dmitry's behalf v2: correct hash in Fixes tag ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: qed: RDMA personality shouldn't fail VF loadDmitry Bogdanov
Fix the assert during VF driver installation when the personality is iWARP Fixes: 1fe614d10f45 ("qed: Relax VF firmware requirements") Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: qede: Disable aRFS for NPAR and 100GDmitry Bogdanov
In some configurations ARFS cannot be used, so disable it if device is not capable. Fixes: e4917d46a653 ("qede: Add aRFS support") Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: qed: Disable aRFS for NPAR and 100GDmitry Bogdanov
In CMT and NPAR the PF is unknown when the GFS block processes the packet. Therefore cannot use searcher as it has a per PF database, and thus ARFS must be disabled. Fixes: d51e4af5c209 ("qed: aRFS infrastructure support") Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09Merge branch 'Marvell-PP2-2-PTP-support'David S. Miller
Russell King says: ==================== Marvell PP2.2 PTP support This series adds PTP support for PP2.2 hardware to the mvpp2 driver. Tested on the Macchiatobin eth1 port. Note that on the Macchiatobin, eth0 uses a separate TAI block from eth1, and there is no hardware synchronisation between the two. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: mvpp2: ptp: add support for transmit timestampingRussell King
Add support for timestamping transmit packets. We allocate SYNC messages to queue 1, every other message to queue 0. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: mvpp2: ptp: add support for receive timestampingRussell King
Add support for receive timestamping. When enabled, the hardware adds a timestamp into the receive queue descriptor for all received packets with no filtering. Hence, we can only support NONE or ALL receive filter modes. The timestamp in the receive queue contains two bit sof seconds and the full nanosecond timestamp. This has to be merged with the remainder of the seconds from the TAI clock to arrive at a full timestamp before we can convert it to a ktime for the skb hardware timestamp field. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: mvpp2: ptp: add TAI supportRussell King
Add support for the TAI block in the mvpp2.2 hardware. Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: mvpp2: check first level interrupt status registersRussell King
Check the first level interrupt status registers to determine how to further process the port interrupt. We will need this to know whether to invoke the link status processing and/or the PTP processing for both XLG and GMAC. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: mvpp2: rename mis-named "link status" interruptRussell King
The link interrupt is used for way more than just the link status; it comes from a collection of units to do with the port. The Marvell documentation describes the interrupt as "GOP port X interrupt". Since we are adding PTP support, and the PTP interrupt uses this, rename it to be more inline with the documentation. This interrupt is also mis-named in the DT binding, but we leave that alone. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: mvpp2: restructure "link status" interrupt handlingRussell King
The "link status" interrupt is used for more than just link status. Restructure mvpp2_link_status_isr() so we can add additional handling. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09Merge branch 'devlink-show-controller-number'David S. Miller
Parav Pandit says: ==================== devlink show controller number Currently a devlink instance that supports an eswitch handles eswitch ports of two type of controllers. (1) controller discovered on same system where eswitch resides. This is the case where PCI PF/VF of a controller and devlink eswitch instance both are located on a single system. (2) controller located on external system. This is the case where a controller is plugged in one system and its devlink eswitch ports are located in a different system. In this case devlink instance of the eswitch only have access to ports of the controller. However, there is no way to describe that a eswitch devlink port belongs to which controller (mainly which external host controller). This problem is more prevalent when port attribute such as PF and VF numbers are overlapping between multiple controllers of same eswitch. Due to this, for a specific switch_id, unique phys_port_name cannot be constructed for such devlink ports. This short series overcomes this limitation by defining two new attributes. (a) external: Indicates if port belongs to external controller (b) controller number: Indicates a controller number of the port Based on this a unique phys_port_name is prepared using controller number. phys_port_name construction using unique controller number is only applicable to external controller ports. This ensures that for non smartnic usecases where there is no external controller, phys_port_name stays same as before. Patch summary: Patch-1 Added mlx5 driver to read controller number Patch-2 Adds the missing comment for the port attributes Patch-3 Move structure comments away from structure fields Patch-4 external attribute added for PCI port flavours Patch-5 Add controller number Patch-6 Use controller number to build phys_port_name --- Changelog: v2->v3: - Updated diagram to get rid of controller 'A' and 'B' - Kept ports of single controller together in diagram - Updated diagram for pf1's VF and SF and its ports v1->v2: - Added text diagram of multiple controllers - Updated example for a VF - Addressed comments from Jiri and Jakub - Moved controller number attribute to PCI port flavours This enables to better, hirerchical view with controller and its PF, VF numbers - Split 'external' and 'controller number' attributes as two different attributes - Merged mlx5_core driver to avoid compiliation break ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09devlink: Use controller while building phys_port_nameParav Pandit
Now that controller number attribute is available, use it when building phsy_port_name for external controller ports. An example devlink port and representor netdev name consist of controller annotation for external controller with controller number = 1, for a VF 1 of PF 0: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev ens2f0c1pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port show pci/0000:06:00.0/2 -jp { "port": { "pci/0000:06:00.0/2": { "type": "eth", "netdev": "ens2f0c1pf0vf1", "flavour": "pcivf", "controller": 1, "pfnum": 0, "vfnum": 1, "external": true, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00" } } } } Controller number annotation is skipped for non external controllers to maintain backward compatibility. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09devlink: Introduce controller numberParav Pandit
A devlink port may be for a controller consist of PCI device. A devlink instance holds ports of two types of controllers. (1) controller discovered on same system where eswitch resides This is the case where PCI PF/VF of a controller and devlink eswitch instance both are located on a single system. (2) controller located on external host system. This is the case where a controller is located in one system and its devlink eswitch ports are located in a different system. When a devlink eswitch instance serves the devlink ports of both controllers together, PCI PF/VF numbers may overlap. Due to this a unique phys_port_name cannot be constructed. For example in below such system controller-0 and controller-1, each has PCI PF pf0 whose eswitch ports can be present in controller-0. These results in phys_port_name as "pf0" for both. Similar problem exists for VFs and upcoming Sub functions. An example view of two controller systems: --------------------------------------------------------- | | | --------- --------- ------- ------- | ----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| | | server | | ------- ----/---- ---/----- ------- ---/--- ---/--- | | pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ | | connect | | ------- ------- | ----------- | | controller_num=1 (no eswitch) | ------|-------------------------------------------------- (internal wire) | --------------------------------------------------------- | devlink eswitch ports and reps | | ----------------------------------------------------- | | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | | | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | | | ----------------------------------------------------- | | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | | | |pf1 | pf1vfN | pf1sfN | pf1 | pf1vfN |pf0sfN | | | ----------------------------------------------------- | | | | | | --------- --------- ------- ------- | | | vf(s) | | sf(s) | |vf(s)| |sf(s)| | | ------- ----/---- ---/----- ------- ---/--- ---/--- | | | pf0 |______/________/ | pf1 |___/_______/ | | ------- ------- | | | | local controller_num=0 (eswitch) | --------------------------------------------------------- An example devlink port for external controller with controller number = 1 for a VF 1 of PF 0: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port show pci/0000:06:00.0/2 -jp { "port": { "pci/0000:06:00.0/2": { "type": "eth", "netdev": "ens2f0pf0vf1", "flavour": "pcivf", "controller": 1, "pfnum": 0, "vfnum": 1, "external": true, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00" } } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09devlink: Introduce external controller flagParav Pandit
A devlink eswitch port may represent PCI PF/VF ports of a controller. A controller either located on same system or it can be an external controller located in host where such NIC is plugged in. Add the ability for driver to specify if a port is for external controller. Use such flag in the mlx5_core driver. An example of an external controller having VF1 of PF0 belong to controller 1. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port show pci/0000:06:00.0/2 -jp { "port": { "pci/0000:06:00.0/2": { "type": "eth", "netdev": "ens2f0pf0vf1", "flavour": "pcivf", "pfnum": 0, "vfnum": 1, "external": true, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00" } } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09devlink: Move structure comments outside of structureParav Pandit
To add more fields to the PCI PF and VF port attributes, follow standard structure comment format. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09devlink: Add comment block for missing port attributesParav Pandit
Add comment block for physical, PF and VF port attributes. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net/mlx5: E-switch, Read controller number from deviceParav Pandit
ECPF supports one external host controller. Read controller number from the device. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: stmmac: dwmac-intel-plat: remove redundant null check before ↵Zhang Changzhong
clk_disable_unprepare() Because clk_prepare_enable() and clk_disable_unprepare() already checked NULL clock parameter, so the additional checks are unnecessary, just remove them. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: pxa168_eth: remove redundant null check before clk_disable_unprepare()Zhang Changzhong
Because clk_prepare_enable() and clk_disable_unprepare() already checked NULL clock parameter, so the additional checks are unnecessary, just remove them. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09Merge branch 'SMSC-Cleanups-and-clock-setup'David S. Miller
Marco Felsch says: ==================== SMSC: Cleanups and clock setup this small series cleans the smsc-phy code a bit and adds the support to specify the phy clock source. Adding the phy clock source support is also the main purpose of this series. Each file has its own changelog. Thanks a lot to Florian and Andrew for reviewing it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: phy: smsc: LAN8710/20: remove PHY_RST_AFTER_CLK_EN flagMarco Felsch
Don't reset the phy without respect to the PHY library state machine because this breaks the phy IRQ mode. The same behaviour can be archived now by specifying the refclk. Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: phy: smsc: LAN8710/20: add phy refclk in supportMarco Felsch
Add support to specify the clock provider for the PHY refclk and don't rely on 'magic' host clock setup. [1] tried to address this by introducing a flag and fixing the corresponding host. But this commit breaks the IRQ support since the irq setup during .config_intr() is thrown away because the reset comes from the side without respecting the current PHY state within the PHY library state machine. Furthermore the commit fixed the problem only for FEC based hosts other hosts acting like the FEC are not covered. This commit goes the other way around to address the bug fixed by [1]. Instead of resetting the device from the side every time the refclk gets (re-)enabled it requests and enables the clock till the device gets removed. Now the PHY library is the only place where the PHY gets reset to respect the PHY library state machine. [1] commit 7f64e5b18ebb ("net: phy: smsc: LAN8710/20: add PHY_RST_AFTER_CLK_EN flag") Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09dt-bindings: net: phy: smsc: document reference clockMarco Felsch
Add support to specify the reference clock for the phy. Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: phy: smsc: simplify config_init callbackMarco Felsch
Exit the driver specific config_init hook early if energy detection is disabled. We can do this because we don't need to clear the interrupt status here. Clearing the status should be removed anyway since this is handled by the phy_enable_interrupts(). Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: phy: smsc: skip ENERGYON interrupt if disabledMarco Felsch
Don't enable the interrupt if the platform disable the energy detection by "smsc,disable-energy-detect". Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09arm64: dts: ns2: Fixed QSPI compatible stringFlorian Fainelli
The string was incorrectly defined before from least to most specific, swap the compatible strings accordingly. Fixes: ff73917d38a6 ("ARM64: dts: Add QSPI Device Tree node for NS2") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2020-09-09ARM: dts: BCM5301X: Fixed QSPI compatible stringFlorian Fainelli
The string was incorrectly defined before from least to most specific, swap the compatible strings accordingly. Fixes: 1c8f40650723 ("ARM: dts: BCM5301X: convert to iProc QSPI") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2020-09-09ARM: dts: NSP: Fixed QSPI compatible stringFlorian Fainelli
The string was incorrectly defined before from least to most specific, swap the compatible strings accordingly. Fixes: 329f98c1974e ("ARM: dts: NSP: Add QSPI nodes to NSPI and bcm958625k DTSes") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2020-09-09ARM: dts: bcm: HR2: Fixed QSPI compatible stringFlorian Fainelli
The string was incorrectly defined before from least to most specific, swap the compatible strings accordingly. Fixes: b9099ec754b5 ("ARM: dts: Add Broadcom Hurricane 2 DTS include file") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2020-09-09dt-bindings: spi: Fix spi-bcm-qspi compatible orderingFlorian Fainelli
The binding is currently incorrectly defining the compatible strings from least specifice to most specific instead of the converse. Re-order them from most specific (left) to least specific (right) and fix the examples as well. Fixes: 5fc78f4c842a ("spi: Broadcom BRCMSTB, NSP, NS2 SoC bindings") Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2020-09-09net: cavium: Fix a bunch of kerneldoc parameter issuesWang Hai
Rename ptp to ptp_info. Fix W=1 compile warnings (invalid kerneldoc): drivers/net/ethernet/cavium/common/cavium_ptp.c:94: warning: Excess function parameter 'ptp' description in 'cavium_ptp_adjfine' drivers/net/ethernet/cavium/common/cavium_ptp.c:141: warning: Excess function parameter 'ptp' description in 'cavium_ptp_adjtime' drivers/net/ethernet/cavium/common/cavium_ptp.c:163: warning: Excess function parameter 'ptp' description in 'cavium_ptp_gettime' drivers/net/ethernet/cavium/common/cavium_ptp.c:185: warning: Excess function parameter 'ptp' description in 'cavium_ptp_settime' drivers/net/ethernet/cavium/common/cavium_ptp.c:208: warning: Excess function parameter 'ptp' description in 'cavium_ptp_enable' Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09Merge tag 'wireless-drivers-2020-09-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.9 First set of fixes for v5.9, small but important. brcmfmac * fix a throughput regression on bcm4329 mt76 * fix a regression with stations reconnecting on mt7616 * properly free tx skbs, it was working by accident before mwifiex * fix a regression with 256 bit encryption keys wlcore * revert AES CMAC support as it caused a regression ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09Merge branch 'wireguard-fixes'David S. Miller
Jason A. Donenfeld says: ==================== wireguard fixes for 5.9-rc5 Yesterday, Eric reported a race condition found by syzbot. This series contains two commits, one that fixes the direct issue, and another that addresses the more general issue, as a defense in depth. 1) The basic problem syzbot unearthed was that one particular mutation of handshake->entry was not protected by the handshake mutex like the other cases, so this patch basically just reorders a line to make sure the mutex is actually taken at the right point. Most of the work here went into making sure the race was fully understood and making a reproducer (which syzbot was unable to do itself, due to the rarity of the race). 2) Eric's initial suggestion for fixing this was taking a spinlock around the hash table replace function where the null ptr deref was happening. This doesn't address the main problem in the most precise possible way like (1) does, but it is a good suggestion for defense-in-depth, in case related issues come up in the future, and basically costs nothing from a performance perspective. I thought it aided in implementing a good general rule: all mutators of that hash table take the table lock. So that's part of this series as a companion. Both of these contain Fixes: tags and are good candidates for stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09wireguard: peerlookup: take lock before checking hash in replace operationJason A. Donenfeld
Eric's suggested fix for the previous commit's mentioned race condition was to simply take the table->lock in wg_index_hashtable_replace(). The table->lock of the hash table is supposed to protect the bucket heads, not the entires, but actually, since all the mutator functions are already taking it, it makes sense to take it too for the test to hlist_unhashed, as a defense in depth measure, so that it no longer races with deletions, regardless of what other locks are protecting individual entries. This is sensible from a performance perspective because, as Eric pointed out, the case of being unhashed is already the unlikely case, so this won't add common contention. And comparing instructions, this basically doesn't make much of a difference other than pushing and popping %r13, used by the new `bool ret`. More generally, I like the idea of locking consistency across table mutator functions, and this might let me rest slightly easier at night. Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/ Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09wireguard: noise: take lock when removing handshake entry from tableJason A. Donenfeld
Eric reported that syzkaller found a race of this variety: CPU 1 CPU 2 -------------------------------------------|--------------------------------------- wg_index_hashtable_replace(old, ...) | if (hlist_unhashed(&old->index_hash)) | | wg_index_hashtable_remove(old) | hlist_del_init_rcu(&old->index_hash) | old->index_hash.pprev = NULL hlist_replace_rcu(&old->index_hash, ...) | *old->index_hash.pprev | Syzbot wasn't actually able to reproduce this more than once or create a reproducer, because the race window between checking "hlist_unhashed" and calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or similar there helps make this demonstrable using this simple script: #!/bin/bash set -ex trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT ip link add wg0 type wireguard ip link add wg1 type wireguard wg set wg0 private-key <(wg genkey) listen-port 9999 wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1 wg set wg0 peer $(wg show wg1 public-key) ip link set wg0 up yes link set wg1 up | ip -force -batch - & pid1=$! yes link set wg1 down | ip -force -batch - & pid2=$! wait The fundumental underlying problem is that we permit calls to wg_index_ hashtable_remove(handshake.entry) without requiring the caller to take the handshake mutex that is intended to protect members of handshake during mutations. This is consistently the case with calls to wg_index_ hashtable_insert(handshake.entry) and wg_index_hashtable_replace( handshake.entry), but it's missing from a pertinent callsite of wg_ index_hashtable_remove(handshake.entry). So, this patch makes sure that mutex is taken. The original code was a little bit funky though, in the form of: remove(handshake.entry) lock(), memzero(handshake.some_members), unlock() remove(handshake.entry) The original intention of that double removal pattern outside the lock appears to be some attempt to prevent insertions that might happen while locks are dropped during expensive crypto operations, but actually, all callers of wg_index_hashtable_insert(handshake.entry) take the write lock and then explicitly check handshake.state, as they should, which the aforementioned memzero clears, which means an insertion should already be impossible. And regardless, the original intention was necessarily racy, since it wasn't guaranteed that something else would run after the unlock() instead of after the remove(). So, from a soundness perspective, it seems positive to remove what looks like a hack at best. The crash from both syzbot and from the script above is as follows: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline] RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174 Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5 RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010 RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000 R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000 R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500 FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820 wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline] wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/ Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09cxgb4/ch_ipsec: Registering xfrmdev_ops with cxgb4Ayush Sawal
As ch_ipsec was removed without clearing xfrmdev_ops and netdev feature(esp-hw-offload). When a recalculation of netdev feature is triggered by changing tls feature(tls-hw-tx-offload) from user request, it causes a page fault due to absence of valid xfrmdev_ops. Fixes: 6dad4e8ab3ec ("chcr: Add support for Inline IPSec") Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09perf: Stop using deprecated bpf_program__title()Andrii Nakryiko
Switch from deprecated bpf_program__title() API to bpf_program__section_name(). Also drop unnecessary error checks because neither bpf_program__title() nor bpf_program__section_name() can fail or return NULL. Fixes: 521095842027 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/bpf/20200908180127.1249-1-andriin@fb.com
2020-09-09Merge branch 'ksz9477-dsa-switch-driver-improvements'David S. Miller
Paul Barker says: ==================== ksz9477 dsa switch driver improvements These changes were made while debugging the ksz9477 driver for use on a custom board which uses the ksz9893 switch supported by this driver. The patches have been runtime tested on top of Linux 5.8.4, I couldn't runtime test them on top of 5.9-rc3 due to unrelated issues. They have been build tested on top of net-next. These changes can also be pulled from: https://gitlab.com/pbarker.dev/staging/linux.git tag: for-net-next/ksz-v3_2020-09-09 Changes from v2: * Fixed incorrect type in assignment error. Reported-by: kernel test robot <lkp@intel.com> Changes from v1: * Rebased onto net-next. * Dropped unnecessary `#include <linux/printk.h>`. * Instead of printing the phy mode in `ksz9477_port_setup()`, modify the existing print in `ksz9477_config_cpu_port()` to always produce output and to be more clear. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: dsa: microchip: Implement recommended reset timingPaul Barker
The datasheet for the ksz9893 and ksz9477 switches recommend waiting at least 100us after the de-assertion of reset before trying to program the device through any interface. Also switch the existing msleep() call to usleep_range() as recommended in Documentation/timers/timers-howto.rst. The 2ms range used here is somewhat arbitrary, as long as the reset is asserted for at least 10ms we should be ok. Signed-off-by: Paul Barker <pbarker@konsulko.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: dsa: microchip: Disable RGMII in-band status on KSZ9893Paul Barker
We can't assume that the link partner supports the in-band status reporting which is enabled by default on the KSZ9893 when using RGMII for the upstream port. Signed-off-by: Paul Barker <pbarker@konsulko.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: dsa: microchip: Improve phy mode messagePaul Barker
Always print the selected phy mode for the CPU port when using the ksz9477 driver. If the phy mode was changed, also print the previous mode to aid in debugging. To make the message more clear, prefix it with the port number which it applies to and improve the language a little. Signed-off-by: Paul Barker <pbarker@konsulko.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09net: dsa: microchip: Make switch detection more informativePaul Barker
To make switch detection more informative print the result of the ksz9477/ksz9893 compatibility check. With debug output enabled also print the contents of the Chip ID registers as a 40-bit hex string. As this detection is the first communication with the switch performed by the driver, making it easy to see any errors here will help identify issues with SPI data corruption or reset sequencing. Signed-off-by: Paul Barker <pbarker@konsulko.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09selftests/bpf: Fix test_sysctl_loop{1, 2} failure due to clang changeYonghong Song
Andrii reported that with latest clang, when building selftests, we have error likes: error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*): Looks like the BPF stack limit of 512 bytes is exceeded. Please move large on stack variables into BPF per-cpu array map. The error is triggered by the following LLVM patch: https://reviews.llvm.org/D87134 For example, the following code is from test_sysctl_loop1.c: static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx) { volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string"; ... } Without the above LLVM patch, the compiler did optimization to load the string (59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load, occupying 64 byte stack size. With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit. So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on the stack, total stack size exceeds 512 bytes, hence compiler complains and quits. To fix the issue, removing "volatile" key word or changing "volatile" to "const"/"static const" does not work, the string is put in .rodata.str1.1 section, which libbpf did not process it and errors out with libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1 libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name' in section '.rodata.str1.1' Defining the string const as global variable can fix the issue as it puts the string constant in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process '.rodata.str*.*' properly, the global definition can be changed back to local definition. Defining tcp_mem_name as a global, however, triggered a verifier failure. ./test_progs -n 7/21 libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: invalid stack off=0 size=1 verification time 6975 usec stack depth 160+64 processed 889 insns (limit 1000000) max_states_per_insn 4 total_states 14 peak_states 14 mark_read 10 libbpf: -- END LOG -- libbpf: failed to load program 'sysctl_tcp_mem' libbpf: failed to load object 'test_sysctl_loop2.o' test_bpf_verif_scale:FAIL:114 #7/21 test_sysctl_loop2.o:FAIL This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code like const char tcp_mem_name[] = "<...long string...>"; ... char name[64]; ... for (i = 0; i < sizeof(tcp_mem_name); ++i) if (name[i] != tcp_mem_name[i]) return 0; In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and 79 for test_sysctl_loop2.c. Without promotion-to-global change, old compiler generates code where the overflowed stack access is actually filled with valid value, so hiding the bpf program bug. With promotion-to-global change, the code is different, more specifically, the previous loading constants to stack is gone, and "name" occupies stack[-64:0] and overflow access triggers a verifier error. To fix the issue, adjust "name" buffer size properly. Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909171542.3673449-1-yhs@fb.com
2020-09-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT, from Michael Zhou. 2) do_ip_vs_set_ctl() dereferences uninitialized value, from Peilin Ye. 3) Support for userdata in tables, from Jose M. Guisado. 4) Do not increment ct error and invalid stats at the same time, from Florian Westphal. 5) Remove ct ignore stats, also from Florian. 6) Add ct stats for clash resolution, from Florian Westphal. 7) Bump reference counter bump on ct clash resolution only, this is safe because bucket lock is held, again from Florian. 8) Use ip_is_fragment() in xt_HMARK, from YueHaibing. 9) Add wildcard support for nft_socket, from Balazs Scheidler. 10) Remove superfluous IPVS dependency on iptables, from Yaroslav Bolyukin. 11) Remove unused definition in ebt_stp, from Wang Hai. 12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT in selftests/net, from Fabian Frederick. 13) Add userdata support for nft_object, from Jose M. Guisado. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-09hsr: avoid newline at end of message in NL_SET_ERR_MSG_MODYe Bin
clean follow coccicheck warning: net//hsr/hsr_netlink.c:94:8-42: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD net//hsr/hsr_netlink.c:87:30-57: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD net//hsr/hsr_netlink.c:79:29-53: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>