linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2022-02-14	net: dev: Remove preempt_disable() and get_cpu() in netif_rx_internal().	Sebastian Andrzej Siewior
	The preempt_disable() () section was introduced in commit cece1945bffcf ("net: disable preemption before call smp_processor_id()") and adds it in case this function is invoked from preemtible context and because get_cpu() later on as been added. The get_cpu() usage was added in commit b0e28f1effd1d ("net: netif_rx() must disable preemption") because ip_dev_loopback_xmit() invoked netif_rx() with enabled preemption causing a warning in smp_processor_id(). The function netif_rx() should only be invoked from an interrupt context which implies disabled preemption. The commit e30b38c298b55 ("ip: Fix ip_dev_loopback_xmit()") was addressing this and replaced netif_rx() with in netif_rx_ni() in ip_dev_loopback_xmit(). Based on the discussion on the list, the former patch (b0e28f1effd1d) should not have been applied only the latter (e30b38c298b55). Remove get_cpu() and preempt_disable() since the function is supposed to be invoked from context with stable per-CPU pointers. Bottom halves have to be disabled at this point because the function may raise softirqs which need to be processed. Link: https://lkml.kernel.org/r/20100415.013347.98375530.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	ice: Simplify tracking status of RDMA support	Dave Ertman
	The status of support for RDMA is currently being tracked with two separate status flags. This is unnecessary with the current state of the driver. Simplify status tracking down to a single flag. Rename the helper function to denote the RDMA specific status and universally use the helper function to test the status bit. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	Merge branch 'ocelot-stats'	David S. Miller
	Colin Foster says: ==================== use bulk reads for ocelot statistics Ocelot loops over memory regions to gather stats on different ports. These regions are mostly continuous, and are ordered. This patch set uses that information to break the stats reads into regions that can get read in bulk. The motiviation is for general cleanup, but also for SPI. Performing two back-to-back reads on a SPI bus require toggling the CS line, holding, re-toggling the CS line, sending 3 address bytes, sending N padding bytes, then actually performing the read. Bulk reads could reduce almost all of that overhead, but require that the reads are performed via regmap_bulk_read. Verified with eth0 hooked up to the CPU port: NIC statistics: Good Rx Frames: 905 Rx Octets: 78848 Good Tx Frames: 691 Tx Octets: 52516 Rx + Tx 65-127 Octet Frames: 1574 Rx + Tx 128-255 Octet Frames: 22 Net Octets: 131364 Rx DMA chan 0: head_enqueue: 1 Rx DMA chan 0: tail_enqueue: 1032 Rx DMA chan 0: busy_dequeue: 628 Rx DMA chan 0: good_dequeue: 905 Tx DMA chan 0: head_enqueue: 346 Tx DMA chan 0: tail_enqueue: 345 Tx DMA chan 0: misqueued: 345 Tx DMA chan 0: empty_dequeue: 346 Tx DMA chan 0: good_dequeue: 691 p00_rx_octets: 52516 p00_rx_unicast: 691 p00_rx_frames_65_to_127_octets: 691 p00_tx_octets: 78848 p00_tx_unicast: 905 p00_tx_frames_65_to_127_octets: 883 p00_tx_frames_128_255_octets: 22 p00_tx_green_prio_0: 905 And with swp2 connected to swp3 with STP enabled: NIC statistics: tx_packets: 379 tx_bytes: 19708 rx_packets: 1 rx_bytes: 46 rx_octets: 64 rx_multicast: 1 rx_frames_below_65_octets: 1 rx_classified_drops: 1 tx_octets: 44630 tx_multicast: 387 tx_broadcast: 290 tx_frames_below_65_octets: 379 tx_frames_65_to_127_octets: 294 tx_frames_128_255_octets: 4 tx_green_prio_0: 298 tx_green_prio_7: 379 NIC statistics: tx_packets: 1 tx_bytes: 52 rx_packets: 713 rx_bytes: 34148 rx_octets: 46982 rx_multicast: 407 rx_broadcast: 306 rx_frames_below_65_octets: 399 rx_frames_65_to_127_octets: 310 rx_frames_128_to_255_octets: 4 rx_classified_drops: 399 rx_green_prio_0: 314 tx_octets: 64 tx_multicast: 1 tx_frames_below_65_octets: 1 tx_green_prio_7: 1 v1 > v2: reword commit messages v2 > v3: correctly mark this for net-next when sending v3 > v4: calloc array instead of zalloc per review v4 > v5: Apply CR suggestions for whitespace Fix calloc / zalloc mixup Properly destroy workqueues Add third commit to split long macros v5 > v6: Fix functionality - v5 was improperly tested Add bugfix for ethtool mutex lock Remove unnecessary ethtool stats reads v6 > v7: Remove mutex bug patch that was applied via net Rename function based on CR Add missed error check ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	net: mscc: ocelot: use bulk reads for stats	Colin Foster
	Create and utilize bulk regmap reads instead of single access for gathering stats. The background reading of statistics happens frequently, and over a few contiguous memory regions. High speed PCIe buses and MMIO access will probably see negligible performance increase. Lower speed buses like SPI and I2C could see significant performance increase, since the bus configuration and register access times account for a large percentage of data transfer time. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	net: mscc: ocelot: add ability to perform bulk reads	Colin Foster
	Regmap supports bulk register reads. Ocelot does not. This patch adds support for Ocelot to invoke bulk regmap reads. That will allow any driver that performs consecutive reads over memory regions to optimize that access. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	net: ocelot: align macros for consistency	Colin Foster
	In the ocelot.h file, several read / write macros were split across multiple lines, while others weren't. Split all macros that exceed the 80 character column width and match the style of the rest of the file. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	net: mscc: ocelot: remove unnecessary stat reading from ethtool	Colin Foster
	The ocelot_update_stats function only needs to read from one port, yet it was updating the stats for all ports. Update to only read the stats that are necessary. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	ipv6: Add reasons for skb drops to __udp6_lib_rcv	David Ahern
	Add reasons to __udp6_lib_rcv for skb drops. The only twist is that the NO_SOCKET takes precedence over the CSUM or other counters for that path (motivation behind this patch - csum counter was misleading). Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	Merge branch 'dm9051'	David S. Miller
	Joseph CHAMG says: ==================== ADD DM9051 ETHERNET DRIVER DM9051 is a spi interface chip, need cs/mosi/miso/clock with an interrupt gpio pin ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	net: Add dm9051 driver	Joseph CHAMG
	Add davicom dm9051 spi ethernet driver, The driver work for the device platform which has the spi master Signed-off-by: Joseph CHAMG <josright123@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	dt-bindings: net: Add Davicom dm9051 SPI ethernet controller	Joseph CHAMG
	This is a new yaml base data file for configure davicom dm9051 with device tree Signed-off-by: Joseph CHAMG <josright123@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	net/smc: Add comment for smc_tx_pending	Tony Lu
	The previous patch introduces a lock-free version of smc_tx_work() to solve unnecessary lock contention, which is expected to be held lock. So this adds comment to remind people to keep an eye out for locks. Suggested-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	Generate netlink notification when default IPv6 route preference changes	Kalash Nainwal
	Generate RTM_NEWROUTE netlink notification when the route preference changes on an existing kernel generated default route in response to RA messages. Currently netlink notifications are generated only when this route is added or deleted but not when the route preference changes, which can cause userspace routing application state to go out of sync with kernel. Signed-off-by: Kalash Nainwal <kalash@arista.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14	net/sched: act_police: more accurate MTU policing	Davide Caratti
	in current Linux, MTU policing does not take into account that packets at the TC ingress have the L2 header pulled. Thus, the same TC police action (with the same value of tcfp_mtu) behaves differently for ingress/egress. In addition, the full GSO size is compared to tcfp_mtu: as a consequence, the policer drops GSO packets even when individual segments have the L2 + L3 + L4 + payload length below the configured valued of tcfp_mtu. Improve the accuracy of MTU policing as follows: - account for mac_len for non-GSO packets at TC ingress. - compare MTU threshold with the segmented size for GSO packets. Also, add a kselftest that verifies the correct behavior. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead	Kees Cook
	With GCC 12, -Wstringop-overread was warning about an implicit cast from char[6] to char[8]. However, the extra 2 bytes are always thrown away, alignment doesn't matter, and the risk of hitting the edge of unallocated memory has been accepted, so this prototype can just be converted to a regular char *. Silences: net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread] 4618 \| orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’ 375 \| static inline bool ether_addr_equal_64bits(const u8 addr1[6+2], \| ^~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	net: lan966x: Fix when CONFIG_IPV6 is not set	Horatiu Vultur
	When CONFIG_IPV6 is not set, then the linking of the lan966x driver fails with the following error: drivers/net/ethernet/microchip/lan966x/lan966x_main.c:444: undefined reference to `ipv6_mc_check_mld' The fix consists in adding a check also for IS_ENABLED(CONFIG_IPV6) Fixes: 47aeea0d57e80c ("net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	net: lan966x: Fix when CONFIG_PTP_1588_CLOCK is compiled as module	Horatiu Vultur
	When CONFIG_PTP_1588_CLOCK is compiled as a module, then the linking of the lan966x fails because it can't find references to the following functions 'ptp_clock_index', 'ptp_clock_register' and 'ptp_clock_unregister' The fix consists in adding CONFIG_PTP_1588_CLOCK_OPTIONAL as a dependency for the driver. Fixes: d096459494a887 ("net: lan966x: Add support for ptp clocks") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	Merge branch 'lan743x-enhancements'	David S. Miller
	Raju Lakkaraju says: ==================== net: lan743x: PCI11010 / PCI11414 devices Enhancements This patch series adds support of the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver. The PCI1xxxx family of devices consists of a PCIe switch with a variety of embedded PCI endpoints on its downstream ports. The PCI11010 / PCI11414 devices include an Ethernet 10/100/1000/2500 function as one of those embedded endpoints. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	net: lan743x: Add support for Clause-45 MDIO PHY management	Raju Lakkaraju
	Add support for Clause-45 MDIO PHY management Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	net: lan743x: Add support for SGMII interface	Raju Lakkaraju
	This change facilitates the selection between SGMII and (R)GIII interfaces Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	net: lan743x: Increase MSI(x) vectors to 16 and Int de-assertion timers to 10	Raju Lakkaraju
	Increase MSI / MSI-X vectors supported from 8 to 16 and Interrupt De-assertion timers from 8 to 10 Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	net: lan743x: Add support for 4 Tx queues	Raju Lakkaraju
	Add support for 4 Tx queues Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	net: lan743x: Add PCI11010 / PCI11414 device IDs	Raju Lakkaraju
	PCI11010/PCI11414 devices are enhancement of Ethernet LAN743x chip family. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-13	net: wwan: iosm: Enable M.2 7360 WWAN card support	M Chetan Kumar
	This patch enables Intel M.2 7360 WWAN card support on IOSM Driver. Control path implementation is a reuse whereas data path implementation it uses a different protocol called as MUX Aggregation. The major portion of this patch covers the MUX Aggregation protocol implementation used for IP traffic communication. For M.2 7360 WWAN card, driver exposes 2 wwan AT ports for control communication. The user space application or the modem manager to use wwan AT port for data path establishment. During probe, driver reads the mux protocol device capability register to know the mux protocol version supported by device. Base on which the right mux protocol is initialized for data path communication. An overview of an Aggregation Protocol 1> An IP packet is encapsulated with 16 octet padding header to form a Datagram & the start offset of the Datagram is indexed into Datagram Header (DH). 2> Multiple such Datagrams are composed & the start offset of each DH is indexed into Datagram Table Header (DTH). 3> The Datagram Table (DT) is IP session specific & table_length item in DTH holds the number of composed datagram pertaining to that particular IP session. 4> And finally the offset of first DTH is indexed into DBH (Datagram Block Header). So in TX/RX flow Datagram Block (Datagram Block Header + Payload)is exchanged between driver & device. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	Revert "net: ethernet: cavium: use div64_u64() instead of do_div()"	Jakub Kicinski
	This reverts commit 038fcdaf0470de89619bc4cc199e329391e6566c. Christophe points out div64_u64() and do_div() have different calling conventions. One updates the param, the other returns the result. Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/all/056a7276-c6f0-cd7e-9e46-1d8507a0b6b1@wanadoo.fr/ Fixes: 038fcdaf0470 ("net: ethernet: cavium: use div64_u64() instead of do_div()") Link: https://lore.kernel.org/r/20220211020544.3262694-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-11	net: moxa: use GFP_KERNEL	Julia Lawall
	Platform_driver probe functions aren't called with locks held and thus don't need GFP_ATOMIC. Use GFP_KERNEL instead. Problem found with Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220210204223.104181-1-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-11	octeontx2-af: fix array bound error	Hariprasad Kelam
	This patch fixes below error by using proper data type. drivers/net/ethernet/marvell/octeontx2/af/rpm.c: In function 'rpm_cfg_pfc_quanta_thresh': include/linux/find.h:40:23: error: array subscript 'long unsigned int[0]' is partly outside array bounds of 'u16[1]' {aka 'short unsigned int[1]'} [-Werror=array-bounds] 40 \| val = *addr & GENMASK(size - 1, offset); Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20220211155539.13931-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-11	Merge tag 'wireless-next-2022-02-11' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next wireless-next patches for v5.18 First set of patches for v5.18, with both wireless and stack patches. rtw89 now has AP mode support and wcn36xx has survey support. But otherwise pretty normal. Major changes: ath11k * add LDPC FEC type in 802.11 radiotap header * enable RX PPDU stats in monitor co-exist mode wcn36xx * implement survey reporting brcmfmac * add CYW43570 PCIE device rtw88 * rtw8821c: enable RFE 6 devices rtw89 * AP mode support mt76 * mt7916 support * background radar detection support
2022-02-11	Merge branch 'ipv6-loopback'	David S. Miller
	Eric Dumazet says: ==================== ipv6: remove addrconf reliance on loopback Second patch in this series removes IPv6 requirement about the netns loopback device being the last device being dismantled. This was needed because rt6_uncached_list_flush_dev() and ip6_dst_ifdown() had to switch dst dev to a known device (loopback). Instead of loopback, we can use the (hidden) blackhole_netdev which is also always there. This will allow future simplfications of netdev_run_to() and other parts of the stack like default_device_exit_batch(). Last two patches are optimizations for both IP families. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	ipv4: add (struct uncached_list)->quarantine list	Eric Dumazet
	This is an optimization to keep the per-cpu lists as short as possible: Whenever rt_flush_dev() changes one rtable dst.dev matching the disappearing device, it can can transfer the object to a quarantine list, waiting for a final rt_del_uncached_list(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	ipv6: add (struct uncached_list)->quarantine list	Eric Dumazet
	This is an optimization to keep the per-cpu lists as short as possible: Whenever rt6_uncached_list_flush_dev() changes one rt6_info matching the disappearing device, it can can transfer the object to a quarantine list, waiting for a final rt6_uncached_list_del(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	ipv6: give an IPv6 dev to blackhole_netdev	Eric Dumazet
	IPv6 addrconf notifiers wants the loopback device to be the last device being dismantled at netns deletion. This caused many limitations and work arounds. Back in linux-5.3, Mahesh added a per host blackhole_netdev that can be used whenever we need to make sure objects no longer refer to a disappearing device. If we attach to blackhole_netdev an ip6_ptr (allocate an idev), then we can use this special device (which is never freed) in place of the loopback_dev (which can be freed). This will permit improvements in netdev_run_todo() and other parts of the stack where had steps to make sure loopback_dev was the last device to disappear. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	ipv6: get rid of net->ipv6.rt6_stats->fib_rt_uncache	Eric Dumazet
	This counter has never been visible, there is little point trying to maintain it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	dsa: mv88e6xxx: make serdes SGMII/Fiber tx amplitude configurable	Holger Brunck
	The mv88e6352, mv88e6240 and mv88e6176 have a serdes interface. This patch allows to configure the output swing to a desired value in the phy-handle of the port. The value which is peak to peak has to be specified in microvolts. As the chips only supports eight dedicated values we return EINVAL if the value in the DTS does not match one of these values. Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	dt-bindings: phy: Add `tx-p2p-microvolt` property binding	Marek Behún
	Common PHYs and network PCSes often have the possibility to specify peak-to-peak voltage on the differential pair - the default voltage sometimes needs to be changed for a particular board. Add properties `tx-p2p-microvolt` and `tx-p2p-microvolt-names` for this purpose. The second property is needed to specify the mode for the corresponding voltage in the `tx-p2p-microvolt` property, if the voltage is to be used only for speficic mode. More voltage-mode pairs can be specified. Example usage with only one voltage (it will be used for all supported PHY modes, the `tx-p2p-microvolt-names` property is not needed in this case): tx-p2p-microvolt = <915000>; Example usage with voltages for multiple modes: tx-p2p-microvolt = <915000>, <1100000>, <1200000>; tx-p2p-microvolt-names = "2500base-x", "usb", "pcie"; Add these properties into a separate file phy/transmit-amplitude.yaml, which should be referenced by any binding that uses it. Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	ipv6: Reject routes configurations that specify dsfield (tos)	Guillaume Nault
	The ->rtm_tos option is normally used to route packets based on both the destination address and the DS field. However it's ignored for IPv6 routes. Setting ->rtm_tos for IPv6 is thus invalid as the route is going to work only on the destination address anyway, so it won't behave as specified. Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	Merge branch 'dsa-cleanup'	David S. Miller
	Vladimir Oltean says: ==================== More aggressive DSA cleanup This series deletes some code which is apparently not needed. I've had these patches in my tree for a while, and testing on my boards didn't reveal any issues. Compared to the RFC v1 series, the only change is the addition of patch 3. https://patchwork.kernel.org/project/netdevbpf/cover/20220107184842.550334-1-vladimir.oltean@nxp.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	net: dsa: remove lockdep class for DSA slave address list	Vladimir Oltean
	Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA slave interfaces to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	net: dsa: remove lockdep class for DSA master address list	Vladimir Oltean
	Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), suggested by Cong Wang, the DSA interfaces and their master have different dev->nested_level, which makes netif_addr_lock() stop complaining about potentially recursive locking on the same lock class. So we no longer need DSA masters to have their own lockdep class. Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	net: dsa: remove ndo_get_phys_port_name and ndo_get_port_parent_id	Vladimir Oltean
	There are no legacy ports, DSA registers a devlink instance with ports unconditionally for all switch drivers. Therefore, delete the old-style ndo operations used for determining bridge forwarding domains. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	Merge branch 'smc-optimizations'	David S. Miller
	D. Wythe says: ==================== net/smc: Optimizing performance in short-lived scenarios This patch set aims to optimizing performance of SMC in short-lived links scenarios, which is quite unsatisfactory right now. In our benchmark, we test it with follow scripts: ./wrk -c 10000 -t 4 -H 'Connection: Close' -d 20 http://smc-server Current performance figures like that: Running 20s test @ http://11.213.45.6 4 threads and 10000 connections 4956 requests in 20.06s, 3.24MB read Socket errors: connect 0, read 0, write 672, timeout 0 Requests/sec: 247.07 Transfer/sec: 165.28KB There are many reasons for this phenomenon, this patch set doesn't solve it all though, but it can be well alleviated with it in. Patch 1/5 (Make smc_tcp_listen_work() independent) : Separate smc_tcp_listen_work() from smc_listen_work(), make them independent of each other, the busy SMC handshake can not affect new TCP connections visit any more. Avoid discarding a large number of TCP connections after being overstock, which is undoubtedly raise the connection establishment time. Patch 2/5 (Limit SMC backlog connections): Since patch 1 has separated smc_tcp_listen_work() from smc_listen_work(), an unrestricted TCP accept have come into being. This patch try to put a limit on SMC backlog connections refers to implementation of TCP. Patch 3/5 (Limit SMC visits when handshake workqueue congested): Considering the complexity of SMC handshake right now, in short-lived links scenarios, this may not be the main scenario of SMC though, it's performance is still quite poor. This patch try to provide constraint on SMC handshake when handshake workqueue congested, which is the sign of SMC handshake stacking in our opinion. Patch 4/5 (Dynamic control handshake limitation by socket options) This patch allow applications dynamically control the ability of SMC handshake limitation. Since SMC don't support set SMC socket option before, this patch also have to support SMC's owns socket options. Patch 5/5 (Add global configure for handshake limitation by netlink) This patch provides a way to get benefit of handshake limitation without modifying any code for applications, which is quite useful for most existing applications. After this patch set, performance figures like that: Running 20s test @ http://11.213.45.6 4 threads and 10000 connections 693253 requests in 20.10s, 452.88MB read Requests/sec: 34488.13 Transfer/sec: 22.53MB That's a quite well performance improvement, about to 6 to 7 times in my environment. --- changelog: v1 -> v2: - fix compile warning - fix invalid dependencies in kconfig v2 -> v3: - correct spelling mistakes - fix useless variable declare v3 -> v4 - make smc_tcp_ls_wq be static v4 -> v5 - add dynamic control for SMC auto fallback by socket options - add global configure for SMC auto fallback through netlink v5 -> v6 - move auto fallback to net namespace scope - remove auto fallback attribute in SMC_GEN_SYS_INFO - add independent attributes for auto fallback v6 -> v7 - fix wording and the naming issues, rename 'auto fallback' to handshake limitation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	net/smc: Add global configure for handshake limitation by netlink	D. Wythe
	Although we can control SMC handshake limitation through socket options, which means that applications who need it must modify their code. It's quite troublesome for many existing applications. This patch modifies the global default value of SMC handshake limitation through netlink, providing a way to put constraint on handshake without modifies any code for applications. Suggested-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	net/smc: Dynamic control handshake limitation by socket options	D. Wythe
	This patch aims to add dynamic control for SMC handshake limitation for every smc sockets, in production environment, it is possible for the same applications to handle different service types, and may have different opinion on SMC handshake limitation. This patch try socket options to complete it, since we don't have socket option level for SMC yet, which requires us to implement it at the same time. This patch does the following: - add new socket option level: SOL_SMC. - add new SMC socket option: SMC_LIMIT_HS. - provide getter/setter for SMC socket options. Link: https://lore.kernel.org/all/20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	net/smc: Limit SMC visits when handshake workqueue congested	D. Wythe
	This patch intends to provide a mechanism to put constraint on SMC connections visit according to the pressure of SMC handshake process. At present, frequent visits will cause the incoming connections to be backlogged in SMC handshake queue, raise the connections established time. Which is quite unacceptable for those applications who base on short lived connections. There are two ways to implement this mechanism: 1. Put limitation after TCP established. 2. Put limitation before TCP established. In the first way, we need to wait and receive CLC messages that the client will potentially send, and then actively reply with a decline message, in a sense, which is also a sort of SMC handshake, affect the connections established time on its way. In the second way, the only problem is that we need to inject SMC logic into TCP when it is about to reply the incoming SYN, since we already do that, it's seems not a problem anymore. And advantage is obvious, few additional processes are required to complete the constraint. This patch use the second way. After this patch, connections who beyond constraint will not informed any SMC indication, and SMC will not be involved in any of its subsequent processes. Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	net/smc: Limit backlog connections	D. Wythe
	Current implementation does not handling backlog semantics, one potential risk is that server will be flooded by infinite amount connections, even if client was SMC-incapable. This patch works to put a limit on backlog connections, referring to the TCP implementation, we divides SMC connections into two categories: 1. Half SMC connection, which includes all TCP established while SMC not connections. 2. Full SMC connection, which includes all SMC established connections. For half SMC connection, since all half SMC connections starts with TCP established, we can achieve our goal by put a limit before TCP established. Refer to the implementation of TCP, this limits will based on not only the half SMC connections but also the full connections, which is also a constraint on full SMC connections. For full SMC connections, although we know exactly where it starts, it's quite hard to put a limit before it. The easiest way is to block wait before receive SMC confirm CLC message, while it's under protection by smc_server_lgr_pending, a global lock, which leads this limit to the entire host instead of a single listen socket. Another way is to drop the full connections, but considering the cast of SMC connections, we prefer to keep full SMC connections. Even so, the limits of full SMC connections still exists, see commits about half SMC connection below. After this patch, the limits of backend connection shows like: For SMC: 1. Client with SMC-capability can makes 2 * backlog full SMC connections or 1 * backlog half SMC connections and 1 * backlog full SMC connections at most. 2. Client without SMC-capability can only makes 1 * backlog half TCP connections and 1 * backlog full TCP connections. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	net/smc: Make smc_tcp_listen_work() independent	D. Wythe
	In multithread and 10K connections benchmark, the backend TCP connection established very slowly, and lots of TCP connections stay in SYN_SENT state. Client: smc_run wrk -c 10000 -t 4 http://server the netstate of server host shows like: 145042 times the listen queue of a socket overflowed 145042 SYNs to LISTEN sockets dropped One reason of this issue is that, since the smc_tcp_listen_work() shared the same workqueue (smc_hs_wq) with smc_listen_work(), while the smc_listen_work() do blocking wait for smc connection established. Once the workqueue became congested, it's will block the accept() from TCP listen. This patch creates a independent workqueue(smc_tcp_ls_wq) for smc_tcp_listen_work(), separate it from smc_listen_work(), which is quite acceptable considering that smc_tcp_listen_work() runs very fast. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11	dt-bindings: net: dsa: realtek: convert to YAML schema, add MDIO	Luiz Angelo Daros de Luca
	Schema changes: - support for mdio-connected switches (mdio driver), recognized by checking the presence of property "reg" - new compatible strings for rtl8367s and rtl8367rb - "interrupt-controller" was not added as a required property. It might still work polling the ports when missing. Examples changes: - renamed "switch_intc" to make it unique between examples - removed "dsa-mdio" from mdio compatible property - renamed phy@0 to ethernet-phy@0 (not tested with real HW) phy@ requires #phy-cells Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-10	Merge tag 'net-5.17-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter and can. Current release - new code bugs: - sparx5: fix get_stat64 out-of-bound access and crash - smc: fix netdev ref tracker misuse Previous releases - regressions: - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid overflows - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP over IP - bonding: fix rare link activation misses in 802.3ad mode Previous releases - always broken: - tcp: fix tcp sock mem accounting in zero-copy corner cases - remove the cached dst when uncloning an skb dst and its metadata, since we only have one ref it'd lead to an UaF - netfilter: - conntrack: don't refresh sctp entries in closed state - conntrack: re-init state for retransmitted syn-ack, avoid connection establishment getting stuck with strange stacks - ctnetlink: disable helper autoassign, avoid it getting lost - nft_payload: don't allow transport header access for fragments - dsa: fix use of devres for mdio throughout drivers - eth: amd-xgbe: disable interrupts during pci removal - eth: dpaa2-eth: unregister netdev before disconnecting the PHY - eth: ice: fix IPIP and SIT TSO offload" * tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister net: mscc: ocelot: fix mutex lock error during ethtool stats read ice: Avoid RTNL lock when re-creating auxiliary device ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler ice: fix IPIP and SIT TSO offload ice: fix an error code in ice_cfg_phy_fec() net: mpls: Fix GCC 12 warning dpaa2-eth: unregister the netdev before disconnecting from the PHY skbuff: cleanup double word in comment net: macb: Align the dma and coherent dma masks mptcp: netlink: process IPv6 addrs in creating listening sockets selftests: mptcp: add missing join check net: usb: qmi_wwan: Add support for Dell DW5829e vlan: move dev_put into vlan_dev_uninit vlan: introduce vlan_dev_free_egress_priority ax25: fix UAF bugs of net_device caused by rebinding operation net: dsa: fix panic when DSA master device unbinds on shutdown net: amd-xgbe: disable interrupts during pci removal tipc: rate limit warning for received illegal binding update net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE ...
2022-02-10	Merge tag 'linux-kselftest-fixes-5.17-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Build and run-time fixes to pidfd, clone3, and ir tests" * tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ir: fix build with ancient kernel headers selftests: fixup build warnings in pidfd / clone3 tests pidfd: fix test failure due to stack overflow on some arches