summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-03wifi: ath12k: move HE capabilities processing to a new functionAloka Dixit
The function ath12k_mac_copy_sband_iftype_data() is currently used HE capabilities propagation but it can be extended to include EHT data. Move the HE specific functionality from to ath12k_mac_copy_he_cap() to make EHT additions easier. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230725224034.14045-3-quic_alokad@quicinc.com
2023-08-03wifi: ath12k: rename HE capabilities setup/copy functionsAloka Dixit
Functions ath12k_mac_setup_he_cap() and ath12k_mac_copy_he_cap() propagate HE and 6GHz capabilities to the userspace using an instance of struct ieee80211_sband_iftype_data. This structure now has a new member 'eht_cap' to include EHT capabilities as well. Rename the above mentioned functions to indicate that their use is not limited to HE. Also, replace the local variable 'band' with 'sband' and reuse 'band' for the type enum nl80211_band. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230725224034.14045-2-quic_alokad@quicinc.com
2023-08-03bonding: support balance-alb with openvswitchMateusz Kowalski
Commit d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge") introduced a support for balance-alb mode for interfaces connected to the linux bridge by fixing missing matching of MAC entry in FDB. In our testing we discovered that it still does not work when the bond is connected to the OVS bridge as show in diagram below: eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac) | bond0.150(mac:eth0_mac) | ovs_bridge(ip:bridge_ip,mac:eth0_mac) This patch fixes it by checking not only if the device is a bridge but also if it is an openvswitch. Signed-off-by: Mateusz Kowalski <mko@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/9fe7297c-609e-208b-c77b-3ceef6eb51a4@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-03Merge patch "can: esd_usb: Add support for esd CAN-USB/3"Marc Kleine-Budde
Frank Jungclaus <frank.jungclaus@esd.eu> says: After having applied a vast number of improvements to the existing CAN-USB/2 driver here now is a new attempt to add support for the esd CAN-USB/3 CAN FD interface. Beside this patch there are the following to-do's left for follow-up patches: * In principle, the esd CAN-USB/3 supports Transmitter Delay Compensation (TDC), but currently only the automatic TDC mode is supported by this driver. An implementation for manual TDC configuration will follow. * Rework the code to no longer switch directly on the USB product IDs to handle different device setting for each supported USB device. Instead use the driver_info member within struct usb_device_id to hold / point to specific properties for each supported device. * Try to switch from synchronous send usb_bulk_msg() to asynchronous communication by means of usb_submit_urb() where it is feasible. Link: https://lore.kernel.org/all/20230728150857.2374886-1-frank.jungclaus@esd.eu Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-08-03can: esd_usb: Add support for esd CAN-USB/3Frank Jungclaus
Add support for esd CAN-USB/3 and CAN FD to esd_usb.c. Signed-off-by: Frank Jungclaus <frank.jungclaus@esd.eu> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/all/20230728150857.2374886-2-frank.jungclaus@esd.eu Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-08-03xen/netback: Fix buffer overrun triggered by unusual packetRoss Lagerwall
It is possible that a guest can send a packet that contains a head + 18 slots and yet has a len <= XEN_NETBACK_TX_COPY_LEN. This causes nr_slots to underflow in xenvif_get_requests() which then causes the subsequent loop's termination condition to be wrong, causing a buffer overrun of queue->tx_map_ops. Rework the code to account for the extra frag_overflow slots. This is CVE-2023-34319 / XSA-432. Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com>
2023-08-02udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGESDavid Howells
__ip_append_data() can get into an infinite loop when asked to splice into a partially-built UDP message that has more than the frag-limit data and up to the MTU limit. Something like: pipe(pfd); sfd = socket(AF_INET, SOCK_DGRAM, 0); connect(sfd, ...); send(sfd, buffer, 8161, MSG_CONFIRM|MSG_MORE); write(pfd[1], buffer, 8); splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0); where the amount of data given to send() is dependent on the MTU size (in this instance an interface with an MTU of 8192). The problem is that the calculation of the amount to copy in __ip_append_data() goes negative in two places, and, in the second place, this gets subtracted from the length remaining, thereby increasing it. This happens when pagedlen > 0 (which happens for MSG_ZEROCOPY and MSG_SPLICE_PAGES), because the terms in: copy = datalen - transhdrlen - fraggap - pagedlen; then mostly cancel when pagedlen is substituted for, leaving just -fraggap. This causes: length -= copy + transhdrlen; to increase the length to more than the amount of data in msg->msg_iter, which causes skb_splice_from_iter() to be unable to fill the request and it returns less than 'copied' - which means that length never gets to 0 and we never exit the loop. Fix this by: (1) Insert a note about the dodgy calculation of 'copy'. (2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above equation, so that 'offset' isn't regressed and 'length' isn't increased, which will mean that length and thus copy should match the amount left in the iterator. (3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if we're asked to splice more than is in the iterator. It might be better to not give the warning or even just give a 'short' write. [!] Note that this ought to also affect MSG_ZEROCOPY, but MSG_ZEROCOPY avoids the problem by simply assuming that everything asked for got copied, not just the amount that was in the iterator. This is a potential bug for the future. Fixes: 7ac7c987850c ("udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES") Reported-by: syzbot+f527b971b4bdc8e79f9e@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: David Ahern <dsahern@kernel.org> cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/1420063.1690904933@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02Merge branch 'introduce-ndo_hwtstamp_get-and-ndo_hwtstamp_set'Jakub Kicinski
Vladimir Oltean says: ==================== Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() Based on previous RFCs from Maxim Georgiev: https://lore.kernel.org/netdev/20230502043150.17097-1-glipus@gmail.com/ this series attempts to introduce new API for the hardware timestamping control path (SIOCGHWTSTAMP and SIOCSHWTSTAMP handling). I don't have any board with phylib hardware timestamping, so I would appreciate testing (especially on lan966x, the most intricate conversion). I was, however, able to test netdev level timestamping, because I also have some more unsubmitted conversions in progress: https://github.com/vladimiroltean/linux/commits/ndo-hwtstamp-v9 I hope that the concerns expressed in the comments of previous series were addressed, and that Köry Maincent's series: https://lore.kernel.org/netdev/20230406173308.401924-1-kory.maincent@bootlin.com/ can make progress in parallel with the conversion of the rest of drivers. ==================== Link: https://lore.kernel.org/r/20230801142824.1772134-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted ↵Vladimir Oltean
drivers It is desirable that the new .ndo_hwtstamp_set() API gives more uniformity, less overhead and future flexibility w.r.t. the PHY timestamping behavior. Currently there are some drivers which allow PHY timestamping through the procedure mentioned in Documentation/networking/timestamping.rst. They don't do anything locally if phy_has_hwtstamp() is set, except for lan966x which installs PTP packet traps. Centralize that behavior in a new dev_set_hwtstamp_phylib() code function, which calls either phy_mii_ioctl() for the phylib PHY, or .ndo_hwtstamp_set() of the netdev, based on a single policy (currently simplistic: phy_has_hwtstamp()). Any driver converted to .ndo_hwtstamp_set() will automatically opt into the centralized phylib timestamping policy. Unconverted drivers still get to choose whether they let the PHY handle timestamping or not. Netdev drivers with integrated PHY drivers that don't use phylib presumably don't set dev->phydev, and those will always see HWTSTAMP_SOURCE_NETDEV requests even when converted. The timestamping policy will remain 100% up to them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230801142824.1772134-13-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: phy: provide phylib stubs for hardware timestamping operationsVladimir Oltean
net/core/dev_ioctl.c (built-in code) will want to call phy_mii_ioctl() for hardware timestamping purposes. This is not directly possible, because phy_mii_ioctl() is a symbol provided under CONFIG_PHYLIB. Do something similar to what was done in DSA in commit 5a17818682cf ("net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub"), and arrange some indirect calls to phy_mii_ioctl() through a stub structure containing function pointers, that's provided by phylib as built-in even when CONFIG_PHYLIB=m, and which phy_init() populates at runtime (module insertion). Note: maybe the ownership of the ethtool_phy_ops singleton is backwards, and the methods exposed by that should be later merged into phylib_stubs. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230801142824.1772134-12-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: transfer rtnl_lock() requirement from ethtool_set_ethtool_phy_ops() to ↵Vladimir Oltean
caller phy_init() and phy_exit() will have to do more stuff under rtnl_lock() in a future change. Since rtnl_unlock() -> netdev_run_todo() does a lot of stuff under the hood, it's a pity to lock and unlock the rtnetlink mutex twice in a row. Change the calling convention such that the only caller of ethtool_set_ethtool_phy_ops(), phy_device.c, provides a context where the rtnl_mutex is already acquired. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230801142824.1772134-11-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: lan966x: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
The hardware timestamping through ndo_eth_ioctl() is going away. Convert the lan966x driver to the new API before that can be removed. After removing the timestamping logic from lan966x_port_ioctl(), the rest is equivalent to phy_do_ioctl(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230801142824.1772134-10-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: sparx5: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
The hardware timestamping through ndo_eth_ioctl() is going away. Convert the sparx5 driver to the new API before that can be removed. After removing the timestamping logic from sparx5_port_ioctl(), the rest is equivalent to phy_do_ioctl(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Link: https://lore.kernel.org/r/20230801142824.1772134-9-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: fec: delete fec_ptp_disable_hwts()Vladimir Oltean
Commit 340746398b67 ("net: fec: fix hardware time stamping by external devices") was overly cautious with calling fec_ptp_disable_hwts() when cmd == SIOCSHWTSTAMP and use_fec_hwts == false, because use_fec_hwts is based on a runtime invariant (phy_has_hwtstamp()). Thus, if use_fec_hwts is false, then fep->hwts_tx_en and fep->hwts_rx_en cannot be changed at runtime; their values depend on the initial memory allocation, which already sets them to zeroes. If the core will ever gain support for switching timestamping layers, it will arrange for a more organized calling convention and disable timestamping in the previous layer as a first step. This means that the code in the FEC driver is not necessary in any case. The purpose of this change is to arrange the phy_has_hwtstamp() code in a way in which it can be refactored away into generic logic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20230801142824.1772134-8-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: fec: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
The hardware timestamping through ndo_eth_ioctl() is going away. Convert the FEC driver to the new API before that can be removed. After removing the timestamping logic from fec_enet_ioctl(), the rest is equivalent to phy_do_ioctl_running(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20230801142824.1772134-7-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: bonding: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()Maxim Georgiev
bonding is one of the stackable net devices which pass the hardware timestamping ops to the real device through ndo_eth_ioctl(). This prevents converting any device driver to the new hwtimestamping API without regressions. Remove that limitation in bonding by using the newly introduced helpers for timestamping through lower devices, that handle both the new and the old driver API. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-6-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: macvlan: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()Maxim Georgiev
macvlan is one of the stackable net devices which pass the hardware timestamping ops to the real device through ndo_eth_ioctl(). This prevents converting any device driver to the new hwtimestamping API without regressions. Remove that limitation in macvlan by using the newly introduced helpers for timestamping through lower devices, that handle both the new and the old driver API. macvlan only implements ndo_eth_ioctl() for these 2 operations, so delete that method. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: vlan: convert to ndo_hwtstamp_get() / ndo_hwtstamp_set()Maxim Georgiev
8021q is one of the stackable net devices which pass the hardware timestamping ops to the real device through ndo_eth_ioctl(). This prevents converting any device driver to the new hwtimestamping API without regressions. Remove that limitation in the vlan driver by using the newly introduced helpers for timestamping through lower devices, that handle both the new and the old driver API. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: add hwtstamping helpers for stackable net devicesMaxim Georgiev
The stackable net devices with hwtstamping support (vlan, macvlan, bonding) only pass the hwtstamping ops to the lower (real) device. These drivers are the first that need to be converted to the new timestamping API, because if they aren't prepared to handle that, then no real device driver cannot be converted to the new API either. After studying what vlan_dev_ioctl(), macvlan_eth_ioctl() and bond_eth_ioctl() have in common, here we propose two generic implementations of ndo_hwtstamp_get() and ndo_hwtstamp_set() which can be called by those 3 drivers, with "dev" being their lower device. These helpers cover both cases, when the lower driver is converted to the new API or unconverted. We need some hacks in case of an unconverted driver, namely to stuff some pointers in struct kernel_hwtstamp_config which shouldn't have been there (since the new API isn't supposed to need it). These will be removed when all drivers will have been converted to the new API. Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230801142824.1772134-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: add NDOs for configuring hardware timestampingMaxim Georgiev
Current hardware timestamping API for NICs requires implementing .ndo_eth_ioctl() for SIOCGHWTSTAMP and SIOCSHWTSTAMP. That API has some boilerplate such as request parameter translation between user and kernel address spaces, handling possible translation failures correctly, etc. Since it is the same all across the board, it would be desirable to handle it through generic code. Here we introduce .ndo_hwtstamp_get() and .ndo_hwtstamp_set(), which implement that boilerplate and allow drivers to just act upon requests. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Maxim Georgiev <glipus@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20230801142824.1772134-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02Merge branch 'net-extend-alloc_skb_with_frags-max-size'Jakub Kicinski
Eric Dumazet says: ==================== net: extend alloc_skb_with_frags() max size alloc_skb_with_frags(), while being able to use high order allocations, limits the payload size to PAGE_SIZE * MAX_SKB_FRAGS Reviewing Tahsin Erdogan patch [1], it was clear to me we need to remove this limitation. [1] https://lore.kernel.org/netdev/20230731230736.109216-1-trdgn@amazon.com/ v2: Addressed Willem feedback on 1st patch. ==================== Link: https://lore.kernel.org/r/20230801205254.400094-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: tap: change tap_alloc_skb() to allow bigger paged allocationsEric Dumazet
tap_alloc_skb() is currently calling sock_alloc_send_pskb() forcing order-0 page allocations. Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max size by 8x. Also add logic to increase the linear part if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/packet: change packet_alloc_skb() to allow bigger paged allocationsEric Dumazet
packet_alloc_skb() is currently calling sock_alloc_send_pskb() forcing order-0 page allocations. Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max size by 8x. Also add logic to increase the linear part if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: tun: change tun_alloc_skb() to allow bigger paged allocationsEric Dumazet
tun_alloc_skb() is currently calling sock_alloc_send_pskb() forcing order-0 page allocations. Switch to PAGE_ALLOC_COSTLY_ORDER, to increase max allocation size by 8x. Also add logic to increase the linear part if needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net: allow alloc_skb_with_frags() to allocate bigger packetsEric Dumazet
Refactor alloc_skb_with_frags() to allow bigger packets allocations. Instead of assuming that only order-0 allocations will be attempted, use the caller supplied max order. v2: try harder to use high-order pages, per Willem feedback. Link: https://lore.kernel.org/netdev/CANn89iJQfmc_KeUr3TeXvsLQwo3ZymyoCr7Y6AnHrkWSuz0yAg@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tahsin Erdogan <trdgn@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230801205254.400094-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02sctp: Remove unused function declarationsYue Haibing
These declarations are never implemented since beginning of git history. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230731141030.32772-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02Merge branch 'mlx5-ipsec-fixes'Jakub Kicinski
Leon Romanovsky says: ==================== mlx5 IPsec fixes The following patches are combination of Jianbo's work on IPsec eswitch mode together with our internal review toward addition of TCP protocol selectors support to IPSec packet offload. Despite not-being fix, the first patch helps us to make second one more clear, so I'm asking to apply it anyway as part of this series. ==================== Link: https://lore.kernel.org/r/cover.1690803944.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Set proper IPsec source port in L4 selectorLeon Romanovsky
Fix typo in setup_fte_upper_proto_match() where destination UDP port was used instead of source port. Fixes: a7385187a386 ("net/mlx5e: IPsec, support upper protocol selector field offload") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/ffc024a4d192113103f392b0502688366ca88c1f.1690803944.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prioJianbo Liu
In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added to support multiple parallel namespaces for multi-chains. And we skip all the flow tables under the fs_node of this type unconditionally, when searching for the next or previous flow table to connect for a new table. As this search function is also used for find new root table when the old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS fs_node next to the old root. However, new root table should be chosen from it if there is any table in it. Fix it by skipping only the flow tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the closest FT for a fs_node. Besides, complete the connecting from FTs of previous priority of prio because there should be multiple prevs after this fs_prio type is introduced. And also the next FT should be chosen from the first flow table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if this prio is the first child. Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.1690803944.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5: fs_core: Make find_closest_ft more genericJianbo Liu
As find_closest_ft_recursive is called to find the closest FT, the first parameter of find_closest_ft can be changed from fs_prio to fs_node. Thus this function is extended to find the closest FT for the nodes of any type, not only prios, but also the sub namespaces. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/d3962c2b443ec8dde7a740dc742a1f052d5e256c.1690803944.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02Merge branch 'mlx5-ipsec-packet-offload-support-in-eswitch-mode'Jakub Kicinski
Leon Romanovsky says: ==================== mlx5 IPsec packet offload support in eswitch mode This series from Jianbo adds mlx5 IPsec packet offload support in eswitch offloaded mode. It works exactly like "regular" IPsec, nothing special, except now users can switch to switchdev before adding IPsec rules. devlink dev eswitch set pci/0000:06:00.0 mode switchdev Same configurations as here: https://lore.kernel.org/netdev/cover.1670005543.git.leonro@nvidia.com/ Packet offload mode: ip xfrm state offload packet dev <if-name> dir <in|out> ip xfrm policy .... offload packet dev <if-name> Crypto offload mode: ip xfrm state offload crypto dev <if-name> dir <in|out> or (backward compatibility) ip xfrm state offload dev <if-name> dir <in|out> v0: https://lore.kernel.org/all/cover.1689064922.git.leonro@nvidia.com ==================== Link: https://lore.kernel.org/r/cover.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdevJianbo Liu
For IPsec packet offload mode, the order of TC offload and IPsec offload on the same netdevice is not aligned with the order in the non-offload software. For example, for RX, the software performs TC first and then IPsec transformation, but the implementation for offload does that in the opposite way. To resolve the difference for now, either IPsec offload or TC offload, not both, is allowed for a specific interface. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/8e2e5e3b0984d785066e8663aaf97b3ba1bb873f.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Add get IPsec offload stats for uplink representorJianbo Liu
As IPsec offload is supported in switchdev mode, HW stats can be can be obtained from uplink rep. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/b43c91c452f1db9c35c10639a029aa10fd8b7895.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Modify and restore TC rules for IPSec TX rulesJianbo Liu
After IPsec policy/state TX rules are added, any TC flow rule, which forwards packets to uplink, is modified to forward to IPsec TX tables. As these tables are destroyed dynamically, whenever there is no reference to them, the destinations of this kind of rules must be restored to uplink. There is a special case for packet encapsulation, as the packet_reformat_id in the extended destination is used to reformat packets, but only for the VPORT destination. To forward packet to IPsec table and do encapsulation in one FTE, move the packet_reformat_id to flow context, instead of using the extended destination. As a limitation, multiple encapsulations with table forwarding, and one together with other VPORT destinations, are not allowed, so add a check when offloading TC rules. TC rules are not allowed before IPsec TX rule is added, so only need to restore TC rules after flush IPSec TX rules. As they are saved in the vport_rep rhashtables, we walk all the rules in the rhashtables, and find TC rules with destinations pointing to IPsec tables, and modify them one by one. To avoid concurrent issue, this handling is done under the protection of eswitch mode_lock. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/7bcb2c7e2ecf0e0d06b095c8dcc6a37ea7f02faf.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Make IPsec offload work together with eswitch and TCJianbo Liu
The eswitch mode is not allowed to change if there are any IPsec rules. Besides, by using mlx5_esw_try_lock() to get eswitch mode lock, IPsec rules are not allowed to be offloaded if there are any TC rules. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/e442b512b21a931fbdfb87d57ae428c37badd58a.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5: Compare with old_dest param to modify rule destinationJianbo Liu
The rule destination must be compared with the old_dest passed in. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/24adc60d05d7492359ba343c6da1ebbe9fe284f6.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Support IPsec packet offload for TX in switchdev modeJianbo Liu
The IPsec encryption is done at the last, so add new prio for IPsec offload in FDB, and put it just lower than the slow path prio and higher than the per-vport prio. Three levels are added for TX. The first one is for ip xfrm policy. The sa table is created in the second level for ip xfrm state. The status table is created at the last to count the number of packets encrypted. The rules, which forward packets to uplink, are changed to forward them to IPsec TX tables first. These rules are restored after those tables are destroyed, which is done immediately when there is no reference to them, just as what does in legacy mode. The support for slow path is added here, by refreshing uplink's channels. But, the handling for TC fast path, which is more complicated, will be added later. Besides, reg c4 is used instead to match reqid. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/cfd0e6ffaf0b8c55ebaa9fb0649b7c504b6b8ec6.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Refactor IPsec TX tables creationJianbo Liu
Add attribute for IPsec TX creation, pass all needed parameters in it, so tx_create() can be used by eswitch. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/24d5ab988b0db2d39b7fde321b44ffe885d47828.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Handle IPsec offload for RX datapath in switchdev modeJianbo Liu
Reuse tun opts bits in reg c1, to pass IPsec obj id to datapath. As this is only for RX SA and there are only 11 bits, xarray is used to map IPsec obj id to an index, which is between 1 and 0x7ff, and replace obj id to write to reg c1. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/43d60fbcc9cd672a97d7e2a2f7fe6a3d9e9a776d.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Support IPsec packet offload for RX in switchdev modeJianbo Liu
As decryption must be done first, add new prio for IPsec offload in FDB, and put it just lower than BYPASS prio and higher than TC prio. Three levels are added for RX. The first one is for ip xfrm policy. SA table is created in the second level for ip xfrm state. The status table is created in the last to check the decryption result. If success, packets continue with the next process, or dropped otherwise. For now, the set of reg c1 is removed for swtichdev mode, and the datapath process will be added in the next patch. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/c91063554cf643fb50b99cf093e8a9bf11729de5.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Refactor IPsec RX tables creation and destructionJianbo Liu
Add attribute for IPsec RX creation, so rx_create() can be used by eswitch in later patch. And move the code for TTC dest connect/disconnect, which are needed only in NIC mode, to individual functions. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/87478d928479b6a4eee41901204546ea05741815.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Prepare IPsec packet offload for switchdev modeJianbo Liu
As the uplink representor is created only in switchdev mode, add a local variable for IPsec to indicate the device is in this mode. In this mode, IPsec ROCE is disabled, and crypto offload is kept as it is. However, as the tables for packet offload are created in FDB, ipsec->rx_esw and ipsec->tx_esw are added. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/ee242398f3b0a18007749fe79ff6ff19445a0280.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Change the parameter of IPsec RX skb handle functionJianbo Liu
Refactor the function to pass in reg B value only. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/3b3c53f64660d464893eaecc41298b1ce49c6baa.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02net/mlx5e: Add function to get IPsec offload namespaceJianbo Liu
Add function to get namespace in different directions. It will be extended for switchdev mode in later patch, but no functionality change for now. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/ac2982c34f1ed3288d4670cacfd7e1b87a8c96d9.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-02Merge tag 'soc-fixes-6.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "A couple of platforms get a lone dts fix each: - SoCFPGA: Fix incorrect I2C property for SCL signal - Renesas: Fix interrupt names for MTU3 channels on RZ/G2L and RZ/V2L. - Juno/Vexpress: remove a dangling symlink - at91: sam9x60 SoC detection compatible strings - nspire: Fix arm primecell compatible string On the NXP i.MX platform, there multiple issues that get addressed: - A couple of ARM DTS fixes for i.MX6SLL usbphy and supported CPU frequency of sk-imx53 board - Add missing pull-up for imx8mn-var-som onboard PHY reset pinmux - A couple of imx8mm-venice fixes from Tim Harvey to diable disp_blk_ctrl - A couple of phycore-imx8mm fixes from Yashwanth Varakala to correct VPU label and gpio-line-names - Fix imx8mp-blk-ctrl driver to register HSIO PLL clock as bus_power_dev child, so that runtime PM can translate into the necessary GPC power domain action On the driver side, there are two fixes for tegra memory controller drivers addressing regressions from the merge window, a couple of minor correctness fixes for SCMI and SMCCC firmware, as well as a build fix for an lcd backlight driver" * tag 'soc-fixes-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (22 commits) backlight: corgi_lcd: fix missing prototype memory: tegra: make icc_set_bw return zero if BWMGR not supported arm64: dts: renesas: rzg2l: Update overfow/underflow IRQ names for MTU3 channels dt-bindings: serial: atmel,at91-usart: update compatible for sam9x60 ARM: dts: at91: sam9x60: fix the SOC detection ARM: dts: nspire: Fix arm primecell compatible string firmware: arm_scmi: Fix chan_free cleanup on SMC firmware: arm_scmi: Drop OF node reference in the transport channel setup soc: imx: imx8mp-blk-ctrl: register HSIO PLL clock as bus_power_dev child ARM: dts: nxp/imx: limit sk-imx53 supported frequencies firmware: arm_scmi: Fix signed error return values handling firmware: smccc: Fix use of uninitialised results structure arm64: dts: freescale: Fix VPU G2 clock arm64: dts: imx8mn-var-som: add missing pull-up for onboard PHY reset pinmux arm64: dts: phycore-imx8mm: Correction in gpio-line-names arm64: dts: phycore-imx8mm: Label typo-fix of VPU ARM: dts: nxp/imx6sll: fix wrong property name in usbphy node arm64: dts: imx8mm-venice-gw7904: disable disp_blk_ctrl arm64: dts: imx8mm-venice-gw7903: disable disp_blk_ctrl arm64: dts: arm: Remove the dangling vexpress-v2m-rs1.dtsi symlink ...
2023-08-02Merge tag 'bitmap-6.5-rc5' of https://github.com:/norov/linuxLinus Torvalds
Pull bitmap fixes from Yury Norov: - Fix for bitmap documentation - Fix for kernel build under certain configurations * tag 'bitmap-6.5-rc5' of https://github.com:/norov/linux: lib/bitmap: workaround const_eval test build failure cpumask: eliminate kernel-doc warnings
2023-08-02Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer()YueHaibing
Since commit 30fbee49b071 ("Staging: hv: vmbus: Get rid of the unused function vmbus_ontimer()") this is not used anymore, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20230725142108.27280-1-yuehaibing@huawei.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-08-02x86/hyperv: add noop functions to x86_init mpparse functionsSaurabh Sengar
Hyper-V can run VMs at different privilege "levels" known as Virtual Trust Levels (VTL). Sometimes, it chooses to run two different VMs at different levels but they share some of their address space. In such setups VTL2 (higher level VM) has visibility of all of the VTL0 (level 0) memory space. When the CONFIG_X86_MPPARSE is enabled for VTL2, the VTL2 kernel performs a search within the low memory to locate MP tables. However, in systems where VTL0 manages the low memory and may contain valid tables, this scanning can result in incorrect MP table information being provided to the VTL2 kernel, mistakenly considering VTL0's MP table as its own Add noop functions to avoid MP parse scan by VTL2. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/1687537688-5397-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-08-02Merge branch 'bpf-xdp-add-tracepoint-to-xdp-attaching-failure'Alexei Starovoitov
Leon Hwang says: ==================== bpf, xdp: Add tracepoint to xdp attaching failure This series introduces a new tracepoint in bpf_xdp_link_attach(). By this tracepoint, error message will be captured when error happens in dev_xdp_attach(), e.g. invalid attaching flags. v4 -> v5: * Initialise the extack variable. * Fix code style issue of variable declaration lines. v3 -> v4: * Fix selftest-crashed issue. ==================== Link: https://lore.kernel.org/r/20230801142621.7925-1-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-02selftests/bpf: Add testcase for xdp attaching failure tracepointLeon Hwang
Add a test case for the tracepoint of xdp attaching failure by bpf tracepoint when attach XDP to a device with invalid flags option. The bpf tracepoint retrieves error message from the tracepoint, and then put the error message to a perf buffer. The testing code receives error message from perf buffer, and then ASSERT "Invalid XDP flags for BPF link attachment". Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20230801142621.7925-3-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>