summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-23net/sched: fix a qdisc modification with ambiguous command requestJamal Hadi Salim
When replacing an existing root qdisc, with one that is of the same kind, the request boils down to essentially a parameterization change i.e not one that requires allocation and grafting of a new qdisc. syzbot was able to create a scenario which resulted in a taprio qdisc replacing an existing taprio qdisc with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to create and graft scenario. The fix ensures that only when the qdisc kinds are different that we should allow a create and graft, otherwise it goes into the "change" codepath. While at it, fix the code and comments to improve readability. While syzbot was able to create the issue, it did not zone on the root cause. Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down. v1->V2 changes: - remove "inline" function definition (Vladmir) - remove extrenous braces in branches (Vladmir) - change inline function names (Pedro) - Run tdc tests (Victor) v2->v3 changes: - dont break else/if (Simon) Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/ Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23net: dsa: rzn1-a5psw: remove redundant logsAlexis Lothoré
Remove debug logs in port vlan management, since there are already multiple tracepoints defined for those operations in DSA Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23net: Avoid address overwrite in kernel_connectJordan Rife
BPF programs that run on connect can rewrite the connect address. For the connect system call this isn't a problem, because a copy of the address is made when it is moved into kernel space. However, kernel_connect simply passes through the address it is given, so the caller may observe its address value unexpectedly change. A practical example where this is problematic is where NFS is combined with a system such as Cilium which implements BPF-based load balancing. A common pattern in software-defined storage systems is to have an NFS mount that connects to a persistent virtual IP which in turn maps to an ephemeral server IP. This is usually done to achieve high availability: if your server goes down you can quickly spin up a replacement and remap the virtual IP to that endpoint. With BPF-based load balancing, mounts will forget the virtual IP address when the address rewrite occurs because a pointer to the only copy of that address is passed down the stack. Server failover then breaks, because clients have forgotten the virtual IP address. Reconnects fail and mounts remain broken. This patch was tested by setting up a scenario like this and ensuring that NFS reconnects worked after applying the patch. Signed-off-by: Jordan Rife <jrife@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23virtio_net: Introduce skb_vnet_common_hdr to avoid typecastingFeng Liu
The virtio_net driver currently deals with different versions and types of virtio net headers, such as virtio_net_hdr_mrg_rxbuf, virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies on multiple type casts to convert memory between different structures, potentially leading to bugs when there are changes in these structures. Introduces the "struct skb_vnet_common_hdr" as a unifying header structure using a union. With this approach, various virtio net header structures can be converted by accessing different members of this structure, thus eliminating the need for type casting and reducing the risk of potential bugs. For example following code: static struct sk_buff *page_to_skb(struct virtnet_info *vi, struct receive_queue *rq, struct page *page, unsigned int offset, unsigned int len, unsigned int truesize, unsigned int headroom) { [...] struct virtio_net_hdr_mrg_rxbuf *hdr; [...] hdr_len = vi->hdr_len; [...] ok: hdr = skb_vnet_hdr(skb); memcpy(hdr, hdr_p, hdr_len); [...] } When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20 But the sizeof(*hdr) is 12, memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr, which make a potential risk of bug. And this risk can be avoided by introducing struct skb_vnet_common_hdr. Change log v1->v2 feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com> feedback from Simon Horman <horms@kernel.org> 1. change to use net-next tree. 2. move skb_vnet_common_hdr inside kernel file instead of the UAPI header. v2->v3 feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com> 1. fix typo in commit message. 2. add original struct virtio_net_hdr into union 3. remove virtio_net_hdr_mrg_rxbuf variable in receive_buf; Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23dp83640: Use list_for_each_entry() helperJinjie Ruan
Convert list_for_each() to list_for_each_entry() where applicable. No functional changed. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23entry: Remove empty addr_limit_user_check()Mark Rutland
Back when set_fs() was a generic API for altering the address limit, addr_limit_user_check() was a safety measure to prevent userspace being able to issue syscalls with an unbound limit. With the the removal of set_fs() as a generic API, the last user of addr_limit_user_check() was removed in commit: b5a5a01d8e9a44ec ("arm64: uaccess: remove addr_limit_user_check()") ... as since that commit, no architecture defines TIF_FSCHECK, and hence addr_limit_user_check() always expands to nothing. Remove addr_limit_user_check(), updating the comment in exit_to_user_mode_prepare() to no longer refer to it. At the same time, the comment is reworded to be a little more generic so as to cover kmap_assert_nomap() in addition to lockdep_sys_exit(). No functional change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230821163526.2319443-1-mark.rutland@arm.com
2023-08-23dt-bindings: pinctrl: renesas,rza2: Use 'additionalProperties' for child nodesRob Herring
A schema under 'additionalProperties' works better for matching any property/node other than the ones explicitly listed. Convert the schema to use that rather than the wildcard and if/then schema. Drop 'phandle' properties which never need to be explicitly listed while we're here. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230819010928.916438-1-robh@kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-08-23dma-buf/sw_sync: Avoid recursive lock during fence signalRob Clark
If a signal callback releases the sw_sync fence, that will trigger a deadlock as the timeline_fence_release recurses onto the fence->lock (used both for signaling and the the timeline tree). To avoid that, temporarily hold an extra reference to the signalled fences until after we drop the lock. (This is an alternative implementation of https://patchwork.kernel.org/patch/11664717/ which avoids some potential UAF issues with the original patch.) v2: Remove now obsolete comment, use list_move_tail() and list_del_init() Reported-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free") Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230818145939.39697-1-robdclark@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2023-08-23media: vcodec: Fix potential array out-of-bounds in encoder queue_setupWei Chen
variable *nplanes is provided by user via system call argument. The possible value of q_data->fmt->num_planes is 1-3, while the value of *nplanes can be 1-8. The array access by index i can cause array out-of-bounds. Fix this bug by checking *nplanes against the array size. Fixes: 4e855a6efa54 ("[media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver") Signed-off-by: Wei Chen <harperchen1110@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-08-23PCI: fu740: Set the number of MSI vectorsYong-Xuan Wang
The iMSI-RX module of the DW PCIe controller provides multiple sets of MSI_CTRL_INT_i_* registers, and each set is capable of handling 32 MSI interrupts. However, the fu740 PCIe controller driver only enabled one set of MSI_CTRL_INT_i_* registers, as the total number of supported interrupts was not specified. Set the supported number of MSI vectors to enable all the MSI_CTRL_INT_i_* registers on the fu740 PCIe core, allowing the system to fully utilize the available MSI interrupts. Link: https://lore.kernel.org/r/20230807055621.2431-1-yongxuan.wang@sifive.com Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
2023-08-23Merge branch 'mlx4-aux-bus'David S. Miller
Petr Pavlu says: ==================== Convert mlx4 to use auxiliary bus This series converts the mlx4 drivers to use auxiliary bus, similarly to how mlx5 was converted [1]. The first 6 patches are preparatory changes, the remaining 4 are the final conversion. Initial motivation for this change was to address a problem related to loading mlx4_en/mlx4_ib by mlx4_core using request_module_nowait(). When doing such a load in initrd, the operation is asynchronous to any init control and can get unexpectedly affected/interrupted by an eventual root switch. Using an auxiliary bus leaves these module loads to udevd which better integrates with systemd processing. [2] General benefit is to get rid of custom interface logic and instead use a common facility available for this task. An obvious risk is that some new bug is introduced by the conversion. Leon Romanovsky was kind enough to check for me that the series passes their verification tests. Changes since v2 [3]: * Use 'void *' as the event param of mlx4_dispatch_event(). Changes since v1 [4]: * Fix a missing definition of the err variable in mlx4_en_add(). * Remove not needed comments about the event type in mlx4_en_event() and mlx4_ib_event(). [1] https://lore.kernel.org/netdev/20201101201542.2027568-1-leon@kernel.org/ [2] https://lore.kernel.org/netdev/0a361ac2-c6bd-2b18-4841-b1b991f0635e@suse.com/ [3] https://lore.kernel.org/netdev/20230813145127.10653-1-petr.pavlu@suse.com/ [4] https://lore.kernel.org/netdev/20230804150527.6117-1-petr.pavlu@suse.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Delete custom device management logicPetr Pavlu
After the conversion to use the auxiliary bus, the custom device management is not needed anymore and can be deleted. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Connect the infiniband part to the auxiliary busPetr Pavlu
Use the auxiliary bus to perform device management of the infiniband part of the mlx4 driver. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Connect the ethernet part to the auxiliary busPetr Pavlu
Use the auxiliary bus to perform device management of the ethernet part of the mlx4 driver. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Register mlx4 devices to an auxiliary virtual busPetr Pavlu
Add an auxiliary virtual bus to model the mlx4 driver structure. The code is added along the current custom device management logic. Subsequent patches switch mlx4_en and mlx4_ib to the auxiliary bus and the old interface is then removed. Structure mlx4_priv gains a new adev dynamic array to keep track of its auxiliary devices. Access to the array is protected by the global mlx4_intf mutex. Functions mlx4_register_device() and mlx4_unregister_device() are updated to expose auxiliary devices on the bus in order to load mlx4_en and/or mlx4_ib. Functions mlx4_register_auxiliary_driver() and mlx4_unregister_auxiliary_driver() are added to substitute mlx4_register_interface() and mlx4_unregister_interface(), respectively. Function mlx4_do_bond() is adjusted to walk over the adev array and re-adds a specific auxiliary device if its driver sets the MLX4_INTFF_BONDING flag. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Avoid resetting MLX4_INTFF_BONDING per driverPetr Pavlu
The mlx4_core driver has a logic that allows a sub-driver to set the MLX4_INTFF_BONDING flag which then causes that function mlx4_do_bond() asks the sub-driver to fully re-probe a device when its bonding configuration changes. Performing this operation is disallowed in mlx4_register_interface() when it is detected that any mlx4 device is multifunction (SRIOV). The code then resets MLX4_INTFF_BONDING in the driver flags. Move this check directly into mlx4_do_bond(). It provides a better separation as mlx4_core no longer directly modifies the sub-driver flags and it will allow to get rid of explicitly keeping track of all mlx4 devices by the intf.c code when it is switched to an auxiliary bus. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Move the bond work to the core driverPetr Pavlu
Function mlx4_en_queue_bond_work() is used in mlx4_en to start a bond reconfiguration. It gathers data about a new port map setting, takes a reference on the netdev that triggered the change and queues a work object on mlx4_en_priv.mdev.workqueue to perform the operation. The scheduled work is mlx4_en_bond_work() which calls mlx4_bond()/mlx4_unbond() and consequently mlx4_do_bond(). At the same time, function mlx4_change_port_types() in mlx4_core might be invoked to change the port type configuration. As part of its logic, it re-registers the whole device by calling mlx4_unregister_device(), followed by mlx4_register_device(). The two operations can result in concurrent access to the data about currently active interfaces on the device. Functions mlx4_register_device() and mlx4_unregister_device() lock the intf_mutex to gain exclusive access to this data. The current implementation of mlx4_do_bond() doesn't do that which could result in an unexpected behavior. An updated version of mlx4_do_bond() for use with an auxiliary bus goes and locks the intf_mutex when accessing a new auxiliary device array. However, doing so can then result in the following deadlock: * A two-port mlx4 device is configured as an Ethernet bond. * One of the ports is changed from eth to ib, for instance, by writing into a mlx4_port<x> sysfs attribute file. * mlx4_change_port_types() is called to update port types. It invokes mlx4_unregister_device() to unregister the device which locks the intf_mutex and starts removing all associated interfaces. * Function mlx4_en_remove() gets invoked and starts destroying its first netdev. This triggers mlx4_en_netdev_event() which recognizes that the configured bond is broken. It runs mlx4_en_queue_bond_work() which takes a reference on the netdev. Removing the netdev now cannot proceed until the work is completed. * Work function mlx4_en_bond_work() gets scheduled. It calls mlx4_unbond() -> mlx4_do_bond(). The latter function tries to lock the intf_mutex but that is not possible because it is held already by mlx4_unregister_device(). This particular case could be possibly solved by unregistering the mlx4_en_netdev_event() notifier in mlx4_en_remove() earlier, but it seems better to decouple mlx4_en more and break this reference order. Avoid then this scenario by recognizing that the bond reconfiguration operates only on a mlx4_dev. The logic to queue and execute the bond work can be moved into the mlx4_core driver. Only a reference on the respective mlx4_dev object is needed to be taken during the work's lifetime. This removes a call from mlx4_en that can directly result in needing to lock the intf_mutex, it remains a privilege of the core driver. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Get rid of the mlx4_interface.activate callbackPetr Pavlu
The mlx4_interface.activate callback was introduced in commit 79857cd31fe7 ("net/mlx4: Postpone the registration of net_device"). It dealt with a situation when a netdev notifier received a NETDEV_REGISTER event for a new net_device created by mlx4_en but the same device was not yet visible to mlx4_get_protocol_dev(). The callback can be removed now that mlx4_get_protocol_dev() is gone. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Replace the mlx4_interface.event callback with a notifierPetr Pavlu
Use a notifier to implement mlx4_dispatch_event() in preparation to switch mlx4_en and mlx4_ib to be an auxiliary device. A problem is that if the mlx4_interface.event callback was replaced with something as mlx4_adrv.event then the implementation of mlx4_dispatch_event() would need to acquire a lock on a given device before executing this callback. That is necessary because otherwise there is no guarantee that the associated driver cannot get unbound when the callback is running. However, taking this lock is not possible because mlx4_dispatch_event() can be invoked from the hardirq context. Using an atomic notifier allows the driver to accurately record when it wants to receive these events and solves this problem. A handler registration is done by both mlx4_en and mlx4_ib at the end of their mlx4_interface.add callback. This matches the current situation when mlx4_add_device() would enable events for a given device immediately after this callback, by adding the device on the mlx4_priv.list. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Use 'void *' as the event param of mlx4_dispatch_event()Petr Pavlu
Function mlx4_dispatch_event() takes an 'unsigned long' as its event parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR), a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit integer (remaining events). In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device, the mlx4_interface.event callback is replaced with a notifier and function mlx4_dispatch_event() gets updated to invoke atomic_notifier_call_chain(). This requires forwarding the input 'param' value from the former function to the latter. A problem is that the notifier call takes 'void *' as its 'param' value, compared to 'unsigned long' used by mlx4_dispatch_event(). Re-passing the value would need either punning it to 'void *' or passing down the address of the input 'param'. Both approaches create a number of unnecessary casts. Change instead the input 'param' of mlx4_dispatch_event() from 'unsigned long' to 'void *'. A mlx4_eqe pointer can be passed directly, callers using an int value are adjusted to pass its address. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Rename member mlx4_en_dev.nb to netdev_nbPetr Pavlu
Rename the mlx4_en_dev.nb notifier_block member to netdev_nb in preparation to add a mlx4 core notifier_block. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23mlx4: Get rid of the mlx4_interface.get_dev callbackPetr Pavlu
Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev() and the associated mlx4_interface.get_dev callbacks. This is done in preparation to use an auxiliary bus to model the mlx4 driver structure. The change is motivated by the following situation: * The mlx4_en interface is being initialized by mlx4_en_add() and mlx4_en_activate(). * The latter activate function calls mlx4_en_init_netdev() -> register_netdev() to register a new net_device. * A netdev event NETDEV_REGISTER is raised for the device. * The netdev notififier mlx4_ib_netdev_event() is called and it invokes mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() -> mlx4_en_get_netdev() [via mlx4_interface.get_dev]. This chain creates a problem when mlx4_en gets switched to be an auxiliary driver. It contains two device calls which would both need to take a respective device lock. Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer call mlx4_get_protocol_dev() but instead to utilize the information passed in net_device.parent and net_device.dev_port. This data is sufficient to determine that an updated port is one that the mlx4_ib driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to date. Following that, update mlx4_ib_get_netdev() to also not call mlx4_get_protocol_dev() and instead scan all current netdevs to find find a matching one. Note that mlx4_ib_get_netdev() is called early from ib_register_device() and cannot use data tracked in mlx4_ib_dev.iboe.netdevs which is not at that point yet set. Finally, remove function mlx4_get_protocol_dev() and the mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they became unused. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23qed/qede: Remove unused declarationsYue Haibing
Commit 8cd160a29415 ("qede: convert to new udp_tunnel_nic infra") removed qede_udp_tunnel_{add,del}() but not the declarations. Commit 0ebcebbef1cc ("qed: Read device port count from the shmem") removed qed_device_num_engines() but not its declaration. Commit 1e128c81290a ("qed: Add support for hardware offloaded FCoE.") declared but never implemented qed_fcoe_set_pf_params(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23octeontx2-pf: Use PTP HW timestamp counter atomic update featureSai Krishna
Some of the newer silicon versions in CN10K series supports a feature where in the current PTP timestamp in HW can be updated atomically without losing any cpu cycles unlike read/modify/write register. This patch uses this feature so that PTP accuracy can be improved while adjusting the master offset in HW. There is no need for SW timecounter when using this feature. So removed references to SW timecounter wherever appropriate. Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-22net/mlx5e: Support IPsec upper TCP protocol selectorLeon Romanovsky
Support TCP as protocol selector for policy and state in IPsec packet offload mode. Example of state configuration is as follows: ip xfrm state add src 192.168.25.3 dst 192.168.25.1 \ proto esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \ 0x54a7588d36873b031e4bd46301be5a86b3a53879 128 mode transport \ offload packet dev re0 dir in sel src 192.168.25.3 dst 192.168.25.1 \ proto tcp dport 9003 Acked-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5e: Support IPsec upper protocol selector field offload for RXEmeel Hakim
Support RX policy/state upper protocol selector field offload, to enable selecting RX traffic for IPsec operation based on l4 protocol UDP with specific source/destination port. Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Store vport in struct mlx5_devlink_port and use it in port opsJiri Pirko
Instead of using internal devlink_port->index to perform vport lookup in every devlink port op, store the vport pointer to the container struct mlx5_devlink_port and use it directly in port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Check vhca_resource_manager capability in each op and add extack msgJiri Pirko
Since the follow-up patch is going to remove mlx5_devlink_port_fn_get_vport() entirely, move the vhca_resource_manager capability checking to individual ops. Add proper extack message on the way. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Relax mlx5_devlink_eswitch_get() return value checkingJiri Pirko
If called from port ops, it is not needed to perform the checks in mlx5_devlink_eswitch_get(). The reason is devlink port would not be registered if the checks are not true. Introduce relaxed version mlx5_devlink_eswitch_nocheck_get() and use it in port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Return -EOPNOTSUPP in mlx5_devlink_port_fn_migratable_set() directlyJiri Pirko
Instead of initializing "err" variable, just return "-EOPNOTSUPP" directly where it is needed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Reduce number of vport lookups passing vport pointer instead of indexJiri Pirko
During devlink port init/cleanup and register/unregister calls, there are many lookups of vport. Instead of passing vport_num as argument to functions, pass the vport struct pointer directly and avoid repeated lookups. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Embed struct devlink_port into driver structureJiri Pirko
Struct devlink_port is usually embedded in a driver-specific struct which allows to carry driver context to devlink port ops. Introduce a container struct to include devlink_port struct in preparation to also include driver context for devlink port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in opsJiri Pirko
Currently each PF/VF/SF devlink port op called into mlx5 code calls is_port_function_supported() to check if the port is either PF, VF or SF. So make sure that the ops are registered with devlink port only for those and avoid the is_port_function_supported() checks in ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Remove no longer used mlx5_esw_offloads_sf_vport_enable/disable()Jiri Pirko
Since the previous patch removed the only users of these functions, remove them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF codeJiri Pirko
Similar to the PF/VF helpers, introduce a set of load/unload helpers for SF vports. From there, call mlx5_eswitch_load/unload_vport() which are common for PFs/VFs and newly introduced SF helpers. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Allow mlx5_esw_offloads_devlink_port_register() to register SFsJiri Pirko
Currently there is a separate set of functions used to register/unregister the SF. The only difference is currently the ops struct. Move the struct up and use it for SFs in mlx5_esw_offloads_devlink_port_register(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Push devlink port PF/VF init/cleanup calls out of ↵Jiri Pirko
devlink_port_register/unregister() In order to prepare for mlx5_esw_offloads_devlink_port_register/unregister() to be used for SFs as well, push out the PF/VF specific init/cleanup calls outside. Introduce mlx5_eswitch_load/unload_pf_vf_vport() and call them from there. Use these new helpers of PF/VF loading and make mlx5_eswitch_local/unload_vport() reusable for SFs. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Push out SF devlink port init and cleanup code to separate helpersJiri Pirko
Similar to what was done for PFs/VFs, introduce devlink port init and cleanup helpers for SFs and manage the vport->dl_port pointer there. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-22net/mlx5: Rework devlink port alloc/free into init/cleanupJiri Pirko
In order to prepare the devlink port registration function to be common for PFs/VFs and SFs, change the existing devlink port allocation and free functions into PF/VF init and cleanup, so similar helpers could be later on introduced for SFs. Make the init/cleanup helpers responsible for setting/clearing the vport->dl_port pointer. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-23tools/nolibc: avoid undesired casts in the __sysret() macroWilly Tarreau
Having __sysret() as an inline function has the unfortunate effect of adding casts and large constants comparisons after the syscall returns that significantly inflate some light code that's otherwise syscall- heavy. Even nolibc-test grew by ~1%. Let's switch back to a macro for this, and use it only with signed arguments. Note that it is also possible to design a slightly more complex macro covering unsigned and pointers but we only have 3 such syscalls so it is pointless, and these were just addressed not to use this macro anymore. Now for the argument (the local variable containing the syscall return value), any negative value is an error, that results in -1 being returned and errno to be assigned the opposite value. This may be revisited again in the future if really needed but for now let's get back to something sane. Fixes: 428905da6ec4 ("tools/nolibc: sys.h: add a syscall return helper") Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/ Link: https://lore.kernel.org/lkml/ZNKOJY+g66nkIyvv@1wt.eu/ Cc: Zhangjin Wu <falcon@tinylab.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Thomas Weißschuh <thomas@t-8ch.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: keep brk(), sbrk(), mmap() away from __sysret()Willy Tarreau
The __sysret() function causes some undesirable casts so we'll revert it. In order to keep it simple it will now only support integer return values like in the past, so we must basically revert the changes that were made to these 3 syscalls which return a pointer so that they simply rely on their own test and the SET_ERRNO() macro. Fixes: 4201cfce15fe ("tools/nolibc: clean up sbrk() routine") Fixes: 924e9539aeaa ("tools/nolibc: clean up mmap() routine") Fixes: d27447bc2e0a ("tools/nolibc: sys.h: apply __sysret() helper") Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/ Link: https://lore.kernel.org/lkml/ZNKOJY+g66nkIyvv@1wt.eu/ Cc: Zhangjin Wu <falcon@tinylab.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Thomas Weißschuh <thomas@t-8ch.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: silence ppc64 compile warningsZhangjin Wu
Silence the following warnings reported by the new -Wall -Wextra options with pure assembly code. In file included from sysroot/powerpc/include/stdio.h:13, from nolibc-test.c:13: sysroot/powerpc/include/arch.h: In function '_start': sysroot/powerpc/include/arch.h:192:32: warning: unused variable 'r2' [-Wunused-variable] 192 | register volatile long r2 __asm__ ("r2") = (void *)&TOC - (void *)_start; | ^~ sysroot/powerpc/include/arch.h:187:97: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var] 187 | void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void) | ^~~~~~ Since only elfv2 ABI requires to save the TOC/GOT pointer to r2 register, when using elfv1 ABI, the old C code is simply ignored by the compiler, but the compiler can not ignore the inline assembly code and will introduce build failure or running segfaults. So, let's further only add the new assembly code for elfv2 ABI with the checking of _CALL_ELF == 2. Link: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf Link: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: libc-test: use HOSTCC instead of CCZhangjin Wu
libc-test is mainly added to compare the behavior of nolibc to the system libc, it is meaningless and error-prone with cross compiling. Let's use HOSTCC instead of CC to avoid wrongly use cross compiler when CROSS_COMPILE is passed or customized. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Fixes: cfb672f94f6e ("selftests/nolibc: add run-libc-test target") Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: stackprotector.h: make __stack_chk_init staticZhangjin Wu
This allows to generate smaller text/data/dec size. As the _start_c() function added by crt.h, __stack_chk_init() is called from _start_c() instead of the assembly _start. So, it is able to mark it with static now. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: allow report with existing test logZhangjin Wu
After the tests finish, it is valuable to report and summarize with existing test log. This avoid rerun or run the tests again when not necessary. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add test support for ppc64Zhangjin Wu
Kernel uses ARCH=powerpc for both 32-bit and 64-bit PowerPC, here adds a ppc64 variant for big endian 64-bit PowerPC, users can pass XARCH=ppc64 to test it. The powernv machine of qemu-system-ppc64 is used with powernv_be_defconfig. As the document [1] shows: PowerNV (as Non-Virtualized) is the “bare metal” platform using the OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can be used as an hypervisor OS, running KVM guests, or simply as a host OS. Notes, - differs from little endian 64-bit PowerPC, vmlinux is used instead of zImage, because big endian zImage [2] only boot on qemu with x-vof=on (added from qemu v7.0) and a fixup patch [3] for qemu v7.0.51: - since the VSX support may be disabled in kernel side, to avoid "illegal instruction" errors due to missing VSX kernel support, let's simply let compiler not generate vector/scalar (VSX) instructions via the '-mno-vsx' option. - as 'man gcc' shows, '-mmultiple' is used to generate code that uses the load multiple word instructions and the store multiple word instructions. Those instructions do not work when the processor is in little-endian mode (except PPC740/PPC750), so, we only enable it for big endian powerpc. - for big endian ppc64, as the help message from arch/powerpc/Kconfig shows, the V2 ABI is standard for 64-bit little-endian, but for big-endian it is less well tested by kernel and toolchain, so, use elfv1 as-is, no need to explicitly ask toolchain to use elfv2 here. [1]: https://qemu.readthedocs.io/en/latest/system/ppc/powernv.html [2]: https://github.com/linuxppc/issues/issues/402 [3]: https://lore.kernel.org/qemu-devel/20220504065536.3534488-1-aik@ozlabs.ru/ Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230722121019.GD17311@1wt.eu/ Link: https://lore.kernel.org/lkml/20230719043353.GC5331@1wt.eu/ Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add test support for ppc64leZhangjin Wu
Kernel uses ARCH=powerpc for both 32-bit and 64-bit PowerPC, here adds a ppc64le variant for little endian 64-bit PowerPC, users can pass XARCH=ppc64le to test it. The powernv machine of qemu-system-ppc64le is used for there is just a working powernv_defconfig. As the document [1] shows: PowerNV (as Non-Virtualized) is the “bare metal” platform using the OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can be used as an hypervisor OS, running KVM guests, or simply as a host OS. Notes, - since the VSX support may be disabled in kernel side, to avoid "illegal instruction" errors due to missing VSX kernel support, let's simply let compiler not generate vector/scalar (VSX) instructions via the '-mno-vsx' option. - little endian ppc64 prefers elfv2 to elfv1 if the toolchain (e.g. gcc 13.1.0) supports it, let's align with kernel, otherwise, our elfv1 binary will not run on kernel with elfv2 ABI. [1]: https://qemu.readthedocs.io/en/latest/system/ppc/powernv.html Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230722120747.GC17311@1wt.eu/ Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add test support for ppcZhangjin Wu
Kernel uses ARCH=powerpc for both 32-bit and 64-bit PowerPC, here adds a ppc variant for 32-bit PowerPC and uses it as the default variant of powerpc architecture. Users can pass XARCH=ppc (or ARCH=powerpc) to test 32-bit PowerPC. The default qemu-system-ppc g3beige machine [1] is used to run 32-bit powerpc kernel with pmac32_defconfig. The missing PMACZILOG serial tty and console are enabled in another patch [2]. Note, - zImage doesn't boot due to "qemu-system-ppc: Some ROM regions are overlapping" error, so, vmlinux is used instead. - since the VSX support may be disabled in kernel side, to avoid "illegal instruction" errors due to missing VSX kernel support, let's simply let compiler not generate vector/scalar (VSX) instructions via the '-mno-vsx' option. - as 'man gcc' shows, '-mmultiple' is used to generate code that uses the load multiple word instructions and the store multiple word instructions. Those instructions do not work when the processor is in little-endian mode (except PPC740/PPC750), so, we only enable it for big endian powerpc. [1]: https://qemu.readthedocs.io/en/latest/system/ppc/powermac.html [2]: https://lore.kernel.org/lkml/bb7b5f9958b3e3a20f6573ff7ce7c5dc566e7e32.1690982937.git.tanyuan@tinylab.org/ Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/ZL9leVOI25S2+0+g@1wt.eu/ Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23selftests/nolibc: add XARCH and ARCH mapping supportZhangjin Wu
Most of the CPU architectures have different variants, but kernel usually only accepts parts of them via the ARCH variable, the others should be customized via kernel config files. To simplify testing, a new XARCH variable is added to extend the kernel's ARCH with a few variants of the same architecture, and it is used to customize variant specific variables, at last XARCH is converted to the kernel's ARCH: e.g. make run XARCH=<one of the supported variants> | \ | `-> variant specific variables: | IMAGE, DEFCONFIG, QEMU_ARCH, QEMU_ARGS, CFLAGS ... \ `---> kernel's ARCH XARCH and ARCH are carefully mapped to allow users to pass architecture variants via XARCH or pass architecture via ARCH from cmdline. PowerPC is the first user and also a very good reference architecture of this mapping, it has variants with different combinations of 32-bit/64-bit and bit endian/little endian. To use this mapping, the other architectures can refer to PowerPC, If the target architecture only has one variant, XARCH is simply an alias of ARCH, no additional mapping required. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230702171715.GD16233@1wt.eu/ Link: https://lore.kernel.org/lkml/20230730061801.GA7690@1wt.eu/ Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add support for powerpc64Zhangjin Wu
This follows the 64-bit PowerPC ABI [1], refers to the slides: "A new ABI for little-endian PowerPC64 Design & Implementation" [2] and the musl code in arch/powerpc64/crt_arch.h. First, stdu and clrrdi are used instead of stwu and clrrwi for powerpc64. Second, the stack frame size is increased to 32 bytes for powerpc64, 32 bytes is the minimal stack frame size supported described in [2]. Besides, the TOC pointer (GOT pointer) must be saved to r2. This works on both little endian and big endian 64-bit PowerPC. [1]: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf [2]: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>