summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-21vringh: replace kmap_atomic() with kmap_local_page()Stefano Garzarella
kmap_atomic() is deprecated in favor of kmap_local_page() since commit f3ba3c710ac5 ("mm/highmem: Provide kmap_local*"). With kmap_local_page() the mappings are per thread, CPU local, can take page-faults, and can be called from any context (including interrupts). Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. kmap_atomic() is implemented like a kmap_local_page() which also disables page-faults and preemption (the latter only for !PREEMPT_RT kernels, otherwise it only disables migration). The code within the mappings/un-mappings in getu16_iotlb() and putu16_iotlb() don't depend on the above-mentioned side effects of kmap_atomic(), so that mere replacements of the old API with the new one is all that is required (i.e., there is no need to explicitly add calls to pagefault_disable() and/or preempt_disable()). This commit reuses a "boiler plate" commit message from Fabio, who has already did this change in several places. Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-4-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-vdpa: use bind_mm/unbind_mm device callbacksStefano Garzarella
When the user call VHOST_SET_OWNER ioctl and the vDPA device has `use_va` set to true, let's call the bind_mm callback. In this way we can bind the device to the user address space and directly use the user VA. The unbind_mm callback is called during the release after stopping the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa: add bind_mm/unbind_mm callbacksStefano Garzarella
These new optional callbacks is used to bind/unbind the device to a specific address space so the vDPA framework can use VA when these callbacks are implemented. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vringh: fix typos in the vringh_init_* documentationStefano Garzarella
Replace `userpace` with `userspace`. Cc: Simon Horman <simon.horman@corigine.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230331080208.17002-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support specifying bounce buffer size via sysfsXie Yongji
As discussed in [1], this adds sysfs interface to support specifying bounce buffer size in virtio-vdpa case. It would be a performance tuning parameter for high throughput workloads. [1] https://lore.kernel.org/netdev/e8f25a35-9d45-69f9-795d-bdbbb90337a3@redhat.com/ Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-12-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Delay iova domain creationXie Yongji
Delay creating iova domain until the vduse device is registered to vdpa bus. This is a preparation for adding sysfs interface to support specifying bounce buffer size for the iova domain. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-11-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Signal vq trigger eventfd directly if possibleXie Yongji
Now the vdpa callback will associate an trigger eventfd in some cases. For performance reasons, VDUSE can signal it directly during irq injection. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-10-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa: Add eventfd for the vdpa callbackXie Yongji
Add eventfd for the vdpa callback so that user can signal it directly instead of triggering the callback. It will be used for vhost-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-9-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Add sysfs interface for irq callback affinityXie Yongji
Add sysfs interface for each vduse virtqueue to get/set the affinity for irq callback. This might be useful for performance tuning when the irq callback affinity mask contains more than one CPU. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-8-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support get_vq_affinity callbackXie Yongji
This implements get_vq_affinity callback so that the virtio-blk driver can build the blk-mq queues based on the irq callback affinity. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-7-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support set_vq_affinity callbackXie Yongji
Since virtio-vdpa bus driver already support interrupt affinity spreading mechanism, let's implement the set_vq_affinity callback to bring it to vduse device. After we get the virtqueue's affinity, we can spread IRQs between CPUs in the affinity mask, in a round-robin manner, to run the irq callback. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-6-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Refactor allocation for vduse virtqueuesXie Yongji
Allocate memory for vduse virtqueues one by one instead of doing one allocation for all of them. This is a preparation for adding sysfs interface for virtqueues. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-5-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21virtio-vdpa: Support interrupt affinity spreading mechanismXie Yongji
To support interrupt affinity spreading mechanism, this makes use of group_cpus_evenly() to create an irq callback affinity mask for each virtqueue of vdpa device. Then we will unify set_vq_affinity callback to pass the affinity to the vdpa device driver. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-4-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vdpa: Add set/get_vq_affinity callbacks in vdpa_config_opsXie Yongji
This introduces set/get_vq_affinity callbacks in vdpa_config_ops to support virtqueue affinity management for vdpa device drivers. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-3-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21lib/group_cpus: Export group_cpus_evenly()Xie Yongji
Export group_cpus_evenly() so that some modules can make use of it to group CPUs evenly according to NUMA and CPU locality. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-2-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/mlx5: Extend driver support for new featuresEli Cohen
Extend the possible list for features that can be supported by firmware. Note that different versions of firmware may or may not support these features. The driver is made aware of them by querying the firmware. While doing this, improve the code so we use enum names instead of hard coded numerical values. The new features supported by the driver are the following: VIRTIO_NET_F_MRG_RXBUF VIRTIO_NET_F_HOST_ECN VIRTIO_NET_F_GUEST_ECN VIRTIO_NET_F_GUEST_TSO6 VIRTIO_NET_F_GUEST_TSO4 Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230321112809.221432-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
2023-04-21vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by defaultEli Cohen
Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF. Current firmware versions show degradation in packet rate when using MRG_RXBUF. Users who favor memory saving over packet rate could enable this feature but we want to keep it off by default. One can still enable it when creating the vdpa device using vdpa tool by providing features that include it. For example: $ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230321112809.221432-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2023-04-21virtio_ring: Allow non power of 2 sizes for packed virtqueueFeng Liu
According to the Virtio Specification, the Queue Size parameter of a virtqueue corresponds to the maximum number of descriptors in that queue, and it does not have to be a power of 2 for packed virtqueues. However, the virtio_pci_modern driver enforced a power of 2 check for virtqueue sizes, which is unnecessary and restrictive for packed virtuqueue. Split virtqueue still needs to check the virtqueue size is power_of_2 which has been done in vring_alloc_queue_split of the virtio_ring layer. To validate this change, we tested various virtqueue sizes for packed rings, including 128, 256, 512, 100, 200, 500, and 1000, with CONFIG_PAGE_POISONING enabled, and all tests passed successfully. Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20230315185458.11638-2-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Reduce vhost_scsi_mutex useMike Christie
We on longer need to hold the vhost_scsi_mutex the entire time we set/clear the endpoint. The tv_tpg_mutex handles tpg accesses not related to the tpg list, the port link/unlink functions use the tv_tpg_mutex while accessing the tpg->vhost_scsi pointer, vhost_scsi_do_plug will no longer queue events after the virtqueue's backend has been cleared and flushed, and we don't drop our refcount to the tpg until after we have stopped cmds and wait for outstanding cmds to complete. This moves the vhost_scsi_mutex use to it's documented use of being used to access the tpg list. We then don't need to hold it while a flush is being performed causing other device's vhost_scsi_set_endpoint and vhost_scsi_make_tpg/vhost_scsi_drop_tpg calls to have to wait on a flakey device. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Drop vhost_scsi_mutex use in port calloutsMike Christie
We are using the vhost_scsi_mutex to make sure vhost_scsi_port_link and vhost_scsi_port_unlink see if vhost_scsi_clear_endpoint has cleared tpg->vhost_scsi and it can't be freed while they are using. However, we currently set the tpg->vhost_scsi pointer while holding tv_tpg_mutex. So, we can just hold that while calling vhost_scsi_hotplug/hotunplug. We then don't need to hold the vhost_scsi_mutex while vhost_scsi_clear_endpoint is holding it and doing a flush which could cause the LUN map/unmap to have to wait on another device's flush. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Check for a cleared backend before queueing an eventMike Christie
We currenly hold the vhost_scsi_mutex while clearing the endpoint and while performing vhost_scsi_do_plug, so tpg->vhost_scsi can't be freed from uder us, and to make sure anything queued is handled by the full call in vhost_scsi_clear_endpoint. This patch removes the need for the vhost_scsi_mutex for the latter case. In the next patches, we won't hold the vhost_scsi_mutex while flushing so this patch adds a check for the clearing of the virtqueue from vhost_scsi_clear_endpoint. We then know that once vhost_scsi_clear_endpoint has cleared the backend that no new events will be queued, and the flush after the vhost_vq_set_backend(vq, NULL) call will see everything that's been queued to that point. So the flush will then handle all events without the need for the vhost_scsi_mutex. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Drop device mutex use in vhost_scsi_do_plugMike Christie
We don't need the device mutex in vhost_scsi_do_plug because: 1. we have the vhost_scsi_mutex so the tpg->vhost_scsi pointer will not change on us and the vhost_scsi can't be freed from under us if it was set. 2. vhost_scsi_clear_endpoint will stop the virtqueues and flush them while holding the vhost_scsi_mutex so we know once vhost_scsi_clear_endpoint has completed that vhost_scsi_do_plug can't send new events and any queued ones have completed. So this patch drops the device mutex use in vhost_scsi_do_plug. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-scsi: Delay releasing our refcount on the tpgMike Christie
We currently hold the vhost_scsi_mutex the entire time we are running vhost_scsi_clear_endpoint. One of the reasons for this is that it prevents userspace from being able to free the se_tpg from under us after we have called target_undepend_item. However, it forces management operations for for other devices to have to wait on a flakey device's: vhost_scsi_clear_endpoint -> vhost_scsi_flush() call which can which can take a long time. This moves the target_undepend_item call and the tpg unsetup code to after we have stopped new IO from starting up and after we have waited on running IO. We can then release our refcount on the tpg and session knowing our device is no longer accessing them. We can then drop the vhost_scsi_mutex use during thee flush call in later patches in this set, when we have removed other reasons for holding it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230321020624.13323-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21virtio_ring: Use const to annotate read-only pointer paramsFeng Liu
Add const to make the read-only pointer parameters clear, similar to many existing functions. To implement this change, the commit also introduces the use of `container_of_const` to implement `to_vvq`, which ensures the const-ness of read-only parameters and avoids accidental modification of their members. Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Message-Id: <20230310053428.3376-4-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21virtio_ring: Avoid using inline for small functionsFeng Liu
According to kernel coding style [1], defining inline functions is not necessary and beneficial for simple functions. Hence clean up the code by removing the inline keyword. It is verified with GCC 12.2.0, the generated code with/without inline is same. Additionally tested with pktgen and iperf, and verified the result, the pps test results are the same in the cases of with/without inline. Iperf and pps of pktgen for virtio-net didn't change before and after the change. [1] https://www.kernel.org/doc/html/v6.2-rc3/process/coding-style.html#the-inline-disease Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20230310053428.3376-3-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21tools/virtio: virtio_test -h,--help should return directlyRong Tao
When we get help information, we should return directly, and we should not execute test cases. Move the exit() directly into the help() function and remove it from case '?'. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_822CEBEB925205EA1573541CD1C2604F4805@qq.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21tools/virtio: virtio_test: Fix indentationRong Tao
Replace eight spaces with Tab. Signed-off-by: Rong Tao <rtoax@foxmail.com> Message-Id: <tencent_89579C514BC4020324A1A4ACA44B5B95BB07@qq.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21virtio: Reorder fields in 'struct virtqueue'Christophe JAILLET
Group some variables based on their sizes to reduce hole and avoid padding. On x86_64, this shrinks the size of 'struct virtqueue' from 72 to 68 bytes. It saves a few bytes of memory. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <8f3d2e49270a2158717e15008e7ed7228196ba02.1676707807.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Lafreniere <peter@n8pjl.ca> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-04-21vhost: use struct_size and size_add to compute flex array sizesJacob Keller
The vhost_get_avail_size and vhost_get_used_size functions compute the size of structures with flexible array members with an additional 2 bytes if the VIRTIO_RING_F_EVENT_IDX feature flag is set. Convert these functions to use struct_size() and size_add() instead of coding the calculation by hand. This ensures that the calculations will saturate at SIZE_MAX rather than overflowing. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Cc: kvm@vger.kernel.org Message-Id: <20230227214127.3678392-1-jacob.e.keller@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa/mlx5: Avoid losing link state updatesEli Cohen
Current code ignores link state updates if VIRTIO_NET_F_STATUS was not negotiated. However, link state updates could be received before feature negotiation was completed , therefore causing link state events to be lost, possibly leaving the link state down. Modify the code so link state notifier is registered after DRIVER_OK was negotiated and carry the registration only if VIRTIO_NET_F_STATUS was negotiated. Unregister the notifier when the device is reset. Fixes: 033779a708f0 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230417110343.138319-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21usb: dwc3: gadget: Refactor EP0 forced stall/restart into a separate APIWesley Cheng
Several sequences utilize the same routine for forcing the control endpoint back into the SETUP phase. This is required, because those operations need to ensure that EP0 is back in the default state. Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com> Link: https://lore.kernel.org/r/20230420212759.29429-3-quic_wcheng@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-21usb: dwc3: gadget: Execute gadget stop after halting the controllerWesley Cheng
Do not call gadget stop until the poll for controller halt is completed. DEVTEN is cleared as part of gadget stop, so the intention to allow ep0 events to continue while waiting for controller halt is not happening. Fixes: c96683798e27 ("usb: dwc3: ep0: Don't prepare beyond Setup stage") Cc: stable@vger.kernel.org Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com> Link: https://lore.kernel.org/r/20230420212759.29429-2-quic_wcheng@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20Merge branch 'net-extend-drop-reasons'Jakub Kicinski
Johannes Berg says: ==================== net: extend drop reasons Here's v4 of the extended drop reasons, with fixes to kernel-doc and checkpatch. ==================== Link: https://lore.kernel.org/r/20230419125254.20789-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20mac80211: use the new drop reasons infrastructureJohannes Berg
It can be really hard to analyse or debug why packets are going missing in mac80211, so add the needed infrastructure to use use the new per-subsystem drop reasons. We actually use two drop reason subsystems here because of the different handling of frames that are dropped but still go to monitor for old versions of hostapd, and those that are just completely unusable (e.g. crypto failed.) Annotate a few reasons here just to illustrate this, we'll need to go through and annotate more of them later. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: extend drop reasons for multiple subsystemsJohannes Berg
Extend drop reasons to make them usable by subsystems other than core by reserving the high 16 bits for a new subsystem ID, of which 0 of course is used for the existing reasons immediately. To still be able to have string reasons, restructure that code a bit to make the loopup under RCU, the only user of this (right now) is drop_monitor. Link: https://lore.kernel.org/netdev/00659771ed54353f92027702c5bbb84702da62ce.camel@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: move dropreason.h to dropreason-core.hJohannes Berg
This will, after the next patch, hold only the core drop reasons and minimal infrastructure. Fix a small kernel-doc issue while at it, to avoid the move triggering a checker. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20ipv6: add icmpv6_error_anycast_as_unicast for ICMPv6Mahesh Bandewar
ICMPv6 error packets are not sent to the anycast destinations and this prevents things like traceroute from working. So create a setting similar to ECHO when dealing with Anycast sources (icmpv6_echo_ignore_anycast). Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Maciej Żenczykowski <maze@google.com> Link: https://lore.kernel.org/r/20230419013238.2691167-1-maheshb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20Merge branch 'ethtool-mm-api-consolidation'Jakub Kicinski
Vladimir Oltean says: ==================== ethtool mm API consolidation This series consolidates the behavior of the 2 drivers that implement the ethtool MAC Merge layer by making NXP ENETC commit its preemptible traffic classes to hardware only when MM TX is active (same as Ocelot). Then, after resolving an issue with the ENETC driver, it restricts user space from entering 2 states which don't make sense: - pmac-enabled off tx-enabled on verify-enabled * - pmac-enabled * tx-enabled off verify-enabled on Then, it introduces a selftest (ethtool_mm.sh) which puts everything together and tests all valid configurations known to me. This is simultaneously the v2 of "[PATCH net-next 0/2] ethtool mm API improvements": https://lore.kernel.org/netdev/20230415173454.3970647-1-vladimir.oltean@nxp.com/ which had caused some problems to openlldp. Those were solved in the meantime, see: https://github.com/intel/openlldp/commit/11171b474f6f3cbccac5d608b7f26b32ff72c651 and of "[RFC PATCH net-next] selftests: forwarding: add a test for MAC Merge layer": https://lore.kernel.org/netdev/20230210221243.228932-1-vladimir.oltean@nxp.com/ ==================== Link: https://lore.kernel.org/r/20230418111459.811553-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20selftests: forwarding: add a test for MAC Merge layerVladimir Oltean
The MAC Merge layer (IEEE 802.3-2018 clause 99) does all the heavy lifting for Frame Preemption (IEEE 802.1Q-2018 clause 6.7.2), a TSN feature for minimizing latency. Preemptible traffic is different on the wire from normal traffic in incompatible ways. If we send a preemptible packet and the link partner doesn't support preemption, it will drop it as an error frame and we will never know. The MAC Merge layer has a control plane of its own, which can be manipulated (using ethtool) in order to negotiate this capability with the link partner (through LLDP). Actually the TLV format for LLDP solves this problem only partly, because both partners only advertise: - if they support preemption (RX and TX) - if they have enabled preemption (TX) so we cannot tell the link partner what to do - we cannot force it to enable reception of our preemptible packets. That is fully solved by the verification feature, where the local device generates some small probe frames which look like preemptible frames with no useful content, and the link partner is obliged to respond to them if it supports the standard. If the verification times out, we know that preemption isn't active in our TX direction on the link. Having clarified the definition, this selftest exercises the manual (ethtool) configuration path of 2 link partners (with and without verification), and the LLDP code path, using the openlldp project. The test also verifies the TX activity of the MAC Merge layer by sending traffic through a traffic class configured as preemptible (using mqprio). There isn't a good way to make this really portable (user space cannot find out how many traffic classes there are for a device), but I chose num_tc 4 here, that should work reasonably well. I also know that some devices (stmmac) only permit TXQ0 to be preemptible, so this is why PREEMPTIBLE_PRIO was strategically chosen as 0. Even if other hardware is more configurable, this test should cover the baseline. This is not really a "forwarding" selftest, but I put it near the other "ethtool" selftests. $ ./ethtool_mm.sh eno0 swp0 TEST: Manual configuration with verification: eno0 to swp0 [ OK ] TEST: Manual configuration with verification: swp0 to eno0 [ OK ] TEST: Manual configuration without verification: eno0 to swp0 [ OK ] TEST: Manual configuration without verification: swp0 to eno0 [ OK ] TEST: Manual configuration with failed verification: eno0 to swp0 [ OK ] TEST: Manual configuration with failed verification: swp0 to eno0 [ OK ] TEST: LLDP [ OK ] Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20selftests: forwarding: introduce helper for standard ethtool countersVladimir Oltean
Counters for the MAC Merge layer and preemptible MAC have standardized so far on using structured ethtool stats as opposed to the driver specific names and meanings. Benefit from that rare opportunity and introduce a helper to lib.sh for querying standardized counters, in the hope that these will take off for other uses as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20selftests: forwarding: generalize bail_on_lldpad from mlxswPetr Machata
mlxsw selftests often invoke a bail_on_lldpad() helper to make sure LLDPAD is not running, to prevent conflicts between the QoS configuration applied through TC or DCB command line tool, and the DCB configuration that LLDPAD might apply. This helper might be useful to others. Move the function to lib.sh, and parameterize to make reusable in other contexts. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20selftests: forwarding: sch_tbf_*: Add a pre-run hookPetr Machata
The driver-specific wrappers of these selftests invoke bail_on_lldpad to make sure that LLDPAD doesn't trample the configuration. The function bail_on_lldpad is going to move to lib.sh in the next patch. With that, it won't be visible for the wrappers before sourcing the framework script. And after sourcing it, it is too late: the selftest will have run by then. One option might be to source NUM_NETIFS=0 lib.sh from the wrapper, but even if that worked (it might, it might not), that seems cumbersome. lib.sh is doing fair amount of stuff, and even if it works today, it does not look particularly solid as a solution. Instead, introduce a hook, sch_tbf_pre_hook(), that when available, gets invoked. Move the bail to the hook. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: ethtool: mm: sanitize some UAPI configurationsVladimir Oltean
The verify-enabled boolean (ETHTOOL_A_MM_VERIFY_ENABLED) was intended to be a sub-setting of tx-enabled (ETHTOOL_A_MM_TX_ENABLED). IOW, MAC Merge TX can be enabled with or without verification, but verification with TX disabled makes no sense. The pmac-enabled boolean (ETHTOOL_A_MM_PMAC_ENABLED) was intended to be a global toggle from an API perspective, whereas tx-enabled just handles the TX direction. IOW, the pMAC can be enabled with or without TX, but it doesn't make sense to enable TX if the pMAC is not enabled. Add two checks which sanitize and reject these invalid cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: enetc: include MAC Merge / FP registers in register dumpVladimir Oltean
These have been useful in debugging various problems related to frame preemption, so make them available through ethtool --register-dump for later too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: enetc: only commit preemptible TCs to hardware when MM TX is activeVladimir Oltean
This was left as TODO in commit 01e23b2b3bad ("net: enetc: add support for preemptible traffic classes") since it's relatively complicated. Where this makes a difference is with a configuration as follows: ethtool --set-mm eno0 pmac-enabled on tx-enabled on verify-enabled on Preemptible packets should only be sent when the MAC Merge TX direction becomes active (i.o.w. when the verification process succeeds, aka when the link partner confirms it can process preemptible traffic). But the tc qdisc with the preemptible traffic classes is offloaded completely asynchronously w.r.t. the MM becoming active. The ENETC manual does suggest that this should be handled in the driver: "On startup, software should wait for the verification process to complete (MMCSR[VSTS]=011) before initiating traffic". Adding the necessary logic allows future selftests to uphold the claim that an inactive or disabled MAC Merge layer should never send data packets through the pMAC. This change moves enetc_set_ptcfpr() from enetc.c to enetc_ethtool.c, where its only caller is now - enetc_mm_commit_preemptible_tcs(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: enetc: report mm tx-active based on tx-enabled and verify-statusVladimir Oltean
The MMCSR register contains 2 fields with overlapping meaning: - LPA (Local preemption active): This read-only status bit indicates whether preemption is active for this port. This bit will be set if preemption is both enabled and has completed the verification process. - TXSTS (Merge status): This read-only status field provides the state of the MAC Merge sublayer transmit status as defined in IEEE Std 802.3-2018 Clause 99. 00 Transmit preemption is inactive 01 Transmit preemption is active 10 Reserved 11 Reserved However none of these 2 fields offer reliable reporting to software. When connecting ENETC to a link partner which is not capable of Frame Preemption, the expectation is that ENETC's verification should fail (VSTS=4) and its MM TX direction should be inactive (LPA=0, TXSTS=00) even though the MM TX is enabled (ME=1). But surprise, the LPA bit of MMCSR stays set even if VSTS=4 and ME=1. OTOH, the TXSTS field has the opposite problem. I cannot get its value to change from 0, even when connecting to a link partner capable of frame preemption, which does respond to its verification frames (ME=1 and VSTS=3, "SUCCEEDED"). The only option with such buggy hardware seems to be to reimplement the formula for calculating tx-active in software, which is for tx-enabled to be true, and for the verify-status to be either SUCCEEDED, or DISABLED. Without reliable tx-active reporting, we have no good indication when to commit the preemptible traffic classes to hardware, which makes it possible (but not desirable) to send preemptible traffic to a link partner incapable of receiving it. However, currently we do not have the logic to wait for TX to be active yet, so the impact is limited. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: enetc: fix MAC Merge layer remaining enabled until a link down eventVladimir Oltean
Current enetc_set_mm() is designed to set the priv->active_offloads bit ENETC_F_QBU for enetc_mm_link_state_update() to act on, but if the link is already up, it modifies the ENETC_MMCSR_ME ("Merge Enable") bit directly. The problem is that it only *sets* ENETC_MMCSR_ME if the link is up, it doesn't *clear* it if needed. So subsequent enetc_get_mm() calls still see tx-enabled as true, up until a link down event, which is when enetc_mm_link_state_update() will get called. This is not a functional issue as far as I can assess. It has only come up because I'd like to uphold a simple API rule in core ethtool code: the pMAC cannot be disabled if TX is going to be enabled. Currently, the fact that TX remains enabled for longer than expected (after the enetc_set_mm() call that disables it) is going to violate that rule, which is how it was caught. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20wwan: core: add print for wwan port attach/disconnectSlark Xiao
Refer to USB serial device or net device, there is a notice to let end user know the status of device, like attached or disconnected. Add attach/disconnect print for wwan device as well. Signed-off-by: Slark Xiao <slark_xiao@163.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/20230420023617.3919569-1-slark_xiao@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20fuse_dev_ioctl(): switch to fdget()Al Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-04-20cgroup_get_from_fd(): switch to fdget_raw()Al Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>