summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-01-25Bluetooth: L2CAP: Try harder to accept device not knowing optionsBastien Nocera
The current implementation of L2CAP options negotiation will continue the negotiation when a device responds with L2CAP_CONF_UNACCEPT ("unaccepted options"), but not when the device replies with L2CAP_CONF_UNKNOWN ("unknown options"). Trying to continue the negotiation without ERTM support will allow Bluetooth-capable XBox One controllers (notably models 1708 and 1797) to connect. btmon before patch: > ACL Data RX: Handle 256 flags 0x02 dlen 16 #64 [hci0] 59.182702 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 64 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 23 #65 [hci0] 59.182744 L2CAP: Configure Request (0x04) ident 3 len 15 Destination CID: 64 Flags: 0x0000 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Basic (0x00) TX window size: 0 Max transmit: 0 Retransmission timeout: 0 Monitor timeout: 0 Maximum PDU size: 0 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #66 [hci0] 59.183948 L2CAP: Configure Request (0x04) ident 1 len 8 Destination CID: 64 Flags: 0x0000 Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 1480 < ACL Data TX: Handle 256 flags 0x00 dlen 18 #67 [hci0] 59.183994 L2CAP: Configure Response (0x05) ident 1 len 10 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 1480 > ACL Data RX: Handle 256 flags 0x02 dlen 15 #69 [hci0] 59.187676 L2CAP: Configure Response (0x05) ident 3 len 7 Source CID: 64 Flags: 0x0000 Result: Failure - unknown options (0x0003) 04 . < ACL Data TX: Handle 256 flags 0x00 dlen 12 #70 [hci0] 59.187722 L2CAP: Disconnection Request (0x06) ident 4 len 4 Destination CID: 64 Source CID: 64 > ACL Data RX: Handle 256 flags 0x02 dlen 12 #73 [hci0] 59.192714 L2CAP: Disconnection Response (0x07) ident 4 len 4 Destination CID: 64 Source CID: 64 btmon after patch: > ACL Data RX: Handle 256 flags 0x02 dlen 16 #248 [hci0] 103.502970 L2CAP: Connection Response (0x03) ident 5 len 8 Destination CID: 65 Source CID: 65 Result: Connection pending (0x0001) Status: No further information available (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 16 #249 [hci0] 103.504184 L2CAP: Connection Response (0x03) ident 5 len 8 Destination CID: 65 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 23 #250 [hci0] 103.504398 L2CAP: Configure Request (0x04) ident 6 len 15 Destination CID: 65 Flags: 0x0000 Option: Retransmission and Flow Control (0x04) [mandatory] Mode: Basic (0x00) TX window size: 0 Max transmit: 0 Retransmission timeout: 0 Monitor timeout: 0 Maximum PDU size: 0 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #251 [hci0] 103.505472 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 65 Flags: 0x0000 Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 1480 < ACL Data TX: Handle 256 flags 0x00 dlen 18 #252 [hci0] 103.505689 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 1480 > ACL Data RX: Handle 256 flags 0x02 dlen 15 #254 [hci0] 103.509165 L2CAP: Configure Response (0x05) ident 6 len 7 Source CID: 65 Flags: 0x0000 Result: Failure - unknown options (0x0003) 04 . < ACL Data TX: Handle 256 flags 0x00 dlen 12 #255 [hci0] 103.509426 L2CAP: Configure Request (0x04) ident 7 len 4 Destination CID: 65 Flags: 0x0000 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #257 [hci0] 103.511870 L2CAP: Connection Request (0x02) ident 8 len 4 PSM: 1 (0x0001) Source CID: 66 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #259 [hci0] 103.514121 L2CAP: Configure Response (0x05) ident 7 len 6 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Signed-off-by: Florian Dollinger <dollinger.florian@gmx.de> Co-developed-by: Florian Dollinger <dollinger.florian@gmx.de> Reviewed-by: Luiz Augusto Von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: L2CAP: Fix handling fragmented lengthLuiz Augusto von Dentz
Bluetooth Core Specification v5.2, Vol. 3, Part A, section 1.4, table 1.1: 'Start Fragments always either begin with the first octet of the Basic L2CAP header of a PDU or they have a length of zero (see [Vol 2] Part B, Section 6.6.2).' Apparently this was changed by the following errata: https://www.bluetooth.org/tse/errata_view.cfm?errata_id=10216 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: btusb: fix memory leak on suspend and resumeVamshi K Sthambamkadi
kmemleak report: unreferenced object 0xffff9b1127f00500 (size 208): comm "kworker/u17:2", pid 500, jiffies 4294937470 (age 580.136s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 60 ed 05 11 9b ff ff 00 00 00 00 00 00 00 00 .`.............. backtrace: [<000000006ab3fd59>] kmem_cache_alloc_node+0x17a/0x480 [<0000000051a5f6f9>] __alloc_skb+0x5b/0x1d0 [<0000000037e2d252>] hci_prepare_cmd+0x32/0xc0 [bluetooth] [<0000000010b586d5>] hci_req_add_ev+0x84/0xe0 [bluetooth] [<00000000d2deb520>] hci_req_clear_event_filter+0x42/0x70 [bluetooth] [<00000000f864bd8c>] hci_req_prepare_suspend+0x84/0x470 [bluetooth] [<000000001deb2cc4>] hci_prepare_suspend+0x31/0x40 [bluetooth] [<000000002677dd79>] process_one_work+0x209/0x3b0 [<00000000aaa62b07>] worker_thread+0x34/0x400 [<00000000826d176c>] kthread+0x126/0x140 [<000000002305e558>] ret_from_fork+0x22/0x30 unreferenced object 0xffff9b1125c6ee00 (size 512): comm "kworker/u17:2", pid 500, jiffies 4294937470 (age 580.136s) hex dump (first 32 bytes): 04 00 00 00 0d 00 00 00 05 0c 01 00 11 9b ff ff ................ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace: [<000000009f07c0cc>] slab_post_alloc_hook+0x59/0x270 [<0000000049431dc2>] __kmalloc_node_track_caller+0x15f/0x330 [<00000000027a42f6>] __kmalloc_reserve.isra.70+0x31/0x90 [<00000000e8e3e76a>] __alloc_skb+0x87/0x1d0 [<0000000037e2d252>] hci_prepare_cmd+0x32/0xc0 [bluetooth] [<0000000010b586d5>] hci_req_add_ev+0x84/0xe0 [bluetooth] [<00000000d2deb520>] hci_req_clear_event_filter+0x42/0x70 [bluetooth] [<00000000f864bd8c>] hci_req_prepare_suspend+0x84/0x470 [bluetooth] [<000000001deb2cc4>] hci_prepare_suspend+0x31/0x40 [bluetooth] [<000000002677dd79>] process_one_work+0x209/0x3b0 [<00000000aaa62b07>] worker_thread+0x34/0x400 [<00000000826d176c>] kthread+0x126/0x140 [<000000002305e558>] ret_from_fork+0x22/0x30 unreferenced object 0xffff9b112b395788 (size 8): comm "kworker/u17:2", pid 500, jiffies 4294937470 (age 580.136s) hex dump (first 8 bytes): 20 00 00 00 00 00 04 00 ....... backtrace: [<0000000052dc28d2>] kmem_cache_alloc_trace+0x15e/0x460 [<0000000046147591>] alloc_ctrl_urb+0x52/0xe0 [btusb] [<00000000a2ed3e9e>] btusb_send_frame+0x91/0x100 [btusb] [<000000001e66030e>] hci_send_frame+0x7e/0xf0 [bluetooth] [<00000000bf6b7269>] hci_cmd_work+0xc5/0x130 [bluetooth] [<000000002677dd79>] process_one_work+0x209/0x3b0 [<00000000aaa62b07>] worker_thread+0x34/0x400 [<00000000826d176c>] kthread+0x126/0x140 [<000000002305e558>] ret_from_fork+0x22/0x30 In pm sleep-resume context, while the btusb device rebinds, it enters hci_unregister_dev(), whilst there is a possibility of hdev receiving PM_POST_SUSPEND suspend_notifier event, leading to generation of msg frames. When hci_unregister_dev() completes, i.e. hdev context is destroyed/freed, those intermittently sent msg frames cause memory leak. BUG details: Below is stack trace of thread that enters hci_unregister_dev(), marks the hdev flag HCI_UNREGISTER to 1, and then goes onto to wait on notifier lock - refer unregister_pm_notifier(). hci_unregister_dev+0xa5/0x320 [bluetoot] btusb_disconnect+0x68/0x150 [btusb] usb_unbind_interface+0x77/0x250 ? kernfs_remove_by_name_ns+0x75/0xa0 device_release_driver_internal+0xfe/0x1 device_release_driver+0x12/0x20 bus_remove_device+0xe1/0x150 device_del+0x192/0x3e0 ? usb_remove_ep_devs+0x1f/0x30 usb_disable_device+0x92/0x1b0 usb_disconnect+0xc2/0x270 hub_event+0x9f6/0x15d0 ? rpm_idle+0x23/0x360 ? rpm_idle+0x26b/0x360 process_one_work+0x209/0x3b0 worker_thread+0x34/0x400 ? process_one_work+0x3b0/0x3b0 kthread+0x126/0x140 ? kthread_park+0x90/0x90 ret_from_fork+0x22/0x30 Below is stack trace of thread executing hci_suspend_notifier() which processes the PM_POST_SUSPEND event, while the unbinding thread is waiting on lock. hci_suspend_notifier.cold.39+0x5/0x2b [bluetooth] blocking_notifier_call_chain+0x69/0x90 pm_notifier_call_chain+0x1a/0x20 pm_suspend.cold.9+0x334/0x352 state_store+0x84/0xf0 kobj_attr_store+0x12/0x20 sysfs_kf_write+0x3b/0x40 kernfs_fop_write+0xda/0x1c0 vfs_write+0xbb/0x250 ksys_write+0x61/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix hci_suspend_notifer(), not to act on events when flag HCI_UNREGISTER is set. Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: Put HCI device if inquiry procedure interruptsPan Bian
Jump to the label done to decrement the reference count of HCI device hdev on path that the Inquiry procedure is interrupted. Fixes: 3e13fa1e1fab ("Bluetooth: Fix hci_inquiry ioctl usage") Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: drop HCI device reference before returnPan Bian
Call hci_dev_put() to decrement reference count of HCI device hdev if fails to duplicate memory. Fixes: 0b26ab9dce74 ("Bluetooth: AMP: Handle Accept phylink command status evt") Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: disable advertisement filters during suspendHoward Chung
This adds logic to disable and reenable advertisement filters during suspend and resume. After this patch, we would only receive packets from devices in allow list during suspend. Signed-off-by: Howard Chung <howardchung@google.com> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: advmon offload MSFT interleave scanning integrationArchie Pusaka
When MSFT extension is supported, we don't have to interleave the scan as we could just do allowlist scan. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: advmon offload MSFT handle filter enablementArchie Pusaka
Implements the feature to disable/enable the filter used for advertising monitor on MSFT controller, effectively have the same effect as "remove all monitors" and "add all previously removed monitors". This feature would be needed when suspending, where we would not want to get packets from anything outside the allowlist. Note that the integration with the suspending part is not included in this patch. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Reviewed-by: Yun-Hao Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: advmon offload MSFT handle controller resetArchie Pusaka
When the controller is powered off, the registered advertising monitor is removed from the controller. This patch handles the re-registration of those monitors when the power is on. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Reviewed-by: Yun-Hao Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: advmon offload MSFT remove monitorArchie Pusaka
Implements the monitor removal functionality for advertising monitor offloading to MSFT controllers. Supply handle = 0 to remove all monitors. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Reviewed-by: Yun-Hao Chung <howardchung@google.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: advmon offload MSFT add monitorArchie Pusaka
Enables advertising monitor offloading to the controller, if MSFT extension is supported. The kernel won't adjust the monitor parameters to match what the controller supports - that is the user space's responsibility. This patch only manages the addition of monitors. Monitor removal is going to be handled by another patch. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Manish Mandlik <mmandlik@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Reviewed-by: Yun-Hao Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25Bluetooth: advmon offload MSFT add rssi supportArchie Pusaka
MSFT needs rssi parameter for monitoring advertisement packet, therefore we should supply them from mgmt. This adds a new opcode to add advertisement monitor with rssi parameters. Signed-off-by: Archie Pusaka <apusaka@chromium.org> Reviewed-by: Manish Mandlik <mmandlik@chromium.org> Reviewed-by: Miao-chen Chou <mcchou@chromium.org> Reviewed-by: Yun-Hao Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-01-25SUNRPC: Correct a commentChuck Lever
Clean up: The rq_argpages field was removed from struct svc_rqst in the pre-git era. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25svcrdma: DMA-sync the receive buffer in svc_rdma_recvfrom()Chuck Lever
The Receive completion handler doesn't look at the contents of the Receive buffer. The DMA sync isn't terribly expensive but it's one less thing that needs to be done by the Receive completion handler, which is single-threaded (per svc_xprt). This helps scalability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-01-25svcrdma: Reduce Receive doorbell rateChuck Lever
This is similar to commit e340c2d6ef2a ("xprtrdma: Reduce the doorbell rate (Receive)") which added Receive batching to the client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25svcrdma: Deprecate stat variables that are no longer usedChuck Lever
Clean up. We are not permitted to remove old proc files. Instead, convert these variables to stubs that are only ever allowed to display a value of zero. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25svcrdma: Restore read and write statsChuck Lever
Now that we have an efficient mechanism to update these two stats, let's start maintaining them again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25svcrdma: Convert rdma_stat_sq_starve to a per-CPU counterChuck Lever
Avoid the overhead of a memory bus lock cycle for counting a value that is hardly every used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25svcrdma: Convert rdma_stat_recv to a per-CPU counterChuck Lever
Receives are frequent events. Avoid the overhead of a memory bus lock cycle for counting a value that is hardly every used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25svcrdma: Refactor svc_rdma_init() and svc_rdma_clean_up()Chuck Lever
Setting up the proc variables is about to get more complicated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25Merge 5.11-rc5 into tty-nextGreg Kroah-Hartman
We need the fixes in here and this resolves a merge issue in drivers/tty/tty_io.c Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPENPengcheng Yang
Upon receiving a cumulative ACK that changes the congestion state from Disorder to Open, the TLP timer is not set. If the sender is app-limited, it can only wait for the RTO timer to expire and retransmit. The reason for this is that the TLP timer is set before the congestion state changes in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() until after tcp_fastretrans_alert() returns and remove the FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. This commit has two additional benefits: 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to avoid spurious RTO caused by RTO timer early expires. 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder timer is set. Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Link: https://lore.kernel.org/netdev/1611311242-6675-1-git-send-email-yangpc@wangsu.com Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1611464834-23030-1-git-send-email-yangpc@wangsu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23udp: allow forwarding of plain (non-fraglisted) UDP GRO packetsAlexander Lobakin
Commit 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") actually not only added a support for fraglisted UDP GRO, but also tweaked some logics the way that non-fraglisted UDP GRO started to work for forwarding too. Commit 2e4ef10f5850 ("net: add GSO UDP L4 and GSO fraglists to the list of software-backed types") added GSO UDP L4 to the list of software GSO to allow virtual netdevs to forward them as is up to the real drivers. Tests showed that currently forwarding and NATing of plain UDP GRO packets are performed fully correctly, regardless if the target netdevice has a support for hardware/driver GSO UDP L4 or not. Add the last element and allow to form plain UDP GRO packets if we are on forwarding path, and the new NETIF_F_GRO_UDP_FWD is enabled on a receiving netdevice. If both NETIF_F_GRO_FRAGLIST and NETIF_F_GRO_UDP_FWD are set, fraglisted GRO takes precedence. This keeps the current behaviour and is generally more optimal for now, as the number of NICs with hardware USO offload is relatively small. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23net: introduce a netdev feature for UDP GRO forwardingAlexander Lobakin
Introduce a new netdev feature, NETIF_F_GRO_UDP_FWD, to allow user to turn UDP GRO on and off for forwarding. Defaults to off to not change current datapath. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23tcp: make TCP_USER_TIMEOUT accurate for zero window probesEnke Chen
The TCP_USER_TIMEOUT is checked by the 0-window probe timer. As the timer has backoff with a max interval of about two minutes, the actual timeout for TCP_USER_TIMEOUT can be off by up to two minutes. In this patch the TCP_USER_TIMEOUT is made more accurate by taking it into account when computing the timer value for the 0-window probes. This patch is similar to and builds on top of the one that made TCP_USER_TIMEOUT accurate for RTOs in commit b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy"). Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT") Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210122191306.GA99540@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23NFC: fix resource leak when target index is invalidPan Bian
Goto to the label put_dev instead of the label error to fix potential resource leak on path that the target index is invalid. Fixes: c4fbb6515a4d ("NFC: The core part should generate the target index") Signed-off-by: Pan Bian <bianpan2016@163.com> Link: https://lore.kernel.org/r/20210121152748.98409-1-bianpan2016@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23NFC: fix possible resource leakPan Bian
Put the device to avoid resource leak on path that the polling flag is invalid. Fixes: a831b9132065 ("NFC: Do not return EBUSY when stopping a poll that's already stopped") Signed-off-by: Pan Bian <bianpan2016@163.com> Link: https://lore.kernel.org/r/20210121153745.122184-1-bianpan2016@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-23net: mrp: move struct definitions out of uapiRasmus Villemoes
None of these are actually used in the kernel/userspace interface - there's a userspace component of implementing MRP, and userspace will need to construct certain frames to put on the wire, but there's no reason the kernel should provide the relevant definitions in a UAPI header. In fact, some of those definitions were broken until previous commit, so only keep the few that are actually referenced in the kernel code, and move them to the br_private_mrp.h header. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22mlxsw: Register physical ports as a devlink resourceDanielle Ratson
The switch ASIC has a limited capacity of physical ('flavour physical' in devlink terminology) ports that it can support. While each system is brought up with a different number of ports, this number can be increased via splitting up to the ASIC's limit. Expose physical ports as a devlink resource so that user space will have visibility to the maximum number of ports that can be supported and the current occupancy. In addition, add a "Generic Resources" section in devlink-resource documentation so the different drivers will be aligned by the same resource name when exposing to user space. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22sch_htb: Stats for offloaded HTBMaxim Mikityanskiy
This commit adds support for statistics of offloaded HTB. Bytes and packets counters for leaf and inner nodes are supported, the values are taken from per-queue qdiscs, and the numbers that the user sees should have the same behavior as the software (non-offloaded) HTB. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22sch_htb: Hierarchical QoS hardware offloadMaxim Mikityanskiy
HTB doesn't scale well because of contention on a single lock, and it also consumes CPU. This patch adds support for offloading HTB to hardware that supports hierarchical rate limiting. In the offload mode, HTB passes control commands to the driver using ndo_setup_tc. The driver has to replicate the whole hierarchy of classes and their settings (rate, ceil) in the NIC. Every modification of the HTB tree caused by the admin results in ndo_setup_tc being called. After this setup, the HTB algorithm is done completely in the NIC. An SQ (send queue) is created for every leaf class and attached to the hierarchy, so that the NIC can calculate and obey aggregated rate limits, too. In the future, it can be changed, so that multiple SQs will back a single leaf class. ndo_select_queue is responsible for selecting the right queue that serves the traffic class of each packet. The data path works as follows: a packet is classified by clsact, the driver selects a hardware queue according to its class, and the packet is enqueued into this queue's qdisc. This solution addresses two main problems of scaling HTB: 1. Contention by flow classification. Currently the filters are attached to the HTB instance as follows: # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80 classid 1:10 It's possible to move classification to clsact egress hook, which is thread-safe and lock-free: # tc filter add dev eth0 egress protocol ip flower dst_port 80 action skbedit priority 1:10 This way classification still happens in software, but the lock contention is eliminated, and it happens before selecting the TX queue, allowing the driver to translate the class to the corresponding hardware queue in ndo_select_queue. Note that this is already compatible with non-offloaded HTB and doesn't require changes to the kernel nor iproute2. 2. Contention by handling packets. HTB is not multi-queue, it attaches to a whole net device, and handling of all packets takes the same lock. When HTB is offloaded, it registers itself as a multi-queue qdisc, similarly to mq: HTB is attached to the netdev, and each queue has its own qdisc. Some features of HTB may be not supported by some particular hardware, for example, the maximum number of classes may be limited, the granularity of rate and ceil parameters may be different, etc. - so, the offload is not enabled by default, a new parameter is used to enable it: # tc qdisc replace dev eth0 root handle 1: htb offload Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: sched: Add extack to Qdisc_class_ops.deleteMaxim Mikityanskiy
In a following commit, sch_htb will start using extack in the delete class operation to pass hardware errors in offload mode. This commit prepares for that by adding the extack parameter to this callback and converting usage of the existing qdiscs. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22tcp: Add receive timestamp support for receive zerocopy.Arjun Roy
tcp_recvmsg() uses the CMSG mechanism to receive control information like packet receive timestamps. This patch adds CMSG fields to struct tcp_zerocopy_receive, and provides receive timestamps if available to the user. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22tcp: Remove CMSG magic numbers for tcp_recvmsg().Arjun Roy
At present, tcp_recvmsg() uses flags to track if any CMSGs are pending and what those CMSGs are. These flags are currently magic numbers, used only within tcp_recvmsg(). To prepare for receive timestamp support in tcp receive zerocopy, gently refactor these magic numbers into enums. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: mark IGMPv3/MLDv2 fast-leave deletesNikolay Aleksandrov
Mark groups which were deleted due to fast leave/EHT. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: handle block pg delete for all casesNikolay Aleksandrov
A block report can result in empty source and host sets for both include and exclude groups so if there are no hosts left we can safely remove the group. Pull the block group handling so it can cover both cases and add a check if EHT requires the delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: add EHT host filter_mode handlingNikolay Aleksandrov
We should be able to handle host filter mode changing. For exclude mode we must create a zero-src entry so the group will be kept even without any S,G entries (non-zero source sets). That entry doesn't count to the entry limit and can always be created, its timer is refreshed on new exclude reports and if we change the host filter mode to include then it gets removed and we rely only on the non-zero source sets. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: optimize TO_INCLUDE EHT timeoutsNikolay Aleksandrov
This is an optimization specifically for TO_INCLUDE which sends queries for the older entries and thus lowers the S,G timers to LMQT. If we have the following situation for a group in either include or exclude mode: - host A was interested in srcs X and Y, but is timing out - host B sends TO_INCLUDE src Z, the bridge lowers X and Y's timeouts to LMQT - host B sends BLOCK src Z after LMQT time has passed => since host B is the last host we can delete the group, but if we still have host A's EHT entries for X and Y (i.e. if they weren't lowered to LMQT previously) then we'll have to wait another LMQT time before deleting the group, with this optimization we can directly remove it regardless of the group mode as there are no more interested hosts Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: add EHT include and exclude handlingNikolay Aleksandrov
Add support for IGMPv3/MLDv2 include and exclude EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - is_include: create missing entries - to_include: flush existing entries and create a new set from the report, obviously if the src set is empty then we delete the group - group exclude - is_exclude: create missing entries - to_exclude: flush existing entries and create a new set from the report, any empty source set entries are removed If the group is in a different mode then we just flush all entries reported by the host and we create a new set with the new mode entries created from the report. If the report is include type, the source list is empty and the group has empty sources' set then we remove it. Any source set entries which are empty are removed as well. If the group is in exclude mode it can exist without any S,G entries (allowing for all traffic to pass). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: add EHT allow/block handlingNikolay Aleksandrov
Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - allow: create missing entries - block: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - group exclude - allow - host include: create missing entries - host exclude: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries - block - host include: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - host exclude: create missing entries Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: add EHT host delete functionNikolay Aleksandrov
Now that we can delete set entries, we can use that to remove EHT hosts. Since the group's host set entries exist only when there are related source set entries we just have to flush all source set entries joined by the host set entry and it will be automatically removed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: add EHT source set handling functionsNikolay Aleksandrov
Add EHT source set and set-entry create, delete and lookup functions. These allow to manipulate source sets which contain their own host sets with entries which joined that S,G. We're limiting the maximum number of tracked S,G entries per host to PG_SRC_ENT_LIMIT (currently 32) which is the current maximum of S,G entries for a group. There's a per-set timer which will be used to destroy the whole set later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: add EHT host handling functionsNikolay Aleksandrov
Add functions to create, destroy and lookup an EHT host. These are per-host entries contained in the eht_host_tree in net_bridge_port_group which are used to store a list of all sources (S,G) entries joined for that group by each host, the host's current filter mode and total number of joined entries. No functional changes yet, these would be used in later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: add EHT structures and definitionsNikolay Aleksandrov
Add EHT structures for tracking hosts and sources per group. We keep one set for each host which has all of the host's S,G entries, and one set for each multicast source which has all hosts that have joined that S,G. For each host, source entry we record the filter_mode and we keep an expiry timer. There is also one global expiry timer per source set, it is updated with each set entry update, it will be later used to lower the set's timer instead of lowering each entry's timer separately. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: calculate idx position without changing ptrNikolay Aleksandrov
We need to preserve the srcs pointer since we'll be passing it for EHT handling later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: __grp_src_block_incl can modify pgNikolay Aleksandrov
Prepare __grp_src_block_incl() for being able to cause a notification due to changes. Currently it cannot happen, but EHT would change that since we'll be deleting sources immediately. Make sure that if the pg is deleted we don't return true as that would cause the caller to access freed pg. This patch shouldn't cause any functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: pass host src address to IGMPv3/MLDv2 functionsNikolay Aleksandrov
We need to pass the host address so later it can be used for explicit host tracking. No functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22net: bridge: multicast: rename src_size to addr_sizeNikolay Aleksandrov
Rename src_size argument to addr_size in preparation for passing host address as an argument to IGMPv3/MLDv2 functions. No functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22mptcp: implement delegated actionsPaolo Abeni
On MPTCP-level ack reception, the packet scheduler may select a subflow other then the current one. Prior to this commit we rely on the workqueue to trigger action on such subflow. This changeset introduces an infrastructure that allows any MPTCP subflow to schedule actions (MPTCP xmit) on others subflows without resorting to (multiple) process reschedule. A dummy NAPI instance is used instead. When MPTCP needs to trigger action an a different subflow, it enqueues the target subflow on the NAPI backlog and schedule such instance as needed. The dummy NAPI poll method walks the sockets backlog and tries to acquire the (BH) socket lock on each of them. If the socket is owned by the user space, the action will be completed by the sock release cb, otherwise push is started. This change leverages the delegated action infrastructure to avoid invoking the MPTCP worker to spool the pending data, when the packet scheduler picks a subflow other then the one currently processing the incoming MPTCP-level ack. Additionally we further refine the subflow selection invoking the packet scheduler for each chunk of data even inside __mptcp_subflow_push_pending(). v1 -> v2: - fix possible UaF at shutdown time, resetting sock ops after removing the ulp context Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22mptcp: schedule work for better snd subflow selectionPaolo Abeni
Otherwise the packet scheduler policy will not be enforced when pushing pending data at MPTCP-level ack reception time. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>