linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-12-02	Bluetooth: L2CAP: Fix u8 overflow	Sungwoo Kim
	By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...) > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #32 [hci0] 14.273106 invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............ > ACL Data RX: Handle 200 flags 0x00 dlen 1547 #33 [hci0] 14.273561 invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@..... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #34 [hci0] 14.274390 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@....... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #35 [hci0] 14.274932 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828 > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #36 [hci0] 14.275753 invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@... Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02	Bluetooth: silence a dmesg error message in hci_request.c	Mateusz Jończyk
	On kernel 6.1-rcX, I have been getting the following dmesg error message on every boot, resume from suspend and rfkill unblock of the Bluetooth device: Bluetooth: hci0: HCI_REQ-0xfcf0 After some investigation, it turned out to be caused by commit dd50a864ffae ("Bluetooth: Delete unreferenced hci_request code") which modified hci_req_add() in net/bluetooth/hci_request.c to always print an error message when it is executed. In my case, the function was executed by msft_set_filter_enable() in net/bluetooth/msft.c, which provides support for Microsoft vendor opcodes. As explained by Brian Gix, "the error gets logged because it is using a deprecated (but still working) mechanism to issue HCI opcodes" [1]. So this is just a debugging tool to show that a deprecated function is executed. As such, it should not be included in the mainline kernel. See for example commit 771c035372a0 ("deprecate the '__deprecated' attribute warnings entirely and for good") Additionally, this error message is cryptic and the user is not able to do anything about it. [1] Link: https://lore.kernel.org/lkml/beb8dcdc3aee4c5c833aa382f35995f17e7961a1.camel@intel.com/ Fixes: dd50a864ffae ("Bluetooth: Delete unreferenced hci_request code") Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Cc: Brian Gix <brian.gix@intel.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02	Bluetooth: hci_conn: add missing hci_dev_put() in iso_listen_bis()	Wang ShaoBo
	hci_get_route() takes reference, we should use hci_dev_put() to release it when not need anymore. Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support") Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02	Bluetooth: 6LoWPAN: add missing hci_dev_put() in get_l2cap_conn()	Wang ShaoBo
	hci_get_route() takes reference, we should use hci_dev_put() to release it when not need anymore. Fixes: 6b8d4a6a0314 ("Bluetooth: 6LoWPAN: Use connected oriented channel instead of fixed one") Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02	Bluetooth: btusb: Add debug message for CSR controllers	Ismael Ferreras Morezuelas
	The rationale of showing this is that it's potentially critical information to diagnose and find more CSR compatibility bugs in the future and it will save a lot of headaches. Given that clones come from a wide array of vendors (some are actually Barrot, some are something else) and these numbers are what let us find differences between actual and fake ones, it will be immensely helpful to scour the Internet looking for this pattern and building an actual database to find correlations and improve the checks. Cc: stable@vger.kernel.org Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02	Bluetooth: btusb: Fix CSR clones again by re-adding ERR_DATA_REPORTING quirk	Ismael Ferreras Morezuelas
	A patch series by a Qualcomm engineer essentially removed my quirk/workaround because they thought it was unnecessary. It wasn't, and it broke everything again: https://patchwork.kernel.org/project/netdevbpf/list/?series=661703&archive=both&state=* He argues that the quirk is not necessary because the code should check if the dongle says if it's supported or not. The problem is that for these Chinese CSR clones they say that it would work: = New Index: 00:00:00:00:00:00 (Primary,USB,hci0) = Open Index: 00:00:00:00:00:00 < HCI Command: Read Local Version Information (0x04\|0x0001) plen 0 > HCI Event: Command Complete (0x0e) plen 12 > [hci0] 11.276039 Read Local Version Information (0x04\|0x0001) ncmd 1 Status: Success (0x00) HCI version: Bluetooth 5.0 (0x09) - Revision 2064 (0x0810) LMP version: Bluetooth 5.0 (0x09) - Subversion 8978 (0x2312) Manufacturer: Cambridge Silicon Radio (10) ... < HCI Command: Read Local Supported Features (0x04\|0x0003) plen 0 > HCI Event: Command Complete (0x0e) plen 68 > [hci0] 11.668030 Read Local Supported Commands (0x04\|0x0002) ncmd 1 Status: Success (0x00) Commands: 163 entries ... Read Default Erroneous Data Reporting (Octet 18 - Bit 2) Write Default Erroneous Data Reporting (Octet 18 - Bit 3) ... ... < HCI Command: Read Default Erroneous Data Reporting (0x03\|0x005a) plen 0 = Close Index: 00:1A:7D:DA:71:XX So bring it back wholesale. Fixes: 63b1a7dd38bf ("Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING") Fixes: e168f6900877 ("Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR") Fixes: 766ae2422b43 ("Bluetooth: hci_sync: Check LMP feature bit instead of quirk") Cc: stable@vger.kernel.org Cc: Zijun Hu <quic_zijuhu@quicinc.com> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Tested-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com> Signed-off-by: Ismael Ferreras Morezuelas <swyterzone@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-12-02	wifi: ath10k: fix QCOM_SMEM dependency	Kalle Valo
	Nathan noticed that when HWSPINLOCK is disabled there's a Kconfig warning: WARNING: unmet direct dependencies detected for QCOM_SMEM Depends on [n]: (ARCH_QCOM [=y] \|\| COMPILE_TEST [=n]) && HWSPINLOCK [=n] Selected by [m]: - ATH10K_SNOC [=m] && NETDEVICES [=y] && WLAN [=y] && WLAN_VENDOR_ATH [=y] && ATH10K [=m] && (ARCH_QCOM [=y] \|\| COMPILE_TEST [=n]) The problem here is that QCOM_SMEM depends on HWSPINLOCK so we cannot select QCOM_SMEM and instead we neeed to use 'depends on'. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/all/Y4YsyaIW+CPdHWv3@dev-arch.thelio-3990X/ Fixes: 4d79f6f34bbb ("wifi: ath10k: Store WLAN firmware version in SMEM image table") Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221202103027.25974-1-kvalo@kernel.org
2022-12-02	KVM: Document the interaction between KVM_CAP_HALT_POLL and halt_poll_ns	David Matlack
	Clarify the existing documentation about how KVM_CAP_HALT_POLL and halt_poll_ns interact to make it clear that VMs using KVM_CAP_HALT_POLL ignore halt_poll_ns. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20221201195249.3369720-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02	KVM: Move halt-polling documentation into common directory	David Matlack
	Move halt-polling.rst into the common KVM documentation directory and out of the x86-specific directory. Halt-polling is a common feature and the existing documentation is already written as such. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20221201195249.3369720-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-02	Merge tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme into block-6.1	Jens Axboe
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.1 - fix SRCU protection of nvme_ns_head list (Caleb Sander) - clear the prp2 field when not used (Lei Rao)" * tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme: nvme: fix SRCU protection of nvme_ns_head list nvme-pci: clear the prp2 field when not used
2022-12-02	tsnep: Rework RX buffer allocation	Gerhard Engleder
	Refill RX queue in batches of descriptors to improve performance. Refill is allowed to fail as long as a minimum number of descriptors is active. Thus, a limited number of failed RX buffer allocations is now allowed for normal operation. Previously every failed allocation resulted in a dropped frame. If the minimum number of active descriptors is reached, then RX buffers are still reused and frames are dropped. This ensures that the RX queue never runs empty and always continues to operate. Prework for future XDP support. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	tsnep: Throttle interrupts	Gerhard Engleder
	Without interrupt throttling, iperf server mode generates a CPU load of 100% (A53 1.2GHz). Also the throughput suffers with less than 900Mbit/s on a 1Gbit/s link. The reason is a high interrupt load with interrupts every ~20us. Reduce interrupt load by throttling of interrupts. Interrupt delay default is 64us. For iperf server mode the CPU load is significantly reduced to ~20% and the throughput reaches the maximum of 941MBit/s. Interrupts are generated every ~140us. RX and TX coalesce can be configured with ethtool. RX coalesce has priority over TX coalesce if the same interrupt is used. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	tsnep: Add ethtool::get_channels support	Gerhard Engleder
	Allow user space to read number of TX and RX queue. This is useful for device dependent qdisc configurations like TAPRIO with hardware offload. Also ethtool::get_per_queue_coalesce / set_per_queue_coalesce requires that interface. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	tsnep: Consistent naming of struct net_device	Gerhard Engleder
	Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	Documentation: bonding: correct xmit hash steps	Jonathan Toppins
	Correct xmit hash steps for layer3+4 as introduced by commit 49aefd131739 ("bonding: do not discard lowest hash bit for non layer3+4 hashing"). Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	Documentation: bonding: update miimon default to 100	Jonathan Toppins
	With commit c1f897ce186a ("bonding: set default miimon value for non-arp modes if not set") the miimon default was changed from zero to 100 if arp_interval is also zero. Document this fact in bonding.rst. Fixes: c1f897ce186a ("bonding: set default miimon value for non-arp modes if not set") Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()	Xiongfeng Wang
	for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() for the error path to avoid reference count leak. Fixes: 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-3-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02	iommu/vt-d: Fix PCI device refcount leak in has_external_pci()	Xiongfeng Wang
	for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() before 'return true' to avoid reference count leak. Fixes: 89a6079df791 ("iommu/vt-d: Force IOMMU on for platform opt in hint") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-2-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02	iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()	Yang Yingliang
	As comment of pci_get_domain_bus_and_slot() says, it returns a pci device with refcount increment, when finish using it, the caller must decrease the reference count by calling pci_dev_put(). So call pci_dev_put() after using the 'pdev' to avoid refcount leak. Besides, if the 'pdev' is null or intel_svm_prq_report() returns error, there is no need to trace this fault. Fixes: 06f4b8d09dba ("iommu/vt-d: Remove unnecessary SVA data accesses in page fault path") Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221119144028.2452731-1-yangyingliang@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02	iommu/vt-d: Add a fix for devices need extra dtlb flush	Jacob Pan
	QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in address translation service (ATS). These devices may inadvertently issue ATS invalidation completion before posted writes initiated with translated address that utilized translations matching the invalidation address range, violating the invalidation completion ordering. This patch adds an extra device TLB invalidation for the affected devices, it is needed to ensure no more posted writes with translated address following the invalidation completion. Therefore, the ordering is preserved and data-corruption is prevented. Device TLBs are invalidated under the following six conditions: 1. Device driver does DMA API unmap IOVA 2. Device driver unbind a PASID from a process, sva_unbind_device() 3. PASID is torn down, after PASID cache is flushed. e.g. process exit_mmap() due to crash 4. Under SVA usage, called by mmu_notifier.invalidate_range() where VM has to free pages that were unmapped 5. userspace driver unmaps a DMA buffer 6. Cache invalidation in vSVA usage (upcoming) For #1 and #2, device drivers are responsible for stopping DMA traffic before unmap/unbind. For #3, iommu driver gets mmu_notifier to invalidate TLB the same way as normal user unmap which will do an extra invalidation. The dTLB invalidation after PASID cache flush does not need an extra invalidation. Therefore, we only need to deal with #4 and #5 in this patch. #1 is also covered by this patch due to common code path with #5. Tested-by: Yuzhang Luo <yuzhang.luo@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20221130062449.1360063-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02	net: thunderbolt: Use bitwise types in the struct thunderbolt_ip_frame_header	Andy Shevchenko
	The main usage of the struct thunderbolt_ip_frame_header is to handle the packets on the media layer. The header is bound to the protocol in which the byte ordering is crucial. However the data type definition doesn't use that and sparse is unhappy, for example (17 altogether): .../thunderbolt.c:718:23: warning: cast to restricted __le32 .../thunderbolt.c:966:42: warning: incorrect type in assignment (different base types) .../thunderbolt.c:966:42: expected unsigned int [usertype] frame_count .../thunderbolt.c:966:42: got restricted __le32 [usertype] Switch to the bitwise types in the struct thunderbolt_ip_frame_header to reduce this, but not completely solving (9 left), because the same data type is used for Rx header handled locally (in CPU byte order). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	net: thunderbolt: Switch from __maybe_unused to pm_sleep_ptr() etc	Andy Shevchenko
	Letting the compiler remove these functions when the kernel is built without CONFIG_PM_SLEEP support is simpler and less heavier for builds than the use of __maybe_unused attributes. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	net: devlink: convert port_list into xarray	Jiri Pirko
	Some devlink instances may contain thousands of ports. Storing them in linked list and looking them up is not scalable. Convert the linked list into xarray. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	Merge branch 'vmxnet3-fixes'	David S. Miller
	Ronak Doshi says: ==================== vmxnet3: couple of fixes This series fixes following issues: Patch 1: This patch provides a fix to correctly report encapsulated LRO'ed packet. Patch 2: This patch provides a fix to use correct intrConf reference. Changes in v2: - declare generic descriptor to be used - remove white spaces - remove single quote around commit reference in patch 2 - remove if check for encap_lro ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	vmxnet3: use correct intrConf reference when using extended queues	Ronak Doshi
	Commit 39f9895a00f4 ("vmxnet3: add support for 32 Tx/Rx queues") added support for 32Tx/Rx queues. As a part of this patch, intrConf structure was extended to incorporate increased queues. This patch fixes the issue where incorrect reference is being used. Fixes: 39f9895a00f4 ("vmxnet3: add support for 32 Tx/Rx queues") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-02	vmxnet3: correctly report encapsulated LRO packet	Ronak Doshi
	Commit dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the pathc did not report correctly the encapsulated packet which is LRO'ed by the hypervisor. This patch fixes this issue by using correct callback for the LRO'ed encapsulated packet. Fixes: dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-01	Merge branch 'hsr'	Jakub Kicinski
	Sebastian Andrzej Siewior says: ==================== I started playing with HSR and run into a problem. Tested latest upstream -rc and noticed more problems. Now it appears to work. For testing I have a small three node setup with iperf and ping. While iperf doesn't complain ping reports missing packets and duplicates. ==================== Link: https://lore.kernel.org/r/20221129164815.128922-1-bigeasy@linutronix.de/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: Add a basic HSR test.	Sebastian Andrzej Siewior
	This test adds a basic HSRv0 network with 3 nodes. In its current shape it sends and forwards packets, announcements and so merges nodes based on MAC A/B information. It is able to detect duplicate packets and packetloss should any occur. Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	hsr: Use a single struct for self_node.	Sebastian Andrzej Siewior
	self_node_db is a list_head with one entry of struct hsr_node. The purpose is to hold the two MAC addresses of the node itself. It is convenient to recycle the structure. However having a list_head and fetching always the first entry is not really optimal. Created a new data strucure contaning the two MAC addresses named hsr_self_node. Access that structure like an RCU protected pointer so it can be replaced on the fly without blocking the reader. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	hsr: Synchronize sequence number updates.	Sebastian Andrzej Siewior
	hsr_register_frame_out() compares new sequence_nr vs the old one recorded in hsr_node::seq_out and if the new sequence_nr is higher then it will be written to hsr_node::seq_out as the new value. This operation isn't locked so it is possible that two frames with the same sequence number arrive (via the two slave devices) and are fed to hsr_register_frame_out() at the same time. Both will pass the check and update the sequence counter later to the same value. As a result the content of the same packet is fed into the stack twice. This was noticed by running ping and observing DUP being reported from time to time. Instead of using the hsr_priv::seqnr_lock for the whole receive path (as it is for sending in the master node) add an additional lock that is only used for sequence number checks and updates. Add a per-node lock that is used during sequence number reads and updates. Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	hsr: Synchronize sending frames to have always incremented outgoing seq nr.	Sebastian Andrzej Siewior
	Sending frames via the hsr (master) device requires a sequence number which is tracked in hsr_priv::sequence_nr and protected by hsr_priv::seqnr_lock. Each time a new frame is sent, it will obtain a new id and then send it via the slave devices. Each time a packet is sent (via hsr_forward_do()) the sequence number is checked via hsr_register_frame_out() to ensure that a frame is not handled twice. This make sense for the receiving side to ensure that the frame is not injected into the stack twice after it has been received from both slave ports. There is no locking to cover the sending path which means the following scenario is possible: CPU0 CPU1 hsr_dev_xmit(skb1) hsr_dev_xmit(skb2) fill_frame_info() fill_frame_info() hsr_fill_frame_info() hsr_fill_frame_info() handle_std_frame() handle_std_frame() skb1's sequence_nr = 1 skb2's sequence_nr = 2 hsr_forward_do() hsr_forward_do() hsr_register_frame_out(, 2) // okay, send) hsr_register_frame_out(, 1) // stop, lower seq duplicate Both skbs (or their struct hsr_frame_info) received an unique id. However since skb2 was sent before skb1, the higher sequence number was recorded in hsr_register_frame_out() and the late arriving skb1 was dropped and never sent. This scenario has been observed in a three node HSR setup, with node1 + node2 having ping and iperf running in parallel. From time to time ping reported a missing packet. Based on tracing that missing ping packet did not leave the system. It might be possible (didn't check) to drop the sequence number check on the sending side. But if the higher sequence number leaves on wire before the lower does and the destination receives them in that order and it will drop the packet with the lower sequence number and never inject into the stack. Therefore it seems the only way is to lock the whole path from obtaining the sequence number and sending via dev_queue_xmit() and assuming the packets leave on wire in the same order (and don't get reordered by the NIC). Cover the whole path for the master interface from obtaining the ID until after it has been forwarded via hsr_forward_skb() to ensure the skbs are sent to the NIC in the order of the assigned sequence numbers. Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	hsr: Disable netpoll.	Sebastian Andrzej Siewior
	The hsr device is a software device. Its net_device_ops::ndo_start_xmit() routine will process the packet and then pass the resulting skb to dev_queue_xmit(). During processing, hsr acquires a lock with spin_lock_bh() (hsr_add_node()) which needs to be promoted to the _irq() suffix in order to avoid a potential deadlock. Then there are the warnings in dev_queue_xmit() (due to local_bh_disable() with disabled interrupts) left. Instead trying to address those (there is qdisc and…) for netpoll sake, just disable netpoll on hsr. Disable netpoll on hsr and replace the _irqsave() locking with _bh(). Fixes: f421436a591d3 ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	hsr: Avoid double remove of a node.	Sebastian Andrzej Siewior
	Due to the hashed-MAC optimisation one problem become visible: hsr_handle_sup_frame() walks over the list of available nodes and merges two node entries into one if based on the information in the supervision both MAC addresses belong to one node. The list-walk happens on a RCU protected list and delete operation happens under a lock. If the supervision arrives on both slave interfaces at the same time then this delete operation can occur simultaneously on two CPUs. The result is the first-CPU deletes the from the list and the second CPUs BUGs while attempting to dereference a poisoned list-entry. This happens more likely with the optimisation because a new node for the mac_B entry is created once a packet has been received and removed (merged) once the supervision frame has been received. Avoid removing/ cleaning up a hsr_node twice by adding a `removed' field which is set to true after the removal and checked before the removal. Fixes: f266a683a4804 ("net/hsr: Better frame dispatch") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	hsr: Add a rcu-read lock to hsr_forward_skb().	Sebastian Andrzej Siewior
	hsr_forward_skb() a skb and keeps information in an on-stack hsr_frame_info. hsr_get_node() assigns hsr_frame_info::node_src which is from a RCU list. This pointer is used later in hsr_forward_do(). I don't see a reason why this pointer can't vanish midway since there is no guarantee that hsr_forward_skb() is invoked from an RCU read section. Use rcu_read_lock() to protect hsr_frame_info::node_src from its assignment until it is no longer used. Fixes: f266a683a4804 ("net/hsr: Better frame dispatch") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	Revert "net: hsr: use hlist_head instead of list_head for mac addresses"	Sebastian Andrzej Siewior
	The hlist optimisation (which not only uses hlist_head instead of list_head but also splits hsr_priv::node_db into an array of 256 slots) does not consider the "node merge": Upon starting the hsr network (with three nodes) a packet that is sent from node1 to node3 will also be sent from node1 to node2 and then forwarded to node3. As a result node3 will receive 2 packets because it is not able to filter out the duplicate. Each packet received will create a new struct hsr_node with macaddress_A only set the MAC address it received from (the two MAC addesses from node1). At some point (early in the process) two supervision frames will be received from node1. They will be processed by hsr_handle_sup_frame() and one frame will leave early ("Node has already been merged") and does nothing. The other frame will be merged as portB and have its MAC address written to macaddress_B and the hsr_node (that was created for it as macaddress_A) will be removed. From now on HSR is able to identify a duplicate because both packets sent from one node will result in the same struct hsr_node because hsr_get_node() will find the MAC address either on macaddress_A or macaddress_B. Things get tricky with the optimisation: If sender's MAC address is saved as macaddress_A then the lookup will work as usual. If the MAC address has been merged into macaddress_B of another hsr_node then the lookup won't work because it is likely that the data structure is in another bucket. This results in creating a new struct hsr_node and not recognising a possible duplicate. A way around it would be to add another hsr_node::mac_list_B and attach it to the other bucket to ensure that this hsr_node will be looked up either via macaddress_A _or_ macaddress_B. I however prefer to revert it because it sounds like an academic problem rather than real life workload plus it adds complexity. I'm not an HSR expert with what is usual size of a network but I would guess 40 to 60 nodes. With 10.000 nodes and assuming 60us for pass-through (from node to node) then it would take almost 600ms for a packet to almost wrap around which sounds a lot. Revert the hash MAC addresses optimisation. Fixes: 4acc45db71158 ("net: hsr: use hlist_head instead of list_head for mac addresses") Cc: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	sctp: delete free member from struct sctp_sched_ops	Xin Long
	After commit 9ed7bfc79542 ("sctp: fix memory leak in sctp_stream_outq_migrate()"), sctp_sched_set_sched() is the only place calling sched->free(), and it can actually be replaced by sched->free_sid() on each stream, and yet there's already a loop to traverse all streams in sctp_sched_set_sched(). This patch adds a function sctp_sched_free_sched() where it calls sched->free_sid() for each stream to replace sched->free() calls in sctp_sched_set_sched() and then deletes the unused free member from struct sctp_sched_ops. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/e10aac150aca2686cb0bd0570299ec716da5a5c0.1669849471.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	Merge branch '1GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-11-30 (e1000e, igb) This series contains updates to e1000e and igb drivers. Akihiko Odaki fixes calculation for checking whether space for next frame exists for e1000e and properly sets MSI-X vector to fix failing ethtool interrupt test for igb. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igb: Allocate MSI-X vector when testing e1000e: Fix TX dispatch condition ==================== Link: https://lore.kernel.org/r/20221130194228.3257787-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	Merge branch 'mptcp-pm-listener-events-selftests-cleanup'	Jakub Kicinski
	Matthieu Baerts says: ==================== mptcp: PM listener events + selftests cleanup Thanks to the patch 6/11, the MPTCP path manager now sends Netlink events when MPTCP listening sockets are created and closed. The reason why it is needed is explained in the linked ticket [1]: MPTCP for Linux, when not using the in-kernel PM, depends on the userspace PM to create extra listening sockets before announcing addresses and ports. Let's call these "PM listeners". With the existing MPTCP netlink events, a userspace PM can create PM listeners at startup time, or in response to an incoming connection. Creating sockets in response to connections is not optimal: ADD_ADDRs can't be sent until the sockets are created and listen()ed, and if all connections are closed then it may not be clear to the userspace PM daemon that PM listener sockets should be cleaned up. Hence this feature request: to add MPTCP netlink events for listening socket close & create, so PM listening sockets can be managed based on application activity. [1] https://github.com/multipath-tcp/mptcp_net-next/issues/313 Selftests for these new Netlink events have been added in patches 9,11/11. The remaining patches introduce different cleanups and small improvements in MPTCP selftests to ease the maintenance and the addition of new tests. ==================== Link: https://lore.kernel.org/r/20221130140637.409926-1-matthieu.baerts@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: listener test for in-kernel PM	Geliang Tang
	This patch adds test coverage for listening sockets created by the in-kernel path manager in mptcp_join.sh. It adds the listener event checking in the existing "remove single address with port" test. The output looks like this: 003 remove single address with port syn[ ok ] - synack[ ok ] - ack[ ok ] add[ ok ] - echo [ ok ] - pt [ ok ] syn[ ok ] - synack[ ok ] - ack[ ok ] syn[ ok ] - ack [ ok ] rm [ ok ] - rmsf [ ok ] invert CREATE_LISTENER 10.0.2.1:10100[ ok ] CLOSE_LISTENER 10.0.2.1:10100 [ ok ] Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: make evts global in mptcp_join	Geliang Tang
	This patch moves evts_ns1 and evts_ns2 out of do_transfer() as two global variables in mptcp_join.sh. Init them in init() and remove them in cleanup(). Add a new helper reset_with_events() to save the outputs of 'pm_nl_ctl events' command in them. And a new helper kill_events_pids() to kill pids of 'pm_nl_ctl events' command. Use these helpers in userspace pm tests. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: listener test for userspace PM	Geliang Tang
	This patch adds test coverage for listening sockets created by userspace processes. It adds a new test named test_listener() and a new verifying helper verify_listener_events(). The new output looks like this: CREATE_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1) [OK] DESTROY_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1) [OK] MP_PRIO TX [OK] MP_PRIO RX [OK] CREATE_LISTENER 10.0.2.2:37106 [OK] CLOSE_LISTENER 10.0.2.2:37106 [OK] Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: make evts global in userspace_pm	Geliang Tang
	This patch makes server_evts and client_evts global in userspace_pm.sh, then these two variables could be used in test_announce(), test_remove() and test_subflows(). The local variable 'evts' in these three functions then could be dropped. Also move local variable 'file' as a global one. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: enhance userspace pm tests	Geliang Tang
	Some userspace pm tests failed since pm listener events have been added. Now MPTCP_EVENT_LISTENER_CREATED event becomes the first item in the events list like this: type:15,family:2,sport:10006,saddr4:0.0.0.0 type:1,token:3701282876,server_side:1,family:2,saddr4:10.0.1.1,... And no token value in this MPTCP_EVENT_LISTENER_CREATED event. This patch fixes this by specifying the type 1 item to search for token values. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	mptcp: add pm listener events	Geliang Tang
	This patch adds two new MPTCP netlink event types for PM listening socket create and close, named MPTCP_EVENT_LISTENER_CREATED and MPTCP_EVENT_LISTENER_CLOSED. Add a new function mptcp_event_pm_listener() to push the new events with family, port and addr to userspace. Invoke mptcp_event_pm_listener() with MPTCP_EVENT_LISTENER_CREATED in mptcp_listen() and mptcp_pm_nl_create_listen_socket(), invoke it with MPTCP_EVENT_LISTENER_CLOSED in __mptcp_close_ssk(). Signed-off-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: declare var as local	Matthieu Baerts
	Just to avoid classical Bash pitfall where variables are accidentally overridden by other functions because the proper scope has not been defined. That's also what is done in other MPTCP selftests scripts where all non local variables are defined at the beginning of the script and the others are defined with the "local" keyword. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: clearly declare global ns vars	Matthieu Baerts
	It is clearer to declare these global variables at the beginning of the file as it is done in other MPTCP selftests rather than in functions in the middle of the script. So for uniformity reason, we can do the same here in mptcp_sockopt.sh. Suggested-by: Geliang Tang <geliang.tang@suse.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: uniform 'rndh' variable	Matthieu Baerts
	The definition of 'rndh' was probably copied from one script to another but some times, 'sec' was not defined, not used and/or not spelled properly. Here all the 'rndh' are now defined the same way. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: removed defined but unused vars	Matthieu Baerts
	Some variables were set but never used. This was not causing any issues except adding some confusion and having shellcheck complaining about them. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	selftests: mptcp: run mptcp_inq from a clean netns	Matthieu Baerts
	A new "sandbox" net namespace is available where no other netfilter rules have been added. Use this new netns instead of re-using "ns1" and clean it. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01	bnxt: report FEC block stats via standard interface	Jakub Kicinski
	I must have missed that these stats are only exposed via the unstructured ethtool -S when they got merged. Plumb in the structured form. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20221130013108.90062-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>