linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-06-28	Bluetooth: qca: Fix BT enable failure again for QCA6390 after warm reboot	Zijun Hu
	Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev") will cause below regression issue: BT can't be enabled after below steps: cold boot -> enable BT -> disable BT -> warm reboot -> BT enable failure if property enable-gpios is not configured within DT\|ACPI for QCA6390. The commit is to fix a use-after-free issue within qca_serdev_shutdown() by adding condition to avoid the serdev is flushed or wrote after closed but also introduces this regression issue regarding above steps since the VSC is not sent to reset controller during warm reboot. Fixed by sending the VSC to reset controller within qca_serdev_shutdown() once BT was ever enabled, and the use-after-free issue is also fixed by this change since the serdev is still opened before it is flushed or wrote. Verified by the reported machine Dell XPS 13 9310 laptop over below two kernel commits: commit e00fc2700a3f ("Bluetooth: btusb: Fix triggering coredump implementation for QCA") of bluetooth-next tree. commit b23d98d46d28 ("Bluetooth: btusb: Fix triggering coredump implementation for QCA") of linus mainline tree. Fixes: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev") Cc: stable@vger.kernel.org Reported-by: Wren Turkal <wt@penguintechs.org> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218726 Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Tested-by: Wren Turkal <wt@penguintechs.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-28	Bluetooth: hci_event: Fix setting of unicast qos interval	Luiz Augusto von Dentz
	qos->ucast interval reffers to the SDU interval, and should not be set to the interval value reported by the LE CIS Established event since the latter reffers to the ISO interval. These two interval are not the same thing: BLUETOOTH CORE SPECIFICATION Version 5.3 \| Vol 6, Part G Isochronous interval: The time between two consecutive BIS or CIS events (designated ISO_Interval in the Link Layer) SDU interval: The nominal time between two consecutive SDUs that are sent or received by the upper layer. So this instead uses the following formula from the spec to calculate the resulting SDU interface: BLUETOOTH CORE SPECIFICATION Version 5.4 \| Vol 6, Part G page 3075: Transport_Latency_C_To_P = CIG_Sync_Delay + (FT_C_To_P) × ISO_Interval + SDU_Interval_C_To_P Transport_Latency_P_To_C = CIG_Sync_Delay + (FT_P_To_C) × ISO_Interval + SDU_Interval_P_To_C Link: https://github.com/bluez/bluez/issues/823 Fixes: 2be22f1941d5 ("Bluetooth: hci_event: Fix parsing of CIS Established Event") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-28	Bluetooth: btintel_pcie: Fix REVERSE_INULL issue reported by coverity	Vijay Satija
	check pdata return of skb_pull_data, instead of data. Fixes: c2b636b3f788 ("Bluetooth: btintel_pcie: Add support for PCIe transport") Signed-off-by: Vijay Satija <vijay.satija@intel.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-28	Bluetooth: hci_bcm4377: Fix msgid release	Hector Martin
	We are releasing a single msgid, so the order argument to bitmap_release_region must be zero. Fixes: 8a06127602de ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards") Cc: stable@vger.kernel.org Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Sven Peter <sven@svenpeter.dev> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-28	Bluetooth: Add quirk to ignore reserved PHY bits in LE Extended Adv Report	Sven Peter
	Some Broadcom controllers found on Apple Silicon machines abuse the reserved bits inside the PHY fields of LE Extended Advertising Report events for additional flags. Add a quirk to drop these and correctly extract the Primary/Secondary_PHY field. The following excerpt from a btmon trace shows a report received with "Reserved" for "Primary PHY" on a 4388 controller: > HCI Event: LE Meta Event (0x3e) plen 26 LE Extended Advertising Report (0x0d) Num reports: 1 Entry 0 Event type: 0x2515 Props: 0x0015 Connectable Directed Use legacy advertising PDUs Data status: Complete Reserved (0x2500) Legacy PDU Type: Reserved (0x2515) Address type: Random (0x01) Address: 00:00:00:00:00:00 (Static) Primary PHY: Reserved Secondary PHY: No packets SID: no ADI field (0xff) TX power: 127 dBm RSSI: -60 dBm (0xc4) Periodic advertising interval: 0.00 msec (0x0000) Direct address type: Public (0x00) Direct address: 00:00:00:00:00:00 (Apple, Inc.) Data length: 0x00 Cc: stable@vger.kernel.org Fixes: 2e7ed5f5e69b ("Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync") Reported-by: Janne Grunau <j@jannau.net> Closes: https://lore.kernel.org/all/Zjz0atzRhFykROM9@robin Tested-by: Janne Grunau <j@jannau.net> Signed-off-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-06-28	ice: Distinguish driver reset and removal for AQ shutdown	Piotr Gardocki
	Admin queue command for shutdown AQ contains a flag to indicate driver unload. However, the flag is always set in the driver, even for resets. It can cause the firmware to consider driver as unloaded once the PF reset is triggered on all ports of device, which could lead to unexpected results. Add an additional function parameter to functions that shutdown AQ, indicating whether the driver is actually unloading. Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28	ice: Allow different FW API versions based on MAC type	Paul Greenwalt
	Allow the driver to be compatible with different FW API versions based on the device's MAC type. Currently, E810 is only compatible with one FW API version. Now the driver can be compatible with different FW API versions for both E810 and E830. For example, E810 FW API version is 1.5.0 and E830 is 1.7.0. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28	ice: Check all ice_vsi_rebuild() errors in function	Eric Joyner
	Check the return value from ice_vsi_rebuild() and prevent the usage of incorrectly configured VSI. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28	ice: Add get/set hw address for VFs using devlink commands	Karthik Sundaravel
	Changing the MAC address of the VFs is currently unsupported via devlink. Add the function handlers to set and get the HW address for the VFs. Signed-off-by: Karthik Sundaravel <ksundara@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28	MAINTAINERS: update Intel Ethernet maintainers	Jesse Brandeburg
	Since Jesse has moved to a new role, replace him with a new maintainer to work with Tony on representing Intel networking drivers in the kernel. Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-28	Merge tag 'nfsd-6.10-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Due to a late review, revert and re-fix a recent crasher fix * tag 'nfsd-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "nfsd: fix oops when reading pool_stats before server is started" nfsd: initialise nfsd_info.mutex early.
2024-06-28	Merge tag 'bcachefs-2024-06-28' of https://evilpiepirate.org/git/bcachefs	Linus Torvalds
	Pull bcachefs fixes from Kent Overstreet: "Simple stuff: - NULL ptr/err ptr deref fixes - fix for getting wedged on shutdown after journal error - fix missing recalc_capacity() call, capacity now changes correctly after a device goes read only however: our capacity calculation still doesn't take into account when we have mixed ro/rw devices and the ro devices have data on them, that's going to be a more involved fix to separate accounting for "capacity used on ro devices" and "capacity used on rw devices" - boring syzbot stuff Slightly more involved: - discard, invalidate workers are now per device this has the effect of simplifying how we take device refs in these paths, and the device ref cleanup fixes a longstanding race between the device removal path and the discard path - fixes for how the debugfs code takes refs on btree_trans objects we have debugfs code that prints in use btree_trans objects. It uses closure_get() on trans->ref, which is mainly for the cycle detector, but the debugfs code was using it on a closure that may have hit 0, which is not allowed; for performance reasons we cannot avoid having not-in-use transactions on the global list. Introduce some new primitives to fix this and make the synchronization here a whole lot saner" * tag 'bcachefs-2024-06-28' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix kmalloc bug in __snapshot_t_mut bcachefs: Discard, invalidate workers are now per device bcachefs: Fix shift-out-of-bounds in bch2_blacklist_entries_gc bcachefs: slab-use-after-free Read in bch2_sb_errors_from_cpu bcachefs: Add missing bch2_journal_do_writes() call bcachefs: Fix null ptr deref in journal_pins_to_text() bcachefs: Add missing recalc_capacity() call bcachefs: Fix btree_trans list ordering bcachefs: Fix race between trans_put() and btree_transactions_read() closures: closure_get_not_zero(), closure_return_sync() bcachefs: Make btree_deadlock_to_text() clearer bcachefs: fix seqmutex_relock() bcachefs: Fix freeing of error pointers
2024-06-28	Merge tag 'block-6.10-20240628' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block fixes from Jens Axboe: "NVMe fixes via Keith: - Fabrics fixes (Hannes) - Missing module description (Jeff) - Clang warning fix (Nathan)" * tag 'block-6.10-20240628' of git://git.kernel.dk/linux: nvmet-fc: Remove __counted_by from nvmet_fc_tgt_queue.fod[] nvmet: make 'tsas' attribute idempotent for RDMA nvme: fixup comment for nvme RDMA Provider Type nvme-apple: add missing MODULE_DESCRIPTION() nvmet: do not return 'reserved' for empty TSAS values nvme: fix NVME_NS_DEAC may incorrectly identifying the disk as EXT_LBA.
2024-06-28	Merge tag 'iommu-fixes-v6.10-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - Two cache flushing fixes for Intel and AMD drivers - AMD guest translation enabling fix - Update IOMMU tree location in MAINTAINERS file * tag 'iommu-fixes-v6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: MAINTAINERS: Update IOMMU tree location iommu/amd: Fix GT feature enablement again iommu/vt-d: Fix missed device TLB cache tag iommu/amd: Invalidate cache before removing device from domain list
2024-06-28	Merge tag 'gpio-fixes-for-v6.10-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: "An assortment of driver fixes and two commits addressing a bad behavior of the GPIO uAPI when reconfiguring requested lines. - fix a race condition in i2c transfers by adding a missing i2c lock section in gpio-pca953x - validate the number of obtained interrupts in gpio-davinci - add missing raw_spinlock_init() in gpio-graniterapids - fix bad character device behavior: disallow GPIO line reconfiguration without set direction both in v1 and v2 uAPI" * tag 'gpio-fixes-for-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: cdev: Ignore reconfiguration without direction gpiolib: cdev: Disallow reconfiguration without direction (uAPI v1) gpio: graniterapids: Add missing raw_spinlock_init() gpio: davinci: Validate the obtained number of IRQs gpio: pca953x: fix pca953x_irq_bus_sync_unlock race
2024-06-28	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A pair of small arm64 fixes for -rc6. One is a fix for the recently merged uffd-wp support (which was triggering a spurious warning) and the other is a fix to the clearing of the initial idmap pgd in some configurations Summary: - Fix spurious page-table warning when clearing PTE_UFFD_WP in a live pte - Fix clearing of the idmap pgd when using large addressing modes" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Clear the initial ID map correctly before remapping arm64: mm: Permit PTE SW bits to change in live mappings
2024-06-28	Merge tag 'v6.10-rc-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat fixes from Len Brown: "Fix three recent minor turbostat regressions" * tag 'v6.10-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: Add local build_bug.h header for snapshot target tools/power turbostat: Fix unc freq columns not showing with '-q' or '-l' tools/power turbostat: option '-n' is ambiguous
2024-06-28	netfilter: xt_recent: Lift restrictions on max hitcount value	Phil Sutter
	Support tracking of up to 65535 packets per table entry instead of just 255 to better facilitate longer term tracking or higher throughput scenarios. Note how this aligns sizes of struct recent_entry's 'nstamps' and 'index' fields when 'nstamps' was larger before. This is unnecessary as the value of 'nstamps' grows along with that of 'index' after being initialized to 1 (see recent_entry_update()). Its value will thus never exceed that of 'index' and therefore does not need to provide space for larger values. Requested-by: Fabio <pedretti.fabio@gmail.com> Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1745 Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-28	selftests: netfilter: nft_queue.sh: add test for disappearing listener	Florian Westphal
	If userspace program exits while the queue its subscribed to has packets those need to be discarded. commit dc21c6cc3d69 ("netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be triggered in this case. Add a test case to cover this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-28	tty: mxser: Remove __counted_by from mxser_board.ports[]	Nathan Chancellor
	Work for __counted_by on generic pointers in structures (not just flexible array members) has started landing in Clang 19 (current tip of tree). During the development of this feature, a restriction was added to __counted_by to prevent the flexible array member's element type from including a flexible array member itself such as: struct foo { int count; char buf[]; }; struct bar { int count; struct foo data[] __counted_by(count); }; because the size of data cannot be calculated with the standard array size formula: sizeof(struct foo) * count This restriction was downgraded to a warning but due to CONFIG_WERROR, it can still break the build. The application of __counted_by on the ports member of 'struct mxser_board' triggers this restriction, resulting in: drivers/tty/mxser.c:291:2: error: 'counted_by' should not be applied to an array with element of unknown size because 'struct mxser_port' is a struct type with a flexible array member. This will be an error in a future compiler version [-Werror,-Wbounds-safety-counted-by-elt-type-unknown-size] 291 \| struct mxser_port ports[] __counted_by(nports); \| ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Remove this use of __counted_by to fix the warning/error. However, rather than remove it altogether, leave it commented, as it may be possible to support this in future compiler releases. Cc: <stable@vger.kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/2026 Fixes: f34907ecca71 ("mxser: Annotate struct mxser_board with __counted_by") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240529-drop-counted-by-ports-mxser-board-v1-1-0ab217f4da6d@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-28	randomize_kstack: Remove non-functional per-arch entropy filtering	Kees Cook
	An unintended consequence of commit 9c573cd31343 ("randomize_kstack: Improve entropy diffusion") was that the per-architecture entropy size filtering reduced how many bits were being added to the mix, rather than how many bits were being used during the offsetting. All architectures fell back to the existing default of 0x3FF (10 bits), which will consume at most 1KiB of stack space. It seems that this is working just fine, so let's avoid the confusion and update everything to use the default. The prior intent of the per-architecture limits were: arm64: capped at 0x1FF (9 bits), 5 bits effective powerpc: uncapped (10 bits), 6 or 7 bits effective riscv: uncapped (10 bits), 6 bits effective x86: capped at 0xFF (8 bits), 5 (x86_64) or 6 (ia32) bits effective s390: capped at 0xFF (8 bits), undocumented effective entropy Current discussion has led to just dropping the original per-architecture filters. The additional entropy appears to be safe for arm64, x86, and s390. Quoting Arnd, "There is no point pretending that 15.75KB is somehow safe to use while 15.00KB is not." Co-developed-by: Yuntao Liu <liuyuntao12@huawei.com> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com> Fixes: 9c573cd31343 ("randomize_kstack: Improve entropy diffusion") Link: https://lore.kernel.org/r/20240617133721.377540-1-liuyuntao12@huawei.com Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20240619214711.work.953-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-28	string: kunit: add missing MODULE_DESCRIPTION() macros	Jeff Johnson
	make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/string_kunit.o WARNING: modpost: missing MODULE_DESCRIPTION() in lib/string_helpers_kunit.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240531-md-lib-string-v1-1-2738cf057d94@quicinc.com Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-28	ata: libata-core: Add ATA_HORKAGE_NOLPM for all Crucial BX SSD1 models	Niklas Cassel
	We got another report that CT1000BX500SSD1 does not work with LPM. If you look in libata-core.c, we have six different Crucial devices that are marked with ATA_HORKAGE_NOLPM. This model would have been the seventh. (This quirk is used on Crucial models starting with both CT* and Crucial_CT*) It is obvious that this vendor does not have a great history of supporting LPM properly, therefore, add the ATA_HORKAGE_NOLPM quirk for all Crucial BX SSD1 models. Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Cc: stable@vger.kernel.org Reported-by: Alessandro Maggio <alex.tkd.alex@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832 Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240627105551.4159447-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-06-28	selftests/harness: Fix tests timeout and race condition	Mickaël Salaün
	We cannot use CLONE_VFORK because we also need to wait for the timeout signal. Restore tests timeout by using the original fork() call in __run_test() but also in __TEST_F_IMPL(). Also fix a race condition when waiting for the test child process. Because test metadata are shared between test processes, only the parent process must set the test PID (child). Otherwise, t->pid may be set to zero, leading to inconsistent error cases: # RUN layout1.rule_on_mountpoint ... # rule_on_mountpoint: Test ended in some other way [127] # OK layout1.rule_on_mountpoint ok 20 layout1.rule_on_mountpoint As safeguards, initialize the "status" variable with a valid exit code, and handle unknown test exits as errors. The use of fork() introduces a new race condition in landlock/fs_test.c which seems to be specific to hostfs bind mounts, but I haven't found the root cause and it's difficult to trigger. I'll try to fix it with another patch. Cc: Christian Brauner <brauner@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Closes: https://lore.kernel.org/r/9341d4db-5e21-418c-bf9e-9ae2da7877e1@sirena.org.uk Fixes: a86f18903db9 ("selftests/harness: Fix interleaved scheduling leading to race conditions") Fixes: 24cf65a62266 ("selftests/harness: Share _metadata between forked processes") Link: https://lore.kernel.org/r/20240621180605.834676-1-mic@digikod.net Tested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-06-28	can: gs_usb: add VID/PID for Xylanta SAINT3 product family	Marc Kleine-Budde
	Add support for the Xylanta SAINT3 product family. Cc: Andy Jackson <andy@xylanta.com> Cc: Ken Aitchison <ken@xylanta.com> Tested-by: Andy Jackson <andy@xylanta.com> Link: https://lore.kernel.org/all/20240625140353.769373-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-06-28	MAINTAINERS: Update IOMMU tree location	Joerg Roedel
	Update the maintainers entries to the new location of the IOMMU tree. Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-28	Merge tag 'ieee802154-for-net-2024-06-27' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan into main
2024-06-28	Merge branch 'mlx5-fixes' into main	David S. Miller
	Tariq Toukan says: ==================== mlx5 fixes 2024-06-27 This patchset provides fixes from the team to the mlx5 core and EN drivers. The first 3 patches by Daniel replace a buggy cap field with a newly introduced one. Patch 4 by Chris de-couples ingress ACL creation from a specific flow, so it's invoked by other flows if needed. Patch 5 by Jianbo fixes a possible missing cleanup of QoS objects. Patches 6 and 7 by Leon fixes IPsec stats logic to better reflect the traffic. Series generated against: commit 02ea312055da ("octeontx2-pf: Fix coverity and klockwork issues in octeon PF driver") V2: Fixed wrong cited SHA in patch 6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	net/mlx5e: Approximate IPsec per-SA payload data bytes count	Leon Romanovsky
	ConnectX devices lack ability to count payload data byte size which is needed for SA to return to libreswan for rekeying. As a solution let's approximate that by decreasing headers size from total size counted by flow steering. The calculation doesn't take into account any other headers which can be in the packet (e.g. IP extensions). Fixes: 5a6cddb89b51 ("net/mlx5e: Update IPsec per SA packets/bytes count") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	net/mlx5e: Present succeeded IPsec SA bytes and packet	Leon Romanovsky
	IPsec SA statistics presents successfully decrypted and encrypted packet and bytes, and not total handled by this SA. So update the calculation logic to take into account failures. Fixes: 6fb7f9408779 ("net/mlx5e: Connect mlx5 IPsec statistics with XFRM core") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	net/mlx5e: Add mqprio_rl cleanup and free in mlx5e_priv_cleanup()	Jianbo Liu
	In the cited commit, mqprio_rl cleanup and free are mistakenly removed in mlx5e_priv_cleanup(), and it causes the leakage of host memory and firmware SCHEDULING_ELEMENT objects while changing eswitch mode. So, add them back. Fixes: 0bb7228f7096 ("net/mlx5e: Fix mqprio_rl handling on devlink reload") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	net/mlx5: E-switch, Create ingress ACL when needed	Chris Mi
	Currently, ingress acl is used for three features. It is created only when vport metadata match and prio tag are enabled. But active-backup lag mode also uses it. It is independent of vport metadata match and prio tag. And vport metadata match can be disabled using the following devlink command: # devlink dev param set pci/0000:08:00.0 name esw_port_metadata \ value false cmode runtime If ingress acl is not created, will hit panic when creating drop rule for active-backup lag mode. If always create it, there will be about 5% performance degradation. Fix it by creating ingress acl when needed. If esw_port_metadata is true, ingress acl exists, then create drop rule using existing ingress acl. If esw_port_metadata is false, create ingress acl and then create drop rule. Fixes: 1749c4c51c16 ("net/mlx5: E-switch, add drop rule support to ingress ACL") Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	net/mlx5: Use max_num_eqs_24b when setting max_io_eqs	Daniel Jurgens
	Due a bug in the device max_num_eqs doesn't always reflect a written value. As a result, setting max_io_eqs may not work but appear successful. Instead write max_num_eqs_24b, which reflects correct value. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	net/mlx5: Use max_num_eqs_24b capability if set	Daniel Jurgens
	A new capability with more bits is added. If it's set use that value as the maximum number of EQs available. This cap is also writable by the vhca_resource_manager to allow limiting the number of EQs available to SFs and VFs. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	net/mlx5: IFC updates for changing max EQs	Daniel Jurgens
	Expose new capability to support changing the number of EQs available to other functions. Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	Merge branch 'net-selftests-mirroring-cleanup' into main	David S. Miller
	Petr Machata says: ==================== selftest: Clean-up and stabilize mirroring tests The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy. mirror_test() accommodated this noisiness by giving the counters an allowance of several packets. But that only works up to a point, and on busy systems won't be always enough. In this patch set, clean up and stabilize the mirroring selftests. The original intention was to port the tests over to UDP, but the logic of ICMP ends up being so entangled in the mirroring selftests that the changes feel overly invasive. Instead, ICMP is kept, but where possible, we match on ICMP message type, thus filtering out hits by other ICMP messages. Where this is not practical (where the counter tap is put on a device that carries encapsulated packets), switch the counter condition to _at least_ X observed packets. This is less robust, but barely so -- probably the only scenario that this would not catch is something like erroneous packet duplication, which would hopefully get caught by the numerous other tests in this extensive suite. - Patches #1 to #3 clean up parameters at various helpers. - Patches #4 to #6 stabilize the mirroring selftests as described above. - Mirroring tests currently allow testing SW datapath even on HW netdevices by trapping traffic to the SW datapath. This complicates the tests a bit without a good reason: to test SW datapath, just run the selftests on the veth topology. Thus in patch #7, drop support for this dual SW/HW testing. - At this point, some cleanups were either made possible by the previous patches, or were always possible. In patches #8 to #11, realize these cleanups. - In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mlxsw: mirror_gre: Obey TESTS	Petr Machata
	This test is unusual in that overriding TESTS does not change the tests to be run. Split the individual tests into several functions and invoke them through tests_run() as appropriate. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: libs: Drop unused functions	Petr Machata
	Nothing calls these. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: libs: Drop slow_path_trap_install()/_uninstall()	Petr Machata
	These functions are not used anymore. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror_gre_lag_lacp: Drop unnecessary code	Petr Machata
	The selftest does not use functions from mirror_gre_lib, ditch the import. It does not use arping either, so drop the require_command as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mlxsw: mirror_gre: Simplify	Petr Machata
	After the previous patch, the function test_span_failable() is always called with should_fail=1. Drop the argument and streamline the code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror: Drop dual SW/HW testing	Petr Machata
	The mirroring tests are currently run in a skip_hw and optionally a skip_sw mode. The former tests the SW datapath, the latter the HW datapath, if available. In order to be able to test SW datapath on HW loopbacks, traps are installed on ingress to get traffic from the HW datapath to the SW one. This adds an unnecessary complexity when it would be much simpler to just use a veth-based topology to test the SW datapath. Thus drop all the code that supports this dual testing. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror: mirror_test(): Allow exact count of packets	Petr Machata
	The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy, because besides the primary ICMP traffic, any amount of other service traffic is mirrored as well. mirror_test() accommodated this noisiness by giving the counters an allowance of several packets. But in the previous patch, where possible, counter taps were changed to match only on an exact ICMP message. At least in those cases, we can demand an exact number of packets to match. Where the tap is installed on a connective netdevice, the exact matching is not practical (though with u32, anything is possible). In those places, there should still be some leeway -- and probably bigger than before, because experience shows that these tests are very noisy. To that end, change mirror_test() so that it can be either called with an exact number to expect, or with an expression. Where leeway is needed, adjust callers to pass a ">= 10" instead of mere 10. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror: do_test_span_dir_ips(): Install accurate taps	Petr Machata
	The mirroring selftests work by sending ICMP traffic between two hosts. Along the way, this traffic is mirrored to a gretap netdevice, and counter taps are then installed strategically along the path of the mirrored traffic to verify the mirroring took place. The problem with this is that besides mirroring the primary traffic, any other service traffic is mirrored as well. At the same time, because the tests need to work in HW-offloaded scenarios, the ability of the device to do arbitrary packet inspection should not be taken for granted. Most tests therefore simply use matchall, one uses flower to match on IP address. As a result, the selftests are noisy, because besides the primary ICMP traffic, any amount of other service traffic is mirrored as well. However, often the counter tap is installed at the remote end of the gretap tunnel. Since this is a SW-datapath scenario anyway, we can make the filter arbitrarily accurate. Thus in this patch, add parameters forward_type and backward_type to several mirroring test helpers, as some other helpers already have. Then change do_test_span_dir_ips() to instead of installing one generic tap and using it for test in both directions, install the tap for each direction separately, matching on the ICMP type given by these parameters. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror_gre_lag_lacp: Check counters at tunnel	Petr Machata
	The test works by sending packets through a tunnel, whence they are forwarded to a LAG. One of the LAG children is removed from the LAG prior to the exercise, and the test then counts how many packets pass through the other one. The issue with this is that it counts all packets, not just the encapsulated ones. So instead add a second gretap endpoint to receive the sent packets, and check reception counters there. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: lib: tc_rule_stats_get(): Move default to argument definition	Petr Machata
	The argument $dir has a fallback value of "ingress". Move the fallback from the usage site to the argument definition block to make the fact clearer. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: mirror: Drop direction argument from several functions	Petr Machata
	The argument is not used by these functions except to propagate it for ultimately no purpose. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	selftests: libs: Expand "$@" where possible	Petr Machata
	In some functions, argument-forwarding through "$@" without listing the individual arguments explicitly is fundamental to the operation of a function. E.g. xfail_on_veth() should be able to run various tests in the fail-to-xfail regime, and usage of "$@" is appropriate as an abstraction mechanism. For functions such as simple_if_init(), $@ is a handy way to pass an array. In other functions, it's merely a mechanism to save some typing, which however ends up obscuring the real arguments and makes life hard for those that end up reading the code. This patch adds some of the implicit function arguments and correspondingly expands $@'s. In several cases this will come in handy as following patches adjust the parameter lists. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	Merge branch 'net-flash-modees-firmware' into main	David S. Miller
	Danielle Ratson says: ==================== Add ability to flash modules' firmware CMIS compliant modules such as QSFP-DD might be running a firmware that can be updated in a vendor-neutral way by exchanging messages between the host and the module as described in section 7.2.2 of revision 4.0 of the CMIS standard. According to the CMIS standard, the firmware update process is done using a CDB commands sequence. CDB (Command Data Block Message Communication) reads and writes are performed on memory map pages 9Fh-AFh according to the CMIS standard, section 8.12 of revision 4.0. Add a pair of new ethtool messages that allow: * User space to trigger firmware update of transceiver modules * The kernel to notify user space about the progress of the process The user interface is designed to be asynchronous in order to avoid RTNL being held for too long and to allow several modules to be updated simultaneously. The interface is designed with CMIS compliant modules in mind, but kept generic enough to accommodate future use cases, if these arise. The kernel interface that will implement the firmware update using CDB command will include 2 layers that will be added under ethtool: * The upper layer that will be triggered from the module layer, is cmis_ fw_update. * The lower one is cmis_cdb. In the future there might be more operations to implement using CDB commands. Therefore, the idea is to keep the cmis_cdb interface clean and the cmis_fw_update specific to the cdb commands handling it. The communication between the kernel and the driver will be done using two ethtool operations that enable reading and writing the transceiver module EEPROM. The operation ethtool_ops::get_module_eeprom_by_page, that is already implemented, will be used for reading from the EEPROM the CDB reply, e.g. reading module setting, state, etc. The operation ethtool_ops::set_module_eeprom_by_page, that is added in the current patchset, will be used for writing to the EEPROM the CDB command such as start firmware image, run firmware image, etc. Therefore in order for a driver to implement module flashing, that driver needs to implement the two functions mentioned above. Patchset overview: Patch #1-#2: Implement the EEPROM writing in mlxsw. Patch #3: Define the interface between the kernel and user space. Patch #4: Add ability to notify the flashing firmware progress. Patch #5: Veto operations during flashing. Patch #6: Add extended compliance codes. Patch #7: Add the cdb layer. Patch #8: Add the fw_update layer. Patch #9: Add ability to flash transceiver modules' firmware. v8: Patch #7: * In the ethtool_cmis_wait_for_cond() evaluate the condition once more to decide if the error code should be -ETIMEDOUT or something else. * s/netdev_err/netdev_err_once. v7: Patch #4: * Return -ENOMEM instead of PTR_ERR(attr) on ethnl_module_fw_flash_ntf_put_err(). Patch #9: * Fix Warning for not unlocking the spin_lock in the error flow on module_flash_fw_work_list_add(). * Avoid the fall-through on ethnl_sock_priv_destroy(). v6: * Squash some of the last patch to patch #5 and patch #9. Patch #3: * Add paragraph in .rst file. Patch #4: * Reserve '1' more place on SKB for NUL terminator in the error message string. * Add more prints on error flow, re-write the printing function and add ethnl_module_fw_flash_ntf_put_err(). * Change the communication method so notification will be sent in unicast instead of multicast. * Add new 'struct ethnl_module_fw_flash_ntf_params' that holds the relevant info for unicast communication and use it to send notification to the specific socket. * s/nla_put_u64_64bit/nla_put_uint/ Patch #7: * In ethtool_cmis_cdb_init(), Use 'const' for the 'params' parameter. Patch #8: * Add a list field to struct ethtool_module_fw_flash for module_fw_flash_work_list that will be presented in the next patch. * Move ethtool_cmis_fw_update() cleaning to a new function that will be represented in the next patch. * Move some of the fields in struct ethtool_module_fw_flash to a separate struct, so ethtool_cmis_fw_update() will get only the relevant parameters for it. * Edit the relevant functions to get the relevant params for them. * s/CMIS_MODULE_READY_MAX_DURATION_USEC/CMIS_MODULE_READY_MAX_DURATION_MSEC Patch #9: * Add a paragraph in the commit message. * Rename labels in module_flash_fw_schedule(). * Add info to genl_sk_priv_() and implement the relevant callbacks, in order to handle properly a scenario of closing the socket from user space before the work item was ended. Add a list the holds all the ethtool_module_fw_flash struct that corresponds to the in progress work items. * Add a new enum for the socket types. * Use both above to identify a flashing socket, add it to the list and when closing socket affect only the flashing type. * Create a new function that will get the work item instead of ethtool_cmis_fw_update(). * Edit the relevant functions to get the relevant params for them. * The new function will call the old ethtool_cmis_fw_update(), and do the cleaning, so the existence of the list should be completely isolated in module.c. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-28	ethtool: Add ability to flash transceiver modules' firmware	Danielle Ratson
	Add the ability to flash the modules' firmware by implementing the interface between the user space and the kernel. Example from a succeeding implementation: # ethtool --flash-module-firmware swp40 file test.bin Transceiver module firmware flashing started for device swp40 Transceiver module firmware flashing in progress for device swp40 Progress: 99% Transceiver module firmware flashing completed for device swp40 In addition, add infrastructure that allows modules to set socket-specific private data. This ensures that when a socket is closed from user space during the flashing process, the right socket halts sending notifications to user space until the work item is completed. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>