summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-05-17Merge tag 'dmaengine-fix-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "This has a bunch of idxd driver fixes, dmatest revert and bunch of smaller driver fixes: - a bunch of idxd potential mem leak fixes - dmatest revert for waiting for interrupt fix as that causes issue - a couple of ti k3 udma fixes for locking and cap_mask - mediatek deadlock fix and unused variable cleanup fix" * tag 'dmaengine-fix-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: mediatek: drop unused variable dmaengine: fsl-edma: Fix return code for unhandled interrupts dmaengine: mediatek: Fix a possible deadlock error in mtk_cqdma_tx_status() dmaengine: idxd: Fix ->poll() return value dmaengine: idxd: Refactor remove call with idxd_cleanup() helper dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call dmaengine: idxd: fix memory leak in error handling path of idxd_pci_probe dmaengine: idxd: fix memory leak in error handling path of idxd_alloc dmaengine: idxd: Add missing cleanups in cleanup internals dmaengine: idxd: Add missing cleanup for early error out in idxd_setup_internals dmaengine: idxd: fix memory leak in error handling path of idxd_setup_groups dmaengine: idxd: fix memory leak in error handling path of idxd_setup_engines dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs dmaengine: ptdma: Move variable condition check to the first place and remove redundancy dmaengine: idxd: Fix allowing write() from different address spaces dmaengine: ti: k3-udma: Add missing locking dmaengine: ti: k3-udma: Use cap_mask directly from dma_device structure instead of a local copy dmaengine: Revert "dmaengine: dmatest: Fix dmatest waiting less when interrupted" dmaengine: idxd: cdev: Fix uninitialized use of sva in idxd_cdev_open
2025-05-17Merge tag 'phy-fixes-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy fixes from Vinod Koul: "A bunch of renesas fixes and few smaller fixes in other drivers: - Rensas fixes for unbind ole detection, irq, locking etc - tegra fixes for error handling at init and UTMI power states and stray unlock fix - rockchip missing assignment and pll output fixes - startfive usb host detection fixes" * tag 'phy-fixes-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: Fix error handling in tegra_xusb_port_init phy: renesas: rcar-gen3-usb2: Set timing registers only once phy: renesas: rcar-gen3-usb2: Assert PLL reset on PHY power off phy: renesas: rcar-gen3-usb2: Lock around hardware registers and driver data phy: renesas: rcar-gen3-usb2: Move IRQ request in probe phy: renesas: rcar-gen3-usb2: Fix role detection on unbind/bind phy: tegra: xusb: remove a stray unlock phy: phy-rockchip-samsung-hdptx: Fix PHY PLL output 50.25MHz error phy: starfive: jh7110-usb: Fix USB 2.0 host occasional detection failure phy: rockchip-samsung-dcphy: Add missing assignment phy: can-transceiver: Re-instate "mux-states" property presence check phy: qcom-qmp-ufs: check for mode type for phy setting phy: tegra: xusb: Use a bitmask for UTMI pad power state tracking
2025-05-17Merge tag 'soundwire-6.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire fix from Vinod Koul: - Fix for irq domain creation race in the core * tag 'soundwire-6.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: soundwire: bus: Fix race on the creation of the IRQ domain
2025-05-17Merge tag 'irq-urgent-2025-05-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc irqchip driver fixes from Ingo Molnar: - Remove the MSI_CHIP_FLAG_SET_ACK flag from 5 irqchip drivers that did not require it - Fix IRQ handling delays in the riscv-imsic irqchip driver * tag 'irq-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/riscv-imsic: Start local sync timer on correct CPU irqchip: Drop MSI_CHIP_FLAG_SET_ACK from unsuspecting MSI drivers
2025-05-17Merge tag 'i2c-for-6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: - designware: cleanup properly on probe failure * tag 'i2c-for-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: designware: Fix an error handling path in i2c_dw_pci_probe()
2025-05-16Merge tag 'drm-fixes-2025-05-17' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly drm fixes, I'll be honest and say I think this is larger than I'd prefer at this point, the main blow out point is that xe has two larger fixes. One is a fix for active context utilisation reporting, it's for a reported regression and will end up in stable anyways, so I don't see any point in holding it up. The second is a fix for mixed cpu/gpu atomics, which are currently broken, but are also not something your average desktop/laptop user is going to hit in normal operation, and having them fixed now is better than threading them through stable later. Other than those, it's mostly the usual, a bunch of amdgpu randoms and a few other minor fixes. dma-buf: - Avoid memory reordering in fence handling meson: - Avoid integer overflow in mode-clock calculations panel-mipi-dbi: - Fix output with drm_client_setup_with_fourcc() amdgpu: - Fix CSA unmap - Fix MALL size reporting on GFX11.5 - AUX fix - DCN 3.5 fix - VRR fix - DP MST fix - DML 2.1 fixes - Silence DP AUX spam - DCN 4.0.1 cursor fix - VCN 4.0.5 fix ivpu: - Fix buffer size in debugfs code gpuvm: - Add timeslicing and allocation restriction for SVM xe: - Fix shrinker debugfs name - Add HW workaround to Xe2 - Fix SVM when mixing GPU and CPU atomics - Fix per client engine utilization due to active contexts not saving timestamp with lite restore enabled" * tag 'drm-fixes-2025-05-17' of https://gitlab.freedesktop.org/drm/kernel: (24 commits) drm/xe: Add WA BB to capture active context utilization drm/xe: Save the gt pointer in lrc and drop the tile drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value drm/xe: Timeslice GPU on atomic SVM fault drm/gpusvm: Add timeslicing support to GPU SVM drm/xe: Strict migration policy for atomic SVM faults drm/gpusvm: Introduce devmem_only flag for allocation drm/xe/xe2hpg: Add Wa_22021007897 drm/amdgpu: read back register after written for VCN v4.0.5 Revert "drm/amd/display: Hardware cursor changes color when switched to software cursor" dma-buf: insert memory barrier before updating num_fences drm/xe: Fix the gem shrinker name drm/amd/display: Avoid flooding unnecessary info messages drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp drm/amd/display: check stream id dml21 wrapper to get plane_id drm/amd/display: fix link_set_dpms_off multi-display MST corner case drm/amd/display: Defer BW-optimization-blocked DRR adjustments Revert: "drm/amd/display: Enable urgent latency adjustment on DCN35" drm/amd/display: Correct the reply value when AUX write incomplete drm/amdgpu: fix incorrect MALL size for GFX1151 ...
2025-05-16Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "Fix to zone block devices to make the maximum segment count match what the block layer is capable of" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES buffer
2025-05-16Merge tag 'block-6.15-20250515' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - fixes for atomic writes (Alan Adamson) - fixes for polled CQs in nvmet-epf (Damien Le Moal) - fix for polled CQs in nvme-pci (Keith Busch) - fix compile on odd configs that need to be forced to inline (Kees Cook) - one more quirk (Ilya Guterman) - Fix for missing allocation of an integrity buffer for some cases - Fix for a regression with ublk command cancelation * tag 'block-6.15-20250515' of git://git.kernel.dk/linux: ublk: fix dead loop when canceling io command nvme-pci: add NVME_QUIRK_NO_DEEPEST_PS quirk for SOLIDIGM P44 Pro nvme: all namespaces in a subsystem must adhere to a common atomic write size nvme: multipath: enable BLK_FEAT_ATOMIC_WRITES for multipathing nvmet: pci-epf: remove NVMET_PCI_EPF_Q_IS_SQ nvmet: pci-epf: improve debug message nvmet: pci-epf: cleanup nvmet_pci_epf_raise_irq() nvmet: pci-epf: do not fall back to using INTX if not supported nvmet: pci-epf: clear completion queue IRQ flag on delete nvme-pci: acquire cq_poll_lock in nvme_poll_irqdisable nvme-pci: make nvme_pci_npages_prp() __always_inline block: always allocate integrity buffer when required
2025-05-16Merge tag 'acpi-6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Fix ACPI PPTT parsing code to address a regression introduced recently and add more sanity checking of data supplied by the platform firmware to avoid using invalid data (Jeremy Linton)" * tag 'acpi-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PPTT: Fix processor subtable walk
2025-05-16Merge tag 'spi-fix-v6.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small driver specific fixes, the most substantial one being the Tegra one which fixes spurious errors with default delays for chip select hold times" * tag 'spi-fix-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-sun4i: fix early activation spi: tegra114: Use value to check for invalid delays spi: loopback-test: Do not split 1024-byte hexdumps
2025-05-16Merge tag 'regulator-fix-v6.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "This fixes an invalid memory access in the MAX20086 driver which could occur during error handling for failed probe due to a hidden use of devres in the core DT parsing code" * tag 'regulator-fix-v6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: max20086: fix invalid memory access
2025-05-16Merge tag 'gpio-fixes-for-v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix an interrupt storm on system wake-up in gpio-pca953x - fix an out-of-bounds write in gpio-virtuser - update MAINTAINERS with an entry for the sloppy logic analyzer * tag 'gpio-fixes-for-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: virtuser: fix potential out-of-bound write gpio: pca953x: fix IRQ storm on system wake up MAINTAINERS: add me as maintainer for the gpio sloppy logic analyzer
2025-05-16Merge tag 'sound-6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A handful small fixes. The only significant change is the fix for MIDI 2.0 UMP handling in ALSA sequencer, but as MIDI 2.0 stuff is still new and rarely used, the impact should be pretty limited. Other than that, quirks for USB-audio and a few cosmetic fixes and changes in drivers that should be safe to apply" * tag 'sound-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Add sample rate quirk for Microdia JP001 USB Camera ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2() ALSA: sh: SND_AICA should depend on SH_DMA_API ALSA: usb-audio: Add sample rate quirk for Audioengine D1 ALSA: ump: Fix a typo of snd_ump_stream_msg_device_info ALSA/hda: intel-sdw-acpi: Correct sdw_intel_acpi_scan() function parameter ALSA: seq: Fix delivery of UMP events to group ports
2025-05-16Merge tag 'drm-xe-fixes-2025-05-15-1' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Core Changes: - Add timeslicing and allocation restriction for SVM Driver Changes: - Fix shrinker debugfs name - Add HW workaround to Xe2 - Fix SVM when mixing GPU and CPU atomics - Fix per client engine utilization due to active contexts not saving timestamp with lite restore enabled. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/qil4scyn6ucnt43u5ju64bi7r7n5r36k4pz5rsh2maz7isle6g@lac3jpsjrrvs
2025-05-16Merge tag 'drm-misc-fixes-2025-05-15' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: dma-buf: - Avoid memory reordering in fence handling ivpu: - Fix buffer size in debugfs code meson: - Avoid integer overflow in mode-clock calculations panel-mipi-dbi: - Fix output with drm_client_setup_with_fourcc() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250515125534.GA41174@linux.fritz.box
2025-05-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Four small fixes for crashes: - Double free in rxe - UAF in irdma from early freeing the rf - Off by one undoing the IRQ allocations during error unwind in irdma - Another race with device rename and uevent generation. uevents accesses the struct device name and UAF when it is changed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/core: Fix "KASAN: slab-use-after-free Read in ib_register_device" problem ice, irdma: fix an off by one in error handling code irdma: free iwdev->rf after removing MSI-X RDMA/rxe: Fix slab-use-after-free Read in rxe_queue_cleanup bug
2025-05-15Merge tag 'hid-for-linus-2025051501' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - fix a few potential memory leaks in the wacom driver (Qasim Ijaz) - AMD SFH fixes when there is only one SRA sensor (Mario Limonciello) - HID-BPF dispatch UAF fix that happens on removal of the Logitech DJ receiver (Rong Zhang) - various minor fixes and usual device ID additions * tag 'hid-for-linus-2025051501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: bpf: abort dispatch if device destroyed HID: quirks: Add ADATA XPG alpha wireless mouse support HID: hid-steam: Remove the unused variable connected HID: amd_sfh: Avoid clearing reports for SRA sensor HID: amd_sfh: Fix SRA sensor when it's the only sensor HID: wacom: fix shift OOB in kfifo allocation for zero pktlen HID: uclogic: Add NULL check in uclogic_input_configured() HID: wacom: fix memory leak on size mismatch in wacom_wac_queue_flush() HID: wacom: handle kzalloc() allocation failure in wacom_wac_queue_flush() HID: thrustmaster: fix memory leak in thrustmaster_interrupts() HID: hid-appletb-kbd: Fix wrong date and kernel version in sysfs interface docs HID: bpf: fix BTN_STYLUS for the XP Pen ACK05 remote
2025-05-15Merge tag 'net-6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Bluetooth and wireless. A few more fixes for the locking changes trickling in. Nothing too alarming, I suspect those will continue for another release. Other than that things are slowing down nicely. Current release - fix to a fix: - Bluetooth: hci_event: use key encryption size when its known - tools: ynl-gen: allow multi-attr without nested-attributes again Current release - regressions: - locking fixes: - lock lower level devices when updating features - eth: bnxt_en: bring back rtnl_lock() in the bnxt_open() path - devmem: fix panic when Netlink socket closes after module unload Current release - new code bugs: - eth: txgbe: fixes for FW communication on new AML devices Previous releases - always broken: - sched: flush gso_skb list too during ->change(), avoid potential null-deref on reconfig - wifi: mt76: disable NAPI on driver removal - hv_netvsc: fix error 'nvsp_rndis_pkt_complete error status: 2'" * tag 'net-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits) net: devmem: fix kernel panic when netlink socket close after module unload tsnep: fix timestamping with a stacked DSA driver net/tls: fix kernel panic when alloc_page failed bnxt_en: bring back rtnl_lock() in the bnxt_open() path mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devices wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request octeontx2-pf: Do not reallocate all ntuple filters wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtbl wifi: mt76: disable napi on driver removal Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer() hv_netvsc: Remove rmsg_pgcnt hv_netvsc: Preserve contiguous PFN grouping in the page buffer array hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messages Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges octeontx2-af: Fix CGX Receive counters net: ethernet: mtk_eth_soc: fix typo for declaration MT7988 ESW capability net: libwx: Fix FW mailbox unknown command net: libwx: Fix FW mailbox reply timeout net: txgbe: Fix to calculate EEPROM checksum for AML devices octeontx2-pf: macsec: Fix incorrect max transmit size in TX secy ...
2025-05-15ublk: fix dead loop when canceling io commandMing Lei
Commit: f40139fde527 ("ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd") adds a request state check in ublk_cancel_cmd(), and if the request is started, skips canceling this uring_cmd. However, the current uring_cmd may be in ACTIVE state, without block request coming to the uring command. Meantime, if the cached request in tag_set.tags[tag] has been delivered to ublk server and reycycled, then this uring_cmd can't be canceled. ublk requests are aborted in ublk char device release handler, which depends on canceling all ACTIVE uring_cmd. So it causes a dead loop. Fix this issue by not taking a stale request into account when canceling uring_cmd in ublk_cancel_cmd(). Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-block/mruqwpf4tqenkbtgezv5oxwq7ngyq24jzeyqy4ixzvivatbbxv@4oh2wzz4e6qn/ Fixes: f40139fde527 ("ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250515162601.77346-1-ming.lei@redhat.com [axboe: rewording of commit message] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-15tsnep: fix timestamping with a stacked DSA driverGerhard Engleder
This driver is susceptible to a form of the bug explained in commit c26a2c2ddc01 ("gianfar: Fix TX timestamping with a stacked DSA driver") and in Documentation/networking/timestamping.rst section "Other caveats for MAC drivers", specifically it timestamps any skb which has SKBTX_HW_TSTAMP, and does not consider if timestamping has been enabled in adapter->hwtstamp_config.tx_type. Evaluate the proper TX timestamping condition only once on the TX path (in tsnep_xmit_frame_ring()) and store the result in an additional TX entry flag. Evaluate the new TX entry flag in the TX confirmation path (in tsnep_tx_poll()). This way SKBTX_IN_PROGRESS is set by the driver as required, but never evaluated. SKBTX_IN_PROGRESS shall not be evaluated as it can be set by a stacked DSA driver and evaluating it would lead to unwanted timestamps. Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250514195657.25874-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15Merge tag 'wireless-2025-05-15' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple of stragglers: - mac80211: fix syzbot/ubsan in scan counted-by - mt76: fix NAPI handling on driver remove - mt67: fix multicast/ipv6 receive * tag 'wireless-2025-05-15' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtbl wifi: mt76: disable napi on driver removal ==================== Link: https://patch.msgid.link/20250515121749.61912-4-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15bnxt_en: bring back rtnl_lock() in the bnxt_open() pathMichael Chan
Error recovery, PCIe AER, resume, and TX timeout will invoke bnxt_open() with netdev_lock only. This will cause RTNL assert failure in netif_set_real_num_tx_queues(), netif_set_real_num_tx_queues(), and netif_set_real_num_tx_queues(). Example error recovery assert: RTNL: assertion failed at net/core/dev.c (3178) WARNING: CPU: 3 PID: 3392 at net/core/dev.c:3178 netif_set_real_num_tx_queues+0x1fd/0x210 Call Trace: <TASK> ? __pfx_bnxt_msix+0x10/0x10 [bnxt_en] __bnxt_open_nic+0x1ef/0xb20 [bnxt_en] bnxt_open+0xda/0x130 [bnxt_en] bnxt_fw_reset_task+0x21f/0x780 [bnxt_en] process_scheduled_works+0x9d/0x400 For now, bring back rtnl_lock() in all these code paths that can invoke bnxt_open(). In the bnxt_queue_start() error path, we don't have rtnl_lock held so we just change it to call netif_close() instead of bnxt_reset_task() for simplicity. This error path is unlikely so it should be fine. Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250514062908.2766677-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15mlxsw: spectrum_router: Fix use-after-free when deleting GRE net devicesIdo Schimmel
The driver only offloads neighbors that are constructed on top of net devices registered by it or their uppers (which are all Ethernet). The device supports GRE encapsulation and decapsulation of forwarded traffic, but the driver will not offload dummy neighbors constructed on top of GRE net devices as they are not uppers of its net devices: # ip link add name gre1 up type gre tos inherit local 192.0.2.1 remote 198.51.100.1 # ip neigh add 0.0.0.0 lladdr 0.0.0.0 nud noarp dev gre1 $ ip neigh show dev gre1 nud noarp 0.0.0.0 lladdr 0.0.0.0 NOARP (Note that the neighbor is not marked with 'offload') When the driver is reloaded and the existing configuration is replayed, the driver does not perform the same check regarding existing neighbors and offloads the previously added one: # devlink dev reload pci/0000:01:00.0 $ ip neigh show dev gre1 nud noarp 0.0.0.0 lladdr 0.0.0.0 offload NOARP If the neighbor is later deleted, the driver will ignore the notification (given the GRE net device is not its upper) and will therefore keep referencing freed memory, resulting in a use-after-free [1] when the net device is deleted: # ip neigh del 0.0.0.0 lladdr 0.0.0.0 dev gre1 # ip link del dev gre1 Fix by skipping neighbor replay if the net device for which the replay is performed is not our upper. [1] BUG: KASAN: slab-use-after-free in mlxsw_sp_neigh_entry_update+0x1ea/0x200 Read of size 8 at addr ffff888155b0e420 by task ip/2282 [...] Call Trace: <TASK> dump_stack_lvl+0x6f/0xa0 print_address_description.constprop.0+0x6f/0x350 print_report+0x108/0x205 kasan_report+0xdf/0x110 mlxsw_sp_neigh_entry_update+0x1ea/0x200 mlxsw_sp_router_rif_gone_sync+0x2a8/0x440 mlxsw_sp_rif_destroy+0x1e9/0x750 mlxsw_sp_netdevice_ipip_ol_event+0x3c9/0xdc0 mlxsw_sp_router_netdevice_event+0x3ac/0x15e0 notifier_call_chain+0xca/0x150 call_netdevice_notifiers_info+0x7f/0x100 unregister_netdevice_many_notify+0xc8c/0x1d90 rtnl_dellink+0x34e/0xa50 rtnetlink_rcv_msg+0x6fb/0xb70 netlink_rcv_skb+0x131/0x360 netlink_unicast+0x426/0x710 netlink_sendmsg+0x75a/0xc20 __sock_sendmsg+0xc1/0x150 ____sys_sendmsg+0x5aa/0x7b0 ___sys_sendmsg+0xfc/0x180 __sys_sendmsg+0x121/0x1b0 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: 8fdb09a7674c ("mlxsw: spectrum_router: Replay neighbours when RIF is made") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/c53c02c904fde32dad484657be3b1477884e9ad6.1747225701.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15irqchip/riscv-imsic: Start local sync timer on correct CPUAndrew Bresticker
When starting the local sync timer to synchronize the state of a remote CPU it should be added on the CPU to be synchronized, not the initiating CPU. This results in interrupt delivery being delayed until the timer eventually runs (due to another mask/unmask/migrate operation) on the target CPU. Fixes: 0f67911e821c ("irqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector") Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/all/20250514171320.3494917-1-abrestic@rivosinc.com
2025-05-15dmaengine: mediatek: drop unused variableVinod Koul
Commit 157ae5ffd76a dmaengine: mediatek: Fix a possible deadlock error in mtk_cqdma_tx_status() fixed locks but kept unused varibale leading to warning and build failure (due to warning treated as errors) drivers/dma/mediatek/mtk-cqdma.c: In function 'mtk_cqdma_find_active_desc': drivers/dma/mediatek/mtk-cqdma.c:423:23: error: unused variable 'flags' [-Werror=unused-variable] 423 | unsigned long flags; | ^~~~~ Fix by dropping this unused flag Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 157ae5ffd76a ("dmaengine: mediatek: Fix a possible deadlock error in mtk_cqdma_tx_status()") Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-05-15octeontx2-pf: Do not reallocate all ntuple filtersSubbaraya Sundeep
If ntuple filters count is modified followed by unicast filters count using devlink then the ntuple count set by user is ignored and all the ntuple filters are being reallocated. Fix this by storing the ntuple count set by user. Without this patch, say if user tries to modify ntuple count as 8 followed by ucast filter count as 4 using devlink commands then ntuple count is being reverted to default value 16 i.e, not retaining user set value 8. Fixes: 39c469188b6d ("octeontx2-pf: Add ucast filter count configurability via devlink.") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747054357-5850-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-15wifi: mt76: mt7925: fix missing hdr_trans_tlv command for broadcast wtblMing Yen Hsieh
Ensure that the hdr_trans_tlv command is included in the broadcast wtbl to prevent the IPv6 and multicast packet from being dropped by the chip. Cc: stable@vger.kernel.org Fixes: cb1353ef3473 ("wifi: mt76: mt7925: integrate *mlo_sta_cmd and *sta_cmd") Reported-by: Benjamin Xiao <fossben@pm.me> Tested-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://lore.kernel.org/lkml/EmWnO5b-acRH1TXbGnkx41eJw654vmCR-8_xMBaPMwexCnfkvKCdlU5u19CGbaapJ3KRu-l3B-tSUhf8CCQwL0odjo6Cd5YG5lvNeB-vfdg=@pm.me/ Link: https://patch.msgid.link/20250509010421.403022-1-mingyen.hsieh@mediatek.com Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-05-15wifi: mt76: disable napi on driver removalFedor Pchelkin
A warning on driver removal started occurring after commit 9dd05df8403b ("net: warn if NAPI instance wasn't shut down"). Disable tx napi before deleting it in mt76_dma_cleanup(). WARNING: CPU: 4 PID: 18828 at net/core/dev.c:7288 __netif_napi_del_locked+0xf0/0x100 CPU: 4 UID: 0 PID: 18828 Comm: modprobe Not tainted 6.15.0-rc4 #4 PREEMPT(lazy) Hardware name: ASUS System Product Name/PRIME X670E-PRO WIFI, BIOS 3035 09/05/2024 RIP: 0010:__netif_napi_del_locked+0xf0/0x100 Call Trace: <TASK> mt76_dma_cleanup+0x54/0x2f0 [mt76] mt7921_pci_remove+0xd5/0x190 [mt7921e] pci_device_remove+0x47/0xc0 device_release_driver_internal+0x19e/0x200 driver_detach+0x48/0x90 bus_remove_driver+0x6d/0xf0 pci_unregister_driver+0x2e/0xb0 __do_sys_delete_module.isra.0+0x197/0x2e0 do_syscall_64+0x7b/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Tested with mt7921e but the same pattern can be actually applied to other mt76 drivers calling mt76_dma_cleanup() during removal. Tx napi is enabled in their *_dma_init() functions and only toggled off and on again inside their suspend/resume/reset paths. So it should be okay to disable tx napi in such a generic way. Found by Linux Verification Center (linuxtesting.org). Fixes: 2ac515a5d74f ("mt76: mt76x02: use napi polling for tx cleanup") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Tested-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com> Link: https://patch.msgid.link/20250506115540.19045-1-pchelkin@ispras.ru Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-05-14Drivers: hv: vmbus: Remove vmbus_sendpacket_pagebuffer()Michael Kelley
With the netvsc driver changed to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(), the latter has no remaining callers. Remove it. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-6-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Remove rmsg_pgcntMichael Kelley
init_page_array() now always creates a single page buffer array entry for the rndis message, even if the rndis message crosses a page boundary. As such, the number of page buffer array entries used for the rndis message must no longer be tracked -- it is always just 1. Remove the rmsg_pgcnt field and use "1" where the value is needed. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-5-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Preserve contiguous PFN grouping in the page buffer arrayMichael Kelley
Starting with commit dca5161f9bd0 ("hv_netvsc: Check status in SEND_RNDIS_PKT completion message") in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161f9bd0 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. The root cause of the problem is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two rndis header PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed30a10 ("net: allow small head cache usage with large MAX_SKB_FRAGS values") in the 6.14-rc4 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in an ssh session that outputs a lot of text. Commit 14ad6ed30a10 subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed30a10 are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. Current code in init_page_array() creates a separate page buffer array entry for each PFN required to identify the data to be transmitted. Contiguous PFNs get separate entries in the page buffer array, and any information about contiguity is lost. Fix the core issue by having init_page_array() construct the page buffer array to represent contiguous ranges rather than individual pages. When these ranges are subsequently passed to netvsc_build_mpb_array(), it can build GPA ranges that contain multiple PFNs, as required to avoid the error "nvsp_rndis_pkt_complete error status: 2". If instead the network packet is sent by copying into a pre-allocated send buffer slot, the copy proceeds using the contiguous ranges rather than individual pages, but the result of the copying is the same. Also fix rndis_filter_send_request() to construct a contiguous range, since it has its own page buffer array. This change has a side benefit in CoCo VMs in that netvsc_dma_map() calls dma_map_single() on each contiguous range instead of on each page. This results in fewer calls to dma_map_single() but on larger chunks of memory, which should reduce contention on the swiotlb. Since the page buffer array now contains one entry for each contiguous range instead of for each individual page, the number of entries in the array can be reduced, saving 208 bytes of stack space in netvsc_xmit() when MAX_SKG_FRAGS has the default value of 17. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217503 Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-4-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14hv_netvsc: Use vmbus_sendpacket_mpb_desc() to send VMBus messagesMichael Kelley
netvsc currently uses vmbus_sendpacket_pagebuffer() to send VMBus messages. This function creates a series of GPA ranges, each of which contains a single PFN. However, if the rndis header in the VMBus message crosses a page boundary, the netvsc protocol with the host requires that both PFNs for the rndis header must be in a single "GPA range" data structure, which isn't possible with vmbus_sendpacket_pagebuffer(). As the first step in fixing this, add a new function netvsc_build_mpb_array() to build a VMBus message with multiple GPA ranges, each of which may contain multiple PFNs. Use vmbus_sendpacket_mpb_desc() to send this VMBus message to the host. There's no functional change since higher levels of netvsc don't maintain or propagate knowledge of contiguous PFNs. Based on its input, netvsc_build_mpb_array() still produces a separate GPA range for each PFN and the behavior is the same as with vmbus_sendpacket_pagebuffer(). But the groundwork is laid for a subsequent patch to provide the necessary grouping. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-3-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple rangesMichael Kelley
vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver and is hardcoded to create a single GPA range. To allow it to also be used by the netvsc driver to create multiple GPA ranges, no longer hardcode as having a single GPA range. Allow the calling driver to specify the rangecount in the supplied descriptor. Update the storvsc driver to reflect this new approach. Cc: <stable@vger.kernel.org> # 6.1.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14octeontx2-af: Fix CGX Receive countersHariprasad Kelam
Each CGX block supports 4 logical MACs (LMACS). Receive counters CGX_CMR_RX_STAT0-8 are per LMAC and CGX_CMR_RX_STAT9-12 are per CGX. Due a bug in previous patch, stale Per CGX counters values observed. Fixes: 66208910e57a ("octeontx2-af: Support to retrieve CGX LMAC stats") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250513071554.728922-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: ethernet: mtk_eth_soc: fix typo for declaration MT7988 ESW capabilityBo-Cun Chen
Since MTK_ESW_BIT is a bit number rather than a bitmap, it causes MTK_HAS_CAPS to produce incorrect results. This leads to the ETH driver not declaring MAC capabilities correctly for the MT7988 ESW. Fixes: 445eb6448ed3 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/b8b37f409d1280fad9c4d32521e6207f63cd3213.1747110258.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: libwx: Fix FW mailbox unknown commandJiawen Wu
For the new SW-FW interaction, missing the error return if there is an unknown command. It causes the driver to mistakenly believe that the interaction is complete. This problem occurs when new driver is paired with old firmware, which does not support the new mailbox commands. Fixes: 2e5af6b2ae85 ("net: txgbe: Add basic support for new AML devices") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/64DBB705D35A0016+20250513021009.145708-4-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: libwx: Fix FW mailbox reply timeoutJiawen Wu
For the new SW-FW interaction, the timeout waiting for the firmware to return is too short. So that some mailbox commands cannot be completed. Use the 'timeout' parameter instead of fixed timeout value for flexible configuration. Fixes: 2e5af6b2ae85 ("net: txgbe: Add basic support for new AML devices") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/5D5BDE3EA501BDB8+20250513021009.145708-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14net: txgbe: Fix to calculate EEPROM checksum for AML devicesJiawen Wu
In the new firmware version, the shadow ram reserves some space to store I2C information, so the checksum calculation needs to skip this section. Otherwise, the driver will fail to probe because the invalid EEPROM checksum. Fixes: 2e5af6b2ae85 ("net: txgbe: Add basic support for new AML devices") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1C6BF7A937237F5A+20250513021009.145708-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-14octeontx2-pf: macsec: Fix incorrect max transmit size in TX secySubbaraya Sundeep
MASCEC hardware block has a field called maximum transmit size for TX secy. Max packet size going out of MCS block has be programmed taking into account full packet size which has L2 header,SecTag and ICV. MACSEC offload driver is configuring max transmit size as macsec interface MTU which is incorrect. Say with 1500 MTU of real device, macsec interface created on top of real device will have MTU of 1468(1500 - (SecTag + ICV)). This is causing packets from macsec interface of size greater than or equal to 1468 are not getting transmitted out because driver programmed max transmit size as 1468 instead of 1514(1500 + ETH_HDR_LEN). Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1747053756-4529-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tpm: tis: Double the timeout B to 4sMichal Suchanek
With some Infineon chips the timeouts in tpm_tis_send_data (both B and C) can reach up to about 2250 ms. Timeout C is retried since commit de9e33df7762 ("tpm, tpm_tis: Workaround failed command reception on Infineon devices") Timeout B still needs to be extended. The problem is most commonly encountered with context related operation such as load context/save context. These are issued directly by the kernel, and there is no retry logic for them. When a filesystem is set up to use the TPM for unlocking the boot fails, and restarting the userspace service is ineffective. This is likely because ignoring a load context/save context result puts the real TPM state and the TPM state expected by the kernel out of sync. Chips known to be affected: tpm_tis IFX1522:00: 2.0 TPM (device-id 0x1D, rev-id 54) Description: SLB9672 Firmware Revision: 15.22 tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22) Firmware Revision: 7.83 tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1A, rev-id 16) Firmware Revision: 5.63 Link: https://lore.kernel.org/linux-integrity/Z5pI07m0Muapyu9w@kitsune.suse.cz/ Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-15char: tpm: tpm-buf: Add sanity check fallback in read helpersPurva Yeshi
Fix Smatch-detected issue: drivers/char/tpm/tpm-buf.c:208 tpm_buf_read_u8() error: uninitialized symbol 'value'. drivers/char/tpm/tpm-buf.c:225 tpm_buf_read_u16() error: uninitialized symbol 'value'. drivers/char/tpm/tpm-buf.c:242 tpm_buf_read_u32() error: uninitialized symbol 'value'. Zero-initialize the return values in tpm_buf_read_u8(), tpm_buf_read_u16(), and tpm_buf_read_u32() to guard against uninitialized data in case of a boundary overflow. Add defensive initialization ensures the return values are always defined, preventing undefined behavior if the unexpected happens. Signed-off-by: Purva Yeshi <purvayeshi550@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-15tpm: Mask TPM RC in tpm2_start_auth_session()Jarkko Sakkinen
tpm2_start_auth_session() does not mask TPM RC correctly from the callers: [ 28.766528] tpm tpm0: A TPM error (2307) occurred start auth session Process TPM RCs inside tpm2_start_auth_session(), and map them to POSIX error codes. Cc: stable@vger.kernel.org # v6.10+ Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions") Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Closes: https://lore.kernel.org/linux-integrity/Z_NgdRHuTKP6JK--@gondor.apana.org.au/ Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-14drm/xe: Add WA BB to capture active context utilizationUmesh Nerlige Ramappa
Context Timestamp (CTX_TIMESTAMP) in the LRC accumulates the run ticks of the context, but only gets updated when the context switches out. In order to check how long a context has been active before it switches out, two things are required: (1) Determine if the context is running: To do so, we program the WA BB to set an initial value for CTX_TIMESTAMP in the LRC. The value chosen is 1 since 0 is the initial value when the LRC is initialized. During a query, we just check for this value to determine if the context is active. If the context switched out, it would overwrite this location with the actual CTX_TIMESTAMP MMIO value. Note that WA BB runs as the last part of the context restore, so reusing this LRC location will not clobber anything. (2) Calculate the time that the context has been active for: The CTX_TIMESTAMP ticks only when the context is active. If a context is active, we just use the CTX_TIMESTAMP MMIO as the new value of utilization. While doing so, we need to read the CTX_TIMESTAMP MMIO for the specific engine instance. Since we do not know which instance the context is running on until it is scheduled, we also read the ENGINE_ID MMIO in the WA BB and store it in the PPHSWP. Using the above 2 instructions in a WA BB, capture active context utilization. v2: (Matt Brost) - This breaks TDR, fix it by saving the CTX_TIMESTAMP register "drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value" - Drop tile from LRC if using gt "drm/xe: Save the gt pointer in LRC and drop the tile" v3: - Remove helpers for bb_per_ctx_ptr (Matt) - Add define for context active value (Matt) - Use 64 bit CTX TIMESTAMP for platforms that support it. For platforms that don't, live with the rare race. (Matt, Lucas) - Convert engine id to hwe and get the MMIO value (Lucas) - Correct commit message on when WA BB runs (Lucas) v4: - s/GRAPHICS_VER(...)/xe->info.has_64bit_timestamp/ (Matt) - Drop support for active utilization on a VF (CI failure) - In xe_lrc_init ensure the lrc value is 0 to begin with (CI regression) v5: - Minor checkpatch fix - Squash into previous commit and make TDR use 32-bit time - Update code comment to match commit msg Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4532 Cc: <stable@vger.kernel.org> # v6.13+ Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250509161159.2173069-8-umesh.nerlige.ramappa@intel.com (cherry picked from commit 82b98cadb01f63cdb159e596ec06866d00f8e8c7) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14drm/xe: Save the gt pointer in lrc and drop the tileUmesh Nerlige Ramappa
Save the gt pointer in the lrc so that it can used for gt based helpers. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250509161159.2173069-7-umesh.nerlige.ramappa@intel.com (cherry picked from commit 741d3ef8b8b88fab2729ca89de1180e49bc9cef0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC valueUmesh Nerlige Ramappa
For determining actual job execution time, save the current value of the CTX_TIMESTAMP register rather than the value saved in LRC since the current register value is the closest to the start time of the job. v2: Define MI_STORE_REGISTER_MEM to fix compile error v3: Place MI_STORE_REGISTER_MEM sorted by MI_INSTR (Lucas) Fixes: 65921374c48f ("drm/xe: Emit ctx timestamp copy in ring ops") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250509161159.2173069-6-umesh.nerlige.ramappa@intel.com (cherry picked from commit 38b14233e5deff51db8faec287b4acd227152246) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14drm/xe: Timeslice GPU on atomic SVM faultMatthew Brost
Ensure GPU can make forward progress on an atomic SVM GPU fault by giving the GPU a timeslice of 5ms v2: - Reduce timeslice to 5ms - Double timeslice on retry - Split out GPU SVM changes into independent patch v5: - Double timeslice in a few more places Fixes: 2f118c949160 ("drm/xe: Add SVM VRAM migration") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250512135500.1405019-5-matthew.brost@intel.com (cherry picked from commit a5d8d3be1dea8154edbbea481081469627665659) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14drm/gpusvm: Add timeslicing support to GPU SVMMatthew Brost
Add timeslicing support to GPU SVM which will guarantee the GPU a minimum execution time on piece of physical memory before migration back to CPU. Intended to implement strict migration policies which require memory to be in a certain placement for correct execution. Required for shared CPU and GPU atomics on certain devices. Fixes: 99624bdff867 ("drm/gpusvm: Add support for GPU Shared Virtual Memory") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250512135500.1405019-4-matthew.brost@intel.com (cherry picked from commit 8dc1812b5b3a42311d28eb385eed88e2053ad3cb) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14drm/xe: Strict migration policy for atomic SVM faultsMatthew Brost
Mixing GPU and CPU atomics does not work unless a strict migration policy of GPU atomics must be device memory. Enforce a policy of must be in VRAM with a retry loop of 3 attempts, if retry loop fails abort fault. Removing always_migrate_to_vram modparam as we now have real migration policy. v2: - Only retry migration on atomics - Drop alway migrate modparam v3: - Only set vram_only on DGFX (Himal) - Bail on get_pages failure if vram_only and retry count exceeded (Himal) - s/vram_only/devmem_only - Update xe_svm_range_is_valid to accept devmem_only argument v4: - Fix logic bug get_pages failure v5: - Fix commit message (Himal) - Mention removing always_migrate_to_vram in commit message (Lucas) - Fix xe_svm_range_is_valid to check for devmem pages - Bail on devmem_only && !migrate_devmem (Thomas) v6: - Add READ_ONCE barriers for opportunistic checks (Thomas) - Pair READ_ONCE with WRITE_ONCE (Thomas) v7: - Adjust comments (Thomas) Fixes: 2f118c949160 ("drm/xe: Add SVM VRAM migration") Cc: stable@vger.kernel.org Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250512135500.1405019-3-matthew.brost@intel.com (cherry picked from commit a9ac0fa455b050d03e3032501368048fb284d318) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14drm/gpusvm: Introduce devmem_only flag for allocationHimal Prasad Ghimiray
This commit adds a new flag, devmem_only, to the drm_gpusvm structure. The purpose of this flag is to ensure that the get_pages function allocates memory exclusively from the device's memory. If the allocation from device memory fails, the function will return an -EFAULT error. Required for shared CPU and GPU atomics on certain devices. v3: - s/vram_only/devmem_only/ Fixes: 99624bdff867 ("drm/gpusvm: Add support for GPU Shared Virtual Memory") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250512135500.1405019-2-matthew.brost@intel.com (cherry picked from commit 8a9b978ebd47df9e0694c34748c2d6fa0c31eb4d) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-14drm/xe/xe2hpg: Add Wa_22021007897Aradhya Bhatia
Add Wa_22021007897 for the Xe2_HPG (graphics version: 20.01) IP. It is a permanent workaround, and applicable on all the steppings. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://lore.kernel.org/r/20250512065004.2576-1-aradhya.bhatia@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit e5c13e2c505b73a8667ef9a0fd5cbd4227e483e6) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>