summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2024-10-06platform/x86/intel: power-domains: Add Diamond Rapids supportSrinivas Pandruvada
Add Diamond Rapids (INTEL_PANTHERCOVE_X) to tpmi_cpu_ids to support domaid id mappings. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20241003215554.3013807-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86: ISST: Add Diamond Rapids to support listSrinivas Pandruvada
Add Diamond Rapids (INTEL_PANTHERCOVE_X) to SST support list by adding to isst_cpu_ids. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20241003215554.3013807-2-srinivas.pandruvada@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-06platform/x86:intel/pmc: Disable ACPI PM Timer disabling on Sky and Kaby LakeHans de Goede
There have been multiple reports that the ACPI PM Timer disabling is causing Sky and Kaby Lake systems to hang on all suspend (s2idle, s3, hibernate) methods. Remove the acpi_pm_tmr_ctl_offset and acpi_pm_tmr_disable_bit settings from spt_reg_map to disable the ACPI PM Timer disabling on Sky and Kaby Lake to fix the hang on suspend. Fixes: e86c8186d03a ("platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Closes: https://lore.kernel.org/linux-pm/18784f62-91ff-4d88-9621-6c88eb0af2b5@molgen.mpg.de/ Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219346 Cc: Marek Maslanka <mmaslanka@google.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Todd Brandt <todd.e.brandt@intel.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360/0596KF Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20241003202614.17181-2-hdegoede@redhat.com
2024-10-06platform/x86: dell-laptop: Do not fail when encountering unsupported batteriesArmin Wolf
If the battery hook encounters a unsupported battery, it will return an error. This in turn will cause the battery driver to automatically unregister the battery hook. On machines with multiple batteries however, this will prevent the battery hook from handling the primary battery, since it will always get unregistered upon encountering one of the unsupported batteries. Fix this by simply ignoring unsupported batteries. Reviewed-by: Pali Rohár <pali@kernel.org> Fixes: ab58016c68cc ("platform/x86:dell-laptop: Add knobs to change battery charge settings") Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20241001212835.341788-4-W_Armin@gmx.de Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-05Merge tag 'for-linus-6.12a-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "Fix Xen config issue introduced in the merge window" * tag 'for-linus-6.12a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Fix config option reference in XEN_PRIVCMD definition
2024-10-05Merge tag 'cxl-fixes-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fix from Ira Weiny: - Fix calculation for SBDF in error injection * tag 'cxl-fixes-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: EINJ, CXL: Fix CXL device SBDF calculation
2024-10-05Merge tag 'i2c-for-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: - Fix potential deadlock during runtime suspend and resume (stm32f7) * tag 'i2c-for-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: stm32f7: Do not prepare/unprepare clock during runtime suspend/resume
2024-10-05Merge tag 'spi-fix-v6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A small set of driver specific fixes that came in since the merge window, about half of which is fixes for correctness in the use of the runtime PM APIs done as part of a broader cleanup" * tag 'spi-fix-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: s3c64xx: fix timeout counters in flush_fifo spi: atmel-quadspi: Fix wrong register value written to MR spi: spi-cadence: Fix missing spi_controller_is_target() check spi: spi-cadence: Fix pm_runtime_set_suspended() with runtime pm enabled spi: spi-imx: Fix pm_runtime_set_suspended() with runtime pm enabled
2024-10-05platform/x86: ISST: Fix the KASAN report slab-out-of-bounds bugZach Wade
Attaching SST PCI device to VM causes "BUG: KASAN: slab-out-of-bounds". kasan report: [ 19.411889] ================================================================== [ 19.413702] BUG: KASAN: slab-out-of-bounds in _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.415634] Read of size 8 at addr ffff888829e65200 by task cpuhp/16/113 [ 19.417368] [ 19.418627] CPU: 16 PID: 113 Comm: cpuhp/16 Tainted: G E 6.9.0 #10 [ 19.420435] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.20192059.B64.2207280713 07/28/2022 [ 19.422687] Call Trace: [ 19.424091] <TASK> [ 19.425448] dump_stack_lvl+0x5d/0x80 [ 19.426963] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.428694] print_report+0x19d/0x52e [ 19.430206] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 19.431837] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.433539] kasan_report+0xf0/0x170 [ 19.435019] ? _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.436709] _isst_if_get_pci_dev+0x3d5/0x400 [isst_if_common] [ 19.438379] ? __pfx_sched_clock_cpu+0x10/0x10 [ 19.439910] isst_if_cpu_online+0x406/0x58f [isst_if_common] [ 19.441573] ? __pfx_isst_if_cpu_online+0x10/0x10 [isst_if_common] [ 19.443263] ? ttwu_queue_wakelist+0x2c1/0x360 [ 19.444797] cpuhp_invoke_callback+0x221/0xec0 [ 19.446337] cpuhp_thread_fun+0x21b/0x610 [ 19.447814] ? __pfx_cpuhp_thread_fun+0x10/0x10 [ 19.449354] smpboot_thread_fn+0x2e7/0x6e0 [ 19.450859] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 19.452405] kthread+0x29c/0x350 [ 19.453817] ? __pfx_kthread+0x10/0x10 [ 19.455253] ret_from_fork+0x31/0x70 [ 19.456685] ? __pfx_kthread+0x10/0x10 [ 19.458114] ret_from_fork_asm+0x1a/0x30 [ 19.459573] </TASK> [ 19.460853] [ 19.462055] Allocated by task 1198: [ 19.463410] kasan_save_stack+0x30/0x50 [ 19.464788] kasan_save_track+0x14/0x30 [ 19.466139] __kasan_kmalloc+0xaa/0xb0 [ 19.467465] __kmalloc+0x1cd/0x470 [ 19.468748] isst_if_cdev_register+0x1da/0x350 [isst_if_common] [ 19.470233] isst_if_mbox_init+0x108/0xff0 [isst_if_mbox_msr] [ 19.471670] do_one_initcall+0xa4/0x380 [ 19.472903] do_init_module+0x238/0x760 [ 19.474105] load_module+0x5239/0x6f00 [ 19.475285] init_module_from_file+0xd1/0x130 [ 19.476506] idempotent_init_module+0x23b/0x650 [ 19.477725] __x64_sys_finit_module+0xbe/0x130 [ 19.476506] idempotent_init_module+0x23b/0x650 [ 19.477725] __x64_sys_finit_module+0xbe/0x130 [ 19.478920] do_syscall_64+0x82/0x160 [ 19.480036] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 19.481292] [ 19.482205] The buggy address belongs to the object at ffff888829e65000 which belongs to the cache kmalloc-512 of size 512 [ 19.484818] The buggy address is located 0 bytes to the right of allocated 512-byte region [ffff888829e65000, ffff888829e65200) [ 19.487447] [ 19.488328] The buggy address belongs to the physical page: [ 19.489569] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888829e60c00 pfn:0x829e60 [ 19.491140] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 19.492466] anon flags: 0x57ffffc0000840(slab|head|node=1|zone=2|lastcpupid=0x1fffff) [ 19.493914] page_type: 0xffffffff() [ 19.494988] raw: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001 [ 19.496451] raw: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000 [ 19.497906] head: 0057ffffc0000840 ffff88810004cc80 0000000000000000 0000000000000001 [ 19.499379] head: ffff888829e60c00 0000000080200018 00000001ffffffff 0000000000000000 [ 19.500844] head: 0057ffffc0000003 ffffea0020a79801 ffffea0020a79848 00000000ffffffff [ 19.502316] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000 [ 19.503784] page dumped because: kasan: bad access detected [ 19.505058] [ 19.505970] Memory state around the buggy address: [ 19.507172] ffff888829e65100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 19.508599] ffff888829e65180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 19.510013] >ffff888829e65200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 19.510014] ^ [ 19.510016] ffff888829e65280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 19.510018] ffff888829e65300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 19.515367] ================================================================== The reason for this error is physical_package_ids assigned by VMware VMM are not continuous and have gaps. This will cause value returned by topology_physical_package_id() to be more than topology_max_packages(). Here the allocation uses topology_max_packages(). The call to topology_max_packages() returns maximum logical package ID not physical ID. Hence use topology_logical_package_id() instead of topology_physical_package_id(). Fixes: 9a1aac8a96dc ("platform/x86: ISST: PUNIT device mapping with Sub-NUMA clustering") Cc: stable@vger.kernel.org Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zach Wade <zachwade.k@gmail.com> Link: https://lore.kernel.org/r/20240923144508.1764-1-zachwade.k@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-04net: phy: bcm84881: Fix some error handling pathsChristophe JAILLET
If phy_read_mmd() fails, the error code stored in 'bmsr' should be returned instead of 'val' which is likely to be 0. Fixes: 75f4d8d10e01 ("net: phy: add Broadcom BCM84881 PHY driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://patch.msgid.link/3e1755b0c40340d00e089d6adae5bca2f8c79e53.1727982168.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Bluetooth: btusb: Don't fail external suspend requestsLuiz Augusto von Dentz
Commit 4e0a1d8b0675 ("Bluetooth: btusb: Don't suspend when there are connections") introduces a check for connections to prevent auto-suspend but that actually ignored the fact the .suspend callback can be called for external suspend requests which Documentation/driver-api/usb/power-management.rst states the following: 'External suspend calls should never be allowed to fail in this way, only autosuspend calls. The driver can tell them apart by applying the :c:func:`PMSG_IS_AUTO` macro to the message argument to the ``suspend`` method; it will return True for internal PM events (autosuspend) and False for external PM events.' In addition to that align system suspend with USB suspend by using hci_suspend_dev since otherwise the stack would be expecting events such as advertising reports which may not be delivered while the transport is suspended. Fixes: 4e0a1d8b0675 ("Bluetooth: btusb: Don't suspend when there are connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Kiran K <kiran.k@intel.com>
2024-10-04net: pse-pd: Fix enabled status mismatchKory Maincent
PSE controllers like the TPS23881 can forcefully turn off their configuration state. In such cases, the is_enabled() and get_status() callbacks will report the PSE as disabled, while admin_state_enabled will show it as enabled. This mismatch can lead the user to attempt to enable it, but no action is taken as admin_state_enabled remains set. The solution is to disable the PSE before enabling it, ensuring the actual status matches admin_state_enabled. Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241002121706.246143-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Merge tag 'riscv-for-linus-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - PERF_TYPE_BREAKPOINT now returns -EOPNOTSUPP instead of -ENOENT, which aligns to other ports and is a saner value - The KASAN-related stack size increasing logic has been moved to a C header, to avoid dependency issues * tag 'riscv-for-linus-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix kernel stack size when KASAN is enabled drivers/perf: riscv: Align errno for unsupported perf event
2024-10-04ibmvnic: Inspect header requirements before using scrq directNick Child
Previously, the TX header requirement for standard frames was ignored. This requirement is a bitstring sent from the VIOS which maps to the type of header information needed during TX. If no header information, is needed then send subcrq direct can be used (which can be more performant). This bitstring was previously ignored for standard packets (AKA non LSO, non CSO) due to the belief that the bitstring was over-cautionary. It turns out that there are some configurations where the backing device does need header information for transmission of standard packets. If the information is not supplied then this causes continuous "Adapter error" transport events. Therefore, this bitstring should be respected and observed before considering the use of send subcrq direct. Fixes: 74839f7a8268 ("ibmvnic: Introduce send sub-crq direct") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001163200.1802522-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Merge tag 'acpi-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix up the ACPI IRQ override quirk list and add two new entries to it, add a new quirk to the ACPI backlight (video) driver, and fix the ACPI battery driver. Specifics: - Add a quirk for Dell OptiPlex 5480 AIO to the ACPI backlight (video) driver (Hans de Goede) - Prevent the ACPI battery driver from crashing when unregistering a battery hook and simplify battery hook locking in it (Armin Wolf) - Fix up the ACPI IRQ override quirk list and add quirks for Asus Vivobook X1704VAP and Asus ExpertBook B2502CVA to it (Hans de Goede)" * tag 'acpi-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: battery: Fix possible crash when unregistering a battery hook ACPI: battery: Simplify battery hook locking ACPI: video: Add backlight=native quirk for Dell OptiPlex 5480 AIO ACPI: resource: Add Asus ExpertBook B2502CVA to irq1_level_low_skip_override[] ACPI: resource: Add Asus Vivobook X1704VAP to irq1_level_low_skip_override[] ACPI: resource: Loosen the Asus E1404GAB DMI match to also cover the E1404GA ACPI: resource: Remove duplicate Asus E1504GAB IRQ override
2024-10-04Merge tag 'pm-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two cpufreq issues, one in the core and one in the intel_pstate driver: - Fix CPU device node reference counting in the cpufreq core (Miquel Sabaté Solà) - Turn the spinlock used by the intel_pstate driver in hard IRQ context into a raw one to prevent the driver from crashing when PREEMPT_RT is enabled (Uwe Kleine-König)" * tag 'pm-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Avoid a bad reference count on CPU node cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock
2024-10-04Merge tag 'gpio-fixes-for-v6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix a potential NULL-pointer dereference in gpiolib core - fix a probe() regression from the v6.12 merge window and an older bug leading to missed interrupts in gpio-davinci * tag 'gpio-fixes-for-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: Fix potential NULL pointer dereference in gpiod_get_label() gpio: davinci: Fix condition for irqchip registration gpio: davinci: fix lazy disable
2024-10-04Merge tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly fixes, xe and amdgpu lead the way, with panthor, and few core components getting various fixes. Nothing seems too out of the ordinary. atomic: - Use correct type when reading damage rectangles display: - Fix kernel docs dp-mst: - Fix DSC decompression detection hdmi: - Fix infoframe size sched: - Update maintainers - Fix race condition whne queueing up jobs - Fix locking in drm_sched_entity_modify_sched() - Fix pointer deref if entity queue changes sysfb: - Disable sysfb if framebuffer parent device is unknown amdgpu: - DML2 fix - DSC fix - Dispclk fix - eDP HDR fix - IPS fix - TBT fix i915: - One fix for bitwise and logical "and" mixup in PM code xe: - Restore pci state on resume - Fix locking on submission, queue and vm - Fix UAF on queue destruction - Fix resource release on freq init error path - Use rw_semaphore to reduce contention on ASID->VM lookup - Fix steering for media on Xe2_HPM - Tuning updates to Xe2 - Resume TDR after GT reset to prevent jobs running forever - Move id allocation to avoid userspace using a guessed number to trigger UAF - Fix OA stream close preventing pbatch buffers to complete - Fix NPD when migrating memory on LNL - Fix memory leak when aborting binds panthor: - Fix locking - Set FOP_UNSIGNED_OFFSET in fops instance - Acquire lock in panthor_vm_prepare_map_op_ctx() - Avoid uninitialized variable in tick_ctx_cleanup() - Do not block scheduler queue if work is pending - Do not add write fences to the shared BOs vbox: - Fix VLA handling" * tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernel: (41 commits) drm/xe: Fix memory leak when aborting binds drm/xe: Prevent null pointer access in xe_migrate_copy drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close drm/xe/queue: move xa_alloc to prevent UAF drm/xe/vm: move xa_alloc to prevent UAF drm/xe: Clean up VM / exec queue file lock usage. drm/xe: Resume TDR after GT reset drm/xe/xe2: Add performance tuning for L3 cache flushing drm/xe/xe2: Extend performance tuning to media GT drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPM drm/xe: Use helper for ASID -> VM in GPU faults and access counters drm/xe: Convert to USM lock to rwsem drm/xe: use devm_add_action_or_reset() helper drm/xe: fix UAF around queue destruction drm/xe/guc_submit: add missing locking in wedged_fini drm/xe: Restore pci state upon resume drm/amd/display: Fix system hang while resume with TBT monitor drm/amd/display: Enable idle workqueue for more IPS modes drm/amd/display: Add HDR workaround for specific eDP drm/amd/display: avoid set dispclk to 0 ...
2024-10-04net: dsa: sja1105: fix reception from VLAN-unaware bridgesVladimir Oltean
The blamed commit introduced an unexpected regression in the sja1105 driver. Packets from VLAN-unaware bridge ports get received correctly, but the protocol stack can't seem to decode them properly. For ds->untag_bridge_pvid users (thus also sja1105), the blamed commit did introduce a functional change: dsa_switch_rcv() used to call dsa_untag_bridge_pvid(), which looked like this: err = br_vlan_get_proto(br, &proto); if (err) return skb; /* Move VLAN tag from data to hwaccel */ if (!skb_vlan_tag_present(skb) && skb->protocol == htons(proto)) { skb = skb_vlan_untag(skb); if (!skb) return NULL; } and now it calls dsa_software_vlan_untag() which has just this: /* Move VLAN tag from data to hwaccel */ if (!skb_vlan_tag_present(skb)) { skb = skb_vlan_untag(skb); if (!skb) return NULL; } thus lacks any skb->protocol == bridge VLAN protocol check. That check is deferred until a later check for skb->vlan_proto (in the hwaccel area). The new code is problematic because, for VLAN-untagged packets, skb_vlan_untag() blindly takes the 4 bytes starting with the EtherType and turns them into a hwaccel VLAN tag. This is what breaks the protocol stack. It would be tempting to "make it work as before" and only call skb_vlan_untag() for those packets with the skb->protocol actually representing a VLAN. But the premise of the newly introduced dsa_software_vlan_untag() core function is not wrong. Drivers set ds->untag_bridge_pvid or ds->untag_vlan_aware_bridge_pvid presumably because they send all traffic to the CPU reception path as VLAN-tagged. So why should we spend any additional CPU cycles assuming that the packet may be VLAN-untagged? And why does the sja1105 driver opt into ds->untag_bridge_pvid if it doesn't always deliver packets to the CPU as VLAN-tagged? The answer to the latter question is indeed more interesting: it doesn't need to. This got done in commit 884be12f8566 ("net: dsa: sja1105: add support for imprecise RX"), because I thought it would be needed, but I didn't realize that it doesn't actually make a difference. As explained in the commit message of the blamed patch, ds->untag_bridge_pvid only makes a difference in the VLAN-untagged receive path of a bridge port. However, in that operating mode, tag_sja1105.c makes use of VLAN tags with the ETH_P_SJA1105 TPID, and it decodes and consumes these VLAN tags as if they were DSA tags (aka tag_8021q operation). Even if commit 884be12f8566 ("net: dsa: sja1105: add support for imprecise RX") added this logic in sja1105_bridge_vlan_add(): /* Always install bridge VLANs as egress-tagged on the CPU port. */ if (dsa_is_cpu_port(ds, port)) flags = 0; that was for _bridge_ VLANs, which are _not_ committed to hardware in VLAN-unaware mode (aka the mode where ds->untag_bridge_pvid does anything at all). Even prior to that change, the tag_8021q VLANs were always installed as egress-tagged on the CPU port, see dsa_switch_tag_8021q_vlan_add(): u16 flags = 0; // egress-tagged, non-PVID if (dsa_port_is_user(dp)) flags |= BRIDGE_VLAN_INFO_UNTAGGED | BRIDGE_VLAN_INFO_PVID; err = dsa_port_do_tag_8021q_vlan_add(dp, info->vid, flags); if (err) return err; Whether the sja1105 driver needs the new flag, ds->untag_vlan_aware_bridge_pvid, rather than ds->untag_bridge_pvid, is a separate discussion. To fix the current bug in VLAN-unaware bridge mode, I would argue that the sja1105 driver should not request something it doesn't need, rather than complicating the core DSA helper. Whereas before the blamed commit, this setting was harmless, now it has caused breakage. Fixes: 93e4649efa96 ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241001140206.50933-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04Merge tag 'block-6.12-20241004' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix another use-after-free in aoe - Fixup wrong nested non-saving irq disable/restore in blk-iocost - Fixup a kerneldoc complaint introduced by a merge window patch * tag 'block-6.12-20241004' of git://git.kernel.dk/linux: aoe: fix the potential use-after-free problem in more places blk_iocost: remove some duplicate irq disable/enables block: fix blk_rq_map_integrity_sg kernel-doc
2024-10-04Merge branches 'acpi-video' and 'acpi-battery'Rafael J. Wysocki
Merge an ACPI backlight (video) quirk and ACPI battery driver fix and cleanup for 6.12-rc2: - Add a quirk for Dell OptiPlex 5480 AIO to the ACPI backlight (video) driver (Hans de Goede). - Prevent the ACPI battery driver from crashing when unregistering a battery hook and simplify battery hook locking in it (Armin Wolf). * acpi-video: ACPI: video: Add backlight=native quirk for Dell OptiPlex 5480 AIO * acpi-battery: ACPI: battery: Fix possible crash when unregistering a battery hook ACPI: battery: Simplify battery hook locking
2024-10-04thermal: core: Free tzp copy along with the thermal zoneRafael J. Wysocki
The object pointed to by tz->tzp may still be accessed after being freed in thermal_zone_device_unregister(), so move the freeing of it to the point after the removal completion has been completed at which it cannot be accessed any more. Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure") Cc: 6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/4623516.LvFx2qVVIh@rjwysocki.net
2024-10-04thermal: core: Reference count the zone in thermal_zone_get_by_id()Rafael J. Wysocki
There are places in the thermal netlink code where nothing prevents the thermal zone object from going away while being accessed after it has been returned by thermal_zone_get_by_id(). To address this, make thermal_zone_get_by_id() get a reference on the thermal zone device object to be returned with the help of get_device(), under thermal_list_lock, and adjust all of its callers to this change with the help of the cleanup.h infrastructure. Fixes: 1ce50e7d408e ("thermal: core: genetlink support for events/cmd/sampling") Cc: 6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/6112242.lOV4Wx5bFT@rjwysocki.net
2024-10-04Merge tag 'drm-xe-fixes-2024-10-03' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Restore pci state on resume (Rodrigo Vivi) - Fix locking on submission, queue and vm (Matthew Auld, Matthew Brost) - Fix UAF on queue destruction (Matthew Auld) - Fix resource release on freq init error path (He Lugang) - Use rw_semaphore to reduce contention on ASID->VM lookup (Matthew Brost) - Fix steering for media on Xe2_HPM (Gustavo Sousa) - Tuning updates to Xe2 (Gustavo Sousa) - Resume TDR after GT reset to prevent jobs running forever (Matthew Brost) - Move id allocation to avoid userspace using a guessed number to trigger UAF (Matthew Auld, Matthew Brost) - Fix OA stream close preventing pbatch buffers to complete (José) - Fix NPD when migrating memory on LNL (Zhanjun Dong) - Fix memory leak when aborting binds (Matthew Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/2fiv63yanlal5mpw3mxtotte6yvkvtex74c7mkjxca4bazlyja@o4iejcfragxy
2024-10-03Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-09-30 (ice, idpf) This series contains updates to ice and idpf drivers: For ice: Michal corrects setting of dst VSI on LAN filters and adds clearing of port VLAN configuration during reset. Gui-Dong Han corrects failures to decrement refcount in some error paths. Przemek resolves a memory leak in ice_init_tx_topology(). Arkadiusz prevents setting of DPLL_PIN_STATE_SELECTABLE to an improper value. Dave stops clearing of VLAN tracking bit to allow for VLANs to be properly restored after reset. For idpf: Ahmed sets uninitialized dyn_ctl_intrvl_s value. Josh corrects use and reporting of mailbox size. Larysa corrects order of function calls during de-initialization. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: idpf: deinit virtchnl transaction manager after vport and vectors idpf: use actual mbx receive payload length idpf: fix VF dynamic interrupt ctl register initialization ice: fix VLAN replay after reset ice: disallow DPLL_PIN_STATE_SELECTABLE for dpll output pins ice: fix memleak in ice_init_tx_topology() ice: clear port vlan config during reset ice: Fix improper handling of refcount in ice_sriov_set_msix_vec_count() ice: Fix improper handling of refcount in ice_dpll_init_rclk_pins() ice: set correct dst VSI in only LAN filters ==================== Link: https://patch.msgid.link/20240930223601.3137464-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03net: phy: aquantia: remove usage of phy_set_max_speedAbhishek Chauhan
Remove the use of phy_set_max_speed in phy driver as the function is mainly used in MAC driver to set the max speed. Instead use get_features to fix up Phy PMA capabilities for AQR111, AQR111B0, AQR114C and AQCS109 Fixes: 038ba1dc4e54 ("net: phy: aquantia: add AQR111 and AQR111B0 PHY ID") Fixes: 0974f1f03b07 ("net: phy: aquantia: remove false 5G and 10G speed ability for AQCS109") Fixes: c278ec644377 ("net: phy: aquantia: add support for AQR114C PHY ID") Link: https://lore.kernel.org/all/20240913011635.1286027-1-quic_abchauha@quicinc.com/T/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20241001224626.2400222-3-quic_abchauha@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03net: phy: aquantia: AQR115c fix up PMA capabilitiesAbhishek Chauhan
AQR115c reports incorrect PMA capabilities which includes 10G/5G and also incorrectly disables capabilities like autoneg and 10Mbps support. AQR115c as per the Marvell databook supports speeds up to 2.5Gbps with autonegotiation. Fixes: 0ebc581f8a4b ("net: phy: aquantia: add support for aqr115c") Link: https://lore.kernel.org/all/20240913011635.1286027-1-quic_abchauha@quicinc.com/T/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20241001224626.2400222-2-quic_abchauha@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03sfc: Don't invoke xdp_do_flush() from netpoll.Sebastian Andrzej Siewior
Yury reported a crash in the sfc driver originated from netpoll_send_udp(). The netconsole sends a message and then netpoll invokes the driver's NAPI function with a budget of zero. It is dedicated to allow driver to free TX resources, that it may have used while sending the packet. In the netpoll case the driver invokes xdp_do_flush() unconditionally, leading to crash because bpf_net_context was never assigned. Invoke xdp_do_flush() only if budget is not zero. Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") Reported-by: Yury Vostrikov <mon@unformed.ru> Closes: https://lore.kernel.org/5627f6d1-5491-4462-9d75-bc0612c26a22@app.fastmail.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20241002125837.utOcRo6Y@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03net: phy: dp83869: fix memory corruption when enabling fiberIngo van Lil
When configuring the fiber port, the DP83869 PHY driver incorrectly calls linkmode_set_bit() with a bit mask (1 << 10) rather than a bit number (10). This corrupts some other memory location -- in case of arm64 the priv pointer in the same structure. Since the advertising flags are updated from supported at the end of the function the incorrect line isn't needed at all and can be removed. Fixes: a29de52ba2a1 ("net: dp83869: Add ability to advertise Fiber connection") Signed-off-by: Ingo van Lil <inguin@gmx.de> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20241002161807.440378-1-inguin@gmx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03drm/nouveau/gsp: remove extraneous ; after mutexColin Ian King
The mutex field has two following semicolons, replace this with just one semicolon. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240917120856.1877733-1-colin.i.king@gmail.com
2024-10-03gpiolib: Fix potential NULL pointer dereference in gpiod_get_label()Lad Prabhakar
In `gpiod_get_label()`, it is possible that `srcu_dereference_check()` may return a NULL pointer, leading to a scenario where `label->str` is accessed without verifying if `label` itself is NULL. This patch adds a proper NULL check for `label` before accessing `label->str`. The check for `label->str != NULL` is removed because `label->str` can never be NULL if `label` is not NULL. This fixes the issue where the label name was being printed as `(efault)` when dumping the sysfs GPIO file when `label == NULL`. Fixes: 5a646e03e956 ("gpiolib: Return label, if set, for IRQ only line") Fixes: a86d27693066 ("gpiolib: fix the speed of descriptor label setting with SRCU") Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20241003131351.472015-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-10-03Merge tag 'net-6.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from ieee802154, bluetooth and netfilter. Current release - regressions: - eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc - eth: am65-cpsw: fix forever loop in cleanup code Current release - new code bugs: - eth: mlx5: HWS, fixed double-free in error flow of creating SQ Previous releases - regressions: - core: avoid potential underflow in qdisc_pkt_len_init() with UFO - core: test for not too small csum_start in virtio_net_hdr_to_skb() - vrf: revert "vrf: remove unnecessary RCU-bh critical section" - bluetooth: - fix uaf in l2cap_connect - fix possible crash on mgmt_index_removed - dsa: improve shutdown sequence - eth: mlx5e: SHAMPO, fix overflow of hd_per_wq - eth: ip_gre: fix drops of small packets in ipgre_xmit Previous releases - always broken: - core: fix gso_features_check to check for both dev->gso_{ipv4_,}max_size - core: fix tcp fraglist segmentation after pull from frag_list - netfilter: nf_tables: prevent nf_skb_duplicated corruption - sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start - mac802154: fix potential RCU dereference issue in mac802154_scan_worker - eth: fec: restart PPS after link state change" * tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits) sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems doc: net: napi: Update documentation for napi_schedule_irqoff net/ncsi: Disable the ncsi work before freeing the associated structure net: phy: qt2025: Fix warning: unused import DeviceId gso: fix udp gso fraglist segmentation after pull from frag_list bridge: mcast: Fail MDB get request on empty entry vrf: revert "vrf: Remove unnecessary RCU-bh critical section" net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code net: phy: realtek: Check the index value in led_hw_control_get ppp: do not assume bh is held in ppp_channel_bridge_input() selftests: rds: move include.sh to TEST_FILES net: test for not too small csum_start in virtio_net_hdr_to_skb() net: gso: fix tcp fraglist segmentation after pull from frag_list ipv4: ip_gre: Fix drops of small packets in ipgre_xmit net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check net: add more sanity checks to qdisc_pkt_len_init() net: avoid potential underflow in qdisc_pkt_len_init() with UFO net: ethernet: ti: cpsw_ale: Fix warning on some platforms net: microchip: Make FDMA config symbol invisible ...
2024-10-03drm/xe: Fix memory leak when aborting bindsMatthew Brost
Make sure to call xe_pt_update_ops_fini in xe_pt_update_ops_abort to free any memory the bind allocated. Caught by kmemleak when running Vulkan CTS tests on LNL. The leak seems to happen only when there's some kind of failure happening, like the lack of memory. Example output: unreferenced object 0xffff9120bdf62000 (size 8192): comm "deqp-vk", pid 115008, jiffies 4310295728 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 1b 05 f9 28 01 00 00 40 ...........(...@ 00 00 00 00 00 00 00 00 1b 15 f9 28 01 00 00 40 ...........(...@ backtrace (crc 7a56be79): [<ffffffff86dd81f0>] __kmalloc_cache_noprof+0x310/0x3d0 [<ffffffffc08e8211>] xe_pt_new_shared.constprop.0+0x81/0xb0 [xe] [<ffffffffc08e8309>] xe_pt_insert_entry+0xb9/0x140 [xe] [<ffffffffc08eab6d>] xe_pt_stage_bind_entry+0x12d/0x5b0 [xe] [<ffffffffc08ecbca>] xe_pt_walk_range+0xea/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08e9eff>] xe_pt_stage_bind.constprop.0+0x25f/0x580 [xe] [<ffffffffc08eb21a>] bind_op_prepare+0xea/0x6e0 [xe] [<ffffffffc08ebab8>] xe_pt_update_ops_prepare+0x1c8/0x440 [xe] [<ffffffffc08ffbf3>] ops_execute+0x143/0x850 [xe] [<ffffffffc0900b64>] vm_bind_ioctl_ops_execute+0x244/0x800 [xe] [<ffffffffc0906467>] xe_vm_bind_ioctl+0x1877/0x2370 [xe] [<ffffffffc05e92b3>] drm_ioctl_kernel+0xb3/0x110 [drm] unreferenced object 0xffff9120bdf72000 (size 8192): comm "deqp-vk", pid 115008, jiffies 4310295728 hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace (crc 23b2f0b5): [<ffffffff86dd81f0>] __kmalloc_cache_noprof+0x310/0x3d0 [<ffffffffc08e8211>] xe_pt_new_shared.constprop.0+0x81/0xb0 [xe] [<ffffffffc08e8453>] xe_pt_stage_unbind_post_descend+0xb3/0x150 [xe] [<ffffffffc08ecd26>] xe_pt_walk_range+0x246/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08ece31>] xe_pt_walk_shared+0xc1/0x110 [xe] [<ffffffffc08e7b2a>] xe_pt_stage_unbind+0x9a/0xd0 [xe] [<ffffffffc08e913d>] unbind_op_prepare+0xdd/0x270 [xe] [<ffffffffc08eb9f6>] xe_pt_update_ops_prepare+0x106/0x440 [xe] [<ffffffffc08ffbf3>] ops_execute+0x143/0x850 [xe] [<ffffffffc0900b64>] vm_bind_ioctl_ops_execute+0x244/0x800 [xe] [<ffffffffc0906467>] xe_vm_bind_ioctl+0x1877/0x2370 [xe] [<ffffffffc05e92b3>] drm_ioctl_kernel+0xb3/0x110 [drm] [<ffffffffc05e95a0>] drm_ioctl+0x280/0x4e0 [drm] Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2877 Fixes: a708f6501c69 ("drm/xe: Update PT layer with better error handling") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927232228.3255246-1-matthew.brost@intel.com (cherry picked from commit 63e0695597a044c96bf369e4d8ba031291449d95) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe: Prevent null pointer access in xe_migrate_copyZhanjun Dong
xe_migrate_copy designed to copy content of TTM resources. When source resource is null, it will trigger a NULL pointer dereference in xe_migrate_copy. To avoid this situation, update lacks source flag to true for this case, the flag will trigger xe_migrate_clear rather than xe_migrate_copy. Issue trace: <7> [317.089847] xe 0000:00:02.0: [drm:xe_migrate_copy [xe]] Pass 14, sizes: 4194304 & 4194304 <7> [317.089945] xe 0000:00:02.0: [drm:xe_migrate_copy [xe]] Pass 15, sizes: 4194304 & 4194304 <1> [317.128055] BUG: kernel NULL pointer dereference, address: 0000000000000010 <1> [317.128064] #PF: supervisor read access in kernel mode <1> [317.128066] #PF: error_code(0x0000) - not-present page <6> [317.128069] PGD 0 P4D 0 <4> [317.128071] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI <4> [317.128074] CPU: 1 UID: 0 PID: 1440 Comm: kunit_try_catch Tainted: G U N 6.11.0-rc7-xe #1 <4> [317.128078] Tainted: [U]=USER, [N]=TEST <4> [317.128080] Hardware name: Intel Corporation Lunar Lake Client Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.3221.D80.2407291239 07/29/2024 <4> [317.128082] RIP: 0010:xe_migrate_copy+0x66/0x13e0 [xe] <4> [317.128158] Code: 00 00 48 89 8d e0 fe ff ff 48 8b 40 10 4c 89 85 c8 fe ff ff 44 88 8d bd fe ff ff 65 48 8b 3c 25 28 00 00 00 48 89 7d d0 31 ff <8b> 79 10 48 89 85 a0 fe ff ff 48 8b 00 48 89 b5 d8 fe ff ff 83 ff <4> [317.128162] RSP: 0018:ffffc9000167f9f0 EFLAGS: 00010246 <4> [317.128164] RAX: ffff8881120d8028 RBX: ffff88814d070428 RCX: 0000000000000000 <4> [317.128166] RDX: ffff88813cb99c00 RSI: 0000000004000000 RDI: 0000000000000000 <4> [317.128168] RBP: ffffc9000167fbb8 R08: ffff88814e7b1f08 R09: 0000000000000001 <4> [317.128170] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88814e7b1f08 <4> [317.128172] R13: ffff88814e7b1f08 R14: ffff88813cb99c00 R15: 0000000000000001 <4> [317.128174] FS: 0000000000000000(0000) GS:ffff88846f280000(0000) knlGS:0000000000000000 <4> [317.128176] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [317.128178] CR2: 0000000000000010 CR3: 000000011f676004 CR4: 0000000000770ef0 <4> [317.128180] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4> [317.128182] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 <4> [317.128184] PKRU: 55555554 <4> [317.128185] Call Trace: <4> [317.128187] <TASK> <4> [317.128189] ? show_regs+0x67/0x70 <4> [317.128194] ? __die_body+0x20/0x70 <4> [317.128196] ? __die+0x2b/0x40 <4> [317.128198] ? page_fault_oops+0x15f/0x4e0 <4> [317.128203] ? do_user_addr_fault+0x3fb/0x970 <4> [317.128205] ? lock_acquire+0xc7/0x2e0 <4> [317.128209] ? exc_page_fault+0x87/0x2b0 <4> [317.128212] ? asm_exc_page_fault+0x27/0x30 <4> [317.128216] ? xe_migrate_copy+0x66/0x13e0 [xe] <4> [317.128263] ? __lock_acquire+0xb9d/0x26f0 <4> [317.128265] ? __lock_acquire+0xb9d/0x26f0 <4> [317.128267] ? sg_free_append_table+0x20/0x80 <4> [317.128271] ? lock_acquire+0xc7/0x2e0 <4> [317.128273] ? mark_held_locks+0x4d/0x80 <4> [317.128275] ? trace_hardirqs_on+0x1e/0xd0 <4> [317.128278] ? _raw_spin_unlock_irqrestore+0x31/0x60 <4> [317.128281] ? __pm_runtime_resume+0x60/0xa0 <4> [317.128284] xe_bo_move+0x682/0xc50 [xe] <4> [317.128315] ? lock_is_held_type+0xaa/0x120 <4> [317.128318] ttm_bo_handle_move_mem+0xe5/0x1a0 [ttm] <4> [317.128324] ttm_bo_validate+0xd1/0x1a0 [ttm] <4> [317.128328] shrink_test_run_device+0x721/0xc10 [xe] <4> [317.128360] ? find_held_lock+0x31/0x90 <4> [317.128363] ? lock_release+0xd1/0x2a0 <4> [317.128365] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10 [kunit] <4> [317.128370] xe_bo_shrink_kunit+0x11/0x20 [xe] <4> [317.128397] kunit_try_run_case+0x6e/0x150 [kunit] <4> [317.128400] ? trace_hardirqs_on+0x1e/0xd0 <4> [317.128402] ? _raw_spin_unlock_irqrestore+0x31/0x60 <4> [317.128404] kunit_generic_run_threadfn_adapter+0x1e/0x40 [kunit] <4> [317.128407] kthread+0xf5/0x130 <4> [317.128410] ? __pfx_kthread+0x10/0x10 <4> [317.128412] ret_from_fork+0x39/0x60 <4> [317.128415] ? __pfx_kthread+0x10/0x10 <4> [317.128416] ret_from_fork_asm+0x1a/0x30 <4> [317.128420] </TASK> Fixes: 266c85885263 ("drm/xe/xe2: Handle flat ccs move for igfx.") Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927161308.862323-2-zhanjun.dong@intel.com (cherry picked from commit 59a1c9c7e1d02b43b415ea92627ce095b7c79e47) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream closeJosé Roberto de Souza
Mesa testing on Xe2+ revealed that when OA metrics are collected for an exec_queue, after the OA stream is closed, future batch buffers submitted on that exec_queue do not complete. Not resetting OAC_CONTEXT_ENABLE on OA stream close resolves these hangs and should not have any adverse effects. v2: Make the change that we don't reset the bit clearer (Ashutosh) Also make the same fix for OAC as OAR (Ashutosh) Bspec: 60314 Fixes: 2f4a730fcd2d ("drm/xe/oa: Add OAR support") Fixes: 14e077f8006d ("drm/xe/oa: Add OAC support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2821 Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924213713.3497992-1-ashutosh.dixit@intel.com (cherry picked from commit 0c8650b09a365f4a31fca1d1d1e9d99c56071128) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe/queue: move xa_alloc to prevent UAFMatthew Auld
Evil user can guess the next id of the queue before the ioctl completes and then call queue destroy ioctl to trigger UAF since create ioctl is still referencing the same queue. Move the xa_alloc all the way to the end to prevent this. v2: - Rebase Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-4-matthew.auld@intel.com (cherry picked from commit 16536582ddbebdbdf9e1d7af321bbba2bf955a87) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe/vm: move xa_alloc to prevent UAFMatthew Auld
Evil user can guess the next id of the vm before the ioctl completes and then call vm destroy ioctl to trigger UAF since create ioctl is still referencing the same vm. Move the xa_alloc all the way to the end to prevent this. v2: - Rebase Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240925071426.144015-3-matthew.auld@intel.com (cherry picked from commit dcfd3971327f3ee92765154baebbaece833d3ca9) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe: Clean up VM / exec queue file lock usage.Matthew Brost
Both the VM / exec queue file lock protect the lookup and reference to the object, nothing more. These locks are not intended anything else underneath them. XA have their own locking too, so no need to take the VM / exec queue file lock aside from when doing a lookup and reference get. Add some kernel doc to make this clear and cleanup a few typos too. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240921011712.2681510-1-matthew.brost@intel.com (cherry picked from commit fe4f5d4b661666a45b48fe7f95443f8fefc09c8c) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe: Resume TDR after GT resetMatthew Brost
Not starting the TDR after GT reset on exec queue which have been restarted can lead to jobs being able to be run forever. Fix this by restarting the TDR. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240724235919.1917216-1-matthew.brost@intel.com (cherry picked from commit 8ec5a4e5ce97d6ee9f5eb5b4ce4cfc831976fdec) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe/xe2: Add performance tuning for L3 cache flushingGustavo Sousa
A recommended performance tuning for LNL related to L3 cache flushing was recently introduced in Bspec. Implement it. Unlike the other existing tuning settings, we limit this one for LNL only, since there is no info about whether this would be applicable to other platforms yet. In the future we can come back and use IP version ranges if applicable. v2: - Fix reference to Bspec. (Sai Teja, Tejas) - Use correct register name for "Tuning: L3 RW flush all Cache". (Sai Teja) - Use SCRATCH3_LBCF (with the underscore) for better readability. v3: - Limit setting to LNL only. (Matt) Bspec: 72161 Cc: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-5-gustavo.sousa@intel.com (cherry picked from commit 876253165f3eaaacacb8c8bed16a9df4b6081479) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe/xe2: Extend performance tuning to media GTGustavo Sousa
With exception of "Tuning: L3 cache - media", we are currently applying recommended performance tuning settings only for the primary GT. Let's also implement them for the media GT when applicable. According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist only in Xe2_LPM and their offsets do not match their primary GT counterparts. Furthermore, the range where CCCHKNREG1 belongs is not listed as a multicast range on the media GT. As such, we need to have Xe2_LPM-specific definitions for those registers and apply the setting only for that specific IP. Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the offset on the media GT matches the one on the primary one. So we can simply have a copy of "Tuning: Stateless compression control" for the media GT. v2: - Fix implementation with respect to multicast vs non-multicast registers. (Matt) - Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning: Compression Overfetch - media". v3: - STATELESS_COMPRESSION_CTRL on Xe2_HPM is also a multicast register, do not define a XE2HPM_STATELESS_COMPRESSION_CTRL register. (Tejas) Bspec: 72161 Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-3-gustavo.sousa@intel.com (cherry picked from commit e1f813947ccf2326cfda4558b7d31430d7860c4b) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPMGustavo Sousa
According to Bspec, Xe2 steering tables must be used for Xe2_HPM, just as it is with Xe2_LPM. Update our driver to reflect that. Bspec: 71186 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240920211459.255181-2-gustavo.sousa@intel.com (cherry picked from commit 21ae035ae5c33ef176f4062bd9d4aa973dde240b) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe: Use helper for ASID -> VM in GPU faults and access countersMatthew Brost
Normalize both code paths with a helper. Fixes a possible leak access counter path too. Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918160503.2021315-1-matthew.brost@intel.com (cherry picked from commit dc0dce6d63d22e8319e27b6a41be7368376f9471) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe: Convert to USM lock to rwsemMatthew Brost
Remove contention from GPU fault path for ASID->VM lookup. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240918054436.1971839-1-matthew.brost@intel.com (cherry picked from commit 1378c633a3fbfeb344c486ffda0e920a21e62712) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe: use devm_add_action_or_reset() helperHe Lugang
Use devm_add_action_or_reset() to release resources in case of failure, because the cleanup function will be automatically called. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: He Lugang <helugang@uniontech.com> Link: https://patchwork.freedesktop.org/patch/msgid/9631BC17D1E028A2+20240911102215.84865-1-helugang@uniontech.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit fdc81c43f0c14ace6383024a02585e3fcbd1ceba) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe: fix UAF around queue destructionMatthew Auld
We currently do stuff like queuing the final destruction step on a random system wq, which will outlive the driver instance. With bad timing we can teardown the driver with one or more work workqueue still being alive leading to various UAF splats. Add a fini step to ensure user queues are properly torn down. At this point GuC should already be nuked so queue itself should no longer be referenced from hw pov. v2 (Matt B) - Looks much safer to use a waitqueue and then just wait for the xa_array to become empty before triggering the drain. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2317 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240923145647.77707-2-matthew.auld@intel.com (cherry picked from commit 861108666cc0e999cffeab6aff17b662e68774e3) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe/guc_submit: add missing locking in wedged_finiMatthew Auld
Any non-wedged queue can have a zero refcount here and can be running concurrently with an async queue destroy, therefore dereferencing the queue ptr to check wedge status after the lookup can trigger UAF if queue is not wedged. Fix this by keeping the submission_state lock held around the check to postpone the free and make the check safe, before dropping again around the put() to avoid the deadlock. Fixes: 8ed9aaae39f3 ("drm/xe: Force wedged state and block GT reset upon any GPU hang") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924150947.118433-2-matthew.auld@intel.com (cherry picked from commit d28af0b6b9580b9f90c265a7da0315b0ad20bbfd) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03drm/xe: Restore pci state upon resumeRodrigo Vivi
The pci state was saved, but not restored. Restore right after the power state transition request like every other driver. v2: Use right fixes tag, since this was there initialy, but accidentally removed. Fixes: f6761c68c0ac ("drm/xe/display: Improve s2idle handling.") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912214507.456897-1-rodrigo.vivi@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (cherry picked from commit ec2d1539e159f53eae708e194c449cfefa004994) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-03Merge tag 'drm-intel-fixes-2024-10-02' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - One fix for bitwise and logical "and" mixup in PM code Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zv1l75s9Z4Gl4lDH@jlahtine-mobl.ger.corp.intel.com
2024-10-03Merge tag 'drm-misc-fixes-2024-10-02' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: panthor: - Set FOP_UNSIGNED_OFFSET in fops instance - Acquire lock in panthor_vm_prepare_map_op_ctx() - Avoid ninitialized variable in tick_ctx_cleanup() - Do not block scheduler queue if work is pending - Do not add write fences to the shared BOs scheduler: - Fix locking in drm_sched_entity_modify_sched() - Fix pointer deref if entity queue changes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20241002151528.GA300287@linux.fritz.box