Age | Commit message (Collapse) | Author |
|
The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(),
dereferences a "data" pointer.
On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However
the interrupt handler may still be invoked afterwards and will dereference
that NULL pointer.
That's because the interrupt is requested using a devm_*() helper and the
driver core releases devm_*() resources *after* calling ->remove().
pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt
Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link
Control Register, but that won't prevent execution of pcie_bwnotif_irq():
The interrupt for bandwidth notifications may be shared with AER, DPC,
PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the
interrupt is requested.
There's a similar race on bind: pcie_bwnotif_probe() requests the
interrupt when the "data" pointer still points to NULL. A NULL pointer
deref may thus likewise occur if AER, DPC, PME or hotplug raise an
interrupt in-between the bandwidth controller's call to devm_request_irq()
and assignment of the "data" pointer.
Drop the devm_*() usage and reorder requesting of the interrupt to fix the
issue.
While at it, drop a stray but harmless no_free_ptr() invocation when
assigning the "data" pointer in pcie_bwnotif_probe().
Ilpo points out that the locking on unbind and bind needs to be symmetric,
so move the call to pcie_bwnotif_disable() inside the critical section
protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem.
Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV.
The issue is no longer reproducible with the present commit.
Evert found that attaching a USB-C monitor prevented the hang. The
machine contains an ASMedia USB 3.2 controller below a hotplug-capable
Root Port. So one possible explanation is that the controller gets
hot-removed on shutdown unless something is connected. And the ensuing
hotplug interrupt occurs exactly when the bandwidth controller is
unregistering. The precise cause could not be determined because the
screen had already turned black when the hang occurred.
Link: https://lore.kernel.org/r/ae2b02c9cfbefff475b6e132b0aa962aaccbd7b2.1736162539.git.lukas@wunner.de
Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Evert Vorster <evorster@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Evert Vorster <evorster@gmail.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
interface"
There is a garbage value problem in fbnic_mac_get_sensor_asic(). 'fw_cmpl'
is uninitialized which makes 'sensor' and '*val' to be stored garbage
value. Revert commit d85ebade02e8 ("eth: fbnic: Add hardware monitoring
support via HWMON interface") to avoid this problem.
Fixes: d85ebade02e8 ("eth: fbnic: Add hardware monitoring support via HWMON interface")
Signed-off-by: Su Hui <suhui@nfschina.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250106023647.47756-1-suhui@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
of_thermal_zone_find() calls of_parse_phandle_with_args(), but does not
release the OF node reference obtained by it.
Add a of_node_put() call when the call is successful.
Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization")
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Link: https://patch.msgid.link/20241224031809.950461-1-joe@pf.is.s.u-tokyo.ac.jp
[ rjw: Changelog edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_dev_irq_override() gets called approx. 30 times during boot (15 legacy
IRQs * 2 override_table entries). Of these 30 calls at max 1 will match
the non DMI checks done by acpi_dev_irq_override(). The dmi_check_system()
check is by far the most expensive check done by acpi_dev_irq_override(),
make this call the last check done by acpi_dev_irq_override() so that it
will be called at max 1 time instead of 30 times.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/20241228165253.42584-1-hdegoede@redhat.com
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The TongFang GM5HG0A is a TongFang barebone design which is sold under
various brand names.
The ACPI IRQ override for the keyboard IRQ must be used on these AMD Zen
laptops in order for the IRQ to work.
At least on the SKIKK Vanaheim variant the DMI product- and board-name
strings have been replaced by the OEM with "Vanaheim" so checking that
board-name contains "GM5HG0A" as is usually done for TongFang barebones
quirks does not work.
The DMI OEM strings do contain "GM5HG0A". I have looked at the dmidecode
for a few other TongFang devices and the TongFang code-name string being
in the OEM strings seems to be something which is consistently true.
Add a quirk checking one of the DMI_OEM_STRING(s) is "GM5HG0A" in the hope
that this will work for other OEM versions of the "GM5HG0A" too.
Link: https://www.skikk.eu/en/laptops/vanaheim-15-rtx-4060
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219614
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patch.msgid.link/20241228164845.42381-1-hdegoede@redhat.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Like the Vivobook X1704VAP the X1504VAP has its keyboard IRQ (1) described
as ActiveLow in the DSDT, which the kernel overrides to EdgeHigh which
breaks the keyboard.
Add the X1504VAP to the irq1_level_low_skip_override[] quirk table to fix
this.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219224
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patch.msgid.link/20241220181352.25974-1-hdegoede@redhat.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
When booting with a dock connected, the igc driver may get stuck for ~40
seconds if PCIe link is lost during initialization.
This happens because the driver access device after EECD register reads
return all F's, indicating failed reads. Consequently, hw->hw_addr is set
to NULL, which impacts subsequent rd32() reads. This leads to the driver
hanging in igc_get_hw_semaphore_i225(), as the invalid hw->hw_addr
prevents retrieving the expected value.
To address this, a validation check and a corresponding return value
catch is added for the EECD register read result. If all F's are
returned, indicating PCIe link loss, the driver will return -ENXIO
immediately. This avoids the 40-second hang and significantly improves
boot time when using a dock with an igc NIC.
Log before the patch:
[ 0.911913] igc 0000:70:00.0: enabling device (0000 -> 0002)
[ 0.912386] igc 0000:70:00.0: PTM enabled, 4ns granularity
[ 1.571098] igc 0000:70:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
[ 43.449095] igc_get_hw_semaphore_i225: igc 0000:70:00.0 (unnamed net_device) (uninitialized): Driver can't access device - SMBI bit is set.
[ 43.449186] igc 0000:70:00.0: probe with driver igc failed with error -13
[ 46.345701] igc 0000:70:00.0: enabling device (0000 -> 0002)
[ 46.345777] igc 0000:70:00.0: PTM enabled, 4ns granularity
Log after the patch:
[ 1.031000] igc 0000:70:00.0: enabling device (0000 -> 0002)
[ 1.032097] igc 0000:70:00.0: PTM enabled, 4ns granularity
[ 1.642291] igc 0000:70:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
[ 5.480490] igc 0000:70:00.0: enabling device (0000 -> 0002)
[ 5.480516] igc 0000:70:00.0: PTM enabled, 4ns granularity
Fixes: ab4056126813 ("igc: Add NVM support")
Cc: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ptp4l application reports too high offset when ran on E823 device
with a 100GB/s link. Those values cannot go under 100ns, like in a
working case when using 100 GB/s cable.
This is due to incorrect frequency settings on the PHY clocks for
100 GB/s speed. Changes are introduced to align with the internal
hardware documentation, and correctly initialize frequency in PHY
clocks with the frequency values that are in our HW spec.
To reproduce the issue run ptp4l as a Time Receiver on E823 device,
and observe the offset, which will never approach values seen
in the PTP working case.
Reproduction output:
ptp4l -i enp137s0f3 -m -2 -s -f /etc/ptp4l_8275.conf
ptp4l[5278.775]: master offset 12470 s2 freq +41288 path delay -3002
ptp4l[5278.837]: master offset 10525 s2 freq +39202 path delay -3002
ptp4l[5278.900]: master offset -24840 s2 freq -20130 path delay -3002
ptp4l[5278.963]: master offset 10597 s2 freq +37908 path delay -3002
ptp4l[5279.025]: master offset 8883 s2 freq +36031 path delay -3002
ptp4l[5279.088]: master offset 7267 s2 freq +34151 path delay -3002
ptp4l[5279.150]: master offset 5771 s2 freq +32316 path delay -3002
ptp4l[5279.213]: master offset 4388 s2 freq +30526 path delay -3002
ptp4l[5279.275]: master offset -30434 s2 freq -28485 path delay -3002
ptp4l[5279.338]: master offset -28041 s2 freq -27412 path delay -3002
ptp4l[5279.400]: master offset 7870 s2 freq +31118 path delay -3002
Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support")
Reviewed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Mask admin command returned max phase adjust value for both input and
output pins. Only 31 bits are relevant, last released data sheet wrongly
points that 32 bits are valid - see [1] 3.2.6.4.1 Get CCU Capabilities
Command for reference. Fix of the datasheet itself is in progress.
Fix the min/max assignment logic, previously the value was wrongly
considered as negative value due to most significant bit being set.
Example of previous broken behavior:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-get --json '{"id":1}'| grep phase-adjust
'phase-adjust': 0,
'phase-adjust-max': 16723,
'phase-adjust-min': -16723,
Correct behavior with the fix:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
--do pin-get --json '{"id":1}'| grep phase-adjust
'phase-adjust': 0,
'phase-adjust-max': 2147466925,
'phase-adjust-min': -2147466925,
[1] https://cdrdv2.intel.com/v1/dl/getContent/613875?explicitVersion=true
Fixes: 90e1c90750d7 ("ice: dpll: implement phase related callbacks")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
During fuzz testing, the following warning was discovered:
different return values (15 and 11) from vsnprintf("%*pbl
", ...)
test:keyward is WARNING in kvasprintf
WARNING: CPU: 55 PID: 1168477 at lib/kasprintf.c:30 kvasprintf+0x121/0x130
Call Trace:
kvasprintf+0x121/0x130
kasprintf+0xa6/0xe0
bitmap_print_to_buf+0x89/0x100
core_siblings_list_read+0x7e/0xb0
kernfs_file_read_iter+0x15b/0x270
new_sync_read+0x153/0x260
vfs_read+0x215/0x290
ksys_read+0xb9/0x160
do_syscall_64+0x56/0x100
entry_SYSCALL_64_after_hwframe+0x78/0xe2
The call trace shows that kvasprintf() reported this warning during the
printing of core_siblings_list. kvasprintf() has several steps:
(1) First, calculate the length of the resulting formatted string.
(2) Allocate a buffer based on the returned length.
(3) Then, perform the actual string formatting.
(4) Check whether the lengths of the formatted strings returned in
steps (1) and (2) are consistent.
If the core_cpumask is modified between steps (1) and (3), the lengths
obtained in these two steps may not match. Indeed our test includes cpu
hotplugging, which should modify core_cpumask while printing.
To fix this issue, cache the cpumask into a temporary variable before
calling cpumap_print_{list, cpumask}_to_buf(), to keep it unchanged
during the printing process.
Fixes: bb9ec13d156e ("topology: use bin_attribute to break the size limitation of cpumap ABI")
Cc: stable <stable@kernel.org>
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20241114110141.94725-1-lihuafei1@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In pmc_core_ssram_get_pmc(), the physical addresses for hidden SSRAM
devices are retrieved from the MMIO region of the primary SSRAM device.
If additional devices are not present, the address returned is zero.
Currently, the code does not check for this condition, resulting in
ioremap() incorrectly attempting to map address 0.
Add a check for a zero address and return 0 if no additional devices
are found, as it is not an error for the device to be absent.
Fixes: a01486dc4bb1 ("platform/x86/intel/pmc: Cleanup SSRAM discovery")
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20250106174653.1497128-1-david.e.box@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Add Clearwater Forest (INTEL_ATOM_DARKMONT_X) to SST support list by
adding to isst_cpu_ids.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20250103155255.1488139-2-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Add Clearwater Forest support (INTEL_ATOM_DARKMONT_X) to tpmi_cpu_ids
to support domaid id mappings.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20250103155255.1488139-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Wakeup for IRQ1 should be disabled only in cases where i8042 had
actually enabled it, otherwise "wake_depth" for this IRQ will try to
drop below zero and there will be an unpleasant WARN() logged:
kernel: atkbd serio0: Disabling IRQ1 wakeup source to avoid platform firmware bug
kernel: ------------[ cut here ]------------
kernel: Unbalanced IRQ 1 wake disable
kernel: WARNING: CPU: 10 PID: 6431 at kernel/irq/manage.c:920 irq_set_irq_wake+0x147/0x1a0
The PMC driver uses DEFINE_SIMPLE_DEV_PM_OPS() to define its dev_pm_ops
which sets amd_pmc_suspend_handler() to the .suspend, .freeze, and
.poweroff handlers. i8042_pm_suspend(), however, is only set as
the .suspend handler.
Fix the issue by call PMC suspend handler only from the same set of
dev_pm_ops handlers as i8042_pm_suspend(), which currently means just
the .suspend handler.
To reproduce this issue try hibernating (S4) the machine after a fresh boot
without putting it into s2idle first.
Fixes: 8e60615e8932 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN")
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Link: https://lore.kernel.org/r/c8f28c002ca3c66fbeeb850904a1f43118e17200.1736184606.git.mail@maciej.szmigiero.name
[ij: edited the commit message.]
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Commit 79d2e1919a27 ("staging: gpib: fix Makefiles") uses the corresponding
config symbols to let Makefiles include the driver sources appropriately in
the kernel build.
Unfortunately, the Makefile in the tnt4882 directory refers to the
non-existing config GPIB_TNT4882. The actual config name for this driver is
GPIB_NI_PCI_ISA, as can be observed in the gpib Makefile.
Probably, this is caused by the subtle differences between the config
names, directory names and file names in ./drivers/staging/gpib/, where
often config names and directory names are identical or at least close in
naming, but in this case, it is not.
Change the reference in the tnt4882 Makefile from the non-existing config
GPIB_TNT4882 to the existing config GPIB_NI_PCI_ISA.
Fixes: 79d2e1919a27 ("staging: gpib: fix Makefiles")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Link: https://lore.kernel.org/r/20250107135032.34424-1-lukas.bulwahn@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The error handling for the case `con_index == 0` should involve dropping
the pm usage counter, as ucsi_ccg_sync_control() gets it at the
beginning. Fix it.
Cc: stable <stable@kernel.org>
Fixes: e56aac6e5a25 ("usb: typec: fix potential array underflow in ucsi_ccg_sync_control()")
Signed-off-by: GONG Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20250107015750.2778646-1-gongruiqi1@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This fixes data corruption when accessing the internal SD card in mass
storage mode.
I am actually not too sure why. I didn't figure a straightforward way to
reproduce the issue, but i seem to get garbage when issuing a lot (over 50)
of large reads (over 120 sectors) are done in a quick succession. That is,
time seems to matter here -- larger reads are fine if they are done with
some delay between them.
But I'm not great at understanding this sort of things, so I'll assume
the issue other, smarter, folks were seeing with similar phones is the
same problem and I'll just put my quirk next to theirs.
The "Software details" screen on the phone is as follows:
V 04.06
07-08-13
RM-849
(c) Nokia
TL;DR version of the device descriptor:
idVendor 0x0421 Nokia Mobile Phones
idProduct 0x06c2
bcdDevice 4.06
iManufacturer 1 Nokia
iProduct 2 Nokia 208
The patch assumes older firmwares are broken too (I'm unable to test, but
no biggie if they aren't I guess), and I have no idea if newer firmware
exists.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: stable <stable@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20250101212206.2386207-1-lkundrak@v3.sk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We should do reverse selection of other components from
CONFIG_USB_F_MIDI2 which is tristate, instead of
CONFIG_USB_CONFIGFS_F_MIDI2 which is bool, for satisfying subtle
module dependencies.
Fixes: 8b645922b223 ("usb: gadget: Add support for USB MIDI 2.0 function driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20250101131124.27599-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
During ARP failure, tid is not inserted but _c4iw_free_ep()
attempts to remove tid which results in error.
This patch fixes the issue by avoiding removal of uninserted tid.
Fixes: 59437d78f088 ("cxgb4/chtls: fix ULD connection failures due to wrong TID base")
Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://patch.msgid.link/20250103092327.1011925-1-anumula@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
DIM work will call the firmware to adjust the coalescing parameters on
the RX rings. We should cancel DIM work before we call the firmware
to free the RX rings. Otherwise, FW will reject the call from DIM
work if the RX ring has been freed. This will generate an error
message like this:
bnxt_en 0000:21:00.1 ens2f1np1: hwrm req_type 0x53 seq id 0x6fca error 0x2
and cause unnecessary concern for the user. It is also possible to
modify the coalescing parameters of the wrong ring if the ring has
been re-allocated.
To prevent this, cancel DIM work right before freeing the RX rings.
We also have to add a check in NAPI poll to not schedule DIM if the
RX rings are shutting down. Check that the VNIC is active before we
schedule DIM. The VNIC is always disabled before we free the RX rings.
Fixes: 0bc0b97fca73 ("bnxt_en: cleanup DIM work on device shutdown")
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250104043849.3482067-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When hwrm_req_replace() fails, the driver is not invoking bnxt_req_drop()
which could cause a memory leak.
Fixes: bbf33d1d9805 ("bnxt_en: update all firmware calls to use the new APIs")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250104043849.3482067-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an array size limit to the for-loop to be sure we don't try
to reference a fw_version string off the end of the fw info names
array. We know that our firmware only has a limited number
of firmware slot names, but we shouldn't leave this unchecked.
Fixes: 45d76f492938 ("pds_core: set up device and adminq")
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250103195147.7408-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When running YouTube videos and Steam games simultaneously,
the tester found a system hang / race condition issue with
the multi-display configuration setting. Adding a lock to
the buddy allocator's trim function would be the solution.
<log snip>
[ 7197.250436] general protection fault, probably for non-canonical address 0xdead000000000108
[ 7197.250447] RIP: 0010:__alloc_range+0x8b/0x340 [amddrm_buddy]
[ 7197.250470] Call Trace:
[ 7197.250472] <TASK>
[ 7197.250475] ? show_regs+0x6d/0x80
[ 7197.250481] ? die_addr+0x37/0xa0
[ 7197.250483] ? exc_general_protection+0x1db/0x480
[ 7197.250488] ? drm_suballoc_new+0x13c/0x93d [drm_suballoc_helper]
[ 7197.250493] ? asm_exc_general_protection+0x27/0x30
[ 7197.250498] ? __alloc_range+0x8b/0x340 [amddrm_buddy]
[ 7197.250501] ? __alloc_range+0x109/0x340 [amddrm_buddy]
[ 7197.250506] amddrm_buddy_block_trim+0x1b5/0x260 [amddrm_buddy]
[ 7197.250511] amdgpu_vram_mgr_new+0x4f5/0x590 [amdgpu]
[ 7197.250682] amdttm_resource_alloc+0x46/0xb0 [amdttm]
[ 7197.250689] ttm_bo_alloc_resource+0xe4/0x370 [amdttm]
[ 7197.250696] amdttm_bo_validate+0x9d/0x180 [amdttm]
[ 7197.250701] amdgpu_bo_pin+0x15a/0x2f0 [amdgpu]
[ 7197.250831] amdgpu_dm_plane_helper_prepare_fb+0xb2/0x360 [amdgpu]
[ 7197.251025] ? try_wait_for_completion+0x59/0x70
[ 7197.251030] drm_atomic_helper_prepare_planes.part.0+0x2f/0x1e0
[ 7197.251035] drm_atomic_helper_prepare_planes+0x5d/0x70
[ 7197.251037] drm_atomic_helper_commit+0x84/0x160
[ 7197.251040] drm_atomic_nonblocking_commit+0x59/0x70
[ 7197.251043] drm_mode_atomic_ioctl+0x720/0x850
[ 7197.251047] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[ 7197.251049] drm_ioctl_kernel+0xb9/0x120
[ 7197.251053] ? srso_alias_return_thunk+0x5/0xfbef5
[ 7197.251056] drm_ioctl+0x2d4/0x550
[ 7197.251058] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[ 7197.251063] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
[ 7197.251186] __x64_sys_ioctl+0xa0/0xf0
[ 7197.251190] x64_sys_call+0x143b/0x25c0
[ 7197.251193] do_syscall_64+0x7f/0x180
[ 7197.251197] ? srso_alias_return_thunk+0x5/0xfbef5
[ 7197.251199] ? amdgpu_display_user_framebuffer_create+0x215/0x320 [amdgpu]
[ 7197.251329] ? drm_internal_framebuffer_create+0xb7/0x1a0
[ 7197.251332] ? srso_alias_return_thunk+0x5/0xfbef5
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Fixes: 4a5ad08f5377 ("drm/amdgpu: Add address alignment support to DCC buffers")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3318ba94e56b9183d0304577c74b33b6b01ce516)
Cc: stable@vger.kernel.org
|
|
atomic scheduling will be triggered in interrupt handler for
AC/DC mode switch as following backtrace.
Call Trace:
<IRQ>
dump_stack_lvl
__schedule_bug
__schedule
schedule
schedule_preempt_disabled
__mutex_lock
smu_cmn_send_smc_msg_with_param
smu_v13_0_irq_process
amdgpu_irq_dispatch
amdgpu_ih_process
amdgpu_irq_handler
__handle_irq_event_percpu
handle_irq_event
handle_edge_irq
__common_interrupt
common_interrupt
</IRQ>
<TASK>
asm_common_interrupt
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Kun Liu <Kun.Liu2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 03cc84b102d1a832e8dfc59344346dedcebcdf42)
Cc: stable@vger.kernel.org
|
|
kfd_process_wq_release() signals eviction fence by
dma_fence_signal() which wanrs if dma_fence
is NULL.
kfd_process->ef is initialized by kfd_process_device_init_vm()
through ioctl. That means the fence is NULL for a new
created kfd_process, and close a kfd_process right
after open it will trigger the warning.
This commit conditionally signals the eviction fence
in kfd_process_wq_release() only when it is available.
[ 503.660882] WARNING: CPU: 0 PID: 9 at drivers/dma-buf/dma-fence.c:467 dma_fence_signal+0x74/0xa0
[ 503.782940] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu]
[ 503.789640] RIP: 0010:dma_fence_signal+0x74/0xa0
[ 503.877620] Call Trace:
[ 503.880066] <TASK>
[ 503.882168] ? __warn+0xcd/0x260
[ 503.885407] ? dma_fence_signal+0x74/0xa0
[ 503.889416] ? report_bug+0x288/0x2d0
[ 503.893089] ? handle_bug+0x53/0xa0
[ 503.896587] ? exc_invalid_op+0x14/0x50
[ 503.900424] ? asm_exc_invalid_op+0x16/0x20
[ 503.904616] ? dma_fence_signal+0x74/0xa0
[ 503.908626] kfd_process_wq_release+0x6b/0x370 [amdgpu]
[ 503.914081] process_one_work+0x654/0x10a0
[ 503.918186] worker_thread+0x6c3/0xe70
[ 503.921943] ? srso_alias_return_thunk+0x5/0xfbef5
[ 503.926735] ? srso_alias_return_thunk+0x5/0xfbef5
[ 503.931527] ? __kthread_parkme+0x82/0x140
[ 503.935631] ? __pfx_worker_thread+0x10/0x10
[ 503.939904] kthread+0x2a8/0x380
[ 503.943132] ? __pfx_kthread+0x10/0x10
[ 503.946882] ret_from_fork+0x2d/0x70
[ 503.950458] ? __pfx_kthread+0x10/0x10
[ 503.954210] ret_from_fork_asm+0x1a/0x30
[ 503.958142] </TASK>
[ 503.960328] ---[ end trace 0000000000000000 ]---
Fixes: 967d226eaae8 ("dma-buf: add WARN_ON() illegal dma-fence signaling")
Signed-off-by: Zhu Lingshan <lingshan.zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2774ef7625adb5fb9e9265c26a59dca7b8fd171e)
Cc: stable@vger.kernel.org
|
|
[Why]
Wrapper functions for dcn_bw_ceil2() and dcn_bw_floor2()
should check for granularity is non zero to avoid assert and
divide-by-zero error in dcn_bw_ functions.
[How]
Add check for granularity 0.
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f6e09701c3eb2ccb8cb0518e0b67f1c69742a4ec)
Cc: stable@vger.kernel.org
|
|
Initialize the process context address before setting the shader debugger.
[ 260.781212] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[ 260.781236] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 260.781255] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A40
[ 260.781270] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 260.781284] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 260.781296] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 260.781308] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x4
[ 260.781320] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 260.781332] amdgpu 0000:03:00.0: amdgpu: RW: 0x1
[ 260.782017] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[ 260.782039] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 260.782058] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A41
[ 260.782073] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 260.782087] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[ 260.782098] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 260.782110] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x4
[ 260.782122] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 260.782137] amdgpu 0000:03:00.0: amdgpu: RW: 0x1
[ 260.782155] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
[ 260.782166] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
Fixes: 438b39ac74e2 ("drm/amdkfd: pause autosuspend when creating pdd")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3849
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5b231f5bc9ff02ec5737f2ec95cdf15ac95088e9)
Cc: stable@vger.kernel.org
|
|
dm_get_plane_scale doesn't take into account plane scaled size equal to
zero, leading to a kernel oops due to division by zero. Fix by setting
out-scale size as zero when the dst size is zero, similar to what is
done by drm_calc_scale(). This issue started with the introduction of
cursor ovelay mode that uses this function to assess cursor mode changes
via dm_crtc_get_cursor_mode() before checking plane state.
[Dec17 17:14] Oops: divide error: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000018] CPU: 5 PID: 1660 Comm: surface-DP-1 Not tainted 6.10.0+ #231
[ +0.000007] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024
[ +0.000004] RIP: 0010:dm_get_plane_scale+0x3f/0x60 [amdgpu]
[ +0.000553] Code: 44 0f b7 41 3a 44 0f b7 49 3e 83 e0 0f 48 0f a3 c2 73 21 69 41 28 e8 03 00 00 31 d2 41 f7 f1 31 d2 89 06 69 41 2c e8 03 00 00 <41> f7 f0 89 07 e9 d7 d8 7e e9 44 89 c8 45 89 c1 41 89 c0 eb d4 66
[ +0.000005] RSP: 0018:ffffa8df0de6b8a0 EFLAGS: 00010246
[ +0.000006] RAX: 00000000000003e8 RBX: ffff9ac65c1f6e00 RCX: ffff9ac65d055500
[ +0.000003] RDX: 0000000000000000 RSI: ffffa8df0de6b8b0 RDI: ffffa8df0de6b8b4
[ +0.000004] RBP: ffff9ac64e7a5800 R08: 0000000000000000 R09: 0000000000000a00
[ +0.000003] R10: 00000000000000ff R11: 0000000000000054 R12: ffff9ac6d0700010
[ +0.000003] R13: ffff9ac65d054f00 R14: ffff9ac65d055500 R15: ffff9ac64e7a60a0
[ +0.000004] FS: 00007f869ea00640(0000) GS:ffff9ac970080000(0000) knlGS:0000000000000000
[ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000003] CR2: 000055ca701becd0 CR3: 000000010e7f2000 CR4: 0000000000350ef0
[ +0.000004] Call Trace:
[ +0.000007] <TASK>
[ +0.000006] ? __die_body.cold+0x19/0x27
[ +0.000009] ? die+0x2e/0x50
[ +0.000007] ? do_trap+0xca/0x110
[ +0.000007] ? do_error_trap+0x6a/0x90
[ +0.000006] ? dm_get_plane_scale+0x3f/0x60 [amdgpu]
[ +0.000504] ? exc_divide_error+0x38/0x50
[ +0.000005] ? dm_get_plane_scale+0x3f/0x60 [amdgpu]
[ +0.000488] ? asm_exc_divide_error+0x1a/0x20
[ +0.000011] ? dm_get_plane_scale+0x3f/0x60 [amdgpu]
[ +0.000593] dm_crtc_get_cursor_mode+0x33f/0x430 [amdgpu]
[ +0.000562] amdgpu_dm_atomic_check+0x2ef/0x1770 [amdgpu]
[ +0.000501] drm_atomic_check_only+0x5e1/0xa30 [drm]
[ +0.000047] drm_mode_atomic_ioctl+0x832/0xcb0 [drm]
[ +0.000050] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [drm]
[ +0.000047] drm_ioctl_kernel+0xb3/0x100 [drm]
[ +0.000062] drm_ioctl+0x27a/0x4f0 [drm]
[ +0.000049] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [drm]
[ +0.000055] amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
[ +0.000360] __x64_sys_ioctl+0x97/0xd0
[ +0.000010] do_syscall_64+0x82/0x190
[ +0.000008] ? __pfx_drm_mode_createblob_ioctl+0x10/0x10 [drm]
[ +0.000044] ? srso_return_thunk+0x5/0x5f
[ +0.000006] ? drm_ioctl_kernel+0xb3/0x100 [drm]
[ +0.000040] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? __check_object_size+0x50/0x220
[ +0.000007] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? drm_ioctl+0x2a4/0x4f0 [drm]
[ +0.000039] ? __pfx_drm_mode_createblob_ioctl+0x10/0x10 [drm]
[ +0.000043] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? __pm_runtime_suspend+0x69/0xc0
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? amdgpu_drm_ioctl+0x71/0x90 [amdgpu]
[ +0.000366] ? srso_return_thunk+0x5/0x5f
[ +0.000006] ? syscall_exit_to_user_mode+0x77/0x210
[ +0.000007] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? do_syscall_64+0x8e/0x190
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000006] ? do_syscall_64+0x8e/0x190
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000008] RIP: 0033:0x55bb7cd962bc
[ +0.000007] Code: 4c 89 6c 24 18 4c 89 64 24 20 4c 89 74 24 28 0f 57 c0 0f 11 44 24 30 89 c7 48 8d 54 24 08 b8 10 00 00 00 be bc 64 38 c0 0f 05 <49> 89 c7 48 83 3b 00 74 09 4c 89 c7 ff 15 62 64 99 00 48 83 7b 18
[ +0.000005] RSP: 002b:00007f869e9f4da0 EFLAGS: 00000217 ORIG_RAX: 0000000000000010
[ +0.000007] RAX: ffffffffffffffda RBX: 00007f869e9f4fb8 RCX: 000055bb7cd962bc
[ +0.000004] RDX: 00007f869e9f4da8 RSI: 00000000c03864bc RDI: 000000000000003b
[ +0.000003] RBP: 000055bb9ddcbcc0 R08: 00007f86541b9920 R09: 0000000000000009
[ +0.000004] R10: 0000000000000004 R11: 0000000000000217 R12: 00007f865406c6b0
[ +0.000003] R13: 00007f86541b5290 R14: 00007f865410b700 R15: 000055bb9ddcbc18
[ +0.000009] </TASK>
Fixes: 1b04dcca4fb1 ("drm/amd/display: Introduce overlay cursor mode")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3729
Reported-by: Fabio Scaccabarozzi <fsvm88@gmail.com>
Co-developed-by: Fabio Scaccabarozzi <fsvm88@gmail.com>
Signed-off-by: Fabio Scaccabarozzi <fsvm88@gmail.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ab75a0d2e07942ae15d32c0a5092fd336451378c)
Cc: stable@vger.kernel.org
|
|
As the hw supports up to 4 surfaces, increase the maximum number of
surfaces to prevent the DC error when trying to use more than three
planes.
[drm:dc_state_add_plane [amdgpu]] *ERROR* Surface: can not attach plane_state 000000003e2cb82c! Maximum is: 3
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3693
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b8d6daffc871a42026c3c20bff7b8fa0302298c1)
Cc: stable@vger.kernel.org
|
|
DC driver is using two different values to define the maximum number of
surfaces: MAX_SURFACES and MAX_SURFACE_NUM. Consolidate MAX_SURFACES as
the unique definition for surface updates across DC.
It fixes page fault faced by Cosmic users on AMD display versions that
support two overlay planes, since the introduction of cursor overlay
mode.
[Nov26 21:33] BUG: unable to handle page fault for address: 0000000051d0f08b
[ +0.000015] #PF: supervisor read access in kernel mode
[ +0.000006] #PF: error_code(0x0000) - not-present page
[ +0.000005] PGD 0 P4D 0
[ +0.000007] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000006] CPU: 4 PID: 71 Comm: kworker/u32:6 Not tainted 6.10.0+ #300
[ +0.000006] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0131 01/30/2024
[ +0.000007] Workqueue: events_unbound commit_work [drm_kms_helper]
[ +0.000040] RIP: 0010:copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu]
[ +0.000847] Code: 8b 10 49 89 94 24 f8 00 00 00 48 8b 50 08 49 89 94 24 00 01 00 00 8b 40 10 41 89 84 24 08 01 00 00 49 8b 45 78 48 85 c0 74 0b <0f> b6 00 41 88 84 24 90 64 00 00 49 8b 45 60 48 85 c0 74 3b 48 8b
[ +0.000010] RSP: 0018:ffffc203802f79a0 EFLAGS: 00010206
[ +0.000009] RAX: 0000000051d0f08b RBX: 0000000000000004 RCX: ffff9f964f0a8070
[ +0.000004] RDX: ffff9f9710f90e40 RSI: ffff9f96600c8000 RDI: ffff9f964f000000
[ +0.000004] RBP: ffffc203802f79f8 R08: 0000000000000000 R09: 0000000000000000
[ +0.000005] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9f96600c8000
[ +0.000004] R13: ffff9f9710f90e40 R14: ffff9f964f000000 R15: ffff9f96600c8000
[ +0.000004] FS: 0000000000000000(0000) GS:ffff9f9970000000(0000) knlGS:0000000000000000
[ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000005] CR2: 0000000051d0f08b CR3: 00000002e6a20000 CR4: 0000000000350ef0
[ +0.000005] Call Trace:
[ +0.000011] <TASK>
[ +0.000010] ? __die_body.cold+0x19/0x27
[ +0.000012] ? page_fault_oops+0x15a/0x2d0
[ +0.000014] ? exc_page_fault+0x7e/0x180
[ +0.000009] ? asm_exc_page_fault+0x26/0x30
[ +0.000013] ? copy_stream_update_to_stream.isra.0+0x30d/0x750 [amdgpu]
[ +0.000739] ? dc_commit_state_no_check+0xd6c/0xe70 [amdgpu]
[ +0.000470] update_planes_and_stream_state+0x49b/0x4f0 [amdgpu]
[ +0.000450] ? srso_return_thunk+0x5/0x5f
[ +0.000009] ? commit_minimal_transition_state+0x239/0x3d0 [amdgpu]
[ +0.000446] update_planes_and_stream_v2+0x24a/0x590 [amdgpu]
[ +0.000464] ? srso_return_thunk+0x5/0x5f
[ +0.000009] ? sort+0x31/0x50
[ +0.000007] ? amdgpu_dm_atomic_commit_tail+0x159f/0x3a30 [amdgpu]
[ +0.000508] ? srso_return_thunk+0x5/0x5f
[ +0.000009] ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu]
[ +0.000377] ? srso_return_thunk+0x5/0x5f
[ +0.000009] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x160/0x390 [drm]
[ +0.000058] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? dma_fence_default_wait+0x8c/0x260
[ +0.000010] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? wait_for_completion_timeout+0x13b/0x170
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000005] ? dma_fence_wait_timeout+0x108/0x140
[ +0.000010] ? commit_tail+0x94/0x130 [drm_kms_helper]
[ +0.000024] ? process_one_work+0x177/0x330
[ +0.000008] ? worker_thread+0x266/0x3a0
[ +0.000006] ? __pfx_worker_thread+0x10/0x10
[ +0.000004] ? kthread+0xd2/0x100
[ +0.000006] ? __pfx_kthread+0x10/0x10
[ +0.000006] ? ret_from_fork+0x34/0x50
[ +0.000004] ? __pfx_kthread+0x10/0x10
[ +0.000005] ? ret_from_fork_asm+0x1a/0x30
[ +0.000011] </TASK>
Fixes: 1b04dcca4fb1 ("drm/amd/display: Introduce overlay cursor mode")
Suggested-by: Leo Li <sunpeng.li@amd.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3693
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1c86c81a86c60f9b15d3e3f43af0363cf56063e7)
Cc: stable@vger.kernel.org
|
|
[WHY & HOW]
commit 7fb363c57522 ("drm/amd/display: Let drm_crtc_vblank_on/off manage interrupts")
lets drm_crtc_vblank_* to manage interrupts in amdgpu_dm_crtc_set_vblank,
and amdgpu_irq_get/put do not need to be called here. Part of that
patch got lost somehow, so fix it up.
Fixes: 7fb363c57522 ("drm/amd/display: Let drm_crtc_vblank_on/off manage interrupts")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3782305ce5807c18fbf092124b9e8303cf1723ae)
Cc: stable@vger.kernel.org
|
|
Pull vfio fix from Alex Williamson:
- Fix a missed order alignment requirement of the pfn when inserting
mappings through the new huge fault handler introduced in v6.12 (Alex
Williamson)
* tag 'vfio-v6.13-rc7' of https://github.com/awilliam/linux-vfio:
vfio/pci: Fallback huge faults for unaligned pfn
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A randconfig build fix and a performance fix:
- Fix the CONFIG_RESET_CONTROLLER=n path signature of
clk_imx8mp_audiomix_reset_controller_register() to appease
randconfig
- Speed up the sdhci clk on TH1520 by a factor of 4 by adding
a fixed factor clk"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: clk-imx8mp-audiomix: fix function signature
clk: thead: Fix TH1520 emmc and shdci clock rate
|
|
The existing SW-FW interaction flow on the driver is wrong. Follow this
wrong flow, driver would never return error if there is a unknown command.
Since firmware writes back 'firmware ready' and 'unknown command' in the
mailbox message if there is an unknown command sent by driver. So reading
'firmware ready' does not timeout. Then driver would mistakenly believe
that the interaction has completed successfully.
It tends to happen with the use of custom firmware. Move the check for
'unknown command' out of the poll timeout for 'firmware ready'. And adjust
the debug log so that mailbox messages are always printed when commands
timeout.
Fixes: 1efa9bfe58c5 ("net: libwx: Implement interaction with firmware")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250103081013.1995939-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2025-01-03
Keisuke Nishimura provided a fix to check for kfifo_alloc() in the ca8210
driver.
Lizhi Xu provided a fix a corrupted list, found by syzkaller, by checking local
interfaces first.
* tag 'ieee802154-for-net-2025-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
mac802154: check local interfaces before deleting sdata list
ieee802154: ca8210: Add missing check for kfifo_alloc() in ca8210_probe()
====================
Link: https://patch.msgid.link/20250103160046.469363-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fix from Wim Van Sebroeck:
- fix error message during stm32 driver probe
* tag 'linux-watchdog-6.13-rc6' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: stm32_iwdg: fix error message during driver probe
|
|
The struct device_node *next pointer is not initialized, and it is
used in an error path in which it may have never been modified by
function mtk_drm_of_get_ddp_ep_cid().
Since the error path is relying on that pointer being NULL for the
OVL Adaptor and/or invalid component check and since said pointer
is being used in prints for %pOF, in the case that it points to a
bogus address, the print may cause a KP.
To resolve that, initialize the *next pointer to NULL before usage.
Fixes: 4c932840db1d ("drm/mediatek: Implement OF graphs support for display paths")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/633f3c6d-d09f-447c-95f1-dfb4114c50e6@stanley.mountain/
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241112105030.93337-1-angelogioacchino.delregno@collabora.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
Check the return value of drm_dp_dpcd_readb() to confirm that
AUX communication is successful. To simplify the code, replace
drm_dp_dpcd_readb() and DP_GET_SINK_COUNT() with drm_dp_read_sink_count().
Fixes: f70ac097a2cf ("drm/mediatek: Add MT8195 Embedded DisplayPort driver")
Signed-off-by: Liankun Yang <liankun.yang@mediatek.com>
Reviewed-by: Guillaume Ranquet <granquet@baylibre.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20241218113448.2992-1-liankun.yang@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
|
|
Pull block fixes from Jens Axboe:
"Collection of fixes for block. Particularly the target name overflow
has been a bit annoying, as it results in overwriting random memory
and hence shows up as triggering various other bugs.
- NVMe pull request via Keith:
- Fix device specific quirk for PRP list alignment (Robert)
- Fix target name overflow (Leo)
- Fix target write granularity (Luis)
- Fix target sleeping in atomic context (Nilay)
- Remove unnecessary tcp queue teardown (Chunguang)
- Simple cdrom typo fix"
* tag 'block-6.13-20250103' of git://git.kernel.dk/linux:
cdrom: Fix typo, 'devicen' to 'device'
nvme-tcp: remove nvme_tcp_destroy_io_queues()
nvmet-loop: avoid using mutex in IO hotpath
nvmet: propagate npwg topology
nvmet: Don't overflow subsysnqn
nvme-pci: 512 byte aligned dma pool segment quirk
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireles and netfilter.
Nothing major here. Over the last two weeks we gathered only around
two-thirds of our normal weekly fix count, but delaying sending these
until -rc7 seemed like a really bad idea.
AFAIK we have no bugs under investigation. One or two reverts for
stuff for which we haven't gotten a proper fix will likely come in the
next PR.
Current release - fix to a fix:
- netfilter: nft_set_hash: unaligned atomic read on struct
nft_set_ext
- eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup
Previous releases - regressions:
- net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
- mptcp:
- fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust
- fix TCP options overflow
- prevent excessive coalescing on receive, fix throughput
- net: fix memory leak in tcp_conn_request() if map insertion fails
- wifi: cw1200: fix potential NULL dereference after conversion to
GPIO descriptors
- phy: micrel: dynamically control external clock of KSZ PHY, fix
suspend behavior
Previous releases - always broken:
- af_packet: fix VLAN handling with MSG_PEEK
- net: restrict SO_REUSEPORT to inet sockets
- netdev-genl: avoid empty messages in NAPI get
- dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X
- eth:
- gve: XDP fixes around transmit, queue wakeup etc.
- ti: icssg-prueth: fix firmware load sequence to prevent time
jump which breaks timesync related operations
Misc:
- netlink: specs: mptcp: add missing attr and improve documentation"
* tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init
net: ti: icssg-prueth: Fix firmware load sequence.
mptcp: prevent excessive coalescing on receive
mptcp: don't always assume copied data in mptcp_cleanup_rbuf()
mptcp: fix recvbuffer adjust on sleeping rcvmsg
ila: serialize calls to nf_register_net_hooks()
af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK
af_packet: fix vlan_get_tci() vs MSG_PEEK
net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init()
net: restrict SO_REUSEPORT to inet sockets
net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets
net: sfc: Correct key_len for efx_tc_ct_zone_ht_params
net: wwan: t7xx: Fix FSM command timeout issue
sky2: Add device ID 11ab:4373 for Marvell 88E8075
mptcp: fix TCP options overflow.
net: mv643xx_eth: fix an OF node reference leak
gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup
eth: bcmsysport: fix call balance of priv->clk handling routines
net: llc: reset skb->transport_header
netlink: specs: mptcp: fix missing doc
...
|
|
Pull rdma fixes from Jason Gunthorpe:
"A lot of fixes accumulated over the holiday break:
- Static tool fixes, value is already proven to be NULL, possible
integer overflow
- Many bnxt_re fixes:
- Crashes due to a mismatch in the maximum SGE list size
- Don't waste memory for user QPs by creating kernel-only
structures
- Fix compatability issues with older HW in some of the new HW
features recently introduced: RTS->RTS feature, work around 9096
- Do not allow destroy_qp to fail
- Validate QP MTU against device limits
- Add missing validation on madatory QP attributes for RTR->RTS
- Report port_num in query_qp as required by the spec
- Fix creation of QPs of the maximum queue size, and in the
variable mode
- Allow all QPs to be used on newer HW by limiting a work around
only to HW it affects
- Use the correct MSN table size for variable mode QPs
- Add missing locking in create_qp() accessing the qp_tbl
- Form WQE buffers correctly when some of the buffers are 0 hop
- Don't crash on QP destroy if the userspace doesn't setup the
dip_ctx
- Add the missing QP flush handler call on the DWQE path to avoid
hanging on error recovery
- Consistently use ENXIO for return codes if the devices is
fatally errored
- Try again to fix VLAN support on iwarp, previous fix was reverted
due to breaking other cards
- Correct error path return code for rdma netlink events
- Remove the seperate net_device pointer in siw and rxe which
syzkaller found a way to UAF
- Fix a UAF of a stack ib_sge in rtrs
- Fix a regression where old mlx5 devices and FW were wrongly
activing new device features and failing"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
RDMA/mlx5: Enable multiplane mode only when it is supported
RDMA/bnxt_re: Fix error recovery sequence
RDMA/rtrs: Ensure 'ib_sge list' is accessible
RDMA/rxe: Remove the direct link to net_device
RDMA/hns: Fix missing flush CQE for DWQE
RDMA/hns: Fix warning storm caused by invalid input in IO path
RDMA/hns: Fix accessing invalid dip_ctx during destroying QP
RDMA/hns: Fix mapping error of zero-hop WQE buffer
RDMA/bnxt_re: Fix the locking while accessing the QP table
RDMA/bnxt_re: Fix MSN table size for variable wqe mode
RDMA/bnxt_re: Add send queue size check for variable wqe
RDMA/bnxt_re: Disable use of reserved wqes
RDMA/bnxt_re: Fix max_qp_wrs reported
RDMA/siw: Remove direct link to net_device
RDMA/nldev: Set error code in rdma_nl_notify_event
RDMA/bnxt_re: Fix reporting hw_ver in query_device
RDMA/bnxt_re: Fix to export port num to ib_query_qp
RDMA/bnxt_re: Fix setting mandatory attributes for modify_qp
RDMA/bnxt_re: Add check for path mtu in modify_qp
RDMA/bnxt_re: Fix the check for 9060 condition
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- A small Kconfig fixup for the i.MX.
In principle this could come in from the SoC tree but the bug was
introduced from the pin control tree so let's fix it from here.
- Fix a sleep in atomic context in the MCP23xxx GPIO expander by
disabling the regmap locking and using explicit mutex locks.
* tag 'pinctrl-v6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: mcp23s08: Fix sleeping in atomic context due to regmap locking
ARM: imx: Re-introduce the PINCTRL selection
|
|
Pull drm fixes from Dave Airlie:
"Happy New Year.
It was fairly quiet for holidays period, certainly nothing that worth
getting off the couch before I needed to, this is for the past two
weeks, i915, xe and some adv7511, I expect we will see some amdgpu etc
happening next week, but otherwise all quiet.
i915:
- Fix C10 pll programming sequence [cx0_phy]
- Fix power gate sequence. [dg1]
xe:
- uapi: Revert some devcoredump file format changes breaking a mesa
debug tool
- Fixes around waits when moving to system
- Fix a typo when checking for LMEM provisioning
- Fix a fault on fd close after unbind
- A couple of OA fixes squashed for stable backporting
adv7511:
- fix UAF
- drop single lane support
- audio infoframe fix"
* tag 'drm-fixes-2025-01-03' of https://gitlab.freedesktop.org/drm/kernel:
xe/oa: Fix query mode of operation for OAR/OAC
drm/i915/dg1: Fix power gate sequence.
drm/i915/cx0_phy: Fix C10 pll programming sequence
drm/xe: Fix fault on fd close after unbind
drm/xe/pf: Use correct function to check LMEM provisioning
drm/xe: Wait for migration job before unmapping pages
drm/xe: Use non-interruptible wait when moving BO to system
drm/xe: Revert some changes that break a mesa debug tool
drm: adv7511: Drop dsi single lane support
dt-bindings: display: adi,adv7533: Drop single lane support
drm: adv7511: Fix use-after-free in adv7533_attach_dsi()
drm/bridge: adv7511_audio: Update Audio InfoFrame properly
|
|
Once a sim device is instantiated and actively used, allowing rmdir for
its configfs serves no purpose and can be confusing. Effectively,
arbitrary users start depending on its existence.
Make the subsystem itself depend on the configfs entry for a sim device
while it is in active use.
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Link: https://lore.kernel.org/r/20250103141829.430662-5-koichiro.den@canonical.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Once a virtuser device is instantiated and actively used, allowing rmdir
for its configfs serves no purpose and can be confusing. Userspace
interacts with the virtual consumer at arbitrary times, meaning it
depends on its existence.
Make the subsystem itself depend on the configfs entry for a virtuser
device while it is in active use.
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Link: https://lore.kernel.org/r/20250103141829.430662-4-koichiro.den@canonical.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Creating a virtuser device via configfs with multiple conn_ids fails due
to incorrect indexing of lookup entries. Correct the indexing logic to
ensure proper functionality when multiple gpio_virtuser_lookup are
created.
Fixes: 91581c4b3f29 ("gpio: virtuser: new virtual testing driver for the GPIO API")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Link: https://lore.kernel.org/r/20250103141829.430662-3-koichiro.den@canonical.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
When a virtuser device is created via configfs and the probe fails due
to an incorrect lookup table, the table is not removed. This prevents
subsequent probe attempts from succeeding, even if the issue is
corrected, unless the device is released. Additionally, cleanup is also
needed in the less likely case of platform_device_register_full()
failure.
Besides, a consistent memory leak in lookup_table->dev_id was spotted
using kmemleak by toggling the live state between 0 and 1 with a correct
lookup table.
Introduce gpio_virtuser_remove_lookup_table() as the counterpart to the
existing gpio_virtuser_make_lookup_table() and call it from all
necessary points to ensure proper cleanup.
Fixes: 91581c4b3f29 ("gpio: virtuser: new virtual testing driver for the GPIO API")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Link: https://lore.kernel.org/r/20250103141829.430662-2-koichiro.den@canonical.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Caching RS parity bytes is already done in fec_decode_bufs() now,
no need to use yet another buffer for conversion to uint16_t.
This patch removes that double copy of RS parity bytes.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
This patch fixes an issue that was fixed in the commit
df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size")
but later broken again in the commit
8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
If the Reed-Solomon roots setting spans multiple blocks, the code does not
use proper parity bytes and randomly fails to repair even trivial errors.
This bug cannot happen if the sector size is multiple of RS roots
setting (Android case with roots 2).
The previous solution was to find a dm-bufio block size that is multiple
of the device sector size and roots size. Unfortunately, the optimization
in commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
is incorrect and uses data block size for some roots (for example, it uses
4096 block size for roots = 20).
This patch uses a different approach:
- It always uses a configured data block size for dm-bufio to avoid
possible misaligned IOs.
- and it caches the processed parity bytes, so it can join it
if it spans two blocks.
As the RS calculation is called only if an error is detected and
the process is computationally intensive, copying a few more bytes
should not introduce performance issues.
The issue was reported to cryptsetup with trivial reproducer
https://gitlab.com/cryptsetup/cryptsetup/-/issues/923
Reproducer (with roots=20):
# create verity device with RS FEC
dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=20 | \
awk '/^Root hash/{ print $3 }' >roothash
# create an erasure that should always be repairable with this roots setting
dd if=/dev/zero of=data.img conv=notrunc bs=1 count=4 seek=4 status=none
# try to read it through dm-verity
veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=20 $(cat roothash)
dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
Even now the log says it cannot repair it:
: verity-fec: 7:1: FEC 0: failed to correct: -74
: device-mapper: verity: 7:1: data block 0 is corrupted
...
With this fix, errors are properly repaired.
: verity-fec: 7:1: FEC 0: corrected 4 errors
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Fixes: 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
The PFN must also be aligned to the fault order to insert a huge
pfnmap. Test the alignment and fallback when unaligned.
Fixes: f9e54c3a2f5b ("vfio/pci: implement huge_fault support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219619
Reported-by: Athul Krishna <athul.krishna.kr@protonmail.com>
Reported-by: Precific <precification@posteo.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Precific <precification@posteo.de>
Link: https://lore.kernel.org/r/20250102183416.1841878-1-alex.williamson@redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|