summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-29Drivers: hv: vmbus: Prevent load re-ordering when reading ring bufferMichael Kelley
When reading a packet from a host-to-guest ring buffer, there is no memory barrier between reading the write index (to see if there is a packet to read) and reading the contents of the packet. The Hyper-V host uses store-release when updating the write index to ensure that writes of the packet data are completed first. On the guest side, the processor can reorder and read the packet data before the write index, and sometimes get stale packet data. Getting such stale packet data has been observed in a reproducible case in a VM on ARM64. Fix this by using virt_load_acquire() to read the write index, ensuring that reads of the packet data cannot be reordered before it. Preventing such reordering is logically correct, and with this change, getting stale data can no longer be reproduced. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/1648394710-33480-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-03-29docs: gpu: i915.rst: Fix DRRS documentationJosé Roberto de Souza
intel_drrs_enable() and intel_drrs_disable() were renamed to intel_drrs_activate() and intel_drrs_deactivate() in commit 54903c7a6b40 ("drm/i915: s/enable/active/ for DRRS") and it is causing warnings when generating the kernel documentation. But as for a while DRRS has its own file, so here just let the tool generate the documentation for all exported and documented functions in intel_drrs.c. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220325183832.146472-1-jose.souza@intel.com
2022-03-29Merge tag 'nvme-5.18-2022-03-29' of git://git.infradead.org/nvme into ↵Jens Axboe
for-5.18/drivers Pull NVMe fixes from Christoph: "- fix multipath hang when disk goes live over reconnect (Anton Eidelman) - fix RCU hole that allowed for endless looping in multipath round robin (Chris Leech) - remove redundant assignment after left shift (Colin Ian King) - add quirks for Samsung X5 SSDs (Monish Kumar R) - fix the read-only state for zoned namespaces with unsupposed features (Pankaj Raghav) - use a private workqueue instead of the system workqueue in nvmet (Sagi Grimberg) - allow duplicate NSIDs for private namespaces (Sungup Moon) - expose use_threaded_interrupts read-only in sysfs (Xin Hao)" * tag 'nvme-5.18-2022-03-29' of git://git.infradead.org/nvme: nvme-multipath: fix hang when disk goes live over reconnect nvme: fix RCU hole that allowed for endless looping in multipath round robin nvme: allow duplicate NSIDs for private namespaces nvmet: remove redundant assignment after left shift nvmet: use a private workqueue instead of the system workqueue nvme-pci: add quirks for Samsung X5 SSDs nvme-pci: expose use_threaded_interrupts read-only in sysfs nvme: fix the read-only state for zoned namespaces with unsupposed features
2022-03-29PCI: hv: Propagate coherence from VMbus device to PCI deviceMichael Kelley
PCI pass-thru devices in a Hyper-V VM are represented as a VMBus device and as a PCI device. The coherence of the VMbus device is set based on the VMbus node in ACPI, but the PCI device has no ACPI node and defaults to not hardware coherent. This results in extra software coherence management overhead on ARM64 when devices are hardware coherent. Fix this by setting up the PCI host bus so that normal PCI mechanisms will propagate the coherence of the VMbus device to the PCI device. There's no effect on x86/x64 where devices are always hardware coherent. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Boqun Feng <boqun.feng@gmail.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1648138492-2191-3-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-03-29Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus deviceMichael Kelley
VMbus synthetic devices are not represented in the ACPI DSDT -- only the top level VMbus device is represented. As a result, on ARM64 coherence information in the _CCA method is not specified for synthetic devices, so they default to not hardware coherent. Drivers for some of these synthetic devices have been recently updated to use the standard DMA APIs, and they are incurring extra overhead of unneeded software coherence management. Fix this by propagating coherence information from the VMbus node in ACPI to the individual synthetic devices. There's no effect on x86/x64 where devices are always hardware coherent. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1648138492-2191-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-03-29Drivers: hv: vmbus: Fix potential crash on module unloadGuilherme G. Piccoli
The vmbus driver relies on the panic notifier infrastructure to perform some operations when a panic event is detected. Since vmbus can be built as module, it is required that the driver handles both registering and unregistering such panic notifier callback. After commit 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback") though, the panic notifier registration is done unconditionally in the module initialization routine whereas the unregistering procedure is conditionally guarded and executes only if HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE capability is set. This patch fixes that by unconditionally unregistering the panic notifier in the module's exit routine as well. Fixes: 74347a99e73a ("x86/Hyper-V: Unload vmbus channel in hv panic callback") Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220315203535.682306-1-gpiccoli@igalia.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-03-29Drivers: hv: vmbus: Fix initialization of device object in ↵Andrea Parri (Microsoft)
vmbus_device_register() Initialize the device's dma_{mask,parms} pointers and the device's dma_mask value before invoking device_register(). Address the following trace with 5.17-rc7: [ 49.646839] WARNING: CPU: 0 PID: 189 at include/linux/dma-mapping.h:543 netvsc_probe+0x37a/0x3a0 [hv_netvsc] [ 49.646928] Call Trace: [ 49.646930] <TASK> [ 49.646935] vmbus_probe+0x40/0x60 [hv_vmbus] [ 49.646942] really_probe+0x1ce/0x3b0 [ 49.646948] __driver_probe_device+0x109/0x180 [ 49.646952] driver_probe_device+0x23/0xa0 [ 49.646955] __device_attach_driver+0x76/0xe0 [ 49.646958] ? driver_allows_async_probing+0x50/0x50 [ 49.646961] bus_for_each_drv+0x84/0xd0 [ 49.646964] __device_attach+0xed/0x170 [ 49.646967] device_initial_probe+0x13/0x20 [ 49.646970] bus_probe_device+0x8f/0xa0 [ 49.646973] device_add+0x41a/0x8e0 [ 49.646975] ? hrtimer_init+0x28/0x80 [ 49.646981] device_register+0x1b/0x20 [ 49.646983] vmbus_device_register+0x5e/0xf0 [hv_vmbus] [ 49.646991] vmbus_add_channel_work+0x12d/0x190 [hv_vmbus] [ 49.646999] process_one_work+0x21d/0x3f0 [ 49.647002] worker_thread+0x4a/0x3b0 [ 49.647005] ? process_one_work+0x3f0/0x3f0 [ 49.647007] kthread+0xff/0x130 [ 49.647011] ? kthread_complete_and_exit+0x20/0x20 [ 49.647015] ret_from_fork+0x22/0x30 [ 49.647020] </TASK> [ 49.647021] ---[ end trace 0000000000000000 ]--- Fixes: 743b237c3a7b0 ("scsi: storvsc: Add Isolation VM support for storvsc driver") Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220315141053.3223-1-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-03-29Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in ↵Andrea Parri (Microsoft)
isolated guests hv_panic_page might contain guest-sensitive information, do not dump it over to Hyper-V by default in isolated guests. While at it, update some comments in hyperv_{panic,die}_event(). Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/20220301141135.2232-1-parri.andrea@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-03-29drm/edid: split drm_add_edid_modes() to twoJani Nikula
Reduce the size of the function that actually modifies the EDID. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/437c3c79f68d1144444fb2dd18a678f3aa97272c.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: add more general struct edid constness in the interfacesJani Nikula
With this, the remaining non-const parts are the ones that actually modify the EDID, for example to fix corrupt EDID. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/c7156f22494b585c55a00a6732462bde0cc19dbf.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: constify struct edid passed around in callbacks and closureJani Nikula
Finalize detailed timing parsing constness by making struct edid also const in callbacks and closure. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/8229be49b5a7686856c17428aa9291b4c4cf1bd5.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: constify struct edid passed to detailed blocksJani Nikula
Constify the first level of struct edid in detailed timing parsing. Also switch to struct edid instead of u8. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/ec7acb8a9ee0e2868bbb2afdfdd7db114ae180e1.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: constify struct detailed_timing in parsing callbacksJani Nikula
Moving one level higher, constify struct detailed_timing pointers in callbacks. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/9b617068d2349a574a837ad6207b1d45c4d79eb5.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: constify struct detailed_timing in lower level parsingJani Nikula
Start constifying the struct detailed_timing pointers being passed around from bottom up. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/0b7fafcc7784db0003e454544916c273a9eb1250.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: use struct detailed_timing member access in gtf2 functionsJani Nikula
Use struct detailed_timing member access instead of direct offsets to avoid casting. Use BUILD_BUG_ON() for sanity check. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/9fe5f5c39039e585fecfffb390297d49262e5fd3.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: use struct detailed_timing member access in is_rb()Jani Nikula
Use struct detailed_timing member access instead of direct offsets to avoid casting. Use BUILD_BUG_ON() for sanity check. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/c069669c2fe8f9c3061c7d1a413c75a33ec48813.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: pass a timing pointer to is_detailed_timing_descriptor()Jani Nikula
Use struct member access instead of direct offsets to avoid a cast. Use BUILD_BUG_ON() for sanity check. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/0b5213383e14f11c6a505b10a7342fb2ff4f2a11.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: pass a timing pointer to is_display_descriptor()Jani Nikula
Use struct member access instead of direct offsets to avoid lots of casts all over the place. Use BUILD_BUG_ON() for sanity check. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/ccc54b45ea628874c0290dd64114da6cefff1819.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: fix reduced blanking support checkJani Nikula
The reduced blanking bit is valid only for CVT, indicated by display range limits flags 0x04. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/5dea5ee24065450716bbc177dd6850d3193dbeec.1648477901.git.jani.nikula@intel.com
2022-03-29drm/edid: don't modify EDID while parsingJani Nikula
We'll want to keep the EDID immutable while parsing. Stop modifying the EDID because of the quirks. In theory, this does have userspace implications, but the userspace is supposed to use the modes exposed via KMS API, not by parsing the EDID directly. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/45d5cf067eaad49b321ac82836090d9de524374e.1648477901.git.jani.nikula@intel.com
2022-03-29drm/i915: Add a DP1.2 compatible way to read LTTPR capabilitiesImre Deak
At least some DELL monitors (P2715Q) with DPCD_REV 1.2 return corrupted DPCD register values when reading from the 0xF0000- LTTPR range with an AUX transaction block size bigger than 1. The DP standard requires 0 to be returned - as for any other reserved/invalid addresses - but these monitors return the DPCD_REV register value repeated in each byte of the read buffer. This will in turn corrupt the values returned by the LTTPRs between the source and the monitor: LTTPRs must adjust the values they read from the downstream DPRX, for instance right-shift/init the downstream DP_PHY_REPEATER_CNT value. Since the value returned by the monitor's DPRX is non-zero the adjusted values will be corrupt. Reading the LTTPR registers one-by-one instead of reading all of them with a single AUX transfer works around the issue. According to the DP standard's 0xF0000 register description: "LTTPR-related registers at DPCD Addresses F0000h through F02FFh are valid only for DPCD r1.4 (or higher)." While it's unclear if DPCD r1.4 refers to the DPCD_REV or to the LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV register (tickets filed at the VESA site to clarify this haven't been addressed), one possibility is that it's a restriction due to non-compliant monitors described above. Disabling the non-transparent LTTPR mode for all such monitors is not a viable solution: the transparent LTTPR mode has its own issue causing link training failures and this would affect a lot of monitors in use with DPCD_REV < 1.4. Instead this patch works around the problem by reading the LTTPR common and PHY cap registers one-by-one for any monitor with a DPCD_REV < 1.4. The standard requires the DPCD capabilities to be read after the LTTPR common capabilities are read, so re-read the DPCD capabilities after the LTTPR common and PHY caps were read out. v2: - Use for instead of a while loop. (Ville) - Add to code comment the monitor model with the problem. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4531 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220322143844.42616-1-imre.deak@intel.com
2022-03-29tilcdc: tilcdc_external: fix an incorrect NULL check on list iteratorXiaomeng Tong
The bug is here: if (!encoder) { The list iterator value 'encoder' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found. To fix the bug, use a new variable 'iter' as the list iterator, while use the original variable 'encoder' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: ec9eab097a500 ("drm/tilcdc: Add drm bridge support for attaching drm bridge drivers") Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Reviewed-by: Jyri Sarha <jyri.sarha@iki.fi> Tested-by: Jyri Sarha <jyri.sarha@iki.fi> Signed-off-by: Jyri Sarha <jyri.sarha@iki.fi> Link: https://patchwork.freedesktop.org/patch/msgid/20220327061516.5076-1-xiam0nd.tong@gmail.com
2022-03-29gma500: fix an incorrect NULL check on list iteratorXiaomeng Tong
The bug is here: return crtc; The list iterator value 'crtc' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found. To fix the bug, return 'crtc' when found, otherwise return NULL. Cc: stable@vger.kernel.org fixes: 89c78134cc54d ("gma500: Add Poulsbo support") Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220327052028.2013-1-xiam0nd.tong@gmail.com
2022-03-29drm/amdgpu: drop amdgpu_gtt_nodeChristian König
We have the BO pointer in the base structure now as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Link: https://patchwork.freedesktop.org/patch/msgid/20220321132601.2161-6-christian.koenig@amd.com
2022-03-29drm/ttm: rework bulk move handling v5Christian König
Instead of providing the bulk move structure for each LRU update set this as property of the BO. This should avoid costly bulk move rebuilds with some games under RADV. v2: some name polishing, add a few more kerneldoc words. v3: add some lockdep v4: fix bugs, handle pin/unpin as well v5: improve kerneldoc Signed-off-by: Christian König <christian.koenig@amd.com> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220321132601.2161-5-christian.koenig@amd.com
2022-03-29drm/ttm: de-inline ttm_bo_pin/unpinChristian König
Those functions are going to become more complex, don't inline them any more. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220321132601.2161-4-christian.koenig@amd.com
2022-03-29net: lan966x: fix kernel oops on ioctl when I/F is downMichael Walle
ioctls handled by phy_mii_ioctl() will cause a kernel oops when the interface is down. Fix it by making sure there is a PHY attached. Fixes: 735fec995b21 ("net: lan966x: Implement SIOCSHWTSTAMP and SIOCGHWTSTAMP") Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220328220350.3118969-1-michael@walle.cc Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29Merge branch 'fix-uaf-bugs-caused-by-ax25_release'Paolo Abeni
Duoming Zhou says: ==================== Fix UAF bugs caused by ax25_release() The first patch fixes UAF bugs in ax25_send_control, and the second patch fixes UAF bugs in ax25 timers. ==================== Link: https://lore.kernel.org/r/cover.1648472006.git.duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29ax25: Fix UAF bugs in ax25 timersDuoming Zhou
There are race conditions that may lead to UAF bugs in ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(), ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call ax25_release() to deallocate ax25_dev. One of the UAF bugs caused by ax25_release() is shown below: (Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | ... | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25_std_establish_data_link() | ax25_start_t1timer() | ax25_dev_device_down() //(3) mod_timer(&ax25->t1timer,..) | | ax25_release() (wait a time) | ... | ax25_dev_put(ax25_dev) //(4)FREE ax25_t1timer_expiry() | ax25->ax25_dev->values[..] //USE| ... ... | We increase the refcount of ax25_dev in position (1) and (2), and decrease the refcount of ax25_dev in position (3) and (4). The ax25_dev will be freed in position (4) and be used in ax25_t1timer_expiry(). The fail log is shown below: ============================================================== [ 106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0 [ 106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574 [ 106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14 [ 106.116942] Call Trace: ... [ 106.116942] ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] call_timer_fn+0x122/0x3d0 [ 106.116942] __run_timers.part.0+0x3f6/0x520 [ 106.116942] run_timer_softirq+0x4f/0xb0 [ 106.116942] __do_softirq+0x1c2/0x651 ... This patch adds del_timer_sync() in ax25_release(), which could ensure that all timers stop before we deallocate ax25_dev. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29ax25: fix UAF bug in ax25_send_control()Duoming Zhou
There are UAF bugs in ax25_send_control(), when we call ax25_release() to deallocate ax25_dev. The possible race condition is shown below: (Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25->state = AX25_STATE_1 | ... | ax25_dev_device_down() //(3) (Thread 3) ax25_release() | ax25_dev_put() //(4) FREE | case AX25_STATE_1: | ax25_send_control() | alloc_skb() //USE | The refcount of ax25_dev increases in position (1) and (2), and decreases in position (3) and (4). The ax25_dev will be freed before dereference sites in ax25_send_control(). The following is part of the report: [ 102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210 [ 102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602 [ 102.297448] Call Trace: [ 102.303751] ax25_send_control+0x33/0x210 [ 102.303751] ax25_release+0x356/0x450 [ 102.305431] __sock_release+0x6d/0x120 [ 102.305431] sock_close+0xf/0x20 [ 102.305431] __fput+0x11f/0x420 [ 102.305431] task_work_run+0x86/0xd0 [ 102.307130] get_signal+0x1075/0x1220 [ 102.308253] arch_do_signal_or_restart+0x1df/0xc00 [ 102.308253] exit_to_user_mode_prepare+0x150/0x1e0 [ 102.308253] syscall_exit_to_user_mode+0x19/0x50 [ 102.308253] do_syscall_64+0x48/0x90 [ 102.308253] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 102.308253] RIP: 0033:0x405ae7 This patch defers the free operation of ax25_dev and net_device after all corresponding dereference sites in ax25_release() to avoid UAF. Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29drm/i915/migrate: move the sanity checkMatthew Auld
Move the sanity check that both src and dst are never both system memory, which should never happen on discrete, and likely means we have a bug. The only exception is on integrated where we trigger this path in the selftests. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220324172143.377104-2-matthew.auld@intel.com
2022-03-29drm/i915/ttm: limit where we apply TTM_PL_FLAG_CONTIGUOUSMatthew Auld
We only need this when allocating device local-memory, where this influences the drm_buddy. Currently there is some funny behaviour where an "in limbo" system memory object is lacking the relevant placement flags etc. before we first allocate the ttm_tt, leading to ttm performing a move when not needed, since the current placement is seen as not compatible. Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 2ed38cec5606 ("drm/i915: opportunistically apply ALLOC_CONTIGIOUS") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Nirmoy Das <nirmoy.das@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220324172143.377104-1-matthew.auld@intel.com
2022-03-29drm/i915: avoid concurrent writes to aux_invFei Yang
GPU hangs have been observed when multiple engines write to the same aux_inv register at the same time. To avoid this each engine should only invalidate its own auxiliary table. The function gen12_emit_flush_xcs() currently invalidate the auxiliary table for all engines because the rq->engine is not necessarily the engine eventually carrying out the request, and potentially the engine could even be a virtual one (with engine->instance being -1). With the MMIO remap feature, we can actually set bit 17 of MI_LRI instruction and let the hardware to figure out the local aux_inv register at runtime to avoid invalidating auxiliary table for all engines. Bspec: 45728 v2: Invalidate AUX table for indirect context as well. Cc: Stuart Summers <stuart.summers@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220328171650.1900674-1-fei.yang@intel.com
2022-03-29openvswitch: Fixed nd target mask field in the flow dump.Martin Varghese
IPv6 nd target mask was not getting populated in flow dump. In the function __ovs_nla_put_key the icmp code mask field was checked instead of icmp code key field to classify the flow as neighbour discovery. ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0), skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0), ct_zone(0/0),ct_mark(0/0),ct_label(0/0), eth(src=00:00:00:00:00:00/00:00:00:00:00:00, dst=00:00:00:00:00:00/00:00:00:00:00:00), eth_type(0x86dd), ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no), icmpv6(type=135,code=0), nd(target=2001::2/::, sll=00:00:00:00:00:00/00:00:00:00:00:00, tll=00:00:00:00:00:00/00:00:00:00:00:00), packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2 Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c) Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29nvme-multipath: fix hang when disk goes live over reconnectAnton Eidelman
nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a fresh ANA log from the ctrl. This is essential to have an up to date path states for both existing namespaces and for those scan_work may discover once the ctrl is up. This happens in the following cases: 1) A new ctrl is being connected. 2) An existing ctrl is successfully reconnected. 3) An existing ctrl is being reset. While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and nvme_read_ana_log() may call nvme_update_ns_ana_state(). This result in a hang when the ANA state of an existing namespace changes and makes the disk live: nvme_mpath_set_live() issues IO to the namespace through the ctrl, which does NOT have IO queues yet. See sample hang below. Solution: - nvme_update_ns_ana_state() to call set_live only if ctrl is live - nvme_read_ana_log() call from nvme_mpath_init_identify() therefore only fetches and parses the ANA log; any erros in this process will fail the ctrl setup as appropriate; - a separate function nvme_mpath_update() is called in nvme_start_ctrl(); this parses the ANA log without fetching it. At this point the ctrl is live, therefore, disks can be set live normally. Sample failure: nvme nvme0: starting error recovery nvme nvme0: Reconnecting in 10 seconds... block nvme0n6: no usable path - requeuing I/O INFO: task kworker/u8:3:312 blocked for more than 122 seconds. Tainted: G E 5.14.5-1.el7.elrepo.x86_64 #1 Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp] Call Trace: __schedule+0x2a2/0x7e0 schedule+0x4e/0xb0 io_schedule+0x16/0x40 wait_on_page_bit_common+0x15c/0x3e0 do_read_cache_page+0x1e0/0x410 read_cache_page+0x12/0x20 read_part_sector+0x46/0x100 read_lba+0x121/0x240 efi_partition+0x1d2/0x6a0 bdev_disk_changed.part.0+0x1df/0x430 bdev_disk_changed+0x18/0x20 blkdev_get_whole+0x77/0xe0 blkdev_get_by_dev+0xd2/0x3a0 __device_add_disk+0x1ed/0x310 device_add_disk+0x13/0x20 nvme_mpath_set_live+0x138/0x1b0 [nvme_core] nvme_update_ns_ana_state+0x2b/0x30 [nvme_core] nvme_update_ana_state+0xca/0xe0 [nvme_core] nvme_parse_ana_log+0xac/0x170 [nvme_core] nvme_read_ana_log+0x7d/0xe0 [nvme_core] nvme_mpath_init_identify+0x105/0x150 [nvme_core] nvme_init_identify+0x2df/0x4d0 [nvme_core] nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core] nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp] nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp] process_one_work+0x1bd/0x360 worker_thread+0x50/0x3d0 Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-29nvme: fix RCU hole that allowed for endless looping in multipath round robinChris Leech
Make nvme_ns_remove match the assumptions elsewhere. 1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is running in __nvme_find_path or nvme_round_robin_path that will re-assign this ns to current_path. 2) Any matching current_path entries need to be cleared before removing from the siblings list, to prevent calling nvme_round_robin_path with an "old" ns that's off list. 3) Finally the list_del_rcu can happen, and then synchronize again before releasing any reference counts. Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-29nvme: allow duplicate NSIDs for private namespacesSungup Moon
A NVMe subsystem with multiple controller can have private namespaces that use the same NSID under some conditions: "If Namespace Management, ANA Reporting, or NVM Sets are supported, the NSIDs shall be unique within the NVM subsystem. If the Namespace Management, ANA Reporting, and NVM Sets are not supported, then NSIDs: a) for shared namespace shall be unique; and b) for private namespace are not required to be unique." Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec. Make sure this specific setup is supported in Linux. Fixes: 9ad1927a3bc2 ("nvme: always search for namespace head") Signed-off-by: Sungup Moon <sungup.moon@samsung.com> [hch: refactored and fixed the controller vs subsystem based naming conflict] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-03-29nvmet: remove redundant assignment after left shiftColin Ian King
The left shift is followed by a re-assignment back to cc_css, the assignment is redundant. Fix this by replacing the "<<=" operator with "<<" instead. This cleans up the clang scan build warning: drivers/nvme/target/core.c:1124:10: warning: Although the value stored to 'cc_css' is used in the enclosing expression, the value is never actually read from 'cc_css' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-29nvmet: use a private workqueue instead of the system workqueueSagi Grimberg
Any attempt to flush kernel-global WQs has possibility of deadlock so we should simply stop using them, instead introduce nvmet_wq which is the generic nvmet workqueue for work elements that don't explicitly require a dedicated workqueue (by the mere fact that they are using the system_wq). Changes were done using the following replaces: - s/schedule_work(/queue_work(nvmet_wq, /g - s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g - s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-29dma-buf: handle empty dma_fence_arrays gracefullyChristian König
A bug inside the new sync-file merge code created empty dma_fence_array instances. Warn about that and handle those without crashing. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220329070001.134180-2-christian.koenig@amd.com
2022-03-29dma-buf/sync-file: fix logic error in new fence merge codeChristian König
When the array is empty because everything is signaled we can't use add_fence() to add something because that would filter the signaled fence again. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 519f490db07e ("dma-buf/sync-file: fix warning about fence containers") Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220329070001.134180-1-christian.koenig@amd.com
2022-03-28selftests/bpf: Fix clang compilation errorsYonghong Song
llvm upstream patch ([1]) added to issue warning for code like void test() { int j = 0; for (int i = 0; i < 1000; i++) j++; return; } This triggered several errors in selftests/bpf build since compilation flag -Werror is used. ... test_lpm_map.c:212:15: error: variable 'n_matches' set but not used [-Werror,-Wunused-but-set-variable] size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups; ^ test_lpm_map.c:212:26: error: variable 'n_matches_after_delete' set but not used [-Werror,-Wunused-but-set-variable] size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups; ^ ... prog_tests/get_stack_raw_tp.c:32:15: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable] static __u64 cnt; ^ ... For test_lpm_map.c, 'n_matches'/'n_matches_after_delete' are changed to be volatile in order to silent the warning. I didn't remove these two declarations since they are referenced in a commented code which might be used by people in certain cases. For get_stack_raw_tp.c, the variable 'cnt' is removed. [1] https://reviews.llvm.org/D122271 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220325200304.2915588-1-yhs@fb.com
2022-03-28Merge branch 'xsk: another round of fixes'Alexei Starovoitov
Maciej Fijalkowski says: ==================== Hello, yet another fixes for XSK from Magnus and me. Magnus addresses the fact that xp_alloc() can return NULL, so this needs to be handled to avoid clearing entries in the SW ring on driver side. Then he addresses the off-by-one problem in Tx desc cleaning routine for ice ZC driver. From my side, I am adding protection to ZC Rx processing loop so that cleaning of descriptors wouldn't go over already processed entries. Then I also fix an issue with assigning XSK pool to Tx queues. This is directed to bpf tree. Thanks! Maciej Fijalkowski (2): ice: xsk: stop Rx processing when ntc catches ntu ice: xsk: fix indexing in ice_tx_xsk_pool() ==================== Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-28ice: xsk: Fix indexing in ice_tx_xsk_pool()Maciej Fijalkowski
Ice driver tries to always create XDP rings array to be num_possible_cpus() sized, regardless of user's queue count setting that can be changed via ethtool -L for example. Currently, ice_tx_xsk_pool() calculates the qid by decrementing the ring->q_index by the count of XDP queues, but ring->q_index is set to 'i + vsi->alloc_txq'. When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool() will do OOB access and in the final result ring would not get xsk_pool pointer assigned. Then, each ice_xsk_wakeup() call will fail with error and it will not be possible to get into NAPI and do the processing from driver side. Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the setting of ring->q_index. Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com
2022-03-28ice: xsk: Stop Rx processing when ntc catches ntuMaciej Fijalkowski
This can happen with big budget values and some breakage of re-filling descriptors as we do not clear the entry that ntu is pointing at the end of ice_alloc_rx_bufs_zc. So if ntc is at ntu then it might be the case that status_error0 has an old, uncleared value and ntc would go over with processing which would result in false results. Break Rx loop when ntc == ntu to avoid broken behavior. Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-4-maciej.fijalkowski@intel.com
2022-03-28ice: xsk: Eliminate unnecessary loop iterationMagnus Karlsson
The NIC Tx ring completion routine cleans entries from the ring in batches. However, it processes one more batch than it is supposed to. Note that this does not matter from a functionality point of view since it will not find a set DD bit for the next batch and just exit the loop. But from a performance perspective, it is faster to terminate the loop before and not issue an expensive read over PCIe to get the DD bit. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-3-maciej.fijalkowski@intel.com
2022-03-28xsk: Do not write NULL in SW ring at allocation failureMagnus Karlsson
For the case when xp_alloc_batch() is used but the batched allocation cannot be used, there is a slow path that uses the non-batched xp_alloc(). When it fails to allocate an entry, it returns NULL. The current code wrote this NULL into the entry of the provided results array (pointer to the driver SW ring usually) and returned. This might not be what the driver expects and to make things simpler, just write successfully allocated xdp_buffs into the SW ring,. The driver might have information in there that is still important after an allocation failure. Note that at this point in time, there are no drivers using xp_alloc_batch() that could trigger this slow path. But one might get added. Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-2-maciej.fijalkowski@intel.com
2022-03-28Merge branch 'kprobes: rethook: x86: Replace kretprobe trampoline with rethook'Alexei Starovoitov
Masami Hiramatsu says: ==================== Here are the 3rd version for generic kretprobe and kretprobe on x86 for replacing the kretprobe trampoline with rethook. The previous version is here[1] [1] https://lore.kernel.org/all/164821817332.2373735.12048266953420821089.stgit@devnote2/T/#u This version fixed typo and build issues for bpf-next and CONFIG_RETHOOK=y error. I also add temporary mitigation lines for ANNOTATE_NOENDBR macro issue for bpf-next tree [2/4]. This will be removed after merging kernel IBT series. Background: This rethook came from Jiri's request of multiple kprobe for bpf[2]. He tried to solve an issue that starting bpf with multiple kprobe will take a long time because bpf-kprobe will wait for RCU grace period for sync rcu events. Jiri wanted to attach a single bpf handler to multiple kprobes and he tried to introduce multiple-probe interface to kprobe. So I asked him to use ftrace and kretprobe-like hook if it is only for the function entry and exit, instead of adding ad-hoc interface to kprobes. For this purpose, I introduced the fprobe (kprobe like interface for ftrace) with the rethook (this is a generic return hook feature for fprobe exit handler)[3]. [2] https://lore.kernel.org/all/20220104080943.113249-1-jolsa@kernel.org/T/#u [3] https://lore.kernel.org/all/164191321766.806991.7930388561276940676.stgit@devnote2/T/#u The rethook is basically same as the kretprobe trampoline. I just made it decoupled from kprobes. Eventually, the all arch dependent kretprobe trampolines will be replaced with the rethook trampoline instead of cloning and set HAVE_RETHOOK=y. When I port the rethook for all arch which supports kretprobe, the legacy kretprobe specific code (which is for CONFIG_KRETPROBE_ON_RETHOOK=n) will be removed eventually. ==================== Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-28x86,kprobes: Fix optprobe trampoline to generate complete pt_regsMasami Hiramatsu
Currently the optprobe trampoline template code ganerate an almost complete pt_regs on-stack, everything except regs->ss. The 'regs->ss' points to the top of stack, which is not a valid segment decriptor. As same as the rethook does, complete the job by also pushing ss. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164826166027.2455864.14759128090648961900.stgit@devnote2
2022-03-28x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regsPeter Zijlstra
Currently arch_rethook_trampoline() generates an almost complete pt_regs on-stack, everything except regs->ss that is, that currently points to the fake return address, which is not a valid segment descriptor. Since interpretation of regs->[sb]p should be done in the context of regs->ss, and we have code actually doing that (see arch/x86/lib/insn-eval.c for instance), complete the job by also pushing ss. This ensures that anybody who does do look at regs->ss doesn't mysteriously malfunction, avoiding much future pain. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/164826164851.2455864.17272661073069737350.stgit@devnote2