linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-11-22	nvmem: rmem: Fix return value check in rmem_read()	Wei Yongjun
	In case of error, the function memremap() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Fixes: 5a3fa75a4d9c ("nvmem: Add driver to expose reserved memory as nvmem") Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20221118063840.6357-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-22	Merge tag 'iio-fixes-for-6.1c' of ↵	Greg Kroah-Hartman
	https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: "3rd set of IIO fixes for the 6.1 cycle. Usual mixed bunch of driver fixes. * sw-triggers - Fix failure to cleanup up list registration in an error path. * aspeed,adc - Drop the trim valid dts property as it doesn't account for unprogrammed OTP and that can be easily detected without it. * avago,apds9960: - Fix register address for gesture gain. * bosch,bma400 - Fix a memory leak in an error path. * rohm,rpr0521 - Fix missing dependency on IIO_BUFFER/IIO_TRIGGERED_BUFFER. * ti,afe4403/4404 - Fix out of band read by moving reads down to where they are used." * tag 'iio-fixes-for-6.1c' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: dt-bindings: iio: adc: Remove the property "aspeed,trim-data-valid" iio: adc: aspeed: Remove the trim valid dts property. iio: core: Fix entry not deleted when iio_register_sw_trigger_type() fails iio: accel: bma400: Fix memory leak in bma400_get_steps_reg() iio: light: rpr0521: add missing Kconfig dependencies iio: health: afe4404: Fix oob read in afe4404_[read\|write]_raw iio: health: afe4403: Fix oob read in afe4403_read_raw iio: light: apds9960: fix wrong register for gesture gain
2022-11-22	Merge tag 'fpga-for-6.1-final' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga into work-linus Xu writes: FPGA Manager changes for 6.1-final Intel m10 bmc secure update - Russ's change fixes Kconfig dependencies All patches have been reviewed on the mailing list, and have been in the last linux-next releases (as part of our for-6.1 branch) Signed-off-by: Xu Yilun <yilun.xu@intel.com> * tag 'fpga-for-6.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/fpga/linux-fpga: fpga: m10bmc-sec: Fix kconfig dependencies
2022-11-22	usb: cdnsp: fix issue with ZLP - added TD_SIZE = 1	Pawel Laszczak
	Patch modifies the TD_SIZE in TRB before ZLP TRB. The TD_SIZE in TRB before ZLP TRB must be set to 1 to force processing ZLP TRB by controller. cc: <stable@vger.kernel.org> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") Signed-off-by: Pawel Laszczak <pawell@cadence.com> Reviewed-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/20221115092218.421267-1-pawell@cadence.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-22	usb: dwc3: gadget: Clear ep descriptor last	Thinh Nguyen
	Until the endpoint is disabled, its descriptors should remain valid. When its requests are removed from ep disable, the request completion routine may attempt to access the endpoint's descriptor. Don't clear the descriptors before that. Fixes: f09ddcfcb8c5 ("usb: dwc3: gadget: Prevent EP queuing while stopping transfers") Cc: stable@vger.kernel.org Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/45db7c83b209259115bf652af210f8b2b3b1a383.1668561364.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-22	usb: dwc3: exynos: Fix remove() function	Marek Szyprowski
	The core DWC3 device node was not properly removed by the custom dwc3_exynos_remove_child() function. Replace it with generic of_platform_depopulate() which does that job right. Fixes: adcf20dcd262 ("usb: dwc3: exynos: Use of_platform API to create dwc3 core pdev") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Cc: stable@vger.kernel.org Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org> Link: https://lore.kernel.org/r/20221110154131.2577-1-m.szyprowski@samsung.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-22	usb: cdnsp: Fix issue with Clear Feature Halt Endpoint	Pawel Laszczak
	During handling Clear Halt Endpoint Feature request, driver invokes Reset Endpoint command. Because this command has some issue with transition endpoint from Running to Idle state the driver must stop the endpoint by using Stop Endpoint command. cc: <stable@vger.kernel.org> Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") Reviewed-by: Peter Chen <peter.chen@kernel.org> Signed-off-by: Pawel Laszczak <pawell@cadence.com> Link: https://lore.kernel.org/r/20221110063005.370656-1-pawell@cadence.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-22	usb: dwc3: gadget: Disable GUSB2PHYCFG.SUSPHY for End Transfer	Thinh Nguyen
	If there's a disconnection while operating in eSS, there may be a delay in VBUS drop response from the connector. In that case, the internal link state may drop to operate in usb2 speed while the controller thinks the VBUS is still high. The driver must make sure to disable GUSB2PHYCFG.SUSPHY when sending endpoint command while in usb2 speed. The End Transfer command may be called, and only that command needs to go through at this point. Let's keep it simple and unconditionally disable GUSB2PHYCFG.SUSPHY whenever we issue the command. This scenario is not seen in real hardware. In a rare case, our prototype type-c controller/interface may have a slow response triggerring this issue. Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/5651117207803c26e2f22ddf4e5ce9e865dcf7c7.1668045468.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-22	usb: gadget: uvc: also use try_format in set_format	Michael Grzeschik
	Since e219a712bc06 (usb: gadget: uvc: add v4l2 try_format api call) the try_format function is available. With this function includes checks for valid configurations programmed in the configfs. We use this function to ensure to return valid values on the set_format callback. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Fixes: e219a712bc06 ("usb: gadget: uvc: add v4l2 try_format api call") Link: https://lore.kernel.org/r/20221026182240.363055-1-m.grzeschik@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-22	fbcon: Use kzalloc() in fbcon_prepare_logo()	Tetsuo Handa
	A kernel built with syzbot's config file reported that scr_memcpyw(q, save, array3_size(logo_lines, new_cols, 2)) causes uninitialized "save" to be copied. ---------- [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0 [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1 Console: switching to colour frame buffer device 128x48 ===================================================== BUG: KMSAN: uninit-value in do_update_region+0x4b8/0xba0 do_update_region+0x4b8/0xba0 update_region+0x40d/0x840 fbcon_switch+0x3364/0x35e0 redraw_screen+0xae3/0x18a0 do_bind_con_driver+0x1cb3/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...) Uninit was stored to memory at: fbcon_prepare_logo+0x143b/0x1940 fbcon_init+0x2c1b/0x31c0 visual_init+0x3e7/0x820 do_bind_con_driver+0x14a4/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...) Uninit was created at: __kmem_cache_alloc_node+0xb69/0x1020 __kmalloc+0x379/0x680 fbcon_prepare_logo+0x704/0x1940 fbcon_init+0x2c1b/0x31c0 visual_init+0x3e7/0x820 do_bind_con_driver+0x14a4/0x1df0 do_take_over_console+0x11cb/0x13f0 fbcon_fb_registered+0xacc/0xfd0 register_framebuffer+0x1179/0x1320 __drm_fb_helper_initial_config_and_unlock+0x23ad/0x2b40 drm_fbdev_client_hotplug+0xbea/0xda0 drm_fbdev_generic_setup+0x65e/0x9d0 vkms_init+0x9f3/0xc76 (...snipped...) CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00356-g8f2975c2bb4c #924 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 ---------- Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/cad03d25-0ea0-32c4-8173-fd1895314bce@I-love.SAKURA.ne.jp
2022-11-22	tsnep: Fix rotten packets	Gerhard Engleder
	If PTP synchronisation is done every second, then sporadic the interval is higher than one second: ptp4l[696.582]: master offset -17 s2 freq -1891 path delay 573 ptp4l[697.582]: master offset -22 s2 freq -1901 path delay 573 ptp4l[699.368]: master offset -1 s2 freq -1887 path delay 573 ^^^^^^^ Should be 698.582! This problem is caused by rotten packets, which are received after polling but before interrupts are enabled again. This can be fixed by checking for pending work and rescheduling if necessary after interrupts has been enabled again. Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://lore.kernel.org/r/20221119211825.81805-1-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22	dma-buf: fix racing conflict of dma_heap_add()	Dawei Li
	Racing conflict could be: task A task B list_for_each_entry strcmp(h->name)) list_for_each_entry strcmp(h->name) kzalloc kzalloc ...... ..... device_create device_create list_add list_add The root cause is that task B has no idea about the fact someone else(A) has inserted heap with same name when it calls list_add, so a potential collision occurs. Fixes: c02a81fba74f ("dma-buf: Add dma-buf heaps framework") Signed-off-by: Dawei Li <set_pte_at@outlook.com> Acked-by: Andrew Davis <afd@ti.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/TYCP286MB2323873BBDF88020781FB986CA3B9@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM
2022-11-22	octeontx2-pf: Remove duplicate MACSEC setting	Zheng Bin
	Commit 4581dd480c9e ("net: octeontx2-pf: mcs: consider MACSEC setting") has already added "depends on MACSEC \|\| !MACSEC", so remove it. Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20221119133616.3583538-1-zhengbin13@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22	bnx2x: fix pci device refcount leak in bnx2x_vf_is_pcie_pending()	Yang Yingliang
	As comment of pci_get_domain_bus_and_slot() says, it returns a pci device with refcount increment, when finish using it, the caller must decrement the reference count by calling pci_dev_put(). Call pci_dev_put() before returning from bnx2x_vf_is_pcie_pending() to avoid refcount leak. Fixes: b56e9670ffa4 ("bnx2x: Prepare device and initialize VF database") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221119070202.1407648-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22	regulator: twl6030: fix get status of twl6032 regulators	Andreas Kemnade
	Status is reported as always off in the 6032 case. Status reporting now matches the logic in the setters. Once of the differences to the 6030 is that there are no groups, therefore the state needs to be read out in the lower bits. Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Link: https://lore.kernel.org/r/20221120221208.3093727-3-andreas@kemnade.info Signed-off-by: Mark Brown <broonie@kernel.org>
2022-11-22	regulator: twl6030: re-add TWL6032_SUBCLASS	Andreas Kemnade
	In former times, info->feature was populated via the parent driver by pdata/regulator_init_data->driver_data for all regulators when USB_PRODUCT_ID_LSB indicates a TWL6032. Today, the information is not set, so re-add it at the regulator definitions. Fixes: 25d82337705e2 ("regulator: twl: make driver DT only") Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Link: https://lore.kernel.org/r/20221120221208.3093727-2-andreas@kemnade.info Signed-off-by: Mark Brown <broonie@kernel.org>
2022-11-22	net: sparx5: fix error handling in sparx5_port_open()	Liu Jian
	If phylink_of_phy_connect() fails, the port should be disabled. If sparx5_serdes_set()/phy_power_on() fails, the port should be disabled and the phylink should be stopped and disconnected. Fixes: 946e7fd5053a ("net: sparx5: add port module support") Fixes: f3cad2611a77 ("net: sparx5: add hostmode with phylink support") Signed-off-by: Liu Jian <liujian56@huawei.com> Tested-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Reviewed-by: Steen Hegelund <steen.hegelund@microchip.com> Link: https://lore.kernel.org/r/20221117125918.203997-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22	sfc: fix potential memleak in __ef100_hard_start_xmit()	Zhang Changzhong
	The __ef100_hard_start_xmit() returns NETDEV_TX_OK without freeing skb in error handling case, add dev_kfree_skb_any() to fix it. Fixes: 51b35a454efd ("sfc: skeleton EF100 PF driver") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/1668671409-10909-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22	net: wwan: iosm: use ACPI_FREE() but not kfree() in ipc_pcie_read_bios_cfg()	Wang ShaoBo
	acpi_evaluate_dsm() should be coupled with ACPI_FREE() to free the ACPI memory, because we need to track the allocation of acpi_object when ACPI_DBG_TRACK_ALLOCATIONS enabled, so use ACPI_FREE() instead of kfree(). Fixes: d38a648d2d6c ("net: wwan: iosm: fix memory leak in ipc_pcie_read_bios_cfg") Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lore.kernel.org/r/20221118062447.2324881-1-bobo.shaobowang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-22	HID: usbhid: Add ALWAYS_POLL quirk for some mice	Ankit Patel
	Some additional USB mouse devices are needing ALWAYS_POLL quirk without which they disconnect and reconnect every 60s. Add below devices to the known quirk list. CHERRY VID 0x046a, PID 0x000c MICROSOFT VID 0x045e, PID 0x0783 PRIMAX VID 0x0461, PID 0x4e2a Signed-off-by: Ankit Patel <anpatel@nvidia.com> Signed-off-by: Haotien Hsu <haotienh@nvidia.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-11-22	Merge tag 'gvt-fixes-2022-11-11' of https://github.com/intel/gvt-linux into ↵	Tvrtko Ursulin
	drm-intel-fixes gvt-fixes-2022-11-11 - kvm reference fix from Sean Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221111090208.GQ30028@zhen-hp.sh.intel.com
2022-11-21	ice: fix handling of burst Tx timestamps	Jacob Keller
	Commit 1229b33973c7 ("ice: Add low latency Tx timestamp read") refactored PTP timestamping logic to use a threaded IRQ instead of a separate kthread. This implementation introduced ice_misc_intr_thread_fn and redefined the ice_ptp_process_ts function interface to return a value of whether or not the timestamp processing was complete. ice_misc_intr_thread_fn would take the return value from ice_ptp_process_ts and convert it into either IRQ_HANDLED if there were no more timestamps to be processed, or IRQ_WAKE_THREAD if the thread should continue processing. This is not correct, as the kernel does not re-schedule threaded IRQ functions automatically. IRQ_WAKE_THREAD can only be used by the main IRQ function. This results in the ice_ptp_process_ts function (and in turn the ice_ptp_tx_tstamp function) from only being called exactly once per interrupt. If an application sends a burst of Tx timestamps without waiting for a response, the interrupt will trigger for the first timestamp. However, later timestamps may not have arrived yet. This can result in dropped or discarded timestamps. Worse, on E822 hardware this results in the interrupt logic getting stuck such that no future interrupts will be triggered. The result is complete loss of Tx timestamp functionality. Fix this by modifying the ice_misc_intr_thread_fn to perform its own polling of the ice_ptp_process_ts function. We sleep for a few microseconds between attempts to avoid wasting significant CPU time. The value was chosen to allow time for the Tx timestamps to complete without wasting so much time that we overrun application wait budgets in the worst case. The ice_ptp_process_ts function also currently returns false in the event that the Tx tracker is not initialized. This would result in the threaded IRQ handler never exiting if it gets started while the tracker is not initialized. Fix the function to appropriately return true when the tracker is not initialized. Note that this will not reproduce with default ptp4l behavior, as the program always synchronously waits for a timestamp response before sending another timestamp request. Reported-by: Siddaraju DH <siddaraju.dh@intel.com> Fixes: 1229b33973c7 ("ice: Add low latency Tx timestamp read") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20221118222729.1565317-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21	Merge branch '40GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-11-18 (iavf) Ivan Vecera resolves issues related to reset by adding back call to netif_tx_stop_all_queues() and adding calls to dev_close() to ensure device is properly closed during reset. Stefan Assmann removes waiting for setting of MAC address as this breaks ARP. Slawomir adds setting of __IAVF_IN_REMOVE_TASK bit to prevent deadlock between remove and shutdown. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: Fix race condition between iavf_shutdown and iavf_remove iavf: remove INITIAL_MAC_SET to allow gARP to work properly iavf: Do not restart Tx queues after reset task failure iavf: Fix a crash during reset task ==================== Link: https://lore.kernel.org/r/20221118222439.1565245-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21	net: phy: at803x: fix error return code in at803x_probe()	Wei Yongjun
	Fix to return a negative error code from the ccr read error handling case instead of 0, as done elsewhere in this function. Fixes: 3265f4218878 ("net: phy: at803x: add fiber support") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221118103635.254256-1-weiyongjun@huaweicloud.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-21	net/mlx5e: Fix possible race condition in macsec extended packet number ↵	Emeel Hakim
	update routine Currenty extended packet number (EPN) update routine is accessing macsec object without holding the general macsec lock hence facing a possible race condition when an EPN update occurs while updating or deleting the SA. Fix by holding the general macsec lock before accessing the object. Fixes: 4411a6c0abd3 ("net/mlx5e: Support MACsec offload extended packet number (EPN)") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5e: Fix MACsec update SecY	Emeel Hakim
	Currently updating SecY destroys and re-creates RX SA objects, the re-created RX SA objects are not identical to the destroyed objects and it disagree on the encryption enabled property which holds the value false after recreation, this value is not supported with offload which leads to no traffic after an update. Fix by recreating an identical objects. Fixes: 5a39816a75e5 ("net/mlx5e: Add MACsec offload SecY support") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5e: Fix MACsec SA initialization routine	Emeel Hakim
	Currently as part of MACsec SA initialization routine extended packet number (EPN) object attribute is always being set without checking if EPN is actually enabled, the above could lead to a NULL dereference. Fix by adding such a check. Fixes: 4411a6c0abd3 ("net/mlx5e: Support MACsec offload extended packet number (EPN)") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5e: Remove leftovers from old XSK queues enumeration	Tariq Toukan
	Before the cited commit, for N channels, a dedicated set of N queues was created to support XSK, in indices [N, 2N-1], doubling the number of queues. In addition, changing the number of channels was prohibited, as it would shift the indices. Remove these two leftovers, as we moved XSK to a new queueing scheme, starting from index 0. Fixes: 3db4c85cde7a ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5e: Offload rule only when all encaps are valid	Chris Mi
	The cited commit adds a for loop to support multiple encapsulations. But it only checks if the last encap is valid. Fix it by setting slow path flag when one of the encap is invalid. Fixes: f493f15534ec ("net/mlx5e: Move flow attr reformat action bit to per dest flags") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5e: Fix missing alignment in size of MTT/KLM entries	Tariq Toukan
	In the cited patch, an alignment required by the HW spec was mistakenly dropped. Bring it back to fix error completions like the below: mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x40b, ci 0x0, qn 0x104f, opcode 0xd, syndrome 0x2, vendor syndrome 0x68 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 86 00 68 02 25 00 10 4f 00 00 bb d2 WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x0, len: 192 00000000: 00 00 00 25 00 10 4f 0c 00 00 00 00 00 18 2e 00 00000010: 90 00 00 00 00 02 00 00 00 00 00 00 20 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000080: 08 00 00 00 48 6a 00 02 08 00 00 00 0e 10 00 02 00000090: 08 00 00 00 0c db 00 02 08 00 00 00 0e 82 00 02 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: 9f123f740428 ("net/mlx5e: Improve MTT/KSM alignment") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5: Fix sync reset event handler error flow	Moshe Shemesh
	When sync reset now event handling fails on mlx5_pci_link_toggle() then no reset was done. However, since mlx5_cmd_fast_teardown_hca() was already done, the firmware function is closed and the driver is left without firmware functionality. Fix it by setting device error state and reopen the firmware resources. Reopening is done by the thread that was called for devlink reload fw_activate as it already holds the devlink lock. Fixes: 5ec697446f46 ("net/mlx5: Add support for devlink reload action fw activate") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5: E-Switch, Set correctly vport destination	Roi Dayan
	The cited commit moved from using reformat_id integer to packet_reformat pointer which introduced the possibility to null pointer dereference. When setting packet reformat flag and pkt_reformat pointer must exists so checking MLX5_ESW_DEST_ENCAP is not enough, we need to make sure the pkt_reformat is valid and check for MLX5_ESW_DEST_ENCAP_VALID. If the dest encap valid flag does not exists then pkt_reformat can be either invalid address or null. Also, to make sure we don't try to access invalid pkt_reformat set it to null when invalidated and invalidate it before calling add flow code as its logically more correct and to be safe. Fixes: 2b688ea5efde ("net/mlx5: Add flow steering actions to fs_cmd shim layer") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5: Lag, avoid lockdep warnings	Eli Cohen
	ldev->lock is used to serialize lag change operations. Since multiport eswtich functionality was added, we now change the mode dynamically. However, acquiring ldev->lock is not allowed as it could possibly lead to a deadlock as reported by the lockdep mechanism. [ 836.154963] WARNING: possible circular locking dependency detected [ 836.155850] 5.19.0-rc5_net_56b7df2 #1 Not tainted [ 836.156549] ------------------------------------------------------ [ 836.157418] handler1/12198 is trying to acquire lock: [ 836.158178] ffff888187d52b58 (&ldev->lock){+.+.}-{3:3}, at: mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.159575] [ 836.159575] but task is already holding lock: [ 836.160474] ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200 [ 836.161669] which lock already depends on the new lock. [ 836.162905] [ 836.162905] the existing dependency chain (in reverse order) is: [ 836.164008] -> #3 (&block->cb_lock){++++}-{3:3}: [ 836.164946] down_write+0x25/0x60 [ 836.165548] tcf_block_get_ext+0x1c6/0x5d0 [ 836.166253] ingress_init+0x74/0xa0 [sch_ingress] [ 836.167028] qdisc_create.constprop.0+0x130/0x5e0 [ 836.167805] tc_modify_qdisc+0x481/0x9f0 [ 836.168490] rtnetlink_rcv_msg+0x16e/0x5a0 [ 836.169189] netlink_rcv_skb+0x4e/0xf0 [ 836.169861] netlink_unicast+0x190/0x250 [ 836.170543] netlink_sendmsg+0x243/0x4b0 [ 836.171226] sock_sendmsg+0x33/0x40 [ 836.171860] ____sys_sendmsg+0x1d1/0x1f0 [ 836.172535] ___sys_sendmsg+0xab/0xf0 [ 836.173183] __sys_sendmsg+0x51/0x90 [ 836.173836] do_syscall_64+0x3d/0x90 [ 836.174471] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 836.175282] [ 836.175282] -> #2 (rtnl_mutex){+.+.}-{3:3}: [ 836.176190] __mutex_lock+0x6b/0xf80 [ 836.176830] register_netdevice_notifier+0x21/0x120 [ 836.177631] rtnetlink_init+0x2d/0x1e9 [ 836.178289] netlink_proto_init+0x163/0x179 [ 836.178994] do_one_initcall+0x63/0x300 [ 836.179672] kernel_init_freeable+0x2cb/0x31b [ 836.180403] kernel_init+0x17/0x140 [ 836.181035] ret_from_fork+0x1f/0x30 [ 836.181687] -> #1 (pernet_ops_rwsem){+.+.}-{3:3}: [ 836.182628] down_write+0x25/0x60 [ 836.183235] unregister_netdevice_notifier+0x1c/0xb0 [ 836.184029] mlx5_ib_roce_cleanup+0x94/0x120 [mlx5_ib] [ 836.184855] __mlx5_ib_remove+0x35/0x60 [mlx5_ib] [ 836.185637] mlx5_eswitch_unregister_vport_reps+0x22f/0x440 [mlx5_core] [ 836.186698] auxiliary_bus_remove+0x18/0x30 [ 836.187409] device_release_driver_internal+0x1f6/0x270 [ 836.188253] bus_remove_device+0xef/0x160 [ 836.188939] device_del+0x18b/0x3f0 [ 836.189562] mlx5_rescan_drivers_locked+0xd6/0x2d0 [mlx5_core] [ 836.190516] mlx5_lag_remove_devices+0x69/0xe0 [mlx5_core] [ 836.191414] mlx5_do_bond_work+0x441/0x620 [mlx5_core] [ 836.192278] process_one_work+0x25c/0x590 [ 836.192963] worker_thread+0x4f/0x3d0 [ 836.193609] kthread+0xcb/0xf0 [ 836.194189] ret_from_fork+0x1f/0x30 [ 836.194826] -> #0 (&ldev->lock){+.+.}-{3:3}: [ 836.195734] __lock_acquire+0x15b8/0x2a10 [ 836.196426] lock_acquire+0xce/0x2d0 [ 836.197057] __mutex_lock+0x6b/0xf80 [ 836.197708] mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.198575] tc_act_parse_mirred+0x25b/0x800 [mlx5_core] [ 836.199467] parse_tc_actions+0x168/0x5a0 [mlx5_core] [ 836.200340] __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core] [ 836.201241] mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core] [ 836.202187] tc_setup_cb_add+0xd7/0x200 [ 836.202856] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 836.203739] fl_change+0xbbe/0x1730 [cls_flower] [ 836.204501] tc_new_tfilter+0x407/0xd90 [ 836.205168] rtnetlink_rcv_msg+0x406/0x5a0 [ 836.205877] netlink_rcv_skb+0x4e/0xf0 [ 836.206535] netlink_unicast+0x190/0x250 [ 836.207217] netlink_sendmsg+0x243/0x4b0 [ 836.207915] sock_sendmsg+0x33/0x40 [ 836.208538] ____sys_sendmsg+0x1d1/0x1f0 [ 836.209219] ___sys_sendmsg+0xab/0xf0 [ 836.209878] __sys_sendmsg+0x51/0x90 [ 836.210510] do_syscall_64+0x3d/0x90 [ 836.211137] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 836.211954] other info that might help us debug this: [ 836.213174] Chain exists of: [ 836.213174] &ldev->lock --> rtnl_mutex --> &block->cb_lock 836.214650] Possible unsafe locking scenario: [ 836.214650] [ 836.215574] CPU0 CPU1 [ 836.216255] ---- ---- [ 836.216943] lock(&block->cb_lock); [ 836.217518] lock(rtnl_mutex); [ 836.218348] lock(&block->cb_lock); [ 836.219212] lock(&ldev->lock); [ 836.219758] [ 836.219758] * DEADLOCK * [ 836.219758] [ 836.220747] 2 locks held by handler1/12198: [ 836.221390] #0: ffff8881d4de2930 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200 [ 836.222646] #1: ffff88810c9a92c0 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core] [ 836.224063] stack backtrace: [ 836.224799] CPU: 6 PID: 12198 Comm: handler1 Not tainted 5.19.0-rc5_net_56b7df2 #1 [ 836.225923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 836.227476] Call Trace: [ 836.227929] <TASK> [ 836.228332] dump_stack_lvl+0x57/0x7d [ 836.228924] check_noncircular+0x104/0x120 [ 836.229562] __lock_acquire+0x15b8/0x2a10 [ 836.230201] lock_acquire+0xce/0x2d0 [ 836.230776] ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.231614] ? find_held_lock+0x2b/0x80 [ 836.232221] __mutex_lock+0x6b/0xf80 [ 836.232799] ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.233636] ? mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.234451] ? xa_load+0xc3/0x190 [ 836.234995] mlx5_lag_do_mirred+0x3b/0x70 [mlx5_core] [ 836.235803] tc_act_parse_mirred+0x25b/0x800 [mlx5_core] [ 836.236636] ? tc_act_can_offload_mirred+0x135/0x210 [mlx5_core] [ 836.237550] parse_tc_actions+0x168/0x5a0 [mlx5_core] [ 836.238364] __mlx5e_add_fdb_flow+0x263/0x480 [mlx5_core] [ 836.239202] mlx5e_configure_flower+0x8a0/0x1820 [mlx5_core] [ 836.240076] ? lock_acquire+0xce/0x2d0 [ 836.240668] ? tc_setup_cb_add+0x5b/0x200 [ 836.241294] tc_setup_cb_add+0xd7/0x200 [ 836.241917] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 836.242709] fl_change+0xbbe/0x1730 [cls_flower] [ 836.243408] tc_new_tfilter+0x407/0xd90 [ 836.244043] ? tc_del_tfilter+0x880/0x880 [ 836.244672] rtnetlink_rcv_msg+0x406/0x5a0 [ 836.245310] ? netlink_deliver_tap+0x7a/0x4b0 [ 836.245991] ? if_nlmsg_stats_size+0x2b0/0x2b0 [ 836.246675] netlink_rcv_skb+0x4e/0xf0 [ 836.258046] netlink_unicast+0x190/0x250 [ 836.258669] netlink_sendmsg+0x243/0x4b0 [ 836.259288] sock_sendmsg+0x33/0x40 [ 836.259857] ____sys_sendmsg+0x1d1/0x1f0 [ 836.260473] ___sys_sendmsg+0xab/0xf0 [ 836.261064] ? lock_acquire+0xce/0x2d0 [ 836.261669] ? find_held_lock+0x2b/0x80 [ 836.262272] ? __fget_files+0xb9/0x190 [ 836.262871] ? __fget_files+0xd3/0x190 [ 836.263462] __sys_sendmsg+0x51/0x90 [ 836.264064] do_syscall_64+0x3d/0x90 [ 836.264652] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 836.265425] RIP: 0033:0x7fdbe5e2677d [ 836.266012] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 ba ee ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 ee ee ff ff 48 [ 836.268485] RSP: 002b:00007fdbe48a75a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 836.269598] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fdbe5e2677d [ 836.270576] RDX: 0000000000000000 RSI: 00007fdbe48a7640 RDI: 000000000000003c [ 836.271565] RBP: 00007fdbe48a8368 R08: 0000000000000000 R09: 0000000000000000 [ 836.272546] R10: 00007fdbe48a84b0 R11: 0000000000000293 R12: 0000557bd17dc860 [ 836.273527] R13: 0000000000000000 R14: 0000557bd17dc860 R15: 00007fdbe48a7640 [ 836.274521] </TASK> To avoid using mode holding ldev->lock in the configure flow, we queue a work to the lag workqueue and cease wait on a completion object. In addition, we remove the lock from mlx5_lag_do_mirred() since it is not really protecting anything. It should be noted that an actual deadlock has not been observed. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5: Fix handling of entry refcount when command is not issued to FW	Moshe Shemesh
	In case command interface is down, or the command is not allowed, driver did not increment the entry refcount, but might have decrement as part of forced completion handling. Fix that by always increment and decrement the refcount to make it symmetric for all flows. Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler") Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reported-by: Jack Wang <jinpu.wang@ionos.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5: cmdif, Print info on any firmware cmd failure to tracepoint	Moshe Shemesh
	While moving to new CMD API (quiet API), some pre-existing flows may call the new API function that in case of error, returns the error instead of printing it as previously done. For such flows we bring back the print but to tracepoint this time for sys admins to have the ability to check for errors especially for commands using the new quiet API. Tracepoint output example: devlink-1333 [001] ..... 822.746922: mlx5_cmd: ACCESS_REG(0x805) op_mod(0x0) failed, status bad resource(0x5), syndrome (0xb06e1f), err(-22) Fixes: f23519e542e5 ("net/mlx5: cmdif, Add new api for command execution") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5: SF: Fix probing active SFs during driver probe phase	Shay Drory
	When SF devices and SF port representors are located on different functions, unloading and reloading of SF parent driver doesn't recreate the existing SF present in the device. Fix it by querying SFs and probe active SFs during driver probe phase. Fixes: 90d010b8634b ("net/mlx5: SF, Add auxiliary device support") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5: Fix FW tracer timestamp calculation	Moshe Shemesh
	Fix a bug in calculation of FW tracer timestamp. Decreasing one in the calculation should effect only bits 52_7 and not effect bits 6_0 of the timestamp, otherwise bits 6_0 are always set in this calculation. Fixes: 70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel tracing support") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Feras Daoud <ferasda@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	net/mlx5: Do not query pci info while pci disabled	Roy Novich
	The driver should not interact with PCI while PCI is disabled. Trying to do so may result in being unable to get vital signs during PCI reset, driver gets timed out and fails to recover. Fixes: fad1783a6d66 ("net/mlx5: Print more info on pci error handlers") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-11-21	device-dax: Fix duplicate 'hmem' device registration	Dan Williams
	So called "soft-reserved" memory is an EFI conventional memory range with the EFI_MEMORY_SP attribute set. That attribute indicates that the memory is not part of the platform general purpose memory pool and may want some consideration from the system administrator about whether to keep that memory set aside for dedicated access through device-dax (map a device file), or assigned to the page allocator as another general purpose memory node target. Absent an ACPI HMAT table the default device-dax registration creates coarse grained devices that are delineated by EFI Memory Map entries. With the HMAT the devices are delineated by the finer grained ranges associated with the proximity domain of the memory target. I.e. the HMAT describes the properties of performance differentiated memory and each unique performance description results in a unique target proximity domain where each memory proximity domain has an associated SRAT entry that delineates the address range. The intent was that SRAT-defined device-dax instances are registered first. Then any left-over address range with the EFI_MEMORY_SP attribute, but not covered by the SRAT, would have a coarse grained device-dax instance established. However, the scheme to detect what ranges are left to be assigned to a device was buggy and resulted in multiple overlapping device-dax instances. Fix this by using explicit tracking for which ranges have been handled. Now, this new approach may leave memory stranded in the presence of broken platform firmware that fails to fully describe all EFI_MEMORY_SP ranges in the HMAT. That requires a deeper fix if it becomes a problem in practice. Reported-by: "Tallam Mahendra Kumar" <tallam.mahendra.kumar@intel.com> Reported-by: Mustafa Hajeer <mustafa.hajeer@intel.com> Debugged-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166890823379.4183293.15333502171004313377.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-21	drm/amd/amdgpu: reserve vm invalidation engine for firmware	Jack Xiao
	If mes enabled, reserve VM invalidation engine 5 for firmware. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x
2022-11-21	drm/amdgpu: Enable Aldebaran devices to report CU Occupancy	Ramesh Errabolu
	Allow user to know number of compute units (CU) that are in use at any given moment. Enable access to the method kgd_gfx_v9_get_cu_occupancy that computes CU occupancy. Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-11-21	drm/amdgpu: fix userptr HMM range handling v2	Christian König
	The basic problem here is that it's not allowed to page fault while holding the reservation lock. So it can happen that multiple processes try to validate an userptr at the same time. Work around that by putting the HMM range object into the mutex protected bo list for now. v2: make sure range is set to NULL in case of an error Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> CC: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21	drm/amdgpu: always register an MMU notifier for userptr	Christian König
	Since switching to HMM we always need that because we no longer grab references to the pages. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> CC: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21	drm/amdgpu/dm/mst: Fix uninitialized var in ↵	Lyude Paul
	pre_compute_mst_dsc_configs_for_state() Coverity noticed this one, so let's fix it. Fixes: ba891436c2d2b2 ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking") Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Cc: stable@vger.kernel.org # v5.6+
2022-11-21	drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing DSC state	Lyude Paul
	Now that we've fixed the issue with using the incorrect topology manager, we're actually grabbing the topology manager's lock - and consequently deadlocking. Luckily for us though, there's actually nothing in AMD's DSC state computation code that really should need this lock. The one exception is the mutex_lock() in dm_dp_mst_is_port_support_mode(), however we grab no locks beneath &mgr->lock there so that should be fine to leave be. Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171 Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share") Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21	drm/amdgpu/dm/mst: Use the correct topology mgr pointer in amdgpu_dm_connector	Lyude Paul
	This bug hurt me. Basically, it appears that we've been grabbing the entirely wrong mutex in the MST DSC computation code for amdgpu! While we've been grabbing: amdgpu_dm_connector->mst_mgr That's zero-initialized memory, because the only connectors we'll ever actually be doing DSC computations for are MST ports. Which have mst_mgr zero-initialized, and instead have the correct topology mgr pointer located at: amdgpu_dm_connector->mst_port->mgr; I'm a bit impressed that until now, this code has managed not to crash anyone's systems! It does seem to cause a warning in LOCKDEP though: [ 66.637670] DEBUG_LOCKS_WARN_ON(lock->magic != lock) This was causing the problems that appeared to have been introduced by: commit 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state") This wasn't actually where they came from though. Presumably, before the only thing we were doing with the topology mgr pointer was attempting to grab mst_mgr->lock. Since the above commit however, we grab much more information from mst_mgr including the atomic MST state and respective modesetting locks. This patch also implies that up until now, it's quite likely we could be susceptible to race conditions when going through the MST topology state for DSC computations since we technically will not have grabbed any lock when going through it. So, let's fix this by adjusting all the respective code paths to look at the right pointer and skip things that aren't actual MST connectors from a topology. Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171 Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share") Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21	drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs() return code	Lyude Paul
	Looks like that we're accidentally dropping a pretty important return code here. For some reason, we just return -EINVAL if we fail to get the MST topology state. This is wrong: error codes are important and should never be squashed without being handled, which here seems to have the potential to cause a deadlock. Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Fixes: 8ec046716ca8 ("drm/dp_mst: Add helper to trigger modeset on affected DSC MST CRTCs") Cc: <stable@vger.kernel.org> # v5.6+ Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21	drm/amdgpu/mst: Stop ignoring error codes and deadlocking	Lyude Paul
	It appears that amdgpu makes the mistake of completely ignoring the return values from the DP MST helpers, and instead just returns a simple true/false. In this case, it seems to have come back to bite us because as a result of simply returning false from compute_mst_dsc_configs_for_state(), amdgpu had no way of telling when a deadlock happened from these helpers. This could definitely result in some kernel splats. V2: * Address Wayne's comments (fix another bunch of spots where we weren't passing down return codes) Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share") Cc: Harry Wentland <harry.wentland@amd.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21	drm/amd/display: Align dcn314_smu logging with other DCNs	Roman Li
	[Why] Assert on non-OK response from SMU is unnecessary. It was replaced with respective log message on other asics in the past with commit: "drm/amd/display: Removing assert statements for Linux" [How] Remove assert and add dbg logging as on other DCNs. Signed-off-by: Roman Li <roman.li@amd.com> Reviewed-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-21	HID: core: fix shift-out-of-bounds in hid_report_raw_event	ZhangPeng
	Syzbot reported shift-out-of-bounds in hid_report_raw_event. microsoft 0003:045E:07DA.0001: hid_field_extract() called with n (128) > 32! (swapper/0) ====================================================================== UBSAN: shift-out-of-bounds in drivers/hid/hid-core.c:1323:20 shift exponent 127 is too large for 32-bit type 'int' CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc4-syzkaller-00159-g4bbf3422df78 #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_shift_out_of_bounds+0x3a6/0x420 lib/ubsan.c:322 snto32 drivers/hid/hid-core.c:1323 [inline] hid_input_fetch_field drivers/hid/hid-core.c:1572 [inline] hid_process_report drivers/hid/hid-core.c:1665 [inline] hid_report_raw_event+0xd56/0x18b0 drivers/hid/hid-core.c:1998 hid_input_report+0x408/0x4f0 drivers/hid/hid-core.c:2066 hid_irq_in+0x459/0x690 drivers/hid/usbhid/hid-core.c:284 __usb_hcd_giveback_urb+0x369/0x530 drivers/usb/core/hcd.c:1671 dummy_timer+0x86b/0x3110 drivers/usb/gadget/udc/dummy_hcd.c:1988 call_timer_fn+0xf5/0x210 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers+0x76a/0x980 kernel/time/timer.c:1790 run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1803 __do_softirq+0x277/0x75b kernel/softirq.c:571 __irq_exit_rcu+0xec/0x170 kernel/softirq.c:650 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662 sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1107 ====================================================================== If the size of the integer (unsigned n) is bigger than 32 in snto32(), shift exponent will be too large for 32-bit type 'int', resulting in a shift-out-of-bounds bug. Fix this by adding a check on the size of the integer (unsigned n) in snto32(). To add support for n greater than 32 bits, set n to 32, if n is greater than 32. Reported-by: syzbot+8b1641d2f14732407e23@syzkaller.appspotmail.com Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split") Signed-off-by: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>