linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-12-07	arm64: dts: apple: t8103: Mark PCIe PERST# polarity active low in DT	Marc Zyngier
	As the name indicates, PERST# is active low. Fix the DT description to match the HW behaviour. Fixes: ff2a8d91d80c ("arm64: apple: Add PCIe node") Link: https://lore.kernel.org/r/20211123180636.80558-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net> Reviewed-by: Mark Kettenis <kettenis@openbsd.org> Acked-by: Arnd Bergmann <arnd@arndb.de>
2021-12-07	clk: versatile: clk-icst: use after free on error path	Dan Carpenter
	This frees "name" and then tries to display in as part of the error message on the next line. Swap the order. Fixes: 1b2189f3aa50 ("clk: versatile: clk-icst: Ensure clock names are unique") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211117072604.GC5237@kili Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2021-12-07	net/qla3xxx: fix an error code in ql_adapter_up()	Dan Carpenter
	The ql_wait_for_drvr_lock() fails and returns false, then this function should return an error code instead of returning success. The other problem is that the success path prints an error message netdev_err(ndev, "Releasing driver lock\n"); Delete that and re-order the code a little to make it more clear. Fixes: 5a4faa873782 ("[PATCH] qla3xxx NIC driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211207082416.GA16110@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07	Merge tag 'linux-can-fixes-for-5.16-20211207' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== can 2021-12-07 The 1st patch is by Vincent Mailhol and fixes a use after free in the pch_can driver. Dan Carpenter fixes a use after free in the ems_pcmcia sja1000 driver. The remaining 7 patches target the m_can driver. Brian Silverman contributes a patch to disable and ignore the ELO interrupt, which is currently not handled in the driver and may lead to an interrupt storm. Vincent Mailhol's patch fixes a memory leak in the error path of the m_can_read_fifo() function. The remaining patches are contributed by Matthias Schiffer, first a iomap_read_fifo() and iomap_write_fifo() functions are fixed in the PCI glue driver, then the clock rate for the Intel Ekhart Lake platform is fixed, the last 3 patches add support for the custom bit timings on the Elkhart Lake platform. * tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: m_can: pci: use custom bit timings for Elkhart Lake can: m_can: make custom bittiming fields const Revert "can: m_can: remove support for custom bit timing" can: m_can: pci: fix incorrect reference clock rate can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo() can: m_can: m_can_read_fifo: fix memory leak in error branch can: m_can: Disable and ignore ELO interrupt can: sja1000: fix use after free in ems_pcmcia_add_card() can: pch_can: pch_can_rx_normal: fix use after free ==================== Link: https://lore.kernel.org/r/20211207102420.120131-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07	xfs: remove all COW fork extents when remounting readonly	Darrick J. Wong
	As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. Solve this race by forcing the xfs_blockgc_free_space to run synchronously, which causes xfs_icwalk to return to inodes that were skipped because the blockgc code couldn't take the IOLOCK. This is safe to do here because the VFS has already prohibited new writer threads. Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
2021-12-07	drm/amdgpu: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi	Claudio Suarez
	Once EDID is parsed, the monitor HDMI support information is available through drm_display_info.is_hdmi. The amdgpu driver still calls drm_detect_hdmi_monitor() to retrieve the same information, which is less efficient. Change to drm_display_info.is_hdmi This is a TODO task in Documentation/gpu/todo.rst Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Claudio Suarez <cssk@net-c.es> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amdgpu: use drm_edid_get_monitor_name() instead of duplicating the code	Claudio Suarez
	Use drm_edid_get_monitor_name() instead of duplicating the code that parses the EDID in dm_helpers_parse_edid_caps() Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Claudio Suarez <cssk@net-c.es> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amdgpu: update drm_display_info correctly when the edid is read	Claudio Suarez
	drm_display_info is updated by drm_get_edid() or drm_connector_update_edid_property(). In the amdgpu driver it is almost always updated when the edid is read in amdgpu_connector_get_edid(), but not always. Change amdgpu_connector_get_edid() and amdgpu_connector_free_edid() to keep drm_display_info updated. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Claudio Suarez <cssk@net-c.es> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Fix out of bounds access on DNC31 stream encoder regs	Nicholas Kazlauskas
	[Why] During dcn31_stream_encoder_create, if PHYC/D get remapped to F/G on B0 then we'll index 5 or 6 into a array of length 5 - leading to an access violation on some configs during device creation. [How] Software won't be touching PHYF/PHYG directly, so just extend the array to cover all possible engine IDs. Even if it does by try to access one of these registers by accident the offset will be 0 and we'll get a warning during the access. Fixes: 2fe9a0e1173f ("drm/amd/display: Fix DCN3 B0 DP Alt Mapping") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amdgpu: skip umc ras error count harvest	Stanley.Yang
	remove in recovery stat check, skip umc ras err cnt harvest in amdgpu_ras_log_on_err_counter Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amdgpu: free vkms_output after use	Flora Cui
	Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amdgpu: drop the critial WARN_ON in amdgpu_vkms	Flora Cui
	Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Reduce stack usage	Aric Cyr
	Reduce stack usage by moving an unnecessary structure copy to a pointer. Reviewed-by: Joshua Aberback <Joshua.Aberback@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Query DMCUB for dp alt status	Nicholas Kazlauskas
	[Why] To avoid hanging RDPCSPIPE when INTERCEPTB isn't set. DMCUB owns control of that bit so DMCUB should manage returning the information driver needs for link encoder control. [How] Add a new DMCUB command to return dp alt disable and dp4 information. Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: [FW Promotion] Release 0.0.96	Anthony Koo
	Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: add a debug option to force dp2 lt fallback method	Wenjing Liu
	[why] A debug option is needed to temporarily force dp2 new link training fallback method for debugging purpose. Reviewed-by: George Shen <George.Shen@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Rename a struct field to describe a cea component better	Oliver Logush
	[why] Need to fix the code so it does not use reserved keywords [how] Change the total_length member of the cea struct Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Oliver Logush <oliver.logush@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Adding dpia debug bits for hpd delay	Meenakshikumar Somasundaram
	[Why] Need to have dpia debug bits for configuring hpd delay. [How] Added hpd_delay_in_ms variable in dpia_debug_options. Reviewed-by: Jimmy Kizito <Jimmy.Kizito@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: meenakshikumar somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Move link_enc init logic to DC	Jude Shih
	[Why] We shouldn't be accessing res_pool funcs from DM level, therefore, we should create API and let the flow be done in DC level. [How] We create new interface dp_get_link_enc to access and get the correct link_enc Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Jude Shih <shenshih@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Fix bug in debugfs crc_win_update entry	Wayne Lin
	[Why] crc_rd_wrk shouldn't be null in crc_win_update_set(). Current programming logic is inconsistent in crc_win_update_set(). [How] Initially, return if crc_rd_wrk is NULL. Later on, we can use member of crc_rd_wrk safely. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 9a65df193108 ("drm/amd/display: Use PSP TA to read out crc") Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: prevent reading unitialized links	Mikita Lipski
	[why/how] The function can be called on boot or after suspend when links are not initialized, to prevent it guard it with NULL pointer check Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Mikita Lipski <mikita.lipski@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Added Check For dc->res_pool	Jarif Aftab
	[WHY] -To ensure dc->res_pool has been initialized [HOW] -Check if dc->res_pool is true in the if statement Reviewed-by: Martin Leung <Martin.Leung@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Jarif Aftab <jaraftab@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	Merge tag 'platform-drivers-x86-v5.16-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "Various bug-fixes and hardware-id additions" * tag 'platform-drivers-x86-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel: hid: add quirk to support Surface Go 3 platform/x86: amd-pmc: Fix s2idle failures on certain AMD laptops platform/x86: touchscreen_dmi: Add TrekStor SurfTab duo W1 touchscreen info platform/x86: lg-laptop: Recognize more models platform/x86: thinkpad_acpi: Add lid_logo_dot to the list of safe LEDs platform/x86: thinkpad_acpi: Restore missing hotkey_tablet_mode and hotkey_radio_sw sysfs-attr
2021-12-07	drm/amd/display: Prevent PSR disable/reenable in HPD IRQ	Wyatt Wood
	[Why] When HPD IRQ occurs, it triggers a PSR disable and reenable directly through dc layer. Since it does not pass through the power layer, the layer that tracks whether PSR is enabled or disabled and which masks are set, this layer is now out of sync with the real PSR state in FW. Theoretically PSR can be enabled during hw programming sequences or any other situation where we must disable PSR. [How] Check if PSR is enabled before doing PSR disable/reenable. Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Wyatt Wood <wyatt.wood@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Fix DPIA outbox timeout after S3/S4/reset	Nicholas Kazlauskas
	[Why] The HW interrupt gets disabled after S3/S4/reset so we don't receive notifications for HPD or AUX from DMUB - leading to timeout and black screen with (or without) DPIA links connected. [How] Re-enable the interrupt after S3/S4/reset like we do for the other DC interrupts. Guard both instances of the outbox interrupt enable or we'll hang during restore on ASIC that don't support it. Fixes: 524a0ba6fab955 ("drm/amd/display: Fix DPIA outbox timeout after GPU reset") Reviewed-by: Jude Shih <Jude.Shih@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Add W/A for PHY tests with certain LTTPR	George Shen
	[Why] Certain LTTPR require output VS/PE to be explicitly set during PHY test automation. [How] Add vendor-specific sequence to set LTTPR output VS/PE. Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: George Shen <George.Shen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amd/display: Apply LTTPR workarounds to non-transparent mode	George Shen
	[Why] Some of the vendor-specific workarounds added for transparent mode also need to be applied to non-transparent mode in order to succeed link training consistently. [How] Remove transparent mode check for the required workarounds. Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Pavle Kotarac <Pavle.Kotarac@amd.com> Signed-off-by: George Shen <George.Shen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amdgpu: only skip get ecc info for aldebaran	Stanley.Yang
	skip get ecc info for aldebarn through check ip version do not affect other asic type Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	drm/amdkfd: Correct the value of the no_atomic_fw_version variable	chen gong
	145: navi10 IP_VERSION(10, 1, 10) navi12 IP_VERSION(10, 1, 2) navi14 IP_VERSION(10, 1, 1) 92: sienna_cichlid IP_VERSION(10, 3, 0) navy_flounder IP_VERSION(10, 3, 2) vangogh IP_VERSION(10, 3, 1) dimgrey_cavefish IP_VERSION(10, 3, 4) beige_goby IP_VERSION(10, 3, 5) yellow_carp IP_VERSION(10, 3, 3) Signed-off-by: chen gong <curry.gong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Graham Sider <Graham.Sider@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-07	RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQ	Tatyana Nikolova
	Completion events (CEs) are lost if the application is allowed to arm the CQ more than two times when no new CE for this CQ has been generated by the HW. Check if arming has been done for the CQ and if not, arm the CQ for any event otherwise promote to arm the CQ for any event only when the last arm event was solicited. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20211201231509.1930-2-shiraz.saleem@intel.com Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07	RDMA/irdma: Report correct WC errors	Shiraz Saleem
	Return IBV_WC_REM_OP_ERR for responder QP errors instead of IBV_WC_REM_ACCESS_ERR. Return IBV_WC_LOC_QP_OP_ERR for errors detected on the SQ with bad opcodes Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20211201231509.1930-1-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07	RDMA/irdma: Fix a potential memory allocation issue in ↵	Christophe JAILLET
	'irdma_prm_add_pble_mem()' 'pchunk->bitmapbuf' is a bitmap. Its size (in number of bits) is stored in 'pchunk->sizeofbitmap'. When it is allocated, the size (in bytes) is computed by: size_in_bits >> 3 There are 2 issues (numbers bellow assume that longs are 64 bits): - there is no guarantee here that 'pchunk->bitmapmem.size' is modulo BITS_PER_LONG but bitmaps are stored as longs (sizeofbitmap=8 bits will only allocate 1 byte, instead of 8 (1 long)) - the number of bytes is computed with a shift, not a round up, so we may allocate less memory than needed (sizeofbitmap=65 bits will only allocate 8 bytes (i.e. 1 long), when 2 longs are needed = 16 bytes) Fix both issues by using 'bitmap_zalloc()' and remove the useless 'bitmapmem' from 'struct irdma_chunk'. While at it, remove some useless NULL test before calling kfree/bitmap_free. Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions") Link: https://lore.kernel.org/r/5e670b640508e14b1869c3e8e4fb970d78cbe997.1638692171.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07	RDMA/irdma: Fix a user-after-free in add_pble_prm	Shiraz Saleem
	When irdma_hmc_sd_one fails, 'chunk' is freed while its still on the PBLE info list. Add the chunk entry to the PBLE info list only after successful setting of the SD in irdma_hmc_sd_one. Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager") Link: https://lore.kernel.org/r/20211207152135.2192-1-shiraz.saleem@intel.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07	IB/hfi1: Fix leak of rcvhdrtail_dummy_kvaddr	Mike Marciniszyn
	This buffer is currently allocated in hfi1_init(): if (reinit) ret = init_after_reset(dd); else ret = loadtime_init(dd); if (ret) goto done; /* allocate dummy tail memory for all receive contexts */ dd->rcvhdrtail_dummy_kvaddr = dma_alloc_coherent(&dd->pcidev->dev, sizeof(u64), &dd->rcvhdrtail_dummy_dma, GFP_KERNEL); if (!dd->rcvhdrtail_dummy_kvaddr) { dd_dev_err(dd, "cannot allocate dummy tail memory\n"); ret = -ENOMEM; goto done; } The reinit triggered path will overwrite the old allocation and leak it. Fix by moving the allocation to hfi1_alloc_devdata() and the deallocation to hfi1_free_devdata(). Link: https://lore.kernel.org/r/20211129192008.101968.91302.stgit@awfm-01.cornelisnetworks.com Cc: stable@vger.kernel.org Fixes: 46b010d3eeb8 ("staging/rdma/hfi1: Workaround to prevent corruption during packet delivery") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07	IB/hfi1: Fix early init panic	Mike Marciniszyn
	The following trace can be observed with an init failure such as firmware load failures: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0010 [#1] SMP PTI CPU: 0 PID: 537 Comm: kworker/0:3 Tainted: G OE --------- - - 4.18.0-240.el8.x86_64 #1 Workqueue: events work_for_cpu_fn RIP: 0010:0x0 Code: Bad RIP value. RSP: 0000:ffffae5f878a3c98 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff95e48e025c00 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff95e48e025c00 RBP: ffff95e4bf3660a4 R08: 0000000000000000 R09: ffffffff86d5e100 R10: ffff95e49e1de600 R11: 0000000000000001 R12: ffff95e4bf366180 R13: ffff95e48e025c00 R14: ffff95e4bf366028 R15: ffff95e4bf366000 FS: 0000000000000000(0000) GS:ffff95e4df200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000f86a0a003 CR4: 00000000001606f0 Call Trace: receive_context_interrupt+0x1f/0x40 [hfi1] __free_irq+0x201/0x300 free_irq+0x2e/0x60 pci_free_irq+0x18/0x30 msix_free_irq.part.2+0x46/0x80 [hfi1] msix_clean_up_interrupts+0x2b/0x70 [hfi1] hfi1_init_dd+0x640/0x1a90 [hfi1] do_init_one.isra.19+0x34d/0x680 [hfi1] local_pci_probe+0x41/0x90 work_for_cpu_fn+0x16/0x20 process_one_work+0x1a7/0x360 worker_thread+0x1cf/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x112/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x35/0x40 The free_irq() results in a callback to the registered interrupt handler, and rcd->do_interrupt is NULL because the receive context data structures are not fully initialized. Fix by ensuring that the do_interrupt is always assigned and adding a guards in the slow path handler to detect and handle a partially initialized receive context and noop the receive. Link: https://lore.kernel.org/r/20211129192003.101968.33612.stgit@awfm-01.cornelisnetworks.com Cc: stable@vger.kernel.org Fixes: b0ba3c18d6bf ("IB/hfi1: Move normal functions from hfi1_devdata to const array") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07	IB/hfi1: Insure use of smp_processor_id() is preempt disabled	Mike Marciniszyn
	The following BUG has just surfaced with our 5.16 testing: BUG: using smp_processor_id() in preemptible [00000000] code: mpicheck/1581081 caller is sdma_select_user_engine+0x72/0x210 [hfi1] CPU: 0 PID: 1581081 Comm: mpicheck Tainted: G S 5.16.0-rc1+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 Call Trace: <TASK> dump_stack_lvl+0x33/0x42 check_preemption_disabled+0xbf/0xe0 sdma_select_user_engine+0x72/0x210 [hfi1] ? _raw_spin_unlock_irqrestore+0x1f/0x31 ? hfi1_mmu_rb_insert+0x6b/0x200 [hfi1] hfi1_user_sdma_process_request+0xa02/0x1120 [hfi1] ? hfi1_write_iter+0xb8/0x200 [hfi1] hfi1_write_iter+0xb8/0x200 [hfi1] do_iter_readv_writev+0x163/0x1c0 do_iter_write+0x80/0x1c0 vfs_writev+0x88/0x1a0 ? recalibrate_cpu_khz+0x10/0x10 ? ktime_get+0x3e/0xa0 ? __fget_files+0x66/0xa0 do_writev+0x65/0x100 do_syscall_64+0x3a/0x80 Fix this long standing bug by moving the smp_processor_id() to after the rcu_read_lock(). The rcu_read_lock() implicitly disables preemption. Link: https://lore.kernel.org/r/20211129191958.101968.87329.stgit@awfm-01.cornelisnetworks.com Cc: stable@vger.kernel.org Fixes: 0cb2aa690c7e ("IB/hfi1: Add sysfs interface for affinity setup") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07	IB/hfi1: Correct guard on eager buffer deallocation	Mike Marciniszyn
	The code tests the dma address which legitimately can be 0. The code should test the kernel logical address to avoid leaking eager buffer allocations that happen to map to a dma address of 0. Fixes: 60368186fd85 ("IB/hfi1: Fix user-space buffers mapping with IOMMU enabled") Link: https://lore.kernel.org/r/20211129191952.101968.17137.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-12-07	nvme: fix use after free when disconnecting a reconnecting ctrl	Ruozhu Li
	A crash happens when trying to disconnect a reconnecting ctrl: 1) The network was cut off when the connection was just established, scan work hang there waiting for some IOs complete. Those I/Os were retried because we return BLK_STS_RESOURCE to blk in reconnecting. 2) After a while, I tried to disconnect this connection. This procedure also hangs because it tried to obtain ctrl->scan_lock. It should be noted that now we have switched the controller state to NVME_CTRL_DELETING. 3) In nvme_check_ready(), we always return true when ctrl->state is NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom device which was already freed. To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom device only when queue state is live. If not, return host path error to the block layer Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-12-07	nvme-multipath: set ana_log_size to 0 after free ana_log_buf	Hou Tao
	Set ana_log_size to 0 when ana_log_buf is freed to make sure nvme_mpath_init_identify will do the right thing when retrying after an earlier failure. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-12-07	drm/bridge: parade-ps8640: Add backpointer to drm_device in drm_dp_aux	Douglas Anderson
	When we added the support for the AUX channel in commit 13afcdd7277e ("drm/bridge: parade-ps8640: Add support for AUX channel") we forgot to set "drm_dev" to avoid the warning splat at the beginning of drm_dp_aux_register(). Since everything was working I guess I never noticed the splat when testing against mainline. In any case, it's easy to fix. This is basically just like commit 6cba3fe43341 ("drm/dp: Add backpointer to drm_device in drm_dp_aux") but just for the parade-ps8640. Fixes: 13afcdd7277e ("drm/bridge: parade-ps8640: Add support for AUX channel") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Philip Chen <philipchen@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211206162907.1.I1f5d1eba741e4663050ec1b8e39a753f6e42e38b@changeid
2021-12-07	PCI: apple: Follow the PCIe specifications when resetting the port	Marc Zyngier
	While the Apple PCIe driver works correctly when directly booted from the firmware, it fails to initialise when the kernel is booted from a bootloader using PCIe such as u-boot. That's because we're missing a proper reset of the port (we only clear the reset, but never assert it). The PCIe spec requirements are two-fold: - PERST# must be asserted before setting up the clocks and stay asserted for at least 100us (Tperst-clk) - Once PERST# is deasserted, the OS must wait for at least 100ms "from the end of a Conventional Reset" before we can start talking to the devices Implementing this results in a booting system. [bhelgaas: #PERST -> PERST#, update spec references to current] Fixes: 1e33888fbe44 ("PCI: apple: Add initial hardware bring-up") Link: https://lore.kernel.org/r/20211123180636.80558-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net> Acked-by: Pali Rohár <pali@kernel.org> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2021-12-07	drm/panel: Update Boe-tv110c9m and Inx-hj110iz initial code	yangcong
	At present, we have enough panel to confirm the effect, update the initial code to achieve the best effect. Such as gamma, Gop timing. They are all minor modifications and doesn't affect the lighting of the panel. a)Boe-tv110c9m panel Optimized touch horizontal grain. b)Inx-hj110iz panel Optimized GOP timing and gamma. Signed-off-by: yangcong <yangcong5@huaqin.corp-partner.google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211201023230.344976-1-yangcong5@huaqin.corp-partner.google.com
2021-12-07	netfs: fix parameter of cleanup()	Jeffle Xu
	The order of these two parameters is just reversed. gcc didn't warn on that, probably because 'void *' can be converted from or to other pointer types without warning. Cc: stable@vger.kernel.org Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers") Fixes: e1b1240c1ff5 ("netfs: Add write_begin helper") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ # v1
2021-12-07	drm/i915: Allow cdclk squasher to be reconfigured live	Ville Syrjälä
	Supposedly we should be able to change the cdclk squasher waveform even when many pipes are active. Make it so. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211119131348.725220-6-mika.kahola@intel.com
2021-12-07	drm/i915/display/dg2: Read CD clock from squasher table	Mika Kahola
	To calculate CD clock with squasher unit, we set CD clock ratio to fixed value of 34. The CD clock value is read from CD clock squasher table. BSpec: 54034 v2: Read ratio from register (Ville) Drop unnecessary local variable (Ville) Get CD clock from the given table v3: Calculate CD clock frequency based on waveform bit pattern (Ville) [v4: vsyrjala: Actually do a proper blind readout from the hardware] [v5: vsyrjala: Use has_cdclk_squasher()] Signed-off-by: Mika Kahola <mika.kahola@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211119131348.725220-5-mika.kahola@intel.com
2021-12-07	drm/i915/display/dg2: Set CD clock squashing registers	Mika Kahola
	Set CD clock squashing registers based on selected CD clock. v2: use slk_cdclk_decimal() to compute decimal values instead of a specific table (Ville) Set waveform based on CD clock table (Ville) Drop unnecessary local variable (Ville) v3: Correct function naming (Ville) Correct if-else structure (Ville) [v4: vsyrjala: Fix spaces vs. tabs] [v5: vsyrjala: Fix cd2x divider calculation (Uma), Add warn to waveform lookup (Uma), Handle bypass freq in waveform lookup, Generalize waveform handling in bxt_set_cdclk()] Signed-off-by: Mika Kahola <mika.kahola@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211119131348.725220-4-mika.kahola@intel.com
2021-12-07	drm/i915/display/dg2: Sanitize CD clock	Mika Kahola
	In case of CD clock squashing the divider is always 1. We don't need to calculate the divider in use so let's skip that for DG2. v2: Drop unnecessary local variable (Ville) v3: Avoid if-else structure (Ville) [v4: vsyrjala: Fix cd2x divider calculation (Uma), Introduce has_cdclk_squasher()] Signed-off-by: Mika Kahola <mika.kahola@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211119131348.725220-3-mika.kahola@intel.com
2021-12-07	drm/i915/display/dg2: Introduce CD clock squashing table	Mika Kahola
	For CD clock squashing method, we need to define corresponding CD clock table for reference clocks, dividers and ratios for all CD clock options. BSpec: 54034 v2: Add CD squashing waveforms as part of CD clock table (Ville) v3: Waveform is 16 bits wide (Ville) [v4: vsyrjala: Nuke the non-squasher based table, Set .divider=2 for consistency, Pack intel_cdclk_vals a bit nicer] v5: Fix error in waveform value (Swati) v6 (Lucas): Rebase on upstream v7 (MattR): Drop 40.8, 81.6, and 122.4 MHz frequencies to reflect new bspec update. Signed-off-by: Mika Kahola <mika.kahola@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211119131348.725220-2-mika.kahola@intel.com
2021-12-07	drm/i915/selftests: Follow up on increase timeout in i915_gem_contexts selftests	Bruce Chang
	Follow up on below commit, to increase the timeout further on new platforms, to accomodate the additional time required for the completion of guc submissions for numerous requests created in loop. commit 5e076529e2652244ec20a86d8f99ba634a16c4f4 Author: Matthew Brost <matthew.brost@intel.com> Date: Mon Jul 26 20:17:03 2021 -0700 drm/i915/selftests: Increase timeout in i915_gem_contexts selftests Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211207003845.12419-1-yu.bruce.chang@intel.com
2021-12-07	netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock	David Howells
	Taking sb_writers whilst holding mmap_lock isn't allowed and will result in a lockdep warning like that below. The problem comes from cachefiles needing to take the sb_writers lock in order to do a write to the cache, but being asked to do this by netfslib called from readpage, readahead or write_begin[1]. Fix this by always offloading the write to the cache off to a worker thread. The main thread doesn't need to wait for it, so deadlock can be avoided. This can be tested by running the quick xfstests on something like afs or ceph with lockdep enabled. WARNING: possible circular locking dependency detected 5.15.0-rc1-build2+ #292 Not tainted ------------------------------------------------------ holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5 but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&mm->mmap_lock#2){++++}-{3:3}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b __might_fault+0x87/0xb1 strncpy_from_user+0x25/0x18c removexattr+0x7c/0xe5 __do_sys_fremovexattr+0x73/0x96 do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (sb_writers#10){.+.+}-{0:0}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b cachefiles_write+0x2b3/0x4bb netfs_rreq_do_write_to_cache+0x3b5/0x432 netfs_readpage+0x2de/0x39d filemap_read_page+0x51/0x94 filemap_get_pages+0x26f/0x413 filemap_read+0x182/0x427 new_sync_read+0xf0/0x161 vfs_read+0x118/0x16e ksys_read+0xb8/0x12e do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}: check_noncircular+0xe4/0x129 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b down_read+0x40/0x4a filemap_fault+0x276/0x7a5 __do_fault+0x96/0xbf do_fault+0x262/0x35a __handle_mm_fault+0x171/0x1b5 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c exc_page_fault+0x85/0xa5 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_lock#2); lock(sb_writers#10); lock(&mm->mmap_lock#2); lock(mapping.invalidate_lock#3); * DEADLOCK * 1 lock held by holetest/65517: #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c stack backtrace: CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0xe4/0x129 ? print_circular_bug+0x207/0x207 ? validate_chain+0x461/0x4a8 ? add_chain_block+0x88/0xd9 ? hlist_add_head_rcu+0x49/0x53 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 ? check_prev_add+0x3a4/0x3a4 ? mark_lock+0xa5/0x1c6 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b ? filemap_fault+0x276/0x7a5 ? rcu_read_unlock+0x59/0x59 ? add_to_page_cache_lru+0x13c/0x13c ? lock_is_held_type+0x7b/0xd3 down_read+0x40/0x4a ? filemap_fault+0x276/0x7a5 filemap_fault+0x276/0x7a5 ? pagecache_get_page+0x2dd/0x2dd ? __lock_acquire+0x8bc/0x949 ? pte_offset_kernel.isra.0+0x6d/0xc3 __do_fault+0x96/0xbf ? do_fault+0x124/0x35a do_fault+0x262/0x35a ? handle_pte_fault+0x1c1/0x20d __handle_mm_fault+0x171/0x1b5 ? handle_pte_fault+0x20d/0x20d ? __lock_release+0x151/0x254 ? mark_held_locks+0x1f/0x78 ? rcu_read_unlock+0x3a/0x59 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c ? pgtable_bad+0x70/0x70 ? rcu_read_lock_bh_held+0xab/0xab exc_page_fault+0x85/0xa5 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x40192f Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b RSP: 002b:00007f9931867eb0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0 RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700 R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0 R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000 Fixes: 726218fdc22c ("netfs: Define an interface to talk to a cache") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> cc: Jan Kara <jack@suse.cz> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1] Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1