linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-03-23	Merge tag 'gfs2-v6.3-rc3-fix' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: - Reinstate commit 970343cd4904 ("GFS2: free disk inode which is deleted by remote node -V2") as reverting that commit could cause gfs2_put_super() to hang. * tag 'gfs2-v6.3-rc3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Reinstate "GFS2: free disk inode which is deleted by remote node -V2"
2023-03-23	Bluetooth: HCI: Fix global-out-of-bounds	Sungwoo Kim
	To loop a variable-length array, hci_init_stage_sync(stage) considers that stage[i] is valid as long as stage[i-1].func is valid. Thus, the last element of stage[].func should be intentionally invalid as hci_init0[], le_init2[], and others did. However, amp_init1[] and amp_init2[] have no invalid element, letting hci_init_stage_sync() keep accessing amp_init1[] over its valid range. This patch fixes this by adding {} in the last of amp_init1[] and amp_init2[]. ================================================================== BUG: KASAN: global-out-of-bounds in hci_dev_open_sync ( /v6.2-bzimage/net/bluetooth/hci_sync.c:3154 /v6.2-bzimage/net/bluetooth/hci_sync.c:3343 /v6.2-bzimage/net/bluetooth/hci_sync.c:4418 /v6.2-bzimage/net/bluetooth/hci_sync.c:4609 /v6.2-bzimage/net/bluetooth/hci_sync.c:4689) Read of size 8 at addr ffffffffaed1ab70 by task kworker/u5:0/1032 CPU: 0 PID: 1032 Comm: kworker/u5:0 Not tainted 6.2.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04 Workqueue: hci1 hci_power_on Call Trace: <TASK> dump_stack_lvl (/v6.2-bzimage/lib/dump_stack.c:107 (discriminator 1)) print_report (/v6.2-bzimage/mm/kasan/report.c:307 /v6.2-bzimage/mm/kasan/report.c:417) ? hci_dev_open_sync (/v6.2-bzimage/net/bluetooth/hci_sync.c:3154 /v6.2-bzimage/net/bluetooth/hci_sync.c:3343 /v6.2-bzimage/net/bluetooth/hci_sync.c:4418 /v6.2-bzimage/net/bluetooth/hci_sync.c:4609 /v6.2-bzimage/net/bluetooth/hci_sync.c:4689) kasan_report (/v6.2-bzimage/mm/kasan/report.c:184 /v6.2-bzimage/mm/kasan/report.c:519) ? hci_dev_open_sync (/v6.2-bzimage/net/bluetooth/hci_sync.c:3154 /v6.2-bzimage/net/bluetooth/hci_sync.c:3343 /v6.2-bzimage/net/bluetooth/hci_sync.c:4418 /v6.2-bzimage/net/bluetooth/hci_sync.c:4609 /v6.2-bzimage/net/bluetooth/hci_sync.c:4689) hci_dev_open_sync (/v6.2-bzimage/net/bluetooth/hci_sync.c:3154 /v6.2-bzimage/net/bluetooth/hci_sync.c:3343 /v6.2-bzimage/net/bluetooth/hci_sync.c:4418 /v6.2-bzimage/net/bluetooth/hci_sync.c:4609 /v6.2-bzimage/net/bluetooth/hci_sync.c:4689) ? __pfx_hci_dev_open_sync (/v6.2-bzimage/net/bluetooth/hci_sync.c:4635) ? mutex_lock (/v6.2-bzimage/./arch/x86/include/asm/atomic64_64.h:190 /v6.2-bzimage/./include/linux/atomic/atomic-long.h:443 /v6.2-bzimage/./include/linux/atomic/atomic-instrumented.h:1781 /v6.2-bzimage/kernel/locking/mutex.c:171 /v6.2-bzimage/kernel/locking/mutex.c:285) ? __pfx_mutex_lock (/v6.2-bzimage/kernel/locking/mutex.c:282) hci_power_on (/v6.2-bzimage/net/bluetooth/hci_core.c:485 /v6.2-bzimage/net/bluetooth/hci_core.c:984) ? __pfx_hci_power_on (/v6.2-bzimage/net/bluetooth/hci_core.c:969) ? read_word_at_a_time (/v6.2-bzimage/./include/asm-generic/rwonce.h:85) ? strscpy (/v6.2-bzimage/./arch/x86/include/asm/word-at-a-time.h:62 /v6.2-bzimage/lib/string.c:161) process_one_work (/v6.2-bzimage/kernel/workqueue.c:2294) worker_thread (/v6.2-bzimage/./include/linux/list.h:292 /v6.2-bzimage/kernel/workqueue.c:2437) ? __pfx_worker_thread (/v6.2-bzimage/kernel/workqueue.c:2379) kthread (/v6.2-bzimage/kernel/kthread.c:376) ? __pfx_kthread (/v6.2-bzimage/kernel/kthread.c:331) ret_from_fork (/v6.2-bzimage/arch/x86/entry/entry_64.S:314) </TASK> The buggy address belongs to the variable: amp_init1+0x30/0x60 The buggy address belongs to the physical page: page:000000003a157ec6 refcount:1 mapcount:0 mapping:0000000000000000 ia flags: 0x200000000001000(reserved\|node=0\|zone=2) raw: 0200000000001000 ffffea0005054688 ffffea0005054688 000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffffaed1aa00: f9 f9 f9 f9 00 00 00 00 f9 f9 f9 f9 00 00 00 00 ffffffffaed1aa80: 00 00 00 00 f9 f9 f9 f9 00 00 00 00 00 00 00 00 >ffffffffaed1ab00: 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 f9 f9 ^ ffffffffaed1ab80: f9 f9 f9 f9 00 00 00 00 f9 f9 f9 f9 00 00 00 f9 ffffffffaed1ac00: f9 f9 f9 f9 00 06 f9 f9 f9 f9 f9 f9 00 00 02 f9 This bug is found by FuzzBT, a modified version of Syzkaller. Other contributors for this bug are Ruoyu Wu and Peng Hui. Fixes: d0b137062b2d ("Bluetooth: hci_sync: Rework init stages") Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-23	Bluetooth: mgmt: Fix MGMT add advmon with RSSI command	Howard Chung
	The MGMT command: MGMT_OP_ADD_ADV_PATTERNS_MONITOR_RSSI uses variable length argument. This causes host not able to register advmon with rssi. This patch has been locally tested by adding monitor with rssi via btmgmt on a kernel 6.1 machine. Reviewed-by: Archie Pusaka <apusaka@chromium.org> Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Signed-off-by: Howard Chung <howardchung@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-23	Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished ↵	Zheng Wang
	work In btsdio_probe, &data->work was bound with btsdio_work.In btsdio_send_frame, it was started by schedule_work. If we call btsdio_remove with an unfinished job, there may be a race condition and cause UAF bug on hdev. Fixes: ddbaf13e3609 ("[Bluetooth] Add generic driver for Bluetooth SDIO devices") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-23	Bluetooth: L2CAP: Fix responding with wrong PDU type	Luiz Augusto von Dentz
	L2CAP_ECRED_CONN_REQ shall be responded with L2CAP_ECRED_CONN_RSP not L2CAP_LE_CONN_RSP: L2CAP LE EATT Server - Reject - run Listening for connections New client connection with handle 0x002a Sending L2CAP Request from client Client received response code 0x15 Unexpected L2CAP response code (expected 0x18) L2CAP LE EATT Server - Reject - test failed > ACL Data RX: Handle 42 flags 0x02 dlen 26 LE L2CAP: Enhanced Credit Connection Request (0x17) ident 1 len 18 PSM: 39 (0x0027) MTU: 64 MPS: 64 Credits: 5 Source CID: 65 Source CID: 66 Source CID: 67 Source CID: 68 Source CID: 69 < ACL Data TX: Handle 42 flags 0x00 dlen 16 LE L2CAP: LE Connection Response (0x15) ident 1 len 8 invalid size 00 00 00 00 00 00 06 00 L2CAP LE EATT Server - Reject - run Listening for connections New client connection with handle 0x002a Sending L2CAP Request from client Client received response code 0x18 L2CAP LE EATT Server - Reject - test passed Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-23	Bluetooth: btqcomsmd: Fix command timeout after setting BD address	Stephan Gerhold
	On most devices using the btqcomsmd driver (e.g. the DragonBoard 410c and other devices based on the Qualcomm MSM8916/MSM8909/... SoCs) the Bluetooth firmware seems to become unresponsive for a while after setting the BD address. On recent kernel versions (at least 5.17+) this often causes timeouts for subsequent commands, e.g. the HCI reset sent by the Bluetooth core during initialization: Bluetooth: hci0: Opcode 0x c03 failed: -110 Unfortunately this behavior does not seem to be documented anywhere. Experimentation suggests that the minimum necessary delay to avoid the problem is ~150us. However, to be sure add a sleep for > 1ms in case it is a bit longer on other firmware versions. Older kernel versions are likely also affected, although perhaps with slightly different errors or less probability. Side effects can easily hide the issue in most cases, e.g. unrelated incoming interrupts that cause the necessary delay. Fixes: 1511cc750c3d ("Bluetooth: Introduce Qualcomm WCNSS SMD based HCI driver") Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-23	Bluetooth: btinel: Check ACPI handle for NULL before accessing	Kiran K
	Older platforms and Virtual platforms which doesn't have support for bluetooth device in ACPI firmware will not have valid ACPI handle. Check for validity of handle before accessing. dmesg log from simics environment (virtual platform): BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: acpi_ns_walk_namespace+0x5c/0x278 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Modules linked in: bnep intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate input_leds joydev serio_raw mac_hid btusb(OE) btintel(OE) bluetooth(OE) lpc_ich compat(OE) ecdh_generic i7core_edac i5500_temp shpchp binfmt_misc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid e1000e psmouse ahci pata_acpi libahci ptp pps_core floppy CPU: 0 PID: 35 Comm: kworker/u3:0 Tainted: G OE 4.15.0-140-generic #144-Ubuntu Hardware name: Simics Simics, BIOS Simics 01/01/2011 Workqueue: hci0 hci_power_on [bluetooth] RIP: 0010:acpi_ns_walk_namespace+0x5c/0x278 RSP: 0000:ffffaa9c0049bba8 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000001001 RCX: 0000000000000010 RDX: ffffffff92ea7e27 RSI: ffffffff92ea7e10 RDI: 00000000000000c8 RBP: ffffaa9c0049bbf8 R08: 0000000000000000 R09: ffffffffc05b39d0 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000000 R14: ffffffffc05b39d0 R15: ffffaa9c0049bc70 FS: 0000000000000000(0000) GS:ffff8be73fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 0000000075f0e000 CR4: 00000000000006f0 Fixes: 294d749b5df5 ("Bluetooth: btintel: Iterate only bluetooth device ACPI entries") Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-03-23	riscv: Handle zicsr/zifencei issues between clang and binutils	Nathan Chancellor
	There are two related issues that appear in certain combinations with clang and GNU binutils. The first occurs when a version of clang that supports zicsr or zifencei via '-march=' [1] (i.e, >= 17.x) is used in combination with a version of GNU binutils that do not recognize zicsr and zifencei in the '-march=' value (i.e., < 2.36): riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei' riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/file.o riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei' riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/super.o The second occurs when a version of clang that does not support zicsr or zifencei via '-march=' (i.e., <= 16.x) is used in combination with a version of GNU as that defaults to a newer ISA base spec, which requires specifying zicsr and zifencei in the '-march=' value explicitly (i.e, >= 2.38): ../arch/riscv/kernel/kexec_relocate.S: Assembler messages: ../arch/riscv/kernel/kexec_relocate.S:147: Error: unrecognized opcode `fence.i', extension `zifencei' required clang-12: error: assembler command failed with exit code 1 (use -v to see invocation) This is the same issue addressed by commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38") (see [2] for additional information) but older versions of clang miss out on it because the cc-option check fails: clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr' clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr' To resolve the first issue, only attempt to add zicsr and zifencei to the march string when using the GNU assembler 2.38 or newer, which is when the default ISA spec was updated, requiring these extensions to be specified explicitly. LLVM implements an older version of the base specification for all currently released versions, so these instructions are available as part of the 'i' extension. If LLVM's implementation is updated in the future, a CONFIG_AS_IS_LLVM condition can be added to CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. To resolve the second issue, use version 2.2 of the base ISA spec when using an older version of clang that does not support zicsr or zifencei via '-march=', as that is the spec version most compatible with the one clang/LLVM implements and avoids the need to specify zicsr and zifencei explicitly due to still being a part of 'i'. [1]: https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694e15bf8a16 [2]: https://lore.kernel.org/ZAxT7T9Xy1Fo3d5W@aurel32.net/ Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1808 Co-developed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23	SUNRPC: fix shutdown of NFS TCP client socket	Siddharth Kawar
	NFS server Duplicate Request Cache (DRC) algorithms rely on NFS clients reconnecting using the same local TCP port. Unique NFS operations are identified by the per-TCP connection set of XIDs. This prevents file corruption when non-idempotent NFS operations are retried. Currently, NFS client TCP connections are using different local TCP ports when reconnecting to NFS servers. After an NFS server initiates shutdown of the TCP connection, the NFS client's TCP socket is set to NULL after the socket state has reached TCP_LAST_ACK(9). When reconnecting, the new socket attempts to reuse the same local port but fails with EADDRNOTAVAIL (99). This forces the socket to use a different local TCP port to reconnect to the remote NFS server. State Transition and Events: TCP_CLOSE_WAIT(8) TCP_LAST_ACK(9) connect(fail EADDRNOTAVAIL(99)) TCP_CLOSE(7) bind on new port connect success dmesg excerpts showing reconnect switching from TCP local port of 926 to 763 after commit 7c81e6a9d75b: [13354.947854] NFS call mkdir testW ... [13405.654781] RPC: xs_tcp_state_change client 00000000037d0f03... [13405.654813] RPC: state 8 conn 1 dead 0 zapped 1 sk_shutdown 1 [13405.654826] RPC: xs_data_ready... [13405.654892] RPC: xs_tcp_state_change client 00000000037d0f03... [13405.654895] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3 [13405.654899] RPC: xs_tcp_state_change client 00000000037d0f03... [13405.654900] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3 [13405.654950] RPC: xs_connect scheduled xprt 00000000037d0f03 [13405.654975] RPC: xs_bind 0.0.0.0:926: ok (0) [13405.654980] RPC: worker connecting xprt 00000000037d0f03 via tcp to 10.101.6.228 (port 2049) [13405.654991] RPC: 00000000037d0f03 connect status 99 connected 0 sock state 7 [13405.655001] RPC: xs_tcp_state_change client 00000000037d0f03... [13405.655002] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 [13405.655024] RPC: xs_connect scheduled xprt 00000000037d0f03 [13405.655038] RPC: xs_bind 0.0.0.0:763: ok (0) [13405.655041] RPC: worker connecting xprt 00000000037d0f03 via tcp to 10.101.6.228 (port 2049) [13405.655065] RPC: 00000000037d0f03 connect status 115 connected 0 sock state 2 State Transition and Events with patch applied: TCP_CLOSE_WAIT(8) TCP_LAST_ACK(9) TCP_CLOSE(7) connect(reuse of port succeeds) dmesg excerpts showing reconnect on same TCP local port of 936 with patch applied: [ 257.139935] NFS: mkdir(0:59/560857152), testQ [ 257.139937] NFS call mkdir testQ ... [ 307.822702] RPC: state 8 conn 1 dead 0 zapped 1 sk_shutdown 1 [ 307.822714] RPC: xs_data_ready... [ 307.822817] RPC: xs_tcp_state_change client 00000000ce702f14... [ 307.822821] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 307.822825] RPC: xs_tcp_state_change client 00000000ce702f14... [ 307.822826] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 307.823606] RPC: xs_tcp_state_change client 00000000ce702f14... [ 307.823609] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 307.823629] RPC: xs_tcp_state_change client 00000000ce702f14... [ 307.823632] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 307.823676] RPC: xs_connect scheduled xprt 00000000ce702f14 [ 307.823704] RPC: xs_bind 0.0.0.0:936: ok (0) [ 307.823709] RPC: worker connecting xprt 00000000ce702f14 via tcp to 10.101.1.30 (port 2049) [ 307.823748] RPC: 00000000ce702f14 connect status 115 connected 0 sock state 2 ... [ 314.916193] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3 [ 314.916251] RPC: xs_connect scheduled xprt 00000000ce702f14 [ 314.916282] RPC: xs_bind 0.0.0.0:936: ok (0) [ 314.916292] RPC: worker connecting xprt 00000000ce702f14 via tcp to 10.101.1.30 (port 2049) [ 314.916342] RPC: 00000000ce702f14 connect status 115 connected 0 sock state 2 Fixes: 7c81e6a9d75b ("SUNRPC: Tweak TCP socket shutdown in the RPC client") Signed-off-by: Siddharth Rajendra Kawar <sikawar@microsoft.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-23	docs: create a top-level arch/ directory	Jonathan Corbet
	As the first step in bringing some order to our architecture-specific documentation, create a top-level arch/ directory and move arch.rst as its index.rst file. There is no change in the rendered docs at this point. Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23	Merge tag 'nvme-6.3-2023-03-23' of git://git.infradead.org/nvme into block-6.3	Jens Axboe
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - send Identify with CNS 06h only to I/O controllers (Martin George) - fix nvme_tcp_term_pdu to match spec (Caleb Sander)" * tag 'nvme-6.3-2023-03-23' of git://git.infradead.org/nvme: nvme-tcp: fix nvme_tcp_term_pdu to match spec nvme: send Identify with CNS 06h only to I/O controllers
2023-03-23	Reinstate "GFS2: free disk inode which is deleted by remote node -V2"	Bob Peterson
	It turns out that reverting commit 970343cd4904 ("GFS2: free disk inode which is deleted by remote node -V2") causes a regression related to evicting inodes that were unlinked on a different cluster node. We could also have simply added a call to d_mark_dontcache() to function gfs2_try_evict(), but the original pre-revert code is better tested and proven. This reverts commit 445cb1277e10d7e19b631ef8a64aa3f055df377d. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-03-23	usb: dwc2: fix a race, don't power off/on phy for dual-role mode	Fabrice Gasnier
	When in dual role mode (dr_mode == USB_DR_MODE_OTG), platform probe successively basically calls: - dwc2_gadget_init() - dwc2_hcd_init() - dwc2_lowlevel_hw_disable() since recent change [1] - usb_add_gadget_udc() The PHYs (and so the clocks it may provide) shouldn't be disabled for all SoCs, in OTG mode, as the HCD part has been initialized. On STM32 this creates some weird race condition upon boot, when: - initially attached as a device, to a HOST - and there is a gadget script invoked to setup the device part. Below issue becomes systematic, as long as the gadget script isn't started by userland: the hardware PHYs (and so the clocks provided by the PHYs) remains disabled. It ends up in having an endless interrupt storm, before the watchdog resets the platform. [ 16.924163] dwc2 49000000.usb-otg: EPs: 9, dedicated fifos, 952 entries in SPRAM [ 16.962704] dwc2 49000000.usb-otg: DWC OTG Controller [ 16.966488] dwc2 49000000.usb-otg: new USB bus registered, assigned bus number 2 [ 16.974051] dwc2 49000000.usb-otg: irq 77, io mem 0x49000000 [ 17.032170] hub 2-0:1.0: USB hub found [ 17.042299] hub 2-0:1.0: 1 port detected [ 17.175408] dwc2 49000000.usb-otg: Mode Mismatch Interrupt: currently in Host mode [ 17.181741] dwc2 49000000.usb-otg: Mode Mismatch Interrupt: currently in Host mode [ 17.189303] dwc2 49000000.usb-otg: Mode Mismatch Interrupt: currently in Host mode ... The host part is also not functional, until the gadget part is configured. The HW may only be disabled for peripheral mode (original init), e.g. dr_mode == USB_DR_MODE_PERIPHERAL, until the gadget driver initializes. But when in USB_DR_MODE_OTG, the HW should remain enabled, as the HCD part is able to run, while the gadget part isn't necessarily configured. I don't fully get the of purpose the original change, that claims disabling the hardware is missing. It creates conditions on SOCs using the PHY initialization to be completely non working in OTG mode. Original change [1] should be reworked to be platform specific. [1] https://lore.kernel.org/r/20221206-dwc2-gadget-dual-role-v1-2-36515e1092cd@theobroma-systems.com Fixes: ade23d7b7ec5 ("usb: dwc2: power on/off phy for peripheral mode in dual-role mode") Cc: stable <stable@kernel.org> Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Reviewed-by: Quentin Schulz <quentin.schulz@theobroma-systems.com> Tested-by: Quentin Schulz <quentin.schulz@theobroma-systems.com> Link: https://lore.kernel.org/r/20230315144433.3095859-1-fabrice.gasnier@foss.st.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-23	usb: dwc2: fix a devres leak in hw_enable upon suspend resume	Fabrice Gasnier
	Each time the platform goes to low power, PM suspend / resume routines call: __dwc2_lowlevel_hw_enable -> devm_add_action_or_reset(). This adds a new devres each time. This may also happen at runtime, as dwc2_lowlevel_hw_enable() can be called from udc_start(). This can be seen with tracing: - echo 1 > /sys/kernel/debug/tracing/events/dev/devres_log/enable - go to low power - cat /sys/kernel/debug/tracing/trace A new "ADD" entry is found upon each low power cycle: ... devres_log: 49000000.usb-otg ADD 82a13bba devm_action_release (8 bytes) ... devres_log: 49000000.usb-otg ADD 49889daf devm_action_release (8 bytes) ... A second issue is addressed here: - regulator_bulk_enable() is called upon each PM cycle (suspend/resume). - regulator_bulk_disable() never gets called. So the reference count for these regulators constantly increase, by one upon each low power cycle, due to missing regulator_bulk_disable() call in __dwc2_lowlevel_hw_disable(). The original fix that introduced the devm_add_action_or_reset() call, fixed an issue during probe, that happens due to other errors in dwc2_driver_probe() -> dwc2_core_reset(). Then the probe fails without disabling regulators, when dr_mode == USB_DR_MODE_PERIPHERAL. Rather fix the error path: disable all the low level hardware in the error path, by using the "hsotg->ll_hw_enabled" flag. Checking dr_mode has been introduced to avoid a dual call to dwc2_lowlevel_hw_disable(). "ll_hw_enabled" should achieve the same (and is used currently in the remove() routine). Fixes: 54c196060510 ("usb: dwc2: Always disable regulators on driver teardown") Fixes: 33a06f1300a7 ("usb: dwc2: Fix error path in gadget registration") Cc: stable <stable@kernel.org> Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Link: https://lore.kernel.org/r/20230316084127.126084-1-fabrice.gasnier@foss.st.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-23	docs: describe how to quickly build a trimmed kernel	Thorsten Leemhuis
	Add a text explaining how to quickly build a kernel, as that's something users will often have to do when they want to report an issue or test proposed fixes. This is a huge and frightening task for quite a few users these days, as many rely on pre-compiled kernels and have never built their own. They find help on quite a few websites explaining the process in various ways, but those howtos often omit important details or make things too hard for the 'quickly build just for testing' case that 'localmodconfig' is really useful for. Hence give users something at hand to guide them, as that makes it easier for them to help with testing, debugging, and fixing the kernel. To keep the complexity at bay, the document explicitly focuses on how to compile the kernel on commodity distributions running on commodity hardware. People that deal with less common distributions or hardware will often know their way around already anyway. The text describes a few oddities of Arch and Debian that were found by the author and a few volunteers that tested the described procedure. There are likely more such quirks that need to be covered as well as a few things the author will have missed -- but one has to start somewhere. The document heavily uses anchors and links to them, which makes things slightly harder to read in the source form. But the intended target audience is way more likely to read rendered versions of this text on pages like docs.kernel.org anyway -- and there those anchors and links allow easy jumps to the reference section and back, which makes the document a lot easier to work with for the intended target audience. Aspects relevant for bisection were left out on purpose, as that is a related, but in the end different use case. The rough plan is to have a second document with a similar style to cover bisection. The idea is to reuse a few bits from this document and link quite often to entries in the reference section with the help of the anchors in this text. Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/1a788a8e7ba8a2063df08668f565efa832016032.1678021408.git.linux@leemhuis.info Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23	Merge tag 'zonefs-6.3-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fixes from Damien Le Moal: - Silence a false positive smatch warning about an uninitialized variable - Fix an error message to provide more useful information about invalid zone append write results * tag 'zonefs-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Fix error message in zonefs_file_dio_append() zonefs: Prevent uninitialized symbol 'size' warning
2023-03-23	coding-style: fix title of Greg K-H's talk	Jakub Wilk
	The talk title was inadvertently mangled in 8c27ceff3604 ("docs: fix locations of several documents that got moved"). Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Link: https://lore.kernel.org/r/20230322215311.6579-1-jwilk@jwilk.net Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23	Documentation/x86: Update split lock documentation	Tony Luck
	commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") added a delay and serialization of split locks. Commit 727209376f49 ("x86/split_lock: Add sysctl to control the misery mode") provided a sysctl to turn off the misery. Update the split lock documentation to describe the current state of the code. Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20230315225722.104607-1-tony.luck@intel.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23	docs/zh_CN: fix a wrong format	Zang Leigang
	Add a missing markup for the code snippet at the end of lru_sort.rst Signed-off-by: Zang Leigang <zangleigang@hisilicon.com> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/20230316024519.27992-1-zangleigang@hisilicon.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23	Documentation: kernel-parameters: sort all "no..." parameters	Randy Dunlap
	Sort all of the "no..." kernel parameters into the correct order as specified in kernel-parameters.rst: "into English Dictionary order (defined as ignoring all punctuation and sorting digits before letters in a case insensitive manner)". No other changes here, just movement of lines. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230317002635.16540-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23	doc:it_IT: translation alignment	Federico Vaga
	Major update for maintainer-pgp-guide commit e4412739472b ("Documentation: raise minimum supported version of binutils to 2.25") commit 67fe6792a7fb ("Documentation: stable: Document alternative for referring upstream commit hash") commit 8763a30bc15b ("docs: deprecated.rst: Add note about DECLARE_FLEX_ARRAY() usage commit 2f993509a97e ("docs: process/5.Posting.rst: clarify use of Reported-by: tag") commit a31323bef2b6 ("timers: Update the documentation to reflect on the new timer_shutdown() API") Signed-off-by: Federico Vaga <federico.vaga@vaga.pv.it> Link: https://lore.kernel.org/r/20230319134624.21327-1-federico.vaga@vaga.pv.it Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23	Documentation: maintainer-tip: Rectify link to "Describe your changes" ↵	Bagas Sanjaya
	section of submitting-patches.rst The general changelog rules for the tip tree refers to "Describe your changes" section of submitting patches guide. However, the internal link reference targets to non-existent "submittingpatches" label, which brings reader to the top of the linked doc. Correct the target. No changes to submitting-patches.rst since the required label is already there. Fixes: 31c9d7c8297558 ("Documentation/process: Add tip tree handbook") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230320124327.174881-1-bagasdotme@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23	usb: chipidea: core: fix possible concurrent when switch role	Xu Yang
	The user may call role_store() when driver is handling ci_handle_id_switch() which is triggerred by otg event or power lost event. Unfortunately, the controller may go into chaos in this case. Fix this by protecting it with mutex lock. Fixes: a932a8041ff9 ("usb: chipidea: core: add sysfs group") cc: <stable@vger.kernel.org> Acked-by: Peter Chen <peter.chen@kernel.org> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20230317061516.2451728-2-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-23	usb: chipdea: core: fix return -EINVAL if request role is the same with ↵	Xu Yang
	current role It should not return -EINVAL if the request role is the same with current role, return non-error and without do anything instead. Fixes: a932a8041ff9 ("usb: chipidea: core: add sysfs group") cc: <stable@vger.kernel.org> Acked-by: Peter Chen <peter.chen@kernel.org> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20230317061516.2451728-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-23	cifs: print session id while listing open files	Shyam Prasad N
	In the output of /proc/fs/cifs/open_files, we only print the tree id for the tcon of each open file. It becomes difficult to know which tcon these files belong to with just the tree id. This change dumps ses id in addition to all other data today. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-23	cifs: dump pending mids for all channels in DebugData	Shyam Prasad N
	Currently, we only dump the pending mid information only on the primary channel in /proc/fs/cifs/DebugData. If multichannel is active, we do not print the pending MID list on secondary channels. This change will dump the pending mids for all the channels based on server->conn_id. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-23	cifs: empty interface list when server doesn't support query interfaces	Shyam Prasad N
	When querying server interfaces returns -EOPNOTSUPP, clear the list of interfaces. Assumption is that multichannel would be disabled too. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-23	cifs: do not poll server interfaces too regularly	Shyam Prasad N
	We have the server interface list hanging off the tcon structure today for reasons unknown. So each tcon which is connected to a file server can query them separately, which is really unnecessary. To avoid this, in the query function, we will check the time of last update of the interface list, and avoid querying the server if it is within a certain range. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-23	arm64: dts: qcom: qrb5165-rb5: Use proper WSA881x shutdown GPIO polarity	Krzysztof Kozlowski
	The WSA881x shutdown GPIO is active low (SD_N), but Linux driver assumed DTS always comes with active high. Since Linux drivers were updated to handle proper flag, correct the DTS. The change is not backwards compatible with older Linux kernel. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230322193051.826167-5-krzysztof.kozlowski@linaro.org
2023-03-23	arm64: dts: qcom: sm8250-mtp: Use proper WSA881x shutdown GPIO polarity	Krzysztof Kozlowski
	The WSA881x shutdown GPIO is active low (SD_N), but Linux driver assumed DTS always comes with active high. Since Linux drivers were updated to handle proper flag, correct the DTS. The change is not backwards compatible with older Linux kernel. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230322193051.826167-4-krzysztof.kozlowski@linaro.org
2023-03-23	arm64: dts: qcom: sdm850-samsung-w737: Use proper WSA881x shutdown GPIO polarity	Krzysztof Kozlowski
	The WSA881x shutdown GPIO is active low (SD_N), but Linux driver assumed DTS always comes with active high. Since Linux drivers were updated to handle proper flag, correct the DTS. The change is not backwards compatible with older Linux kernel. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230322193051.826167-3-krzysztof.kozlowski@linaro.org
2023-03-23	arm64: dts: qcom: sdm850-lenovo-yoga-c630: Use proper WSA881x shutdown GPIO ↵	Krzysztof Kozlowski
	polarity The WSA881x shutdown GPIO is active low (SD_N), but Linux driver assumed DTS always comes with active high. Since Linux drivers were updated to handle proper flag, correct the DTS. The change is not backwards compatible with older Linux kernel. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230322193051.826167-2-krzysztof.kozlowski@linaro.org
2023-03-23	efi/libstub: randomalloc: Return EFI_OUT_OF_RESOURCES on failure	Ard Biesheuvel
	The logic in efi_random_alloc() will iterate over the memory map twice, once to count the number of candidate slots, and another time to locate the chosen slot after randomization. If there is insufficient memory to do the allocation, the second loop will run to completion without actually having located a slot, but we currently return EFI_SUCCESS in this case, as we fail to initialize status to the appropriate error value of EFI_OUT_OF_RESOURCES. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-03-23	ASoC: codecs: lpass: fix the order or clks turn off during suspend	Srinivas Kandagatla
	The order in which clocks are stopped matters as some of the clock like NPL are derived from MCLK. Without this patch, Dragonboard RB5 DSP would crash with below error: qcom_q6v5_pas 17300000.remoteproc: fatal error received: ABT_dal.c:278:ABTimeout: AHB Bus hang is detected, Number of bus hang detected := 2 , addr0 = 0x3370000 , addr1 = 0x0!!! Turn off fsgen first, followed by npl and then finally mclk, which is exactly the opposite order of enable sequence. Fixes: 1dc3459009c3 ("ASoC: codecs: lpass: register mclk after runtime pm") Reported-by: Amit Pundir <amit.pundir@linaro.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Amit Pundir <amit.pundir@linaro.org> Link: https://lore.kernel.org/r/20230323110125.23790-1-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2023-03-23	pwm: Zero-initialize the pwm_state passed to driver's .get_state()	Uwe Kleine-König
	This is just to ensure that .usage_power is properly initialized and doesn't contain random stack data. The other members of struct pwm_state should get a value assigned in a successful call to .get_state(). So in the absence of bugs in driver implementations, this is only a safe-guard and no fix. Reported-by: Munehisa Kamata <kamatam@amazon.com> Link: https://lore.kernel.org/r/20230310214004.2619480-1-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2023-03-23	pwm: meson: Explicitly set .polarity in .get_state()	Uwe Kleine-König
	The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly. This fixes a regression that was possible since commit c73a3107624d ("pwm: Handle .get_state() failures") which stopped to zero-initialize the state passed to the .get_state() callback. This was reported at https://forum.odroid.com/viewtopic.php?f=177&t=46360 . While this was an unintended side effect, the real issue is the driver's callback not setting the polarity. There is a complicating fact, that the .apply() callback fakes support for inversed polarity. This is not (and cannot) be matched by .get_state(). As fixing this isn't easy, only point it out in a comment to prevent authors of other drivers from copying that approach. Fixes: c375bcbaabdb ("pwm: meson: Read the full hardware state in meson_pwm_get_state()") Reported-by: Munehisa Kamata <kamatam@amazon.com> Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/r/20230310191405.2606296-1-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2023-03-23	pwm: sprd: Explicitly set .polarity in .get_state()	Uwe Kleine-König
	The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly. Fixes: 8aae4b02e8a6 ("pwm: sprd: Add Spreadtrum PWM support") Link: https://lore.kernel.org/r/20230228135508.1798428-5-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2023-03-23	pwm: iqs620a: Explicitly set .polarity in .get_state()	Uwe Kleine-König
	The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly. Fixes: 6f0841a8197b ("pwm: Add support for Azoteq IQS620A PWM generator") Reviewed-by: Jeff LaBundy <jeff@labundy.com> Link: https://lore.kernel.org/r/20230228135508.1798428-4-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2023-03-23	pwm: cros-ec: Explicitly set .polarity in .get_state()	Uwe Kleine-König
	The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly. Reviewed-by: Guenter Roeck <groeck@chromium.org> Fixes: 1f0d3bb02785 ("pwm: Add ChromeOS EC PWM driver") Link: https://lore.kernel.org/r/20230228135508.1798428-3-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2023-03-23	pwm: hibvt: Explicitly set .polarity in .get_state()	Uwe Kleine-König
	The driver only both polarities. Complete the implementation of .get_state() by setting .polarity according to the configured hardware state. Fixes: d09f00810850 ("pwm: Add PWM driver for HiSilicon BVT SOCs") Link: https://lore.kernel.org/r/20230228135508.1798428-2-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2023-03-23	drm/amd/display: Set dcn32 caps.seamless_odm	Hersen Wu
	[Why & How] seamless_odm set was not picked up while merging commit 2d017189e2b3 ("drm/amd/display: Blank eDP on enable drv if odm enabled") Fixes: 2d017189e2b3 ("drm/amd/display: Blank eDP on enable drv if odm enabled") Reviewed-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-23	drm/amd/display: fix wrong index used in dccg32_set_dpstreamclk	Hersen Wu
	[Why & How] When merging commit 9af611f29034 ("drm/amd/display: Fix DCN32 DPSTREAMCLK_CNTL programming"), index change was not picked up. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Fixes: 9af611f29034 ("drm/amd/display: Fix DCN32 DPSTREAMCLK_CNTL programming") Reviewed-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-23	drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi	Kai-Heng Feng
	S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default"). The root cause is still not clear for now. So extend and apply the ASPM quirk from commit e02fe3bc7aba ("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to workaround the issue on Navi cards too. Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-03-23	drm/amd/display: remove outdated 8bpc comments	Alex Hung
	[Why] The commit c76e483cd916 ("drm/amd/display: Don't restrict bpc to 8 bpc") removes the historical 8bpc dependency and sets max_bpc to 16. [How] The comment that states "8bpc for non-edp" needs to be removed as well. Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-23	drm/amdgpu/gfx: set cg flags to enter/exit safe mode	Jane Jian
	sriov needs to enter/exit safe mode in update umd p state add the cg flag to let it enter or exit while needed Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-23	drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs	YuBiao Wang
	[Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. The fences in this case are never signaled and next time when we try to free the sa_bo, kernel will hang. [How] During pre_asic_reset, driver will clear job fences and afterwards the fences' refcount will be reduced to 1. For drm_sched_jobs it will be released in job_free_cb, and for non-sched jobs like ib_test, it's meant to be released in sa_bo_free but only when the fences are signaled. So we have to force signal the non_sched bad job's fence during pre_asic_reset or the clear is not complete. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-23	drm/amdgpu: add mes resume when do gfx post soft reset	Tong Liu01
	[why] when gfx do soft reset, mes will also do reset, if mes is not resumed when do recover from soft reset, mes is unable to respond in later sequence [how] resume mes when do gfx post soft reset Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-23	drm/amdgpu: skip ASIC reset for APUs when go to S4	Tim Huang
	For GC IP v11.0.4/11, PSP TMR need to be reserved for ASIC mode2 reset. But for S4, when psp suspend, it will destroy the TMR that fails the ASIC reset. [ 96.006101] amdgpu 0000:62:00.0: amdgpu: MODE2 reset [ 100.409717] amdgpu 0000:62:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000011 SMN_C2PMSG_82:0x00000002 [ 100.411593] amdgpu 0000:62:00.0: amdgpu: Mode2 reset failed! [ 100.412470] amdgpu 0000:62:00.0: PM: pci_pm_freeze(): amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62 [ 100.414020] amdgpu 0000:62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62 [ 100.415311] amdgpu 0000:62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 4623202 usecs [ 100.416608] amdgpu 0000:62:00.0: PM: failed to freeze async: error -62 We can skip the reset on APUs, assuming we can resume them properly. Verified on some GFX11, GFX10 and old GFX9 APUs. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-03-23	drm/amdgpu: reposition the gpu reset checking for reuse	Tim Huang
	Move the amdgpu_acpi_should_gpu_reset out of CONFIG_SUSPEND to share it with hibernate case. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-03-23	vhost: use vhost_tasks for worker threads	Mike Christie
	For vhost workers we use the kthread API which inherit's its values from and checks against the kthreadd thread. This results in the wrong RLIMITs being checked, so while tools like libvirt try to control the number of threads based on the nproc rlimit setting we can end up creating more threads than the user wanted. This patch has us use the vhost_task helpers which will inherit its values/checks from the thread that owns the device similar to if we did a clone in userspace. The vhost threads will now be counted in the nproc rlimits. And we get features like cgroups and mm sharing automatically, so we can remove those calls. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>