summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-13x86/xen: don't let xen_pv_play_dead() returnJuergen Gross
A function called via the paravirt play_dead() hook should not return to the caller. xen_pv_play_dead() has a problem in this regard, as it currently will return in case an offlined cpu is brought to life again. This can be changed only by doing basically a longjmp() to cpu_bringup_and_idle(), as the hypercall for bringing down the cpu will just return when the cpu is coming up again. Just re-initializing the cpu isn't possible, as the Xen hypervisor will deny that operation. So introduce xen_cpu_bringup_again() resetting the stack and calling cpu_bringup_and_idle(), which can be called after HYPERVISOR_vcpu_op() in xen_pv_play_dead(). Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221125063248.30256-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13drivers/xen/hypervisor: Expose Xen SIF flags to userspacePer Bilse
/proc/xen is a legacy pseudo filesystem which predates Xen support getting merged into Linux. It has largely been replaced with more normal locations for data (/sys/hypervisor/ for info, /dev/xen/ for user devices). We want to compile xenfs support out of the dom0 kernel. There is one item which only exists in /proc/xen, namely /proc/xen/capabilities with "control_d" being the signal of "you're in the control domain". This ultimately comes from the SIF flags provided at VM start. This patch exposes all SIF flags in /sys/hypervisor/start_flags/ as boolean files, one for each bit, returning '1' if set, '0' otherwise. Two known flags, 'privileged' and 'initdomain', are explicitly named, and all remaining flags can be accessed via generically named files, as suggested by Andrew Cooper. Signed-off-by: Per Bilse <per.bilse@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230103130213.2129753-1-per.bilse@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-12tracing: Make trace_define_field_ext() staticSteven Rostedt (Google)
trace_define_field_ext() is not used outside of trace_events.c, it should be static. Link: https://lore.kernel.org/oe-kbuild-all/202302130750.679RaRog-lkp@intel.com/ Fixes: b6c7abd1c28a ("tracing: Fix TASK_COMM_LEN in trace event format file") Reported-by: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-12Linux 6.2-rc8v6.2-rc8Linus Torvalds
2023-02-12MAINTAINERS: Add myself as maintainer for arch/sh (SUPERH)John Paul Adrian Glaubitz
Both Rich Felker and Yoshinori Sato haven't done any work on arch/sh for a while. As I have been maintaining Debian's sh4 port since 2014, I am interested to keep the architecture alive. Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12Merge tag 'trace-v6.2-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Fix showing of TASK_COMM_LEN instead of its value The TASK_COMM_LEN was converted from a macro into an enum so that BTF would have access to it. But this unfortunately caused TASK_COMM_LEN to display in the format fields of trace events, as they are created by the TRACE_EVENT() macro and such, macros convert to their values, where as enums do not. To handle this, instead of using the field itself to be display, save the value of the array size as another field in the trace_event_fields structure, and use that instead. Not only does this fix the issue, but also converts the other trace events that have this same problem (but were not breaking tooling). With this change, the original work around b3bc8547d3be6 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well") could be reverted (but that should be done in the merge window)" * tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix TASK_COMM_LEN in trace event format file
2023-02-12Merge tag 'for-6.2-rc7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - one more fix for a tree-log 'write time corruption' report, update the last dir index directly and don't keep in the log context - do VFS-level inode lock around FIEMAP to prevent a deadlock with concurrent fsync, the extent-level lock is not sufficient - don't cache a single-device filesystem device to avoid cases when a loop device is reformatted and the entry gets stale * tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: free device in btrfs_close_devices for a single device filesystem btrfs: lock the inode in shared mode before starting fiemap btrfs: simplify update of last_dir_index_offset when logging a directory
2023-02-12Merge tag 'usb-6.2-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are 2 small USB driver fixes that resolve some reported regressions and one new device quirk. Specifically these are: - new quirk for Alcor Link AK9563 smartcard reader - revert of u_ether gadget change in 6.2-rc1 that caused problems - typec pin probe fix All of these have been in linux-next with no reported problems" * tag 'usb-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: core: add quirk for Alcor Link AK9563 smartcard reader usb: typec: altmodes/displayport: Fix probe pin assign check Revert "usb: gadget: u_ether: Do not make UDC parent of the net device"
2023-02-12Merge tag 'efi-fixes-for-v6.2-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "A fix from Darren to widen the SMBIOS match for detecting Ampere Altra machines with problematic firmware. In the mean time, we are working on a more precise check, but this is still work in progress" * tag 'efi-fixes-for-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: arm64: efi: Force the use of SetVirtualAddressMap() on eMAG and Altra Max machines
2023-02-12Merge tag 'powerpc-6.2-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix interrupt exit race with security mitigation switching. - Don't select ARCH_WANTS_NO_INSTR until warnings are fixed. - Build fix for CONFIG_NUMA=n. Thanks to Nicholas Piggin, Randy Dunlap, and Sachin Sant. * tag 'powerpc-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch powerpc/kexec_file: fix implicit decl error powerpc: Don't select ARCH_WANTS_NO_INSTR
2023-02-12Fix page corruption caused by racy check in __free_pagesDavid Chen
When we upgraded our kernel, we started seeing some page corruption like the following consistently: BUG: Bad page state in process ganesha.nfsd pfn:1304ca page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca flags: 0x17ffffc0000000() raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000 raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P B O 5.10.158-1.nutanix.20221209.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 Call Trace: dump_stack+0x74/0x96 bad_page.cold+0x63/0x94 check_new_page_bad+0x6d/0x80 rmqueue+0x46e/0x970 get_page_from_freelist+0xcb/0x3f0 ? _cond_resched+0x19/0x40 __alloc_pages_nodemask+0x164/0x300 alloc_pages_current+0x87/0xf0 skb_page_frag_refill+0x84/0x110 ... Sometimes, it would also show up as corruption in the free list pointer and cause crashes. After bisecting the issue, we found the issue started from commit e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages"): if (put_page_testzero(page)) free_the_page(page, order); else if (!PageHead(page)) while (order-- > 0) free_the_page(page + (1 << order), order); So the problem is the check PageHead is racy because at this point we already dropped our reference to the page. So even if we came in with compound page, the page can already be freed and PageHead can return false and we will end up freeing all the tail pages causing double free. Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages") Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12tracing: Fix TASK_COMM_LEN in trace event format fileYafang Shao
After commit 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"), the content of the format file under /sys/kernel/tracing/events/task/task_newtask was changed from field:char comm[16]; offset:12; size:16; signed:0; to field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0; John reported that this change breaks older versions of perfetto. Then Mathieu pointed out that this behavioral change was caused by the use of __stringify(_len), which happens to work on macros, but not on enum labels. And he also gave the suggestion on how to fix it: :One possible solution to make this more robust would be to extend :struct trace_event_fields with one more field that indicates the length :of an array as an actual integer, without storing it in its stringified :form in the type, and do the formatting in f_show where it belongs. The result as follows after this change, $ cat /sys/kernel/tracing/events/task/task_newtask/format field:char comm[16]; offset:12; size:16; signed:0; Link: https://lore.kernel.org/lkml/Y+QaZtz55LIirsUO@google.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230210155921.4610-1-laoar.shao@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230212151303.12353-1-laoar.shao@gmail.com Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Kajetan Puchalski <kajetan.puchalski@arm.com> CC: Qais Yousef <qyousef@layalina.io> Fixes: 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN") Reported-by: John Stultz <jstultz@google.com> Debugged-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-11Merge tag 'spi-fix-v6.2-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of hopefully final fixes for spi: one driver specific fix for an issue with very large transfers and a fix for an issue with the locking fixes in spidev merged earlier this release cycle which was missed" * tag 'spi-fix-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spidev: fix a recursive locking error spi: dw: Fix wrong FIFO level setting for long xfers
2023-02-11mmc: omap: drop TPS65010 dependencyArnd Bergmann
The original TPS65010 dependency was only needed for MACH_OMAP_H2, which is now gone, but I messed up the conversion when I removed that symbol. Now the missing TPS65010 causes a boot failure on other machines such as the SX1. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 0d7bb85e9413 ("ARM: omap1: remove unused board files") Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-11Merge tag 'x86-urgent-2023-02-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix a kprobes bug, plus add a new Intel model number to the upstream <asm/intel-family.h> header for drivers to use" * tag 'x86-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Lunar Lake M x86/kprobes: Fix 1 byte conditional jump target
2023-02-11Merge tag 'locking-urgent-2023-02-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix an rtmutex missed-wakeup bug" * tag 'locking-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rtmutex: Ensure that the top waiter is always woken up
2023-02-11Merge tag 'cxl-fixes-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "Two fixups for CXL (Compute Express Link) in presence of passthrough decoders. This primarily helps developers using the QEMU CXL emulation, but with the impending arrival of CXL switches these types of topologies will be of interest to end users. - Fix a crash when shutting down regions in the presence of passthrough decoders - Fix region creation to understand passthrough decoders instead of the narrower definition of passthrough ports" * tag 'cxl-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Fix passthrough-decoder detection cxl/region: Fix null pointer dereference for resetting decoder
2023-02-11Merge tag 'libnvdimm-fixes-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A fix for an issue that could causes users to inadvertantly reserve too much capacity when debugging the KMSAN and persistent memory namespace, a lockdep fix, and a kernel-doc build warning: - Resolve the conflict between KMSAN and NVDIMM with respect to reserving pmem namespace / volume capacity for larger sizeof(struct page) - Fix a lockdep warning in the the NFIT code - Fix a kernel-doc build warning" * tag 'libnvdimm-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE ACPI: NFIT: fix a potential deadlock during NFIT teardown dax: super.c: fix kernel-doc bad line warning
2023-02-11Merge tag 'fixes-2023-02-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock revert from Mike Rapoport: "Revert 'mm: Always release pages to the buddy allocator in memblock_free_late()' The pages being freed by memblock_free_late() have already been initialized, but if they are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies, which will cause a crash. A proper fix will be more involved so revert this change for the time being" * tag 'fixes-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
2023-02-11nfsd: don't destroy global nfs4_file table in per-net shutdownJeff Layton
The nfs4_file table is global, so shutting it down when a containerized nfsd is shut down is wrong and can lead to double-frees. Tear down the nfs4_file_rhltable in nfs4_state_shutdown instead of nfs4_state_shutdown_net. Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017 Reported-by: JianHong Yin <jiyin@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-11perf/x86/intel/uncore: Add Meteor Lake supportKan Liang
The uncore subsystem for Meteor Lake is similar to the previous Alder Lake. The main difference is that MTL provides PMU support for different tiles, while ADL only provides PMU support for the whole package. On ADL, there are CBOX, ARB, and clockbox uncore PMON units. On MTL, they are split into CBOX/HAC_CBOX, ARB/HAC_ARB, and cncu/sncu which provides a fixed counter for clockticks. Also, new MSR addresses are introduced on MTL. The IMC uncore PMON is the same as Alder Lake. Add new PCIIDs of IMC for Meteor Lake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230210190238.1726237-1-kan.liang@linux.intel.com
2023-02-11x86/entry: Fix unwinding from kprobe on PUSH/POP instructionJosh Poimboeuf
If a kprobe (INT3) is set on a stack-modifying single-byte instruction, like a single-byte PUSH/POP or a LEAVE, ORC fails to unwind past it: Call Trace: <TASK> dump_stack_lvl+0x57/0x90 handler_pre+0x33/0x40 [kprobe_example] aggr_pre_handler+0x49/0x90 kprobe_int3_handler+0xe3/0x180 do_int3+0x3a/0x80 exc_int3+0x7d/0xc0 asm_exc_int3+0x35/0x40 RIP: 0010:kernel_clone+0xe/0x3a0 Code: cc e8 16 b2 bf 00 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 cc <53> 48 89 fb 48 83 ec 68 4c 8b 27 65 48 8b 04 25 28 00 00 00 48 89 RSP: 0018:ffffc9000074fda0 EFLAGS: 00000206 RAX: 0000000000808100 RBX: ffff888109de9d80 RCX: 0000000000000000 RDX: 0000000000000011 RSI: ffff888109de9d80 RDI: ffffc9000074fdc8 RBP: ffff8881019543c0 R08: ffffffff81127e30 R09: 00000000e71742a5 R10: ffff888104764a18 R11: 0000000071742a5e R12: ffff888100078800 R13: ffff888100126000 R14: 0000000000000000 R15: ffff888100126005 ? __pfx_call_usermodehelper_exec_async+0x10/0x10 ? kernel_clone+0xe/0x3a0 ? user_mode_thread+0x5b/0x80 ? __pfx_call_usermodehelper_exec_async+0x10/0x10 ? call_usermodehelper_exec_work+0x77/0xb0 ? process_one_work+0x299/0x5f0 ? worker_thread+0x4f/0x3a0 ? __pfx_worker_thread+0x10/0x10 ? kthread+0xf2/0x120 ? __pfx_kthread+0x10/0x10 ? ret_from_fork+0x29/0x50 </TASK> The problem is that #BP saves the pointer to the instruction immediately *after* the INT3, rather than to the INT3 itself. The instruction replaced by the INT3 hasn't actually run, but ORC assumes otherwise and expects the wrong stack layout. Fix it by annotating the #BP exception as a non-signal stack frame, which tells the ORC unwinder to decrement the instruction pointer before looking up the corresponding ORC entry. Reported-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/baafcd3cc1abb14cb757fe081fa696012a5265ee.1676068346.git.jpoimboe@kernel.org
2023-02-11x86/unwind/orc: Add 'signal' field to ORC metadataJosh Poimboeuf
Add a 'signal' field which allows unwind hints to specify whether the instruction pointer should be taken literally (like for most interrupts and exceptions) rather than decremented (like for call stack return addresses) when used to find the next ORC entry. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org
2023-02-11x86/perf/zhaoxin: Add stepping check for ZXCsilviazhao
Some of Nano series processors will lead GP when accessing PMC fixed counter. Meanwhile, their hardware support for PMC has not announced externally. So exclude Nano CPUs from ZXC by checking stepping information. This is an unambiguous way to differentiate between ZXC and Nano CPUs. Following are Nano and ZXC FMS information: Nano FMS: Family=6, Model=F, Stepping=[0-A][C-D] ZXC FMS: Family=6, Model=F, Stepping=E-F OR Family=6, Model=0x19, Stepping=0-3 Fixes: 3a4ac121c2ca ("x86/perf: Add hardware performance events support for Zhaoxin CPU.") Reported-by: Arjan <8vvbbqzo567a@nospam.xutrox.com> Reported-by: Kevin Brace <kevinbrace@gmx.com> Signed-off-by: silviazhao <silviazhao-oc@zhaoxin.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389
2023-02-11perf/x86/intel/ds: Fix the conversion from TSC to perf timeKan Liang
The time order is incorrect when the TSC in a PEBS record is used. $perf record -e cycles:upp dd if=/dev/zero of=/dev/null count=10000 $ perf script --show-task-events perf-exec 0 0.000000: PERF_RECORD_COMM: perf-exec:915/915 dd 915 106.479872: PERF_RECORD_COMM exec: dd:915/915 dd 915 106.483270: PERF_RECORD_EXIT(915:915):(914:914) dd 915 106.512429: 1 cycles:upp: ffffffff96c011b7 [unknown] ([unknown]) ... ... The perf time is from sched_clock_cpu(). The current PEBS code unconditionally convert the TSC to native_sched_clock(). There is a shift between the two clocks. If the TSC is stable, the shift is consistent, __sched_clock_offset. If the TSC is unstable, the shift has to be calculated at runtime. This patch doesn't support the conversion when the TSC is unstable. The TSC unstable case is a corner case and very unlikely to happen. If it happens, the TSC in a PEBS record will be dropped and fall back to perf_event_clock(). Fixes: 47a3aeb39e8d ("perf/x86/intel/pebs: Fix PEBS timestamps overwritten") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/CAM9d7cgWDVAq8-11RbJ2uGfwkKD6fA-OMwOKDrNUrU_=8MgEjg@mail.gmail.com/
2023-02-11sched/rt: pick_next_rt_entity(): check list_entryPietro Borrello
Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") removed any path which could make pick_next_rt_entity() return NULL. However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of pick_next_rt_entity()) still checks the error condition, which can never happen, since list_entry() never returns NULL. Remove the BUG_ON check, and instead emit a warning in the only possible error condition here: the queue being empty which should never happen. Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd1ac6b@diag.uniroma1.it
2023-02-11sched/deadline: Add more reschedule cases to prio_changed_dl()Valentin Schneider
I've been tracking down an issue on a ~5.17ish kernel where: CPUx CPUy <DL task p0 owns an rtmutex M> <p0 depletes its runtime, gets throttled> <rq switches to the idle task> <DL task p1 blocks on M, boost/replenish p0> <No call to resched_curr() happens here> [idle task keeps running here until *something* accidentally sets TIF_NEED_RESCHED] On that kernel, it is quite easy to trigger using rt-tests's deadline_test [1] with the test running on isolated CPUs (this reduces the chance of something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the issue even more obvious as the hung task detector chimes in). I haven't been able to reproduce this using a mainline kernel, even if I revert 2972e3050e35 ("tracing: Make trace_marker{,_raw} stream-like") which gets rid of the lock involved in the above test, *but* I cannot convince myself the issue isn't there from looking at the code. Make prio_changed_dl() issue a reschedule if the current task isn't a deadline one. While at it, ensure a reschedule is emitted when a queued-but-not-current task gets boosted with an earlier deadline that current's. [1]: https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230206140612.701871-1-vschneid@redhat.com
2023-02-11sched/fair: sanitize vruntime of entity being placedZhang Qiao
When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards. However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu. To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale. [rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Co-developed-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de
2023-02-11sched/fair: Remove capacity inversion detectionVincent Guittot
Remove the capacity inversion detection which is now handled by util_fits_cpu() returning -1 when we need to continue to look for a potential CPU with better performance. This ends up almost reverting patches below except for some comments: commit da07d2f9c153 ("sched/fair: Fixes for capacity inversion detection") commit aa69c36f31aa ("sched/fair: Consider capacity inversion in util_fits_cpu()") commit 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230201143628.270912-3-vincent.guittot@linaro.org
2023-02-11sched/fair: unlink misfit task from cpu overutilizedVincent Guittot
By taking into account uclamp_min, the 1:1 relation between task misfit and cpu overutilized is no more true as a task with a small util_avg may not fit a high capacity cpu because of uclamp_min constraint. Add a new state in util_fits_cpu() to reflect the case that task would fit a CPU except for the uclamp_min hint which is a performance requirement. Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we can use this new value to take additional action to select the best CPU that doesn't match uclamp_min hint. When util_fits_cpu() returns -1, we will continue to look for a possible CPU with better performance, which replaces Capacity Inversion detection with capacity_orig_of() - thermal_load_avg to detect a capacity inversion. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-and-tested-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Link: https://lore.kernel.org/r/20230201143628.270912-2-vincent.guittot@linaro.org
2023-02-11objtool: mem*() are not uaccess safePeter Zijlstra
For mysterious raisins I listed the new __asan_mem*() functions as being uaccess safe, this is giving objtool fails on KASAN builds because these functions call out to the actual __mem*() functions which are not marked uaccess safe. Removing it doesn't make the robots unhappy. Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Bisected-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230126182302.GA687063@paulmck-ThinkPad-P17-Gen-1
2023-02-11x86/tsc: Do feature check as the very first thingBorislav Petkov (AMD)
Do the feature check as the very first thing in the function. Everything else comes after that and is meaningless work if the TSC CPUID bit is not even set. Switch to cpu_feature_enabled() too, while at it. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y5990CUCuWd5jfBH@zn.tnic
2023-02-11x86/tsc: Make recalibrate_cpu_khz() export GPL onlyBorislav Petkov (AMD)
A quick search doesn't reveal any use outside of the kernel - which would be questionable to begin with anyway - so make the export GPL only. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y599miBzWRAuOwhg@zn.tnic
2023-02-11x86/cacheinfo: Remove unused trace variableBorislav Petkov (AMD)
15cd8812ab2c ("x86: Remove the CPU cache size printk's") removed the last use of the trace local var. Remove it too and the useless trace cache case. No functional changes. Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230210234541.9694-1-bp@alien8.de Link: http://lore.kernel.org/r/20220705073349.1512-1-jiapeng.chong@linux.alibaba.com
2023-02-11ALSA: hda: Fix codec device field initializanCezary Rojewski
Commit f2bd1c5ae2cb ("ALSA: hda: Fix page fault in snd_hda_codec_shutdown()") relocated initialization of several codec device fields. Due to differences between codec_exec_verb() and snd_hdac_bus_exec_bus() in how they handle VERB execution - the latter does not touch PM - assigning ->exec_verb to codec_exec_verb() causes PM to be engaged before it is configured for the device. Configuration of PM for the ASoC HDAudio sound card is done with snd_hda_set_power_save() during skl_hda_audio_probe() whereas the assignment happens early, in snd_hda_codec_device_init(). Revert to previous behavior to avoid problems caused by too early PM manipulation. Suggested-by: Jason Montleon <jmontleo@redhat.com> Link: https://lore.kernel.org/regressions/CALFERdzKUodLsm6=Ub3g2+PxpNpPtPq3bGBLbff=eZr9_S=YVA@mail.gmail.com Fixes: f2bd1c5ae2cb ("ALSA: hda: Fix page fault in snd_hda_codec_shutdown()") Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lore.kernel.org/r/20230210165541.3543604-1-cezary.rojewski@intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-10Merge branch 'sk-sk_forward_alloc-fixes'Jakub Kicinski
Kuniyuki Iwashima says: ==================== sk->sk_forward_alloc fixes. The first patch fixes a negative sk_forward_alloc by adding sk_rmem_schedule() before skb_set_owner_r(), and second patch removes an unnecessary WARN_ON_ONCE(). v2: https://lore.kernel.org/netdev/20230209013329.87879-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20230207183718.54520-1-kuniyu@amazon.com/ ==================== Link: https://lore.kernel.org/r/20230210002202.81442-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues().Kuniyuki Iwashima
Christoph Paasch reported that commit b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues(). [0 - 2] Also, we can reproduce it by a program in [3]. In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy() to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in inet_csk_destroy_sock(). The same check has been in inet_sock_destruct() from at least v2.6, we can just remove the WARN_ON_ONCE(). However, among the users of sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct(). Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor(). [0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.com/ [1]: https://github.com/multipath-tcp/mptcp_net-next/issues/341 [2]: WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0 Modules linked in: CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9 RSP: 0018:ffff88810570fc68 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005 RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488 R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458 FS: 00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0 Call Trace: <TASK> inet_csk_destroy_sock+0x1a1/0x320 __tcp_close+0xab6/0xe90 tcp_close+0x30/0xc0 inet_release+0xe9/0x1f0 inet6_release+0x4c/0x70 __sock_release+0xd2/0x280 sock_close+0x15/0x20 __fput+0x252/0xa20 task_work_run+0x169/0x250 exit_to_user_mode_prepare+0x113/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f7fdf7ae28d Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000 R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8 </TASK> [3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/ Fixes: b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Christoph Paasch <christophpaasch@icloud.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.Kuniyuki Iwashima
Eric Dumazet pointed out [0] that when we call skb_set_owner_r() for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called, resulting in a negative sk_forward_alloc. We add a new helper which clones a skb and sets its owner only when sk_rmem_schedule() succeeds. Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv() because tcp_send_synack() can make sk_forward_alloc negative before ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases. [0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/ Fixes: 323fbd0edf3f ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()") Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10i40e: Add checking for null for nlmsg_find_attr()Natalia Petrova
The result of nlmsg_find_attr() 'br_spec' is dereferenced in nla_for_each_nested(), but it can take NULL value in nla_find() function, which will result in an error. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Signed-off-by: Natalia Petrova <n.petrova@fintech.ru> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230209172833.3596034-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10ice: xsk: Fix cleaning of XDP_TX framesLarysa Zaremba
Incrementation of xsk_frames inside the for-loop produces infinite loop, if we have both normal AF_XDP-TX and XDP_TXed buffers to complete. Split xsk_frames into 2 variables (xsk_frames and completed_frames) to eliminate this bug. Fixes: 29322791bc8b ("ice: xsk: change batched Tx descriptor cleaning") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230209160130.1779890-1-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net/sched: tcindex: update imperfect hash filters respecting rcuPedro Tammela
The imperfect hash area can be updated while packets are traversing, which will cause a use-after-free when 'tcf_exts_exec()' is called with the destroyed tcf_ext. CPU 0: CPU 1: tcindex_set_parms tcindex_classify tcindex_lookup tcindex_lookup tcf_exts_change tcf_exts_exec [UAF] Stop operating on the shared area directly, by using a local copy, and update the filter with 'rcu_replace_pointer()'. Delete the old filter version only after a rcu grace period elapsed. Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change") Reported-by: valis <sec@valis.email> Suggested-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10sctp: sctp_sock_filter(): avoid list_entry() on possibly empty listPietro Borrello
Use list_is_first() to check whether tsp->asoc matches the first element of ep->asocs, as the list is not guaranteed to have an entry. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniroma1.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: ethernet: ti: am65-cpsw: Add RX DMA Channel Teardown QuirkSiddharth Vadapalli
In TI's AM62x/AM64x SoCs, successful teardown of RX DMA Channel raises an interrupt. The process of servicing this interrupt involves flushing all pending RX DMA descriptors and clearing the teardown completion marker (TDCM). The am65_cpsw_nuss_rx_packets() function invoked from the RX NAPI callback services the interrupt. Thus, it is necessary to wait for this handler to run, drain all packets and clear TDCM, before calling napi_disable() in am65_cpsw_nuss_common_stop() function post channel teardown. If napi_disable() executes before ensuring that TDCM is cleared, the TDCM remains set when the interfaces are down, resulting in an interrupt storm when the interfaces are brought up again. Since the interrupt raised to indicate the RX DMA Channel teardown is specific to the AM62x and AM64x SoCs, add a quirk for it. Fixes: 4f7cce272403 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g") Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20230209084432.189222-1-s-vadapalli@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Two clk driver fixes - Use devm_kasprintf() to avoid overflows when forming clk names in the Microchip PolarFire driver - Fix the pretty broken Ingenic JZ4760 M/N/OD calculation to actually work and find proper divisors" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: ingenic: jz4760: Update M/N/OD calculation algorithm clk: microchip: mpfs-ccc: Use devm_kasprintf() for allocating formatted strings
2023-02-10Merge tag 'pinctrl-v6.2-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some assorted pin control fixes, the most interesting will be the Intel patch fixing a classic problem: laptop touchpad IRQs... - Some pin drive register fixes in the Mediatek driver. - Return proper error code in the Aspeed driver, and revert and ill-advised force-disablement patch that needs to be reworked. - Fix AMD driver debug output. - Fix potential NULL dereference in the Single driver. - Fix a group definition error in the Qualcomm SM8450 LPASS driver. - Restore pins used in direct IRQ mode in the Intel driver (This fixes some laptop touchpads!)" * tag 'pinctrl-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: intel: Restore the pins that used to be in Direct IRQ mode pinctrl: qcom: sm8450-lpass-lpi: correct swr_rx_data group pinctrl: aspeed: Revert "Force to disable the function's signal" pinctrl: single: fix potential NULL dereference pinctrl: amd: Fix debug output for debounce time pinctrl: aspeed: Fix confusing types in return value pinctrl: mediatek: Fix the drive register definition of some Pins
2023-02-10io_uring,audit: don't log IORING_OP_MADVISERichard Guy Briggs
fadvise and madvise both provide hints for caching or access pattern for file and memory respectively. Skip them. Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Link: https://lore.kernel.org/r/b5dfdcd541115c86dbc774aa9dd502c964849c5f.1675282642.git.rgb@redhat.com Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-10Merge tag 'pci-v6.2-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Bjorn Helgaas: - Move to a shared PCI git tree (Bjorn Helgaas) - Add Krzysztof Wilczyński as another PCI maintainer (Lorenzo Pieralisi) - Revert a couple ASPM patches to fix suspend/resume regressions (Bjorn Helgaas) * tag 'pci-v6.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming" Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume" MAINTAINERS: Promote Krzysztof to PCI controller maintainer MAINTAINERS: Move to shared PCI tree
2023-02-10Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"Bjorn Helgaas
This reverts commit 5e85eba6f50dc288c22083a7e213152bcc4b8208. Thomas Witt reported that 5e85eba6f50d ("PCI/ASPM: Refactor L1 PM Substates Control Register programming") broke suspend/resume on a Tuxedo Infinitybook S 14 v5, which seems to use a Clevo L140CU Mainboard. The main symptom is: iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible and the machine is only partially usable after resume. It can't run dmesg and can't do a clean reboot. This happens on every suspend/resume cycle. Revert 5e85eba6f50d until we can figure out the root cause. Fixes: 5e85eba6f50d ("PCI/ASPM: Refactor L1 PM Substates Control Register programming") Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877 Reported-by: Thomas Witt <kernel@witt.link> Tested-by: Thomas Witt <kernel@witt.link> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v6.1+ Cc: Vidya Sagar <vidyas@nvidia.com>
2023-02-10Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"Bjorn Helgaas
This reverts commit 4ff116d0d5fd8a025604b0802d93a2d5f4e465d1. Tasev Nikola and Mark Enriquez reported that resume from suspend was broken in v6.1-rc1. Tasev bisected to a47126ec29f5 ("PCI/PTM: Cache PTM Capability offset"), but we can't figure out how that could be related. Mark saw the same symptoms and bisected to 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"), which does have a connection: it restores L1 Substates configuration while ASPM L1 may be enabled: pci_restore_state pci_restore_aspm_l1ss_state aspm_program_l1ss pci_write_config_dword(PCI_L1SS_CTL1, ctl1) # L1SS restore pci_restore_pcie_state pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++]) # L1 restore which is a problem because PCIe r6.0, sec 5.5.4, requires that: If setting either or both of the enable bits for ASPM L1 PM Substates, both ports must be configured as described in this section while ASPM L1 is disabled. Separately, Thomas Witt reported that 5e85eba6f50d ("PCI/ASPM: Refactor L1 PM Substates Control Register programming") broke suspend/resume, and it depends on 4ff116d0d5fd. Revert 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") to fix the resume issue and enable revert of 5e85eba6f50d to fix the issue Thomas reported. Note that reverting 4ff116d0d5fd means L1 Substates config may be lost on suspend/resume. As far as we know the system will use more power but will still *work* correctly. Fixes: 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877 Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be> Reported-by: Mark Enriquez <enriquezmark36@gmail.com> Reported-by: Thomas Witt <kernel@witt.link> Tested-by: Mark Enriquez <enriquezmark36@gmail.com> Tested-by: Thomas Witt <kernel@witt.link> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v6.1+ Cc: Vidya Sagar <vidyas@nvidia.com>
2023-02-10Merge branch 'for-next/signal' into for-next/coreCatalin Marinas
* for-next/signal: : Signal handling cleanups arm64/signal: Only read new data when parsing the ZT context arm64/signal: Only read new data when parsing the ZA context arm64/signal: Only read new data when parsing the SVE context arm64/signal: Avoid rereading context frame sizes arm64/signal: Make interface for restore_fpsimd_context() consistent arm64/signal: Remove redundant size validation from parse_user_sigframe() arm64/signal: Don't redundantly verify FPSIMD magic