linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-02-13	ALSA: hda/realtek - fixed wrong gpio assigned	Kailang Yang
	GPIO2 PIN use for output. Mask Dir and Data need to assign for 0x4. Not 0x3. This fixed was for Lenovo Desktop(0x17aa1056). GPIO2 use for AMP enable. Signed-off-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/8d02bb9ac8134f878cd08607fdf088fd@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-13	crypto: certs: fix FIPS selftest dependency	Arnd Bergmann
	The selftest code is built into the x509_key_parser module, and depends on the pkcs7_message_parser module, which in turn has a dependency on the key parser, creating a dependency loop and a resulting link failure when the pkcs7 code is a loadable module: ld: crypto/asymmetric_keys/selftest.o: in function `fips_signature_selftest': crypto/asymmetric_keys/selftest.c:205: undefined reference to `pkcs7_parse_message' ld: crypto/asymmetric_keys/selftest.c:209: undefined reference to `pkcs7_supply_detached_data' ld: crypto/asymmetric_keys/selftest.c:211: undefined reference to `pkcs7_verify' ld: crypto/asymmetric_keys/selftest.c:215: undefined reference to `pkcs7_validate_trust' ld: crypto/asymmetric_keys/selftest.c:219: undefined reference to `pkcs7_free_message' Avoid this by only allowing the selftest to be enabled when either both parts are loadable modules, or both are built-in. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-02-13	Merge tag 'iwlwifi-next-for-kalle-2023-01-30' of ↵	Kalle Valo
	http://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next iwlwifi updates towards v6.3, this patch-set contains: * EHT rate reporting * Sniffer support for EHT and a few fixes in the related code * A few general cleanups and small bugfixes * Bump FW API to 74 for AX devices * Fix a compilation error in mei (it still depends on BROKEN) * STEP equalizer support - transfer some Phy related parameters from the BIOS to the firmware
2023-02-13	xen/grant-dma-iommu: Implement a dummy probe_device() callback	Oleksandr Tyshchenko
	Update stub IOMMU driver (which main purpose is to reuse generic IOMMU device-tree bindings by Xen grant DMA-mapping layer on Arm) according to the recent changes done in the following commit 57365a04c921 ("iommu: Move bus setup to IOMMU device registration"). With probe_device() callback being called during IOMMU device registration, the uninitialized callback just leads to the "kernel NULL pointer dereference" issue during boot. Fix that by adding a dummy callback. Looks like the release_device() callback is not mandatory to be implemented as IOMMU framework makes sure that callback is initialized before dereferencing. Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Fixes: 57365a04c921 ("iommu: Move bus setup to IOMMU device registration") Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Tested-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20230208153649.3604857-1-olekstysh@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13	nvme-pci: add bogus ID quirk for ADATA SX6000PNP	Daniel Wagner
	Yet another device which needs a quirk: nvme nvme1: globally duplicate IDs for nsid 1 nvme nvme1: VID:DID 10ec:5763 model:ADATA SX6000PNP firmware:V9002s94 Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1207827 Reported-by: Gustavo Freitas <freitasmgustavo@gmail.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-13	xen/pvcalls-back: fix permanently masked event channel	Volodymyr Babchuk
	There is a sequence of events that can lead to a permanently masked event channel, because xen_irq_lateeoi() is newer called. This happens when a backend receives spurious write event from a frontend. In this case pvcalls_conn_back_write() returns early and it does not clears the map->write counter. As map->write > 0, pvcalls_back_ioworker() returns without calling xen_irq_lateeoi(). This leaves the event channel in masked state, a backend does not receive any new events from a frontend and the whole communication stops. Move atomic_set(&map->write, 0) to the very beginning of pvcalls_conn_back_write() to fix this issue. Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Reported-by: Oleksii Moisieiev <oleksii_moisieiev@epam.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230119211037.1234931-1-volodymyr_babchuk@epam.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13	xen: Allow platform PCI interrupt to be shared	David Woodhouse
	When we don't use the per-CPU vector callback, we ask Xen to deliver event channel interrupts as INTx on the PCI platform device. As such, it can be shared with INTx on other PCI devices. Set IRQF_SHARED, and make it return IRQ_HANDLED or IRQ_NONE according to whether the evtchn_upcall_pending flag was actually set. Now I can share the interrupt: 11: 82 0 IO-APIC 11-fasteoi xen-platform-pci, ens4 Drop the IRQF_TRIGGER_RISING. It has no effect when the IRQ is shared, and besides, the only effect it was having even beforehand was to trigger a debug message in both I/OAPIC and legacy PIC cases: [ 0.915441] genirq: No set_type function for IRQ 11 (IO-APIC) [ 0.951939] genirq: No set_type function for IRQ 11 (XT-PIC) Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/f9a29a68d05668a3636dd09acd94d970269eaec6.camel@infradead.org Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13	x86/xen/time: prefer tsc as clocksource when it is invariant	Krister Johansen
	Kvm elects to use tsc instead of kvm-clock when it can detect that the TSC is invariant. (As of commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if invariant TSC is exposed")). Notable cloud vendors[1] and performance engineers[2] recommend that Xen users preferentially select tsc over xen-clocksource due the performance penalty incurred by the latter. These articles are persuasive and tailored to specific use cases. In order to understand the tradeoffs around this choice more fully, this author had to reference the documented[3] complexities around the Xen configuration, as well as the kernel's clocksource selection algorithm. Many users may not attempt this to correctly configure the right clock source in their guest. The approach taken in the kvm-clock module spares users this confusion, where possible. Both the Intel SDM[4] and the Xen tsc documentation explain that marking a tsc as invariant means that it should be considered stable by the OS and is elibile to be used as a wall clock source. In order to obtain better out-of-the-box performance, and reduce the need for user tuning, follow kvm's approach and decrease the xen clock rating so that tsc is preferable, if it is invariant, stable, and the tsc will never be emulated. [1] https://aws.amazon.com/premiumsupport/knowledge-center/manage-ec2-linux-clock-source/ [2] https://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html [3] https://xenbits.xen.org/docs/unstable/man/xen-tscmode.7.html [4] Intel 64 and IA-32 Architectures Sofware Developer's Manual Volume 3b: System Programming Guide, Part 2, Section 17.17.1, Invariant TSC Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Code-reviewed-by: David Reaver <me@davidreaver.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20221216162118.GB2633@templeofstupid.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13	x86/xen: mark xen_pv_play_dead() as __noreturn	Juergen Gross
	Mark xen_pv_play_dead() and related to that xen_cpu_bringup_again() as "__noreturn". Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221125063248.30256-3-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13	x86/xen: don't let xen_pv_play_dead() return	Juergen Gross
	A function called via the paravirt play_dead() hook should not return to the caller. xen_pv_play_dead() has a problem in this regard, as it currently will return in case an offlined cpu is brought to life again. This can be changed only by doing basically a longjmp() to cpu_bringup_and_idle(), as the hypercall for bringing down the cpu will just return when the cpu is coming up again. Just re-initializing the cpu isn't possible, as the Xen hypervisor will deny that operation. So introduce xen_cpu_bringup_again() resetting the stack and calling cpu_bringup_and_idle(), which can be called after HYPERVISOR_vcpu_op() in xen_pv_play_dead(). Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221125063248.30256-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-13	drivers/xen/hypervisor: Expose Xen SIF flags to userspace	Per Bilse
	/proc/xen is a legacy pseudo filesystem which predates Xen support getting merged into Linux. It has largely been replaced with more normal locations for data (/sys/hypervisor/ for info, /dev/xen/ for user devices). We want to compile xenfs support out of the dom0 kernel. There is one item which only exists in /proc/xen, namely /proc/xen/capabilities with "control_d" being the signal of "you're in the control domain". This ultimately comes from the SIF flags provided at VM start. This patch exposes all SIF flags in /sys/hypervisor/start_flags/ as boolean files, one for each bit, returning '1' if set, '0' otherwise. Two known flags, 'privileged' and 'initdomain', are explicitly named, and all remaining flags can be accessed via generically named files, as suggested by Andrew Cooper. Signed-off-by: Per Bilse <per.bilse@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230103130213.2129753-1-per.bilse@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-02-12	tracing: Make trace_define_field_ext() static	Steven Rostedt (Google)
	trace_define_field_ext() is not used outside of trace_events.c, it should be static. Link: https://lore.kernel.org/oe-kbuild-all/202302130750.679RaRog-lkp@intel.com/ Fixes: b6c7abd1c28a ("tracing: Fix TASK_COMM_LEN in trace event format file") Reported-by: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-13	zonefs: make kobj_type structure constant	Thomas Weißschuh
	Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2023-02-12	Linux 6.2-rc8v6.2-rc8	Linus Torvalds

2023-02-12	MAINTAINERS: Add myself as maintainer for arch/sh (SUPERH)	John Paul Adrian Glaubitz
	Both Rich Felker and Yoshinori Sato haven't done any work on arch/sh for a while. As I have been maintaining Debian's sh4 port since 2014, I am interested to keep the architecture alive. Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12	Merge tag 'trace-v6.2-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Fix showing of TASK_COMM_LEN instead of its value The TASK_COMM_LEN was converted from a macro into an enum so that BTF would have access to it. But this unfortunately caused TASK_COMM_LEN to display in the format fields of trace events, as they are created by the TRACE_EVENT() macro and such, macros convert to their values, where as enums do not. To handle this, instead of using the field itself to be display, save the value of the array size as another field in the trace_event_fields structure, and use that instead. Not only does this fix the issue, but also converts the other trace events that have this same problem (but were not breaking tooling). With this change, the original work around b3bc8547d3be6 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types as well") could be reverted (but that should be done in the merge window)" * tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix TASK_COMM_LEN in trace event format file
2023-02-12	Merge tag 'for-6.2-rc7-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - one more fix for a tree-log 'write time corruption' report, update the last dir index directly and don't keep in the log context - do VFS-level inode lock around FIEMAP to prevent a deadlock with concurrent fsync, the extent-level lock is not sufficient - don't cache a single-device filesystem device to avoid cases when a loop device is reformatted and the entry gets stale * tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: free device in btrfs_close_devices for a single device filesystem btrfs: lock the inode in shared mode before starting fiemap btrfs: simplify update of last_dir_index_offset when logging a directory
2023-02-12	Merge tag 'usb-6.2-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are 2 small USB driver fixes that resolve some reported regressions and one new device quirk. Specifically these are: - new quirk for Alcor Link AK9563 smartcard reader - revert of u_ether gadget change in 6.2-rc1 that caused problems - typec pin probe fix All of these have been in linux-next with no reported problems" * tag 'usb-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: core: add quirk for Alcor Link AK9563 smartcard reader usb: typec: altmodes/displayport: Fix probe pin assign check Revert "usb: gadget: u_ether: Do not make UDC parent of the net device"
2023-02-12	Merge tag 'efi-fixes-for-v6.2-4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "A fix from Darren to widen the SMBIOS match for detecting Ampere Altra machines with problematic firmware. In the mean time, we are working on a more precise check, but this is still work in progress" * tag 'efi-fixes-for-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: arm64: efi: Force the use of SetVirtualAddressMap() on eMAG and Altra Max machines
2023-02-12	Merge tag 'powerpc-6.2-5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix interrupt exit race with security mitigation switching. - Don't select ARCH_WANTS_NO_INSTR until warnings are fixed. - Build fix for CONFIG_NUMA=n. Thanks to Nicholas Piggin, Randy Dunlap, and Sachin Sant. * tag 'powerpc-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/interrupt: Fix interrupt exit race with security mitigation switch powerpc/kexec_file: fix implicit decl error powerpc: Don't select ARCH_WANTS_NO_INSTR
2023-02-12	Fix page corruption caused by racy check in __free_pages	David Chen
	When we upgraded our kernel, we started seeing some page corruption like the following consistently: BUG: Bad page state in process ganesha.nfsd pfn:1304ca page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca flags: 0x17ffffc0000000() raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000 raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P B O 5.10.158-1.nutanix.20221209.el7.x86_64 #1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 Call Trace: dump_stack+0x74/0x96 bad_page.cold+0x63/0x94 check_new_page_bad+0x6d/0x80 rmqueue+0x46e/0x970 get_page_from_freelist+0xcb/0x3f0 ? _cond_resched+0x19/0x40 __alloc_pages_nodemask+0x164/0x300 alloc_pages_current+0x87/0xf0 skb_page_frag_refill+0x84/0x110 ... Sometimes, it would also show up as corruption in the free list pointer and cause crashes. After bisecting the issue, we found the issue started from commit e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages"): if (put_page_testzero(page)) free_the_page(page, order); else if (!PageHead(page)) while (order-- > 0) free_the_page(page + (1 << order), order); So the problem is the check PageHead is racy because at this point we already dropped our reference to the page. So even if we came in with compound page, the page can already be freed and PageHead can return false and we will end up freeing all the tail pages causing double free. Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages") Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12	tracing: Fix TASK_COMM_LEN in trace event format file	Yafang Shao
	After commit 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"), the content of the format file under /sys/kernel/tracing/events/task/task_newtask was changed from field:char comm[16]; offset:12; size:16; signed:0; to field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0; John reported that this change breaks older versions of perfetto. Then Mathieu pointed out that this behavioral change was caused by the use of __stringify(_len), which happens to work on macros, but not on enum labels. And he also gave the suggestion on how to fix it: :One possible solution to make this more robust would be to extend :struct trace_event_fields with one more field that indicates the length :of an array as an actual integer, without storing it in its stringified :form in the type, and do the formatting in f_show where it belongs. The result as follows after this change, $ cat /sys/kernel/tracing/events/task/task_newtask/format field:char comm[16]; offset:12; size:16; signed:0; Link: https://lore.kernel.org/lkml/Y+QaZtz55LIirsUO@google.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230210155921.4610-1-laoar.shao@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230212151303.12353-1-laoar.shao@gmail.com Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Kajetan Puchalski <kajetan.puchalski@arm.com> CC: Qais Yousef <qyousef@layalina.io> Fixes: 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN") Reported-by: John Stultz <jstultz@google.com> Debugged-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-12	hwmon: (mlxreg-fan) Return zero speed for broken fan	Vadim Pasternak
	Currently for broken fan driver returns value calculated based on error code (0xFF) in related fan speed register. Thus, for such fan user gets fan{n}_fault to 1 and fan{n}_input with misleading value. Add check for fan fault prior return speed value and return zero if fault is detected. Fixes: 65afb4c8e7e4 ("hwmon: (mlxreg-fan) Add support for Mellanox FAN driver") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20230212145730.24247-1-vadimp@nvidia.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-02-12	dm table: check that a dm device doesn't reference itself	Benjamin Marzinski
	If a DM device's table references itself, it will crash the kernel with an infinite recursion. Check for a self-reference in dm_get_device(). This is a quick check, but it won't catch more complicated circular references. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-12	dm raid: fix some spelling mistakes in comments	Yu Zhe
	Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-11	Merge tag 'spi-fix-v6.2-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A couple of hopefully final fixes for spi: one driver specific fix for an issue with very large transfers and a fix for an issue with the locking fixes in spidev merged earlier this release cycle which was missed" * tag 'spi-fix-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spidev: fix a recursive locking error spi: dw: Fix wrong FIFO level setting for long xfers
2023-02-11	mmc: omap: drop TPS65010 dependency	Arnd Bergmann
	The original TPS65010 dependency was only needed for MACH_OMAP_H2, which is now gone, but I messed up the conversion when I removed that symbol. Now the missing TPS65010 causes a boot failure on other machines such as the SX1. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 0d7bb85e9413 ("ARM: omap1: remove unused board files") Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-11	Merge tag 'x86-urgent-2023-02-11' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix a kprobes bug, plus add a new Intel model number to the upstream <asm/intel-family.h> header for drivers to use" * tag 'x86-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Lunar Lake M x86/kprobes: Fix 1 byte conditional jump target
2023-02-11	Merge tag 'locking-urgent-2023-02-11' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix an rtmutex missed-wakeup bug" * tag 'locking-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rtmutex: Ensure that the top waiter is always woken up
2023-02-11	Merge tag 'cxl-fixes-6.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "Two fixups for CXL (Compute Express Link) in presence of passthrough decoders. This primarily helps developers using the QEMU CXL emulation, but with the impending arrival of CXL switches these types of topologies will be of interest to end users. - Fix a crash when shutting down regions in the presence of passthrough decoders - Fix region creation to understand passthrough decoders instead of the narrower definition of passthrough ports" * tag 'cxl-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Fix passthrough-decoder detection cxl/region: Fix null pointer dereference for resetting decoder
2023-02-11	Merge tag 'libnvdimm-fixes-6.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A fix for an issue that could causes users to inadvertantly reserve too much capacity when debugging the KMSAN and persistent memory namespace, a lockdep fix, and a kernel-doc build warning: - Resolve the conflict between KMSAN and NVDIMM with respect to reserving pmem namespace / volume capacity for larger sizeof(struct page) - Fix a lockdep warning in the the NFIT code - Fix a kernel-doc build warning" * tag 'libnvdimm-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE ACPI: NFIT: fix a potential deadlock during NFIT teardown dax: super.c: fix kernel-doc bad line warning
2023-02-11	Merge tag 'fixes-2023-02-11' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock revert from Mike Rapoport: "Revert 'mm: Always release pages to the buddy allocator in memblock_free_late()' The pages being freed by memblock_free_late() have already been initialized, but if they are in the deferred init range, __free_one_page() might access nearby uninitialized pages when trying to coalesce buddies, which will cause a crash. A proper fix will be more involved so revert this change for the time being" * tag 'fixes-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
2023-02-11	nfsd: don't destroy global nfs4_file table in per-net shutdown	Jeff Layton
	The nfs4_file table is global, so shutting it down when a containerized nfsd is shut down is wrong and can lead to double-frees. Tear down the nfs4_file_rhltable in nfs4_state_shutdown instead of nfs4_state_shutdown_net. Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017 Reported-by: JianHong Yin <jiyin@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-11	perf/x86/intel/uncore: Add Meteor Lake support	Kan Liang
	The uncore subsystem for Meteor Lake is similar to the previous Alder Lake. The main difference is that MTL provides PMU support for different tiles, while ADL only provides PMU support for the whole package. On ADL, there are CBOX, ARB, and clockbox uncore PMON units. On MTL, they are split into CBOX/HAC_CBOX, ARB/HAC_ARB, and cncu/sncu which provides a fixed counter for clockticks. Also, new MSR addresses are introduced on MTL. The IMC uncore PMON is the same as Alder Lake. Add new PCIIDs of IMC for Meteor Lake. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230210190238.1726237-1-kan.liang@linux.intel.com
2023-02-11	x86/perf/zhaoxin: Add stepping check for ZXC	silviazhao
	Some of Nano series processors will lead GP when accessing PMC fixed counter. Meanwhile, their hardware support for PMC has not announced externally. So exclude Nano CPUs from ZXC by checking stepping information. This is an unambiguous way to differentiate between ZXC and Nano CPUs. Following are Nano and ZXC FMS information: Nano FMS: Family=6, Model=F, Stepping=[0-A][C-D] ZXC FMS: Family=6, Model=F, Stepping=E-F OR Family=6, Model=0x19, Stepping=0-3 Fixes: 3a4ac121c2ca ("x86/perf: Add hardware performance events support for Zhaoxin CPU.") Reported-by: Arjan <8vvbbqzo567a@nospam.xutrox.com> Reported-by: Kevin Brace <kevinbrace@gmx.com> Signed-off-by: silviazhao <silviazhao-oc@zhaoxin.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212389
2023-02-11	perf/x86/intel/ds: Fix the conversion from TSC to perf time	Kan Liang
	The time order is incorrect when the TSC in a PEBS record is used. $perf record -e cycles:upp dd if=/dev/zero of=/dev/null count=10000 $ perf script --show-task-events perf-exec 0 0.000000: PERF_RECORD_COMM: perf-exec:915/915 dd 915 106.479872: PERF_RECORD_COMM exec: dd:915/915 dd 915 106.483270: PERF_RECORD_EXIT(915:915):(914:914) dd 915 106.512429: 1 cycles:upp: ffffffff96c011b7 [unknown] ([unknown]) ... ... The perf time is from sched_clock_cpu(). The current PEBS code unconditionally convert the TSC to native_sched_clock(). There is a shift between the two clocks. If the TSC is stable, the shift is consistent, __sched_clock_offset. If the TSC is unstable, the shift has to be calculated at runtime. This patch doesn't support the conversion when the TSC is unstable. The TSC unstable case is a corner case and very unlikely to happen. If it happens, the TSC in a PEBS record will be dropped and fall back to perf_event_clock(). Fixes: 47a3aeb39e8d ("perf/x86/intel/pebs: Fix PEBS timestamps overwritten") Reported-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/CAM9d7cgWDVAq8-11RbJ2uGfwkKD6fA-OMwOKDrNUrU_=8MgEjg@mail.gmail.com/
2023-02-11	sched/rt: pick_next_rt_entity(): check list_entry	Pietro Borrello
	Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") removed any path which could make pick_next_rt_entity() return NULL. However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of pick_next_rt_entity()) still checks the error condition, which can never happen, since list_entry() never returns NULL. Remove the BUG_ON check, and instead emit a warning in the only possible error condition here: the queue being empty which should never happen. Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd1ac6b@diag.uniroma1.it
2023-02-11	sched/deadline: Add more reschedule cases to prio_changed_dl()	Valentin Schneider
	I've been tracking down an issue on a ~5.17ish kernel where: CPUx CPUy <DL task p0 owns an rtmutex M> <p0 depletes its runtime, gets throttled> <rq switches to the idle task> <DL task p1 blocks on M, boost/replenish p0> <No call to resched_curr() happens here> [idle task keeps running here until something accidentally sets TIF_NEED_RESCHED] On that kernel, it is quite easy to trigger using rt-tests's deadline_test [1] with the test running on isolated CPUs (this reduces the chance of something unrelated setting TIF_NEED_RESCHED on the idle tasks, making the issue even more obvious as the hung task detector chimes in). I haven't been able to reproduce this using a mainline kernel, even if I revert 2972e3050e35 ("tracing: Make trace_marker{,_raw} stream-like") which gets rid of the lock involved in the above test, but I cannot convince myself the issue isn't there from looking at the code. Make prio_changed_dl() issue a reschedule if the current task isn't a deadline one. While at it, ensure a reschedule is emitted when a queued-but-not-current task gets boosted with an earlier deadline that current's. [1]: https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20230206140612.701871-1-vschneid@redhat.com
2023-02-11	sched/fair: sanitize vruntime of entity being placed	Zhang Qiao
	When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards. However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu. To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale. [rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Co-developed-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de
2023-02-11	sched/fair: Remove capacity inversion detection	Vincent Guittot
	Remove the capacity inversion detection which is now handled by util_fits_cpu() returning -1 when we need to continue to look for a potential CPU with better performance. This ends up almost reverting patches below except for some comments: commit da07d2f9c153 ("sched/fair: Fixes for capacity inversion detection") commit aa69c36f31aa ("sched/fair: Consider capacity inversion in util_fits_cpu()") commit 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230201143628.270912-3-vincent.guittot@linaro.org
2023-02-11	sched/fair: unlink misfit task from cpu overutilized	Vincent Guittot
	By taking into account uclamp_min, the 1:1 relation between task misfit and cpu overutilized is no more true as a task with a small util_avg may not fit a high capacity cpu because of uclamp_min constraint. Add a new state in util_fits_cpu() to reflect the case that task would fit a CPU except for the uclamp_min hint which is a performance requirement. Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we can use this new value to take additional action to select the best CPU that doesn't match uclamp_min hint. When util_fits_cpu() returns -1, we will continue to look for a possible CPU with better performance, which replaces Capacity Inversion detection with capacity_orig_of() - thermal_load_avg to detect a capacity inversion. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-and-tested-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Link: https://lore.kernel.org/r/20230201143628.270912-2-vincent.guittot@linaro.org
2023-02-11	objtool: mem*() are not uaccess safe	Peter Zijlstra
	For mysterious raisins I listed the new __asan_mem() functions as being uaccess safe, this is giving objtool fails on KASAN builds because these functions call out to the actual __mem() functions which are not marked uaccess safe. Removing it doesn't make the robots unhappy. Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Bisected-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230126182302.GA687063@paulmck-ThinkPad-P17-Gen-1
2023-02-11	x86/tsc: Do feature check as the very first thing	Borislav Petkov (AMD)
	Do the feature check as the very first thing in the function. Everything else comes after that and is meaningless work if the TSC CPUID bit is not even set. Switch to cpu_feature_enabled() too, while at it. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y5990CUCuWd5jfBH@zn.tnic
2023-02-11	x86/tsc: Make recalibrate_cpu_khz() export GPL only	Borislav Petkov (AMD)
	A quick search doesn't reveal any use outside of the kernel - which would be questionable to begin with anyway - so make the export GPL only. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y599miBzWRAuOwhg@zn.tnic
2023-02-11	x86/cacheinfo: Remove unused trace variable	Borislav Petkov (AMD)
	15cd8812ab2c ("x86: Remove the CPU cache size printk's") removed the last use of the trace local var. Remove it too and the useless trace cache case. No functional changes. Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230210234541.9694-1-bp@alien8.de Link: http://lore.kernel.org/r/20220705073349.1512-1-jiapeng.chong@linux.alibaba.com
2023-02-11	ALSA: hda: make kobj_type structure constant	Thomas Weißschuh
	Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definition to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230211-kobj_type-sound-v1-1-17107ceb25b7@weissschuh.net Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-11	ALSA: hda: Fix codec device field initializan	Cezary Rojewski
	Commit f2bd1c5ae2cb ("ALSA: hda: Fix page fault in snd_hda_codec_shutdown()") relocated initialization of several codec device fields. Due to differences between codec_exec_verb() and snd_hdac_bus_exec_bus() in how they handle VERB execution - the latter does not touch PM - assigning ->exec_verb to codec_exec_verb() causes PM to be engaged before it is configured for the device. Configuration of PM for the ASoC HDAudio sound card is done with snd_hda_set_power_save() during skl_hda_audio_probe() whereas the assignment happens early, in snd_hda_codec_device_init(). Revert to previous behavior to avoid problems caused by too early PM manipulation. Suggested-by: Jason Montleon <jmontleo@redhat.com> Link: https://lore.kernel.org/regressions/CALFERdzKUodLsm6=Ub3g2+PxpNpPtPq3bGBLbff=eZr9_S=YVA@mail.gmail.com Fixes: f2bd1c5ae2cb ("ALSA: hda: Fix page fault in snd_hda_codec_shutdown()") Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lore.kernel.org/r/20230210165541.3543604-1-cezary.rojewski@intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-02-10	Merge branch 'sk-sk_forward_alloc-fixes'	Jakub Kicinski
	Kuniyuki Iwashima says: ==================== sk->sk_forward_alloc fixes. The first patch fixes a negative sk_forward_alloc by adding sk_rmem_schedule() before skb_set_owner_r(), and second patch removes an unnecessary WARN_ON_ONCE(). v2: https://lore.kernel.org/netdev/20230209013329.87879-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20230207183718.54520-1-kuniyu@amazon.com/ ==================== Link: https://lore.kernel.org/r/20230210002202.81442-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10	net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues().	Kuniyuki Iwashima
	Christoph Paasch reported that commit b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues(). [0 - 2] Also, we can reproduce it by a program in [3]. In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy() to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in inet_csk_destroy_sock(). The same check has been in inet_sock_destruct() from at least v2.6, we can just remove the WARN_ON_ONCE(). However, among the users of sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct(). Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor(). [0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.com/ [1]: https://github.com/multipath-tcp/mptcp_net-next/issues/341 [2]: WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0 Modules linked in: CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9 RSP: 0018:ffff88810570fc68 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005 RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488 R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458 FS: 00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0 Call Trace: <TASK> inet_csk_destroy_sock+0x1a1/0x320 __tcp_close+0xab6/0xe90 tcp_close+0x30/0xc0 inet_release+0xe9/0x1f0 inet6_release+0x4c/0x70 __sock_release+0xd2/0x280 sock_close+0x15/0x20 __fput+0x252/0xa20 task_work_run+0x169/0x250 exit_to_user_mode_prepare+0x113/0x120 syscall_exit_to_user_mode+0x1d/0x40 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f7fdf7ae28d Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000 R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8 </TASK> [3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/ Fixes: b5fc29233d28 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Christoph Paasch <christophpaasch@icloud.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10	dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.	Kuniyuki Iwashima
	Eric Dumazet pointed out [0] that when we call skb_set_owner_r() for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called, resulting in a negative sk_forward_alloc. We add a new helper which clones a skb and sets its owner only when sk_rmem_schedule() succeeds. Note that we move skb_set_owner_r() forward in (dccp\|tcp)_v6_do_rcv() because tcp_send_synack() can make sk_forward_alloc negative before ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases. [0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/ Fixes: 323fbd0edf3f ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()") Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>