summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2023-06-06efi/libstub: Implement support for unaccepted memoryKirill A. Shutemov
UEFI Specification version 2.9 introduces the concept of memory acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, requiring memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific for the Virtual Machine platform. Accepting memory is costly and it makes VMM allocate memory for the accepted guest physical address range. It's better to postpone memory acceptance until memory is needed. It lowers boot time and reduces memory overhead. The kernel needs to know what memory has been accepted. Firmware communicates this information via memory map: a new memory type -- EFI_UNACCEPTED_MEMORY -- indicates such memory. Range-based tracking works fine for firmware, but it gets bulky for the kernel: e820 (or whatever the arch uses) has to be modified on every page acceptance. It leads to table fragmentation and there's a limited number of entries in the e820 table. Another option is to mark such memory as usable in e820 and track if the range has been accepted in a bitmap. One bit in the bitmap represents a naturally aligned power-2-sized region of address space -- unit. For x86, unit size is 2MiB: 4k of the bitmap is enough to track 64GiB or physical address space. In the worst-case scenario -- a huge hole in the middle of the address space -- It needs 256MiB to handle 4PiB of the address space. Any unaccepted memory that is not aligned to unit_size gets accepted upfront. The bitmap is allocated and constructed in the EFI stub and passed down to the kernel via EFI configuration table. allocate_e820() allocates the bitmap if unaccepted memory is present, according to the size of unaccepted region. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20230606142637.5171-4-kirill.shutemov@linux.intel.com
2023-06-06mm: Add support for unaccepted memoryKirill A. Shutemov
UEFI Specification version 2.9 introduces the concept of memory acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, require memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific to the Virtual Machine platform. There are several ways the kernel can deal with unaccepted memory: 1. Accept all the memory during boot. It is easy to implement and it doesn't have runtime cost once the system is booted. The downside is very long boot time. Accept can be parallelized to multiple CPUs to keep it manageable (i.e. via DEFERRED_STRUCT_PAGE_INIT), but it tends to saturate memory bandwidth and does not scale beyond the point. 2. Accept a block of memory on the first use. It requires more infrastructure and changes in page allocator to make it work, but it provides good boot time. On-demand memory accept means latency spikes every time kernel steps onto a new memory block. The spikes will go away once workload data set size gets stabilized or all memory gets accepted. 3. Accept all memory in background. Introduce a thread (or multiple) that gets memory accepted proactively. It will minimize time the system experience latency spikes on memory allocation while keeping low boot time. This approach cannot function on its own. It is an extension of #2: background memory acceptance requires functional scheduler, but the page allocator may need to tap into unaccepted memory before that. The downside of the approach is that these threads also steal CPU cycles and memory bandwidth from the user's workload and may hurt user experience. Implement #1 and #2 for now. #2 is the default. Some workloads may want to use #1 with accept_memory=eager in kernel command line. #3 can be implemented later based on user's demands. Support of unaccepted memory requires a few changes in core-mm code: - memblock accepts memory on allocation. It serves early boot memory allocations and doesn't limit them to pre-accepted pool of memory. - page allocator accepts memory on the first allocation of the page. When kernel runs out of accepted memory, it accepts memory until the high watermark is reached. It helps to minimize fragmentation. EFI code will provide two helpers if the platform supports unaccepted memory: - accept_memory() makes a range of physical addresses accepted. - range_contains_unaccepted_memory() checks anything within the range of physical addresses requires acceptance. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mike Rapoport <rppt@linux.ibm.com> # memblock Link: https://lore.kernel.org/r/20230606142637.5171-2-kirill.shutemov@linux.intel.com
2023-06-06regulator: Add X-Powers AXP15060/AXP313a PMICMark Brown
Merge series from Andre Przywara <andre.przywara@arm.com>: This patch series adds support for the X-Powers AXP15060 and AXP313a PMIC, which are general purpose PMICs as seen on different boards with different SOCs, mostly from Allwinner. This is mostly a repost of the previous patches, combining both the AXP313a and AXP15060 series, rebased on top of v6.4-rc3, and omitting the patches that already got merged. The first two patches are the successors of the AXP313a v10 post, the third patch is based on Shengyu's AXP15060 v3 post. There were no code changes, just some tiny context differences due to the rebase, plus I added the newly gained tags. As the DT bindings and the AXP15060 MFD part are already in the tree, this is just completing support with the MFD part for the AXP313a, and the regulator support for both PMICs.
2023-06-06firmware: arm_scmi: Add Powercap protocol enable supportCristian Marussi
SCMI powercap protocol v3.2 supports disabling the powercap on a zone by zone basis by providing a zero valued powercap. Expose new operations to enable/disable powercapping on a per-zone base. Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20230531152039.2363181-3-cristian.marussi@arm.com Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2023-06-06Merge tag 'platform-drivers-x86-v6.4-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: - various Microsoft Surface support fixes - one fix for the INT3472 driver * tag 'platform-drivers-x86-v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: int3472: Avoid crash in unregistering regulator gpio platform/surface: aggregator_tabletsw: Add support for book mode in POS subsystem platform/surface: aggregator_tabletsw: Add support for book mode in KIP subsystem platform/surface: aggregator: Allow completion work-items to be executed in parallel platform/surface: aggregator: Make to_ssam_device_driver() respect constness
2023-06-06tracing/probes: Add tracepoint support on fprobe_eventsMasami Hiramatsu (Google)
Allow fprobe_events to trace raw tracepoints so that user can trace tracepoints which don't have traceevent wrappers. This new event is always available if the fprobe_events is enabled (thus no kconfig), because the fprobe_events depends on the trace-event and traceporint. e.g. # echo 't sched_overutilized_tp' >> dynamic_events # echo 't 9p_client_req' >> dynamic_events # cat dynamic_events t:tracepoints/sched_overutilized_tp sched_overutilized_tp t:tracepoints/_9p_client_req 9p_client_req The event name is based on the tracepoint name, but if it is started with digit character, an underscore '_' will be added. NOTE: to avoid further confusion, this renames TPARG_FL_TPOINT to TPARG_FL_TEVENT because this flag is used for eprobe (trace-event probe). And reuse TPARG_FL_TPOINT for this raw tracepoint probe. Link: https://lore.kernel.org/all/168507471874.913472.17214624519622959593.stgit@mhiramat.roam.corp.google.com/ Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202305020453.afTJ3VVp-lkp@intel.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-06-06tracing/probes: Add fprobe events for tracing function entry and exit.Masami Hiramatsu (Google)
Add fprobe events for tracing function entry and exit instead of kprobe events. With this change, we can continue to trace function entry/exit even if the CONFIG_KPROBES_ON_FTRACE is not available. Since CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS, it is not available if the architecture only supports CONFIG_DYNAMIC_FTRACE_WITH_ARGS. And that means kprobe events can not probe function entry/exit effectively on such architecture. But this can be solved if the dynamic events supports fprobe events. The fprobe event is a new dynamic events which is only for the function (symbol) entry and exit. This event accepts non register fetch arguments so that user can trace the function arguments and return values. The fprobe events syntax is here; f[:[GRP/][EVENT]] FUNCTION [FETCHARGS] f[MAXACTIVE][:[GRP/][EVENT]] FUNCTION%return [FETCHARGS] E.g. # echo 'f vfs_read $arg1' >> dynamic_events # echo 'f vfs_read%return $retval' >> dynamic_events # cat dynamic_events f:fprobes/vfs_read__entry vfs_read arg1=$arg1 f:fprobes/vfs_read__exit vfs_read%return arg1=$retval # echo 1 > events/fprobes/enable # head -n 20 trace | tail # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | sh-142 [005] ...1. 448.386420: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386436: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.386451: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386458: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.386469: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386476: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.602073: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.602089: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 Link: https://lore.kernel.org/all/168507469754.913472.6112857614708350210.stgit@mhiramat.roam.corp.google.com/ Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/all/202302011530.7vm4O8Ro-lkp@intel.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-06-06fprobe: Pass return address to the handlersMasami Hiramatsu (Google)
Pass return address as 'ret_ip' to the fprobe entry and return handlers so that the fprobe user handler can get the reutrn address without analyzing arch-dependent pt_regs. Link: https://lore.kernel.org/all/168507467664.913472.11642316698862778600.stgit@mhiramat.roam.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-06-06dt-bindings: gpio: Remove FSI domain ports on Tegra234Prathamesh Shete
Ports S, T, U and V are in a separate controller that is part of the FSI domain. Remove their definitions from the MAIN controller definitions to get rid of the confusion. This technically breaks ABI compatibility with old device trees. However it doesn't cause issues in practice. The GPIO pins impacted by this are used for non-critical functionality. Fixes: a8b10f3d12cfc ("dt-bindings: gpio: Add Tegra234 support") Signed-off-by: Prathamesh Shete <pshete@nvidia.com> [treding@nvidia.com: rewrite commit message] Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2023-06-06wifi: mac80211: provide a helper to fetch the medium synchronization delayEmmanuel Grumbach
There are drivers which need this information. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230604120651.b1043f3126e2.Iad3806f8bf8df07f52ef0a02cc3d0373c44a8c93@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-06wifi: mac80211: fetch and store the EML capability informationEmmanuel Grumbach
We need to teach the low level driver about the EML capability which includes information for EMLSR / EMLMR operation. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230504134511.828474-11-gregory.greenman@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-06wifi: mac80211: skip EHT BSS membership selectorJohannes Berg
Skip the EHT BSS membership selector for getting rates. While at it, add the definitions for GLK and EPS, and sort the list. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230504134511.828474-9-gregory.greenman@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-06s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctlTony Krowiak
Realize the VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve the information for the VFIO device request IRQ. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20230530223538.279198-2-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-06gro: decrease size of CBRichard Gobert
The GRO control block (NAPI_GRO_CB) is currently at its maximum size. This commit reduces its size by putting two groups of fields that are used only at different times into a union. Specifically, the fields frag0 and frag0_len are the fields that make up the frag0 optimisation mechanism, which is used during the initial parsing of the SKB. The fields last and age are used after the initial parsing, while the SKB is stored in the GRO list, waiting for other packets to arrive. There was one location in dev_gro_receive that modified the frag0 fields after setting last and age. I changed this accordingly without altering the code behaviour. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230601161407.GA9253@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-06watch_queue: prevent dangling pipe pointerSiddh Raman Pant
NULL the dangling pipe reference while clearing watch_queue. If not done, a reference to a freed pipe remains in the watch_queue, as this function is called before freeing a pipe in free_pipe_info() (see line 834 of fs/pipe.c). The sole use of wqueue->defunct is for checking if the watch queue has been cleared, but wqueue->pipe is also NULLed while clearing. Thus, wqueue->defunct is superfluous, as wqueue->pipe can be checked for NULL. Hence, the former can be removed. Tested with keyutils testsuite. Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Siddh Raman Pant <code@siddh.me> Acked-by: David Howells <dhowells@redhat.com> Message-Id: <20230605143616.640517-1-code@siddh.me> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-06Merge tag 'v6.4-rc4' into wpan-next/stagingMiquel Raynal
Linux 6.4-rc4
2023-06-05Bluetooth: ISO: use correct CIS order in Set CIG Parameters eventPauli Virtanen
The order of CIS handle array in Set CIG Parameters response shall match the order of the CIS_ID array in the command (Core v5.3 Vol 4 Part E Sec 7.8.97). We send CIS_IDs mainly in the order of increasing CIS_ID (but with "last" CIS first if it has fixed CIG_ID). In handling of the reply, we currently assume this is also the same as the order of hci_conn in hdev->conn_hash, but that is not true. Match the correct hci_conn to the correct handle by matching them based on the CIG+CIS combination. The CIG+CIS combination shall be unique for ISO_LINK hci_conn at state >= BT_BOUND, which we maintain in hci_le_set_cig_params. Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: fix debugfs registrationJohan Hovold
Since commit ec6cef9cd98d ("Bluetooth: Fix SMP channel registration for unconfigured controllers") the debugfs interface for unconfigured controllers will be created when the controller is configured. There is however currently nothing preventing a controller from being configured multiple time (e.g. setting the device address using btmgmt) which results in failed attempts to register the already registered debugfs entries: debugfs: File 'features' in directory 'hci0' already present! debugfs: File 'manufacturer' in directory 'hci0' already present! debugfs: File 'hci_version' in directory 'hci0' already present! ... debugfs: File 'quirk_simultaneous_discovery' in directory 'hci0' already present! Add a controller flag to avoid trying to register the debugfs interface more than once. Fixes: ec6cef9cd98d ("Bluetooth: Fix SMP channel registration for unconfigured controllers") Cc: stable@vger.kernel.org # 4.0 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-05Bluetooth: hci_sync: add lock to protect HCI_UNREGISTERZhengping Jiang
When the HCI_UNREGISTER flag is set, no jobs should be scheduled. Fix potential race when HCI_UNREGISTER is set after the flag is tested in hci_cmd_sync_queue. Fixes: 0b94f2651f56 ("Bluetooth: hci_sync: Fix queuing commands when HCI_UNREGISTER is set") Signed-off-by: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-06-06firewire: fix warnings to generate UAPI documentationTakashi Sakamoto
Any target to generate UAPI documentation reports warnings to missing annotation for padding member in structures added recently. This commit suppresses the warnings. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/lkml/20230531135306.43613a59@canb.auug.org.au/ Fixes: 7c22d4a92bb2 ("firewire: cdev: add new event to notify request subaction with time stamp") Fixes: fc2b52cf2e0e ("firewire: cdev: add new event to notify response subaction with time stamp") Link: https://lore.kernel.org/r/20230601144937.121179-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2023-06-05lib/string_helpers: Change returned value of the strreplace()Andy Shevchenko
It's more useful to return the pointer to the string itself with strreplace(), so it may be used like attr->name = strreplace(name, '/', '_'); While at it, amend the kernel documentation. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605170553.7835-3-andriy.shevchenko@linux.intel.com
2023-06-05Merge branch 'drm-i915-use-ref_tracker-library-for-tracking-wakerefs'Jakub Kicinski
Andrzej Hajda says: ==================== drm/i915: use ref_tracker library for tracking wakerefs This is reviewed series of ref_tracker patches, ready to merge via network tree, rebased on net-next/main. i915 patches will be merged later via intel-gfx tree. ==================== Merge on top of an -rc tag in case it's needed in another tree. Link: https://lore.kernel.org/r/20230224-track_gt-v9-0-5b47a33f55d1@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05lib/ref_tracker: add printing to memory bufferAndrzej Hajda
Similar to stack_(depot|trace)_snprint the patch adds helper to printing stats to memory buffer. It will be helpful in case of debugfs. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05lib/ref_tracker: improve printing statsAndrzej Hajda
In case the library is tracking busy subsystem, simply printing stack for every active reference will spam log with long, hard to read, redundant stack traces. To improve readabilty following changes have been made: - reports are printed per stack_handle - log is more compact, - added display name for ref_tracker_dir - it will differentiate multiple subsystems, - stack trace is printed indented, in the same printk call, - info about dropped references is printed as well. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05lib/ref_tracker: add unlocked leak print helperAndrzej Hajda
To have reliable detection of leaks, caller must be able to check under the same lock both: tracked counter and the leaks. dir.lock is natural candidate for such lock and unlocked print helper can be called with this lock taken. As a bonus we can reuse this helper in ref_tracker_dir_exit. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-05bpf: netfilter: Add BPF_NETFILTER bpf_attach_typeFlorian Westphal
Andrii Nakryiko writes: And we currently don't have an attach type for NETLINK BPF link. Thankfully it's not too late to add it. I see that link_create() in kernel/bpf/syscall.c just bypasses attach_type check. We shouldn't have done that. Instead we need to add BPF_NETLINK attach type to enum bpf_attach_type. And wire all that properly throughout the kernel and libbpf itself. This adds BPF_NETFILTER and uses it. This breaks uabi but this wasn't in any non-rc release yet, so it should be fine. v2: check link_attack prog type in link_create too Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs") Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4BzZ69YgrQW7DHCJUT_X+GqMq_ZQQPBwopaJJVGFD5=d5Vg@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20230605131445.32016-1-fw@strlen.de
2023-06-05KVM: Fix comment for KVM_ENABLE_CAPBinbin Wu
Fix comment for vcpu ioctl version of KVM_ENABLE_CAP. KVM provides ioctl KVM_ENABLE_CAP to allow userspace to enable an extension which is not enabled by default. For vcpu ioctl version, it is available with the capability KVM_CAP_ENABLE_CAP. For vm ioctl version, it is available with the capability KVM_CAP_ENABLE_CAP_VM. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20230518091339.1102-2-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-05sched/clock: Provide local_clock_noinstr()Peter Zijlstra
Now that all ARCH_WANTS_NO_INSTR architectures (arm64, loongarch, s390, x86) provide sched_clock_noinstr(), use this to provide local_clock_noinstr(). This local_clock_noinstr() will be safe to use from noinstr code with the assumption that any such noinstr code is non-preemptible (it had better be, entry code will have IRQs disabled while __cpuidle must have preemption disabled). Specifically, preempt_enable_notrace(), a common part of many a sched_clock() implementation calls out to schedule() -- even though, per the above, it will never trigger -- which frustrates noinstr validation. vmlinux.o: warning: objtool: local_clock+0xb5: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102715.978624636@infradead.org
2023-06-05clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing ↵Peter Zijlstra
U64_MAX Currently hv_read_tsc_page_tsc() (ab)uses the (valid) time value of U64_MAX as an error return. This breaks the clean wrap-around of the clock. Modify the function signature to return a boolean state and provide another u64 pointer to store the actual time on success. This obviates the need to steal one time value and restores the full counter width. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102715.775630881@infradead.org
2023-06-05math64: Always inline u128 version of mul_u64_u64_shr()Peter Zijlstra
In order to prevent the following complaint from happening, always inline the u128 variant of mul_u64_u64_shr() -- which is what x86_64 will use. vmlinux.o: warning: objtool: read_hv_sched_clock_tsc+0x5a: call to mul_u64_u64_shr.constprop.0() leaves .noinstr.text section It should compile into something like: asm("mul %[mul];" "shrd %rdx, %rax, %cl" : "+&a" (a) : "c" shift, [mul] "r" (mul) : "d"); Which is silly not to inline, but it happens. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102715.637420396@infradead.org
2023-06-05seqlock/latch: Provide raw_read_seqcount_latch_retry()Peter Zijlstra
The read side of seqcount_latch consists of: do { seq = raw_read_seqcount_latch(&latch->seq); ... } while (read_seqcount_latch_retry(&latch->seq, seq)); which is asymmetric in the raw_ department, and sure enough, read_seqcount_latch_retry() includes (explicit) instrumentation where raw_read_seqcount_latch() does not. This inconsistency becomes a problem when trying to use it from noinstr code. As such, fix it by renaming and re-implementing raw_read_seqcount_latch_retry() without the instrumentation. Specifically the instrumentation in question is kcsan_atomic_next(0) in do___read_seqcount_retry(). Loosing this annotation is not a problem because raw_read_seqcount_latch() does not pass through kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102715.233598176@infradead.org
2023-06-05sched: Unconditionally use full-fat wait_task_inactive()Peter Zijlstra
While modifying wait_task_inactive() for PREEMPT_RT; the build robot noted that UP got broken. This led to audit and consideration of the UP implementation of wait_task_inactive(). It looks like the UP implementation is also broken for PREEMPT; consider task_current_syscall() getting preempted between the two calls to wait_task_inactive(). Therefore move the wait_task_inactive() implementation out of CONFIG_SMP and unconditionally use it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230602103731.GA630648%40hirez.programming.kicks-ass.net
2023-06-05block: mark early_lookup_bdev as __initChristoph Hellwig
early_lookup_bdev is now only used during the early boot code as it should, so mark it __init to not waste run time memory on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05dm: remove dm_get_dev_tChristoph Hellwig
Open code dm_get_dev_t in the only remaining caller, and propagate the exact error code from lookup_bdev and early_lookup_bdev. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: move more code to early-lookup.cChristoph Hellwig
blk_lookup_devt is only used by code in early-lookup.c, so move it there. printk_all_partitions and it's helper bdevt_str are only used by the early init code in init/do_mounts.c, so they should go there as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05init: improve the name_to_dev_t interfaceChristoph Hellwig
name_to_dev_t has a very misleading name, that doesn't make clear it should only be used by the early init code, and also has a bad calling convention that doesn't allow returning different kinds of errors. Rename it to early_lookup_bdev to make the use case clear, and return an errno, where -EINVAL means the string could not be parsed, and -ENODEV means it the string was valid, but there was no device found for it. Also stub out the whole call for !CONFIG_BLOCK as all the non-block root cases are always covered in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05init: handle ubi/mtd root mounting like all other root typesChristoph Hellwig
Assign a Root_Generic magic value for UBI/MTD root and handle the root mounting in mount_root like all other root types. Besides making the code more clear this also means that UBI/MTD root can be used together with an initrd (not that anyone should care). Also factor parsing of the root name into a helper now that it can be easily done and will get more complicated with subsequent patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05init: remove pointless Root_* valuesChristoph Hellwig
Remove all unused defines, and just use the expanded versions for the SCSI disk majors. I've decided to keep Root_RAM0 even if it could be expanded as there is a lot of special casing for it in the init code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05driver core: return bool from driver_probe_doneChristoph Hellwig
bool is the most sensible return value for a yes/no return. Also add __init as this funtion is only called from the early boot code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230531125535.676098-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05drm/i915/huc: differentiate the 2 steps of the MTL HuC auth flowDaniele Ceraolo Spurio
Before we add the second step of the MTL HuC auth (via GSC), we need to have the ability to differentiate between them. To do so, the huc authentication check is duplicated for GuC and GSC auth, with GSC-enabled binaries being considered fully authenticated only after the GSC auth step. To report the difference between the 2 auth steps, a new case is added to the HuC getparam. This way, the clear media driver can start submitting before full auth, as partial auth is enough for those workloads. v2: fix authentication status check for DG2 v3: add a better comment at the top of the HuC file to explain the different approaches to load and auth (John) v4: update call to intel_huc_is_authenticated in the pxp code to check for GSC authentication v5: drop references to meu and esclamation mark in huc_auth print (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> #v2 Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-5-daniele.ceraolospurio@intel.com
2023-06-05fs: add a method to shut down the file systemChristoph Hellwig
Add a new ->shutdown super operation that can be used to tell the file system to shut down, and call it from newly created holder ops when the block device under a file system shuts down. This only covers the main block device for "simple" file systems using get_tree_bdev / mount_bdev. File systems their own get_tree method or opening additional devices will need to set up their own blk_holder_ops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: add a mark_dead holder operationChristoph Hellwig
Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead to notify the holder that the block device it is using has been marked dead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: introduce holder opsChristoph Hellwig
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05Merge tag 'qcom-driver-fixes-for-6.4' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm driver fixes for 6.4 Error paths is corrected across icc-bwmon, rpmh-rsc, ramp_controller and rmtfs. The ice module is renamed qcom_ice, to avoid clashing with existing "ice" driver. SA8155P-specific RPMh power-domains are introduced to avoid the code trying to access resources that exists on SM8150, but not on SA8155P. Lastly, changes to the EDAC driver to fix an issue where the driver performs mmio based on the wrong register map. * tag 'qcom-driver-fixes-for-6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: EDAC/qcom: Get rid of hardcoded register offsets EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup() dt-bindings: cache: qcom,llcc: Fix SM8550 description soc: qcom: rpmhpd: Add SA8155P power domains dt-bindings: power: qcom,rpmpd: Add SA8155P soc: qcom: Rename ice to qcom_ice to avoid module name conflict soc: qcom: rmtfs: Fix error code in probe() soc: qcom: ramp_controller: Fix an error handling path in qcom_ramp_controller_probe() soc: qcom: rpmh-rsc: drop redundant unsigned >=0 comparision soc: qcom: icc-bwmon: fix incorrect error code passed to dev_err_probe() Link: https://lore.kernel.org/r/20230601141058.2246039-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-06-05ACPI: scan: Reduce overhead related to devices with dependenciesRafael J. Wysocki
Notice that all of the objects for which the acpi_scan_check_dep() return value is greater than 0 are present in acpi_dep_list as consumers (there may be multiple entries for one object, but that is not a problem), so after carrying out the initial ACPI namespace walk in which devices with dependencies are skipped, acpi_bus_scan() can simply walk acpi_dep_list and enumerate all of the unique consumer objects from there and their descendants instead of walking the entire target branch of the ACPI namespace and looking for device objects that have not been enumerated yet in it. Because walking acpi_dep_list is generally less overhead than walking the entire ACPI namespace, use the observation above to reduce the system initialization overhead related to ACPI, which is particularly important on large systems. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2023-06-05Merge branch 'topic/midi20' into for-nextTakashi Iwai
Pull fixes for a couple of minor issues spotted by bots. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-06-05ALSA: usb-audio: Use __le16 for 16bit USB descriptor fieldsTakashi Iwai
Use proper notion for 16bit values for fixing the sparse warnings. Fixes: f8ddb0fb3289 ("ALSA: usb-audio: Define USB MIDI 2.0 specs") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202305260528.wcqjXso8-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202305270534.odwHL9F0-lkp@intel.com/ Link: https://lore.kernel.org/r/20230605144758.6677-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-06-05coresight: Fix CTI module refcount leak by making it a helper deviceJames Clark
The CTI module has some hard coded refcounting code that has a leak. For example running perf and then trying to unload it fails: perf record -e cs_etm// -a -- ls rmmod coresight_cti rmmod: ERROR: Module coresight_cti is in use The coresight core already handles references of devices in use, so by making CTI a normal helper device, we get working refcounting for free. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20230425143542.2305069-14-james.clark@arm.com
2023-06-05coresight: Enable and disable helper devices adjacent to the pathJames Clark
Currently CATU is the only helper device, and its enable and disable calls are hard coded. To allow more helper devices to be added in a generic way, remove these hard coded calls and just enable and disable all helper devices. This has to apply to helpers adjacent to the path, because they will never be in the path. CATU was already discovered in this way, so there is no change there. One change that is needed is for CATU to call back into ETR to allocate the buffer. Because the enable call was previously hard coded, it was done at a point where the buffer was already allocated, but this is no longer the case. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20230425143542.2305069-13-james.clark@arm.com
2023-06-05coresight: Make refcount a property of the connectionJames Clark
This removes the need to do an additional lookup for the total number of ports used and also removes the need to allocate an array of refcounts which is just another representation of a connection array. This was only used for link type devices, for regular devices a single refcount on the coresight device is used. There is a both an input and output refcount in case two link type devices are connected together so that they don't overwrite each other's counts. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20230425143542.2305069-11-james.clark@arm.com