summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-24iommu: Add iommu_domain ops for dirty trackingJoao Martins
Add to iommu domain operations a set of callbacks to perform dirty tracking, particulary to start and stop tracking and to read and clear the dirty data. Drivers are generally expected to dynamically change its translation structures to toggle the tracking and flush some form of control state structure that stands in the IOVA translation path. Though it's not mandatory, as drivers can also enable dirty tracking at boot, and just clear the dirty bits before setting dirty tracking. For each of the newly added IOMMU core APIs: iommu_cap::IOMMU_CAP_DIRTY_TRACKING: new device iommu_capable value when probing for capabilities of the device. .set_dirty_tracking(): an iommu driver is expected to change its translation structures and enable dirty tracking for the devices in the iommu_domain. For drivers making dirty tracking always-enabled, it should just return 0. .read_and_clear_dirty(): an iommu driver is expected to walk the pagetables for the iova range passed in and use iommu_dirty_bitmap_record() to record dirty info per IOVA. When detecting that a given IOVA is dirty it should also clear its dirty state from the PTE, *unless* the flag IOMMU_DIRTY_NO_CLEAR is passed in -- flushing is steered from the caller of the domain_op via iotlb_gather. The iommu core APIs use the same data structure in use for dirty tracking for VFIO device dirty (struct iova_bitmap) abstracted by iommu_dirty_bitmap_record() helper function. domain::dirty_ops: IOMMU domains will store the dirty ops depending on whether the iommu device supports dirty tracking or not. iommu drivers can then use this field to figure if the dirty tracking is supported+enforced on attach. The enforcement is enable via domain_alloc_user() which is done via IOMMUFD hwpt flag introduced later. Link: https://lore.kernel.org/r/20231024135109.73787-5-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24iommufd/iova_bitmap: Move symbols to IOMMUFD namespaceJoao Martins
Have the IOVA bitmap exported symbols adhere to the IOMMUFD symbol export convention i.e. using the IOMMUFD namespace. In doing so, import the namespace in the current users. This means VFIO and the vfio-pci drivers that use iova_bitmap_set(). Link: https://lore.kernel.org/r/20231024135109.73787-4-joao.m.martins@oracle.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24vfio: Move iova_bitmap into iommufdJoao Martins
Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking the user bitmaps, so move to the common dependency into IOMMUFD. In doing so, create the symbol IOMMUFD_DRIVER which designates the builtin code that will be used by drivers when selected. Today this means MLX5_VFIO_PCI and PDS_VFIO_PCI. IOMMU drivers will do the same (in future patches) when supporting dirty tracking and select IOMMUFD_DRIVER accordingly. Given that the symbol maybe be disabled, add header definitions in iova_bitmap.h for when IOMMUFD_DRIVER=n Link: https://lore.kernel.org/r/20231024135109.73787-3-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24vfio/iova_bitmap: Export more API symbolsJoao Martins
In preparation to move iova_bitmap into iommufd, export the rest of API symbols that will be used in what could be used by modules, namely: iova_bitmap_alloc iova_bitmap_free iova_bitmap_for_each Link: https://lore.kernel.org/r/20231024135109.73787-2-joao.m.martins@oracle.com Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24RDMA/hfi1: Remove redundant assignment to pointer ppdColin Ian King
Pointer ppd is being assigned a value in a for-loop however it is never read. The assignment is redundant and can be removed. Cleans up clang scan build warning: drivers/infiniband/hw/hfi1/init.c:1030:3: warning: Value stored to 'ppd' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20231023141733.667807-1-colin.i.king@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-24arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=nMaria Yu
The counting of module PLTs has been broken when CONFIG_RANDOMIZE_BASE=n since commit: 3e35d303ab7d22c4 ("arm64: module: rework module VA range selection") Prior to that commit, when CONFIG_RANDOMIZE_BASE=n, the kernel image and all modules were placed within a 128M region, and no PLTs were necessary for B or BL. Hence count_plts() and partition_branch_plt_relas() skipped handling B and BL when CONFIG_RANDOMIZE_BASE=n. After that commit, modules can be placed anywhere within a 2G window regardless of CONFIG_RANDOMIZE_BASE, and hence PLTs may be necessary for B and BL even when CONFIG_RANDOMIZE_BASE=n. Unfortunately that commit failed to update count_plts() and partition_branch_plt_relas() accordingly. Due to this, module_emit_plt_entry() may fail if an insufficient number of PLT entries have been reserved, resulting in modules failing to load with -ENOEXEC. Fix this by counting PLTs regardless of CONFIG_RANDOMIZE_BASE in count_plts() and partition_branch_plt_relas(). Fixes: 3e35d303ab7d ("arm64: module: rework module VA range selection") Signed-off-by: Maria Yu <quic_aiquny@quicinc.com> Cc: <stable@vger.kernel.org> # 6.5.x Acked-by: Ard Biesheuvel <ardb@kernel.org> Fixes: 3e35d303ab7d ("arm64: module: rework module VA range selection") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231024010954.6768-1-quic_aiquny@quicinc.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-24arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helperJames Morse
ACPI, irqchip and the architecture code all inspect the MADT enabled bit for a GICC entry in the MADT. The addition of an 'online capable' bit means all these sites need updating. Move the current checks behind a helper to make future updates easier. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Acked-by: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/E1quv5D-00AeNJ-U8@rmk-PC.armlinux.org.uk Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-24RDMA/mlx5: Change the key being sent for MPV device affiliationPatrisious Haddad
Change the key that we send from IB driver to EN driver regarding the MPV device affiliation, since at that stage the IB device is not yet initialized, so its index would be zero for different IB devices and cause wrong associations between unrelated master and slave devices. Instead use a unique value from inside the core device which is already initialized at this stage. Fixes: 0d293714ac32 ("RDMA/mlx5: Send events from IB driver about device affiliation state") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/ac7e66357d963fc68d7a419515180212c96d137d.1697705185.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-24dt-bindings: timer: fsl,imxgpt: Add optional osc_per clockAlexander Stein
Since commit bad3db104f89 ("ARM: imx: source gpt per clk from OSC for system timer") osc_per can be used for clocking the GPT which is not scaled when entering low bus mode. This clock source is available only on i.MX6Q (incl. i.MX6QP) and i.MX6DL. Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230810144451.1459985-7-alexander.stein@ew.tq-group.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-10-24netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctxPhil Sutter
Relieve the dump callback from having to check nlmsg_type upon each call. Prep work for set element reset locking. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24drm/i915/perf: Determine context valid in OA reportsUmesh Nerlige Ramappa
When supporting OA for TGL, it was seen that the context valid bit in the report ID was not defined, however revisiting the spec seems to have this bit defined. The bit is used to determine if a context is valid on a context switch and is essential to determine active and idle periods for a context. Re-enable the context valid bit for gen12 platforms. BSpec: 52196 (description of report_id) v2: Include BSpec reference (Ashutosh) Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230802202854.1224547-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit 7eeaedf79989a8f131939782832e21e9218ed2a0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-10-24x86/microcode/intel: Add a minimum required revision for late loadingAshok Raj
In general users, don't have the necessary information to determine whether late loading of a new microcode version is safe and does not modify anything which the currently running kernel uses already, e.g. removal of CPUID bits or behavioural changes of MSRs. To address this issue, Intel has added a "minimum required version" field to a previously reserved field in the microcode header. Microcode updates should only be applied if the current microcode version is equal to, or greater than this minimum required version. Thomas made some suggestions on how meta-data in the microcode file could provide Linux with information to decide if the new microcode is suitable candidate for late loading. But even the "simpler" option requires a lot of metadata and corresponding kernel code to parse it, so the final suggestion was to add the 'minimum required version' field in the header. When microcode changes visible features, microcode will set the minimum required version to its own revision which prevents late loading. Old microcode blobs have the minimum revision field always set to 0, which indicates that there is no information and the kernel considers it unsafe. This is a pure OS software mechanism. The hardware/firmware ignores this header field. For early loading there is no restriction because OS visible features are enumerated after the early load and therefore a change has no effect. The check is always enabled, but by default not enforced. It can be enforced via Kconfig or kernel command line. If enforced, the kernel refuses to late load microcode with a minimum required version field which is zero or when the currently loaded microcode revision is smaller than the minimum required revision. If not enforced the load happens independent of the revision check to stay compatible with the existing behaviour, but it influences the decision whether the kernel is tainted or not. If the check signals that the late load is safe, then the kernel is not tainted. Early loading is not affected by this. [ tglx: Massaged changelog and fixed up the implementation ] Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.776467264@linutronix.de
2023-10-24x86/microcode: Prepare for minimal revision checkThomas Gleixner
Applying microcode late can be fatal for the running kernel when the update changes functionality which is in use already in a non-compatible way, e.g. by removing a CPUID bit. There is no way for admins which do not have access to the vendors deep technical support to decide whether late loading of such a microcode is safe or not. Intel has added a new field to the microcode header which tells the minimal microcode revision which is required to be active in the CPU in order to be safe. Provide infrastructure for handling this in the core code and a command line switch which allows to enforce it. If the update is considered safe the kernel is not tainted and the annoying warning message not emitted. If it's enforced and the currently loaded microcode revision is not safe for late loading then the load is aborted. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211724.079611170@linutronix.de
2023-10-24x86/microcode: Handle "offline" CPUs correctlyThomas Gleixner
Offline CPUs need to be parked in a safe loop when microcode update is in progress on the primary CPU. Currently, offline CPUs are parked in mwait_play_dead(), and for Intel CPUs, its not a safe instruction, because the MWAIT instruction can be patched in the new microcode update that can cause instability. - Add a new microcode state 'UCODE_OFFLINE' to report status on per-CPU basis. - Force NMI on the offline CPUs. Wake up offline CPUs while the update is in progress and then return them back to mwait_play_dead() after microcode update is complete. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.660850472@linutronix.de
2023-10-24x86/apic: Provide apic_force_nmi_on_cpu()Thomas Gleixner
When SMT siblings are soft-offlined and parked in one of the play_dead() variants they still react on NMI, which is problematic on affected Intel CPUs. The default play_dead() variant uses MWAIT on modern CPUs, which is not guaranteed to be safe when updated concurrently. Right now late loading is prevented when not all SMT siblings are online, but as they still react on NMI, it is possible to bring them out of their park position into a trivial rendezvous handler. Provide a function which allows to do that. I does sanity checks whether the target is in the cpus_booted_once_mask and whether the APIC driver supports it. Mark X2APIC and XAPIC as capable, but exclude 32bit and the UV and NUMACHIP variants as that needs feedback from the relevant experts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.603100036@linutronix.de
2023-10-24x86/microcode: Protect against instrumentationThomas Gleixner
The wait for control loop in which the siblings are waiting for the microcode update on the primary thread must be protected against instrumentation as instrumentation can end up in #INT3, #DB or #PF, which then returns with IRET. That IRET reenables NMI which is the opposite of what the NMI rendezvous is trying to achieve. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.545969323@linutronix.de
2023-10-24x86/microcode: Rendezvous and load in NMIThomas Gleixner
stop_machine() does not prevent the spin-waiting sibling from handling an NMI, which is obviously violating the whole concept of rendezvous. Implement a static branch right in the beginning of the NMI handler which is nopped out except when enabled by the late loading mechanism. The late loader enables the static branch before stop_machine() is invoked. Each CPU has an nmi_enable in its control structure which indicates whether the CPU should go into the update routine. This is required to bridge the gap between enabling the branch and actually being at the point where it is required to enter the loader wait loop. Each CPU which arrives in the stopper thread function sets that flag and issues a self NMI right after that. If the NMI function sees the flag clear, it returns. If it's set it clears the flag and enters the rendezvous. This is safe against a real NMI which hits in between setting the flag and sending the NMI to itself. The real NMI will be swallowed by the microcode update and the self NMI will then let stuff continue. Otherwise this would end up with a spurious NMI. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.489900814@linutronix.de
2023-10-24x86/microcode: Replace the all-in-one rendevous handlerThomas Gleixner
with a new handler which just separates the control flow of primary and secondary CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.433704135@linutronix.de
2023-10-24x86/microcode: Provide new control functionsThomas Gleixner
The current all in one code is unreadable and really not suited for adding future features like uniform loading with package or system scope. Provide a set of new control functions which split the handling of the primary and secondary CPUs. These will replace the current rendezvous all in one function in the next step. This is intentionally a separate change because diff makes an complete unreadable mess otherwise. So the flow separates the primary and the secondary CPUs into their own functions which use the control field in the per CPU ucode_ctrl struct. primary() secondary() wait_for_all() wait_for_all() apply_ucode() wait_for_release() release() apply_ucode() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.377922731@linutronix.de
2023-10-24x86/microcode: Add per CPU control fieldThomas Gleixner
Add a per CPU control field to ucode_ctrl and define constants for it which are going to be used to control the loading state machine. In theory this could be a global control field, but a global control does not cover the following case: 15 primary CPUs load microcode successfully 1 primary CPU fails and returns with an error code With global control the sibling of the failed CPU would either try again or the whole operation would be aborted with the consequence that the 15 siblings do not invoke the apply path and end up with inconsistent software state. The result in dmesg would be inconsistent too. There are two additional fields added and initialized: ctrl_cpu and secondaries. ctrl_cpu is the CPU number of the primary thread for now, but with the upcoming uniform loading at package or system scope this will be one CPU per package or just one CPU. Secondaries hands the control CPU a CPU mask which will be required to release the secondary CPUs out of the wait loop. Preparatory change for implementing a properly split control flow for primary and secondary CPUs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.319959519@linutronix.de
2023-10-24x86/microcode: Add per CPU result stateThomas Gleixner
The microcode rendezvous is purely acting on global state, which does not allow to analyze fails in a coherent way. Introduce per CPU state where the results are written into, which allows to analyze the return codes of the individual CPUs. Initialize the state when walking the cpu_present_mask in the online check to avoid another for_each_cpu() loop. Enhance the result print out with that. The structure is intentionally named ucode_ctrl as it will gain control fields in subsequent changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.632681010@linutronix.de
2023-10-24x86/microcode: Sanitize __wait_for_cpus()Thomas Gleixner
The code is too complicated for no reason: - The return value is pointless as this is a strict boolean. - It's way simpler to count down from num_online_cpus() and check for zero. - The timeout argument is pointless as this is always one second. - Touching the NMI watchdog every 100ns does not make any sense, neither does checking every 100ns. This is really not a hotpath operation. Preload the atomic counter with the number of online CPUs and simplify the whole timeout logic. Delay for one microsecond and touch the NMI watchdog once per millisecond. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.204251527@linutronix.de
2023-10-24x86/microcode: Clarify the late load logicThomas Gleixner
reload_store() is way too complicated. Split the inner workings out and make the following enhancements: - Taint the kernel only when the microcode was actually updated. If. e.g. the rendezvous fails, then nothing happened and there is no reason for tainting. - Return useful error codes Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20231002115903.145048840@linutronix.de
2023-10-24x86/microcode: Handle "nosmt" correctlyThomas Gleixner
On CPUs where microcode loading is not NMI-safe the SMT siblings which are parked in one of the play_dead() variants still react to NMIs. So if an NMI hits while the primary thread updates the microcode the resulting behaviour is undefined. The default play_dead() implementation on modern CPUs is using MWAIT which is not guaranteed to be safe against a microcode update which affects MWAIT. Take the cpus_booted_once_mask into account to detect this case and refuse to load late if the vendor specific driver does not advertise that late loading is NMI safe. AMD stated that this is safe, so mark the AMD driver accordingly. This requirement will be partially lifted in later changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.087472735@linutronix.de
2023-10-24x86/microcode: Clean up mc_cpu_down_prep()Thomas Gleixner
This function has nothing to do with suspend. It's a hotplug callback. Remove the bogus comment. Drop the pointless debug printk. The hotplug core provides tracepoints which track the invocation of those callbacks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115903.028651784@linutronix.de
2023-10-24x86/microcode: Get rid of the schedule work indirectionThomas Gleixner
Scheduling work on all CPUs to collect the microcode information is just another extra step for no value. Let the CPU hotplug callback registration do it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.354748138@linutronix.de
2023-10-24x86/microcode: Mop up early loading leftoversThomas Gleixner
Get rid of the initrd_gone hack which was required to keep find_microcode_in_initrd() functional after init. As find_microcode_in_initrd() is now only used during init, mark it accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.298854846@linutronix.de
2023-10-24x86/microcode/amd: Use cached microcode for AP loadThomas Gleixner
Now that the microcode cache is initialized before the APs are brought up, there is no point in scanning builtin/initrd microcode during AP loading. Convert the AP loader to utilize the cache, which in turn makes the CPU hotplug callback which applies the microcode after initrd/builtin is gone, obsolete as the early loading during late hotplug operations including the resume path depends now only on the cache. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.243426023@linutronix.de
2023-10-24x86/microcode/amd: Cache builtin/initrd microcode earlyThomas Gleixner
There is no reason to scan builtin/initrd microcode on each AP. Cache the builtin/initrd microcode in an early initcall so that the early AP loader can utilize the cache. The existing fs initcall which invoked save_microcode_in_initrd_amd() is still required to maintain the initrd_gone flag. Rename it accordingly. This will be removed once the AP loader code is converted to use the cache. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211723.187566507@linutronix.de
2023-10-24x86/microcode/amd: Cache builtin microcode tooThomas Gleixner
save_microcode_in_initrd_amd() fails to cache builtin microcode and only scans initrd. Use find_blobs_in_containers() instead which covers both. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231010150702.495139089@linutronix.de
2023-10-24x86/microcode/amd: Use correct per CPU ucode_cpu_infoThomas Gleixner
find_blobs_in_containers() is invoked on every CPU but overwrites unconditionally ucode_cpu_info of CPU0. Fix this by using the proper CPU data and move the assignment into the call site apply_ucode_from_containers() so that the function can be reused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231010150702.433454320@linutronix.de
2023-10-24x86/microcode: Remove pointless apply() invocationThomas Gleixner
Microcode is applied on the APs during early bringup. There is no point in trying to apply the microcode again during the hotplug operations and neither at the point where the microcode device is initialized. Collect CPU info and microcode revision in setup_online_cpu() for now. This will move to the CPU hotplug callback later. [ bp: Leave the starting notifier for the following scenario: - boot, late load, suspend to disk, resume without the starting notifier, only the last core manages to update the microcode upon resume: # rdmsr -a 0x8b 10000bf 10000bf 10000bf 10000bf 10000bf 10000dc <---- This is on an AMD F10h machine. For the future, one should check whether potential unification of the CPU init path could cover the resume path too so that this can be simplified even more. tglx: This is caused by the odd handling of APs which try to find the microcode blob in builtin or initrd instead of caching the microcode blob during early init before the APs are brought up. Will be cleaned up in a later step. ] Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231017211723.018821624@linutronix.de
2023-10-24x86/microcode/intel: Rework intel_find_matching_signature()Thomas Gleixner
Take a cpu_signature argument and work from there. Move the match() helper next to the callsite as there is no point for having it in a header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.797820205@linutronix.de
2023-10-24x86/microcode/intel: Reuse intel_cpu_collect_info()Thomas Gleixner
No point for an almost duplicate function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.741173606@linutronix.de
2023-10-24x86/microcode/intel: Rework intel_cpu_collect_info()Thomas Gleixner
Nothing needs struct ucode_cpu_info. Make it take struct cpu_signature, let it return a boolean and simplify the implementation. Rename it now that the silly name clash with collect_cpu_info() is gone. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.851573238@linutronix.de
2023-10-24x86/microcode/intel: Unify microcode apply() functionsThomas Gleixner
Deduplicate the early and late apply() functions. [ bp: Rename the function which does the actual application to __apply_microcode() to differentiate it from microcode_ops.apply_microcode(). ] Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231017211722.795508212@linutronix.de
2023-10-24x86/microcode/intel: Switch to kvmalloc()Thomas Gleixner
Microcode blobs are getting larger and might soon reach the kmalloc() limit. Switch over kvmalloc(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.564323243@linutronix.de
2023-10-24x86/microcode/intel: Save the microcode only after a successful late-loadThomas Gleixner
There are situations where the late microcode is loaded into memory but is not applied: 1) The rendezvous fails 2) The microcode is rejected by the CPUs If any of this happens then the pointer which was updated at firmware load time is stale and subsequent CPU hotplug operations either fail to update or create inconsistent microcode state. Save the loaded microcode in a separate pointer before the late load is attempted and when successful, update the hotplug pointer accordingly via a new microcode_ops callback. Remove the pointless fallback in the loader to a microcode pointer which is never populated. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231002115902.505491309@linutronix.de
2023-10-24x86/microcode/intel: Simplify early loadingThomas Gleixner
The early loading code is overly complicated: - It scans the builtin/initrd for microcode not only on the BSP, but also on all APs during early boot and then later in the boot process it scans again to duplicate and save the microcode before initrd goes away. That's a pointless exercise because this can be simply done before bringing up the APs when the memory allocator is up and running. - Saving the microcode from within the scan loop is completely non-obvious and a left over of the microcode cache. This can be done at the call site now which makes it obvious. Rework the code so that only the BSP scans the builtin/initrd microcode once during early boot and save it away in an early initcall for later use. [ bp: Test and fold in a fix from tglx ontop which handles the need to distinguish what save_microcode() does depending on when it is called: - when on the BSP during early load, it needs to find a newer revision than the one currently loaded on the BSP - later, before SMP init, it still runs on the BSP and gets the BSP revision just loaded and uses that revision to know which patch to save for the APs. For that it needs to find the exact one as on the BSP. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231017211722.629085215@linutronix.de
2023-10-24bpf: Improve JEQ/JNE branch taken logicAndrii Nakryiko
When determining if an if/else branch will always or never be taken, use signed range knowledge in addition to currently used unsigned range knowledge. If either signed or unsigned range suggests that condition is always/never taken, return corresponding branch_taken verdict. Current use of unsigned range for this seems arbitrary and unnecessarily incomplete. It is possible for *signed* operations to be performed on register, which could "invalidate" unsigned range for that register. In such case branch_taken will be artificially useless, even if we can still tell that some constant is outside of register value range based on its signed bounds. veristat-based validation shows zero differences across selftests, Cilium, and Meta-internal BPF object files. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://lore.kernel.org/bpf/20231022205743.72352-2-andrii@kernel.org
2023-10-24perf: hisi: Fix use-after-free when register pmu failsJunhao He
When we fail to register the uncore pmu, the pmu context may not been allocated. The error handing will call cpuhp_state_remove_instance() to call uncore pmu offline callback, which migrate the pmu context. Since that's liable to lead to some kind of use-after-free. Use cpuhp_state_remove_instance_nocalls() instead of cpuhp_state_remove_instance() so that the notifiers don't execute after the PMU device has been failed to register. Fixes: a0ab25cd82ee ("drivers/perf: hisi: Add support for HiSilicon PA PMU driver") FIxes: 3bf30882c3c7 ("drivers/perf: hisi: Add support for HiSilicon SLLC PMU driver") Signed-off-by: Junhao He <hejunhao3@huawei.com> Link: https://lore.kernel.org/r/20231024113630.13472-1-hejunhao3@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2023-10-24ASoC: mediatek: mt7986: add sample rate checkerMaso Huang
mt7986 only supports 8/12/16/24/32/48/96/192 kHz Signed-off-by: Maso Huang <maso.huang@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20231024035019.11732-4-maso.huang@mediatek.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-10-24ASoC: mediatek: mt7986: remove the mt7986_wm8960_priv structureMaso Huang
Remove the mt7986_wm8960_priv structure. Signed-off-by: Maso Huang <maso.huang@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20231024035019.11732-3-maso.huang@mediatek.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-10-24ASoC: mediatek: mt7986: drop the remove callback of mt7986_wm8960Maso Huang
Drop the remove callback of mt7986_wm8960. Signed-off-by: Maso Huang <maso.huang@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20231024035019.11732-2-maso.huang@mediatek.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-10-24bpf: Fold smp_mb__before_atomic() into atomic_set_release()Paul E. McKenney
The bpf_user_ringbuf_drain() BPF_CALL function uses an atomic_set() immediately preceded by smp_mb__before_atomic() so as to order storing of ring-buffer consumer and producer positions prior to the atomic_set() call's clearing of the ->busy flag, as follows: smp_mb__before_atomic(); atomic_set(&rb->busy, 0); Although this works given current architectures and implementations, and given that this only needs to order prior writes against a later write. However, it does so by accident because the smp_mb__before_atomic() is only guaranteed to work with read-modify-write atomic operations, and not at all with things like atomic_set() and atomic_read(). Note especially that smp_mb__before_atomic() will not, repeat *not*, order the prior write to "a" before the subsequent non-read-modify-write atomic read from "b", even on strongly ordered systems such as x86: WRITE_ONCE(a, 1); smp_mb__before_atomic(); r1 = atomic_read(&b); Therefore, replace the smp_mb__before_atomic() and atomic_set() with atomic_set_release() as follows: atomic_set_release(&rb->busy, 0); This is no slower (and sometimes is faster) than the original, and also provides a formal guarantee of ordering that the original lacks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/ec86d38e-cfb4-44aa-8fdb-6c925922d93c@paulmck-laptop
2023-10-24bpf: Fix unnecessary -EBUSY from htab_lock_bucketSong Liu
htab_lock_bucket uses the following logic to avoid recursion: 1. preempt_disable(); 2. check percpu counter htab->map_locked[hash] for recursion; 2.1. if map_lock[hash] is already taken, return -BUSY; 3. raw_spin_lock_irqsave(); However, if an IRQ hits between 2 and 3, BPF programs attached to the IRQ logic will not able to access the same hash of the hashtab and get -EBUSY. This -EBUSY is not really necessary. Fix it by disabling IRQ before checking map_locked: 1. preempt_disable(); 2. local_irq_save(); 3. check percpu counter htab->map_locked[hash] for recursion; 3.1. if map_lock[hash] is already taken, return -BUSY; 4. raw_spin_lock(). Similarly, use raw_spin_unlock() and local_irq_restore() in htab_unlock_bucket(). Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked") Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/7a9576222aa40b1c84ad3a9ba3e64011d1a04d41.camel@linux.ibm.com Link: https://lore.kernel.org/bpf/20231012055741.3375999-1-song@kernel.org
2023-10-24ACPI: scan: Rename acpi_scan_device_not_present() to be about enumerationJames Morse
acpi_scan_device_not_present() is called when a device in the hierarchy is not available for enumeration. Historically enumeration was only based on whether the device was present. To add support for only enumerating devices that are both present and enabled, this helper should be renamed. It was only ever about enumeration, rename it acpi_scan_device_not_enumerated(). No change in behaviour is intended. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-24netfilter: nf_tables: set->ops->insert returns opaque set element in case of ↵Pablo Neira Ayuso
EEXIST Return struct nft_elem_priv instead of struct nft_set_ext for consistency with ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv") and to prepare the introduction of element timeout updates from control path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: shrink memory consumption of set elementsPablo Neira Ayuso
Instead of copying struct nft_set_elem into struct nft_trans_elem, store the pointer to the opaque set element object in the transaction. Adapt set backend API (and set backend implementations) to take the pointer to opaque set element representation whenever required. This patch deconstifies .remove() and .activate() set backend API since these modify the set element opaque object. And it also constify nft_set_elem_ext() this provides access to the nft_set_ext struct without updating the object. According to pahole on x86_64, this patch shrinks struct nft_trans_elem size from 216 to 24 bytes. This patch also reduces stack memory consumption by removing the template struct nft_set_elem object, using the opaque set element object instead such as from the set iterator API, catchall elements and the get element command. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24drivers/perf: hisi_pcie: Initialize event->cpu only on successYicong Yang
Initialize the event->cpu only on success. To be more reasonable and keep consistent with other PMUs. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20231024092954.42297-3-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>