summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2021-12-09genirq/msi: Fixup includesThomas Gleixner
Remove the kobject.h include from msi.h as it's not required and add a sysfs.h include to the core code instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20211206210224.103502021@linutronix.de
2021-12-09genirq/msi: Remove unused domain callbacksThomas Gleixner
No users and there is no need to grow them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211126223824.322987915@linutronix.de Link: https://lore.kernel.org/r/20211206210224.041777889@linutronix.de
2021-12-09genirq/msi: Guard sysfs codeThomas Gleixner
No point in building unused code when CONFIG_SYSFS=n. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20211206210223.985907940@linutronix.de
2021-12-09Merge tag 'drm-misc-next-2021-11-29' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.17: UAPI Changes: Cross-subsystem Changes: * Move 'nomodeset' kernel boot option into DRM subsystem Core Changes: * Replace several DRM_*() logging macros with drm_*() equivalents * panel: Add quirk for Lenovo Yoga Book X91F/L * ttm: Documentation fixes Driver Changes: * Cleanup nomodeset handling in drivers * Fixes * bridge/anx7625: Fix reading EDID; Fix error code * bridge/megachips: Probe both bridges before registering * vboxvideo: Fix ERR_PTR usage Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YaSVz15Q7dAlEevU@linux-uq9g.fritz.box
2021-12-08net: wwan: make debugfs optionalSergey Ryazanov
Debugfs interface is optional for the regular modem use. Some distros and users will want to disable this feature for security or kernel size reasons. So add a configuration option that allows to completely disable the debugfs interface of the WWAN devices. A primary considered use case for this option was embedded firmwares. For example, in OpenWrt, you can not completely disable debugfs, as a lot of wireless stuff can only be configured and monitored with the debugfs knobs. At the same time, reducing the size of a kernel and modules is an essential task in the world of embedded software. Disabling the WWAN and IOSM debugfs interfaces allows us to save 50K (x86-64 build) of space for module storage. Not much, but already considerable when you only have 16MB of storage. So it is hard to just disable whole debugfs. Users need some fine grained set of options to control which debugfs interface is important and should be available and which is not. The new configuration symbol is enabled by default and is hidden under the EXPERT option. So a regular user would not be bothered by another one configuration question. While an embedded distro maintainer will be able to a little more reduce the final image size. Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Acked-by: M Chetan Kumar <m.chetan.kumar@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08Merge tag 'linux-can-next-for-5.17-20211208' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== can-next 2021-12-08 The first patch is by Vincent Mailhol and replaces the custom CAN units with generic one form linux/units.h. The next 3 patches are by Evgeny Boger and add Allwinner R40 support to the sun4i CAN driver. Andy Shevchenko contributes 4 patches to the hi311x CAN driver, consisting of cleanups and converting the driver to the device property API. * tag 'linux-can-next-for-5.17-20211208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: hi311x: hi3110_can_probe(): convert to use dev_err_probe() can: hi311x: hi3110_can_probe(): make use of device property API can: hi311x: hi3110_can_probe(): try to get crystal clock rate from property can: hi311x: hi3110_can_probe(): use devm_clk_get_optional() to get the input clock ARM: dts: sun8i: r40: add node for CAN controller can: sun4i_can: add support for R40 CAN controller dt-bindings: net: can: add support for Allwinner R40 CAN controller can: bittiming: replace CAN units with the generic ones from linux/units.h ==================== Link: https://lore.kernel.org/r/20211208125055.223141-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== bpf 2021-12-08 We've added 12 non-merge commits during the last 22 day(s) which contain a total of 29 files changed, 659 insertions(+), 80 deletions(-). The main changes are: 1) Fix an off-by-two error in packet range markings and also add a batch of new tests for coverage of these corner cases, from Maxim Mikityanskiy. 2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh. 3) Fix two functional regressions and a build warning related to BTF kfunc for modules, from Kumar Kartikeya Dwivedi. 4) Fix outdated code and docs regarding BPF's migrate_disable() use on non- PREEMPT_RT kernels, from Sebastian Andrzej Siewior. 5) Add missing includes in order to be able to detangle cgroup vs bpf header dependencies, from Jakub Kicinski. 6) Fix regression in BPF sockmap tests caused by missing detachment of progs from sockets when they are removed from the map, from John Fastabend. 7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF dispatcher, from Björn Töpel. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add selftests to cover packet access corner cases bpf: Fix the off-by-two error in range markings treewide: Add missing includes masked by cgroup -> bpf dependency tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets bpf: Fix bpf_check_mod_kfunc_call for built-in modules bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL mips, bpf: Fix reference to non-existing Kconfig symbol bpf: Make sure bpf_disable_instrumentation() is safe vs preemption. Documentation/locking/locktypes: Update migrate_disable() bits. bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap bpf, sockmap: Attach map progs to psock early for feature probes bpf, x86: Fix "no previous prototype" warning ==================== Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: keep the bridge_dev and bridge_num as part of the same structureVladimir Oltean
The main desire behind this is to provide coherent bridge information to the fast path without locking. For example, right now we set dp->bridge_dev and dp->bridge_num from separate code paths, it is theoretically possible for a packet transmission to read these two port properties consecutively and find a bridge number which does not correspond with the bridge device. Another desire is to start passing more complex bridge information to dsa_switch_ops functions. For example, with FDB isolation, it is expected that drivers will need to be passed the bridge which requested an FDB/MDB entry to be offloaded, and along with that bridge_dev, the associated bridge_num should be passed too, in case the driver might want to implement an isolation scheme based on that number. We already pass the {bridge_dev, bridge_num} pair to the TX forwarding offload switch API, however we'd like to remove that and squash it into the basic bridge join/leave API. So that means we need to pass this pair to the bridge join/leave API. During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we call the driver's .port_bridge_leave with what used to be our dp->bridge_dev, but provided as an argument. When bridge_dev and bridge_num get folded into a single structure, we need to preserve this behavior in dsa_port_bridge_leave: we need a copy of what used to be in dp->bridge. Switch drivers check bridge membership by comparing dp->bridge_dev with the provided bridge_dev, but now, if we provide the struct dsa_bridge as a pointer, they cannot keep comparing dp->bridge to the provided pointer, since this only points to an on-stack copy. To make this obvious and prevent driver writers from forgetting and doing stupid things, in this new API, the struct dsa_bridge is provided as a full structure (not very large, contains an int and a pointer) instead of a pointer. An explicit comparison function needs to be used to determine bridge membership: dsa_port_offloads_bridge(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: make dp->bridge_num one-basedVladimir Oltean
I have seen too many bugs already due to the fact that we must encode an invalid dp->bridge_num as a negative value, because the natural tendency is to check that invalid value using (!dp->bridge_num). Latest example can be seen in commit 1bec0f05062c ("net: dsa: fix bridge_num not getting cleared after ports leaving the bridge"). Convert the existing users to assume that dp->bridge_num == 0 is the encoding for invalid, and valid bridge numbers start from 1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08x86/sev: Use CC_ATTR attribute to generalize string I/O unrollKuppuswamy Sathyanarayanan
INS/OUTS are not supported in TDX guests and cause #UD. Kernel has to avoid them when running in TDX guest. To support existing usage, string I/O operations are unrolled using IN/OUT instructions. AMD SEV platform implements this support by adding unroll logic in ins#bwl()/outs#bwl() macros with SEV-specific checks. Since TDX VM guests will also need similar support, use CC_ATTR_GUEST_UNROLL_STRING_IO and generic cc_platform_has() API to implement it. String I/O helpers were the last users of sev_key_active() interface and sev_enable_key static key. Remove them. [ bp: Move comment too and do not delete it. ] Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20211206135505.75045-2-kirill.shutemov@linux.intel.com
2021-12-08PM: hibernate: Allow ACPI hardware signature to be honouredDavid Woodhouse
Theoretically, when the hardware signature in FACS changes, the OS is supposed to gracefully decline to attempt to resume from S4: "If the signature has changed, OSPM will not restore the system context and can boot from scratch" In practice, Windows doesn't do this and many laptop vendors do allow the signature to change especially when docking/undocking, so it would be a bad idea to simply comply with the specification by default in the general case. However, there are use cases where we do want the compliant behaviour and we know it's safe. Specifically, when resuming virtual machines where we know the hypervisor has changed sufficiently that resume will fail. We really want to be able to *tell* the guest kernel not to try, so it boots cleanly and doesn't just crash. This patch provides a way to opt in to the spec-compliant behaviour on the command line. A follow-up patch may do this automatically for certain "known good" machines based on a DMI match, or perhaps just for all hypervisor guests since there's no good reason a hypervisor would change the hardware_signature that it exposes to guests *unless* it wants them to obey the ACPI specification. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-08PM: runtime: Fix pm_runtime_active() kerneldoc commentRafael J. Wysocki
The kerneldoc comment of pm_runtime_active() does not reflect the behavior of the function, so update it accordingly. Fixes: 403d2d116ec0 ("PM: runtime: Add kerneldoc comments to multiple helpers") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-12-08Merge branch 'kvm-on-hv-msrbm-fix' into HEADPaolo Bonzini
Merge bugfix for enlightened MSR Bitmap, before adding support to KVM for exposing the feature to nested guests.
2021-12-08clk: gate: Add devm_clk_hw_register_gate()Horatiu Vultur
Add devm_clk_hw_register_gate() - devres-managed version of clk_hw_register_gate() Suggested-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Link: https://lore.kernel.org/r/20211103085102.1656081-2-horatiu.vultur@microchip.com
2021-12-08KVM: Add helpers to wake/query blocking vCPUSean Christopherson
Add helpers to wake and query a blocking vCPU. In addition to providing nice names, the helpers reduce the probability of KVM neglecting to use kvm_arch_vcpu_get_wait(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: stats: Add stat to detect if vcpu is currently blockingJing Zhang
Add a "blocking" stat that userspace can use to detect the case where a vCPU is not being run because of an vCPU/guest action, e.g. HLT or WFS on x86, WFI on arm64, etc... Current guest/host/halt stats don't show this well, e.g. if a guest halts for a long period of time then the vCPU could could appear pathologically blocked due to a host condition, when in reality the vCPU has been put into a not-runnable state by the guest. Originally-by: Cannon Matthews <cannonmatthews@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> [sean: renamed stat to "blocking", massaged changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Split out a kvm_vcpu_block() helper from kvm_vcpu_halt()Sean Christopherson
Factor out the "block" part of kvm_vcpu_halt() so that x86 can emulate non-halt wait/sleep/block conditions that should not be subjected to halt-polling. No functional change intended. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()Sean Christopherson
Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting the actual "block" sequences into a separate helper (to be named kvm_vcpu_block()). x86 will use the standalone block-only path to handle non-halt cases where the vCPU is not runnable. Rename block_ns to halt_ns to match the new function name. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Force PPC to define its own rcuwait objectSean Christopherson
Do not define/reference kvm_vcpu.wait if __KVM_HAVE_ARCH_WQP is true, and instead force the architecture (PPC) to define its own rcuwait object. Allowing common KVM to directly access vcpu->wait without a guard makes it all too easy to introduce potential bugs, e.g. kvm_vcpu_block(), kvm_vcpu_on_spin(), and async_pf_execute() all operate on vcpu->wait, not the result of kvm_arch_vcpu_get_wait(), and so may do the wrong thing for PPC. Due to PPC's shenanigans with respect to callbacks and waits (it switches to the virtual core's wait object at KVM_RUN!?!?), it's not clear whether or not this fixes any bugs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: x86/mmu: Propagate memslot const qualifierBen Gardon
In preparation for implementing in-place hugepage promotion, various functions will need to be called from zap_collapsible_spte_range, which has the const qualifier on its memslot argument. Propagate the const qualifier to the various functions which will be needed. This just serves to simplify the following patch. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20211115234603.2908381-11-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Optimize gfn lookup in kvm_zap_gfn_range()Maciej S. Szmigiero
Introduce a memslots gfn upper bound operation and use it to optimize kvm_zap_gfn_range(). This way this handler can do a quick lookup for intersecting gfns and won't have to do a linear scan of the whole memslot set. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <ef242146a87a335ee93b441dcf01665cb847c902.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Keep memslots in tree-based structures instead of array-based onesMaciej S. Szmigiero
The current memslot code uses a (reverse gfn-ordered) memslot array for keeping track of them. Because the memslot array that is currently in use cannot be modified every memslot management operation (create, delete, move, change flags) has to make a copy of the whole array so it has a scratch copy to work on. Strictly speaking, however, it is only necessary to make copy of the memslot that is being modified, copying all the memslots currently present is just a limitation of the array-based memslot implementation. Two memslot sets, however, are still needed so the VM continues to run on the currently active set while the requested operation is being performed on the second, currently inactive one. In order to have two memslot sets, but only one copy of actual memslots it is necessary to split out the memslot data from the memslot sets. The memslots themselves should be also kept independent of each other so they can be individually added or deleted. These two memslot sets should normally point to the same set of memslots. They can, however, be desynchronized when performing a memslot management operation by replacing the memslot to be modified by its copy. After the operation is complete, both memslot sets once again point to the same, common set of memslot data. This commit implements the aforementioned idea. For tracking of gfns an ordinary rbtree is used since memslots cannot overlap in the guest address space and so this data structure is sufficient for ensuring that lookups are done quickly. The "last used slot" mini-caches (both per-slot set one and per-vCPU one), that keep track of the last found-by-gfn memslot, are still present in the new code. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Use interval tree to do fast hva lookup in memslotsMaciej S. Szmigiero
The current memslots implementation only allows quick binary search by gfn, quick lookup by hva is not possible - the implementation has to do a linear scan of the whole memslots array, even though the operation being performed might apply just to a single memslot. This significantly hurts performance of per-hva operations with higher memslot counts. Since hva ranges can overlap between memslots an interval tree is needed for tracking them. [sean: handle interval tree updates in kvm_replace_memslot()] Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <d66b9974becaa9839be9c4e1a5de97b177b4ac20.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Resolve memslot ID via a hash table instead of via a static arrayMaciej S. Szmigiero
Memslot ID to the corresponding memslot mappings are currently kept as indices in static id_to_index array. The size of this array depends on the maximum allowed memslot count (regardless of the number of memslots actually in use). This has become especially problematic recently, when memslot count cap was removed, so the maximum count is now full 32k memslots - the maximum allowed by the current KVM API. Keeping these IDs in a hash table (instead of an array) avoids this problem. Resolving a memslot ID to the actual memslot (instead of its index) will also enable transitioning away from an array-based implementation of the whole memslots structure in a later commit. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <117fb2c04320e6cd6cf34f205a72eadb0aa8d5f9.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Integrate gfn_to_memslot_approx() into search_memslots()Maciej S. Szmigiero
s390 arch has gfn_to_memslot_approx() which is almost identical to search_memslots(), differing only in that in case the gfn falls in a hole one of the memslots bordering the hole is returned. Add this lookup mode as an option to search_memslots() so we don't have two almost identical functions for looking up a memslot by its gfn. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [sean: tweaked helper names to keep gfn_to_memslot_approx() in s390] Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <171cd89b52c718dbe180ecd909b4437a64a7e2ec.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Stop passing kvm_userspace_memory_region to arch memslot hooksSean Christopherson
Drop the @mem param from kvm_arch_{prepare,commit}_memory_region() now that its use has been removed in all architectures. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <aa5ed3e62c27e881d0d8bc0acbc1572bc336dc19.1638817640.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Let/force architectures to deal with arch specific memslot dataSean Christopherson
Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch code to handle propagating arch specific data from "new" to "old" when necessary. This is a baby step towards dynamically allocating "new" from the get go, and is a (very) minor performance boost on x86 due to not unnecessarily copying arch data. For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE and FLAGS_ONLY. This is functionally a nop as the previous behavior would overwrite the pointer for CREATE, and eventually discard/ignore it for DELETE. For x86, copy the arch data only for FLAGS_ONLY changes. Unlike PPC HV, x86 needs to reallocate arch data in the MOVE case as the size of x86's allocations depend on the alignment of the memslot's gfn. Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to match the "commit" prototype. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> [mss: add missing RISCV kvm_arch_prepare_memory_region() change] Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Require total number of memslot pages to fit in an unsigned longSean Christopherson
Explicitly disallow creating more memslot pages than can fit in an unsigned long, KVM doesn't correctly handle a total number of memslot pages that doesn't fit in an unsigned long and remedying that would be a waste of time. For a 64-bit kernel, this is a nop as memslots are not allowed to overlap in the gfn address space. With a 32-bit kernel, userspace can at most address 3gb of virtual memory, whereas wrapping the total number of pages would require 4tb+ of guest physical memory. Even with x86's second address space for SMM, userspace would need to alias all of guest memory more than one _thousand_ times. And on older x86 hardware with MAXPHYADDR < 43, the guest couldn't actually access any of those aliases even if userspace lied about guest.MAXPHYADDR. On 390 and arm64, this is a nop as they don't support 32-bit hosts. On x86, practically speaking this is simply acknowledging reality as the existing kvm_mmu_calculate_default_mmu_pages() assumes the total number of pages fits in an "unsigned long". On PPC, this is likely a nop as every flavor of PPC KVM assumes gfns (and gpas!) fit in unsigned long. arch/powerpc/kvm/book3s_32_mmu_host.c goes a step further and fails the build if CONFIG_PTE_64BIT=y, which presumably means that it does't support 64-bit physical addresses. On MIPS, this is also likely a nop as the core MMU helpers assume gpas fit in unsigned long, e.g. see kvm_mips_##name##_pte. And finally, RISC-V is a "don't care" as it doesn't exist in any release, i.e. there is no established ABI to break. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <1c2c91baf8e78acccd4dad38da591002e61c013c.1638817638.git.maciej.szmigiero@oracle.com>
2021-12-08KVM: Convert kvm_for_each_vcpu() to using xa_for_each_range()Marc Zyngier
Now that the vcpu array is backed by an xarray, use the optimised iterator that matches the underlying data structure. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-8-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s indexMarc Zyngier
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Convert the kvm->vcpus array to a xarrayMarc Zyngier
At least on arm64 and x86, the vcpus array is pretty huge (up to 1024 entries on x86) and is mostly empty in the majority of the cases (running 1k vcpu VMs is not that common). This mean that we end-up with a 4kB block of unused memory in the middle of the kvm structure. Instead of wasting away this memory, let's use an xarray instead, which gives us almost the same flexibility as a normal array, but with a reduced memory usage with smaller VMs. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-6-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: Move wiping of the kvm->vcpus array to common codeMarc Zyngier
All architectures have similar loops iterating over the vcpus, freeing one vcpu at a time, and eventually wiping the reference off the vcpus array. They are also inconsistently taking the kvm->lock mutex when wiping the references from the array. Make this code common, which will simplify further changes. The locking is dropped altogether, as this should only be called when there is no further references on the kvm structure. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-2-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08can: bittiming: replace CAN units with the generic ones from linux/units.hVincent Mailhol
In [1], we introduced a set of units in linux/can/bittiming.h. Since then, generic SI prefixes were added to linux/units.h in [2]. Those new prefixes can perfectly replace CAN specific ones. This patch replaces all occurrences of the CAN units with their corresponding prefix (from linux/units) and the unit (as a comment) according to below table. CAN units SI metric prefix (from linux/units) + unit (as a comment) ------------------------------------------------------------------------ CAN_KBPS KILO /* BPS */ CAN_MBPS MEGA /* BPS */ CAM_MHZ MEGA /* Hz */ The definition are then removed from linux/can/bittiming.h [1] commit 1d7750760b70 ("can: bittiming: add CAN_KBPS, CAN_MBPS and CAN_MHZ macros") [2] commit 26471d4a6cf8 ("units: Add SI metric prefix definitions") Link: https://lore.kernel.org/all/20211124014536.782550-1-mailhol.vincent@wanadoo.fr Suggested-by: Jimmy Assarsson <extja@kvaser.com> Suggested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07net: phy: Remove unnecessary indentation in the comments of phy_deviceYanteng Si
Fix warning as: linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:543: WARNING: Unexpected indentation. linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:544: WARNING: Block quote ends without a blank line; unexpected unindent. linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:546: WARNING: Unexpected indentation. Suggested-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07Merge tag 'wireless-drivers-next-2021-12-07' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.17 First set of patches for v5.17. The biggest change is the iwlmei driver for Intel's AMT devices. Also now WCN6855 support in ath11k should be usable. Major changes: ath10k * fetch (pre-)calibration data via nvmem subsystem ath11k * enable 802.11 power save mode in station mode for qca6390 and wcn6855 * trace log support * proper board file detection for WCN6855 based on PCI ids * BSS color change support rtw88 * add debugfs file to force lowest basic rate * add quirk to disable PCI ASPM on HP 250 G7 Notebook PC mwifiex * add quirk to disable deep sleep with certain hardware revision in Surface Book 2 devices iwlwifi * add iwlmei driver for co-operating with Intel's Active Management Technology (AMT) devices * tag 'wireless-drivers-next-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (87 commits) iwlwifi: mei: fix linking when tracing is not enabled rtlwifi: rtl8192de: Style clean-ups mwl8k: Use named struct for memcpy() region intersil: Use struct_group() for memcpy() region libertas_tf: Use struct_group() for memcpy() region libertas: Use struct_group() for memcpy() region wlcore: no need to initialise statics to false rsi: Fix out-of-bounds read in rsi_read_pkt() rsi: Fix use-after-free in rsi_rx_done_handler() brcmfmac: Configure keep-alive packet on suspend wilc1000: remove '-Wunused-but-set-variable' warning in chip_wakeup() iwlwifi: mvm: read the rfkill state and feed it to iwlmei iwlwifi: mvm: add vendor commands needed for iwlmei iwlwifi: integrate with iwlmei iwlwifi: mei: add debugfs hooks iwlwifi: mei: add the driver to allow cooperation with CSME mei: bus: add client dma interface mwifiex: Ignore BTCOEX events from the 88W8897 firmware mwifiex: Ensure the version string from the firmware is 0-terminated mwifiex: Add quirk to disable deep sleep with certain hardware revision ... ==================== Link: https://lore.kernel.org/r/20211207144211.A9949C341C1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: watchdog: add net device refcount trackerEric Dumazet
Add a netdevice_tracker inside struct net_device, to track the self reference when a device has an active watchdog timer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07vlan: add net device refcount trackerEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: eql: add net device refcount trackerEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07locktorture,rcutorture,torture: Always log error messageLi Zhijian
Unconditionally log messages corresponding to errors. Acked-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07rcu/nocb: Invoke rcu_core() at the start of deoffloadingFrederic Weisbecker
On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process, some work, such as callbacks acceleration and invocation, may be left unattended due to the volatile checks on the offloaded state. In the worst case this work is postponed until the next rcu_pending() check that can take a jiffy to reach, which can be a problem in case of callbacks flooding. Solve that with invoking rcu_core() early in the de-offloading process. This way any work dismissed by an ongoing rcu_core() call fooled by a preempting deoffloading process will be caught up by a nearby future recall to rcu_core(), this time fully aware of the de-offloading state. Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07rcu/nocb: Prepare state machine for a new stepFrederic Weisbecker
Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the segcblist flags because it is an exclusive state that doesn't mix up with the other flags. Remove it in favour of: _ A flag specifying that rcu_core() needs to perform callbacks execution and acceleration and _ A flag specifying we want the nocb lock to be held in any needed circumstances This clarifies the code and is more flexible: It allows to have a state where rcu_core() runs with locking while offloading hasn't started yet. This is a necessary step to prepare for triggering rcu_core() at the very beginning of the de-offloading process so that rcu_core() won't dismiss work while being preempted by the de-offloading process, at least not without a pending subsequent rcu_core() that will quickly catch up. Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-12-07tcp: expose __tcp_sock_set_cork and __tcp_sock_set_nodelayMaxim Galaganov
Expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for use in MPTCP setsockopt code -- namely for syncing MPTCP socket options with subflows inside sync_socket_options() while already holding the subflow socket lock. Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07skbuff: introduce skb_pull_dataLuiz Augusto von Dentz
Like skb_pull but returns the original data pointer before pulling the data after performing a check against sbk->len. This allows to change code that does "struct foo *p = (void *)skb->data;" which is hard to audit and error prone, to: p = skb_pull_data(skb, sizeof(*p)); if (!p) return; Which is both safer and cleaner. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-12-07regulator: fix bullet lists of regulator_ops commentYanteng Si
Since 89a6a5e56c82("regulator: add property parsing and callbacks to set protection limits") which introduced a warning: Documentation/driver-api/regulator:166: ./include/linux/regulator/driver.h:96: WARNING: Unexpected indentation. Documentation/driver-api/regulator:166: ./include/linux/regulator/driver.h:98: WARNING: Block quote ends without a blank line; unexpected unindent. Let's fix them. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20211207123230.2262047-1-siyanteng@loongson.cn Signed-off-by: Mark Brown <broonie@kernel.org>
2021-12-07locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.hSebastian Andrzej Siewior
The printk header file includes ratelimit_types.h for its __ratelimit() based usage. It is required for the static initializer used in printk_ratelimited(). It uses a raw_spinlock_t and includes the spinlock_types.h. PREEMPT_RT substitutes spinlock_t with a rtmutex based implementation and so its spinlock_t implmentation (provided by spinlock_rt.h) includes rtmutex.h and atomic.h which leads to recursive includes where defines are missing. By including only the raw_spinlock_t defines it avoids the atomic.h related includes at this stage. An example on powerpc: | CALL scripts/atomic/check-atomics.sh |In file included from include/linux/bug.h:5, | from include/linux/page-flags.h:10, | from kernel/bounds.c:10: |arch/powerpc/include/asm/page_32.h: In function ‘clear_page’: |arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function â=80=98__WARNâ=80=99 [-Werror=3Dimplicit-function-declaration] | 87 | __WARN(); \ | | ^~~~~~ |arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99 | 48 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | | ^~~~~~~ |arch/powerpc/include/asm/bug.h:58:17: error: invalid application of ‘sizeofâ€=99 to incomplete type ‘struct bug_entryâ€=99 | 58 | "i" (sizeof(struct bug_entry)), \ | | ^~~~~~ |arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro ‘BUG_ENTRYâ€=99 | 89 | BUG_ENTRY(PPC_TLNEI " %4, 0", \ | | ^~~~~~~~~ |arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99 | 48 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | | ^~~~~~~ |In file included from arch/powerpc/include/asm/ptrace.h:298, | from arch/powerpc/include/asm/hw_irq.h:12, | from arch/powerpc/include/asm/irqflags.h:12, | from include/linux/irqflags.h:16, | from include/asm-generic/cmpxchg-local.h:6, | from arch/powerpc/include/asm/cmpxchg.h:526, | from arch/powerpc/include/asm/atomic.h:11, | from include/linux/atomic.h:7, | from include/linux/rwbase_rt.h:6, | from include/linux/rwlock_types.h:55, | from include/linux/spinlock_types.h:74, | from include/linux/ratelimit_types.h:7, | from include/linux/printk.h:10, | from include/asm-generic/bug.h:22, | from arch/powerpc/include/asm/bug.h:109, | from include/linux/bug.h:5, | from include/linux/page-flags.h:10, | from kernel/bounds.c:10: |include/linux/thread_info.h: In function â=80=98copy_overflowâ=80=99: |include/linux/thread_info.h:210:2: error: implicit declaration of function â=80=98WARNâ=80=99 [-Werror=3Dimplicit-function-declaration] | 210 | WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count); | | ^~~~ The WARN / BUG include pulls in printk.h and then ptrace.h expects WARN (from bug.h) which is not yet complete. Even hw_irq.h has WARN_ON() statements. On POWERPC64 there are missing atomic64 defines while building 32bit VDSO: | VDSO32C arch/powerpc/kernel/vdso32/vgettimeofday.o |In file included from include/linux/atomic.h:80, | from include/linux/rwbase_rt.h:6, | from include/linux/rwlock_types.h:55, | from include/linux/spinlock_types.h:74, | from include/linux/ratelimit_types.h:7, | from include/linux/printk.h:10, | from include/linux/kernel.h:19, | from arch/powerpc/include/asm/page.h:11, | from arch/powerpc/include/asm/vdso/gettimeofday.h:5, | from include/vdso/datapage.h:137, | from lib/vdso/gettimeofday.c:5, | from <command-line>: |include/linux/atomic-arch-fallback.h: In function ‘arch_atomic64_incâ€=99: |include/linux/atomic-arch-fallback.h:1447:2: error: implicit declaration of function ‘arch_atomic64_add’; did you mean ‘arch_atomic_add’? [-Werror=3Dimpl |icit-function-declaration] | 1447 | arch_atomic64_add(1, v); | | ^~~~~~~~~~~~~~~~~ | | arch_atomic_add The generic fallback is not included, atomics itself are not used. If kernel.h does not include printk.h then it comes later from the bug.h include. Allow asm/spinlock_types.h to be included from linux/spinlock_types_raw.h. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211129174654.668506-12-bigeasy@linutronix.de
2021-12-07KVM: Drop stale kvm_is_transparent_hugepage() declarationVitaly Kuznetsov
kvm_is_transparent_hugepage() was removed in commit 205d76ff0684 ("KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()") but its declaration in include/linux/kvm_host.h persisted. Drop it. Fixes: 205d76ff0684 (""KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap()") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211018151407.2107363-1-vkuznets@redhat.com
2021-12-06net: fix recent csum changesEric Dumazet
Vladimir reported csum issues after my recent change in skb_postpull_rcsum() Issue here is the following: initial skb->csum is the csum of [part to be pulled][rest of packet] Old code: skb->csum = csum_sub(skb->csum, csum_partial(pull, pull_length, 0)); New code: skb->csum = ~csum_partial(pull, pull_length, ~skb->csum); This is broken if the csum of [pulled part] happens to be equal to skb->csum, because end result of skb->csum is 0 in new code, instead of being 0xffffffff David Laight suggested to use skb->csum = -csum_partial(pull, pull_length, -skb->csum); I based my patches on existing code present in include/net/seg6.h, update_csum_diff4() and update_csum_diff16() which might need a similar fix. I guess that my tests, mostly pulling 40 bytes of IPv6 header were not providing enough entropy to hit this bug. v2: added wsum_negate() to make sparse happy. Fixes: 29c3002644bd ("net: optimize skb_postpull_rcsum()") Fixes: 0bd28476f636 ("gro: optimize skb_gro_postpull_rcsum()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Suggested-by: David Laight <David.Laight@ACULAB.COM> Cc: David Lebrun <dlebrun@google.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211204045356.3659278-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06netpoll: add net device refcount tracker to struct netpollEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06ipmr, ip6mr: add net device refcount tracker to struct vif_deviceEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-06net: linkwatch: add net device refcount trackerEric Dumazet
Add a netdevice_tracker inside struct net_device, to track the self reference when a device is in lweventlist. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>