linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-11-18	KVM: x86: avoid memslot check in NX hugepage recovery if it cannot succeed	Paolo Bonzini
	Since gfn_to_memslot() is relatively expensive, it helps to skip it if it the memslot cannot possibly have dirty logging enabled. In order to do this, add to struct kvm a counter of the number of log-page memslots. While the correct value can only be read with slots_lock taken, the NX recovery thread is content with using an approximate value. Therefore, the counter is an atomic_t. Based on https://lore.kernel.org/kvm/20221027200316.2221027-2-dmatlack@google.com/ by David Matlack. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-18	regulator: Add of_regulator_bulk_get_all()	Mark Brown
	Merge series from Corentin Labbe <clabbe@baylibre.com>: This adds a new regulator_bulk_get_all() which grab all supplies properties in a DT node, for use in implementing generic handling for things like MDIO PHYs where the physical standardisation of the bus does not extend to power supplies.
2022-11-18	regulator: Add of_regulator_bulk_get_all	Corentin Labbe
	It work exactly like regulator_bulk_get() but instead of working on a provided list of names, it seek all consumers properties matching xxx-supply. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Link: https://lore.kernel.org/r/20221115073603.3425396-2-clabbe@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-11-18	ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses	Mark Rutland
	In subsequent patches we'll arrange for architectures to have an ftrace_regs which is entirely distinct from pt_regs. In preparation for this, we need to minimize the use of pt_regs to where strictly necessary in the core ftrace code. This patch adds new ftrace_regs_{get,set}_*() helpers which can be used to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y, these can always be used on any ftrace_regs, and when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are available. A new ftrace_regs_has_args(fregs) helper is added which code can use to check when these are usable. Co-developed-by: Florent Revest <revest@chromium.org> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18	ftrace: rename ftrace_instruction_pointer_set() -> ↵	Mark Rutland
	ftrace_regs_set_instruction_pointer() In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*() helpers. In preparation, this patch renames ftrace_instruction_pointer_set() to ftrace_regs_set_instruction_pointer(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18	ftrace: pass fregs to arch_ftrace_set_direct_caller()	Mark Rutland
	In subsequent patches we'll arrange for architectures to have an ftrace_regs which is entirely distinct from pt_regs. In preparation for this, we need to minimize the use of pt_regs to where strictly necessary in the core ftrace code. This patch changes the prototype of arch_ftrace_set_direct_caller() to take ftrace_regs rather than pt_regs, and moves the extraction of the pt_regs into arch_ftrace_set_direct_caller(). On x86, arch_ftrace_set_direct_caller() can be used even when CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n, and <linux/ftrace.h> defines struct ftrace_regs. Due to this, it's necessary to define arch_ftrace_set_direct_caller() as a macro to avoid using an incomplete type. I've also moved the body of arch_ftrace_set_direct_caller() after the CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y defineidion of struct ftrace_regs. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20221103170520.931305-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18	Merge tag 'wireless-next-2022-11-18' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.2 Second set of patches for v6.2. Only driver patches this time, nothing really special. Unused platform data support was removed from wl1251 and rtw89 got WoWLAN support. Major changes: ath11k * support configuring channel dwell time during scan rtw89 * new dynamic header firmware format support * Wake-over-WLAN support rtl8xxxu * enable IEEE80211_HW_SUPPORT_FAST_XMIT ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18	mmc: core: Fix ambiguous TRIM and DISCARD arg	Christian Löhle
	Clean up the MMC_TRIM_ARGS define that became ambiguous with DISCARD introduction. While at it, let's fix one usage where MMC_TRIM_ARGS falsely included DISCARD too. Fixes: b3bf915308ca ("mmc: core: new discard feature support at eMMC v4.5") Signed-off-by: Christian Loehle <cloehle@hyperstone.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/11376b5714964345908f3990f17e0701@hyperstone.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-11-18	efi: random: combine bootloader provided RNG seed with RNG protocol output	Ard Biesheuvel
	Instead of blindly creating the EFI random seed configuration table if the RNG protocol is implemented and works, check whether such a EFI configuration table was provided by an earlier boot stage and if so, concatenate the existing and the new seeds, leaving it up to the core code to mix it in and credit it the way it sees fit. This can be used for, e.g., systemd-boot, to pass an additional seed to Linux in a way that can be consumed by the kernel very early. In that case, the following definitions should be used to pass the seed to the EFI stub: struct linux_efi_random_seed { u32 size; // of the 'seed' array in bytes u8 seed[]; }; The memory for the struct must be allocated as EFI_ACPI_RECLAIM_MEMORY pool memory, and the address of the struct in memory should be installed as a EFI configuration table using the following GUID: LINUX_EFI_RANDOM_SEED_TABLE_GUID 1ce1e5bc-7ceb-42f2-81e5-8aadf180f57b Note that doing so is safe even on kernels that were built without this patch applied, but the seed will simply be overwritten with a seed derived from the EFI RNG protocol, if available. The recommended seed size is 32 bytes, and seeds larger than 512 bytes are considered corrupted and ignored entirely. In order to preserve forward secrecy, seeds from previous bootloaders are memzero'd out, and in order to preserve memory, those older seeds are also freed from memory. Freeing from memory without first memzeroing is not safe to do, as it's possible that nothing else will ever overwrite those pages used by EFI. Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> [ardb: incorporate Jason's followup changes to extend the maximum seed size on the consumer end, memzero() it and drop a needless printk] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18	crypto: hisilicon/qm - modify the process of regs dfx	Kai Ye
	The last register logic and different register logic are combined. Use "u32" instead of 'int' in the regs function input parameter to simplify some checks. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-18	crypto: hisilicon/qm - add missing pci_dev_put() in q_num_set()	Xiongfeng Wang
	pci_get_device() will increase the reference count for the returned pci_dev. We need to use pci_dev_put() to decrease the reference count before q_num_set() returns. Fixes: c8b4b477079d ("crypto: hisilicon - add HiSilicon HPRE accelerator") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-18	hwrng: core - treat default_quality as a maximum and default to 1024	Jason A. Donenfeld
	Most hw_random devices return entropy which is assumed to be of full quality, but driver authors don't bother setting the quality knob. Some hw_random devices return less than full quality entropy, and then driver authors set the quality knob. Therefore, the entropy crediting should be opt-out rather than opt-in per-driver, to reflect the actual reality on the ground. For example, the two Raspberry Pi RNG drivers produce full entropy randomness, and both EDK2 and U-Boot's drivers for these treat them as such. The result is that EFI then uses these numbers and passes the to Linux, and Linux credits them as boot, thereby initializing the RNG. Yet, in Linux, the quality knob was never set to anything, and so on the chance that Linux is booted without EFI, nothing is ever credited. That's annoying. The same pattern appears to repeat itself throughout various drivers. In fact, very very few drivers have bothered setting quality=1024. Looking at the git history of existing drivers and corresponding mailing list discussion, this conclusion tracks. There's been a decent amount of discussion about drivers that set quality < 1024 -- somebody read and interepreted a datasheet, or made some back of the envelope calculation somehow. But there's been very little, if any, discussion about most drivers where the quality is just set to 1024 or unset (or set to 1000 when the authors misunderstood the API and assumed it was base-10 rather than base-2); in both cases the intent was fairly clear of, "this is a hardware random device; it's fine." So let's invert this logic. A hw_random struct's quality knob now controls the maximum quality a driver can produce, or 0 to specify 1024. Then, the module-wide switch called "default_quality" is changed to represent the maximum quality of any driver. By default it's 1024, and the quality of any particular driver is then given by: min(default_quality, rng->quality ?: 1024); This way, the user can still turn this off for weird reasons (and we can replace whatever driver-specific disabling hacks existed in the past), yet we get proper crediting for relevant RNGs. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-18	Merge tag 'intel-pinctrl-v6.2-1' of ↵	Linus Walleij
	git://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/intel into devel intel-pinctrl for v6.2-1 * Add Intel Moorefield pin control driver * Deduplicate COMMUNITY() macro in the Intel pin control drivers * Switch Freescale GPIO driver to use fwnode instead of of_node * Miscellaneous clenups here and there The following is an automated git shortlog grouped by driver: alderlake: - Deduplicate COMMUNITY macro code cannonlake: - Deduplicate COMMUNITY macro code device property: - Introduce fwnode_device_is_compatible() helper icelake: - Deduplicate COMMUNITY macro code intel: - Add Intel Moorefield pin controller support - Use temporary variable for struct device - Use str_enable_disable() helper merrifield: - Use temporary variable for struct device qcom: - lpass-lpi: Add missed bitfield.h soc: - fsl: qe: Switch to use fwnode instead of of_node sunrisepoint: - Deduplicate COMMUNITY macro code tigerlake: - Deduplicate COMMUNITY macro code
2022-11-18	efi/cper, cxl: Decode CXL Error Log	Smita Koralahalli
	Print the CXL Error Log field as found in CXL Protocol Error Section. The CXL RAS Capability structure will be reused by OS First Handling and the duplication/appropriate placement will be addressed eventually. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18	efi: x86: Move EFI runtime map sysfs code to arch/x86	Ard Biesheuvel
	The EFI runtime map code is only wired up on x86, which is the only architecture that has a need for it in its implementation of kexec. So let's move this code under arch/x86 and drop all references to it from generic code. To ensure that the efi_runtime_map_init() is invoked at the appropriate time use a 'sync' subsys_initcall() that will be called right after the EFI initcall made from generic code where the original invocation of efi_runtime_map_init() resided. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Dave Young <dyoung@redhat.com>
2022-11-18	efi: memmap: Move manipulation routines into x86 arch tree	Ard Biesheuvel
	The EFI memory map is a description of the memory layout as provided by the firmware, and only x86 manipulates it in various different ways for its own memory bookkeeping. So let's move the memmap routines that are only used by x86 into the x86 arch tree. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18	efi: memmap: Move EFI fake memmap support into x86 arch tree	Ard Biesheuvel
	The EFI fake memmap support is specific to x86, which manipulates the EFI memory map in various different ways after receiving it from the EFI stub. On other architectures, we have managed to push back on this, and the EFI memory map is kept pristine. So let's move the fake memmap code into the x86 arch tree, where it arguably belongs. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18	efi: libstub: Implement devicepath support for initrd commandline loader	Ard Biesheuvel
	Currently, the initrd= command line option to the EFI stub only supports loading files that reside on the same volume as the loaded image, which is not workable for loaders like GRUB that don't even implement the volume abstraction (EFI_SIMPLE_FILE_SYSTEM_PROTOCOL), and load the kernel from an anonymous buffer in memory. For this reason, another method was devised that relies on the LoadFile2 protocol. However, the command line loader is rather useful when using the UEFI shell or other generic loaders that have no awareness of Linux specific protocols so let's make it a bit more flexible, by permitting textual device paths to be provided to initrd= as well, provided that they refer to a file hosted on a EFI_SIMPLE_FILE_SYSTEM_PROTOCOL volume. E.g., initrd=PciRoot(0x0)/Pci(0x3,0x0)/HD(1,MBR,0xBE1AFDFA,0x3F,0xFBFC1)/rootfs.cpio.gz Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-11-18	Merge tag 'efi-zboot-direct-for-v6.2' into efi/next	Ard Biesheuvel

2022-11-17	Input: max8997 - convert to modern way to get a reference to a PWM	Uwe Kleine-König
	pwm_request() isn't recommended to be used any more because it relies on global IDs for the PWM which comes with different difficulties. The new way to do things is to find the right PWM using a reference from the platform device. (This can be created either using a device-tree or a platform lookup table, see e.g. commit 5a4412d4a82f ("ARM: pxa: tavorevb: Use PWM lookup table") how to do this.) There are no in-tree users, so there are no other code locations that need adaption. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221117073543.3790449-1-u.kleine-koenig@pengutronix.de Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-11-17	sctp: move SCTP_PAD4 and SCTP_TRUNC4 to linux/sctp.h	Xin Long
	Move these two macros from net/sctp/sctp.h to linux/sctp.h, so that it will be enough to include only linux/sctp.h in nft_exthdr.c and xt_sctp.c. It should not include "net/sctp/sctp.h" if a module does not have a dependence on SCTP module. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Link: https://lore.kernel.org/r/ef6468a687f36da06f575c2131cd4612f6b7be88.1668526821.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17	ethtool: doc: clarify what drivers can implement in their get_drvinfo()	Vincent Mailhol
	Many of the drivers which implement ethtool_ops::get_drvinfo() will prints the .driver, .version or .bus_info of struct ethtool_drvinfo. To have a glance of current state, do: $ git grep -W "get_drvinfo(struct" Printing in those three fields is useless because: - since [1], the driver version should be the kernel version (at least for upstream drivers). Arguably, out of tree drivers might still want to set a custom version, but out of tree is not our focus. - since [2], the core is able to provide default values for .driver and .bus_info. In summary, drivers may provide .fw_version and .erom_version, the rest is expected to be done by the core. In struct ethtool_ops doc from linux/ethtool: rephrase field get_drvinfo() doc to discourage developers from implementing this callback. In struct ethtool_drvinfo doc from uapi/linux/ethtool.h: remove the paragraph mentioning what drivers should do. Rationale: no need to repeat what is already written in struct ethtool_ops doc. But add a note that .fw_version and .erom_version are driver defined. Also update the dummy driver and simply remove the callback in order not to confuse the newcomers: most of the drivers will not need this callback function any more. [1] commit 6a7e25c7fb48 ("net/core: Replace driver version to be kernel version") Link: https://git.kernel.org/torvalds/linux/c/6a7e25c7fb48 [2] commit edaf5df22cb8 ("ethtool: ethtool_get_drvinfo: populate drvinfo fields even if callback exits") Link: https://git.kernel.org/netdev/net-next/c/edaf5df22cb8 Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/r/20221116171828.4093-1-mailhol.vincent@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17	bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}	Kumar Kartikeya Dwivedi
	This commit implements the delayed release logic for bpf_list_push_front and bpf_list_push_back. Once a node has been added to the list, it's pointer changes to PTR_UNTRUSTED. However, it is only released once the lock protecting the list is unlocked. For such PTR_TO_BTF_ID \| MEM_ALLOC with PTR_UNTRUSTED set but an active ref_obj_id, it is still permitted to read them as long as the lock is held. Writing to them is not allowed. This allows having read access to push items we no longer own until we release the lock guarding the list, allowing a little more flexibility when working with these APIs. Note that enabling write support has fairly tricky interactions with what happens inside the critical section. Just as an example, currently, bpf_obj_drop is not permitted, but if it were, being able to write to the PTR_UNTRUSTED pointer while the object gets released back to the memory allocator would violate safety properties we wish to guarantee (i.e. not crashing the kernel). The memory could be reused for a different type in the BPF program or even in the kernel as it gets eventually kfree'd. Not enabling bpf_obj_drop inside the critical section would appear to prevent all of the above, but that is more of an artifical limitation right now. Since the write support is tangled with how we handle potential aliasing of nodes inside the critical section that may or may not be part of the list anymore, it has been deferred to a future patch. Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-18-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	bpf: Introduce bpf_obj_new	Kumar Kartikeya Dwivedi
	Introduce type safe memory allocator bpf_obj_new for BPF programs. The kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments to kfuncs still requires having them in prototype, unlike BPF helpers which always take 5 arguments and have them checked using bpf_func_proto in verifier, ignoring unset argument types. Introduce __ign suffix to ignore a specific kfunc argument during type checks, then use this to introduce support for passing type metadata to the bpf_obj_new_impl kfunc. The user passes BTF ID of the type it wants to allocates in program BTF, the verifier then rewrites the first argument as the size of this type, after performing some sanity checks (to ensure it exists and it is a struct type). The second argument is also fixed up and passed by the verifier. This is the btf_struct_meta for the type being allocated. It would be needed mostly for the offset array which is required for zero initializing special fields while leaving the rest of storage in unitialized state. It would also be needed in the next patch to perform proper destruction of the object's special fields. Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free, using the any context BPF memory allocator introduced recently. To this end, a global instance of the BPF memory allocator is initialized on boot to be used for this purpose. This 'bpf_global_ma' serves all allocations for bpf_obj_new. In the future, bpf_obj_new variants will allow specifying a custom allocator. Note that now that bpf_obj_new can be used to allocate objects that can be linked to BPF linked list (when future linked list helpers are available), we need to also free the elements using bpf_mem_free. However, since the draining of elements is done outside the bpf_spin_lock, we need to do migrate_disable around the call since bpf_list_head_free can be called from map free path where migration is enabled. Otherwise, when called from BPF programs migration is already disabled. A convenience macro is included in the bpf_experimental.h header to hide over the ugly details of the implementation, leading to user code looking similar to a language level extension which allocates and constructs fields of a user type. struct bar { struct bpf_list_node node; }; struct foo { struct bpf_spin_lock lock; struct bpf_list_head head __contains(bar, node); }; void prog(void) { struct foo f; f = bpf_obj_new(typeof(f)); if (!f) return; ... } A key piece of this story is still missing, i.e. the free function, which will come in the next patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-14-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	bpf: Rewrite kfunc argument handling	Kumar Kartikeya Dwivedi
	As we continue to add more features, argument types, kfunc flags, and different extensions to kfuncs, the code to verify the correctness of the kfunc prototype wrt the passed in registers has become ad-hoc and ugly to read. To make life easier, and make a very clear split between different stages of argument processing, move all the code into verifier.c and refactor into easier to read helpers and functions. This also makes sharing code within the verifier easier with kfunc argument processing. This will be more and more useful in later patches as we are now moving to implement very core BPF helpers as kfuncs, to keep them experimental before baking into UAPI. Remove all kfunc related bits now from btf_check_func_arg_match, as users have been converted away to refactored kfunc argument handling. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-12-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	bpf: Allow locking bpf_spin_lock global variables	Kumar Kartikeya Dwivedi
	Global variables reside in maps accessible using direct_value_addr callbacks, so giving each load instruction's rewrite a unique reg->id disallows us from holding locks which are global. The reason for preserving reg->id as a unique value for registers that may point to spin lock is that two separate lookups are treated as two separate memory regions, and any possible aliasing is ignored for the purposes of spin lock correctness. This is not great especially for the global variable case, which are served from maps that have max_entries == 1, i.e. they always lead to map values pointing into the same map value. So refactor the active_spin_lock into a 'active_lock' structure which represents the lock identity, and instead of the reg->id, remember two fields, a pointer and the reg->id. The pointer will store reg->map_ptr or reg->btf. It's only necessary to distinguish for the id == 0 case of global variables, but always setting the pointer to a non-NULL value and using the pointer to check whether the lock is held simplifies code in the verifier. This is generic enough to allow it for global variables, map lookups, and allocated objects at the same time. Note that while whether a lock is held can be answered by just comparing active_lock.ptr to NULL, to determine whether the register is pointing to the same held lock requires comparing _both_ ptr and id. Finally, as a result of this refactoring, pseudo load instructions are not given a unique reg->id, as they are doing lookup for the same map value (max_entries is never greater than 1). Essentially, we consider that the tuple of (ptr, id) will always be unique for any kind of argument to bpf_spin_{lock,unlock}. Note that this can be extended in the future to also remember offset used for locking, so that we can introduce multiple bpf_spin_lock fields in the same allocation. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	bpf: Verify ownership relationships for user BTF types	Kumar Kartikeya Dwivedi
	Ensure that there can be no ownership cycles among different types by way of having owning objects that can hold some other type as their element. For instance, a map value can only hold allocated objects, but these are allowed to have another bpf_list_head. To prevent unbounded recursion while freeing resources, elements of bpf_list_head in local kptrs can never have a bpf_list_head which are part of list in a map value. Later patches will verify this by having dedicated BTF selftests. Also, to make runtime destruction easier, once btf_struct_metas is fully populated, we can stash the metadata of the value type directly in the metadata of the list_head fields, as that allows easier access to the value type's layout to destruct it at runtime from the btf_field entry of the list head itself. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	bpf: Recognize lock and list fields in allocated objects	Kumar Kartikeya Dwivedi
	Allow specifying bpf_spin_lock, bpf_list_head, bpf_list_node fields in a allocated object. Also update btf_struct_access to reject direct access to these special fields. A bpf_list_head allows implementing map-in-map style use cases, where an allocated object with bpf_list_head is linked into a list in a map value. This would require embedding a bpf_list_node, support for which is also included. The bpf_spin_lock is used to protect the bpf_list_head and other data. While we strictly don't require to hold a bpf_spin_lock while touching the bpf_list_head in such objects, as when have access to it, we have complete ownership of the object, the locking constraint is still kept and may be conditionally lifted in the future. Note that the specification of such types can be done just like map values, e.g.: struct bar { struct bpf_list_node node; }; struct foo { struct bpf_spin_lock lock; struct bpf_list_head head __contains(bar, node); struct bpf_list_node node; }; struct map_value { struct bpf_spin_lock lock; struct bpf_list_head head __contains(foo, node); }; To recognize such types in user BTF, we build a btf_struct_metas array of metadata items corresponding to each BTF ID. This is done once during the btf_parse stage to avoid having to do it each time during the verification process's requirement to inspect the metadata. Moreover, the computed metadata needs to be passed to some helpers in future patches which requires allocating them and storing them in the BTF that is pinned by the program itself, so that valid access can be assumed to such data during program runtime. A key thing to note is that once a btf_struct_meta is available for a type, both the btf_record and btf_field_offs should be available. It is critical that btf_field_offs is available in case special fields are present, as we extensively rely on special fields being zeroed out in map values and allocated objects in later patches. The code ensures that by bailing out in case of errors and ensuring both are available together. If the record is not available, the special fields won't be recognized, so not having both is also fine (in terms of being a verification error and not a runtime bug). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	bpf: Introduce allocated objects support	Kumar Kartikeya Dwivedi
	Introduce support for representing pointers to objects allocated by the BPF program, i.e. PTR_TO_BTF_ID that point to a type in program BTF. This is indicated by the presence of MEM_ALLOC type flag in reg->type to avoid having to check btf_is_kernel when trying to match argument types in helpers. Whenever walking such types, any pointers being walked will always yield a SCALAR instead of pointer. In the future we might permit kptr inside such allocated objects (either kernel or program allocated), and it will then form a PTR_TO_BTF_ID of the respective type. For now, such allocated objects will always be referenced in verifier context, hence ref_obj_id == 0 for them is a bug. It is allowed to write to such objects, as long fields that are special are not touched (support for which will be added in subsequent patches). Note that once such a pointer is marked PTR_UNTRUSTED, it is no longer allowed to write to it. No PROBE_MEM handling is therefore done for loads into this type unless PTR_UNTRUSTED is part of the register type, since they can never be in an undefined state, and their lifetime will always be valid. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	include/linux/bpf.h 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record") f71b2f64177a ("bpf: Refactor map->off_arr handling") https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18	random: always mix cycle counter in add_latent_entropy()	Jason A. Donenfeld
	add_latent_entropy() is called every time a process forks, in kernel_clone(). This in turn calls add_device_randomness() using the latent entropy global state. add_device_randomness() does two things: 2) Mixes into the input pool the latent entropy argument passed; and 1) Mixes in a cycle counter, a sort of measurement of when the event took place, the high precision bits of which are presumably difficult to predict. (2) is impossible without CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y. But (1) is always possible. However, currently CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n disables both (1) and (2), instead of just (2). This commit causes the CONFIG_GCC_PLUGIN_LATENT_ENTROPY=n case to still do (1) by passing NULL (len 0) to add_device_randomness() when add_latent_ entropy() is called. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: PaX Team <pageexec@freemail.hu> Cc: Emese Revfy <re.emese@gmail.com> Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	hw_random: use add_hwgenerator_randomness() for early entropy	Jason A. Donenfeld
	Rather than calling add_device_randomness(), the add_early_randomness() function should use add_hwgenerator_randomness(), so that the early entropy can be potentially credited, which allows for the RNG to initialize earlier without having to wait for the kthread to come up. This requires some minor API refactoring, by adding a `sleep_after` parameter to add_hwgenerator_randomness(), so that we don't hit a blocking sleep from add_early_randomness(). Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	random: remove early archrandom abstraction	Jason A. Donenfeld
	The arch_get_random_early() abstraction is not completely useful and adds complexity, because it's not a given that there will be no calls to arch_get_random() between random_init_early(), which uses arch_get_random_early(), and init_cpu_features(). During that gap, crng_reseed() might be called, which uses arch_get_random(), since it's mostly not init code. Instead we can test whether we're in the early phase in arch_get_random() itself, and in doing so avoid all ambiguity about where we are. Fortunately, the only architecture that currently implements arch_get_random_early() also has an alternatives-based cpu feature system, one flag of which determines whether the other flags have been initialized. This makes it possible to do the early check with zero cost once the system is initialized. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	stackprotector: move get_random_canary() into stackprotector.h	Jason A. Donenfeld
	This has nothing to do with random.c and everything to do with stack protectors. Yes, it uses randomness. But many things use randomness. random.h and random.c are concerned with the generation of randomness, not with each and every use. So move this function into the more specific stackprotector.h file where it belongs. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	treewide: use get_random_u32_below() instead of deprecated function	Jason A. Donenfeld
	This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18	random: add helpers for random numbers with given floor or range	Jason A. Donenfeld
	Now that we have get_random_u32_below(), it's nearly trivial to make inline helpers to compute get_random_u32_above() and get_random_u32_inclusive(), which will help clean up open coded loops and manual computations throughout the tree. One snag is that in order to make get_random_u32_inclusive() operate on closed intervals, we have to do some (unlikely) special case handling if get_random_u32_inclusive(0, U32_MAX) is called. The least expensive way of doing this is actually to adjust the slowpath of get_random_u32_below() to have its undefined 0 result just return the output of get_random_u32(). We can make this basically free by calling get_random_u32() before the branch, so that the branch latency gets interleaved. Cc: stable@vger.kernel.org # to ease future backports that use this api Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-17	io_uring: fix multishot accept request leaks	Pavel Begunkov
	Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17	Merge tag 'net-6.1-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Current release - regressions: - tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init() Previous releases - regressions: - bridge: fix memory leaks when changing VLAN protocol - dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims - dsa: don't leak tagger-owned storage on switch driver unbind - eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6 is removed - eth: stmmac: ensure tx function is not running in stmmac_xdp_release() - eth: hns3: fix return value check bug of rx copybreak Previous releases - always broken: - kcm: close race conditions on sk_receive_queue - bpf: fix alignment problem in bpf_prog_test_run_skb() - bpf: fix writing offset in case of fault in strncpy_from_kernel_nofault - eth: macvlan: use built-in RCU list checking - eth: marvell: add sleep time after enabling the loopback bit - eth: octeon_ep: fix potential memory leak in octep_device_setup() Misc: - tcp: configurable source port perturb table size - bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)" * tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits) net: use struct_group to copy ip/ipv6 header addresses net: usb: smsc95xx: fix external PHY reset net: usb: qmi_wwan: add Telit 0x103a composition netdevsim: Fix memory leak of nsim_dev->fa_cookie tcp: configurable source port perturb table size l2tp: Serialize access to sk_user_data with sk_callback_lock net: thunderbolt: Fix error handling in tbnet_init() net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start() net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init() net: dsa: don't leak tagger-owned storage on switch driver unbind net/x25: Fix skb leak in x25_lapb_receive_frame() net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open() bridge: switchdev: Fix memory leaks when changing VLAN protocol net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process net: hns3: fix return value check bug of rx copybreak net: hns3: fix incorrect hw rss hash type of rx packet net: phy: marvell: add sleep time after enabling the loopback bit net: ena: Fix error handling in ena_init() kcm: close race conditions on sk_receive_queue net: ionic: Fix error handling in ionic_init_module() ...
2022-11-17	random: use rejection sampling for uniform bounded random integers	Jason A. Donenfeld
	Until the very recent commits, many bounded random integers were calculated using `get_random_u32() % max_plus_one`, which not only incurs the price of a division -- indicating performance mostly was not a real issue -- but also does not result in a uniformly distributed output if max_plus_one is not a power of two. Recent commits moved to using `prandom_u32_max(max_plus_one)`, which replaces the division with a faster multiplication, but still does not solve the issue with non-uniform output. For some users, maybe this isn't a problem, and for others, maybe it is, but for the majority of users, probably the question has never been posed and analyzed, and nobody thought much about it, probably assuming random is random is random. In other words, the unthinking expectation of most users is likely that the resultant numbers are uniform. So we implement here an efficient way of generating uniform bounded random integers. Through use of compile-time evaluation, and avoiding divisions as much as possible, this commit introduces no measurable overhead. At least for hot-path uses tested, any potential difference was lost in the noise. On both clang and gcc, code generation is pretty small. The new function, get_random_u32_below(), lives in random.h, rather than prandom.h, and has a "get_random_xxx" function name, because it is suitable for all uses, including cryptography. In order to be efficient, we implement a kernel-specific variant of Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. The kernel's variant takes advantage of constant folding to avoid divisions entirely in the vast majority of cases, works on both 32-bit and 64-bit architectures, and requests a minimal amount of bytes from the RNG. Link: https://arxiv.org/pdf/1805.10941.pdf Cc: stable@vger.kernel.org # to ease future backports that use this api Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-17	bpf: Pass map file to .map_update_batch directly	Hou Tao
	Currently bpf_map_do_batch() first invokes fdget(batch.map_fd) to get the target map file, then it invokes generic_map_update_batch() to do batch update. generic_map_update_batch() will get the target map file by using fdget(batch.map_fd) again and pass it to bpf_map_update_value(). The problem is map file returned by the second fdget() may be NULL or a totally different file compared by map file in bpf_map_do_batch(). The reason is that the first fdget() only guarantees the liveness of struct file instead of file descriptor and the file description may be released by concurrent close() through pick_file(). It doesn't incur any problem as for now, because maps with batch update support don't use map file in .map_fd_get_ptr() ops. But it is better to fix the potential access of an invalid map file. Using __bpf_map_get() again in generic_map_update_batch() can not fix the problem, because batch.map_fd may be closed and reopened, and the returned map file may be different with map file got in bpf_map_do_batch(), so just passing the map file directly to .map_update_batch() in bpf_map_do_batch(). Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221116075059.1551277-1-houtao@huaweicloud.com
2022-11-17	fbdev: Make fb_modesetting_disabled() static inline	Thomas Zimmermann
	Make fb_modesetting_disabled() a static-inline function when it is defined in the header file. Avoid the linker error shown below. ld: drivers/video/fbdev/core/fbmon.o: in function `fb_modesetting_disabled': fbmon.c:(.text+0x1e4): multiple definition of `fb_modesetting_disabled'; drivers/video/fbdev/core/fbmem.o:fbmem.c:(.text+0x1bac): first defined here A bug report is at [1]. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Fixes: 0ba2fa8cbd29 ("fbdev: Add support for the nomodeset kernel parameter") Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Link: https://lore.kernel.org/dri-devel/20221117183214.2473e745@canb.auug.org.au/T/#u # 1 Link: https://patchwork.freedesktop.org/patch/msgid/20221117114729.7570-1-tzimmermann@suse.de
2022-11-17	KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL	David Matlack
	Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL on every halt, rather than just sampling the module parameter when the VM is first created. This restore the original behavior of kvm.halt_poll_ns for VMs that have not opted into KVM_CAP_HALT_POLL. Notably, this change restores the ability for admins to disable or change the maximum halt-polling time system wide for VMs not using KVM_CAP_HALT_POLL. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: acd05785e48c ("kvm: add capability for halt polling") Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20221117001657.1067231-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-17	genirq/msi: Remove msi_domain_ops:: Msi_check()	Thomas Gleixner
	No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122015.807616900@linutronix.de
2022-11-17	PCI/MSI: Move pci_alloc_irq_vectors() to api.c	Ahmed S. Darwish
	To disentangle the maze in msi.c, all exported device-driver MSI APIs are now to be grouped in one file, api.c. Make pci_alloc_irq_vectors() a real function instead of wrapper and add proper kernel doc to it. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20221111122014.870888193@linutronix.de
2022-11-17	genirq: Get rid of GENERIC_MSI_IRQ_DOMAIN	Thomas Gleixner
	Adjust to reality and remove another layer of pointless Kconfig indirection. CONFIG_GENERIC_MSI_IRQ is good enough to serve all purposes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122014.524842979@linutronix.de
2022-11-17	PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAIN	Thomas Gleixner
	What a zoo: PCI_MSI select GENERIC_MSI_IRQ PCI_MSI_IRQ_DOMAIN def_bool y depends on PCI_MSI select GENERIC_MSI_IRQ_DOMAIN Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are just an indirection to PCI_MSI. Match the reality and just admit that PCI_MSI requires GENERIC_MSI_IRQ_DOMAIN. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20221111122014.467556921@linutronix.de
2022-11-17	genirq/msi: Add bus token to struct msi_domain_info	Ahmed S. Darwish
	Add a bus token member to struct msi_domain_info and let msi_create_irq_domain() set the bus token. That allows to remove the bus token updates at the call sites. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122014.294554462@linutronix.de
2022-11-17	genirq/irqdomain: Move bus token enum into a seperate header	Thomas Gleixner
	Split the bus token defines out into a seperate header file to avoid inclusion of irqdomain.h in msi.h. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122014.237221143@linutronix.de
2022-11-17	genirq/msi: Make __msi_domain_free_irqs() static	Thomas Gleixner
	Now that the last user is gone, confine it to the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122014.179595843@linutronix.de
2022-11-17	genirq/msi: Provide msi_domain_ops:: Post_free()	Thomas Gleixner
	To prepare for removing the exposure of __msi_domain_free_irqs() provide a post_free() callback in the MSI domain ops which can be used to solve the problem of the only user of __msi_domain_free_irqs() in arch/powerpc. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122014.063153448@linutronix.de