summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2023-12-06bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooksAndrii Nakryiko
Based on upstream discussion ([0]), rework existing bpf_prog_alloc_security LSM hook. Rename it to bpf_prog_load and instead of passing bpf_prog_aux, pass proper bpf_prog pointer for a full BPF program struct. Also, we pass bpf_attr union with all the user-provided arguments for BPF_PROG_LOAD command. This will give LSMs as much information as we can basically provide. The hook is also BPF token-aware now, and optional bpf_token struct is passed as a third argument. bpf_prog_load LSM hook is called after a bunch of sanity checks were performed, bpf_prog and bpf_prog_aux were allocated and filled out, but right before performing full-fledged BPF verification step. bpf_prog_free LSM hook is now accepting struct bpf_prog argument, for consistency. SELinux code is adjusted to all new names, types, and signatures. Note, given that bpf_prog_load (previously bpf_prog_alloc) hook can be used by some LSMs to allocate extra security blob, but also by other LSMs to reject BPF program loading, we need to make sure that bpf_prog_free LSM hook is called after bpf_prog_load/bpf_prog_alloc one *even* if the hook itself returned error. If we don't do that, we run the risk of leaking memory. This seems to be possible today when combining SELinux and BPF LSM, as one example, depending on their relative ordering. Also, for BPF LSM setup, add bpf_prog_load and bpf_prog_free to sleepable LSM hooks list, as they are both executed in sleepable context. Also drop bpf_prog_load hook from untrusted, as there is no issue with refcount or anything else anymore, that originally forced us to add it to untrusted list in c0c852dd1876 ("bpf: Do not mark certain LSM hook arguments as trusted"). We now trigger this hook much later and it should not be an issue anymore. [0] https://lore.kernel.org/bpf/9fe88aef7deabbe87d3fc38c4aea3c69.paul@paul-moore.com/ Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06bpf: consistently use BPF token throughout BPF verifier logicAndrii Nakryiko
Remove remaining direct queries to perfmon_capable() and bpf_capable() in BPF verifier logic and instead use BPF token (if available) to make decisions about privileges. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06bpf: take into account BPF token when fetching helper protosAndrii Nakryiko
Instead of performing unconditional system-wide bpf_capable() and perfmon_capable() calls inside bpf_base_func_proto() function (and other similar ones) to determine eligibility of a given BPF helper for a given program, use previously recorded BPF token during BPF_PROG_LOAD command handling to inform the decision. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06bpf: add BPF token support to BPF_PROG_LOAD commandAndrii Nakryiko
Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of allowed BPF program types and attach types, derived from BPF FS at BPF token creation time. Then make sure we perform bpf_token_capable() checks everywhere where it's relevant. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06bpf: add BPF token support to BPF_MAP_CREATE commandAndrii Nakryiko
Allow providing token_fd for BPF_MAP_CREATE command to allow controlled BPF map creation from unprivileged process through delegated BPF token. Wire through a set of allowed BPF map types to BPF token, derived from BPF FS at BPF token creation time. This, in combination with allowed_cmds allows to create a narrowly-focused BPF token (controlled by privileged agent) with a restrictive set of BPF maps that application can attempt to create. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06bpf: introduce BPF token objectAndrii Nakryiko
Add new kind of BPF kernel object, BPF token. BPF token is meant to allow delegating privileged BPF functionality, like loading a BPF program or creating a BPF map, from privileged process to a *trusted* unprivileged process, all while having a good amount of control over which privileged operations could be performed using provided BPF token. This is achieved through mounting BPF FS instance with extra delegation mount options, which determine what operations are delegatable, and also constraining it to the owning user namespace (as mentioned in the previous patch). BPF token itself is just a derivative from BPF FS and can be created through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF FS FD, which can be attained through open() API by opening BPF FS mount point. Currently, BPF token "inherits" delegated command, map types, prog type, and attach type bit sets from BPF FS as is. In the future, having an BPF token as a separate object with its own FD, we can allow to further restrict BPF token's allowable set of things either at the creation time or after the fact, allowing the process to guard itself further from unintentionally trying to load undesired kind of BPF programs. But for now we keep things simple and just copy bit sets as is. When BPF token is created from BPF FS mount, we take reference to the BPF super block's owning user namespace, and then use that namespace for checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN} capabilities that are normally only checked against init userns (using capable()), but now we check them using ns_capable() instead (if BPF token is provided). See bpf_token_capable() for details. Such setup means that BPF token in itself is not sufficient to grant BPF functionality. User namespaced process has to *also* have necessary combination of capabilities inside that user namespace. So while previously CAP_BPF was useless when granted within user namespace, now it gains a meaning and allows container managers and sys admins to have a flexible control over which processes can and need to use BPF functionality within the user namespace (i.e., container in practice). And BPF FS delegation mount options and derived BPF tokens serve as a per-container "flag" to grant overall ability to use bpf() (plus further restrict on which parts of bpf() syscalls are treated as namespaced). Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF) within the BPF FS owning user namespace, rounding up the ns_capable() story of BPF token. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06bpf: add BPF token delegation mount options to BPF FSAndrii Nakryiko
Add few new mount options to BPF FS that allow to specify that a given BPF FS instance allows creation of BPF token (added in the next patch), and what sort of operations are allowed under BPF token. As such, we get 4 new mount options, each is a bit mask - `delegate_cmds` allow to specify which bpf() syscall commands are allowed with BPF token derived from this BPF FS instance; - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies a set of allowable BPF map types that could be created with BPF token; - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies a set of allowable BPF program types that could be loaded with BPF token; - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies a set of allowable BPF program attach types that could be loaded with BPF token; delegate_progs and delegate_attachs are meant to be used together, as full BPF program type is, in general, determined through both program type and program attach type. Currently, these mount options accept the following forms of values: - a special value "any", that enables all possible values of a given bit set; - numeric value (decimal or hexadecimal, determined by kernel automatically) that specifies a bit mask value directly; - all the values for a given mount option are combined, if specified multiple times. E.g., `mount -t bpf nodev /path/to/mount -o delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3 mask. Ideally, more convenient (for humans) symbolic form derived from corresponding UAPI enums would be accepted (e.g., `-o delegate_progs=kprobe|tracepoint`) and I intend to implement this, but it requires a bunch of UAPI header churn, so I postponed it until this feature lands upstream or at least there is a definite consensus that this feature is acceptable and is going to make it, just to minimize amount of wasted effort and not increase amount of non-essential code to be reviewed. Attentive reader will notice that BPF FS is now marked as FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init user namespace as long as the process has sufficient *namespaced* capabilities within that user namespace. But in reality we still restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added to allow creating BPF FS context object (i.e., fsopen("bpf")) from inside unprivileged process inside non-init userns, to capture that userns as the owning userns. It will still be required to pass this context object back to privileged process to instantiate and mount it. This manipulation is important, because capturing non-init userns as the owning userns of BPF FS instance (super block) allows to use that userns to constraint BPF token to that userns later on (see next patch). So creating BPF FS with delegation inside unprivileged userns will restrict derived BPF token objects to only "work" inside that intended userns, making it scoped to a intended "container". Also, setting these delegation options requires capable(CAP_SYS_ADMIN), so unprivileged process cannot set this up without involvement of a privileged process. There is a set of selftests at the end of the patch set that simulates this sequence of steps and validates that everything works as intended. But careful review is requested to make sure there are no missed gaps in the implementation and testing. This somewhat subtle set of aspects is the result of previous discussions ([0]) about various user namespace implications and interactions with BPF token functionality and is necessary to contain BPF token inside intended user namespace. [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/ Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231130185229.2688956-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-06ACPI: bus: update acpi_dev_hid_uid_match() to support multiple typesRaag Jadav
Now that we have _UID matching support for both integer and string types, we can support them into acpi_dev_hid_uid_match() helper as well. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-06ACPI: bus: update acpi_dev_uid_match() to support multiple typesRaag Jadav
According to the ACPI specification, a _UID object can evaluate to either a numeric value or a string. Update acpi_dev_uid_match() to support _UID matching for both integer and string types. Suggested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Raag Jadav <raag.jadav@intel.com> [ rjw: Rename auxiliary macros, relocate kerneldoc comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-06Merge tag 'ffa-fixes-6.7' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes Arm FF-A fixes for v6.7 A bunch of fixes addressing issues around the notification support that was added this cycle. They address issue in partition IDs handling in ffa_notification_info_get(), notifications cleanup path and the size of the allocation in ffa_partitions_cleanup(). It also adds check for the notification enabled state so that the drivers registering the callbacks can be rejected if not enabled/supported. It also moves the partitions setup operation after the notification initialisation so that the driver has the correct state for notification enabled/supported before the partitions are initialised/setup. It also now allows FF-A initialisation to complete successfully even when the notification initialisation fails as it is an optional support in the specification. Initial support just allowed it only if the firmware didn't support notifications. Finally, it also adds a fix for smatch warning by declaring ffa_bus_type structure in the header. * tag 'ffa-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_ffa: Fix ffa_notification_info_get() IDs handling firmware: arm_ffa: Fix the size of the allocation in ffa_partitions_cleanup() firmware: arm_ffa: Fix FFA notifications cleanup path firmware: arm_ffa: Add checks for the notification enabled state firmware: arm_ffa: Setup the partitions after the notification initialisation firmware: arm_ffa: Allow FF-A initialisation even when notification fails firmware: arm_ffa: Declare ffa_bus_type structure in the header Link: https://lore.kernel.org/r/20231116191603.929767-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-06Merge tag 'asahi-soc-mailbox-6.8' of https://github.com/AsahiLinux/linux ↵Arnd Bergmann
into soc/drivers Apple SoC mailbox updates for 6.8 This moves the mailbox driver out of the mailbox subsystem and into SoC, next to its only consumer (RTKit). It has been cooking in linux-next for a long while, so it's time to pull it in. * tag 'asahi-soc-mailbox-6.8' of https://github.com/AsahiLinux/linux: soc: apple: mailbox: Add explicit include of platform_device.h soc: apple: mailbox: Rename config symbol to APPLE_MAILBOX mailbox: apple: Delete driver soc: apple: rtkit: Port to the internal mailbox driver soc: apple: mailbox: Add ASC/M3 mailbox driver soc: apple: rtkit: Get rid of apple_rtkit_send_message_wait Link: https://lore.kernel.org/r/6e64472e-c55d-4499-9a61-da59cfd28021@marcan.st Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-06cpu/hotplug: Remove unused CPU hotplug statesZenghui Yu
There are unused hotplug states which either have never been used or the removal of the usage did not remove the state constant. Drop them to reduce the size of the cpuhp_hp_states array. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20231124121615.1604-1-yuzenghui@huawei.com
2023-12-06regulator: event: Add regulator netlink event supportNaresh Solanki
This commit introduces netlink event support to the regulator subsystem. Changes: - Introduce event.c and regnl.h for netlink event handling. - Implement reg_generate_netlink_event to broadcast regulator events. - Update Makefile to include the new event.c file. Signed-off-by: Naresh Solanki <naresh.solanki@9elements.com> Link: https://lore.kernel.org/r/20231205105207.1262928-1-naresh.solanki@9elements.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-12-06net/tcp: Don't store TCP-AO maclen on reqskDmitry Safonov
This extra check doesn't work for a handshake when SYN segment has (current_key.maclen != rnext_key.maclen). It could be amended to preserve rnext_key.maclen instead of current_key.maclen, but that requires a lookup on listen socket. Originally, this extra maclen check was introduced just because it was cheap. Drop it and convert tcp_request_sock::maclen into boolean tcp_request_sock::used_tcp_ao. Fixes: 06b22ef29591 ("net/tcp: Wire TCP-AO to request sockets") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-06mm/slab: move the rest of slub_def.h to mm/slab.hVlastimil Babka
mm/slab.h is the only place to include include/linux/slub_def.h which has allowed switching between SLAB and SLUB. Now we can simply move the contents over and remove slub_def.h. Use this opportunity to fix up some whitespace (alignment) issues. Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2023-12-06mm/slab: move struct kmem_cache_cpu declaration to slub.cVlastimil Babka
Nothing outside SLUB itself accesses the struct kmem_cache_cpu fields so it does not need to be declared in slub_def.h. This allows also to move enum stat_item. Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2023-12-06mm/slab: remove mm/slab.c and slab_def.hVlastimil Babka
Remove the SLAB implementation. Update CREDITS. Also update and properly sort the SLOB entry there. RIP SLAB allocator (1996 - 2024) Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2023-12-06HID: Intel-ish-hid: Ishtp: Add helper functions for client connectionEven Xu
For every ishtp client driver during initialization state, the flow is: 1 - Allocate an ISHTP client instance 2 - Reserve a host id and link the client instance 3 - Search a firmware client using UUID and get related client information 4 - Bind firmware client id to the ISHTP client instance 5 - Set the state the ISHTP client instance to CONNECTING 6 - Send connect request to firmware 7 - Register event callback for messages from the firmware During deinitizalization state, the flow is: 9 - Set the state the ISHTP client instance to ISHTP_CL_DISCONNECTING 10 - Issue disconnect request to firmware 11 - Unlike the client instance 12 - Flush message queue 13 - Free ISHTP client instance Step 2-7 are identical to the steps of client driver initialization and driver reset flow, but reallocation of the RX/TX ring buffers can be avoided in reset flow. Also for step 9-12, they are identical to the steps of client driver failure handling after connect request, driver reset flow and driver removing. So, add two helper functions to simplify client driver code. ishtp_cl_establish_connection() ishtp_cl_destroy_connection() No functional changes are expected. Signed-off-by: Even Xu <even.xu@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2023-12-06r8152: add vendor/device ID pair for ASUS USB-C2500Kelly Kane
The ASUS USB-C2500 is an RTL8156 based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Kelly Kane <kelly@hawknetworks.com> Link: https://lore.kernel.org/r/20231203011712.6314-1-kelly@hawknetworks.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-05net: core: synchronize link-watch when carrier is queriedJohannes Berg
There are multiple ways to query for the carrier state: through rtnetlink, sysfs, and (possibly) ethtool. Synchronize linkwatch work before these operations so that we don't have a situation where userspace queries the carrier state between the driver's carrier off->on transition and linkwatch running and expects it to work, when really (at least) TX cannot work until linkwatch has run. I previously posted a longer explanation of how this applies to wireless [1] but with this wireless can simply query the state before sending data, to ensure the kernel is ready for it. [1] https://lore.kernel.org/all/346b21d87c69f817ea3c37caceb34f1f56255884.camel@sipsolutions.net/ Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231204214706.303c62768415.I1caedccae72ee5a45c9085c5eb49c145ce1c0dd5@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05tcp: reorganize tcp_sock fast path variablesCoco Li
The variables are organized according in the following way: - TX read-mostly hotpath cache lines - TXRX read-mostly hotpath cache lines - RX read-mostly hotpath cache lines - TX read-write hotpath cache line - TXRX read-write hotpath cache line - RX read-write hotpath cache line Fastpath cachelines end after rcvq_space. Cache line boundaries are enforced only between read-mostly and read-write. That is, if read-mostly tx cachelines bleed into read-mostly txrx cachelines, we do not care. We care about the boundaries between read and write cachelines because we want to prevent false sharing. Fast path variables span cache lines before change: 12 Fast path variables span cache lines after change: 8 Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231204201232.520025-3-lixiaoyan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-05net-device: reorganize net_device fast path variablesCoco Li
Reorganize fast path variables on tx-txrx-rx order Fastpath variables end after npinfo. Below data generated with pahole on x86 architecture. Fast path variables span cache lines before change: 12 Fast path variables span cache lines after change: 4 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231204201232.520025-2-lixiaoyan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06drivers: base: add arch_cpu_is_hotpluggable()Russell King (Oracle)
The differences between architecture specific implementations of arch_register_cpu() are down to whether the CPU is hotpluggable or not. Rather than overriding the weak version of arch_register_cpu(), provide a function that can be used to provide this detail instead. Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/E1r5R3M-00CszH-6r@rmk-PC.armlinux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06drivers: base: Allow parts of GENERIC_CPU_DEVICES to be overriddenJames Morse
Architectures often have extra per-cpu work that needs doing before a CPU is registered, often to determine if a CPU is hotpluggable. To allow the ACPI architectures to use GENERIC_CPU_DEVICES, move the cpu_register() call into arch_register_cpu(), which is made __weak so architectures with extra work can override it. This aligns with the way x86, ia64 and loongarch register hotplug CPUs when they become present. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/E1r5R3B-00Csz6-Uh@rmk-PC.armlinux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-05bpf: support non-r10 register spill/fill to/from stack in precision trackingAndrii Nakryiko
Use instruction (jump) history to record instructions that performed register spill/fill to/from stack, regardless if this was done through read-only r10 register, or any other register after copying r10 into it *and* potentially adjusting offset. To make this work reliably, we push extra per-instruction flags into instruction history, encoding stack slot index (spi) and stack frame number in extra 10 bit flags we take away from prev_idx in instruction history. We don't touch idx field for maximum performance, as it's checked most frequently during backtracking. This change removes basically the last remaining practical limitation of precision backtracking logic in BPF verifier. It fixes known deficiencies, but also opens up new opportunities to reduce number of verified states, explored in the subsequent patches. There are only three differences in selftests' BPF object files according to veristat, all in the positive direction (less states). File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF) -------------------------------------- ------------- --------- --------- ------------- ---------- ---------- ------------- test_cls_redirect_dynptr.bpf.linked3.o cls_redirect 2987 2864 -123 (-4.12%) 240 231 -9 (-3.75%) xdp_synproxy_kern.bpf.linked3.o syncookie_tc 82848 82661 -187 (-0.23%) 5107 5073 -34 (-0.67%) xdp_synproxy_kern.bpf.linked3.o syncookie_xdp 85116 84964 -152 (-0.18%) 5162 5130 -32 (-0.62%) Note, I avoided renaming jmp_history to more generic insn_hist to minimize number of lines changed and potential merge conflicts between bpf and bpf-next trees. Notice also cur_hist_entry pointer reset to NULL at the beginning of instruction verification loop. This pointer avoids the problem of relying on last jump history entry's insn_idx to determine whether we already have entry for current instruction or not. It can happen that we added jump history entry because current instruction is_jmp_point(), but also we need to add instruction flags for stack access. In this case, we don't want to entries, so we need to reuse last added entry, if it is present. Relying on insn_idx comparison has the same ambiguity problem as the one that was fixed recently in [0], so we avoid that. [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@kernel.org/ Acked-by: Eduard Zingerman <eddyz87@gmail.com> Reported-by: Tao Lyu <tao.lyu@epfl.ch> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231205184248.1502704-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-05drivers: perf: arm_pmu: Drop 'pmu_lock' element from 'struct pmu_hw_events'Anshuman Khandual
As 'pmu_lock' element is not being used in any ARM PMU implementation, just drop this from 'struct pmu_hw_events'. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231115092805.737822-3-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-05iov_iter: replace import_single_range() with import_ubuf()Jens Axboe
With the removal of the 'iov' argument to import_single_range(), the two functions are now fully identical. Convert the import_single_range() callers to import_ubuf(), and remove the former fully. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-3-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-05iov_iter: remove unused 'iov' argument from import_single_range()Jens Axboe
It is entirely unused, just get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-05mm/slab: remove CONFIG_SLAB code from slab common codeVlastimil Babka
In slab_common.c and slab.h headers, we can now remove all code behind CONFIG_SLAB and CONFIG_DEBUG_SLAB ifdefs, and remove all CONFIG_SLUB ifdefs. Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2023-12-05cpu/hotplug: remove CPUHP_SLAB_PREPARE hooksVlastimil Babka
The CPUHP_SLAB_PREPARE hooks are only used by SLAB which is removed. SLUB defines them as NULL, so we can remove those altogether. Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Tested-by: David Rientjes <rientjes@google.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2023-12-04net/mlx5e: Tidy up IPsec NAT-T SA discoveryLeon Romanovsky
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones. In case they arrive to RX, the SPI and ESP are located in inner header, while the check was performed on outer header instead. That wrong check caused to the situation where received rekeying request was missed and caused to rekey timeout, which "compensated" this failure by completing rekeying. Fixes: d65954934937 ("net/mlx5e: Support IPsec NAT-T functionality") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Honor user choice of IPsec replay window sizeLeon Romanovsky
Users can configure IPsec replay window size, but mlx5 driver didn't honor their choice and set always 32bits. Fix assignment logic to configure right size from the beginning. Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers") Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net: stmmac: fix FPE events losingJianheng Zhang
The status bits of register MAC_FPE_CTRL_STS are clear on read. Using 32-bit read for MAC_FPE_CTRL_STS in dwmac5_fpe_configure() and dwmac5_fpe_send_mpacket() clear the status bits. Then the stmmac interrupt handler missing FPE event status and leads to FPE handshaking failure and retries. To avoid clear status bits of MAC_FPE_CTRL_STS in dwmac5_fpe_configure() and dwmac5_fpe_send_mpacket(), add fpe_csr to stmmac_fpe_cfg structure to cache the control bits of MAC_FPE_CTRL_STS and to avoid reading MAC_FPE_CTRL_STS in those methods. Fixes: 5a5586112b92 ("net: stmmac: support FPE link partner hand-shaking procedure") Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jianheng Zhang <Jianheng.Zhang@synopsys.com> Link: https://lore.kernel.org/r/CY5PR12MB637225A7CF529D5BE0FBE59CBF81A@CY5PR12MB6372.namprd12.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04net: Add NAPI IRQ supportAmritha Nambiar
Add support to associate the interrupt vector number for a NAPI instance. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://lore.kernel.org/r/170147334728.5260.13221803396905901904.stgit@anambiarhost.jf.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04net: Add queue and napi associationAmritha Nambiar
Add the napi pointer in netdev queue for tracking the napi instance for each queue. This achieves the queue<->napi mapping. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://lore.kernel.org/r/170147331483.5260.15723438819994285695.stgit@anambiarhost.jf.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-04bpf: Optimize the free of inner mapHou Tao
When removing the inner map from the outer map, the inner map will be freed after one RCU grace period and one RCU tasks trace grace period, so it is certain that the bpf program, which may access the inner map, has exited before the inner map is freed. However there is no need to wait for one RCU tasks trace grace period if the outer map is only accessed by non-sleepable program. So adding sleepable_refcnt in bpf_map and increasing sleepable_refcnt when adding the outer map into env->used_maps for sleepable program. Although the max number of bpf program is INT_MAX - 1, the number of bpf programs which are being loaded may be greater than INT_MAX, so using atomic64_t instead of atomic_t for sleepable_refcnt. When removing the inner map from the outer map, using sleepable_refcnt to decide whether or not a RCU tasks trace grace period is needed before freeing the inner map. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-04bpf: Defer the free of inner map when necessaryHou Tao
When updating or deleting an inner map in map array or map htab, the map may still be accessed by non-sleepable program or sleepable program. However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map directly through bpf_map_put(), if the ref-counter is the last one (which is true for most cases), the inner map will be freed by ops->map_free() in a kworker. But for now, most .map_free() callbacks don't use synchronize_rcu() or its variants to wait for the elapse of a RCU grace period, so after the invocation of ops->map_free completes, the bpf program which is accessing the inner map may incur use-after-free problem. Fix the free of inner map by invoking bpf_map_free_deferred() after both one RCU grace period and one tasks trace RCU grace period if the inner map has been removed from the outer map before. The deferment is accomplished by using call_rcu() or call_rcu_tasks_trace() when releasing the last ref-counter of bpf map. The newly-added rcu_head field in bpf_map shares the same storage space with work field to reduce the size of bpf_map. Fixes: bba1dc0b55ac ("bpf: Remove redundant synchronize_rcu.") Fixes: 638e4b825d52 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-04bpf: Add map and need_defer parameters to .map_fd_put_ptr()Hou Tao
map is the pointer of outer map, and need_defer needs some explanation. need_defer tells the implementation to defer the reference release of the passed element and ensure that the element is still alive before the bpf program, which may manipulate it, exits. The following three cases will invoke map_fd_put_ptr() and different need_defer values will be passed to these callers: 1) release the reference of the old element in the map during map update or map deletion. The release must be deferred, otherwise the bpf program may incur use-after-free problem, so need_defer needs to be true. 2) release the reference of the to-be-added element in the error path of map update. The to-be-added element is not visible to any bpf program, so it is OK to pass false for need_defer parameter. 3) release the references of all elements in the map during map release. Any bpf program which has access to the map must have been exited and released, so need_defer=false will be OK. These two parameters will be used by the following patches to fix the potential use-after-free problem for map-in-map. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-04vfio/migration: Add debugfs to live migration driverLongfang Liu
There are multiple devices, software and operational steps involved in the process of live migration. An error occurred on any node may cause the live migration operation to fail. This complex process makes it very difficult to locate and analyze the cause when the function fails. In order to quickly locate the cause of the problem when the live migration fails, I added a set of debugfs to the vfio live migration driver. +-------------------------------------------+ | | | | | QEMU | | | | | +---+----------------------------+----------+ | ^ | ^ | | | | | | | | v | v | +---------+--+ +---------+--+ |src vfio_dev| |dst vfio_dev| +--+---------+ +--+---------+ | ^ | ^ | | | | v | | | +-----------+----+ +-----------+----+ |src dev debugfs | |dst dev debugfs | +----------------+ +----------------+ The entire debugfs directory will be based on the definition of the CONFIG_DEBUG_FS macro. If this macro is not enabled, the interfaces in vfio.h will be empty definitions, and the creation and initialization of the debugfs directory will not be executed. vfio | +---<dev_name1> | +---migration | +--state | +---<dev_name2> +---migration +--state debugfs will create a public root directory "vfio" file. then create a dev_name() file for each live migration device. First, create a unified state acquisition file of "migration" in this device directory. Then, create a public live migration state lookup file "state". Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20231106072225.28577-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-12-04pinctrl: Convert unsigned to unsigned intAndy Shevchenko
Simple type conversion with no functional change implied. While at it, adjust indentation where it makes sense. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231129161459.1002323-24-andriy.shevchenko@linux.intel.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-04usb: typec: tcpci: add vconn over current fault handling to maxim_coreRD Babiera
Add TCPC_FAULT_STATUS_VCONN_OC constant and corresponding mask definition. Maxim TCPC is capable of detecting VConn over current faults, so add fault to alert mask. When a Vconn over current fault is triggered, put the port in an error recovery state via tcpm_port_error_recovery. Signed-off-by: RD Babiera <rdbabiera@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20231121203845.170234-6-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-04usb: typec: tcpm: add tcpm_port_error_recovery symbolRD Babiera
Add tcpm_port_error_recovery symbol and corresponding event that runs in tcpm_pd_event handler to set the port to the ERROR_RECOVERY state. tcpci drivers can use the symbol to reset the port when tcpc faults affect port functionality. Signed-off-by: RD Babiera <rdbabiera@google.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20231121203845.170234-5-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-04usb: core: Allow subclassed USB drivers to override usb_choose_configuration()Douglas Anderson
For some USB devices we might want to do something different for usb_choose_configuration(). One example here is the r8152 driver where we want to end up using the vendor driver with the preferred interface. The r8152 driver tried to make things work by implementing a USB generic_subclass driver and then overriding the normal config selection after it happened. This is less than ideal and also caused breakage if someone deauthorized and re-authorized the USB device because the USB core ended up going back to it's default logic for choosing the best config. I made an attempt to fix this [1] but it was a bit ugly. Let's do this better and allow USB generic_subclass drivers to override usb_choose_configuration(). [1] https://lore.kernel.org/r/20231130154337.1.Ie00e07f07f87149c9ce0b27ae4e26991d307e14b@changeid Suggested-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20231201102946.v2.2.Iade5fa31997f1a0ca3e1dec0591633b02471df12@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-04spi: spl022: fix sleeping in interrupt contextMark Brown
Merge series from Nam Cao <namcao@linutronix.de>: While running the spl022, I got the following warning: BUG: sleeping function called from invalid context at drivers/spi/spi.c:1428 This is because between spi transfers, spi_transfer_delay_exec() (who may sleep if the delay is >10us) is called in interrupt context. This is a problem for anyone who runs this driver and need more than 10us delay. Patch 1 adds an error reporting mechanism, needed by patch 2 who switch to use the default spi_transfer_one_message(), which fix the problem. The series is tested with polling transfer mode and interrupt transfer mode. I can't test the DMA mode, so some help testing here is very appreciated.
2023-12-04mtd: rawnand: NAND controller write protectDavid Regan
Allow NAND controller to be responsible for write protect pin handling during fast path and exec_op destructive operation when controller_wp flag is set. Signed-off-by: David Regan <dregan@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20231125012438.15191-2-dregan@broadcom.com
2023-12-04mtd: rawnand: Add destructive operationBoris Brezillon
Erase and program operations need the write protect (wp) pin to be de-asserted to take effect. Add the concept of destructive operation and pass the information to exec_op() so controllers know when they should de-assert this pin without having to decode the command opcode. Signed-off-by: Boris Brezillon <bbrezillon@kernel.org> Signed-off-by: David Regan <dregan@broadcom.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20231125012438.15191-1-dregan@broadcom.com
2023-12-04device property: Add fwnode_name_eq()Sakari Ailus
Add fwnode_name_eq() to implement the functionality of of_node_name_eq() on fwnode property API. The same convention of ending the comparison at '@' (besides NUL) is applied on also both ACPI and swnode. The function is intended for comparing unit address-less node names on DT and firmware or swnodes compliant with DT bindings. Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Tested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-12-03lib/find: optimize find_*_bit_wrapYury Norov
When an offset is 0, there's no need to search a bitmap from the beginning after the 1st search failed, because each bit has already been tested. Signed-off-by: Yury Norov <yury.norov@gmail.com>
2023-12-03lib/find_bit: Fix the code comments about find_next_bit_wrapGuanjun
The function find_next_bit_wrap only has one memory region to search on. Adjust the comments. Signed-off-by: Guanjun <guanjun@linux.alibaba.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2023-12-03Merge tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull vfio fixes from Alex Williamson: - Fix the lifecycle of a mutex in the pds variant driver such that a reset prior to opening the device won't find it uninitialized. Implement the release path to symmetrically destroy the mutex. Also switch a different lock from spinlock to mutex as the code path has the potential to sleep and doesn't need the spinlock context otherwise (Brett Creeley) - Fix an issue detected via randconfig where KVM tries to symbol_get an undeclared function. The symbol is temporarily declared unconditionally here, which resolves the problem and avoids churn relative to a series pending for the next merge window which resolves some of this symbol ugliness, but also fixes Kconfig dependencies (Sean Christopherson) * tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio: vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart vfio/pds: Fix possible sleep while in atomic context vfio/pds: Fix mutex lock->magic != lock warning