summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2021-12-29notifier: Return an error when a callback has already been registeredBorislav Petkov
Return -EEXIST when a notifier callback has already been registered on a notifier chain. This should avoid any homegrown registration tracking at the callsite like https://lore.kernel.org/amd-gfx/20210512013058.6827-1-mukul.joshi@amd.com for example. This version is an alternative of https://lore.kernel.org/r/20211108101157.15189-1-bp@alien8.de which needed to touch every caller not checking the registration routine's return value. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/YcSWNdUBS8A2ZB3s@zn.tnic
2021-12-28kobject: remove kset from struct kset_uevent_ops callbacksGreg Kroah-Hartman
There is no need to pass the pointer to the kset in the struct kset_uevent_ops callbacks as no one uses it, so just remove that pointer entirely. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-27driver core: make kobj_type constant.Wedson Almeida Filho
This way instances of kobj_type (which contain function pointers) can be stored in .rodata, which means that they cannot be [easily/accidentally] modified at runtime. Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211224231345.777370-1-wedsonaf@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "9 patches. Subsystems affected by this patch series: mm (kfence, mempolicy, memory-failure, pagemap, pagealloc, damon, and memory-failure), core-kernel, and MAINTAINERS" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() mm/damon/dbgfs: protect targets destructions with kdamond_lock mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid mm: delete unsafe BUG from page_cache_add_speculative() mm, hwpoison: fix condition in free hugetlb page path MAINTAINERS: mark more list instances as moderated kernel/crash_core: suppress unknown crashkernel parameter warning mm: mempolicy: fix THP allocations escaping mempolicy restrictions kfence: fix memory leak when cat kfence objects
2021-12-25kernel/crash_core: suppress unknown crashkernel parameter warningPhilipp Rudo
When booting with crashkernel= on the kernel command line a warning similar to Kernel command line: ro console=ttyS0 crashkernel=256M Unknown kernel command line parameters "crashkernel=256M", will be passed to user space. is printed. This comes from crashkernel= being parsed independent from the kernel parameter handling mechanism. So the code in init/main.c doesn't know that crashkernel= is a valid kernel parameter and prints this incorrect warning. Suppress the warning by adding a dummy early_param handler for crashkernel=. Link: https://lkml.kernel.org/r/20211208133443.6867-1-prudo@redhat.com Fixes: 86d1919a4fb0 ("init: print out unknown kernel parameters") Signed-off-by: Philipp Rudo <prudo@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-23Merge branch 'ucount-rlimit-fixes-for-v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucount fix from Eric Biederman: "This fixes a silly logic bug in the ucount rlimits code, where it was comparing against the wrong limit" * 'ucount-rlimit-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Fix rlimit max values check
2021-12-23Documentation: livepatch: Add livepatch API pageDavid Vernet
The livepatch subsystem has several exported functions and objects with kerneldoc comments. Though the livepatch documentation contains handwritten descriptions of all of these exported functions, they are currently not pulled into the docs build using the kernel-doc directive. In order to allow readers of the documentation to see the full kerneldoc comments in the generated documentation files, this change adds a new Documentation/livepatch/api.rst page which contains kernel-doc directives to link the kerneldoc comments directly in the documentation. With this, all of the hand-written descriptions of the APIs now cross-reference the kerneldoc comments on the new Livepatching APIs page, and running ./scripts/find-unused-docs.sh on kernel/livepatch no longer shows any files as missing documentation. Note that all of the handwritten API descriptions were left alone with the exception of Documentation/livepatch/system-state.rst, which was updated to allow the cross-referencing to work correctly. The file now follows the cross-referencing formatting guidance specified in Documentation/doc-guide/kernel-doc.rst. Furthermore, some comments around klp_shadow_free_all() were updated to say <_, id> rather than <*, id> to match the rest of the file, and to prevent the docs build from emitting an "Inline emphasis start-string without end string" error. Signed-off-by: David Vernet <void@manifault.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20211221145743.4098360-1-void@manifault.com
2021-12-22kthread: Never put_user the set_child_tid addressEric W. Biederman
Kernel threads abuse set_child_tid. Historically that has been fine as set_child_tid was initialized after the kernel thread had been forked. Unfortunately storing struct kthread in set_child_tid after the thread is running makes struct kthread being unusable for storing result codes of the thread. When set_child_tid is set to struct kthread during fork that results in schedule_tail writing the thread id to the beggining of struct kthread (if put_user does not realize it is a kernel address). Solve this by skipping the put_user for all kthreads. Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lkml.kernel.org/r/YcNsG0Lp94V13whH@archlinux-ax161 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-21bpf: Use struct_size() helperXiu Jianfeng
In an effort to avoid open-coded arithmetic in the kernel, use the struct_size() helper instead of open-coded calculation. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://github.com/KSPP/linux/issues/160 Link: https://lore.kernel.org/bpf/20211220113048.2859-1-xiujianfeng@huawei.com
2021-12-21kthread: Warn about failed allocations for the init kthreadEric W. Biederman
Failed allocates are not expected when setting up the initial task and it is not really possible to handle them either. So I added a warning to report if such an allocation failure ever happens. Correct the sense of the warning so it warns when an allocation failure happens not when the allocation succeeded. Oops. Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lkml.kernel.org/r/20211221231611.785b74cf@canb.auug.org.au Link: https://lkml.kernel.org/r/CA+G9fYvLaR5CF777CKeWTO+qJFTN6vAvm95gtzN+7fw3Wi5hkA@mail.gmail.com Link: https://lkml.kernel.org/r/20211216102956.GC10708@xsang-OptiPlex-9020 Fixes: 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-20blktrace: switch trace spinlock to a raw spinlockWander Lairson Costa
The running_trace_lock protects running_trace_list and is acquired within the tracepoint which implies disabled preemption. The spinlock_t typed lock can not be acquired with disabled preemption on PREEMPT_RT because it becomes a sleeping lock. The runtime of the tracepoint depends on the number of entries in running_trace_list and has no limit. The blk-tracer is considered debug code and higher latencies here are okay. Make running_trace_lock a raw_spinlock_t. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20211220192827.38297-1-wander@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-20audit: use struct_size() helper in audit_[send|make]_reply()Xiu Jianfeng
Make use of struct_size() helper instead of an open-coded calculation. Link: https://github.com/KSPP/linux/issues/160 Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-20swiotlb: Add swiotlb bounce buffer remap function for HV IVMTianyu Lan
In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20211213071407.314309-2-ltykernel@gmail.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-12-20fork: Rename bad_fork_cleanup_threadgroup_lock to bad_fork_cleanup_delayacctEric W. Biederman
I just fixed a bug in copy_process when using the label bad_fork_cleanup_threadgroup_lock. While fixing the bug I looked closer at the label and realized it has been misnamed since 568ac888215c ("cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork"). Fix the name so that fork is easier to understand. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-20fork: Stop protecting back_fork_cleanup_cgroup_lock with CONFIG_NUMAEric W. Biederman
Mark Brown <broonie@kernel.org> reported: > This is also causing further build errors including but not limited to: > > /tmp/next/build/kernel/fork.c: In function 'copy_process': > /tmp/next/build/kernel/fork.c:2106:4: error: label 'bad_fork_cleanup_threadgroup_lock' used but not defined > 2106 | goto bad_fork_cleanup_threadgroup_lock; > | ^~~~ It turns out that I messed up and was depending upon a label protected by an ifdef. Move the label out of the ifdef as the ifdef around the label no longer makes sense (if it ever did). Link: https://lkml.kernel.org/r/YbugCP144uxXvRsk@sirena.org.uk Fixes: 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-19Merge tag 'timers_urgent_for_v5.16_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Make sure the CLOCK_REALTIME to CLOCK_MONOTONIC offset is never positive * tag 'timers_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Really make sure wall_to_monotonic isn't positive
2021-12-19Merge tag 'locking_urgent_for_v5.16_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Fix the rtmutex condition checking when the optimistic spinning of a waiter needs to be terminated * tag 'locking_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
2021-12-19Merge tag 'core_urgent_for_v5.16_rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull signal handlign fix from Borislav Petkov: - Prevent lock contention on the new sigaltstack lock on the common-case path, when no changes have been made to the alternative signal stack. * tag 'core_urgent_for_v5.16_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: signal: Skip the altstack update when not needed
2021-12-18bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument supportKumar Kartikeya Dwivedi
Allow passing PTR_TO_CTX, if the kfunc expects a matching struct type, and punt to PTR_TO_MEM block if reg->type does not fall in one of PTR_TO_BTF_ID or PTR_TO_SOCK* types. This will be used by future commits to get access to XDP and TC PTR_TO_CTX, and pass various data (flags, l4proto, netns_id, etc.) encoded in opts struct passed as pointer to kfunc. For PTR_TO_MEM support, arguments are currently limited to pointer to scalar, or pointer to struct composed of scalars. This is done so that unsafe scenarios (like passing PTR_TO_MEM where PTR_TO_BTF_ID of in-kernel valid structure is expected, which may have pointers) are avoided. Since the argument checking happens basd on argument register type, it is not easy to ascertain what the expected type is. In the future, support for PTR_TO_MEM for kfunc can be extended to serve other usecases. The struct type whose pointer is passed in may have maximum nesting depth of 4, all recursively composed of scalars or struct with scalars. Future commits will add negative tests that check whether these restrictions imposed for kfunc arguments are duly rejected by BPF verifier or not. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217015031.1278167-4-memxor@gmail.com
2021-12-18bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.Hao Luo
Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
2021-12-18bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.Hao Luo
Tag the return type of {per, this}_cpu_ptr with RDONLY_MEM. The returned value of this pair of helpers is kernel object, which can not be updated by bpf programs. Previously these two helpers return PTR_OT_MEM for kernel objects of scalar type, which allows one to directly modify the memory. Now with RDONLY_MEM tagging, the verifier will reject programs that write into RDONLY_MEM. Fixes: 63d9b80dcf2c ("bpf: Introducte bpf_this_cpu_ptr()") Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()") Fixes: 4976b718c355 ("bpf: Introduce pseudo_btf_id") Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-8-haoluo@google.com
2021-12-18bpf: Convert PTR_TO_MEM_OR_NULL to composable types.Hao Luo
Remove PTR_TO_MEM_OR_NULL and replace it with PTR_TO_MEM combined with flag PTR_MAYBE_NULL. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-7-haoluo@google.com
2021-12-18bpf: Introduce MEM_RDONLY flagHao Luo
This patch introduce a flag MEM_RDONLY to tag a reg value pointing to read-only memory. It makes the following changes: 1. PTR_TO_RDWR_BUF -> PTR_TO_BUF 2. PTR_TO_RDONLY_BUF -> PTR_TO_BUF | MEM_RDONLY Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-6-haoluo@google.com
2021-12-18bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULLHao Luo
We have introduced a new type to make bpf_reg composable, by allocating bits in the type to represent flags. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. This patch switches the qualified reg_types to use this flag. The reg_types changed in this patch include: 1. PTR_TO_MAP_VALUE_OR_NULL 2. PTR_TO_SOCKET_OR_NULL 3. PTR_TO_SOCK_COMMON_OR_NULL 4. PTR_TO_TCP_SOCK_OR_NULL 5. PTR_TO_BTF_ID_OR_NULL 6. PTR_TO_MEM_OR_NULL 7. PTR_TO_RDONLY_BUF_OR_NULL 8. PTR_TO_RDWR_BUF_OR_NULL Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211217003152.48334-5-haoluo@google.com
2021-12-18bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULLHao Luo
We have introduced a new type to make bpf_ret composable, by reserving high bits to represent flags. One of the flag is PTR_MAYBE_NULL, which indicates a pointer may be NULL. When applying this flag to ret_types, it means the returned value could be a NULL pointer. This patch switches the qualified arg_types to use this flag. The ret_types changed in this patch include: 1. RET_PTR_TO_MAP_VALUE_OR_NULL 2. RET_PTR_TO_SOCKET_OR_NULL 3. RET_PTR_TO_TCP_SOCK_OR_NULL 4. RET_PTR_TO_SOCK_COMMON_OR_NULL 5. RET_PTR_TO_ALLOC_MEM_OR_NULL 6. RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL 7. RET_PTR_TO_BTF_ID_OR_NULL This patch doesn't eliminate the use of these names, instead it makes them aliases to 'RET_PTR_TO_XXX | PTR_MAYBE_NULL'. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-4-haoluo@google.com
2021-12-18bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULLHao Luo
We have introduced a new type to make bpf_arg composable, by reserving high bits of bpf_arg to represent flags of a type. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. When applying this flag to an arg_type, it means the arg can take NULL pointer. This patch switches the qualified arg_types to use this flag. The arg_types changed in this patch include: 1. ARG_PTR_TO_MAP_VALUE_OR_NULL 2. ARG_PTR_TO_MEM_OR_NULL 3. ARG_PTR_TO_CTX_OR_NULL 4. ARG_PTR_TO_SOCKET_OR_NULL 5. ARG_PTR_TO_ALLOC_MEM_OR_NULL 6. ARG_PTR_TO_STACK_OR_NULL This patch does not eliminate the use of these arg_types, instead it makes them an alias to the 'ARG_XXX | PTR_MAYBE_NULL'. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-3-haoluo@google.com
2021-12-18Merge branch 'locking/urgent' into locking/coreThomas Gleixner
Pick up the spin loop condition fix. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2021-12-18locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()Zqiang
Optimistic spinning needs to be terminated when the spinning waiter is not longer the top waiter on the lock, but the condition is negated. It terminates if the waiter is the top waiter, which is defeating the whole purpose. Fixes: c3123c431447 ("locking/rtmutex: Dont dereference waiter lockless") Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211217074207.77425-1-qiang1.zhang@intel.com
2021-12-17timekeeping: Really make sure wall_to_monotonic isn't positiveYu Liao
Even after commit e1d7ba873555 ("time: Always make sure wall_to_monotonic isn't positive") it is still possible to make wall_to_monotonic positive by running the following code: int main(void) { struct timespec time; clock_gettime(CLOCK_MONOTONIC, &time); time.tv_nsec = 0; clock_settime(CLOCK_REALTIME, &time); return 0; } The reason is that the second parameter of timespec64_compare(), ts_delta, may be unnormalized because the delta is calculated with an open coded substraction which causes the comparison of tv_sec to yield the wrong result: wall_to_monotonic = { .tv_sec = -10, .tv_nsec = 900000000 } ts_delta = { .tv_sec = -9, .tv_nsec = -900000000 } That makes timespec64_compare() claim that wall_to_monotonic < ts_delta, but actually the result should be wall_to_monotonic > ts_delta. After normalization, the result of timespec64_compare() is correct because the tv_sec comparison is not longer misleading: wall_to_monotonic = { .tv_sec = -10, .tv_nsec = 900000000 } ts_delta = { .tv_sec = -10, .tv_nsec = 100000000 } Use timespec64_sub() to ensure that ts_delta is normalized, which fixes the issue. Fixes: e1d7ba873555 ("time: Always make sure wall_to_monotonic isn't positive") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211213135727.1656662-1-liaoyu15@huawei.com
2021-12-16Only output backtracking information in log level 2Christy Lee
Backtracking information is very verbose, don't print it in log level 1 to improve readability. Signed-off-by: Christy Lee <christylee@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211216213358.3374427-4-christylee@fb.com
2021-12-16bpf: Right align verifier states in verifier logs.Christy Lee
Make the verifier logs more readable, print the verifier states on the corresponding instruction line. If the previous line was not a bpf instruction, then print the verifier states on its own line. Before: Validating test_pkt_access_subprog3() func#3... 86: R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R10=fp0 ; int test_pkt_access_subprog3(int val, struct __sk_buff *skb) 86: (bf) r6 = r2 87: R2=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) 87: (bc) w7 = w1 88: R1=invP(id=0) R7_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ; return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123)); 88: (bf) r1 = r6 89: R1_w=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) 89: (85) call pc+9 Func#4 is global and valid. Skipping. 90: R0_w=invP(id=0) 90: (bc) w8 = w0 91: R0_w=invP(id=0) R8_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ; return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123)); 91: (b7) r1 = 123 92: R1_w=invP123 92: (85) call pc+65 Func#5 is global and valid. Skipping. 93: R0=invP(id=0) After: 86: R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R10=fp0 ; int test_pkt_access_subprog3(int val, struct __sk_buff *skb) 86: (bf) r6 = r2 ; R2=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) 87: (bc) w7 = w1 ; R1=invP(id=0) R7_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ; return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123)); 88: (bf) r1 = r6 ; R1_w=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) 89: (85) call pc+9 Func#4 is global and valid. Skipping. 90: R0_w=invP(id=0) 90: (bc) w8 = w0 ; R0_w=invP(id=0) R8_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ; return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123)); 91: (b7) r1 = 123 ; R1_w=invP123 92: (85) call pc+65 Func#5 is global and valid. Skipping. 93: R0=invP(id=0) Signed-off-by: Christy Lee <christylee@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-16bpf: Only print scratched registers and stack slots to verifier logs.Christy Lee
When printing verifier state for any log level, print full verifier state only on function calls or on errors. Otherwise, only print the registers and stack slots that were accessed. Log size differences: verif_scale_loop6 before: 234566564 verif_scale_loop6 after: 72143943 69% size reduction kfree_skb before: 166406 kfree_skb after: 55386 69% size reduction Before: 156: (61) r0 = *(u32 *)(r1 +0) 157: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=ctx(id=0,off=0,imm=0) R2_w=invP0 R10=fp0 fp-8_w=00000000 fp-16_w=00\ 000000 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000\ 000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000 fp-112_w=00000000 fp-120_w=00000000 fp-128_w=00000000 fp-136_w=00000000 fp-144_w=00\ 000000 fp-152_w=00000000 fp-160_w=00000000 fp-168_w=00000000 fp-176_w=00000000 fp-184_w=00000000 fp-192_w=00000000 fp-200_w=00000000 fp-208\ _w=00000000 fp-216_w=00000000 fp-224_w=00000000 fp-232_w=00000000 fp-240_w=00000000 fp-248_w=00000000 fp-256_w=00000000 fp-264_w=00000000 f\ p-272_w=00000000 fp-280_w=00000000 fp-288_w=00000000 fp-296_w=00000000 fp-304_w=00000000 fp-312_w=00000000 fp-320_w=00000000 fp-328_w=00000\ 000 fp-336_w=00000000 fp-344_w=00000000 fp-352_w=00000000 fp-360_w=00000000 fp-368_w=00000000 fp-376_w=00000000 fp-384_w=00000000 fp-392_w=\ 00000000 fp-400_w=00000000 fp-408_w=00000000 fp-416_w=00000000 fp-424_w=00000000 fp-432_w=00000000 fp-440_w=00000000 fp-448_w=00000000 ; return skb->len; 157: (95) exit Func#4 is safe for any args that match its prototype Validating get_constant() func#5... 158: R1=invP(id=0) R10=fp0 ; int get_constant(long val) 158: (bf) r0 = r1 159: R0_w=invP(id=1) R1=invP(id=1) R10=fp0 ; return val - 122; 159: (04) w0 += -122 160: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=invP(id=1) R10=fp0 ; return val - 122; 160: (95) exit Func#5 is safe for any args that match its prototype Validating get_skb_ifindex() func#6... 161: R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R3=invP(id=0) R10=fp0 ; int get_skb_ifindex(int val, struct __sk_buff *skb, int var) 161: (bc) w0 = w3 162: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R3=invP(id=0) R10=fp0 After: 156: (61) r0 = *(u32 *)(r1 +0) 157: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=ctx(id=0,off=0,imm=0) ; return skb->len; 157: (95) exit Func#4 is safe for any args that match its prototype Validating get_constant() func#5... 158: R1=invP(id=0) R10=fp0 ; int get_constant(long val) 158: (bf) r0 = r1 159: R0_w=invP(id=1) R1=invP(id=1) ; return val - 122; 159: (04) w0 += -122 160: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) ; return val - 122; 160: (95) exit Func#5 is safe for any args that match its prototype Validating get_skb_ifindex() func#6... 161: R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R3=invP(id=0) R10=fp0 ; int get_skb_ifindex(int val, struct __sk_buff *skb, int var) 161: (bc) w0 = w3 162: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R3=invP(id=0) Signed-off-by: Christy Lee <christylee@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211216213358.3374427-2-christylee@fb.com
2021-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16Merge tag 'audit-pr-20211216' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A single patch to fix a problem where the audit queue could grow unbounded when the audit daemon is forcibly stopped" * tag 'audit-pr-20211216' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: improve robustness of the audit queue handling
2021-12-16Merge tag 'net-5.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, wifi, bpf. Relatively large batches of fixes from BPF and the WiFi stack, calm in general networking. Current release - regressions: - dpaa2-eth: fix buffer overrun when reporting ethtool statistics Current release - new code bugs: - bpf: fix incorrect state pruning for <8B spill/fill - iavf: - add missing unlocks in iavf_watchdog_task() - do not override the adapter state in the watchdog task (again) - mlxsw: spectrum_router: consolidate MAC profiles when possible Previous releases - regressions: - mac80211 fixes: - rate control, avoid driver crash for retransmitted frames - regression in SSN handling of addba tx - a memory leak where sta_info is not freed - marking TX-during-stop for TX in in_reconfig, prevent stall - cfg80211: acquire wiphy mutex on regulatory work - wifi drivers: fix build regressions and LED config dependency - virtio_net: fix rx_drops stat for small pkts - dsa: mv88e6xxx: unforce speed & duplex in mac_link_down() Previous releases - always broken: - bpf fixes: - kernel address leakage in atomic fetch - kernel address leakage in atomic cmpxchg's r0 aux reg - signed bounds propagation after mov32 - extable fixup offset - extable address check - mac80211: - fix the size used for building probe request - send ADDBA requests using the tid/queue of the aggregation session - agg-tx: don't schedule_and_wake_txq() under sta->lock, avoid deadlocks - validate extended element ID is present - mptcp: - never allow the PM to close a listener subflow (null-defer) - clear 'kern' flag from fallback sockets, prevent crash - fix deadlock in __mptcp_push_pending() - inet_diag: fix kernel-infoleak for UDP sockets - xsk: do not sleep in poll() when need_wakeup set - smc: avoid very long waits in smc_release() - sch_ets: don't remove idle classes from the round-robin list - netdevsim: - zero-initialize memory for bpf map's value, prevent info leak - don't let user space overwrite read only (max) ethtool parms - ixgbe: set X550 MDIO speed before talking to PHY - stmmac: - fix null-deref in flower deletion w/ VLAN prio Rx steering - dwmac-rk: fix oob read in rk_gmac_setup - ice: time stamping fixes - systemport: add global locking for descriptor life cycle" * tag 'net-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (89 commits) bpf, selftests: Fix racing issue in btf_skc_cls_ingress test selftest/bpf: Add a test that reads various addresses. bpf: Fix extable address check. bpf: Fix extable fixup offset. bpf, selftests: Add test case trying to taint map value pointer bpf: Make 32->64 bounds propagation slightly more robust bpf: Fix signed bounds propagation after mov32 sit: do not call ipip6_dev_free() from sit_init_net() net: systemport: Add global locking for descriptor lifecycle net/smc: Prevent smc_release() from long blocking net: Fix double 0x prefix print in SKB dump virtio_net: fix rx_drops stat for small pkts dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED sfc_ef100: potential dereference of null pointer net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup net: usb: lan78xx: add Allied Telesis AT29M2-AF net/packet: rx_owner_map depends on pg_vec netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc dpaa2-eth: fix ethtool statistics ixgbe: set X550 MDIO speed before talking to PHY ...
2021-12-16add missing bpf-cgroup.h includesJakub Kicinski
We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency, make sure those who actually need more than the definition of struct cgroup_bpf include bpf-cgroup.h explicitly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org
2021-12-16genirq/msi: Convert storage to xarrayThomas Gleixner
The current linked list storage for MSI descriptors is suboptimal in several ways: 1) Looking up a MSI desciptor requires a O(n) list walk in the worst case 2) The upcoming support of runtime expansion of MSI-X vectors would need to do a full list walk to figure out whether a particular index is already associated. 3) Runtime expansion of sparse allocations is even more complex as the current implementation assumes an ordered list (increasing MSI index). Use an xarray which solves all of the above problems nicely. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210749.280627070@linutronix.de
2021-12-16genirq/msi: Simplify sysfs handlingThomas Gleixner
The sysfs handling for MSI is a convoluted maze and it is in the way of supporting dynamic expansion of the MSI-X vectors because it only supports a one off bulk population/free of the sysfs entries. Change it to do: 1) Creating an empty sysfs attribute group when msi_device_data is allocated 2) Populate the entries when the MSI descriptor is initialized 3) Free the entries when a MSI descriptor is detached from a Linux interrupt. 4) Provide functions for the legacy non-irqdomain fallback code to do a bulk population/free. This code won't support dynamic expansion. This makes the code simpler and reduces the number of allocations as the empty attribute group can be shared. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210749.224917330@linutronix.de
2021-12-16genirq/msi: Mop up old interfacesThomas Gleixner
Get rid of the old iterators, alloc/free functions and adjust the core code accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210749.117395027@linutronix.de
2021-12-16genirq/msi: Convert to new functionsThomas Gleixner
Use the new iterator functions and add locking where required. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210749.063705667@linutronix.de
2021-12-16genirq/msi: Make interrupt allocation less convolutedThomas Gleixner
There is no real reason to do several loops over the MSI descriptors instead of just doing one loop. In case of an error everything is undone anyway so it does not matter whether it's a partial or a full rollback. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210749.010234767@linutronix.de
2021-12-16platform-msi: Simplify platform device MSI codeThomas Gleixner
The allocation code is overly complex. It tries to have the MSI index space packed, which is not working when an interrupt is freed. There is no requirement for this. The only requirement is that the MSI index is unique. Move the MSI descriptor allocation into msi_domain_populate_irqs() and use the Linux interrupt number as MSI index which fulfils the unique requirement. This requires to lock the MSI descriptors which makes the lock order reverse to the regular MSI alloc/free functions vs. the domain mutex. Assign a seperate lockdep class for these MSI device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210748.956731741@linutronix.de
2021-12-16genirq/msi: Provide domain flags to allocate/free MSI descriptors automaticallyThomas Gleixner
Provide domain info flags which tell the core to allocate simple descriptors or to free descriptors when the interrupts are freed and implement the required functionality. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210747.928198636@linutronix.de
2021-12-16genirq/msi: Provide msi_alloc_msi_desc() and a simple allocatorThomas Gleixner
Provide msi_alloc_msi_desc() which takes a template MSI descriptor for initializing a newly allocated descriptor. This allows to simplify various usage sites of alloc_msi_entry() and moves the storage handling into the core code. For simple cases where only a linear vector space is required provide msi_add_simple_msi_descs() which just allocates a linear range of MSI descriptors and fills msi_desc::msi_index accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210747.873833567@linutronix.de
2021-12-16genirq/msi: Provide a set of advanced MSI accessors and iteratorsThomas Gleixner
In preparation for dynamic handling of MSI-X interrupts provide a new set of MSI descriptor accessor functions and iterators. They are benefitial per se as they allow to cleanup quite some code in various MSI domain implementations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210747.818635078@linutronix.de
2021-12-16genirq/msi: Provide msi_domain_alloc/free_irqs_descs_locked()Thomas Gleixner
Usage sites which do allocations of the MSI descriptors before invoking msi_domain_alloc_irqs() require to lock the MSI decriptors accross the operation. Provide entry points which can be called with the MSI mutex held and lock the mutex in the existing entry points. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210747.765371053@linutronix.de
2021-12-16genirq/msi: Add mutex for MSI list protectionThomas Gleixner
For upcoming runtime extensions of MSI-X interrupts it's required to protect the MSI descriptor list. Add a mutex to struct msi_device_data and provide lock/unlock functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210747.708877269@linutronix.de
2021-12-16genirq/msi: Move descriptor list to struct msi_device_dataThomas Gleixner
It's only required when MSI is in use. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211206210747.650487479@linutronix.de
2021-12-16genirq/msi: Provide interface to retrieve Linux interrupt numberThomas Gleixner
This allows drivers to retrieve the Linux interrupt number instead of fiddling with MSI descriptors. msi_get_virq() returns the Linux interrupt number or 0 in case that there is no entry for the given MSI index. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211210221814.780824745@linutronix.de
2021-12-16genirq/msi: Remove the original sysfs interfacesThomas Gleixner
No more users. Refactor the core code accordingly and move the global interface under CONFIG_PCI_MSI_ARCH_FALLBACKS. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211210221814.168362229@linutronix.de