summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2022-02-28bpf: Add config to allow loading modules with BTF mismatchesConnor O'Brien
BTF mismatch can occur for a separately-built module even when the ABI is otherwise compatible and nothing else would prevent successfully loading. Add a new Kconfig to control how mismatches are handled. By default, preserve the current behavior of refusing to load the module. If MODULE_ALLOW_BTF_MISMATCH is enabled, load the module but ignore its BTF information. Suggested-by: Yonghong Song <yhs@fb.com> Suggested-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/CAADnVQJ+OVPnBz8z3vNu8gKXX42jCUqfuvhWAyCQDu8N_yqqwQ@mail.gmail.com Link: https://lore.kernel.org/bpf/20220223012814.1898677-1-connoro@google.com
2022-02-28Merge 5.17-rc6 into driver-core-nextGreg Kroah-Hartman
We need the driver core fix in here as well for future changes. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-28Merge 5.17-rc6 into char-misc-nextGreg Kroah-Hartman
We need the char-misc fixes in here. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-27Merge tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fix from Christoph Hellwig: - fix a swiotlb info leak (Halil Pasic) * tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix info leak with DMA_FROM_DEVICE
2022-02-26Merge tag 'trace-v5.17-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - rtla (Real-Time Linux Analysis tool): - fix typo in man page - Update API -e to -E before it is released - Error message fix and memory leak fix - Partially uninline trace event soft disable to shrink text - Fix function graph start up test - Have triggers affect the trace instance they are in and not top level - Have osnoise sleep in the units it says it uses - Remove unused ftrace stub function - Remove event probe redundant info from event in the buffer - Fix group ownership setting in tracefs - Ensure trace buffer is minimum size to prevent crashes * tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rtla/osnoise: Fix error message when failing to enable trace instance rtla/osnoise: Free params at the exit rtla/hist: Make -E the short version of --entries tracing: Fix selftest config check for function graph start up test tracefs: Set the group ownership in apply_options() not parse_options() tracing/osnoise: Make osnoise_main to sleep for microseconds ftrace: Remove unused ftrace_startup_enable() stub tracing: Ensure trace buffer is at least 4096 bytes large tracing: Uninline trace_trigger_soft_disabled() partly eprobes: Remove redundant event type information tracing: Have traceon and traceoff trigger honor the instance tracing: Dump stacktrace trigger to the corresponding instance rtla: Fix systme -> system typo on man page
2022-02-25tracing: Fix selftest config check for function graph start up testChristophe Leroy
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is required to test direct tramp. Link: https://lkml.kernel.org/r/bdc7e594e13b0891c1d61bc8d56c94b1890eaed7.1640017960.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25bpf: Fix issue with bpf preload module taking over stdout/stdin of kernel.Yucong Sun
In cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.") BPF preload was switched from user mode process to use in-kernel light skeleton instead. However, in the kernel context, early in the boot sequence, the first available FD can start from 0, instead of normally 3 for user mode process. So FDs 0 and 1 are then used for loaded BPF programs and prevent init process from setting up stdin/stdout/stderr on FD 0, 1, and 2 as expected. Before the fix: ls -lah /proc/1/fd/* lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console After the fix: ls -lah /proc/1/fd/* lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console Fix by closing prog FDs after initialization. struct bpf_prog's themselves are kept alive through direct kernel references taken with bpf_link_get_from_fd(). Fixes: cb80ddc67152 ("bpf: Convert bpf_preload.ko to use light skeleton.") Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220225185923.2535519-1-fallentree@fb.com
2022-02-25tracing/osnoise: Make osnoise_main to sleep for microsecondsDaniel Bristot de Oliveira
osnoise's runtime and period are in the microseconds scale, but it is currently sleeping in the millisecond's scale. This behavior roots in the usage of hwlat as the skeleton for osnoise. Make osnoise to sleep in the microseconds scale. Also, move the sleep to a specialized function. Link: https://lkml.kernel.org/r/302aa6c7bdf2d131719b22901905e9da122a11b2.1645197336.git.bristot@kernel.org Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25ftrace: Remove unused ftrace_startup_enable() stubNathan Chancellor
When building with clang + CONFIG_DYNAMIC_FTRACE=n + W=1, there is a warning: kernel/trace/ftrace.c:7194:20: error: unused function 'ftrace_startup_enable' [-Werror,-Wunused-function] static inline void ftrace_startup_enable(int command) { } ^ 1 error generated. Clang warns on instances of static inline functions in .c files with W=1 after commit 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build"). The ftrace_startup_enable() stub has been unused since commit e1effa0144a1 ("ftrace: Annotate the ops operation on update"), where its use outside of the CONFIG_DYNAMIC_TRACE section was replaced by ftrace_startup_all(). Remove it to resolve the warning. Link: https://lkml.kernel.org/r/20220214192847.488166-1-nathan@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25tracing: Ensure trace buffer is at least 4096 bytes largeSven Schnelle
Booting the kernel with 'trace_buf_size=1' give a warning at boot during the ftrace selftests: [ 0.892809] Running postponed tracer tests: [ 0.892893] Testing tracer function: [ 0.901899] Callback from call_rcu_tasks_trace() invoked. [ 0.983829] Callback from call_rcu_tasks_rude() invoked. [ 1.072003] .. bad ring buffer .. corrupted trace buffer .. [ 1.091944] Callback from call_rcu_tasks() invoked. [ 1.097695] PASSED [ 1.097701] Testing dynamic ftrace: .. filter failed count=0 ..FAILED! [ 1.353474] ------------[ cut here ]------------ [ 1.353478] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:1951 run_tracer_selftest+0x13c/0x1b0 Therefore enforce a minimum of 4096 bytes to make the selftest pass. Link: https://lkml.kernel.org/r/20220214134456.1751749-1-svens@linux.ibm.com Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25tracing: Uninline trace_trigger_soft_disabled() partlyChristophe Leroy
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline keyword is not honored and trace_trigger_soft_disabled() appears approx 50 times in vmlinux. Adding -Winline to the build, the following message appears: ./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline] That function is rather big for an inlined function: c003df60 <trace_trigger_soft_disabled>: c003df60: 94 21 ff f0 stwu r1,-16(r1) c003df64: 7c 08 02 a6 mflr r0 c003df68: 90 01 00 14 stw r0,20(r1) c003df6c: bf c1 00 08 stmw r30,8(r1) c003df70: 83 e3 00 24 lwz r31,36(r3) c003df74: 73 e9 01 00 andi. r9,r31,256 c003df78: 41 82 00 10 beq c003df88 <trace_trigger_soft_disabled+0x28> c003df7c: 38 60 00 00 li r3,0 c003df80: 39 61 00 10 addi r11,r1,16 c003df84: 4b fd 60 ac b c0014030 <_rest32gpr_30_x> c003df88: 73 e9 00 80 andi. r9,r31,128 c003df8c: 7c 7e 1b 78 mr r30,r3 c003df90: 41 a2 00 14 beq c003dfa4 <trace_trigger_soft_disabled+0x44> c003df94: 38 c0 00 00 li r6,0 c003df98: 38 a0 00 00 li r5,0 c003df9c: 38 80 00 00 li r4,0 c003dfa0: 48 05 c5 f1 bl c009a590 <event_triggers_call> c003dfa4: 73 e9 00 40 andi. r9,r31,64 c003dfa8: 40 82 00 28 bne c003dfd0 <trace_trigger_soft_disabled+0x70> c003dfac: 73 ff 02 00 andi. r31,r31,512 c003dfb0: 41 82 ff cc beq c003df7c <trace_trigger_soft_disabled+0x1c> c003dfb4: 80 01 00 14 lwz r0,20(r1) c003dfb8: 83 e1 00 0c lwz r31,12(r1) c003dfbc: 7f c3 f3 78 mr r3,r30 c003dfc0: 83 c1 00 08 lwz r30,8(r1) c003dfc4: 7c 08 03 a6 mtlr r0 c003dfc8: 38 21 00 10 addi r1,r1,16 c003dfcc: 48 05 6f 6c b c0094f38 <trace_event_ignore_this_pid> c003dfd0: 38 60 00 01 li r3,1 c003dfd4: 4b ff ff ac b c003df80 <trace_trigger_soft_disabled+0x20> However it is located in a hot path so inlining it is important. But forcing inlining of the entire function by using __always_inline leads to increasing the text size by approx 20 kbytes. Instead, split the fonction in two parts, one part with the likely fast path, flagged __always_inline, and a second part out of line. With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE vmlinux text increases by only 1,4 kbytes, which is partly compensated by a decrease of vmlinux data by 7 kbytes. On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this change reduces vmlinux text by more than 30 kbytes. Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25eprobes: Remove redundant event type informationSteven Rostedt (Google)
Currently, the event probes save the type of the event they are attached to when recording the event. For example: # echo 'e:switch sched/sched_switch prev_state=$prev_state prev_prio=$prev_prio next_pid=$next_pid next_prio=$next_prio' > dynamic_events # cat events/eprobes/switch/format name: switch ID: 1717 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned int __probe_type; offset:8; size:4; signed:0; field:u64 prev_state; offset:12; size:8; signed:0; field:u64 prev_prio; offset:20; size:8; signed:0; field:u64 next_pid; offset:28; size:8; signed:0; field:u64 next_prio; offset:36; size:8; signed:0; print fmt: "(%u) prev_state=0x%Lx prev_prio=0x%Lx next_pid=0x%Lx next_prio=0x%Lx", REC->__probe_type, REC->prev_state, REC->prev_prio, REC->next_pid, REC->next_prio The __probe_type adds 4 bytes to every event. One of the reasons for creating eprobes is to limit what is traced in an event to be able to limit what is written into the ring buffer. Having this redundant 4 bytes to every event takes away from this. The event that is recorded can be retrieved from the event probe itself, that is available when the trace is happening. For user space tools, it could simply read the dynamic_event file to find the event they are for. So there is really no reason to write this information into the ring buffer for every event. Link: https://lkml.kernel.org/r/20220218190057.2f5a19a8@gandalf.local.home Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25tracing: Have traceon and traceoff trigger honor the instanceSteven Rostedt (Google)
If a trigger is set on an event to disable or enable tracing within an instance, then tracing should be disabled or enabled in the instance and not at the top level, which is confusing to users. Link: https://lkml.kernel.org/r/20220223223837.14f94ec3@rorschach.local.home Cc: stable@vger.kernel.org Fixes: ae63b31e4d0e2 ("tracing: Separate out trace events from global variables") Tested-by: Daniel Bristot de Oliveira <bristot@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25ucounts: Fix systemd LimitNPROC with private users regressionEric W. Biederman
Long story short recursively enforcing RLIMIT_NPROC when it is not enforced on the process that creates a new user namespace, causes currently working code to fail. There is no reason to enforce RLIMIT_NPROC recursively when we don't enforce it normally so update the code to detect this case. I would like to simply use capable(CAP_SYS_RESOURCE) to detect when RLIMIT_NPROC is not enforced upon the caller. Unfortunately because RLIMIT_NPROC is charged and checked for enforcement based upon the real uid, using capable() which is euid based is inconsistent with reality. Come as close as possible to testing for capable(CAP_SYS_RESOURCE) by testing for when the real uid would match the conditions when CAP_SYS_RESOURCE would be present if the real uid was the effective uid. Reported-by: Etienne Dechamps <etienne@edechamps.fr> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215596 Link: https://lkml.kernel.org/r/e9589141-cfeb-90cd-2d0e-83a62787239a@edechamps.fr Link: https://lkml.kernel.org/r/87sfs8jmpz.fsf_-_@email.froward.int.ebiederm.org Cc: stable@vger.kernel.org Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-02-25Merge tag 'ixp4xx-cleanup-for-v5.18' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik into arm/soc This cleans out the remaining board files from IXP4xx and makes it an exclusive device tree subarchitecture without any special weirdness in arch/arm/mach-ixp4xx. The biggest noticeable change is the removal of the old PCI driver and along with that the removal of the special DMA coherency code and defines and the DMA bouncing. I tried to convert the IXP4xx to multiplatform on top of this but it didn't work because IXP4xx wants to be big endian and multiplatform config creates a problem like this: ../arch/arm/kernel/head.S: Assembler messages: ../arch/arm/kernel/head.S:94: Error: selected processor does not support `setend be' in ARM mode I think this is because MULTI_V5 turns on CPUs that cannot do big endian, and IXP4xx turn on big endian. (It crashes if I try to boot in little endian mode, sorry. It really wants to run big endian.) But before fixing multiplatform we can fix all of this! The networking patches are dependencies so I am requesting ACKs from the network maintainers on these. * tag 'ixp4xx-cleanup-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-nomadik: ARM: ixp4xx: Convert to SPARSE_IRQ and P2V ARM: ixp4xx: Drop all common code ARM: ixp4xx: Drop custom DMA coherency and bouncing ARM: ixp4xx: Remove feature bit accessors net: ixp4xx_hss: Check features using syscon net: ixp4xx_eth: Drop platform data support soc: ixp4xx-npe: Access syscon regs using regmap soc: ixp4xx: Add features from regmap helper ARM: ixp4xx: Drop UDC info setting function ARM: ixp4xx: Drop stale Kconfig entry ARM: ixp4xx: Delete old PCI driver ARM: ixp4xx: Delete the Goramo MLR boardfile ARM: ixp4xx: Delete Gateway 7001 boardfiles Link: https://lore.kernel.org/r/CACRpkdahK-jaHFqLCpSqiXwAtkSKbhWQZ9jaSo6rRzHfSiECkA@mail.gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-25config: android-recommended: Disable BPF_UNPRIV_DEFAULT_OFF for netdMarijn Suijten
AOSP's `netd` process fails to start on Android S: E ClatdController: getClatEgress4MapFd() failure: Operation not permitted I netd : Initializing ClatdController: 410us E netd : Failed to start trafficcontroller: (Status[code: 1, msg: "Pinned map not accessible or does not exist: (/sys/fs/bpf/map_netd_cookie_tag_map): Operation not permitted"]) E netd : CRITICAL: sleeping 60 seconds, netd exiting with failure, crash loop likely! And on Android R: I ClatdController: 4.9+ kernel and device shipped with P - clat ebpf might work. E ClatdController: getClatEgressMapFd() failure: Operation not permitted I netd : Initializing ClatdController: 1409us E netd : Failed to start trafficcontroller: (Status[code: 1, msg: "Pinned map not accessible or does not exist: (/sys/fs/bpf/map_netd_cookie_tag_map): Operation not permitted"]) These permission issues are caused by 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default") because AOSP does not provide netd the `SYS_ADMIN` capability, and also has no userspace support for the `BPF` capability yet. Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: John Stultz <john.stultz@linaro.org> [John suggested this in https://linaro.atlassian.net/browse/ACK-107?focusedCommentId=117382] Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org> Link: https://lore.kernel.org/r/20220202100528.190794-2-marijn.suijten@somainline.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-25config: android-recommended: Don't explicitly disable CONFIG_AIOMarijn Suijten
Android nowadays (for a couple years already) requires AIO for at least its `adb` "Android Debug Bridge" [1]. Without this config option (`default y`) it simply refuses start, making users unable to connect to their phone for debugging purposes when using these kernel fragments. [1]: https://cs.android.com/android/_/android/platform/packages/modules/adb/+/a2cb8de5e68067a5e1d002886d5f3b42d91371e1 Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org> Link: https://lore.kernel.org/r/20220202100528.190794-1-marijn.suijten@somainline.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-25uaccess: remove CONFIG_SET_FSArnd Bergmann
There are no remaining callers of set_fs(), so CONFIG_SET_FS can be removed globally, along with the thread_info field and any references to it. This turns access_ok() into a cheaper check against TASK_SIZE_MAX. As CONFIG_SET_FS is now gone, drop all remaining references to set_fs()/get_fs(), mm_segment_t, user_addr_max() and uaccess_kernel(). Acked-by: Sam Ravnborg <sam@ravnborg.org> # for sparc32 changes Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Tested-by: Sergey Matyukevich <sergey.matyukevich@synopsys.com> # for arc changes Acked-by: Stafford Horne <shorne@gmail.com> # [openrisc, asm-generic] Acked-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-02-24tracing: Dump stacktrace trigger to the corresponding instanceDaniel Bristot de Oliveira
The stacktrace event trigger is not dumping the stacktrace to the instance where it was enabled, but to the global "instance." Use the private_data, pointing to the trigger file, to figure out the corresponding trace instance, and use it in the trigger action, like snapshot_trigger does. Link: https://lkml.kernel.org/r/afbb0b4f18ba92c276865bc97204d438473f4ebc.1645396236.git.bristot@kernel.org Cc: stable@vger.kernel.org Fixes: ae63b31e4d0e2 ("tracing: Separate out trace events from global variables") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/mptcp/mptcp_join.sh 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") 857898eb4b28 ("selftests: mptcp: add missing join check") 6ef84b1517e0 ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions") c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") 09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") 84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr") efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") 3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24Merge tag 'net-5.17-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. Current release - regressions: - bpf: fix crash due to out of bounds access into reg2btf_ids - mvpp2: always set port pcs ops, avoid null-deref - eth: marvell: fix driver load from initrd - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC" Current release - new code bugs: - mptcp: fix race in overlapping signal events Previous releases - regressions: - xen-netback: revert hotplug-status changes causing devices to not be configured - dsa: - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held - fix panic when removing unoffloaded port from bridge - dsa: microchip: fix bridging with more than two member ports Previous releases - always broken: - bpf: - fix crash due to incorrect copy_map_value when both spin lock and timer are present in a single value - fix a bpf_timer initialization issue with clang - do not try bpf_msg_push_data with len 0 - add schedule points in batch ops - nf_tables: - unregister flowtable hooks on netns exit - correct flow offload action array size - fix a couple of memory leaks - vsock: don't check owner in vhost_vsock_stop() while releasing - gso: do not skip outer ip header in case of ipip and net_failover - smc: use a mutex for locking "struct smc_pnettable" - openvswitch: fix setting ipv6 fields causing hw csum failure - mptcp: fix race in incoming ADD_ADDR option processing - sysfs: add check for netdevice being present to speed_show - sched: act_ct: fix flow table lookup after ct clear or switching zones - eth: intel: fixes for SR-IOV forwarding offloads - eth: broadcom: fixes for selftests and error recovery - eth: mellanox: flow steering and SR-IOV forwarding fixes Misc: - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends not report freed skbs as drops - force inlining of checksum functions in net/checksum.h" * tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits) net: mv643xx_eth: process retval from of_get_mac_address ping: remove pr_err from ping_lookup Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC" openvswitch: Fix setting ipv6 fields causing hw csum failure ipv6: prevent a possible race condition with lifetimes net/smc: Use a mutex for locking "struct smc_pnettable" bnx2x: fix driver load from initrd Revert "xen-netback: Check for hotplug-status existence before watching" Revert "xen-netback: remove 'hotplug-status' once it has served its purpose" net/mlx5e: Fix VF min/max rate parameters interchange mistake net/mlx5e: Add missing increment of count net/mlx5e: MPLSoUDP decap, fix check for unsupported matches net/mlx5e: Fix MPLSoUDP encap to use MPLS action information net/mlx5e: Add feature check for set fec counters net/mlx5e: TC, Skip redundant ct clear actions net/mlx5e: TC, Reject rules with forward and drop actions net/mlx5e: TC, Reject rules with drop and modify hdr action net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets net/mlx5e: Fix wrong return value on ioctl EEPROM query failure net/mlx5: Fix possible deadlock on rule deletion ...
2022-02-24Merge branches 'exp.2022.02.24a', 'fixes.2022.02.14a', ↵Paul E. McKenney
'rcu_barrier.2022.02.08a', 'rcu-tasks.2022.02.08a', 'rt.2022.02.01b', 'torture.2022.02.01b' and 'torturescript.2022.02.08a' into HEAD exp.2022.02.24a: Expedited grace-period updates. fixes.2022.02.14a: Miscellaneous fixes. rcu_barrier.2022.02.08a: Make rcu_barrier() no longer exclude CPU hotplug. rcu-tasks.2022.02.08a: RCU-tasks updates. rt.2022.02.01b: Real-time-related updates. torture.2022.02.01b: Torture-test updates. torturescript.2022.02.08a: Torture-test scripting updates.
2022-02-23tracing: Fix allocation of last_cmd in last_cmd_set()Steven Rostedt (Google)
The strncat() used in last_cmd_set() includes the nul byte of length of the string being copied in, when it should only hold the size of the string being copied (not the nul byte). Change it to subtract the length of the allocated space and the nul byte to pass that into the strncat(). Also, assign "len" instead of initializing it to zero and its first update is to do a "+=". Link: https://lore.kernel.org/all/202202140628.fj6e4w4v-lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-23bpf: Cleanup commentsTom Rix
Add leading space to spdx tag Use // for spdx c file comment Replacements resereved to reserved inbetween to in between everytime to every time intutivie to intuitive currenct to current encontered to encountered referenceing to referencing upto to up to exectuted to executed Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220220184055.3608317-1-trix@redhat.com
2022-02-23kernfs: move struct kernfs_root out of the public view.Greg Kroah-Hartman
There is no need to have struct kernfs_root be part of kernfs.h for the whole kernel to see and poke around it. Move it internal to kernfs code and provide a helper function, kernfs_root_to_node(), to handle the one field that kernfs users were directly accessing from the structure. Cc: Imran Khan <imran.f.khan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220222070713.3517679-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h ↵Ingo Molnar
dependencies Remove all headers, except the ones required to make this header build standalone. Also include stats.h in sched.h explicitly - dependencies already require this. Summary of the build speedup gained through the last ~15 scheduler build & header dependency patches: Cumulative scheduler (kernel/sched/) build time speedup on a Linux distribution's config, which enables all scheduler features, compared to the vanilla kernel: _____________________________________________________________________________ | | Vanilla kernel (v5.13-rc7): |_____________________________________________________________________________ | | Performance counter stats for 'make -j96 kernel/sched/' (3 runs): | | 126,975,564,374 instructions # 1.45 insn per cycle ( +- 0.00% ) | 87,637,847,671 cycles # 3.959 GHz ( +- 0.30% ) | 22,136.96 msec cpu-clock # 7.499 CPUs utilized ( +- 0.29% ) | | 2.9520 +- 0.0169 seconds time elapsed ( +- 0.57% ) |_____________________________________________________________________________ | | Patched kernel: |_____________________________________________________________________________ | | Performance counter stats for 'make -j96 kernel/sched/' (3 runs): | | 50,420,496,914 instructions # 1.47 insn per cycle ( +- 0.00% ) | 34,234,322,038 cycles # 3.946 GHz ( +- 0.31% ) | 8,675.81 msec cpu-clock # 3.053 CPUs utilized ( +- 0.45% ) | | 2.8420 +- 0.0181 seconds time elapsed ( +- 0.64% ) |_____________________________________________________________________________ Summary: - CPU time used to build the scheduler dropped by -60.9%, a reduction from 22.1 clock-seconds to 8.7 clock-seconds. - Wall-clock time to build the scheduler dropped by -3.9%, a reduction from 2.95 seconds to 2.84 seconds. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Reorganize, clean up and optimize ↵Ingo Molnar
kernel/sched/build_utility.c dependencies Use all generic headers from kernel/sched/sched.h that are required for it to build. Sort the sections alphabetically. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Reorganize, clean up and optimize kernel/sched/build_policy.c ↵Ingo Molnar
dependencies Use all generic headers from kernel/sched/sched.h that are required for it to build. Sort the sections alphabetically. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Reorganize, clean up and optimize kernel/sched/fair.c ↵Ingo Molnar
dependencies Use all generic headers from kernel/sched/sched.h that are required for it to build. Sort the sections alphabetically. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Reorganize, clean up and optimize kernel/sched/core.c ↵Ingo Molnar
dependencies Use all generic headers from kernel/sched/sched.h that are required for it to build. Sort the sections alphabetically. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Standardize kernel/sched/sched.h header dependenciesIngo Molnar
kernel/sched/sched.h is a weird mix of ad-hoc headers included in the middle of the header. Two of them rely on being included in the middle of kernel/sched/sched.h, due to definitions they require: - "stat.h" needs the rq definitions. - "autogroup.h" needs the task_group definition. Move the inclusion of these two files out of kernel/sched/sched.h, and include them in all files that require them. Move of the rest of the header dependencies to the top of the kernel/sched/sched.h file. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Introduce kernel/sched/build_policy.c and build multiple .c ↵Ingo Molnar
files there Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related source code files into kernel/sched/build_policy.c: kernel/sched/idle.c kernel/sched/rt.c kernel/sched/cpudeadline.c kernel/sched/pelt.c kernel/sched/cputime.c kernel/sched/deadline.c With the exception of fair.c, which we continue to build as a separate file for build efficiency and parallelism reasons. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Introduce kernel/sched/build_utility.c and build multiple .c ↵Ingo Molnar
files there Collect all utility functionality source code files into a single kernel/sched/build_utility.c file, via #include-ing the .c files: kernel/sched/clock.c kernel/sched/completion.c kernel/sched/loadavg.c kernel/sched/swait.c kernel/sched/wait_bit.c kernel/sched/wait.c CONFIG_CPU_FREQ: kernel/sched/cpufreq.c CONFIG_CPU_FREQ_GOV_SCHEDUTIL: kernel/sched/cpufreq_schedutil.c CONFIG_CGROUP_CPUACCT: kernel/sched/cpuacct.c CONFIG_SCHED_DEBUG: kernel/sched/debug.c CONFIG_SCHEDSTATS: kernel/sched/stats.c CONFIG_SMP: kernel/sched/cpupri.c kernel/sched/stop_task.c kernel/sched/topology.c CONFIG_SCHED_CORE: kernel/sched/core_sched.c CONFIG_PSI: kernel/sched/psi.c CONFIG_MEMBARRIER: kernel/sched/membarrier.c CONFIG_CPU_ISOLATION: kernel/sched/isolation.c CONFIG_SCHED_AUTOGROUP: kernel/sched/autogroup.c The goal is to amortize the 60+ KLOC header bloat from over a dozen build units into a single build unit. The build time of build_utility.c also roughly matches the build time of core.c and fair.c - allowing better load-balancing of scheduler-only rebuilds. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Fix comment typo in kernel/sched/cpudeadline.cIngo Molnar
File name changed. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: sched/clock: Mark all functions 'notrace', remove ↵Ingo Molnar
CC_FLAGS_FTRACE build asymmetry Mark all non-init functions in kernel/sched.c as 'notrace', instead of turning them all off via CC_FLAGS_FTRACE. This is going to allow the treatment of this file as any other scheduler file, and it can be #include-ed in compound compilation units as well. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Add header guard to kernel/sched/stats.h and ↵Ingo Molnar
kernel/sched/autogroup.h Protect against multiple inclusion. Also include "sched.h" in "stat.h", as it relies on it. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-23sched/headers: Add header guard to kernel/sched/sched.hIngo Molnar
Use the canonical header guard naming of the full path to the header. Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
2022-02-22scsi: block: Remove REQ_OP_WRITE_SAME supportChristoph Hellwig
No more users of REQ_OP_WRITE_SAME or drivers implementing it are left, so remove the infrastructure. [mkp: fold in and tweak sysfs reporting fix] Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22Merge branch 'for-5.17-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix for a subtle bug in the recent release_agent permission check update - Fix for a long-standing race condition between cpuset and cpu hotplug - Comment updates * 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: Fix kernel-doc cgroup-v1: Correct privileges check in release_agent writes cgroup: clarify cgroup_css_set_fork() cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
2022-02-22fork: Use IS_ENABLED() in account_kernel_stack()Sebastian Andrzej Siewior
Not strickly needed but checking CONFIG_VMAP_STACK instead of task_stack_vm_area()' result allows the compiler the remove the else path in the CONFIG_VMAP_STACK case where the pointer can't be NULL. Check for CONFIG_VMAP_STACK in order to use the proper path. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220217102406.3697941-9-bigeasy@linutronix.de
2022-02-22fork: Only cache the VMAP stack in finish_task_switch()Sebastian Andrzej Siewior
The task stack could be deallocated later, but for fork()/exec() kind of workloads (say a shell script executing several commands) it is important that the stack is released in finish_task_switch() so that in VMAP_STACK case it can be cached and reused in the new task. For PREEMPT_RT it would be good if the wake-up in vfree_atomic() could be avoided in the scheduling path. Far worse are the other free_thread_stack() implementations which invoke __free_pages()/ kmem_cache_free() with disabled preemption. Cache the stack in free_thread_stack() in the VMAP_STACK case and RCU-delay the free path otherwise. Free the stack in the RCU callback. In the VMAP_STACK case this is another opportunity to fill the cache. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220217102406.3697941-8-bigeasy@linutronix.de
2022-02-22fork: Move task stack accounting to do_exit()Sebastian Andrzej Siewior
There is no need to perform the stack accounting of the outgoing task in its final schedule() invocation which happens with preemption disabled. The task is leaving, the resources will be freed and the accounting can happen in do_exit() before the actual schedule invocation which frees the stack memory. Move the accounting of the stack memory from release_task_stack() to exit_task_stack_account() which then can be invoked from do_exit(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220217102406.3697941-7-bigeasy@linutronix.de
2022-02-22fork: Move memcg_charge_kernel_stack() into CONFIG_VMAP_STACKSebastian Andrzej Siewior
memcg_charge_kernel_stack() is only used in the CONFIG_VMAP_STACK case. Move memcg_charge_kernel_stack() into the CONFIG_VMAP_STACK block and invoke it from within alloc_thread_stack_node(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220217102406.3697941-6-bigeasy@linutronix.de
2022-02-22fork: Don't assign the stack pointer in dup_task_struct()Sebastian Andrzej Siewior
All four versions of alloc_thread_stack_node() assign now task_struct::stack in case the allocation was successful. Let alloc_thread_stack_node() return an error code instead of the stack pointer and remove the stack assignment in dup_task_struct(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220217102406.3697941-5-bigeasy@linutronix.de
2022-02-22fork, IA64: Provide alloc_thread_stack_node() for IA64Sebastian Andrzej Siewior
Provide a generic alloc_thread_stack_node() for IA64 and CONFIG_ARCH_THREAD_STACK_ALLOCATOR which returns stack pointer and sets task_struct::stack so it behaves exactly like the other implementations. Rename IA64's alloc_thread_stack_node() and add the generic version to the fork code so it is in one place _and_ to drastically lower the chances of fat fingering the IA64 code. Do the same for free_thread_stack(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220217102406.3697941-4-bigeasy@linutronix.de
2022-02-22fork: Duplicate task_struct before stack allocationSebastian Andrzej Siewior
alloc_thread_stack_node() already populates the task_struct::stack member except on IA64. The stack pointer is saved and populated again because IA64 needs it and arch_dup_task_struct() overwrites it. Allocate thread's stack after task_struct has been duplicated as a preparation for further changes. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220217102406.3697941-3-bigeasy@linutronix.de
2022-02-22fork: Redo ifdefs around task stack handlingSebastian Andrzej Siewior
The use of ifdef CONFIG_VMAP_STACK is confusing in terms what is actually happenning and what can happen. For instance from reading free_thread_stack() it appears that in the CONFIG_VMAP_STACK case it may receive a non-NULL vm pointer but it may also be NULL in which case __free_pages() is used to free the stack. This is however not the case because in the VMAP case a non-NULL pointer is always returned here. Since it looks like this might happen, the compiler creates the correct dead code with the invocation to __free_pages() and everything around it. Twice. Add spaces between the ifdef and the identifer to recognize the ifdef level which is currently in scope. Add the current identifer as a comment behind #else and #endif. Move the code within free_thread_stack() and alloc_thread_stack_node() into the relevant ifdef blocks. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220217102406.3697941-2-bigeasy@linutronix.de
2022-02-22cpuset: Fix kernel-docJiapeng Chong
Fix the following W=1 kernel warnings: kernel/cgroup/cpuset.c:3718: warning: expecting prototype for cpuset_memory_pressure_bump(). Prototype was for __cpuset_memory_pressure_bump() instead. kernel/cgroup/cpuset.c:3568: warning: expecting prototype for cpuset_node_allowed(). Prototype was for __cpuset_node_allowed() instead. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-22audit: log AUDIT_TIME_* records only from rulesRichard Guy Briggs
AUDIT_TIME_* events are generated when there are syscall rules present that are not related to time keeping. This will produce noisy log entries that could flood the logs and hide events we really care about. Rather than immediately produce the AUDIT_TIME_* records, store the data in the context and log it at syscall exit time respecting the filter rules. Note: This eats the audit_buffer, unlike any others in show_special(). Please see https://bugzilla.redhat.com/show_bug.cgi?id=1991919 Fixes: 7e8eda734d30 ("ntp: Audit NTP parameters adjustment") Fixes: 2d87a0674bd6 ("timekeeping: Audit clock adjustments") Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [PM: fixed style/whitespace issues] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-22cgroup-v1: Correct privileges check in release_agent writesMichal Koutný
The idea is to check: a) the owning user_ns of cgroup_ns, b) capabilities in init_user_ns. The commit 24f600856418 ("cgroup-v1: Require capabilities to set release_agent") got this wrong in the write handler of release_agent since it checked user_ns of the opener (may be different from the owning user_ns of cgroup_ns). Secondly, to avoid possibly confused deputy, the capability of the opener must be checked. Fixes: 24f600856418 ("cgroup-v1: Require capabilities to set release_agent") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/ Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Masami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp> Signed-off-by: Tejun Heo <tj@kernel.org>