summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2020-12-14Merge tag 'core-entry-2020-12-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core entry/exit updates from Thomas Gleixner: "A set of updates for entry/exit handling: - More generalization of entry/exit functionality - The consolidation work to reclaim TIF flags on x86 and also for non-x86 specific TIF flags which are solely relevant for syscall related work and have been moved into their own storage space. The x86 specific part had to be merged in to avoid a major conflict. - The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal delivery mode of task work and results in an impressive performance improvement for io_uring. The non-x86 consolidation of this is going to come seperate via Jens. - The selective syscall redirection facility which provides a clean and efficient way to support the non-Linux syscalls of WINE by catching them at syscall entry and redirecting them to the user space emulation. This can be utilized for other purposes as well and has been designed carefully to avoid overhead for the regular fastpath. This includes the core changes and the x86 support code. - Simplification of the context tracking entry/exit handling for the users of the generic entry code which guarantee the proper ordering and protection. - Preparatory changes to make the generic entry code accomodate S390 specific requirements which are mostly related to their syscall restart mechanism" * tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) entry: Add syscall_exit_to_user_mode_work() entry: Add exit_to_user_mode() wrapper entry_Add_enter_from_user_mode_wrapper entry: Rename exit_to_user_mode() entry: Rename enter_from_user_mode() docs: Document Syscall User Dispatch selftests: Add benchmark for syscall user dispatch selftests: Add kselftest for syscall user dispatch entry: Support Syscall User Dispatch on common syscall entry kernel: Implement selective syscall userspace redirection signal: Expose SYS_USER_DISPATCH si_code type x86: vdso: Expose sigreturn address on vdso to the kernel MAINTAINERS: Add entry for common entry code entry: Fix boot for !CONFIG_GENERIC_ENTRY x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK context_tracking: Only define schedule_user() on !HAVE_CONTEXT_TRACKING_OFFSTACK archs sched: Detect call to schedule from critical entry code context_tracking: Don't implement exception_enter/exit() on CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK context_tracking: Introduce HAVE_CONTEXT_TRACKING_OFFSTACK x86: Reclaim unused x86 TI flags ...
2020-12-14Merge tag 'fixes-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull misc fixes from Christian Brauner: "This contains several fixes which felt worth being combined into a single branch: - Use put_nsproxy() instead of open-coding it switch_task_namespaces() - Kirill's work to unify lifecycle management for all namespaces. The lifetime counters are used identically for all namespaces types. Namespaces may of course have additional unrelated counters and these are not altered. This work allows us to unify the type of the counters and reduces maintenance cost by moving the counter in one place and indicating that basic lifetime management is identical for all namespaces. - Peilin's fix adding three byte padding to Dmitry's PTRACE_GET_SYSCALL_INFO uapi struct to prevent an info leak. - Two smal patches to convert from the /* fall through */ comment annotation to the fallthrough keyword annotation which I had taken into my branch and into -next before df561f6688fe ("treewide: Use fallthrough pseudo-keyword") made it upstream which fixed this tree-wide. Since I didn't want to invalidate all testing for other commits I didn't rebase and kept them" * tag 'fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: nsproxy: use put_nsproxy() in switch_task_namespaces() sys: Convert to the new fallthrough notation signal: Convert to the new fallthrough notation time: Use generic ns_common::count cgroup: Use generic ns_common::count mnt: Use generic ns_common::count user: Use generic ns_common::count pid: Use generic ns_common::count ipc: Use generic ns_common::count uts: Use generic ns_common::count net: Use generic ns_common::count ns: Add a common refcount into ns_common ptrace: Prevent kernel-infoleak in ptrace_get_syscall_info()
2020-12-14Merge tag 'time-namespace-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull time namespace updates from Christian Brauner: "When time namespaces were introduced we missed to virtualize the 'btime' field in /proc/stat. This confuses tasks which are in another time namespace with a virtualized boottime which is common in some container workloads. This contains Michael's series to fix 'btime' which Thomas asked me to take through my tree. To fix 'btime' virtualization we simply subtract the offset of the time namespace's boottime from btime before printing the stats. Note that since start_boottime of processes are seconds since boottime and the boottime stamp is now shifted according to the time namespace's offset, the offset of the time namespace also needs to be applied before the process stats are given to userspace. This avoids that processes shown by tools such as 'ps' appear as time travelers in the corresponding time namespace. Selftests are included to verify that btime virtualization in /proc/stat works as expected" * tag 'time-namespace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: namespace: make timens_on_fork() return nothing selftests/timens: added selftest for /proc/stat btime fs/proc: apply the time namespace offset to /proc/stat btime timens: additional helper functions for boottime offset handling
2020-12-14Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Expose tag address bits in siginfo. The original arm64 ABI did not expose any of the bits 63:56 of a tagged address in siginfo. In the presence of user ASAN or MTE, this information may be useful. The implementation is generic to other architectures supporting tags (like SPARC ADI, subject to wiring up the arch code). The user will have to opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra bits, if available, become visible in si_addr. - Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the lowest 1GB to cope with the Raspberry Pi 4 limitations, to the detriment of other platforms. With these changes, the kernel scans the Device Tree dma-ranges and the ACPI IORT information before deciding on a smaller ZONE_DMA. - Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building with LTO, there is an increased risk of the compiler converting an address dependency headed by a READ_ONCE() invocation into a control dependency and consequently allowing for harmful reordering by the CPU. - Add CPPC FFH support using arm64 AMU counters. - set_fs() removal on arm64. This renders the User Access Override (UAO) ARMv8 feature unnecessary. - Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs identifier file for SMMUv3, stop event counters support for i.MX8MP, enable the perf events-based hard lockup detector. - Reorganise the kernel VA space slightly so that 52-bit VA configurations can use more virtual address space. - Improve the robustness of the arm64 memory offline event notifier. - Pad the Image header to 64K following the EFI header definition updated recently to increase the section alignment to 64K. - Support CONFIG_CMDLINE_EXTEND on arm64. - Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8 bits for PtrAuth. - Switch to vmapped shadow call stacks. - Miscellaneous clean-ups. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits) perf/imx_ddr: Add system PMU identifier for userspace bindings: perf: imx-ddr: add compatible string arm64: Fix build failure when HARDLOCKUP_DETECTOR_PERF is enabled arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE arm64: mark __system_matches_cap as __maybe_unused arm64: uaccess: remove vestigal UAO support arm64: uaccess: remove redundant PAN toggling arm64: uaccess: remove addr_limit_user_check() arm64: uaccess: remove set_fs() arm64: uaccess cleanup macro naming arm64: uaccess: split user/kernel routines arm64: uaccess: refactor __{get,put}_user arm64: uaccess: simplify __copy_user_flushcache() arm64: uaccess: rename privileged uaccess routines arm64: sdei: explicitly simulate PAN/UAO entry arm64: sdei: move uaccess logic to arch/arm64/ arm64: head.S: always initialize PSTATE arm64: head.S: cleanup SCTLR_ELx initialization arm64: head.S: rename el2_setup -> init_kernel_el arm64: add C wrappers for SET_PSTATE_*() ...
2020-12-14Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-12-14 1) Expose bpf_sk_storage_*() helpers to iterator programs, from Florent Revest. 2) Add AF_XDP selftests based on veth devs to BPF selftests, from Weqaar Janjua. 3) Support for finding BTF based kernel attach targets through libbpf's bpf_program__set_attach_target() API, from Andrii Nakryiko. 4) Permit pointers on stack for helper calls in the verifier, from Yonghong Song. 5) Fix overflows in hash map elem size after rlimit removal, from Eric Dumazet. 6) Get rid of direct invocation of llc in BPF selftests, from Andrew Delgadillo. 7) Fix xsk_recvmsg() to reorder socket state check before access, from Björn Töpel. 8) Add new libbpf API helper to retrieve ring buffer epoll fd, from Brendan Jackman. 9) Batch of minor BPF selftest improvements all over the place, from Florian Lehner, KP Singh, Jiri Olsa and various others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits) selftests/bpf: Add a test for ptr_to_map_value on stack for helper access bpf: Permits pointers on stack for helper calls libbpf: Expose libbpf ring_buffer epoll_fd selftests/bpf: Add set_attach_target() API selftest for module target libbpf: Support modules in bpf_program__set_attach_target() API selftests/bpf: Silence ima_setup.sh when not running in verbose mode. selftests/bpf: Drop the need for LLVM's llc selftests/bpf: fix bpf_testmod.ko recompilation logic samples/bpf: Fix possible hang in xdpsock with multiple threads selftests/bpf: Make selftest compilation work on clang 11 selftests/bpf: Xsk selftests - adding xdpxceiver to .gitignore selftests/bpf: Drop tcp-{client,server}.py from Makefile selftests/bpf: Xsk selftests - Bi-directional Sockets - SKB, DRV selftests/bpf: Xsk selftests - Socket Teardown - SKB, DRV selftests/bpf: Xsk selftests - DRV POLL, NOPOLL selftests/bpf: Xsk selftests - SKB POLL, NOPOLL selftests/bpf: Xsk selftests framework bpf: Only provide bpf_sock_from_file with CONFIG_NET bpf: Return -ENOTSUPP when attaching to non-kernel BTF xsk: Validate socket state in xsk_recvmsg, prior touching socket members ... ==================== Link: https://lore.kernel.org/r/20201214214316.20642-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14rcu-tasks: Move RCU-tasks initialization to before early_initcall()Uladzislau Rezki (Sony)
PowerPC testing encountered boot failures due to RCU Tasks not being fully initialized until core_initcall() time. This commit therefore initializes RCU Tasks (along with Rude RCU and RCU Tasks Trace) just before early_initcall() time, thus allowing waiting on RCU Tasks grace periods from early_initcall() handlers. Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/ Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-12-14bpf: Permits pointers on stack for helper callsYonghong Song
Currently, when checking stack memory accessed by helper calls, for spills, only PTR_TO_BTF_ID and SCALAR_VALUE are allowed. Song discovered an issue where the below bpf program int dump_task(struct bpf_iter__task *ctx) { struct seq_file *seq = ctx->meta->seq; static char[] info = "abc"; BPF_SEQ_PRINTF(seq, "%s\n", info); return 0; } may cause a verifier failure. The verifier output looks like: ; struct seq_file *seq = ctx->meta->seq; 1: (79) r1 = *(u64 *)(r1 +0) ; BPF_SEQ_PRINTF(seq, "%s\n", info); 2: (18) r2 = 0xffff9054400f6000 4: (7b) *(u64 *)(r10 -8) = r2 5: (bf) r4 = r10 ; 6: (07) r4 += -8 ; BPF_SEQ_PRINTF(seq, "%s\n", info); 7: (18) r2 = 0xffff9054400fe000 9: (b4) w3 = 4 10: (b4) w5 = 8 11: (85) call bpf_seq_printf#126 R1_w=ptr_seq_file(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=4,imm=0) R3_w=inv4 R4_w=fp-8 R5_w=inv8 R10=fp0 fp-8_w=map_value last_idx 11 first_idx 0 regs=8 stack=0 before 10: (b4) w5 = 8 regs=8 stack=0 before 9: (b4) w3 = 4 invalid indirect read from stack off -8+0 size 8 Basically, the verifier complains the map_value pointer at "fp-8" location. To fix the issue, if env->allow_ptr_leaks is true, let us also permit pointers on the stack to be accessible by the helper. Reported-by: Song Liu <songliubraving@fb.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201210013349.943719-1-yhs@fb.com
2020-12-14Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add speed testing on 1420-byte blocks for networking Algorithms: - Improve performance of chacha on ARM for network packets - Improve performance of aegis128 on ARM for network packets Drivers: - Add support for Keem Bay OCS AES/SM4 - Add support for QAT 4xxx devices - Enable crypto-engine retry mechanism in caam - Enable support for crypto engine on sdm845 in qce - Add HiSilicon PRNG driver support" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (161 commits) crypto: qat - add capability detection logic in qat_4xxx crypto: qat - add AES-XTS support for QAT GEN4 devices crypto: qat - add AES-CTR support for QAT GEN4 devices crypto: atmel-i2c - select CONFIG_BITREVERSE crypto: hisilicon/trng - replace atomic_add_return() crypto: keembay - Add support for Keem Bay OCS AES/SM4 dt-bindings: Add Keem Bay OCS AES bindings crypto: aegis128 - avoid spurious references crypto_aegis128_update_simd crypto: seed - remove trailing semicolon in macro definition crypto: x86/poly1305 - Use TEST %reg,%reg instead of CMP $0,%reg crypto: x86/sha512 - Use TEST %reg,%reg instead of CMP $0,%reg crypto: aesni - Use TEST %reg,%reg instead of CMP $0,%reg crypto: cpt - Fix sparse warnings in cptpf hwrng: ks-sa - Add dependency on IOMEM and OF crypto: lib/blake2s - Move selftest prototype into header file crypto: arm/aes-ce - work around Cortex-A57/A72 silion errata crypto: ecdh - avoid unaligned accesses in ecdh_set_secret() crypto: ccree - rework cache parameters handling crypto: cavium - Use dma_set_mask_and_coherent to simplify code crypto: marvell/octeontx - Use dma_set_mask_and_coherent to simplify code ...
2020-12-14Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq updates for 5.11-rc1 from Viresh Kumar: "This contains the following updates: - Fix imx's NVMEM_IMX_OCOTP dependency (Arnd Bergmann). - Add support for mt8167 and blacklist mt8516 (Fabien Parent). - Some ->get() callback related cleanups to the tegra194 driver and some optimizations in tegra186 driver (Jon Hunter and Sumit Gupta). - Power scale improvements to arm_scmi driver (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE and MODULE_ALIAS to several drivers (Pali Rohár). - Fix error path in mediatek driver (Qinglang Miao). - Fix memleak in ST's cpufreq driver (Yangtao Li)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (22 commits) cpufreq: arm_scmi: Discover the power scale in performance protocol firmware: arm_scmi: Add power_scale_mw_get() interface cpufreq: tegra194: Rename tegra194_get_speed_common function cpufreq: tegra194: Remove unnecessary frequency calculation cpufreq: tegra186: Simplify cluster information lookup cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning cpufreq: imx: fix NVMEM_IMX_OCOTP dependency cpufreq: vexpress-spc: Add missing MODULE_ALIAS cpufreq: scpi: Add missing MODULE_ALIAS cpufreq: loongson1: Add missing MODULE_ALIAS cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE cpufreq: st: Add missing MODULE_DEVICE_TABLE cpufreq: qcom: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE cpufreq: highbank: Add missing MODULE_DEVICE_TABLE cpufreq: ap806: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: add missing platform_driver_unregister() on error in mtk_cpufreq_driver_init cpufreq: tegra194: get consistent cpuinfo_cur_freq cpufreq: blacklist mt8516 in cpufreq-dt-platdev cpufreq: mediatek: Add support for mt8167 ...
2020-12-14Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"Steven Rostedt (VMware)
It was believed that metag was the only architecture that required the ring buffer to keep 8 byte words aligned on 8 byte architectures, and with its removal, it was assumed that the ring buffer code did not need to handle this case. It appears that sparc64 also requires this. The following was reported on a sparc64 boot up: kernel: futex hash table entries: 65536 (order: 9, 4194304 bytes, linear) kernel: Running postponed tracer tests: kernel: Testing tracer function: kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140 kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140 kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: PASSED Need to put back the 64BIT aligned code for the ring buffer. Link: https://lore.kernel.org/r/CADxRZqzXQRYgKc=y-KV=S_yHL+Y8Ay2mh5ezeZUnpRvg+syWKw@mail.gmail.com Cc: stable@vger.kernel.org Fixes: 86b3de60a0b6 ("ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS") Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-12-14ring-buffer: Add rb_check_bpage in __rb_allocate_pagesQiujun Huang
It may be better to check each page is aligned by 4 bytes. The 2 least significant bits of the address will be used as flags. Link: https://lkml.kernel.org/r/20201015113842.2921-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-12-14ring-buffer: Fix two typos in commentsQiujun Huang
s/inerrupting/interrupting/ s/beween/between/ Link: https://lkml.kernel.org/r/20201014152749.29986-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-12-14tracing: Drop unneeded assignment in ring_buffer_resize()Lukas Bulwahn
Since commit 0a1754b2a97e ("ring-buffer: Return 0 on success from ring_buffer_resize()"), computing the size is not needed anymore. Drop unneeded assignment in ring_buffer_resize(). Link: https://lkml.kernel.org/r/20201214084503.3079-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-12-14tracing: Disable ftrace selftests when any tracer is runningMasami Hiramatsu
Disable ftrace selftests when any tracer (kernel command line options like ftrace=, trace_events=, kprobe_events=, and boot-time tracing) starts running because selftest can disturb it. Currently ftrace= and trace_events= are checked, but kprobe_events has a different flag, and boot-time tracing didn't checked. This unifies the disabled flag and all of those boot-time tracing features sets the flag. This also fixes warnings on kprobe-event selftest (CONFIG_FTRACE_STARTUP_TEST=y and CONFIG_KPROBE_EVENTS=y) with boot-time tracing (ftrace.event.kprobes.EVENT.probes) like below; [ 59.803496] trace_kprobe: Testing kprobe tracing: [ 59.804258] ------------[ cut here ]------------ [ 59.805682] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:1987 kprobe_trace_self_tests_ib [ 59.806944] Modules linked in: [ 59.807335] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc7+ #172 [ 59.808029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/204 [ 59.808999] RIP: 0010:kprobe_trace_self_tests_init+0x5f/0x42b [ 59.809696] Code: e8 03 00 00 48 c7 c7 30 8e 07 82 e8 6d 3c 46 ff 48 c7 c6 00 b2 1a 81 48 c7 c7 7 [ 59.812439] RSP: 0018:ffffc90000013e78 EFLAGS: 00010282 [ 59.813038] RAX: 00000000ffffffef RBX: 0000000000000000 RCX: 0000000000049443 [ 59.813780] RDX: 0000000000049403 RSI: 0000000000049403 RDI: 000000000002deb0 [ 59.814589] RBP: ffffc90000013e90 R08: 0000000000000001 R09: 0000000000000001 [ 59.815349] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffffef [ 59.816138] R13: ffff888004613d80 R14: ffffffff82696940 R15: ffff888004429138 [ 59.816877] FS: 0000000000000000(0000) GS:ffff88807dcc0000(0000) knlGS:0000000000000000 [ 59.817772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.818395] CR2: 0000000001a8dd38 CR3: 0000000002222000 CR4: 00000000000006a0 [ 59.819144] Call Trace: [ 59.819469] ? init_kprobe_trace+0x6b/0x6b [ 59.819948] do_one_initcall+0x5f/0x300 [ 59.820392] ? rcu_read_lock_sched_held+0x4f/0x80 [ 59.820916] kernel_init_freeable+0x22a/0x271 [ 59.821416] ? rest_init+0x241/0x241 [ 59.821841] kernel_init+0xe/0x10f [ 59.822251] ret_from_fork+0x22/0x30 [ 59.822683] irq event stamp: 16403349 [ 59.823121] hardirqs last enabled at (16403359): [<ffffffff810db81e>] console_unlock+0x48e/0x580 [ 59.824074] hardirqs last disabled at (16403368): [<ffffffff810db786>] console_unlock+0x3f6/0x580 [ 59.825036] softirqs last enabled at (16403200): [<ffffffff81c0033a>] __do_softirq+0x33a/0x484 [ 59.825982] softirqs last disabled at (16403087): [<ffffffff81a00f02>] asm_call_irq_on_stack+0x10 [ 59.827034] ---[ end trace 200c544775cdfeb3 ]--- [ 59.827635] trace_kprobe: error on probing function entry. Link: https://lkml.kernel.org/r/160741764955.3448999.3347769358299456915.stgit@devnote2 Fixes: 4d655281eb1b ("tracing/boot Add kprobe event support") Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-12-14Merge branch 'for-5.11' into for-linusPetr Mladek
2020-12-14Merge branch 'for-5.11-null-console' into for-linusPetr Mladek
2020-12-13Merge tag 'x86-urgent-2020-12-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 and membarrier fixes: - Correct a few problems in the x86 and the generic membarrier implementation. Small corrections for assumptions about visibility which have turned out not to be true. - Make the PAT bits for memory encryption correct vs 4K and 2M/1G page table entries as they are at a different location. - Fix a concurrency issue in the the local bandwidth readout of resource control leading to incorrect values - Fix the ordering of allocating a vector for an interrupt. The order missed to respect the provided cpumask when the first attempt of allocating node local in the mask fails. It then tries the node instead of trying the full provided mask first. This leads to erroneous error messages and breaking the (user) supplied affinity request. Reorder it. - Make the INT3 padding detection in optprobe work correctly" * tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Fix optprobe to detect INT3 padding correctly x86/apic/vector: Fix ordering in vector assignment x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP membarrier: Execute SYNC_CORE on the calling thread membarrier: Explicitly sync remote cores when SYNC_CORE is requested membarrier: Add an actual barrier before rseq_preempt() x86/membarrier: Get rid of a dubious optimization
2020-12-12kernel: remove checking for TIF_NOTIFY_SIGNALJens Axboe
It's available everywhere now, no need to check or add dummy defines. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-12signal: kill JOBCTL_TASK_WORKJens Axboe
It's no longer used, get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-12task_work: remove legacy TWA_SIGNAL pathJens Axboe
All archs now support TIF_NOTIFY_SIGNAL. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11tick/sched: Make jiffies update quick check more robustThomas Gleixner
The quick check in tick_do_update_jiffies64() whether jiffies need to be updated is not really correct under all circumstances and on all architectures, especially not on 32bit systems. The quick check does: if (now < READ_ONCE(tick_next_period)) return; and the counterpart in the update is: WRITE_ONCE(tick_next_period, next_update_time); This has two problems: 1) On weakly ordered architectures there is no guarantee that the stores before the WRITE_ONCE() are visible which means that other CPUs can operate on a stale jiffies value. 2) On 32bit the store of tick_next_period which is an u64 is split into two 32bit stores. If the first 32bit store advances tick_next_period far out and the second 32bit store is delayed (virt, NMI ...) then jiffies will become stale until the second 32bit store happens. Address this by seperating the handling for 32bit and 64bit. On 64bit problem #1 is addressed by replacing READ_ONCE() / WRITE_ONCE() with smp_load_acquire() / smp_store_release(). On 32bit problem #2 is addressed by protecting the quick check with the jiffies sequence counter. The load and stores can be plain because the sequence count mechanics provides the required barriers already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/87czzpc02w.fsf@nanos.tec.linutronix.de
2020-12-11bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpersAndrii Nakryiko
Remove bpf_ prefix, which causes these helpers to be reported in verifier dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets fix it as long as it is still possible before UAPI freezes on these helpers. Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11elfcore: fix building with clangArnd Bergmann
kernel/elfcore.c only contains weak symbols, which triggers a bug with clang in combination with recordmcount: Cannot find symbol for section 2: .text. kernel/elfcore.o: failed Move the empty stubs into linux/elfcore.h as inline functions. As only two architectures use these, just use the architecture specific Kconfig symbols to key off the declaration. Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Barret Rhoden <brho@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-11x86,swiotlb: Adjust SWIOTLB bounce buffer size for SEV guestsAshish Kalra
For SEV, all DMA to and from guest has to use shared (un-encrypted) pages. SEV uses SWIOTLB to make this happen without requiring changes to device drivers. However, depending on the workload being run, the default 64MB of it might not be enough and it may run out of buffers to use for DMA, resulting in I/O errors and/or performance degradation for high I/O workloads. Adjust the default size of SWIOTLB for SEV guests using a percentage of the total memory available to guest for the SWIOTLB buffers. Adds a new sev_setup_arch() function which is invoked from setup_arch() and it calls into a new swiotlb generic code function swiotlb_adjust_size() to do the SWIOTLB buffer adjustment. v5 fixed build errors and warnings as Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Co-developed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2020-12-11cpufreq: schedutil: Simplify sugov_update_next_freq()Rafael J. Wysocki
Rearrange a conditional to make it more straightforward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-12-11genirq/affinity: Add irq_update_affinity_desc()John Garry
Add a function to allow the affinity of an interrupt be switched to managed, such that interrupts allocated for platform devices may be managed. This new interface has certain limitations, and attempts to use it in the following circumstances will fail: - For when the kernel is configured for generic IRQ reservation mode (in config GENERIC_IRQ_RESERVATION_MODE). The reason being that it could conflict with managed vs. non-managed interrupt accounting. - The interrupt is already started, which should not be the case during init - The interrupt is already configured as managed, which means double init Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1606905417-183214-2-git-send-email-john.garry@huawei.com
2020-12-11Revert "genirq: Add fasteoi IPI flow"Valentin Schneider
handle_percpu_devid_fasteoi_ipi() has no more users, and handle_percpu_devid_irq() can do all that it was supposed to do. Get rid of it. This reverts commit c5e5ec033c4ab25c53f1fd217849e75deb0bf7bf. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201109094121.29975-6-valentin.schneider@arm.com
2020-12-11ntp: Consolidate the RTC update implementationThomas Gleixner
The code for the legacy RTC and the RTC class based update are pretty much the same. Consolidate the common parts into one function and just invoke the actual setter functions. For RTC class based devices the update code checks whether the offset is valid for the device, which is usually not the case for the first invocation. If it's not the same it stores the correct offset and lets the caller try again. That's not much different from the previous approach where the first invocation had a pretty low probability to actually hit the allowed window. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.355743355@linutronix.de
2020-12-11ntp: Make the RTC sync offset less obscureThomas Gleixner
The current RTC set_offset_nsec value is not really intuitive to understand. tsched twrite(t2.tv_sec - 1) t2 (seconds increment) The offset is calculated from twrite based on the assumption that t2 - twrite == 1s. That means for the MC146818 RTC the offset needs to be negative so that the write happens 500ms before t2. It's easier to understand when the whole calculation is based on t2. That avoids negative offsets and the meaning is obvious: t2 - twrite: The time defined by the chip when seconds increment after the write. twrite - tsched: The time for the transport to the point where the chip is updated. ==> set_offset_nsec = t2 - tsched ttransport = twrite - tsched tRTCinc = t2 - twrite ==> set_offset_nsec = ttransport + tRTCinc tRTCinc is a chip property and can be obtained from the data sheet. ttransport depends on how the RTC is connected. It is close to 0 for directly accessible RTCs. For RTCs behind a slow bus, e.g. i2c, it's the time required to send the update over the bus. This can be estimated or even calibrated, but that's a different problem. Adjust the implementation and update comments accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.263204937@linutronix.de
2020-12-11ntp, rtc: Move rtc_set_ntp_time() to ntp codeThomas Gleixner
rtc_set_ntp_time() is not really RTC functionality as the code is just a user of RTC. Move it into the NTP code which allows further cleanups. Requested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.166871172@linutronix.de
2020-12-11ntp: Make the RTC synchronization more reliableThomas Gleixner
Miroslav reported that the periodic RTC synchronization in the NTP code fails more often than not to hit the specified update window. The reason is that the code uses delayed_work to schedule the update which needs to be in thread context as the underlying RTC might be connected via a slow bus, e.g. I2C. In the update function it verifies whether the current time is correct vs. the requirements of the underlying RTC. But delayed_work is using the timer wheel for scheduling which is inaccurate by design. Depending on the distance to the expiry the wheel gets less granular to allow batching and to avoid the cascading of the original timer wheel. See 500462a9de65 ("timers: Switch to a non-cascading wheel") and the code for further details. The code already deals with this by splitting the 660 seconds period into a long 659 seconds timer and then retrying with a smaller delta. But looking at the actual granularities of the timer wheel (which depend on the HZ configuration) the 659 seconds timer ends up in an outer wheel level and is affected by a worst case granularity of: HZ Granularity 1000 32s 250 16s 100 40s So the initial timer can be already off by max 12.5% which is not a big issue as the period of the sync is defined as ~11 minutes. The fine grained second attempt schedules to the desired update point with a timer expiring less than a second from now. Depending on the actual delta and the HZ setting even the second attempt can end up in outer wheel levels which have a large enough granularity to make the correctness check fail. As this is a fundamental property of the timer wheel there is no way to make this more accurate short of iterating in one jiffies steps towards the update point. Switch it to an hrtimer instead which schedules the actual update work. The hrtimer will expire precisely (max 1 jiffie delay when high resolution timers are not available). The actual scheduling delay of the work is the same as before. The update is triggered from do_adjtimex() which is a bit racy but not much more racy than it was before: if (ntp_synced()) queue_delayed_work(system_power_efficient_wq, &sync_work, 0); which is racy when the work is currently executed and has not managed to reschedule itself. This becomes now: if (ntp_synced() && !hrtimer_is_queued(&sync_hrtimer)) queue_work(system_power_efficient_wq, &sync_work, 0); which is racy when the hrtimer has expired and the work is currently executed and has not yet managed to rearm the hrtimer. Not a big problem as it just schedules work for nothing. The new implementation has a safe guard in place to catch the case where the hrtimer is queued on entry to the work function and avoids an extra update attempt of the RTC that way. Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Miroslav Lichvar <mlichvar@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201206220542.062910520@linutronix.de
2020-12-11sched/fair: Trivial correction of the newidle_balance() commentBarry Song
idle_balance() has been renamed to newidle_balance(). To differentiate with nohz_idle_balance, it seems refining the comment will be helpful for the readers of the code. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20201202220641.22752-1-song.bao.hua@hisilicon.com
2020-12-11sched/fair: Clear SMT siblings after determining the core is not idleMel Gorman
The clearing of SMT siblings from the SIS mask before checking for an idle core is a small but unnecessary cost. Defer the clearing of the siblings until the scan moves to the next potential target. The cost of this was not measured as it is borderline noise but it should be self-evident. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20201130144020.GS3371@techsingularity.net
2020-12-11sched: Fix kernel-doc markupMauro Carvalho Chehab
Kernel-doc requires that a kernel-doc markup to be immediately below the function prototype, as otherwise it will rename it. So, move sys_sched_yield() markup to the right place. Also fix the cpu_util() markup: Kernel-doc markups should use this format: identifier - description Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/50cd6f460aeb872ebe518a8e9cfffda2df8bdb0a.1606823973.git.mchehab+huawei@kernel.org
2020-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) IPsec compat fixes, from Dmitry Safonov. 2) Fix memory leak in xfrm_user_policy(). Fix from Yu Kuai. 3) Fix polling in xsk sockets by using sk_poll_wait() instead of datagram_poll() which keys off of sk_wmem_alloc and such which xsk sockets do not update. From Xuan Zhuo. 4) Missing init of rekey_data in cfgh80211, from Sara Sharon. 5) Fix destroy of timer before init, from Davide Caratti. 6) Missing CRYPTO_CRC32 selects in ethernet driver Kconfigs, from Arnd Bergmann. 7) Missing error return in rtm_to_fib_config() switch case, from Zhang Changzhong. 8) Fix some src/dest address handling in vrf and add a testcase. From Stephen Suryaputra. 9) Fix multicast handling in Seville switches driven by mscc-ocelot driver. From Vladimir Oltean. 10) Fix proto value passed to skb delivery demux in udp, from Xin Long. 11) HW pkt counters not reported correctly in enetc driver, from Claudiu Manoil. 12) Fix deadlock in bridge, from Joseph Huang. 13) Missing of_node_pur() in dpaa2 driver, fromn Christophe JAILLET. 14) Fix pid fetching in bpftool when there are a lot of results, from Andrii Nakryiko. 15) Fix long timeouts in nft_dynset, from Pablo Neira Ayuso. 16) Various stymmac fixes, from Fugang Duan. 17) Fix null deref in tipc, from Cengiz Can. 18) When mss is biog, coose more resonable rcvq_space in tcp, fromn Eric Dumazet. 19) Revert a geneve change that likely isnt necessary, from Jakub Kicinski. 20) Avoid premature rx buffer reuse in various Intel driversm from Björn Töpel. 21) retain EcT bits during TIS reflection in tcp, from Wei Wang. 22) Fix Tso deferral wrt. cwnd limiting in tcp, from Neal Cardwell. 23) MPLS_OPT_LSE_LABEL attribute is 342 ot 8 bits, from Guillaume Nault 24) Fix propagation of 32-bit signed bounds in bpf verifier and add test cases, from Alexei Starovoitov. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) selftests: fix poll error in udpgro.sh selftests/bpf: Fix "dubious pointer arithmetic" test selftests/bpf: Fix array access with signed variable test selftests/bpf: Add test for signed 32-bit bound check bug bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds. MAINTAINERS: Add entry for Marvell Prestera Ethernet Switch driver net: sched: Fix dump of MPLS_OPT_LSE_LABEL attribute in cls_flower net/mlx4_en: Handle TX error CQE net/mlx4_en: Avoid scheduling restart task if it is already running tcp: fix cwnd-limited bug for TSO deferral where we send nothing net: flow_offload: Fix memory leak for indirect flow block tcp: Retain ECT bits for tos reflection ethtool: fix stack overflow in ethnl_parse_bitset() e1000e: fix S0ix flow to allow S0i3.2 subset entry ice: avoid premature Rx buffer reuse ixgbe: avoid premature Rx buffer reuse i40e: avoid premature Rx buffer reuse igb: avoid transmit queue timeout in xdp path igb: use xdp_do_flush igb: skb add metasize for xdp ...
2020-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-12-10 The following pull-request contains BPF updates for your *net* tree. We've added 21 non-merge commits during the last 12 day(s) which contain a total of 21 files changed, 163 insertions(+), 88 deletions(-). The main changes are: 1) Fix propagation of 32-bit signed bounds from 64-bit bounds, from Alexei. 2) Fix ring_buffer__poll() return value, from Andrii. 3) Fix race in lwt_bpf, from Cong. 4) Fix test_offload, from Toke. 5) Various xsk fixes. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Cong Wang, Hulk Robot, Jakub Kicinski, Jean-Philippe Brucker, John Fastabend, Magnus Karlsson, Maxim Mikityanskiy, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-10bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds.Alexei Starovoitov
The 64-bit signed bounds should not affect 32-bit signed bounds unless the verifier knows that upper 32-bits are either all 1s or all 0s. For example the register with smin_value==1 doesn't mean that s32_min_value is also equal to 1, since smax_value could be larger than 32-bit subregister can hold. The verifier refines the smax/s32_max return value from certain helpers in do_refine_retval_range(). Teach the verifier to recognize that smin/s32_min value is also bounded. When both smin and smax bounds fit into 32-bit subregister the verifier can propagate those bounds. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-12-10exec: Transform exec_update_mutex into a rw_semaphoreEric W. Biederman
Recently syzbot reported[0] that there is a deadlock amongst the users of exec_update_mutex. The problematic lock ordering found by lockdep was: perf_event_open (exec_update_mutex -> ovl_i_mutex) chown (ovl_i_mutex -> sb_writes) sendfile (sb_writes -> p->lock) by reading from a proc file and writing to overlayfs proc_pid_syscall (p->lock -> exec_update_mutex) While looking at possible solutions it occured to me that all of the users and possible users involved only wanted to state of the given process to remain the same. They are all readers. The only writer is exec. There is no reason for readers to block on each other. So fix this deadlock by transforming exec_update_mutex into a rw_semaphore named exec_update_lock that only exec takes for writing. Cc: Jann Horn <jannh@google.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christopher Yeoh <cyeoh@au1.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Fixes: eea9673250db ("exec: Add exec_update_mutex to replace cred_guard_mutex") [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-12-10bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcuEric W. Biederman
When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance. Using task_lookup_next_fd_rcu simplifies task_file_seq_get_next, by moving the checking for the maximum file descritor into the generic code, and by remvoing the need for capturing and releasing a reference on files_struct. As the reference count of files_struct no longer needs to be maintained bpf_iter_seq_task_file_info can have it's files member removed and task_file_seq_get_next no longer needs it's fstruct argument. The curr_fd local variable does need to become unsigned to be used with fnext_task. As curr_fd is assigned from and assigned a u32 making curr_fd an unsigned int won't cause problems and might prevent them. [1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov <oleg@redhat.com> v1: https://lkml.kernel.org/r/20200817220425.9389-11-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-16-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-12-10kcmp: In get_file_raw_ptr use task_lookup_fd_rcuEric W. Biederman
Modify get_file_raw_ptr to use task_lookup_fd_rcu. The helper task_lookup_fd_rcu does the work of taking the task lock and verifying that task->files != NULL and then calls files_lookup_fd_rcu. So let use the helper to make a simpler implementation of get_file_raw_ptr. Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Link: https://lkml.kernel.org/r/20201120231441.29911-13-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-12-10file: Replace fcheck_files with files_lookup_fd_rcuEric W. Biederman
This change renames fcheck_files to files_lookup_fd_rcu. All of the remaining callers take the rcu_read_lock before calling this function so the _rcu suffix is appropriate. This change also tightens up the debug check to verify that all callers hold the rcu_read_lock. All callers that used to call files_check with the files->file_lock held have now been changed to call files_lookup_fd_locked. This change of name has helped remind me of which locks and which guarantees are in place helping me to catch bugs later in the patchset. The need for better names became apparent in the last round of discussion of this set of changes[1]. [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com Link: https://lkml.kernel.org/r/20201120231441.29911-9-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-12-10bpf: In bpf_task_fd_query use fget_taskEric W. Biederman
Use the helper fget_task to simplify bpf_task_fd_query. As well as simplifying the code this removes one unnecessary increment of struct files_struct. This unnecessary increment of files_struct.count can result in exec unnecessarily unsharing files_struct and breaking posix locks, and it can result in fget_light having to fallback to fget reducing performance. This simplification comes from the observation that none of the callers of get_files_struct actually need to call get_files_struct that was made when discussing[1] exec and posix file locks. [1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov <oleg@redhat.com> v1: https://lkml.kernel.org/r/20200817220425.9389-5-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-5-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-12-10kcmp: In kcmp_epoll_target use fget_taskEric W. Biederman
Use the helper fget_task and simplify the code. As well as simplifying the code this removes one unnecessary increment of struct files_struct. This unnecessary increment of files_struct.count can result in exec unnecessarily unsharing files_struct and breaking posix locks, and it can result in fget_light having to fallback to fget reducing performance. Suggested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> v1: https://lkml.kernel.org/r/20200817220425.9389-4-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-4-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-12-10exec: Simplify unshare_filesEric W. Biederman
Now that exec no longer needs to return the unshared files to their previous value there is no reason to return displaced. Instead when unshare_fd creates a copy of the file table, call put_files_struct before returning from unshare_files. Acked-by: Christian Brauner <christian.brauner@ubuntu.com> v1: https://lkml.kernel.org/r/20200817220425.9389-2-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-2-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-12-09Input: gtco - remove driverDmitry Torokhov
The driver has its own HID descriptor parsing code, that had and still has several issues discovered by syzbot and other tools. Ideally we should move the driver over to the HID subsystem, so that it uses proven parsing code. However the devices in question are EOL, and GTCO is not willing to extend resources for that, so let's simply remove the driver. Note that our HID support has greatly improved over the last 10 years, we may also consider reverting 6f8d9e26e7de ("hid-core.c: Adds all GTCO CalComp Digitizers and InterWrite School Products to blacklist") and see if GTCO devices actually work with normal HID drivers. Link: https://lore.kernel.org/r/X8wbBtO5KidME17K@google.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-12-09driver core: Add fwnode_init()Saravana Kannan
There are multiple locations in the kernel where a struct fwnode_handle is initialized. Add fwnode_init() so that we have one way of initializing a fwnode_handle. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20201121020232.908850-8-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-09Merge remote-tracking branch 'arm64/for-next/fixes' into for-next/coreCatalin Marinas
* arm64/for-next/fixes: (26 commits) arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE arm64: mte: Fix typo in macro definition arm64: entry: fix EL1 debug transitions arm64: entry: fix NMI {user, kernel}->kernel transitions arm64: entry: fix non-NMI kernel<->kernel transitions arm64: ptrace: prepare for EL1 irq/rcu tracking arm64: entry: fix non-NMI user<->kernel transitions arm64: entry: move el1 irq/nmi logic to C arm64: entry: prepare ret_to_user for function call arm64: entry: move enter_from_user_mode to entry-common.c arm64: entry: mark entry code as noinstr arm64: mark idle code as noinstr arm64: syscall: exit userspace before unmasking exceptions arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect() arm64: pgtable: Fix pte_accessible() ACPI/IORT: Fix doc warnings in iort.c arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist ... # Conflicts: # arch/arm64/include/asm/exception.h # arch/arm64/kernel/sdei.c
2020-12-09Merge remote-tracking branch 'arm64/for-next/scs' into for-next/coreCatalin Marinas
* arm64/for-next/scs: arm64: sdei: Push IS_ENABLED() checks down to callee functions arm64: scs: use vmapped IRQ and SDEI shadow stacks scs: switch to vmapped shadow stacks
2020-12-09perf: Break deadlock involving exec_update_mutexpeterz@infradead.org
Syzbot reported a lock inversion involving perf. The sore point being perf holding exec_update_mutex() for a very long time, specifically across a whole bunch of filesystem ops in pmu::event_init() (uprobes) and anon_inode_getfile(). This then inverts against procfs code trying to take exec_update_mutex. Move the permission checks later, such that we need to hold the mutex over less code. Reported-by: syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>