Age | Commit message (Collapse) | Author |
|
It is possible that on error pg->size can be zero when getting its order,
which would return a -1 value. It is dangerous to pass in an order of -1
to free_pages(). Check if order is greater than or equal to zero before
calling free_pages().
Link: https://lore.kernel.org/lkml/20210330093916.432697c7@gandalf.local.home/
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
The custom devres structure manages only a single pointer which can
can be achieved by using devm_add_action_or_reset() as well which
makes the code simpler.
[ tglx: Fixed return value handling - found by smatch ]
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210301142659.8971-1-brgl@bgdev.pl
|
|
The struct clocksource callbacks enable() and disable() are described as a
way to allow clock sources to enter a power save mode. See commit
4614e6adafa2 ("clocksource: add enable() and disable() callbacks")
But using runtime PM from these callbacks triggers a cyclic lockdep warning when
switching clock source using change_clocksource().
# echo e60f0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource
======================================================
WARNING: possible circular locking dependency detected
------------------------------------------------------
migration/0/11 is trying to acquire lock:
ffff0000403ed220 (&dev->power.lock){-...}-{2:2}, at: __pm_runtime_resume+0x40/0x74
but task is already holding lock:
ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (tk_core.seq.seqcount){----}-{0:0}:
ktime_get+0x28/0xa0
hrtimer_start_range_ns+0x210/0x2dc
generic_sched_clock_init+0x70/0x88
sched_clock_init+0x40/0x64
start_kernel+0x494/0x524
-> #1 (hrtimer_bases.lock){-.-.}-{2:2}:
hrtimer_start_range_ns+0x68/0x2dc
rpm_suspend+0x308/0x5dc
rpm_idle+0xc4/0x2a4
pm_runtime_work+0x98/0xc0
process_one_work+0x294/0x6f0
worker_thread+0x70/0x45c
kthread+0x154/0x160
ret_from_fork+0x10/0x20
-> #0 (&dev->power.lock){-...}-{2:2}:
_raw_spin_lock_irqsave+0x7c/0xc4
__pm_runtime_resume+0x40/0x74
sh_cmt_start+0x1c4/0x260
sh_cmt_clocksource_enable+0x28/0x50
change_clocksource+0x9c/0x160
multi_cpu_stop+0xa4/0x190
cpu_stopper_thread+0x90/0x154
smpboot_thread_fn+0x244/0x270
kthread+0x154/0x160
ret_from_fork+0x10/0x20
other info that might help us debug this:
Chain exists of:
&dev->power.lock --> hrtimer_bases.lock --> tk_core.seq.seqcount
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(tk_core.seq.seqcount);
lock(hrtimer_bases.lock);
lock(tk_core.seq.seqcount);
lock(&dev->power.lock);
*** DEADLOCK ***
2 locks held by migration/0/11:
#0: ffff8000113c9278 (timekeeper_lock){-.-.}-{2:2}, at: change_clocksource+0x2c/0x160
#1: ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190
Rework change_clocksource() so it enables the new clocksource and disables
the old clocksource outside of the timekeeper_lock and seqcount write held
region. There is no requirement that these callbacks are invoked from the
lock held region.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20210211134318.323910-1-niklas.soderlund+renesas@ragnatech.se
|
|
Pull io_uring fixes from Jens Axboe:
- Use thread info versions of flag testing, as discussed last week.
- The series enabling PF_IO_WORKER to just take signals, instead of
needing to special case that they do not in a bunch of places. Ends
up being pretty trivial to do, and then we can revert all the special
casing we're currently doing.
- Kill dead pointer assignment
- Fix hashed part of async work queue trace
- Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS
- Fix a link completion ordering regression in this merge window
- Cancellation fixes
* tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
io_uring: remove unsued assignment to pointer io
io_uring: don't cancel extra on files match
io_uring: don't cancel-track common timeouts
io_uring: do post-completion chore on t-out cancel
io_uring: fix timeout cancel return code
Revert "signal: don't allow STOP on PF_IO_WORKER threads"
Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"
kernel: stop masking signals in create_io_thread()
io_uring: handle signals for IO threads like a normal thread
kernel: don't call do_exit() for PF_IO_WORKER threads
io_uring: maintain CQE order of a failed link
io-wq: fix race around pending work on teardown
io_uring: do ctx sqd ejection in a clear context
io_uring: fix provide_buffers sign extension
io_uring: don't skip file_end_write() on reissue
io_uring: correct io_queue_async_work() traces
io_uring: don't use {test,clear}_tsk_thread_flag() for current
|
|
This reverts commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597.
The IO threads allow and handle SIGSTOP now, so don't special case them
anymore in task_set_jobctl_pending().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This reverts commit 15b2219facadec583c24523eed40fa45865f859f.
Before IO threads accepted signals, the freezer using take signals to wake
up an IO thread would cause them to loop without any way to clear the
pending signal. That is no longer the case, so stop special casing
PF_IO_WORKER in the freezer.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9.
The IO threads do allow signals now, including SIGSTOP, and we can allow
ptrace attach. Attaching won't reveal anything interesting for the IO
threads, but it will allow eg gdb to attach to a task with io_urings
and IO threads without complaining. And once attached, it will allow
the usual introspection into regular threads.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This reverts commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5.
IO threads now take signals just fine, so there's no reason to limit them
specifically. Revert the change that prevented that from happening.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is racy - move the blocking into when the task is created and
we're marking it as PF_IO_WORKER anyway. The IO threads are now
prepared to handle signals like SIGSTOP as well, so clear that from
the mask to allow proper stopping of IO threads.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently module can be unloaded even if there's a trampoline
register in it. It's easily reproduced by running in parallel:
# while :; do ./test_progs -t module_attach; done
# while :; do rmmod bpf_testmod; sleep 0.5; done
Taking the module reference in case the trampoline's ip is
within the module code. Releasing it when the trampoline's
ip is unregistered.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210326105900.151466-1-jolsa@kernel.org
|
|
Right now we're never calling get_signal() from PF_IO_WORKER threads, but
in preparation for doing so, don't handle a fatal signal for them. The
workers have state they need to cleanup when exiting, so just return
instead of calling do_exit() on their behalf. The threads themselves will
detect a fatal signal and do proper shutdown.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix an issue related to device links in the runtime PM framework
and debugfs usage in the Energy Model code.
Specifics:
- Modify the runtime PM device suspend to avoid suspending supplier
devices before the consumer device's status changes to
RPM_SUSPENDED (Rafael Wysocki)
- Change the Energy Model code to prevent it from attempting to
create its main debugfs directory too early (Lukasz Luba)"
* tag 'pm-5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: EM: postpone creating the debugfs dir till fs_initcall
PM: runtime: Defer suspending suppliers
|
|
The name string for BPF_XOR is "xor", not "or". Fix it.
Fixes: 981f94c3e921 ("bpf: Add bitwise atomic instructions")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/bpf/20210325134141.8533-1-xukuohai@huawei.com
|
|
With the introduction of the struct_ops program type, it became possible to
implement kernel functionality in BPF, making it viable to use BPF in place
of a regular kernel module for these particular operations.
Thus far, the only user of this mechanism is for implementing TCP
congestion control algorithms. These are clearly marked as GPL-only when
implemented as modules (as seen by the use of EXPORT_SYMBOL_GPL for
tcp_register_congestion_control()), so it seems like an oversight that this
was not carried over to BPF implementations. Since this is the only user
of the struct_ops mechanism, just enforcing GPL-only for the struct_ops
program type seems like the simplest way to fix this.
Fixes: 0baf26b0fcd7 ("bpf: tcp: Support tcp_congestion_ops in bpf")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210326100314.121853-1-toke@redhat.com
|
|
Merge misc fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: mm (hugetlb, kasan, gup,
selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64,
gcov, and mailmap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mailmap: update Andrey Konovalov's email address
mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
mm: memblock: fix section mismatch warning again
kfence: make compatible with kmemleak
gcov: fix clang-11+ support
ia64: fix format strings for err_inject
ia64: mca: allocate early mca with GFP_ATOMIC
squashfs: fix xattr id and id lookup sanity checks
squashfs: fix inode lookup sanity checks
z3fold: prevent reclaim/free race for headless pages
selftests/vm: fix out-of-tree build
mm/mmu_notifiers: ensure range_end() is paired with range_start()
kasan: fix per-page tags for non-page_alloc pages
hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
|
|
LLVM changed the expected function signatures for llvm_gcda_start_file()
and llvm_gcda_emit_function() in the clang-11 release. Users of
clang-11 or newer may have noticed their kernels failing to boot due to
a panic when enabling CONFIG_GCOV_KERNEL=y +CONFIG_GCOV_PROFILE_ALL=y.
Fix up the function signatures so calling these functions doesn't panic
the kernel.
Link: https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041
Link: https://reviews.llvm.org/rG13a633b438b6500ecad9e4f936ebadf3411d0f44
Link: https://lkml.kernel.org/r/20210312224132.3413602-2-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Prasad Sodagudi <psodagud@quicinc.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Cc: <stable@vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
"Various fixes, all over:
1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu.
2) Always store the rx queue mapping in veth, from Maciej
Fijalkowski.
3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov.
4) Fix memory leak in octeontx2-af from Colin Ian King.
5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from
Yonghong Song.
6) Fix tx ptp stats in mlx5, from Aya Levin.
7) Check correct ip version in tun decap, fropm Roi Dayan.
8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit.
9) Work item memork leak in mlx5, from Shay Drory.
10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann.
11) Lack of preemptrion awareness in macvlan, from Eric Dumazet.
12) Fix data race in pxa168_eth, from Pavel Andrianov.
13) Range validate stab in red_check_params(), from Eric Dumazet.
14) Inherit vlan filtering setting properly in b53 driver, from
Florian Fainelli.
15) Fix rtnl locking in igc driver, from Sasha Neftin.
16) Pause handling fixes in igc driver, from Muhammad Husaini
Zulkifli.
17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits.
18) Use after free in qlcnic, from Lv Yunlong.
19) fix crash in fritzpci mISDN, from Tong Zhang.
20) Premature rx buffer reuse in igb, from Li RongQing.
21) Missing termination of ip[a driver message handler arrays, from
Alex Elder.
22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25
driver, from Xie He.
23) Use after free in c_can_pci_remove(), from Tong Zhang.
24) Uninitialized variable use in nl80211, from Jarod Wilson.
25) Off by one size calc in bpf verifier, from Piotr Krysiuk.
26) Use delayed work instead of deferrable for flowtable GC, from
Yinjun Zhang.
27) Fix infinite loop in NPC unmap of octeontx2 driver, from
Hariprasad Kelam.
28) Fix being unable to change MTU of dwmac-sun8i devices due to lack
of fifo sizes, from Corentin Labbe.
29) DMA use after free in r8169 with WoL, fom Heiner Kallweit.
30) Mismatched prototypes in isdn-capi, from Arnd Bergmann.
31) Fix psample UAPI breakage, from Ido Schimmel"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits)
psample: Fix user API breakage
math: Export mul_u64_u64_div_u64
ch_ktls: fix enum-conversion warning
octeontx2-af: Fix memory leak of object buf
ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
net: bridge: don't notify switchdev for local FDB addresses
net/sched: act_ct: clear post_ct if doing ct_clear
net: dsa: don't assign an error value to tag_ops
isdn: capi: fix mismatched prototypes
net/mlx5: SF, do not use ecpu bit for vhca state processing
net/mlx5e: Fix division by 0 in mlx5e_select_queue
net/mlx5e: Fix error path for ethtool set-priv-flag
net/mlx5e: Offload tuple rewrite for non-CT flows
net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
net/mlx5: Add back multicast stats for uplink representor
net: ipconfig: ic_dev can be NULL in ic_close_devs
MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one
docs: networking: Fix a typo
r8169: fix DMA being used after buffer free if WoL is enabled
net: ipa: fix init header command validation
...
|
|
The debugfs directory '/sys/kernel/debug/energy_model' is needed before
the Energy Model registration can happen. With the recent change in
debugfs subsystem it's not allowed to create this directory at early
stage (core_initcall). Thus creating this directory would fail.
Postpone the creation of the EM debug dir to later stage: fs_initcall.
It should be safe since all clients: CPUFreq drivers, Devfreq drivers
will be initialized in later stages.
The custom debug log below prints the time of creation the EM debug dir
at fs_initcall and successful registration of EMs at later stages.
[ 1.505717] energy_model: creating rootdir
[ 3.698307] cpu cpu0: EM: created perf domain
[ 3.709022] cpu cpu1: EM: created perf domain
Fixes: 56348560d495 ("debugfs: do not attempt to create a new file before the filesystem is initalized")
Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Fix ~56 single-word typos in timekeeping & clocksource code comments.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-kernel@vger.kernel.org
|
|
Clang doesn't like format strings that truncate a 32-bit
value to something shorter:
kernel/locking/lockdep.c:709:4: error: format specifies type 'short' but the argument has type 'int' [-Werror,-Wformat]
In this case, the warning is a slightly questionable, as it could realize
that both class->wait_type_outer and class->wait_type_inner are in fact
8-bit struct members, even though the result of the ?: operator becomes an
'int'.
However, there is really no point in printing the number as a 16-bit
'short' rather than either an 8-bit or 32-bit number, so just change
it to a normal %d.
Fixes: de8f5e4f2dc1 ("lockdep: Introduce wait-type checks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210322115531.3987555-1-arnd@kernel.org
|
|
Fix ~36 single-word typos in the IRQ, irqchip and irqdomain code comments.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Fix 3 single-word typos in the generic syscall entry code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull io_uring followup fixes from Jens Axboe:
- The SIGSTOP change from Eric, so we properly ignore that for
PF_IO_WORKER threads.
- Disallow sending signals to PF_IO_WORKER threads in general, we're
not interested in having them funnel back to the io_uring owning
task.
- Stable fix from Stefan, ensuring we properly break links for short
send/sendmsg recv/recvmsg if MSG_WAITALL is set.
- Catch and loop when needing to run task_work before a PF_IO_WORKER
threads goes to sleep.
* tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block:
io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
io-wq: ensure task is running before processing task_work
signal: don't allow STOP on PF_IO_WORKER threads
signal: don't allow sending any signals to PF_IO_WORKER threads
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"A change to robustify force-threaded IRQ handlers to always disable
interrupts, plus a DocBook fix.
The force-threaded IRQ handler change has been accelerated from the
normal schedule of such a change to keep the bad pattern/workaround of
spin_lock_irqsave() in handlers or IRQF_NOTHREAD as a kludge from
spreading"
* tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Disable interrupts for force threaded handlers
genirq/irq_sim: Fix typos in kernel doc (fnode -> fwnode)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
- Get static calls & modules right. Hopefully.
- WW mutex fixes
* tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
static_call: Fix static_call_update() sanity check
static_call: Align static_call_is_init() patching condition
static_call: Fix static_call_set_init()
locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"The freshest pile of shiny x86 fixes for 5.12:
- Add the arch-specific mapping between physical and logical CPUs to
fix devicetree-node lookups
- Restore the IRQ2 ignore logic
- Fix get_nr_restart_syscall() to return the correct restart syscall
number. Split in a 4-patches set to avoid kABI breakage when
backporting to dead kernels"
* tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/of: Fix CPU devicetree-node lookups
x86/ioapic: Ignore IRQ2 again
x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
x86: Move TS_COMPAT back to asm/thread_info.h
kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
|
|
Just like we don't allow normal signals to IO threads, don't deliver a
STOP to a task that has PF_IO_WORKER set. The IO threads don't take
signals in general, and have no means of flushing out a stop either.
Longer term, we may want to look into allowing stop of these threads,
as it relates to eg process freezing. For now, this prevents a spin
issue if a SIGSTOP is delivered to the parent task.
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
They don't take signals individually, and even if they share signals with
the parent task, don't allow them to be delivered through the worker
thread. Linux does allow this kind of behavior for regular threads, but
it's really a compatability thing that we need not care about for the IO
threads.
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
non-static key" message
Since this message is printed when dynamically allocated spinlocks (e.g.
kzalloc()) are used without initialization (e.g. spin_lock_init()),
suggest to developers to check whether initialization functions for objects
were called, before making developers wonder what annotation is missing.
[ mingo: Minor tweaks to the message. ]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210321064913.4619-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With interrupt force threading all device interrupt handlers are invoked
from kernel threads. Contrary to hard interrupt context the invocation only
disables bottom halfs, but not interrupts. This was an oversight back then
because any code like this will have an issue:
thread(irq_A)
irq_handler(A)
spin_lock(&foo->lock);
interrupt(irq_B)
irq_handler(B)
spin_lock(&foo->lock);
This has been triggered with networking (NAPI vs. hrtimers) and console
drivers where printk() happens from an interrupt which interrupted the
force threaded handler.
Now people noticed and started to change the spin_lock() in the handler to
spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the
interrupt request which in turn breaks RT.
Fix the root cause and not the symptom and disable interrupts before
invoking the force threaded handler which preserves the regular semantics
and the usefulness of the interrupt force threading as a general debugging
tool.
For not RT this is not changing much, except that during the execution of
the threaded handler interrupts are delayed until the handler
returns. Vs. scheduling and softirq processing there is no difference.
For RT kernels there is no issue.
Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading")
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Johan Hovold <johan@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
|
|
Pull io_uring fixes from Jens Axboe:
"Quieter week this time, which was both expected and desired. About
half of the below is fixes for this release, the other half are just
fixes in general. In detail:
- Fix the freezing of IO threads, by making the freezer not send them
fake signals. Make them freezable by default.
- Like we did for personalities, move the buffer IDR to xarray. Kills
some code and avoids a use-after-free on teardown.
- SQPOLL cleanups and fixes (Pavel)
- Fix linked timeout race (Pavel)
- Fix potential completion post use-after-free (Pavel)
- Cleanup and move internal structures outside of general kernel view
(Stefan)
- Use MSG_SIGNAL for send/recv from io_uring (Stefan)"
* tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block:
io_uring: don't leak creds on SQO attach error
io_uring: use typesafe pointers in io_uring_task
io_uring: remove structures from include/linux/io_uring.h
io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
io_uring: fix sqpoll cancellation via task_work
io_uring: add generic callback_head helpers
io_uring: fix concurrent parking
io_uring: halt SQO submission on ctx exit
io_uring: replace sqd rw_semaphore with mutex
io_uring: fix complete_post use ctx after free
io_uring: fix ->flags races by linked timeouts
io_uring: convert io_buffer_idr to XArray
io_uring: allow IO worker threads to be frozen
kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
|
|
When irq_matrix_free() is called for an unallocated vector the
managed_allocated and total_allocated counters get out of sync with the
real state of the matrix. Later, when the last interrupt is freed, these
counters will underflow resulting in UINTMAX because the counters are
unsigned.
While this is certainly a problem of the calling code, this can be catched
in the allocator by checking the allocation bit for the to be freed vector
which simplifies debugging.
An example of the problem described above:
https://lore.kernel.org/lkml/20210318192819.636943062@linutronix.de/
Add the missing sanity check and emit a warning when it triggers.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210319111823.1105248-1-vkuznets@redhat.com
|
|
The syzbot reported a memleak as follows:
BUG: memory leak
unreferenced object 0xffff888101b41d00 (size 120):
comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
backtrace:
[<ffffffff8125dc56>] alloc_pid+0x66/0x560
[<ffffffff81226405>] copy_process+0x1465/0x25e0
[<ffffffff81227943>] kernel_clone+0xf3/0x670
[<ffffffff812281a1>] kernel_thread+0x61/0x80
[<ffffffff81253464>] call_usermodehelper_exec_work
[<ffffffff81253464>] call_usermodehelper_exec_work+0xc4/0x120
[<ffffffff812591c9>] process_one_work+0x2c9/0x600
[<ffffffff81259ab9>] worker_thread+0x59/0x5d0
[<ffffffff812611c8>] kthread+0x178/0x1b0
[<ffffffff8100227f>] ret_from_fork+0x1f/0x30
unreferenced object 0xffff888110ef5c00 (size 232):
comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
backtrace:
[<ffffffff8154a0cf>] kmem_cache_zalloc
[<ffffffff8154a0cf>] __alloc_file+0x1f/0xf0
[<ffffffff8154a809>] alloc_empty_file+0x69/0x120
[<ffffffff8154a8f3>] alloc_file+0x33/0x1b0
[<ffffffff8154ab22>] alloc_file_pseudo+0xb2/0x140
[<ffffffff81559218>] create_pipe_files+0x138/0x2e0
[<ffffffff8126c793>] umd_setup+0x33/0x220
[<ffffffff81253574>] call_usermodehelper_exec_async+0xb4/0x1b0
[<ffffffff8100227f>] ret_from_fork+0x1f/0x30
After the UMD process exits, the pipe_to_umh/pipe_from_umh and
tgid need to be released.
Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.")
Reported-by: syzbot+44908bb56d2bfe56b28e@syzkaller.appspotmail.com
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210317030915.2865-1-qiang.zhang@windriver.com
|
|
Sites that match init_section_contains() get marked as INIT. For
built-in code init_sections contains both __init and __exit text. OTOH
kernel_text_address() only explicitly includes __init text (and there
are no __exit text markers).
Match what jump_label already does and ignore the warning for INIT
sites. Also see the excellent changelog for commit: 8f35eaa5f2de
("jump_label: Don't warn on __exit jump entries")
Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure")
Reported-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.739542434@infradead.org
|
|
The intent is to avoid writing init code after init (because the text
might have been freed). The code is needlessly different between
jump_label and static_call and not obviously correct.
The existing code relies on the fact that the module loader clears the
init layout, such that within_module_init() always fails, while
jump_label relies on the module state which is more obvious and
matches the kernel logic.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.636651340@infradead.org
|
|
It turns out that static_call_set_init() does not preserve the other
flags; IOW. it clears TAIL if it was set.
Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure")
Reported-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.519406371@infradead.org
|
|
This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
This patch causes a panic when rebooting my Dell Poweredge r440. I do
not have the full panic log as it's lost at that stage of the reboot and
I do not have a serial console. Reverting this patch makes my system
able to reboot again.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There is no need to keep the dentry pointer around for the created
debugfs file, as it is only needed when removing it from the system.
When it is to be removed, ask debugfs itself for the pointer, to save on
storage and make things a bit simpler.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210216155020.1012407-1-gregkh@linuxfoundation.org
|
|
The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
The synchronize_rcu_tasks() will not wait for such tasks to complete.
In such case the trampoline image will be freed and when the task
wakes up the return IP will point to freed memory causing the crash.
Solve this by adding percpu_ref_get/put for the duration of trampoline
and separate trampoline vs its image life times.
The "half page" optimization has to be removed, since
first_half->second_half->first_half transition cannot be guaranteed to
complete in deterministic time. Every trampoline update becomes a new image.
The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
call_rcu_tasks. Together they will wait for the original function and
trampoline asm to complete. The trampoline is patched from nop to jmp to skip
fexit progs. They are freed independently from the trampoline. The image with
fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
will wait for both sleepable and non-sleepable progs to complete.
Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU
Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
|
|
Given we know the max possible value of ptr_limit at the time of retrieving
the latter, add basic assertions, so that the verifier can bail out if
anything looks odd and reject the program. Nothing triggered this so far,
but it also does not hurt to have these.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
The if condition in irq_matrix_reserve() can be much simpler.
While at it fix a typo in the comment.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210211070953.5914-1-jgross@suse.com
|
|
Instead of having the mov32 with aux->alu_limit - 1 immediate, move this
operation to retrieve_ptr_limit() instead to simplify the logic and to
allow for subsequent sanity boundary checks inside retrieve_ptr_limit().
This avoids in future that at the time of the verifier masking rewrite
we'd run into an underflow which would not sign extend due to the nature
of mov32 instruction.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
retrieve_ptr_limit() computes the ptr_limit for registers with stack and
map_value type. ptr_limit is the size of the memory area that is still
valid / in-bounds from the point of the current position and direction
of the operation (add / sub). This size will later be used for masking
the operation such that attempting out-of-bounds access in the speculative
domain is redirected to remain within the bounds of the current map value.
When masking to the right the size is correct, however, when masking to
the left, the size is off-by-one which would lead to an incorrect mask
and thus incorrect arithmetic operation in the non-speculative domain.
Piotr found that if the resulting alu_limit value is zero, then the
BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading
0xffffffff into AX instead of sign-extending to the full 64 bit range,
and as a result, this allows abuse for executing speculatively out-of-
bounds loads against 4GB window of address space and thus extracting the
contents of kernel memory via side-channel.
Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic")
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
The purpose of this patch is to streamline error propagation and in particular
to propagate retrieve_ptr_limit() errors for pointer types that are not defining
a ptr_limit such that register-based alu ops against these types can be rejected.
The main rationale is that a gap has been identified by Piotr in the existing
protection against speculatively out-of-bounds loads, for example, in case of
ctx pointers, unprivileged programs can still perform pointer arithmetic. This
can be abused to execute speculatively out-of-bounds loads without restrictions
and thus extract contents of kernel memory.
Fix this by rejecting unprivileged programs that attempt any pointer arithmetic
on unprotected pointer types. The two affected ones are pointer to ctx as well
as pointer to map. Field access to a modified ctx' pointer is rejected at a
later point in time in the verifier, and 7c6967326267 ("bpf: Permit map_ptr
arithmetic with opcode add and offset 0") only relevant for root-only use cases.
Risk of unprivileged program breakage is considered very low.
Fixes: 7c6967326267 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0")
Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Piotr Krysiuk <piotras@gmail.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
On RT a task which has soft interrupts disabled can block on a lock and
schedule out to idle while soft interrupts are pending. This triggers the
warning in the NOHZ idle code which complains about going idle with pending
soft interrupts. But as the task is blocked soft interrupt processing is
temporarily blocked as well which means that such a warning is a false
positive.
To prevent that check the per CPU state which indicates that a scheduled
out task has soft interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.527563866@linutronix.de
|
|
Provide a local lock based serialization for soft interrupts on RT which
allows the local_bh_disabled() sections and servicing soft interrupts to be
preemptible.
Provide the necessary inline helpers which allow to reuse the bulk of the
softirq processing code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.426370483@linutronix.de
|
|
To allow reuse of the bulk of softirq processing code for RT and to avoid
#ifdeffery all over the place, split protections for various code sections
out into inline helpers so the RT variant can just replace them in one go.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.310118772@linutronix.de
|
|
vtime_account_irq and irqtime_account_irq() base checks on preempt_count()
which fails on RT because preempt_count() does not contain the softirq
accounting which is seperate on RT.
These checks do not need the full preempt count as they only operate on the
hard and softirq sections.
Use irq_count() instead which provides the correct value on both RT and non
RT kernels. The compiler is clever enough to fold the masking for !RT:
99b: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax
- 9a2: 25 ff ff ff 7f and $0x7fffffff,%eax
+ 9a2: 25 00 ff ff 00 and $0xffff00,%eax
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.153926793@linutronix.de
|
|
tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in
the tasklet state to be cleared. This works on !RT nicely because the
corresponding execution can only happen on a different CPU.
On RT softirq processing is preemptible, therefore a task preempting the
softirq processing thread can spin forever.
Prevent this by invoking local_bh_disable()/enable() inside the loop. In
case that the softirq processing thread was preempted by the current task,
current will block on the local lock which yields the CPU to the preempted
softirq processing thread. If the tasklet is processed on a different CPU
then the local_bh_disable()/enable() pair is just a waste of processor
cycles.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309084241.988908275@linutronix.de
|
|
tasklet_kill() spin waits for TASKLET_STATE_SCHED to be cleared invoking
yield() from inside the loop. yield() is an ill defined mechanism and the
result might still be wasting CPU cycles in a tight loop which is
especially painful in a guest when the CPU running the tasklet is scheduled
out.
tasklet_kill() is used in teardown paths and not performance critical at
all. Replace the spin wait with wait_var_event().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309084241.890532921@linutronix.de
|