Age | Commit message (Collapse) | Author |
|
Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
all writes into sequential write required zones be aligned to the device
physical block size. However, NVMe ZNS does not have this constraint and
allows write operations into sequential zones to be aligned to the
device logical block size. This inconsistency does not help with
software portability across device types.
To solve this, introduce the zone_write_granularity queue limit to
indicate the alignment constraint, in bytes, of write operations into
zones of a zoned block device. This new limit is exported as a
read-only sysfs queue attribute and the helper
blk_queue_zone_write_granularity() introduced for drivers to set this
limit.
The function blk_queue_set_zoned() is modified to set this new limit to
the device logical block size by default. NVMe ZNS devices as well as
zoned nullb devices use this default value as is. The scsi disk driver
is modified to execute the blk_queue_zone_write_granularity() helper to
set the zone write granularity of host-managed SMR disks to the disk
physical block size.
The accessor functions queue_zone_write_granularity() and
bdev_zone_write_granularity() are also introduced.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When changing the zoned model of host-aware zoned block devices, use
blk_queue_set_zoned() instead of directly assigning the gendisk queue
zoned limit.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use blk_queue_set_zoned() to set a nullb device zone model instead of
directly assigning the device queue zoned limit. This initialization of
the devicve zoned model as well as the setup of the queue flag
QUEUE_FLAG_ZONE_RESETALL and of the device queue elevator feature are
moved from null_init_zoned_dev() to null_register_zoned_dev() so that
the initialization of the queue limits is done when the gendisk of the
nullb device is available.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For a zoned namespace, in nvme_update_ns_info(), call
nvme_update_zone_info() after executing nvme_update_disk_info() so that
the namespace queue logical and physical block size limits are set.
This allows setting the namespace queue max_zone_append_sectors limit
in nvme_update_zone_info() instead of nvme_revalidate_zones(),
simplifying this function. Also use blk_queue_set_zoned() to set the
namespace zoned model.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The description of the zone_append_max_bytes sysfs queue attribute is
missing from Documentation/block/queue-sysfs.rst. Add it.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of imposing rlimit memlock limits for the rings themselves,
ensure that we account them properly under memcg with __GFP_ACCOUNT.
We retain rlimit memlock for registered buffers, this is just for the
ring arrays themselves.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This puts io_uring under the memory cgroups accounting and limits for
requests.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is the last class of requests that cannot utilize the req alloc
cache. Add a per-ctx req cache that is protected by the completion_lock,
and refill our submit side cache when it gets over our batch count.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Abaci reported follow issue:
[ 30.615891] ======================================================
[ 30.616648] WARNING: possible circular locking dependency detected
[ 30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted
[ 30.618035] ------------------------------------------------------
[ 30.618914] a.out/1128 is trying to acquire lock:
[ 30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220
[ 30.620505]
[ 30.620505] but task is already holding lock:
[ 30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
[ 30.622349]
[ 30.622349] which lock already depends on the new lock.
[ 30.622349]
[ 30.623289]
[ 30.623289] the existing dependency chain (in reverse order) is:
[ 30.624243]
[ 30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}:
[ 30.625263] lock_acquire+0x2c7/0x390
[ 30.625868] __mutex_lock+0xae/0x9f0
[ 30.626451] io_cqring_overflow_flush.part.95+0x6d/0x70
[ 30.627278] io_uring_poll+0xcb/0xd0
[ 30.627890] ep_item_poll.isra.14+0x4e/0x90
[ 30.628531] do_epoll_ctl+0xb7e/0x1120
[ 30.629122] __x64_sys_epoll_ctl+0x70/0xb0
[ 30.629770] do_syscall_64+0x2d/0x40
[ 30.630332] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 30.631187]
[ 30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}:
[ 30.631985] check_prevs_add+0x226/0xb00
[ 30.632584] __lock_acquire+0x1237/0x13a0
[ 30.633207] lock_acquire+0x2c7/0x390
[ 30.633740] __mutex_lock+0xae/0x9f0
[ 30.634258] __ep_eventpoll_poll+0x9f/0x220
[ 30.634879] __io_arm_poll_handler+0xbf/0x220
[ 30.635462] io_issue_sqe+0xa6b/0x13e0
[ 30.635982] __io_queue_sqe+0x10b/0x550
[ 30.636648] io_queue_sqe+0x235/0x470
[ 30.637281] io_submit_sqes+0xcce/0xf10
[ 30.637839] __x64_sys_io_uring_enter+0x3fb/0x5b0
[ 30.638465] do_syscall_64+0x2d/0x40
[ 30.638999] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 30.639643]
[ 30.639643] other info that might help us debug this:
[ 30.639643]
[ 30.640618] Possible unsafe locking scenario:
[ 30.640618]
[ 30.641402] CPU0 CPU1
[ 30.641938] ---- ----
[ 30.642664] lock(&ctx->uring_lock);
[ 30.643425] lock(&ep->mtx);
[ 30.644498] lock(&ctx->uring_lock);
[ 30.645668] lock(&ep->mtx);
[ 30.646321]
[ 30.646321] *** DEADLOCK ***
[ 30.646321]
[ 30.647642] 1 lock held by a.out/1128:
[ 30.648424] #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
[ 30.649954]
[ 30.649954] stack backtrace:
[ 30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1
[ 30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 30.652290] Call Trace:
[ 30.652688] dump_stack+0xac/0xe3
[ 30.653164] check_noncircular+0x11e/0x130
[ 30.653747] ? check_prevs_add+0x226/0xb00
[ 30.654303] check_prevs_add+0x226/0xb00
[ 30.654845] ? add_lock_to_list.constprop.49+0xac/0x1d0
[ 30.655564] __lock_acquire+0x1237/0x13a0
[ 30.656262] lock_acquire+0x2c7/0x390
[ 30.656788] ? __ep_eventpoll_poll+0x9f/0x220
[ 30.657379] ? __io_queue_proc.isra.88+0x180/0x180
[ 30.658014] __mutex_lock+0xae/0x9f0
[ 30.658524] ? __ep_eventpoll_poll+0x9f/0x220
[ 30.659112] ? mark_held_locks+0x5a/0x80
[ 30.659648] ? __ep_eventpoll_poll+0x9f/0x220
[ 30.660229] ? _raw_spin_unlock_irqrestore+0x2d/0x40
[ 30.660885] ? trace_hardirqs_on+0x46/0x110
[ 30.661471] ? __io_queue_proc.isra.88+0x180/0x180
[ 30.662102] ? __ep_eventpoll_poll+0x9f/0x220
[ 30.662696] __ep_eventpoll_poll+0x9f/0x220
[ 30.663273] ? __ep_eventpoll_poll+0x220/0x220
[ 30.663875] __io_arm_poll_handler+0xbf/0x220
[ 30.664463] io_issue_sqe+0xa6b/0x13e0
[ 30.664984] ? __lock_acquire+0x782/0x13a0
[ 30.665544] ? __io_queue_proc.isra.88+0x180/0x180
[ 30.666170] ? __io_queue_sqe+0x10b/0x550
[ 30.666725] __io_queue_sqe+0x10b/0x550
[ 30.667252] ? __fget_files+0x131/0x260
[ 30.667791] ? io_req_prep+0xd8/0x1090
[ 30.668316] ? io_queue_sqe+0x235/0x470
[ 30.668868] io_queue_sqe+0x235/0x470
[ 30.669398] io_submit_sqes+0xcce/0xf10
[ 30.669931] ? xa_load+0xe4/0x1c0
[ 30.670425] __x64_sys_io_uring_enter+0x3fb/0x5b0
[ 30.671051] ? lockdep_hardirqs_on_prepare+0xde/0x180
[ 30.671719] ? syscall_enter_from_user_mode+0x2b/0x80
[ 30.672380] do_syscall_64+0x2d/0x40
[ 30.672901] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 30.673503] RIP: 0033:0x7fd89c813239
[ 30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
[ 30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa
[ 30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239
[ 30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003
[ 30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000
[ 30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0
[ 30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000
This might happen if we do epoll_wait on a uring fd while reading/writing
the former epoll fd in a sqe in the former uring instance.
So let's don't flush cqring overflow list, just do a simple check.
Reported-by: Abaci <abaci@linux.alibaba.com>
Fixes: 6c503150ae33 ("io_uring: patch up IOPOLL overflow_flush sync")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Awhile there are requests in the allocation cache -- use them, only if
those ended go for the stashed memory in comp.free_list. As list
manipulation are generally heavy and are not good for caches, flush them
all or as much as can in one go.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: return success/failure from io_flush_cached_reqs()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
__io_queue_sqe() is always called with a non-NULL comp_state, which is
taken directly from context. Don't pass it around but infer from ctx.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
task_work is run without utilizing the req alloc cache, so any deferred
items don't get to take advantage of either the alloc or free side of it.
With task_work now being wrapped by io_uring, we can use the ctx
completion state to both use the req cache and the completion flush
batching.
With this, the only request type that cannot take advantage of the req
cache is IRQ driven IO for regular files / block devices. Anything else,
including IOPOLL polled IO to those same tyes, will take advantage of it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
task_work is a LIFO list, due to how it's implemented as a lockless
list. For long chains of task_work, this can be problematic as the
first entry added is the last one processed. Similarly, we'd waste
a lot of CPU cycles reversing this list.
Wrap the task_work so we have a single task_work entry per task per
ctx, and use that to run it in the right order.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now that we have the submit_state in the ring itself, we can have io_kiocb
allocations that are persistent across invocations. This reduces the time
spent doing slab allocations and frees.
[sil: rebased]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make io_req_free_batch(), which is used for inline executed requests and
IOPOLL, to return requests back into the allocation cache, so avoid
most of kmalloc()/kfree() for those cases.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't free batch-allocated requests across syscalls.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently batch free handles request memory freeing and ctx ref putting
together. Separate them and use different counters, that will be needed
for reusing reqs memory.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove fallback_req for now, it gets in the way of other changes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_submit_flush_completions() does completion batching, but may also use
free batching as iopoll does. The main beneficiaries should be buffered
reads/writes and send/recv.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Reincarnation of an old patch that replaces a list in struct
io_compl_batch with an array. It's needed to avoid hooking requests via
their compl.list, because it won't be always available in the future.
It's also nice to split io_submit_flush_completions() to avoid free
under locks and remove unlock/lock with a long comment describing when
it can be done.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As now submit_state is retained across syscalls, we can save ourself
from initialising it from ground up for each io_submit_sqes(). Set some
fields during ctx allocation, and just keep them always consistent.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: remove unnecessary zeroing of ctx members]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
completion state is closely bound to ctx, we don't need to store ctx
inside as we always have it around to pass to flush.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
struct io_submit_state is quite big (168 bytes) and going to grow. It's
better to not keep it on stack as it is now. Move it to context, it's
always protected by uring_lock, so it's fine to have only one instance
of it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is no reason to drag io_comp_state into opcode handlers, we just
need a flag and the actual work will be done in __io_queue_sqe().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This reverts commit c10983e14e8f5d7c8dab0415e0cb7fe8d10aa9e3.
This commit is not meant for drm-misc-next-fixes, and was accidentally
cherry picked over.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
Fixes the following warnings which results in interrupts disabled on
port B/F:
gpio gpiochip1: (B): detected irqchip that is shared with multiple gpiochips: please fix the driver.
gpio gpiochip5: (F): detected irqchip that is shared with multiple gpiochips: please fix the driver.
- added separate irqchip for each interrupt capable gpiochip
- provided unique names for each irqchip
Fixes: d2b091961510 ("gpio: ep93xx: Pass irqchip when adding gpiochip")
Cc: <stable@vger.kernel.org>
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
|
|
Two index spaces and ep93xx_gpio_port are confusing.
Instead add a separate struct to store necessary data and remove
ep93xx_gpio_port.
- add struct to store IRQ related data for each IRQ capable chip
- replace offset array with defined offsets
- add IRQ registers offset for each IRQ capable chip into
ep93xx_gpio_banks
------------[ cut here ]------------
kernel BUG at drivers/gpio/gpio-ep93xx.c:64!
---[ end trace 3f6544e133e9f5ae ]---
Fixes: fd935fc421e74 ("gpio: ep93xx: Do not pingpong irq numbers")
Cc: <stable@vger.kernel.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
|
|
If the kernel gets a SMEP violation or a fault that would have been a
SMEP violation if it had SMEP support, it shouldn't run fixups. Just
OOPS.
[ bp: Massage commit message. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/46160d8babce2abf1d6daa052146002efa24ac56.1612924255.git.luto@kernel.org
|
|
There are several things special for the RAPL Psys energy counter, on
Intel Sapphire Rapids platform.
1. it contains one Psys master package, and only CPUs on the master
package can read valid value of the Psys energy counter, reading the
MSR on CPUs in the slave package returns 0.
2. The master package does not have to be Physical package 0. And when
all the CPUs on the Psys master package are offlined, we lose the Psys
energy counter, at runtime.
3. The Psys energy counter can be disabled by BIOS, while all the other
energy counters are not affected.
It is not easy to handle all of these in the current RAPL PMU design
because
a) perf_msr_probe() validates the MSR on some random CPU, which may either
be in the Psys master package or in the Psys slave package.
b) all the RAPL events share the same PMU, and there is not API to remove
the psys-energy event cleanly, without affecting the other events in
the same PMU.
This patch addresses the problems in a simple way.
First, by setting .no_check bit for RAPL Psys MSR, the psys-energy event
is always added, so we don't have to check the Psys ENERGY_STATUS MSR on
master package.
Then, by removing rapl_not_visible(), the psys-energy event is always
available in sysfs. This does not affect the previous code because, for
the RAPL MSRs with .no_check cleared, the .is_visible() callback is always
overriden in the perf_msr_probe() function.
Note, although RAPL PMU is die-based, and the Psys energy counter MSR on
Intel SPR is package scope, this is not a problem because there is only
one die in each package on SPR.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20210204161816.12649-3-rui.zhang@intel.com
|
|
In the RAPL ENERGY_COUNTER MSR, only the lower 32bits represent the energy
counter.
On previous platforms, the higher 32bits are reverved and always return
Zero. But on Intel SapphireRapids platform, the higher 32bits are reused
for other purpose and return non-zero value.
Thus check the lower 32bits only for these ENERGY_COUTNER MSRs, to make
sure the RAPL PMU events are not added erroneously when higher 32bits
contain non-zero value.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20210204161816.12649-2-rui.zhang@intel.com
|
|
In some cases, when probing a perf MSR, we're probing certain bits of the
MSR instead of the whole register, thus only these bits should be checked.
For example, for RAPL ENERGY_STATUS MSR, only the lower 32 bits represents
the energy counter, and the higher 32bits are reserved.
Introduce a new mask field in struct perf_msr to allow probing certain
bits of a MSR.
This change is transparent to the current perf_msr_probe() users.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20210204161816.12649-1-rui.zhang@intel.com
|
|
Cascade Lake Xeon parts have the same model number as Skylake Xeon
parts, so they are tagged with the intel_pebs_isolation
quirk. However, as with Skylake Xeon H0 stepping parts, the PEBS
isolation issue is fixed in all microcode versions.
Add the Cascade Lake Xeon steppings (5, 6, and 7) to the
isolation_ucodes[] table so that these parts benefit from Andi's
optimization in commit 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary
work in guest filtering").
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20210205191324.2889006-1-jmattson@google.com
|
|
mutex_trylock_recursive() has been removed from the tree, there is no
need to check for it.
Remove traces of mutex_trylock_recursive()'s existence.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210210085248.219210-3-bigeasy@linutronix.de
|
|
There are not users of mutex_trylock_recursive() in tree as of
v5.11-rc7.
Remove it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210210085248.219210-2-bigeasy@linutronix.de
|
|
Commit 997acaf6b4b5 ("lockdep: report broken irq restoration") makes
compiling s390 fail because the irq enable/disable functions are now
no longer fully contained in header files.
Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
vmlinux.o: warning: objtool: lock_is_held_type()+0x107: call to warn_bogus_irq_restore() leaves .noinstr.text section
As per the general rule that WARNs are allowed to violate noinstr to
get out, annotate it away.
Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lkml.kernel.org/r/YCKyYg53mMp4E7YI@hirez.programming.kicks-ass.net
|
|
The name no_context() has never been very clear. It's only called for
faults from kernel mode, so rename it and change the no-longer-useful
user_mode(regs) check to a WARN_ON_ONCE.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c21940efe676024bb4bc721f7d70c29c420e127e.1612924255.git.luto@kernel.org
|
|
Drop an indentation level and remove the last user_mode(regs) == true
caller of no_context() by directly OOPSing for implicit kernel faults
from usermode.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/6e3d1129494a8de1e59d28012286e3a292a2296e.1612924255.git.luto@kernel.org
|
|
Not all callers of no_context() want to run exception fixups.
Separate the OOPS code out from the fixup code in no_context().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/450f8d8eabafb83a5df349108c8e5ea83a2f939d.1612924255.git.luto@kernel.org
|
|
Merely enabling CONFIG_COMPILE_TEST should not enable additional code.
To fix this, restrict the automatic enabling of GPIO_MXS to ARCH_MXS,
and ask the user in case of compile-testing.
Fixes: 6876ca311bfca5d7 ("gpio: mxs: add COMPILE_TEST support for GPIO_MXS")
Cc: <stable@vger.kernel.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
|
|
Right now, the case of the kernel trying to execute from user memory
is treated more or less just like the kernel getting a page fault on a
user access. In the failure path, it checks for erratum #93, tries to
otherwise fix up the error, and then oopses.
If it manages to jump to the user address space, with or without SMEP,
it should not try to resolve the page fault. This is an error, pure and
simple. Rearrange the code so that this case is caught early, check for
erratum #93, and bail out.
[ bp: Massage commit message. ]
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/ab8719c7afb8bd501c4eee0e36493150fbbe5f6a.1612924255.git.luto@kernel.org
|
|
In general, page fault errors for WRUSS should be just like get_user(),
etc. Fix three bugs in this area:
There is a comment that says that, if the kernel can't handle a page fault
on a user address due to OOM, the OOM-kill-and-retry logic would be
skipped. The code checked kernel *privilege*, not kernel mode, so it
missed WRUSS. This means that the kernel would malfunction if it got OOM
on a WRUSS fault -- this would be a kernel-mode, user-privilege fault, and
the OOM killer would be invoked and the handler would retry the faulting
instruction.
A failed user access from kernel while a fatal signal is pending should
fail even if the instruction in question was WRUSS.
do_sigbus() should not send SIGBUS for WRUSS -- it should handle it like
any other kernel mode failure.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/a7b7bcea730bd4069e6b7e629236bb2cf526c2fb.1612924255.git.luto@kernel.org
|
|
If fault_signal_pending() returns true, then the core mm has unlocked the
mm for us. Add a comment to help future readers of this code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/c56de3d103f40e6304437b150aa7b215530d23f7.1612924255.git.luto@kernel.org
|
|
bad_area() and its relatives are called from many places in fault.c, and
exactly one of them wants the F00F workaround.
__bad_area_nosemaphore() no longer contains any kernel fault code, which
prepares for further cleanups.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/e9668729a48ce6754022b0a4415631e8ebdd00e7.1612924255.git.luto@kernel.org
|
|
mm_fault_error() is logically just the end of do_user_addr_fault().
Combine the functions. This makes the code easier to read.
Most of the churn here is from renaming hw_error_code to error_code in
do_user_addr_fault().
This makes no difference at all to the generated code (objdump -dr) as
compared to changing noinline to __always_inline in the definition of
mm_fault_error().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/dedc4d9c9b047e51ce38b991bd23971a28af4e7b.1612924255.git.luto@kernel.org
|
|
Add support for version 2 of the LARI_CONFIG_CHANGE command.
this is needed to support UHB enable/disable from BIOS
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210210142629.8a0c951bfdea.I850f29d3ff3931388447bda635dfbc742ea1df61@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
WARNING is better than crashing. Since this happened to me,
be on the safe side.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210210142629.d4651427fcda.I1bcecb73676d039e2521309c07fc6b6314a90546@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Once sending the REPLY_ERROR group ID is not set and this lead to
get it set to wrong value LONG_GROUP later in default handling
Fix this by checking the REPLY_ERROR and avoid changing the Group ID
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210210142629.82578caaea84.I0ca9cfdd4e656d2e88ee7696dd6baf4267e7cb52@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Add support for AX201 and AX211 radio modules, which we call HR2 and
GF, respectively. These modules can be used with the Ma family of
devices and above.
Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210210142629.f8e3080ce633.I7377b421b031796730daf809c4024a3c3ef95fa8@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|
|
Some new devices contain an extra bit in the CRF ID register to denote
that they support CDB. Add definitions and macros to be able to
support it and add the "NO_CDB" to all existing entired.
Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210210142629.7b40184d9899.I3bb2cf9b9afb0457583f786dc52d4d1b1ad75ffc@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
|