Age | Commit message (Collapse) | Author |
|
Linux SATA support in ipr has always been limited to SATA DVDs. The last
systems that had the option of including a SATA DVD was Power 8, which have
been withdrawn for some time now, so this support can be removed.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20230412174015.114764-1-brking@linux.vnet.ibm.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Currently scsi_debug_device_reset() does not do much apart from setting the
SDEBUG_UA_POR ("Power on, reset, or bus device reset") flag, which is
eventually passed back to the SCSI midlayer later for a "unit attention"
command.
There is a report that blktest scsi/007 test fails due to commit
1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd").
The problem there is that there are dangling scsi_debug queued commands
when we attempt to remove the driver.
scsi/007 test triggers SCSI EH and attempts to abort a timed-out command.
Function scsi_debug_device_reset() is called as part of the EH, but does
not deal with outstanding erroneous command. Prior to the named commit,
removing the driver caused all dangling queued commands to be stopped -
this should have not been necessary.
Fix by aborting outstanding commands on a scsi_device basis from
scsi_debug_device_reset().
Fixes: 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202304071111.e762fcbd-yujie.liu@intel.com
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230416175654.159163-1-john.g.garry@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Evan Green <evan@rivosinc.com> says:
There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].
I was asked about the performance delta between this and something like
sysfs. I created a small test program and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.
[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050
* b4-shazam-merge:
RISC-V: Add hwprobe vDSO function and data
selftests: Test the new RISC-V hwprobe interface
RISC-V: hwprobe: Support probing of misaligned access performance
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: Add a syscall for HW probing
RISC-V: Move struct riscv_cpuinfo to new header
Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Reapply the fix from:
30b2b2196d6e ("cifs: do not include page data when checking signature")
that got lost in the iteratorisation of the cifs driver.
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reported-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@cjr.nz>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Bharath S M <bharathsm@microsoft.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If read() is done in an unbuffered manner, such that, say,
cifs_strict_readv() goes through cifs_user_readv() and thence
__cifs_readv(), it doesn't recognise the EOF and keeps indicating to
userspace that it returning full buffers of data.
This is due to ctx->iter being advanced in cifs_send_async_read() as the
buffer is split up amongst a number of rdata objects. The iterator count
is then used in collect_uncached_read_data() in the non-DIO case to set the
total length read - and thus the return value of sys_read(). But since the
iterator normally gets used up completely during splitting, ctx->total_len
gets overridden to the full amount.
However, prior to that in collect_uncached_read_data(), we've gone through
the list of rdatas and added up the amount of data we actually received
(which we then throw away).
Fix this by removing the bit that overrides the amount read in the non-DIO
case and just going with the total added up in the aforementioned loop.
This was observed by mounting a cifs share with multiple channels, e.g.:
mount //192.168.6.1/test /test/ -o user=shares,pass=...,max_channels=6
and then reading a 1MiB file on the share:
strace cat /xfstest.test/1M >/dev/null
Through strace, the same data can be seen being read again and again.
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
cc: Jérôme Glisse <jglisse@redhat.com>
cc: Long Li <longli@microsoft.com>
cc: Enzo Matsumiya <ematsumiya@suse.de>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Make sure to check device queue mode in the null_validate_conf() and
return error for NULL_Q_RQ as we don't allow legacy I/O path, without
this patch we get OOPs when queue mode is set to 1 from configfs,
following are repro steps :-
modprobe null_blk nr_devices=0
mkdir config/nullb/nullb0
echo 1 > config/nullb/nullb0/memory_backed
echo 4096 > config/nullb/nullb0/blocksize
echo 20480 > config/nullb/nullb0/size
echo 1 > config/nullb/nullb0/queue_mode
echo 1 > config/nullb/nullb0/power
Entering kdb (current=0xffff88810acdd080, pid 2372) on processor 42 Oops: (null)
due to oops @ 0xffffffffc041c329
CPU: 42 PID: 2372 Comm: sh Tainted: G O N 6.3.0-rc5lblk+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:null_add_dev.part.0+0xd9/0x720 [null_blk]
Code: 01 00 00 85 d2 0f 85 a1 03 00 00 48 83 bb 08 01 00 00 00 0f 85 f7 03 00 00 80 bb 62 01 00 00 00 48 8b 75 20 0f 85 6d 02 00 00 <48> 89 6e 60 48 8b 75 20 bf 06 00 00 00 e8 f5 37 2c c1 48 8b 75 20
RSP: 0018:ffffc900052cbde0 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff88811084d800 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888100042e00
RBP: ffff8881053d8200 R08: ffffc900052cbd68 R09: ffff888105db2000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
R13: ffff888104765200 R14: ffff88810eec1748 R15: ffff88810eec1740
FS: 00007fd445fd1740(0000) GS:ffff8897dfc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 0000000166a00000 CR4: 0000000000350ee0
DR0: ffffffff8437a488 DR1: ffffffff8437a489 DR2: ffffffff8437a48a
DR3: ffffffff8437a48b DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
nullb_device_power_store+0xd1/0x120 [null_blk]
configfs_write_iter+0xb4/0x120
vfs_write+0x2ba/0x3c0
ksys_write+0x5f/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fd4460c57a7
Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffd3792a4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4460c57a7
RDX: 0000000000000002 RSI: 000055b43c02e4c0 RDI: 0000000000000001
RBP: 000055b43c02e4c0 R08: 000000000000000a R09: 00007fd44615b4e0
R10: 00007fd44615b3e0 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fd446198520 R14: 0000000000000002 R15: 00007fd446198700
</TASK>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Link: https://lore.kernel.org/r/20230416220339.43845-1-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
All ublk commands(control, IO) should have taken ioctl command encoding
from the beginning, because ioctl command encoding defines each code
uniquely, so driver can figure out wrong command sent from userspace
easily; 2) it might help security subsystem for audit uring cmd[1].
Unfortunately we didn't do that way, and it could be one lesson for
ublk driver.
So switch to ioctl command encoding now, we still support commands encoded
in old way, but they become legacy definition. Any new command should take
ioctl encoding.
See ublksrv code for switching to ioctl command encoding in [2].
[1] https://lore.kernel.org/io-uring/CAHC9VhSVzujW9LOj5Km80AjU0EfAuukoLrxO6BEfnXeK_s6bAg@mail.gmail.com/
[2] https://github.com/ming1/ubdsrv/commits/ioctl_cmd_encoding
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ken Kurematsu <k.kurematsu@nskint.co.jp>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230418131810.855959-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties. As
part of this, convert of_get_property/of_find_property calls to the
recently added of_property_present() helper when we just want to test
for presence of a property and nothing more.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230310144736.1546972-1-robh@kernel.org
|
|
A multishot timeout submission will repeatedly generate completions with
the IORING_CQE_F_MORE cflag set. Depending on the value of the `off'
field in the submission, these timeouts can either repeat indefinitely
until cancelled (`off' = 0) or for a fixed number of times (`off' > 0).
Only noseq timeouts (i.e. not dependent on the number of I/O
completions) are supported.
An indefinite timer will be cancelled if the CQ ever overflows.
Signed-off-by: David Wei <davidhwei@meta.com>
Link: https://lore.kernel.org/r/20230418225817.1905027-1-davidhwei@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make rsrc nodes independent from rsrd_data, for that we keep ctx and
rsrc type in nodes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4f259abe9cd4eea6a3b4ed83508635218acd3c3f.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We only have two rsrc types, buffers and files, replace virtual
callbacks for putting resources down with a switch..case.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/02ca727bf8e5f7f820c2f404e95ae88c8f472930.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of passing rsrc_data and a resource to io_rsrc_put_work() just
forward node, that's all the function needs.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/791e8edd28d78797240b74d34e99facbaad62f3b.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_rsrc_put_work() is simple enough to be open coded into its only
caller.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1b36dd46766ced39a9b160767babfa2fce07b8f8.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Unless a node was flushed by io_rsrc_ref_quiesce(), it'll carry a
resource. Replace ->inline_items with an empty flag, which is
initialised to false and only raised in io_rsrc_ref_quiesce().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/75d384c9d2252e12af73b9cf8a44e1699106aeb1.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
struct io_rsrc_node carries a number of resources represented by struct
io_rsrc_put. That was handy before for sync overhead ammortisation, but
all complexity is gone and nodes are simple and lightweight. Let's
allocate a separate node for each resource.
Nodes and io_rsrc_put and not much different in size, and former are
cached, so node allocation should work better. That also removes some
overhead for nested iteration in io_rsrc_node_ref_zero() /
__io_rsrc_put_work().
Another reason for the patch is that it greatly reduces complexity
by moving io_rsrc_node_switch[_start]() inside io_queue_rsrc_removal(),
so users don't have to care about it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c7d3a45b30cc14cd93700a710dd112edc703db98.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For io_queue_rsrc_removal() we should always use the current active rsrc
node, don't pass it directly but let the function grab it from the
context.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d15939b4afea730978b4925685c2577538b823bb.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
->llist was needed for rsrc node destruction offload, which is removed
now. Get rid of the unused field.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8e7d764c3f947489fde88d0927c3060d2e1bb599.1681822823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
So Intel introduced the FSRS ("Fast Short REP STOS") CPU capability bit,
because they seem to have done the (much simpler) REP STOS optimizations
separately and later than the REP MOVS one.
In contrast, when AMD introduced support for FSRM ("Fast Short REP
MOVS"), in the Zen 3 core, it appears to have improved the REP STOS case
at the same time, and since the FSRS bit was added by Intel later, it
doesn't show up on those AMD Zen 3 cores.
And now that we made use of FSRS for the "rep stos" conditional, that
made those AMD machines unnecessarily slower. The Intel situation where
"rep movs" is fast, but "rep stos" isn't, is just odd. The 'stos' case
is a lot simpler with no aliasing, no mutual alignment issues, no
complicated cases.
So this just sets FSRS automatically when FSRM is available on AMD
machines, to get back all the nice REP STOS goodness in Zen 3.
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The old 'copy_user_generic_unrolled' function was oddly implemented for
largely historical reasons: it had been largely based on the uncached
copy case, which has some other concerns.
For example, the __copy_user_nocache() function uses 'movnti' for the
destination stores, and those want the destination to be aligned. In
contrast, the regular copy function doesn't really care, and trying to
align things only complicates matters.
Also, like the clear_user function, the copy function had some odd
handling of the repeat counts, complicating the exception handling for
no really good reason. So as with clear_user, just write it to keep all
the byte counts in the %rcx register, exactly like the 'rep movs'
functionality that this replaces.
Unlike a real 'rep movs', we do allow for this to trash a few temporary
registers to not have to unnecessarily save/restore registers on the
stack.
And like the clearing case, rename this to what it now clearly is:
'rep_movs_alternative', and make it one coherent function, so that it
shows up as such in profiles (instead of the odd split between
"copy_user_generic_unrolled" and "copy_user_short_string", the latter of
which was not about strings at all, and which was shared with the
uncached case).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The old version was oddly written to have the repeat count in multiple
registers. So instead of taking advantage of %rax being zero, it had
some sub-counts in it. All just for a "single word clearing" loop,
which isn't even efficient to begin with.
So get rid of those games, and just keep all the state in the same
registers we got it in (and that we should return things in). That not
only makes this act much more like 'rep stos' (which this function is
replacing), but makes it much easier to actually do the obvious loop
unrolling.
Also rename the function from the now nonsensical 'clear_user_original'
to what it now clearly is: 'rep_stos_alternative'.
End result: if we don't have a fast 'rep stosb', at least we can have a
fast fallback for it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This does the same thing for the user copies as commit 0db7058e8e23
("x86/clear_user: Make it faster") did for clear_user(). In other
words, it inlines the "rep movs" case when X86_FEATURE_FSRM is set,
avoiding the function call entirely.
In order to do that, it makes the calling convention for the out-of-line
case ("copy_user_generic_unrolled") match the 'rep movs' calling
convention, although it does also end up clobbering a number of
additional registers.
Also, to simplify code sharing in the low-level assembly with the
__copy_user_nocache() function (that uses the normal C calling
convention), we end up with a kind of mixed return value for the
low-level asm code: it will return the result in both %rcx (to work as
an alternative for the 'rep movs' case), _and_ in %rax (for the nocache
case).
We could avoid this by wrapping __copy_user_nocache() callers in an
inline asm, but since the cost is just an extra register copy, it's
probably not worth it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is preparatory work for inlining the 'rep movs' case, but also a
cleanup. The __copy_user_nocache() function was mis-used by the rdma
code to do uncached kernel copies that don't actually want user copies
at all, and as a result doesn't want the stac/clac either.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The modern target to use is FSRS (Fast Short REP STOS), and the other
cases should only be used for bigger areas (ie mainly things like page
clearing).
Note! This changes the conditional for the inlining from FSRM ("fast
short rep movs") to FSRS ("fast short rep stos").
We'll have a separate fixup for AMD microarchitectures that have a good
'rep stosb' yet do not set the new Intel-specific FSRS bit (because FSRM
was there first).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The modern target to use is FSRM (Fast Short REP MOVS), and the other
cases should only be used for bigger areas (ie mainly things like page
clearing).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The modern target to use is FSRS (Fast Short REP STOS), and the other
cases should only be used for bigger areas (ie mainly things like page
clearing).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The modern target to use is FSRM (Fast Short REP MOVS), and the other
cases should only be used for bigger areas (ie mainly things like page
copying and clearing).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
hwpoison_user_mappings() is updated to support ksm pages, and add
collect_procs_ksm() to collect processes when the error hit an ksm page.
The difference from collect_procs_anon() is that it also needs to traverse
the rmap-item list on the stable node of the ksm page. At the same time,
add_to_kill_ksm() is added to handle ksm pages. And
task_in_to_kill_list() is added to avoid duplicate addition of tsk to the
to_kill list. This is because when scanning the list, if the pages that
make up the ksm page all come from the same process, they may be added
repeatedly.
Link: https://lkml.kernel.org/r/20230414021741.2597273-3-xialonglong1@huawei.com
Signed-off-by: Longlong Xia <xialonglong1@huawei.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: ksm: support hwpoison for ksm page", v2.
Currently, ksm does not support hwpoison. As ksm is being used more
widely for deduplication at the system level, container level, and process
level, supporting hwpoison for ksm has become increasingly important.
However, ksm pages were not processed by hwpoison in 2009 [1].
The main method of implementation:
1. Refactor add_to_kill() and add new add_to_kill_*() to better
accommodate the handling of different types of pages.
2. Add collect_procs_ksm() to collect processes when the error hit an
ksm page.
3. Add task_in_to_kill_list() to avoid duplicate addition of tsk to
the to_kill list.
4. Try_to_unmap ksm page (already supported).
5. Handle related processes such as sending SIGBUS.
Tested with poisoning to ksm page from
1) different process
2) one process
and with/without memory_failure_early_kill set, the processes are killed
as expected with the patchset.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?h=01e00f880ca700376e1845cf7a2524ebe68e47d6
This patch (of 2):
The page_address_in_vma() is used to find the user virtual address of page
in add_to_kill(), but it doesn't support ksm due to the ksm page->index
unusable, add an ksm_addr as parameter to add_to_kill(), let's the caller
to pass it, also rename the function to __add_to_kill(), and adding
add_to_kill_anon_file() for handling anonymous pages and file pages,
adding add_to_kill_fsdax() for handling fsdax pages.
Link: https://lkml.kernel.org/r/20230414021741.2597273-1-xialonglong1@huawei.com
Link: https://lkml.kernel.org/r/20230414021741.2597273-2-xialonglong1@huawei.com
Signed-off-by: Longlong Xia <xialonglong1@huawei.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
sysctl memfd_noexec is pid-namespaced, non-reservable, and inherent to the
child process.
Move the inherence test from init ns to child ns, so init ns can keep the
default value.
Link: https://lkml.kernel.org/r/20230414022801.2545257-1-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@google.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202303312259.441e35db-yujie.liu@intel.com
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The va_high_addr_switch selftest is used to test mmap across 128TB
boundary. It divides the selftest cases into two main categories on the
basis of size. One set is used to create mappings that are multiples of
PAGE_SIZE while the other creates mappings that are multiples of
HUGETLB_SIZE.
In order to run the hugetlb testcases the binary must be appended with
"--run-hugetlb" but the file that used to run the test only invokes the
binary, thereby completely skipping the hugetlb testcases. Hence, the
required statement has been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-6-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Arm64 has a default hugepage size of 512MB when CONFIG_ARM64_64K_PAGES=y
is enabled. While testing on arm64 platforms having up to 4PB of virtual
address space, a minimum of 6 hugepages were required for all test cases
to pass. Support for this requirement has been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-5-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The in code comments for the selftest were made on the basis of 128TB
switch, an architecture feature specific to PowerPc and x86 platforms.
Keeping in mind the support added for arm64 platforms which implements a
256TB switch, a more generic explanation has been provided.
Link: https://lkml.kernel.org/r/20230323105243.2807166-4-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As the initial selftest only took into consideration PowperPC and x86
architectures, on adding support for arm64, a platform independent naming
convention is chosen.
Link: https://lkml.kernel.org/r/20230323105243.2807166-3-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "selftests/mm: Implement support for arm64 on va".
The va_128TBswitch selftest is designed and implemented for PowerPC and
x86 architectures which support a 128TB switch, up to 256TB of virtual
address space and hugepage sizes of 16MB and 2MB respectively. Arm64
platforms on the other hand support a 256Tb switch, up to 4PB of virtual
address space and a default hugepage size of 512MB when 64k pagesize is
enabled.
These architectural differences require introducing support for arm64
platforms, after which a more generic naming convention is suggested. The
in code comments are amended to provide a more platform independent
explanation of the working of the code and nr_hugepages are configured as
required. Finally, the file running the testcase is modified in order to
prevent skipping of hugetlb testcases of va_high_addr_switch.
This patch (of 5):
Arm64 platforms have the ability to support 64kb pagesize, 512MB default
hugepage size and up to 4PB of virtual address space. The address switch
occurs at 256TB as opposed to 128TB. Hence, the necessary support has
been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-1-chaitanyas.prakash@arm.com
Link: https://lkml.kernel.org/r/20230323105243.2807166-2-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Lists (items) with one item should be just const or enum because it is
shorter and simpler.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20230414104230.23165-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Lists (items) with one item should be just enum because it is shorter,
simpler and does not confuse, if one wants to add new entry with a
fallback. Convert all of them to enums. OTOH, leave unused "oneOf"
entries in anticipation of further growth of the entire binding.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20230414083311.12197-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
video-interface.txt does not exist anymore, as it has been converted
to video-interfaces.yaml.
Instead of referencing video-interfaces.yaml multiple times,
pass it as a $ref to the schema.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Link: https://lore.kernel.org/r/20230412175800.2537812-1-festevam@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Cleanup bindings dropping unneeded quotes. Once all these are fixed,
checking for this can be enabled in yamllint.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20230327170146.4104556-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Add QDU1000 PDC, already used in upstreamed DTS.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230416102831.105136-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
All headers from 'include/dt-bindings/' must be verified by checkpatch
together with Documentation bindings, because all of them are part of the
whole DT bindings system.
The requirement is dual licensed and matching patterns:
* Schemas:
/GPL-2\.0(?:-only)? OR BSD-2-Clause/
* Headers:
/GPL-2\.0(?:-only)? OR \S+/
Above patterns suggested by Rob at:
https://lore.kernel.org/all/CAL_Jsq+-YJsBO+LuPJ=ZQ=eb-monrwzuCppvReH+af7hYZzNaQ@mail.gmail.com
The issue was found during patch review:
https://lore.kernel.org/all/20230313201259.19998-4-ddrokosov@sberdevices.ru/
Link: https://lkml.kernel.org/r/20230404191715.7319-1-ddrokosov@sberdevices.ru
Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As of 4f04cbaf128 ("epoll: use refcount to reduce ep_mutex contention"),
this lock is now specific to nesting cases - inserting an epoll fd onto
another epoll fd. Rename the lock to be less generic.
Link: https://lkml.kernel.org/r/20230411234159.20421-1-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
$lx_dentry_name() generates a full VFS path from a given dentry pointer,
and $lx_i_dentry() returns the dentry pointer associated with the given
inode pointer, if there is one.
Link: https://lkml.kernel.org/r/c9a5ad8efbfbd2cc6559e082734eed7628f43a16.1677631565.git.development@efficientek.com
Signed-off-by: Glenn Washburn <development@efficientek.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Antonio Borneo <antonio.borneo@foss.st.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "GDB VFS utils".
I've created a couple GDB convenience functions that I found useful when
debugging some VFS issues and figure others might find them useful. For
instance, they are useful in setting conditional breakpoints on VFS
functions where you only care if the dentry path is a certain value. I
took the opportunity to create a new "vfs" python module to give VFS
related utilities a home.
This patch (of 2):
This will allow for more VFS specific GDB helpers to be collected in one
place. Move utils.dentry_name into the vfs modules. Also a local
variable in proc.py was changed from vfs to mnt to prevent a naming
collision with the new vfs module.
[akpm@linux-foundation.org: add SPDX-License-Identifier]
Link: https://lkml.kernel.org/r/cover.1677631565.git.development@efficientek.com
Link: https://lkml.kernel.org/r/7bba4c065a8c2c47f1fc5b03a7278005b04db251.1677631565.git.development@efficientek.com
Signed-off-by: Glenn Washburn <development@efficientek.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Antonio Borneo <antonio.borneo@foss.st.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
typeof is (still) a GNU extension, which means that it cannot be used when
building ISO C (e.g. -std=c99). It should therefore be avoided in uapi
headers in favour of the ISO-friendly __typeof__.
Unfortunately this issue could not be detected by
CONFIG_UAPI_HEADER_TEST=y as the __ALIGN_KERNEL() macro is not expanded in
any uapi header.
This matters from a userspace perspective, not a kernel one. uapi
headers and their contents are expected to be usable in a variety of
situations, and in particular when building ISO C applications (with
-std=c99 or similar).
This particular problem can be reproduced by trying to use the
__ALIGN_KERNEL macro directly in application code, say:
#include <linux/const.h>
int align(int x, int a)
{
return __KERNEL_ALIGN(x, a);
}
and trying to build that with -std=c99.
Link: https://lkml.kernel.org/r/20230411092747.3759032-1-kevin.brodsky@arm.com
Fixes: a79ff731a1b2 ("netfilter: xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()")
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reported-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com>
Tested-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Delay accounting does not track the delay of IRQ/SOFTIRQ. While
IRQ/SOFTIRQ could have obvious impact on some workloads productivity, such
as when workloads are running on system which is busy handling network
IRQ/SOFTIRQ.
Get the delay of IRQ/SOFTIRQ could help users to reduce such delay. Such
as setting interrupt affinity or task affinity, using kernel thread for
NAPI etc. This is inspired by "sched/psi: Add PSI_IRQ to track
IRQ/SOFTIRQ pressure"[1]. Also fix some code indent problems of older
code.
And update tools/accounting/getdelays.c:
/ # ./getdelays -p 156 -di
print delayacct stats ON
printing IO accounting
PID 156
CPU count real total virtual total delay total delay average
15 15836008 16218149 275700790 18.380ms
IO count delay total delay average
0 0 0.000ms
SWAP count delay total delay average
0 0 0.000ms
RECLAIM count delay total delay average
0 0 0.000ms
THRASHING count delay total delay average
0 0 0.000ms
COMPACT count delay total delay average
0 0 0.000ms
WPCOPY count delay total delay average
36 7586118 0.211ms
IRQ count delay total delay average
42 929161 0.022ms
[1] commit 52b1364ba0b1("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure")
Link: https://lkml.kernel.org/r/202304081728353557233@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Cc: wangyong <wang.yong12@zte.com.cn>
Cc: junhua huang <huang.junhua@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
join() expects strings but integers are given.
Convert chunks list to strings before passing it to join()
Link: https://lkml.kernel.org/r/20230406221217.1585486-4-f.fainelli@gmail.com
Signed-off-by: Amjad Ouled-Ameur <aouledameur@baylibre.com>
Signed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This GDB script prints the interrupts in the system in the same way that
/proc/interrupts does. This does include the architecture specific part
done by arch_show_interrupts() for x86, ARM, ARM64 and MIPS. Example
output from an ARM64 system:
(gdb) lx-interruptlist
CPU0 CPU1 CPU2 CPU3
10: 3167 1225 1276 2629 GICv2 30 Level arch_timer
13: 0 0 0 0 GICv2 36 Level arm-pmu
14: 0 0 0 0 GICv2 37 Level arm-pmu
15: 0 0 0 0 GICv2 38 Level arm-pmu
16: 0 0 0 0 GICv2 39 Level arm-pmu
28: 0 0 0 0 interrupt-controller@8410640 5 Edge brcmstb-gpio-wake
30: 125 0 0 0 GICv2 128 Level ttyS0
31: 0 0 0 0 interrupt-controller@8416000 0 Level mspi_done
32: 0 0 0 0 interrupt-controller@8410640 3 Edge brcmstb-waketimer
33: 0 0 0 0 interrupt-controller@8418580 8 Edge brcmstb-waketimer-rtc
34: 872 0 0 0 GICv2 230 Level brcm_scmi@0
35: 0 0 0 0 interrupt-controller@8410640 10 Edge 8d0f200.usb-phy
37: 0 0 0 0 GICv2 97 Level PCIe PME
42: 0 0 0 0 GICv2 145 Level xhci-hcd:usb1
43: 94 0 0 0 GICv2 71 Level mmc1
44: 0 0 0 0 GICv2 70 Level mmc0
IPI0: 23 666 154 98 Rescheduling interrupts
IPI1: 247 1053 1701 634 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 Timer broadcast interrupts
IPI5: 7 9 5 0 IRQ work interrupts
IPI6: 0 0 0 0 CPU wake-up interrupts
ERR: 0
Link: https://lkml.kernel.org/r/20230406220451.1583239-1-f.fainelli@gmail.com
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If CONFIG_DEBUG_INFO_REDUCED is enabled in the kernel configuration, we
will typically not be able to load vmlinux-gdb.py and will fail with:
Traceback (most recent call last):
File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/vmlinux-gdb.py", line 25, in <module>
import linux.utils
File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/scripts/gdb/linux/utils.py", line 131, in <module>
atomic_long_counter_offset = atomic_long_type.get_type()['counter'].bitpos
KeyError: 'counter'
Rather be left wondering what is happening only to find out that reduced
debug information is the cause, raise an eror. This was not typically a
problem until e3c8d33e0d62 ("scripts/gdb: fix 'lx-dmesg' on 32 bits arch")
but it has since then.
Link: https://lkml.kernel.org/r/20230406215252.1580538-1-f.fainelli@gmail.com
Fixes: e3c8d33e0d62 ("scripts/gdb: fix 'lx-dmesg' on 32 bits arch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Antonio Borneo <antonio.borneo@foss.st.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Linux makes use of the Radix Tree data structure to store pointers indexed
by integer values. This structure is utilised across many structures in
the kernel including the IRQ descriptor tables, and several filesystems.
This module provides a method to lookup values from a structure given its
head node.
Usage:
The function lx_radix_tree_lookup, must be given a symbol of type struct
radix_tree_root, and an index into that tree.
The object returned is a generic integer value, and must be cast correctly
to the type based on the storage in the data structure.
For example, to print the irq descriptor in the sparse irq_desc_tree at
index 18, try the following:
(gdb) print (struct irq_desc)$lx_radix_tree_lookup(irq_desc_tree, 18)
This script previously existed under commit
e127a73d41ac471d7e3ba950cf128f42d6ee3448 ("scripts/gdb: add a Radix Tree
Parser") and was later reverted with
b447e02548a3304c47b78b5e2d75a4312a8f17e1i (Revert "scripts/gdb: add a
Radix Tree Parser").
This version expects the XArray based radix tree implementation and has
been verified using QEMU/x86 on Linux 6.3-rc5.
[f.fainelli@gmail.com: revive and update for xarray implementation]
[f.fainelli@gmail.com: guard against a NULL node in the while loop]
Link: https://lkml.kernel.org/r/20230405222743.1191674-1-f.fainelli@gmail.com
Link: https://lkml.kernel.org/r/20230404214049.1016811-1-f.fainelli@gmail.com
Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This has a slight benefit for x86 and has no effect on other targets.
The benefit to x86 is it change the codegen for setting a node to block
from `mov %r0, %r1; or $RB_BLACK, %r1` to `lea RB_BLACK(%r0), %r1` which
saves an instructions.
In all other cases it just replace ALU with ALU (or -> and) which
perform the same on all machines I am aware of.
Total instructions in rbtree.o:
Before - 802
After - 782
so it saves about 20 `mov` instructions.
Link: https://lkml.kernel.org/r/20230404221350.3806566-1-goldstein.w.n@gmail.com
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|