Age | Commit message (Collapse) | Author |
|
If an AF_ALG socket bound to a hashing algorithm is sent a zero-length
message with MSG_MORE set and then recvmsg() is called without first
sending another message without MSG_MORE set to end the operation, an oops
will occur because the crypto context and result doesn't now get set up in
advance because hash_sendmsg() now defers that as long as possible in the
hope that it can use crypto_ahash_digest() - and then because the message
is zero-length, it the data wrangling loop is skipped.
Fix this by handling zero-length sends at the top of the hash_sendmsg()
function. If we're not continuing the previous sendmsg(), then just ignore
the send (hash_recvmsg() will invent something when called); if we are
continuing, then we finalise the request at this point if MSG_MORE is not
set to get any error here, otherwise the send is of no effect and can be
ignored.
Whilst we're at it, remove the code to create a kvmalloc'd scatterlist if
we get more than ALG_MAX_PAGES - this shouldn't happen.
Fixes: c662b043cdca ("crypto: af_alg/hash: Support MSG_SPLICE_PAGES")
Reported-by: syzbot+13a08c0bf4d212766c3c@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000b928f705fdeb873a@google.com/
Reported-by: syzbot+14234ccf6d0ef629ec1a@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000c047db05fdeb8790@google.com/
Reported-by: syzbot+4e2e47f32607d0f72d43@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000bcca3205fdeb87fb@google.com/
Reported-by: syzbot+472626bb5e7c59fb768f@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000b55d8805fdeb8385@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-and-tested-by: syzbot+6efc50cc1f8d718d6cb7@syzkaller.appspotmail.com
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/427646.1686913832@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
devm_clk_notifier_register() allocates a devres resource for clk
notifier but didn't register that to the device, so the notifier didn't
get unregistered on device detach and the allocated resource was leaked.
Fix the issue by registering the resource through devres_add().
This issue was found with kmemleak on a Chromebook.
Fixes: 6d30d50d037d ("clk: add devm variant of clk_notifier_register")
Signed-off-by: Fei Shao <fshao@chromium.org>
Link: https://lore.kernel.org/r/20230619112253.v2.1.I13f060c10549ef181603e921291bdea95f83033c@changeid
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
The new phy driver attempts to select a driver from another subsystem,
but that fails when the NVMEM subsystem is disabled:
WARNING: unmet direct dependencies detected for NVMEM_MTK_EFUSE
Depends on [n]: NVMEM [=n] && (ARCH_MEDIATEK [=n] || COMPILE_TEST [=y]) && HAS_IOMEM [=y]
Selected by [y]:
- MEDIATEK_GE_SOC_PHY [=y] && NETDEVICES [=y] && PHYLIB [=y] && (ARM64 && ARCH_MEDIATEK [=n] || COMPILE_TEST [=y])
I could not see an actual compile time dependency, so presumably this
is only needed for for working correctly but not technically a dependency
on that particular nvmem driver implementation, so it would likely
be safe to remove the select for compile testing.
To keep the spirit of the original 'select', just replace this with a
'depends on' that ensures that the driver will work but does not get in
the way of build testing.
Fixes: 98c485eaf509b ("net: phy: add driver for MediaTek SoC built-in GE PHYs")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/20230616093009.3511692-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rework iterating over DT CPU nodes to iterate over possible CPUs
instead. There's no need to walk the DT CPU nodes again. Possible CPUs
is equal to the number of CPUs defined in the DT. Using the "reg" value
for an array index is fragile as it assumes "reg" is 0-N which often is
not the case.
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-3-8333729ee45d@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Use of_get_cpu_hwid() rather than the open coded reading of the CPU
nodes "reg" property. The existing code is in fact wrong as the "reg"
address cells size is 2 cells for arm64. The existing code happens to
work because the DTS files are wrong as well.
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-2-8333729ee45d@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
drivers/clk/mvebu/ is missing a maintainers entry. Add it to the
existing entry for the Marvell mvebu platforms.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-1-8333729ee45d@kernel.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
In the function bdev_add_partition(),there is no check that the start
and end sectors exceed the size of the disk before calling add_partition.
When we call the block's ioctl interface directly to add a partition,
and the capacity of the disk is set to 0 by driver,the command will
continue to execute.
Signed-off-by: Min Li <min15.li@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230619091214.31615-1-min15.li@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull smb server fixes from Steve French:
"Four smb3 server fixes, all also for stable:
- fix potential oops in parsing compounded requests
- fix various paths (mkdir, create etc) where mnt_want_write was not
checked first
- fix slab out of bounds in check_message and write"
* tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate session id and tree id in the compound request
ksmbd: fix out-of-bound read in smb2_write
ksmbd: add mnt_want_write to ksmbd vfs functions
ksmbd: validate command payload size
|
|
Allow of unprivileged Persistent Reservation operations on devices
if the write permission check on the device node has passed.
brw-rw---- 1 root disk 259, 0 Jun 13 07:09 /dev/nvme0n1
In the example above, the "disk" group of nvme0n1 is also allowed to
make reservations on the device even without CAP_SYS_ADMIN.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230613084008.93795-3-jefflexu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Refuse Persistent Reservation operations on partitions as reservation
on partitions doesn't make sense.
Besides, introduce blkdev_pr_allowed() helper, where more policies could
be placed here later.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230613084008.93795-2-jefflexu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The device /dev/hwctr was introduced to access complete
CPU Measurement facility counter sets via an ioctl system call.
The access the to device is limited to privileged processes
running as root or superuser. The capability CAP_SYS_ADMIN
is required. The device permissions are read/write for the
device owner root. There is no need for this restriction.
Make the device access permission read/write for all and
reduce the capabilities to CAP_PERFMON.
Any user space program with the CAP_PERFMON capability assigned to it
can now read and display the CPU Measurement facility counter sets.
For more details on perf tool usage and security, see linux
documentation in Documentation/admin-guide/perf-security.rst.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
During module load, module layout allocation occurs by initially
allowing the architecture to frob the sections. This is performed via
module_frob_arch_sections().
However, the size of each module memory types like text,data,rodata etc
are updated correctly only after layout_sections().
After calculation of required module memory sizes for each types,
move_module() is responsible for allocating the module memory for each
type from modules vaddr range.
Considering the sequence above, module_frob_arch_sections() updates the
module mod_arch_specific got_offset before module memory text type size
is fully updated in layout_sections(). Hence mod_arch_specific
got_offset points to currently zero.
As per s390 ABI,
R_390_GOTENT : (G + O + A - P) >> 1
where
G=me->mem[MOD_TEXT].base+me->arch.got_offset
O=info->got_offset
A=rela->r_addend
P=loc
fix R_390_GOTENT calculation in apply_rela().
Note: currently this doesn't break anything because me->arch.got_offset
is zero. However, reordering of functions in the future could break it.
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Fix virtual vs physical address confusion (which currently are the same).
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Kernel Address Sanitizer uses 3 bits per byte to
encode memory. That is the number of bits the start
and end address of a memory range is shifted right
when the corresponding shadow memory is created for
that memory range.
The used memory mapping routine expects page-aligned
addresses, while the above described 3-bit shift might
turn the shadow memory range start and end boundaries
into non-page-aligned in case the size of the original
memory range is less than (PAGE_SIZE << 3). As result,
the resulting shadow memory range could be short on one
page.
Align on page boundary the start and end addresses when
mapping a shadow memory range and avoid the described
issue in the future.
Note, that does not fix a real problem, since currently
no virtual regions of size less than (PAGE_SIZE << 3)
exist.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Since commit 3b5c3f000c2e ("s390/kasan: move shadow mapping
to decompressor") the decompressor establishes mappings for
the shadow memory and sets initial protection attributes to
RWX. The decompressed kernel resets protection to RW+NX
later on.
In case a shadow memory range is not aligned on page boundary
(e.g. as result of mem= kernel command line parameter use),
the "Checked W+X mappings: FAILED, 1 W+X pages found" warning
hits.
Reported-by: Vasily Gorbik <gor@linux.ibm.com>
Fixes: 557b19709da9 ("s390/kasan: move shadow mapping to decompressor")
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
get_elfcorehdr_size() returns a size_t, so there is no real point to
store it in a u32.
Turn 'alloc_size' into a size_t.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/0756118c9058338f3040edb91971d0bfd100027b.1686688212.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
[BUG]
David reported an ASSERT() get triggered during fio load on 8 devices
with data/raid6 and metadata/raid1c3:
fio --rw=randrw --randrepeat=1 --size=3000m \
--bsrange=512b-64k --bs_unaligned \
--ioengine=libaio --fsync=1024 \
--name=job0 --name=job1 \
The ASSERT() is from rbio_add_bio() of raid56.c:
ASSERT(orig_logical >= full_stripe_start &&
orig_logical + orig_len <= full_stripe_start +
rbio->nr_data * BTRFS_STRIPE_LEN);
Which is checking if the target rbio is crossing the full stripe
boundary.
[100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622
[100.795] ------------[ cut here ]------------
[100.796] kernel BUG at fs/btrfs/raid56.c:1622!
[100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124
[100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[100.802] Workqueue: writeback wb_workfn (flush-btrfs-1)
[100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs]
[100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246
[100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01
[100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
[100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f
[100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400
[100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000
[100.817] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[100.821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0
[100.824] Call Trace:
[100.825] <TASK>
[100.825] ? die+0x32/0x80
[100.826] ? do_trap+0x12d/0x160
[100.827] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.827] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.829] ? do_error_trap+0x90/0x130
[100.830] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.831] ? handle_invalid_op+0x2c/0x30
[100.833] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.835] ? exc_invalid_op+0x29/0x40
[100.836] ? asm_exc_invalid_op+0x16/0x20
[100.837] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.837] raid56_parity_write+0x64/0x270 [btrfs]
[100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs]
[100.840] ? btrfs_bio_init+0x80/0x80 [btrfs]
[100.841] ? release_pages+0x503/0x6d0
[100.842] ? folio_unlock+0x2f/0x60
[100.844] ? __folio_put+0x60/0x60
[100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs]
[100.847] btrfs_submit_bio+0x21/0x60 [btrfs]
[100.847] submit_one_bio+0x6a/0xb0 [btrfs]
[100.849] extent_write_cache_pages+0x395/0x680 [btrfs]
[100.850] ? __extent_writepage+0x520/0x520 [btrfs]
[100.851] ? mark_usage+0x190/0x190
[100.852] extent_writepages+0xdb/0x130 [btrfs]
[100.853] ? extent_write_locked_range+0x480/0x480 [btrfs]
[100.854] ? mark_usage+0x190/0x190
[100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs]
[100.855] ? reacquire_held_locks+0x178/0x280
[100.856] ? writeback_sb_inodes+0x245/0x7f0
[100.857] do_writepages+0x102/0x2e0
[100.858] ? page_writeback_cpu_online+0x10/0x10
[100.859] ? __lock_release.isra.0+0x14a/0x4d0
[100.860] ? reacquire_held_locks+0x280/0x280
[100.861] ? __lock_acquired+0x1e9/0x3d0
[100.862] ? do_raw_spin_lock+0x1b0/0x1b0
[100.863] __writeback_single_inode+0x94/0x450
[100.864] writeback_sb_inodes+0x372/0x7f0
[100.864] ? lock_sync+0xd0/0xd0
[100.865] ? do_raw_spin_unlock+0x93/0xf0
[100.866] ? sync_inode_metadata+0xc0/0xc0
[100.867] ? rwsem_optimistic_spin+0x340/0x340
[100.868] __writeback_inodes_wb+0x70/0x130
[100.869] wb_writeback+0x2d1/0x530
[100.869] ? __writeback_inodes_wb+0x130/0x130
[100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0
[100.870] wb_do_writeback+0x3eb/0x480
[100.871] ? wb_writeback+0x530/0x530
[100.871] ? mark_lock_irq+0xcd0/0xcd0
[100.872] wb_workfn+0xe0/0x3f0<
[CAUSE]
Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by
BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
u64 division.
Function btrfs_max_io_len() is to get the length to the stripe boundary.
It calculates the full stripe start offset (inside the chunk) by the
following code:
*full_stripe_start =
rounddown(*stripe_nr, nr_data_stripes(map)) <<
BTRFS_STRIPE_LEN_SHIFT;
The calculation itself is fine, but the value returned by rounddown() is
dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
returned int).
Thus the result is also u32, then we do the left shift, which can
overflow u32.
If such overflow happens, @full_stripe_start will be a value way smaller
than @offset, causing later "full_stripe_len - (offset -
*full_stripe_start)" to underflow, thus make later length calculation to
have no stripe boundary limit, resulting a write bio to exceed stripe
boundary.
There are some other locations like this, with a u32 @stripe_nr got left
shift, which can lead to a similar overflow.
[FIX]
Fix all @stripe_nr with left shift with a type cast to u64 before the
left shift.
Those involved @stripe_nr or similar variables are recording the stripe
number inside the chunk, which is small enough to be contained by u32,
but their offset inside the chunk can not fit into u32.
Thus for those specific left shifts, a type cast to u64 is necessary so
this patch does not touch them and the code will be cleaned up in the
future to keep the fix minimal.
Reported-by: David Sterba <dsterba@suse.com>
Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Remove all the open coded magic on slot->file_ptr by introducing two
helpers that return the file pointer and the flags instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use io_file_from_index instead of open coding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use io_file_from_index instead of open coding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Two of the three callers want them, so return the more usual format,
and shift into the FFS_ form only for the fixed file table.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Just checking the flag directly makes it a lot more obvious what is
going on here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The SCM inflight mechanism has nothing to do with the fact that a file
might be a regular file or not and if it supports non-blocking
operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The variable is only once now, so don't bother with it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now that this only checks O_NONBLOCK and FMODE_NOWAIT, the helper is
complete overkilļ, and the comments are confusing bordering to wrong.
Just inline the check into the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620113235.920399-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
smatch warning:
drivers/accel/qaic/qaic_data.c:620 qaic_free_object() error:
dereferencing freed memory 'obj->import_attach'
obj->import_attach is detached and freed using dma_buf_detach().
But used after free to decrease the dmabuf ref count using
dma_buf_put().
drm_prime_gem_destroy() handles this issue and performs the proper clean
up instead of open coding it in the driver.
Fixes: ff13be830333 ("accel/qaic: Add datapath")
Reported-by: Sukrut Bellary <sukrut.bellary@linux.com>
Closes: https://lore.kernel.org/all/20230610021200.377452-1-sukrut.bellary@linux.com/
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230614161528.11710-1-quic_jhugo@quicinc.com
|
|
In journal_init_dev(), if super bdev is used as 'j_dev_bd', then
blkdev_get_by_dev() is called with NULL holder, otherwise, holder will
be journal. However, later in release_journal_dev(), blkdev_put() is
called with journal unconditionally, cause following warning:
WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 bd_end_claim block/bdev.c:617 [inline]
WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 blkdev_put+0x562/0x8a0 block/bdev.c:901
RIP: 0010:blkdev_put+0x562/0x8a0 block/bdev.c:901
Call Trace:
<TASK>
release_journal_dev fs/reiserfs/journal.c:2592 [inline]
free_journal_ram+0x421/0x5c0 fs/reiserfs/journal.c:1896
do_journal_release fs/reiserfs/journal.c:1960 [inline]
journal_release+0x276/0x630 fs/reiserfs/journal.c:1971
reiserfs_put_super+0xe4/0x5c0 fs/reiserfs/super.c:616
generic_shutdown_super+0x158/0x480 fs/super.c:499
kill_block_super+0x64/0xb0 fs/super.c:1422
deactivate_locked_super+0x98/0x160 fs/super.c:330
deactivate_super+0xb1/0xd0 fs/super.c:361
cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1247
task_work_run+0x16f/0x270 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xadc/0x2a30 kernel/exit.c:874
do_group_exit+0xd4/0x2a0 kernel/exit.c:1024
__do_sys_exit_group kernel/exit.c:1035 [inline]
__se_sys_exit_group kernel/exit.c:1033 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fix this problem by passing in NULL holder in this case.
Reported-by: syzbot+04625c80899f4555de39@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=04625c80899f4555de39
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620111322.1014775-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
After commit 2736e8eeb0cc ("block: use the holder as indication for
exclusive opens"), blkdev_get_by_dev() will warn if holder is NULL and
mode contains 'FMODE_EXCL'.
holder from blkdev_get_by_dev() from disk_scan_partitions() is always NULL,
hence it should not use 'FMODE_EXCL', which is broben by the commit. For
consequence, WARN_ON_ONCE() will be triggered from blkdev_get_by_dev()
if user scan partitions with device opened exclusively.
Fix this problem by removing 'FMODE_EXCL' from disk_scan_partitions(),
as it used to be.
Reported-by: syzbot+00cd27751f78817f167b@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=00cd27751f78817f167b
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230618140402.7556-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620043536.707249-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently, associating a loop device with a different file descriptor
does not increment its diskseq. This allows the following race
condition:
1. Program X opens a loop device
2. Program X gets the diskseq of the loop device.
3. Program X associates a file with the loop device.
4. Program X passes the loop device major, minor, and diskseq to
something.
5. Program X exits.
6. Program Y detaches the file from the loop device.
7. Program Y attaches a different file to the loop device.
8. The opener finally gets around to opening the loop device and checks
that the diskseq is what it expects it to be. Even though the
diskseq is the expected value, the result is that the opener is
accessing the wrong file.
From discussions with Christoph Hellwig, it appears that
disk_force_media_change() was supposed to call inc_diskseq(), but in
fact it does not. Adding a Fixes: tag to indicate this. Christoph's
Reported-by is because he stated that disk_force_media_change()
calls inc_diskseq(), which is what led me to discover that it should but
does not.
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Fixes: e6138dc12de9 ("block: add a helper to raise a media changed event")
Cc: stable@vger.kernel.org # 5.15+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix a missing conversion to the new BLK_OPEN constant in swim.
Fixes: 05bdb9965305 ("block: replace fmode_t with a block-specific type for block open flags")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620043051.707196-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can
resume execution due to NMI, SMI and MCE, which has the same issue as the
MWAIT loop.
Kicking the secondary CPUs into INIT makes this safe against NMI and SMI.
A broadcast MCE will take the machine down, but a broadcast MCE which makes
HLT resume and execute overwritten text, pagetables or data will end up in
a disaster too.
So chose the lesser of two evils and kick the secondary CPUs into INIT
unless the system has installed special wakeup mechanisms which are not
using INIT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230615193330.608657211@linutronix.de
|
|
Putting CPUs into INIT is a safer place during kexec() to park CPUs.
Split the INIT assert/deassert sequence out so it can be reused.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/20230615193330.551157083@linutronix.de
|
|
TLDR: It's a mess.
When kexec() is executed on a system with offline CPUs, which are parked in
mwait_play_dead() it can end up in a triple fault during the bootup of the
kexec kernel or cause hard to diagnose data corruption.
The reason is that kexec() eventually overwrites the previous kernel's text,
page tables, data and stack. If it writes to the cache line which is
monitored by a previously offlined CPU, MWAIT resumes execution and ends
up executing the wrong text, dereferencing overwritten page tables or
corrupting the kexec kernels data.
Cure this by bringing the offlined CPUs out of MWAIT into HLT.
Write to the monitored cache line of each offline CPU, which makes MWAIT
resume execution. The written control word tells the offlined CPUs to issue
HLT, which does not have the MWAIT problem.
That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as
those make it come out of HLT.
A follow up change will put them into INIT, which protects at least against
NMI and SMI.
Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
|
|
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an
obvious choice as all what is needed is a cache line which is not written
by other CPUs.
But there is a use case where a "dead" CPU needs to be brought out of
MWAIT: kexec().
This is required as kexec() can overwrite text, pagetables, stacks and the
monitored cacheline of the original kernel. The latter causes MWAIT to
resume execution which obviously causes havoc on the kexec kernel which
results usually in triple faults.
Use a dedicated per CPU storage to prepare for that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de
|
|
The wmb()s before sending the IPIs are not synchronizing anything.
If at all then the apic IPI functions have to provide or act as appropriate
barriers.
Remove these cargo cult barriers which have no explanation of what they are
synchronizing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.378358382@linutronix.de
|
|
stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel
CPUs return the content of the highest supported leaf when a non-existing
leaf is read, while AMD CPUs return all zeros for unsupported leafs.
So the result of the test on Intel CPUs is lottery.
While harmless it's incorrect and causes the conditional wbinvd() to be
issued where not required.
Check whether the leaf is supported before reading it.
[ tglx: Adjusted changelog ]
Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com
Link: https://lore.kernel.org/r/20230615193330.322186388@linutronix.de
|
|
Tony reported intermittent lockups on poweroff. His analysis identified the
wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that
on SME enabled machines a kexec() does not leave any stale data in the
caches when switching from encrypted to non-encrypted mode or vice versa.
That wbinvd() is conditional on the SME feature bit which is read directly
from CPUID. But that readout does not check whether the CPUID leaf is
available or not. If it's not available the CPU will return the value of
the highest supported leaf instead. Depending on the content the "SME" bit
might be set or not.
That's incorrect but harmless. Making the CPUID readout conditional makes
the observed hangs go away, but it does not fix the underlying problem:
CPU0 CPU1
stop_other_cpus()
send_IPIs(REBOOT); stop_this_cpu()
while (num_online_cpus() > 1); set_online(false);
proceed... -> hang
wbinvd()
WBINVD is an expensive operation and if multiple CPUs issue it at the same
time the resulting delays are even larger.
But CPU0 already observed num_online_cpus() going down to 1 and proceeds
which causes the system to hang.
This issue exists independent of WBINVD, but the delays caused by WBINVD
make it more prominent.
Make this more robust by adding a cpumask which is initialized to the
online CPU mask before sending the IPIs and CPUs clear their bit in
stop_this_cpu() after the WBINVD completed. Check for that cpumask to
become empty in stop_other_cpus() instead of watching num_online_cpus().
The cpumask cannot plug all holes either, but it's better than a raw
counter and allows to restrict the NMI fallback IPI to be sent only the
CPUs which have not reported within the timeout window.
Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
Reported-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com
Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
ipsec-2023-06-20
|
|
Provide helpers to set and clear sb->s_readonly_remount including
appropriate memory barriers. Also use this opportunity to document what
the barriers pair with and why they are needed.
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Message-Id: <20230620112832.5158-1-jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This microSD card never clears Flush Cache bit after cache flush has
been started in sd_flush_cache(). This leads e.g. to failure to mount
file system. Add a quirk which disables the SD cache for this specific
card from specific manufacturing date of 11/2019, since on newer dated
cards from 05/2023 the cache flush works correctly.
Fixes: 08ebf903af57 ("mmc: core: Fixup support for writeback-cache for eMMC and SD")
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20230620102713.7701-1-marex@denx.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The Kconfig currently defaults the governor to schedutil on x86_64
only when intel-pstate and SMP have been selected.
If the kernel is built only with amd-pstate, the default governor
should also be schedutil.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Tested-by: Perry Yuan <Perry.Yuan@amd.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
It seems that Kingston EMMC04G-M627 despite advertising TRIM support does
not work when the core is trying to use REQ_OP_WRITE_ZEROES.
We are seeing I/O errors in OpenWrt under 6.1 on Zyxel NBG7815 that we did
not previously have and tracked it down to REQ_OP_WRITE_ZEROES.
Trying to use fstrim seems to also throw errors like:
[93010.835112] I/O error, dev loop0, sector 16902 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 2
Disabling TRIM makes the error go away, so lets add a quirk for this eMMC
to disable TRIM.
Signed-off-by: Robert Marko <robimarko@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230619193621.437358-1-robimarko@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
On STM32MP25, the delay block is inside the SoC, and configured through
the SYSCFG registers. The algorithm is also different from what was in
STM32MP1 chip.
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Link: https://lore.kernel.org/r/20230619115120.64474-7-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Create an sdmmc_tuning_ops struct to ease support for another
delay block peripheral.
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Link: https://lore.kernel.org/r/20230619115120.64474-6-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
In stm32 sdmmc variant revision v3.0, a block gap hardware flow control
should be used with bus speed modes SDR104 and HS200.
It is enabled by writing a non-null value to the new added register
MMCI_STM32_FIFOTHRR.
The threshold will be 2^(N-1) bytes, so we can use the ffs() function to
compute the value N to be written to the register. The threshold used
should be the data block size, but must not be bigger than the FIFO size.
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Link: https://lore.kernel.org/r/20230619115120.64474-5-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
This is an update of the SDMMC revision v2.2, with just an increased
FIFO size, from 64B to 1kB.
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Link: https://lore.kernel.org/r/20230619115120.64474-4-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The alignment for the IDMA size depends on the peripheral version, it
should then be configurable. Add stm32_idmabsize_align in the variant
structure.
And remove now unused (and wrong) MMCI_STM32_IDMABNDT_* macros.
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Link: https://lore.kernel.org/r/20230619115120.64474-3-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
For STM32MP25, we'll need to distinguish how is managed the delay block.
This is done through a new comptible dedicated for this SoC, as the
delay block registers are located in SYSCFG peripheral.
Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230619115120.64474-2-yann.gautier@foss.st.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux
Merge devfreq updates for v6.5 from Chanwoo Choi:
"1. Reorder fieldls in 'struct devfreq_dev_status' in order to shrink
the size of 'struct devfreqw_dev_status' without any behavior
changes.
2. Add exynos-ppmu.c driver as a soft module dependency in order to
prevent the freeze issue between exynos-bus.c devfreq driver and
exynos-ppmu.c devfreq event driver.
3. Fix variable deferencing before NULL check on mtk-cci-devfreq.c"
* tag 'devfreq-next-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
PM / devfreq: mtk-cci: Fix variable deferencing before NULL check
PM / devfreq: exynos: add Exynos PPMU as a soft module dependency
PM / devfreq: Reorder fields in 'struct devfreq_dev_status'
|