Age | Commit message (Collapse) | Author |
|
Applying MMC_QUIRK_BROKEN_SD_CACHE is broken, as the card's SD quirks are
referenced in sd_parse_ext_reg_perf() prior to the quirks being initialized
in mmc_blk_probe().
To fix this problem, let's split out an SD-specific list of quirks and
apply in mmc_sd_init_card() instead. In this way, sd_read_ext_regs() to has
the available information for not assigning the SD_EXT_PERF_CACHE as one of
the (un)supported features, which in turn allows mmc_sd_init_card() to
properly skip execution of sd_enable_cache().
Fixes: c467c8f08185 ("mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019")
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Co-developed-by: Keita Aihara <keita.aihara@sony.com>
Signed-off-by: Keita Aihara <keita.aihara@sony.com>
Reviewed-by: Dragan Simic <dsimic@manjaro.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240820230631.GA436523@sony.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Subtract network offset to skb->len before performing IPv4 header sanity
checks, then adjust transport offset from offset from mac header.
Jorge Ortiz says:
When small UDP packets (< 4 bytes payload) are sent from eth0,
`meta l4proto udp` condition is not met because `NFT_PKTINFO_L4PROTO` is
not set. This happens because there is a comparison that checks if the
transport header offset exceeds the total length. This comparison does
not take into account the fact that the skb network offset might be
non-zero in egress mode (e.g., 14 bytes for Ethernet header).
Fixes: 0ae8e4cca787 ("netfilter: nf_tables: set transport offset from mac header for netdev/egress")
Reported-by: Jorge Ortiz <jorge.ortiz.escribano@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Move the initialization of parent->mutex into
hda_component_manager_init() so that it is always valid.
In hda_component_manager_bind() do not clear the parent information.
Only zero-fill the per-component data ready for it to be filled in
by the components as they bind.
Previously parent->mutex was being initialized only in
hda_component_manager_bind(). This meant that it was only
initialized if all components appeared and there was a bind callback.
If there wasn't a bind the mutex object was not valid when the
Realtek driver called any of the other functions.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 047b9cbbaa8e ("ALSA: hda: hda_component: Protect shared data with a mutex")
Link: https://patch.msgid.link/20240826094940.45563-1-rf@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Commit 5fbf57a937f4 ("net: netlink: remove the cb_mutex "injection" from
netlink core") has removed the usage of the 'dump_cb_mutex' field from the
struct netlink_sock.
Remove the field itself now. It saves a few bytes in the structure.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the HP Pavilion Aero 13 (13-bg0xxx) (year 2024) to list of
quirks for keyboard LED mute indication.
The laptop has two LEDs (one for speaker and one for mic mute). The
pre-existing quirk ALC245_FIXUP_HP_X360_MUTE_LEDS chains both the quirk for
mic and speaker mute.
Tested on 6.11.0-rc4 with the aforementioned laptop.
Signed-off-by: Hendrik Borghorst <hendrikborghorst@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20240825174351.5687-1-hendrikborghorst@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Like the getrandom() syscall, vDSO getrandom() must also reject unknown
flags. [1]
It would be possible to return -EINVAL from vDSO itself, but in the
possible case that a new flag is added to getrandom() syscall in the
future, it would be easier to get the behavior from the syscall, instead
of erroring until the vDSO is extended to support the new flag or
explicitly falling back.
[1] Designing the API: Planning for Extension
https://docs.kernel.org/process/adding-syscalls.html#designing-the-api-planning-for-extension
Signed-off-by: Yann Droneaud <yann@droneaud.fr>
[Jason: reworded commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Add Jean-Philippe as a reviewer for the Arm SMMUv3 SVA support, since
he's been a consistent contributor to that code over the years and
understands the relevant parts of the architecture much better than me.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240823165454.1064-1-will@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
These three implementations of map_pages() all succeed if a mapping is
requested with no read or write. Since they return back to __iommu_map()
leaving the mapped output as 0 it triggers an infinite loop. Therefore
nothing is using no-access protection bits.
Further, VFIO and iommufd rely on iommu_iova_to_phys() to get back PFNs
stored by map, if iommu_map() succeeds but iommu_iova_to_phys() fails that
will create serious bugs.
Thus remove this never used "nothing to do" concept and just fail map
immediately.
Fixes: e5fc9753b1a8 ("iommu/io-pgtable: Add ARMv7 short descriptor support")
Fixes: e1d3c0fd701d ("iommu: add ARM LPAE page table allocator")
Fixes: 745ef1092bcf ("iommu/io-pgtable: Move Apple DART support to its own file")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/2-v1-1211e1294c27+4b1-iommu_no_prot_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This results in passing 0 or just IOMMU_CACHE to iommu_map(). Most of
the page table formats don't like this:
amdv1 - -EINVAL
armv7s - returns 0, doesn't update mapped
arm-lpae - returns 0 doesn't update mapped
dart - returns 0, doesn't update mapped
VT-D - returns -EINVAL
Unfortunately the three formats that return 0 cause serious problems:
- Returning ret = but not uppdating mapped from domain->map_pages()
causes an infinite loop in __iommu_map()
- Not writing ioptes means that VFIO/iommufd have no way to recover them
and we will have memory leaks and worse during unmap
Since almost nothing can support this, and it is a useless thing to do,
block it early in iommufd.
Cc: stable@kernel.org
Fixes: aad37e71d5c4 ("iommufd: IOCTLs for the io_pagetable")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/1-v1-1211e1294c27+4b1-iommu_no_prot_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The helper intel_context_flush_present() is designed to flush all related
caches when a context entry with the present bit set is modified. It
currently retrieves the domain ID from the context entry and uses it to
flush the IOTLB and context caches. This is incorrect when the context
entry transitions from present to non-present, as the domain ID field is
cleared before calling the helper.
Fix it by passing the domain ID programmed in the context entry before the
change to intel_context_flush_present(). This ensures that the correct
domain ID is used for cache invalidation.
Fixes: f90584f4beb8 ("iommu/vt-d: Add helper to flush caches for context change")
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Closes: https://lore.kernel.org/linux-iommu/20240814162726.5efe1a6e.alex.williamson@redhat.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jacob Pan <jacob.pan@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240815124857.70038-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
I notice a rmap query bug in xfs_io fsmap:
[root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt
EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL
0: 253:16 [0..7]: static fs metadata 0 (0..7) 8
1: 253:16 [8..23]: per-AG metadata 0 (8..23) 16
2: 253:16 [24..39]: inode btree 0 (24..39) 16
3: 253:16 [40..47]: per-AG metadata 0 (40..47) 8
4: 253:16 [48..55]: refcount btree 0 (48..55) 8
5: 253:16 [56..103]: per-AG metadata 0 (56..103) 48
6: 253:16 [104..127]: free space 0 (104..127) 24
......
Bug:
[root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt
[root@fedora ~]#
Normally, we should be able to get one record, but we got nothing.
The root cause of this problem lies in the incorrect setting of rm_owner in
the rmap query. In the case of the initial query where the owner is not
set, __xfs_getfsmap_datadev() first sets info->high.rm_owner to ULLONG_MAX.
This is done to prevent any omissions when comparing rmap items. However,
if the current ag is detected to be the last one, the function sets info's
high_irec based on the provided key. If high->rm_owner is not specified, it
should continue to be set to ULLONG_MAX; otherwise, there will be issues
with interval omissions. For example, consider "start" and "end" within the
same block. If high->rm_owner == 0, it will be smaller than the founded
record in rmapbt, resulting in a query with no records. The main call stack
is as follows:
xfs_ioc_getfsmap
xfs_getfsmap
xfs_getfsmap_datadev_rmapbt
__xfs_getfsmap_datadev
info->high.rm_owner = ULLONG_MAX
if (pag->pag_agno == end_ag)
xfs_fsmap_owner_to_rmap
// set info->high.rm_owner = 0 because fmr_owner == -1ULL
dest->rm_owner = 0
// get nothing
xfs_getfsmap_datadev_rmapbt_query
The problem can be resolved by simply modify the xfs_fsmap_owner_to_rmap
function internal logic to achieve.
After applying this patch, the above problem have been solved:
[root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt
EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL
0: 253:16 [0..7]: static fs metadata 0 (0..7) 8
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Don't bother reporting the number of bytes that we "trimmed" because the
underlying storage isn't required to do anything(!) and failed discard
IOs aren't reported to the caller anyway. It's not like userspace can
use the reported value for anything useful like adjusting the offset
parameter of the next call, and it's not like anyone ever wrote a
manpage about FITRIM's out parameters.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
As a result of the factoring in commit 14dd46cf31f4 ("xfs: split
xfs_inobt_init_cursor"), mount started taking a long time on a
user's filesystem. For Anders, this made mount times regress from
under a second to over 15 minutes for a filesystem with only 30
million inodes in it.
Anders bisected it down to the above commit, but even then the bug
was not obvious. In this commit, over 20 calls to
xfs_inobt_init_cursor() were modified, and some we modified to call
a new function named xfs_finobt_init_cursor().
If that takes you a moment to reread those function names to see
what the rename was, then you have realised why this bug wasn't
spotted during review. And it wasn't spotted on inspection even
after the bisect pointed at this commit - a single missing "f" isn't
the easiest thing for a human eye to notice....
The result is that xfs_finobt_count_blocks() now incorrectly calls
xfs_inobt_init_cursor() so it is now walking the inobt instead of
the finobt. Hence when there are lots of allocated inodes in a
filesystem, mount takes a -long- time run because it now walks a
massive allocated inode btrees instead of the small, nearly empty
free inode btrees. It also means all the finobt space reservations
are wrong, so mount could potentially given ENOSPC on kernel
upgrade.
In hindsight, commit 14dd46cf31f4 should have been two commits - the
first to convert the finobt callers to the new API, the second to
modify the xfs_inobt_init_cursor() API for the inobt callers. That
would have made the bug very obvious during review.
Fixes: 14dd46cf31f4 ("xfs: split xfs_inobt_init_cursor")
Reported-by: Anders Blomdell <anders.blomdell@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
willy pointed out that folio_mark_dirty is the correct function to use
to mark an xfile folio dirty because it calls out to the mapping's aops
to mark it dirty. For tmpfs this likely doesn't matter much since it
currently uses nop_dirty_folio, but let's use the abstractions properly.
Reported-by: willy@infradead.org
Fixes: 6907e3c00a40 ("xfs: add file_{get,put}_folio")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
"KjellR" complained on IRC that an old V4 filesystem suddenly stopped
mounting after upgrading from 6.9.11 to 6.10.3, with the following splat
when trying to read the rt bitmap inode:
00000000: 49 4e 80 00 01 02 00 01 00 00 00 00 00 00 00 00 IN..............
00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020: 00 00 00 00 00 00 00 00 43 d2 a9 da 21 0f d6 30 ........C...!..0
00000030: 43 d2 a9 da 21 0f d6 30 00 00 00 00 00 00 00 00 C...!..0........
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000050: 00 00 00 02 00 00 00 00 00 00 00 04 00 00 00 00 ................
00000060: ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
As Dave Chinner points out, this is a V1 inode with both di_onlink and
di_nlink set to 1 and di_flushiter == 0. In other words, this inode was
formatted this way by mkfs and hasn't been touched since then.
Back in the old days of xfsprogs 3.2.3, I observed that libxfs_ialloc
would set di_nlink, but if the filesystem didn't have NLINK, it would
then set di_version = 1. libxfs_iflush_int later sees the V1 inode and
copies the value of di_nlink to di_onlink without zeroing di_onlink.
Eventually this filesystem must have been upgraded to support NLINK
because 6.10 doesn't support !NLINK filesystems, which is how we tripped
over this old behavior. The filesystem doesn't have a realtime section,
so that's why the rtbitmap inode has never been touched.
Fix this by removing the di_onlink/di_nlink checking for all V1/V2
inodes because this is a muddy mess. The V3 inode handling code has
always supported NLINK and written di_onlink==0 so keep that check.
The removal of the V1 inode handling code when we dropped support for
!NLINK obscured this old behavior.
Reported-by: kjell.m.randa@gmail.com
Fixes: 40cb8613d612 ("xfs: check unused nlink fields in the ondisk inode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
|
Coherent surfaces make only sense if the host renders to them using
accelerated apis. Without 3d the entire content of dumb buffers stays
in the guest making all of the extra work they're doing to synchronize
between guest and host useless.
Configurations without 3d also tend to run with very low graphics
memory limits. The pinned console fb, mob cursors and graphical login
manager tend to run out of 16MB graphics memory that those guests use.
Fix it by making sure the coherent dumb buffers are only used on
configs with 3d enabled.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers")
Reported-by: Christian Heusel <christian@heusel.eu>
Closes: https://lore.kernel.org/all/0d0330f3-2ac0-4cd5-8075-7f1cbaf72a8e@heusel.eu
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.9+
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-4-zack.rusin@broadcom.com
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
|
|
Make sure that for external buffers mapping goes through the dma_buf
interface instead of trying to access pages directly.
External buffers might not provide direct access to readable/writable
pages so to make sure the bo's created from external dma_bufs can be
read dma_buf interface has to be used.
Fixes crashes in IGT's kms_prime with vgem. Regular desktop usage won't
trigger this due to the fact that virtual machines will not have
multiple GPUs but it enables better test coverage in IGT.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export")
Cc: <stable@vger.kernel.org> # v6.6+
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.9+
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-3-zack.rusin@broadcom.com
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
|
|
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-2-zack.rusin@broadcom.com
Reviewed-by: Martin Krastev <martin.krastev@broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com>
|
|
policy_unpack_test fails on big endian systems because data byte order
is expected to be little endian but is generated in host byte order.
This results in test failures such as:
# policy_unpack_test_unpack_array_with_null_name: EXPECTATION FAILED at security/apparmor/policy_unpack_test.c:150
Expected array_size == (u16)16, but
array_size == 4096 (0x1000)
(u16)16 == 16 (0x10)
# policy_unpack_test_unpack_array_with_null_name: pass:0 fail:1 skip:0 total:1
not ok 3 policy_unpack_test_unpack_array_with_null_name
# policy_unpack_test_unpack_array_with_name: EXPECTATION FAILED at security/apparmor/policy_unpack_test.c:164
Expected array_size == (u16)16, but
array_size == 4096 (0x1000)
(u16)16 == 16 (0x10)
# policy_unpack_test_unpack_array_with_name: pass:0 fail:1 skip:0 total:1
Add the missing endianness conversions when generating test data.
Fixes: 4d944bcd4e73 ("apparmor: add AppArmor KUnit tests for policy unpack")
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
This reverts commit 7190401fc56fb5f02ee3d04476778ab000bbaf32.
Verifying the clock source sometimes deems the MIPS clock
to be unstable, at least in qemu.
clocksource: timekeeping watchdog on CPU0: Marking clocksource 'MIPS' as unstable because the skew is too large:
clocksource: 'jiffies' wd_nsec: 500000000 wd_now: ffff8bde wd_last: ffff8bac mask: ffffffff
clocksource: 'MIPS' cs_nsec: 940634468 cs_now: 310181c4 cs_last: 28090a09 mask: ffffffff
clocksource: Clocksource 'MIPS' skewed 440634468 ns (440 ms) over watchdog 'jiffies' interval of 500000000 ns (500 ms)
clocksource: 'MIPS' is current clocksource.
If this happens, network interfaces fail to come online.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Before commit 721f4a6526da ("mm/memblock: remove empty dummy entry") the
check for non-zero of memblock.reserved.cnt in mmu_init() would always
be true either because memblock.reserved.cnt is initialized to 1 or
because there were memory reservations earlier.
The removal of dummy empty entry in memblock caused this check to fail
because now memblock.reserved.cnt is initialized to 0.
Remove the check for non-zero of memblock.reserved.cnt because it's
perfectly fine to have an empty memblock.reserved array that early in
boot.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240729053327.4091459-1-rppt@kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
X570-E GAMING does not have VRM temperature sensor.
Signed-off-by: Ross Brown <true.robot.ross@gmail.com>
Signed-off-by: Eugene Shalygin <eugene.shalygin@gmail.com>
Link: https://lore.kernel.org/r/20240730062320.5188-2-eugene.shalygin@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
We have transient failures with btrfs/301, specifically in the part
where we do
for i in $(seq 0 10); do
write 50m to file
rm -f file
done
Sometimes this will result in a transient quota error, and it's because
sometimes we start writeback on the file which results in a delayed
iput, and thus the rm doesn't actually clean the file up. When we're
flushing the quota space we need to run the delayed iputs to make sure
all the unlinks that we think have completed have actually completed.
This removes the small window where we could fail to find enough space
in our quota.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Running the ltp test cve-2015-3290 concurrently reports the following
warnings.
perfevents: irq loop stuck!
WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174
intel_pmu_handle_irq+0x285/0x370
Call Trace:
<NMI>
? __warn+0xa4/0x220
? intel_pmu_handle_irq+0x285/0x370
? __report_bug+0x123/0x130
? intel_pmu_handle_irq+0x285/0x370
? __report_bug+0x123/0x130
? intel_pmu_handle_irq+0x285/0x370
? report_bug+0x3e/0xa0
? handle_bug+0x3c/0x70
? exc_invalid_op+0x18/0x50
? asm_exc_invalid_op+0x1a/0x20
? irq_work_claim+0x1e/0x40
? intel_pmu_handle_irq+0x285/0x370
perf_event_nmi_handler+0x3d/0x60
nmi_handle+0x104/0x330
Thanks to Thomas Gleixner's analysis, the issue is caused by the low
initial period (1) of the frequency estimation algorithm, which triggers
the defects of the HW, specifically erratum HSW11 and HSW143. (For the
details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/)
The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL
event, but the initial period in the freq mode is 1. The erratum is the
same as the BDM11, which has been supported in the kernel. A minimum
period of 128 is enforced as well on HSW.
HSW143 is regarding that the fixed counter 1 may overcount 32 with the
Hyper-Threading is enabled. However, based on the test, the hardware
has more issues than it tells. Besides the fixed counter 1, the message
'interrupt took too long' can be observed on any counter which was armed
with a period < 32 and two events expired in the same NMI. A minimum
period of 32 is enforced for the rest of the events.
The recommended workaround code of the HSW143 is not implemented.
Because it only addresses the issue for the fixed counter. It brings
extra overhead through extra MSR writing. No related overcounting issue
has been reported so far.
Fixes: 3a632cb229bf ("perf/x86/intel: Add simple Haswell PMU support")
Reported-by: Li Huafei <lihuafei1@huawei.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com
Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/
|
|
nr_iovs is capped at 1024, and mode only has a few low values. We can
safely make them u16, in preparation for adding a few more members.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Expose min_wait_usec in io_uring_getevents_arg, replacing the pad member
that is currently in there. The value is in usecs, which is explained in
the name as well.
Note that if min_wait_usec and a normal timeout is used in conjunction,
the normal timeout is still relative to the base time. For example, if
min_wait_usec is set to 100 and the normal timeout is 1000, the max
total time waited is still 1000. This also means that if the normal
timeout is shorter than min_wait_usec, then only the min_wait_usec will
take effect.
See previous commit for an explanation of how this works.
IORING_FEAT_MIN_TIMEOUT is added as a feature flag for this, as
applications doing submit_and_wait_timeout() style operations will
generally not see the -EINVAL from the wait side as they return the
number of IOs submitted. Only if no IOs are submitted will the -EINVAL
bubble back up to the application.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Waiting for events with io_uring has two knobs that can be set:
1) The number of events to wake for
2) The timeout associated with the event
Waiting will abort when either of those conditions are met, as expected.
This adds support for a third event, which is associated with the number
of events to wait for. Applications generally like to handle batches of
completions, and right now they'd set a number of events to wait for and
the timeout for that. If no events have been received but the timeout
triggers, control is returned to the application and it can wait again.
However, if the application doesn't have anything to do until events are
reaped, then it's possible to make this waiting more efficient.
For example, the application may have a latency time of 50 usecs and
wanting to handle a batch of 8 requests at the time. If it uses 50 usecs
as the timeout, then it'll be doing 20K context switches per second even
if nothing is happening.
This introduces the notion of min batch wait time. If the min batch wait
time expires, then we'll return to userspace if we have any events at all.
If none are available, the general wait time is applied. Any request
arriving after the min batch wait time will cause waiting to stop and
return control to the application.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation for having two distinct timeouts and avoid waking the
task if we don't need to.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In preparation for expanding how we handle waits, move the actual
schedule and schedule_timeout() handling into a helper.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Rather than need to pass in 2 or 3 separate arguments, add a struct
to encapsulate the timeout and sigset_t parts of waiting. In preparation
for adding another argument for waiting.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new registration opcode IORING_REGISTER_CLOCK, which allows the
user to select which clock id it wants to use with CQ waiting timeouts.
It only allows a subset of all posix clocks and currently supports
CLOCK_MONOTONIC and CLOCK_BOOTTIME.
Suggested-by: Lewis Baker <lewissbaker@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/98f2bc8a3c36cdf8f0e6a275245e81e903459703.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In addition to current relative timeouts for the waiting loop, where the
timespec argument specifies the maximum time it can wait for, add
support for the absolute mode, with the value carrying a CLOCK_MONOTONIC
absolute time until which we should return control back to the user.
Suggested-by: Lewis Baker <lewissbaker@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4d5b74d67ada882590b2e42aa3aa7117bbf6b55f.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove io_napi_adjust_timeout() and move the adjustments out of the
common path into __io_napi_busy_loop(). Now the limit it's calculated
based on struct io_wait_queue::timeout, for which we query current time
another time. The overhead shouldn't be a problem, it's a polling path,
however that can be optimised later by additionally saving the delta
time value in io_cqring_wait().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/88e14686e245b3b42ff90a3c4d70895d48676206.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
we don't need to set ->napi_prefer_busy_poll if we're not going to poll,
do the checks first and all polling preparation after.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2ad7ede8cc7905328fc62e8c3805fdb11635ae0b.1723039801.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We could just move these two and save some space, but in preparation
for adding another flag, turn them into flags first.
This saves 8 bytes in struct io_buffer_list, making it exactly half
a cacheline on 64-bit archs now rather than 40 bytes.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Just like what is being done on the recv side, if we only map a single
segment, then use ITER_UBUF for mapping it. That's more efficient than
using an ITER_IOVEC.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
req->buf_list is assigned higher up and is safe to use as we remain
within a locked region, as is the 'bl' variable itself from which it
was assigned. To improve readability, use 'bl' directly rather than
get it from the io_kiocb, if we need to increment the head directly
in the buffer selection path. This makes it readily apparent that
it's the same io_buffer_list being used.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
reverse the order of the element evaluation in an if statement.
for many users that are not using iopoll, the iopoll_list will always
evaluate to false after having made a memory access whereas to_submit is
very likely already loaded in a register.
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/052ca60b5c49e7439e4b8bd33bfab4a09d36d3d6.1722374371.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add support for checking and coalescing multi-hugepage-backed fixed
buffers. The coalescing optimizes both time and space consumption caused
by mapping and storing multi-hugepage fixed buffers.
A coalescable multi-hugepage buffer should fully cover its folios
(except potentially the first and last one), and these folios should
have the same size. These requirements are for easier processing later,
also we need same size'd chunks in io_import_fixed for fast iov_iter
adjust.
Signed-off-by: Chenliang Li <cliang01.li@samsung.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20240731090133.4106-3-cliang01.li@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Store the folio shift and folio mask into imu struct and use it in
iov_iter adjust, as we will have non PAGE_SIZE'd chunks if a
multi-hugepage buffer get coalesced.
Signed-off-by: Chenliang Li <cliang01.li@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20240731090133.4106-2-cliang01.li@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This info may be useful when attempting to debug a problem involving a
ring using the NAPI feature.
Here is an example of the output:
ip-172-31-39-89 /proc/772/fdinfo # cat 14
pos: 0
flags: 02000002
mnt_id: 16
ino: 10243
SqMask: 0xff
SqHead: 633
SqTail: 633
CachedSqHead: 633
CqMask: 0x3fff
CqHead: 430250
CqTail: 430250
CachedCqTail: 430250
SQEs: 0
CQEs: 0
SqThread: 885
SqThreadCpu: 0
SqTotalTime: 52793826
SqWorkTime: 3590465
UserFiles: 0
UserBufs: 0
PollList:
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=6, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=6, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
op=10, task_works=0
CqOverflowList:
NAPI: enabled
napi_busy_poll_to: 1
napi_prefer_busy_poll: true
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bb184f8b62703ddd3e6e19eae7ab6c67b97e1e10.1722293317.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
144d:c1cc requires the same workaround to enable the speaker amp
as other Samsung models with the ALC298 codec.
Signed-off-by: YOUNGJIN JOO <neoelec@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20240825092515.28728-1-neoelec@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
dma-direct.h is introduced in commit d4b6f1562a3c3284 ("LoongArch: Add
Non-Uniform Memory Access (NUMA) support"). In commit c78c43fe7d42524c
("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA"),
ARCH_HAS_PHYS_TO_DMA was deselected and the coresponding phys_to_dma()/
dma_to_phys() functions were removed. However, the unused dma-direct.h
was left behind, which is removed by this patch.
Cc: <stable@vger.kernel.org>
Fixes: c78c43fe7d42 ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA")
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
The cifs filesystem doesn't quite emulate FALLOC_FL_PUNCH_HOLE correctly
(note that due to lack of protocol support, it can't actually implement it
directly). Whilst it will (partially) invalidate dirty folios in the
pagecache, it doesn't write them back first, and so the EOF marker on the
server may be lower than inode->i_size.
This presents a problem, however, as if the punched hole invalidates the
tail of the locally cached dirty data, writeback won't know it needs to
move the EOF over to account for the hole punch (which isn't supposed to
move the EOF). We could just write zeroes over the punched out region of
the pagecache and write that back - but this is supposed to be a
deallocatory operation.
Fix this by manually moving the EOF over on the server after the operation
if the hole punched would corrupt it.
Note that the FSCTL_SET_ZERO_DATA RPC and the setting of the EOF should
probably be compounded to stop a third party interfering (or, at least,
massively reduce the chance).
This was reproducible occasionally by using fsx with the following script:
truncate 0x0 0x375e2 0x0
punch_hole 0x2f6d3 0x6ab5 0x375e2
truncate 0x0 0x3a71f 0x375e2
mapread 0xee05 0xcf12 0x3a71f
write 0x2078e 0x5604 0x3a71f
write 0x3ebdf 0x1421 0x3a71f *
punch_hole 0x379d0 0x8630 0x40000 *
mapread 0x2aaa2 0x85b 0x40000
fallocate 0x1b401 0x9ada 0x40000
read 0x15f2 0x7d32 0x40000
read 0x32f37 0x7a3b 0x40000 *
The second "write" should extend the EOF to 0x40000, and the "punch_hole"
should operate inside of that - but that depends on whether the VM gets in
and writes back the data first. If it doesn't, the file ends up 0x3a71f in
size, not 0x40000.
Fixes: 31742c5a3317 ("enable fallocate punch hole ("fallocate -p") for SMB3")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
rqst.rq_iter needs to be truncated otherwise we'll
also send the bytes into the stream socket...
This is the logic behind rqst.rq_npages = 0, which was removed in
"cifs: Change the I/O paths to use an iterator rather than a page list"
(d08089f649a0cfb2099c8551ac47eef0cc23fdf2).
Cc: stable@vger.kernel.org
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Reviewed-by: David Howells <dhowells@redhat.com>
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This happens when called from SMB2_read() while using rdma
and reaching the rdma_readwrite_threshold.
Cc: stable@vger.kernel.org
Fixes: a6559cc1d35d ("cifs: split out smb3_use_rdma_offload() helper")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
|
|
Pull bcachefs fixes from Kent Overstreet:
- assorted syzbot fixes
- some upgrade fixes for old (pre 1.0) filesystems
- fix for moving data off a device that was switched to durability=0
after data had been written to it.
- nocow deadlock fix
- fix for new rebalance_work accounting
* tag 'bcachefs-2024-08-24' of git://evilpiepirate.org/bcachefs: (28 commits)
bcachefs: Fix rebalance_work accounting
bcachefs: Fix failure to flush moves before sleeping in copygc
bcachefs: don't use rht_bucket() in btree_key_cache_scan()
bcachefs: add missing inode_walker_exit()
bcachefs: clear path->should_be_locked in bch2_btree_key_cache_drop()
bcachefs: Fix double assignment in check_dirent_to_subvol()
bcachefs: Fix refcounting in discard path
bcachefs: Fix compat issue with old alloc_v4 keys
bcachefs: Fix warning in bch2_fs_journal_stop()
fs/super.c: improve get_tree() error message
bcachefs: Fix missing validation in bch2_sb_journal_v2_validate()
bcachefs: Fix replay_now_at() assert
bcachefs: Fix locking in bch2_ioc_setlabel()
bcachefs: fix failure to relock in btree_node_fill()
bcachefs: fix failure to relock in bch2_btree_node_mem_alloc()
bcachefs: unlock_long() before resort in journal replay
bcachefs: fix missing bch2_err_str()
bcachefs: fix time_stats_to_text()
bcachefs: Fix bch2_bucket_gens_init()
bcachefs: Fix bch2_trigger_alloc assert
...
|
|
Pull smb server fixes from Steve French:
- query directory flex array fix
- fix potential null ptr reference in open
- fix error message in some open cases
- two minor cleanups
* tag '6.11-rc5-server-fixes' of git://git.samba.org/ksmbd:
smb/server: update misguided comment of smb2_allocate_rsp_buf()
smb/server: remove useless assignment of 'file_present' in smb2_open()
smb/server: fix potential null-ptr-deref of lease_ctx_info in smb2_open()
smb/server: fix return value of smb2_open()
ksmbd: the buffer of smb2 query dir response has at least 1 byte
|