Age | Commit message (Collapse) | Author |
|
Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use
memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can
drop the gfp_mask parameter from blkdev_zone_mgmt() as well as
blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated().
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix performance regression introduced by moving the security
permission hook out of do_clone_file_range() and into its caller
vfs_clone_file_range().
This causes the security hook to be called in situation were it
wasn't called before as the fast permission checks were left in
do_clone_file_range().
Fix this by merging the two implementations back together and
restoring the old ordering: fast permission checks first, expensive
ones later.
- Tweak mount_setattr() permission checking so that mount properties on
the real rootfs can be changed.
When we added mount_setattr() we added additional checks compared to
legacy mount(2). If the mount had a parent then verify that the
caller and the mount namespace the mount is attached to match and if
not make sure that it's an anonymous mount.
But the real rootfs falls into neither category. It is neither an
anoymous mount because it is obviously attached to the initial mount
namespace but it also obviously doesn't have a parent mount. So that
means legacy mount(2) allows changing mount properties on the real
rootfs but mount_setattr(2) blocks this. This causes regressions (See
the commit for details).
Fix this by relaxing the check. If the mount has a parent or if it
isn't a detached mount, verify that the mount namespaces of the
caller and the mount are the same. Technically, we could probably
write this even simpler and check that the mount namespaces match if
it isn't a detached mount. But the slightly longer check makes it
clearer what conditions one needs to think about.
* tag 'vfs-6.8-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: relax mount_setattr() permission checks
remap_range: merge do_clone_file_range() into vfs_clone_file_range()
|
|
...and eliminate the stub version when CONFIG_FILE_LOCKING is disabled.
This silences the following warning that crept in recently:
fs/ceph/locks.c: In function 'ceph_count_locks':
fs/ceph/locks.c:380:27: error: unused variable 'lock' [-Werror=unused-variable]
380 | struct file_lock *lock;
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402062210.3YyBVGF1-lkp@intel.com/
Fixes: 75cabec0111b ("filelock: add some new helper functions")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240212-flsplit3-v1-1-019f0ad6bf69@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a new ioctl for getting the sysfs name of a filesystem - the path
under /sys/fs.
This is going to let us standardize exporting data from sysfs across
filesystems, e.g. time stats.
The returned path will always be of the form "$FSTYP/$SYSFS_IDENTIFIER",
where the sysfs identifier may be a UUID (for bcachefs) or a device name
(xfs).
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240207025624.1019754-6-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that mode is in struct coresight_device, sets can be wrapped. This
also allows us to add a sanity check that there have been no concurrent
modifications of mode. Currently all usages of local_set() were inside
the device's spin locks so this new warning shouldn't be triggered.
coresight_take_mode() could maybe have been used in place of adding
the warning, but there may be use cases which set the mode to the same
mode which are valid but would fail in coresight_take_mode() because
it requires the device to only be in the disabled state.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-13-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
Now that mode is in struct coresight_device accesses can be wrapped.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-12-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
Now that mode is in struct coresight_device, this pattern can be wrapped
in a helper.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-11-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
These are a bit annoying to keep up to date when the function signatures
change. But if CONFIG_CORESIGHT isn't enabled, then they're not used
anyway so just delete them.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-9-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
Refcnt is only ever accessed from either inside the coresight_mutex, or
the device's spinlock, making the atomic type and atomic_dec_return()
calls confusing and unnecessary. The only point of synchronisation
outside of these two types of locks is already done with a compare and
swap on 'mode', which a comment has been added for.
There was one instance of refcnt being used outside of a lock in TPIU,
but that can easily be fixed by making it the same as all the other
devices and adding a spinlock. Potentially in the future all the
refcounting and locking can be moved up into the core code, and all the
mostly duplicate code from the individual devices can be removed.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-8-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
At the moment the core file contains both sysfs functionality and
core functionality, while the Perf mode is in a separate file in
coresight-etm-perf.c
Many of the functions have ambiguous names like
coresight_enable_source() which actually only work in relation to the
sysfs mode. To avoid further confusion, move everything that isn't core
functionality into the sysfs file and append _sysfs to the ambiguous
functions.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-7-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
'enable', which probably should have been 'enabled', is only ever read
in the core code in relation to controlling sources, and specifically
only sources in sysfs mode. Confusingly it's not labelled as such and
relying on it can be a source of bugs like the one fixed by
commit 078dbba3f0c9 ("coresight: Fix crash when Perf and sysfs modes are
used concurrently").
Most importantly, it can only be used when the coresight_mutex is held
which is only done when enabling and disabling paths in sysfs mode, and
not Perf mode. So to prevent its usage spreading and leaking out to
other devices, remove it.
It's use is equivalent to checking if the mode is currently sysfs, as
due to the coresight_mutex lock, mode == CS_MODE_SYSFS can only become
true or untrue when that lock is held, and when mode == CS_MODE_SYSFS
the device is both enabled and in sysfs mode.
The one place it was used outside of the core code is in TPDA, but that
pattern is more appropriately represented using refcounts inside the
device's own spinlock.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-6-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
Most devices use mode, so move the mode definition out of the individual
devices and up to the Coresight device. This will allow the core code to
also know the mode which will be useful in a later commit.
This also fixes the inconsistency of the documentation of the mode field
on the individual device types. For example ETB10 had "this ETB is being
used".
Two devices didn't require an atomic mode type, so these usages have
been converted to atomic_get() and atomic_set() only to make it compile,
but the documentation of the field in struct coresight_device explains
this type of usage.
In the future, manipulation of the mode could be completely moved out of
the individual devices and into the core code because it's almost all
duplicate code, and this change is a step towards that.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-5-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
Activated has the specific meaning of a sink that's selected for use by
the user via sysfs. But comments in some code that's shared by Perf use
the same word, so in those cases change them to just say "selected"
instead. With selected implying either via Perf or "activated" via
sysfs.
coresight_get_enabled_sink() doesn't actually get an enabled sink, it
only gets an activated one, so change that too.
And change the activated variable name to include "sysfs" so it can't
be confused as a general status.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20240129154050.569566-3-james.clark@arm.com
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
|
|
Now that the driver core can properly handle constant struct bus_type,
move the coresight_bustype variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: James Clark <james.clark@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/2024010531-tinfoil-avert-4a57@gregkh
|
|
dev->lstats is notably used from loopback ndo_start_xmit()
and other virtual drivers.
Per cpu stats updates are dirtying per-cpu data,
but the pointer itself is read-only.
Fixes: 43a71cd66b9c ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tp->tcp_usec_ts is a read mostly field, used in rx and tx fast paths.
Fixes: d5fed5addb2b ("tcp: reorganize tcp_sock fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Wei Wang <weiwan@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tp->scaling_ratio is a read mostly field, used in rx and tx fast paths.
Fixes: d5fed5addb2b ("tcp: reorganize tcp_sock fast path variables")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Wei Wang <weiwan@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The match function used in gpio_device_find() should not modify the
contents of passed opaque pointer, because such modification would not
be necessary for actual matching and it could lead to quite unreadable,
spaghetti code.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[Bartosz: fix coding style in header]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Linux 6.8-rc4
Pulling this for a bugfix upstream with which the gpio/for-next branch
conflicts.
|
|
In various performance profiles of kernels with BPF programs attached,
bpf_local_storage_lookup() appears as a significant portion of CPU
cycles spent. To enable the compiler generate more optimal code, turn
bpf_local_storage_lookup() into a static inline function, where only the
cache insertion code path is outlined
Notably, outlining cache insertion helps avoid bloating callers by
duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on
architectures which do not inline spin_lock/unlock, such as x86), which
would cause the compiler produce worse code by deciding to outline
otherwise inlinable functions. The call overhead is neutral, because we
make 2 calls either way: either calling raw_spin_lock_irqsave() and
raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(),
which calls raw_spin_lock_irqsave(), followed by a tail-call to
raw_spin_unlock_irqsave() where the compiler can perform TCO and (in
optimized uninstrumented builds) turns it into a plain jump. The call to
__bpf_local_storage_insert_cache() can be elided entirely if
cacheit_lockit is a false constant expression.
Based on results from './benchs/run_bench_local_storage.sh' (21 trials,
reboot between each trial; x86 defconfig + BPF, clang 16) this produces
improvements in throughput and latency in the majority of cases, with an
average (geomean) improvement of 8%:
+---- Hashmap Control --------------------
|
| + num keys: 10
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 14.789 M ops/s | 14.745 M ops/s ( ~ )
| +- hits latency | 67.679 ns/op | 67.879 ns/op ( ~ )
| +- important_hits throughput | 14.789 M ops/s | 14.745 M ops/s ( ~ )
|
| + num keys: 1000
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 12.233 M ops/s | 12.170 M ops/s ( ~ )
| +- hits latency | 81.754 ns/op | 82.185 ns/op ( ~ )
| +- important_hits throughput | 12.233 M ops/s | 12.170 M ops/s ( ~ )
|
| + num keys: 10000
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 7.220 M ops/s | 7.204 M ops/s ( ~ )
| +- hits latency | 138.522 ns/op | 138.842 ns/op ( ~ )
| +- important_hits throughput | 7.220 M ops/s | 7.204 M ops/s ( ~ )
|
| + num keys: 100000
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 5.061 M ops/s | 5.165 M ops/s (+2.1%)
| +- hits latency | 198.483 ns/op | 194.270 ns/op (-2.1%)
| +- important_hits throughput | 5.061 M ops/s | 5.165 M ops/s (+2.1%)
|
| + num keys: 4194304
| : <before> | <after>
| +-+ hashmap (control) sequential get +----------------------+----------------------
| +- hits throughput | 2.864 M ops/s | 2.882 M ops/s ( ~ )
| +- hits latency | 365.220 ns/op | 361.418 ns/op (-1.0%)
| +- important_hits throughput | 2.864 M ops/s | 2.882 M ops/s ( ~ )
|
+---- Local Storage ----------------------
|
| + num_maps: 1
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 33.005 M ops/s | 39.068 M ops/s (+18.4%)
| +- hits latency | 30.300 ns/op | 25.598 ns/op (-15.5%)
| +- important_hits throughput | 33.005 M ops/s | 39.068 M ops/s (+18.4%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 37.151 M ops/s | 44.926 M ops/s (+20.9%)
| +- hits latency | 26.919 ns/op | 22.259 ns/op (-17.3%)
| +- important_hits throughput | 37.151 M ops/s | 44.926 M ops/s (+20.9%)
|
| + num_maps: 10
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 32.288 M ops/s | 38.099 M ops/s (+18.0%)
| +- hits latency | 30.972 ns/op | 26.248 ns/op (-15.3%)
| +- important_hits throughput | 3.229 M ops/s | 3.810 M ops/s (+18.0%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 34.473 M ops/s | 41.145 M ops/s (+19.4%)
| +- hits latency | 29.010 ns/op | 24.307 ns/op (-16.2%)
| +- important_hits throughput | 12.312 M ops/s | 14.695 M ops/s (+19.4%)
|
| + num_maps: 16
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 32.524 M ops/s | 38.341 M ops/s (+17.9%)
| +- hits latency | 30.748 ns/op | 26.083 ns/op (-15.2%)
| +- important_hits throughput | 2.033 M ops/s | 2.396 M ops/s (+17.9%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 34.575 M ops/s | 41.338 M ops/s (+19.6%)
| +- hits latency | 28.925 ns/op | 24.193 ns/op (-16.4%)
| +- important_hits throughput | 11.001 M ops/s | 13.153 M ops/s (+19.6%)
|
| + num_maps: 17
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 28.861 M ops/s | 32.756 M ops/s (+13.5%)
| +- hits latency | 34.649 ns/op | 30.530 ns/op (-11.9%)
| +- important_hits throughput | 1.700 M ops/s | 1.929 M ops/s (+13.5%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 31.529 M ops/s | 36.110 M ops/s (+14.5%)
| +- hits latency | 31.719 ns/op | 27.697 ns/op (-12.7%)
| +- important_hits throughput | 9.598 M ops/s | 10.993 M ops/s (+14.5%)
|
| + num_maps: 24
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 18.602 M ops/s | 19.937 M ops/s (+7.2%)
| +- hits latency | 53.767 ns/op | 50.166 ns/op (-6.7%)
| +- important_hits throughput | 0.776 M ops/s | 0.831 M ops/s (+7.2%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 21.718 M ops/s | 23.332 M ops/s (+7.4%)
| +- hits latency | 46.047 ns/op | 42.865 ns/op (-6.9%)
| +- important_hits throughput | 6.110 M ops/s | 6.564 M ops/s (+7.4%)
|
| + num_maps: 32
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 14.118 M ops/s | 14.626 M ops/s (+3.6%)
| +- hits latency | 70.856 ns/op | 68.381 ns/op (-3.5%)
| +- important_hits throughput | 0.442 M ops/s | 0.458 M ops/s (+3.6%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 17.111 M ops/s | 17.906 M ops/s (+4.6%)
| +- hits latency | 58.451 ns/op | 55.865 ns/op (-4.4%)
| +- important_hits throughput | 4.776 M ops/s | 4.998 M ops/s (+4.6%)
|
| + num_maps: 100
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 5.281 M ops/s | 5.528 M ops/s (+4.7%)
| +- hits latency | 192.398 ns/op | 183.059 ns/op (-4.9%)
| +- important_hits throughput | 0.053 M ops/s | 0.055 M ops/s (+4.9%)
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 6.265 M ops/s | 6.498 M ops/s (+3.7%)
| +- hits latency | 161.436 ns/op | 152.877 ns/op (-5.3%)
| +- important_hits throughput | 1.636 M ops/s | 1.697 M ops/s (+3.7%)
|
| + num_maps: 1000
| : <before> | <after>
| +-+ local_storage cache sequential get +----------------------+----------------------
| +- hits throughput | 0.355 M ops/s | 0.354 M ops/s ( ~ )
| +- hits latency | 2826.538 ns/op | 2827.139 ns/op ( ~ )
| +- important_hits throughput | 0.000 M ops/s | 0.000 M ops/s ( ~ )
| :
| : <before> | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
| +- hits throughput | 0.404 M ops/s | 0.403 M ops/s ( ~ )
| +- hits latency | 2481.190 ns/op | 2487.555 ns/op ( ~ )
| +- important_hits throughput | 0.102 M ops/s | 0.101 M ops/s ( ~ )
The on_lookup test in {cgrp,task}_ls_recursion.c is removed
because the bpf_local_storage_lookup is no longer traceable
and adding tracepoint will make the compiler generate worse
code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/
Signed-off-by: Marco Elver <elver@google.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Sometimes a voltage channel might have an hard failure (eg: a shorted
MOSFET). Hence, add a fault attribute to report such failures.
Signed-off-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20240129-b4-ltc4282-support-v4-2-fe75798164cc@analog.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
HWMON_CHANNEL_INFO() is supposed to be used as initializer for arrays of
const struct hwmon_channel_info *. However, without explicit const,
HWMON_CHANNEL_INFO() creates mutable compound literals, and the const
pointers point at the mutable data. Add const to place the data in
rodata.
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20240117114405.1506775-1-jani.nikula@intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Make sure a warning is issued when a hrtimer gets queued after the
timers have been migrated on the CPU down path and thus said timer
will get ignored
* tag 'timers_urgent_for_v6.8_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Report offline hrtimer enqueue
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Update a potentially stale firmware attribute (Maurizio)
- Fixes for the recent verbose error logging (Keith, Chaitanya)
- Protection information payload size fix for passthrough (Francis)
- Fix for a queue freezing issue in virtblk (Yi)
- blk-iocost underflow fix (Tejun)
- blk-wbt task detection fix (Jan)
* tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux:
virtio-blk: Ensure no requests in virtqueues before deleting vqs.
blk-iocost: Fix an UBSAN shift-out-of-bounds warning
nvme: use ns->head->pi_size instead of t10_pi_tuple structure size
nvme-core: fix comment to reflect right functions
nvme: move passthrough logging attribute to head
blk-wbt: Fix detection of dirty-throttled tasks
nvme-host: fix the updating of the firmware version
|
|
Some PHY driver might require additional regs call after
genphy_c37_read_status() is called.
Expand genphy_c37_read_status to provide a bool wheather the link has
changed or not to permit PHY driver to skip additional regs call if
nothing has changed.
Every user of genphy_c37_read_status() is updated with the new
additional bool.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add devm/of_phy_package_join helper to join PHYs in a PHY package. These
are variant of the manual phy_package_join with the difference that
these will use DT nodes to derive the base_addr instead of manually
passing an hardcoded value.
An additional value is added in phy_package_shared, "np" to reference
the PHY package node pointer in specific PHY driver probe_once and
config_init_once functions to make use of additional specific properties
defined in the PHY package node in DT.
The np value is filled only with of_phy_package_join if a valid PHY
package node is found. A valid PHY package node must have the node name
set to "ethernet-phy-package".
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 57851607326a2beef21e67f83f4f53a90df8445a.
Unfortunately, while we only call that thing once, the callback
*can* be called more than once for the same dentry - all it
takes is rename_lock being touched while we are in d_walk().
For now let's revert it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull ceph fixes from Ilya Dryomov:
"Some fscrypt-related fixups (sparse reads are used only for encrypted
files) and two cap handling fixes from Xiubo and Rishabh"
* tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
ceph: always check dir caps asynchronously
ceph: prevent use-after-free in encode_cap_msg()
ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
libceph: just wait for more data to be available on the socket
libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
libceph: fail sparse-read if the data length doesn't match
|
|
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f53 ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit
43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround. But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in
this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it
has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a
barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Async can schedule a number of interdependent work items. However, since
5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for
unbound workqueues"), unbound workqueues have separate min_active which sets
the number of interdependent work items that can be handled. This default
value is 8 which isn't sufficient for async and can lead to stalls during
resume from suspend in some cases.
Let's use a dedicated unbound workqueue with raised min_active.
Link: http://lkml.kernel.org/r/708a65cc-79ec-44a6-8454-a93d0f3114c3@samsung.com
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Since 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement
for unbound workqueues"), unbound workqueues have separate min_active which
sets the number of interdependent work items that can be handled. This value
is currently initialized to WQ_DFL_MIN_ACTIVE which is 8. This isn't high
enough for some users, let's add an interface to adjust the setting.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This adds the napi busy polling support in io_uring.c. It adds a new
napi_list to the io_ring_ctx structure. This list contains the list of
napi_id's that are currently enabled for busy polling. The list is
synchronized by the new napi_lock spin lock. The current default napi
busy polling time is stored in napi_busy_poll_to. If napi busy polling
is not enabled, the value is 0.
In addition there is also a hash table. The hash table store the napi
id and the pointer to the above list nodes. The hash table is used to
speed up the lookup to the list elements. The hash table is synchronized
with rcu.
The NAPI_TIMEOUT is stored as a timeout to make sure that the time a
napi entry is stored in the napi list is limited.
The busy poll timeout is also stored as part of the io_wait_queue. This
is necessary as for sq polling the poll interval needs to be adjusted
and the napi callback allows only to pass in one value.
This has been tested with two simple programs from the liburing library
repository: the napi client and the napi server program. The client
sends a request, which has a timestamp in its payload and the server
replies with the same payload. The client calculates the roundtrip time
and stores it to calculate the results.
The client is running on host1 and the server is running on host 2 (in
the same rack). The measured times below are roundtrip times. They are
average times over 5 runs each. Each run measures 1 million roundtrips.
no rx coal rx coal: frames=88,usecs=33
Default 57us 56us
client_poll=100us 47us 46us
server_poll=100us 51us 46us
client_poll=100us+ 40us 40us
server_poll=100us
client_poll=100us+ 41us 39us
server_poll=100us+
prefer napi busy poll on client
client_poll=100us+ 41us 39us
server_poll=100us+
prefer napi busy poll on server
client_poll=100us+ 41us 39us
server_poll=100us+
prefer napi busy poll on client + server
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Suggested-by: Olivier Langlois <olivier@trillion01.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230608163839.2891748-5-shr@devkernel.io
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"The only notable change here is the patch that changes the way we deal
with spurious errors from the EFI memory attribute protocol. This will
be backported to v6.6, and is intended to ensure that we will not
paint ourselves into a corner when we tighten this further in order to
comply with MS requirements on signed EFI code.
Note that this protocol does not currently exist in x86 production
systems in the field, only in Microsoft's fork of OVMF, but it will be
mandatory for Windows logo certification for x86 PCs in the future.
- Tighten ELF relocation checks on the RISC-V EFI stub
- Give up if the new EFI memory attributes protocol fails spuriously
on x86
- Take care not to place the kernel in the lowest 16 MB of DRAM on
x86
- Omit special purpose EFI memory from memblock
- Some fixes for the CXL CPER reporting code
- Make the PE/COFF layout of mixed-mode capable images comply with a
strict interpretation of the spec"
* tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
cxl/trace: Remove unnecessary memcpy's
cxl/cper: Fix errant CPER prints for CXL events
efi: Don't add memblocks for soft-reserved memory
efi: runtime: Fix potential overflow of soft-reserved region size
efi/libstub: Add one kernel-doc comment
x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR
x86/efistub: Give up if memory attribute protocol returns an error
riscv/efistub: Tighten ELF relocation check
riscv/efistub: Ensure GP-relative addressing is not used
|
|
In commit 8caab75fd2c2 ("spi: Generalize SPI "master" to "controller"")
some functions and struct members were renamed. Recent work by Uwe
completes this renaming. However, there are plenty of leftovers in
the comments and in-code documentation. Update them as well.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240209165423.2305493-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled the kernel will
be compiled with -ftrivial-auto-var-init=<...> which causes initialization
of stack variables at function entry time.
In order to avoid the performance impact that comes with this users can use
the "uninitialized" attribute to prevent such initialization.
Therefore provide the __uninitialized macro which can be used for cases
where INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled, but only
selected variables should not be initialized.
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240205154844.3757121-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add function to check iommu group mutex lock. So that device drivers can
rely on group mutex lock instead of adding another driver level lock
before modifying driver specific device data structure.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240205115615.6053-10-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Introduce get_amd_iommu_from_dev() and get_amd_iommu_from_dev_data().
And replace rlookup_amd_iommu() with the new helper function where
applicable to avoid unnecessary loop to look up struct amd_iommu from
struct device.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240205115615.6053-4-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Add a new WWAN port that connects to the device fastboot protocol
interface.
Signed-off-by: Jinjian Song <jinjian.song@fibocom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 398aa9a7e77cf23c2a6f882ddd3dcd96f21771dc.
The update to the gadget API to support EBC feature is incomplete. It's
missing at least the following:
* New usage documentation
* Gadget capability check
* Condition for the user to check how many and which endpoints can be
used as "fifo_mode"
* Description of how it can affect completed request (e.g. dwc3 won't
update TRB on completion -- ie. how it can affect request's actual
length report)
Let's revert this until it's ready.
Fixes: 398aa9a7e77c ("usb: dwc3: Support EBC feature of DWC_usb31")
Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/3042f847ff904b4dd4e4cf66a1b9df470e63439e.1707441690.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Refactor pci_epf_alloc_space() API to accept "epc_features" as a parameter.
This is a preparatory work to make the API more robust.
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240207213922.1796533-2-cassel@kernel.org
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/stmicro/stmmac/common.h
38cc3c6dcc09 ("net: stmmac: protect updates of 64-bit statistics counters")
fd5a6a71313e ("net: stmmac: est: Per Tx-queue error count for HLBF")
c5c3e1bfc9e0 ("net: stmmac: Offload queueMaxSDU from tc-taprio")
drivers/net/wireless/microchip/wilc1000/netdev.c
c9013880284d ("wifi: fill in MODULE_DESCRIPTION()s for wilc1000")
328efda22af8 ("wifi: wilc1000: do not realloc workqueue everytime an interface is added")
net/unix/garbage.c
11498715f266 ("af_unix: Remove io_uring code for GC.")
1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use existing typedef for dma_filter_fn to avoid duplicating type
definition.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240208202154.630336-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add kerneldoc for dma_filter field in struct pl022_ssp_controller.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240208202154.630336-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Use existing typedef for dma_filter_fn to avoid duplicating type
definition.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240208202154.630336-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Nothing major here, just moving a few things around to reduce the
padding. This reduces the size on a non-debug kernel from 1536 to
1472 bytes, saving a full cacheline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This adds a flag to avoid dipping dereferencing file and then f_op to
figure out if the file has a poll handler defined or not. We generally
call this at least twice for networked workloads, and if using ring
provided buffers, we do it on every buffer selection. Particularly the
latter is troublesome, as it's otherwise a very fast operation.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Just leave it unset by default, avoiding dipping into the last
cacheline (which is otherwise untouched) for the fast path of using
poll to drive networked traffic. Add a flag that tells us if the
sequence is valid or not, and then we can defer actually assigning
the flag and sequence until someone runs cancelations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We're out of space here, and none of the flags are easily reclaimable.
Bump it to 64-bits and re-arrange the struct a bit to avoid gaps.
Add a specific bitwise type for the request flags, io_request_flags_t.
This will help catch violations of casting this value to a smaller type
on 32-bit archs, like unsigned int.
This creates a hole in the io_kiocb, so move nr_tw up and rsrc_node down
to retain needing only cacheline 0 and 1 for non-polled opcodes.
No functional changes intended in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some weird old filesytems have UUID-like things that we wish to expose
as UUIDs, but are smaller; add a length field so that the new
FS_IOC_(GET|SET)UUID ioctls can handle them in generic code.
And add a helper super_set_uuid(), for setting nonstandard length uuids.
Helper is now required for the new FS_IOC_GETUUID ioctl; if
super_set_uuid() hasn't been called, the ioctl won't be supported.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240207025624.1019754-2-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Merge series from Uwe Kleine-König <u.kleine-koenig@pengutronix.de>:
This series finishes off the removal of some of the legacy names for
SPI controllers and devices.
|