Age | Commit message (Collapse) | Author |
|
Pull block updates from Jens Axboe:
- NVMe pull requests via Keith:
- Target support for PCI-Endpoint transport (Damien)
- TCP IO queue spreading fixes (Sagi, Chaitanya)
- Target handling for "limited retry" flags (Guixen)
- Poll type fix (Yongsoo)
- Xarray storage error handling (Keisuke)
- Host memory buffer free size fix on error (Francis)
- MD pull requests via Song:
- Reintroduce md-linear (Yu Kuai)
- md-bitmap refactor and fix (Yu Kuai)
- Replace kmap_atomic with kmap_local_page (David Reaver)
- Quite a few queue freeze and debugfs deadlock fixes
Ming introduced lockdep support for this in the 6.13 kernel, and it
has (unsurprisingly) uncovered quite a few issues
- Use const attributes for IO schedulers
- Remove bio ioprio wrappers
- Fixes for stacked device atomic write support
- Refactor queue affinity helpers, in preparation for better supporting
isolated CPUs
- Cleanups of loop O_DIRECT handling
- Cleanup of BLK_MQ_F_* flags
- Add rotational support for null_blk
- Various fixes and cleanups
* tag 'for-6.14/block-20250118' of git://git.kernel.dk/linux: (106 commits)
block: Don't trim an atomic write
block: Add common atomic writes enable flag
md/md-linear: Fix a NULL vs IS_ERR() bug in linear_add()
block: limit disk max sectors to (LLONG_MAX >> 9)
block: Change blk_stack_atomic_writes_limits() unit_min check
block: Ensure start sector is aligned for stacking atomic writes
blk-mq: Move more error handling into blk_mq_submit_bio()
block: Reorder the request allocation code in blk_mq_submit_bio()
nvme: fix bogus kzalloc() return check in nvme_init_effects_log()
md/md-bitmap: move bitmap_{start, end}write to md upper layer
md/raid5: implement pers->bitmap_sector()
md: add a new callback pers->bitmap_sector()
md/md-bitmap: remove the last parameter for bimtap_ops->endwrite()
md/md-bitmap: factor behind write counters out from bitmap_{start/end}write()
md: Replace deprecated kmap_atomic() with kmap_local_page()
md: reintroduce md-linear
partitions: ldm: remove the initial kernel-doc notation
blk-cgroup: rwstat: fix kernel-doc warnings in header file
blk-cgroup: fix kernel-doc warnings in header file
nbd: fix partial sending
...
|
|
Pull bcachefs updates from Kent Overstreet:
"Lots of scalability work, another big on-disk format change. On-disk
format version goes from 1.13 to 1.20.
Like 6.11, this is another big and expensive automatic/required on
disk format upgrade. This is planned to be the last big on disk format
upgrade before the experimental label comes off. There will be one
more minor on disk format update for a few things that couldn't make
this release.
Headline improvements:
- Self healing work:
Allocator and reflink now run the exact same check/repair code that
fsck does at runtime, where applicable.
The long term goal here is to remove inconsistent() errors (that
cause us to go emergency read only) by lifting fsck code up to
normal runtime paths; we should only go emergency read-only if we
detect an inconsistency that was due to a runtime bug - or truly
catastrophic damage (corrupted btree roots/interior nodes).
- Reflink repair no longer deletes reflink pointers:
Instead we flip an error bit and log the error, and they can still
be deleted by file deletion. This means a temporary failure to find
an indirect extent (perhaps repaired later by btree node scan)
won't result in unnecessary data loss
- Improvements to rebalance data path option handling:
We can now correctly apply changed filesystem-level io path options
to pending rebalance work, and soon we'll be able to apply
file-level io path option changes to indirect extents
- Fix mount time regression that some users encountered post the 6.11
disk accounting rewrite.
Accounting keys were encoded little endian (typetag in the low
bits) - which didn't anticipate adding accounting keys for every
inode, which aren't stored in memory and we don't want to scan at
mount time.
- fsck time on large filesystems is improved by multiple orders of
magnitude. Previously, 100TB was about the practical max filesystem
size, where users were reporting fsck times of a day+. With the new
changes (which nearly eliminate backpointers fsck overhead), we
fsck'd a filesystem with 10PB of data in 1.5 hours.
The problematic fsck passes were walking every extent and checking
for missing backpointers, and walking every backpointer to check
for dangling backpointers. As we've been adding more and more
runtime self healing there was no reason to keep around the
backpointers -> extents pass; dangling backpointers are just
deleted, and we can do that when using them - thus, backpointers ->
extents is now only run in debug mode.
extents -> backpointers does need to exist, since missing
backpointers would mean we can't find data to move it (for e.g.
copygc, device evacuate, scrub). But the new on disk format version
makes possible a new strategy where we sum up backpointers within a
bucket and check it against the bucket sector counts, and then only
scan for missing backpointers if the counts are off (and then, only
for specific buckets).
Full list of on disk format changes:
- 1.14: backpointer_bucket_gen
Backpointers now have a field for the bucket generation number,
replacing the obsolete bucket_offset field. This is needed for the
new "sum up backpointers within a bucket" code, since backpointers
use the btree write buffer - meaning we will see stale reads, and
this runs online, with the filesystem in full rw mode.
- 1.15: disk_accounting_big_endian
As previously described, fix the endianness of accounting keys so
that accounting keys with the same typetag sort together, and
accounting read can skip types it's not interested in.
- 1.16: reflink_p_may_update_opts:
This version indicates that a new reflink pointer field is
understood and may be used; the field indicates whether the reflink
pointer has permissions to update IO path options (e.g.
compression, replicas) may be updated on the indirect extent it
points to.
This completes the rebalance/reflink data path option handling from
the 6.13 pull request.
- 1.17: inode_depth
Add a new inode field, bi_depth, to accelerate the
check_directory_structure fsck path, which checks for loops in the
filesystem heirarchy.
check_inodes and check_dirents check connectivity, so
check_directory_structure only has to check for loops - by walking
back up to the root from every directory.
But a path can't be a loop if it has a counter that increases
monotonically from root to leaf - adding a depth counter means that
we can check for loops with only local (parent -> child) checks. We
might need to occasionally renumber the depth field in fsck if
directories have been moved around, but then future fsck runs will
be much faster.
- 1.18: persistent_inode_cursors
Previously, the cursor used for inode allocation was only kept in
memory, which meant that users with large filesystems and lots of
files were reporting that the first create after mounting would
take awhile - since it had to scan from the start.
Inode allocation cursors are now persistent, and also include a
generation field (incremented on wraparound, which will only happen
if inode allocation is restricted to 32 bit inodes), so that we
don't have to leave inode_generation keys around after a delete.
The option for 32 bit inode numbers may now also be set on
individual directories, and non-32 bit inode allocations are
disallowed from allocating from the 32 bit part of the inode number
space.
- 1.19: autofix_errors
Runtime self healing is now the default.o
- 1.20: directory size (from Hongbo)
directory i_size is now meaningful, and not 0"
* tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs: (268 commits)
bcachefs: Fix check_inode_hash_info_matches_root()
bcachefs: Document issue with bch_stripe layout
bcachefs: Fix self healing on read error
bcachefs: Pop all the transactions from the abort one
bcachefs: Only abort the transactions in the cycle
bcachefs: Introduce lock_graph_pop_from
bcachefs: Convert open-coded lock_graph_pop_all to helper
bcachefs: Do not allow no fail lock request to fail
bcachefs: Merge the condition to avoid additional invocation
Revert "bcachefs: Fix bch2_btree_node_upgrade()"
bcachefs: bcachefs_metadata_version_directory_size
bcachefs: make directory i_size meaningful
bcachefs: check_unreachable_inodes is not actually PASS_ONLINE yet
bcachefs: Don't use BTREE_ITER_cached when walking alloc btree during fsck
bcachefs: Check for dirents to overwritten inodes
bcachefs: bch2_btree_iter_peek_slot() handles navigating to nonexistent depth
bcachefs: Don't set btree_path to updtodate if we don't fill
bcachefs: __bch2_btree_pos_to_text()
bcachefs: printbuf_reset() handles tabstops
bcachefs: Silence read-only errors when deleting snapshots
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case (Tycho
Andersen, Kees Cook)
- binfmt_misc: Fix comment typos (Christophe JAILLET)
- move empty argv[0] warning closer to actual logic (Nir Lichtman)
- remove legacy custom binfmt modules autoloading (Nir Lichtman)
- Make sure set_task_comm() always NUL-terminates
- binfmt_flat: Fix integer overflow bug on 32 bit systems (Dan
Carpenter)
- coredump: Do not lock when copying "comm"
- MAINTAINERS: add auxvec.h and set myself as maintainer
* tag 'execve-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_flat: Fix integer overflow bug on 32 bit systems
selftests/exec: add a test for execveat()'s comm
exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
exec: Make sure task->comm is always NUL-terminated
exec: remove legacy custom binfmt modules autoloading
exec: move warning of null argv to be next to the relevant code
fs: binfmt: Fix a typo
MAINTAINERS: exec: Mark Kees as maintainer
MAINTAINERS: exec: Add auxvec.h UAPI
coredump: Do not lock during 'comm' reporting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible changes, features:
- rebuilding of the free space tree at mount time is done in more
transactions, fix potential hangs when the transaction thread is
blocked due to large amount of block groups
- more read IO balancing strategies (experimental config), add two
new ways how to select a device for read if the profiles allow that
(all RAID1*), the current default selects the device by pid which
is good on average but less performant for single reader workloads
- select preferred device for all reads (namely for testing)
- round-robin, balance reads across devices relevant for the
requested IO range
- add encoded write ioctl support to io_uring (read was added in
6.12), basis for writing send stream using that instead of
syscalls, non-blocking mode is not yet implemented
- support FS_IOC_READ_VERITY_METADATA, applications can use the
metadata to do their own verification
- pass inode's i_write_hint to bios, for parity with other
filesystems, ioctls F_GET_RW_HINT/F_SET_RW_HINT
Core:
- in zoned mode: allow to directly reclaim a block group by simply
resetting it, then it can be reused and another block group does
not need to be allocated
- super block validation now also does more comprehensive sys array
validation, adding it to the points where superblock is validated
(post-read, pre-write)
- subpage mode fixes:
- fix double accounting of blocks due to some races
- improved or fixed error handling in a few cases (compression,
delalloc)
- raid stripe tree:
- fix various cases with extent range splitting or deleting
- implement hole punching to extent range
- reduce number of stripe tree lookups during bio submission
- more self-tests
- updated self-tests (delayed refs)
- error handling improvements
- cleanups, refactoring
- remove rest of backref caching infrastructure from relocation,
not needed anymore
- error message updates
- remove unnecessary calls when extent buffer was marked dirty
- unused parameter removal
- code moved to new files
Other code changes: add rb_find_add_cached() to the rb-tree API"
* tag 'for-6.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (127 commits)
btrfs: selftests: add a selftest for deleting two out of three extents
btrfs: selftests: add test for punching a hole into 3 RAID stripe-extents
btrfs: selftests: add selftest for punching holes into the RAID stripe extents
btrfs: selftests: test RAID stripe-tree deletion spanning two items
btrfs: selftests: don't split RAID extents in half
btrfs: selftests: check for correct return value of failed lookup
btrfs: don't use btrfs_set_item_key_safe on RAID stripe-extents
btrfs: implement hole punching for RAID stripe extents
btrfs: fix deletion of a range spanning parts two RAID stripe extents
btrfs: fix tail delete of RAID stripe-extents
btrfs: fix front delete range calculation for RAID stripe extents
btrfs: assert RAID stripe-extent length is always greater than 0
btrfs: don't try to delete RAID stripe-extents if we don't need to
btrfs: selftests: correct RAID stripe-tree feature flag setting
btrfs: add io_uring interface for encoded writes
btrfs: remove the unused locked_folio parameter from btrfs_cleanup_ordered_extents()
btrfs: add extra error messages for delalloc range related errors
btrfs: subpage: dump the involved bitmap when ASSERT() failed
btrfs: subpage: fix the bitmap dump of the locked flags
btrfs: do proper folio cleanup when run_delalloc_nocow() failed
...
|
|
Merge devfreq and OPP (Operating Performance Points) updates for 6.14:
- Clean up the Exynos devfreq driver and devfreq core (Markus Elfring,
Jeongjun Park).
- Minor cleanups and fixes for OPP (Dan Carpenter, Neil Armstrong, Joe
Hattori).
- Implement dev_pm_opp_get_bw() (Neil Armstrong).
- Expose OPP reference counting helpers for Rust (Viresh Kumar).
* pm-devfreq:
PM / devfreq: exynos: remove unused function parameter
PM / devfreq: event: Call of_node_put() only once in devfreq_event_get_edev_by_phandle()
* pm-opp:
PM / OPP: Add reference counting helpers for Rust implementation
OPP: OF: Fix an OF node leak in _opp_add_static_v2()
OPP: fix dev_pm_opp_find_bw_*() when bandwidth table not initialized
OPP: add index check to assert to avoid buffer overflow in _read_freq()
opp: core: Fix off by one in dev_pm_opp_get_bw()
opp: core: implement dev_pm_opp_get_bw
|
|
Record the pending configuration in net_device struct.
ethtool core duplicates the current config and the specific
handlers (for now just ringparam) can modify it.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Separate the HDS config from the ethtool state struct.
The HDS config contains just simple parameters, not state.
Having it as a separate struct will make it easier to clone / copy
and also long term potentially make it per-queue.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs direct-io updates from Christian Brauner:
"File systems that write out of place usually require different
alignment for direct I/O writes than what they can do for reads.
Add a separate dio read align field to statx, as many out of place
write file systems can easily do reads aligned to the device sector
size, but require bigger alignment for writes.
This is usually papered over by falling back to buffered I/O for
smaller writes and doing read-modify-write cycles, but performance for
this sucks, so applications benefit from knowing the actual write
alignment"
* tag 'vfs-6.14-rc1.statx.dio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
xfs: report larger dio alignment for COW inodes
xfs: report the correct read/write dio alignment for reflinked inodes
xfs: cleanup xfs_vn_getattr
fs: add STATX_DIO_READ_ALIGN
fs: reformat the statx definition
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs libfs updates from Christian Brauner:
"This improves the stable directory offset behavior in various ways.
Stable offsets are needed so that NFS can reliably read directories on
filesystems such as tmpfs:
- Improve the end-of-directory detection
According to getdents(3), the d_off field in each returned
directory entry points to the next entry in the directory. The
d_off field in the last returned entry in the readdir buffer must
contain a valid offset value, but if it points to an actual
directory entry, then readdir/getdents can loop.
Introduce a specific fixed offset value that is placed in the d_off
field of the last entry in a directory. Some user space
applications assume that the EOD offset value is larger than the
offsets of real directory entries, so the largest valid offset
value is reserved for this purpose. This new value is never
allocated by simple_offset_add().
When ->iterate_dir() returns, getdents{64} inserts the ctx->pos
value into the d_off field of the last valid entry in the readdir
buffer. When it hits EOD, offset_readdir() sets ctx->pos to the EOD
offset value so the last entry is updated to point to the EOD
marker.
When trying to read the entry at the EOD offset, offset_readdir()
terminates immediately.
- Rely on d_children to iterate stable offset directories
Instead of using the mtree to emit entries in the order of their
offset values, use it only to map incoming ctx->pos to a starting
entry. Then use the directory's d_children list, which is already
maintained properly by the dcache, to find the next child to emit.
- Narrow the range of directory offset values returned by
simple_offset_add() to 3 .. (S32_MAX - 1) on all platforms. This
means the allocation behavior is identical on 32-bit systems,
64-bit systems, and 32-bit user space on 64-bit kernels. The new
range still permits over 2 billion concurrent entries per
directory.
- Return ENOSPC when the directory offset range is exhausted. Hitting
this error is almost impossible though.
- Remove the simple_offset_empty() helper"
* tag 'vfs-6.14-rc1.libfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
libfs: Use d_children list to iterate simple_offset directories
libfs: Replace simple_offset end-of-directory detection
Revert "libfs: fix infinite directory reads for offset dir"
Revert "libfs: Add simple_offset_empty()"
libfs: Return ENOSPC when the directory offset range is exhausted
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- Add a mountinfo program to demonstrate statmount()/listmount()
Add a new "mountinfo" sample userland program that demonstrates how
to use statmount() and listmount() to get at the same info that
/proc/pid/mountinfo provides
- Remove pointless nospec.h include
- Prepend statmount.mnt_opts string with security_sb_mnt_opts()
Currently these mount options aren't accessible via statmount()
- Add new mount namespaces to mount namespace rbtree outside of the
namespace semaphore
- Lockless mount namespace lookup
Currently we take the read lock when looking for a mount namespace to
list mounts in. We can make this lockless. The simple search case can
just use a sequence counter to detect concurrent changes to the
rbtree
For walking the list of mount namespaces sequentially via nsfs we
keep a separate rcu list as rb_prev() and rb_next() aren't usable
safely with rcu. Currently there is no primitive for retrieving the
previous list member. To do this we need a new deletion primitive
that doesn't poison the prev pointer and a corresponding retrieval
helper
Since creating mount namespaces is a relatively rare event compared
with querying mounts in a foreign mount namespace this is worth it.
Once libmount and systemd pick up this mechanism to list mounts in
foreign mount namespaces this will be used very frequently
- Add extended selftests for lockless mount namespace iteration
- Add a sample program to list all mounts on the system, i.e., in
all mount namespaces
- Improve mount namespace iteration performance
Make finding the last or first mount to start iterating the mount
namespace from an O(1) operation and add selftests for iterating the
mount table starting from the first and last mount
- Use an xarray for the old mount id
While the ida does use the xarray internally we can use it explicitly
which allows us to increment the unique mount id under the xa lock.
This allows us to remove the atomic as we're now allocating both ids
in one go
- Use a shared header for vfs sample programs
- Fix build warnings for new sample program to list all mounts
* tag 'vfs-6.14-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
samples/vfs: fix build warnings
samples/vfs: use shared header
samples/vfs/mountinfo: Use __u64 instead of uint64_t
fs: remove useless lockdep assertion
fs: use xarray for old mount id
selftests: add listmount() iteration tests
fs: cache first and last mount
samples: add test-list-all-mounts
selftests: remove unneeded include
selftests: add tests for mntns iteration
seltests: move nsfs into filesystems subfolder
fs: simplify rwlock to spinlock
fs: lockless mntns lookup for nsfs
rculist: add list_bidir_{del,prev}_rcu()
fs: lockless mntns rbtree lookup
fs: add mount namespace to rbtree late
fs: prepend statmount.mnt_opts string with security_sb_mnt_opts()
mount: remove inlude/nospec.h include
samples: add a mountinfo program to demonstrate statmount()/listmount()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pid_max namespacing update from Christian Brauner:
"The pid_max sysctl is a global value. For a long time the default
value has been 65535 and during the pidfd dicussions Linus proposed to
bump pid_max by default. Based on this discussion systemd started
bumping pid_max to 2^22. So all new systems now run with a very high
pid_max limit with some distros having also backported that change.
The decision to bump pid_max is obviously correct. It just doesn't
make a lot of sense nowadays to enforce such a low pid number. There's
sufficient tooling to make selecting specific processes without typing
really large pid numbers available.
In any case, there are workloads that have expections about how large
pid numbers they accept. Either for historical reasons or
architectural reasons. One concreate example is the 32-bit version of
Android's bionic libc which requires pid numbers less than 65536.
There are workloads where it is run in a 32-bit container on a 64-bit
kernel. If the host has a pid_max value greater than 65535 the libc
will abort thread creation because of size assumptions of
pthread_mutex_t.
That's a fairly specific use-case however, in general specific
workloads that are moved into containers running on a host with a new
kernel and a new systemd can run into issues with large pid_max
values. Obviously making assumptions about the size of the allocated
pid is suboptimal but we have userspace that does it.
Of course, giving containers the ability to restrict the number of
processes in their respective pid namespace indepent of the global
limit through pid_max is something desirable in itself and comes in
handy in general.
Independent of motivating use-cases the existence of pid namespaces
makes this also a good semantical extension and there have been prior
proposals pushing in a similar direction. The trick here is to
minimize the risk of regressions which I think is doable. The fact
that pid namespaces are hierarchical will help us here.
What we mostly care about is that when the host sets a low pid_max
limit, say (crazy number) 100 that no descendant pid namespace can
allocate a higher pid number in its namespace. Since pid allocation is
hierarchial this can be ensured by checking each pid allocation
against the pid namespace's pid_max limit. This means if the
allocation in the descendant pid namespace succeeds, the ancestor pid
namespace can reject it. If the ancestor pid namespace has a higher
limit than the descendant pid namespace the descendant pid namespace
will reject the pid allocation. The ancestor pid namespace will
obviously not care about this.
All in all this means pid_max continues to enforce a system wide limit
on the number of processes but allows pid namespaces sufficient leeway
in handling workloads with assumptions about pid values and allows
containers to restrict the number of processes in a pid namespace
through the pid_max interface"
* tag 'kernel-6.14-rc1.pid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
tests/pid_namespace: add pid_max tests
pid: allow pid_max to be set per pid namespace
|
|
Merge updates related to system sleep, a cpuidle update and an Energy
Model handling code update for 6.14-rc1:
- Allow configuring the system suspend-resume (DPM) watchdog to warn
earlier than panic (Douglas Anderson).
- Implement devm_device_init_wakeup() helper and introduce a device-
managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan).
- Remove direct inclusions of 'pm_wakeup.h' which should be only
included via 'device.h' (Wolfram Sang).
- Clean up two comments in the core system-wide PM code (Rafael
Wysocki, Randy Dunlap).
- Add Clearwater Forest processor support to the intel_idle cpuidle
driver (Artem Bityutskiy).
- Move sched domains rebuild function from the schedutil cpufreq
governor to the Energy Model handling code (Rafael Wysocki).
* pm-sleep:
PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq()
PM: sleep: Allow configuring the DPM watchdog to warn earlier than panic
PM: sleep: convert comment from kernel-doc to plain comment
PM: wakeup: implement devm_device_init_wakeup() helper
PM: sleep: sysfs: don't include 'pm_wakeup.h' directly
PM: sleep: autosleep: don't include 'pm_wakeup.h' directly
PM: sleep: Update stale comment in device_resume()
* pm-cpuidle:
intel_idle: add Clearwater Forest SoC support
* pm-em:
PM: EM: Move sched domains rebuild function from schedutil to EM
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull cred refcount updates from Christian Brauner:
"For the v6.13 cycle we switched overlayfs to a variant of
override_creds() that doesn't take an extra reference. To this end the
{override,revert}_creds_light() helpers were introduced.
This generalizes the idea behind {override,revert}_creds_light() to
the {override,revert}_creds() helpers. Afterwards overriding and
reverting credentials is reference count free unless the caller
explicitly takes a reference.
All callers have been appropriately ported"
* tag 'kernel-6.14-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
cred: fold get_new_cred_many() into get_cred_many()
cred: remove unused get_new_cred()
nfsd: avoid pointless cred reference count bump
cachefiles: avoid pointless cred reference count bump
dns_resolver: avoid pointless cred reference count bump
trace: avoid pointless cred reference count bump
cgroup: avoid pointless cred reference count bump
acct: avoid pointless reference count bump
io_uring: avoid pointless cred reference count bump
smb: avoid pointless cred reference count bump
cifs: avoid pointless cred reference count bump
cifs: avoid pointless cred reference count bump
ovl: avoid pointless cred reference count bump
open: avoid pointless cred reference count bump
nfsfh: avoid pointless cred reference count bump
nfs/nfs4recover: avoid pointless cred reference count bump
nfs/nfs4idmap: avoid pointless reference count bump
nfs/localio: avoid pointless cred reference count bumps
coredump: avoid pointless cred reference count bump
binfmt_misc: avoid pointless cred reference count bump
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
- Rework inode number allocation
Recently we received a patchset that aims to enable file handle
encoding and decoding via name_to_handle_at(2) and
open_by_handle_at(2).
A crucical step in the patch series is how to go from inode number to
struct pid without leaking information into unprivileged contexts.
The issue is that in order to find a struct pid the pid number in the
initial pid namespace must be encoded into the file handle via
name_to_handle_at(2).
This can be used by containers using a separate pid namespace to
learn what the pid number of a given process in the initial pid
namespace is. While this is a weak information leak it could be used
in various exploits and in general is an ugly wart in the design.
To solve this problem a new way is needed to lookup a struct pid
based on the inode number allocated for that struct pid. The other
part is to remove the custom inode number allocation on 32bit systems
that is also an ugly wart that should go away.
Allocate unique identifiers for struct pid by simply incrementing a
64 bit counter and insert each struct pid into the rbtree so it can
be looked up to decode file handles avoiding to leak actual pids
across pid namespaces in file handles.
On both 64 bit and 32 bit the same 64 bit identifier is used to
lookup struct pid in the rbtree. On 64 bit the unique identifier for
struct pid simply becomes the inode number. Comparing two pidfds
continues to be as simple as comparing inode numbers.
On 32 bit the 64 bit number assigned to struct pid is split into two
32 bit numbers. The lower 32 bits are used as the inode number and
the upper 32 bits are used as the inode generation number. Whenever a
wraparound happens on 32 bit the 64 bit number will be incremented by
2 so inode numbering starts at 2 again.
When a wraparound happens on 32 bit multiple pidfds with the same
inode number are likely to exist. This isn't a problem since before
pidfs pidfds used the anonymous inode meaning all pidfds had the same
inode number. On 32 bit sserspace can thus reconstruct the 64 bit
identifier by retrieving both the inode number and the inode
generation number to compare, or use file handles. This gives the
same guarantees on both 32 bit and 64 bit.
- Implement file handle support
This is based on custom export operation methods which allows pidfs
to implement permission checking and opening of pidfs file handles
cleanly without hacking around in the core file handle code too much.
- Support bind-mounts
Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts
for pidfds. This allows pidfds to be safely recovered and checked for
process recycling.
Instead of checking d_ops for both nsfs and pidfs we could in a
follow-up patch add a flag argument to struct dentry_operations that
functions similar to file_operations->fop_flags.
* tag 'vfs-6.14-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add pidfd bind-mount tests
pidfs: allow bind-mounts
pidfs: lookup pid through rbtree
selftests/pidfd: add pidfs file handle selftests
pidfs: check for valid ioctl commands
pidfs: implement file handle support
exportfs: add permission method
fhandle: pull CAP_DAC_READ_SEARCH check into may_decode_fh()
exportfs: add open method
fhandle: simplify error handling
pseudofs: add support for export_ops
pidfs: support FS_IOC_GETVERSION
pidfs: remove 32bit inode number handling
pidfs: rework inode number allocation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support caching symlink lengths in inodes
The size is stored in a new union utilizing the same space as
i_devices, thus avoiding growing the struct or taking up any more
space
When utilized it dodges strlen() in vfs_readlink(), giving about
1.5% speed up when issuing readlink on /initrd.img on ext4
- Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
If a file system supports uncached buffered IO, it may set
FOP_DONTCACHE and enable support for RWF_DONTCACHE.
If RWF_DONTCACHE is attempted without the file system supporting
it, it'll get errored with -EOPNOTSUPP
- Enable VBOXGUEST and VBOXSF_FS on ARM64
Now that VirtualBox is able to run as a host on arm64 (e.g. the
Apple M3 processors) we can enable VBOXSF_FS (and in turn
VBOXGUEST) for this architecture.
Tested with various runs of bonnie++ and dbench on an Apple MacBook
Pro with the latest Virtualbox 7.1.4 r165100 installed
Cleanups:
- Delay sysctl_nr_open check in expand_files()
- Use kernel-doc includes in fiemap docbook
- Use page->private instead of page->index in watch_queue
- Use a consume fence in mnt_idmap() as it's heavily used in
link_path_walk()
- Replace magic number 7 with ARRAY_SIZE() in fc_log
- Sort out a stale comment about races between fd alloc and dup2()
- Fix return type of do_mount() from long to int
- Various cosmetic cleanups for the lockref code
Fixes:
- Annotate spinning as unlikely() in __read_seqcount_begin
The annotation already used to be there, but got lost in commit
52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as
statement expressions")
- Fix proc_handler for sysctl_nr_open
- Flush delayed work in delayed fput()
- Fix grammar and spelling in propagate_umount()
- Fix ESP not readable during coredump
In /proc/PID/stat, there is the kstkesp field which is the stack
pointer of a thread. While the thread is active, this field reads
zero. But during a coredump, it should have a valid value
However, at the moment, kstkesp is zero even during coredump
- Don't wake up the writer if the pipe is still full
- Fix unbalanced user_access_end() in select code"
* tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
gfs2: use lockref_init for qd_lockref
erofs: use lockref_init for pcl->lockref
dcache: use lockref_init for d_lockref
lockref: add a lockref_init helper
lockref: drop superfluous externs
lockref: use bool for false/true returns
lockref: improve the lockref_get_not_zero description
lockref: remove lockref_put_not_zero
fs: Fix return type of do_mount() from long to int
select: Fix unbalanced user_access_end()
vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64
pipe_read: don't wake up the writer if the pipe is still full
selftests: coredump: Add stackdump test
fs/proc: do_task_stat: Fix ESP not readable during coredump
fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
fs: sort out a stale comment about races between fd alloc and dup2
fs: Fix grammar and spelling in propagate_umount()
fs: fc_log replace magic number 7 with ARRAY_SIZE()
fs: use a consume fence in mnt_idmap()
file: flush delayed work in delayed fput()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs netfs updates from Christian Brauner:
"This contains read performance improvements and support for monolithic
single-blob objects that have to be read/written as such (e.g. AFS
directory contents). The implementation of the two parts is interwoven
as each makes the other possible.
- Read performance improvements
The read performance improvements are intended to speed up some
loss of performance detected in cifs and to a lesser extend in afs.
The problem is that we queue too many work items during the
collection of read results: each individual subrequest is collected
by its own work item, and then they have to interact with each
other when a series of subrequests don't exactly align with the
pattern of folios that are being read by the overall request.
Whilst the processing of the pages covered by individual
subrequests as they complete potentially allows folios to be woken
in parallel and with minimum delay, it can shuffle wakeups for
sequential reads out of order - and that is the most common I/O
pattern.
The final assessment and cleanup of an operation is then held up
until the last I/O completes - and for a synchronous sequential
operation, this means the bouncing around of work items just adds
latency.
Two changes have been made to make this work:
(1) All collection is now done in a single "work item" that works
progressively through the subrequests as they complete (and
also dispatches retries as necessary).
(2) For readahead and AIO, this work item be done on a workqueue
and can run in parallel with the ultimate consumer of the data;
for synchronous direct or unbuffered reads, the collection is
run in the application thread and not offloaded.
Functions such as smb2_readv_callback() then just tell netfslib
that the subrequest has terminated; netfslib does a minimal bit of
processing on the spot - stat counting and tracing mostly - and
then queues/wakes up the worker. This simplifies the logic as the
collector just walks sequentially through the subrequests as they
complete and walks through the folios, if buffered, unlocking them
as it goes. It also keeps to a minimum the amount of latency
injected into the filesystem's low-level I/O handling
The way netfs supports filesystems using the deprecated
PG_private_2 flag is changed: folios are flagged and added to a
write request as they complete and that takes care of scheduling
the writes to the cache. The originating read request can then just
unlock the pages whatever happens.
- Single-blob object support
Single-blob objects are files for which the content of the file
must be read from or written to the server in a single operation
because reading them in parts may yield inconsistent results. AFS
directories are an example of this as there exists the possibility
that the contents are generated on the fly and would differ between
reads or might change due to third party interference.
Such objects will be written to and retrieved from the cache if one
is present, though we allow/may need to propose multiple
subrequests to do so. The important part is that read from/write to
the *server* is monolithic.
Single blob reading is, for the moment, fully synchronous and does
result collection in the application thread and, also for the
moment, the API is supplied the buffer in the form of a folio_queue
chain rather than using the pagecache.
- Related afs changes
This series makes a number of changes to the kafs filesystem,
primarily in the area of directory handling:
- AFS's FetchData RPC reply processing is made partially
asynchronous which allows the netfs_io_request's outstanding
operation counter to be removed as part of reducing the
collection to a single work item.
- Directory and symlink reading are plumbed through netfslib using
the single-blob object API and are now cacheable with fscache.
This also allows the afs_read struct to be eliminated and
netfs_io_subrequest to be used directly instead.
- Directory and symlink content are now stored in a folio_queue
buffer rather than in the pagecache. This means we don't require
the RCU read lock and xarray iteration to access it, and folios
won't randomly disappear under us because the VM wants them
back.
- The vnode operation lock is changed from a mutex struct to a
private lock implementation. The problem is that the lock now
needs to be dropped in a separate thread and mutexes don't
permit that.
- When a new directory or symlink is created, we now initialise it
locally and mark it valid rather than downloading it (we know
what it's likely to look like).
- We now use the in-directory hashtable to reduce the number of
entries we need to scan when doing a lookup. The edit routines
have to maintain the hash chains.
- Cancellation (e.g. by signal) of an async call after the
rxrpc_call has been set up is now offloaded to the worker thread
as there will be a notification from rxrpc upon completion. This
avoids a double cleanup.
- A "rolling buffer" implementation is created to abstract out the
two separate folio_queue chaining implementations I had (one for
read and one for write).
- Functions are provided to create/extend a buffer in a folio_queue
chain and tear it down again.
This is used to handle AFS directories, but could also be used to
create bounce buffers for content crypto and transport crypto.
- The was_async argument is dropped from netfs_read_subreq_terminated()
Instead we wake the read collection work item by either queuing it
or waking up the app thread.
- We don't need to use BH-excluding locks when communicating between
the issuing thread and the collection thread as neither of them now
run in BH context.
- Also included are a number of new tracepoints; a split of the
netfslib write collection code to put retrying into its own file
(it gets more complicated with content encryption).
- There are also some minor fixes AFS included, including fixing the
AFS directory format struct layout, reducing some directory
over-invalidation and making afs_mkdir() translate EEXIST to
ENOTEMPY (which is not available on all systems the servers
support).
- Finally, there's a patch to try and detect entry into the folio
unlock function with no folio_queue structs in the buffer (which
isn't allowed in the cases that can get there).
This is a debugging patch, but should be minimal overhead"
* tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
afs: Add a tracepoint for afs_read_receive()
afs: Locally initialise the contents of a new symlink on creation
afs: Use the contained hashtable to search a directory
afs: Make afs_mkdir() locally initialise a new directory's content
netfs: Change the read result collector to only use one work item
afs: Make {Y,}FS.FetchData an asynchronous operation
afs: Fix cleanup of immediately failed async calls
afs: Eliminate afs_read
afs: Use netfslib for symlinks, allowing them to be cached
afs: Use netfslib for directories
afs: Make afs_init_request() get a key if not given a file
netfs: Add support for caching single monolithic objects such as AFS dirs
netfs: Add functions to build/clean a buffer in a folio_queue
afs: Add more tracepoints to do with tracking validity
cachefiles: Add auxiliary data trace
cachefiles: Add some subrequest tracepoints
netfs: Remove some extraneous directory invalidations
afs: Fix directory format encoding struct
afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
...
|
|
Merge ACPI battery and fan drivers updates and miscellaneous ACPI
chanages for 6.14:
- Update messages printed by the ACPI battery driver to always
refer to driver extensions as "hooks" to avoid confusion with
similar functionality in the power supply subsystem in the
future (Thomas Weißschuh).
- Fix .probe() error path cleanup in the ACPI fan driver to avoid
memory leaks (Joe Hattori).
- Constify 'struct bin_attribute' in some places in the ACPI subsystem
and mark it as __ro_after_init in one place to prevent binary blob
attributes from being updated (Thomas Weißschuh)
- Add empty stubs for several ACPI-related symbols so that they can be
used when CONFIG_ACPI is unset and use them for removing unnecessary
conditional compilation from the ipu-bridge driver (Ricardo Ribalda).
* acpi-battery:
ACPI: battery: Rename extensions to hook in messages
* acpi-fan:
ACPI: fan: cleanup resources in the error path of .probe()
* acpi-misc:
media: ipu-bridge: Remove unneeded conditional compilations
ACPI: bus: implement acpi_device_hid when !ACPI
ACPI: bus: implement for_each_acpi_consumer_dev when !ACPI
ACPI: header: implement acpi_device_handle when !ACPI
ACPI: bus: implement acpi_get_physical_device_location when !ACPI
ACPI: bus: implement for_each_acpi_dev_match when !ACPI
ACPI: bus: change the prototype for acpi_get_physical_device_location
ACPI: sysfs: Constify 'struct bin_attribute'
ACPI: BGRT: Constify 'struct bin_attribute'
ACPI: BGRT: Mark bin_attribute as __ro_after_init
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge OPP (Operating Performance Points) updates for 6.14 from Viresh
Kumar:
"- Minor cleanups / fixes (Dan Carpenter, Neil Armstrong, and Joe
Hattori).
- Implement dev_pm_opp_get_bw (Neil Armstrong).
- Expose reference counting helpers (Viresh Kumar)."
* tag 'opp-updates-6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
PM / OPP: Add reference counting helpers for Rust implementation
OPP: OF: Fix an OF node leak in _opp_add_static_v2()
OPP: fix dev_pm_opp_find_bw_*() when bandwidth table not initialized
OPP: add index check to assert to avoid buffer overflow in _read_freq()
opp: core: Fix off by one in dev_pm_opp_get_bw()
opp: core: implement dev_pm_opp_get_bw
|
|
- newly added support for Intel Touch Host Controller (Even Xu, Xinpeng Sun)
|
|
- dead code removal in intel-ish-hid driver (Dr. David Alan Gilbert)
|
|
- constification of 'struct bin_attribute' in various HID driver (Thomas Weißschuh)
|
|
To ensure that resources such as OPP tables or OPP nodes are not freed
while in use by the Rust implementation, it is necessary to increment
their reference count from Rust code.
This commit introduces a new helper function,
dev_pm_opp_get_opp_table_ref(), to increment the reference count of an
OPP table and declares the existing helper dev_pm_opp_get() in pm_opp.h.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Borislav Petkov:
- Reset hrtimers correctly when a CPU hotplug state traversal happens
"half-ways" and leaves hrtimers not (re-)initialized properly
- Annotate accesses to a timer group's ignore flag to prevent KCSAN
from raising data_race warnings
- Make sure timer group initialization is visible to timer tree walkers
and avoid a hypothetical race
- Fix another race between CPU hotplug and idle entry/exit where timers
on a fully idle system are getting ignored
- Fix a case where an ignored signal is still being handled which it
shouldn't be
* tag 'timers_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimers: Handle CPU state correctly on hotplug
timers/migration: Annotate accesses to ignore flag
timers/migration: Enforce group initialization visibility to tree walkers
timers/migration: Fix another race between hotplug and idle entry/exit
signal/posixtimers: Handle ignore/blocked sequences correctly
|
|
key_being_used_for[] is an unused array of textual names for
the elements of the enum key_being_used_for. It was added in 2015 by
commit 99db44350672 ("PKCS#7: Appropriately restrict authenticated
attributes and content type")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.14
Most likely the last "new features" pull request for v6.14 and this is
a bigger one. Multi-Link Operation (MLO) work continues both in stack
in drivers. Few new devices supported and usual fixes all over.
Major changes:
cfg80211
* Emergency Preparedness Communication Services (EPCS) station mode support
mac80211
* an option to filter a sta from being flushed
* some support for RX Operating Mode Indication (OMI) power saving
* support for adding and removing station links for MLO
iwlwifi
* new device ids
* rework firmware error handling and restart
rtw88
* RTL8812A: RFE type 2 support
* LED support
rtw89
* variant info to support RTL8922AE-VS
mt76
* mt7996: single wiphy multiband support (preparation for MLO)
* mt7996: support for more variants
* mt792x: P2P_DEVICE support
* mt7921u: TP-Link TXE50UH support
ath12k
* enable MLO for QCN9274 (although it seems to be broken with dual
band devices)
* MLO radar detection support
* debugfs: transmit buffer OFDMA, AST entry and puncture stats
* tag 'wireless-next-2025-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (322 commits)
wifi: brcmfmac: fix NULL pointer dereference in brcmf_txfinalize()
wifi: rtw88: add RTW88_LEDS depends on LEDS_CLASS to Kconfig
wifi: wilc1000: unregister wiphy only after netdev registration
wifi: cfg80211: adjust allocation of colocated AP data
wifi: mac80211: fix memory leak in ieee80211_mgd_assoc_ml_reconf()
wifi: ath12k: fix key cache handling
wifi: ath12k: Fix uninitialized variable access in ath12k_mac_allocate() function
wifi: ath12k: Remove ath12k_get_num_hw() helper function
wifi: ath12k: Refactor the ath12k_hw get helper function argument
wifi: ath12k: Refactor ath12k_hw set helper function argument
wifi: mt76: mt7996: add implicit beamforming support for mt7992
wifi: mt76: mt7996: fix beacon command during disabling
wifi: mt76: mt7996: fix ldpc setting
wifi: mt76: mt7996: fix definition of tx descriptor
wifi: mt76: connac: adjust phy capabilities based on band constraints
wifi: mt76: mt7996: fix incorrect indexing of MIB FW event
wifi: mt76: mt7996: fix HE Phy capability
wifi: mt76: mt7996: fix the capability of reception of EHT MU PPDU
wifi: mt76: mt7996: add max mpdu len capability
wifi: mt76: mt7921: avoid undesired changes of the preset regulatory domain
...
====================
Link: https://patch.msgid.link/20250117203529.72D45C4CEDD@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have some leftovers from the switch to linkmode bitmaps which
- have never been used
- are not used any longer
- have no user outside phy_device.c
So remove them.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/5493b96e-88bb-4230-a911-322659ec5167@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For packets with two-step timestamp requests, the hardware timestamp
comes back to the driver through a confirmation mechanism of sorts,
which allows the driver to confidently bump the successful "pkts"
counter.
For one-step PTP, the NIC is supposed to autonomously insert its
hardware TX timestamp in the packet headers while simultaneously
transmitting it. There may be a confirmation that this was done
successfully, or there may not.
None of the current drivers which implement ethtool_ops :: get_ts_stats()
also support HWTSTAMP_TX_ONESTEP_SYNC or HWTSTAMP_TX_ONESTEP_SYNC, so it
is a bit unclear which model to follow. But there are NICs, such as DSA,
where there is no transmit confirmation at all. Here, it would be wrong /
misleading to increment the successful "pkts" counter, because one-step
PTP packets can be dropped on TX just like any other packets.
So introduce a special counter which signifies "yes, an attempt was made,
but we don't know whether it also exited the port or not". I expect that
for one-step PTP packets where a confirmation is available, the "pkts"
counter would be bumped.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250116104628.123555-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: support FW Recovery Mode
Konrad Knitter says:
Enable update of card in FW Recovery Mode
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: support FW Recovery Mode
devlink: add devl guard
pldmfw: enable selected component update
====================
Link: https://patch.msgid.link/20250116212059.1254349-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"Two last minute fixes: one build issue on TI soc drivers, and a
regression in the renesas reset controller driver"
* tag 'soc-fixes-6.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
soc: ti: pruss: Fix pruss APIs
reset: rzg2l-usbphy-ctrl: Assign proper of node to the allocated device
|
|
Currently only stacked devices need to explicitly enable atomic writes by
setting BLK_FEAT_ATOMIC_WRITES_STACKED flag.
This does not work well for device mapper stacking devices, as there many
sets of limits are stacked and what is the 'bottom' and 'top' device can
swapped. This means that BLK_FEAT_ATOMIC_WRITES_STACKED needs to be set
for many queue limits, which is messy.
Generalize enabling atomic writes enabling by ensuring that all devices
must explicitly set a flag - that includes NVMe, SCSI sd, and md raid.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20250116170301.474130-2-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add device-managed variant of dev_pm_set_wake_irq which automatically
clear the wake irq on device destruction to simplify error handling
and resource management in drivers.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Link: https://patch.msgid.link/20250103-wake_irq-v2-1-e3aeff5e9966@nxp.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Introduce power budget management for the regulator device. Enable tracking
of available power capacity by providing helpers to request and release
power budget allocations.
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20250115-feature_regulator_pw_budget-v2-1-0a44b949e6bc@bootlin.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add kerneldoc and sysfs class documentation.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250116002721.75592-19-kuurtb@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
platform_profile_handler is now an internal structure. Move it to
platform_profile.c.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250116002721.75592-17-kuurtb@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
In order to protect the platform_profile_handler from API consumers,
allocate it in platform_profile_register() and modify it's signature
accordingly.
Remove the platform_profile_handler from all consumer drivers and
replace them with a pointer to the class device, which is
now returned from platform_profile_register().
Replace *pprof with a pointer to the class device in the rest of
exported symbols.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250116002721.75592-16-kuurtb@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
of_alias_scan() has no external callers and returns void.
Do not expose it and delete return value descriptions in its comments.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250114-of_core_fix-v5-1-b8bafd00a86f@quicinc.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next
|
|
Add EEE management to phylink, making use of the phylib implementation.
This will only be used where a MAC driver populates the methods and
capabilities bitfield, otherwise we keep our old behaviour.
Phylink will keep track of the EEE configuration, including the clock
stop abilities at each end of the MAC to PHY link, programming the PHY
appropriately and preserving the LPI configuration should the PHY go
away.
Phylink will call into the MAC driver when LPI needs to be enabled or
disabled, with the requirement that the MAC have LPI disabled prior
to the netdev being brought up (in other words, it will only call
mac_disable_tx_lpi() if it has already called mac_enable_tx_lpi().)
Support for phylink managed EEE is enabled by populating both tx_lpi
MAC operations method pointers, and filling in both LPI interfaces
and capabilities. If the methods are provided but the LPI interfaces
or capabilities remain empty, this indicates to phylink that EEE is
implemented by the driver but the hardware it is driving does not
support EEE, and thus the ethtool set_eee() and get_eee() methods will
return EOPNOTSUPP.
No validation of the LPI timer value is performed by this patch.
For interface modes which do not support LPI, we make no attempt to
manipulate the phylib EEE advertisement, but instead refuse to
activate LPI at the MAC, noting it at debug message level.
We also restrict the advertisement and reported userspace support
linkmode masks according to the lpi_capabilities provided to
phylink by the MAC driver.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1tYADq-0014Pn-J1@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for querying whether the PHY allows the transmit xMII clock
to be stopped while in LPI mode. This will be used by phylink to pass
to the MAC driver so it can configure the generation of the xMII clock
appropriately.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1tYADg-0014Pb-AJ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch enables to update a selected component from PLDM image
containing multiple components.
Example usage:
struct pldmfw;
data.mode = PLDMFW_UPDATE_MODE_SINGLE_COMPONENT;
data.compontent_identifier = DRIVER_FW_MGMT_COMPONENT_ID;
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Konrad Knitter <konrad.knitter@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.13-rc8).
Conflicts:
drivers/net/ethernet/realtek/r8169_main.c
1f691a1fc4be ("r8169: remove redundant hwmon support")
152d00a91396 ("r8169: simplify setting hwmon attribute visibility")
https://lore.kernel.org/20250115122152.760b4e8d@canb.auug.org.au
Adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.c
152f4da05aee ("bnxt_en: add support for rx-copybreak ethtool command")
f0aa6a37a3db ("eth: bnxt: always recalculate features after XDP clearing, fix null-deref")
drivers/net/ethernet/intel/ice/ice_type.h
50327223a8bb ("ice: add lock to protect low latency interface")
dc26548d729e ("ice: Fix quad registers read on E825")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a `probe` callback to platform_profile_ops, which lets drivers
initialize the choices member manually. This is a step towards
unexposing the struct platform_profile_handler from the consumer
drivers.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250116002721.75592-6-kuurtb@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Replace *profile_get and *profile_set members with a general *ops
member.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250116002721.75592-5-kuurtb@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Devices can now set drvdata to the class device, thus passing the
platform_profile_handler to callbacks is unnecessary. Instead pass the
class device.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250116002721.75592-4-kuurtb@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Add *drvdata to platform_profile_register() signature and assign it to
the class device.
While at it, pass specific driver state as drvdata to replace uses of
container_of() with dev_get_drvdata().
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250116002721.75592-3-kuurtb@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Instead of holding a reference to the class device, embed it the
platform_profile_handler. This involves manually creating and
registering the device and replacing dev_get_drvdata() with the newly
created to_pprof_handler() macro.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kurt Borja <kuurtb@gmail.com>
Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca>
Link: https://lore.kernel.org/r/20250116002721.75592-2-kuurtb@gmail.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Consider a scenario where a CPU transitions from CPUHP_ONLINE to halfway
through a CPU hotunplug down to CPUHP_HRTIMERS_PREPARE, and then back to
CPUHP_ONLINE:
Since hrtimers_prepare_cpu() does not run, cpu_base.hres_active remains set
to 1 throughout. However, during a CPU unplug operation, the tick and the
clockevents are shut down at CPUHP_AP_TICK_DYING. On return to the online
state, for instance CFS incorrectly assumes that the hrtick is already
active, and the chance of the clockevent device to transition to oneshot
mode is also lost forever for the CPU, unless it goes back to a lower state
than CPUHP_HRTIMERS_PREPARE once.
This round-trip reveals another issue; cpu_base.online is not set to 1
after the transition, which appears as a WARN_ON_ONCE in enqueue_hrtimer().
Aside of that, the bulk of the per CPU state is not reset either, which
means there are dangling pointers in the worst case.
Address this by adding a corresponding startup() callback, which resets the
stale per CPU state and sets the online flag.
[ tglx: Make the new callback unconditionally available, remove the online
modification in the prepare() callback and clear the remaining
state in the starting callback instead of the prepare callback ]
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com
|
|
Add a helper to initialize the lockdep, that is initialize the spinlock
and set a value. Having to open code them isn't a big deal, but having
an initializer feels right for a proper primitive.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250115094702.504610-6-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the superfluous externs from the remaining prototypes in lockref.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250115094702.504610-5-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Replace int used as bool with the actual bool type for return values that
can only be true or false.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250115094702.504610-4-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|