Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
- Rework inode number allocation
Recently we received a patchset that aims to enable file handle
encoding and decoding via name_to_handle_at(2) and
open_by_handle_at(2).
A crucical step in the patch series is how to go from inode number to
struct pid without leaking information into unprivileged contexts.
The issue is that in order to find a struct pid the pid number in the
initial pid namespace must be encoded into the file handle via
name_to_handle_at(2).
This can be used by containers using a separate pid namespace to
learn what the pid number of a given process in the initial pid
namespace is. While this is a weak information leak it could be used
in various exploits and in general is an ugly wart in the design.
To solve this problem a new way is needed to lookup a struct pid
based on the inode number allocated for that struct pid. The other
part is to remove the custom inode number allocation on 32bit systems
that is also an ugly wart that should go away.
Allocate unique identifiers for struct pid by simply incrementing a
64 bit counter and insert each struct pid into the rbtree so it can
be looked up to decode file handles avoiding to leak actual pids
across pid namespaces in file handles.
On both 64 bit and 32 bit the same 64 bit identifier is used to
lookup struct pid in the rbtree. On 64 bit the unique identifier for
struct pid simply becomes the inode number. Comparing two pidfds
continues to be as simple as comparing inode numbers.
On 32 bit the 64 bit number assigned to struct pid is split into two
32 bit numbers. The lower 32 bits are used as the inode number and
the upper 32 bits are used as the inode generation number. Whenever a
wraparound happens on 32 bit the 64 bit number will be incremented by
2 so inode numbering starts at 2 again.
When a wraparound happens on 32 bit multiple pidfds with the same
inode number are likely to exist. This isn't a problem since before
pidfs pidfds used the anonymous inode meaning all pidfds had the same
inode number. On 32 bit sserspace can thus reconstruct the 64 bit
identifier by retrieving both the inode number and the inode
generation number to compare, or use file handles. This gives the
same guarantees on both 32 bit and 64 bit.
- Implement file handle support
This is based on custom export operation methods which allows pidfs
to implement permission checking and opening of pidfs file handles
cleanly without hacking around in the core file handle code too much.
- Support bind-mounts
Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts
for pidfds. This allows pidfds to be safely recovered and checked for
process recycling.
Instead of checking d_ops for both nsfs and pidfs we could in a
follow-up patch add a flag argument to struct dentry_operations that
functions similar to file_operations->fop_flags.
* tag 'vfs-6.14-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add pidfd bind-mount tests
pidfs: allow bind-mounts
pidfs: lookup pid through rbtree
selftests/pidfd: add pidfs file handle selftests
pidfs: check for valid ioctl commands
pidfs: implement file handle support
exportfs: add permission method
fhandle: pull CAP_DAC_READ_SEARCH check into may_decode_fh()
exportfs: add open method
fhandle: simplify error handling
pseudofs: add support for export_ops
pidfs: support FS_IOC_GETVERSION
pidfs: remove 32bit inode number handling
pidfs: rework inode number allocation
|
|
Yonghong Song says:
====================
Emil Tsalapatis from Meta reported such a case where 'may_goto 0' insn is
generated by clang-19 compiler and this caused verification failure
since 'may_goto 0' is rejected by verifier.
In fact, 'may_goto 0' insn is actually a no-op and it won't hurt
verification. The only side effect is that the verifier will convert
the insn to a sequence of codes like
/* r10 - 8 stores the implicit loop count */
r11 = *(u64 *)(r10 -8)
if r11 == 0x0 goto pc+2
r11 -= 1
*(u64 *)(r10 -8) = r11
With this patch set 'may_goto 0' insns are allowed in verification which
also removes those insns.
Changelogs:
v1 -> v2:
- Instead of a separate function, removing 'may_goto 0' in existing
func opt_remove_nops().
====================
Link: https://patch.msgid.link/20250118192019.2123689-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add both asm-based and C-based tests which have 'may_goto 0' insns.
For the following code in C-based test,
int i, tmp[3];
for (i = 0; i < 3 && can_loop; i++)
tmp[i] = 0;
The clang compiler (clang 19 and 20) generates
may_goto 2
may_goto 1
may_goto 0
r1 = 0
r2 = 0
r3 = 0
The above asm codes are due to llvm pass SROAPass. This ensures the
successful verification since tmp[0-2] are initialized. Otherwise,
the code without SROAPass like
may_goto 5
r1 = 0
may_goto 3
r2 = 0
may_goto 1
r3 = 0
will have verification failure.
Although from the source code C-based test should have verification
failure, clang compiler optimization generates code with successful
verification. If gcc generates different asm codes than clang, the
following code can be used for gcc:
int i, tmp[3];
for (i = 0; i < 3; i++)
tmp[i] = 0;
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250118192034.2124952-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since 'may_goto 0' insns are actually no-op, let us remove them.
Otherwise, verifier will generate code like
/* r10 - 8 stores the implicit loop count */
r11 = *(u64 *)(r10 -8)
if r11 == 0x0 goto pc+2
r11 -= 1
*(u64 *)(r10 -8) = r11
which is the pure overhead.
The following code patterns (from the previous commit) are also
handled:
may_goto 2
may_goto 1
may_goto 0
With this commit, the above three 'may_goto' insns are all
eliminated.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250118192029.2124584-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit 011832b97b31 ("bpf: Introduce may_goto instruction") added support
for may_goto insn. The 'may_goto 0' insn is disallowed since the insn is
equivalent to a nop as both branch will go to the next insn.
But it is possible that compiler transformation may generate 'may_goto 0'
insn. Emil Tsalapatis from Meta reported such a case which caused
verification failure. For example, for the following code,
int i, tmp[3];
for (i = 0; i < 3 && can_loop; i++)
tmp[i] = 0;
...
clang 20 may generate code like
may_goto 2;
may_goto 1;
may_goto 0;
r1 = 0; /* tmp[0] = 0; */
r2 = 0; /* tmp[1] = 0; */
r3 = 0; /* tmp[2] = 0; */
Let us permit 'may_goto 0' insn to avoid verification failure for codes
like the above.
Reported-by: Emil Tsalapatis <etsal@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250118192024.2124059-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support caching symlink lengths in inodes
The size is stored in a new union utilizing the same space as
i_devices, thus avoiding growing the struct or taking up any more
space
When utilized it dodges strlen() in vfs_readlink(), giving about
1.5% speed up when issuing readlink on /initrd.img on ext4
- Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
If a file system supports uncached buffered IO, it may set
FOP_DONTCACHE and enable support for RWF_DONTCACHE.
If RWF_DONTCACHE is attempted without the file system supporting
it, it'll get errored with -EOPNOTSUPP
- Enable VBOXGUEST and VBOXSF_FS on ARM64
Now that VirtualBox is able to run as a host on arm64 (e.g. the
Apple M3 processors) we can enable VBOXSF_FS (and in turn
VBOXGUEST) for this architecture.
Tested with various runs of bonnie++ and dbench on an Apple MacBook
Pro with the latest Virtualbox 7.1.4 r165100 installed
Cleanups:
- Delay sysctl_nr_open check in expand_files()
- Use kernel-doc includes in fiemap docbook
- Use page->private instead of page->index in watch_queue
- Use a consume fence in mnt_idmap() as it's heavily used in
link_path_walk()
- Replace magic number 7 with ARRAY_SIZE() in fc_log
- Sort out a stale comment about races between fd alloc and dup2()
- Fix return type of do_mount() from long to int
- Various cosmetic cleanups for the lockref code
Fixes:
- Annotate spinning as unlikely() in __read_seqcount_begin
The annotation already used to be there, but got lost in commit
52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as
statement expressions")
- Fix proc_handler for sysctl_nr_open
- Flush delayed work in delayed fput()
- Fix grammar and spelling in propagate_umount()
- Fix ESP not readable during coredump
In /proc/PID/stat, there is the kstkesp field which is the stack
pointer of a thread. While the thread is active, this field reads
zero. But during a coredump, it should have a valid value
However, at the moment, kstkesp is zero even during coredump
- Don't wake up the writer if the pipe is still full
- Fix unbalanced user_access_end() in select code"
* tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
gfs2: use lockref_init for qd_lockref
erofs: use lockref_init for pcl->lockref
dcache: use lockref_init for d_lockref
lockref: add a lockref_init helper
lockref: drop superfluous externs
lockref: use bool for false/true returns
lockref: improve the lockref_get_not_zero description
lockref: remove lockref_put_not_zero
fs: Fix return type of do_mount() from long to int
select: Fix unbalanced user_access_end()
vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64
pipe_read: don't wake up the writer if the pipe is still full
selftests: coredump: Add stackdump test
fs/proc: do_task_stat: Fix ESP not readable during coredump
fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag
fs: sort out a stale comment about races between fd alloc and dup2
fs: Fix grammar and spelling in propagate_umount()
fs: fc_log replace magic number 7 with ARRAY_SIZE()
fs: use a consume fence in mnt_idmap()
file: flush delayed work in delayed fput()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull /proc/kcore updates from Christian Brauner:
"The performance of /proc/kcore reads has been showing up as a
bottleneck for the drgn debugger. drgn scripts often spend ~25% of
their time in the kernel reading from /proc/kcore.
A lot of this overhead comes from silly inefficiencies. This pull
request contains fixes for the low-hanging fruit. The fixes are all
fairly small and straightforward.
The result is a 25% improvement in read latency in micro-benchmarks
(from ~235 nanoseconds to ~175) and a 15% improvement in execution
time for real-world drgn scripts:
- Make /proc/kcore entry permanent
- Avoid walking the list on every read
- Use percpu_rw_semaphore for kclist_lock
- Make Omar Sandoval the official maintainer for /proc/kcore"
* tag 'vfs-6.14-rc1.kcore' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
MAINTAINERS: add me as /proc/kcore maintainer
proc/kcore: use percpu_rw_semaphore for kclist_lock
proc/kcore: don't walk list on every read
proc/kcore: mark proc entry as permanent
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs netfs updates from Christian Brauner:
"This contains read performance improvements and support for monolithic
single-blob objects that have to be read/written as such (e.g. AFS
directory contents). The implementation of the two parts is interwoven
as each makes the other possible.
- Read performance improvements
The read performance improvements are intended to speed up some
loss of performance detected in cifs and to a lesser extend in afs.
The problem is that we queue too many work items during the
collection of read results: each individual subrequest is collected
by its own work item, and then they have to interact with each
other when a series of subrequests don't exactly align with the
pattern of folios that are being read by the overall request.
Whilst the processing of the pages covered by individual
subrequests as they complete potentially allows folios to be woken
in parallel and with minimum delay, it can shuffle wakeups for
sequential reads out of order - and that is the most common I/O
pattern.
The final assessment and cleanup of an operation is then held up
until the last I/O completes - and for a synchronous sequential
operation, this means the bouncing around of work items just adds
latency.
Two changes have been made to make this work:
(1) All collection is now done in a single "work item" that works
progressively through the subrequests as they complete (and
also dispatches retries as necessary).
(2) For readahead and AIO, this work item be done on a workqueue
and can run in parallel with the ultimate consumer of the data;
for synchronous direct or unbuffered reads, the collection is
run in the application thread and not offloaded.
Functions such as smb2_readv_callback() then just tell netfslib
that the subrequest has terminated; netfslib does a minimal bit of
processing on the spot - stat counting and tracing mostly - and
then queues/wakes up the worker. This simplifies the logic as the
collector just walks sequentially through the subrequests as they
complete and walks through the folios, if buffered, unlocking them
as it goes. It also keeps to a minimum the amount of latency
injected into the filesystem's low-level I/O handling
The way netfs supports filesystems using the deprecated
PG_private_2 flag is changed: folios are flagged and added to a
write request as they complete and that takes care of scheduling
the writes to the cache. The originating read request can then just
unlock the pages whatever happens.
- Single-blob object support
Single-blob objects are files for which the content of the file
must be read from or written to the server in a single operation
because reading them in parts may yield inconsistent results. AFS
directories are an example of this as there exists the possibility
that the contents are generated on the fly and would differ between
reads or might change due to third party interference.
Such objects will be written to and retrieved from the cache if one
is present, though we allow/may need to propose multiple
subrequests to do so. The important part is that read from/write to
the *server* is monolithic.
Single blob reading is, for the moment, fully synchronous and does
result collection in the application thread and, also for the
moment, the API is supplied the buffer in the form of a folio_queue
chain rather than using the pagecache.
- Related afs changes
This series makes a number of changes to the kafs filesystem,
primarily in the area of directory handling:
- AFS's FetchData RPC reply processing is made partially
asynchronous which allows the netfs_io_request's outstanding
operation counter to be removed as part of reducing the
collection to a single work item.
- Directory and symlink reading are plumbed through netfslib using
the single-blob object API and are now cacheable with fscache.
This also allows the afs_read struct to be eliminated and
netfs_io_subrequest to be used directly instead.
- Directory and symlink content are now stored in a folio_queue
buffer rather than in the pagecache. This means we don't require
the RCU read lock and xarray iteration to access it, and folios
won't randomly disappear under us because the VM wants them
back.
- The vnode operation lock is changed from a mutex struct to a
private lock implementation. The problem is that the lock now
needs to be dropped in a separate thread and mutexes don't
permit that.
- When a new directory or symlink is created, we now initialise it
locally and mark it valid rather than downloading it (we know
what it's likely to look like).
- We now use the in-directory hashtable to reduce the number of
entries we need to scan when doing a lookup. The edit routines
have to maintain the hash chains.
- Cancellation (e.g. by signal) of an async call after the
rxrpc_call has been set up is now offloaded to the worker thread
as there will be a notification from rxrpc upon completion. This
avoids a double cleanup.
- A "rolling buffer" implementation is created to abstract out the
two separate folio_queue chaining implementations I had (one for
read and one for write).
- Functions are provided to create/extend a buffer in a folio_queue
chain and tear it down again.
This is used to handle AFS directories, but could also be used to
create bounce buffers for content crypto and transport crypto.
- The was_async argument is dropped from netfs_read_subreq_terminated()
Instead we wake the read collection work item by either queuing it
or waking up the app thread.
- We don't need to use BH-excluding locks when communicating between
the issuing thread and the collection thread as neither of them now
run in BH context.
- Also included are a number of new tracepoints; a split of the
netfslib write collection code to put retrying into its own file
(it gets more complicated with content encryption).
- There are also some minor fixes AFS included, including fixing the
AFS directory format struct layout, reducing some directory
over-invalidation and making afs_mkdir() translate EEXIST to
ENOTEMPY (which is not available on all systems the servers
support).
- Finally, there's a patch to try and detect entry into the folio
unlock function with no folio_queue structs in the buffer (which
isn't allowed in the cases that can get there).
This is a debugging patch, but should be minimal overhead"
* tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
afs: Add a tracepoint for afs_read_receive()
afs: Locally initialise the contents of a new symlink on creation
afs: Use the contained hashtable to search a directory
afs: Make afs_mkdir() locally initialise a new directory's content
netfs: Change the read result collector to only use one work item
afs: Make {Y,}FS.FetchData an asynchronous operation
afs: Fix cleanup of immediately failed async calls
afs: Eliminate afs_read
afs: Use netfslib for symlinks, allowing them to be cached
afs: Use netfslib for directories
afs: Make afs_init_request() get a key if not given a file
netfs: Add support for caching single monolithic objects such as AFS dirs
netfs: Add functions to build/clean a buffer in a folio_queue
afs: Add more tracepoints to do with tracking validity
cachefiles: Add auxiliary data trace
cachefiles: Add some subrequest tracepoints
netfs: Remove some extraneous directory invalidations
afs: Fix directory format encoding struct
afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
...
|
|
Hou Tao says:
====================
The patch set continues the previous work [1] to move all the freeings
of htab elements out of bucket lock. One motivation for the patch set is
the locking problem reported by Sebastian [2]: the freeing of bpf_timer
under PREEMPT_RT may acquire a spin-lock (namely softirq_expiry_lock).
However the freeing procedure for htab element has already held a
raw-spin-lock (namely bucket lock), and it will trigger the warning:
"BUG: scheduling while atomic" as demonstrated by the selftests patch.
Another motivation is to reduce the locked scope of bucket lock.
However, the patch set doesn't move all freeing of htab element out of
bucket lock, it still keep the free of special fields in pre-allocated
hash map under the protect of bucket lock in htab_map_update_elem(). The
patch set is structured as follows:
* Patch #1 moves the element freeing out of bucket lock for
htab_lru_map_delete_node(). However the freeing is still in the locked
scope of LRU raw spin lock.
* Patch #2~#3 move the element freeing out of bucket lock for
__htab_map_lookup_and_delete_elem()
* Patch #4 cancels the bpf_timer in two steps to fix the locking
problem in htab_map_update_elem() for PREEMPT_PRT.
* Patch #5 adds a selftest for the locking problem
Please see individual patches for more details. Comments are always
welcome.
---
v3:
* patch #1: update the commit message to state that the freeing of
special field is still in the locked scope of LRU raw spin lock
* patch #4: cancel the bpf_timer in two steps only for PREEMPT_RT
(suggested by Alexei)
v2: https://lore.kernel.org/bpf/20250109061901.2620825-1-houtao@huaweicloud.com
* cancels the bpf timer in two steps instead of breaking the reuse
the refill of per-cpu ->extra_elems into two steps
v1: https://lore.kernel.org/bpf/20250107085559.3081563-1-houtao@huaweicloud.com
[1]: https://lore.kernel.org/bpf/20241106063542.357743-1-houtao@huaweicloud.com
[2]: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de
====================
Link: https://patch.msgid.link/20250117101816.2101857-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The main purpose of the test is to demonstrate the lock problem for the
free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is
running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel()
will try to acquire a spin-lock (namely softirq_expiry_lock), however
the freeing procedure has already held a raw-spin-lock.
The test first creates two threads: one to start timers and the other to
free timers. The start-timers thread will start the timer and then wake
up the free-timers thread to free these timers when the starts complete.
After freeing, the free-timer thread will wake up the start-timer thread
to complete the current iteration. A loop of 10 iterations is used.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250117101816.2101857-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
During the update procedure, when overwrite element in a pre-allocated
htab, the freeing of old_element is protected by the bucket lock. The
reason why the bucket lock is necessary is that the old_element has
already been stashed in htab->extra_elems after alloc_htab_elem()
returns. If freeing the old_element after the bucket lock is unlocked,
the stashed element may be reused by concurrent update procedure and the
freeing of old_element will run concurrently with the reuse of the
old_element. However, the invocation of check_and_free_fields() may
acquire a spin-lock which violates the lockdep rule because its caller
has already held a raw-spin-lock (bucket lock). The following warning
will be reported when such race happens:
BUG: scheduling while atomic: test_progs/676/0x00000003
3 locks held by test_progs/676:
#0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830
#1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500
#2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0
Modules linked in: bpf_testmod(O)
Preemption disabled at:
[<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500
CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11
Tainted: [W]=WARN, [O]=OOT_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)...
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x70
dump_stack+0x10/0x20
__schedule_bug+0x120/0x170
__schedule+0x300c/0x4800
schedule_rtlock+0x37/0x60
rtlock_slowlock_locked+0x6d9/0x54c0
rt_spin_lock+0x168/0x230
hrtimer_cancel_wait_running+0xe9/0x1b0
hrtimer_cancel+0x24/0x30
bpf_timer_delete_work+0x1d/0x40
bpf_timer_cancel_and_free+0x5e/0x80
bpf_obj_free_fields+0x262/0x4a0
check_and_free_fields+0x1d0/0x280
htab_map_update_elem+0x7fc/0x1500
bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43
bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e
bpf_prog_test_run_syscall+0x322/0x830
__sys_bpf+0x135d/0x3ca0
__x64_sys_bpf+0x75/0xb0
x64_sys_call+0x1b5/0xa10
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
...
</TASK>
It seems feasible to break the reuse and refill of per-cpu extra_elems
into two independent parts: reuse the per-cpu extra_elems with bucket
lock being held and refill the old_element as per-cpu extra_elems after
the bucket lock is unlocked. However, it will make the concurrent
overwrite procedures on the same CPU return unexpected -E2BIG error when
the map is full.
Therefore, the patch fixes the lock problem by breaking the cancelling
of bpf_timer into two steps for PREEMPT_RT:
1) use hrtimer_try_to_cancel() and check its return value
2) if the timer is running, use hrtimer_cancel() through a kworker to
cancel it again
Considering that the current implementation of hrtimer_cancel() will try
to acquire a being held softirq_expiry_lock when the current timer is
running, these steps above are reasonable. However, it also has
downside. When the timer is running, the cancelling of the timer is
delayed when releasing the last map uref. The delay is also fixable
(e.g., break the cancelling of bpf timer into two parts: one part in
locked scope, another one in unlocked scope), it can be revised later if
necessary.
It is a bit hard to decide the right fix tag. One reason is that the
problem depends on PREEMPT_RT which is enabled in v6.12. Considering the
softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced
in v5.15, the bpf_timer commit is used in the fixes tag and an extra
depends-on tag is added to state the dependency on PREEMPT_RT.
Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.")
Depends-on: v6.12+ with PREEMPT_RT enabled
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Closes: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://lore.kernel.org/r/20250117101816.2101857-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The freeing of special fields in map value may acquire a spin-lock
(e.g., the freeing of bpf_timer), however, the lookup_and_delete_elem
procedure has already held a raw-spin-lock, which violates the lockdep
rule.
The running context of __htab_map_lookup_and_delete_elem() has already
disabled the migration. Therefore, it is OK to invoke free_htab_elem()
after unlocking the bucket lock.
Fix the potential problem by freeing element after unlocking bucket lock
in __htab_map_lookup_and_delete_elem().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250117101816.2101857-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Use goto statement to bail out early when the target element is not
found, instead of using a large else branch to handle the more likely
case. This change doesn't affect functionality and simply make the code
cleaner.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://lore.kernel.org/r/20250117101816.2101857-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When bpf_timer is used in LRU hash map, calling check_and_free_fields()
in htab_lru_map_delete_node() will invoke bpf_timer_cancel_and_free() to
free the bpf_timer. If the timer is running on other CPUs,
hrtimer_cancel() will invoke hrtimer_cancel_wait_running() to spin on
current CPU to wait for the completion of the hrtimer callback.
Considering that the deletion has already acquired a raw-spin-lock
(bucket lock). To reduce the time holding the bucket lock, move the
invocation of check_and_free_fields() out of bucket lock. However,
because htab_lru_map_delete_node() is invoked with LRU raw spin lock
being held, the freeing of special fields still happens in a locked
scope.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Link: https://lore.kernel.org/r/20250117101816.2101857-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Merge ACPI battery and fan drivers updates and miscellaneous ACPI
chanages for 6.14:
- Update messages printed by the ACPI battery driver to always
refer to driver extensions as "hooks" to avoid confusion with
similar functionality in the power supply subsystem in the
future (Thomas Weißschuh).
- Fix .probe() error path cleanup in the ACPI fan driver to avoid
memory leaks (Joe Hattori).
- Constify 'struct bin_attribute' in some places in the ACPI subsystem
and mark it as __ro_after_init in one place to prevent binary blob
attributes from being updated (Thomas Weißschuh)
- Add empty stubs for several ACPI-related symbols so that they can be
used when CONFIG_ACPI is unset and use them for removing unnecessary
conditional compilation from the ipu-bridge driver (Ricardo Ribalda).
* acpi-battery:
ACPI: battery: Rename extensions to hook in messages
* acpi-fan:
ACPI: fan: cleanup resources in the error path of .probe()
* acpi-misc:
media: ipu-bridge: Remove unneeded conditional compilations
ACPI: bus: implement acpi_device_hid when !ACPI
ACPI: bus: implement for_each_acpi_consumer_dev when !ACPI
ACPI: header: implement acpi_device_handle when !ACPI
ACPI: bus: implement acpi_get_physical_device_location when !ACPI
ACPI: bus: implement for_each_acpi_dev_match when !ACPI
ACPI: bus: change the prototype for acpi_get_physical_device_location
ACPI: sysfs: Constify 'struct bin_attribute'
ACPI: BGRT: Constify 'struct bin_attribute'
ACPI: BGRT: Mark bin_attribute as __ro_after_init
|
|
This was a suggestion by David Laight, and while I was slightly worried
that some micro-architecture would predict cmov like a conditional
branch, there is little reason to actually believe any core would be
that broken.
Intel documents that their existing cores treat CMOVcc as a data
dependency that will constrain speculation in their "Speculative
Execution Side Channel Mitigations" whitepaper:
"Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also
be used to prevent bounds check bypass by constraining speculative
execution on current family 6 processors (Intel® Core™, Intel® Atom™,
Intel® Xeon® and Intel® Xeon Phi™ processors)"
and while that leaves the future uarch issues open, that's certainly
true of our traditional SBB usage too.
Any core that predicts CMOV will be unusable for various crypto
algorithms that need data-independent timing stability, so let's just
treat CMOV as the safe choice that simplifies the address masking by
avoiding an extra instruction and doesn't need a temporary register.
Suggested-by: David Laight <David.Laight@aculab.com>
Link: https://www.intel.com/content/dam/develop/external/us/en/documents/336996-speculative-execution-side-channel-mitigations.pdf
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
'acpi-apei'
Merge assorted changes in ACPI library code for 6.14:
- Use usleep_range() instead of msleep() in acpi_os_sleep() to reduce
excessive delays due to timer inaccuracy, mostly affecting system
suspend and resume (Rafael Wysocki).
- Use str_enabled_disabled() string helpers in the ACPI tables parsing
code to make it easier to follow (Sunil V L).
- Update device properties parsing on systems using ACPI so that
data firmware nodes resulting from _DSD evaluation are treated as
available in firmware nodes walks (Sakari Ailus).
- Fix missing guid_t declaration in linux/prmt.h (Robert Richter).
- Update the GHES handling code to follow the global panic= instead of
overriding it by force-rebooting the system after a fatal hw error
has been reported (Borislav Petkov).
* acpi-osl:
ACPI: OSL: Use usleep_range() in acpi_os_sleep()
* acpi-tables:
ACPI: tables: Use string choice helpers
* acpi-property:
ACPI: property: Consider data nodes as being available
* acpi-prm:
ACPI: PRM: Fix missing guid_t declaration in linux/prmt.h
* acpi-apei:
APEI: GHES: Have GHES honor the panic= setting
|
|
Back when we added SMAP support, all versions of binutils didn't
necessarily understand the 'clac' and 'stac' instructions. So we
implemented those instructions manually as ".byte" sequences.
But we've since upgraded the minimum version of binutils to version
2.25, and that included proper support for the SMAP instructions, and
there's no reason for us to use some line noise to express them any
more.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The fault->mutex serializes the fault read()/write() fops and the
iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it
was conveniently used to fence the fault->deliver in poll() fop and
iommufd_fault_iopf_handler().
However, copy_from/to_user() may sleep if pagefaults are enabled. Thus,
they could take a long time to wait for user pages to swap in, blocking
iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ
handler of an IOMMU driver, resulting in a potential global DOS.
Instead of reusing the mutex to protect the fault->deliver list, add a
separate spinlock, nested under the mutex, to do the job.
iommufd_fault_iopf_handler() would no longer be blocked by
copy_from/to_user().
Add a free_list in iommufd_auto_response_faults(), so the spinlock can
simply fence a fast list_for_each_entry_safe routine.
Provide two deliver list helpers for iommufd_fault_fops_read() to use:
- Fetch the first iopf_group out of the fault->deliver list
- Restore an iopf_group back to the head of the fault->deliver list
Lastly, move the mutex closer to the response in the fault structure,
and update its kdoc accordingly.
Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object")
Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com
Cc: stable@vger.kernel.org
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v6.14
This was quite a quiet release for what I imagine are holiday related
reasons, the diffstat is dominated by some Cirrus Logic Kunit tests.
There's the usual mix of small improvements and fixes, plus a few new
drivers and features. The diffstat includes some DRM changes due to
work on HDMI audio.
- Allow clocking on each DAI in an audio graph card to be configured
separately.
- Improved power management for Renesas RZ-SSI.
- KUnit testing for the Cirrus DSP framework.
- Memory to meory operation support for Freescale/NXP platforms.
- Support for pause operations in SOF.
- Support for Allwinner suinv F1C100s, Awinc AW88083, Realtek
ALC5682I-VE
|
|
tasdevice_spi_switch_book()
Clang warns (or errors with CONFIG_WERROR=y):
sound/pci/hda/tas2781_hda_spi.c:110:6: error: variable 'ret' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
110 | if (tas_priv->cur_book != TASDEVICE_BOOK_ID(reg)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/pci/hda/tas2781_hda_spi.c:119:9: note: uninitialized use occurs here
119 | return ret;
| ^~~
sound/pci/hda/tas2781_hda_spi.c:110:2: note: remove the 'if' if its condition is always true
110 | if (tas_priv->cur_book != TASDEVICE_BOOK_ID(reg)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/pci/hda/tas2781_hda_spi.c:108:9: note: initialize the variable 'ret' to silence this warning
108 | int ret;
| ^
| = 0
Sink the declaration of ret into the if block and just return 0 at the
end of the function, as there is nothing to do if cur_book has already
been changed.
Fixes: bb5f86ea50ff ("ALSA: hda/tas2781: Add tas2781 hda SPI driver")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501192006.Hm9GmKiV-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20250120-tas2781_hda_spi-fix-wsometimes-uninitialized-v1-1-d7fd104aa63e@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
|
|
|
|
HWMON support has been added for the RTL8221/8251 PHYs integrated together
with the MAC inside the RTL8125/8126 chips. This patch extends temperature
reading support for standalone variants of the mentioned PHYs.
I don't know whether the earlier revisions of the RTL8226 also have a
built-in temperature sensor, so they have been skipped for now.
Tested on RTL8221B-VB-CG.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I noticed that HyStart incorrectly marks the start of rounds,
leading to inaccurate measurements of ACK train lengths and
resetting the `ca->sample_cnt` variable. This inaccuracy can impact
HyStart's functionality in terminating exponential cwnd growth during
Slow-Start, potentially degrading TCP performance.
The issue arises because the changes introduced in commit 4e1fddc98d25
("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows")
moved the caller of the `bictcp_hystart_reset` function inside the `hystart_update` function.
This modification added an additional condition for triggering the caller,
requiring that (tcp_snd_cwnd(tp) >= hystart_low_window) must also
be satisfied before invoking `bictcp_hystart_reset`.
This fix ensures that `bictcp_hystart_reset` is correctly called
at the start of a new round, regardless of the congestion window size.
This is achieved by moving the condition
(tcp_snd_cwnd(tp) >= hystart_low_window)
from before calling `bictcp_hystart_reset` to after it.
I tested with a client and a server connected through two Linux software routers.
In this setup, the minimum RTT was 150 ms, the bottleneck bandwidth was 50 Mbps,
and the bottleneck buffer size was 1 BDP, calculated as (50M / 1514 / 8) * 0.150 = 619 packets.
I conducted the test twice, transferring data from the server to the client for 1.5 seconds.
Before the patch was applied, HYSTART-DELAY stopped the exponential growth of cwnd when
cwnd = 516, and the bottleneck link was not yet saturated (516 < 619).
After the patch was applied, HYSTART-ACK-TRAIN stopped the exponential growth of cwnd when
cwnd = 632, and the bottleneck link was saturated (632 > 619).
In this test, applying the patch resulted in 300 KB more data delivered.
Fixes: 4e1fddc98d25 ("tcp_cubic: fix spurious Hystart ACK train detections for not-cwnd-limited flows")
Signed-off-by: Mahdi Arghavani <ma.arghavani@yahoo.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Haibo Zhang <haibo.zhang@otago.ac.nz>
Cc: David Eyers <david.eyers@otago.ac.nz>
Cc: Abbas Arghavani <abbas.arghavani@mdu.se>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change resolves warning produced by sparse tool as currently
there is a mismatch between normal generic type in salt and endian
annotated type in macsec driver code. Endian annotated types should
be used here.
Sparse output:
warning: restricted ssci_t degrades to integer
warning: incorrect type in assignment (different base types)
expected restricted ssci_t [usertype] ssci
got unsigned int
warning: restricted __be64 degrades to integer
warning: incorrect type in assignment (different base types)
expected restricted __be64 [usertype] pn
got unsigned long long
Signed-off-by: Ales Nezbeda <anezbeda@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On a 32bit system the "keylen + sizeof(struct tipc_aead_key)" math could
have an integer wrapping issue. It doesn't matter because the "keylen"
is checked on the next line, but just to make life easier for static
analysis tools, let's re-order these conditions and avoid the integer
overflow.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Background: https://lore.kernel.org/r/20250107123615.161095-1-ericwouds@gmail.com
Since adding negotiation of in-band capabilities, it is no longer
sufficient to just look at the MLO_AN_xxx mode and PHY interface to
decide whether to do a major configuration, since the result now
depends on the capabilities of the attaching PHY.
Always trigger a major configuration in this case.
Testing log: https://lore.kernel.org/r/f20c9744-3953-40e7-a9c9-5534b25d2e2a@gmail.com
Reported-by: Eric Woudstra <ericwouds@gmail.com>
Tested-by: Eric Woudstra <ericwouds@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On machines that do not support the balanced profile the value of
last_non_turbo_profile is invalid after initialization which might
cause the driver to switch to an unsupported platform profile later.
Fix this by only setting last_non_turbo_profile to supported platform
profile values.
Fixes: 191e21f1a4c3 ("platform/x86: acer-wmi: use an ACPI bitmap to set the platform profile choices")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Tested-by: Hridesh MG <hridesh699@gmail.com>
Reviewed-by: Hridesh MG <hridesh699@gmail.com>
Link: https://lore.kernel.org/r/20250119201723.11102-3-W_Armin@gmx.de
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
On the Acer Swift SFG14-41, the events 8 - 1 and 8 - 0 are printed on
AC connect/disconnect. Ignore those events to avoid spamming the
kernel log with error messages.
Reported-by: Farhan Anwar <farhan.anwar8@gmail.com>
Closes: https://lore.kernel.org/platform-driver-x86/2ffb529d-e7c8-4026-a3b8-120c8e7afec8@gmail.com
Tested-by: Rayan Margham <rayanmargham4@gmail.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20250119201723.11102-2-W_Armin@gmx.de
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Fix build warnings reported from linux-next.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20250120192504.4a1965a0@canb.auug.org.au
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Follow the advice in Documentation/filesystems/sysfs.rst:
show() should only use sysfs_emit() or sysfs_emit_at() when formatting
the value to be returned to user space.
Signed-off-by: Ai Chao <aichao@kylinos.cn>
Acked-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20250116081129.2902274-1-aichao@kylinos.cn
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Follow the advice in Documentation/filesystems/sysfs.rst:
show() should only use sysfs_emit() or sysfs_emit_at() when formatting
the value to be returned to user space.
Signed-off-by: Ai Chao <aichao@kylinos.cn>
Acked-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20250116081000.2900435-1-aichao@kylinos.cn
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Follow the advice in Documentation/filesystems/sysfs.rst:
show() should only use sysfs_emit() or sysfs_emit_at() when formatting
the value to be returned to user space.
Signed-off-by: Ai Chao <aichao@kylinos.cn>
Acked-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20250116080836.2890442-1-aichao@kylinos.cn
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
The following patch adds support for HP Victus 16-s1000 laptop series,
by adding and fixing the following functionalities, which can be
accessed through hwmon and platform_profile sysfs:
- Functional measured fan speed reading
- Ability to enable and disable maximum fan speed
- Platform profiles full setting ability for CPU and GPU
It sets appropriates CPU and GPU power settings both on AC and battery
power sources, for low-power, balanced and performance modes.
It has been thoroughly tested on a 16-s1034nf laptop based on a 8C9C DMI
board name, and behavior of the driver on previous boards is left
untouched thanks to the separated lists of DMI board names.
Signed-off-by: Julien ROBIN <julien.robin28@free.fr>
Link: https://lore.kernel.org/r/1c00f906-8500-41d5-be80-f9092b6a49f1@free.fr
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Merge updates of Intel thermal drivers for 6.14:
- Add support for Panther Lake processors in multiple places (Zhang
Rui, Srinivas Pandruvada).
- Remove explicit user_space governor selection from Intel thermal
drivers (Srinivas Pandruvada).
* thermal-intel:
thermal: intel: Fix compile issue when CONFIG_NET is not defined
thermal: intel: int340x: Panther Lake power floor and workload hint support
thermal: intel: int340x: Panther Lake DLVR support
thermal: intel: Remove explicit user_space governor selection
ACPI: DPTF: Support Panther Lake
thermal: intel: int340x: processor: Enable MMIO RAPL for Panther Lake
powercap: intel_rapl: Add support for Panther Lake platform
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge ARM cpufreq updates for 6.14 from Viresh Kumar:
"- Extended support for more SoCs in apple cpufreq driver (Hector Martin
and Nick Chan).
- Add new cpufreq driver for Airoha SoCs (Christian Marangi).
- Fix using cpufreq-dt as module (Andreas Kemnade).
- Minor fixes for Sparc, scmi, and Qcom drivers (Ethan Carter Edwards,
Sibi Sankar and Manivannan Sadhasivam)."
* tag 'cpufreq-arm-updates-6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver
cpufreq: sparc: change kzalloc to kcalloc
cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks
cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available
cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support
cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT
cpufreq: apple-soc: Increase cluster switch timeout to 400us
cpufreq: apple-soc: Use 32-bit read for status register
cpufreq: apple-soc: Allow per-SoC configuration of APPLE_DVFS_CMD_PS1
cpufreq: apple-soc: Drop setting the PS2 field on M2+
dt-bindings: cpufreq: apple,cluster-cpufreq: Add A7-A11, T2 compatibles
dt-bindings: cpufreq: Document support for Airoha EN7581 CPUFreq
cpufreq: fix using cpufreq-dt as module
cpufreq: scmi: Register for limit change notifications
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge OPP (Operating Performance Points) updates for 6.14 from Viresh
Kumar:
"- Minor cleanups / fixes (Dan Carpenter, Neil Armstrong, and Joe
Hattori).
- Implement dev_pm_opp_get_bw (Neil Armstrong).
- Expose reference counting helpers (Viresh Kumar)."
* tag 'opp-updates-6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
PM / OPP: Add reference counting helpers for Rust implementation
OPP: OF: Fix an OF node leak in _opp_add_static_v2()
OPP: fix dev_pm_opp_find_bw_*() when bandwidth table not initialized
OPP: add index check to assert to avoid buffer overflow in _read_freq()
opp: core: Fix off by one in dev_pm_opp_get_bw()
opp: core: implement dev_pm_opp_get_bw
|
|
Share some infrastructure between sample programs and fix a build
failure that was reported.
Reported-by: Sasha Levin <sashal@kernel.org>
Link: https://lore.kernel.org/r/Z42UkSXx0MS9qZ9w@lappy
Link: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.13-rc7-511-g109a8e0fa9d6/testrun/26809210/suite/build/test/gcc-8-allyesconfig/log
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
aarp_send_probe_phase1() used to work by calling ndo_do_ioctl of
appletalk drivers ltpc or cops, but these two drivers have been removed
since the following commits:
commit 03dcb90dbf62 ("net: appletalk: remove Apple/Farallon LocalTalk PC
support")
commit 00f3696f7555 ("net: appletalk: remove cops support")
Thus aarp_send_probe_phase1() no longer works, so drop it. (found by
code inspection)
Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When getting the IRQ we use k3_udma_glue_tx_get_irq() which returns
negative error value on error. So not NULL check is not sufficient
to deteremine if IRQ is valid. Check that IRQ is greater then zero
to ensure it is valid.
There is no issue at probe time but at runtime user can invoke
.set_channels which results in the following call chain.
am65_cpsw_set_channels()
am65_cpsw_nuss_update_tx_rx_chns()
am65_cpsw_nuss_remove_tx_chns()
am65_cpsw_nuss_init_tx_chns()
At this point if am65_cpsw_nuss_init_tx_chns() fails due to
k3_udma_glue_tx_get_irq() then tx_chn->irq will be set to a
negative value.
Then, at subsequent .set_channels with higher channel count we
will attempt to free an invalid IRQ in am65_cpsw_nuss_remove_tx_chns()
leading to a kernel warning.
The issue is present in the original commit that introduced this driver,
although there, am65_cpsw_nuss_update_tx_rx_chns() existed as
am65_cpsw_nuss_update_tx_chns().
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace ternary (condition ? "enable" : "disable") syntax with helpers
from string_choices.h because:
1. Simple function call with one argument is easier to read. Ternary
operator has three arguments and with wrapping might lead to quite
long code.
2. Is slightly shorter thus also easier to read.
3. It brings uniformity in the text - same string.
4. Allows deduping by the linker, which results in a smaller binary
file.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch addresses issues with filter counting in block (tcf_block),
particularly for software bypass scenarios, by introducing a more
accurate mechanism using useswcnt.
Previously, filtercnt and skipswcnt were introduced by:
Commit 2081fd3445fe ("net: sched: cls_api: add filter counter") and
Commit f631ef39d819 ("net: sched: cls_api: add skip_sw counter")
filtercnt tracked all tp (tcf_proto) objects added to a block, and
skipswcnt counted tp objects with the skipsw attribute set.
The problem is: a single tp can contain multiple filters, some with skipsw
and others without. The current implementation fails in the case:
When the first filter in a tp has skipsw, both skipswcnt and filtercnt
are incremented, then adding a second filter without skipsw to the same
tp does not modify these counters because tp->counted is already set.
This results in bypass software behavior based solely on skipswcnt
equaling filtercnt, even when the block includes filters without
skipsw. Consequently, filters without skipsw are inadvertently bypassed.
To address this, the patch introduces useswcnt in block to explicitly count
tp objects containing at least one filter without skipsw. Key changes
include:
Whenever a filter without skipsw is added, its tp is marked with usesw
and counted in useswcnt. tc_run() now uses useswcnt to determine software
bypass, eliminating reliance on filtercnt and skipswcnt.
This refined approach prevents software bypass for blocks containing
mixed filters, ensuring correct behavior in tc_run().
Additionally, as atomic operations on useswcnt ensure thread safety and
tp->lock guards access to tp->usesw and tp->counted, the broader lock
down_write(&block->cb_lock) is no longer required in tc_new_tfilter(),
and this resolves a performance regression caused by the filter counting
mechanism during parallel filter insertions.
The improvement can be demonstrated using the following script:
# cat insert_tc_rules.sh
tc qdisc add dev ens1f0np0 ingress
for i in $(seq 16); do
taskset -c $i tc -b rules_$i.txt &
done
wait
Each of rules_$i.txt files above includes 100000 tc filter rules to a
mlx5 driver NIC ens1f0np0.
Without this patch:
# time sh insert_tc_rules.sh
real 0m50.780s
user 0m23.556s
sys 4m13.032s
With this patch:
# time sh insert_tc_rules.sh
real 0m17.718s
user 0m7.807s
sys 3m45.050s
Fixes: 047f340b36fc ("net: sched: make skip_sw actually skip software")
Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Tested-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- improvement of behavior for non-standard LED brightness values (Jason Gerecke)
- PCI Wacom device supports (depends on Intel THC support) (Even Xu)
|
|
- minor code cleanup (Colin Ian King)
|
|
- SteelSeries Arctis 9 support (Christian Mayer)
|
|
- support for MD/GEN 6B controller (Ryan McClelland)
|
|
- improved support for Thinkpad-X12-TAB-1/2 (Vishnu Sankar)
|
|
- newly added support for Intel Touch Host Controller (Even Xu, Xinpeng Sun)
|
|
- dead code removal in intel-ish-hid driver (Dr. David Alan Gilbert)
|