Age | Commit message (Collapse) | Author |
|
Similar to other protocol ops, it's useful for an scmi module driver to get
the number of supported performance domains, hence let's make this
available by adding a new perf protocol callback. Note that, a user is
being added from subsequent changes.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20230825112633.236607-2-ulf.hansson@linaro.org
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
This adds handling of MSG_ZEROCOPY flag on transmission path:
1) If this flag is set and zerocopy transmission is possible (enabled
in socket options and transport allows zerocopy), then non-linear
skb will be created and filled with the pages of user's buffer.
Pages of user's buffer are locked in memory by 'get_user_pages()'.
2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it
calls 'skb_set_owner_w()'. Reason of this change is that
'__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so
to decrease this field correctly, proper skb destructor is needed:
'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'.
If this callback is set, then transport needs extra check to be able
to send provided number of buffers in zerocopy mode. Currently, the
only transport that needs this callback set is virtio, because this
transport adds new buffers to the virtio queue and we need to check,
that number of these buffers is less than size of the queue (it is
required by virtio spec). vhost and loopback transports don't need
this check.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This is preparation patch for MSG_ZEROCOPY support. It adds handling of
non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with
'skb_copy_datagram_iter()'. Main advantage of the second one is that it
can handle paged part of the skb by using 'kmap()' on each page, but if
there are no pages in the skb, it behaves like simple copying to iov
iterator. This patch also adds new field to the control block of skb -
this value shows current offset in the skb to read next portion of data
(it doesn't matter linear it or not). Idea behind this field is that
'skb_copy_datagram_iter()' handles both types of skb internally - it
just needs an offset from which to copy data from the given skb. This
offset is incremented on each read from skb. This approach allows to
simplify handling of both linear and non-linear skbs, because for
linear skb we need to call 'skb_pull()' after reading data from it,
while in non-linear case we need to update 'data_len'.
Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Use the same _LINUX_SCHED_ prefix nomenclature as the other 29 header guards
in include/linux/sched/ do.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It's the only non-trivial header in include/linux/sched/ missing a header guard.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Except on x86, preempt_count is always accessed with READ_ONCE().
Repeated invocations in macros like irq_count() produce repeated loads.
These redundant instructions appear in various fast paths. In the one
shown below, for example, irq_count() is evaluated during kernel entry
if !tick_nohz_full_cpu(smp_processor_id()).
0001ed0a <irq_enter_rcu>:
1ed0a: 4e56 0000 linkw %fp,#0
1ed0e: 200f movel %sp,%d0
1ed10: 0280 ffff e000 andil #-8192,%d0
1ed16: 2040 moveal %d0,%a0
1ed18: 2028 0008 movel %a0@(8),%d0
1ed1c: 0680 0001 0000 addil #65536,%d0
1ed22: 2140 0008 movel %d0,%a0@(8)
1ed26: 082a 0001 000f btst #1,%a2@(15)
1ed2c: 670c beqs 1ed3a <irq_enter_rcu+0x30>
1ed2e: 2028 0008 movel %a0@(8),%d0
1ed32: 2028 0008 movel %a0@(8),%d0
1ed36: 2028 0008 movel %a0@(8),%d0
1ed3a: 4e5e unlk %fp
1ed3c: 4e75 rts
This patch doesn't prevent the pointless btst and beqs instructions
above, but it does eliminate 2 of the 3 pointless move instructions
here and elsewhere.
On x86, preempt_count is per-cpu data and the problem does not arise
presumably because the compiler is free to optimize more effectively.
This patch was tested on m68k and x86. I was expecting no changes
to object code for x86 and mostly that's what I saw. However, there
were a few places where code generation was perturbed for some reason.
The performance issue addressed here is minor on uniprocessor m68k. I
got a 0.01% improvement from this patch for a simple "find /sys -false"
benchmark. For architectures and workloads susceptible to cache line bounce
the improvement is expected to be larger. The only SMP architecture I have
is x86, and as x86 unaffected I have not done any further measurements.
Fixes: 15115830c887 ("preempt: Cleanup the macro maze a bit")
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/0a403120a682a525e6db2d81d1a3ffcc137c3742.1694756831.git.fthain@linux-m68k.org
|
|
do_write_seqcount_begin_nested()
It was brought up by Tetsuo that the following sequence:
write_seqlock_irqsave()
printk_deferred_enter()
could lead to a deadlock if the lockdep annotation within
write_seqlock_irqsave() triggers.
The problem is that the sequence counter is incremented before the lockdep
annotation is performed. The lockdep splat would then attempt to invoke
printk() but the reader side, of the same seqcount, could have a
tty_port::lock acquired waiting for the sequence number to become even again.
The other lockdep annotations come before the actual locking because "we
want to see the locking error before it happens". There is no reason why
seqcount should be different here.
Do the lockdep annotation first then perform the locking operation (the
sequence increment).
Fixes: 1ca7d67cf5d5a ("seqcount: Add lockdep functionality to seqcount/seqlock structures")
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230920104627._DTHgPyA@linutronix.de
Closes: https://lore.kernel.org/20230621130641.-5iueY1I@linutronix.de
|
|
This reverts commit ffb6cf19e06334062744b7e3493f71e500964f8e.
Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.
Message-ID: <20230920-keine-eile-c9755b5825db@brauner>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Expose a couple of new SCMI clock operations to get and set OEM specific
clock configurations when talking to an SCMI v3.2 compliant.
Issuing such requests against an SCMI platform server not supporting v3.2
extension for OEM specific clock configurations will fail.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20230826125308.462328-7-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
Add support for v3.2 clock CONFIG_GET command and related new clock
protocol operation state_get() to retrieve the status of a clock.
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20230826125308.462328-4-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
SCMI clock enable/disable operations come in 2 different flavours which
simply just differ in how the underlying SCMI transactions is carried on:
atomic or not.
Currently we expose such SCMI operations through 2 distinctly named
wrappers, that, in turn, are wrapped into another couple of similarly and
distinctly named callbacks inside SCMI clock driver user.
Reduce the churn of duplicated wrappers by adding a param to SCMI clock
enable/disable operations to ask for atomic operation while removing the
_atomic version of such operations.
No functional change.
CC: Michael Turquette <mturquette@baylibre.com>
CC: Stephen Boyd <sboyd@kernel.org>
CC: linux-clk@vger.kernel.org
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20230826125308.462328-2-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
Since commit:
9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
The ordering fallbacks for atomic*_read_acquire() and
atomic*_set_release() erroneously fall back to the implictly relaxed
atomic*_read() and atomic*_set() variants respectively, without any
additional barriers. This loses the ACQUIRE and RELEASE ordering
semantics, which can result in a wide variety of problems, even on
strongly-ordered architectures where the implementation of
atomic*_read() and/or atomic*_set() allows the compiler to reorder those
relative to other accesses.
In practice this has been observed to break bit spinlocks on arm64,
resulting in dentry cache corruption.
The fallback logic was intended to allow ACQUIRE/RELEASE/RELAXED ops to
be defined in terms of FULL ops, but where an op had RELAXED ordering by
default, this unintentionally permitted the ACQUIRE/RELEASE ops to be
defined in terms of the implicitly RELAXED default.
This patch corrects the logic to avoid falling back to implicitly
RELAXED ops, resulting in the same behaviour as prior to commit
9257959a6e5b4fca.
I've verified the resulting assembly on arm64 by generating outlined
wrappers of the atomics. Prior to this patch the compiler generates
sequences using relaxed load (LDR) and store (STR) instructions, e.g.
| <outlined_atomic64_read_acquire>:
| ldr x0, [x0]
| ret
|
| <outlined_atomic64_set_release>:
| str x1, [x0]
| ret
With this patch applied the compiler generates sequences using the
intended load-acquire (LDAR) and store-release (STLR) instructions, e.g.
| <outlined_atomic64_read_acquire>:
| ldar x0, [x0]
| ret
|
| <outlined_atomic64_set_release>:
| stlr x1, [x0]
| ret
To make sure that there were no other victims of the ifdeffery rewrite,
I generated outlined copies of all of the {atomic,atomic64,atomic_long}
atomic operations before and after commit 9257959a6e5b4fca. A diff of
the generated assembly on arm64 shows that only the read_acquire() and
set_release() operations were changed, and only lost their intended
ordering:
| [mark@lakrids:~/src/linux]% diff -u \
| <(aarch64-linux-gnu-objdump -d before-9257959a6e5b4fca.o)
| <(aarch64-linux-gnu-objdump -d after-9257959a6e5b4fca.o)
| --- /proc/self/fd/11 2023-09-19 16:51:51.114779415 +0100
| +++ /proc/self/fd/16 2023-09-19 16:51:51.114779415 +0100
| @@ -1,5 +1,5 @@
|
| -before-9257959a6e5b4fca.o: file format elf64-littleaarch64
| +after-9257959a6e5b4fca.o: file format elf64-littleaarch64
|
|
| Disassembly of section .text:
| @@ -9,7 +9,7 @@
| 4: d65f03c0 ret
|
| 0000000000000008 <outlined_atomic_read_acquire>:
| - 8: 88dffc00 ldar w0, [x0]
| + 8: b9400000 ldr w0, [x0]
| c: d65f03c0 ret
|
| 0000000000000010 <outlined_atomic_set>:
| @@ -17,7 +17,7 @@
| 14: d65f03c0 ret
|
| 0000000000000018 <outlined_atomic_set_release>:
| - 18: 889ffc01 stlr w1, [x0]
| + 18: b9000001 str w1, [x0]
| 1c: d65f03c0 ret
|
| 0000000000000020 <outlined_atomic_add>:
| @@ -1230,7 +1230,7 @@
| 1070: d65f03c0 ret
|
| 0000000000001074 <outlined_atomic64_read_acquire>:
| - 1074: c8dffc00 ldar x0, [x0]
| + 1074: f9400000 ldr x0, [x0]
| 1078: d65f03c0 ret
|
| 000000000000107c <outlined_atomic64_set>:
| @@ -1238,7 +1238,7 @@
| 1080: d65f03c0 ret
|
| 0000000000001084 <outlined_atomic64_set_release>:
| - 1084: c89ffc01 stlr x1, [x0]
| + 1084: f9000001 str x1, [x0]
| 1088: d65f03c0 ret
|
| 000000000000108c <outlined_atomic64_add>:
| @@ -2427,7 +2427,7 @@
| 207c: d65f03c0 ret
|
| 0000000000002080 <outlined_atomic_long_read_acquire>:
| - 2080: c8dffc00 ldar x0, [x0]
| + 2080: f9400000 ldr x0, [x0]
| 2084: d65f03c0 ret
|
| 0000000000002088 <outlined_atomic_long_set>:
| @@ -2435,7 +2435,7 @@
| 208c: d65f03c0 ret
|
| 0000000000002090 <outlined_atomic_long_set_release>:
| - 2090: c89ffc01 stlr x1, [x0]
| + 2090: f9000001 str x1, [x0]
| 2094: d65f03c0 ret
|
| 0000000000002098 <outlined_atomic_long_add>:
I've build tested this with a variety of configs for alpha, arm, arm64,
csky, i386, m68k, microblaze, mips, nios2, openrisc, powerpc, riscv,
s390, sh, sparc, x86_64, and xtensa, for which I've seen no issues. I
was unable to build test for ia64 and parisc due to existing build
breakage in v6.6-rc2.
Fixes: 9257959a6e5b4fca ("locking/atomic: scripts: restructure fallback ifdeffery")
Reported-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lkml.kernel.org/r/20230919171430.2697727-1-mark.rutland@arm.com
|
|
With PREEMPT_RT there is a rt_mutex recursion problem where
sched_submit_work() can use an rtlock (aka spinlock_t). More
specifically what happens is:
mutex_lock() /* really rt_mutex */
...
__rt_mutex_slowlock_locked()
task_blocks_on_rt_mutex()
// enqueue current task as waiter
// do PI chain walk
rt_mutex_slowlock_block()
schedule()
sched_submit_work()
...
spin_lock() /* really rtlock */
...
__rt_mutex_slowlock_locked()
task_blocks_on_rt_mutex()
// enqueue current task as waiter *AGAIN*
// *CONFUSION*
Fix this by making rt_mutex do the sched_submit_work() early, before
it enqueues itself as a waiter -- before it even knows *if* it will
wait.
[[ basically Thomas' patch but with different naming and a few asserts
added ]]
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230908162254.999499-5-bigeasy@linutronix.de
|
|
Add new health error syndrome to indicate that pci data poisoned error
has been received while fetching device ICM data.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In order to have mcast offloads the driver needs the following:
It should know if that mcast comes from wire port, in addition the flow
should not be marked as any specific source, that way it will give the
flexibility for the driver not to be depended on the way iterator
implemented in the FW.
Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
As Linus suggested, __HAVE_ARCH_XYZ is "stupid" and "having historical
uses of it doesn't make it good". So migrate __HAVE_ARCH_SHADOW_MAP to
separate macros named after the respective functions.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add a new API type lskcipher designed for taking straight kernel
pointers instead of SG lists. Its relationship to skcipher will
be analogous to that between shash and ahash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Move the macro CRYPTO_ALG_TYPE_AHASH_MASK out of linux/crypto.h
and into crypto/ahash.c so that it's not visible to users of the
Crypto API.
Also remove the unused CRYPTO_ALG_TYPE_HASH_MASK macro.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Update macro name for LLCC_DRE to LLCC_ECC as per the latest specification.
Signed-off-by: Komal Bajaj <quic_kbajaj@quicinc.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20230830105654.28057-6-quic_kbajaj@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Add the stub nvmem_cell_read_u8() function for drivers running with
CONFIG_NVMEM disabled.
Signed-off-by: Komal Bajaj <quic_kbajaj@quicinc.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Link: https://lore.kernel.org/r/20230830105654.28057-4-quic_kbajaj@quicinc.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Fix 'tr' dereferencing bug when CONFIG_BPF_JIT is turned off.
When CONFIG_BPF_JIT is turned off, 'bpf_trampoline_get()' returns NULL,
which is same as the cases when CONFIG_BPF_JIT is turned on.
Closes: https://lore.kernel.org/r/202309131936.5Nc8eUD0-lkp@intel.com/
Fixes: f7b12b6fea00 ("bpf: verifier: refactor check_attach_btf_id()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230917153846.88732-1-hffilwlqm@gmail.com
|
|
Breno and Josef report a deadlock scenario from cgroup reclaim
re-entering the filesystem:
[ 361.546690] ======================================================
[ 361.559210] WARNING: possible circular locking dependency detected
[ 361.571703] 6.5.0-0_fbk700_debug_rc0_kbuilder_13159_gbf787a128001 #1 Tainted: G S E
[ 361.589704] ------------------------------------------------------
[ 361.602277] find/9315 is trying to acquire lock:
[ 361.611625] ffff88837ba140c0 (&delayed_node->mutex){+.+.}-{4:4}, at: __btrfs_release_delayed_node+0x68/0x4f0
[ 361.631437]
[ 361.631437] but task is already holding lock:
[ 361.643243] ffff8881765b8678 (btrfs-tree-01){++++}-{4:4}, at: btrfs_tree_read_lock+0x1e/0x40
[ 362.904457] mutex_lock_nested+0x1c/0x30
[ 362.912414] __btrfs_release_delayed_node+0x68/0x4f0
[ 362.922460] btrfs_evict_inode+0x301/0x770
[ 362.982726] evict+0x17c/0x380
[ 362.988944] prune_icache_sb+0x100/0x1d0
[ 363.005559] super_cache_scan+0x1f8/0x260
[ 363.013695] do_shrink_slab+0x2a2/0x540
[ 363.021489] shrink_slab_memcg+0x237/0x3d0
[ 363.050606] shrink_slab+0xa7/0x240
[ 363.083382] shrink_node_memcgs+0x262/0x3b0
[ 363.091870] shrink_node+0x1a4/0x720
[ 363.099150] shrink_zones+0x1f6/0x5d0
[ 363.148798] do_try_to_free_pages+0x19b/0x5e0
[ 363.157633] try_to_free_mem_cgroup_pages+0x266/0x370
[ 363.190575] reclaim_high+0x16f/0x1f0
[ 363.208409] mem_cgroup_handle_over_high+0x10b/0x270
[ 363.246678] try_charge_memcg+0xaf2/0xc70
[ 363.304151] charge_memcg+0xf0/0x350
[ 363.320070] __mem_cgroup_charge+0x28/0x40
[ 363.328371] __filemap_add_folio+0x870/0xd50
[ 363.371303] filemap_add_folio+0xdd/0x310
[ 363.399696] __filemap_get_folio+0x2fc/0x7d0
[ 363.419086] pagecache_get_page+0xe/0x30
[ 363.427048] alloc_extent_buffer+0x1cd/0x6a0
[ 363.435704] read_tree_block+0x43/0xc0
[ 363.443316] read_block_for_search+0x361/0x510
[ 363.466690] btrfs_search_slot+0xc8c/0x1520
This is caused by the mem_cgroup_handle_over_high() not respecting the
gfp_mask of the allocation context. We used to only call this function on
resume to userspace, where no locks were held. But c9afe31ec443 ("memcg:
synchronously enforce memory.high for large overcharges") added a call
from the allocation context without considering the gfp.
Link: https://lkml.kernel.org/r/20230914152139.100822-1-hannes@cmpxchg.org
Fixes: c9afe31ec443 ("memcg: synchronously enforce memory.high for large overcharges")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Breno Leitao <leitao@debian.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org> [5.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
MT7988 SoC support 802.11 receive reordering offload in hw while
MT7986 SoC implements it through the firmware running on the mcu.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Introduce partial AMSDU offload support for MT7988 SoC in order to merge
in hw packets belonging to the same AMSDU before passing them to the
WLAN nic.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Similar to MT7986 and MT7622, enable Wireless Ethernet Ditpatcher for
MT7988 in order to offload traffic forwarded from LAN/WLAN to WLAN/LAN
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Introduce mtk_wed_buf structure to store both virtual and physical
addresses allocated in mtk_wed_tx_buffer_alloc() routine. This is a
preliminary patch to add WED support for MT7988 SoC since it relies on a
different dma descriptor layout not storing page dma addresses.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Rename mtk_rxbm_desc structure in mtk_wed_bm_desc since it will be used
even on tx side by MT7988 SoC.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We have data-races while reading np->srcprefs
Switch the field to a plain byte, add READ_ONCE()
and WRITE_ONCE() annotations where needed,
and IPV6_ADDR_PREFERENCES setsockopt() can now be lockless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230918142321.1794107-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Marek reports that a deadlock occurs with the AX88772A PHY used on the
ASIX USB network driver:
asix 1-1.4:1.0 (unnamed net_device) (uninitialized): PHY [usb-001:003:10] driver [Asix Electronics AX88772A] (irq=POLL)
Asix Electronics AX88772A usb-001:003:10: attached PHY driver(mii_bus:phy_addr=usb-001:003:10, irq=POLL)
asix 1-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX AX88772 USB 2.0 Ethernet, a2:99:b6:cd:11:eb
asix 1-1.4:1.0 eth0: configuring for phy/internal link mode
============================================
WARNING: possible recursive locking detected
6.6.0-rc1-00239-g8da77df649c4-dirty #13949 Not tainted
--------------------------------------------
kworker/3:3/71 is trying to acquire lock:
c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_start_aneg+0x1c/0x38
but task is already holding lock:
c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8
This is because we now consistently call phy_process_state_change()
while holding phydev->lock, but the AX88772A PHY driver then goes on
to call phy_start_aneg() which tries to grab the same lock - causing
deadlock.
Fix this by exporting the unlocked version, and use this in the PHY
driver instead.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: ef113a60d0a9 ("net: phy: call phy_error_precise() while holding the lock")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/E1qiEFs-007g7b-Lq@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into i2c/for-mergewindow
Immutable branch between MFD, I2C and Reboot due for the v6.7 merge window
|
|
Get a null-ptr-deref bug as follows with reproducer [1].
BUG: kernel NULL pointer dereference, address: 0000000000000228
...
RIP: 0010:vlan_dev_hard_header+0x35/0x140 [8021q]
...
Call Trace:
<TASK>
? __die+0x24/0x70
? page_fault_oops+0x82/0x150
? exc_page_fault+0x69/0x150
? asm_exc_page_fault+0x26/0x30
? vlan_dev_hard_header+0x35/0x140 [8021q]
? vlan_dev_hard_header+0x8e/0x140 [8021q]
neigh_connected_output+0xb2/0x100
ip6_finish_output2+0x1cb/0x520
? nf_hook_slow+0x43/0xc0
? ip6_mtu+0x46/0x80
ip6_finish_output+0x2a/0xb0
mld_sendpack+0x18f/0x250
mld_ifc_work+0x39/0x160
process_one_work+0x1e6/0x3f0
worker_thread+0x4d/0x2f0
? __pfx_worker_thread+0x10/0x10
kthread+0xe5/0x120
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
[1]
$ teamd -t team0 -d -c '{"runner": {"name": "loadbalance"}}'
$ ip link add name t-dummy type dummy
$ ip link add link t-dummy name t-dummy.100 type vlan id 100
$ ip link add name t-nlmon type nlmon
$ ip link set t-nlmon master team0
$ ip link set t-nlmon nomaster
$ ip link set t-dummy up
$ ip link set team0 up
$ ip link set t-dummy.100 down
$ ip link set t-dummy.100 master team0
When enslave a vlan device to team device and team device type is changed
from non-ether to ether, header_ops of team device is changed to
vlan_header_ops. That is incorrect and will trigger null-ptr-deref
for vlan->real_dev in vlan_dev_hard_header() because team device is not
a vlan device.
Cache eth_header_ops in team_setup(), then assign cached header_ops to
header_ops of team net device when its type is changed from non-ether
to ether to fix the bug.
Fixes: 1d76efe1577b ("team: add support for non-ethernet devices")
Suggested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230918123011.1884401-1-william.xuanziyang@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull NFS client fixes from Anna Schumaker:
"Various O_DIRECT related fixes from Trond:
- Error handling
- Locking issues
- Use the correct commit info for joining page groups
- Fixes for rescheduling IO
Sunrpc bad verifier fixes:
- Report EINVAL errors from connect()
- Revalidate creds that the server has rejected
- Revert "SUNRPC: Fail faster on bad verifier"
Misc:
- Fix pNFS session trunking when MDS=DS
- Fix zero-value filehandles for post-open getattr operations
- Fix compiler warning about tautological comparisons
- Revert 'SUNRPC: clean up integer overflow check' before Trond's fix"
* tag 'nfs-for-6.6-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Silence compiler complaints about tautological comparisons
Revert "SUNRPC: clean up integer overflow check"
NFSv4.1: fix zero value filehandle in post open getattr
NFSv4.1: fix pnfs MDS=DS session trunking
Revert "SUNRPC: Fail faster on bad verifier"
SUNRPC: Mark the cred for revalidation if the server rejects it
NFS/pNFS: Report EINVAL errors from connect() to the server
NFS: More fixes for nfs_direct_write_reschedule_io()
NFS: Use the correct commit info in nfs_join_page_group()
NFS: More O_DIRECT accounting fixes for error paths
NFS: Fix O_DIRECT locking issues
NFS: Fix error handling for O_DIRECT write scheduling
|
|
For the write_atomic callback, the console driver may have unsafe
regions that need to be appropriately marked. Provide functions
that accept the nbcon_write_context struct to allow for the driver
to enter and exit unsafe regions.
Also provide a function for drivers to check if they are still the
owner of the console.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230916192007.608398-9-john.ogness@linutronix.de
|
|
Implement an emit function for nbcon consoles to output printk
messages. It utilizes the lockless printk_get_next_message() and
console_prepend_dropped() functions to retrieve/build the output
message. The emit function includes the required safety points to
check for handover/takeover and calls a new write_atomic callback
of the console driver to output the message. It also includes
proper handling for updating the nbcon console sequence number.
A new nbcon_write_context struct is introduced. This is provided
to the write_atomic callback and includes only the information
necessary for performing atomic writes.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230916192007.608398-8-john.ogness@linutronix.de
|
|
Add an atomic_long_t field @nbcon_seq to the console struct to
store the sequence number for nbcon consoles. For nbcon consoles
this will be used instead of the non-atomic @seq field. The new
field allows for safe atomic sequence number updates without
requiring any locking.
On 64bit systems the new field stores the full sequence number.
On 32bit systems the new field stores the lower 32 bits of the
sequence number, which are expanded to 64bit as needed by
folding the values based on the sequence numbers available in
the ringbuffer.
For 32bit systems, having a 32bit representation in the console
is sufficient. If a console ever gets more than 2^31 records
behind the ringbuffer then this is the least of the problems.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230916192007.608398-7-john.ogness@linutronix.de
|
|
In case of hostile takeovers it must be ensured that the previous
owner cannot scribble over the output buffer of the emergency/panic
context. This is achieved by:
- Adding a global output buffer instance for the panic context.
This is the only situation where hostile takeovers can occur and
there is always at most 1 panic context.
- Allocating an output buffer per non-boot console upon console
registration. This buffer is used by the console owner when not
in panic context. (For boot consoles, the existing shared global
legacy output buffer is used instead. Boot console printing will
be synchronized with legacy console printing.)
- Choosing the appropriate buffer is handled in the acquire/release
functions.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230916192007.608398-5-john.ogness@linutronix.de
|
|
Add per console acquire/release functionality.
The state of the console is maintained in the "nbcon_state" atomic
variable.
The console is locked when:
- The 'prio' field contains the priority of the context that owns the
console. Only higher priority contexts are allowed to take over the
lock. A value of 0 (NBCON_PRIO_NONE) means the console is not locked.
- The 'cpu' field denotes on which CPU the console is locked. It is used
to prevent busy waiting on the same CPU. Also it informs the lock owner
that it has lost the lock in a more complex scenario when the lock was
taken over by a higher priority context, released, and taken on another
CPU with the same priority as the interrupted owner.
The acquire mechanism uses a few more fields:
- The 'req_prio' field is used by the handover approach to make the
current owner aware that there is a context with a higher priority
waiting for the friendly handover.
- The 'unsafe' field allows to take over the console in a safe way in the
middle of emitting a message. The field is set only when accessing some
shared resources or when the console device is manipulated. It can be
cleared, for example, after emitting one character when the console
device is in a consistent state.
- The 'unsafe_takeover' field is set when a hostile takeover took the
console in an unsafe state. The console will stay in the unsafe state
until re-initialized.
The acquire mechanism uses three approaches:
1) Direct acquire when the console is not owned or is owned by a lower
priority context and is in a safe state.
2) Friendly handover mechanism uses a request/grant handshake. It is used
when the current owner has lower priority and the console is in an
unsafe state.
The requesting context:
a) Sets its priority into the 'req_prio' field.
b) Waits (with a timeout) for the owning context to unlock the
console.
c) Takes the lock and clears the 'req_prio' field.
The owning context:
a) Observes the 'req_prio' field set on exit from the unsafe
console state.
b) Gives up console ownership by clearing the 'prio' field.
3) Unsafe hostile takeover allows to take over the lock even when the
console is an unsafe state. It is used only in panic() by the final
attempt to flush consoles in a try and hope mode.
Note that separate record buffers are used in panic(). As a result,
the messages can be read and formatted without any risk even after
using the hostile takeover in unsafe state.
The release function simply clears the 'prio' field.
All operations on @console::nbcon_state are atomic cmpxchg based to
handle concurrency.
The acquire/release functions implement only minimal policies:
- Preference for higher priority contexts.
- Protection of the panic CPU.
All other policy decisions must be made at the call sites:
- What is marked as an unsafe section.
- Whether to spin-wait if there is already an owner and the console is
in an unsafe state.
- Whether to attempt an unsafe hostile takeover.
The design allows to implement the well known:
acquire()
output_one_printk_record()
release()
The output of one printk record might be interrupted with a higher priority
context. The new owner is supposed to reprint the entire interrupted record
from scratch.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230916192007.608398-3-john.ogness@linutronix.de
|
|
The current console/printk subsystem is protected by a Big Kernel Lock,
(aka console_lock) which has ill defined semantics and is more or less
stateless. This puts severe limitations on the console subsystem and
makes forced takeover and output in emergency and panic situations a
fragile endeavour that is based on try and pray.
The goal of non-BKL (nbcon) consoles is to break out of the console lock
jail and to provide a new infrastructure that avoids the pitfalls and
also allows console drivers to be gradually converted over.
The proposed infrastructure aims for the following properties:
- Per console locking instead of global locking
- Per console state that allows to make informed decisions
- Stateful handover and takeover
As a first step, state is added to struct console. The per console state
is an atomic_t using a 32bit bit field.
Reserve state bits, which will be populated later in the series. Wire
it up into the console register/unregister functionality.
It was decided to use a bitfield because using a plain u32 with
mask/shift operations resulted in uncomprehensible code.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230916192007.608398-2-john.ogness@linutronix.de
|
|
Add sw fallback of tx checksum calculation for those tx queues that
don't support tx checksum offloading. DW xGMAC IP can be synthesized
such that it can support tx checksum offloading only for a few
initial tx queues. Also as Serge pointed out, for the DW QoS IP, tx
coe can be individually configured for each tx queue.
So when tx coe is enabled, for any tx queue that doesn't support
tx coe with 'coe-unsupported' flag set will have a sw fallback
happen in the driver for tx checksum calculation when any packets to
be transmitted on these tx queues.
Signed-off-by: Rohan G Thomas <rohan.g.thomas@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 3af5ae22030cb59fab4fba35f5a2b62f47e14df9.
ceph_mds_request_args_ext was already (and remains to be) a union. An
additional anonymous union inside is bogus:
union ceph_mds_request_args_ext {
union {
union ceph_mds_request_args old;
struct { ... } __attribute__ ((packed)) setattr_ext;
};
}
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct ceph_monmap.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a serial port is used for kernel console output, then all
modifications to the UART registers which are done from other contexts,
e.g. getty, termios, are interference points for the kernel console.
So far this has been ignored and the printk output is based on the
principle of hope. The rework of the console infrastructure which aims to
support threaded and atomic consoles, requires to mark sections which
modify the UART registers as unsafe. This allows the atomic write function
to make informed decisions and eventually to restore operational state. It
also allows to prevent the regular UART code from modifying UART registers
while printk output is in progress.
All modifications of UART registers are guarded by the UART port lock,
which provides an obvious synchronization point with the console
infrastructure.
To avoid adding this functionality to all UART drivers, wrap the
spin_[un]lock*() invocations for uart_port::lock into helper functions
which just contain the spin_[un]lock*() invocations for now. In a
subsequent step these helpers will gain the console synchronization
mechanisms.
Converted with coccinelle. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/20230914183831.587273-3-john.ogness@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When a serial port is used for kernel console output, then all
modifications to the UART registers which are done from other contexts,
e.g. getty, termios, are interference points for the kernel console.
So far this has been ignored and the printk output is based on the
principle of hope. The rework of the console infrastructure which aims to
support threaded and atomic consoles, requires to mark sections which
modify the UART registers as unsafe. This allows the atomic write function
to make informed decisions and eventually to restore operational state. It
also allows to prevent the regular UART code from modifying UART registers
while printk output is in progress.
All modifications of UART registers are guarded by the UART port lock,
which provides an obvious synchronization point with the console
infrastructure.
Provide wrapper functions for spin_[un]lock*(port->lock) invocations so
that the console mechanics can be applied later on at a single place and
does not require to copy the same logic all over the drivers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/20230914183831.587273-2-john.ogness@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Similar to what we do in the AdminQ, check for devcmd health
while waiting for an answer.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for freezer to also use saved_state, remove the
CONFIG_PREEMPT_RT compilation guard around saved_state.
On the arm64 platform I tested which did not have CONFIG_PREEMPT_RT,
there was no statistically significant deviation by applying this patch.
Test methodology:
perf bench sched message -g 40 -l 40
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. Since, is_kdump_kernel() is used to restrict resources in
crash dump capture kernel and such restrictions may not be desirable
for fadump, allow is_kdump_kernel() to be defined differently for such
scenarios. With this, is_kdump_kernel() could be false while vmcore is
usable. So, remove unnecessary dependency with is_kdump_kernel(), for
exporting vmcore.
[1] https://docs.kernel.org/powerpc/firmware-assisted-dump.html
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230912082950.856977-1-hbathini@linux.ibm.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- Fix an UV boot crash
- Skip spurious ENDBR generation on _THIS_IP_
- Fix ENDBR use in putuser() asm methods
- Fix corner case boot crashes on 5-level paging
- and fix a false positive WARNING on LTO kernels"
* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Remove LTO flags
x86/boot/compressed: Reserve more memory for page tables
x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
x86/ibt: Suppress spurious ENDBR
x86/platform/uv: Use alternate source for socket to node data
|