Age | Commit message (Collapse) | Author |
|
Harald Freudenberger says:
====================
This is a complete rework of the protected key AES (PAES) implementation.
The goal of this rework is to implement the 4 modes (ecb, cbc, ctr, xts)
in a real asynchronous fashion:
- init(), exit() and setkey() are synchronous and don't allocate any memory.
- the encrypt/decrypt functions first try to do the job in a synchronous
manner. If this fails, for example the protected key got invalid caused
by a guest suspend/resume or guest migration action, the encrypt/decrypt
is transferred to an instance of the crypto engine (see below) for
asynchronous processing.
These postponed requests are then handled by the crypto engine by
invoking the do_one_request() callback but may of course again run into
a still not converted key or the key is getting invalid. If the key is
still not converted, the first thread does the conversion and updates
the key status in the transformation context. The conversion is
invoked via pkey API with a new flag PKEY_XFLAG_NOMEMALLOC.
Note that once there is an active requests enqueued to get async
processed via crypto engine, further requests also need to go via
crypto engine to keep the request sequence.
This patch together with the pkey/zcrypt/AP extensions to support
the new PKEY_XFLAG_NOMEMMALOC should toughen the paes crypto algorithms
to truly meet the requirements for in-kernel skcipher implementations
and the usage patterns for the dm-crypt and dm-integrity layers.
The new flag PKEY_XFLAG_NOMEMALLOC tells the PKEY layer (and
subsidiary layers) that it must not allocate any memory causing IO
operations. Note that the patches for this pkey/zcrypt/AP extensions
are currently in the features branch but may be seen in the master
branch with the next merge.
There is still some confusion about the way how paes treats the key
within the transformation context. The tfm context may be shared by
multiple requests running en/decryption with the same key. So the tfm
context is supposed to be read-only.
The s390 protected key support is in fact an encrypted key with the
wrapping key sitting in the firmware. On each invocation of a
protected key instruction the firmware unwraps the pkey and performs
the operation. Part of the protected key is a hash about the wrapping
key used - so the firmware is able to detect if a protected key
matches to the wrapping key or not. If there is a mismatch the cpacf
operation fails with cc 1 (key invalid). Such a situation can occur
for example with a kvm live guest migration to another machine where
the guest simple awakens in a new environment. As the wrapping key is
NOT transfered, after the reawakening all protected key cpacf
operations fail with "key invalid". There exist other situations
where a protected key cpacf operation may run into "key invalid" and
thus the code needs to be prepared for such cpacf failures.
The recovery is simple: via pkey API the source key material (in real
cases this is usually a secure key bound to a HSM) needs to generate
a new protected key which is the wrapped by the wrapping key of the
current firmware.
So the paes tfms hold the source key material to be able to
re-generate the protected key at any time. A naive implementation
would hold the protected key in some kind of running context (for
example the request context) and only the source key would be stored
in the tfm context. But the derivation of the protected key from the
source key is an expensive and time consuming process often involving
interaction with a crypto card. And such a naive implementation would
then for every tfm in use trigger the derivation process individual.
So why not store the protected key in tfm context and only the very
first process hitting the "invalid key" cc runs the derivation and
updates the protected key stored in the tfm. The only really important
thing is that the protected key update and cloning from this value
needs to be done in a atomic fashion.
Please note that there are still race conditions where the protected
key stored in the tfm may get updated by an (outdated) protected key
value. This is not an issue and the code handles this correctly by
again re-deriving the protected key. The only fact that matters, is
that the protected key must always be in a state where the cpacf
instructions can figure out if it is valid (the hash part of the
protected key matches to the hash of the wrapping key) or invalid
(and refuse the crypto operation with "invalid key").
Changelog:
v1 - first version. Applied and tested on top of the mentioned
pkey/zcrypt/AP changes. Selftests and multithreaded testcases
executed via AP_ALG interface run successful and even instrumented
code (with some sleeps to force asynch pathes) ran fine.
Code is good enough for a first code review and collecting feedback.
v2 - A new patch which does a slight rework of the cpacf_pcc() inline
function to return the condition code.
A rework of the paes implementation based on feedback from Herbert
and Ingo:
- the spinlock is now consequently used to protect updates and
changes on the protected key and protected key state within
the transformation context.
- setkey() is now synchronous
- the walk is now held in the request context and thus the
postponing of a request to the engine and later processing
can continue at exactly the same state.
- the param block needed for the cpacf instructions is constructed
once and held in the request context.
- if a request can't get handled synchronous, it is postponed
for asynch processing via an instance of the crpyto engine.
With v2 comes a patch which updates the crypto engine docu
in Documentation/crypto. Feel free to use it or drop it or
do some rework - at least it needs some review.
v2 was only posted internal to collect some feedback within IBM.
v3 - Slight improvements based on feedback from Finn.
v4 - With feedback from Holger and Herbert Xu. Holger gave some good
hints about better readability of the code and I picked nearly
all his suggestions. Herbert noted that once a request goes via
engine to keep the sequence as long as there are requests
enqueued the following requests should also go via engine. This
is now realized via a via_engine_ctr atomic counter in the tfm
context.
Stress tested with lots of debug code to run through all the
failure paths of the code. Looks good.
v5 - Fixed two typos and 1 too long line in the commit message found
by Holger. Added Acked-by and Reviewed-by.
Removed patch #3 which updates the crypto engine docu - this
will go separate. All prepared for picking in the s390 subsystem.
====================
Link: https://lore.kernel.org/r/20250514090955.72370-1-freude@linux.ibm.com/
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This is a complete rework of the protected key AES (PAES) implementation.
The goal of this rework is to implement the 4 modes (ecb, cbc, ctr, xts)
in a real asynchronous fashion:
- init(), exit() and setkey() are synchronous and don't allocate any
memory.
- the encrypt/decrypt functions first try to do the job in a synchronous
manner. If this fails, for example the protected key got invalid caused
by a guest suspend/resume or guest migration action, the encrypt/decrypt
is transferred to an instance of the crypto engine (see below) for
asynchronous processing.
These postponed requests are then handled by the crypto engine by
invoking the do_one_request() callback but may of course again run into
a still not converted key or the key is getting invalid. If the key is
still not converted, the first thread does the conversion and updates
the key status in the transformation context. The conversion is
invoked via pkey API with a new flag PKEY_XFLAG_NOMEMALLOC.
Note that once there is an active requests enqueued to get async
processed via crypto engine, further requests also need to go via
crypto engine to keep the request sequence.
This patch together with the pkey/zcrypt/AP extensions to support
the new PKEY_XFLAG_NOMEMMALOC should toughen the paes crypto algorithms
to truly meet the requirements for in-kernel skcipher implementations
and the usage patterns for the dm-crypt and dm-integrity layers.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20250514090955.72370-3-freude@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Some of the pcc sub-functions have a protected key as
input and thus may run into the situation that this
key may be invalid for example due to live guest migration
to another physical hardware.
Rework the inline assembler function cpacf_pcc() to
return the condition code (cc) as return value:
0 - cc code 0 (normal completion)
1 - cc code 1 (prot key wkvp mismatch or src op out of range)
2 - cc code 2 (something invalid, scalar multiply infinity, ...)
Note that cc 3 (partial completion) is handled within the asm code
and never returned.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Link: https://lore.kernel.org/r/20250514090955.72370-2-freude@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
PARAVIRT_XXL is exclusively utilized by XEN_PV, which is only compatible
with 64-bit machines.
Clearly designate PARAVIRT_XXL as 64-bit only and remove ifdefs to
support CONFIG_PGTABLE_LEVELS < 5.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-5-kirill.shutemov@linux.intel.com
|
|
Both Intel and AMD CPUs support 5-level paging, which is expected to
become more widely adopted in the future. All major x86 Linux
distributions have the feature enabled.
Remove CONFIG_X86_5LEVEL and related #ifdeffery for it to make it more readable.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-4-kirill.shutemov@linux.intel.com
|
|
5-level paging only supports SPARSEMEM_VMEMMAP. CONFIG_X86_5LEVEL is
being phased out, making 5-level paging support mandatory.
Make CONFIG_SPARSEMEM_VMEMMAP mandatory for x86-64 and eliminate
any associated conditional statements.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-3-kirill.shutemov@linux.intel.com
|
|
Dynamic memory layout is used by KASLR and 5-level paging.
CONFIG_X86_5LEVEL is going to be removed, making 5-level paging support
unconditional which requires unconditional support of dynamic memory
layout.
Remove CONFIG_DYNAMIC_MEMORY_LAYOUT.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250516123306.3812286-2-kirill.shutemov@linux.intel.com
|
|
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.15-rc7
- designware: cleanup properly on probe failure
|
|
Add a helper to check if an event is in freq mode to improve readability.
No functional changes.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250516182853.2610284-2-kan.liang@linux.intel.com
|
|
Pull smb client fixes from Steve French:
- Fix memory leak in mkdir error path
- Fix max rsize miscalculation after channel reconnect
* tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: fix zero rsize error messages
smb: client: fix memory leak during error handling for POSIX mkdir
|
|
Guoyu Yin reported a splat in the ipmr netns cleanup path:
WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_free_table net/ipv4/ipmr.c:440 [inline]
WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361
Modules linked in:
CPU: 2 UID: 0 PID: 14564 Comm: syz.4.838 Not tainted 6.14.0 #1
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:ipmr_free_table net/ipv4/ipmr.c:440 [inline]
RIP: 0010:ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361
Code: ff df 48 c1 ea 03 80 3c 02 00 75 7d 48 c7 83 60 05 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e e9 71 67 7f 00 e8 4c 2d 8a fd 90 <0f> 0b 90 eb 93 e8 41 2d 8a fd 0f b6 2d 80 54 ea 01 31 ff 89 ee e8
RSP: 0018:ffff888109547c58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888108c12dc0 RCX: ffffffff83e09868
RDX: ffff8881022b3300 RSI: ffffffff83e098d4 RDI: 0000000000000005
RBP: ffff888104288000 R08: 0000000000000000 R09: ffffed10211825c9
R10: 0000000000000001 R11: ffff88801816c4a0 R12: 0000000000000001
R13: ffff888108c13320 R14: ffff888108c12dc0 R15: fffffbfff0b74058
FS: 00007f84f39316c0(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f84f3930f98 CR3: 0000000113b56000 CR4: 0000000000350ef0
Call Trace:
<TASK>
ipmr_net_exit_batch+0x50/0x90 net/ipv4/ipmr.c:3160
ops_exit_list+0x10c/0x160 net/core/net_namespace.c:177
setup_net+0x47d/0x8e0 net/core/net_namespace.c:394
copy_net_ns+0x25d/0x410 net/core/net_namespace.c:516
create_new_namespaces+0x3f6/0xaf0 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0xc3/0x180 kernel/nsproxy.c:228
ksys_unshare+0x78d/0x9a0 kernel/fork.c:3342
__do_sys_unshare kernel/fork.c:3413 [inline]
__se_sys_unshare kernel/fork.c:3411 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:3411
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xa6/0x1a0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f84f532cc29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f84f3931038 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 00007f84f5615fa0 RCX: 00007f84f532cc29
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000400
RBP: 00007f84f53fba18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f84f5615fa0 R15: 00007fff51c5f328
</TASK>
The running kernel has CONFIG_IP_MROUTE_MULTIPLE_TABLES disabled, and
the sanity check for such build is still too loose.
Address the issue consolidating the relevant sanity check in a single
helper regardless of the kernel configuration. Also share it between
the ipv4 and ipv6 code.
Reported-by: Guoyu Yin <y04609127@gmail.com>
Fixes: 50b94204446e ("ipmr: tune the ipmr_can_free_table() checks.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/372dc261e1bf12742276e1b984fc5a071b7fc5a8.1747321903.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not recycle the page twice in airoha_qdma_rx_process routine in case
of error. Just run dev_kfree_skb() if the skb has been allocated and marked
for recycling. Run page_pool_put_full_page() directly if the skb has not
been allocated yet.
Moreover, rely on DMA address from queue entry element instead of reading
it from the DMA descriptor for DMA syncing in airoha_qdma_rx_process().
Fixes: e12182ddb6e71 ("net: airoha: Enable Rx Scatter-Gather")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250515-airoha-fix-rx-process-error-condition-v2-1-657e92c894b9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, I'll be honest and say I think this is larger than
I'd prefer at this point, the main blow out point is that xe has two
larger fixes.
One is a fix for active context utilisation reporting, it's for a
reported regression and will end up in stable anyways, so I don't see
any point in holding it up.
The second is a fix for mixed cpu/gpu atomics, which are currently
broken, but are also not something your average desktop/laptop user is
going to hit in normal operation, and having them fixed now is better
than threading them through stable later.
Other than those, it's mostly the usual, a bunch of amdgpu randoms and
a few other minor fixes.
dma-buf:
- Avoid memory reordering in fence handling
meson:
- Avoid integer overflow in mode-clock calculations
panel-mipi-dbi:
- Fix output with drm_client_setup_with_fourcc()
amdgpu:
- Fix CSA unmap
- Fix MALL size reporting on GFX11.5
- AUX fix
- DCN 3.5 fix
- VRR fix
- DP MST fix
- DML 2.1 fixes
- Silence DP AUX spam
- DCN 4.0.1 cursor fix
- VCN 4.0.5 fix
ivpu:
- Fix buffer size in debugfs code
gpuvm:
- Add timeslicing and allocation restriction for SVM
xe:
- Fix shrinker debugfs name
- Add HW workaround to Xe2
- Fix SVM when mixing GPU and CPU atomics
- Fix per client engine utilization due to active contexts not saving
timestamp with lite restore enabled"
* tag 'drm-fixes-2025-05-17' of https://gitlab.freedesktop.org/drm/kernel: (24 commits)
drm/xe: Add WA BB to capture active context utilization
drm/xe: Save the gt pointer in lrc and drop the tile
drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value
drm/xe: Timeslice GPU on atomic SVM fault
drm/gpusvm: Add timeslicing support to GPU SVM
drm/xe: Strict migration policy for atomic SVM faults
drm/gpusvm: Introduce devmem_only flag for allocation
drm/xe/xe2hpg: Add Wa_22021007897
drm/amdgpu: read back register after written for VCN v4.0.5
Revert "drm/amd/display: Hardware cursor changes color when switched to software cursor"
dma-buf: insert memory barrier before updating num_fences
drm/xe: Fix the gem shrinker name
drm/amd/display: Avoid flooding unnecessary info messages
drm/amd/display: Fix null check of pipe_ctx->plane_state for update_dchubp_dpp
drm/amd/display: check stream id dml21 wrapper to get plane_id
drm/amd/display: fix link_set_dpms_off multi-display MST corner case
drm/amd/display: Defer BW-optimization-blocked DRR adjustments
Revert: "drm/amd/display: Enable urgent latency adjustment on DCN35"
drm/amd/display: Correct the reply value when AUX write incomplete
drm/amdgpu: fix incorrect MALL size for GFX1151
...
|
|
Currently, when device mtu is updated, vmxnet3 updates netdev mtu, quiesces
the device and then reactivates it for the ESXi to know about the new mtu.
So, technically the OS stack can start using the new mtu before ESXi knows
about the new mtu.
This can lead to issues for TSO packets which use mss as per the new mtu
configured. This patch fixes this issue by moving the mtu write after
device quiesce.
Cc: stable@vger.kernel.org
Fixes: d1a890fa37f2 ("net: VMware virtual Ethernet NIC driver: vmxnet3")
Signed-off-by: Ronak Doshi <ronak.doshi@broadcom.com>
Acked-by: Guolin Yang <guolin.yang@broadcom.com>
Changes v1-> v2:
Moved MTU write after destroy of rx rings
Link: https://patch.msgid.link/20250515190457.8597-1-ronak.doshi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When netfilter defrag hooks are loaded (due to the presence of conntrack
rules, for example), fragmented packets entering the bridge will be
defragged by the bridge's pre-routing hook (br_nf_pre_routing() ->
ipv4_conntrack_defrag()).
Later on, in the bridge's post-routing hook, the defragged packet will
be fragmented again. If the size of the largest fragment is larger than
what the kernel has determined as the destination MTU (using
ip_skb_dst_mtu()), the defragged packet will be dropped.
Before commit ac6627a28dbf ("net: ipv4: Consolidate ipv4_mtu and
ip_dst_mtu_maybe_forward"), ip_skb_dst_mtu() would return dst_mtu() as
the destination MTU. Assuming the dst entry attached to the packet is
the bridge's fake rtable one, this would simply be the bridge's MTU (see
fake_mtu()).
However, after above mentioned commit, ip_skb_dst_mtu() ends up
returning the route's MTU stored in the dst entry's metrics. Ideally, in
case the dst entry is the bridge's fake rtable one, this should be the
bridge's MTU as the bridge takes care of updating this metric when its
MTU changes (see br_change_mtu()).
Unfortunately, the last operation is a no-op given the metrics attached
to the fake rtable entry are marked as read-only. Therefore,
ip_skb_dst_mtu() ends up returning 1500 (the initial MTU value) and
defragged packets are dropped during fragmentation when dealing with
large fragments and high MTU (e.g., 9k).
Fix by moving the fake rtable entry's metrics to be per-bridge (in a
similar fashion to the fake rtable entry itself) and marking them as
writable, thereby allowing MTU changes to be reflected.
Fixes: 62fa8a846d7d ("net: Implement read-only protection and COW'ing of metrics.")
Fixes: 33eb9873a283 ("bridge: initialize fake_rtable metrics")
Reported-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Closes: https://lore.kernel.org/netdev/PH0PR10MB4504888284FF4CBA648197D0ACB82@PH0PR10MB4504.namprd10.prod.outlook.com/
Tested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250515084848.727706-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pointer arithmentic for accessing the tail tag only works
for linear skbs.
For nonlinear skbs, it reads uninitialized memory inside the
skb headroom, essentially randomizing the tag. I have observed
it gets set to 6 most of the time.
Example where ksz9477_rcv thinks that the packet from port 1 comes from port 6
(which does not exist for the ksz9896 that's in use), dropping the packet.
Debug prints added by me (not included in this patch):
[ 256.645337] ksz9477_rcv:323 tag0=6
[ 256.645349] skb len=47 headroom=78 headlen=0 tailroom=0
mac=(64,14) mac_len=14 net=(78,0) trans=78
shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=1 iif=3
priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0
encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
[ 256.645377] dev name=end1 feat=0x0002e10200114bb3
[ 256.645386] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645395] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645403] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645411] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 256.645420] skb headroom: 00000040: ff ff ff ff ff ff 00 1c 19 f2 e2 db 08 06
[ 256.645428] skb frag: 00000000: 00 01 08 00 06 04 00 01 00 1c 19 f2 e2 db 0a 02
[ 256.645436] skb frag: 00000010: 00 83 00 00 00 00 00 00 0a 02 a0 2f 00 00 00 00
[ 256.645444] skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
[ 256.645452] ksz_common_rcv:92 dsa_conduit_find_user returned NULL
Call skb_linearize before trying to access the tag.
This patch fixes ksz9477_rcv which is used by the ksz9896 I have at
hand, and also applies the same fix to ksz8795_rcv which seems to have
the same problem.
Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher@cherry.de>
CC: stable@vger.kernel.org
Fixes: 016e43a26bab ("net: dsa: ksz: Add KSZ8795 tag code")
Fixes: 8b8010fb7876 ("dsa: add support for Microchip KSZ tail tagging")
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20250515072920.2313014-1-jakob.unterwurzacher@cherry.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzkaller reports the following issue:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
netdev_lock include/linux/netdevice.h:2751 [inline]
netdev_lock_ops include/net/netdev_lock.h:42 [inline]
dev_set_promiscuity+0x10e/0x260 net/core/dev_api.c:285
bond_set_promiscuity drivers/net/bonding/bond_main.c:922 [inline]
bond_change_rx_flags+0x219/0x690 drivers/net/bonding/bond_main.c:4732
dev_change_rx_flags net/core/dev.c:9145 [inline]
__dev_set_promiscuity+0x3f5/0x590 net/core/dev.c:9189
netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201
dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286
^^ all of the above is under rcu lock
team_change_rx_flags+0x1b3/0x330 drivers/net/team/team_core.c:1785
dev_change_rx_flags net/core/dev.c:9145 [inline]
__dev_set_promiscuity+0x3f5/0x590 net/core/dev.c:9189
netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201
dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286
hsr_del_port+0x25e/0x2d0 net/hsr/hsr_slave.c:233
hsr_netdev_notify+0x827/0xb60 net/hsr/hsr_main.c:104
notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2214 [inline]
call_netdevice_notifiers net/core/dev.c:2228 [inline]
unregister_netdevice_many_notify+0x15d8/0x2330 net/core/dev.c:11970
rtnl_delete_link net/core/rtnetlink.c:3522 [inline]
rtnl_dellink+0x488/0x710 net/core/rtnetlink.c:3564
rtnetlink_rcv_msg+0x7cc/0xb70 net/core/rtnetlink.c:6955
netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x758/0x8d0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
team_change_rx_flags runs under rcu lock which means we can't grab
instance lock for the lower devices. Switch to team->lock, similar
to what we already do for team_set_mac_address and team_change_mtu.
Fixes: 78cd408356fe ("net: add missing instance lock to dev_set_promiscuity")
Reported-by: syzbot+53485086a41dbb43270a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=53485086a41dbb43270a
Link: https://lore.kernel.org/netdev/6822cc81.050a0220.f2294.00e8.GAE@google.com
Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com>
Link: https://patch.msgid.link/20250514220319.3505158-1-stfomichev@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The debugfs summary output could access uninitialized elements in
the freq_in[] and signal_out[] arrays, causing NULL pointer
dereferences and triggering a kernel Oops (page_fault_oops).
This patch adds u8 fields (nr_freq_in, nr_signal_out) to track the
number of initialized elements, with a maximum of 4 per array.
The summary output functions are updated to respect these limits,
preventing out-of-bounds access and ensuring safe array handling.
Widen the label variables because the change confuses GCC about
max length of the strings.
Fixes: ef61f5528fca ("ptp: ocp: add Adva timecard support")
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250514073541.35817-1-maimon.sagi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull NFS client bugfixes from Trond Myklebust:
- NFS: Fix a couple of missed handlers for the ENETDOWN and ENETUNREACH
transport errors
- NFS: Handle Oopsable failure of nfs_get_lock_context in the unlock
path
- NFSv4: Fix a race in nfs_local_open_fh()
- NFSv4/pNFS: Fix a couple of layout segment leaks in layoutreturn
- NFSv4/pNFS Avoid sharing pNFS DS connections between net namespaces
since IP addresses are not guaranteed to refer to the same nodes
- NFS: Don't flush file data while holding multiple directory locks in
nfs_rename()
* tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Avoid flushing data while holding directory locks in nfs_rename()
NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked()
NFSv4/pnfs: Reset the layout state after a layoutreturn
NFS/localio: Fix a race in nfs_local_open_fh()
nfs: nfs3acl: drop useless assignment in nfs3_get_acl()
nfs: direct: drop useless initializer in nfs_direct_write_completion()
nfs: move the nfs4_data_server_cache into struct nfs_net
nfs: don't share pNFS DS connections between net namespaces
nfs: handle failure of nfs_get_lock_context in unlock path
pNFS/flexfiles: Record the RPC errors in the I/O tracepoints
NFSv4/pnfs: Layoutreturn on close must handle fatal networking errors
NFSv4: Handle fatal ENETDOWN and ENETUNREACH errors
|
|
The Linux client assumes that all filehandles are non-volatile for
renames within the same directory (otherwise sillyrename cannot work).
However, the existence of the Linux 'subtree_check' export option has
meant that nfs_rename() has always assumed it needs to flush writes
before attempting to rename.
Since NFSv4 does allow the client to query whether or not the server
exhibits this behaviour, and since knfsd does actually set the
appropriate flag when 'subtree_check' is enabled on an export, it
should be OK to optimise away the write flushing behaviour in the cases
where it is clearly not needed.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
If there isn't a valid layout, or the layout stateid has changed, the
cleanup after a layout return should clear out the old data.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If there are still layout segments in the layout plh_return_lsegs list
after a layout return, we should be resetting the state to ensure they
eventually get returned as well.
Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
In an effort to move ARM64 away from the legacy MSI setup, convert the
Tegra PCIe driver to the MSI-parent infrastructure and let each device have
its own MSI domain.
[ tglx: Moved the struct out of the function call argument ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-10-maz@kernel.org
|
|
In an effort to move ARM64 away from the legacy MSI setup, convert the
XGENE PCIe driver to the MSI-parent infrastructure and let each device have
its own MSI domain.
[ tglx: Moved the struct out of the function call argument ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-9-maz@kernel.org
|
|
In an effort to move ARM64 away from the legacy MSI setup, convert the
Apple PCIe driver to the MSI-parent infrastructure and let each device have
its own MSI domain.
[ tglx: Moved the struct out of the function call argument ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://lore.kernel.org/all/20250513172819.2216709-8-maz@kernel.org
|
|
Bad MSI implementations multiplex MSIs onto a single downstream interrupt,
meaning they have no concept of individual affinity.
The old MSI code did a reasonable job at this by honouring the
MSI_FLAG_NO_AFFINITY, but the new shiny device MSI code doesn't.
Teach it about the sad reality of existing hardware.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-7-maz@kernel.org
|
|
Switch the MVEBU family of interrupt chip drivers over to the common helper
function to create the interrupt domains.
[ tglx: Moved the struct out of the function call argument and fix up
the of_node_to_fwnode() instances ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-5-maz@kernel.org
|
|
Switch the GIC family of interrupt chip drivers over to the common helper
function to create the interrupt domains.
[ tglx: Moved the struct out of the function call argument ]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-4-maz@kernel.org
|
|
Creating an irq domain that serves as an MSI parent requires
a substantial amount of esoteric boiler-plate code, some of
which is often provided twice (such as the bus token).
To make things a bit simpler for the unsuspecting MSI tinkerer,
provide a helper that does it for them, and serves as documentation
of what needs to be provided.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-3-maz@kernel.org
|
|
Move irq-msi-lib.h into include/linux/irqchip, making it available
to compilation units outside of drivers/irqchip.
This requires some churn in drivers to fetch it from the new location,
generated using this script:
git grep -l -w \"irq-msi-lib.h\" | \
xargs sed -i -e 's:"irq-msi-lib.h":\<linux/irqchip/irq-msi-lib.h\>:'
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250513172819.2216709-2-maz@kernel.org
|
|
Pick up the PCI changes to avoid conflicts.
|
|
Now that the file has been thrown through the mincer, finish the job and
consolidate the coding style.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
irqdomain.c's kernel-doc exists, but is not plugged into Documentation/
yet.
Before plugging it in, fix it first: irq_domain_get_irq_data() and
irq_domain_set_info() were documented twice. Identically, by both
definitions for CONFIG_IRQ_DOMAIN_HIERARCHY and !CONFIG_IRQ_DOMAIN_HIERARCHY.
Therefore, switch the second kernel-doc into an ordinary comment -- change
"/**" to simple "/*". This avoids sphinx's: WARNING: Duplicate C
declaration
Next, in commit b7b377332b96 ("irqdomain: Fix the kernel-doc and plug it
into Documentation"), irqdomain.h's (header) kernel-doc was added into
core-api/genericirq.rst. But given the amount of irqdomain functions and
structures, move all these to core-api/irq/irq-domain.rst now.
Finally, add these newly fixed irqdomain.c's (source) docs there as
well.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-58-jirislaby@kernel.org
|
|
The irqdomain documentaion became obsolete over time. Update and extend
it a bit with respect to the current code and HW.
Most notably the doubled documentation of irq_domain (from .rst and .h)
was unified and let only in .rst. A reference link was added to .h.
Furthermore:
* Add some 'struct' keywords, so that the respective structs are
hyperlinked
* :c:member: use where appropriate to mark a member of a struct
* Rephrase some wording to improve readability/understanding
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-57-jirislaby@kernel.org
|
|
The improvements include:
* Capitals in headlines.
* Add commas: for easier reading, it is always desired to add commas
at some places in text. Like before adverbs or after fronted
sentences.
* 3rd person -> add 's' to verbs.
* End some sentences with period and start a new one. Avoid thus heavy
sentences.
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-56-jirislaby@kernel.org
|
|
Just note in the docs:
1) A PCI device as an example for shared interrupts
2) A sparse tree can be used for interrupts too
3) i8259s which have 8 pins
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-55-jirislaby@kernel.org
|
|
For easier reading, it is always desired to add commas at some places in
text. Like before adverbs or after fronted sentences.
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-54-jirislaby@kernel.org
|
|
Many of irqdomain.h's functions are referenced in Documentation/ but are
not properly documented. Therefore, document these.
And use "Returns:" tag consistently, so that it is properly generated in
the resulting docs.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-53-jirislaby@kernel.org
|
|
This is just expressing it explicitly as irq_domain_instantiate() takes
const already. No functional change.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-52-jirislaby@kernel.org
|
|
This makes the code more compact.
Note that irq_domain_create_hierarchy()'s handling of size is now part of
info's initializer (using the ternary operator). That makes more sense
while reading it.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-51-jirislaby@kernel.org
|
|
irq_linear_revmap() is deprecated and unused now. So remove it.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-50-jirislaby@kernel.org
|
|
irq_linear_revmap() is deprecated, so remove all its uses and supersede
them by an identical call to irq_find_mapping().
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/all/20250319092951.37667-48-jirislaby@kernel.org
|
|
irq_linear_revmap() is deprecated, so remove all its uses and supersede
them by an identical call to irq_find_mapping().
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-47-jirislaby@kernel.org
|
|
irq_linear_revmap() is deprecated, so remove all its uses and supersede
them by an identical call to irq_find_mapping().
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-45-jirislaby@kernel.org
|
|
irq_linear_revmap() is deprecated, so remove all its uses and supersede
them by an identical call to irq_find_mapping().
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/all/20250319092951.37667-44-jirislaby@kernel.org
|
|
irq_linear_revmap() is deprecated, so remove all its uses and supersede
them by an identical call to irq_find_mapping().
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-43-jirislaby@kernel.org
|
|
irq_linear_revmap() is deprecated, so remove all its uses and supersede
them by an identical call to irq_find_mapping().
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> # for 8xx
Link: https://lore.kernel.org/all/20250319092951.37667-42-jirislaby@kernel.org
|
|
Most irq_domain_add_*() functions are unused now, so drop them. The
remaining ones are moved to the deprecated section and will be removed
during the merge window after the patches in various trees have been
merged.
Note: The Chinese docs are touched but unfinished. I cannot parse those.
[ tglx: Remove the leftover in irq-domain.rst and handle merge logistics ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-41-jirislaby@kernel.org
|
|
All irq_domain_add_*() functions are going away. PowerPC is the only
user of irq_domain_add_nomap() and there is no irq_domain_create_nomap()
complement.
Therefore, to align with the rest of the kernel, rename
irq_domain_add_nomap() to irq_domain_create_nomap() and accept a
fwnode_handle instead of a device_node.
[ tglx: Fix up subject prefix ]
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-40-jirislaby@kernel.org
|