Age | Commit message (Collapse) | Author |
|
Unlike the rsa driver, which separates signature decoding and
signature verification into two steps, the ecdsa driver does both in one.
This restricts users to the one signature format currently supported
(X9.62) and prevents addition of others such as P1363, which is needed
by the forthcoming SPDM library (Security Protocol and Data Model) for
PCI device authentication.
Per Herbert's suggestion, change ecdsa to use a "raw" signature encoding
and then implement X9.62 and P1363 as templates which convert their
respective encodings to the raw one. One may then specify
"x962(ecdsa-nist-XXX)" or "p1363(ecdsa-nist-XXX)" to pick the encoding.
The present commit moves X9.62 decoding to a template. A separate
commit is going to introduce another template for P1363 decoding.
The ecdsa driver internally represents a signature as two u64 arrays of
size ECC_MAX_BYTES. This appears to be the most natural choice for the
raw format as it can directly be used for verification without having to
further decode signature data or copy it around.
Repurpose all the existing test vectors for "x962(ecdsa-nist-XXX)" and
create a duplicate of them to test the raw encoding.
Link: https://lore.kernel.org/all/ZoHXyGwRzVvYkcTP@gondor.apana.org.au/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
If <linux/asn1_decoder.h> is the first header included from a .c file
(due to headers being sorted alphabetically), the compiler complains:
include/linux/asn1_decoder.h:18:29: error: unknown type name 'size_t'
Avoid by including <linux/types.h>.
Jonathan notes that the counterpart <linux/asn1_encoder.h> already
includes <linux/types.h>, but additionally includes the unnecessary
<linux/bug.h>. Drop it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The crypto_sig_*() API calls lived in sig.c so far because they needed
access to struct crypto_sig_type: This was necessary to differentiate
between signature algorithms that had already been migrated from
crypto_akcipher to crypto_sig and those that hadn't yet.
Now that all algorithms have been migrated, the API calls can become
static inlines in <crypto/sig.h> to mimic what <crypto/akcipher.h> is
doing.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
A sig_alg backend has just been introduced and all asymmetric
sign/verify algorithms have been migrated to it.
The sign/verify operations can thus be dropped from akcipher_alg.
It is now purely for asymmetric encrypt/decrypt.
Move struct crypto_akcipher_sync_data from internal.h to akcipher.c and
unexport crypto_akcipher_sync_{prep,post}(): They're no longer used by
sig.c but only locally in akcipher.c.
In crypto_akcipher_sync_{prep,post}(), drop various NULL pointer checks
for data->dst as they were only necessary for the verify operation.
In the crypto_sig_*() API calls, remove the forks that were necessary
while algorithms were converted from crypto_akcipher to crypto_sig
one by one.
In struct akcipher_testvec, remove the "params", "param_len" and "algo"
elements as they were only needed for the ecrdsa verify operation.
Remove corresponding dead code from test_akcipher_one() as well.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The virtio crypto driver exposes akcipher sign/verify operations in a
user space ABI. This blocks removal of sign/verify from akcipher_alg.
Herbert opines:
"I would say that this is something that we can break. Breaking it
is no different to running virtio on a host that does not support
these algorithms. After all, a software implementation must always
be present.
I deliberately left akcipher out of crypto_user because the API
is still in flux. We should not let virtio constrain ourselves."
https://lore.kernel.org/all/ZtqoNAgcnXnrYhZZ@gondor.apana.org.au/
"I would remove virtio akcipher support in its entirety. This API
was never meant to be exposed outside of the kernel."
https://lore.kernel.org/all/Ztqql_gqgZiMW8zz@gondor.apana.org.au/
Drop sign/verify support from virtio crypto. There's no strong reason
to also remove encrypt/decrypt support, so keep it.
A key selling point of virtio crypto is to allow guest access to crypto
accelerators on the host. So far the only akcipher algorithm supported
by virtio crypto is RSA. Dropping sign/verify merely means that the
PKCS#1 padding is now always generated or verified inside the guest,
but the actual signature generation/verification (which is an RSA
decrypt/encrypt operation) may still use an accelerator on the host.
Generating or verifying the PKCS#1 padding is cheap, so a hardware
accelerator won't be of much help there. Which begs the question
whether virtio crypto support for sign/verify makes sense at all.
It would make sense for the sign operation if the host has a security
chip to store asymmetric private keys. But the kernel doesn't even
have an asymmetric_key_subtype yet for hardware-based private keys.
There's at least one rudimentary driver for such chips (atmel-ecc.c for
ATECC508A), but it doesn't implement the sign operation. The kernel
would first have to grow support for a hardware asymmetric_key_subtype
and at least one driver implementing the sign operation before exposure
to guests via virtio makes sense.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
A sig_alg backend has just been introduced with the intent of moving all
asymmetric sign/verify algorithms to it one by one.
Migrate the sign/verify operations from rsa-pkcs1pad.c to a separate
rsassa-pkcs1.c which uses the new backend.
Consequently there are now two templates which build on the "rsa"
akcipher_alg:
* The existing "pkcs1pad" template, which is instantiated as an
akcipher_instance and retains the encrypt/decrypt operations of
RSAES-PKCS1-v1_5 (RFC 8017 sec 7.2).
* The new "pkcs1" template, which is instantiated as a sig_instance
and contains the sign/verify operations of RSASSA-PKCS1-v1_5
(RFC 8017 sec 8.2).
In a separate step, rsa-pkcs1pad.c could optionally be renamed to
rsaes-pkcs1.c for clarity. Additional "oaep" and "pss" templates
could be added for RSAES-OAEP and RSASSA-PSS.
Note that it's currently allowed to allocate a "pkcs1pad(rsa)" transform
without specifying a hash algorithm. That makes sense if the transform
is only used for encrypt/decrypt and continues to be supported. But for
sign/verify, such transforms previously did not insert the Full Hash
Prefix into the padding. The resulting message encoding was incompliant
with EMSA-PKCS1-v1_5 (RFC 8017 sec 9.2) and therefore nonsensical.
From here on in, it is no longer allowed to allocate a transform without
specifying a hash algorithm if the transform is used for sign/verify
operations. This simplifies the code because the insertion of the Full
Hash Prefix is no longer optional, so various "if (digest_info)" clauses
can be removed.
There has been a previous attempt to forbid transform allocation without
specifying a hash algorithm, namely by commit c0d20d22e0ad ("crypto:
rsa-pkcs1pad - Require hash to be present"). It had to be rolled back
with commit b3a8c8a5ebb5 ("crypto: rsa-pkcs1pad: Allow hash to be
optional [ver #2]"), presumably because it broke allocation of a
transform which was solely used for encrypt/decrypt, not sign/verify.
Avoid such breakage by allowing transform allocation for encrypt/decrypt
with and without specifying a hash algorithm (and simply ignoring the
hash algorithm in the former case).
So again, specifying a hash algorithm is now mandatory for sign/verify,
but optional and ignored for encrypt/decrypt.
The new sig_alg API uses kernel buffers instead of sglists, which
avoids the overhead of copying signature and digest from sglists back
into kernel buffers. rsassa-pkcs1.c is thus simplified quite a bit.
sig_alg is always synchronous, whereas the underlying "rsa" akcipher_alg
may be asynchronous. So await the result of the akcipher_alg, similar
to crypto_akcipher_sync_{en,de}crypt().
As part of the migration, rename "rsa_digest_info" to "hash_prefix" to
adhere to the spec language in RFC 9580. Otherwise keep the code
unmodified wherever possible to ease reviewing and bisecting. Leave
several simplification and hardening opportunities to separate commits.
rsassa-pkcs1.c uses modern __free() syntax for allocation of buffers
which need to be freed by kfree_sensitive(), hence a DEFINE_FREE()
clause for kfree_sensitive() is introduced herein as a byproduct.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
pkcs1pad_set_pub_key() and pkcs1pad_set_priv_key() are almost identical.
The upcoming migration of sign/verify operations from rsa-pkcs1pad.c
into a separate crypto_template will require another copy of the exact
same functions. When RSASSA-PSS and RSAES-OAEP are introduced, each
will need yet another copy.
Deduplicate the functions into a single one which lives in a common
header file for reuse by RSASSA-PKCS1-v1_5, RSASSA-PSS and RSAES-OAEP.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Commit 6cb8815f41a9 ("crypto: sig - Add interface for sign/verify")
began a transition of asymmetric sign/verify operations from
crypto_akcipher to a new crypto_sig frontend.
Internally, the crypto_sig frontend still uses akcipher_alg as backend,
however:
"The link between sig and akcipher is meant to be temporary. The
plan is to create a new low-level API for sig and then migrate
the signature code over to that from akcipher."
https://lore.kernel.org/r/ZrG6w9wsb-iiLZIF@gondor.apana.org.au/
"having a separate alg for sig is definitely where we want to
be since there is very little that the two types actually share."
https://lore.kernel.org/r/ZrHlpz4qnre0zWJO@gondor.apana.org.au/
Take the next step of that migration and augment the crypto_sig frontend
with a sig_alg backend to which all algorithms can be moved.
During the migration, there will briefly be signature algorithms that
are still based on crypto_akcipher, whilst others are already based on
crypto_sig. Allow for that by building a fork into crypto_sig_*() API
calls (i.e. crypto_sig_maxsize() and friends) such that one of the two
backends is selected based on the transform's cra_type.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ packet
scheduler.
Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.
This patchs adds to FQ packet scheduler TCA_FQ_OFFLOAD_HORIZON
attribute.
Its value is capped by the device max_pacing_offload_horizon,
added in the prior patch.
It allows FQ to let packets within pacing offload horizon
to be delivered to the device, which will handle the needed
delay without host involvement.
Signed-off-by: Jeffrey Ji <jeffreyji@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241003121219.2396589-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some network devices have the ability to offload EDT (Earliest
Departure Time) which is the model used for TCP pacing and FQ
packet scheduler.
Some of them implement the timing wheel mechanism described in
https://saeed.github.io/files/carousel-sigcomm17.pdf
with an associated 'timing wheel horizon'.
This patch adds dev->max_pacing_offload_horizon expressing
this timing wheel horizon in nsec units.
This is a read-only attribute.
Unless a driver sets it, dev->max_pacing_offload_horizon
is zero.
v2: addressed Jakub feedback ( https://lore.kernel.org/netdev/20240930152304.472767-2-edumazet@google.com/T/#mf6294d714c41cc459962154cc2580ce3c9693663 )
v3: added yaml doc (also per Jakub feedback)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20241003121219.2396589-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kernel may crash when deleting a genetlink family if there are still
listeners for that family:
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
Call Trace:
__netlink_clear_multicast_users+0x74/0xc0
genl_unregister_family+0xd4/0x2d0
Change the unsafe loop on the list to a safe one, because inside the
loop there is an element removal from this list.
Fixes: b8273570f802 ("genetlink: fix netns vs. netlink table locking (2)")
Cc: stable@vger.kernel.org
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
delack timer is not stopped from inet_csk_clear_xmit_timer()
because we do not define INET_CSK_CLEAR_TIMERS.
This is a conscious choice : inet_csk_clear_xmit_timer()
is often called from another cpu. Calling del_timer()
would cause false sharing and lock contention.
This means that very often, tcp_delack_timer() is called
at the timer expiration, while there is no ACK to transmit.
This can be detected very early, avoiding the socket spinlock.
Notes:
- test about tp->compressed_ack is racy,
but in the unlikely case there is a race, the dedicated
compressed_ack_timer hrtimer would close it.
- Even if the fast path is not taken, reading
icsk->icsk_ack.pending and tp->compressed_ack
before acquiring the socket spinlock reduces
acquisition time and chances of contention.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241002173042.917928-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
icsk->icsk_pending can be read locklessly already.
Following patch in the series will add another lockless read.
Add smp_load_acquire() and smp_store_release() annotations
because following patch will add a test in tcp_write_timer(),
and READ_ONCE()/WRITE_ONCE() alone would possibly lead to races.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241002173042.917928-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one in the core and one in the
intel_pstate driver:
- Fix CPU device node reference counting in the cpufreq core (Miquel
Sabaté Solà)
- Turn the spinlock used by the intel_pstate driver in hard IRQ
context into a raw one to prevent the driver from crashing when
PREEMPT_RT is enabled (Uwe Kleine-König)"
* tag 'pm-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid a bad reference count on CPU node
cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock
|
|
The last type of sockets which supports SOF_TIMESTAMPING_OPT_ID is RAW
sockets. To add new option this patch converts all callers (direct and
indirect) of _sock_tx_timestamp to provide sockcm_cookie instead of
tsflags. And while here fix __sock_tx_timestamp to receive tsflags as
__u32 instead of __u16.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241001125716.2832769-3-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg for UDP sockets.
The documentation is also added in this patch.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://patch.msgid.link/20241001125716.2832769-2-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It no longer serves any purpose and is identical to mlx5_fc_create upon
which it was originally based of.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241001103709.58127-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mlx5_fc_stats and mlx5_fc_pool structs are only used from
fs_counters.c. As such, make them private there.
mlx5_fc_pool is not used or referenced at all outside fs_counters.
mlx5_fc_stats is referenced from mlx5_core_dev, so instead of having it
as a direct member (which requires exporting it from fs_counters), store
a pointer to it, allocate it on init and clear it on destroy.
One caveat is that a simple container_of to get from a 'work' struct to
the outermost mlx5_core_dev struct directly no longer works, so an extra
pointer had to be added to mlx5_fc_stats back to the parent dev.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241001103709.58127-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Slightly high amount of changes in this round, partly because of my
vacation in the last weeks. But all changes are small and nothing
looks worrisome.
The biggest LOCs is MAINTAINERS updates, and there is a core change
for card-ID string creation for non-ASCII inputs. Others are rather
device-specific, such as new quirks and device IDs for ASoC, usual
HD-audio and USB-audio quirks and fixes, as well as regression fixes
in HD-audio HDMI audio and Conexant codec"
* tag 'sound-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin
ALSA: line6: add hw monitor volume control to POD HD500X
ALSA: gus: Fix some error handling paths related to get_bpos() usage
ALSA: hda: Add missing parameter description for snd_hdac_stream_timecounter_init()
ALSA: usb-audio: Add native DSD support for Luxman D-08u
ALSA: core: add isascii() check to card ID generator
MAINTAINERS: ALSA: use linux-sound@vger.kernel.org list
Revert "ALSA: hda: Conditionally use snooping for AMD HDMI"
ASoC: intel: sof_sdw: Add check devm_kasprintf() returned value
ASoC: imx-card: Set card.owner to avoid a warning calltrace if SND=m
ASoC: dt-bindings: davinci-mcasp: Fix interrupts property
ASoC: qcom: sm8250: add qrb4210-rb2-sndcard compatible string
ASoC: dt-bindings: qcom,sm8250: add qrb4210-rb2-sndcard
ALSA: hda: fix trigger_tstamp_latched
ALSA: hda/realtek: Add a quirk for HP Pavilion 15z-ec200
ALSA: hda/generic: Drop obsoleted obey_preferred_dacs flag
ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs
ALSA: silence integer wrapping warning
ASoC: Intel: soc-acpi: arl: Fix some missing empty terminators
ASoC: Intel: soc-acpi-intel-rpl-match: add missing empty item
...
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes, xe and amdgpu lead the way, with panthor, and few core
components getting various fixes. Nothing seems too out of the
ordinary.
atomic:
- Use correct type when reading damage rectangles
display:
- Fix kernel docs
dp-mst:
- Fix DSC decompression detection
hdmi:
- Fix infoframe size
sched:
- Update maintainers
- Fix race condition whne queueing up jobs
- Fix locking in drm_sched_entity_modify_sched()
- Fix pointer deref if entity queue changes
sysfb:
- Disable sysfb if framebuffer parent device is unknown
amdgpu:
- DML2 fix
- DSC fix
- Dispclk fix
- eDP HDR fix
- IPS fix
- TBT fix
i915:
- One fix for bitwise and logical "and" mixup in PM code
xe:
- Restore pci state on resume
- Fix locking on submission, queue and vm
- Fix UAF on queue destruction
- Fix resource release on freq init error path
- Use rw_semaphore to reduce contention on ASID->VM lookup
- Fix steering for media on Xe2_HPM
- Tuning updates to Xe2
- Resume TDR after GT reset to prevent jobs running forever
- Move id allocation to avoid userspace using a guessed number to
trigger UAF
- Fix OA stream close preventing pbatch buffers to complete
- Fix NPD when migrating memory on LNL
- Fix memory leak when aborting binds
panthor:
- Fix locking
- Set FOP_UNSIGNED_OFFSET in fops instance
- Acquire lock in panthor_vm_prepare_map_op_ctx()
- Avoid uninitialized variable in tick_ctx_cleanup()
- Do not block scheduler queue if work is pending
- Do not add write fences to the shared BOs
vbox:
- Fix VLA handling"
* tag 'drm-fixes-2024-10-04' of https://gitlab.freedesktop.org/drm/kernel: (41 commits)
drm/xe: Fix memory leak when aborting binds
drm/xe: Prevent null pointer access in xe_migrate_copy
drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close
drm/xe/queue: move xa_alloc to prevent UAF
drm/xe/vm: move xa_alloc to prevent UAF
drm/xe: Clean up VM / exec queue file lock usage.
drm/xe: Resume TDR after GT reset
drm/xe/xe2: Add performance tuning for L3 cache flushing
drm/xe/xe2: Extend performance tuning to media GT
drm/xe/mcr: Use Xe2_LPM steering tables for Xe2_HPM
drm/xe: Use helper for ASID -> VM in GPU faults and access counters
drm/xe: Convert to USM lock to rwsem
drm/xe: use devm_add_action_or_reset() helper
drm/xe: fix UAF around queue destruction
drm/xe/guc_submit: add missing locking in wedged_fini
drm/xe: Restore pci state upon resume
drm/amd/display: Fix system hang while resume with TBT monitor
drm/amd/display: Enable idle workqueue for more IPS modes
drm/amd/display: Add HDR workaround for specific eDP
drm/amd/display: avoid set dispclk to 0
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify fixes from Jan Kara:
"Fixes for an inotify deadlock and a data race in fsnotify"
* tag 'fsnotify_for_v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
inotify: Fix possible deadlock in fsnotify_destroy_mark
fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- in incremental send, fix invalid clone operation for file that got
its size decreased
- fix __counted_by() annotation of send path cache entries, we do not
store the terminating NUL
- fix a longstanding bug in relocation (and quite hard to hit by
chance), drop back reference cache that can get out of sync after
transaction commit
- wait for fixup worker kthread before finishing umount
- add missing raid-stripe-tree extent for NOCOW files, zoned mode
cannot have NOCOW files but RST is meant to be a standalone feature
- handle transaction start error during relocation, avoid potential
NULL pointer dereference of relocation control structure (reported by
syzbot)
- disable module-wide rate limiting of debug level messages
- minor fix to tracepoint definition (reported by checkpatch.pl)
* tag 'for-6.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: disable rate limiting when debug enabled
btrfs: wait for fixup workers before stopping cleaner kthread during umount
btrfs: fix a NULL pointer dereference when failed to start a new trasacntion
btrfs: send: fix invalid clone operation for file that got its size decreased
btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class
btrfs: drop the backref cache during relocation if we commit
btrfs: also add stripe entries for NOCOW writes
btrfs: send: fix buffer overflow detection when copying path to cache entry
|
|
Pull close_range() fix from Al Viro:
"Fix the logic in descriptor table trimming"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
close_range(): fix the logics in descriptor table trimming
|
|
We already have the possibility to force not binding to hid-generic and
rely on a dedicated driver, but we couldn't do the other way around.
This is useful for BPF programs where we are fixing the report descriptor
and the events, but want to avoid a specialized driver to come after BPF
which would unwind everything that is done there.
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Link: https://patch.msgid.link/20241001-hid-bpf-hid-generic-v3-8-2ef1019468df@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
Currently, hid_bpf_rdesc_fixup() is called once the match between the
HID device and the driver is done. This can be problematic in case
the driver selected by the kernel would change the report descriptor
after the fact.
To give a chance for hid_bpf_rdesc_fixup() to provide hints on to how
to select a dedicated driver or not, move the call to that BPF hook
earlier in the .probe() process, when we get the first match.
However, this means that we might get called more than once (typically
once for hid-generic, and once for hid-vendor-specific). So we store the
result of HID-BPF fixup in struct hid_device. Basically, this means that
->bpf_rdesc can replace ->dev_rdesc when it was used in the code.
In order to not grow struct hid_device, some fields are re-ordered. This
was the output of pahole for the first 128 bytes:
struct hid_device {
__u8 * dev_rdesc; /* 0 8 */
unsigned int dev_rsize; /* 8 4 */
/* XXX 4 bytes hole, try to pack */
__u8 * rdesc; /* 16 8 */
unsigned int rsize; /* 24 4 */
/* XXX 4 bytes hole, try to pack */
struct hid_collection * collection; /* 32 8 */
unsigned int collection_size; /* 40 4 */
unsigned int maxcollection; /* 44 4 */
unsigned int maxapplication; /* 48 4 */
__u16 bus; /* 52 2 */
__u16 group; /* 54 2 */
__u32 vendor; /* 56 4 */
__u32 product; /* 60 4 */
/* --- cacheline 1 boundary (64 bytes) --- */
__u32 version; /* 64 4 */
enum hid_type type; /* 68 4 */
unsigned int country; /* 72 4 */
/* XXX 4 bytes hole, try to pack */
struct hid_report_enum report_enum[3]; /* 80 6216 */
Basically, we got three holes of 4 bytes. We can reorder things a little
and makes those 3 holes a continuous 12 bytes hole, which can be replaced
by the new pointer and the new unsigned int we need.
In terms of code allocation, when not using HID-BPF, we are back to kernel
v6.2 in hid_open_report(). These multiple kmemdup() calls will be fixed
in a later commit.
Link: https://patch.msgid.link/20241001-hid-bpf-hid-generic-v3-1-2ef1019468df@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
Currently, the imx deivice controller has below limitations:
1. can't generate short packet interrupt if IOC not set in dTD. So if one
request span more than one dTDs and only the last dTD set IOC, the usb
request will pending there if no more data comes.
2. the controller can't accurately deliver data to differtent usb requests
in some cases due to short packet. For example: one usb request span 3
dTDs, then if the controller received a short packet the next packet
will go to 2nd dTD of current request rather than the first dTD of next
request.
3. can't build a bus packet use multiple dTDs. For example: controller
needs to send one packet of 512 bytes use dTD1 (200 bytes) + dTD2
(312 bytes), actually the host side will see 200 bytes short packet.
Based on these limits, add CI_HDRC_HAS_SHORT_PKT_LIMIT flag and use it on
imx platforms.
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20240923081203.2851768-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
alloc_ltalkdev in net/appletalk/dev.c is dead since
commit 00f3696f7555 ("net: appletalk: remove cops support")
Removing it (and it's helper) leaves dev.c and if_ltalk.h empty;
remove them and the Makefile entry.
tun.c was including that if_ltalk.h but actually wanted
the uapi version for LTALK_ALEN, fix up the path.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Provide a new register type NT_ARM_GCS reporting the current GCS mode
and pointer for EL0. Due to the interactions with allocation and
deallocation of Guarded Control Stacks we do not permit any changes to
the GCS mode via ptrace, only GCSPR_EL0 may be changed.
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-27-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Use VM_HIGH_ARCH_5 for guarded control stack pages.
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-14-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
In preparation for adding arm64 GCS support make the map_shadow_stack()
SHADOW_STACK_SET_TOKEN flag generic and add _SET_MARKER. The existing
flag indicates that a token usable for stack switch should be added to
the top of the newly mapped GCS region while the new flag indicates that
a top of stack marker suitable for use by unwinders should be added
above that.
For arm64 the top of stack marker is all bits 0.
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Yury Khrustalev <yury.khrustalev@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-5-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Three architectures (x86, aarch64, riscv) have announced support for
shadow stacks with fairly similar functionality. While x86 is using
arch_prctl() to control the functionality neither arm64 nor riscv uses
that interface so this patch adds arch-agnostic prctl() support to
get and set status of shadow stacks and lock the current configuation to
prevent further changes, with support for turning on and off individual
subfeatures so applications can limit their exposure to features that
they do not need. The features are:
- PR_SHADOW_STACK_ENABLE: Tracking and enforcement of shadow stacks,
including allocation of a shadow stack if one is not already
allocated.
- PR_SHADOW_STACK_WRITE: Writes to specific addresses in the shadow
stack.
- PR_SHADOW_STACK_PUSH: Push additional values onto the shadow stack.
These features are expected to be inherited by new threads and cleared
on exec(), unknown features should be rejected for enable but accepted
for locking (in order to allow for future proofing).
This is based on a patch originally written by Deepak Gupta but modified
fairly heavily, support for indirect landing pads is removed, additional
modes added and the locking interface reworked. The set status prctl()
is also reworked to just set flags, if setting/reading the shadow stack
pointer is required this could be a separate prctl.
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Yury Khrustalev <yury.khrustalev@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-4-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The addition of protection keys means that on arm64 we now use all of the
currently defined VM_HIGH_ARCH_x bits. In order to allow us to allocate a
new flag for GCS pages define VM_HIGH_ARCH_6.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-2-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Backmerging to resolve a conflict with core locally.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The only user of hid_snto32() is Logitech HID++ driver, which always
calls hid_snto32() with valid size (constant, either 12 or 8) and
therefore can simply use sign_extend32().
Make the switch and remove hid_snto32(). Move snto32() and s32ton() to
avoid introducing forward declaration.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://patch.msgid.link/20241003144656.3786064-2-dmitry.torokhov@gmail.com
[bentiss: fix checkpatch warning]
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
serio_pause_rx() and serio_continue_rx() are usually used together to
temporarily stop receiving interrupts/data for a given serio port.
Define "serio_pause_rx" guard for this so that the port is always
resumed once critical section is over.
Example:
scoped_guard(serio_pause_rx, elo->serio) {
elo->expected_packet = toupper(packet[0]);
init_completion(&elo->cmd_done);
}
Link: https://lore.kernel.org/r/20240905041732.2034348-2-dmitry.torokhov@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Use the folio that we already have to call folio_test_private_2()
instead. This is the last call to PagePrivate2(), so replace its
PAGEFLAG() definition with FOLIO_FLAG().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20241002040111.1023018-6-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
All callers have now been converted to the folio APIs, so remove
the page API for this flag.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20241002040111.1023018-4-willy@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make the role of the structure members related to UIC command processing
more clear.
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240912223019.3510966-2-bvanassche@acm.org
Reviewed-by: Bao D. Nguyen <quic_nguyenb@quicinc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In order to allow pahole add btf_decl_tag("bpf_fastcall") for kfuncs
supporting bpf_fastcall, mark such functions with KF_FASTCALL in
id_set8 objects.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240916091712.2929279-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since [1] kernel supports __bpf_fastcall attribute for helper function
bpf_get_smp_processor_id(). Update uapi definition for this helper in
order to have this attribute in the generated bpf_helper_defs.h
[1] commit 91b7fbf3936f ("bpf, x86, riscv, arm: no_caller_saved_registers for bpf_get_smp_processor_id()")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240916091712.2929279-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pass a dscp_t variable to ip_route_input_noref(), instead of a plain
u8, to prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input_noref() to consider are:
* arp_process() in net/ipv4/arp.c. This function sets the tos
parameter to 0, which is already a valid dscp_t value, so it
doesn't need to be adjusted for the new prototype.
* ip_route_input(), which already has a dscp_t variable to pass as
parameter. We just need to remove the inet_dscp_to_dsfield()
conversion.
* ipvlan_l3_rcv(), bpf_lwt_input_reroute(), ip_expire(),
ip_rcv_finish_core(), xfrm4_rcv_encap_finish() and
xfrm4_rcv_encap(), which get the DSCP directly from IPv4 headers
and can simply use the ip4h_dscp() helper.
While there, declare the IPv4 header pointers as const in
ipvlan_l3_rcv() and bpf_lwt_input_reroute().
Also, modify the declaration of ip_route_input_noref() in
include/net/route.h so that it matches the prototype of its
implementation in net/ipv4/route.c.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/a8a747bed452519c4d0cc06af32c7e7795d7b627.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pass a dscp_t variable to ip_route_input(), instead of a plain u8, to
prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input() to consider are:
* input_action_end_dx4_finish() and input_action_end_dt4() in
net/ipv6/seg6_local.c. These functions set the tos parameter to 0,
which is already a valid dscp_t value, so they don't need to be
adjusted for the new prototype.
* icmp_route_lookup(), which already has a dscp_t variable to pass as
parameter. We just need to remove the inet_dscp_to_dsfield()
conversion.
* br_nf_pre_routing_finish(), ip_options_rcv_srr() and ip4ip6_err(),
which get the DSCP directly from IPv4 headers. Define a helper to
read the .tos field of struct iphdr as dscp_t, so that these
function don't have to do the conversion manually.
While there, declare *iph as const in br_nf_pre_routing_finish(),
declare its local variables in reverse-christmas-tree order and move
the "err = ip_route_input()" assignment out of the conditional to avoid
checkpatch warning.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/e9d40781d64d3d69f4c79ac8a008b8d67a033e8d.1727807926.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
packing() is now used in some hot paths, and it would be good to get rid
of some ifs and buts that depend on "op", to speed things up a little bit.
With the main implementations now taking size_t endbit, we no longer
have to check for negative values. Update the local integer variables to
also be size_t to match.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-5-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Geert Uytterhoeven described packing() as "really bad API" because of
not being able to enforce const correctness. The same function is used
both when "pbuf" is input and "uval" is output, as in the other way
around.
Create 2 wrapper functions where const correctness can be ensured.
Do ugly type casts inside, to be able to reuse packing() as currently
implemented - which will _not_ modify the input argument.
Also, take the opportunity to change the type of startbit and endbit to
size_t - an unsigned type - in these new function prototypes. When int,
an extra check for negative values is necessary. Hopefully, when
packing() goes away completely, that check can be dropped.
My concern is that code which does rely on the conditional directionality
of packing() is harder to refactor without blowing up in size. So it may
take a while to completely eliminate packing(). But let's make alternatives
available for those who do not need that.
Link: https://lore.kernel.org/netdev/20210223112003.2223332-1-geert+renesas@glider.be/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-4-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It is not necessary to have the kernel-doc duplicated both in the
header and in the implementation. It is better to have it near the
implementation of the function, since in C, a function can have N
declarations, but only one definition.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241002-packing-kunit-tests-and-split-pack-unpack-v2-3-8373e551eae3@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some userspace changes are necessary so add a flag for userspace to
advertise support for preemption when creating the submitqueue.
When this flag is not set preemption will not be allowed in the middle
of the submitted IBs therefore mantaining compatibility with older
userspace.
The flag is rejected if preemption is not supported on the target, this
allows userspace to know whether preemption is supported.
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
Signed-off-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/618028/
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Add nfs_to_nfsd_file_put_local() interface to fix race with nfsd
module unload. Similarly, use RCU around nfs_open_local_fh()'s error
path call to nfs_to->nfsd_serv_put(). Holding RCU ensures that NFS
will safely _call and return_ from its nfs_to calls into the NFSD
functions nfsd_file_put_local() and nfsd_serv_put().
Otherwise, if RCU isn't used then there is a narrow window when NFS's
reference for the nfsd_file and nfsd_serv are dropped and the NFSD
module could be unloaded, which could result in a crash from the
return instruction for either nfs_to->nfsd_file_put_local() or
nfs_to->nfsd_serv_put().
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:
[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701] ESR = 0x0000000096000007
[232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084] SET = 0, FnV = 0
[232066.589216] EA = 0, S1PTW = 0
[232066.589340] FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683] ISV = 0, ISS = 0x00000007
[232066.589842] CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690] kthread+0x110/0x114
[232066.602830] ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!
Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp"
#0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
#1 [ffff8001116ef760] __schedule at ffff800008dd0650
#2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
#3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
#4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
#5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
#6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
#7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
#8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
#9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]
The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.
Fixes: 0e65a32c8a56 ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts and no adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from ieee802154, bluetooth and netfilter.
Current release - regressions:
- eth: mlx5: fix wrong reserved field in hca_cap_2 in mlx5_ifc
- eth: am65-cpsw: fix forever loop in cleanup code
Current release - new code bugs:
- eth: mlx5: HWS, fixed double-free in error flow of creating SQ
Previous releases - regressions:
- core: avoid potential underflow in qdisc_pkt_len_init() with UFO
- core: test for not too small csum_start in virtio_net_hdr_to_skb()
- vrf: revert "vrf: remove unnecessary RCU-bh critical section"
- bluetooth:
- fix uaf in l2cap_connect
- fix possible crash on mgmt_index_removed
- dsa: improve shutdown sequence
- eth: mlx5e: SHAMPO, fix overflow of hd_per_wq
- eth: ip_gre: fix drops of small packets in ipgre_xmit
Previous releases - always broken:
- core: fix gso_features_check to check for both
dev->gso_{ipv4_,}max_size
- core: fix tcp fraglist segmentation after pull from frag_list
- netfilter: nf_tables: prevent nf_skb_duplicated corruption
- sctp: set sk_state back to CLOSED if autobind fails in
sctp_listen_start
- mac802154: fix potential RCU dereference issue in
mac802154_scan_worker
- eth: fec: restart PPS after link state change"
* tag 'net-6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits)
sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
dt-bindings: net: xlnx,axi-ethernet: Add missing reg minItems
doc: net: napi: Update documentation for napi_schedule_irqoff
net/ncsi: Disable the ncsi work before freeing the associated structure
net: phy: qt2025: Fix warning: unused import DeviceId
gso: fix udp gso fraglist segmentation after pull from frag_list
bridge: mcast: Fail MDB get request on empty entry
vrf: revert "vrf: Remove unnecessary RCU-bh critical section"
net: ethernet: ti: am65-cpsw: Fix forever loop in cleanup code
net: phy: realtek: Check the index value in led_hw_control_get
ppp: do not assume bh is held in ppp_channel_bridge_input()
selftests: rds: move include.sh to TEST_FILES
net: test for not too small csum_start in virtio_net_hdr_to_skb()
net: gso: fix tcp fraglist segmentation after pull from frag_list
ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check
net: add more sanity checks to qdisc_pkt_len_init()
net: avoid potential underflow in qdisc_pkt_len_init() with UFO
net: ethernet: ti: cpsw_ale: Fix warning on some platforms
net: microchip: Make FDMA config symbol invisible
...
|