Age | Commit message (Collapse) | Author |
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The bch2_subvolume_get_snapshot() call needs to happen before the dirent
lookup - the dirent is in the parent subvolume.
Also, check for loops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
kthread creation checks for pending signals, which is _very_ annoying if
we have to do a long recovery and don't go rw until we've done
significant work.
Check if we'll be going rw and pre-allocate kthreads/workqueues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Print out more info when we find a key (extent, dirent, xattr) for a
missing inode - was there a good inode in an older snapshot, full(ish)
list of keys for that missing inode, so we can make better decisions on
how to repair.
If it looks like it should've been deleted, autofix it. If we ever hit
the non-autofix cases, we'll want to write more repair code (possibly
reconstituting the inode).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
After cd3cdb1ef706 ("Single err message for btree node reads"),
all errors caused __btree_err to return -BCH_ERR_fsck_fix no matter what
the actual error type was if the recovery pass was scanning for btree
nodes. This lead to the code continuing despite things like bad node
formats when they earlier would have caused a jump to fsck_err, because
btree_err only jumps when the return from __btree_err does not match
fsck_fix. Ultimately this lead to undefined behavior by attempting to
unpack a key based on an invalid format.
Make only errors of type -BCH_ERR_btree_node_read_err_fixable cause
__btree_err to return -BCH_ERR_fsck_fix when scanning for btree nodes.
Reported-by: syzbot+cfd994b9cdf00446fd54@syzkaller.appspotmail.com
Fixes: cd3cdb1ef706 ("bcachefs: Single err message for btree node reads")
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree_interior_update_pool has not been initialized before the
filesystem becomes read-write, thus mempool_alloc in bch2_btree_update_start
will trigger pool->alloc NULL pointer dereference in mempool_alloc_noprof
Reported-by: syzbot+2f3859bd28f20fa682e6@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In syzbot's crash, the bset's u64s is larger than the btree node.
Reported-by: syzbot+bfaeaa8e26281970158d@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Dead code cleanup.
Link: https://lore.kernel.org/linux-bcachefs/20250612224059.39fddd07@batman.local.home/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a mount option for rewinding the journal, bringing the entire
filesystem to where it was at a previous point in time.
This is for extreme disaster recovery scenarios - it's not intended as
an undelete operation.
The option takes a journal sequence number; the desired sequence number
can be determined with 'bcachefs list_journal'
Caveats:
- The 'journal_transaction_names' option must have been enabled (it's on
by default). The option controls emitting of extra debug info in the
journal, so we can see what individual transactions were doing;
It also enables journalling of keys being overwritten, which is what
we rely on here.
- A full fsck run will be automatically triggered since alloc info will
be inconsistent. Only leaf node updates to non-alloc btrees are
rewound, since rewinding interior btree updates isn't possible or
desirable.
- We can't do anything about data that was deleted and overwritten.
Lots of metadata updates after the point in time we're rewinding to
shouldn't cause a problem, since we segragate data and metadata
allocations (this is in order to make repair by btree node scan
practical on larger filesystems; there's a small 64-bit per device
bitmap in the superblock of device ranges with btree nodes, and we try
to keep this small).
However, having discards enabled will cause problems, since buckets
are discarded as soon as they become empty (this is why we don't
implement fstrim: we don't need it).
Hopefully, this feature will be a one-off thing that's never used
again: this was implemented for recovering from the "vfs i_nlink 0 ->
subvol deletion" bug, and that bug was unusually disastrous and
additional safeguards have since been implemented.
But if it does turn out that we need this more in the future, I'll
have to implement an option so that empty buckets aren't discarded
immediately - lagging by perhaps 1% of device capacity.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We should count the terminating NUL byte as part of the ctx_len.
Otherwise, UBSAN logs a warning:
UBSAN: array-index-out-of-bounds in security/selinux/xfrm.c:99:14
index 60 is out of range for type 'char [*]'
The allocation itself is correct so there is no actual out of bounds
indexing, just a warning.
Cc: stable@vger.kernel.org
Suggested-by: Christian Göttsche <cgzones@googlemail.com>
Link: https://lore.kernel.org/selinux/CAEjxPJ6tA5+LxsGfOJokzdPeRomBHjKLBVR6zbrg+_w3ZZbM3A@mail.gmail.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
The following kernel Oops was recently reported by Mesa CI:
[ 800.139824] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000588
[ 800.148619] Mem abort info:
[ 800.151402] ESR = 0x0000000096000005
[ 800.155141] EC = 0x25: DABT (current EL), IL = 32 bits
[ 800.160444] SET = 0, FnV = 0
[ 800.163488] EA = 0, S1PTW = 0
[ 800.166619] FSC = 0x05: level 1 translation fault
[ 800.171487] Data abort info:
[ 800.174357] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 800.179832] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 800.184873] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 800.190176] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001014c2000
[ 800.196607] [0000000000000588] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[ 800.205305] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[ 800.211564] Modules linked in: vc4 snd_soc_hdmi_codec drm_display_helper v3d cec gpu_sched drm_dma_helper drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm i2c_brcmstb snd_timer snd backlight
[ 800.234448] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1 Debian 1:6.12.25-1+rpt1
[ 800.244182] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 800.250005] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 800.256959] pc : v3d_job_update_stats+0x60/0x130 [v3d]
[ 800.262112] lr : v3d_job_update_stats+0x48/0x130 [v3d]
[ 800.267251] sp : ffffffc080003e60
[ 800.270555] x29: ffffffc080003e60 x28: ffffffd842784980 x27: 0224012000000000
[ 800.277687] x26: ffffffd84277f630 x25: ffffff81012fd800 x24: 0000000000000020
[ 800.284818] x23: ffffff8040238b08 x22: 0000000000000570 x21: 0000000000000158
[ 800.291948] x20: 0000000000000000 x19: ffffff8040238000 x18: 0000000000000000
[ 800.299078] x17: ffffffa8c1bd2000 x16: ffffffc080000000 x15: 0000000000000000
[ 800.306208] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 800.313338] x11: 0000000000000040 x10: 0000000000001a40 x9 : ffffffd83b39757c
[ 800.320468] x8 : ffffffd842786420 x7 : 7fffffffffffffff x6 : 0000000000ef32b0
[ 800.327598] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : ffffffd842784980
[ 800.334728] x2 : 0000000000000004 x1 : 0000000000010002 x0 : 000000ba4c0ca382
[ 800.341859] Call trace:
[ 800.344294] v3d_job_update_stats+0x60/0x130 [v3d]
[ 800.349086] v3d_irq+0x124/0x2e0 [v3d]
[ 800.352835] __handle_irq_event_percpu+0x58/0x218
[ 800.357539] handle_irq_event+0x54/0xb8
[ 800.361369] handle_fasteoi_irq+0xac/0x240
[ 800.365458] handle_irq_desc+0x48/0x68
[ 800.369200] generic_handle_domain_irq+0x24/0x38
[ 800.373810] gic_handle_irq+0x48/0xd8
[ 800.377464] call_on_irq_stack+0x24/0x58
[ 800.381379] do_interrupt_handler+0x88/0x98
[ 800.385554] el1_interrupt+0x34/0x68
[ 800.389123] el1h_64_irq_handler+0x18/0x28
[ 800.393211] el1h_64_irq+0x64/0x68
[ 800.396603] default_idle_call+0x3c/0x168
[ 800.400606] do_idle+0x1fc/0x230
[ 800.403827] cpu_startup_entry+0x40/0x50
[ 800.407742] rest_init+0xe4/0xf0
[ 800.410962] start_kernel+0x5e8/0x790
[ 800.414616] __primary_switched+0x80/0x90
[ 800.418622] Code: 8b170277 8b160296 11000421 b9000861 (b9401ac1)
[ 800.424707] ---[ end trace 0000000000000000 ]---
[ 800.457313] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
This issue happens when the file descriptor is closed before the jobs
submitted by it are completed. When the job completes, we update the
global GPU stats and the per-fd GPU stats, which are exposed through
fdinfo. If the file descriptor was closed, then the struct `v3d_file_priv`
and its stats were already freed and we can't update the per-fd stats.
Therefore, if the file descriptor was already closed, don't update the
per-fd GPU stats, only update the global ones.
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Link: https://lore.kernel.org/r/20250602151451.10161-1-mcanal@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
|
|
Ido Schimmel says:
====================
seg6: Allow End.X behavior to accept an oif
Patches #1-#3 gradually extend the End.X behavior to accept an output
interface as an optional argument. This is needed for cases where user
space wishes to specify an IPv6 link-local address as the nexthop
address.
Patch #4 adds test cases to the existing End.X selftest to cover the new
functionality.
====================
Link: https://patch.msgid.link/20250612122323.584113-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the current test topology, all the routers are connected to each
other via dedicated links with addresses of the form fcf0:0:x:y::/64.
The test configures rt-3 with an adjacency with rt-4 and rt-4 with an
adjacency with rt-1:
# ip -n rt_3-IgWSBJ -6 route show tab 90 fcbb:0:300::/48
fcbb:0:300::/48 encap seg6local action End.X nh6 fcf0:0:3:4::4 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
# ip -n rt_4-JdCunK -6 route show tab 90 fcbb:0:400::/48
fcbb:0:400::/48 encap seg6local action End.X nh6 fcf0:0:1:4::1 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
The routes are used when pinging hs-2 from hs-1 and vice-versa.
Extend the test to also cover End.X behavior with an IPv6 link-local
nexthop address and an output interface. Configure every router
interface with an IPv6 link-local address of the form fe80::x:y/64 and
before re-running the ping tests, replace the previous End.X routes with
routes that use the new IPv6 link-local addresses:
# ip -n rt_3-IgWSBJ -6 route show tab 90 fcbb:0:300::/48
fcbb:0:300::/48 encap seg6local action End.X nh6 fe80::4:3 oif veth-rt-3-4 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
# ip -n rt_4-JdCunK -6 route show tab 90 fcbb:0:400::/48
fcbb:0:400::/48 encap seg6local action End.X nh6 fe80::1:4 oif veth-rt-4-1 flavors next-csid lblen 32 nflen 16 dev dum0 metric 1024 pref medium
The new test cases fail without the previous patch ("seg6: Allow End.X
behavior to accept an oif"):
# ./srv6_end_x_next_csid_l3vpn_test.sh
[...]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv6), link-local
################################################################################
TEST: IPv6 Hosts connectivity: hs-1 -> hs-2 [FAIL]
TEST: IPv6 Hosts connectivity: hs-2 -> hs-1 [FAIL]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv4), link-local
################################################################################
TEST: IPv4 Hosts connectivity: hs-1 -> hs-2 [FAIL]
TEST: IPv4 Hosts connectivity: hs-2 -> hs-1 [FAIL]
Tests passed: 40
Tests failed: 4
And pass with it:
# ./srv6_end_x_next_csid_l3vpn_test.sh
[...]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv6), link-local
################################################################################
TEST: IPv6 Hosts connectivity: hs-1 -> hs-2 [ OK ]
TEST: IPv6 Hosts connectivity: hs-2 -> hs-1 [ OK ]
################################################################################
TEST SECTION: SRv6 VPN connectivity test hosts (h1 <-> h2, IPv4), link-local
################################################################################
TEST: IPv4 Hosts connectivity: hs-1 -> hs-2 [ OK ]
TEST: IPv4 Hosts connectivity: hs-2 -> hs-1 [ OK ]
Tests passed: 44
Tests failed: 0
Without the previous patch, rt-3 and rt-4 resolve the wrong routes for
the link-local nexthops, with the output interface being the input
interface:
# perf script
[...]
ping 1067 [001] 37.554486: fib6:fib6_table_lookup: table 254 oif 0 iif 11 proto 41 cafe::254/0 -> fe80::4:3/0 flowlabel 0xb7973 tos 0 scope 0 flags 2 ==> dev veth-rt-3-1 gw :: err 0
[...]
ping 1069 [002] 41.573360: fib6:fib6_table_lookup: table 254 oif 0 iif 12 proto 41 cafe::254/0 -> fe80::1:4/0 flowlabel 0xb7973 tos 0 scope 0 flags 2 ==> dev veth-rt-4-2 gw :: err 0
But the correct routes are resolved with the patch:
# perf script
[...]
ping 1066 [006] 30.672355: fib6:fib6_table_lookup: table 254 oif 13 iif 1 proto 41 cafe::254/0 -> fe80::4:3/0 flowlabel 0x85941 tos 0 scope 0 flags 6 ==> dev veth-rt-3-4 gw :: err 0
[...]
ping 1066 [006] 30.672411: fib6:fib6_table_lookup: table 254 oif 11 iif 1 proto 41 cafe::254/0 -> fe80::1:4/0 flowlabel 0x91de0 tos 0 scope 0 flags 6 ==> dev veth-rt-4-1 gw :: err 0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend the End.X behavior to accept an output interface as an optional
attribute and make use of it when resolving a route. This is needed when
user space wants to use a link-local address as the nexthop address.
Before:
# ip route add 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6
# ip route add 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6
$ ip -6 route show
2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 dev sr6 metric 1024 pref medium
2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 metric 1024 pref medium
After:
# ip route add 2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6
# ip route add 2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6
$ ip -6 route show
2001:db8:1::/64 encap seg6local action End.X nh6 fe80::1 oif eth0 dev sr6 metric 1024 pref medium
2001:db8:2::/64 encap seg6local action End.X nh6 2001:db8:10::1 dev sr6 metric 1024 pref medium
Note that the oif attribute is not dumped to user space when it was not
specified (as an oif of 0) since each entry keeps track of the optional
attributes that it parsed during configuration (see struct
seg6_local_lwt::parsed_optattrs).
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
seg6_lookup_nexthop() is a wrapper around seg6_lookup_any_nexthop().
Change End.X behavior to invoke seg6_lookup_any_nexthop() directly so
that we would not need to expose the new output interface argument
outside of the seg6local module.
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
seg6_lookup_any_nexthop() is called by the different endpoint behaviors
(e.g., End, End.X) to resolve an IPv6 route. Extend the function with an
output interface argument so that it could be used to resolve a route
with a certain output interface. This will be used by subsequent patches
that will extend the End.X behavior with an output interface as an
optional argument.
ip6_route_input_lookup() cannot be used when an output interface is
specified as it ignores this parameter. Similarly, calling
ip6_pol_route() when a table ID was not specified (e.g., End.X behavior)
is wrong.
Therefore, when an output interface is specified without a table ID,
resolve the route using ip6_route_output() which will take the output
interface into account.
Note that no endpoint behavior currently passes both a table ID and an
output interface, so the oif argument passed to ip6_pol_route() is
always zero and there are no functional changes in this regard.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250612122323.584113-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ziwei Xiao says:
====================
gve: Add Rx HW timestamping support
This patch series add the support of Rx HW timestamping, which sends
adminq commands periodically to the device for clock synchronization with
the NIC.
The ability to read the PHC from user space will be added in the
future patch series when adding the actual PTP support. For this patch
series, it's adding the initial ptp to utilize the ptp_schedule_worker
to schedule the work of syncing the NIC clock.
====================
Link: https://patch.msgid.link/20250614000754.164827-1-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Expand the get_ts_info ethtool handler with the new gve_get_ts_info
which advertises support for rx hardware timestamping.
With this patch, the driver now fully supports rx hardware timestamping.
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-9-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement ndo_hwtstamp_get/set to enable hardware RX timestamping,
providing support for SIOC[SG]HWTSTAMP IOCTLs. Included with this support
is the small change necessary to read the rx timestamp out of the rx
descriptor, now that timestamps start being enabled. The gve clock is
only used for hardware timestamps, so started when timestamps are
requested and stopped when not needed.
This version only supports RX hardware timestamping with the rx filter
HWTSTAMP_FILTER_ALL. If the user attempts to configure a more
restrictive filter, the filter will be set to HWTSTAMP_FILTER_ALL in the
returned structure.
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-8-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow the rx path to recover the high 32 bits of the full 64 bit rx
timestamp.
Use the low 32 bits of the last synced nic time and the 32 bits of the
timestamp provided in the rx descriptor to generate a difference, which
is then applied to the last synced nic time to reconstruct the complete
64-bit timestamp.
This scheme remains accurate as long as no more than ~2 seconds have
passed between the last read of the nic clock and the timestamping
application of the received packet.
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-7-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Query the nic clock and store the results. The timestamp delivered
in descriptors has a wraparound time of ~4 seconds so 250ms is chosen
as the sync cadence to provide a balance between performance, and
drift potential when we do start associating host time and nic time.
Leverage PTP's aux_work to query the nic clock periodically.
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Tim Hostetler <thostet@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-6-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adminq commands for queues creation and destruction were not
consistently protected by the driver's adminq_lock. This was previously
benign as these operations were always initiated from contexts holding
kernel-level locks (e.g., rtnl_lock, netdev_lock), which provided
serialization.
Upcoming PTP aux_work will issue adminq commands directly from the
driver to read the NIC clock, without such kernel lock protection.
To prevent race conditions with this new PTP work, this patch ensures
the adminq_lock is held during queues creation and destruction.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-5-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the device supports reading of the nic clock, add support
to initialize and register the PTP clock.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-4-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an adminq command to read NIC's hardware clock. The driver
allocates dma memory and passes that dma memory address to the device.
The device then writes the clock to the given address.
Signed-off-by: Jeff Rogers <jefrogers@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-3-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the device option and negotiation with the device for clock
synchronization with the nic. This option is necessary before the driver
will advertise support for hardware timestamping or other related
features.
Signed-off-by: Jeff Rogers <jefrogers@google.com>
Signed-off-by: John Fraker <jfraker@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250614000754.164827-2-hramamurthy@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To collaborate with hardware servicing events, upon receiving the special
EQE notification from the HW channel, remove the devices on this bus.
Then, after a waiting period based on the device specs, rescan the parent
bus to recover the devices.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1749834034-18498-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Breno Leitao says:
====================
netpoll: Untangle netconsole and netpoll
Initially netpoll and netconsole were created together, and some
functions are in the wrong file. Seperate netconsole-only functions
in netconsole, avoiding exports.
1. Expose netpoll logging macros in the public header to enable consistent
log formatting across netpoll consumers.
2. Relocate netconsole-specific functions from netpoll to the netconsole
module where they are actually used, reducing unnecessary coupling.
3. Remove unnecessary function exports
4. Rename netpoll parsing functions in netconsole to better reflect their
specific usage.
5. Create a test to check that cmdline works fine. This was in my todo
list since [1], this was a good time to add it here to make sure this
patchset doesn't regress.
PS: The code was split in a way that it is easy to review. When copying
the functions from netpoll to netconsole, I do not change than other
than adding `static`. This will make checkpatch unhappy, but, further
patches will address the issues. It is done this way to make it easy for
reviewers.
Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorage.com/ [1]
v2: https://lore.kernel.org/20250611-rework-v2-0-ab1d92b458ca@debian.org
v1: https://lore.kernel.org/20250610-rework-v1-0-7cfde283f246@debian.org
====================
Link: https://patch.msgid.link/20250613-rework-v3-0-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new selftest to verify netconsole module loading with command
line arguments. This test exercises the init_netconsole() path and
validates proper parsing of the netconsole= parameter format.
The test:
- Loads netconsole module with cmdline configuration instead of
dynamic reconfiguration
- Validates message transmission through the configured target
- Adds helper functions for cmdline string generation and module
validation
This complements existing netconsole selftests by covering the
module initialization code path that processes boot-time parameters.
This test is useful to test issues like the one described in [1].
Link: https://lore.kernel.org/netdev/Z36TlACdNMwFD7wv@dev-ushankar.dev.purestorage.com/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-8-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extract the network device and namespace cleanup logic from the
cleanup() function into a new do_cleanup() helper in lib_netcons.sh.
The do_cleanup() function only unconfigure the network and
printk, while cleanup() cleans the netconsole targets plus the network
and printk.
This refactoring let this code to be reused in cases netconsole dynamic
is not being used, as in the upcoming patch.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-7-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Split assignment from conditional checks and use preferred null pointer
check style (!delim instead of == NULL) in netconsole_parser_cmdline().
This improves code readability and follows kernel coding style
conventions.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-6-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename netpoll_parse_options() to netconsole_parser_cmdline() and
netpoll_print_options() to netconsole_print_banner() to better
describe what these functions actually do within the netconsole
context.
Also fix minor code style issues including variable declaration
ordering and spacing.
These functions are specific to netconsole functionality rather
than general netpoll operations, so the new names better reflect
their actual purpose.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-5-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move netpoll_print_options() from net/core/netpoll.c to
drivers/net/netconsole.c and make it static. This function is only used
by netconsole, so there's no need to export it or keep it in the public
netpoll API.
This reduces the netpoll API surface and improves code locality
by keeping netconsole-specific functionality within the netconsole
driver.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-4-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move netpoll_parse_ip_addr() and netpoll_parse_options() from the generic
netpoll module to the netconsole module where they are actually used.
These functions were originally placed in netpoll but are only consumed by
netconsole. This refactoring improves code organization by:
- Removing unnecessary exported symbols from netpoll
- Making netpoll_parse_options() static (no longer needs global visibility)
- Reducing coupling between netpoll and netconsole modules
The functions remain functionally identical - this is purely a code
reorganization to better reflect their actual usage patterns. Here are
the changes:
1) Move both functions from netpoll to netconsole
2) Add static to netpoll_parse_options()
3) Removed the EXPORT_SYMBOL()
PS: This diff does not change the function format, so, it is easy to
review, but, checkpatch will not be happy. A follow-up patch will
address the current issues reported by checkpatch.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-3-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move np_info(), np_err(), and np_notice() macros from internal
implementation to the public netpoll header file to make them
available for use by netpoll consumers.
These logging macros provide consistent formatting for netpoll-related
messages by automatically prefixing log output with the netpoll instance
name.
The goal is to use the exact same format that is being displayed today,
instead of creating something netconsole-specific.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-2-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit 97714695ef90 ("net: netconsole: Defer netpoll cleanup to
avoid lock release during list traversal"), netconsole no longer uses
__netpoll_cleanup(). With no remaining users, remove this function
from the exported netpoll API.
The function remains available internally within netpoll for use by
netpoll_cleanup().
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250613-rework-v3-1-0752bf2e6912@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace pr_err() with pr_err_ratelimited() in ptp_clock_settime() to
prevent log flooding when the physical clock is free running, which
happens on some of my hosts. This ensures error messages are
rate-limited and improves kernel log readability.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250613-ptp-v1-1-ee44260ce9e2@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add one test to check that the kernel rejects a negative perturb timer.
Add a second test checking that the kernel rejects
a too big perturb timer.
All test results:
1..2
ok 1 cdc1 - Check that a negative perturb timer is rejected
ok 2 a9f0 - Check that a too big perturb timer is rejected
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250613064136.3911944-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Heiner Kallweit says:
====================
net: phy: make phy_package a separate module
Only a handful of PHY drivers needs the PHY package functionality,
therefore make it a separate module which is built only if needed.
====================
Link: https://patch.msgid.link/eec346a4-e903-48af-8150-0191932a7a0b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Only a handful of PHY drivers needs the PHY package functionality,
therefore build the module only if needed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/42c05496-61b2-4b09-b853-3d99b3dfe95c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make phy_package a separate module, so that this code is only loaded
if needed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/66bb4cce-b6a3-421e-9a7b-5d4a0c75290e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move both functions to phy_package.c, so that phy_core.c no longer
has a dependency on phy_package.c (phy_package_address).
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/8956fa53-3eda-4079-8203-a8fddcc17bf3@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It appears that the GMAC_ANE_ADV and GMAC_ANE_LPA registers are only
available for TBI and RTBI PHY interfaces. In commit 482b3c3ba757
("net: stmmac: Drop TBI/RTBI PCS flags") support for these was dropped,
and thus it no longer makes sense to access these registers.
Remove the *_get_adv_lp() functions, and the now redundant struct
rgmii_adv and STMMAC_PCS_* definitions.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/E1uPkbT-004EyG-OQ@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Several of the tcp_ao events are only called when CONFIG_TCP_AO is
defined. As each event can take up to 5K regardless if they are used or
not, it's best not to define them when they are not used. Add #ifdef
around these events when they are not used.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250612094616.4222daf0@batman.local.home
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
strsep() modifies the address of the pointer passed to it so that it no
longer points to the original address. This means kfree() gets the wrong
pointer.
Fix this by passing unmodified pointer returned from kstrdup() to
kfree().
Found by Linux Verification Center (linuxtesting.org) with Svace.
Fixes: 4df84e846624 ("scsi: elx: efct: Driver initialization routines")
Signed-off-by: Vitaliy Shevtsov <v.shevtsov@mt-integration.ru>
Link: https://lore.kernel.org/r/20250612163616.24298-1-v.shevtsov@mt-integration.ru
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
pldmfw calls crc32 code and depends on it being enabled, else
there is a link error as follows. So PLDMFW should select CRC32.
lib/pldmfw/pldmfw.o: In function `pldmfw_flash_image':
pldmfw.c:(.text+0x70f): undefined reference to `crc32_le_base'
This problem was introduced by commit b8265621f488 ("Add pldmfw library
for PLDM firmware update").
It manifests as of commit d69ea414c9b4 ("ice: implement device flash
update via devlink").
And is more likely to occur as of commit 9ad19171b6d6 ("lib/crc: remove
unnecessary prompt for CONFIG_CRC32 and drop 'default y'").
Found by chance while exercising builds based on tinyconfig.
Fixes: b8265621f488 ("Add pldmfw library for PLDM firmware update")
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250613-pldmfw-crc32-v1-1-f3fad109eee6@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
AMD's Family 19h-based Models 70h-7fh support 4 unified memory controllers
(UMC) per processor die.
The amd64_edac driver, however, assumes only 2 UMCs are supported since
max_mcs variable for the models has not been explicitly set to 4. The same
results in incomplete or incorrect memory information being logged to dmesg by
the module during initialization in some instances.
Fixes: 6c79e42169fe ("EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh")
Closes: https://lore.kernel.org/all/27dc093f-ce27-4c71-9e81-786150a040b6@reox.at/
Reported-by: reox <mailinglist@reox.at>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@kernel.org
Link: https://lore.kernel.org/20250613005233.2330627-1-avadhut.naik@amd.com
|
|
For some reason arm64's Poly1305 code got changed to ignore the padbit
argument. As a result, the output is incorrect when the message length
is not a multiple of 16 (which is not reached with the standard
ChaCha20Poly1305, but bcachefs could reach this). Fix this.
Fixes: a59e5468a921 ("crypto: arm64/poly1305 - Add block-only interface")
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Tested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20250616010654.367302-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
subsystem
In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.
rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.
Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).
rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.
Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.
Fixes: 328ea68874642 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files")
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com
|
|
On 64-bit systems, writing/reading one u64 is faster than two u32s even
when they're are adjacent in a struct. The compilers won't guarantee
they will combine those; I observed both successful and unsuccessful
attempts with both GCC and Clang, and it's not easy to say what it
depends on.
There's a few places in libeth_xdp winning up to several percent from
combined access (both performance and object code size, especially
when unrolling). Add __LIBETH_WORD_ACCESS and use it there on LE.
Drivers are free to optimize HW-specific callbacks under the same
definition.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|