Age | Commit message (Collapse) | Author |
|
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used
correctly. The problem is that we try to set them up in the request
initialisation, but we the cache may be in the process of setting up still,
and so the state may not be correct. Further, we secondarily sample the
cache state and make contradictory decisions later.
The issue arises because we set up the cache resources, which allows the
cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which
triggers cache writing even if we didn't set the flags when allocating.
Fix this in the following way:
(1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in
->init_request() rather than trying to juggle that in
netfs_alloc_request().
(2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is
to be done, then PG_private_2 is to be used rather than only setting
it if we decide to cache and then having netfs_rreq_unlock_folios()
set the non-PG_private_2 writeback-to-cache if it wasn't set.
(3) Split netfs_rreq_unlock_folios() into two functions, one of which
contains the deprecated code for using PG_private_2 to avoid
accidentally doing the writeback path - and always use it if
USE_PGPRIV2 is set.
(4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always
wait for PG_private_2. This function is deprecated and only used by
ceph anyway, and so label it so.
(5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use
fscache_operation_valid() on the cache_resources instead. This has
the advantage of picking up the result of netfs_begin_cache_read() and
fscache_begin_write_operation() - which are called after the object is
initialised and will wait for the cache to come to a usable state.
Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be
applied on top of that. Without this as well, things like:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
and:
WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386
may happen, along with some UAFs due to PG_private_2 not getting used to
wait on writeback completion.
Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Hristo Venev <hristo@venev.name>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/1173209.1723152682@warthog.procyon.org.uk
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
second writeback flag"
This reverts commit ae678317b95e760607c7b20b97c9cd4ca9ed6e1a.
Revert the patch that removes the deprecated use of PG_private_2 in
netfslib for the moment as Ceph is actually still using this to track
data copied to the cache.
Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Xiubo Li <xiubli@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: ceph-devel@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The explanatory comment above take_fd() contains a typo, fix that to not
confuse readers.
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20240809135035.748109-1-minipli@grsecurity.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner
Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to
CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First,
NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas
the actual intended config is called NETFS_SUPPORT. Second, the config
renaming misses to adjust the documentation of the functionality of this
config.
Clean up those two points.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Link: https://lore.kernel.org/r/20240731073902.69262-1-lukas.bulwahn@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
After we switch tmpfs dir operations from simple_dir_operations to
simple_offset_dir_operations, every rename happened will fill new dentry
to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
key starting with octx->newx_offset, and then set newx_offset equals to
free key + 1. This will lead to infinite readdir combine with rename
happened at the same time, which fail generic/736 in xfstests(detail show
as below).
1. create 5000 files(1 2 3...) under one dir
2. call readdir(man 3 readdir) once, and get one entry
3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
4. loop 2~3, until readdir return nothing or we loop too many
times(tmpfs break test with the second condition)
We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
directory reads") to fix it, record the last_index when we open dir, and
do not emit the entry which index >= last_index. The file->private_data
now used in offset dir can use directly to do this, and we also update
the last_index when we llseek the dir file.
Fixes: a2e459555c5f ("shmem: stable directory offsets")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
[brauner: only update last_index after seek when offset is zero like Jan suggested]
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The kernel is writing an object of type __u64, so the ioctl has to be
defined to _IOR(NSIO, 0x5, __u64) instead of _IO(NSIO, 0x5).
Reported-by: Dmitry V. Levin <ldv@strace.io>
Link: https://lore.kernel.org/r/20240730164554.GA18486@altlinux.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This fixes a NULL pointer dereference bug due to a data race which
looks like this:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
Workqueue: events_unbound netfs_rreq_write_to_cache_work
RIP: 0010:cachefiles_prepare_write+0x30/0xa0
Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10
RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286
RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000
RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438
RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001
R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68
R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00
FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0
Call Trace:
<TASK>
? __die+0x1f/0x70
? page_fault_oops+0x15d/0x440
? search_module_extables+0xe/0x40
? fixup_exception+0x22/0x2f0
? exc_page_fault+0x5f/0x100
? asm_exc_page_fault+0x22/0x30
? cachefiles_prepare_write+0x30/0xa0
netfs_rreq_write_to_cache_work+0x135/0x2e0
process_one_work+0x137/0x2c0
worker_thread+0x2e9/0x400
? __pfx_worker_thread+0x10/0x10
kthread+0xcc/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x30/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
This happened because fscache_cookie_state_machine() was slow and was
still running while another process invoked fscache_unuse_cookie();
this led to a fscache_cookie_lru_do_one() call, setting the
FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by
fscache_cookie_state_machine(), withdrawing the cookie via
cachefiles_withdraw_cookie(), clearing cookie->cache_priv.
At the same time, yet another process invoked
cachefiles_prepare_write(), which found a NULL pointer in this code
line:
struct cachefiles_object *object = cachefiles_cres_object(cres);
The next line crashes, obviously:
struct cachefiles_cache *cache = object->volume->cache;
During cachefiles_prepare_write(), the "n_accesses" counter is
non-zero (via fscache_begin_operation()). The cookie must not be
withdrawn until it drops to zero.
The counter is checked by fscache_cookie_state_machine() before
switching to FSCACHE_COOKIE_STATE_RELINQUISHING and
FSCACHE_COOKIE_STATE_WITHDRAWING (in "case
FSCACHE_COOKIE_STATE_FAILED"), but not for
FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case
FSCACHE_COOKIE_STATE_ACTIVE").
This patch adds the missing check. With a non-zero access counter,
the function returns and the next fscache_end_cookie_access() call
will queue another fscache_cookie_state_machine() call to handle the
still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20240729162002.3436763-2-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When struct file_lease was split out from struct file_lock, the name of
the file_lock slab cache was copied to the new slab cache for
file_lease. This name conflict causes confusion in /proc/slabinfo and
/sys/kernel/slab. In particular, it caused failures in drgn's test case
for slab cache merging.
Link: https://github.com/osandov/drgn/blob/9ad29fd86499eb32847473e928b6540872d3d59a/tests/linux_kernel/helpers/test_slab.py#L81
Fixes: c69ff4071935 ("filelock: split leases out of struct file_lock")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Link: https://lore.kernel.org/r/2d1d053da1cafb3e7940c4f25952da4f0af34e38.1722293276.git.osandov@fb.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
As in commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large
folio mappings"), we can see a performance loss for filesystems
which have not yet been converted to large folios.
Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20240527201735.1898381-1-willy@infradead.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"While the ideapad concurrency fix itself is relatively
straightforward, it required moving code around and adding a bit of
supporting infrastructure to have a clean inter-driver interface. This
shows up in the diffstats.
- ideapad-laptop / lenovo-ymc: Protect VPC calls with a mutex
- amd/pmf: Query HPD data also when ALS is disabled"
* tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands
platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc
platform/x86: ideapad-laptop: introduce a generic notification chain
platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
|
|
Pull fd bitmap fix from Al Viro:
"Fix bitmap corruption on close_range() by cleaning up
copy_fd_bitmaps()"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
|
|
When there are multiple ap interfaces on one band and with WED on,
turning the interface down will cause a kernel panic on MT798X.
Previously, cb_priv was freed in mtk_wed_setup_tc_block() without
marking NULL,and mtk_wed_setup_tc_block_cb() didn't check the value, too.
Assign NULL after free cb_priv in mtk_wed_setup_tc_block() and check NULL
in mtk_wed_setup_tc_block_cb().
----------
Unable to handle kernel paging request at virtual address 0072460bca32b4f5
Call trace:
mtk_wed_setup_tc_block_cb+0x4/0x38
0xffffffc0794084bc
tcf_block_playback_offloads+0x70/0x1e8
tcf_block_unbind+0x6c/0xc8
...
---------
Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Zheng Zhang <everything411@qq.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MANA driver's RX buffer alloc_size is passed into napi_build_skb() to
create SKB. skb_shinfo(skb) is located at the end of skb, and its alignment
is affected by the alloc_size passed into napi_build_skb(). The size needs
to be aligned properly for better performance and atomic operations.
Otherwise, on ARM64 CPU, for certain MTU settings like 4000, atomic
operations may panic on the skb_shinfo(skb)->dataref due to alignment fault.
To fix this bug, add proper alignment to the alloc_size calculation.
Sample panic info:
[ 253.298819] Unable to handle kernel paging request at virtual address ffff000129ba5cce
[ 253.300900] Mem abort info:
[ 253.301760] ESR = 0x0000000096000021
[ 253.302825] EC = 0x25: DABT (current EL), IL = 32 bits
[ 253.304268] SET = 0, FnV = 0
[ 253.305172] EA = 0, S1PTW = 0
[ 253.306103] FSC = 0x21: alignment fault
Call trace:
__skb_clone+0xfc/0x198
skb_clone+0x78/0xe0
raw6_local_deliver+0xfc/0x228
ip6_protocol_deliver_rcu+0x80/0x500
ip6_input_finish+0x48/0x80
ip6_input+0x48/0xc0
ip6_sublist_rcv_finish+0x50/0x78
ip6_sublist_rcv+0x1cc/0x2b8
ipv6_list_rcv+0x100/0x150
__netif_receive_skb_list_core+0x180/0x220
netif_receive_skb_list_internal+0x198/0x2a8
__napi_poll+0x138/0x250
net_rx_action+0x148/0x330
handle_softirqs+0x12c/0x3a0
Cc: stable@vger.kernel.org
Fixes: 80f6215b450e ("net: mana: Add support for jumbo frame")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add missed property phys, which indicate how connect to serdes phy.
Fix below warning:
arch/arm64/boot/dts/freescale/fsl-lx2160a-honeycomb.dtb: fsl-mc@80c000000: dpmacs:ethernet@7: Unevaluated properties are not allowed ('phys' was unexpected)
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pawel Dembicki says:
====================
net: dsa: vsc73xx: fix MDIO bus access and PHY opera
This series are extracted patches from net-next series [0].
The VSC73xx driver has issues with PHY configuration. This patch series
fixes most of them.
The first patch synchronizes the register configuration routine with the
datasheet recommendations.
Patches 2-3 restore proper communication on the MDIO bus. Currently,
the write value isn't sent to the MDIO register, and without a busy check,
communication with the PHY can be interrupted. This causes the PHY to
receive improper configuration and autonegotiation could fail.
The fourth patch removes the PHY reset blockade, as it is no longer
required.
After fixing the MDIO operations, autonegotiation became possible.
The last patch removes the blockade, which became unnecessary after
the MDIO operations fix.
[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=874739&state=%2A&archive=both
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the vsc73xx mdio bus work properly, the generic autonegotiation
configuration works well.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Resetting the VSC73xx PHY was problematic because the MDIO bus, without
a busy check, read and wrote incorrect register values.
My investigation indicates that resetting the PHY only triggers changes
in configuration. However, improper register values written earlier
were only exposed after a soft reset.
The reset itself wasn't the issue; rather, the problem stemmed from
incorrect read and write operations.
A 'soft_reset' can now proceed normally. There are no reasons to keep
the VSC73xx from being reset.
This commit removes the reset blockade in the 'vsc73xx_phy_write'
function.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The VSC73xx has a busy flag used during MDIO operations. It is raised
when MDIO read/write operations are in progress. Without it, PHYs are
misconfigured and bus operations do not work as expected.
Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the 'vsc73xx_phy_write' function, the register value is missing,
and the phy write operation always sends zeros.
This commit passes the value variable into the proper register.
Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the datasheet description ("Port Mode Procedure" in 5.6.2),
the VSC73XX_MAC_CFG_WEXC_DIS bit is configured only for half duplex mode.
The WEXC_DIS bit is responsible for MAC behavior after an excessive
collision. Let's set it as described in the datasheet.
Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In axiethernet header fix register defines comment description to be
inline with IP documentation. It updates MAC configuration register,
MDIO configuration register and frame filter control description.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can't dereference "skb" after calling vcc->push() because the skb
is released.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Fix 32-bit PTI for real.
pti_clone_entry_text() is called twice, once before initcalls so that
initcalls can use the user-mode helper and then again after text is
set read only. Setting read only on 32-bit might break up the PMD
mapping, which makes the second invocation of pti_clone_entry_text()
find the mappings out of sync and failing.
Allow the second call to split the existing PMDs in the user mapping
and synchronize with the kernel mapping.
- Don't make acpi_mp_wake_mailbox read-only after init as the mail box
must be writable in the case that CPU hotplug operations happen after
boot. Otherwise the attempt to start a CPU crashes with a write to
read only memory.
- Add a missing sanity check in mtrr_save_state() to ensure that the
fixed MTRR MSRs are supported.
Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but
the WARN_ON() can bring systems down when panic on warn is set.
* tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Check if fixed MTRRs exist before saving them
x86/paravirt: Fix incorrect virt spinlock setting on bare metal
x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox
x86/mm: Fix PTI for i386 some more
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time keeping fixes from Thomas Gleixner:
- Fix a couple of issues in the NTP code where user supplied values are
neither sanity checked nor clamped to the operating range. This
results in integer overflows and eventualy NTP getting out of sync.
According to the history the sanity checks had been removed in favor
of clamping the values, but the clamping never worked correctly under
all circumstances. The NTP people asked to not bring the sanity
checks back as it might break existing applications.
Make the clamping work correctly and add it where it's missing
- If adjtimex() sets the clock it has to trigger the hrtimer subsystem
so it can adjust and if the clock was set into the future expire
timers if needed. The caller should provide a bitmask to tell
hrtimers which clocks have been adjusted.
adjtimex() uses not the proper constant and uses CLOCK_REALTIME
instead, which is 0. So hrtimers adjusts only the clocks, but does
not check for expired timers, which might make them expire really
late. Use the proper bitmask constant instead.
* tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
ntp: Safeguard against time_constant overflow
ntp: Clamp maxerror and esterror to operating range
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Three small fixes for interrupt core and drivers:
- The interrupt core fails to honor caller supplied affinity hints
for non-managed interrupts and uses the system default affinity on
startup instead. Set the missing flag in the descriptor to tell the
core to use the provided affinity.
- Fix a shift out of bounds error in the Xilinx driver
- Handle switching to level trigger correctly in the RISCV APLIC
driver. It failed to retrigger the interrupt which causes it to
become stale"
* tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
irqchip/xilinx: Fix shift out of bounds
genirq/irqdesc: Honor caller provided affinity in alloc_desc()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for reported issues for
6.11-rc3. Included in here are:
- usb serial driver MODULE_DESCRIPTION() updates
- usb serial driver fixes
- typec driver fixes
- usb-ip driver fix
- gadget driver fixes
- dt binding update
All of these have been in linux-next with no reported issues"
* tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Fix a deadlock in ucsi_send_command_common()
usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message
usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt()
usb: gadget: f_fs: restore ffs_func_disable() functionality
USB: serial: debug: do not echo input by default
usb: typec: tipd: Delete extra semi-colon
usb: typec: tipd: Fix dereferencing freeing memory in tps6598x_apply_patch()
usb: gadget: u_serial: Set start_delayed during suspend
usb: typec: tcpci: Fix error code in tcpci_check_std_output_cap()
usb: typec: fsa4480: Check if the chip is really there
usb: gadget: core: Check for unset descriptor
usb: vhci-hcd: Do not drop references before new references are gained
usb: gadget: u_audio: Check return codes from usb_ep_enable and config_ep_by_speed.
usb: gadget: midi2: Fix the response for FB info with block 0xff
dt-bindings: usb: microchip,usb2514: Add USB2517 compatible
USB: serial: garmin_gps: use struct_size() to allocate pkt
USB: serial: garmin_gps: annotate struct garmin_packet with __counted_by
USB: serial: add missing MODULE_DESCRIPTION() macros
USB: serial: spcp8x5: remove unused struct 'spcp8x5_usb_ctrl_arg'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for reported problems
for 6.11-rc3. Included in here are:
- sc16is7xx serial driver fixes
- uartclk bugfix for a divide by zero issue
- conmakehash userspace build issue fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: vt: conmakehash: cope with abs_srctree no longer in env
serial: sc16is7xx: fix invalid FIFO access with special register set
serial: sc16is7xx: fix TX fifo corruption
serial: core: check uartclk for zero to avoid divide by zero
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core / documentation fixes from Greg KH:
"Here are some small fixes, and some documentation updates for
6.11-rc3. Included in here are:
- embargoed hardware documenation updates based on a lot of review by
legal-types in lots of companies to try to make the process a _bit_
easier for us to manage over time.
- rust firmware documentation fix
- driver detach race fix for the fix that went into 6.11-rc1
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Fix uevent_show() vs driver detach race
Documentation: embargoed-hardware-issues.rst: add a section documenting the "early access" process
Documentation: embargoed-hardware-issues.rst: minor cleanups and fixes
rust: firmware: fix invalid rustdoc link
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are some small char/misc/other driver fixes for 6.11-rc3 for
reported issues. Included in here are:
- binder driver fixes
- fsi MODULE_DESCRIPTION() additions (people seem to love them...)
- eeprom driver fix
- Kconfig dependency fix to resolve build issues
- spmi driver fixes
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
spmi: pmic-arb: add missing newline in dev_err format strings
spmi: pmic-arb: Pass the correct of_node to irq_domain_add_tree
binder_alloc: Fix sleeping function called from invalid context
binder: fix descriptor lookup for context manager
char: add missing NetWinder MODULE_DESCRIPTION() macros
misc: mrvl-cn10k-dpi: add PCI_IOV dependency
eeprom: ee1004: Fix locking issues in ee1004_probe()
fsi: add missing MODULE_DESCRIPTION() macros
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two core fixes: one to prevent discard type changes (seen on iSCSI)
during intermittent errors and the other is fixing a lockdep problem
caused by the queue limits change.
And one driver fix in ufs"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Keep the discard mode stable
scsi: sd: Move sd_read_cpr() out of the q->limits_lock region
scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic
|
|
ueue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-08-07 (igc)
This series contains updates to igc driver only.
Faizal adjusts the size of the MAC internal buffer on i226 devices to
resolve an errata for leaking packet transmits. He also corrects a
condition in which qbv_config_change_errors are incorrectly counted.
Lastly, he adjusts the conditions for resetting the adapter when
changing TSN Tx mode and corrects the conditions in which gtxoffset
register is set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
`ip_hdr(skb)->ihl << 2` is the same as `ip_hdrlen(skb)`
Therefore, we should use a well-defined function not a bit shift
to find the header length.
It also compresses two lines to a single line.
Signed-off-by: Moon Yeounsu <yyyynoom@gmail.com>
Reviewed-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two minor fixes for recent changes
* tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets
sunrpc: avoid -Wformat-security warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- Two fixes for SMBusAlert handling in the I2C core: one to avoid an
endless loop when scanning for handlers and one to make sure handlers
are always called even if HW has broken behaviour
- I2C header build fix for when ACPI is enabled but I2C isn't
- The testunit gets a rename in the code to match the documentation
- Two fixes for the Qualcomm GENI I2C controller are cleaning up the
error exit patch in the runtime_resume() function. The first is
disabling the clock, the second disables the icc on the way out
* tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: testunit: match HostNotify test name with docs
i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
i2c: qcom-geni: Add missing clk_disable_unprepare in geni_i2c_runtime_resume
i2c: Fix conditional for substituting empty ACPI functions
i2c: smbus: Send alert notifications to all devices if source not found
i2c: smbus: Improve handling of stuck alerts
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- avoid a deadlock with dma-debug and netconsole (Rik van Riel)
* tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: avoid deadlock between dma debug vs printk and netconsole
|
|
Pull more bcachefs fixes from Kent Overstreet:
"A couple last minute fixes for the new disk accounting
- fix a bug that was causing ACLs to seemingly "disappear"
- new on disk format version, bcachefs_metadata_version_disk_accounting_v3
bcachefs_metadata_version_disk_accounting_v2 accidentally included
padding in disk_accounting_key; fortunately, 6.11 isn't out yet so
we can fix this with another version bump"
* tag 'bcachefs-2024-08-10' of git://evilpiepirate.org/bcachefs:
bcachefs: bcachefs_metadata_version_disk_accounting_v3
bcachefs: improve bch2_dev_usage_to_text()
bcachefs: bch2_accounting_invalid()
bcachefs: Switch to .get_inode_acl()
|
|
wpa_supplicant 2.11 sends since 1efdba5fdc2c ("Handle PMKSA flush in the
driver for SAE/OWE offload cases") SSID based PMKSA del commands.
brcmfmac is not prepared and tries to dereference the NULL bssid and
pmkid pointers in cfg80211_pmksa. PMKID_V3 operations support SSID based
updates so copy the SSID.
Fixes: a96202acaea4 ("wifi: brcmfmac: cfg80211: Add support for PMKID_V3 operations")
Cc: stable@vger.kernel.org # 6.4.x
Signed-off-by: Janne Grunau <j@jannau.net>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/20240803-brcmfmac_pmksa_del_ssid-v1-1-4e85f19135e1@jannau.net
|
|
The section 4.5.2 of the RISC-V AIA specification says that "any write
to a sourcecfg register of an APLIC might (or might not) cause the
corresponding interrupt-pending bit to be set to one if the rectified
input value is high (= 1) under the new source mode."
When the interrupt type is changed in the sourcecfg register, the APLIC
device might not set the corresponding pending bit, so the interrupt might
never become pending.
To handle sourcecfg register changes for level-triggered interrupts in MSI
mode, manually set the pending bit for retriggering interrupt so it gets
retriggered if it was already asserted.
Fixes: ca8df97fe679 ("irqchip/riscv-aplic: Add support for MSI-mode")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240809071049.2454-1-yongxuan.wang@sifive.com
|
|
The device tree property 'xlnx,kind-of-intr' is sanity checked that the
bitmask contains only set bits which are in the range of the number of
interrupts supported by the controller.
The check is done by shifting the mask right by the number of supported
interrupts and checking the result for zero.
The data type of the mask is u32 and the number of supported interrupts is
up to 32. In case of 32 interrupts the shift is out of bounds, resulting in
a mismatch warning. The out of bounds condition is also reported by UBSAN:
UBSAN: shift-out-of-bounds in irq-xilinx-intc.c:332:22
shift exponent 32 is too large for 32-bit type 'unsigned int'
Fix it by promoting the mask to u64 for the test.
Fixes: d50466c90724 ("microblaze: intc: Refactor DT sanity check")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/1723186944-3571957-1-git-send-email-radhey.shyam.pandey@amd.com
|
|
Tariq Toukan says:
====================
mlx5 misc fixes 2024-08-08
This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
====================
Link: https://patch.msgid.link/20240808144107.2095424-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The queue stats API queries the queues according to the
real_num_[tr]x_queues, in case the device is down and channels were not
yet created, don't try to query their statistics.
To trigger the panic, run this command before the interface is brought
up:
./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml --dump qstats-get --json '{"ifindex": 4}'
BUG: kernel NULL pointer dereference, address: 0000000000000c00
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 977 Comm: python3 Not tainted 6.10.0+ #40
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
Code: fc 55 48 63 ee 53 48 89 d3 e8 40 3d 70 e1 85 c0 74 58 4c 89 ef e8 d4 07 04 00 84 c0 75 41 49 8b 84 24 f8 39 00 00 48 8b 04 e8 <48> 8b 90 00 0c 00 00 48 03 90 40 0a 00 00 48 89 53 08 48 8b 90 08
RSP: 0018:ffff888116be37d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888116be3868 RCX: 0000000000000004
RDX: ffff88810ada4000 RSI: 0000000000000000 RDI: ffff888109df09c0
RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000004
R10: ffff88813461901c R11: ffffffffffffffff R12: ffff888109df0000
R13: ffff888109df09c0 R14: ffff888116be38d0 R15: 0000000000000000
FS: 00007f4375d5c740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000c00 CR3: 0000000106ada006 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __die+0x1f/0x60
? page_fault_oops+0x14e/0x3d0
? exc_page_fault+0x73/0x130
? asm_exc_page_fault+0x22/0x30
? mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
netdev_nl_stats_by_netdev+0x2a6/0x4c0
? __rmqueue_pcplist+0x351/0x6f0
netdev_nl_qstats_get_dumpit+0xc4/0x1b0
genl_dumpit+0x2d/0x80
netlink_dump+0x199/0x410
__netlink_dump_start+0x1aa/0x2c0
genl_family_rcv_msg_dumpit+0x94/0xf0
? __pfx_genl_start+0x10/0x10
? __pfx_genl_dumpit+0x10/0x10
? __pfx_genl_done+0x10/0x10
genl_rcv_msg+0x116/0x2b0
? __pfx_netdev_nl_qstats_get_dumpit+0x10/0x10
? __pfx_genl_rcv_msg+0x10/0x10
netlink_rcv_skb+0x54/0x100
genl_rcv+0x24/0x40
netlink_unicast+0x21a/0x340
netlink_sendmsg+0x1f4/0x440
__sys_sendto+0x1b6/0x1c0
? do_sock_setsockopt+0xc3/0x180
? __sys_setsockopt+0x60/0xb0
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x50/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f43757132b0
Code: c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 41 54 48 83 ec 20
RSP: 002b:00007ffd258da048 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffd258da0f8 RCX: 00007f43757132b0
RDX: 000000000000001c RSI: 00007f437464b850 RDI: 0000000000000003
RBP: 00007f4375085de0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f43751a6147
</TASK>
Modules linked in: netconsole xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse [last unloaded: netconsole]
CR2: 0000000000000c00
---[ end trace 0000000000000000 ]---
RIP: 0010:mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
Code: fc 55 48 63 ee 53 48 89 d3 e8 40 3d 70 e1 85 c0 74 58 4c 89 ef e8 d4 07 04 00 84 c0 75 41 49 8b 84 24 f8 39 00 00 48 8b 04 e8 <48> 8b 90 00 0c 00 00 48 03 90 40 0a 00 00 48 89 53 08 48 8b 90 08
RSP: 0018:ffff888116be37d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888116be3868 RCX: 0000000000000004
RDX: ffff88810ada4000 RSI: 0000000000000000 RDI: ffff888109df09c0
RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000004
R10: ffff88813461901c R11: ffffffffffffffff R12: ffff888109df0000
R13: ffff888109df09c0 R14: ffff888116be38d0 R15: 0000000000000000
FS: 00007f4375d5c740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000c00 CR3: 0000000106ada006 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 7b66ae536a78 ("net/mlx5e: Add per queue netdev-genl stats")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240808144107.2095424-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Previously, an ethtool rx flow with no attrs would not be added to the
NIC as it has no rules to configure the hw with, but it would be
reported as successful to the caller (return code 0). This is confusing
for the user as ethtool then reports "Added rule $num", but no rule was
actually added.
This change corrects that by instead reporting these wrong rules as
-EINVAL.
Fixes: b29c61dac3a2 ("net/mlx5e: Ethtool steering flow validation refactoring")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20240808144107.2095424-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5e_safe_reopen_channels() requires the state lock taken. The
referenced changed in the Fixes tag removed the lock to fix another
issue. This patch adds it back but at a later point (when calling
mlx5e_safe_reopen_channels()) to avoid the deadlock referenced in the
Fixes tag.
Fixes: eab0da38912e ("net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_work")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://lore.kernel.org/all/ZplpKq8FKi3vwfxv@gmail.com/T/
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20240808144107.2095424-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During latency tests (netperf TCP_RR) a 30% degradation of HW GRO vs SW
GRO was observed. This is due to SHAMPO triggering timeout filler CQEs
instead of delivering the CQE for the packet.
Having a short timeout for SHAMPO doesn't bring any benefits as it is
the driver that does the merging, not the hardware. On the contrary, it
can have a negative impact: additional filler CQEs are generated due to
the timeout. As there is no way to disable this timeout, this change
sets it to the maximum value.
Instead of using the packet_merge.timeout parameter which is also used
for LRO, set the value directly when filling in the rest of the SHAMPO
parameters in mlx5e_build_rq_param().
Fixes: 99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20240808144107.2095424-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Unconditionally calling the MPIR query on BF separate mode yields the FW
syndrome below [1]. Do not call it unless admin clearly specified the SD
group, i.e. expressing the intention of using the multi-PF netdev
feature.
This fix covers cases not covered in
commit fca3b4791850 ("net/mlx5: Do not query MPIR on embedded CPU function").
[1]
mlx5_cmd_out_err:808:(pid 8267): ACCESS_REG(0x805) op_mod(0x1) failed,
status bad system state(0x4), syndrome (0x685f19), err(-5)
Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240808144107.2095424-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'don-t-take-hw-uso-path-when-packets-can-t-be-checksummed-by-device'
Jakub Sitnicki says:
====================
Don't take HW USO path when packets can't be checksummed by device
This series addresses a recent regression report from syzbot [1].
After enabling UDP_SEGMENT for egress devices which don't support checksum
offload [2], we need to tighten down the checks which let packets take the
HW USO path.
The fix consists of two parts:
1. don't let devices offer USO without checksum offload, and
2. force software USO fallback in presence of IPv6 extension headers.
[1] https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10154dbded6d6a2fecaebdfda206609de0f121a9
v3: https://lore.kernel.org/r/20240807-udp-gso-egress-from-tunnel-v3-0-8828d93c5b45@cloudflare.com
v2: https://lore.kernel.org/r/20240801-udp-gso-egress-from-tunnel-v2-0-9a2af2f15d8d@cloudflare.com
v1: https://lore.kernel.org/r/20240725-udp-gso-egress-from-tunnel-v1-0-5e5530ead524@cloudflare.com
====================
Link: https://patch.msgid.link/20240808-udp-gso-egress-from-tunnel-v4-0-f5c5b4149ab9@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After enabling UDP GSO for devices not offering checksum offload, we have
hit a regression where a bad offload warning can be triggered when sending
a datagram with IPv6 extension headers.
Extend the UDP GSO IPv6 tests to cover this scenario.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20240808-udp-gso-egress-from-tunnel-v4-3-f5c5b4149ab9@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
checksum offload") we have intentionally allowed UDP GSO packets marked
CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and
checksummed by a software fallback when the egress device lacks these
features.
What was not taken into consideration is that a CHECKSUM_NONE skb can be
handed over to the GSO stack also when the egress device advertises the
tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature.
This will happen when there are IPv6 extension headers present, which we
check for in __ip6_append_data(). Syzbot has discovered this scenario,
producing a warning as below:
ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
Modules linked in:
CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
[...]
Call Trace:
<TASK>
__skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
__dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
neigh_output include/net/neighbour.h:542 [inline]
ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xef/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
___sys_sendmsg net/socket.c:2639 [inline]
__sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
__do_sys_sendmmsg net/socket.c:2754 [inline]
__se_sys_sendmmsg net/socket.c:2751 [inline]
__x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[...]
</TASK>
We are hitting the bad offload warning because when an egress device is
capable of handling segmentation offload requested by
skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce
any segment skbs and return NULL. See the skb_gso_ok() branch in
{__udp,tcp,sctp}_gso_segment helpers.
To fix it, force a fallback to software USO when processing a packet with
IPv6 extension headers, since we don't know if these can checksummed by
all devices which offer USO.
Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20240808-udp-gso-egress-from-tunnel-v4-2-f5c5b4149ab9@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|