Age | Commit message (Collapse) | Author |
|
Support sw SCM_TSTAMP_SND case for bpf timestamping.
Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_SW_CB. This
callback will occur at the same timestamping point as the user
space's software SCM_TSTAMP_SND. The BPF program can use it to
get the same SCM_TSTAMP_SND timestamp without modifying the
user-space application.
Based on this patch, BPF program will get the software
timestamp when the driver is ready to send the skb. In the
sebsequent patch, the hardware timestamp will be supported.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-8-kerneljasonxing@gmail.com
|
|
Support SCM_TSTAMP_SCHED case for bpf timestamping.
Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SCHED_CB. This
callback will occur at the same timestamping point as the user
space's SCM_TSTAMP_SCHED. The BPF program can use it to get the
same SCM_TSTAMP_SCHED timestamp without modifying the user-space
application.
A new SKBTX_BPF flag is added to mark skb_shinfo(skb)->tx_flags,
ensuring that the new BPF timestamping and the current user
space's SO_TIMESTAMPING do not interfere with each other.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-7-kerneljasonxing@gmail.com
|
|
The subsequent patch will implement BPF TX timestamping. It will
call the sockops BPF program without holding the sock lock.
This breaks the current assumption that all sock ops programs will
hold the sock lock. The sock's fields of the uapi's bpf_sock_ops
requires this assumption.
To address this, a new "u8 is_locked_tcp_sock;" field is added. This
patch sets it in the current sock_ops callbacks. The "is_fullsock"
test is then replaced by the "is_locked_tcp_sock" test during
sock_ops_convert_ctx_access().
The new TX timestamping callbacks added in the subsequent patch will
not have this set. This will prevent unsafe access from the new
timestamping callbacks.
Potentially, we could allow read-only access. However, this would
require identifying which callback is read-safe-only and also requires
additional BPF instruction rewrites in the covert_ctx. Since the BPF
program can always read everything from a socket (e.g., by using
bpf_core_cast), this patch keeps it simple and disables all read
and write access to any socket fields through the bpf_sock_ops
UAPI from the new TX timestamping callback.
Moreover, note that some of the fields in bpf_sock_ops are specific
to tcp_sock, and sock_ops currently only supports tcp_sock. In
the future, UDP timestamping will be added, which will also break
this assumption. The same idea used in this patch will be reused.
Considering that the current sock_ops only supports tcp_sock, the
variable is named is_locked_"tcp"_sock.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250220072940.99994-4-kerneljasonxing@gmail.com
|
|
This patch introduces a new bpf_skops_tx_timestamping() function
that prepares the "struct bpf_sock_ops" ctx and then executes the
sockops BPF program.
The subsequent patch will utilize bpf_skops_tx_timestamping() at
the existing TX timestamping kernel callbacks (__sk_tstamp_tx
specifically) to call the sockops BPF program. Later, four callback
points to report information to user space based on this patch will
be introduced.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250220072940.99994-3-kerneljasonxing@gmail.com
|
|
The new SK_BPF_CB_FLAGS and new SK_BPF_CB_TX_TIMESTAMPING are
added to bpf_get/setsockopt. The later patches will implement the
BPF networking timestamping. The BPF program will use
bpf_setsockopt(SK_BPF_CB_FLAGS, SK_BPF_CB_TX_TIMESTAMPING) to
enable the BPF networking timestamping on a socket.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250220072940.99994-2-kerneljasonxing@gmail.com
|
|
The change is necessary to enable the use of the
`i2c_get_dma_safe_msg_buf()` API, which requires a non-const
`struct i2c_msg *` to operate. The `i2c_get_dma_safe_msg_buf()` function
ensures safe handling of I2C messages when using DMA, making it essential
for scenarios where DMA transfers are involved. By removing the `const`
qualifier, this patch allows drivers to prepare and manage DMA-safe
buffers directly.
Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com>
Link: https://lore.kernel.org/r/20250204091702.4014466-1-billy_tsai@aspeedtech.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Remove the hidden assumption that options are allocated at the end of
the struct, and teach the compiler about them using a flexible array.
With this, we can revert the unsafe_memcpy() call we have in
tun_dst_unclone() [1], and resolve the false field-spanning write
warning caused by the memcpy() in ip_tunnel_info_opts_set().
The layout of struct ip_tunnel_info remains the same with this patch.
Before this patch, there was an implicit padding at the end of the
struct, options would be written at 'info + 1' which is after the
padding.
This will remain the same as this patch explicitly aligns 'options'.
The alignment is needed as the options are later casted to different
structs, and might result in unaligned memory access.
Pahole output before this patch:
struct ip_tunnel_info {
struct ip_tunnel_key key; /* 0 64 */
/* XXX last struct has 1 byte of padding */
/* --- cacheline 1 boundary (64 bytes) --- */
struct ip_tunnel_encap encap; /* 64 8 */
struct dst_cache dst_cache; /* 72 16 */
u8 options_len; /* 88 1 */
u8 mode; /* 89 1 */
/* size: 96, cachelines: 2, members: 5 */
/* padding: 6 */
/* paddings: 1, sum paddings: 1 */
/* last cacheline: 32 bytes */
};
Pahole output after this patch:
struct ip_tunnel_info {
struct ip_tunnel_key key; /* 0 64 */
/* XXX last struct has 1 byte of padding */
/* --- cacheline 1 boundary (64 bytes) --- */
struct ip_tunnel_encap encap; /* 64 8 */
struct dst_cache dst_cache; /* 72 16 */
u8 options_len; /* 88 1 */
u8 mode; /* 89 1 */
/* XXX 6 bytes hole, try to pack */
u8 options[] __attribute__((__aligned__(16))); /* 96 0 */
/* size: 96, cachelines: 2, members: 6 */
/* sum members: 90, holes: 1, sum holes: 6 */
/* paddings: 1, sum paddings: 1 */
/* forced alignments: 1, forced holes: 1, sum forced holes: 6 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(16)));
[1] Commit 13cfd6a6d7ac ("net: Silence false field-spanning write warning in metadata_dst memcpy")
Link: https://lore.kernel.org/all/53D1D353-B8F6-4ADC-8F29-8C48A7C9C6F1@kernel.org/
Suggested-by: Kees Cook <kees@kernel.org>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250219143256.370277-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tunnel options should not be accessed directly, use the ip_tunnel_info()
accessor instead.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250219143256.370277-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc4).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Smaller than usual with no fixes from any subtree.
Current release - regressions:
- core: fix race of rtnl_net_lock(dev_net(dev))
Previous releases - regressions:
- core: remove the single page frag cache for good
- flow_dissector: fix handling of mixed port and port-range keys
- sched: cls_api: fix error handling causing NULL dereference
- tcp:
- adjust rcvq_space after updating scaling ratio
- drop secpath at the same time as we currently drop dst
- eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl().
Previous releases - always broken:
- vsock:
- fix variables initialization during resuming
- for connectible sockets allow only connected
- eth:
- geneve: fix use-after-free in geneve_find_dev()
- ibmvnic: don't reference skb after sending to VIOS"
* tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
Revert "net: skb: introduce and use a single page frag cache"
net: allow small head cache usage with large MAX_SKB_FRAGS values
nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()
tcp: drop secpath at the same time as we currently drop dst
net: axienet: Set mac_managed_pm
arp: switch to dev_getbyhwaddr() in arp_req_set_public()
net: Add non-RCU dev_getbyhwaddr() helper
sctp: Fix undefined behavior in left shift operation
selftests/bpf: Add a specific dst port matching
flow_dissector: Fix port range key handling in BPF conversion
selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range
flow_dissector: Fix handling of mixed port and port-range keys
geneve: Suppress list corruption splat in geneve_destroy_tunnels().
gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().
dev: Use rtnl_net_dev_lock() in unregister_netdev().
net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
net: Add net_passive_inc() and net_passive_dec().
net: pse-pd: pd692x0: Fix power limit retrieval
MAINTAINERS: trim the GVE entry
gve: set xdp redirect target only when it is available
...
|
|
This adds basic support for LEDs for the max77705 PMIC.
Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250123-starqltechn_integration_upstream-v17-7-8b06685b6612@gmail.com
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
Add the core MFD driver for max77705 PMIC. Drivers for sub-devices
will be added in subsequent patches.
Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250123-starqltechn_integration_upstream-v17-5-8b06685b6612@gmail.com
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
Add driver for Maxim 77705 switch-mode charger.
It providing power supply class information to userspace.
The driver is configured through DTS (battery and system related
settings).
Signed-off-by: Dzmitry Sankouski <dsankouski@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Link: https://lore.kernel.org/r/20250123-starqltechn_integration_upstream-v17-3-8b06685b6612@gmail.com
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
Add AXP717_TS_PIN_CFG (register 0x50) to the table of writeable
registers so that the temperature sensor can be configured by the
battery driver.
Signed-off-by: Chris Morgan <macromorgan@hotmail.com>
Link: https://lore.kernel.org/r/20250204155835.161973-3-macroalpha82@gmail.com
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
For existing epoll event loops that can't fully convert to io_uring,
the used approach is usually to add the io_uring fd to the epoll
instance and use epoll_wait() to wait on both "legacy" and io_uring
events. While this work, it isn't optimal as:
1) epoll_wait() is pretty limited in what it can do. It does not support
partial reaping of events, or waiting on a batch of events.
2) When an io_uring ring is added to an epoll instance, it activates the
io_uring "I'm being polled" logic which slows things down.
Rather than use this approach, with EPOLL_WAIT support added to io_uring,
event loops can use the normal io_uring wait logic for everything, as
long as an epoll wait request has been armed with io_uring.
Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this
is an async request. Waiting on io_uring events in general has various
timeout parameters, and those are the ones that should be used when
waiting on any kind of request. If events are immediately available for
reaping, then This opcode will return those immediately. If none are
available, then it will post an async completion when they become
available.
cqe->res will contain either an error code (< 0 value) for a malformed
request, invalid epoll instance, etc. It will return a positive result
indicating how many events were reaped.
IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring
cancelation infrastructure.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs into for-6.15/io_uring-epoll-wait
Merge epoll changes from the VFS tree, which the io_uring changes depend
on.
* 'vfs-6.15.eventpoll' of https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
eventpoll: add epoll_sendevents() helper
eventpoll: abstract out ep_try_send_events() helper
eventpoll: abstract out parameter sanity checking
|
|
* for-6.15/io_uring-rx-zc: (77 commits)
io_uring: Rename KConfig to Kconfig
io_uring/zcrx: fix leaks on failed registration
io_uring/zcrx: recheck ifq on shutdown
io_uring/zcrx: add selftest
net: add documentation for io_uring zcrx
io_uring/zcrx: add copy fallback
io_uring/zcrx: throttle receive requests
io_uring/zcrx: set pp memory provider for an rx queue
io_uring/zcrx: add io_recvzc request
io_uring/zcrx: dma-map area for the device
io_uring/zcrx: implement zerocopy receive pp memory provider
io_uring/zcrx: grab a net device
io_uring/zcrx: add io_zcrx_area
io_uring/zcrx: add interface queue and refill queue
net: add helpers for setting a memory provider on an rx queue
net: page_pool: add memory provider helpers
net: prepare for non devmem TCP memory providers
net: page_pool: add a mp hook to unregister_netdevice*
net: page_pool: add callback for mp info printing
netdev: add io_uring memory provider info
...
|
|
* for-6.15/io_uring: (30 commits)
io_uring: use lockless_cq flag in io_req_complete_post()
io_uring: pass struct io_tw_state by value
io_uring: introduce type alias for io_tw_state
io_uring/rsrc: avoid NULL check in io_put_rsrc_node()
io_uring: pass ctx instead of req to io_init_req_drain()
io_uring: use IO_REQ_LINK_FLAGS more
io_uring/net: improve recv bundles
io_uring/waitid: use generic io_cancel_remove() helper
io_uring/futex: use generic io_cancel_remove() helper
io_uring/cancel: add generic cancel helper
io_uring/waitid: convert to io_cancel_remove_all()
io_uring/futex: convert to io_cancel_remove_all()
io_uring/cancel: add generic remove_all helper
io_uring/kbuf: uninline __io_put_kbufs
io_uring/kbuf: introduce io_kbuf_drop_legacy()
io_uring/kbuf: open code __io_put_kbuf()
io_uring/kbuf: remove legacy kbuf caching
io_uring/kbuf: simplify __io_put_kbuf
io_uring/kbuf: move locking into io_kbuf_drop()
io_uring/kbuf: remove legacy kbuf kmem cache
...
|
|
Fuse allows the value of a symlink to change and this property is exploited
by some filesystems (e.g. CVMFS).
It has been observed, that sometimes after changing the symlink contents,
the value is truncated to the old size.
This is caused by fuse_getattr() racing with fuse_reverse_inval_inode().
fuse_reverse_inval_inode() updates the fuse_inode's attr_version, which
results in fuse_change_attributes() exiting before updating the cached
attributes
This is okay, as the cached attributes remain invalid and the next call to
fuse_change_attributes() will likely update the inode with the correct
values.
The reason this causes problems is that cached symlinks will be
returned through page_get_link(), which truncates the symlink to
inode->i_size. This is correct for filesystems that don't mutate
symlinks, but in this case it causes bad behavior.
The solution is to just remove this truncation. This can cause a
regression in a filesystem that relies on supplying a symlink larger than
the file size, but this is unlikely. If that happens we'd need to make
this behavior conditional.
Reported-by: Laura Promberger <laura.promberger@cern.ch>
Tested-by: Sam Lewis <samclewis@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/r/20250220100258.793363-1-mszeredi@redhat.com
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The following functions have been in the mei code for
a long time but have never been used.
mei_txe_setup_satt2() was added in 2014 by
commit 32e2b59fca2c ("mei: txe: add hw-txe.c")
mei_me_cl_rm_by_uuid_id() was added in 2015 by
commit 79563db9ddd3 ("mei: add reference counting for me clients")
mei_cldev_uuid() was added in 2015 by
commit baeacd037697 ("mei: bus: export uuid and protocol version to mei_cl
bus drivers")
mei_cldev_recv_nonblock() was added in 2016 by
commit 076802d00615 ("mei: bus: enable non-blocking RX")
it is the only user of mei_cldev_recv_nonblock_vtag().
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250130012654.255119-1-linux@treblig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Interrupt controller drivers which enable CONFIG_GENERIC_PENDING_IRQ
require to know whether an interrupt can be moved in process context or not
to decide whether they need to invoke the work around for non-atomic MSI
updates or not.
This information can be retrieved via irq_can_move_pcntxt(). That helper
requires access to the top-most interrupt domain data, but the driver which
requires this is usually further down in the hierarchy.
Introduce irq_can_move_in_process_context() which retrieves that
information from the top-most interrupt domain data.
[ tglx: Massaged change log ]
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-6-apatel@ventanamicro.com
|
|
CONFIG_GENERIC_PENDING_IRQ requires an architecture specific implementation
of irq_force_complete_move() for CPU hotplug. At the moment, only x86
implements this unconditionally, but for RISC-V irq_force_complete_move()
is only needed when the RISC-V IMSIC driver is in use and not needed
otherwise.
To allow runtime configuration of this mechanism, introduce a common
irq_force_complete_move() implementation in the interrupt core code, which
only invokes the completion function, when a interrupt chip in the
hierarchy implements it.
Switch X86 over to the new mechanism. No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-5-apatel@ventanamicro.com
|
|
msi_lib_init_dev_msi_info() sets the default irq_eoi()/irq_ack() callbacks
unconditionally. This is correct for all existing users, but prevents the
IMSIC driver to be moved to the MSI library implementation.
Introduce chip_flags in struct msi_parent_ops, which instruct the library
to selectively set the callbacks depending on the flags, and update all
current users to set them.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250217085657.789309-3-apatel@ventanamicro.com
|
|
Remove return since both device_remove_group() and device_remove_groups()
are void functions.
Fixes: e323b2dddc1c ("driver core: add device_{add|remove}_group() helpers")
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250208-fix_device_remove_group-v1-1-8a5b0ac0ce5c@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Remove return since both class_remove_file() and class_remove_file_ns()
are void functions.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250208-cls_rmv_return-v1-1-091b37945aac@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
After the previous commit is finally safe to revert commit dbae2b062824
("net: skb: introduce and use a single page frag cache"): do it here.
The intended goal of such change was to counter a performance regression
introduced by commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs").
Unfortunately, the blamed commit introduces another regression for the
virtio_net driver. Such a driver calls napi_alloc_skb() with a tiny
size, so that the whole head frag could fit a 512-byte block.
The single page frag cache uses a 1K fragment for such allocation, and
the additional overhead, under small UDP packets flood, makes the page
allocator a bottleneck.
Thanks to commit bf9f1baa279f ("net: add dedicated kmem_cache for
typical/small skb->head"), this revert does not re-introduce the
original regression. Actually, in the relevant test on top of this
revert, I measure a small but noticeable positive delta, just above
noise level.
The revert itself required some additional mangling due to recent updates
in the affected code.
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: dbae2b062824 ("net: skb: introduce and use a single page frag cache")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Sabrina reported the following splat:
WARNING: CPU: 0 PID: 1 at net/core/dev.c:6935 netif_napi_add_weight_locked+0x8f2/0xba0
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-net-00092-g011b03359038 #996
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
RIP: 0010:netif_napi_add_weight_locked+0x8f2/0xba0
Code: e8 c3 e6 6a fe 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc c7 44 24 10 ff ff ff ff e9 8f fb ff ff e8 9e e6 6a fe <0f> 0b e9 d3 fe ff ff e8 92 e6 6a fe 48 8b 04 24 be ff ff ff ff 48
RSP: 0000:ffffc9000001fc60 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88806ce48128 RCX: 1ffff11001664b9e
RDX: ffff888008f00040 RSI: ffffffff8317ca42 RDI: ffff88800b325cb6
RBP: ffff88800b325c40 R08: 0000000000000001 R09: ffffed100167502c
R10: ffff88800b3a8163 R11: 0000000000000000 R12: ffff88800ac1c168
R13: ffff88800ac1c168 R14: ffff88800ac1c168 R15: 0000000000000007
FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888008201000 CR3: 0000000004c94001 CR4: 0000000000370ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
gro_cells_init+0x1ba/0x270
xfrm_input_init+0x4b/0x2a0
xfrm_init+0x38/0x50
ip_rt_init+0x2d7/0x350
ip_init+0xf/0x20
inet_init+0x406/0x590
do_one_initcall+0x9d/0x2e0
do_initcalls+0x23b/0x280
kernel_init_freeable+0x445/0x490
kernel_init+0x20/0x1d0
ret_from_fork+0x46/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
irq event stamp: 584330
hardirqs last enabled at (584338): [<ffffffff8168bf87>] __up_console_sem+0x77/0xb0
hardirqs last disabled at (584345): [<ffffffff8168bf6c>] __up_console_sem+0x5c/0xb0
softirqs last enabled at (583242): [<ffffffff833ee96d>] netlink_insert+0x14d/0x470
softirqs last disabled at (583754): [<ffffffff8317c8cd>] netif_napi_add_weight_locked+0x77d/0xba0
on kernel built with MAX_SKB_FRAGS=45, where SKB_WITH_OVERHEAD(1024)
is smaller than GRO_MAX_HEAD.
Such built additionally contains the revert of the single page frag cache
so that napi_get_frags() ends up using the page frag allocator, triggering
the splat.
Note that the underlying issue is independent from the mentioned
revert; address it ensuring that the small head cache will fit either TCP
and GRO allocation and updating napi_alloc_skb() and __netdev_alloc_skb()
to select kmalloc() usage for any allocation fitting such cache.
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2025-02-19
this is a pull request of 12 patches for net-next/master.
The first 4 patches are by Krzysztof Kozlowski and simplify the c_can
driver's c_can_plat_probe() function.
Ciprian Marian Costea contributes 3 patches to add S32G2/S32G3 support
to the flexcan driver.
Ruffalo Lavoisier's patch removes a duplicated word from the mcp251xfd
DT bindings documentation.
Oleksij Rempel extends the J1939 documentation.
The next patch is by Oliver Hartkopp and adds access for the Remote
Request Substitution bit in CAN-XL frames.
Henrik Brix Andersen's patch for the gs_usb driver adds support for
the CANnectivity firmware.
The last patch is by Robin van der Gracht and removes a duplicated
setup of RX FIFO in the rockchip_canfd driver.
linux-can-next-for-6.15-20250219
* tag 'linux-can-next-for-6.15-20250219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: rockchip_canfd: rkcanfd_chip_fifo_setup(): remove duplicated setup of RX FIFO
can: gs_usb: add VID/PID for the CANnectivity firmware
can: canxl: support Remote Request Substitution bit access
can: j1939: Extend stack documentation with buffer size behavior
dt-binding: can: mcp251xfd: remove duplicate word
can: flexcan: add NXP S32G2/S32G3 SoC support
can: flexcan: Add quirk to handle separate interrupt lines for mailboxes
dt-bindings: can: fsl,flexcan: add S32G2/S32G3 SoC support
can: c_can: Use syscon_regmap_lookup_by_phandle_args
can: c_can: Use of_property_present() to test existence of DT property
can: c_can: Simplify handling syscon error path
can: c_can: Drop useless final probe failure message
====================
Link: https://patch.msgid.link/20250219113354.529611-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Basic helper that copies ready events to the specified userspace
address. The event checking is quick and racy, it's up to the caller
to ensure it retries appropriately in case 0 events are copied.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20250219172552.1565603-4-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while
running tests that boil down to:
- create a pair of netns
- run a basic TCP test over ipcomp6
- delete the pair of netns
The xfrm_state found on spi_byaddr was not deleted at the time we
delete the netns, because we still have a reference on it. This
lingering reference comes from a secpath (which holds a ref on the
xfrm_state), which is still attached to an skb. This skb is not
leaked, it ends up on sk_receive_queue and then gets defer-free'd by
skb_attempt_defer_free.
The problem happens when we defer freeing an skb (push it on one CPU's
defer_list), and don't flush that list before the netns is deleted. In
that case, we still have a reference on the xfrm_state that we don't
expect at this point.
We already drop the skb's dst in the TCP receive path when it's no
longer needed, so let's also drop the secpath. At this point,
tcp_filter has already called into the LSM hooks that may require the
secpath, so it should not be needed anymore. However, in some of those
places, the MPTCP extension has just been attached to the skb, so we
cannot simply drop all extensions.
Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/5055ba8f8f72bdcb602faa299faca73c280b7735.1739743613.git.sd@queasysnail.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
User-space cursor-image data is encoded in ARBG8888, while hardware
supports ARGB4444. Implement the format conversion as part of the
format-helper framework, so that other drivers can benefit.
This allows to respect the damage area of the cursor update. In
previous code, all cursor image data had to be converted on each
update. Now, only the changed areas require an update. The hardware
image is always updated completely, as it is required for the
checksum update.
The format-conversion helper still contains the old implementation's
optimization of writing 2 output pixels at the same time.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250217122336.230067-3-tzimmermann@suse.de
|
|
After the previous patch we can remove the forward_alloc_get
proto callback, basically reverting commit 292e6077b040 ("net: introduce
sk_forward_alloc_get()") and commit 66d58f046c9d ("net: use
sk_forward_alloc_get() in sk_get_meminfo()").
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250218-net-next-mptcp-rx-path-refactor-v1-5-4a47d90d7998@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add dedicated helper for finding devices by hardware address when
holding rtnl_lock, similar to existing dev_getbyhwaddr_rcu(). This prevents
PROVE_LOCKING warnings when rtnl_lock is held but RCU read lock is not.
Extract common address comparison logic into dev_addr_cmp().
The context about this change could be found in the following
discussion:
Link: https://lore.kernel.org/all/20250206-scarlet-ermine-of-improvement-1fcac5@leitao/
Cc: kuniyu@amazon.com
Cc: ushankar@purestorage.com
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-1-d3d6892db9e1@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
priv->plat->fix_mac_speed() is called from stmmac_mac_link_up(), which
is passed the speed as an "int". However, fix_mac_speed() implicitly
casts this to an unsigned int. Some platform glue code print this value
using %u, others with %d. Some implicitly cast it back to an int, and
others to u32.
Good practice is to use one type and only one type to represent a value
being passed around a driver.
Switch all of these over to consistently use "int" when dealing with a
speed passed from stmmac_mac_link_up(), even though the speed will
always be positive.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Link: https://patch.msgid.link/E1tkKmN-004ObM-Ge@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend IPv4 FIB rules to match on source and destination ports using a
mask. Note that the mask is only set when not matching on a range.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for configuring and deleting rules that match on source and
destination ports using a mask as well as support for dumping such rules
to user space.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add attributes that allow matching on source and destination ports with
a mask. Matching on the source port with a mask is needed in deployments
where users encode path information into certain bits of the UDP source
port.
Temporarily set the type of the attributes to 'NLA_REJECT' while support
is being added.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"18 hotfixes. 5 are cc:stable and the remainder address post-6.13
issues or aren't considered necessary for -stable kernels.
10 are for MM and 8 are for non-MM. All are singletons, please see the
changelogs for details"
* tag 'mm-hotfixes-stable-2025-02-19-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
test_xarray: fix failure in check_pause when CONFIG_XARRAY_MULTI is not defined
kasan: don't call find_vm_area() in a PREEMPT_RT kernel
MAINTAINERS: update Nick's contact info
selftests/mm: fix check for running THP tests
mm: hugetlb: avoid fallback for specific node allocation of 1G pages
memcg: avoid dead loop when setting memory.max
mailmap: update Nick's entry
mm: pgtable: fix incorrect reclaim of non-empty PTE pages
taskstats: modify taskstats version
getdelays: fix error format characters
mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize()
tools/mm: fix build warnings with musl-libc
mailmap: add entry for Feng Tang
.mailmap: add entries for Jeff Johnson
mm,madvise,hugetlb: check for 0-length range after end address adjustment
mm/zswap: fix inconsistency when zswap_store_page() fails
lib/iov_iter: fix import_iovec_ubuf iovec management
procfs: fix a locking bug in a vmcore_add_device_dump() error path
|
|
Similar to `access_process_vm()` but specific to strings. Also chunks
reads by page and utilizes `strscpy()` for handling null termination.
The primary motivation for this change is to copy strings from
a non-current task/process in BPF. There is already a helper
`bpf_copy_from_user_task()`, which uses `access_process_vm()` but one to
handle strings would be very helpful.
Signed-off-by: Jordan Rome <linux@jordanrome.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/bpf/20250213152125.1837400-1-linux@jordanrome.com
|
|
Report VOLTAGE_MAX_DESIGN for the bq27x00 and bq27x10 fuel gauges. Per the
datasheet, this value is stored in the Charge Termination Voltage Settings
(QV0 and QV1) of the Pack Configuration register.
Tested on the Nokia N900 with bq27200.
Signed-off-by: Sicelo A. Mhlongo <absicsz@gmail.com>
Link: https://lore.kernel.org/r/20250207211521.103357-1-absicsz@gmail.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
If rpc_signal_task() is called while a task is in an rpc_call_done()
callback function, and the latter calls rpc_restart_call(), the task can
end up looping due to the RPC_TASK_SIGNALLED flag being set without the
tk_rpc_status being set.
Removing the redundant mechanism for signalling the task fixes the
looping behaviour.
Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Fixes: 39494194f93b ("SUNRPC: Fix races with rpc_killall_tasks()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
It's pretty inconvenient to access the full atomic state from
drm_bridges, so let's change the atomic_post_disable hook prototype to
pass it directly.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250213-bridge-connector-v3-5-e71598f49c8f@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
It's pretty inconvenient to access the full atomic state from
drm_bridges, so let's change the atomic_disable hook prototype to pass
it directly.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250213-bridge-connector-v3-4-e71598f49c8f@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
It's pretty inconvenient to access the full atomic state from
drm_bridges, so let's change the atomic_enable hook prototype to pass it
directly.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250213-bridge-connector-v3-3-e71598f49c8f@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
It's pretty inconvenient to access the full atomic state from
drm_bridges, so let's change the atomic_pre_enable hook prototype to
pass it directly.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250213-bridge-connector-v3-2-e71598f49c8f@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
After some discussions on the mailing-list for an earlier revision of
the series, it was suggested to document the evolution of
drm_atomic_state and its use by drivers to explain some of the confusion
one might still encounter when reading the framework code.
Suggested-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://lore.kernel.org/dri-devel/Z4jtKHY4qN3RNZNG@phenom.ffwll.local/
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Link: https://lore.kernel.org/r/20250213-bridge-connector-v3-1-e71598f49c8f@kernel.org
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
A recent discussion has revealed that using DPM_FLAG_SMART_SUSPEND
unconditionally is generally problematic because it may lead to
situations in which the device's runtime PM information is internally
inconsistent or does not reflect its real state [1].
For this reason, change the handling of DPM_FLAG_SMART_SUSPEND so that
it is only taken into account if it is consistently set by the drivers
of all devices having any PM callbacks throughout dependency graphs in
accordance with the following rules:
- The "smart suspend" feature is only enabled for devices whose drivers
ask for it (that is, set DPM_FLAG_SMART_SUSPEND) and for devices
without PM callbacks unless they have never had runtime PM enabled.
- The "smart suspend" feature is not enabled for a device if it has not
been enabled for the device's parent unless the parent does not take
children into account or it has never had runtime PM enabled.
- The "smart suspend" feature is not enabled for a device if it has not
been enabled for one of the device's suppliers taking runtime PM into
account unless that supplier has never had runtime PM enabled.
Namely, introduce a new device PM flag called smart_suspend that is only
set if the above conditions are met and update all DPM_FLAG_SMART_SUSPEND
users to check power.smart_suspend instead of directly checking the
latter.
At the same time, drop the power.set_active flage introduced recently
in commit 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status
of parents and children") because it is now sufficient to check
power.smart_suspend along with the dev_pm_skip_resume() return value
to decide whether or not pm_runtime_set_active() needs to be called
for the device.
Link: https://lore.kernel.org/linux-pm/CAPDyKFroyU3YDSfw_Y6k3giVfajg3NQGwNWeteJWqpW29BojhQ@mail.gmail.com/ [1]
Fixes: 7585946243d6 ("PM: sleep: core: Restrict power.set_active propagation")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci
Link: https://patch.msgid.link/1914558.tdWV9SEqCh@rjwysocki.net
|
|
The Remote Request Substitution bit is a dominant bit ("0") in the CAN
XL frame. As some CAN XL controllers support to access this bit a new
CANXL_RRS value has been defined for the canxl_frame.flags element.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250124142347.7444-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Compute env->peak_states as a maximum value of sum of
env->explored_states and env->free_list size.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250215110411.3236773-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When fixes from patches 1 and 3 are applied, Patrick Somaru reported
an increase in memory consumption for sched_ext iterator-based
programs hitting 1M instructions limit. For example, 2Gb VMs ran out
of memory while verifying a program. Similar behaviour could be
reproduced on current bpf-next master.
Here is an example of such program:
/* verification completes if given 16G or RAM,
* final env->free_list size is 369,960 entries.
*/
SEC("raw_tp")
__flag(BPF_F_TEST_STATE_FREQ)
__success
int free_list_bomb(const void *ctx)
{
volatile char buf[48] = {};
unsigned i, j;
j = 0;
bpf_for(i, 0, 10) {
/* this forks verifier state:
* - verification of current path continues and
* creates a checkpoint after 'if';
* - verification of forked path hits the
* checkpoint and marks it as loop_entry.
*/
if (bpf_get_prandom_u32())
asm volatile ("");
/* this marks 'j' as precise, thus any checkpoint
* created on current iteration would not be matched
* on the next iteration.
*/
buf[j++] = 42;
j %= ARRAY_SIZE(buf);
}
asm volatile (""::"r"(buf));
return 0;
}
Memory consumption increased due to more states being marked as loop
entries and eventually added to env->free_list.
This commit introduces logic to free states from env->free_list during
verification. A state in env->free_list can be freed if:
- it has no child states;
- it is not used as a loop_entry.
This commit:
- updates bpf_verifier_state->used_as_loop_entry to be a counter
that tracks how many states use this one as a loop entry;
- adds a function maybe_free_verifier_state(), which:
- frees a state if its ->branches and ->used_as_loop_entry counters
are both zero;
- if the state is freed, state->loop_entry->used_as_loop_entry is
decremented, and an attempt is made to free state->loop_entry.
In the example above, this approach reduces the maximum number of
states in the free list from 369,960 to 16,223.
However, this approach has its limitations. If the buf size in the
example above is modified to 64, state caching overflows: the state
for j=0 is evicted from the cache before it can be used to stop
traversal. As a result, states in the free list accumulate because
their branch counters do not reach zero.
The effect of this patch on the selftests looks as follows:
File Program Max free list (A) Max free list (B) Max free list (DIFF)
-------------------------------- ------------------------------------ ----------------- ----------------- --------------------
arena_list.bpf.o arena_list_add 17 3 -14 (-82.35%)
bpf_iter_task_stack.bpf.o dump_task_stack 39 9 -30 (-76.92%)
iters.bpf.o checkpoint_states_deletion 265 89 -176 (-66.42%)
iters.bpf.o clean_live_states 19 0 -19 (-100.00%)
profiler2.bpf.o tracepoint__syscalls__sys_enter_kill 102 1 -101 (-99.02%)
profiler3.bpf.o tracepoint__syscalls__sys_enter_kill 144 0 -144 (-100.00%)
pyperf600_iter.bpf.o on_event 15 0 -15 (-100.00%)
pyperf600_nounroll.bpf.o on_event 1170 1158 -12 (-1.03%)
setget_sockopt.bpf.o skops_sockopt 18 0 -18 (-100.00%)
strobemeta_nounroll1.bpf.o on_event 147 83 -64 (-43.54%)
strobemeta_nounroll2.bpf.o on_event 312 209 -103 (-33.01%)
strobemeta_subprogs.bpf.o on_event 124 86 -38 (-30.65%)
test_cls_redirect_subprogs.bpf.o cls_redirect 15 0 -15 (-100.00%)
timer.bpf.o test1 30 15 -15 (-50.00%)
Measured using "do-not-submit" patches from here:
https://github.com/eddyz87/bpf/tree/get-loop-entry-hungup
Reported-by: Patrick Somaru <patsomaru@meta.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250215110411.3236773-10-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|