Age | Commit message (Collapse) | Author |
|
Commit 2be22aae6b18 ("phy: qcom-qmp-usb: populate offsets configuration")
introduced register offsets to the driver but for ipq8074/ipq6018 they do
not match what was in the old style device tree. Example from old
ipq6018.dtsi:
<0x00078200 0x130>, /* Tx */
<0x00078400 0x200>, /* Rx */
<0x00078800 0x1f8>, /* PCS */
<0x00078600 0x044>; /* PCS misc */
which would translate to:
{.., .pcs = 0x800, .pcs_misc = 0x600, .tx = 0x200, .rx = 0x400 }
but was translated to:
{.., .pcs = 0x600, .tx = 0x200, .rx = 0x400 }
So split usb_offsets and fix USB initialization for IPQ8074 and IPQ6018.
Tested only on IPQ6018
Fixes: 2be22aae6b18 ("phy: qcom-qmp-usb: populate offsets configuration")
Signed-off-by: Mantas Pucka <mantas@8devices.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/1706026160-17520-2-git-send-email-mantas@8devices.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Current code uses the specified ring buffer size (either the default of 128
Kbytes or a module parameter specified value) to encompass the one page
ring buffer header plus the actual ring itself. When the page size is 4K,
carving off one page for the header isn't significant. But when the page
size is 64K on ARM64, only half of the default 128 Kbytes is left for the
actual ring. While this doesn't break anything, the smaller ring size
could be a performance bottleneck.
Fix this by applying the VMBUS_RING_SIZE macro to the specified ring buffer
size. This macro adds a page for the header, and rounds up the size to a
page boundary, using the page size for which the kernel is built. Use this
new size for subsequent ring buffer calculations. For example, on ARM64
with 64K page size and the default ring size, this results in the actual
ring being 128 Kbytes, which is intended.
Cc: stable@vger.kernel.org # 5.15.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240122170956.496436-1-mhklinux@outlook.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
A recent change moved the code that decides to skip
a channel or disable multichannel entirely, into a
helper function.
During this, a mutex_unlock of the session_mutex
should have been removed. Doing that here.
Fixes: f591062bdbf4 ("cifs: handle servers that still advertise multichannel after disabling")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Similar to the rest of the commands, this is a change
to add replay flags on retry. This one does not add a
back-off, considering that we may want to flush a write
ASAP to the server. Considering that this will be a
flush of cached pages, the retrans value is also not
honoured.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
MS-SMB2 states that the header flag SMB2_FLAGS_REPLAY_OPERATION
needs to be set when a command needs to be retried, so that
the server is aware that this is a replay for an operation that
appeared before.
This can be very important, for example, for state changing
operations and opens which get retried following a reconnect;
since the client maybe unaware of the status of the previous
open.
This is particularly important for multichannel scenario, since
disconnection of one connection does not mean that the session
is lost. The requests can be replayed on another channel.
This change also makes use of exponential back-off before replays
and also limits the number of retries to "retrans" mount option
value.
Also, this change does not modify the read/write codepath.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The code to check for replay is not just -EAGAIN. In some
cases, the send request or receive response may result in
network errors, which we're now mapping to -ECONNABORTED.
This change introduces a helper function which checks
if the error returned in one of the above two errors.
And all checks for replays will now use this helper.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When the network stack returns various errors, we today bubble
up the error to the user (in case of soft mounts).
This change translates all network errors except -EINTR and
-EAGAIN to -ECONNABORTED. A similar approach is taken when
we receive network errors when reading from the socket.
The change also forces the cifsd thread to reconnect during
it's next activity.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host
lock every time for deciding if error handler kthread needs to be waken up.
This can be too heavy in case of recovery, such as:
- N hardware queues
- queue depth is M for each hardware queue
- each scsi_host_busy() iterates over (N * M) tag/requests
If recovery is triggered in case that all requests are in-flight, each
scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
for the last in-flight request, scsi_host_busy() has been run for (N * M -
1) times, and request has been iterated for (N*M - 1) * (N * M) times.
If both N and M are big enough, hard lockup can be triggered on acquiring
host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).
Fix the issue by calling scsi_host_busy() outside the host lock. We don't
need the host lock for getting busy count because host the lock never
covers that.
[mkp: Drop unnecessary 'busy' variables pointed out by Bart]
Cc: Ewan Milne <emilne@redhat.com>
Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Tested-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When a qdisc is deleted from a net device the stack instructs the
underlying driver to remove its flow offload callback from the
associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack
then continues to replay the removal of the filters in the block for
this driver by iterating over the chains in the block and invoking the
'reoffload' operation of the classifier being used. In turn, the
classifier in its 'reoffload' operation prepares and emits a
'FLOW_CLS_DESTROY' command for each filter.
However, the stack does not do the same for chain templates and the
underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when
a qdisc is deleted. This results in a memory leak [1] which can be
reproduced using [2].
Fix by introducing a 'tmplt_reoffload' operation and have the stack
invoke it with the appropriate arguments as part of the replay.
Implement the operation in the sole classifier that supports chain
templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}'
command based on whether a flow offload callback is being bound to a
filter block or being unbound from one.
As far as I can tell, the issue happens since cited commit which
reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains()
in __tcf_block_put(). The order cannot be reversed as the filter block
is expected to be freed after flushing all the chains.
[1]
unreferenced object 0xffff888107e28800 (size 2048):
comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
hex dump (first 32 bytes):
b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..|......[......
01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................
backtrace:
[<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff81ab374e>] __kmalloc+0x4e/0x90
[<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0
[<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
[<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
[<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
[<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
[<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
[<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
[<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
[<ffffffff83ac6270>] netlink_unicast+0x540/0x820
[<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
[<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0
[<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0
[<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0
unreferenced object 0xffff88816d2c0400 (size 1024):
comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s)
hex dump (first 32 bytes):
40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8.....
10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m....
backtrace:
[<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90
[<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0
[<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460
[<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0
[<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0
[<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180
[<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280
[<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340
[<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0
[<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170
[<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0
[<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440
[<ffffffff83ac6270>] netlink_unicast+0x540/0x820
[<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0
[<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[2]
# tc qdisc add dev swp1 clsact
# tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32
# tc qdisc del dev swp1 clsact
# devlink dev reload pci/0000:06:00.0
Fixes: bbf73830cd48 ("net: sched: traverse chains in block with tcf_get_next_chain()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are missing a lot of config options from net selftests,
it seems:
tun/tap: CONFIG_TUN, CONFIG_MACVLAN, CONFIG_MACVTAP
fib_tests: CONFIG_NET_SCH_FQ_CODEL
l2tp: CONFIG_L2TP, CONFIG_L2TP_V3, CONFIG_L2TP_IP, CONFIG_L2TP_ETH
sctp-vrf: CONFIG_INET_DIAG
txtimestamp: CONFIG_NET_CLS_U32
vxlan_mdb: CONFIG_BRIDGE_VLAN_FILTERING
gre_gso: CONFIG_NET_IPGRE_DEMUX, CONFIG_IP_GRE, CONFIG_IPV6_GRE
srv6_end_dt*_l3vpn: CONFIG_IPV6_SEG6_LWTUNNEL
ip_local_port_range: CONFIG_MPTCP
fib_test: CONFIG_NET_CLS_BASIC
rtnetlink: CONFIG_MACSEC, CONFIG_NET_SCH_HTB, CONFIG_XFRM_INTERFACE
CONFIG_NET_IPGRE, CONFIG_BONDING
fib_nexthops: CONFIG_MPLS, CONFIG_MPLS_ROUTING
vxlan_mdb: CONFIG_NET_ACT_GACT
tls: CONFIG_TLS, CONFIG_CRYPTO_CHACHA20POLY1305
psample: CONFIG_PSAMPLE
fcnal: CONFIG_TCP_MD5SIG
Try to add them in a semi-alphabetical order.
Fixes: 62199e3f1658 ("selftests: net: Add VXLAN MDB test")
Fixes: c12e0d5f267d ("self-tests: introduce self-tests for RPS default mask")
Fixes: 122db5e3634b ("selftests/net: add MPTCP coverage for IP_LOCAL_PORT_RANGE")
Link: https://lore.kernel.org/r/20240122203528.672004-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current code in netvsc_drv_init() incorrectly assumes that PAGE_SIZE
is 4 Kbytes, which is wrong on ARM64 with 16K or 64K page size. As a
result, the default VMBus ring buffer size on ARM64 with 64K page size
is 8 Mbytes instead of the expected 512 Kbytes. While this doesn't break
anything, a typical VM with 8 vCPUs and 8 netvsc channels wastes 120
Mbytes (8 channels * 2 ring buffers/channel * 7.5 Mbytes/ring buffer).
Unfortunately, the module parameter specifying the ring buffer size
is in units of 4 Kbyte pages. Ideally, it should be in units that
are independent of PAGE_SIZE, but backwards compatibility prevents
changing that now.
Fix this by having netvsc_drv_init() hardcode 4096 instead of using
PAGE_SIZE when calculating the ring buffer size in bytes. Also
use the VMBUS_RING_SIZE macro to ensure proper alignment when running
with page size larger than 4K.
Cc: <stable@vger.kernel.org> # 5.15.x
Fixes: 7aff79e297ee ("Drivers: hv: Enable Hyper-V code to be built on ARM64")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240122162028.348885-1-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5.
Using skb_ensure_writable_head_tail without a call to skb_unshare causes
the MACsec stack to operate on the original skb rather than a copy in the
macsec_encrypt path. This causes the buffer to be exceeded in space, and
leads to warnings generated by skb_put operations. Opting to revert this
change since skb_copy_expand is more efficient than
skb_ensure_writable_head_tail followed by a call to skb_unshare.
Log:
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:2464!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 21 PID: 61997 Comm: iperf3 Not tainted 6.7.0-rc8_for_upstream_debug_2024_01_07_17_05 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_put+0x113/0x190
Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 70 3b 9d bc 00 00 00 77 0e 48 83 c4 08 4c 89 e8 5b 5d 41 5d c3 <0f> 0b 4c 8b 6c 24 20 89 74 24 04 e8 6d b7 f0 fe 8b 74 24 04 48 c7
RSP: 0018:ffff8882694e7278 EFLAGS: 00010202
RAX: 0000000000000025 RBX: 0000000000000100 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff88816ae0bad4
RBP: ffff88816ae0ba60 R08: 0000000000000004 R09: 0000000000000004
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88811ba5abfa
R13: ffff8882bdecc100 R14: ffff88816ae0ba60 R15: ffff8882bdecc0ae
FS: 00007fe54df02740(0000) GS:ffff88881f080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe54d92e320 CR3: 000000010a345003 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? die+0x33/0x90
? skb_put+0x113/0x190
? do_trap+0x1b4/0x3b0
? skb_put+0x113/0x190
? do_error_trap+0xb6/0x180
? skb_put+0x113/0x190
? handle_invalid_op+0x2c/0x30
? skb_put+0x113/0x190
? exc_invalid_op+0x2b/0x40
? asm_exc_invalid_op+0x16/0x20
? skb_put+0x113/0x190
? macsec_start_xmit+0x4e9/0x21d0
macsec_start_xmit+0x830/0x21d0
? get_txsa_from_nl+0x400/0x400
? lock_downgrade+0x690/0x690
? dev_queue_xmit_nit+0x78b/0xae0
dev_hard_start_xmit+0x151/0x560
__dev_queue_xmit+0x1580/0x28f0
? check_chain_key+0x1c5/0x490
? netdev_core_pick_tx+0x2d0/0x2d0
? __ip_queue_xmit+0x798/0x1e00
? lock_downgrade+0x690/0x690
? mark_held_locks+0x9f/0xe0
ip_finish_output2+0x11e4/0x2050
? ip_mc_finish_output+0x520/0x520
? ip_fragment.constprop.0+0x230/0x230
? __ip_queue_xmit+0x798/0x1e00
__ip_queue_xmit+0x798/0x1e00
? __skb_clone+0x57a/0x760
__tcp_transmit_skb+0x169d/0x3490
? lock_downgrade+0x690/0x690
? __tcp_select_window+0x1320/0x1320
? mark_held_locks+0x9f/0xe0
? lockdep_hardirqs_on_prepare+0x286/0x400
? tcp_small_queue_check.isra.0+0x120/0x3d0
tcp_write_xmit+0x12b6/0x7100
? skb_page_frag_refill+0x1e8/0x460
__tcp_push_pending_frames+0x92/0x320
tcp_sendmsg_locked+0x1ed4/0x3190
? tcp_sendmsg_fastopen+0x650/0x650
? tcp_sendmsg+0x1a/0x40
? mark_held_locks+0x9f/0xe0
? lockdep_hardirqs_on_prepare+0x286/0x400
tcp_sendmsg+0x28/0x40
? inet_send_prepare+0x1b0/0x1b0
__sock_sendmsg+0xc5/0x190
sock_write_iter+0x222/0x380
? __sock_sendmsg+0x190/0x190
? kfree+0x96/0x130
vfs_write+0x842/0xbd0
? kernel_write+0x530/0x530
? __fget_light+0x51/0x220
? __fget_light+0x51/0x220
ksys_write+0x172/0x1d0
? update_socket_protocol+0x10/0x10
? __x64_sys_read+0xb0/0xb0
? lockdep_hardirqs_on_prepare+0x286/0x400
do_syscall_64+0x40/0xe0
entry_SYSCALL_64_after_hwframe+0x46/0x4e
RIP: 0033:0x7fe54d9018b7
Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffdbd4191d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000025 RCX: 00007fe54d9018b7
RDX: 0000000000000025 RSI: 0000000000d9859c RDI: 0000000000000004
RBP: 0000000000d9859c R08: 0000000000000004 R09: 0000000000000000
R10: 00007fe54d80afe0 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000025 R14: 00007fe54e00ec00 R15: 0000000000d982a0
</TASK>
Modules linked in: 8021q garp mrp iptable_raw bonding vfio_pci rdma_ucm ib_umad mlx5_vfio_pci mlx5_ib vfio_pci_core vfio_iommu_type1 ib_uverbs vfio mlx5_core ip_gre nf_tables ipip tunnel4 ib_ipoib ip6_gre gre ip6_tunnel tunnel6 geneve openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core zram zsmalloc fuse [last unloaded: ib_uverbs]
---[ end trace 0000000000000000 ]---
Cc: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Link: https://lore.kernel.org/r/20240118191811.50271-1-rrameshbabu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
cifs_pick_channel today just selects a channel based
on the policy of least loaded channel. However, it
does not take into account if the channel needs
reconnect. As a result, we can have failures in send
that can be completely avoided.
This change doesn't make a channel a candidate for
this selection if it needs reconnect.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Use cifsi->netfs_ctx.remote_i_size instead of cifsi->server_eof so that
netfslib can refer to it to.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Recent versions of Clang gets confused about the possible size of the
"user" allocation, and CONFIG_FORTIFY_SOURCE ends up emitting a
warning[1]:
repro.c:126:4: warning: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
126 | __write_overflow_field(p_size_field, size);
| ^
for this memset():
int len;
__le16 *user;
...
len = ses->user_name ? strlen(ses->user_name) : 0;
user = kmalloc(2 + (len * 2), GFP_KERNEL);
...
if (len) {
...
} else {
memset(user, '\0', 2);
}
While Clang works on this bug[2], switch to using a direct assignment,
which avoids memset() entirely which both simplifies the code and silences
the false positive warning. (Making "len" size_t also silences the
warning, but the direct assignment seems better.)
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1966 [1]
Link: https://github.com/llvm/llvm-project/issues/77813 [2]
Cc: Steve French <sfrench@samba.org>
Cc: Paulo Alcantara <pc@manguebit.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Shyam Prasad N <sprasad@microsoft.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: linux-cifs@vger.kernel.org
Cc: llvm@lists.linux.dev
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing and eventfs fixes from Steven Rostedt:
- Fix histogram tracing_map insertion.
The tracing_map_insert copies the value into the elt variable and
then assigns the elt to the entry value. But it is possible that the
entry value becomes visible on other CPUs before the elt is fully
initialized. This is fixed by adding a wmb() between the
initialization of the elt variable and assigning it.
- Have eventfs directory have unique inode numbers.
Having them be all the same proved to be a failure as the 'find'
application will think that the directories are causing loops, as it
checks for directory loops via their inodes. Have the evenfs dir
entries get their inodes assigned when they are referenced and then
save them in the eventfs_inode structure.
* tag 'trace-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Save directory inodes in the eventfs_inode structure
tracing: Ensure visibility when inserting an element into tracing_map
|
|
We need to correct some aspects of the IORING_OP_FIXED_FD_INSTALL
command to take into account the security implications of making an
io_uring-private file descriptor generally accessible to a userspace
task.
The first change in this patch is to enable auditing of the FD_INSTALL
operation as installing a file descriptor into a task's file descriptor
table is a security relevant operation and something that admins/users
may want to audit.
The second change is to disable the io_uring credential override
functionality, also known as io_uring "personalities", in the
FD_INSTALL command. The credential override in FD_INSTALL is
particularly problematic as it affects the credentials used in the
security_file_receive() LSM hook. If a task were to request a
credential override via REQ_F_CREDS on a FD_INSTALL operation, the LSM
would incorrectly check to see if the overridden credentials of the
io_uring were able to "receive" the file as opposed to the task's
credentials. After discussions upstream, it's difficult to imagine a
use case where we would want to allow a credential override on a
FD_INSTALL operation so we are simply going to block REQ_F_CREDS on
IORING_OP_FIXED_FD_INSTALL operations.
Fixes: dc18b89ab113 ("io_uring/openclose: add support for IORING_OP_FIXED_FD_INSTALL")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20240123215501.289566-2-paul@paul-moore.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We encountered a kernel crash triggered by the bpf_tcp_ca testcase as
show below:
Unable to handle kernel paging request at virtual address ff60000088554500
Oops [#1]
...
CPU: 3 PID: 458 Comm: test_progs Tainted: G OE 6.8.0-rc1-kselftest_plain #1
Hardware name: riscv-virtio,qemu (DT)
epc : 0xff60000088554500
ra : tcp_ack+0x288/0x1232
epc : ff60000088554500 ra : ffffffff80cc7166 sp : ff2000000117ba50
gp : ffffffff82587b60 tp : ff60000087be0040 t0 : ff60000088554500
t1 : ffffffff801ed24e t2 : 0000000000000000 s0 : ff2000000117bbc0
s1 : 0000000000000500 a0 : ff20000000691000 a1 : 0000000000000018
a2 : 0000000000000001 a3 : ff60000087be03a0 a4 : 0000000000000000
a5 : 0000000000000000 a6 : 0000000000000021 a7 : ffffffff8263f880
s2 : 000000004ac3c13b s3 : 000000004ac3c13a s4 : 0000000000008200
s5 : 0000000000000001 s6 : 0000000000000104 s7 : ff2000000117bb00
s8 : ff600000885544c0 s9 : 0000000000000000 s10: ff60000086ff0b80
s11: 000055557983a9c0 t3 : 0000000000000000 t4 : 000000000000ffc4
t5 : ffffffff8154f170 t6 : 0000000000000030
status: 0000000200000120 badaddr: ff60000088554500 cause: 000000000000000c
Code: c796 67d7 0000 0000 0052 0002 c13b 4ac3 0000 0000 (0001) 0000
---[ end trace 0000000000000000 ]---
The reason is that commit 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops
CFI") changes the func_addr of arch_prepare_bpf_trampoline in struct_ops
from NULL to non-NULL, while we use func_addr on RV64 to differentiate
between struct_ops and regular trampoline. When the struct_ops testcase
is triggered, it emits wrong prologue and epilogue, and lead to
unpredictable issues. After commit 2cd3e3772e41, we can use
BPF_TRAMP_F_INDIRECT to distinguish them as it always be set in
struct_ops.
Fixes: 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240123023207.1917284-1-pulehui@huaweicloud.com
|
|
Commit 1db9d06aaa55 ("mm/slab: remove CONFIG_SLAB from all Kconfig and
Makefile") removes the config SLAB and makes the SLUB allocator the only
default allocator in the kernel. Hence, the advice on reducing OS jitter
due to kworker kernel threads to build with CONFIG_SLUB instead of
CONFIG_SLAB is obsolete.
Remove the obsolete advice to build with SLUB instead of SLAB.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20231130095515.21586-1-lukas.bulwahn@gmail.com
|
|
This comment about DRM drivers has been there since the first git
commit. It simply doesn't belong in kernel-parameters; remove it.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240111085220.3693059-1-vegard.nossum@oracle.com
|
|
Since the mailing archive list lkml.org is obsolete, change the links into
lore.kernel.org's ones.
Signed-off-by: Hu Haowen <2023002089@link.tyut.edu.cn>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240118090140.4868-1-2023002089@link.tyut.edu.cn
|
|
When on the documentation home page, there won't be any ".current"
element since no entry from the TOC was selected yet. That results in a
javascript error. Fix that by only trying to set the scrollTop if we
have matches for current entries.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240123162157.61819-2-gustavo.sousa@intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Merge cpupower utility update for 6.8-rc2 from Shuah Khan:
"This cpupower fixes update for Linux 6.8-rc2 consists of one single fix
to an issue where CFLAGS is passed as a make argument for cpupower, but
bench makefile does not inherit and append to them."
This fixes the problem so the user specified CFLAGS are honored.
* tag 'linux-cpupower-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
tools cpupower bench: Override CFLAGS assignments
|
|
Kernel has its own official true/false definitions.
The defines aren't even used in this file.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.8-rc2
The most visible fix here is the ath11k crash fix which was introduced
in v6.7. We also have a fix for iwlwifi memory corruption and few
smaller fixes in the stack.
* tag 'wireless-2024-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix race condition on enabling fast-xmit
wifi: iwlwifi: fix a memory corruption
wifi: mac80211: fix potential sta-link leak
wifi: cfg80211/mac80211: remove dependency on non-existing option
wifi: cfg80211: fix missing interfaces when dumping
wifi: ath11k: rely on mac80211 debugfs handling for vif
wifi: p54: fix GCC format truncation warning with wiphy->fw_version
====================
Link: https://lore.kernel.org/r/20240122153434.E0254C433C7@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The host and target use two definition of aer type, unify
them into a single one.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Syzkaller reports a warning in _copy_from_iter because an
iov_iter is supposedly used in the wrong direction. The reason
is that syzcaller managed to generate a request with
a transfer direction of SG_DXFER_TO_FROM_DEV. This instructs
the kernel to copy user buffers into the kernel, read into
the copied buffers and then copy the data back to user space.
Thus the iovec is used in both directions.
Detect this situation in the block layer and construct a new
iterator with the correct direction for the copy-in.
Reported-by: syzbot+a532b03fdfee2c137666@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/0000000000009b92c10604d7a5e9@google.com/t/
Reported-by: syzbot+63dec323ac56c28e644f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/0000000000003faaa105f6e7c658@google.com/T/
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240121202634.275068-1-lk@c--e.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Return IRQ_NONE from the interrupt handler when no interrupt was
detected. Because an empty interrupt will cause a null pointer error:
Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000008
Call trace:
complete+0x54/0x100
hisi_sfc_v3xx_isr+0x2c/0x40 [spi_hisi_sfc_v3xx]
__handle_irq_event_percpu+0x64/0x1e0
handle_irq_event+0x7c/0x1cc
Signed-off-by: Devyn Liu <liudingyuan@huawei.com>
Link: https://msgid.link/r/20240123071149.917678-1-liudingyuan@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
shared interrupt register
We can't use devm_platform_ioremap_resource_byname() to remap the
interrupt register that can be shared between
regulator-abb-{ivahd,dspeve,gpu} drivers instances.
The combined helper introduce a call to devm_request_mem_region() that
creates a new busy resource region on PRM_IRQSTATUS_MPU register
(0x4ae06010). The first devm_request_mem_region() call succeeds for
regulator-abb-ivahd but fails for the two other regulator-abb-dspeve
and regulator-abb-gpu.
# cat /proc/iomem | grep -i 4ae06
4ae06010-4ae06013 : 4ae07e34.regulator-abb-ivahd int-address
4ae06014-4ae06017 : 4ae07ddc.regulator-abb-mpu int-address
regulator-abb-dspeve and regulator-abb-gpu are missing due to
devm_request_mem_region() failure (EBUSY):
[ 1.326660] ti_abb 4ae07e30.regulator-abb-dspeve: can't request region for resource [mem 0x4ae06010-0x4ae06013]
[ 1.326660] ti_abb: probe of 4ae07e30.regulator-abb-dspeve failed with error -16
[ 1.327239] ti_abb 4ae07de4.regulator-abb-gpu: can't request region for resource [mem 0x4ae06010-0x4ae06013]
[ 1.327270] ti_abb: probe of 4ae07de4.regulator-abb-gpu failed with error -16
>From arm/boot/dts/dra7.dtsi:
The abb_mpu is the only instance using its own interrupt register:
(0x4ae06014) PRM_IRQSTATUS_MPU_2, ABB_MPU_DONE_ST (bit 7)
The other tree instances (abb_ivahd, abb_dspeve, abb_gpu) share
PRM_IRQSTATUS_MPU register (0x4ae06010) but use different bits
ABB_IVA_DONE_ST (bit 30), ABB_DSPEVE_DONE_ST( bit 29) and
ABB_GPU_DONE_ST (but 28).
The commit b36c6b1887ff ("regulator: ti-abb: Make use of the helper
function devm_ioremap related") overlooked the following comment
implicitly explaining why devm_ioremap() is used in this case:
/*
* We may have shared interrupt register offsets which are
* write-1-to-clear between domains ensuring exclusivity.
*/
Fixes and partially reverts commit b36c6b1887ff ("regulator: ti-abb:
Make use of the helper function devm_ioremap related").
Improve the existing comment to avoid further conversion to
devm_platform_ioremap_resource_byname().
Fixes: b36c6b1887ff ("regulator: ti-abb: Make use of the helper function devm_ioremap related")
Signed-off-by: Romain Naour <romain.naour@skf.com>
Reviewed-by: Yoann Congal <yoann.congal@smile.fr>
Link: https://msgid.link/r/20240123111456.739381-1-romain.naour@smile.fr
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfs fixes from David Howells:
* 'netfs-fixes' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix missing/incorrect unlocking of RCU read lock
afs: Remove afs_dynroot_d_revalidate() as it is redundant
afs: Fix error handling with lookup via FS.InlineBulkStatus
afs: Hide silly-rename files from userspace
cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write()
netfs, fscache: Prevent Oops in fscache_put_cache()
cifs: Don't use certain unnecessary folio_*() functions
afs: Don't use certain unnecessary folio_*() functions
netfs: Don't use certain unnecessary folio_*() functions
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The eventfs inodes and directories are allocated when referenced. But this
leaves the issue of keeping consistent inode numbers and the number is
only saved in the inode structure itself. When the inode is no longer
referenced, it can be freed. When the file that the inode was representing
is referenced again, the inode is once again created, but the inode number
needs to be the same as it was before.
Just making the inode numbers the same for all files is fine, but that
does not work with directories. The find command will check for loops via
the inode number and having the same inode number for directories triggers:
# find /sys/kernel/tracing
find: File system loop detected;
'/sys/kernel/debug/tracing/events/initcall/initcall_finish' is part of the same file system loop as
'/sys/kernel/debug/tracing/events/initcall'.
[..]
Linus pointed out that the eventfs_inode structure ends with a single
32bit int, and on 64 bit machines, there's likely a 4 byte hole due to
alignment. We can use this hole to store the inode number for the
eventfs_inode. All directories in eventfs are represented by an
eventfs_inode and that data structure can hold its inode number.
That last int was also purposely placed at the end of the structure to
prevent holes from within. Now that there's a 4 byte number to hold the
inode, both the inode number and the last integer can be moved up in the
structure for better cache locality, where the llist and rcu fields can be
moved to the end as they are only used when the eventfs_inode is being
deleted.
Link: https://lore.kernel.org/all/CAMuHMdXKiorg-jiuKoZpfZyDJ3Ynrfb8=X+c7x0Eewxn-YRdCA@mail.gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240122152748.46897388@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 53c41052ba31 ("eventfs: Have the inodes all for files and directories all be the same")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Since commit 086b957cc17f5 ("ALSA: usb-audio: Skip the clock selector
inquiry for single connections") we are already skipping clock selector
inquiry if only one clock source is connected, but we are still sending
a set request. Lets skip that too.
This should fix errors when setting a sample rate on devices that don't
have any controls present within the clock selector. An example of such
device is the new revision of MOTU M Series (07fd:000b):
AudioControl Interface Descriptor:
bLength 8
bDescriptorType 36
bDescriptorSubtype 11 (CLOCK_SELECTOR)
bClockID 1
bNrInPins 1
baCSourceID(0) 2
bmControls 0x00
iClockSelector 0
Perhaps we also should check if clock selectors are readable and writeable
like we already do for clock sources, but this is out of scope of this
patch.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217601
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Link: https://lore.kernel.org/r/20240123134635.54026-1-alexander@tsoy.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
In the existing implementation, when executing interleaved write and read
operations in the ISR for a transfer length greater than the FIFO size,
the TXFIFO write precedes the RXFIFO read. Consequently, the initially
received data in the RXFIFO is pushed out and lost, leading to a failure
in data integrity. To address this issue, reverse the order of interleaved
operations and conduct the RXFIFO read followed by the TXFIFO write.
Fixes: 6afe2ae8dc48 ("spi: spi-cadence: Interleave write of TX and read of RX FIFO")
Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com>
Link: https://msgid.link/r/20231218090652.18403-1-amit.kumar-mahapatra@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
If dma_request_chan() fails, no error is shown nor any information is
shown in /sys/kernel/debug/devices_deferred if -EPROBE_DEFER is returned.
Use dev_err_probe to fix both problems.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://msgid.link/r/20240110085403.457089-1-alexander.stein@ew.tq-group.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
SFDP read shall use the mspi reads when using the bcm_qspi_exec_mem_op()
call. This fixes SFDP parameter page read failures seen with parts that
now use SFDP protocol to read the basic flash parameter table.
Fixes: 5f195ee7d830 ("spi: bcm-qspi: Implement the spi_mem interface")
Signed-off-by: Kamal Dasu <kamal.dasu@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://msgid.link/r/20240109210033.43249-1-kamal.dasu@broadcom.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
In commit 198bc90e0e73("tcp: make sure init the accept_queue's spinlocks
once"), the spinlocks of accept_queue are initialized only when socket is
created in the inet4 scenario. The locks are not initialized when socket
is created in the inet6 scenario. The kernel reports the following error:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:107)
register_lock_class (kernel/locking/lockdep.c:1289)
__lock_acquire (kernel/locking/lockdep.c:5015)
lock_acquire.part.0 (kernel/locking/lockdep.c:5756)
_raw_spin_lock_bh (kernel/locking/spinlock.c:178)
inet_csk_listen_stop (net/ipv4/inet_connection_sock.c:1386)
tcp_disconnect (net/ipv4/tcp.c:2981)
inet_shutdown (net/ipv4/af_inet.c:935)
__sys_shutdown (./include/linux/file.h:32 net/socket.c:2438)
__x64_sys_shutdown (net/socket.c:2445)
do_syscall_64 (arch/x86/entry/common.c:52)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
RIP: 0033:0x7f52ecd05a3d
Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48
RSP: 002b:00007f52ecf5dde8 EFLAGS: 00000293 ORIG_RAX: 0000000000000030
RAX: ffffffffffffffda RBX: 00007f52ecf5e640 RCX: 00007f52ecd05a3d
RDX: 00007f52ecc8b188 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007f52ecf5de20 R08: 00007ffdae45c69f R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f52ecf5e640
R13: 0000000000000000 R14: 00007f52ecc8b060 R15: 00007ffdae45c6e0
Fixes: 198bc90e0e73 ("tcp: make sure init the accept_queue's spinlocks once")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240122102001.2851701-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
An opaque directory cannot have xwhiteouts, so instead of marking an
xwhiteouts directory with a new xattr, overload overlay.opaque xattr
for marking both opaque dir ('y') and xwhiteouts dir ('x').
This is more efficient as the overlay.opaque xattr is checked during
lookup of directory anyway.
This also prevents unnecessary checking the xattr when reading a
directory without xwhiteouts, i.e. most of the time.
Note that the xwhiteouts marker is not checked on the upper layer and
on the last layer in lowerstack, where xwhiteouts are not expected.
Fixes: bc8df7a3dc03 ("ovl: Add an alternative type of whiteout")
Cc: <stable@vger.kernel.org> # v6.7
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Tested-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
There's a Cirque touchpad that wakes system up without anything touched
the touchpad. The input report is empty when this happens.
The reason is stated in HID over I2C spec, 7.2.8.2:
"If the DEVICE wishes to wake the HOST from its low power state, it can
issue a wake by asserting the interrupt."
This is fine if OS can put system back to suspend by identifying input
wakeup count stays the same on resume, like Chrome OS Dark Resume [0].
But for regular distro such policy is lacking.
Though the change doesn't bring any impact on power consumption for
touchpad is minimal, other i2c-hid device may depends on SLEEP control
power. So use a quirk to limit the change scope.
[0] https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/power_manager/docs/dark_resume.md
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
devm_kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure. Ensure the allocation was successful
by checking the pointer validity.
[jkosina@suse.com: tweak changelog a bit]
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
I analyze the potential sleeping issue of the following processes:
Thread A Thread B
... netlink_create //ref = 1
do_mq_notify ...
sock = netlink_getsockbyfilp ... //ref = 2
info->notify_sock = sock; ...
... netlink_sendmsg
... skb = netlink_alloc_large_skb //skb->head is vmalloced
... netlink_unicast
... sk = netlink_getsockbyportid //ref = 3
... netlink_sendskb
... __netlink_sendskb
... skb_queue_tail //put skb to sk_receive_queue
... sock_put //ref = 2
... ...
... netlink_release
... deferred_put_nlk_sk //ref = 1
mqueue_flush_file
spin_lock
remove_notification
netlink_sendskb
sock_put //ref = 0
sk_free
...
__sk_destruct
netlink_sock_destruct
skb_queue_purge //get skb from sk_receive_queue
...
__skb_queue_purge_reason
kfree_skb_reason
__kfree_skb
...
skb_release_all
skb_release_head_state
netlink_skb_destructor
vfree(skb->head) //sleeping while holding spinlock
In netlink_sendmsg, if the memory pointed to by skb->head is allocated by
vmalloc, and is put to sk_receive_queue queue, also the skb is not freed.
When the mqueue executes flush, the sleeping bug will occur. Use
vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.
Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Server product based on the Atom Darkmont core.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240117191844.56180-1-tony.luck@intel.com
|
|
Add a synthetic feature flag for Zen5.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240104201138.5072-1-bp@alien8.de
|
|
Jakub reported that ASSERT_EQ(cpu, i) in so_incoming_cpu.c seems to
fire somewhat randomly.
# # RUN so_incoming_cpu.before_reuseport.test3 ...
# # so_incoming_cpu.c:191:test3:Expected cpu (32) == i (0)
# # test3: Test terminated by assertion
# # FAIL so_incoming_cpu.before_reuseport.test3
# not ok 3 so_incoming_cpu.before_reuseport.test3
When the test failed, not-yet-accepted CLOSE_WAIT sockets received
SYN with a "challenging" SEQ number, which was sent from an unexpected
CPU that did not create the receiver.
The test basically does:
1. for each cpu:
1-1. create a server
1-2. set SO_INCOMING_CPU
2. for each cpu:
2-1. set cpu affinity
2-2. create some clients
2-3. let clients connect() to the server on the same cpu
2-4. close() clients
3. for each server:
3-1. accept() all child sockets
3-2. check if all children have the same SO_INCOMING_CPU with the server
The root cause was the close() in 2-4. and net.ipv4.tcp_tw_reuse.
In a loop of 2., close() changed the client state to FIN_WAIT_2, and
the peer transitioned to CLOSE_WAIT.
In another loop of 2., connect() happened to select the same port of
the FIN_WAIT_2 socket, and it was reused as the default value of
net.ipv4.tcp_tw_reuse is 2.
As a result, the new client sent SYN to the CLOSE_WAIT socket from
a different CPU, and the receiver's sk_incoming_cpu was overwritten
with unexpected CPU ID.
Also, the SYN had a different SEQ number, so the CLOSE_WAIT socket
responded with Challenge ACK. The new client properly returned RST
and effectively killed the CLOSE_WAIT socket.
This way, all clients were created successfully, but the error was
detected later by 3-2., ASSERT_EQ(cpu, i).
To avoid the failure, let's make sure that (i) the number of clients
is less than the number of available ports and (ii) such reuse never
happens.
Fixes: 6df96146b202 ("selftest: Add test for SO_INCOMING_CPU.")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240120031642.67014-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
On CPUs with weak memory models, reads and updates performed by tcp_push
to the sk variables can get reordered leaving the socket throttled when
it should not. The tasklet running tcp_wfree() may also not observe the
memory updates in time and will skip flushing any packets throttled by
tcp_push(), delaying the sending. This can pathologically cause 40ms
extra latency due to bad interactions with delayed acks.
Adding a memory barrier in tcp_push removes the bug, similarly to the
previous commit bf06200e732d ("tcp: tsq: fix nonagle handling").
smp_mb__after_atomic() is used to not incur in unnecessary overhead
on x86 since not affected.
Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu
22.04 and Apache Tomcat 9.0.83 running the basic servlet below:
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class HelloWorldServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=utf-8");
OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8");
String s = "a".repeat(3096);
osw.write(s,0,s.length());
osw.flush();
}
}
Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS
c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+
values is observed while, with the patch, the extra latency disappears.
No patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
...
50.000% 0.91ms
75.000% 1.13ms
90.000% 1.46ms
99.000% 1.74ms
99.900% 1.89ms
99.990% 41.95ms <<< 40+ ms extra latency
99.999% 48.32ms
100.000% 48.96ms
With patch and tcp_autocorking=1
./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello
...
50.000% 0.90ms
75.000% 1.13ms
90.000% 1.45ms
99.000% 1.72ms
99.900% 1.83ms
99.990% 2.11ms <<< no 40+ ms extra latency
99.999% 2.53ms
100.000% 2.62ms
Patch has been also tested on x86 (m7i.2xlarge instance) which it is not
affected by this issue and the patch doesn't introduce any additional
delay.
Fixes: 7aa5470c2c09 ("tcp: tsq: move tsq_flags close to sk_wmem_alloc")
Signed-off-by: Salvatore Dipietro <dipiets@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The device fails to initialize otherwise, giving the following error:
[ 3676.671641] usb 2-1.1: 1:1: cannot get freq at ep 0x1
Signed-off-by: Julian Sikorski <belegdol+github@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240123084935.2745-1-belegdol+github@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Avoid a kernel crash in stifb by providing the correct pointer to the fb_info
struct. Prior to commit e2e0b838a184 ("video/sticore: Remove info field from
STI struct") the fb_info struct was at the beginning of the fb struct.
Fixes: e2e0b838a184 ("video/sticore: Remove info field from STI struct")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
|
|
Customer has reported an issue with specific desktop platform
where two CS42L42 codecs are connected to CS8409 HDA bridge.
If "Master Volume Control" is created then on Ubuntu OS UCM
left/right balance slider in UI audio settings has no effect.
This patch will fix this issue for a target paltform.
Fixes: 20e507724113 ("ALSA: hda/cs8409: Add support for dolphin")
Signed-off-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240122184710.5802-1-vitalyr@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Apollo Lake seems to also suffer from IRQ timing issues. After being up for ~4
minutes, a Pentium N4200 system ends up falling back to workqueue-based IRQ
handling:
[ 208.019906] snd_hda_intel 0000:00:0e.0: IRQ timing workaround is activated
for card #0. Suggest a bigger bdl_pos_adj.
Unfortunately, the Baytrail and Braswell workaround value of 32 samples isn't
enough to fix the issue here. Default to 64 samples.
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Link: https://lore.kernel.org/r/20240122114512.55808-3-rsalvaterra@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
We have self-explanatory constants for Intel HDA devices, let's use them instead
of magic numbers and code comments.
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Reviewed-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Link: https://lore.kernel.org/r/20240122114512.55808-2-rsalvaterra@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
|