summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-04tcp: fix delayed ACKs for MSS boundary conditionNeal Cardwell
This commit fixes poor delayed ACK behavior that can cause poor TCP latency in a particular boundary condition: when an application makes a TCP socket write that is an exact multiple of the MSS size. The problem is that there is painful boundary discontinuity in the current delayed ACK behavior. With the current delayed ACK behavior, we have: (1) If an app reads data when > 1*MSS is unacknowledged, then tcp_cleanup_rbuf() ACKs immediately because of: tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss || (2) If an app reads all received data, and the packets were < 1*MSS, and either (a) the app is not ping-pong or (b) we received two packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause of: ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) || ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) && !inet_csk_in_pingpong_mode(sk))) && (3) *However*: if an app reads exactly 1*MSS of data, tcp_cleanup_rbuf() does not send an immediate ACK. This is true even if the app is not ping-pong and the 1*MSS of data had the PSH bit set, suggesting the sending application completed an application write. Thus if the app is not ping-pong, we have this painful case where >1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a write whose last skb is an exact multiple of 1*MSS can get a 40ms delayed ACK. This means that any app that transfers data in one direction and takes care to align write size or packet size with MSS can suffer this problem. With receive zero copy making 4KB MSS values more common, it is becoming more common to have application writes naturally align with MSS, and more applications are likely to encounter this delayed ACK problem. The fix in this commit is to refine the delayed ACK heuristics with a simple check: immediately ACK a received 1*MSS skb with PSH bit set if the app reads all data. Why? If an skb has a len of exactly 1*MSS and has the PSH bit set then it is likely the end of an application write. So more data may not be arriving soon, and yet the data sender may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send an ACK immediately if the app reads all of the data and is not ping-pong. Note that this logic is also executed for the case where len > MSS, but in that case this logic does not matter (and does not hurt) because tcp_cleanup_rbuf() will always ACK immediately if the app reads data and there is more than an MSS of unACKed data. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Xin Guo <guoxin0309@gmail.com> Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04tcp: fix quick-ack counting to count actual ACKs of new dataNeal Cardwell
This commit fixes quick-ack counting so that it only considers that a quick-ack has been provided if we are sending an ACK that newly acknowledges data. The code was erroneously using the number of data segments in outgoing skbs when deciding how many quick-ack credits to remove. This logic does not make sense, and could cause poor performance in request-response workloads, like RPC traffic, where requests or responses can be multi-segment skbs. When a TCP connection decides to send N quick-acks, that is to accelerate the cwnd growth of the congestion control module controlling the remote endpoint of the TCP connection. That quick-ack decision is purely about the incoming data and outgoing ACKs. It has nothing to do with the outgoing data or the size of outgoing data. And in particular, an ACK only serves the intended purpose of allowing the remote congestion control to grow the congestion window quickly if the ACK is ACKing or SACKing new data. The fix is simple: only count packets as serving the goal of the quickack mechanism if they are ACKing/SACKing new data. We can tell whether this is the case by checking inet_csk_ack_scheduled(), since we schedule an ACK exactly when we are ACKing/SACKing new data. Fixes: fc6415bcb0f5 ("[TCP]: Fix quick-ack decrementing with TSO.") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231001151239.1866845-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-05pmdomain: imx: scu-pd: correct DMA2 channelPeng Fan
Per "dt-bindings/firmware/imx/rsrc.h", `IMX_SC_R_DMA_2_CH0 + 5` not equals to IMX_SC_R_DMA_2_CH5, so there should be two entries in imx8qxp_scu_pd_ranges, otherwise the imx_scu_add_pm_domain may filter out wrong power domains. Fixes: 927b7d15dcf2 ("genpd: imx: scu-pd: enlarge PD range") Reported-by: Dong Aisheng <Aisheng.dong@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20231001123853.200773-1-peng.fan@oss.nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-10-04Merge tag 'nf-23-10-04' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter patches for net First patch resolves a regression with vlan header matching, this was broken since 6.5 release. From myself. Second patch fixes an ancient problem with sctp connection tracking in case INIT_ACK packets are delayed. This comes with a selftest, both patches from Xin Long. Patch 4 extends the existing nftables audit selftest, from Phil Sutter. Patch 5, also from Phil, avoids a situation where nftables would emit an audit record twice. This was broken since 5.13 days. Patch 6, from myself, avoids spurious insertion failure if we encounter an overlapping but expired range during element insertion with the 'nft_set_rbtree' backend. This problem exists since 6.2. * tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure netfilter: nf_tables: Deduplicate nft_register_obj audit logs selftests: netfilter: Extend nft_audit.sh selftests: netfilter: test for sctp collision processing in nf_conntrack netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp netfilter: nft_payload: rebuild vlan header on h_proto access ==================== Link: https://lore.kernel.org/r/20231004141405.28749-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04page_pool: fix documentation typosRandy Dunlap
Correct grammar for better readability. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20231001003846.29541-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04smb: client: do not start laundromat thread on nohandlecachePaulo Alcantara
Honor 'nohandlecache' mount option by not starting laundromat thread even when SMB server supports directory leases. Do not waste system resources by having laundromat thread running with no directory caching at all. Fixes: 2da338ff752a ("smb3: do not start laundromat thread when dir leases disabled") Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-04smb: use kernel_connect() and kernel_bind()Jordan Rife
Recent changes to kernel_connect() and kernel_bind() ensure that callers are insulated from changes to the address parameter made by BPF SOCK_ADDR hooks. This patch wraps direct calls to ops->connect() and ops->bind() with kernel_connect() and kernel_bind() to ensure that SMB mounts do not see their mount address overwritten in such cases. Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/ Cc: <stable@vger.kernel.org> # 6.0+ Signed-off-by: Jordan Rife <jrife@google.com> Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-04tipc: fix a potential deadlock on &tx->lockChengfeng Ye
It seems that tipc_crypto_key_revoke() could be be invoked by wokequeue tipc_crypto_work_rx() under process context and timer/rx callback under softirq context, thus the lock acquisition on &tx->lock seems better use spin_lock_bh() to prevent possible deadlock. This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. tipc_crypto_work_rx() <workqueue> --> tipc_crypto_key_distr() --> tipc_bcast_xmit() --> tipc_bcbase_xmit() --> tipc_bearer_bc_xmit() --> tipc_crypto_xmit() --> tipc_ehdr_build() --> tipc_crypto_key_revoke() --> spin_lock(&tx->lock) <timer interrupt> --> tipc_disc_timeout() --> tipc_bearer_xmit_skb() --> tipc_crypto_xmit() --> tipc_ehdr_build() --> tipc_crypto_key_revoke() --> spin_lock(&tx->lock) <deadlock here> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Link: https://lore.kernel.org/r/20230927181414.59928-1-dg573847474@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04net: stmmac: dwmac-stm32: fix resume on STM32 MCUBen Wolsieffer
The STM32MP1 keeps clk_rx enabled during suspend, and therefore the driver does not enable the clock in stm32_dwmac_init() if the device was suspended. The problem is that this same code runs on STM32 MCUs, which do disable clk_rx during suspend, causing the clock to never be re-enabled on resume. This patch adds a variant flag to indicate that clk_rx remains enabled during suspend, and uses this to decide whether to enable the clock in stm32_dwmac_init() if the device was suspended. This approach fixes this specific bug with limited opportunity for unintended side-effects, but I have a follow up patch that will refactor the clock configuration and hopefully make it less error prone. Fixes: 6528e02cc9ff ("net: ethernet: stmmac: add adaptation for stm32mp157c.") Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230927175749.1419774-1-ben.wolsieffer@hefring.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04workqueue: Fix UAF report by KASAN in pwq_release_workfn()Zqiang
Currently, for UNBOUND wq, if the apply_wqattrs_prepare() return error, the apply_wqattr_cleanup() will be called and use the pwq_release_worker kthread to release resources asynchronously. however, the kfree(wq) is invoked directly in failure path of alloc_workqueue(), if the kfree(wq) has been executed and when the pwq_release_workfn() accesses wq, this leads to the following scenario: BUG: KASAN: slab-use-after-free in pwq_release_workfn+0x339/0x380 kernel/workqueue.c:4124 Read of size 4 at addr ffff888027b831c0 by task pool_workqueue_/3 CPU: 0 PID: 3 Comm: pool_workqueue_ Not tainted 6.5.0-rc7-next-20230825-syzkaller #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0xc4/0x620 mm/kasan/report.c:475 kasan_report+0xda/0x110 mm/kasan/report.c:588 pwq_release_workfn+0x339/0x380 kernel/workqueue.c:4124 kthread_worker_fn+0x2fc/0xa80 kernel/kthread.c:823 kthread+0x33a/0x430 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> Allocated by task 5054: kasan_save_stack+0x33/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] __kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383 kmalloc include/linux/slab.h:599 [inline] kzalloc include/linux/slab.h:720 [inline] alloc_workqueue+0x16f/0x1490 kernel/workqueue.c:4684 kvm_mmu_init_tdp_mmu+0x23/0x100 arch/x86/kvm/mmu/tdp_mmu.c:19 kvm_mmu_init_vm+0x248/0x2e0 arch/x86/kvm/mmu/mmu.c:6180 kvm_arch_init_vm+0x39/0x720 arch/x86/kvm/x86.c:12311 kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1222 [inline] kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:5089 [inline] kvm_dev_ioctl+0xa31/0x1c20 arch/x86/kvm/../../../virt/kvm/kvm_main.c:5131 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 5054: kasan_save_stack+0x33/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:522 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x15b/0x1b0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:164 [inline] slab_free_hook mm/slub.c:1800 [inline] slab_free_freelist_hook+0x114/0x1e0 mm/slub.c:1826 slab_free mm/slub.c:3809 [inline] __kmem_cache_free+0xb8/0x2f0 mm/slub.c:3822 alloc_workqueue+0xe76/0x1490 kernel/workqueue.c:4746 kvm_mmu_init_tdp_mmu+0x23/0x100 arch/x86/kvm/mmu/tdp_mmu.c:19 kvm_mmu_init_vm+0x248/0x2e0 arch/x86/kvm/mmu/mmu.c:6180 kvm_arch_init_vm+0x39/0x720 arch/x86/kvm/x86.c:12311 kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1222 [inline] kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:5089 [inline] kvm_dev_ioctl+0xa31/0x1c20 arch/x86/kvm/../../../virt/kvm/kvm_main.c:5131 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd This commit therefore flush pwq_release_worker in the alloc_and_link_pwqs() before invoke kfree(wq). Reported-by: syzbot+60db9f652c92d5bacba4@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=60db9f652c92d5bacba4 Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-10-04HID: nvidia-shield: Select POWER_SUPPLY Kconfig optionRahul Rameshbabu
Battery information reported by the driver depends on the power supply subsystem. Select the required subsystem when the HID_NVIDIA_SHIELD Kconfig option is enabled. Fixes: 3ab196f88237 ("HID: nvidia-shield: Add battery support for Thunderstrike") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2023-10-04PM: hibernate: Fix copying the zero bitmap to safe pagesPavankumar Kondeti
The following crash is observed 100% of the time during resume from the hibernation on a x86 QEMU system. [ 12.931887] ? __die_body+0x1a/0x60 [ 12.932324] ? page_fault_oops+0x156/0x420 [ 12.932824] ? search_exception_tables+0x37/0x50 [ 12.933389] ? fixup_exception+0x21/0x300 [ 12.933889] ? exc_page_fault+0x69/0x150 [ 12.934371] ? asm_exc_page_fault+0x26/0x30 [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 [ 12.935428] snapshot_write_next+0x7c/0x9f0 [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 [ 12.937035] ? hib_submit_io+0xa5/0x110 [ 12.937501] load_image+0x83/0x1a0 [ 12.937919] swsusp_read+0x17f/0x1d0 [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 [ 12.938967] load_image_and_restore+0x45/0xc0 [ 12.939494] software_resume+0x13c/0x180 [ 12.939994] resume_store+0xa3/0x1d0 The commit being fixed introduced a bug in copying the zero bitmap to safe pages. A temporary bitmap is allocated with PG_ANY flag in prepare_image() to make a copy of zero bitmap after the unsafe pages are marked. Freeing this temporary bitmap with PG_UNSAFE_KEEP later results in an inconsistent state of unsafe pages. Since free bit is left as is for this temporary bitmap after free, these pages are treated as unsafe pages when they are allocated again. This results in incorrect calculation of the number of pages pre-allocated for the image. nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; The allocate_unsafe_pages is estimated to be higher than the actual which results in running short of pages in safe_pages_list. Hence the crash is observed in get_buffer() due to NULL pointer access of safe_pages_list. Fix this issue by creating the temporary zero bitmap from safe pages (free bit not set) so that the corresponding free bits can be cleared while freeing this bitmap. Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") Suggested-by:: Brian Geffon <bgeffon@google.com> Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Reviewed-by: Brian Geffon <bgeffon@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-10-04ipv4: Set offload_failed flag in fibmatch resultsBenjamin Poirier
Due to a small omission, the offload_failed flag is missing from ipv4 fibmatch results. Make sure it is set correctly. The issue can be witnessed using the following commands: echo "1 1" > /sys/bus/netdevsim/new_device ip link add dummy1 up type dummy ip route add 192.0.2.0/24 dev dummy1 echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/fib/fail_route_offload ip route add 198.51.100.0/24 dev dummy1 ip route # 192.168.15.0/24 has rt_trap # 198.51.100.0/24 has rt_offload_failed ip route get 192.168.15.1 fibmatch # Result has rt_trap ip route get 198.51.100.1 fibmatch # Result differs from the route shown by `ip route`, it is missing # rt_offload_failed ip link del dev dummy1 echo 1 > /sys/bus/netdevsim/del_device Fixes: 36c5100e859d ("IPv4: Add "offload failed" indication to routes") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230926182730.231208-1-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04Merge tag 'linux-kselftest-fixes-6.6-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "One single fix to Makefile to fix the incorrect TARGET name for uevent test" * tag 'linux-kselftest-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: Fix wrong TARGET in kselftest top level Makefile
2023-10-04Merge tag 'wireless-2023-09-27' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Quite a collection of fixes this time, really too many to list individually. Many stack fixes, even rfkill (found by simulation and the new eevdf scheduler)! Also a bigger maintainers file cleanup, to remove old and redundant information. * tag 'wireless-2023-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (32 commits) wifi: iwlwifi: mvm: Fix incorrect usage of scan API wifi: mac80211: Create resources for disabled links wifi: cfg80211: avoid leaking stack data into trace wifi: mac80211: allow transmitting EAPOL frames with tainted key wifi: mac80211: work around Cisco AP 9115 VHT MPDU length wifi: cfg80211: Fix 6GHz scan configuration wifi: mac80211: fix potential key leak wifi: mac80211: fix potential key use-after-free wifi: mt76: mt76x02: fix MT76x0 external LNA gain handling wifi: brcmfmac: Replace 1-element arrays with flexible arrays wifi: mwifiex: Fix oob check condition in mwifiex_process_rx_packet wifi: rtw88: rtw8723d: Fix MAC address offset in EEPROM rfkill: sync before userspace visibility/changes wifi: mac80211: fix mesh id corruption on 32 bit systems wifi: cfg80211: add missing kernel-doc for cqm_rssi_work wifi: cfg80211: fix cqm_config access race wifi: iwlwifi: mvm: Fix a memory corruption issue wifi: iwlwifi: Ensure ack flag is properly cleared. wifi: iwlwifi: dbg_ini: fix structure packing iwlwifi: mvm: handle PS changes in vif_cfg_changed ... ==================== Link: https://lore.kernel.org/r/20230927095835.25803-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04cpuidle, ACPI: Evaluate LPI arch_flags for broadcast timerOza Pawandeep
Arm® Functional Fixed Hardware Specification defines LPI states, which provide an architectural context loss flags field that can be used to describe the context that might be lost when an LPI state is entered. - Core context Lost - General purpose registers. - Floating point and SIMD registers. - System registers, include the System register based - generic timer for the core. - Debug register in the core power domain. - PMU registers in the core power domain. - Trace register in the core power domain. - Trace context loss - GICR - GICD Qualcomm's custom CPUs preserves the architectural state, including keeping the power domain for local timers active. when core is power gated, the local timers are sufficient to wake the core up without needing broadcast timer. The patch fixes the evaluation of cpuidle arch_flags, and moves only to broadcast timer if core context lost is defined in ACPI LPI. Fixes: a36a7fecfe60 ("ACPI / processor_idle: Add support for Low Power Idle(LPI) states") Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Oza Pawandeep <quic_poza@quicinc.com> Link: https://lore.kernel.org/r/20231003173333.2865323-1-quic_poza@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2023-10-04Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-10-02 We've added 11 non-merge commits during the last 12 day(s) which contain a total of 12 files changed, 176 insertions(+), 41 deletions(-). The main changes are: 1) Fix BPF verifier to reset backtrack_state masks on global function exit as otherwise subsequent precision tracking would reuse them, from Andrii Nakryiko. 2) Several sockmap fixes for available bytes accounting, from John Fastabend. 3) Reject sk_msg egress redirects to non-TCP sockets given this is only supported for TCP sockets today, from Jakub Sitnicki. 4) Fix a syzkaller splat in bpf_mprog when hitting maximum program limits with BPF_F_BEFORE directive, from Daniel Borkmann and Nikolay Aleksandrov. 5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust size_index for selecting a bpf_mem_cache, from Hou Tao. 6) Fix arch_prepare_bpf_trampoline return code for s390 JIT, from Song Liu. 7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off, from Leon Hwang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use kmalloc_size_roundup() to adjust size_index selftest/bpf: Add various selftests for program limits bpf, mprog: Fix maximum program check on mprog attachment bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets bpf, sockmap: Add tests for MSG_F_PEEK bpf, sockmap: Do not inc copied_seq when PEEK flag set bpf: tcp_read_skb needs to pop skb regardless of seq bpf: unconditionally reset backtrack_state masks on global func exit bpf: Fix tr dereferencing selftests/bpf: Check bpf_cubic_acked() is called via struct_ops s390/bpf: Let arch_prepare_bpf_trampoline return program size ==================== Link: https://lore.kernel.org/r/20231002113417.2309-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04Input: goodix - ensure int GPIO is in input for gpio_count == 1 && ↵Hans de Goede
gpio_int_idx == 0 case Add a special case for gpio_count == 1 && gpio_int_idx == 0 to goodix_add_acpi_gpio_mappings(). It seems that on newer x86/ACPI devices the reset and irq GPIOs are no longer listed as GPIO resources instead there is only 1 GpioInt resource and _PS0 does the whole reset sequence for us. This means that we must call acpi_device_fix_up_power() on these devices to ensure that the chip is reset before we try to use it. This part was already fixed in commit 3de93e6ed2df ("Input: goodix - call acpi_device_fix_up_power() in some cases") by adding a call to acpi_device_fix_up_power() to the generic "Unexpected ACPI resources" catch all. But it turns out that this case on some hw needs some more special handling. Specifically the firmware may bootup with the IRQ pin in output mode. The reset sequence from ACPI _PS0 (executed by acpi_device_fix_up_power()) should put the pin in input mode, but the GPIO subsystem has cached the direction at bootup, causing request_irq() to fail due to gpiochip_lock_as_irq() failure: [ 9.119864] Goodix-TS i2c-GDIX1002:00: Unexpected ACPI resources: gpio_count 1, gpio_int_idx 0 [ 9.317443] Goodix-TS i2c-GDIX1002:00: ID 911, version: 1060 [ 9.321902] input: Goodix Capacitive TouchScreen as /devices/pci0000:00/0000:00:17.0/i2c_designware.4/i2c-5/i2c-GDIX1002:00/input/input8 [ 9.327840] gpio gpiochip0: (INT3453:00): gpiochip_lock_as_irq: tried to flag a GPIO set as output for IRQ [ 9.327856] gpio gpiochip0: (INT3453:00): unable to lock HW IRQ 26 for IRQ [ 9.327861] genirq: Failed to request resources for GDIX1002:00 (irq 131) on irqchip intel-gpio [ 9.327912] Goodix-TS i2c-GDIX1002:00: request IRQ failed: -5 Fix this by adding a special case for gpio_count == 1 && gpio_int_idx == 0 which adds an ACPI GPIO lookup table for the int GPIO even though we cannot use it for reset purposes (as there is no reset GPIO). Adding the lookup will make the gpiod_int = gpiod_get(..., GPIOD_IN) call succeed, which will explicitly set the direction to input fixing the issue. Note this re-uses the acpi_goodix_int_first_gpios[] lookup table, since there is only 1 GPIO in the ACPI resources the reset entry in that lookup table will amount to a no-op. Reported-and-tested-by: Michael Smith <1973.mjsmith@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20231003215144.69527-1-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2023-10-04Input: i8042 - add Fujitsu Lifebook E5411 to i8042 quirk tableSzilard Fabian
In the initial boot stage the integrated keyboard of Fujitsu Lifebook E5411 refuses to work and it's not possible to type for example a dm-crypt passphrase without the help of an external keyboard. i8042.nomux kernel parameter resolves this issue but using that a PS/2 mouse is detected. This input device is unused even when the i2c-hid-acpi kernel module is blacklisted making the integrated ELAN touchpad (04F3:308A) not working at all. Since the integrated touchpad is managed by the i2c_designware input driver in the Linux kernel and you can't find a PS/2 mouse port on the computer I think it's safe to not use the PS/2 mouse port at all. Signed-off-by: Szilard Fabian <szfabian@bluemarch.art> Link: https://lore.kernel.org/r/20231004011749.101789-1-szfabian@bluemarch.art Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2023-10-04dt-bindings: media: renesas,vin: Fix field-even-active spellingGeert Uytterhoeven
make dt_binding_check: field-active-even: missing type definition The property is named "field-even-active", not "field-active-even". Fixes: 3ab7801dfab998a2 ("media: dt-bindings: media: rcar-vin: Describe optional ep properties") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/c999eef0a14c8678f56eb698d27b2243e09afed4.1696328563.git.geert+renesas@glider.be Signed-off-by: Rob Herring <robh@kernel.org>
2023-10-04netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failureFlorian Westphal
nft_rbtree_gc_elem() walks back and removes the end interval element that comes before the expired element. There is a small chance that we've cached this element as 'rbe_ge'. If this happens, we hold and test a pointer that has been queued for freeing. It also causes spurious insertion failures: $ cat test-testcases-sets-0044interval_overlap_0.1/testout.log Error: Could not process rule: File exists add element t s { 0 - 2 } ^^^^^^ Failed to insert 0 - 2 given: table ip t { set s { type inet_service flags interval,timeout timeout 2s gc-interval 2s } } The set (rbtree) is empty. The 'failure' doesn't happen on next attempt. Reason is that when we try to insert, the tree may hold an expired element that collides with the range we're adding. While we do evict/erase this element, we can trip over this check: if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new)) return -ENOTEMPTY; rbe_ge was erased by the synchronous gc, we should not have done this check. Next attempt won't find it, so retry results in successful insertion. Restart in-kernel to avoid such spurious errors. Such restart are rare, unless userspace intentionally adds very large numbers of elements with very short timeouts while setting a huge gc interval. Even in this case, this cannot loop forever, on each retry an existing element has been removed. As the caller is holding the transaction mutex, its impossible for a second entity to add more expiring elements to the tree. After this it also becomes feasible to remove the async gc worker and perform all garbage collection from the commit path. Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection") Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04netfilter: nf_tables: Deduplicate nft_register_obj audit logsPhil Sutter
When adding/updating an object, the transaction handler emits suitable audit log entries already, the one in nft_obj_notify() is redundant. To fix that (and retain the audit logging from objects' 'update' callback), Introduce an "audit log free" variant for internal use. Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> (Audit) Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04dmaengine: mediatek: Fix deadlock caused by synchronize_irq()Duoming Zhou
The synchronize_irq(c->irq) will not return until the IRQ handler mtk_uart_apdma_irq_handler() is completed. If the synchronize_irq() holds a spin_lock and waits the IRQ handler to complete, but the IRQ handler also needs the same spin_lock. The deadlock will happen. The process is shown below: cpu0 cpu1 mtk_uart_apdma_device_pause() | mtk_uart_apdma_irq_handler() spin_lock_irqsave() | | spin_lock_irqsave() //hold the lock to wait | synchronize_irq() | This patch reorders the synchronize_irq(c->irq) outside the spin_lock in order to mitigate the bug. Fixes: 9135408c3ace ("dmaengine: mediatek: Add MediaTek UART APDMA support") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Eugen Hristev <eugen.hristev@collabora.com> Link: https://lore.kernel.org/r/20230806032511.45263-1-duoming@zju.edu.cn Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-10-04dt-bindings: cache: andestech,ax45mp-cache: Fix unit address in exampleGeert Uytterhoeven
The unit address in the example does not match the reg property. Correct the unit address to match reality. Fixes: 3e7bf4685e42786d ("dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/7b93655219a6ad696dd3faa9f36fde6b094694a9.1696330005.git.geert+renesas@glider.be Signed-off-by: Rob Herring <robh@kernel.org>
2023-10-04drm/i915: Invalidate the TLBs on each GTChris Wilson
With multi-GT devices, the object may have been bound on each GT and so we need to invalidate the TLBs across all GT before releasing the pages back to the system. Fixes: d6c531ab4820 ("drm/i915: Invalidate the TLBs on each GT") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: Matt Roper <matthew.d.roper@intel.com> CC: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231002140742.933530-1-jonathan.cavitt@intel.com (cherry picked from commit 6b8ace7a14e7926b7b914ccd96a8ac657c0d518c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-10-04drm/i915: Register engines early to avoid type confusionMathias Krause
Commit 1ec23ed7126e ("drm/i915: Use uabi engines for the default engine map") switched from using for_each_engine() to for_each_uabi_engine() to iterate over the user engines. While this seems to be a sensible change, it's only safe to do when the engines are actually chained using the rb-tree structure which is not the case during early driver initialization where it can be either a lock-less list or regular double-linked list. In fact, the modesetting initialization code may end up calling default_engines() through the fb helper code while the engines list is still llist_node-based: i915_driver_probe() -> intel_display_driver_probe() -> intel_fbdev_init() -> drm_fb_helper_init() -> drm_client_init() -> drm_client_open() -> drm_file_alloc() -> i915_driver_open() -> i915_gem_open() -> i915_gem_context_open() -> i915_gem_create_context() -> default_engines() Using for_each_uabi_engine() in default_engines() is therefore wrong, as it would try to interpret the llist as rb-tree, making it find no engine at all, as the rb_left and rb_right members will still be NULL, as they haven't been initialized yet. To fix this type confusion register the engines earlier and at the same time reduce the amount of code that has to deal with the intermediate llist state. Reported-by: sanity checks in grsecurity Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 1ec23ed7126e ("drm/i915: Use uabi engines for the default engine map") Signed-off-by: Mathias Krause <minipli@grsecurity.net> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230928182019.10256-2-minipli@grsecurity.net [tursulin: fixed commit tag typo] (cherry picked from commit 2b562f032fc2594fb3fac22b7a2eb3c1969a7ba3) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-10-04drm/i915: Don't set PIPE_CONTROL_FLUSH_L3 for aux invalNirmoy Das
PIPE_CONTROL_FLUSH_L3 is not needed for aux invalidation so don't set that. Fixes: 78a6ccd65fa3 ("drm/i915/gt: Ensure memory quiesced before invalidation") Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.8+ Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Mark Janes <mark.janes@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Acked-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Tested-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230926142401.25687-1-nirmoy.das@intel.com (cherry picked from commit 03d681412b38558aefe4fb0f46e36efa94bb21ef) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-10-04ASoC: dt-bindings: fsl,micfil: Document #sound-dai-cellsFabio Estevam
imx8mp.dtsi passes #sound-dai-cells = <0> in the micfil node. Document #sound-dai-cells to fix the following schema warning: audio-controller@30ca0000: '#sound-dai-cells' does not match any of the regexes: 'pinctrl-[0-9]+' from schema $id: http://devicetree.org/schemas/sound/fsl,micfil.yaml# Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Adam Ford <aford173@gmail.com> Link: https://lore.kernel.org/r/20231004122935.2250889-1-festevam@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-10-04selftests: netfilter: Extend nft_audit.shPhil Sutter
Add tests for sets and elements and deletion of all kinds. Also reorder rule reset tests: By moving the bulk rule add command up, the two 'reset rules' tests become identical. While at it, fix for a failing bulk rule add test's error status getting lost due to its use in a pipe. Avoid this by using a temporary file. Headings in diff output for failing tests contain no useful data, strip them. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04selftests: netfilter: test for sctp collision processing in nf_conntrackXin Long
This patch adds a test case to reproduce the SCTP DATA chunk retransmission timeout issue caused by the improper SCTP collision processing in netfilter nf_conntrack_proto_sctp. In this test, client sends a INIT chunk, but the INIT_ACK replied from server is delayed until the server sends a INIT chunk to start a new connection from its side. After the connection is complete from server side, the delayed INIT_ACK arrives in nf_conntrack_proto_sctp. The delayed INIT_ACK should be dropped in nf_conntrack_proto_sctp instead of updating the vtag with the out-of-date init_tag, otherwise, the vtag in DATA chunks later sent by client don't match the vtag in the conntrack entry and the DATA chunks get dropped. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04netfilter: handle the connecting collision properly in nf_conntrack_proto_sctpXin Long
In Scenario A and B below, as the delayed INIT_ACK always changes the peer vtag, SCTP ct with the incorrect vtag may cause packet loss. Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK 192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151] 192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] * 192.168.1.2 > 192.168.1.1: [COOKIE ECHO] 192.168.1.1 > 192.168.1.2: [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: [COOKIE ACK] Scenario B: INIT_ACK is delayed until the peer completes its own handshake 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] 192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] * This patch fixes it as below: In SCTP_CID_INIT processing: - clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir]. (Scenario E) - set ct->proto.sctp.init[dir]. In SCTP_CID_INIT_ACK processing: - drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C) - drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A) In SCTP_CID_COOKIE_ACK processing: - clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir]. (Scenario D) Also, it's important to allow the ct state to move forward with cookie_echo and cookie_ack from the opposite dir for the collision scenarios. There are also other Scenarios where it should allow the packet through, addressed by the processing above: Scenario C: new CT is created by INIT_ACK. Scenario D: start INIT on the existing ESTABLISHED ct. Scenario E: start INIT after the old collision on the existing ESTABLISHED ct. 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] (both side are stopped, then start new connection again in hours) 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742] Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04netfilter: nft_payload: rebuild vlan header on h_proto accessFlorian Westphal
nft can perform merging of adjacent payload requests. This means that: ether saddr 00:11 ... ether type 8021ad ... is a single payload expression, for 8 bytes, starting at the ethernet source offset. Check that offset+length is fully within the source/destination mac addersses. This bug prevents 'ether type' from matching the correct h_proto in case vlan tag got stripped. Fixes: de6843be3082 ("netfilter: nft_payload: rebuild vlan header when needed") Reported-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-04ibmveth: Remove condition to recompute TCP header checksum.David Wilder
In some OVS environments the TCP pseudo header checksum may need to be recomputed. Currently this is only done when the interface instance is configured for "Trunk Mode". We found the issue also occurs in some Kubernetes environments, these environments do not use "Trunk Mode", therefor the condition is removed. Performance tests with this change show only a fractional decrease in throughput (< 0.2%). Fixes: 7525de2516fb ("ibmveth: Set CHECKSUM_PARTIAL if NULL TCP CSUM.") Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04platform/x86: touchscreen_dmi: Add info for the BUSH Bush Windows tabletTomasz Swiatek
Add touchscreen info for the BUSH Bush Windows tablet. It was tested using gslx680_ts_acpi module and on patched kernel installed on device. Link: https://github.com/onitake/gsl-firmware/pull/215 Link: https://github.com/systemd/systemd/pull/29268 Signed-off-by: Tomasz Swiatek <swiatektomasz99@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-10-04platform/mellanox: tmfifo: fix kernel-doc warningsRandy Dunlap
Fix kernel-doc notation for structs and struct members to prevent these warnings: mlxbf-tmfifo.c:73: warning: cannot understand function prototype: 'struct mlxbf_tmfifo_vring ' mlxbf-tmfifo.c:128: warning: cannot understand function prototype: 'struct mlxbf_tmfifo_vdev ' mlxbf-tmfifo.c:146: warning: cannot understand function prototype: 'struct mlxbf_tmfifo_irq_info ' mlxbf-tmfifo.c:158: warning: cannot understand function prototype: 'struct mlxbf_tmfifo_io ' mlxbf-tmfifo.c:182: warning: cannot understand function prototype: 'struct mlxbf_tmfifo ' mlxbf-tmfifo.c:208: warning: cannot understand function prototype: 'struct mlxbf_tmfifo_msg_hdr ' mlxbf-tmfifo.c:138: warning: Function parameter or member 'config' not described in 'mlxbf_tmfifo_vdev' mlxbf-tmfifo.c:212: warning: Function parameter or member 'unused' not described in 'mlxbf_tmfifo_msg_hdr' Fixes: 1357dfd7261f ("platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc") Fixes: bc05ea63b394 ("platform/mellanox: Add BlueField-3 support in the tmfifo driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Closes: lore.kernel.org/r/202309252330.saRU491h-lkp@intel.com Cc: Liming Sun <lsun@mellanox.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Mark Gross <markgross@kernel.org> Cc: Vadim Pasternak <vadimp@nvidia.com> Cc: platform-driver-x86@vger.kernel.org Link: https://lore.kernel.org/r/20230926054013.11450-1-rdunlap@infradead.org Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-10-04platform/x86/intel/ifs: release cpus_read_lock()Jithu Joseph
Couple of error paths in do_core_test() was returning directly without doing a necessary cpus_read_unlock(). Following lockdep warning was observed when exercising these scenarios with PROVE_RAW_LOCK_NESTING enabled: [ 139.304775] ================================================ [ 139.311185] WARNING: lock held when returning to user space! [ 139.317593] 6.6.0-rc2ifs01+ #11 Tainted: G S W I [ 139.324499] ------------------------------------------------ [ 139.330908] bash/11476 is leaving the kernel with locks still held! [ 139.338000] 1 lock held by bash/11476: [ 139.342262] #0: ffffffffaa26c930 (cpu_hotplug_lock){++++}-{0:0}, at: do_core_test+0x35/0x1c0 [intel_ifs] Fix the flow so that all scenarios release the lock prior to returning from the function. Fixes: 5210fb4e1880 ("platform/x86/intel/ifs: Sysfs interface for Array BIST") Cc: stable@vger.kernel.org Signed-off-by: Jithu Joseph <jithu.joseph@intel.com> Link: https://lore.kernel.org/r/20230927184824.2566086-1-jithu.joseph@intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-10-04platform/x86: hp-bioscfg: Fix reference leakArmin Wolf
If a duplicate attribute is found using kset_find_obj(), a reference to that attribute is returned which needs to be disposed accordingly using kobject_put(). Use kobject_put() to dispose the duplicate attribute in such a case. As a side note, a very similar bug was fixed in commit 7295a996fdab ("platform/x86: dell-sysman: Fix reference leak"), so it seems that the bug was copied from that driver. Compile-tested only. Fixes: a34fc329b189 ("platform/x86: hp-bioscfg: bioscfg") Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Armin Wolf <W_Armin@gmx.de> Reviewed-by: Jorge Lopez <jorge.lopez2@hp.com> Link: https://lore.kernel.org/r/20230925142819.74525-3-W_Armin@gmx.de Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-10-04platform/x86: think-lmi: Fix reference leakArmin Wolf
If a duplicate attribute is found using kset_find_obj(), a reference to that attribute is returned which needs to be disposed accordingly using kobject_put(). Move the setting name validation into a separate function to allow for this change without having to duplicate the cleanup code for this setting. As a side note, a very similar bug was fixed in commit 7295a996fdab ("platform/x86: dell-sysman: Fix reference leak"), so it seems that the bug was copied from that driver. Compile-tested only. Fixes: 1bcad8e510b2 ("platform/x86: think-lmi: Fix issues with duplicate attributes") Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20230925142819.74525-2-W_Armin@gmx.de Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-10-04dmaengine: ti: k3-udma-glue: clean up k3_udma_glue_tx_get_irq() returnDan Carpenter
The k3_udma_glue_tx_get_irq() function currently returns negative error codes on error, zero on error and positive values for success. This complicates life for the callers who need to propagate the error code. Also GCC will not warn about unsigned comparisons when you check: if (unsigned_irq <= 0) All the callers have been fixed now but let's just make this easy going forward. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04net: ti: icssg-prueth: Fix signedness bug in prueth_init_tx_chns()Dan Carpenter
The "tx_chn->irq" variable is unsigned so the error checking does not work correctly. Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04net: ethernet: ti: am65-cpsw: Fix error code in am65_cpsw_nuss_init_tx_chns()Dan Carpenter
This accidentally returns success, but it should return a negative error code. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04dmaengine: idxd: use spin_lock_irqsave before wait_event_lock_irqRex Zhang
In idxd_cmd_exec(), wait_event_lock_irq() explicitly calls spin_unlock_irq()/spin_lock_irq(). If the interrupt is on before entering wait_event_lock_irq(), it will become off status after wait_event_lock_irq() is called. Later, wait_for_completion() may go to sleep but irq is disabled. The scenario is warned in might_sleep(). Fix it by using spin_lock_irqsave() instead of the primitive spin_lock() to save the irq status before entering wait_event_lock_irq() and using spin_unlock_irqrestore() instead of the primitive spin_unlock() to restore the irq status before entering wait_for_completion(). Before the change: idxd_cmd_exec() { interrupt is on spin_lock() // interrupt is on wait_event_lock_irq() spin_unlock_irq() // interrupt is enabled ... spin_lock_irq() // interrupt is disabled spin_unlock() // interrupt is still disabled wait_for_completion() // report "BUG: sleeping function // called from invalid context... // in_atomic() irqs_disabled()" } After applying spin_lock_irqsave(): idxd_cmd_exec() { interrupt is on spin_lock_irqsave() // save the on state // interrupt is disabled wait_event_lock_irq() spin_unlock_irq() // interrupt is enabled ... spin_lock_irq() // interrupt is disabled spin_unlock_irqrestore() // interrupt is restored to on wait_for_completion() // No Call trace } Fixes: f9f4082dbc56 ("dmaengine: idxd: remove interrupt disable for cmd_lock") Signed-off-by: Rex Zhang <rex.zhang@intel.com> Signed-off-by: Lijun Pan <lijun.pan@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20230916060619.3744220-1-rex.zhang@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2023-10-04vringh: don't use vringh_kiov_advance() in vringh_iov_xfer()Stefano Garzarella
In the while loop of vringh_iov_xfer(), `partlen` could be 0 if one of the `iov` has 0 lenght. In this case, we should skip the iov and go to the next one. But calling vringh_kiov_advance() with 0 lenght does not cause the advancement, since it returns immediately if asked to advance by 0 bytes. Let's restore the code that was there before commit b8c06ad4d67d ("vringh: implement vringh_kiov_advance()"), avoiding using vringh_kiov_advance(). Fixes: b8c06ad4d67d ("vringh: implement vringh_kiov_advance()") Cc: stable@vger.kernel.org Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-04Merge tag 'xfs-fstrim-busy-tag' of ↵Chandan Babu R
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-6.6-fixesC xfs: reduce AGF hold times during fstrim operations A recent log space overflow and recovery failure was root caused to a long running truncate blocking on the AGF and ending up pinning the tail of the log. The filesystem then hung, the machine was rebooted, and log recoery then refused to run because there wasn't enough space in the log for EFI transaction reservation. The reason the long running truncate got blocked on the AGF for so long was that an fstrim was being run. THe underlying block device was large and very slow (10TB ceph rbd volume) and so discarding all the free space in the AG took a really long time. The current fstrim implementation holds the AGF across the entire operations - both the freee space scan and the issuing of all the discards. The discards are synchronous and single depth, so if there are millions of free spaces, we hold the AGF lock across millions of discard operations. It doesn't really need to be said that this is a Bad Thing. This series reworks the fstrim discard path to use the same mechanisms as online discard. This allows discards to be issued asynchronously without holding the AGF locked, enabling higher discard queue depths (much faster on fast devices) and only requiring the AGF lock to be held whilst we are scanning free space. To do this, we make use of busy extents - we lock the AGF, mark all the extents we want to discard as "busy under discard" so that nothing will be allowed to allocate them, and then drop the AGF lock. We then issue discards on the gathered busy extents and on discard completion remove them from the busy list. This results in AGF lock holds times for fstrim dropping to a few milliseconds each batch of free extents we scan, and so the hours long hold times that can currently occur on large, slow, badly fragmented device no longer occur. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'xfs-fstrim-busy-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: abort fstrim if kernel is suspending xfs: reduce AGF hold times during fstrim operations xfs: move log discard work to xfs_discard.c
2023-10-03nbd: don't call blk_mark_disk_dead nbd_clear_sock_ioctlChristoph Hellwig
blk_mark_disk_dead is the proper interface to shut down a block device, but it also makes the disk unusable forever. nbd_clear_sock_ioctl on the other hand wants to shut down the file system, but allow the block device to be used again when when connected to another socket. Switch nbd to use disk_force_media_change and nbd_bdev_reset to go back to a behavior of the old __invalidate_device call, with the added benefit of incrementing the device generation as there is no guarantee the old content comes back when the device is reconnected. Reported-by: Samuel Holland <samuel.holland@sifive.com> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 0c1c9a27ce90 ("nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl") Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20231003153106.1331363-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-04btrfs: error out when reallocating block for defrag using a stale transactionFilipe Manana
At btrfs_realloc_node() we have these checks to verify we are not using a stale transaction (a past transaction with an unblocked state or higher), and the only thing we do is to trigger two WARN_ON(). This however is a critical problem, highly unexpected and if it happens it's most likely due to a bug, so we should error out and turn the fs into error state so that such issue is much more easily noticed if it's triggered. The problem is critical because in btrfs_realloc_node() we COW tree blocks, and using such stale transaction will lead to not persisting the extent buffers used for the COW operations, as allocating tree block adds the range of the respective extent buffers to the ->dirty_pages iotree of the transaction, and a stale transaction, in the unlocked state or higher, will not flush dirty extent buffers anymore, therefore resulting in not persisting the tree block and resource leaks (not cleaning the dirty_pages iotree for example). So do the following changes: 1) Return -EUCLEAN if we find a stale transaction; 2) Turn the fs into error state, with error -EUCLEAN, so that no transaction can be committed, and generate a stack trace; 3) Combine both conditions into a single if statement, as both are related and have the same error message; 4) Mark the check as unlikely, since this is not expected to ever happen. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: error when COWing block from a root that is being deletedFilipe Manana
At btrfs_cow_block() we check if the block being COWed belongs to a root that is being deleted and if so we log an error message. However this is an unexpected case and it indicates a bug somewhere, so we should return an error and abort the transaction. So change this in the following ways: 1) Abort the transaction with -EUCLEAN, so that if the issue ever happens it can easily be noticed; 2) Change the logged message level from error to critical, and change the message itself to print the block's logical address and the ID of the root; 3) Return -EUCLEAN to the caller; 4) As this is an unexpected scenario, that should never happen, mark the check as unlikely, allowing the compiler to potentially generate better code. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: error out when COWing block using a stale transactionFilipe Manana
At btrfs_cow_block() we have these checks to verify we are not using a stale transaction (a past transaction with an unblocked state or higher), and the only thing we do is to trigger a WARN with a message and a stack trace. This however is a critical problem, highly unexpected and if it happens it's most likely due to a bug, so we should error out and turn the fs into error state so that such issue is much more easily noticed if it's triggered. The problem is critical because using such stale transaction will lead to not persisting the extent buffer used for the COW operation, as allocating a tree block adds the range of the respective extent buffer to the ->dirty_pages iotree of the transaction, and a stale transaction, in the unlocked state or higher, will not flush dirty extent buffers anymore, therefore resulting in not persisting the tree block and resource leaks (not cleaning the dirty_pages iotree for example). So do the following changes: 1) Return -EUCLEAN if we find a stale transaction; 2) Turn the fs into error state, with error -EUCLEAN, so that no transaction can be committed, and generate a stack trace; 3) Combine both conditions into a single if statement, as both are related and have the same error message; 4) Mark the check as unlikely, since this is not expected to ever happen. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: always print transaction aborted messages with an error levelFilipe Manana
Commit b7af0635c87f ("btrfs: print transaction aborted messages with an error level") changed the log level of transaction aborted messages from a debug level to an error level, so that such messages are always visible even on production systems where the log level is normally above the debug level (and also on some syzbot reports). Later, commit fccf0c842ed4 ("btrfs: move btrfs_abort_transaction to transaction.c") changed the log level back to debug level when the error number for a transaction abort should not have a stack trace printed. This happened for absolutely no reason. It's always useful to print transaction abort messages with an error level, regardless of whether the error number should cause a stack trace or not. So change back the log level to error level. Fixes: fccf0c842ed4 ("btrfs: move btrfs_abort_transaction to transaction.c") CC: stable@vger.kernel.org # 6.5+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: reject unknown mount options earlyQu Wenruo
[BUG] The following script would allow invalid mount options to be specified (although such invalid options would just be ignored): # mkfs.btrfs -f $dev # mount $dev $mnt1 <<< Successful mount expected # mount $dev $mnt2 -o junk <<< Failed mount expected # echo $? 0 [CAUSE] For the 2nd mount, since the fs is already mounted, we won't go through open_ctree() thus no btrfs_parse_options(), but only through btrfs_parse_subvol_options(). However we do not treat unrecognized options from valid but irrelevant options, thus those invalid options would just be ignored by btrfs_parse_subvol_options(). [FIX] Add the handling for Opt_err to handle invalid options and error out, while still ignore other valid options inside btrfs_parse_subvol_options(). Reported-by: Anand Jain <anand.jain@oracle.com> CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>