summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-26net/sched: act_mirred: Create function tcf_mirred_to_dev and improve readabilityVictor Nogueira
As a preparation for adding block ID to mirred, separate the part of mirred that redirect/mirrors to a dev into a specific function so that it can be called by blockcast for each dev. Also improve readability. Eg. rename use_reinsert to dont_clone and skb2 to skb_to_send. Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/sched: cls_api: Expose tc block to the datapathVictor Nogueira
The datapath can now find the block of the port in which the packet arrived at. In the next patch we show a possible usage of this patch in a new version of mirred that multicasts to all ports except for the port in which the packet arrived on. Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/sched: Introduce tc block netdev tracking infraVictor Nogueira
This commit makes tc blocks track which ports have been added to them. And, with that, we'll be able to use this new information to send packets to the block's ports. Which will be done in the patch #3 of this series. Suggested-by: Jiri Pirko <jiri@nvidia.com> Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26keys, dns: Fix missing size check of V1 server-list headerEdward Adam Davis
The dns_resolver_preparse() function has a check on the size of the payload for the basic header of the binary-style payload, but is missing a check for the size of the V1 server-list payload header after determining that's what we've been given. Fix this by getting rid of the the pointer to the basic header and just assuming that we have a V1 server-list payload and moving the V1 server list pointer inside the if-statement. Dealing with other types and versions can be left for when such have been defined. This can be tested by doing the following with KASAN enabled: echo -n -e '\x0\x0\x1\x2' | keyctl padd dns_resolver foo @p and produces an oops like the following: BUG: KASAN: slab-out-of-bounds in dns_resolver_preparse+0xc9f/0xd60 net/dns_resolver/dns_key.c:127 Read of size 1 at addr ffff888028894084 by task syz-executor265/5069 ... Call Trace: dns_resolver_preparse+0xc9f/0xd60 net/dns_resolver/dns_key.c:127 __key_create_or_update+0x453/0xdf0 security/keys/key.c:842 key_create_or_update+0x42/0x50 security/keys/key.c:1007 __do_sys_add_key+0x29c/0x450 security/keys/keyctl.c:134 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x62/0x6a This patch was originally by Edward Adam Davis, but was modified by Linus. Fixes: b946001d3bb1 ("keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiry") Reported-and-tested-by: syzbot+94bbb75204a05da3d89f@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/0000000000009b39bc060c73e209@google.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Cc: Edward Adam Davis <eadavis@qq.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jeffrey E Altman <jaltman@auristor.com> Cc: Wang Lei <wang840925@gmail.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: Steve French <sfrench@us.ibm.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-26f2fs: show more discard status by sysfsZhiguo Niu
The current pending_discard attr just only shows the discard_cmd_cnt information. More discard status can be shown so that we can check them through sysfs when needed. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26f2fs: Add error handling for negative returns from do_garbage_collectYongpeng Yang
The function do_garbage_collect can return a value less than 0 due to f2fs_cp_error being true or page allocation failure, as a result of calling f2fs_get_sum_page. However, f2fs_gc does not account for such cases, which could potentially lead to an abnormal total_freed and thus cause subsequent code to behave unexpectedly. Given that an f2fs_cp_error is irrecoverable, and considering that do_garbage_collect already retries page allocation errors through its call to f2fs_get_sum_page->f2fs_get_meta_page_retry, any error reported by do_garbage_collect should immediately terminate the current GC. Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26f2fs: Constrain the modification range of dir_level in the sysfsYongpeng Yang
The {struct f2fs_sb_info}->dir_level can be modified through the sysfs interface, but its value range is not limited. If the value exceeds MAX_DIR_HASH_DEPTH and the mount options include "noinline_dentry", the following error will occur: [root@fedora ~]# mount -o noinline_dentry /dev/sdb /mnt/sdb/ [root@fedora ~]# echo 128 > /sys/fs/f2fs/sdb/dir_level [root@fedora ~]# cd /mnt/sdb/ [root@fedora sdb]# mkdir test [root@fedora sdb]# cd test/ [root@fedora test]# mkdir test mkdir: cannot create directory 'test': Argument list too long Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26f2fs: Use wait_event_freezable_timeout() for freezable kthreadKevin Hao
A freezable kernel thread can enter frozen state during freezing by either calling try_to_freeze() or using wait_event_freezable() and its variants. So for the following snippet of code in a kernel thread loop: wait_event_interruptible_timeout(); try_to_freeze(); We can change it to a simple wait_event_freezable_timeout() and then eliminate the function calls to try_to_freeze() and freezing(). Signed-off-by: Kevin Hao <haokexin@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-26net: remove SOCK_DEBUG macroDenis Kirjanov
Since there are no more users of the macro let's finally burn it Signed-off-by: Denis Kirjanov <dkirjanov@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net: remove SOCK_DEBUG leftoversDenis Kirjanov
SOCK_DEBUG comes from the old days. Let's move logging to standard net core ratelimited logging functions Signed-off-by: Denis Kirjanov <dkirjanov@suse.de> changes in v2: - remove SOCK_DEBUG macro altogether Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26octeontx2-af: Fix marking couple of structure as __packedSuman Ghosh
Couple of structures was not marked as __packed. This patch fixes the same and mark them as __packed. Fixes: 42006910b5ea ("octeontx2-af: cleanup KPU config data") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26Merge branch 'net-smcv2.1-ISM-device-support'David S. Miller
Wen Gu says: ==================== net/smc: implement SMCv2.1 virtual ISM device support The fourth edition of SMCv2 adds the SMC version 2.1 feature updates for SMC-Dv2 with virtual ISM. Virtual ISM are created and supported mainly by OS or hypervisor software, comparable to IBM ISM which is based on platform firmware or hardware. With the introduction of virtual ISM, SMCv2.1 makes some updates: - Introduce feature bitmask to indicate supplemental features. - Reserve a range of CHIDs for virtual ISM. - Support extended GIDs (128 bits) in CLC handshake. So this patch set aims to implement these updates in Linux kernel. And it acts as the first part of SMC-D virtual ISM extension & loopback-ism [1]. [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/ v8->v7: - Patch #7: v7 mistakenly changed the type of gid_ext in smc_clc_msg_accept_confirm to u64 instead of __be64 as previous versions when fixing the rebase conflicts. So fix this mistake. v7->v6: Link: https://lore.kernel.org/netdev/20231219084536.8158-1-guwen@linux.alibaba.com/ - Collect the Reviewed-by tag in v6; - Patch #3: redefine the struct smc_clc_msg_accept_confirm; - Patch #7: Because that the Patch #3 already adds '__packed' to smc_clc_msg_accept_confirm, so Patch #7 doesn't need to do the same thing. But this is a minor change, so I kept the 'Reviewed-by' tag. Other changes in previous versions but not yet acked: - Patch #1: Some minor changes in subject and fix the format issue (length exceeds 80 columns) compared to v3. - Patch #5: removes useless ini->feature_mask assignment in __smc_connect() and smc_listen_v2_check() compared to v4. - Patch #8: new added, compared to v3. v6->v5: Link: https://lore.kernel.org/netdev/1702371151-125258-1-git-send-email-guwen@linux.alibaba.com/ - Add 'Reviewed-by' label given in the previous versions: * Patch #4, #6, #9, #10 have nothing changed since v3; - Patch #2: * fix the format issue (Alignment should match open parenthesis) compared to v5; * remove useless clc->hdr.length assignment in smcr_clc_prep_confirm_accept() compared to v5; - Patch #3: new added compared to v5. - Patch #7: some minor changes like aclc_v2->aclc or clc_v2->clc compared to v5 due to the introduction of Patch #3. Since there were no major changes, I kept the 'Reviewed-by' label. Other changes in previous versions but not yet acked: - Patch #1: Some minor changes in subject and fix the format issue (length exceeds 80 columns) compared to v3. - Patch #5: removes useless ini->feature_mask assignment in __smc_connect() and smc_listen_v2_check() compared to v4. - Patch #8: new added, compared to v3. v5->v4: Link: https://lore.kernel.org/netdev/1702021259-41504-1-git-send-email-guwen@linux.alibaba.com/ - Patch #6: improve the comment of SMCD_CLC_MAX_V2_GID_ENTRIES; - Patch #4: remove useless ini->feature_mask assignment; v4->v3: https://lore.kernel.org/netdev/1701920994-73705-1-git-send-email-guwen@linux.alibaba.com/ - Patch #6: use SMCD_CLC_MAX_V2_GID_ENTRIES to indicate the max gid entries in CLC proposal and using SMC_MAX_V2_ISM_DEVS to indicate the max devices to propose; - Patch #6: use i and i+1 in smc_find_ism_v2_device_serv(); - Patch #2: replace the large if-else block in smc_clc_send_confirm_accept() with 2 subfunctions; - Fix missing byte order conversion of GID and token in CLC handshake, which is in a separate patch sending to net: https://lore.kernel.org/netdev/1701882157-87956-1-git-send-email-guwen@linux.alibaba.com/ - Patch #7: add extended GID in SMC-D lgr netlink attribute; v3->v2: https://lore.kernel.org/netdev/1701343695-122657-1-git-send-email-guwen@linux.alibaba.com/ - Rename smc_clc_fill_fce as smc_clc_fill_fce_v2x; - Remove ISM_IDENT_MASK from drivers/s390/net/ism.h; - Add explicitly assigning 'false' to ism_v2_capable in ism_dev_init(); - Remove smc_ism_set_v2_capable() helper for now, and introduce it in later loopback-ism implementation; v2->v1: - Fix sparse complaint; - Rebase to the latest net-next; ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: manage system EID in SMC stack instead of ISM driverWen Gu
The System EID (SEID) is an internal EID that is used by the SMCv2 software stack that has a predefined and constant value representing the s390 physical machine that the OS is executing on. So it should be managed by SMC stack instead of ISM driver and be consistent for all ISMv2 device (including virtual ISM devices) on s390 architecture. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: disable SEID on non-s390 archs where virtual ISM may be usedWen Gu
The system EID (SEID) is an internal EID used by SMC-D to represent the s390 physical machine that OS is executing on. On s390 architecture, it predefined by fixed string and part of cpuid and is enabled regardless of whether underlay device is virtual ISM or platform firmware ISM. However on non-s390 architectures where SMC-D can be used with virtual ISM devices, there is no similar information to identify physical machines, especially in virtualization scenarios. So in such cases, SEID is forcibly disabled and the user-defined UEID will be used to represent the communicable space. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: support extended GID in SMC-D lgr netlink attributeWen Gu
Virtual ISM devices introduced in SMCv2.1 requires a 128 bit extended GID vs. the existing ISM 64bit GID. So the 2nd 64 bit of extended GID should be included in SMC-D linkgroup netlink attribute as well. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: compatible with 128-bits extended GID of virtual ISM deviceWen Gu
According to virtual ISM support feature defined by SMCv2.1, GIDs of virtual ISM device are UUIDs defined by RFC4122, which are 128-bits long. So some adaptation work is required. And note that the GIDs of existing platform firmware ISM devices still remain 64-bits long. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: define a reserved CHID range for virtual ISM devicesWen Gu
According to virtual ISM support feature defined by SMCv2.1, CHIDs in the range 0xFF00 to 0xFFFF are reserved for use by virtual ISM devices. And two helpers are introduced to distinguish virtual ISM devices from the existing platform firmware ISM devices. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: introduce virtual ISM device support featureWen Gu
This introduces virtual ISM device support feature to SMCv2.1 as the first supplemental feature. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: support SMCv2.x supplemental features negotiationWen Gu
This patch adds SMCv2.x supplemental features negotiation. Supported SMCv2.x supplemental features are represented by feature_mask in FCE field. The negotiation process is as follows. Server Client Proposal(features(c-mask bits)) <----------------------------------------- Accept(features(s-mask bits)) -----------------------------------------> Confirm(features(s&c-mask bits)) <----------------------------------------- Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: unify the structs of accept or confirm message for v1 and v2Wen Gu
The structs of CLC accept and confirm messages for SMCv1 and SMCv2 are separately defined and often casted to each other in the code, which may increase the risk of errors caused by future divergence of them. So unify them into one struct for better maintainability. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: introduce sub-functions for smc_clc_send_confirm_accept()Wen Gu
There is a large if-else block in smc_clc_send_confirm_accept() and it is better to split it into two sub-functions. Suggested-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26net/smc: rename some 'fce' to 'fce_v2x' for clarityWen Gu
Rename some functions or variables with 'fce' in their name but used in SMCv2.1 as 'fce_v2x' for clarity. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26drm/xe: Fix warning on impossible conditionLucas De Marchi
Having a different value for op is not possible: this is already kept out of user-visible warning by the check in xe_wait_user_fence_ioctl() if op > MAX_OP. The warning is useful as if this switch() is not update when a new op is added, it should be triggered. Fix warning as reported by 0-DAY CI Kernel: drivers/gpu/drm/xe/xe_wait_user_fence.c:46:2: warning: variable 'passed' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] Closes: https://lore.kernel.org/oe-kbuild-all/202312170357.KPSinwPs-lkp@intel.com/ Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20231218163301.3453285-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-26drm/xe: Fix UBSAN splat in add_preempt_fences()Matthew Brost
add_preempt_fences() calls dma_resv_reserve_fences() with num_fences == 0 resulting in the below UBSAN splat. Short circuit add_preempt_fences() if num_fences == 0. [ 58.652241] ================================================================================ [ 58.660736] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 [ 58.667281] shift exponent 64 is too large for 64-bit type 'long unsigned int' [ 58.674539] CPU: 2 PID: 1170 Comm: xe_gpgpu_fill Not tainted 6.6.0-rc3-guc+ #630 [ 58.674545] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020 [ 58.674547] Call Trace: [ 58.674548] <TASK> [ 58.674550] dump_stack_lvl+0x92/0xb0 [ 58.674555] __ubsan_handle_shift_out_of_bounds+0x15a/0x300 [ 58.674559] ? rcu_is_watching+0x12/0x60 [ 58.674564] ? software_resume+0x141/0x210 [ 58.674575] ? new_vma+0x44b/0x600 [xe] [ 58.674606] dma_resv_reserve_fences.cold+0x40/0x66 [ 58.674612] new_vma+0x4b3/0x600 [xe] [ 58.674638] xe_vm_bind_ioctl+0xffd/0x1e00 [xe] [ 58.674663] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 58.674680] drm_ioctl_kernel+0xc1/0x170 [ 58.674686] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 58.674703] drm_ioctl+0x247/0x4c0 [ 58.674709] ? find_held_lock+0x2b/0x80 [ 58.674716] __x64_sys_ioctl+0x8c/0xb0 [ 58.674720] do_syscall_64+0x3c/0x90 [ 58.674723] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 58.674727] RIP: 0033:0x7fce4bd1aaff [ 58.674730] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 58.674731] RSP: 002b:00007ffc57434050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 58.674734] RAX: ffffffffffffffda RBX: 00007ffc574340e0 RCX: 00007fce4bd1aaff [ 58.674736] RDX: 00007ffc574340e0 RSI: 0000000040886445 RDI: 0000000000000003 [ 58.674737] RBP: 0000000040886445 R08: 0000000000000002 R09: 00007ffc574341b0 [ 58.674739] R10: 000055de43eb3780 R11: 0000000000000246 R12: 00007ffc574340e0 [ 58.674740] R13: 0000000000000003 R14: 00007ffc574341b0 R15: 0000000000000001 [ 58.674747] </TASK> [ 58.674748] ================================================================================ Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231215230203.719244-1-matthew.brost@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-26idpf: avoid compiler introduced padding in virtchnl2_rss_key structPavan Kumar Linga
Size of the virtchnl2_rss_key struct should be 7 bytes but the compiler introduces a padding byte for the structure alignment. This results in idpf sending an additional byte of memory to the device control plane than the expected buffer size. As the control plane enforces virtchnl message size checks to validate the message, set RSS key message fails resulting in the driver load failure. Remove implicit compiler padding by using "__packed" structure attribute for the virtchnl2_rss_key struct. Also there is no need to use __DECLARE_FLEX_ARRAY macro for the 'key_flex' struct field. So drop it. Fixes: 0d7502a9b4a7 ("virtchnl: add virtchnl version 2 ops") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Scott Register <scott.register@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-26idpf: fix corrupted frames and skb leaks in singleq modeAlexander Lobakin
idpf_ring::skb serves only for keeping an incomplete frame between several NAPI Rx polling cycles, as one cycle may end up before processing the end of packet descriptor. The pointer is taken from the ring onto the stack before entering the loop and gets written there after the loop exits. When inside the loop, only the onstack pointer is used. For some reason, the logics is broken in the singleq mode, where the pointer is taken from the ring each iteration. This means that if a frame got fragmented into several descriptors, each fragment will have its own skb, but only the last one will be passed up the stack (containing garbage), leaving the rest leaked. Then, on ifdown, rxq::skb is being freed only in the splitq mode, while it can point to a valid skb in singleq as well. This can lead to a yet another skb leak. Just don't touch the ring skb field inside the polling loop, letting the onstack skb pointer work as expected: build a new skb if it's the first frame descriptor and attach a frag otherwise. On ifdown, free rxq::skb unconditionally if the pointer is non-NULL. Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Scott Register <scott.register@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-26blk-wbt: remove the separate write cache trackingChristoph Hellwig
Use the queue wide write back cache tracking insted of duplicating the value in strut rq_wb. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231226090747.204969-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-26block: reject invalid operation in submit_bio_noacctChristoph Hellwig
submit_bio_noacct allows completely invalid operations, or operations that are not supported in the bio path. Extent the existing switch statement to rejcect all invalid types. Move the code point for REQ_OP_ZONE_APPEND so that it's not right in the middle of the zone management operations and the switch statement can follow the numerical order of the operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231221070538.1112446-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-26block: renumber QUEUE_FLAG_HW_WCChristoph Hellwig
For the QUEUE_FLAG_HW_WC to actually work, it needs to have a separate number from QUEUE_FLAG_FUA, doh. Fixes: 43c9835b144c ("block: don't allow enabling a cache on devices that don't support it") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231226081524.180289-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-26IB/iser: iscsi_iser.h: fix kernel-doc warning and spellosRandy Dunlap
Drop one kernel-doc comment to prevent a warning: iscsi_iser.h:313: warning: Excess struct member 'mr' description in 'iser_device' and spell 2 words correctly (buffer and deferred). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Link: https://lore.kernel.org/r/20231222234623.25231-1-rdunlap@infradead.org Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-12-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "A couple of bugfixes: one for a regression" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_blk: fix snprintf truncation compiler warning virtio_ring: fix syncs DMA memory with different direction
2023-12-25Merge branch 'nfc-refcounting'David S. Miller
@ 2023-12-19 17:49 Siddh Raman Pant 2023-12-19 17:49 ` [PATCH net-next v7 1/2] nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local Siddh Raman Pant 2023-12-19 17:49 ` [PATCH net-next v7 2/2] nfc: Do not send datagram if socket state isn't LLCP_BOUND Siddh Raman Pant 0 siblings, 2 replies; 4+ messages in thread Siddh Raman Pant says: ==================== [PATCH net-next v7 0/2] nfc: Fix UAF during datagram sending caused by missing refcounting Changes in v7: - Stupidly reverted ordering in recv() too, fix that. - Remove redundant call to nfc_llcp_sock_free(). Changes in v6: - Revert label introduction from v4, and thus also v5 entirely. Changes in v5: - Move reason = LLCP_DM_REJ under the fail_put_sock label. - Checkpatch now warns about == NULL check for new_sk, so fix that, and also at other similar places in the same function. Changes in v4: - Fix put ordering and comments. - Separate freeing in recv() into end labels. - Remove obvious comment and add reasoning. - Picked up r-bs by Suman. Changes in v3: - Fix missing freeing statements. Changes in v2: - Add net-next in patch subject. - Removed unnecessary extra lock and hold nfc_dev ref when holding llcp_sock. - Remove last formatting patch. - Picked up r-b from Krzysztof for LLCP_BOUND patch. --- For connectionless transmission, llcp_sock_sendmsg() codepath will eventually call nfc_alloc_send_skb() which takes in an nfc_dev as an argument for calculating the total size for skb allocation. virtual_ncidev_close() codepath eventually releases socket by calling nfc_llcp_socket_release() (which sets the sk->sk_state to LLCP_CLOSED) and afterwards the nfc_dev will be eventually freed. When an ndev gets freed, llcp_sock_sendmsg() will result in an use-after-free as it (1) doesn't have any checks in place for avoiding the datagram sending. (2) calls nfc_llcp_send_ui_frame(), which also has a do-while loop which can race with freeing. This loop contains the call to nfc_alloc_send_skb() where we dereference the nfc_dev pointer. nfc_dev is being freed because we do not hold a reference to it when we hold a reference to llcp_local. Thus, virtual_ncidev_close() eventually calls nfc_release() due to refcount going to 0. Since state has to be LLCP_BOUND for datagram sending, we can bail out early in llcp_sock_sendmsg(). Please review and let me know if any errors are there, and hopefully this gets accepted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-25nfc: Do not send datagram if socket state isn't LLCP_BOUNDSiddh Raman Pant
As we know we cannot send the datagram (state can be set to LLCP_CLOSED by nfc_llcp_socket_release()), there is no need to proceed further. Thus, bail out early from llcp_sock_sendmsg(). Signed-off-by: Siddh Raman Pant <code@siddh.me> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-25nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_localSiddh Raman Pant
llcp_sock_sendmsg() calls nfc_llcp_send_ui_frame() which in turn calls nfc_alloc_send_skb(), which accesses the nfc_dev from the llcp_sock for getting the headroom and tailroom needed for skb allocation. Parallelly the nfc_dev can be freed, as the refcount is decreased via nfc_free_device(), leading to a UAF reported by Syzkaller, which can be summarized as follows: (1) llcp_sock_sendmsg() -> nfc_llcp_send_ui_frame() -> nfc_alloc_send_skb() -> Dereference *nfc_dev (2) virtual_ncidev_close() -> nci_free_device() -> nfc_free_device() -> put_device() -> nfc_release() -> Free *nfc_dev When a reference to llcp_local is acquired, we do not acquire the same for the nfc_dev. This leads to freeing even when the llcp_local is in use, and this is the case with the UAF described above too. Thus, when we acquire a reference to llcp_local, we should acquire a reference to nfc_dev, and release the references appropriately later. References for llcp_local is initialized in nfc_llcp_register_device() (which is called by nfc_register_device()). Thus, we should acquire a reference to nfc_dev there. nfc_unregister_device() calls nfc_llcp_unregister_device() which in turn calls nfc_llcp_local_put(). Thus, the reference to nfc_dev is appropriately released later. Reported-and-tested-by: syzbot+bbe84a4010eeea00982d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bbe84a4010eeea00982d Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket") Reviewed-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Siddh Raman Pant <code@siddh.me> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-25net: sfp: fix PHY discovery for FS SFP-10G-T moduleMarek BehĂșn
Commit 2f3ce7a56c6e ("net: sfp: rework the RollBall PHY waiting code") changed the long wait before accessing RollBall / FS modules into probing for PHY every 1 second, and trying 25 times. Wei Lei reports that this does not work correctly on FS modules: when initializing, they may report values different from 0xffff in PHY ID registers for some MMDs, causing get_phy_c45_ids() to find some bogus MMD. Fix this by adding the module_t_wait member back, and setting it to 4 seconds for FS modules. Fixes: 2f3ce7a56c6e ("net: sfp: rework the RollBall PHY waiting code") Reported-by: Wei Lei <quic_leiwei@quicinc.com> Signed-off-by: Marek BehĂșn <kabel@kernel.org> Tested-by: Lei Wei <quic_leiwei@quicinc.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-25drm/rockchip: vop2: clean up some inconsistent indentingJiapeng Chong
No functional modification involved. drivers/gpu/drm/rockchip/rockchip_drm_vop2.c:1708 rk3588_calc_cru_cfg() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7778 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231219062635.100718-1-jiapeng.chong@linux.alibaba.com
2023-12-25drm/rockchip: vop2: Avoid use regmap_reinit_cache at runtimeAndy Yan
Marek Report a possible irq lock inversion dependency warning when commit 81a06f1d02e5 ("Revert "drm/rockchip: vop2: Use regcache_sync() to fix suspend/resume"") lands linux-next. I can reproduce this warning with: CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_LOCKDEP=y It seems than when use regmap_reinit_cache at runtime whith Mark's commit 3d59c22bbb8d ("drm/rockchip: vop2: Convert to use maple tree register cache"), it will trigger a possible irq lock inversion dependency warning. One solution is switch back to REGCACHE_RBTREE, but it seems that REGCACHE_MAPLE is the future, so I avoid using regmap_reinit_cache, and drop all the regcache when vop is disabled, then we get a fresh start at next enbable time. Fixes: 81a06f1d02e5 ("Revert "drm/rockchip: vop2: Use regcache_sync() to fix suspend/resume"") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/all/98a9f15d-30ac-47bf-9b93-3aa2c9900f7b@samsung.com/ Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> [dropped the large kernel log of the lockdep report from the message] Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231217084415.2373043-1-andyshrk@163.com
2023-12-24lsm: new security_file_ioctl_compat() hookAlfred Piccioni
Some ioctl commands do not require ioctl permission, but are routed to other permissions such as FILE_GETATTR or FILE_SETATTR. This routing is done by comparing the ioctl cmd to a set of 64-bit flags (FS_IOC_*). However, if a 32-bit process is running on a 64-bit kernel, it emits 32-bit flags (FS_IOC32_*) for certain ioctl operations. These flags are being checked erroneously, which leads to these ioctl operations being routed to the ioctl permission, rather than the correct file permissions. This was also noted in a RED-PEN finding from a while back - "/* RED-PEN how should LSM module know it's handling 32bit? */". This patch introduces a new hook, security_file_ioctl_compat(), that is called from the compat ioctl syscall. All current LSMs have been changed to support this hook. Reviewing the three places where we are currently using security_file_ioctl(), it appears that only SELinux needs a dedicated compat change; TOMOYO and SMACK appear to be functional without any change. Cc: stable@vger.kernel.org Fixes: 0b24dcb7f2f7 ("Revert "selinux: simplify ioctl checking"") Signed-off-by: Alfred Piccioni <alpic@google.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: subject tweak, line length fixes, and alignment corrections] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-12-24arm64: dts: rockchip: Fix rk3588 USB power-domain clocksSam Edwards
The QoS blocks saved/restored when toggling the PD_USB power domain are clocked by ACLK_USB. Attempting to access these memory regions without that clock running will result in an indefinite CPU stall. The PD_USB node wasn't specifying this clock dependency, resulting in hangs when trying to toggle the power domain (either on or off), unless we get "lucky" and have ACLK_USB running for another reason at the time. This "luck" can result from the bootloader leaving USB powered/clocked, and if no built-in driver wants USB, Linux will disable the unused PD+CLK on boot when {pd,clk}_ignore_unused aren't given. This can also be unlucky because the two cleanup tasks run in parallel and race: if the CLK is disabled first, the PD deactivation stalls the boot. In any case, the PD cannot then be reenabled (if e.g. the driver loads later) once the clock has been stopped. Fix this by specifying a dependency on ACLK_USB, instead of only ACLK_USB_ROOT. The child-parent relationship means the former implies the latter anyway. Fixes: c9211fa2602b8 ("arm64: dts: rockchip: Add base DT for rk3588 SoC") Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards <CFSworks@gmail.com> Link: https://lore.kernel.org/r/20231216021019.1543811-1-CFSworks@gmail.com [changed to only include the missing clock, not dropping the root-clocks] Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24arm64: dts: rockchip: configure eth pad driver strength for orangepi r1 plus ltsTianling Shen
The default strength is not enough to provide stable connection under 3.3v LDO voltage. Fixes: 387b3bbac5ea ("arm64: dts: rockchip: Add Xunlong OrangePi R1 Plus LTS") Cc: stable@vger.kernel.org # 6.6+ Signed-off-by: Tianling Shen <cnsztl@gmail.com> Link: https://lore.kernel.org/r/20231216040723.17864-1-cnsztl@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24arm64: dts: rockchip: Support poweroff on NanoPC-T6Hugh Cole-Baker
The RK806 on the NanoPC-T6 can be used to power on/off the whole board. Mark it as the system power controller. Signed-off-by: Hugh Cole-Baker <sigmaris@gmail.com> Link: https://lore.kernel.org/r/20231216212134.23314-1-sigmaris@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24arm64: dts: rockchip: rk3308-rock-pi-s gpio-line-names cleanupTrevor Woerner
Perform the following cleanups on a previous patch: - indent lines after "gpio-line-names" - fix D0-D8 -> D0-D7 - sort phandle references Fixes: c45de75d7a9a ("arm64: dts: rockchip: add gpio-line-names to rk3308-rock-pi-s") Signed-off-by: Trevor Woerner <twoerner@gmail.com> Link: https://lore.kernel.org/r/20231219173814.1569-1-twoerner@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24arm64: dts: rockchip: Add support for rk3588 based board Cool Pi CM5 EVBAndy Yan
Cool Pi CM5 EVB works as a mother board connect with CM5. CM5 Specification: - Rockchip RK3588 - LPDDR4 2/4/8/16 GB - TF scard slot - eMMC 8/32/64/128 GB module - Gigabit ethernet x 1 with PHY YT8531 - Gigabit ethernet x 1 drived by PCIE with YT6801S CM5 EVB Specification: - HDMI Type A out x 2 - HDMI Type D in x 1 - USB 2.0 Host x 2 - USB 3.0 OTG x 1 - USB 3.0 Host x 1 - PCIE M.2 E Key for Wireless connection - PCIE M.2 M Key for NVME connection - 40 pin header Signed-off-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20231212124407.1897604-1-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24dt-bindings: arm: rockchip: Add Cool Pi CM5Andy Yan
Add Cool Pi CM5, a board powered by RK3588 CM5 EVB works with a mother board connect with CM5 Signed-off-by: Andy Yan <andyshrk@163.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20231212124340.1897502-1-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24arm64: dts: rockchip: Add support for rk3588s based board Cool Pi 4BAndy Yan
CoolPi 4B is a rk3588s based SBC. Specification: - Rockchip RK3588S - LPDDR4 2/4/8/16 GB - TF scard slot - eMMC 8/32/64/128 GB module - Gigabit ethernet drived by PCIE with RTL8111HS - HDMI Type D out - Mini DP out - USB 2.0 Host x 2 - USB 3.0 OTG x 1 - USB 3.0 Host x 1 - WIFI/BT module AIC8800 - 40 pin header Signed-off-by: Andy Yan <andyshrk@163.com> arm64: dts: rockchip: Add support for rk3588s based board Cool Pi 4B Link: https://lore.kernel.org/r/20231212124253.1897438-1-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24dt-bindings: arm: rockchip: Add Cool Pi 4BAndy Yan
Add Cool Pi 4B, a SBC powered by RK3588S Signed-off-by: Andy Yan <andyshrk@163.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20231212124237.1897378-1-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24dt-bindings: vendor-prefixes: Add Cool PiAndy Yan
Add vendor prefix for Cool Pi(https://cool-pi.com/) Signed-off-by: Andy Yan <andyshrk@163.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20231212124223.1897314-1-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24arm64: dts: rockchip: add gpio-line-names to rk3328-rock-pi-eTrevor Woerner
Add names to the pins of the general-purpose expansion header as given in the Radxa GPIO page[1] following the conventions in the kernel documentation[2] to make it easier for users to correlate the pins with functions when using utilities such as 'gpioinfo'. Signed-off-by: Trevor Woerner <twoerner@gmail.com> Link: https://lore.kernel.org/r/20231213160556.14424-1-twoerner@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24ARM: dts: rockchip: Remove rockchip,default-sample-phase from rk3036.dtsiAndy Yan
This should be a per board property, should not be put in a soc core dtsi. And when this property convert from default-sample-phase in linux-5.7 by commit 8a385eb57296 ("ARM: dts: rockchip: fix rockchip,default-sample-phase property names"), the emmc on rk3036 kylin board get a initialising error: [ 4.512797] Freeing unused kernel memory: 8192K [ 4.519500] mmc_host mmc1: Bus speed (slot 0) = 37125000Hz (slot req 37500000Hz, actual 37125000HZ div = 0) [ 4.530971] mmc1: error -84 whilst initialising MMC card [ 4.537277] Run /init as init process [ 4.550932] mmc_host mmc1: Bus speed (slot 0) = 300000Hz (slot req 300000Hz, actual 300000HZ div = 0) [ 4.664717] mmc_host mmc1: Bus speed (slot 0) = 37125000Hz (slot req 37500000Hz, actual 37125000HZ div = 0) [ 4.676156] mmc1: error -84 whilst initialising MMC card I think the reason why the emmc on rk3036 kylin board was able to work before linux-5.7 was that the illegal property was not correctly identified by the rockchip dw_mmc driver. Fixes: faea098e1808 ("ARM: dts: rockchip: add core rk3036 dtsi") Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://lore.kernel.org/r/20231218105523.2478315-4-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-12-24ARM: dts: rockchip: Add stdout-path for rk3036 kylinAndy Yan
Add stdout-path to get a uart console when system boot. Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Link: https://lore.kernel.org/r/20231218105523.2478315-3-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>