summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-11btrfs: unify index_cnt and csum_bytes from struct btrfs_inodeFilipe Manana
The index_cnt field of struct btrfs_inode is used only for two purposes: 1) To store the index for the next entry added to a directory; 2) For the data relocation inode to track the logical start address of the block group currently being relocated. For the relocation case we use index_cnt because it's not used for anything else in the relocation use case - we could have used other fields that are not used by relocation such as defrag_bytes, last_unlink_trans or last_reflink_trans for example (among others). Since the csum_bytes field is not used for directories, do the following changes: 1) Put index_cnt and csum_bytes in a union, and index_cnt is only initialized when the inode is a directory. The csum_bytes is only accessed in IO paths for regular files, so we're fine here; 2) Use the defrag_bytes field for relocation, since the data relocation inode is never used for defrag purposes. And to make the naming better, alias it to reloc_block_group_start by using a union. This reduces the size of struct btrfs_inode by 8 bytes in a release kernel, from 1056 bytes down to 1048 bytes. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: remove inode_lock from struct btrfs_root and use xarray locksFilipe Manana
Currently we use the spinlock inode_lock from struct btrfs_root to serialize access to two different data structures: 1) The delayed inodes xarray (struct btrfs_root::delayed_nodes); 2) The inodes xarray (struct btrfs_root::inodes). Instead of using our own lock, we can use the spinlock that is part of the xarray implementation, by using the xa_lock() and xa_unlock() APIs and using the xarray APIs with the double underscore prefix that don't take the xarray locks and assume the caller is using xa_lock() and xa_unlock(). So remove the spinlock inode_lock from struct btrfs_root and use the corresponding xarray locks. This brings 2 benefits: 1) We reduce the size of struct btrfs_root, from 1336 bytes down to 1328 bytes on a 64 bits release kernel config; 2) We reduce lock contention by not using anymore the same lock for changing two different and unrelated xarrays. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: reduce nesting and deduplicate error handling at btrfs_iget_path()Filipe Manana
Make btrfs_iget_path() simpler and easier to read by avoiding nesting of if-then-else statements and having an error label to do all the error handling instead of repeating it a couple times. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: preallocate inodes xarray entry to avoid transaction abortFilipe Manana
When creating a new inode, at btrfs_create_new_inode(), one of the very last steps is to add the inode to the root's inodes xarray. This often requires allocating memory which may fail (even though xarrays have a dedicated kmem_cache which make it less likely to fail), and at that point we are forced to abort the current transaction (as some, but not all, of the inode metadata was added to its subvolume btree). To avoid a transaction abort, preallocate memory for the xarray early at btrfs_create_new_inode(), so that if we fail we don't need to abort the transaction and the insertion into the xarray is guaranteed to succeed. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: use an xarray to track open inodes in a rootFilipe Manana
Currently we use a red black tree (rb-tree) to track the currently open inodes of a root (in struct btrfs_root::inode_tree). This however is not very efficient when the number of inodes is large since rb-trees are binary trees. For example for 100K open inodes, the tree has a depth of 17. Besides that, inserting into the tree requires navigating through it and pulling useless cache lines in the process since the red black tree nodes are embedded within the btrfs inode - on the other hand, by being embedded, it requires no extra memory allocations. We can improve this by using an xarray instead, which is efficient when indices are densely clustered (such as inode numbers), is more cache friendly and behaves like a resizable array, with a much better search and insertion complexity than a red black tree. This only has one small disadvantage which is that insertion will sometimes require allocating memory for the xarray - which may fail (not that often since it uses a kmem_cache) - but on the other hand we can reduce the btrfs inode structure size by 24 bytes (from 1080 down to 1056 bytes) after removing the embedded red black tree node, which after the next patches will allow to reduce the size of the structure to 1024 bytes, meaning we will be able to store 4 inodes per 4K page instead of 3 inodes. This change does a straightforward change to use an xarray, and results in a transaction abort if we can't allocate memory for the xarray when creating an inode - but the next patch changes things so that we don't need to abort. Running the following fs_mark test showed some improvements: $ cat test.sh #!/bin/bash DEV=/dev/nullb0 MNT=/mnt/nullb0 MOUNT_OPTIONS="-o ssd" FILES=100000 THREADS=$(nproc --all) echo "performance" | \ tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor mkfs.btrfs -f $DEV mount $MOUNT_OPTIONS $DEV $MNT OPTS="-S 0 -L 5 -n $FILES -s 0 -t $THREADS -k" for ((i = 1; i <= $THREADS; i++)); do OPTS="$OPTS -d $MNT/d$i" done fs_mark $OPTS umount $MNT Before this patch: FSUse% Count Size Files/sec App Overhead 10 1200000 0 92081.6 12505547 16 2400000 0 138222.6 13067072 23 3600000 0 148833.1 13290336 43 4800000 0 97864.7 13931248 53 6000000 0 85597.3 14384313 After this patch: FSUse% Count Size Files/sec App Overhead 10 1200000 0 93225.1 12571078 16 2400000 0 146720.3 12805007 23 3600000 0 160626.4 13073835 46 4800000 0 116286.2 13802927 53 6000000 0 90087.9 14754892 The test was run with a release kernel config (Debian's default config). Also capturing the insertion times into the rb tree and into the xarray, that is measuring the duration of the old function inode_tree_add() and the duration of the new btrfs_add_inode_to_root() function, gave the following results (in nanoseconds): Before this patch, inode_tree_add() execution times: Count: 5000000 Range: 0.000 - 5536887.000; Mean: 775.674; Median: 729.000; Stddev: 4820.961 Percentiles: 90th: 1015.000; 95th: 1139.000; 99th: 1397.000 0.000 - 7.816: 40 | 7.816 - 37.858: 209 | 37.858 - 170.278: 6059 | 170.278 - 753.961: 2754890 ##################################################### 753.961 - 3326.728: 2232312 ########################################### 3326.728 - 14667.018: 4366 | 14667.018 - 64652.943: 852 | 64652.943 - 284981.761: 550 | 284981.761 - 1256150.914: 221 | 1256150.914 - 5536887.000: 7 | After this patch, btrfs_add_inode_to_root() execution times: Count: 5000000 Range: 0.000 - 2900652.000; Mean: 272.148; Median: 241.000; Stddev: 2873.369 Percentiles: 90th: 342.000; 95th: 432.000; 99th: 572.000 0.000 - 7.264: 104 | 7.264 - 33.145: 352 | 33.145 - 140.081: 109606 # 140.081 - 581.930: 4840090 ##################################################### 581.930 - 2407.590: 43532 | 2407.590 - 9950.979: 2245 | 9950.979 - 41119.278: 514 | 41119.278 - 169902.616: 155 | 169902.616 - 702018.539: 47 | 702018.539 - 2900652.000: 9 | Average, percentiles, standard deviation, etc, are all much better. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: raid56: do extra dumping for CONFIG_BTRFS_ASSERTQu Wenruo
There are several hard-to-hit ASSERT()s hit inside raid56. Unfortunately the ASSERT() expression is a little complex, and except the ASSERT(), there is nothing to provide any clue. Considering if race is involved, it's pretty hard to reproduce. Meanwhile sometimes the dump of the rbio structure can provide some pretty good clues, it's worth to do the extra multi-line dump for btrfs raid56 related code. The dump looks like this: BTRFS critical (device dm-3): bioc logical=4598530048 full_stripe=4598530048 size=0 map_type=0x81 mirror=0 replace_nr_stripes=0 replace_stripe_src=-1 num_stripes=5 BTRFS critical (device dm-3): nr=0 devid=1 physical=1166147584 BTRFS critical (device dm-3): nr=1 devid=2 physical=1145176064 BTRFS critical (device dm-3): nr=2 devid=4 physical=1145176064 BTRFS critical (device dm-3): nr=3 devid=5 physical=1145176064 BTRFS critical (device dm-3): nr=4 devid=3 physical=1145176064 BTRFS critical (device dm-3): rbio flags=0x0 nr_sectors=80 nr_data=4 real_stripes=5 stripe_nsectors=16 scrubp=0 dbitmap=0x0 BTRFS critical (device dm-3): logical=4598530048 assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1702 Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: fix function name in comment for btrfs_remove_ordered_extent()Filipe Manana
Due to a refactoring introduced by commit 53d9981ca20e ("btrfs: split btrfs_alloc_ordered_extent to allocation and insertion helpers"), the function btrfs_alloc_ordered_extent() was renamed to alloc_ordered_extent(), so the comment at btrfs_remove_ordered_extent() is no longer very accurate. Update the comment to refer to the new name "alloc_ordered_extent()". Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: fix misspelled end IO compression callbacksFilipe Manana
Fix typo in the end IO compression callbacks, from "comprssed" to "compressed". Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: remove no longer used btrfs_migrate_to_delayed_refs_rsv()Filipe Manana
The function btrfs_migrate_to_delayed_refs_rsv() is no longer used. Its last use was removed in commit 2f6397e448e6 ("btrfs: don't refill whole delayed refs block reserve when starting transaction"). So remove the function. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: zoned: make btrfs_get_dev_zone() staticFilipe Manana
It's not used outside zoned.c, so make it static. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: pass struct btrfs_io_geometry into handle_ops_on_dev_replace()Johannes Thumshirn
Passing in a 'struct btrfs_io_geometry into handle_ops_on_dev_replace can reduce the number of arguments by two. No functional changes otherwise. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: qgroup: do quick checks if quotas are enabled before starting ioctlsDavid Sterba
The ioctls that add relations, create qgroups or set limits start/join transaction. When quotas are not enabled this is not necessary, there will be errors reported back anyway but this could be also misleading and we should really report that quotas are not enabled. For that use -ENOTCONN. The helper is meant to do a quick check before any other standard ioctl checks are done. If quota is disabled meanwhile we still rely on proper locking inside any active operation changing the qgroup structures. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11regmap: kunit: Add test cases for regmap_multi_reg_(read,write}()Guenter Roeck
Add test cases for regmap_multi_reg_read() and regmap_multi_reg_write(). Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/20240711055352.3411807-1-linux@roeck-us.net Signed-off-by: Mark Brown <broonie@kernel.org>
2024-07-11Merge tag 'nf-24-07-11' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: Patch #1 fixes a bogus WARN_ON splat in nfnetlink_queue. Patch #2 fixes a crash due to stack overflow in chain loop detection by using the existing chain validation routines Both patches from Florian Westphal. netfilter pull request 24-07-11 * tag 'nf-24-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: prefer nft_chain_validate netfilter: nfnetlink_queue: drop bogus WARN_ON ==================== Link: https://patch.msgid.link/20240711093948.3816-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11Merge tag 'for-netdev' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-11 The following pull-request contains BPF updates for your *net* tree. We've added 4 non-merge commits during the last 2 day(s) which contain a total of 4 files changed, 262 insertions(+), 19 deletions(-). The main changes are: 1) Fixes for a BPF timer lockup and a use-after-free scenario when timers are used concurrently, from Kumar Kartikeya Dwivedi. 2) Fix the argument order in the call to bpf_map_kvcalloc() which could otherwise lead to a compilation error, from Mohammad Shehar Yaar Tausif. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add timer lockup selftest bpf: Defer work in bpf_timer_cancel_and_free bpf: Fail bpf_timer_cancel when callback is being cancelled bpf: fix order of args in call to bpf_map_kvcalloc ==================== Link: https://patch.msgid.link/20240711084016.25757-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socketDaniel Borkmann
When using a BPF program on kernel_connect(), the call can return -EPERM. This causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing the kernel to potentially freeze up. Neil suggested: This will propagate -EPERM up into other layers which might not be ready to handle it. It might be safer to map EPERM to an error we would be more likely to expect from the network system - such as ECONNREFUSED or ENETDOWN. ECONNREFUSED as error seems reasonable. For programs setting a different error can be out of reach (see handling in 4fbac77d2d09) in particular on kernels which do not have f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow boolean"), thus given that it is better to simply remap for consistent behavior. UDP does handle EPERM in xs_udp_send_request(). Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect") Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Co-developed-by: Lex Siegel <usiegl00@gmail.com> Signed-off-by: Lex Siegel <usiegl00@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trondmy@kernel.org> Cc: Anna Schumaker <anna@kernel.org> Link: https://github.com/cilium/cilium/issues/33395 Link: https://lore.kernel.org/bpf/171374175513.12877.8993642908082014881@noble.neil.brown.name Link: https://patch.msgid.link/9069ec1d59e4b2129fc23433349fd5580ad43921.1720075070.git.daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11net/sched: Fix UAF when resolving a clashChengen Du
KASAN reports the following UAF: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> <TASK> asm_common_interrupt+0x27/0x40 Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 The ct may be dropped if a clash has been resolved but is still passed to the tcf_ct_flow_table_process_conn function for further usage. This issue can be fixed by retrieving ct from skb again after confirming conntrack. Fixes: 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action") Co-developed-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Gerald Yang <gerald.yang@canonical.com> Signed-off-by: Chengen Du <chengen.du@canonical.com> Link: https://patch.msgid.link/20240710053747.13223-1-chengen.du@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11Documentation/ABI/configfs-tsm: Fix an unexpected indentation sillyBorislav Petkov (AMD)
Fix: Documentation/ABI/testing/configfs-tsm:97: ERROR: Unexpected indentation when building htmldocs with sphinx. I can't say I'm loving those rigid sphinx rules but whatever, make it shut up. Fixes: 627dc671518b ("x86/sev: Extend the config-fs attestation support for an SVSM") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20240701184557.4735ca3d@canb.auug.org.au
2024-07-11x86/sev: Do RMP memory coverage check after max_pfn has been setTom Lendacky
The RMP table is probed early in the boot process before max_pfn has been set, so the logic to check if the RMP covers all of system memory is not valid. Move the RMP memory coverage check from snp_probe_rmptable_info() into snp_rmptable_init(), which is well after max_pfn has been set. Also, fix the calculation to use PFN_UP instead of PHYS_PFN, in order to compute the required RMP size properly. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/bec4364c7e34358cc576f01bb197a7796a109169.1718984524.git.thomas.lendacky@amd.com
2024-07-11x86/sev: Move SEV compilation unitsBorislav Petkov (AMD)
A long time ago it was agreed upon that the coco stuff needs to go where it belongs: https://lore.kernel.org/all/Yg5nh1RknPRwIrb8@zn.tnic and not keep it in arch/x86/kernel. TDX did that and SEV can't find time to do so. So lemme do it. If people have trouble converting their ongoing featuritis patches, ask me for a sed script. No functional changes. Move the instrumentation exclusion bits too, as helpfully caught and reported by the 0day folks. Closes: https://lore.kernel.org/oe-kbuild-all/202406220748.hG3qlmDx-lkp@intel.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-lkp/202407091342.46d7dbb-oliver.sang@intel.com Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Ashish Kalra <ashish.kalra@amd.com> Tested-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/r/20240619093014.17962-1-bp@kernel.org
2024-07-11net: ks8851: Fix potential TX stall after interface reopenRonald Wahl
The amount of TX space in the hardware buffer is tracked in the tx_space variable. The initial value is currently only set during driver probing. After closing the interface and reopening it the tx_space variable has the last value it had before close. If it is smaller than the size of the first send packet after reopeing the interface the queue will be stopped. The queue is woken up after receiving a TX interrupt but this will never happen since we did not send anything. This commit moves the initialization of the tx_space variable to the ks8851_net_open function right before starting the TX queue. Also query the value from the hardware instead of using a hard coded value. Only the SPI chip variant is affected by this issue because only this driver variant actually depends on the tx_space variable in the xmit function. Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun") Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240709195845.9089-1-rwahl@gmx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().Kuniyuki Iwashima
syzkaller triggered the warning [0] in udp_v4_early_demux(). In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount of the looked-up sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace period. Currently, SOCK_RCU_FREE is flagged for a bound socket after being put into the hash table. Moreover, the SOCK_RCU_FREE check is done too early in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race window: CPU1 CPU2 ---- ---- udp_v4_early_demux() udp_lib_get_port() | |- hlist_add_head_rcu() |- sk = __udp4_lib_demux_lookup() | |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk)); `- sock_set_flag(sk, SOCK_RCU_FREE) We had the same bug in TCP and fixed it in commit 871019b22d1b ("net: set SOCK_RCU_FREE before inserting socket into hashtable"). Let's apply the same fix for UDP. [0]: WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599 Modules linked in: CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda33046e7 #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599 Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52 RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001 RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680 R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e FS: 00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 PKRU: 55555554 Call Trace: <TASK> ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349 ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624 __netif_receive_skb+0x21/0xd0 net/core/dev.c:5738 netif_receive_skb_internal net/core/dev.c:5824 [inline] netif_receive_skb+0x271/0x300 net/core/dev.c:5884 tun_rx_batched drivers/net/tun.c:1549 [inline] tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002 tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048 new_sync_write fs/read_write.c:497 [inline] vfs_write+0x76f/0x8d0 fs/read_write.c:590 ksys_write+0xbf/0x190 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x41/0x50 fs/read_write.c:652 x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fc44a68bc1f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48 RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8 RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000 R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000 </TASK> Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240709191356.24010-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11i2c: mark HostNotify target address as usedWolfram Sang
I2C core handles the local target for receiving HostNotify alerts. There is no separate driver bound to that address. That means userspace can access it if desired, leading to further complications if controllers are not capable of reading their own local target. Bind the local target to the dummy driver so it will be marked as "handled by the kernel" if the HostNotify feature is used. That protects aginst userspace access and prevents other drivers binding to it. Fixes: 2a71593da34d ("i2c: smbus: add core function handling SMBus host-notify") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-07-11i2c: testunit: correct Kconfig descriptionWolfram Sang
The testunit has nothing to do with 'eeprom', remove that term. It was a copy&paste leftover. Fixes: a8335c64c5f0 ("i2c: add slave testunit driver") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-07-11netfilter: nf_tables: prefer nft_chain_validateFlorian Westphal
nft_chain_validate already performs loop detection because a cycle will result in a call stack overflow (ctx->level >= NFT_JUMP_STACK_SIZE). It also follows maps via ->validate callback in nft_lookup, so there appears no reason to iterate the maps again. nf_tables_check_loops() and all its helper functions can be removed. This improves ruleset load time significantly, from 23s down to 12s. This also fixes a crash bug. Old loop detection code can result in unbounded recursion: BUG: TASK stack guard page was hit at .... Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN CPU: 4 PID: 1539 Comm: nft Not tainted 6.10.0-rc5+ #1 [..] with a suitable ruleset during validation of register stores. I can't see any actual reason to attempt to check for this from nft_validate_register_store(), at this point the transaction is still in progress, so we don't have a full picture of the rule graph. For nf-next it might make sense to either remove it or make this depend on table->validate_state in case we could catch an error earlier (for improved error reporting to userspace). Fixes: 20a69341f2d0 ("netfilter: nf_tables: add netlink set API") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-11netfilter: nfnetlink_queue: drop bogus WARN_ONFlorian Westphal
Happens when rules get flushed/deleted while packet is out, so remove this WARN_ON. This WARN exists in one form or another since v4.14, no need to backport this to older releases, hence use a more recent fixes tag. Fixes: 3f8019688894 ("netfilter: move nf_reinject into nfnetlink_queue modules") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202407081453.11ac0f63-lkp@intel.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-07-11MAINTAINERS: VIRTIO I2C loses a maintainer, gains a reviewerWolfram Sang
Conghui Chen left, welcome Jian as reviewer. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: "Chen, Jian Jun" <jian.jun.chen@intel.com>
2024-07-11ethtool: netlink: do not return SQI value if link is downOleksij Rempel
Do not attach SQI value if link is down. "SQI values are only valid if link-up condition is present" per OpenAlliance specification of 100Base-T1 Interoperability Test suite [1]. The same rule would apply for other link types. [1] https://opensig.org/automotive-ethernet-specifications/# Fixes: 806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Woojung Huh <woojung.huh@microchip.com> Link: https://patch.msgid.link/20240709061943.729381-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11ppp: reject claimed-as-LCP but actually malformed packetsDmitry Antipov
Since 'ppp_async_encode()' assumes valid LCP packets (with code from 1 to 7 inclusive), add 'ppp_check_packet()' to ensure that LCP packet has an actual body beyond PPP_LCP header bytes, and reject claimed-as-LCP but actually malformed data otherwise. Reported-by: syzbot+ec0723ba9605678b14bf@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ec0723ba9605678b14bf Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11MAINTAINERS: delete entries for Thor ThayerWolfram Sang
The email address bounced. I couldn't find a newer one in recent git history. Delete the entries and let them fallback to subsystem defaults. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-07-11sched: Update MAINTAINERS and CREDITSPeter Zijlstra
Thank you Daniel for having been our friend! Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240708075752.GF11386@noisy.programming.kicks-ass.net
2024-07-11Merge branch 'sched/urgent' into sched/core, to pick up fixes and refresh ↵Ingo Molnar
the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-07-11selftests/bpf: Add timer lockup selftestKumar Kartikeya Dwivedi
Add a selftest that tries to trigger a situation where two timer callbacks are attempting to cancel each other's timer. By running them continuously, we hit a condition where both run in parallel and cancel each other. Without the fix in the previous patch, this would cause a lockup as hrtimer_cancel on either side will wait for forward progress from the callback. Ensure that this situation leads to a EDEADLK error. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240711052709.2148616-1-memxor@gmail.com
2024-07-11net: ethernet: mtk-star-emac: set mac_managed_pm when probingJian Hui Lee
The below commit introduced a warning message when phy state is not in the states: PHY_HALTED, PHY_READY, and PHY_UP. commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") mtk-star-emac doesn't need mdiobus suspend/resume. To fix the warning message during resume, indicate the phy resume/suspend is managed by the mac when probing. Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Signed-off-by: Jian Hui Lee <jianhui.lee@canonical.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240708065210.4178980-1-jianhui.lee@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-11kernel: rerun task_work while freezing in get_signal()Pavel Begunkov
io_uring can asynchronously add a task_work while the task is getting freezed. TIF_NOTIFY_SIGNAL will prevent the task from sleeping in do_freezer_trap(), and since the get_signal()'s relock loop doesn't retry task_work, the task will spin there not being able to sleep until the freezing is cancelled / the task is killed / etc. Run task_works in the freezer path. Keep the patch small and simple so it can be easily back ported, but we might need to do some cleaning after and look if there are other places with similar problems. Cc: stable@vger.kernel.org Link: https://github.com/systemd/systemd/issues/33626 Fixes: 12db8b690010c ("entry: Add support for TIF_NOTIFY_SIGNAL") Reported-by: Julian Orth <ju.orth@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89ed3a52933370deaaf61a0a620a6ac91f1e754d.1720634146.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-11io_uring/io-wq: limit retrying worker initialisationPavel Begunkov
If io-wq worker creation fails, we retry it by queueing up a task_work. tasK_work is needed because it should be done from the user process context. The problem is that retries are not limited, and if queueing a task_work is the reason for the failure, we might get into an infinite loop. It doesn't seem to happen now but it would with the following patch executing task_work in the freezer's loop. For now, arbitrarily limit the number of attempts to create a worker. Cc: stable@vger.kernel.org Fixes: 3146cba99aa28 ("io-wq: make worker creation resilient against signals") Reported-by: Julian Orth <ju.orth@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8280436925db88448c7c85c6656edee1a43029ea.1720634146.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-11platform/x86/amd/pmf: Remove update system state documentShyam Sundar S K
This commit removes the "pmf.rst" document, which was associated with the PMF driver that enabled system state updates based on TA output actions. The driver now uses existing input events (KEY_SCREENLOCK, KEY_SLEEP, and KEY_SUSPEND) instead of defining new udev rules in the "/etc/udev/rules.d/" directory. Consequently, the pmf.rst document is no longer necessary. Therefore, the pmf.rst documentation is being removed. Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20240711052047.1531957-2-Shyam-sundar.S-k@amd.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-11platform/x86/amd/pmf: Use existing input event codes to update system statesShyam Sundar S K
At present, the PMF driver employs custom system state codes to update system states. It is recommended to replace these with existing input event codes (KEY_SLEEP, KEY_SUSPEND, and KEY_SCREENLOCK) to align system updates with the PMF-TA output actions. Co-developed-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20240711052047.1531957-1-Shyam-sundar.S-k@amd.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-11dt-bindings: pwm: at91: Add sama7d65 compatible stringNicolas Ferre
Add compatible string for sama7d65. Like sama7g5, it currently binds to "atmel,sama5d2-pwm" compatibility string for this driver, so add an "enum" to reflect that. Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240710163651.343751-1-nicolas.ferre@microchip.com Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2024-07-11erofs: avoid refcounting short-lived pagesGao Xiang
LZ4 always reuses the decompressed buffer as its LZ77 sliding window (dynamic dictionary) for optimal performance. However, in specific cases, the output buffer may not fully contain valid page cache pages, resulting in the use of short-lived pages for temporary purposes. Due to the limited sliding window size, LZ4 shortlived bounce pages can also be reused in a sliding manner, so each bounce page can be vmapped multiple times in different relative positions by design. In order to avoiding double frees, currently, reuse counts are recorded via page refcount, but it will no longer be used as-is in the future world of Memdescs. Just maintain a lookup table to check if a shortlived page is reused. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240711053659.1364989-1-hsiangkao@linux.alibaba.com
2024-07-11xen/arm: Convert comma to semicolonChen Ni
Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-ID: <20240710014208.1719662-1-nichen@iscas.ac.cn> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-10Merge branch 'ice-support-to-dump-phy-config-fec'Jakub Kicinski
Tony Nguyen says: ==================== ice: Support to dump PHY config, FEC Anil Samal says: Implementation to dump PHY configuration and FEC statistics to facilitate link level debugging of customer issues. Implementation has two parts a. Serdes equalization # ethtool -d eth0 Output: Offset Values ------ ------ 0x0000: 00 00 00 00 03 00 00 00 05 00 00 00 01 08 00 40 0x0010: 01 00 00 40 00 00 39 3c 01 00 00 00 00 00 00 00 0x0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 0x01f0: 01 00 00 00 ef be ad de 8f 00 00 00 00 00 00 00 0x0200: 00 00 00 00 ef be ad de 00 00 00 00 00 00 00 00 0x0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0230: 00 00 00 00 00 00 00 00 00 00 00 00 fa ff 00 00 0x0240: 06 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 0x0250: 0f b0 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 0x0260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x02e0: 00 00 00 00 00 00 00 00 00 00 00 00 Current implementation appends 176 bytes i.e. 44 bytes * 4 serdes lane. For port with 2 serdes lane, first 88 bytes are valid values and remaining 88 bytes are filled with zero. Similarly for port with 1 serdes lane, first 44 bytes are valid and remaining 132 bytes are marked zero. Each set of serdes equalizer parameter (i.e. set of 44 bytes) follows below order a. rx_equalization_pre2 b. rx_equalization_pre1 c. rx_equalization_post1 d. rx_equalization_bflf e. rx_equalization_bfhf f. rx_equalization_drate g. tx_equalization_pre1 h. tx_equalization_pre3 i. tx_equalization_atten j. tx_equalization_post1 k. tx_equalization_pre2 Where each individual equalizer parameter is of 4 bytes. As ethtool prints values as individual bytes, for little endian machine these values will be in reverse byte order. b. FEC block counts # ethtool -I --show-fec eth0 Output: FEC parameters for eth0: Supported/Configured FEC encodings: Auto RS BaseR Active FEC encoding: RS Statistics: corrected_blocks: 0 uncorrectable_blocks: 0 This series do following: Patch 1 - Implementation to support user provided flag for side band queue command. Patch 2 - Currently driver does not have a way to derive serdes lane number, pcs quad , pcs port from port number. So we introduced a mechanism to derive above info. Ethtool interface extension to include FEC statistics counter. Patch 3 - Ethtool interface extension to include serdes equalizer output. v1: https://lore.kernel.org/netdev/20240702180710.2606969-1-anthony.l.nguyen@intel.com/ ==================== Link: https://patch.msgid.link/20240709202951.2103115-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10ice: Implement driver functionality to dump serdes equalizer valuesAnil Samal
To debug link issues in the field, serdes Tx/Rx equalizer values help to determine the health of serdes lane. Extend 'ethtool -d' option to dump serdes Tx/Rx equalizer. The following list of equalizer param is supported a. rx_equalization_pre2 b. rx_equalization_pre1 c. rx_equalization_post1 d. rx_equalization_bflf e. rx_equalization_bfhf f. rx_equalization_drate g. tx_equalization_pre1 h. tx_equalization_pre3 i. tx_equalization_atten j. tx_equalization_post1 k. tx_equalization_pre2 Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10ice: Implement driver functionality to dump fec statisticsAnil Samal
To debug link issues in the field, it is paramount to dump fec corrected/uncorrected block counts from firmware. Firmware requires PCS quad number and PCS port number to read FEC statistics. Current driver implementation does not maintain above physical properties of a port. Add new driver API to derive physical properties of an input port.These properties include PCS quad number, PCS port number, serdes lane count, primary serdes lane number. Extend ethtool option '--show-fec' to support fec statistics. The IEEE standard mandates two sets of counters: - 30.5.1.1.17 aFECCorrectedBlocks - 30.5.1.1.18 aFECUncorrectableBlocks Standard defines above statistics per lane but current implementation supports total FEC statistics per port i.e. sum of all lane per port. Find sample output below FEC parameters for ens21f0np0: Supported/Configured FEC encodings: Auto RS BaseR Active FEC encoding: RS Statistics: corrected_blocks: 0 uncorrectable_blocks: 0 Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10ice: Extend Sideband Queue command to support flagsAnil Samal
Current driver implementation for Sideband Queue supports a fixed flag (ICE_AQ_FLAG_RD). To retrieve FEC statistics from firmware, Sideband Queue command is used with a different flag. Extend API for Sideband Queue command to use 'flags' as input argument. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Anil Samal <anil.samal@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240709202951.2103115-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10e1000e: fix force smbus during suspend flowVitaly Lifshits
Commit 861e8086029e ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue") resolved a PHY access loss during suspend on Meteor Lake consumer platforms, but it affected corporate systems incorrectly. A better fix, working for both consumer and corporate systems, was proposed in commit bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function"). However, it introduced a regression on older devices, such as [8086:15B8], [8086:15F9], [8086:15BE]. This patch aims to fix the secondary regression, by limiting the scope of the changes to Meteor Lake platforms only. Fixes: bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function") Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940 Reported-by: Dieter Mummenschanz <dmummenschanz@web.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936 Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240709203123.2103296-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10dt-bindings: net: convert enetc to yamlFrank Li
Convert enetc device binding file to yaml. Split to 3 yaml files, 'fsl,enetc.yaml', 'fsl,enetc-mdio.yaml', 'fsl,enetc-ierb.yaml'. Additional Changes: - Add pci<vendor id>,<production id> in compatible string. - Ref to common ethernet-controller.yaml and mdio.yaml. - Add Wei fang, Vladimir and Claudiu as maintainer. - Update ENETC description. - Remove fixed-link part. Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240709214841.570154-1-Frank.Li@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10tcp: avoid too many retransmit packetsEric Dumazet
If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2 ms for HZ=1000), for about 4 minutes after TCP_USER_TIMEOUT has 'expired'. The fix is to make sure tcp_rtx_probe0_timed_out() takes icsk->icsk_user_timeout into account. Before blamed commit, the socket would not timeout after icsk->icsk_user_timeout, but would use standard exponential backoff for the retransmits. Also worth noting that before commit e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0"), the issue would last 2 minutes instead of 4. Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10dt-bindings: net: realtek,rtl82xx: Document RTL8211F LED supportMarek Vasut
The RTL8211F PHY does support LED configuration, document support for LEDs in the binding document. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240708211649.165793-1-marex@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10Merge branch 'fixes-for-bpf-timer-lockup-and-uaf'Alexei Starovoitov
Kumar Kartikeya Dwivedi says: ==================== Fixes for BPF timer lockup and UAF The following patches contain fixes for timer lockups and a use-after-free scenario. This set proposes to fix the following lockup situation for BPF timers. CPU 1 CPU 2 bpf_timer_cb bpf_timer_cb timer_cb1 timer_cb2 bpf_timer_cancel(timer_cb2) bpf_timer_cancel(timer_cb1) hrtimer_cancel hrtimer_cancel In this case, both callbacks will continue waiting for each other to finish synchronously, causing a lockup. The proposed fix adds support for tracking in-flight cancellations *begun by other timer callbacks* for a particular BPF timer. Whenever preparing to call hrtimer_cancel, a callback will increment the target timer's counter, then inspect its in-flight cancellations, and if non-zero, return -EDEADLK to avoid situations where the target timer's callback is waiting for its completion. This does mean that in cases where a callback is fired and cancelled, it will be unable to cancel any timers in that execution. This can be alleviated by maintaining the list of waiting callbacks in bpf_hrtimer and searching through it to avoid interdependencies, but this may introduce additional delays in bpf_timer_cancel, in addition to requiring extra state at runtime which may need to be allocated or reused from bpf_hrtimer storage. Moreover, extra synchronization is needed to delete these elements from the list of waiting callbacks once hrtimer_cancel has finished. The second patch is for a deadlock situation similar to above in bpf_timer_cancel_and_free, but also a UAF scenario that can occur if timer is armed before entering it, if hrtimer_running check causes the hrtimer_cancel call to be skipped. As seen above, synchronous hrtimer_cancel would lead to deadlock (if same callback tries to free its timer, or two timers free each other), therefore we queue work onto the global workqueue to ensure outstanding timers are cancelled before bpf_hrtimer state is freed. Further details are in the patches. ==================== Link: https://lore.kernel.org/r/20240709185440.1104957-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>