summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-12net: macb: fix sleep inside spinlockSascha Hauer
macb_set_tx_clk() is called under a spinlock but itself calls clk_set_rate() which can sleep. This results in: | BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 | pps pps1: new PPS source ptp1 | in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 40, name: kworker/u4:3 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | 4 locks held by kworker/u4:3/40: | #0: ffff000003409148 | macb ff0c0000.ethernet: gem-ptp-timer ptp clock registered. | ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c | #1: ffff8000833cbdd8 ((work_completion)(&pl->resolve)){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c | #2: ffff000004f01578 (&pl->state_mutex){+.+.}-{4:4}, at: phylink_resolve+0x44/0x4e8 | #3: ffff000004f06f50 (&bp->lock){....}-{3:3}, at: macb_mac_link_up+0x40/0x2ac | irq event stamp: 113998 | hardirqs last enabled at (113997): [<ffff800080e8503c>] _raw_spin_unlock_irq+0x30/0x64 | hardirqs last disabled at (113998): [<ffff800080e84478>] _raw_spin_lock_irqsave+0xac/0xc8 | softirqs last enabled at (113608): [<ffff800080010630>] __do_softirq+0x430/0x4e4 | softirqs last disabled at (113597): [<ffff80008001614c>] ____do_softirq+0x10/0x1c | CPU: 0 PID: 40 Comm: kworker/u4:3 Not tainted 6.5.0-11717-g9355ce8b2f50-dirty #368 | Hardware name: ... ZynqMP ... (DT) | Workqueue: events_power_efficient phylink_resolve | Call trace: | dump_backtrace+0x98/0xf0 | show_stack+0x18/0x24 | dump_stack_lvl+0x60/0xac | dump_stack+0x18/0x24 | __might_resched+0x144/0x24c | __might_sleep+0x48/0x98 | __mutex_lock+0x58/0x7b0 | mutex_lock_nested+0x24/0x30 | clk_prepare_lock+0x4c/0xa8 | clk_set_rate+0x24/0x8c | macb_mac_link_up+0x25c/0x2ac | phylink_resolve+0x178/0x4e8 | process_one_work+0x1ec/0x51c | worker_thread+0x1ec/0x3e4 | kthread+0x120/0x124 | ret_from_fork+0x10/0x20 The obvious fix is to move the call to macb_set_tx_clk() out of the protected area. This seems safe as rx and tx are both disabled anyway at this point. It is however not entirely clear what the spinlock shall protect. It could be the read-modify-write access to the NCFGR register, but this is accessed in macb_set_rx_mode() and macb_set_rxcsum_feature() as well without holding the spinlock. It could also be the register accesses done in mog_init_rings() or macb_init_buffers(), but again these functions are called without holding the spinlock in macb_hresp_error_task(). The locking seems fishy in this driver and it might deserve another look before this patch is applied. Fixes: 633e98a711ac0 ("net: macb: use resolved link config in mac_link_up()") Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Link: https://lore.kernel.org/r/20230908112913.1701766-1-s.hauer@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12wwan: core: Use the bitmap API to allocate bitmapsAndy Shevchenko
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them. It is less verbose and it improves the type checking and semantic. While at it, add missing header inclusion (should be bitops.h, but with the above change it becomes bitmap.h). Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230911131618.4159437-1-andriy.shevchenko@linux.intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12net: dst: remove unnecessary input parameter in dst_alloc and dst_initZhengchao Shao
Since commit 1202cdd66531("Remove DECnet support from kernel") has been merged, all callers pass in the initial_ref value of 1 when they call dst_alloc(). Therefore, remove initial_ref when the dst_alloc() is declared and replace initial_ref with 1 in dst_alloc(). Also when all callers call dst_init(), the value of initial_ref is 1. Therefore, remove the input parameter initial_ref of the dst_init() and replace initial_ref with the value 1 in dst_init. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20230911125045.346390-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12Merge branch 'add-support-for-icssg-on-am64x-evm'Paolo Abeni
MD Danish Anwar says: ==================== Add support for ICSSG on AM64x EVM This series adds support for ICSSG driver on AM64x EVM. First patch of the series adds compatible for AM64x EVM in icssg-prueth dt binding. Second patch adds support for AM64x compatible in the ICSSG driver. This series addresses comments on [v1] (which was posted as RFC). This series is based on the latest net-next/main. This series has no dependency. Changes from v1 to v2: *) Made the compatible list in patch 1 alphanumerically ordered as asked by Krzysztof. *) Dropped the RFC tag. *) Added RB tags of Andrew and Roger. [v1] https://lore.kernel.org/all/20230830113724.1228624-1-danishanwar@ti.com/ ==================== Link: https://lore.kernel.org/r/20230911054308.2163076-1-danishanwar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12net: ti: icssg-prueth: Add AM64x icssg supportMD Danish Anwar
Add AM64x ICSSG support which is similar to am65x SR2.0, but required: - all ring configured in exposed ring mode - always fill both original and buffer fields in cppi5 desc Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12dt-bindings: net: Add compatible for AM64x in ICSSGMD Danish Anwar
Add compatible for AM64x in icssg-prueth dt bindings. AM64x supports ICSSG similar to AM65x SR2.0. Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-12net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()Liu Jian
I got the below warning when do fuzzing test: BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470 Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9 CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE Hardware name: linux,dummy-virt (DT) Workqueue: pencrypt_parallel padata_parallel_worker Call trace: dump_backtrace+0x0/0x420 show_stack+0x34/0x44 dump_stack+0x1d0/0x248 __kasan_report+0x138/0x140 kasan_report+0x44/0x6c __asan_load4+0x94/0xd0 scatterwalk_copychunks+0x320/0x470 skcipher_next_slow+0x14c/0x290 skcipher_walk_next+0x2fc/0x480 skcipher_walk_first+0x9c/0x110 skcipher_walk_aead_common+0x380/0x440 skcipher_walk_aead_encrypt+0x54/0x70 ccm_encrypt+0x13c/0x4d0 crypto_aead_encrypt+0x7c/0xfc pcrypt_aead_enc+0x28/0x84 padata_parallel_worker+0xd0/0x2dc process_one_work+0x49c/0xbdc worker_thread+0x124/0x880 kthread+0x210/0x260 ret_from_fork+0x10/0x18 This is because the value of rec_seq of tls_crypto_info configured by the user program is too large, for example, 0xffffffffffffff. In addition, TLS is asynchronously accelerated. When tls_do_encryption() returns -EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow, skmsg is released before the asynchronous encryption process ends. As a result, the UAF problem occurs during the asynchronous processing of the encryption module. If the operation is asynchronous and the encryption module returns EINPROGRESS, do not free the record information. Fixes: 635d93981786 ("net/tls: free record only on encryption error") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-11tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper()Steven Rostedt (Google)
The eventfs files list is protected by SRCU. In earlier iterations it was protected with just RCU, but because it needed to also call sleepable code, it had to be switch to SRCU. The dcache_dir_open_wrapper() list_for_each_rcu() was missed and did not get converted over to list_for_each_srcu(). That needs to be fixed. Link: https://lore.kernel.org/linux-trace-kernel/20230911120053.ca82f545e7f46ea753deda18@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20230911200654.71ce927c@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ajay Kaher <akaher@vmware.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-11selftests/bpf: Correct map_fd to data_fd in tailcallsLeon Hwang
Get and check data_fd. It should not check map_fd again. Meanwhile, correct some 'return' to 'goto out'. Thank the suggestion from Maciej in "bpf, x64: Fix tailcall infinite loop"[0] discussions. [0] https://lore.kernel.org/bpf/e496aef8-1f80-0f8e-dcdd-25a8c300319a@gmail.com/T/#m7d3b601066ba66400d436b7e7579b2df4a101033 Fixes: 79d49ba048ec ("bpf, testing: Add various tail call test cases") Fixes: 3b0379111197 ("selftests/bpf: Add tailcall_bpf2bpf tests") Fixes: 5e0b0a4c52d3 ("selftests/bpf: Test tail call counting with bpf2bpf and data on stack") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230906154256.95461-1-hffilwlqm@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-11tracing/synthetic: Print out u64 values properlyTero Kristo
The synth traces incorrectly print pointer to the synthetic event values instead of the actual value when using u64 type. Fix by addressing the contents of the union properly. Link: https://lore.kernel.org/linux-trace-kernel/20230911141704.3585965-1-tero.kristo@linux.intel.com Fixes: ddeea494a16f ("tracing/synthetic: Use union instead of casts") Cc: stable@vger.kernel.org Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-11tracing/synthetic: Fix order of struct trace_dynamic_infoSteven Rostedt (Google)
To make handling BIG and LITTLE endian better the offset/len of dynamic fields of the synthetic events was changed into a structure of: struct trace_dynamic_info { #ifdef CONFIG_CPU_BIG_ENDIAN u16 offset; u16 len; #else u16 len; u16 offset; #endif }; to replace the manual changes of: data_offset = offset & 0xffff; data_offest = len << 16; But if you look closely, the above is: <len> << 16 | offset Which in little endian would be in memory: offset_lo offset_hi len_lo len_hi and in big endian: len_hi len_lo offset_hi offset_lo Which if broken into a structure would be: struct trace_dynamic_info { #ifdef CONFIG_CPU_BIG_ENDIAN u16 len; u16 offset; #else u16 offset; u16 len; #endif }; Which is the opposite of what was defined. Fix this and just to be safe also add "__packed". Link: https://lore.kernel.org/all/20230908154417.5172e343@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20230908163929.2c25f3dc@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Tested-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: ddeea494a16f3 ("tracing/synthetic: Use union instead of casts") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-11iavf: Fix promiscuous mode configuration flow messagesBrett Creeley
Currently when configuring promiscuous mode on the AVF we detect a change in the netdev->flags. We use IFF_PROMISC and IFF_ALLMULTI to determine whether or not we need to request/release promiscuous mode and/or multicast promiscuous mode. The problem is that the AQ calls for setting/clearing promiscuous/multicast mode are treated separately. This leads to a case where we can trigger two promiscuous mode AQ calls in a row with the incorrect state. To fix this make a few changes. Use IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE instead of the previous IAVF_FLAG_AQ_[REQUEST|RELEASE]_[PROMISC|ALLMULTI] flags. In iavf_set_rx_mode() detect if there is a change in the netdev->flags in comparison with adapter->flags and set the IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE aq_required bit. Then in iavf_process_aq_command() only check for IAVF_FLAG_CONFIGURE_PROMISC_MODE and call iavf_set_promiscuous() if it's set. In iavf_set_promiscuous() check again to see which (if any) promiscuous mode bits have changed when comparing the netdev->flags with the adapter->flags. Use this to set the flags which get sent to the PF driver. Add a spinlock that is used for updating current_netdev_promisc_flags and only allows one promiscuous mode AQ at a time. [1] Fixes the fact that we will only have one AQ call in the aq_required queue at any one time. [2] Streamlines the change in promiscuous mode to only set one AQ required bit. [3] This allows us to keep track of the current state of the flags and also makes it so we can take the most recent netdev->flags promiscuous mode state. [4] This fixes the problem where a change in the netdev->flags can cause IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE to be set in iavf_set_rx_mode(), but cleared in iavf_set_promiscuous() before the change is ever made via AQ call. Fixes: 47d3483988f6 ("i40evf: Add driver support for promiscuous mode") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-11i40e: fix potential memory leaks in i40e_remove()Andrii Staikov
Instead of freeing memory of a single VSI, make sure the memory for all VSIs is cleared before releasing VSIs. Add releasing of their resources in a loop with the iteration number equal to the number of allocated VSIs. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-11platform/x86: asus-wmi: Support 2023 ROG X16 tablet modeLuke D. Jones
Add quirk for ASUS ROG X16 (GV601V, 2023 versions) Flow 2-in-1 to enable tablet mode with lid flip (all screen rotations). Signed-off-by: Luke D. Jones <luke@ljones.dev> Link: https://lore.kernel.org/r/20230905082813.13470-1-luke@ljones.dev Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-09-11platform/mellanox: NVSW_SN2201 should depend on ACPIGeert Uytterhoeven
The only probing method supported by the Nvidia SN2201 platform driver is probing through an ACPI match table. Hence add a dependency on ACPI, to prevent asking the user about this driver when configuring a kernel without ACPI support. Fixes: 662f24826f95 ("platform/mellanox: Add support for new SN2201 system") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Vadim Pasternak <vadimp@nvidia.com> Acked-by: Andi Shyti <andi.shyti@kernel.org> Link: https://lore.kernel.org/r/ec5a4071691ab08d58771b7732a9988e89779268.1693828363.git.geert+renesas@glider.be Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-09-11platform/mellanox: mlxbf-bootctl: add NET dependency into KconfigDavid Thompson
The latest version of the mlxbf_bootctl driver utilizes "sysfs_format_mac", and this API is only available if NET is defined in the kernel configuration. This patch changes the mlxbf_bootctl Kconfig to depend on NET. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309031058.JvwNDBKt-lkp@intel.com/ Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20230905133243.31550-1-davthompson@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-09-11platform/mellanox: mlxbf-pmc: Fix reading of unprogrammed eventsShravan Kumar Ramani
This fix involves 2 changes: - All event regs have a reset value of 0, which is not a valid event_number as per the event_list for most blocks and hence seen as an error. Add a "disable" event with event_number 0 for all blocks. - The enable bit for each counter need not be checked before reading the event info, and hence removed. Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver") Signed-off-by: Shravan Kumar Ramani <shravankr@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/04d0213932d32681de1c716b54320ed894e52425.1693917738.git.shravankr@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-09-11platform/mellanox: mlxbf-pmc: Fix potential buffer overflowsShravan Kumar Ramani
Replace sprintf with sysfs_emit where possible. Size check in mlxbf_pmc_event_list_show should account for "\0". Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver") Signed-off-by: Shravan Kumar Ramani <shravankr@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/bef39ef32319a31b32f999065911f61b0d3b17c3.1693917738.git.shravankr@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-09-11platform/mellanox: mlxbf-tmfifo: Drop jumbo framesLiming Sun
This commit drops over-sized network packets to avoid tmfifo queue stuck. Fixes: 1357dfd7261f ("platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc") Signed-off-by: Liming Sun <limings@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/9318936c2447f76db475c985ca6d91f057efcd41.1693322547.git.limings@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-09-11platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptorsLiming Sun
This commit fixes tmfifo console stuck issue when the virtual networking interface is in down state. In such case, the network Rx descriptors runs out and causes the Rx network packet staying in the head of the tmfifo thus blocking the console packets. The fix is to drop the Rx network packet when no more Rx descriptors. Function name mlxbf_tmfifo_release_pending_pkt() is also renamed to mlxbf_tmfifo_release_pkt() to be more approperiate. Fixes: 1357dfd7261f ("platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc") Signed-off-by: Liming Sun <limings@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/8c0177dc938ae03f52ff7e0b62dbeee74b7bec09.1693322547.git.limings@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-09-11net: ethernet: mtk_eth_soc: fix pse_port configuration for MT7988Lorenzo Bianconi
MT7988 SoC support 3 NICs. Fix pse_port configuration in mtk_flow_set_output_device routine if the traffic is offloaded to eth2. Rely on mtk_pse_port definitions. Fixes: 88efedf517e6 ("net: ethernet: mtk_eth_soc: enable nft hw flowtable_offload for MT7988 SoC") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: ethernet: mtk_eth_soc: fix uninitialized variableDaniel Golle
Variable dma_addr in function mtk_poll_rx can be uninitialized on some of the error paths. In practise this doesn't matter, even random data present in uninitialized stack memory can safely be used in the way it happens in the error path. However, in order to make Smatch happy make sure the variable is always initialized. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11kcm: Fix memory leak in error path of kcm_sendmsg()Shigeru Yoshida
syzbot reported a memory leak like below: BUG: memory leak unreferenced object 0xffff88810b088c00 (size 240): comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s) hex dump (first 32 bytes): 00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634 [<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline] [<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815 [<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline] [<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748 [<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494 [<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548 [<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577 [<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 [<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append newly allocated skbs to 'head'. If some bytes are copied, an error occurred, and jumped to out_error label, 'last_skb' is left unmodified. A later kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the 'head' frag_list and causing the leak. This patch fixes this issue by properly updating the last allocated skb in 'last_skb'. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-and-tested-by: syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccb Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11r8152: check budget for r8152_poll()Hayes Wang
According to the document of napi, there is no rx process when the budget is 0. Therefore, r8152_poll() has to return 0 directly when the budget is equal to 0. Fixes: d2187f8e4454 ("r8152: divide the tx and rx bottom functions") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11Merge branch 'sha1105-regressions'David S. Miller
Vladimir Oltean says: ==================== Fixes for SJA1105 DSA FDB regressions A report by Yanan Yang has prompted an investigation into the sja1105 driver's behavior w.r.t. multicast. The report states that when adding multicast L2 addresses with "bridge mdb add", only the most recently added address works - the others seem to be overwritten. This is solved by patch 3/5 (with patch 2/5 as a dependency for it). Patches 4/5 and 5/5 fix a series of race conditions introduced during the same patch set as the bug above, namely this one: https://patchwork.kernel.org/project/netdevbpf/cover/20211024171757.3753288-1-vladimir.oltean@nxp.com/ Finally, patch 1/5 fixes an issue found ever since the introduction of multicast forwarding offload in sja1105, which is that the multicast addresses are visible (with the "self" flag) in "bridge fdb show". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: dsa: sja1105: block FDB accesses that are concurrent with a switch resetVladimir Oltean
Currently, when we add the first sja1105 port to a bridge with vlan_filtering 1, then we sometimes see this output: sja1105 spi2.2: port 4 failed to read back entry for be:79:b4:9e:9e:96 vid 3088: -ENOENT sja1105 spi2.2: Reset switch and programmed static config. Reason: VLAN filtering sja1105 spi2.2: port 0 failed to add be:79:b4:9e:9e:96 vid 0 to fdb: -2 It is because sja1105_fdb_add() runs from the dsa_owq which is no longer serialized with switch resets since it dropped the rtnl_lock() in the blamed commit. Either performing the FDB accesses before the reset, or after the reset, is equally fine, because sja1105_static_fdb_change() backs up those changes in the static config, but FDB access during reset isn't ok. Make sja1105_static_config_reload() take the fdb_lock to fix that. Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: dsa: sja1105: serialize sja1105_port_mcast_flood() with other FDB accessesVladimir Oltean
sja1105_fdb_add() runs from the dsa_owq, and sja1105_port_mcast_flood() runs from switchdev_deferred_process_work(). Prior to the blamed commit, they used to be indirectly serialized through the rtnl_lock(), which no longer holds true because dsa_owq dropped that. So, it is now possible that we traverse the static config BLK_IDX_L2_LOOKUP elements concurrently compared to when we change them, in sja1105_static_fdb_change(). That is not ideal, since it might result in data corruption. Introduce a mutex which serializes accesses to the hardware FDB and to the static config elements for the L2 Address Lookup table. I can't find a good reason to add locking around sja1105_fdb_dump(). I'll add it later if needed. Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: dsa: sja1105: fix multicast forwarding working only for last added mdb ↵Vladimir Oltean
entry The commit cited in Fixes: did 2 things: it refactored the read-back polling from sja1105_dynamic_config_read() into a new function, sja1105_dynamic_config_wait_complete(), and it called that from sja1105_dynamic_config_write() too. What is problematic is the refactoring. The refactored code from sja1105_dynamic_config_poll_valid() works like the previous one, but the problem is that it uses another packed_buf[] SPI buffer, and there was code at the end of sja1105_dynamic_config_read() which was relying on the read-back packed_buf[]: /* Don't dereference possibly NULL pointer - maybe caller * only wanted to see whether the entry existed or not. */ if (entry) ops->entry_packing(packed_buf, entry, UNPACK); After the change, the packed_buf[] that this code sees is no longer the entry read back from hardware, but the original entry that the caller passed to the sja1105_dynamic_config_read(), packed into this buffer. This difference is the most notable with the SJA1105_SEARCH uses from sja1105pqrs_fdb_add() - used for both fdb and mdb. There, we have logic added by commit 728db843df88 ("net: dsa: sja1105: ignore the FDB entry for unknown multicast when adding a new address") to figure out whether the address we're trying to add matches on any existing hardware entry, with the exception of the catch-all multicast address. That logic was broken, because with sja1105_dynamic_config_read() not working properly, it doesn't return us the entry read back from hardware, but the entry that we passed to it. And, since for multicast, a match will always exist, it will tell us that any mdb entry already exists at index=0 L2 Address Lookup table. It is index=0 because the caller doesn't know the index - it wants to find it out, and sja1105_dynamic_config_read() does: if (index < 0) { // SJA1105_SEARCH /* Avoid copying a signed negative number to an u64 */ cmd.index = 0; // <- this cmd.search = true; } else { cmd.index = index; cmd.search = false; } So, to the caller of sja1105_dynamic_config_read(), the returned info looks entirely legit, and it will add all mdb entries to FDB index 0. There, they will always overwrite each other (not to mention, potentially they can also overwrite a pre-existing bridge fdb entry), and the user-visible impact will be that only the last mdb entry will be forwarded as it should. The others won't (will be flooded or dropped, depending on the egress flood settings). Fixing is a bit more complicated, and involves either passing the same packed_buf[] to sja1105_dynamic_config_wait_complete(), or moving all the extra processing on the packed_buf[] to sja1105_dynamic_config_wait_complete(). I've opted for the latter, because it makes sja1105_dynamic_config_wait_complete() a bit more self-contained. Fixes: df405910ab9f ("net: dsa: sja1105: wait for dynamic config command completion on writes too") Reported-by: Yanan Yang <yanan.yang@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: dsa: sja1105: propagate exact error code from ↵Vladimir Oltean
sja1105_dynamic_config_poll_valid() Currently, sja1105_dynamic_config_wait_complete() returns either 0 or -ETIMEDOUT, because it just looks at the read_poll_timeout() return code. There will be future changes which move some more checks to sja1105_dynamic_config_poll_valid(). It is important that we propagate their exact return code (-ENOENT, -EINVAL), because callers of sja1105_dynamic_config_read() depend on them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: dsa: sja1105: hide all multicast addresses from "bridge fdb show"Vladimir Oltean
Commit 4d9423549501 ("net: dsa: sja1105: offload bridge port flags to device") has partially hidden some multicast entries from showing up in the "bridge fdb show" output, but it wasn't enough. Addresses which are added through "bridge mdb add" still show up. Hide them all. Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net:ethernet:adi:adin1110: Fix forwarding offloadCiprian Regus
Currently, when a new fdb entry is added (with both ports of the ADIN2111 bridged), the driver configures the MAC filters for the wrong port, which results in the forwarding being done by the host, and not actually hardware offloaded. The ADIN2111 offloads the forwarding by setting filters on the destination MAC address of incoming frames. Based on these, they may be routed to the other port. Thus, if a frame has to be forwarded from port 1 to port 2, the required configuration for the ADDR_FILT_UPRn register should set the APPLY2PORT1 bit (instead of APPLY2PORT2, as it's currently the case). Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Ciprian Regus <ciprian.regus@analog.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11hsr: Fix uninit-value access in fill_frame_info()Ziyang Xuan
Syzbot reports the following uninit-value access problem. ===================================================== BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:601 [inline] BUG: KMSAN: uninit-value in hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616 fill_frame_info net/hsr/hsr_forward.c:601 [inline] hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616 hsr_dev_xmit+0x192/0x330 net/hsr/hsr_device.c:223 __netdev_start_xmit include/linux/netdevice.h:4889 [inline] netdev_start_xmit include/linux/netdevice.h:4903 [inline] xmit_one net/core/dev.c:3544 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3560 __dev_queue_xmit+0x34d0/0x52a0 net/core/dev.c:4340 dev_queue_xmit include/linux/netdevice.h:3082 [inline] packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline] sock_sendmsg net/socket.c:753 [inline] __sys_sendto+0x781/0xa30 net/socket.c:2176 __do_sys_sendto net/socket.c:2188 [inline] __se_sys_sendto net/socket.c:2184 [inline] __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 Uninit was created at: slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767 slab_alloc_node mm/slub.c:3478 [inline] kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523 kmalloc_reserve+0x148/0x470 net/core/skbuff.c:559 __alloc_skb+0x318/0x740 net/core/skbuff.c:644 alloc_skb include/linux/skbuff.h:1286 [inline] alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6299 sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2794 packet_alloc_skb net/packet/af_packet.c:2936 [inline] packet_snd net/packet/af_packet.c:3030 [inline] packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline] sock_sendmsg net/socket.c:753 [inline] __sys_sendto+0x781/0xa30 net/socket.c:2176 __do_sys_sendto net/socket.c:2188 [inline] __se_sys_sendto net/socket.c:2184 [inline] __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82 It is because VLAN not yet supported in hsr driver. Return error when protocol is ETH_P_8021Q in fill_frame_info() now to fix it. Fixes: 451d8123f897 ("net: prp: add packet handling support") Reported-by: syzbot+bf7e6250c7ce248f3ec9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bf7e6250c7ce248f3ec9 Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11Merge branch 'rule_buf-OOB'David S. Miller
Hangyu Hua says: ==================== Fix possible OOB write when using rule_buf ADD bounds checks in bcmasp_netfilt_get_all_active and mvpp2_ethtool_get_rxnfc and mtk_hwlro_get_fdir_all when using rule_buf from ethtool_get_rxnfc. v2: [PATCH v2 1/3]: use -EMSGSIZE instead of truncating the list sliently. [PATCH v2 3/3]: drop the brackets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in ↵Hangyu Hua
mtk_hwlro_get_fdir_all() rule_locs is allocated in ethtool_get_rxnfc and the size is determined by rule_cnt from user space. So rule_cnt needs to be check before using rule_locs to avoid NULL pointer dereference. Fixes: 7aab747e5563 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: ethernet: mvpp2_main: fix possible OOB write in mvpp2_ethtool_get_rxnfc()Hangyu Hua
rules is allocated in ethtool_get_rxnfc and the size is determined by rule_cnt from user space. So rule_cnt needs to be check before using rules to avoid OOB writing or NULL pointer dereference. Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: ethernet: bcmasp: fix possible OOB write in bcmasp_netfilt_get_all_active()Hangyu Hua
rule_locs is allocated in ethtool_get_rxnfc and the size is determined by rule_cnt from user space. So rule_cnt needs to be check before using rule_locs to avoid OOB writing or NULL pointer dereference. Fixes: c5d511c49587 ("net: bcmasp: Add support for wake on net filters") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-11net: stmmac: fix handling of zero coalescing tx-usecsVincent Whitchurch
Setting ethtool -C eth0 tx-usecs 0 is supposed to disable the use of the coalescing timer but currently it gets programmed with zero delay instead. Disable the use of the coalescing timer if tx-usecs is zero by preventing it from being restarted. Note that to keep things simple we don't start/stop the timer when the coalescing settings are changed, but just let that happen on the next transmit or timer expiry. Fixes: 8fce33317023 ("net: stmmac: Rework coalesce timer and fix multi-queue races") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10Linux 6.6-rc1v6.6-rc1Linus Torvalds
2023-09-10Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm ci scripts from Dave Airlie: "This is a bunch of ci integration for the freedesktop gitlab instance where we currently do upstream userspace testing on diverse sets of GPU hardware. From my perspective I think it's an experiment worth going with and seeing how the benefits/noise playout keeping these files useful. Ideally I'd like to get this so we can do pre-merge testing on PRs eventually. Below is some info from danvet on why we've ended up making the decision and how we can roll it back if we decide it was a bad plan. Why in upstream? - like documentation, testcases, tools CI integration is one of these things where you can waste endless amounts of time if you accidentally have a version that doesn't match your source code - but also like the above, there's a balance, this is the initial cut of what we think makes sense to keep in sync vs out-of-tree, probably needs adjustment - gitlab supports out-of-repo gitlab integration and that's what's been used for the kernel in drm, but it results in per-driver fragmentation and lots of duplicated effort. the simple act of smashing an arbitrary winner into a topic branch already started surfacing patches on dri-devel and sparking good cross driver team discussions Why gitlab? - it's not any more shit than any of the other CI - drm userspace uses it extensively for everything in userspace, we have a lot of people and experience with this, including integration of hw testing labs - media userspace like gstreamer is also on gitlab.fd.o, and there's discussion to extend this to the media subsystem in some fashion Can this be shared? - there's definitely a pile of code that could move to scripts/ if other subsystem adopt ci integration in upstream kernel git. other bits are more drm/gpu specific like the igt-gpu-tests/tools integration - docker images can be run locally or in other CI runners Will we regret this? - it's all in one directory, intentionally, for easy deletion - probably 1-2 years in upstream to see whether this is worth it or a Big Mistake. that's roughly what it took to _really_ roll out solid CI in the bigger userspace projects we have on gitlab.fd.o like mesa3d" * tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm: drm: ci: docs: fix build warning - add missing escape drm: Add initial ci/ subdirectory
2023-09-10Merge branch 'smc-r-fixes'David S. Miller
Guangguan Wang says: ==================== Two fixes for SMC-R ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in ↵Guangguan Wang
smcr_port_add While doing smcr_port_add, there maybe linkgroup add into or delete from smc_lgr_list.list at the same time, which may result kernel crash. So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add. The crash calltrace show below: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014 Workqueue: events smc_ib_port_event_work [smc] RIP: 0010:smcr_port_add+0xa6/0xf0 [smc] RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297 RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918 R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4 R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08 FS: 0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0 PKRU: 55555554 Call Trace: smc_ib_port_event_work+0x18f/0x380 [smc] process_one_work+0x19b/0x340 worker_thread+0x30/0x370 ? process_one_work+0x340/0x340 kthread+0x114/0x130 ? __kthread_cancel_work+0x50/0x50 ret_from_fork+0x1f/0x30 Fixes: 1f90a05d9ff9 ("net/smc: add smcr_port_add() and smcr_link_up() processing") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10net/smc: bugfix for smcr v2 server connect success statisticGuangguan Wang
In the macro SMC_STAT_SERV_SUCC_INC, the smcd_version is used to determin whether to increase the v1 statistic or the v2 statistic. It is correct for SMCD. But for SMCR, smcr_version should be used. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10octeontx2-pf: Fix page pool cache index corruption.Ratheesh Kannoth
The access to page pool `cache' array and the `count' variable is not locked. Page pool cache access is fine as long as there is only one consumer per pool. octeontx2 driver fills in rx buffers from page pool in NAPI context. If system is stressed and could not allocate buffers, refiiling work will be delegated to a delayed workqueue. This means that there are two cosumers to the page pool cache. Either workqueue or IRQ/NAPI can be run on other CPU. This will lead to lock less access, hence corruption of cache pool indexes. To fix this issue, NAPI is rescheduled from workqueue context to refill rx buffers. Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10net: microchip: vcap api: Fix possible memory leak for vcap_dup_rule()Jinjie Ruan
Inject fault When select CONFIG_VCAP_KUNIT_TEST, the below memory leak occurs. If kzalloc() for duprule succeeds, but the following kmemdup() fails, the duprule, ckf and caf memory will be leaked. So kfree them in the error path. unreferenced object 0xffff122744c50600 (size 192): comm "kunit_try_catch", pid 346, jiffies 4294896122 (age 911.812s) hex dump (first 32 bytes): 10 27 00 00 04 00 00 00 1e 00 00 00 2c 01 00 00 .'..........,... 00 00 00 00 00 00 00 00 18 06 c5 44 27 12 ff ff ...........D'... backtrace: [<00000000394b0db8>] __kmem_cache_alloc_node+0x274/0x2f8 [<0000000001bedc67>] kmalloc_trace+0x38/0x88 [<00000000b0612f98>] vcap_dup_rule+0x50/0x460 [<000000005d2d3aca>] vcap_add_rule+0x8cc/0x1038 [<00000000eef9d0f8>] test_vcap_xn_rule_creator.constprop.0.isra.0+0x238/0x494 [<00000000cbda607b>] vcap_api_rule_remove_in_front_test+0x1ac/0x698 [<00000000c8766299>] kunit_try_run_case+0xe0/0x20c [<00000000c4fe9186>] kunit_generic_run_threadfn_adapter+0x50/0x94 [<00000000f6864acf>] kthread+0x2e8/0x374 [<0000000022e639b3>] ret_from_fork+0x10/0x20 Fixes: 814e7693207f ("net: microchip: vcap api: Add a storage state to a VCAP rule") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10net: bcmasp: add missing of_node_putJulia Lawall
for_each_available_child_of_node performs an of_node_get on each iteration, so a break out of the loop requires an of_node_put. This was done using the Coccinelle semantic patch iterators/for_each_child.cocci Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10selftests/net: Improve bind_bhash.sh to accommodate predictable network ↵Juntong Deng
interface names Starting with v197, systemd uses predictable interface network names, the traditional interface naming scheme (eth0) is deprecated, therefore it cannot be assumed that the eth0 interface exists on the host. This modification makes the bind_bhash test program run in a separate network namespace and no longer needs to consider the name of the network interface on the host. Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-10Merge tag 'x86-urgent-2023-09-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix preemption delays in the SGX code, remove unnecessarily UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and make the x86 SMP init code a bit more conservative to fix kexec() lockups" * tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Break up long non-preemptible delays in sgx_vepc_release() x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld x86/smp: Don't send INIT to non-present and non-booted CPUs
2023-09-10Merge tag 'perf-urgent-2023-09-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event fix from Ingo Molnar: "Work around a firmware bug in the uncore PMU driver, affecting certain Intel systems" * tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/uncore: Correct the number of CHAs on EMR
2023-09-09Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "perf tools maintainership: - Add git information for perf-tools and perf-tools-next trees and branches to the MAINTAINERS file. That is where development now takes place and myself and Namhyung Kim have write access, more people to come as we emulate other maintainer groups. perf record: - Record kernel data maps when 'perf record --data' is used, so that global variables can be resolved and used in tools that do data profiling. perf trace: - Remove the old, experimental support for BPF events in which a .c file was passed as an event: "perf trace -e hello.c" to then get compiled and loaded. The only known usage for that, that shipped with the kernel as an example for such events, augmented the raw_syscalls tracepoints and was converted to a libbpf skeleton, reusing all the user space components and the BPF code connected to the syscalls. In the end just the way to glue the BPF part and the user space type beautifiers changed, now being performed by libbpf skeletons. The next step is to use BTF to do pretty printing of all syscall types, as discussed with Alan Maguire and others. Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all path/filenames/strings, some of the networking data structures, perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5 seconds: # perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5 0.000 ( 9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 9.039 ( 0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 ? ( ): gpm/991 ... [continued]: clock_nanosleep()) = 0 10.133 ( ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ... ? ( ): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 30.276 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0 30.276 (2000.394 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0 1230.814 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ... 1230.814 (1000.404 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 2030.886 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0 ? ( ): crond/1172 ... [continued]: clock_nanosleep()) = 0 3242.699 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ... 2030.886 (2000.385 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0 3728.078 ( ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ... 3242.699 (1000.158 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0 4031.409 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ... 10.133 (5000.375 ms): sleep/327642 ... [continued]: clock_nanosleep()) = 0 Performance counter stats for 'sleep 5': 2,617,347 cycles 1,855,997 instructions # 0.71 insn per cycle 5.002282128 seconds time elapsed 0.000855000 seconds user 0.000852000 seconds sys perf annotate: - Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1) for licensing reasons, and we missed a build test on tools/perf/tests makefile. Since we now default to NDEBUG=1, we ended up segfaulting when building with BUILD_NONDISTRO=1 because a needed initialization routine was being "error checked" via an assert. Fix it by explicitly checking the result and aborting instead if it fails. We better back propagate the error, but at least 'perf annotate' on samples collected for a BPF program is back working when perf is built with BUILD_NONDISTRO=1. perf report/top: - Add back TUI hierarchy mode header, that is seen when using 'perf report/top --hierarchy'. - Fix the number of entries for 'e' key in the TUI that was preventing navigation of lines when expanding an entry. perf report/script: - Support cross platform register handling, allowing a perf.data file collected on one architecture to have registers sampled correctly displayed when analysis tools such as 'perf report' and 'perf script' are used on a different architecture. - Fix handling of event attributes in pipe mode, i.e. when one uses: perf record -o - | perf report -i - When no perf.data files are used. - Handle files generated via pipe mode with a version of perf and then read also via pipe mode with a different version of perf, where the event attr record may have changed, use the record size field to properly support this version mismatch. perf probe: - Accessing global variables from uprobes isn't supported, make the error message state that instead of stating that some minimal kernel version is needed to have that feature. This seems just a tool limitation, the kernel probably has all that is needed. perf tests: - Fix a reference count related leak in the dlfilter v0 API where the result of a thread__find_symbol_fb() is not matched with an addr_location__exit() to drop the reference counts of the resolved components (machine, thread, map, symbol, etc). Add a dlfilter test to make sure that doesn't regresses. - Lots of fixes for the 'perf test' written in shell script related to problems found with the shellcheck utility. - Fixes for 'perf test' shell scripts testing features enabled when perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf counters. - Add perf record sample filtering test, things like the following example, that gets implemented as a BPF filter attached to the event: # perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000' - Improve the way the task_analyzer test checks if libtraceevent is linked, using 'perf version --build-options' instead of the more expensinve 'perf record -e "sched:sched_switch"'. - Add support for riscv in the mmap-basic test. (This went as well via the RiscV tree, same contents). libperf: - Implement riscv mmap support (This went as well via the RiscV tree, same contents). perf script: - New tool that converts perf.data files to the firefox profiler format so that one can use the visualizer at https://profiler.firefox.com/. Done by Anup Sharma as part of this year's Google Summer of Code. One can generate the output and upload it to the web interface but Anup also automated everything: perf script gecko -F 99 -a sleep 60 - Support syscall name parsing on arm64. - Print "cgroup" field on the same line as "comm". perf bench: - Add new 'uprobe' benchmark to measure the overhead of uprobes with/without BPF programs attached to it. - breakpoints are not available on power9, skip that test. perf stat: - Add #num_cpus_online literal to be used in 'perf stat' metrics, and add this extra 'perf test' check that exemplifies its purpose: TEST_ASSERT_VAL("#num_cpus_online", expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0); TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0); TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online); Miscellaneous: - Improve tool startup time by lazily reading PMU, JSON, sysfs data. - Improve error reporting in the parsing of events, passing YYLTYPE to error routines, so that the output can show were the parsing error was found. - Add 'perf test' entries to check the parsing of events improvements. - Fix various leak for things detected by -fsanitize=address, mostly things that would be freed at tool exit, including: - Free evsel->filter on the destructor. - Allow tools to register a thread->priv destructor and use it in 'perf trace'. - Free evsel->priv in 'perf trace'. - Free string returned by synthesize_perf_probe_point() when the caller fails to do all it needs. - Adjust various compiler options to not consider errors some warnings when building with broken headers found in things like python, flex, bison, as we otherwise build with -Werror. Some for gcc, some for clang, some for some specific version of those, some for some specific version of flex or bison, or some specific combination of these components, bah. - Allow customization of clang options for BPF target, this helps building on gentoo where there are other oddities where BPF targets gets passed some compiler options intended for the native build, so building with WERROR=0 helps while these oddities are fixed. - Dont pass ERR_PTR() values to perf_session__delete() in 'perf top' and 'perf lock', fixing some segfaults when handling some odd failures. - Add LTO build option. - Fix format of unordered lists in the perf docs (tools/perf/Documentation) - Overhaul the bison files, using constructs such as YYNOMEM. - Remove unused tokens from the bison .y files. - Add more comments to various structs. - A few LoongArch enablement patches. Vendor events (JSON): - Add JSON metrics for Yitian 710 DDR (aarch64). Things like: EventName, BriefDescription visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.", visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.", op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.", op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.", op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.", - Add AmpereOne metrics (aarch64). - Update N2 and V2 metrics (aarch64) and events using Arm telemetry repo. - Update scale units and descriptions of common topdown metrics on aarch64. Things like: - "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)", - "BriefDescription": "Frontend bound L1 topdown metric", + "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))", + "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.", - Update events for intel: meteorlake to 1.04, sapphirerapids to 1.15, Icelake+ metric constraints. - Update files for the power10 platform" * tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits) perf parse-events: Fix driver config term perf parse-events: Fixes relating to no_value terms perf parse-events: Fix propagation of term's no_value when cloning perf parse-events: Name the two term enums perf list: Don't print Unit for "default_core" perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake perf dlfilter: Avoid leak in v0 API test use of resolve_address() perf metric: Add #num_cpus_online literal perf pmu: Remove str from perf_pmu_alias perf parse-events: Make common term list to strbuf helper perf parse-events: Minor help message improvements perf pmu: Avoid uninitialized use of alias->str perf jevents: Use "default_core" for events with no Unit perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test perf test shell stat_bpf_counters: Fix test on Intel perf test shell record_bpf_filter: Skip 6.2 kernel libperf: Get rid of attr.id field perf tools: Convert to perf_record_header_attr_id() libperf: Add perf_record_header_attr_id() perf tools: Handle old data in PERF_RECORD_ATTR ...
2023-09-09Merge tag '6.6-rc-smb3-client-fixes-part2' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: - six smb3 client fixes including ones to allow controlling smb3 directory caching timeout and limits, and one debugging improvement - one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option) - one minor spnego registry update * tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: spnego: add missing OID to oid registry smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU cifs: update internal module version number for cifs.ko smb3: allow controlling maximum number of cached directories smb3: add trace point for queryfs (statfs) nls: Hide new NLS_UCS2_UTILS smb3: allow controlling length of time directory entries are cached with dir leases smb: propagate error code of extract_sharename()