Age | Commit message (Collapse) | Author |
|
macb_set_tx_clk() is called under a spinlock but itself calls clk_set_rate()
which can sleep. This results in:
| BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
| pps pps1: new PPS source ptp1
| in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 40, name: kworker/u4:3
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| 4 locks held by kworker/u4:3/40:
| #0: ffff000003409148
| macb ff0c0000.ethernet: gem-ptp-timer ptp clock registered.
| ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c
| #1: ffff8000833cbdd8 ((work_completion)(&pl->resolve)){+.+.}-{0:0}, at: process_one_work+0x14c/0x51c
| #2: ffff000004f01578 (&pl->state_mutex){+.+.}-{4:4}, at: phylink_resolve+0x44/0x4e8
| #3: ffff000004f06f50 (&bp->lock){....}-{3:3}, at: macb_mac_link_up+0x40/0x2ac
| irq event stamp: 113998
| hardirqs last enabled at (113997): [<ffff800080e8503c>] _raw_spin_unlock_irq+0x30/0x64
| hardirqs last disabled at (113998): [<ffff800080e84478>] _raw_spin_lock_irqsave+0xac/0xc8
| softirqs last enabled at (113608): [<ffff800080010630>] __do_softirq+0x430/0x4e4
| softirqs last disabled at (113597): [<ffff80008001614c>] ____do_softirq+0x10/0x1c
| CPU: 0 PID: 40 Comm: kworker/u4:3 Not tainted 6.5.0-11717-g9355ce8b2f50-dirty #368
| Hardware name: ... ZynqMP ... (DT)
| Workqueue: events_power_efficient phylink_resolve
| Call trace:
| dump_backtrace+0x98/0xf0
| show_stack+0x18/0x24
| dump_stack_lvl+0x60/0xac
| dump_stack+0x18/0x24
| __might_resched+0x144/0x24c
| __might_sleep+0x48/0x98
| __mutex_lock+0x58/0x7b0
| mutex_lock_nested+0x24/0x30
| clk_prepare_lock+0x4c/0xa8
| clk_set_rate+0x24/0x8c
| macb_mac_link_up+0x25c/0x2ac
| phylink_resolve+0x178/0x4e8
| process_one_work+0x1ec/0x51c
| worker_thread+0x1ec/0x3e4
| kthread+0x120/0x124
| ret_from_fork+0x10/0x20
The obvious fix is to move the call to macb_set_tx_clk() out of the
protected area. This seems safe as rx and tx are both disabled anyway at
this point.
It is however not entirely clear what the spinlock shall protect. It
could be the read-modify-write access to the NCFGR register, but this
is accessed in macb_set_rx_mode() and macb_set_rxcsum_feature() as well
without holding the spinlock. It could also be the register accesses
done in mog_init_rings() or macb_init_buffers(), but again these
functions are called without holding the spinlock in macb_hresp_error_task().
The locking seems fishy in this driver and it might deserve another look
before this patch is applied.
Fixes: 633e98a711ac0 ("net: macb: use resolved link config in mac_link_up()")
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/r/20230908112913.1701766-1-s.hauer@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use bitmap_zalloc() and bitmap_free() instead of hand-writing them.
It is less verbose and it improves the type checking and semantic.
While at it, add missing header inclusion (should be bitops.h,
but with the above change it becomes bitmap.h).
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230911131618.4159437-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since commit 1202cdd66531("Remove DECnet support from kernel") has been
merged, all callers pass in the initial_ref value of 1 when they call
dst_alloc(). Therefore, remove initial_ref when the dst_alloc() is
declared and replace initial_ref with 1 in dst_alloc().
Also when all callers call dst_init(), the value of initial_ref is 1.
Therefore, remove the input parameter initial_ref of the dst_init() and
replace initial_ref with the value 1 in dst_init.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20230911125045.346390-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
MD Danish Anwar says:
====================
Add support for ICSSG on AM64x EVM
This series adds support for ICSSG driver on AM64x EVM.
First patch of the series adds compatible for AM64x EVM in icssg-prueth
dt binding. Second patch adds support for AM64x compatible in the ICSSG
driver.
This series addresses comments on [v1] (which was posted as RFC).
This series is based on the latest net-next/main. This series has no
dependency.
Changes from v1 to v2:
*) Made the compatible list in patch 1 alphanumerically ordered as asked
by Krzysztof.
*) Dropped the RFC tag.
*) Added RB tags of Andrew and Roger.
[v1] https://lore.kernel.org/all/20230830113724.1228624-1-danishanwar@ti.com/
====================
Link: https://lore.kernel.org/r/20230911054308.2163076-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add AM64x ICSSG support which is similar to am65x SR2.0, but required:
- all ring configured in exposed ring mode
- always fill both original and buffer fields in cppi5 desc
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add compatible for AM64x in icssg-prueth dt bindings. AM64x supports
ICSSG similar to AM65x SR2.0.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
I got the below warning when do fuzzing test:
BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470
Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9
CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE
Hardware name: linux,dummy-virt (DT)
Workqueue: pencrypt_parallel padata_parallel_worker
Call trace:
dump_backtrace+0x0/0x420
show_stack+0x34/0x44
dump_stack+0x1d0/0x248
__kasan_report+0x138/0x140
kasan_report+0x44/0x6c
__asan_load4+0x94/0xd0
scatterwalk_copychunks+0x320/0x470
skcipher_next_slow+0x14c/0x290
skcipher_walk_next+0x2fc/0x480
skcipher_walk_first+0x9c/0x110
skcipher_walk_aead_common+0x380/0x440
skcipher_walk_aead_encrypt+0x54/0x70
ccm_encrypt+0x13c/0x4d0
crypto_aead_encrypt+0x7c/0xfc
pcrypt_aead_enc+0x28/0x84
padata_parallel_worker+0xd0/0x2dc
process_one_work+0x49c/0xbdc
worker_thread+0x124/0x880
kthread+0x210/0x260
ret_from_fork+0x10/0x18
This is because the value of rec_seq of tls_crypto_info configured by the
user program is too large, for example, 0xffffffffffffff. In addition, TLS
is asynchronously accelerated. When tls_do_encryption() returns
-EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow,
skmsg is released before the asynchronous encryption process ends. As a
result, the UAF problem occurs during the asynchronous processing of the
encryption module.
If the operation is asynchronous and the encryption module returns
EINPROGRESS, do not free the record information.
Fixes: 635d93981786 ("net/tls: free record only on encryption error")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The eventfs files list is protected by SRCU. In earlier iterations it was
protected with just RCU, but because it needed to also call sleepable
code, it had to be switch to SRCU. The dcache_dir_open_wrapper()
list_for_each_rcu() was missed and did not get converted over to
list_for_each_srcu(). That needs to be fixed.
Link: https://lore.kernel.org/linux-trace-kernel/20230911120053.ca82f545e7f46ea753deda18@kernel.org/
Link: https://lore.kernel.org/linux-trace-kernel/20230911200654.71ce927c@gandalf.local.home
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ajay Kaher <akaher@vmware.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Get and check data_fd. It should not check map_fd again.
Meanwhile, correct some 'return' to 'goto out'.
Thank the suggestion from Maciej in "bpf, x64: Fix tailcall infinite
loop"[0] discussions.
[0] https://lore.kernel.org/bpf/e496aef8-1f80-0f8e-dcdd-25a8c300319a@gmail.com/T/#m7d3b601066ba66400d436b7e7579b2df4a101033
Fixes: 79d49ba048ec ("bpf, testing: Add various tail call test cases")
Fixes: 3b0379111197 ("selftests/bpf: Add tailcall_bpf2bpf tests")
Fixes: 5e0b0a4c52d3 ("selftests/bpf: Test tail call counting with bpf2bpf and data on stack")
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230906154256.95461-1-hffilwlqm@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
The synth traces incorrectly print pointer to the synthetic event values
instead of the actual value when using u64 type. Fix by addressing the
contents of the union properly.
Link: https://lore.kernel.org/linux-trace-kernel/20230911141704.3585965-1-tero.kristo@linux.intel.com
Fixes: ddeea494a16f ("tracing/synthetic: Use union instead of casts")
Cc: stable@vger.kernel.org
Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
To make handling BIG and LITTLE endian better the offset/len of dynamic
fields of the synthetic events was changed into a structure of:
struct trace_dynamic_info {
#ifdef CONFIG_CPU_BIG_ENDIAN
u16 offset;
u16 len;
#else
u16 len;
u16 offset;
#endif
};
to replace the manual changes of:
data_offset = offset & 0xffff;
data_offest = len << 16;
But if you look closely, the above is:
<len> << 16 | offset
Which in little endian would be in memory:
offset_lo offset_hi len_lo len_hi
and in big endian:
len_hi len_lo offset_hi offset_lo
Which if broken into a structure would be:
struct trace_dynamic_info {
#ifdef CONFIG_CPU_BIG_ENDIAN
u16 len;
u16 offset;
#else
u16 offset;
u16 len;
#endif
};
Which is the opposite of what was defined.
Fix this and just to be safe also add "__packed".
Link: https://lore.kernel.org/all/20230908154417.5172e343@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20230908163929.2c25f3dc@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fixes: ddeea494a16f3 ("tracing/synthetic: Use union instead of casts")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently when configuring promiscuous mode on the AVF we detect a
change in the netdev->flags. We use IFF_PROMISC and IFF_ALLMULTI to
determine whether or not we need to request/release promiscuous mode
and/or multicast promiscuous mode. The problem is that the AQ calls for
setting/clearing promiscuous/multicast mode are treated separately. This
leads to a case where we can trigger two promiscuous mode AQ calls in
a row with the incorrect state. To fix this make a few changes.
Use IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE instead of the previous
IAVF_FLAG_AQ_[REQUEST|RELEASE]_[PROMISC|ALLMULTI] flags.
In iavf_set_rx_mode() detect if there is a change in the
netdev->flags in comparison with adapter->flags and set the
IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE aq_required bit. Then in
iavf_process_aq_command() only check for IAVF_FLAG_CONFIGURE_PROMISC_MODE
and call iavf_set_promiscuous() if it's set.
In iavf_set_promiscuous() check again to see which (if any) promiscuous
mode bits have changed when comparing the netdev->flags with the
adapter->flags. Use this to set the flags which get sent to the PF
driver.
Add a spinlock that is used for updating current_netdev_promisc_flags
and only allows one promiscuous mode AQ at a time.
[1] Fixes the fact that we will only have one AQ call in the aq_required
queue at any one time.
[2] Streamlines the change in promiscuous mode to only set one AQ
required bit.
[3] This allows us to keep track of the current state of the flags and
also makes it so we can take the most recent netdev->flags promiscuous
mode state.
[4] This fixes the problem where a change in the netdev->flags can cause
IAVF_FLAG_AQ_CONFIGURE_PROMISC_MODE to be set in iavf_set_rx_mode(),
but cleared in iavf_set_promiscuous() before the change is ever made via
AQ call.
Fixes: 47d3483988f6 ("i40evf: Add driver support for promiscuous mode")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Instead of freeing memory of a single VSI, make sure
the memory for all VSIs is cleared before releasing VSIs.
Add releasing of their resources in a loop with the iteration
number equal to the number of allocated VSIs.
Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add quirk for ASUS ROG X16 (GV601V, 2023 versions) Flow 2-in-1
to enable tablet mode with lid flip (all screen rotations).
Signed-off-by: Luke D. Jones <luke@ljones.dev>
Link: https://lore.kernel.org/r/20230905082813.13470-1-luke@ljones.dev
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The only probing method supported by the Nvidia SN2201 platform driver
is probing through an ACPI match table. Hence add a dependency on
ACPI, to prevent asking the user about this driver when configuring a
kernel without ACPI support.
Fixes: 662f24826f95 ("platform/mellanox: Add support for new SN2201 system")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Vadim Pasternak <vadimp@nvidia.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/ec5a4071691ab08d58771b7732a9988e89779268.1693828363.git.geert+renesas@glider.be
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The latest version of the mlxbf_bootctl driver utilizes
"sysfs_format_mac", and this API is only available if
NET is defined in the kernel configuration. This patch
changes the mlxbf_bootctl Kconfig to depend on NET.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309031058.JvwNDBKt-lkp@intel.com/
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/20230905133243.31550-1-davthompson@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
This fix involves 2 changes:
- All event regs have a reset value of 0, which is not a valid
event_number as per the event_list for most blocks and hence seen
as an error. Add a "disable" event with event_number 0 for all blocks.
- The enable bit for each counter need not be checked before
reading the event info, and hence removed.
Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver")
Signed-off-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/04d0213932d32681de1c716b54320ed894e52425.1693917738.git.shravankr@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Replace sprintf with sysfs_emit where possible.
Size check in mlxbf_pmc_event_list_show should account for "\0".
Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver")
Signed-off-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/bef39ef32319a31b32f999065911f61b0d3b17c3.1693917738.git.shravankr@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
This commit drops over-sized network packets to avoid tmfifo
queue stuck.
Fixes: 1357dfd7261f ("platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc")
Signed-off-by: Liming Sun <limings@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/9318936c2447f76db475c985ca6d91f057efcd41.1693322547.git.limings@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
This commit fixes tmfifo console stuck issue when the virtual
networking interface is in down state. In such case, the network
Rx descriptors runs out and causes the Rx network packet staying
in the head of the tmfifo thus blocking the console packets. The
fix is to drop the Rx network packet when no more Rx descriptors.
Function name mlxbf_tmfifo_release_pending_pkt() is also renamed
to mlxbf_tmfifo_release_pkt() to be more approperiate.
Fixes: 1357dfd7261f ("platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc")
Signed-off-by: Liming Sun <limings@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: David Thompson <davthompson@nvidia.com>
Link: https://lore.kernel.org/r/8c0177dc938ae03f52ff7e0b62dbeee74b7bec09.1693322547.git.limings@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
MT7988 SoC support 3 NICs. Fix pse_port configuration in
mtk_flow_set_output_device routine if the traffic is offloaded to eth2.
Rely on mtk_pse_port definitions.
Fixes: 88efedf517e6 ("net: ethernet: mtk_eth_soc: enable nft hw flowtable_offload for MT7988 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Variable dma_addr in function mtk_poll_rx can be uninitialized on
some of the error paths. In practise this doesn't matter, even random
data present in uninitialized stack memory can safely be used in the
way it happens in the error path.
However, in order to make Smatch happy make sure the variable is
always initialized.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reported a memory leak like below:
BUG: memory leak
unreferenced object 0xffff88810b088c00 (size 240):
comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s)
hex dump (first 32 bytes):
00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634
[<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline]
[<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815
[<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline]
[<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748
[<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494
[<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548
[<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577
[<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
[<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append
newly allocated skbs to 'head'. If some bytes are copied, an error occurred,
and jumped to out_error label, 'last_skb' is left unmodified. A later
kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the
'head' frag_list and causing the leak.
This patch fixes this issue by properly updating the last allocated skb in
'last_skb'.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-and-tested-by: syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccb
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the document of napi, there is no rx process when the
budget is 0. Therefore, r8152_poll() has to return 0 directly when the
budget is equal to 0.
Fixes: d2187f8e4454 ("r8152: divide the tx and rx bottom functions")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vladimir Oltean says:
====================
Fixes for SJA1105 DSA FDB regressions
A report by Yanan Yang has prompted an investigation into the sja1105
driver's behavior w.r.t. multicast. The report states that when adding
multicast L2 addresses with "bridge mdb add", only the most recently
added address works - the others seem to be overwritten. This is solved
by patch 3/5 (with patch 2/5 as a dependency for it).
Patches 4/5 and 5/5 fix a series of race conditions introduced during
the same patch set as the bug above, namely this one:
https://patchwork.kernel.org/project/netdevbpf/cover/20211024171757.3753288-1-vladimir.oltean@nxp.com/
Finally, patch 1/5 fixes an issue found ever since the introduction of
multicast forwarding offload in sja1105, which is that the multicast
addresses are visible (with the "self" flag) in "bridge fdb show".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when we add the first sja1105 port to a bridge with
vlan_filtering 1, then we sometimes see this output:
sja1105 spi2.2: port 4 failed to read back entry for be:79:b4:9e:9e:96 vid 3088: -ENOENT
sja1105 spi2.2: Reset switch and programmed static config. Reason: VLAN filtering
sja1105 spi2.2: port 0 failed to add be:79:b4:9e:9e:96 vid 0 to fdb: -2
It is because sja1105_fdb_add() runs from the dsa_owq which is no longer
serialized with switch resets since it dropped the rtnl_lock() in the
blamed commit.
Either performing the FDB accesses before the reset, or after the reset,
is equally fine, because sja1105_static_fdb_change() backs up those
changes in the static config, but FDB access during reset isn't ok.
Make sja1105_static_config_reload() take the fdb_lock to fix that.
Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sja1105_fdb_add() runs from the dsa_owq, and sja1105_port_mcast_flood()
runs from switchdev_deferred_process_work(). Prior to the blamed commit,
they used to be indirectly serialized through the rtnl_lock(), which
no longer holds true because dsa_owq dropped that.
So, it is now possible that we traverse the static config BLK_IDX_L2_LOOKUP
elements concurrently compared to when we change them, in
sja1105_static_fdb_change(). That is not ideal, since it might result in
data corruption.
Introduce a mutex which serializes accesses to the hardware FDB and to
the static config elements for the L2 Address Lookup table.
I can't find a good reason to add locking around sja1105_fdb_dump().
I'll add it later if needed.
Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
entry
The commit cited in Fixes: did 2 things: it refactored the read-back
polling from sja1105_dynamic_config_read() into a new function,
sja1105_dynamic_config_wait_complete(), and it called that from
sja1105_dynamic_config_write() too.
What is problematic is the refactoring.
The refactored code from sja1105_dynamic_config_poll_valid() works like
the previous one, but the problem is that it uses another packed_buf[]
SPI buffer, and there was code at the end of sja1105_dynamic_config_read()
which was relying on the read-back packed_buf[]:
/* Don't dereference possibly NULL pointer - maybe caller
* only wanted to see whether the entry existed or not.
*/
if (entry)
ops->entry_packing(packed_buf, entry, UNPACK);
After the change, the packed_buf[] that this code sees is no longer the
entry read back from hardware, but the original entry that the caller
passed to the sja1105_dynamic_config_read(), packed into this buffer.
This difference is the most notable with the SJA1105_SEARCH uses from
sja1105pqrs_fdb_add() - used for both fdb and mdb. There, we have logic
added by commit 728db843df88 ("net: dsa: sja1105: ignore the FDB entry
for unknown multicast when adding a new address") to figure out whether
the address we're trying to add matches on any existing hardware entry,
with the exception of the catch-all multicast address.
That logic was broken, because with sja1105_dynamic_config_read() not
working properly, it doesn't return us the entry read back from
hardware, but the entry that we passed to it. And, since for multicast,
a match will always exist, it will tell us that any mdb entry already
exists at index=0 L2 Address Lookup table. It is index=0 because the
caller doesn't know the index - it wants to find it out, and
sja1105_dynamic_config_read() does:
if (index < 0) { // SJA1105_SEARCH
/* Avoid copying a signed negative number to an u64 */
cmd.index = 0; // <- this
cmd.search = true;
} else {
cmd.index = index;
cmd.search = false;
}
So, to the caller of sja1105_dynamic_config_read(), the returned info
looks entirely legit, and it will add all mdb entries to FDB index 0.
There, they will always overwrite each other (not to mention,
potentially they can also overwrite a pre-existing bridge fdb entry),
and the user-visible impact will be that only the last mdb entry will be
forwarded as it should. The others won't (will be flooded or dropped,
depending on the egress flood settings).
Fixing is a bit more complicated, and involves either passing the same
packed_buf[] to sja1105_dynamic_config_wait_complete(), or moving all
the extra processing on the packed_buf[] to
sja1105_dynamic_config_wait_complete(). I've opted for the latter,
because it makes sja1105_dynamic_config_wait_complete() a bit more
self-contained.
Fixes: df405910ab9f ("net: dsa: sja1105: wait for dynamic config command completion on writes too")
Reported-by: Yanan Yang <yanan.yang@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sja1105_dynamic_config_poll_valid()
Currently, sja1105_dynamic_config_wait_complete() returns either 0 or
-ETIMEDOUT, because it just looks at the read_poll_timeout() return code.
There will be future changes which move some more checks to
sja1105_dynamic_config_poll_valid(). It is important that we propagate
their exact return code (-ENOENT, -EINVAL), because callers of
sja1105_dynamic_config_read() depend on them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 4d9423549501 ("net: dsa: sja1105: offload bridge port flags to
device") has partially hidden some multicast entries from showing up in
the "bridge fdb show" output, but it wasn't enough. Addresses which are
added through "bridge mdb add" still show up. Hide them all.
Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when a new fdb entry is added (with both ports of the
ADIN2111 bridged), the driver configures the MAC filters for the wrong
port, which results in the forwarding being done by the host, and not
actually hardware offloaded.
The ADIN2111 offloads the forwarding by setting filters on the
destination MAC address of incoming frames. Based on these, they may be
routed to the other port. Thus, if a frame has to be forwarded from port
1 to port 2, the required configuration for the ADDR_FILT_UPRn register
should set the APPLY2PORT1 bit (instead of APPLY2PORT2, as it's
currently the case).
Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support")
Signed-off-by: Ciprian Regus <ciprian.regus@analog.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Syzbot reports the following uninit-value access problem.
=====================================================
BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:601 [inline]
BUG: KMSAN: uninit-value in hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
fill_frame_info net/hsr/hsr_forward.c:601 [inline]
hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
hsr_dev_xmit+0x192/0x330 net/hsr/hsr_device.c:223
__netdev_start_xmit include/linux/netdevice.h:4889 [inline]
netdev_start_xmit include/linux/netdevice.h:4903 [inline]
xmit_one net/core/dev.c:3544 [inline]
dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3560
__dev_queue_xmit+0x34d0/0x52a0 net/core/dev.c:4340
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3087 [inline]
packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:730 [inline]
sock_sendmsg net/socket.c:753 [inline]
__sys_sendto+0x781/0xa30 net/socket.c:2176
__do_sys_sendto net/socket.c:2188 [inline]
__se_sys_sendto net/socket.c:2184 [inline]
__ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
entry_SYSENTER_compat_after_hwframe+0x70/0x82
Uninit was created at:
slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767
slab_alloc_node mm/slub.c:3478 [inline]
kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523
kmalloc_reserve+0x148/0x470 net/core/skbuff.c:559
__alloc_skb+0x318/0x740 net/core/skbuff.c:644
alloc_skb include/linux/skbuff.h:1286 [inline]
alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6299
sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2794
packet_alloc_skb net/packet/af_packet.c:2936 [inline]
packet_snd net/packet/af_packet.c:3030 [inline]
packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:730 [inline]
sock_sendmsg net/socket.c:753 [inline]
__sys_sendto+0x781/0xa30 net/socket.c:2176
__do_sys_sendto net/socket.c:2188 [inline]
__se_sys_sendto net/socket.c:2184 [inline]
__ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
entry_SYSENTER_compat_after_hwframe+0x70/0x82
It is because VLAN not yet supported in hsr driver. Return error
when protocol is ETH_P_8021Q in fill_frame_info() now to fix it.
Fixes: 451d8123f897 ("net: prp: add packet handling support")
Reported-by: syzbot+bf7e6250c7ce248f3ec9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bf7e6250c7ce248f3ec9
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Hangyu Hua says:
====================
Fix possible OOB write when using rule_buf
ADD bounds checks in bcmasp_netfilt_get_all_active and
mvpp2_ethtool_get_rxnfc and mtk_hwlro_get_fdir_all when
using rule_buf from ethtool_get_rxnfc.
v2:
[PATCH v2 1/3]: use -EMSGSIZE instead of truncating the list sliently.
[PATCH v2 3/3]: drop the brackets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mtk_hwlro_get_fdir_all()
rule_locs is allocated in ethtool_get_rxnfc and the size is determined by
rule_cnt from user space. So rule_cnt needs to be check before using
rule_locs to avoid NULL pointer dereference.
Fixes: 7aab747e5563 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rules is allocated in ethtool_get_rxnfc and the size is determined by
rule_cnt from user space. So rule_cnt needs to be check before using
rules to avoid OOB writing or NULL pointer dereference.
Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rule_locs is allocated in ethtool_get_rxnfc and the size is determined by
rule_cnt from user space. So rule_cnt needs to be check before using
rule_locs to avoid OOB writing or NULL pointer dereference.
Fixes: c5d511c49587 ("net: bcmasp: Add support for wake on net filters")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Setting ethtool -C eth0 tx-usecs 0 is supposed to disable the use of the
coalescing timer but currently it gets programmed with zero delay
instead.
Disable the use of the coalescing timer if tx-usecs is zero by
preventing it from being restarted. Note that to keep things simple we
don't start/stop the timer when the coalescing settings are changed, but
just let that happen on the next transmit or timer expiry.
Fixes: 8fce33317023 ("net: stmmac: Rework coalesce timer and fix multi-queue races")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Pull drm ci scripts from Dave Airlie:
"This is a bunch of ci integration for the freedesktop gitlab instance
where we currently do upstream userspace testing on diverse sets of
GPU hardware. From my perspective I think it's an experiment worth
going with and seeing how the benefits/noise playout keeping these
files useful.
Ideally I'd like to get this so we can do pre-merge testing on PRs
eventually.
Below is some info from danvet on why we've ended up making the
decision and how we can roll it back if we decide it was a bad plan.
Why in upstream?
- like documentation, testcases, tools CI integration is one of these
things where you can waste endless amounts of time if you
accidentally have a version that doesn't match your source code
- but also like the above, there's a balance, this is the initial cut
of what we think makes sense to keep in sync vs out-of-tree,
probably needs adjustment
- gitlab supports out-of-repo gitlab integration and that's what's
been used for the kernel in drm, but it results in per-driver
fragmentation and lots of duplicated effort. the simple act of
smashing an arbitrary winner into a topic branch already started
surfacing patches on dri-devel and sparking good cross driver team
discussions
Why gitlab?
- it's not any more shit than any of the other CI
- drm userspace uses it extensively for everything in userspace, we
have a lot of people and experience with this, including
integration of hw testing labs
- media userspace like gstreamer is also on gitlab.fd.o, and there's
discussion to extend this to the media subsystem in some fashion
Can this be shared?
- there's definitely a pile of code that could move to scripts/ if
other subsystem adopt ci integration in upstream kernel git. other
bits are more drm/gpu specific like the igt-gpu-tests/tools
integration
- docker images can be run locally or in other CI runners
Will we regret this?
- it's all in one directory, intentionally, for easy deletion
- probably 1-2 years in upstream to see whether this is worth it or a
Big Mistake. that's roughly what it took to _really_ roll out solid
CI in the bigger userspace projects we have on gitlab.fd.o like
mesa3d"
* tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm:
drm: ci: docs: fix build warning - add missing escape
drm: Add initial ci/ subdirectory
|
|
Guangguan Wang says:
====================
Two fixes for SMC-R
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
smcr_port_add
While doing smcr_port_add, there maybe linkgroup add into or delete
from smc_lgr_list.list at the same time, which may result kernel crash.
So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in
smcr_port_add.
The crash calltrace show below:
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G
Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
Workqueue: events smc_ib_port_event_work [smc]
RIP: 0010:smcr_port_add+0xa6/0xf0 [smc]
RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297
RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000
RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918
R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4
R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08
FS: 0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0
PKRU: 55555554
Call Trace:
smc_ib_port_event_work+0x18f/0x380 [smc]
process_one_work+0x19b/0x340
worker_thread+0x30/0x370
? process_one_work+0x340/0x340
kthread+0x114/0x130
? __kthread_cancel_work+0x50/0x50
ret_from_fork+0x1f/0x30
Fixes: 1f90a05d9ff9 ("net/smc: add smcr_port_add() and smcr_link_up() processing")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the macro SMC_STAT_SERV_SUCC_INC, the smcd_version is used
to determin whether to increase the v1 statistic or the v2
statistic. It is correct for SMCD. But for SMCR, smcr_version
should be used.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The access to page pool `cache' array and the `count' variable
is not locked. Page pool cache access is fine as long as there
is only one consumer per pool.
octeontx2 driver fills in rx buffers from page pool in NAPI context.
If system is stressed and could not allocate buffers, refiiling work
will be delegated to a delayed workqueue. This means that there are
two cosumers to the page pool cache.
Either workqueue or IRQ/NAPI can be run on other CPU. This will lead
to lock less access, hence corruption of cache pool indexes.
To fix this issue, NAPI is rescheduled from workqueue context to refill
rx buffers.
Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Inject fault When select CONFIG_VCAP_KUNIT_TEST, the below memory leak
occurs. If kzalloc() for duprule succeeds, but the following
kmemdup() fails, the duprule, ckf and caf memory will be leaked. So kfree
them in the error path.
unreferenced object 0xffff122744c50600 (size 192):
comm "kunit_try_catch", pid 346, jiffies 4294896122 (age 911.812s)
hex dump (first 32 bytes):
10 27 00 00 04 00 00 00 1e 00 00 00 2c 01 00 00 .'..........,...
00 00 00 00 00 00 00 00 18 06 c5 44 27 12 ff ff ...........D'...
backtrace:
[<00000000394b0db8>] __kmem_cache_alloc_node+0x274/0x2f8
[<0000000001bedc67>] kmalloc_trace+0x38/0x88
[<00000000b0612f98>] vcap_dup_rule+0x50/0x460
[<000000005d2d3aca>] vcap_add_rule+0x8cc/0x1038
[<00000000eef9d0f8>] test_vcap_xn_rule_creator.constprop.0.isra.0+0x238/0x494
[<00000000cbda607b>] vcap_api_rule_remove_in_front_test+0x1ac/0x698
[<00000000c8766299>] kunit_try_run_case+0xe0/0x20c
[<00000000c4fe9186>] kunit_generic_run_threadfn_adapter+0x50/0x94
[<00000000f6864acf>] kthread+0x2e8/0x374
[<0000000022e639b3>] ret_from_fork+0x10/0x20
Fixes: 814e7693207f ("net: microchip: vcap api: Add a storage state to a VCAP rule")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
for_each_available_child_of_node performs an of_node_get
on each iteration, so a break out of the loop requires an
of_node_put.
This was done using the Coccinelle semantic patch
iterators/for_each_child.cocci
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
interface names
Starting with v197, systemd uses predictable interface network names,
the traditional interface naming scheme (eth0) is deprecated, therefore
it cannot be assumed that the eth0 interface exists on the host.
This modification makes the bind_bhash test program run in a separate
network namespace and no longer needs to consider the name of the
network interface on the host.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix preemption delays in the SGX code, remove unnecessarily
UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
make the x86 SMP init code a bit more conservative to fix kexec()
lockups"
* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
x86/smp: Don't send INIT to non-present and non-booted CPUs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf event fix from Ingo Molnar:
"Work around a firmware bug in the uncore PMU driver, affecting certain
Intel systems"
* tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Correct the number of CHAs on EMR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"perf tools maintainership:
- Add git information for perf-tools and perf-tools-next trees and
branches to the MAINTAINERS file. That is where development now
takes place and myself and Namhyung Kim have write access, more
people to come as we emulate other maintainer groups.
perf record:
- Record kernel data maps when 'perf record --data' is used, so that
global variables can be resolved and used in tools that do data
profiling.
perf trace:
- Remove the old, experimental support for BPF events in which a .c
file was passed as an event: "perf trace -e hello.c" to then get
compiled and loaded.
The only known usage for that, that shipped with the kernel as an
example for such events, augmented the raw_syscalls tracepoints and
was converted to a libbpf skeleton, reusing all the user space
components and the BPF code connected to the syscalls.
In the end just the way to glue the BPF part and the user space
type beautifiers changed, now being performed by libbpf skeletons.
The next step is to use BTF to do pretty printing of all syscall
types, as discussed with Alan Maguire and others.
Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all
path/filenames/strings, some of the networking data structures,
perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls
and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5
seconds:
# perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5
0.000 ( 9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
9.039 ( 0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
? ( ): gpm/991 ... [continued]: clock_nanosleep()) = 0
10.133 ( ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ...
? ( ): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
30.276 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
30.276 (2000.394 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0
1230.814 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
1230.814 (1000.404 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
2030.886 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
? ( ): crond/1172 ... [continued]: clock_nanosleep()) = 0
3242.699 ( ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
2030.886 (2000.385 ms): gpm/991 ... [continued]: clock_nanosleep()) = 0
3728.078 ( ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ...
3242.699 (1000.158 ms): pool-gsd-smart/3051 ... [continued]: clock_nanosleep()) = 0
4031.409 ( ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
10.133 (5000.375 ms): sleep/327642 ... [continued]: clock_nanosleep()) = 0
Performance counter stats for 'sleep 5':
2,617,347 cycles
1,855,997 instructions # 0.71 insn per cycle
5.002282128 seconds time elapsed
0.000855000 seconds user
0.000852000 seconds sys
perf annotate:
- Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1)
for licensing reasons, and we missed a build test on
tools/perf/tests makefile.
Since we now default to NDEBUG=1, we ended up segfaulting when
building with BUILD_NONDISTRO=1 because a needed initialization
routine was being "error checked" via an assert.
Fix it by explicitly checking the result and aborting instead if it
fails.
We better back propagate the error, but at least 'perf annotate' on
samples collected for a BPF program is back working when perf is
built with BUILD_NONDISTRO=1.
perf report/top:
- Add back TUI hierarchy mode header, that is seen when using 'perf
report/top --hierarchy'.
- Fix the number of entries for 'e' key in the TUI that was
preventing navigation of lines when expanding an entry.
perf report/script:
- Support cross platform register handling, allowing a perf.data file
collected on one architecture to have registers sampled correctly
displayed when analysis tools such as 'perf report' and 'perf
script' are used on a different architecture.
- Fix handling of event attributes in pipe mode, i.e. when one uses:
perf record -o - | perf report -i -
When no perf.data files are used.
- Handle files generated via pipe mode with a version of perf and
then read also via pipe mode with a different version of perf,
where the event attr record may have changed, use the record size
field to properly support this version mismatch.
perf probe:
- Accessing global variables from uprobes isn't supported, make the
error message state that instead of stating that some minimal
kernel version is needed to have that feature. This seems just a
tool limitation, the kernel probably has all that is needed.
perf tests:
- Fix a reference count related leak in the dlfilter v0 API where the
result of a thread__find_symbol_fb() is not matched with an
addr_location__exit() to drop the reference counts of the resolved
components (machine, thread, map, symbol, etc). Add a dlfilter test
to make sure that doesn't regresses.
- Lots of fixes for the 'perf test' written in shell script related
to problems found with the shellcheck utility.
- Fixes for 'perf test' shell scripts testing features enabled when
perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf
counters.
- Add perf record sample filtering test, things like the following
example, that gets implemented as a BPF filter attached to the
event:
# perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000'
- Improve the way the task_analyzer test checks if libtraceevent is
linked, using 'perf version --build-options' instead of the more
expensinve 'perf record -e "sched:sched_switch"'.
- Add support for riscv in the mmap-basic test. (This went as well
via the RiscV tree, same contents).
libperf:
- Implement riscv mmap support (This went as well via the RiscV tree,
same contents).
perf script:
- New tool that converts perf.data files to the firefox profiler
format so that one can use the visualizer at
https://profiler.firefox.com/. Done by Anup Sharma as part of this
year's Google Summer of Code.
One can generate the output and upload it to the web interface but
Anup also automated everything:
perf script gecko -F 99 -a sleep 60
- Support syscall name parsing on arm64.
- Print "cgroup" field on the same line as "comm".
perf bench:
- Add new 'uprobe' benchmark to measure the overhead of uprobes
with/without BPF programs attached to it.
- breakpoints are not available on power9, skip that test.
perf stat:
- Add #num_cpus_online literal to be used in 'perf stat' metrics, and
add this extra 'perf test' check that exemplifies its purpose:
TEST_ASSERT_VAL("#num_cpus_online",
expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0);
TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0);
TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online);
Miscellaneous:
- Improve tool startup time by lazily reading PMU, JSON, sysfs data.
- Improve error reporting in the parsing of events, passing YYLTYPE
to error routines, so that the output can show were the parsing
error was found.
- Add 'perf test' entries to check the parsing of events
improvements.
- Fix various leak for things detected by -fsanitize=address, mostly
things that would be freed at tool exit, including:
- Free evsel->filter on the destructor.
- Allow tools to register a thread->priv destructor and use it in
'perf trace'.
- Free evsel->priv in 'perf trace'.
- Free string returned by synthesize_perf_probe_point() when the
caller fails to do all it needs.
- Adjust various compiler options to not consider errors some
warnings when building with broken headers found in things like
python, flex, bison, as we otherwise build with -Werror. Some for
gcc, some for clang, some for some specific version of those, some
for some specific version of flex or bison, or some specific
combination of these components, bah.
- Allow customization of clang options for BPF target, this helps
building on gentoo where there are other oddities where BPF targets
gets passed some compiler options intended for the native build, so
building with WERROR=0 helps while these oddities are fixed.
- Dont pass ERR_PTR() values to perf_session__delete() in 'perf top'
and 'perf lock', fixing some segfaults when handling some odd
failures.
- Add LTO build option.
- Fix format of unordered lists in the perf docs
(tools/perf/Documentation)
- Overhaul the bison files, using constructs such as YYNOMEM.
- Remove unused tokens from the bison .y files.
- Add more comments to various structs.
- A few LoongArch enablement patches.
Vendor events (JSON):
- Add JSON metrics for Yitian 710 DDR (aarch64). Things like:
EventName, BriefDescription
visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.",
op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.",
op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",
- Add AmpereOne metrics (aarch64).
- Update N2 and V2 metrics (aarch64) and events using Arm telemetry
repo.
- Update scale units and descriptions of common topdown metrics on
aarch64. Things like:
- "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
- "BriefDescription": "Frontend bound L1 topdown metric",
+ "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
+ "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
- Update events for intel: meteorlake to 1.04, sapphirerapids to
1.15, Icelake+ metric constraints.
- Update files for the power10 platform"
* tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits)
perf parse-events: Fix driver config term
perf parse-events: Fixes relating to no_value terms
perf parse-events: Fix propagation of term's no_value when cloning
perf parse-events: Name the two term enums
perf list: Don't print Unit for "default_core"
perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake
perf dlfilter: Avoid leak in v0 API test use of resolve_address()
perf metric: Add #num_cpus_online literal
perf pmu: Remove str from perf_pmu_alias
perf parse-events: Make common term list to strbuf helper
perf parse-events: Minor help message improvements
perf pmu: Avoid uninitialized use of alias->str
perf jevents: Use "default_core" for events with no Unit
perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
perf test shell stat_bpf_counters: Fix test on Intel
perf test shell record_bpf_filter: Skip 6.2 kernel
libperf: Get rid of attr.id field
perf tools: Convert to perf_record_header_attr_id()
libperf: Add perf_record_header_attr_id()
perf tools: Handle old data in PERF_RECORD_ATTR
...
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- six smb3 client fixes including ones to allow controlling smb3
directory caching timeout and limits, and one debugging improvement
- one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option)
- one minor spnego registry update
* tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
spnego: add missing OID to oid registry
smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU
cifs: update internal module version number for cifs.ko
smb3: allow controlling maximum number of cached directories
smb3: add trace point for queryfs (statfs)
nls: Hide new NLS_UCS2_UTILS
smb3: allow controlling length of time directory entries are cached with dir leases
smb: propagate error code of extract_sharename()
|