Age | Commit message (Collapse) | Author |
|
This patch initializes debugfs and creates root directory
for each device. The tx_ring and rx_ring debugfs files
are implemented together.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216040532.1566229-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Oleksij Rempel says:
====================
lan78xx: Preparations for PHYlink
This patch set is a third part of the preparatory work for migrating
the lan78xx USB Ethernet driver to the PHYlink framework. During
extensive testing, I observed that resetting the USB adapter can lead to
various read/write errors. While the errors themselves are acceptable,
they generate excessive log messages, resulting in significant log spam.
This set improves error handling to reduce logging noise by addressing
errors directly and returning early when necessary.
====================
Link: https://patch.msgid.link/20241216120941.1690908-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enhance error handling in Wake-on-LAN (WoL) operations:
- Log a warning in `lan78xx_get_wol` if `lan78xx_read_reg` fails.
- Check and handle errors from `device_set_wakeup_enable` and
`phy_ethtool_set_wol` in `lan78xx_set_wol`.
- Ensure proper cleanup with a unified error handling path.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241216120941.1690908-7-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove PHY register handling from `lan78xx_get_regs` and
`lan78xx_get_regs_len`. Since the controller can have different PHYs
attached, the first 32 registers are not universally relevant or the
most interesting. Simplify the implementation to focus on MAC and device
registers.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241216120941.1690908-6-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename `phy_mutex` to `mdiobus_mutex` for clarity, as the mutex protects
MDIO bus access rather than PHY-specific operations. Update all
references to ensure consistency.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241216120941.1690908-5-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename the generic `done` label to the action-specific `exit_unlock`
label in `lan78xx_mac_reset`. This improves clarity by indicating the
specific cleanup action (mutex unlock) and aligns with best practices
for error handling and cleanup labels.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20241216120941.1690908-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update lan78xx_stop_hw to return -ETIMEDOUT instead of -ETIME when
a timeout occurs. While -ETIME indicates a general timer expiration,
-ETIMEDOUT is more commonly used for signaling operation timeouts and
provides better consistency with standard error handling in the driver.
The -ETIME checks in tx_complete() and rx_complete() are unrelated to
this error handling change. In these functions, the error values are derived
from urb->status, which reflects USB transfer errors. The error value from
lan78xx_stop_hw will be exposed in the following cases:
- usb_driver::suspend
- net_device_ops::ndo_stop (potentially, though currently the return value
is not used).
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20241216120941.1690908-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update `lan78xx_get_regs` to handle errors during register and PHY
reads. Log warnings for failed reads and exit the function early if an
error occurs. Drop all previously logged registers to signal
inconsistent readings to the user space. This ensures that invalid data
is not returned to users.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241216120941.1690908-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We are close to removing page->index. Use page->private instead, which
is least likely to be removed.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://patch.msgid.link/20241216155124.3114-1-willy@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Benefit from the recent conversion of the driver to NAPI and enable GRO
support through the use of napi_gro_receive(). Pass the NAPI pointer
from the bus driver (mlxsw_pci) to the switch driver (mlxsw_spectrum)
through the skb control block where various packet metadata is already
encoded.
The main motivation is to improve forwarding performance through the use
of GRO fraglist [1]. In my testing, when the forwarding data path is
simple (routing between two ports) there is not much difference in
forwarding performance between GRO disabled and GRO enabled with
fraglist.
The improvement becomes more noticeable as the data path becomes more
complex since it is traversed less times with GRO enabled. For example,
with 10 ingress and 10 egress flower filters with different priorities
on the two ports between which routing is performed, there is an
improvement of about 140% in forwarded bandwidth.
[1] https://lore.kernel.org/netdev/20200125102645.4782-1-steffen.klassert@secunet.com/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/21258fe55f608ccf1ee2783a5a4534220af28903.1734354812.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
inetpeer: reduce false sharing and atomic operations
After commit 8c2bd38b95f7 ("icmp: change the order of rate limits"),
there is a risk that a host receiving packets from an unique
source targeting closed ports is using a common inet_peer structure
from many cpus.
All these cpus have to acquire/release a refcount and update
the inet_peer timestamp (p->dtime)
Switch to pure RCU to avoid changing the refcount, and update
p->dtime only once per jiffy.
Tested:
DUT : 128 cores, 32 hw rx queues.
receiving 8,400,000 UDP packets per second, targeting closed ports.
Before the series:
- napi poll can not keep up, NIC drops 1,200,000 packets
per second.
- We use 20 % of cpu cycles
After this series:
- All packets are received (no more hw drops)
- We use 12 % of cpu cycles.
v1: https://lore.kernel.org/20241213130212.1783302-1-edumazet@google.com
====================
Link: https://patch.msgid.link/20241215175629.1248773-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All inet_getpeer() callers except ip4_frag_init() don't need
to acquire a permanent refcount on the inetpeer.
They can switch to full RCU protection.
Move the refcount_inc_not_zero() into ip4_frag_init(),
so that all the other callers no longer have to
perform a pair of expensive atomic operations on
a possibly contended cache line.
inet_putpeer() no longer needs to be exported.
After this patch, my DUT can receive 8,400,000 UDP packets
per second targeting closed ports, using 50% less cpu cycles
than before.
Also change two calls to l3mdev_master_ifindex() by
l3mdev_master_ifindex_rcu() (Ido ideas)
Fixes: 8c2bd38b95f7 ("icmp: change the order of rate limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet_putpeer() will be removed in the following patch,
because we will no longer use refcounts.
Update inetpeer timestamp (p->dtime) at lookup time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All callers of inet_getpeer() want to create an inetpeer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All callers of inet_getpeer_v4() and inet_getpeer_v6()
want to create an inetpeer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241215175629.1248773-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Thomas Weißschuh says:
====================
net: constify 'struct bin_attribute'
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.
====================
Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-0-ec460b91f274@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-4-ec460b91f274@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-2-ec460b91f274@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241216-sysfs-const-bin_attr-net-v1-1-ec460b91f274@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Xiao Liang reported that the cited commit changed netns handling
in newlink() of netkit, veth, and vxcan.
Before the patch, if we don't find a netns attribute in the peer
device attributes, we tried to find another netns attribute in
the outer netlink attributes by passing it to rtnl_link_get_net().
Let's restore the original behaviour.
Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()")
Reported-by: Xiao Liang <shaw.leon@gmail.com>
Closes: https://lore.kernel.org/netdev/CABAhCORBVVU8P6AHcEkENMj+gD2d3ce9t=A_o48E0yOQp8_wUQ@mail.gmail.com/#t
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Tested-by: Xiao Liang <shaw.leon@gmail.com>
Link: https://patch.msgid.link/20241216110432.51488-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nsim_pp_hold_write() has two problems:
1) It may return with rtnl held, as found by syzbot.
2) Its return value does not propagate an error if any.
Fixes: 1580cbcbfe77 ("net: netdevsim: add some fake page pool use")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241216083703.1859921-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
page_pool_is_last_ref() releases a reference while the name,
to me at least, suggests it just checks if the refcount is 1.
The semantics of the function are the same as those of
atomic_dec_and_test() and refcount_dec_and_test(), so just
use the _and_test() suffix.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://patch.msgid.link/20241215212938.99210-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On x86-64 calling bpf_get_smp_processor_id() in a kernel with CONFIG_SMP
disabled can trigger the following bug, as pcpu_hot is unavailable:
[ 8.471774] BUG: unable to handle page fault for address: 00000000936a290c
[ 8.471849] #PF: supervisor read access in kernel mode
[ 8.471881] #PF: error_code(0x0000) - not-present page
Fix by inlining a return 0 in the !CONFIG_SMP case.
Fixes: 1ae6921009e5 ("bpf: inline bpf_get_smp_processor_id() helper")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241217195813.622568-1-arighi@nvidia.com
|
|
The Hexagon-specific constant extender optimization in LLVM may crash on
Linux kernel code [1], such as fs/bcache/btree_io.c after
commit 32ed4a620c54 ("bcachefs: Btree path tracepoints") in 6.12:
clang: llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp:745: bool (anonymous namespace)::HexagonConstExtenders::ExtRoot::operator<(const HCE::ExtRoot &) const: Assertion `ThisB->getParent() == OtherB->getParent()' failed.
Stack dump:
0. Program arguments: clang --target=hexagon-linux-musl ... fs/bcachefs/btree_io.c
1. <eof> parser at end of file
2. Code generation
3. Running pass 'Function Pass Manager' on module 'fs/bcachefs/btree_io.c'.
4. Running pass 'Hexagon constant-extender optimization' on function '@__btree_node_lock_nopath'
Without assertions enabled, there is just a hang during compilation.
This has been resolved in LLVM main (20.0.0) [2] and backported to LLVM
19.1.0 but the kernel supports LLVM 13.0.1 and newer, so disable the
constant expander optimization using the '-mllvm' option when using a
toolchain that is not fixed.
Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/issues/99714 [1]
Link: https://github.com/llvm/llvm-project/commit/68df06a0b2998765cb0a41353fcf0919bbf57ddb [2]
Link: https://github.com/llvm/llvm-project/commit/2ab8d93061581edad3501561722ebd5632d73892 [3]
Reviewed-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a race condition between exiting wb_on_itr and completion write
backs. For example, we are in wb_on_itr mode and a Tx completion is
generated by HW, ready to be written back, as we are re-enabling
interrupts:
HW SW
| |
| | idpf_tx_splitq_clean_all
| | napi_complete_done
| |
| tx_completion_wb | idpf_vport_intr_update_itr_ena_irq
That tx_completion_wb happens before the vector is fully re-enabled.
Continuing with this example, it is a UDP stream and the
tx_completion_wb is the last one in the flow (there are no rx packets).
Because the HW generated the completion before the interrupt is fully
enabled, the HW will not fire the interrupt once the timer expires and
the write back will not happen. NAPI poll won't be called. We have
indicated we're back in interrupt mode but nothing else will trigger the
interrupt. Therefore, the completion goes unprocessed, triggering a Tx
timeout.
To mitigate this, fire a SW triggered interrupt upon exiting wb_on_itr.
This interrupt will catch the rogue completion and avoid the timeout.
Add logic to set the appropriate bits in the vector's dyn_ctl register.
Fixes: 9c4a27da0ecc ("idpf: enable WB_ON_ITR")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently the pending_async_copies count is decremented just
before a struct nfsd4_copy is destroyed. After commit aa0ebd21df9c
("NFSD: Add nfsd4_copy time-to-live") nfsd4_copy structures sticks
around for 10 lease periods after the COPY itself has completed,
the pending_async_copies count stays high for a long time. This
causes NFSD to avoid the use of background copy even though the
actual background copy workload might no longer be running.
In this patch, decrement pending_async_copies once async copy thread
is done processing the copy work.
Fixes: aa0ebd21df9c ("NFSD: Add nfsd4_copy time-to-live")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
SW triggered interrupts are guaranteed to fire after their timer
expires, unlike Tx and Rx interrupts which will only fire after the
timer expires _and_ a descriptor write back is available to be processed
by the driver.
Add the necessary fields, defines, and initializations to enable a SW
triggered interrupt in the vector's dyn_ctl register.
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In accordance with LicheePi 4A BSP the clock that comes to emmc/sdhci
is 198Mhz which is got through frequency division of source clock
VIDEO PLL by 4 [1].
But now the AP_SUBSYS driver sets the CLK EMMC SDIO to the same
frequency as the VIDEO PLL, equal to 792 MHz. This causes emmc/sdhci
to work 4 times slower.
Let's fix this issue by adding fixed factor clock that divides
VIDEO PLL by 4 for emmc/sdhci.
Link: https://github.com/revyos/thead-kernel/blob/7563179071a314f41cdcdbfd8cf6e101e73707f3/drivers/clk/thead/clk-light-fm.c#L454
Fixes: ae81b69fd2b1 ("clk: thead: Add support for T-Head TH1520 AP_SUBSYS clocks")
Signed-off-by: Maksim Kiselev <bigunclemax@gmail.com>
Link: https://lore.kernel.org/r/20241210083029.92620-1-bigunclemax@gmail.com
Tested-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Drew Fustini <dfustini@tenstorrent.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Add a devlink health reporter for MDD events. The 'dump' handler will
return the information captured in each call to ice_handle_mdd_event().
A device reset (CORER/PFR) will put the reporter back in healthy state.
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Co-developed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add Tx hang devlink health reporter, see struct ice_tx_hang_event to see
what exactly is reported. For now dump descriptors with little metadata
and skb diagnostic information.
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Set the cache-line-size parameter of the L2 cache for each core to the
correct value of 64 bytes.
Previously, the L2 cache line size was incorrectly set to 128 bytes
for the Broadcom BCM2712. This causes validation tests for the
Performance Application Programming Interface (PAPI) tool to fail as
they depend on sysfs accurately reporting cache line sizes.
The correct value of 64 bytes is stated in the official documentation of
the ARM Cortex A-72, which is linked in the comments of
arm64/boot/dts/broadcom/bcm2712.dtsi as the source for cache-line-size.
Fixes: faa3381267d0 ("arm64: dts: broadcom: Add minimal support for Raspberry Pi 5")
Signed-off-by: Willow Cunningham <willow.e.cunningham@maine.edu>
Link: https://lore.kernel.org/r/20241007212954.214724-1-willow.e.cunningham@maine.edu
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
|
|
[BUG]
There is a bug report in the mailing list where btrfs_run_delayed_refs()
failed to drop the ref count for logical 25870311358464 num_bytes
2113536.
The involved leaf dump looks like this:
item 166 key (25870311358464 168 2113536) itemoff 10091 itemsize 50
extent refs 1 gen 84178 flags 1
ref#0: shared data backref parent 32399126528000 count 0 <<<
ref#1: shared data backref parent 31808973717504 count 1
Notice the count number is 0.
[CAUSE]
There is no concrete evidence yet, but considering 0 -> 1 is also a
single bit flipped, it's possible that hardware memory bitflip is
involved, causing the on-disk extent tree to be corrupted.
[FIX]
To prevent us reading such corrupted extent item, or writing such
damaged extent item back to disk, enhance the handling of
BTRFS_EXTENT_DATA_REF_KEY and BTRFS_SHARED_DATA_REF_KEY keys for both
inlined and key items, to detect such 0 ref count and reject them.
CC: stable@vger.kernel.org # 5.4+
Link: https://lore.kernel.org/linux-btrfs/7c69dd49-c346-4806-86e7-e6f863a66f48@app.fastmail.com/
Reported-by: Frankie Fisher <frankie@terrorise.me.uk>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Btrfs like other file systems can't really deal with I/O not aligned to
it's internal block size (which strangely is called sector size in
btrfs, for historical reasons), but the block layer split helper doesn't
even know about that.
Round down the split boundary so that all I/Os are aligned.
Fixes: d5e4377d5051 ("btrfs: split zone append bios in btrfs_submit_bio")
CC: stable@vger.kernel.org # 6.12
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Otherwise it won't catch bios turned into regular writes by the block
level zone write plugging. The additional test it adds is for emulated
zone append.
Fixes: 9b1ce7f0c6f8 ("block: Implement zone append emulation")
CC: stable@vger.kernel.org # 6.12
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We have been using the following check
if (generation <= root->root_key.offset)
to make decisions about whether or not to visit a node during snapshot
delete. This is because for normal subvolumes this is set to 0, and for
snapshots it's set to the creation generation. The idea being that if
the generation of the node is less than or equal to our creation
generation then we don't need to visit that node, because it doesn't
belong to us, we can simply drop our reference and move on.
However reloc roots don't have their generation stored in
root->root_key.offset, instead that is the objectid of their
corresponding fs root. This means we can incorrectly not walk into
nodes that need to be dropped when deleting a reloc root.
There are a variety of consequences to making the wrong choice in two
distinct areas.
visit_node_for_delete()
1. False positive. We think we are newer than the block when we really
aren't. We don't visit the node and drop our reference to the node
and carry on. This would result in leaked space.
2. False negative. We do decide to walk down into a block that we
should have just dropped our reference to. However this means that
the child node will have refs > 1, so we will switch to
UPDATE_BACKREF, and then the subsequent walk_down_proc() will notice
that btrfs_header_owner(node) != root->root_key.objectid and it'll
break out of the loop, and then walk_up_proc() will drop our reference,
so this appears to be ok.
do_walk_down()
1. False positive. We are in UPDATE_BACKREF and incorrectly decide that
we are done and don't need to update the backref for our lower nodes.
This is another case that simply won't happen with relocation, as we
only have to do UPDATE_BACKREF if the node below us was shared and
didn't have FULL_BACKREF set, and since we don't own that node
because we're a reloc root we actually won't end up in this case.
2. False negative. Again this is tricky because as described above, we
simply wouldn't be here from relocation, because we don't own any of
the nodes because we never set btrfs_header_owner() to the reloc root
objectid, and we always use FULL_BACKREF, we never actually need to
set FULL_BACKREF on any children.
Having spent a lot of time stressing relocation/snapshot delete recently
I've not seen this pop in practice. But this is objectively incorrect,
so fix this to get the correct starting generation based on the root
we're dropping to keep me from thinking there's a problem here.
CC: stable@vger.kernel.org
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Drop "devlink_" prefix from files that sit in devlink/.
I'm going to add more files there, and repeating "devlink" does not feel
good. This is also the scheme used in most other places, most notably the
devlink core files are named like that.
devlink.[ch] stays as is.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add devlink_fmsg_dump_skb() function that adds some diagnostic
information about skb (like length, pkt type, MAC, etc) to devlink
fmsg mechanism using bunch of devlink_fmsg_put() function calls.
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add devlink_fmsg_put() that dispatches based on the type
of the value to put, example: bool -> devlink_fmsg_bool_pair_put().
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Improve CamelCase recognition logic to avoid reporting on
_Generic() use.
Other C keywords, such as _Bool, are intentionally omitted, as those
should be rather avoided in new source code.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The 'select FB_CORE' statement moved from CONFIG_DRM to DRM_CLIENT_LIB,
but there are now configurations that have code calling into fb_core
as built-in even though the client_lib itself is a loadable module:
x86_64-linux-ld: drivers/gpu/drm/drm_fb_helper.o: in function `drm_fb_helper_set_suspend':
drm_fb_helper.c:(.text+0x2c6): undefined reference to `fb_set_suspend'
x86_64-linux-ld: drivers/gpu/drm/drm_fb_helper.o: in function `drm_fb_helper_resume_worker':
drm_fb_helper.c:(.text+0x2e1): undefined reference to `fb_set_suspend'
In addition to DRM_CLIENT_LIB, the 'select' needs to be at least in
DRM_KMS_HELPER and DRM_GEM_SHMEM_HELPER, so add it here.
This patch is the KMS_HELPER part of [1].
Fixes: dadd28d4142f ("drm/client: Add client-lib module")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/series/141411/ # 1
Link: https://patchwork.freedesktop.org/patch/msgid/20241216074450.8590-4-tzimmermann@suse.de
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:
- Always try to initialize the idle functions when graph tracer starts
A bug was found that when a CPU is offline when graph tracing starts
and then comes online, that CPU is not traced. The fix to that was to
move the initialization of the idle shadow stack over to the hot plug
online logic, which also handle onlined CPUs. The issue was that it
removed the initialization of the shadow stack when graph tracing
starts, but the callbacks to the hot plug logic do nothing if graph
tracing isn't currently running. Although that fix fixed the onlining
of a CPU during tracing, it broke the CPUs that were already online.
- Have microblaze not try to get the "true parent" in function tracing
If function tracing and graph tracing are both enabled at the same
time the parent of the functions traced by the function tracer may
sometimes be the graph tracing trampoline. The graph tracing hijacks
the return pointer of the function to trace it, but that can
interfere with the function tracing parent output.
This was fixed by using the ftrace_graph_ret_addr() function passing
in the kernel stack pointer using the ftrace_regs_get_stack_pointer()
function. But Al Viro reported that Microblaze does not implement the
kernel_stack_pointer(regs) helper function that
ftrace_regs_get_stack_pointer() uses and fails to compile when
function graph tracing is enabled.
It was first thought that this was a microblaze issue, but the real
cause is that this only works when an architecture implements
HAVE_DYNAMIC_FTRACE_WITH_ARGS, as a requirement for that config is to
have ftrace always pass a valid ftrace_regs to the callbacks. That
also means that the architecture supports
ftrace_regs_get_stack_pointer()
Microblaze does not set HAVE_DYNAMIC_FTRACE_WITH_ARGS nor does it
implement ftrace_regs_get_stack_pointer() which caused it to fail to
build. Only implement the "true parent" logic if an architecture has
that config set"
* tag 'ftrace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Do not find "true_parent" if HAVE_DYNAMIC_FTRACE_WITH_ARGS is not set
fgraph: Still initialize idle shadow stacks when starting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix DirectMap accounting in /proc/meminfo file
- Fix strscpy() return code handling that led to "unsigned 'len' is
never less than zero" warning
- Fix the calculation determining whether to use three- or four-level
paging: account KMSAN modules metadata
* tag 's390-6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: Consider KMSAN modules metadata for paging levels
s390/ipl: Fix never less than zero warning
s390/mm: Fix DirectMap accounting
|
|
Select FB_CORE if GEM's DMA and TTM implementations support fbdev
emulation. Fixes linker errors about missing symbols from the fbdev
subsystem.
Also see [1] for a related SHMEM fix.
Fixes: dadd28d4142f ("drm/client: Add client-lib module")
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/series/141411/ # 1
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241216074450.8590-3-tzimmermann@suse.de
|
|
Do not select BACKLIGHT_CLASS_DEVICE from FB_BACKLIGHT. The latter
only controls backlight support within fbdev core code and data
structures.
Make fbdev drivers depend on BACKLIGHT_CLASS_DEVICE and let users
select it explicitly. Fixes warnings about recursive dependencies,
such as
error: recursive dependency detected!
symbol BACKLIGHT_CLASS_DEVICE is selected by FB_BACKLIGHT
symbol FB_BACKLIGHT is selected by FB_SH_MOBILE_LCDC
symbol FB_SH_MOBILE_LCDC depends on FB_DEVICE
symbol FB_DEVICE depends on FB_CORE
symbol FB_CORE is selected by DRM_GEM_DMA_HELPER
symbol DRM_GEM_DMA_HELPER is selected by DRM_PANEL_ILITEK_ILI9341
symbol DRM_PANEL_ILITEK_ILI9341 depends on BACKLIGHT_CLASS_DEVICE
BACKLIGHT_CLASS_DEVICE is user-selectable, so making drivers adapt to
it is the correct approach in any case. For most drivers, backlight
support is also configurable separately.
v3:
- Select BACKLIGHT_CLASS_DEVICE in PowerMac defconfigs (Christophe)
- Fix PMAC_BACKLIGHT module dependency corner cases (Christophe)
v2:
- s/BACKLIGHT_DEVICE_CLASS/BACKLIGHT_CLASS_DEVICE (Helge)
- Fix fbdev driver-dependency corner case (Arnd)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241216074450.8590-2-tzimmermann@suse.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"The first one fixes a syzbot UAF report caused by a commit introduced
in this cycle, but it also addresses a longstanding memory leak. The
second one resolves a PSI memstall mis-accounting issue.
The remaining patches switch file-backed mounts to use buffered I/Os
by default instead of direct I/Os, since the page cache of underlay
files is typically valid and maybe even dirty. This change also aligns
with the default policy of loopback devices. A mount option has been
added to try to use direct I/Os explicitly.
Summary:
- Fix (pcluster) memory leak and (sbi) UAF after umounting
- Fix a case of PSI memstall mis-accounting
- Use buffered I/Os by default for file-backed mounts"
* tag 'erofs-for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: use buffered I/O for file-backed mounts by default
erofs: reference `struct erofs_device_info` for erofs_map_dev
erofs: use `struct erofs_device_info` for the primary device
erofs: add erofs_sb_free() helper
MAINTAINERS: erofs: update Yue Hu's email address
erofs: fix PSI memstall accounting
erofs: fix rare pcluster memory leak after unmounting
|
|
the lock->wait_lock
Bert reported seeing occasional boot hangs when running with
PREEPT_RT and bisected it down to commit 894d1b3db41c
("locking/mutex: Remove wakeups from under mutex::wait_lock").
It looks like I missed a few spots where we drop the wait_lock and
potentially call into schedule without waking up the tasks on the
wake_q structure. Since the tasks being woken are ww_mutex tasks
they need to be able to run to release the mutex and unblock the
task that currently is planning to wake them. Thus we can deadlock.
So make sure we wake the wake_q tasks when we unlock the wait_lock.
Closes: https://lore.kernel.org/lkml/20241211182502.2915-1-spasswolf@web.de
Fixes: 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241212222138.2400498-1-jstultz@google.com
|
|
The only difference between 5 and 6 is the new counters snapshotting
group, without the following counters snapshotting enabling patches,
it's impossible to utilize the feature in a PEBS record. It's safe to
share the same code path with format 5.
Add format 6, so the end user can at least utilize the legacy PEBS
features.
Fixes: a932aa0e868f ("perf/x86: Add Lunar Lake and Arrow Lake support")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241216204505.748363-1-kan.liang@linux.intel.com
|
|
From the perspective of the uncore PMU, the Clearwater Forest is the
same as the previous Sierra Forest. The only difference is the event
list, which will be supported in the perf tool later.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241211161146.235253-1-kan.liang@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fix from Kees Cook:
"Silence a GCC value-range warning that is being ironically triggered
by bounds checking"
* tag 'hardening-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fortify: Hide run-time copy size from value range tracking
|
|
The TP_printk() portion of a trace event is executed at the time a event
is read from the trace. This can happen seconds, minutes, hours, days,
months, years possibly later since the event was recorded. If the print
format contains a dereference to a string via "%s", and that string was
allocated, there's a chance that string could be freed before it is read
by the trace file.
To protect against such bugs, there are two functions that verify the
event. The first one is test_event_printk(), which is called when the
event is created. It reads the TP_printk() format as well as its arguments
to make sure nothing may be dereferencing a pointer that was not copied
into the ring buffer along with the event. If it is, it will trigger a
WARN_ON().
For strings that use "%s", it is not so easy. The string may not reside in
the ring buffer but may still be valid. Strings that are static and part
of the kernel proper which will not be freed for the life of the running
system, are safe to dereference. But to know if it is a pointer to a
static string or to something on the heap can not be determined until the
event is triggered.
This brings us to the second function that tests for the bad dereferencing
of strings, trace_check_vprintf(). It would walk through the printf format
looking for "%s", and when it finds it, it would validate that the pointer
is safe to read. If not, it would produces a WARN_ON() as well and write
into the ring buffer "[UNSAFE-MEMORY]".
The problem with this is how it used va_list to have vsnprintf() handle
all the cases that it didn't need to check. Instead of re-implementing
vsnprintf(), it would make a copy of the format up to the %s part, and
call vsnprintf() with the current va_list ap variable, where the ap would
then be ready to point at the string in question.
For architectures that passed va_list by reference this was possible. For
architectures that passed it by copy it was not. A test_can_verify()
function was used to differentiate between the two, and if it wasn't
possible, it would disable it.
Even for architectures where this was feasible, it was a stretch to rely
on such a method that is undocumented, and could cause issues later on
with new optimizations of the compiler.
Instead, the first function test_event_printk() was updated to look at
"%s" as well. If the "%s" argument is a pointer outside the event in the
ring buffer, it would find the field type of the event that is the problem
and mark the structure with a new flag called "needs_test". The event
itself will be marked by TRACE_EVENT_FL_TEST_STR to let it be known that
this event has a field that needs to be verified before the event can be
printed using the printf format.
When the event fields are created from the field type structure, the
fields would copy the field type's "needs_test" value.
Finally, before being printed, a new function ignore_event() is called
which will check if the event has the TEST_STR flag set (if not, it
returns false). If the flag is set, it then iterates through the events
fields looking for the ones that have the "needs_test" flag set.
Then it uses the offset field from the field structure to find the pointer
in the ring buffer event. It runs the tests to make sure that pointer is
safe to print and if not, it triggers the WARN_ON() and also adds to the
trace output that the event in question has an unsafe memory access.
The ignore_event() makes the trace_check_vprintf() obsolete so it is
removed.
Link: https://lore.kernel.org/all/CAHk-=wh3uOnqnZPpR0PeLZZtyWbZLboZ7cHLCKRWsocvs9Y7hQ@mail.gmail.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20241217024720.848621576@goodmis.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|