Age | Commit message (Collapse) | Author |
|
While tracking an IDPF bug, I found that idpf_vport_splitq_napi_poll()
was not following NAPI rules.
It can indeed return @budget after napi_complete() has been called.
Add two debug conditions in networking core to hopefully catch
this kind of bugs sooner.
IDPF bug will be fixed in a separate patch.
[ 72.441242] repoll requested for device eth1 idpf_vport_splitq_napi_poll [idpf] but napi is not scheduled.
[ 72.446291] list_del corruption. next->prev should be ff31783d93b14040, but was ff31783d93b10080. (next=ff31783d93b10080)
[ 72.446659] kernel BUG at lib/list_debug.c:67!
[ 72.446816] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 72.447031] CPU: 156 UID: 0 PID: 16258 Comm: ip Tainted: G W 6.15.0-dbg-DEV #1944 NONE
[ 72.447340] Tainted: [W]=WARN
[ 72.447702] RIP: 0010:__list_del_entry_valid_or_report (lib/list_debug.c:65)
[ 72.450630] Call Trace:
[ 72.450720] <IRQ>
[ 72.450797] net_rx_action (include/linux/list.h:215 include/linux/list.h:287 net/core/dev.c:7385 net/core/dev.c:7516)
[ 72.450928] ? lock_release (kernel/locking/lockdep.c:?)
[ 72.451059] ? clockevents_program_event (kernel/time/clockevents.c:?)
[ 72.451222] handle_softirqs (kernel/softirq.c:579)
[ 72.451356] ? do_softirq (kernel/softirq.c:480)
[ 72.451480] ? idpf_vc_xn_exec (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:462) idpf
[ 72.451635] do_softirq (kernel/softirq.c:480)
[ 72.451750] </IRQ>
[ 72.451828] <TASK>
[ 72.451905] __local_bh_enable_ip (kernel/softirq.c:?)
[ 72.452051] idpf_vc_xn_exec (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:462) idpf
[ 72.452210] idpf_send_delete_queues_msg (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:2083) idpf
[ 72.452390] idpf_vport_stop (drivers/net/ethernet/intel/idpf/idpf_lib.c:837 drivers/net/ethernet/intel/idpf/idpf_lib.c:868) idpf
[ 72.452541] ? idpf_vport_stop (include/linux/bottom_half.h:? include/linux/netdevice.h:4762 drivers/net/ethernet/intel/idpf/idpf_lib.c:855) idpf
[ 72.452695] idpf_initiate_soft_reset (drivers/net/ethernet/intel/idpf/idpf_lib.c:?) idpf
[ 72.452867] idpf_change_mtu (drivers/net/ethernet/intel/idpf/idpf_lib.c:2189) idpf
[ 72.453015] netif_set_mtu_ext (net/core/dev.c:9437)
[ 72.453157] ? packet_notifier (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/packet/af_packet.c:4240)
[ 72.453292] netif_set_mtu (net/core/dev.c:9515)
[ 72.453416] dev_set_mtu (net/core/dev_api.c:?)
[ 72.453534] bond_change_mtu (drivers/net/bonding/bond_main.c:4833)
[ 72.453666] netif_set_mtu_ext (net/core/dev.c:9437)
[ 72.453803] do_setlink (net/core/rtnetlink.c:3116)
[ 72.453925] ? rtnl_newlink (net/core/rtnetlink.c:3901)
[ 72.454055] ? rtnl_newlink (net/core/rtnetlink.c:3901)
[ 72.454185] ? rtnl_newlink (net/core/rtnetlink.c:3901)
[ 72.454314] ? trace_contention_end (include/trace/events/lock.h:122)
[ 72.454467] ? __mutex_lock (arch/x86/include/asm/preempt.h:85 kernel/locking/mutex.c:611 kernel/locking/mutex.c:746)
[ 72.454597] ? cap_capable (include/trace/events/capability.h:26)
[ 72.454721] ? security_capable (security/security.c:?)
[ 72.454857] rtnl_newlink (net/core/rtnetlink.c:?)
[ 72.454982] ? lock_is_held_type (kernel/locking/lockdep.c:5599 kernel/locking/lockdep.c:5938)
[ 72.455121] ? __lock_acquire (kernel/locking/lockdep.c:?)
[ 72.455256] ? __change_page_attr_set_clr (arch/x86/mm/pat/set_memory.c:685)
[ 72.455438] ? __lock_acquire (kernel/locking/lockdep.c:?)
[ 72.455582] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885)
[ 72.455721] ? lock_acquire (kernel/locking/lockdep.c:5866)
[ 72.455848] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885)
[ 72.455987] ? lock_release (kernel/locking/lockdep.c:?)
[ 72.456117] ? rcu_read_unlock (include/linux/rcupdate.h:341 include/linux/rcupdate.h:871)
[ 72.456249] ? __pfx_rtnl_newlink (net/core/rtnetlink.c:3956)
[ 72.456388] rtnetlink_rcv_msg (net/core/rtnetlink.c:6955)
[ 72.456526] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885)
[ 72.456671] ? lock_acquire (kernel/locking/lockdep.c:5866)
[ 72.456802] ? net_generic (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 include/net/netns/generic.h:45)
[ 72.456929] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6858)
[ 72.457082] netlink_rcv_skb (net/netlink/af_netlink.c:2534)
[ 72.457212] netlink_unicast (net/netlink/af_netlink.c:1313)
[ 72.457344] netlink_sendmsg (net/netlink/af_netlink.c:1883)
[ 72.457476] __sock_sendmsg (net/socket.c:712)
[ 72.457602] ____sys_sendmsg (net/socket.c:?)
[ 72.457735] ? _copy_from_user (arch/x86/include/asm/uaccess_64.h:126 arch/x86/include/asm/uaccess_64.h:134 arch/x86/include/asm/uaccess_64.h:141 include/linux/uaccess.h:178 lib/usercopy.c:18)
[ 72.457875] ___sys_sendmsg (net/socket.c:2620)
[ 72.458042] ? __call_rcu_common (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 arch/x86/include/asm/irqflags.h:159 kernel/rcu/tree.c:3107)
[ 72.458185] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457)
[ 72.458324] ? lock_acquire (kernel/locking/lockdep.c:5866)
[ 72.458451] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457)
[ 72.458588] ? lock_release (kernel/locking/lockdep.c:?)
[ 72.458718] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457)
[ 72.458856] __x64_sys_sendmsg (net/socket.c:2652)
[ 72.458997] ? do_syscall_64 (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 include/linux/entry-common.h:198 arch/x86/entry/syscall_64.c:90)
[ 72.459136] do_syscall_64 (arch/x86/entry/syscall_64.c:?)
[ 72.459259] ? exc_page_fault (arch/x86/mm/fault.c:1542)
[ 72.459387] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 72.459555] RIP: 0033:0x7fd15f17cbd0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250520121908.1805732-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
idpf_vport_splitq_napi_poll() can incorrectly return @budget
after napi_complete_done() has been called.
This violates NAPI rules, because after napi_complete_done(),
current thread lost napi ownership.
Move the test against POLL_MODE before the napi_complete_done().
Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support")
Reported-by: Peter Newman <peternewman@google.com>
Closes: https://lore.kernel.org/netdev/20250520121908.1805732-1-edumazet@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joshua Hay <joshua.a.hay@intel.com>
Cc: Alan Brady <alan.brady@intel.com>
Cc: Madhu Chittim <madhu.chittim@intel.com>
Cc: Phani Burra <phani.r.burra@intel.com>
Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Link: https://patch.msgid.link/20250520124030.1983936-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enic started using netif_get_num_default_rss_queues() to set the number
of RQs used in commit cc94d6c4d40c ("enic: Adjust used MSI-X
wq/rq/cq/interrupt resources in a more robust way")
This resulted in machines with less than 16 cpus using less than 8 RQs.
Allow enic to use at least 8 RQs no matter how many cpus are in the
machine to not impact existing enic workloads after a kernel upgrade.
Reviewed-by: John Daley <johndale@cisco.com>
Reviewed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250521-enic_min_8rq-v1-1-691bd2353273@cisco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is [1/3] part of hinic3 Ethernet driver initial submission.
With this patch hinic3 is a valid kernel module but non-functional
driver.
The driver parts contained in this patch:
Module initialization.
PCI driver registration but with empty id_table.
Auxiliary driver registration.
Net device_ops registration but open/stop are empty stubs.
tx/rx logic.
All major data structures of the driver are fully introduced with the
code that uses them but without their initialization code that requires
management interface with the hw.
Co-developed-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Xin Guo <guoxin09@huawei.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
Co-developed-by: Gur Stavi <gur.stavi@huawei.com>
Signed-off-by: Gur Stavi <gur.stavi@huawei.com>
Link: https://patch.msgid.link/76a137ffdfe115c737c2c224f0c93b60ba53cc16.1747736586.git.gur.stavi@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the misspelling of "Electronics" in copyright headers across:
- s3fwrn5 driver
- virtual_ncidev driver
Signed-off-by: Sumanth Gavini <sumanth.gavini@yahoo.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250520072119.176018-1-sumanth.gavini@yahoo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Logic here always sets hdr->version to 2 if it is not a BE3 or Lancer chip,
even if it is BE2. Use 'else if' to prevent multiple assignments, setting
version 0 for BE2, version 1 for BE3 and Lancer, and version 2 for others.
Fixes potential incorrect version setting when BE2_chip and
BE3_chip/lancer_chip checks could both be true.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250519141731.691136-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The list_first_entry() macro never returns NULL. If the list is
empty then it returns an invalid pointer. Use list_first_entry_or_null()
to check if the list is empty.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202505080231.7OXwq4Te-lkp@intel.com/
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
I found that rename fails after cifs mount due to update of
lookup_one_qstr_excl().
mv a/c b/
mv: cannot move 'a/c' to 'b/c': No such file or directory
In order to rename to a new name regardless of whether the dentry is
negative, we need to get the dentry through lookup_one_qstr_excl().
So It will not return error if the name doesn't exist.
Fixes: 204a575e91f3 ("VFS: add common error checks to lookup_one_qstr_excl()")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Subsystem rstat locks are dynamically allocated per-cpu. It was discovered
that a panic can occur during this allocation when the lock size is zero.
This is the case on non-smp systems, since arch_spinlock_t is defined as an
empty struct. Prevent this allocation when !CONFIG_SMP by adding a
pre-processor conditional around the affected block.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Reported-by: Klara Modin <klarasmodin@gmail.com>
Fixes: 748922dcfabd ("cgroup: use subsystem-specific rstat locks to avoid contention")
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
If a shorter than assumed transfer was seen, a partial buffer will have
been filled. For that case it isn't sane to attempt to fill more into
the bundle before posting a completion, as that will cause a gap in
the received data.
Check if the iterator has hit zero and only allow to continue a bundle
operation if that is the case.
Also ensure that for putting finished buffers, only the current transfer
is accounted. Otherwise too many buffers may be put for a short transfer.
Link: https://github.com/axboe/liburing/issues/1409
Cc: stable@vger.kernel.org
Fixes: 7c71a0af81ba ("io_uring/net: improve recv bundles")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Fixes for some SoC clk drivers:
- Define the gate clk for the OTG PHY on Rockchip RK3576 so the nvmem
driver actually works
- Initialize clk_hw_onecell_data::num before accessing the 'hws'
array to keep UBSAN happy
- Fix a perf degradation on the Allwinner D1 MMC clk that was making
things half bad
- Fix the Allwinner SNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT macro to have
proper order of arguments"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: d1: Add missing divider for MMC mod clocks
clk: s2mps11: initialise clk_hw_onecell_data::num before accessing ::hws[] in probe()
clk: sunxi-ng: fix order of arguments in clock macro
clk: rockchip: rk3576: define clk_otp_phy_g
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- Convert to a 'fs_str' tracepoint that just emits as a string: this
lets us build up the tracepoint with a printbuf, using our pretty
printers, and they're much easier to manage
- Include locks_held, before and after
- Include the btree node pointer we failed on (error pointer, null, or
real node)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a flag for tracking whether a directory has case-insensitive
descendents - so that overlayfs can disallow mounting, even though the
filesystem supports case insensitivity.
This is a new on disk format version, with a (cheap) upgrade to ensure
the flag is correctly set on existing inodes.
Create, rename and fssetxattr are all plumbed to ensure the new flag is
set, and we've got new fsck code that hooks into check_inode(0.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Move a fsck.c helper into inode.c, eliminate some duplicate and organize
the inode lookup helpers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a better helper for printing out paths of inodes when we don't know
the subvolume, for fsck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bi_casefold only makes sense for directories, and since it's one of the
variable length fields setting it unnecessarily wastes space.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There is one in for_each_btree_key_max().
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There's no reason to be running this inside our transaction; it forces
us to copy the key we're updating to a temporary, which we'd like to
skip.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It used to be that we had a fixed maximum number of btree paths to work
with - 64.
That's no longer the case, so bch2_extent_atomic_end() doesn't have to
be as strict.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Accounting has gotten quite heavy, and there's lots of redundancy in
accounting updates within a transaction, as we often add/delete multiple
extents that touch the same accountign counters.
This will reduce the amount of data that we journal, and reduce pressure
downstream on the btree write buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There can be a lot of rendundancy in accounting updates within a single
btree transaction.
Split out accounting updates so that they can be deduped, in the next
commit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This fixes the subvol loop checking and directory loop checking to print
the loop.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Detect buckets with missing backpointers, and run repair on demand.
__bch2_move_data_phys() now calls
bch2_check_bucket_backpointer_mismatch() as it walks buckets, which
checks for missing backpointers by comparing backpointers against bucket
sector counts.
When missing backpointers are detected, we kick off
bch2_check_extents_to_backpointers() asynchronously - right away if
we're trying to evacuate, or with a threshold if we're just running
copygc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add some more helpers, and mismatches is now a superset of the empty
bitmap - simplifies most checks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we request a recovery pass to be run online, i.e. not during
recovery, if it's an online pass it'll now be run in the background,
instead of waiting for the next mount.
To avoid situations where recovery passes are running continuously, this
also includes ratelimiting: if the RUN_RECOVERY_PASS_ratelimit flag is
passed, the pass may be deferred until later - depending on the runtime
and last run stats in the recovery_passes superblock section.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Consolidate the run_explicit_recovery_pass() interfaces by adding a
flags parameter; this will also let us add a RUN_RECOVERY_PASS_ratelimit
flag.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Show recovery pass status in sysfs - important now that we're running
them automatically in the background.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We want recovery.curr_pass to be private to the recovery passes code,
for better showing recovery pass status; also, it may rewind and is
generally not the correct member to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Consolidate bch2_run_recovery_passes() and
bch2_run_online_recovery_passes(), prep work for automatically
scheduling and running recovery passes in the background.
- Now takes a mask of which passes to run, automatic background repair
will pass in sb.recovery_passes_required.
- Skips passes that are failing: a pass that failed may be reattempted
after another pass succeeds (some passes depend on repair done by
other passes for successful completion).
- bch2_recovery_passes_match() helper to skip alloc passes on a
filesystem without alloc info.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch_fs has gotten obnoxiously big, let's start organizing thins a bit
better.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Avoid doing more updates if we already have one.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This ensures that our pending rebalance work accounting is accurate
quickly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Improve this so it can be used by fsck.c check_inode(); it provides a
much better error message than the check_inode() version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Factor out a small common helper.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|