Age | Commit message (Collapse) | Author |
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The XOL (execute out-of-line) buffer is used to single-step the
replaced instruction(s) for uprobes. The RISC-V port was missing a
proper fence.i (i$ flushing) after constructing the XOL buffer, which
can result in incorrect execution of stale/broken instructions.
This was found running the BPF selftests "test_progs:
uprobe_autoattach, attach_probe" on the Spacemit K1/X60, where the
uprobes tests randomly blew up.
Reviewed-by: Guo Ren <guoren@kernel.org>
Fixes: 74784081aac8 ("riscv: Add uprobes supported")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250419111402.1660267-2-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The flush_icache_range() function is implemented as a "function-like
macro with unused parameters", which can result in "unused variables"
warnings.
Replace the macro with a static inline function, as advised by
Documentation/process/coding-style.rst.
Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250419111402.1660267-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The single core change is an obvious bug fix (and falls within the LF
guidelines for patches from sanctioned entities). The other driver
changes are a bit larger but likewise pretty obvious"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpi3mr: Add level check to control event logging
scsi: ufs: core: Add NULL check in ufshcd_mcq_compl_pending_transfer()
scsi: core: Clear flags for scsi_cmnd that did not complete
scsi: ufs: Introduce quirk to extend PA_HIBERN8TIME for UFS devices
scsi: ufs: qcom: Add quirks for Samsung UFS devices
scsi: target: iscsi: Fix timeout on deleted connection
scsi: mpi3mr: Reset the pending interrupt flag
scsi: mpi3mr: Fix pending I/O counter
scsi: ufs: mcq: Add NULL check in ufshcd_mcq_abort()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"Fix some Landlock audit issues, add related tests, and updates
documentation"
* tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Update log documentation
landlock: Fix documentation for landlock_restrict_self(2)
landlock: Fix documentation for landlock_create_ruleset(2)
selftests/landlock: Add PID tests for audit records
selftests/landlock: Factor out audit fixture in audit_test
landlock: Log the TGID of the domain creator
landlock: Remove incorrect warning
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc4).
This pull includes wireless and a fix to vxlan which isn't
in Linus's tree just yet. The latter creates with a silent conflict
/ build breakage, so merging it now to avoid causing problems.
drivers/net/vxlan/vxlan_vnifilter.c
094adad91310 ("vxlan: Use a single lock to protect the FDB table")
087a9eb9e597 ("vxlan: vnifilter: Fix unlocked deletion of default FDB entry")
https://lore.kernel.org/20250423145131.513029-1-idosch@nvidia.com
No "normal" conflicts, or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
insn_decoder_test found a problem with decoding APX CTEST instructions:
Found an x86 instruction decoder bug, please report this.
ffffffff810021df 62 54 94 05 85 ff ctestneq
objdump says 6 bytes, but insn_get_length() says 5
It happens because x86-opcode-map.txt doesn't specify arguments for the
instruction and the decoder doesn't expect to see ModRM byte.
Fixes: 690ca3a3067f ("x86/insn: Add support for APX EVEX instructions to the opcode map")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v6.10+
Link: https://lore.kernel.org/r/20250423065815.2003231-1-kirill.shutemov@linux.intel.com
|
|
Perf doesn't work at perf stat for hardware events on certain x86 platforms:
$perf stat -- sleep 1
Performance counter stats for 'sleep 1':
16.44 msec task-clock # 0.016 CPUs utilized
2 context-switches # 121.691 /sec
0 cpu-migrations # 0.000 /sec
54 page-faults # 3.286 K/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
The reason is that the check in x86_pmu_hw_config() for sampling events is
unexpectedly applied to counting events as well.
It should only impact x86 platforms with limit_period used for non-PEBS
events. For Intel platforms, it should only impact some older platforms,
e.g., HSW, BDW and NHM.
Fixes: 88ec7eedbbd2 ("perf/x86: Fix low freqency setting issue")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250423064724.3716211-1-luogengkun@huaweicloud.com
|
|
When a VNI is deleted from a VXLAN device in 'vnifilter' mode, the FDB
entry associated with the default remote (assuming one was configured)
is deleted without holding the hash lock. This is wrong and will result
in a warning [1] being generated by the lockdep annotation that was
added by commit ebe642067455 ("vxlan: Create wrappers for FDB lookup").
Reproducer:
# ip link add vx0 up type vxlan dstport 4789 external vnifilter local 192.0.2.1
# bridge vni add vni 10010 remote 198.51.100.1 dev vx0
# bridge vni del vni 10010 dev vx0
Fix by acquiring the hash lock before the deletion and releasing it
afterwards. Blame the original commit that introduced the issue rather
than the one that exposed it.
[1]
WARNING: CPU: 3 PID: 392 at drivers/net/vxlan/vxlan_core.c:417 vxlan_find_mac+0x17f/0x1a0
[...]
RIP: 0010:vxlan_find_mac+0x17f/0x1a0
[...]
Call Trace:
<TASK>
__vxlan_fdb_delete+0xbe/0x560
vxlan_vni_delete_group+0x2ba/0x940
vxlan_vni_del.isra.0+0x15f/0x580
vxlan_process_vni_filter+0x38b/0x7b0
vxlan_vnifilter_process+0x3bb/0x510
rtnetlink_rcv_msg+0x2f7/0xb70
netlink_rcv_skb+0x131/0x360
netlink_unicast+0x426/0x710
netlink_sendmsg+0x75a/0xc20
__sock_sendmsg+0xc1/0x150
____sys_sendmsg+0x5aa/0x7b0
___sys_sendmsg+0xfc/0x180
__sys_sendmsg+0x121/0x1b0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250423145131.513029-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Some more fixes, notably:
* iwlwifi: various regression and iwlmld fixes
* mac80211: fix TX frames in monitor mode
* brcmfmac: error handling for firmware load
* tag 'wireless-2025-04-24' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: restore missing initialization of async_handlers_list
wifi: brcm80211: fmac: Add error handling for brcmf_usb_dl_writeimage()
wifi: plfxlc: Remove erroneous assert in plfxlc_mac_release
wifi: iwlwifi: fix the check for the SCRATCH register upon resume
wifi: iwlwifi: don't warn if the NIC is gone in resume
wifi: iwlwifi: mld: fix BAID validity check
wifi: iwlwifi: back off on continuous errors
wifi: iwlwifi: mld: only create debugfs symlink if it does not exist
wifi: iwlwifi: mld: inform trans on init failure
wifi: iwlwifi: mld: properly handle async notification in op mode start
Revert "wifi: iwlwifi: make no_160 more generic"
Revert "wifi: iwlwifi: add support for BE213"
wifi: mac80211: restore monitor for outgoing frames
====================
Link: https://patch.msgid.link/20250424120535.56499-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.15, round #2
- Single fix for broken usage of 'multi-MIDR' infrastructure in PI
code, adding an open-coded erratum check for everyone's favorite pile
of sand: Cavium ThunderX
|
|
When memory allocation profiling is disabled at runtime or due to an
error, shutdown_mem_profiling() is called: slab->obj_exts which
previously allocated remains.
It won't be cleared by unaccount_slab() because of
mem_alloc_profiling_enabled() not true. It's incorrect, slab->obj_exts
should always be cleaned up in unaccount_slab() to avoid following error:
[...]BUG: Bad page state in process...
..
[...]page dumped because: page still charged to cgroup
[andriy.shevchenko@linux.intel.com: fold need_slab_obj_ext() into its only user]
Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions")
Cc: stable@vger.kernel.org
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Tested-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Link: https://patch.msgid.link/20250421075232.2165527-1-quic_zhenhuah@quicinc.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
A previous commit added a 'sync' parameter to io_fallback_tw(), which if
true, means the caller wants to wait on the fallback thread handling it.
But the logic is somewhat messed up, ensure that ctxs are swapped and
flushed appropriately.
Cc: stable@vger.kernel.org
Fixes: dfbe5561ae93 ("io_uring: flush offloaded and delayed task_work on exit")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The GNU coreutils version of truncate, which is the original, accepts a
% prefix for the -s size argument which means the file in question
should be padded to a multiple of the given size. This is currently used
to pad the setup block of bzImage to a multiple of 4k before appending
the decompressor.
busybox reimplements truncate but does not support this idiom, and
therefore fails the build since commit
9c54baab4401 ("x86/boot: Drop CRC-32 checksum and the build tool that generates it")
Since very little build code within the kernel depends on the 'truncate'
utility, work around this incompatibility by avoiding truncate altogether,
and relying on dd to perform the padding.
Fixes: 9c54baab4401 ("x86/boot: Drop CRC-32 checksum and the build tool that generates it")
Reported-by: <phasta@kernel.org>
Tested-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250424101917.1552527-2-ardb+git@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"No fixes from any subtree.
Current release - regressions:
- net: fix the missing unlock for detached devices
Previous releases - regressions:
- sched: fix UAF vulnerability in HFSC qdisc
- lwtunnel: disable BHs when required
- mptcp: pm: defer freeing of MPTCP userspace path manager entries
- tipc: fix NULL pointer dereference in tipc_mon_reinit_self()
- eth: virtio-net: disable delayed refill when pausing rx
Previous releases - always broken:
- phylink: fix suspend/resume with WoL enabled and link down
- eth:
- mlx5: fix null-ptr-deref in mlx5_create_{inner_,}ttc_table()
- xen-netfront: handle NULL returned by xdp_convert_buff_to_frame()
- enetc: fix frame corruption on bpf_xdp_adjust_head/tail() and XDP_PASS
- stmmac: fix dwmac1000 ptp timestamp status offset
- pds_core: prevent possible adminq overflow/stuck condition
Misc:
- a bunch of MAINTAINERS updates"
* tag 'net-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (32 commits)
net: stmmac: fix multiplication overflow when reading timestamp
net: stmmac: fix dwmac1000 ptp timestamp status offset
net: dp83822: Fix OF_MDIO config check
pds_core: make wait_context part of q_info
pds_core: Remove unnecessary check in pds_client_adminq_cmd()
pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result
pds_core: Prevent possible adminq overflow/stuck condition
net: dsa: mt7530: sync driver-specific behavior of MT7531 variants
selftests/tc-testing: Add test for HFSC queue emptying during peek operation
net_sched: hfsc: Fix a potential UAF in hfsc_dequeue() too
net_sched: hfsc: Fix a UAF vulnerability in class handling
selftests: mptcp: diag: use mptcp_lib_get_info_value
mptcp: pm: Defer freeing of MPTCP userspace path manager entries
net: ethernet: mtk_eth_soc: net: revise NETSYSv3 hardware configuration
tipc: fix NULL pointer dereference in tipc_mon_reinit_self()
virtio-net: disable delayed refill when pausing rx
net: phy: leds: fix memory leak
net: phylink: mac_link_(up|down)() clarifications
net: phylink: fix suspend/resume with WoL enabled and link down
net: lwtunnel: disable BHs when required
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Revert acomp multibuffer tests which were buggy
- Fix off-by-one regression in new scomp code
- Lower quality setting on atmel-sha204a as it may not be random
* tag 'v6.15-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: atmel-sha204a - Set hwrng quality to lowest possible
crypto: scomp - Fix off-by-one bug when calculating last page
Revert "crypto: testmgr - Add multibuffer acomp testing"
|
|
Exclude code that relies on sock_cgroup_classid() as preparation of
removal of the function.
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The xt_group matching supports the default hierarchy since commit
c38c4597e4bf3 ("netfilter: implement xt_cgroup cgroup2 path match").
The cgroup v1 matching (based on clsid) and cgroup v2 matching (based on
path) are rather independent. Downgrade the Kconfig dependency to
mere CONFIG_SOCK_GROUP_DATA so that xt_group can be built even without
CONFIG_NET_CLS_CGROUP for path matching.
Also add a message for users when they attempt to specify any clsid.
Link: https://lists.opensuse.org/archives/list/kernel@lists.opensuse.org/thread/S23NOILB7MUIRHSKPBOQKJHVSK26GP6X/
Cc: Jan Engelhardt <ej@inai.de>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication.
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
@depends on patch@
expression E;
@@
-msecs_to_jiffies(E * 1000)
+secs_to_jiffies(E)
-msecs_to_jiffies(E * MSEC_PER_SEC)
+secs_to_jiffies(E)
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Not all VMs allow access to RIP. Check guest_state_protected before
calling kvm_rip_read().
This avoids, for example, hitting WARN_ON_ONCE in vt_cache_reg() for
TDX VMs.
Fixes: 81bf912b2c15 ("KVM: TDX: Implement TDX vcpu enter/exit path")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Message-ID: <20250415104821.247234-3-adrian.hunter@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Not all VMs allow access to RIP. Check guest_state_protected before
calling kvm_rip_read().
This avoids, for example, hitting WARN_ON_ONCE in vt_cache_reg() for
TDX VMs.
Fixes: 81bf912b2c15 ("KVM: TDX: Implement TDX vcpu enter/exit path")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Message-ID: <20250415104821.247234-2-adrian.hunter@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that the AMD IOMMU doesn't signal success incorrectly, WARN if KVM
attempts to track an AMD IRTE entry without metadata.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
WARN if KVM attempts to set vCPU affinity when posted interrupts aren't
enabled, as KVM shouldn't try to enable posting when they're unsupported,
and the IOMMU driver darn well should only advertise posting support when
AMD_IOMMU_GUEST_IR_VAPIC() is true.
Note, KVM consumes is_guest_mode only on success.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Return -EINVAL instead of success if amd_ir_set_vcpu_affinity() is
invoked without use_vapic; lying to KVM about whether or not the IRTE was
configured to post IRQs is all kinds of bad.
Fixes: d98de49a53e4 ("iommu/amd: Enable vAPIC interrupt remapping mode by default")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Take irqfds.lock when adding/deleting an IRQ bypass producer to ensure
irqfd->producer isn't modified while kvm_irq_routing_update() is running.
The only lock held when a producer is added/removed is irqbypass's mutex.
Fixes: 872768800652 ("KVM: x86: select IRQ_BYPASS_MANAGER")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Explicitly treat type differences as GSI routing changes, as comparing MSI
data between two entries could get a false negative, e.g. if userspace
changed the type but left the type-specific data as-is.
Fixes: 515a0c79e796 ("kvm: irqfd: avoid update unmodified entries of the routing")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Restore an IRTE back to host control (remapped or posted MSI mode) if the
*new* GSI route prevents posting the IRQ directly to a vCPU, regardless of
the GSI routing type. Updating the IRTE if and only if the new GSI is an
MSI results in KVM leaving an IRTE posting to a vCPU.
The dangling IRTE can result in interrupts being incorrectly delivered to
the guest, and in the worst case scenario can result in use-after-free,
e.g. if the VM is torn down, but the underlying host IRQ isn't freed.
Fixes: efc644048ecd ("KVM: x86: Update IRTE for posted-interrupts")
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Allocate SVM's interrupt remapping metadata using GFP_ATOMIC as
svm_ir_list_add() is called with IRQs are disabled and irqfs.lock held
when kvm_irq_routing_update() reacts to GSI routing changes.
Fixes: 411b44ba80ab ("svm: Implements update_pi_irte hook to setup posted interrupt")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250404193923.1413163-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Skip IRTE updates if AVIC is disabled/unsupported, as forcing the IRTE
into remapped mode (kvm_vcpu_apicv_active() will never be true) is
unnecessary and wasteful. The IOMMU driver is responsible for putting
IRTEs into remapped mode when an IRQ is allocated by a device, long before
that device is assigned to a VM. I.e. the kernel as a whole has major
issues if the IRTE isn't already in remapped mode.
Opportunsitically kvm_arch_has_irq_bypass() to query for APICv/AVIC, so
so that all checks in KVM x86 incorporate the same information.
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250401161804.842968-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_arch_has_irq_bypass() is a small function and even though it does
not appear in any *really* hot paths, it's also not entirely rare.
Make it inline---it also works out nicely in preparation for using it in
kvm-intel.ko and kvm-amd.ko, since the function is not currently exported.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Loading a driver just to configure blk-cgroup doesn't make sense, as that
assumes and already existing device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250423053810.1683309-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
blkdev_get_no_open can trigger the legacy autoload of block drivers. A
simple stat of a block device has not historically done that, so disable
this behavior again.
Fixes: 9abcfbd235f5 ("block: Add atomic write support for statx")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250423053810.1683309-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
backing_inode is only used once, so remove it and update the comment
describing the bdev lookup to be a bit more clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250423053810.1683309-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
These are only to be used by block internal code. Remove the comment
as we grew more users due to reworking block device node opening.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250423053810.1683309-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When the user increased the read-ahead size through sysfs this value
currently get lost if the device is reprobe, including on a resume
from suspend.
As there is no hardware limitation for the read-ahead size there is
no real need to reset it or track a separate hardware limitation
like for max_sectors.
This restores the pre-atomic queue limit behavior in the sd driver as
sd did not use blk_queue_io_opt and thus never updated the read ahead
size to the value based of the optimal I/O, but changes behavior for
all other drivers. As the new behavior seems useful and sd is the
driver for which the readahead size tweaks are most useful that seems
like a worthwhile trade off.
Fixes: 804e498e0496 ("sd: convert to the atomic queue limits API")
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20250424082521.1967286-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some distributions, such as centos stream 9, still have a version of
coreutils which does not yet support the %Hr and %Lr formats for stat(1)
[1, 2]. Running ublk selftests on these distributions results in the
following error in tests that use the _get_disk_dev_t helper:
line 23: ?r: syntax error: operand expected (error token is "?r")
To better accommodate older distributions, rewrite _get_disk_dev_t to
use the much older %t and %T formats for stat instead.
[1] https://github.com/coreutils/coreutils/blob/v9.0/NEWS#L114
[2] https://pkgs.org/download/coreutils
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250423-ublk_selftests-v1-2-7d060e260e76@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_req_post_cqe() sets submit_state.cq_flush so that
*flush_completions() can take care of batch commiting CQEs. Don't commit
it twice by using __io_cq_unlock_post().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/41c416660c509cee676b6cad96081274bcb459f3.1745493861.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull NVMe fix from Christoph:
"nvme fixes for Linux 6.15
- fix an out-of-bounds access in nvmet_enable_port (Richard Weinberger)"
* tag 'nvme-6.15-2025-04-24' of git://git.infradead.org/nvme:
nvmet: fix out-of-bounds access in nvmet_enable_port
|
|
The qcom_spi_block_erase() function returns with error in case of
failure. Change the qcom_spi_send_cmdaddr() function to propagate
these errors to the callers instead of returning with success.
Fixes: 7304d1909080 ("spi: spi-qpic: add driver for QCOM SPI NAND flash Interface")
Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Link: https://patch.msgid.link/20250423-qpic-snand-propagate-error-v1-1-4b26ed45fdb5@gmail.com
Reviewed-by: Md Sadre Alam <quic_mdalam@quicinc.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
In the latest kernel versions system crashes were noticed occasionally
during suspend/resume. This occurs because the RZ SSI suspend trigger
(called from snd_soc_suspend()) is executed after rz_ssi_pm_ops->suspend()
and it accesses IP registers. After the rz_ssi_pm_ops->suspend() is
executed the IP clocks are disabled and its reset line is asserted.
Since snd_soc_suspend() is invoked through snd_soc_pm_ops->suspend(),
snd_soc_pm_ops is associated with soc_driver (defined in
sound/soc/soc-core.c), and there is no parent-child relationship between
soc_driver and rz_ssi_driver the power management subsystem does not
enforce a specific suspend/resume order between the RZ SSI platform driver
and soc_driver.
To ensure that the suspend/resume function of rz-ssi is executed after
snd_soc_suspend(), use NOIRQ_SYSTEM_SLEEP_PM_OPS().
Fixes: 1fc778f7c833 ("ASoC: renesas: rz-ssi: Add suspend to RAM support")
Cc: stable@vger.kernel.org
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://patch.msgid.link/20250410141525.4126502-1-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The temperature sensor enabled for mv88q222x devices also functions for
mv88q211x based devices. Unify the two devices probe functions to enable
the sensors for all devices supported by this driver.
The same oddity as for mv88q222x devices exists, the PHY link must be up
for a correct temperature reading to be reported.
# cat /sys/class/hwmon/hwmon9/temp1_input
-75000
# ifconfig end5 up
# cat /sys/class/hwmon/hwmon9/temp1_input
59000
Worth noting is that while the temperature register offsets and layout
are the same between mv88q211x and mv88q222x devices their names in the
datasheets are different. This change keeps the mv88q222x names for the
mv88q211x support.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250418145800.2420751-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Alexis Lothore says:
====================
net: stmmac: fix timestamp snapshots on dwmac1000
this is the v2 of a small series containing two small fixes for the
timestamp snapshot feature on stmmac, especially on dwmac1000 version.
Those issues have been detected on a socfpga (Cyclone V) platform. They
kind of follow the big rework sent by Maxime at the end of last year to
properly split this feature support between different versions of the
DWMAC IP.
v1: https://lore.kernel.org/r/20250422-stmmac_ts-v1-0-b59c9f406041@bootlin.com
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
====================
Link: https://patch.msgid.link/20250423-stmmac_ts-v2-0-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The current way of reading a timestamp snapshot in stmmac can lead to
integer overflow, as the computation is done on 32 bits. The issue has
been observed on a dwmac-socfpga platform returning chaotic timestamp
values due to this overflow. The corresponding multiplication is done
with a MUL instruction, which returns 32 bit values. Explicitly casting
the value to 64 bits replaced the MUL with a UMLAL, which computes and
returns the result on 64 bits, and so returns correctly the timestamps.
Prevent this overflow by explicitly casting the intermediate value to
u64 to make sure that the whole computation is made on u64. While at it,
apply the same cast on the other dwmac variant (GMAC4) method for
snapshot retrieval.
Fixes: 477c3e1f6363 ("net: stmmac: Introduce dwmac1000 timestamping operations")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423-stmmac_ts-v2-2-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When a PTP interrupt occurs, the driver accesses the wrong offset to
learn about the number of available snapshots in the FIFO for dwmac1000:
it should be accessing bits 29..25, while it is currently reading bits
19..16 (those are bits about the auxiliary triggers which have generated
the timestamps). As a consequence, it does not compute correctly the
number of available snapshots, and so possibly do not generate the
corresponding clock events if the bogus value ends up being 0.
Fix clock events generation by reading the correct bits in the timestamp
register for dwmac1000.
Fixes: 477c3e1f6363 ("net: stmmac: Introduce dwmac1000 timestamping operations")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423-stmmac_ts-v2-1-e2cf2bbd61b1@bootlin.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When CONFIG_OF_MDIO is set to be a module the code block is not
compiled. Use the IS_ENABLED macro that checks for both built in as
well as module.
Fixes: 5dc39fd5ef35 ("net: phy: DP83822: Add ability to advertise Fiber connection")
Signed-off-by: Johannes Schneider <johannes.schneider@leica-geosystems.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250423044724.1284492-1-johannes.schneider@leica-geosystems.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Regression test for FAN_MARK_MNTFS | FAN_MARK_FLUSH bug.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250418193903.2607617-3-amir73il@gmail.com
|
|
fanotify_mark(fd, FAN_MARK_FLUSH | FAN_MARK_MNTNS, ...) incorrectly
ends up causing removal inode marks.
Fixes: 0f46d81f2bce ("fanotify: notify on mount attach and detach")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250418193903.2607617-2-amir73il@gmail.com
|
|
Kuniyuki Iwashima says:
====================
ipv6: No RTNL for IPv6 routing table.
IPv6 routing tables are protected by each table's lock and work in
the interrupt context, which means we basically don't need RTNL to
modify an IPv6 routing table itself.
Currently, the control paths require RTNL because we may need to
perform device and nexthop lookups; we must prevent dev/nexthop from
going away from the netns.
This, however, can be achieved by RCU as well.
If we are in the RCU critical section while adding an IPv6 route,
synchronize_net() in __dev_change_net_namespace() and
unregister_netdevice_many_notify() guarantee that the dev will not be
moved to another netns or removed.
Also, nexthop is guaranteed not to be freed during the RCU grace period.
If we care about a race between nexthop removal and IPv6 route addition,
we can get rid of RTNL from the control paths.
Patch 1 moves a validation for RTA_MULTIPATH earlier.
Patch 2 removes RTNL for SIOCDELRT and RTM_DELROUTE.
Patch 3 ~ 11 moves validation and memory allocation earlier.
Patch 12 prevents a race between two requests for the same table.
Patch 13 & 14 prevents the nexthop race mentioned above.
Patch 15 removes RTNL for SIOCADDRT and RTM_NEWROUTE.
Test:
The script [0] lets each CPU-X create 100000 routes on table-X in a
batch.
On c7a.metal-48xl EC2 instance with 192 CPUs,
without this series:
$ sudo ./route_test.sh
start adding routes
added 19200000 routes (100000 routes * 192 tables).
total routes: 19200006
Time elapsed: 191577 milliseconds.
with this series:
$ sudo ./route_test.sh
start adding routes
added 19200000 routes (100000 routes * 192 tables).
total routes: 19200006
Time elapsed: 62854 milliseconds.
I changed the number of routes (1000 ~ 100000 per CPU/table) and
consistently saw it finish 3x faster with this series.
[0]
mkdir tmp
NS="test"
ip netns add $NS
ip -n $NS link add veth0 type veth peer veth1
ip -n $NS link set veth0 up
ip -n $NS link set veth1 up
TABLES=()
for i in $(seq $(nproc)); do
TABLES+=("$i")
done
ROUTES=()
for i in {1..100}; do
for j in {1..1000}; do
ROUTES+=("2001:$i:$j::/64")
done
done
for TABLE in "${TABLES[@]}"; do
(
FILE="./tmp/batch-table-$TABLE.txt"
> $FILE
for ROUTE in "${ROUTES[@]}"; do
echo "route add $ROUTE dev veth0 table $TABLE" >> $FILE
done
) &
done
wait
echo "start adding routes"
START_TIME=$(date +%s%3N)
for TABLE in "${TABLES[@]}"; do
ip -n $NS -6 -batch "./tmp/batch-table-$TABLE.txt" &
done
wait
END_TIME=$(date +%s%3N)
ELAPSED_TIME=$((END_TIME - START_TIME))
echo "added $((${#ROUTES[@]} * ${#TABLES[@]})) routes (${#ROUTES[@]} routes * ${#TABLES[@]} tables)."
echo "total routes: $(ip -n $NS -6 route show table all | wc -l)" # Just for debug
echo "Time elapsed: ${ELAPSED_TIME} milliseconds."
ip netns del $NS
rm -fr ./tmp/
v2: https://lore.kernel.org/netdev/20250409011243.26195-1-kuniyu@amazon.com/
v1: https://lore.kernel.org/netdev/20250321040131.21057-1-kuniyu@amazon.com/
====================
Link: https://patch.msgid.link/20250418000443.43734-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Now we are ready to remove RTNL from SIOCADDRT and RTM_NEWROUTE.
The remaining things to do are
1. pass false to lwtunnel_valid_encap_type_attr()
2. use rcu_dereference_rtnl() in fib6_check_nexthop()
3. place rcu_read_lock() before ip6_route_info_create_nh().
Let's complete the RTNL-free conversion.
When each CPU-X adds 100000 routes on table-X in a batch
concurrently on c7a.metal-48xl EC2 instance with 192 CPUs,
without this series:
$ sudo ./route_test.sh
...
added 19200000 routes (100000 routes * 192 tables).
time elapsed: 191577 milliseconds.
with this series:
$ sudo ./route_test.sh
...
added 19200000 routes (100000 routes * 192 tables).
time elapsed: 62854 milliseconds.
I changed the number of routes in each table (1000 ~ 100000)
and consistently saw it finish 3x faster with this series.
Note that now every caller of lwtunnel_valid_encap_type() passes
false as the last argument, and this can be removed later.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-16-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT.
Then, we may be going to add a route tied to a dying nexthop.
The nexthop itself is not freed during the RCU grace period, but
if we link a route after __remove_nexthop_fib() is called for the
nexthop, the route will be leaked.
To avoid the race between IPv6 route addition under RCU vs nexthop
deletion under RTNL, let's add a dead flag and protect it and
nh->f6i_list with a spinlock.
__remove_nexthop_fib() acquires the nexthop's spinlock and sets false
to nh->dead, then calls ip6_del_rt() for the linked route one by one
without the spinlock because fib6_purge_rt() acquires it later.
While adding an IPv6 route, fib6_add() acquires the nexthop lock and
checks the dead flag just before inserting the route.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250418000443.43734-15-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|