Age | Commit message (Collapse) | Author |
|
Helper phy_is_internal() is just used in two places phylib-internally.
So let's remove it from the API.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/f3f35265-80a9-4ed7-ad78-ae22c21e288b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
phy_queue_state_machine() isn't used outside phy.c,
so stop exporting it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/16986d3d-7baf-4b02-a641-e2916d491264@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Stop exporting feature arrays which aren't used outside phylib.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/01886672-4880-4ca8-b7b0-94d40f6e0ec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
outside phylib
Certain fixup-related definitions aren't used outside phy_device.c.
So make them private and remove them from phy.h.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/ea6fde13-9183-4c7c-8434-6c0eb64fc72c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
KVM fixes for 6.14 part 1
- Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated
by KVM to fix a NULL pointer dereference.
- Enter guest mode (L2) from KVM's perspective before initializing the vCPU's
nested NPT MMU so that the MMU is properly tagged for L2, not L1.
- Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the
guest's value may be stale if a VM-Exit is handled in the fastpath.
|
|
KVM is dependent on the PSP SEV driver and PSP SEV driver needs to be
loaded before KVM module. In case of module loading any dependent
modules are automatically loaded but in case of built-in modules there
is no inherent mechanism available to specify dependencies between
modules and ensure that any dependent modules are loaded implicitly.
Add a new external API interface for PSP module initialization which
allows PSP SEV driver to be loaded explicitly if KVM is built-in.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Message-ID: <15279ca0cad56a07cf12834ec544310f85ff5edc.1739226950.git.ashish.kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"Take the newly introduced EFI_MEMORY_HOT_PLUGGABLE memory attribute
into account when placing the kernel image in memory at boot.
Otherwise, the presence of the kernel image could prevent such a
memory region from being unplugged at runtime if it was 'cold
plugged', i.e., already plugged in at boot time (and exposed via the
EFI memory map).
This should ensure that the new EFI_MEMORY_HOT_PLUGGABLE memory
attribute is used consistently by Linux before it ever turns up in
production, ensuring that we can make meaningful use of it without
running the risk of regressing existing users"
* tag 'efi-fixes-for-v6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: Use BIT_ULL() constants for memory attributes
efi: Avoid cold plugged memory for placing the kernel
|
|
Make xpcs_config_eee() private to the XPCS driver, called only from
the phylink pcs_disable_eee() and pcs_enable_eee() methods.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1thRQT-003w7O-Ec@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a function to separate out the EEE clock multiplying factor. This
will be called by the stmmac driver to configure this value.
It would have been better had the driver used the CLK API to retrieve
this clock, get its rate and calculate the appropriate multiplier, but
that door has closed.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1thRQ8-003w70-VT@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are hooks in the stmmac driver into XPCS to control the EEE
settings when LPI is configured at the MAC. This bypasses the layering.
To allow this to be removed from the stmmac driver, add two new
methods for PCS to inform them when the LPI/EEE enablement state
changes at the MAC.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1thRQ3-003w6u-RH@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor clock management in EQoS driver for code reuse and to avoid
redundancy. This way, only minimal changes are required when a new platform
is added.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Swathi K S <swathi.ks@samsung.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250213041559.106111-1-swathi.ks@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull block fixes from Jens Axboe:
- Fix for request rejection for batch addition
- Fix a few issues for bogus mac partition tables
* tag 'block-6.14-20250214' of git://git.kernel.dk/linux:
partitions: mac: fix handling of bogus partition table
block: cleanup and fix batch completion adding conditions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix a race window where a newly forked task could escape cgroup.kill
- Remove incorrectly included steal time from cpu.stat::usage_usec
- Minor update in selftest
* tag 'cgroup-for-6.14-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Remove steal time from usage_usec
selftests/cgroup: use bash in test_cpuset_v1_hp.sh
cgroup: fix race between fork and cgroup.kill
|
|
Enable support for VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC, to enable the VF
driver the ability to determine what Rx descriptor formats are
available. This requires sending an additional message during
initialization and reset, the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS. This
operation requests the supported Rx descriptor IDs available from the
PF.
This is treated the same way that VLAN V2 capabilities are handled. Add
a new set of extended capability flags, used to process send and receipt
of the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message.
This ensures we finish negotiating for the supported descriptor formats
prior to beginning configuration of receive queues.
This change stores the supported format bitmap into the iavf_adapter
structure. Additionally, if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is enabled
by the PF, we need to make sure that the Rx queue configuration
specifies the format.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Support for allowing VF to negotiate the descriptor format requires that
the VF specify which descriptor format to use when requesting Rx queues.
The VF is supposed to request the set of supported formats via the new
VIRTCHNL_OP_GET_SUPPORTED_RXDIDS, and then set one of the supported
formats in the rxdid field of the virtchnl_rxq_info structure.
The virtchnl.h header does not provide an enumeration of the format
values. The existing implementations in the PF directly use the values
from the DDP package.
Make the formats explicit by defining an enumeration of the RXDIDs.
Provide an enumeration for the values as well as the bit positions as
returned by the supported_rxdids data from the
VIRTCHNL_OP_GET_SUPPORTED_RXDIDS.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
To support Rx timestamp offload, VIRTCHNL_OP_1588_PTP_CAPS is sent by
the VF to request PTP capability and responded by the PF what capability
is enabled for that VF.
Hardware captures timestamps which contain only 32 bits of nominal
nanoseconds, as opposed to the 64bit timestamps that the stack expects.
To convert 32b to 64b, we need a current PHC time.
VIRTCHNL_OP_1588_PTP_GET_TIME is sent by the VF and responded by the
PF with the current PHC time.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add support for allowing a VF to enable PTP feature - Rx timestamps
The new capability is gated by VIRTCHNL_VF_CAP_PTP, which must be
set by the VF to request access to the new operations. In addition, the
VIRTCHNL_OP_1588_PTP_CAPS command is used to determine the specific
capabilities available to the VF.
This support includes the following additional capabilities:
* Rx timestamps enabled in the Rx queues (when using flexible advanced
descriptors)
* Read access to PHC time over virtchnl using
VIRTCHNL_OP_1588_PTP_GET_TIME
Extra space is reserved in most structures to allow for future
extension (like set clock, Tx timestamps). Additional opcode numbers
are reserved and space in the virtchnl_ptp_caps structure is
specifically set aside for this.
Additionally, each structure has some space reserved for future
extensions to allow some flexibility.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
It is possible to correctly do aging without taking the KVM MMU lock,
or while taking it for read; add a Kconfig to let architectures do so.
Architectures that select KVM_MMU_LOCKLESS_AGING are responsible for
correctness.
Suggested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20250204004038.1680123-3-jthoughton@google.com
[sean: massage shortlog+changelog, fix Kconfig goof and shorten name]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The handoff between the boot stubs and start_secondary() are before IBT is
enabled and is definitely not subject to kCFI. As such, suppress all that for
this function.
Notably when the ENDBR poison would become fatal (ud1 instead of nop) this will
trigger a tripple fault because we haven't set up the IDT to handle #UD yet.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20250207122546.509520369@infradead.org
|
|
With the introduction of kCFI the addition of ENDBR to
SYM_FUNC_START* no longer suffices to make the function indirectly
callable. This now requires the use of SYM_TYPED_FUNC_START.
As such, remove the implicit ENDBR from SYM_FUNC_START* and add some
explicit annotations to fix things up again.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20250207122546.409116003@infradead.org
|
|
Depends on the simplifications from commit 1d7e707af446 ("Revert "x86/module: prepare module loading for ROX allocations of text"")
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
Add the following kfuncs to set and remove xattrs from BPF programs:
bpf_set_dentry_xattr
bpf_remove_dentry_xattr
bpf_set_dentry_xattr_locked
bpf_remove_dentry_xattr_locked
The _locked version of these kfuncs are called from hooks where
dentry->d_inode is already locked. Instead of requiring the user
to know which version of the kfuncs to use, the verifier will pick
the proper kfunc based on the calling hook.
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20250130213549.3353349-5-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The core functionality in target_cpu_store() is also needed in a
subsequent patch for automatically changing the CPU when taking
a CPU offline. As such, factor out the body of target_cpu_store()
into new function vmbus_channel_set_cpu() that can also be used
elsewhere.
No functional change is intended.
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20250117203309.192072-2-hamzamahfooz@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250117203309.192072-2-hamzamahfooz@linux.microsoft.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc3).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and bluetooth.
Kalle Valo steps down after serving as the WiFi driver maintainer for
over a decade.
Current release - fix to a fix:
- vsock: orphan socket after transport release, avoid null-deref
- Bluetooth: L2CAP: fix corrupted list in hci_chan_del
Current release - regressions:
- eth:
- stmmac: correct Rx buffer layout when SPH is enabled
- iavf: fix a locking bug in an error path
- rxrpc: fix alteration of headers whilst zerocopy pending
- s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH
- Revert "netfilter: flowtable: teardown flow if cached mtu is stale"
Current release - new code bugs:
- rxrpc: fix ipv6 path MTU discovery, only ipv4 worked
- pse-pd: fix deadlock in current limit functions
Previous releases - regressions:
- rtnetlink: fix netns refleak with rtnl_setlink()
- wifi: brcmfmac: use random seed flag for BCM4355 and BCM4364
firmware
Previous releases - always broken:
- add missing RCU protection of struct net throughout the stack
- can: rockchip: bail out if skb cannot be allocated
- eth: ti: am65-cpsw: base XDP support fixes
Misc:
- ethtool: tsconfig: update the format of hwtstamp flags, changes the
uAPI but this uAPI was not in any release yet"
* tag 'net-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
net: pse-pd: Fix deadlock in current limit functions
rxrpc: Fix ipv6 path MTU discovery
Reapply "net: skb: introduce and use a single page frag cache"
s390/qeth: move netif_napi_add_tx() and napi_enable() from under BH
mlxsw: Add return value check for mlxsw_sp_port_get_stats_raw()
ipv6: mcast: add RCU protection to mld_newpack()
team: better TEAM_OPTION_TYPE_STRING validation
Bluetooth: L2CAP: Fix corrupted list in hci_chan_del
Bluetooth: btintel_pcie: Fix a potential race condition
Bluetooth: L2CAP: Fix slab-use-after-free Read in l2cap_send_cmd
net: ethernet: ti: am65_cpsw: fix tx_cleanup for XDP case
net: ethernet: ti: am65-cpsw: fix RX & TX statistics for XDP_TX case
net: ethernet: ti: am65-cpsw: fix memleak in certain XDP cases
vsock/test: Add test for SO_LINGER null ptr deref
vsock: Orphan socket after transport release
MAINTAINERS: Add sctp headers to the general netdev entry
Revert "netfilter: flowtable: teardown flow if cached mtu is stale"
iavf: Fix a locking bug in an error path
rxrpc: Fix alteration of headers whilst zerocopy pending
net: phylink: make configuring clock-stop dependent on MAC support
...
|
|
This patch addresses an issue where some files in case-insensitive
directories become inaccessible due to changes in how the kernel
function, utf8_casefold(), generates case-folded strings from the
commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code
points").
There are good reasons why this change should be made; it's actually
quite stupid that Unicode seems to think that the characters ❤ and ❤️
should be casefolded. Unfortimately because of the backwards
compatibility issue, this commit was reverted in 231825b2e1ff.
This problem is addressed by instituting a brute-force linear fallback
if a lookup fails on case-folded directory, which does result in a
performance hit when looking up files affected by the changing how
thekernel treats ignorable Uniode characters, or when attempting to
look up non-existent file names. So this fallback can be disabled by
setting an encoding flag if in the future, the system administrator or
the manufacturer of a mobile handset or tablet can be sure that there
was no opportunity for a kernel to insert file names with incompatible
encodings.
Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
|
|
This reverts commit 011b0335903832facca86cd8ed05d7d8d94c9c76.
Sabrina reports that the revert may trigger warnings due to intervening
changes, especially the ability to rise MAX_SKB_FRAGS. Let's drop it
and revisit once that part is also ironed out.
Fixes: 011b03359038 ("Revert "net: skb: introduce and use a single page frag cache"")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/6bf54579233038bc0e76056c5ea459872ce362ab.1739375933.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove commented out code.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20250211102057.587182-1-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Many drivers abuse the platform driver/bus system as it provides a
simple way to create and bind a device to a driver-specific set of
probe/release functions. Instead of doing that, and wasting all of the
memory associated with a platform device, here is a "faux" bus that
can be used instead.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/2025021026-atlantic-gibberish-3f0c@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The conditions for whether or not a request is allowed adding to a
completion batch are a bit hard to read, and they also have a few
issues. One is that ioerror may indeed be a random value on passthrough,
and it's being checked unconditionally of whether or not the given
request is a passthrough request or not.
Rewrite the conditions to be separate for easier reading, and only check
ioerror for non-passthrough requests. This fixes an issue with bio
unmapping on passthrough, where it fails getting added to a batch. This
both leads to suboptimal performance, and may trigger a potential
schedule-under-atomic condition for polled passthrough IO.
Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
Link: https://lore.kernel.org/r/20575f0a-656e-4bb3-9d82-dec6c7e3a35c@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix a number of hangs in the netfslib read-retry code, including:
(1) netfs_reissue_read() doubles up the getting of references on
subrequests, thereby leaking the subrequest and causing inode eviction
to wait indefinitely. This can lead to the kernel reporting a hang in
the filesystem's evict_inode().
Fix this by removing the get from netfs_reissue_read() and adding one
to netfs_retry_read_subrequests() to deal with the one place that
didn't double up.
(2) The loop in netfs_retry_read_subrequests() that retries a sequence of
failed subrequests doesn't record whether or not it retried the one
that the "subreq" pointer points to when it leaves the loop. It may
not if renegotiation/repreparation of the subrequests means that fewer
subrequests are needed to span the cumulative range of the sequence.
Because it doesn't record this, the piece of code that discards
now-superfluous subrequests doesn't know whether it should discard the
one "subreq" points to - and so it doesn't.
Fix this by noting whether the last subreq it examines is superfluous
and if it is, then getting rid of it and all subsequent subrequests.
If that one one wasn't superfluous, then we would have tried to go
round the previous loop again and so there can be no further unretried
subrequests in the sequence.
(3) netfs_retry_read_subrequests() gets yet an extra ref on any additional
subrequests it has to get because it ran out of ones it could reuse to
to renegotiation/repreparation shrinking the subrequests.
Fix this by removing that extra ref.
(4) In netfs_retry_reads(), it was using wait_on_bit() to wait for
NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the
sequence - but netfs_read_subreq_terminated() is now using a wait
queue on the request instead and so this wait will never finish.
Fix this by waiting on the wait queue instead. To make this work, a
new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell
the wake-up code to wake up the wait queue rather than requeuing the
request's work item.
Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is
no longer used.
(5) Whilst not strictly anything to do with the hang,
netfs_retry_read_subrequests() was also doubly incrementing the
subreq_counter and re-setting the debug index, leaving a gap in the
trace. This is also fixed.
One of these hangs was observed with 9p and with cifs. Others were forced
by manual code injection into fs/afs/file.c. Firstly, afs_prepare_read()
was created to provide an changing pattern of maximum subrequest sizes:
static int afs_prepare_read(struct netfs_io_subrequest *subreq)
{
struct netfs_io_request *rreq = subreq->rreq;
if (!S_ISREG(subreq->rreq->inode->i_mode))
return 0;
if (subreq->retry_count < 20)
rreq->io_streams[0].sreq_max_len =
umax(200, 2222 - subreq->retry_count * 40);
else
rreq->io_streams[0].sreq_max_len = 3333;
return 0;
}
and pointed to by afs_req_ops. Then the following:
struct netfs_io_subrequest *subreq = op->fetch.subreq;
if (subreq->error == 0 &&
S_ISREG(subreq->rreq->inode->i_mode) &&
subreq->retry_count < 20) {
subreq->transferred = subreq->already_done;
__clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
afs_fetch_data_notify(op);
return;
}
was inserted into afs_fetch_data_success() at the beginning and struct
netfs_io_subrequest given an extra field, "already_done" that was set to
the value in "subreq->transferred" by netfs_reissue_read().
When reading a 4K file, the subrequests would get gradually smaller, a new
subrequest would be allocated around the 3rd retry and then eventually be
rendered superfluous when the 20th retry was hit and the limit on the first
subrequest was eased.
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20250212222402.3618494-2-dhowells@redhat.com
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Steve French <stfrench@microsoft.com>
cc: Ihor Solodrai <ihor.solodrai@pm.me>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
For some usecases a consumer driver requires its device to remain power-on
from the PM domain perspective during runtime. Using dev PM qos along with
the genpd governors, doesn't work for this case as would potentially
prevent the device from being runtime suspended too.
To support these usecases, let's introduce dev_pm_genpd_rpm_always_on() to
allow consumers drivers to dynamically control the behaviour in genpd for a
device that is attached to it.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1738736156-119203-4-git-send-email-shawn.lin@rock-chips.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Some users are reporting journal replay takes a long time when there is
excessive number of revoke blocks in the journal. Reported times are
like:
1048576 records - 95 seconds
2097152 records - 580 seconds
The problem is that hash chains in the revoke table gets excessively
long in these cases. Fix the problem by sizing the revoke table
appropriately before the revoke pass.
Thanks to Alexey Zhuravlev <azhuravlev@ddn.com> for benchmarking the
patch with large numbers of revoke blocks [1].
[1] https://lore.kernel.org/all/20250113183107.7bfef7b6@x390.bzzz77.ru
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://patch.msgid.link/20250121140925.17231-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Provide a helper to determine whether the MAC operations structure
implements the LPI operations, which will be used by both phylink and
DSA.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1thR9g-003vX6-4s@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce the `phy_get_next_update_time` function to allow PHY drivers
to dynamically determine the time (in jiffies) until the next state
update event. This enables more flexible and adaptive polling intervals
based on the link state or other conditions.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250210082358.200751-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add offload_flags to the documentation comment for struct spi_transfer.
This was missed when adding the field.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250212154356.784944ea@canb.auug.org.au/
Fixes: 700a281905f2 ("spi: add offload TX/RX streaming APIs")
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://patch.msgid.link/20250212-spi-offload-fixes-v1-1-e192c69e3bb3@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The whole purpose of the custom CLASS() is to have possibility
to initialise the counter variable _i to 0. This can't be done
with simple __free() macro as it will be not allowed by C language.
OTOH, the CLASS() operates with the pointers and explicit usage of
the scoped variable _data is not needed, since the pointers are kept
the same over the iterations. Simplify the implementation of
for_each_hwgpio_in_range().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20250207151149.2119765-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Refactor for_each_requested_gpio_in_range() to deduplicate some code
which is basically repeats the for_each_hwgpio(). In order to achieve
this, split the latter to two, for_each_hwgpio_in_range() and
for_each_hwgpio().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20250207151149.2119765-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Christian Brauner <brauner@kernel.org> says:
Currently, it isn't possible to change the idmapping of an idmapped
mount. This is becoming an obstacle for various use-cases.
/* idmapped home directories with systemd-homed */
On newer systems /home is can be an idmapped mount such that each file
on disk is owned by 65536 and a subfolder exists for foreign id ranges
such as containers. For example, a home directory might look like this
(using an arbitrary folder as an example):
user1@localhost:~/data/mount-idmapped$ ls -al /data/
total 16
drwxrwxrwx 1 65536 65536 36 Jan 27 12:15 .
drwxrwxr-x 1 root root 184 Jan 27 12:06 ..
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 aaa
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 bbb
-rw-r--r-- 1 65536 65536 0 Jan 27 12:07 cc
drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers
When logging in home is mounted as an idmapped mount with the following
idmappings:
65536:$(id -u):1 // uid mapping
65536:$(id -g):1 // gid mapping
2147352576:2147352576:65536 // uid mapping
2147352576:2147352576:65536 // gid mapping
So for a user with uid/gid 1000 an idmapped /home would like like this:
user1@localhost:~/data/mount-idmapped$ ls -aln /mnt/
total 16
drwxrwxrwx 1 1000 1000 36 Jan 27 12:15 .
drwxrwxr-x 1 0 0 184 Jan 27 12:06 ..
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 aaa
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 bbb
-rw-r--r-- 1 1000 1000 0 Jan 27 12:07 cc
drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers
In other words, 65536 is mapped to the user's uid/gid and the range
2147352576 up to 2147352576 + 65536 is an identity mapping for
containers.
When a container is started a transient uid/gid range is allocated
outside of both mappings of the idmapped mount. For example, the
container might get the idmapping:
$ cat /proc/1742611/uid_map
0 537985024 65536
This container will be allowed to write to disk within the allocated
foreign id range 2147352576 to 2147352576 + 65536. To do this an
idmapped mount must be created from an already idmapped mount such that:
- The mappings for the user's uid/gid must be dropped, i.e., the
following mappings are removed:
65536:$(id -u):1 // uid mapping
65536:$(id -g):1 // gid mapping
- A mapping for the transient uid/gid range to the foreign uid/gid range
is added:
2147352576:537985024:65536
In combination this will mean that the container will write to disk
within the foreign id range 2147352576 to 2147352576 + 65536.
/* nested containers */
When the outer container makes use of idmapped mounts it isn't posssible
to create an idmapped mount for the inner container with a differen
idmapping from the outer container's idmapped mount.
There are other usecases and the two above just serve as an illustration
of the problem.
This patchset makes it possible to create a new idmapped mount from an
already idmapped mount. It aims to adhere to current performance
constraints and requirements:
- Idmapped mounts aim to have near zero performance implications for
path lookup. That is why no refernce counting, locking or any other
mechanism can be required that would impact performance.
This works be ensuring that a regular mount transitions to an idmapped
mount once going from a static nop_mnt_idmap mapping to a non-static
idmapping.
- The idmapping of a mount change anymore for the lifetime of the mount
afterwards. This not just avoids UAF issues it also avoids pitfalls
such as generating non-matching uid/gid values.
Changing idmappings could be solved by:
- Idmappings could simply be reference counted (above the simple
reference count when sharing them across multiple mounts).
This would require pairing mnt_idmap_get() with mnt_idmap_put() which
would end up being sprinkled everywhere into the VFS and some
filesystems that access idmappings directly.
It wouldn't just be quite ugly and introduce new complexity it would
have a noticeable performance impact.
- Idmappings could gain RCU protection. This would help the LOOKUP_RCU
case and avoids taking reference counts under RCU.
When not under LOOKUP_RCU reference counts need to be acquired on each
idmapping. This would require pairing mnt_idmap_get() with
mnt_idmap_put() which would end up being sprinkled everywhere into the
VFS and some filesystems that access idmappings directly.
This would have the same downsides as mentioned earlier.
- The earlier solutions work by updating the mnt->mnt_idmap pointer with
the new idmapping. Instead of this it would be possible to change the
idmapping itself to avoid UAF issues.
To do this a sequence counter would have to be added to struct mount.
When retrieving the idmapping to generate uid/gid values the sequence
counter would need to be sampled and the generation of the uid/gid
would spin until the update of the idmap is finished.
This has problems as well but the biggest issue will be that this can
lead to inconsistent permission checking and inconsistent uid/gid
pairs even more than this is already possible today. Specifically,
during creation it could happen that:
idmap = mnt_idmap(mnt);
inode_permission(idmap, ...);
may_create(idmap);
// create file with uid/gid based on @idmap
in between the permission checking and the generation of the uid/gid
value the idmapping could change leading to the permission checking
and uid/gid value that is actually used to create a file on disk being
out of sync.
Similarly if two values are generated like:
idmap = mnt_idmap(mnt)
vfsgid = make_vfsgid(idmap);
// idmapping gets update concurrently
vfsuid = make_vfsuid(idmap);
@vfsgid and @vfsuid could be out of sync if the idmapping was changed
in between. The generation of vfsgid/vfsuid could span a lot of
codelines so to guard against this a sequence count would have to be
passed around.
The performance impact of this solutio are less clear but very likely
not zero.
- Using SRCU similar to fanotify that can sleep. I find that not just
ugly but it would have memory consumption implications and is overall
pretty ugly.
/* solution */
So, to avoid all of these pitfalls creating an idmapped mount from an
already idmapped mount will be done atomically, i.e., a new detached
mount is created and a new set of mount properties applied to it without
it ever having been exposed to userspace at all.
This can be done in two ways. A new flag to open_tree() is added
OPEN_TREE_CLEAR_IDMAP that clears the old idmapping and returns a mount
that isn't idmapped. And then it is possible to set mount attributes on
it again including creation of an idmapped mount.
This has the consequence that a file descriptor must exist in userspace
that doesn't have any idmapping applied and it will thus never work in
unpriviledged scenarios. As a container would be able to remove the
idmapping of the mount it has been given. That should be avoided.
Instead, we add open_tree_attr() which works just like open_tree() but
takes an optional struct mount_attr parameter. This is useful beyond
idmappings as it fills a gap where a mount never exists in userspace
without the necessary mount properties applied.
This is particularly useful for mount options such as
MOUNT_ATTR_{RDONLY,NOSUID,NODEV,NOEXEC}.
To create a new idmapped mount the following works:
// Create a first idmapped mount
struct mount_attr attr = {
.attr_set = MOUNT_ATTR_IDMAP
.userns_fd = fd_userns
};
fd_tree = open_tree(-EBADF, "/", OPEN_TREE_CLONE, &attr, sizeof(attr));
move_mount(fd_tree, "", -EBADF, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
// Create a second idmapped mount from the first idmapped mount
attr.attr_set = MOUNT_ATTR_IDMAP;
attr.userns_fd = fd_userns2;
fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr));
// Create a second non-idmapped mount from the first idmapped mount:
memset(&attr, 0, sizeof(attr));
attr.attr_clr = MOUNT_ATTR_IDMAP;
fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr));
* patches from https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org:
fs: allow changing idmappings
fs: add kflags member to struct mount_kattr
fs: add open_tree_attr()
fs: add copy_mount_setattr() helper
fs: add vfs_open_tree() helper
Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add open_tree_attr() which allow to atomically create a detached mount
tree and set mount options on it. If OPEN_TREE_CLONE is used this will
allow the creation of a detached mount with a new set of mount options
without it ever being exposed to userspace without that set of mount
options applied.
Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-3-c25feb0d2eb3@kernel.org
Reviewed-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This adds the STATMOUNT_MNT_UIDMAP and STATMOUNT_MNT_GIDMAP options.
It allows the retrieval of idmappings via statmount().
Currently it isn't possible to figure out what idmappings are applied to
an idmapped mount. This information is often crucial. Before statmount()
the only realistic options for an interface like this would have been to
add it to /proc/<pid>/fdinfo/<nr> or to expose it in
/proc/<pid>/mountinfo. Both solution would have been pretty ugly and
would've shown information that is of strong interest to some
application but not all. statmount() is perfect for this.
The idmappings applied to an idmapped mount are shown relative to the
caller's user namespace. This is the most useful solution that doesn't
risk leaking information or confuse the caller.
For example, an idmapped mount might have been created with the
following idmappings:
mount --bind -o X-mount.idmap="0:10000:1000 2000:2000:1 3000:3000:1" /srv /opt
Listing the idmappings through statmount() in the same context shows:
mnt_id: 2147485088
mnt_parent_id: 2147484816
fs_type: btrfs
mnt_root: /srv
mnt_point: /opt
mnt_opts: ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
mnt_uidmap[0]: 0 10000 1000
mnt_uidmap[1]: 2000 2000 1
mnt_uidmap[2]: 3000 3000 1
mnt_gidmap[0]: 0 10000 1000
mnt_gidmap[1]: 2000 2000 1
mnt_gidmap[2]: 3000 3000 1
But the idmappings might not always be resolvable in the caller's user
namespace. For example:
unshare --user --map-root
In this case statmount() will skip any mappings that fil to resolve in
the caller's idmapping:
mnt_id: 2147485087
mnt_parent_id: 2147484016
fs_type: btrfs
mnt_root: /srv
mnt_point: /opt
mnt_opts: ssd,discard=async,space_cache=v2,subvolid=5,subvol=/
The caller can differentiate between a mount not being idmapped and a
mount that is idmapped but where all mappings fail to resolve in the
caller's idmapping by check for the STATMOUNT_MNT_{G,U}IDMAP flag being
raised but the number of mappings in ->mnt_{g,u}idmap_num being zero.
Note that statmount() requires that the whole range must be resolvable
in the caller's user namespace. If a subrange fails to map it will still
list the map as not resolvable. This is a practical compromise to avoid
having to find which subranges are resovable and wich aren't.
Idmappings are listed as a string array with each mapping separated by
zero bytes. This allows to retrieve the idmappings and immediately use
them for writing to e.g., /proc/<pid>/{g,u}id_map and it also allow for
simple iteration like:
if (stmnt->mask & STATMOUNT_MNT_UIDMAP) {
const char *idmap = stmnt->str + stmnt->mnt_uidmap;
for (size_t idx = 0; idx < stmnt->mnt_uidmap_nr; idx++) {
printf("mnt_uidmap[%lu]: %s\n", idx, idmap);
idmap += strlen(idmap) + 1;
}
}
Link: https://lore.kernel.org/r/20250204-work-mnt_idmap-statmount-v2-2-007720f39f2e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add map_id_range_up() to verify that the full kernel id range can be
mapped up in a given idmapping. This will be used in follow-up patches.
Link: https://lore.kernel.org/r/20250204-work-mnt_idmap-statmount-v2-1-007720f39f2e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a new gpiod_multi_set_value_cansleep() helper function with fewer
parameters than gpiod_set_array_value_cansleep().
Calling gpiod_set_array_value_cansleep() can get quite verbose. In many
cases, the first arguments all come from the same struct gpio_descs, so
having a separate function where we can just pass that cuts down on the
boilerplate.
Signed-off-by: David Lechner <dlechner@baylibre.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20250210-gpio-set-array-helper-v3-1-d6a673674da8@baylibre.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
Both definitions are unused. Last users have been removed with:
f3ba9d490d6e ("net: s6gmac: remove driver")
2bd229df5e2e ("net: phy: remove state PHY_FORCING")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/f8e7b8ed-a665-41ad-b0ce-cbfdb65262ef@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Consider that an EEE mode may not be broken but simply not supported
by the MAC, and rename function phy_set_eee_broken().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/30deb630-3f6b-4ffb-a1e6-a9736021f43a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This bitmap is used also if the MAC doesn't support an EEE mode.
So the mode isn't necessarily broken in the PHY. Therefore rename
the bitmap.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/6cd11422-dd67-4c87-a642-308de694af92@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is already a macro DEFINE_WAIT_FUNC() to declare a wait_queue_entry
with a specified waking function. But there is not a counterpart for
initializing one wait_queue_entry with a specified waking function. So
introducing init_wait_func() for this, which also could be used in iocost
and rq-qos. Using default_wake_function() in rq_qos_wait() to wake up
waiters, which could remove ->task field from rq_qos_wait_data.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250208090416.38642-1-songmuchun@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of an odd comment, cite the documentation, which says more clearly
what's going on with the programming flow on some of the Intel SoCs.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
Add #include <linux/bits.h> to linux/spi/offload/types.h since this
file uses the BIT macro.
Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: David Lechner <dlechner@baylibre.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20250210-spi-offload-extra-headers-v1-1-0f3356362254@baylibre.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
With some research in some obscure old QSDK, it was possible to find the
MASK of the last register there were still set with raw shift and
convert them to FIELD_PREP API.
This is only a cleanup and modernize the code a bit and doesn't make
any behaviour change.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|