Age | Commit message (Collapse) | Author |
|
Joel Savitz <jsavitz@redhat.com> says:
The two patches are independent of each other. The first patch removes
unnecssary NULL guards from free_nsproxy() and create_new_namespaces()
in line with other usage of the put_*_ns() call sites. The second patch
slightly reduces the size of the kernel when CONFIG_CGROUPS is not
selected.
* patches from https://lore.kernel.org/20250508184930.183040-1-jsavitz@redhat.com:
include/cgroup: separate {get,put}_cgroup_ns no-op case
kernel/nsproxy: remove unnecessary guards
Link: https://lore.kernel.org/20250508184930.183040-1-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When CONFIG_CGROUPS is not selected, {get,put}_cgroup_ns become no-ops
and therefore it is not necessary to compile in the code for changing
the reference count.
When CONFIG_CGROUP is selected, there is no valid case where
either of {get,put}_cgroup_ns() will be called with a NULL argument.
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Link: https://lore.kernel.org/20250508184930.183040-3-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In free_nsproxy() and the error path of create_new_namesapces() the
put_*_ns() calls are guarded by unnecessary NULL checks.
put_pid_ns(), put_ipc_ns(), put_uts_ns(), and put_time_ns() will never
receive a NULL argument unless their namespace type is disabled, and in
this case all four become no-ops at compile time anyway. put_mnt_ns()
will never receive a null argument at any time.
This unguarded usage is in line with other call sites of put_*_ns().
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Link: https://lore.kernel.org/20250508184930.183040-2-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Using FREEZE_HOLDER_USERSPACE has two consequences:
(1) If userspace freezes the filesystem after mnt_drop_write_file() but
before freeze_super() was called filesystem resizing will fail
because the freeze isn't marked as nestable.
(2) If the kernel has successfully frozen the filesystem via
FREEZE_HOLDER_USERSPACE userspace can simply undo it by using the
FITHAW ioctl.
Fix both issues by using FREEZE_HOLDER_KERNEL. It will nest with
FREEZE_HOLDER_USERSPACE and cannot be undone by userspace.
And it is the correct thing to do because the kernel temporarily freezes
the filesystem.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Christian Brauner <brauner@kernel.org> says:
Now all the pieces are in place to actually allow the power subsystem to
freeze/thaw filesystems during suspend/resume. Filesystems are only
frozen and thawed if the power subsystem does actually own the freeze.
Othwerwise it risks thawing filesystems it didn't own. This could be
done differently be e.g., keeping the filesystems that were actually
frozen on a list and then unfreezing them from that list. This is
disgustingly unclean though and reeks of an ugly hack.
If the filesystem is already frozen by the time we've frozen all
userspace processes we don't care to freeze it again. That's userspace's
job once the process resumes. We only actually freeze filesystems if we
absolutely have to and we ignore other failures to freeze.
We could bubble up errors and fail suspend/resume if the error isn't
EBUSY (aka it's already frozen) but I don't think that this is worth it.
Filesystem freezing during suspend/resume is best-effort. If the user
has 500 ext4 filesystems mounted and 4 fail to freeze for whatever
reason then we simply skip them.
What we have now is already a big improvement and let's see how we fare
with it before making our lives even harder (and uglier) than we have
to.
* patches from https://lore.kernel.org/r/20250402-work-freeze-v2-0-6719a97b52ac@kernel.org:
kernfs: add warning about implementing freeze/thaw
power: freeze filesystems during suspend/resume
Link: https://lore.kernel.org/r/20250402-work-freeze-v2-0-6719a97b52ac@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Christian Brauner <brauner@kernel.org> says:
Allow efivarfs to partake to resync variable state during system
hibernation and suspend. Add freeze/thaw support.
This is a pretty straightforward implementation. We simply add regular
freeze/thaw support for both userspace and the kernel. This works
without any big issues and congrats afaict efivars is the first
pseudofilesystem that adds support for filesystem freezing and thawing.
The simplicity comes from the fact that we simply always resync variable
state after efivarfs has been frozen. It doesn't matter whether that's
because of suspend, userspace initiated freeze or hibernation. Efivars
is simple enough that it doesn't matter that we walk all dentries. There
are no directories and there aren't insane amounts of entries and both
freeze/thaw are already heavy-handed operations. If userspace initiated
a freeze/thaw cycle they would need CAP_SYS_ADMIN in the initial user
namespace (as that's where efivarfs is mounted) so it can't be triggered
by random userspace. IOW, we really really don't care.
* patches from https://lore.kernel.org/r/20250331-work-freeze-v1-0-6dfbe8253b9f@kernel.org:
efivarfs: support freeze/thaw
libfs: export find_next_child()
Link: https://lore.kernel.org/r/20250331-work-freeze-v1-0-6dfbe8253b9f@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Sysfs is built on top of kernfs and sysfs provides the power management
infrastructure to support suspend/hibernate by writing to various files
in /sys/power/. As filesystems may be automatically frozen during
suspend/hibernate implementing freeze/thaw support for kernfs
generically will cause deadlocks as the suspending/hibernation
initiating task will hold a VFS lock that it will then wait upon to be
released. If freeze/thaw for kernfs is needed talk to the VFS.
Link: https://lore.kernel.org/r/20250402-work-freeze-v2-4-6719a97b52ac@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow efivarfs to partake to resync variable state during system
hibernation and suspend. Add freeze/thaw support.
This is a pretty straightforward implementation. We simply add regular
freeze/thaw support for both userspace and the kernel. This works
without any big issues and congrats afaict efivars is the first
pseudofilesystem that adds support for filesystem freezing and thawing.
The simplicity comes from the fact that we simply always resync variable
state after efivarfs has been frozen. It doesn't matter whether that's
because of suspend, userspace initiated freeze or hibernation. Efivars
is simple enough that it doesn't matter that we walk all dentries. There
are no directories and there aren't insane amounts of entries and both
freeze/thaw are already heavy-handed operations. We really really don't
need to care.
Link: https://lore.kernel.org/r/20250331-work-freeze-v1-2-6dfbe8253b9f@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now all the pieces are in place to actually allow the power subsystem
to freeze/thaw filesystems during suspend/resume. Filesystems are only
frozen and thawed if the power subsystem does actually own the freeze.
We could bubble up errors and fail suspend/resume if the error isn't
EBUSY (aka it's already frozen) but I don't think that this is worth it.
Filesystem freezing during suspend/resume is best-effort. If the user
has 500 ext4 filesystems mounted and 4 fail to freeze for whatever
reason then we simply skip them.
What we have now is already a big improvement and let's see how we fare
with it before making our lives even harder (and uglier) than we have
to.
We add a new sysctl know /sys/power/freeze_filesystems that will allow
userspace to freeze filesystems during suspend/hibernate. For now it
defaults to off. The thaw logic doesn't require checking whether
freezing is enabled because the power subsystem exclusively owns frozen
filesystems for the duration of suspend/hibernate and is able to skip
filesystems it doesn't need to freeze.
Also it is technically possible that filesystem
filesystem_freeze_enabled is true and power freezes the filesystems but
before freezing all processes another process disables
filesystem_freeze_enabled. If power were to place the filesystems_thaw()
call under filesystems_freeze_enabled it would fail to thaw the
fileystems it frozw. The exclusive holder mechanism makes it possible to
iterate through the list without any concern making sure that no
filesystems are left frozen.
Link: https://lore.kernel.org/r/20250402-work-freeze-v2-3-6719a97b52ac@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Export find_next_child() so it can be used by efivarfs.
Keep it internal for now. There's no reason to advertise this
kernel-wide.
Link: https://lore.kernel.org/r/20250331-work-freeze-v1-1-6dfbe8253b9f@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Christian Brauner <brauner@kernel.org> says:
Add the necessary infrastructure changes to support freezing for suspend
and hibernate. This should all that's needed to wire up power.
* patches from https://lore.kernel.org/r/20250329-work-freeze-v2-0-a47af37ecc3d@kernel.org:
super: add filesystem freezing helpers for suspend and hibernate
gfs2: pass through holder from the VFS for freeze/thaw
super: use common iterator (Part 2)
super: use a common iterator (Part 1)
super: skip dying superblocks early
super: simplify user_get_super()
super: remove pointless s_root checks
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-0-a47af37ecc3d@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow the power subsystem to support filesystem freeze for
suspend and hibernate.
For some kernel subsystems it is paramount that they are guaranteed that
they are the owner of the freeze to avoid any risk of deadlocks. This is
the case for the power subsystem. Enable it to recognize whether it did
actually freeze the filesystem.
If userspace has 10 filesystems and suspend/hibernate manges to freeze 5
and then fails on the 6th for whatever odd reason (current or future)
then power needs to undo the freeze of the first 5 filesystems. It can't
just walk the list again because while it's unlikely that a new
filesystem got added in the meantime it still cannot tell which
filesystems the power subsystem actually managed to get a freeze
reference count on that needs to be dropped during thaw.
There's various ways out of this ugliness. For example, record the
filesystems the power subsystem managed to freeze on a temporary list in
the callbacks and then walk that list backwards during thaw to undo the
freezing or make sure that the power subsystem just actually exclusively
freezes things it can freeze and marking such filesystems as being owned
by power for the duration of the suspend or resume cycle. I opted for
the latter as that seemed the clean thing to do even if it means more
code changes.
If hibernation races with filesystem freezing (e.g. DM reconfiguration),
then hibernation need not freeze a filesystem because it's already
frozen but userspace may thaw the filesystem before hibernation actually
happens.
If the race happens the other way around, DM reconfiguration may
unexpectedly fail with EBUSY.
So allow FREEZE_EXCL to nest with other holders. An exclusive freezer
cannot be undone by any of the other concurrent freezers.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Stop using write_cache_pages and use writeback_iter directly. This
removes an indirect call per written folio and makes the code easier
to follow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250507062124.3933305-1-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Brian Foster <bfoster@redhat.com> says:
Here's a bit more fallout and prep. work associated with the folio batch
prototype posted a while back [1]. Work on that is still pending so it
isn't included here, but based on the iter advance cleanups most of
these seemed worthwhile as standalone cleanups. Mainly this just cleans
up some of the helpers and pushes some pos/len trimming further down in
the write begin path.
The fbatch thing is still in prototype stage, but for context the intent
here is that it can mostly now just bolt onto the folio lookup path
because we can advance the range that is skipped and return the next
folio along with the folio subrange for the caller to process.
[1] https://lore.kernel.org/linux-fsdevel/20241213150528.1003662-1-bfoster@redhat.com/
* patches from https://lore.kernel.org/20250506134118.911396-1-bfoster@redhat.com:
iomap: rework iomap_write_begin() to return folio offset and length
iomap: push non-large folio check into get folio path
iomap: helper to trim pos/bytes to within folio
iomap: drop pos param from __iomap_[get|put]_folio()
iomap: drop unnecessary pos param from iomap_write_[begin|end]
iomap: resample iter->pos after iomap_write_begin() calls
Link: https://lore.kernel.org/20250506134118.911396-1-bfoster@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
iomap_write_begin() returns a folio based on current pos and
remaining length in the iter, and each caller then trims the
pos/length to the given folio. Clean this up a bit and let
iomap_write_begin() return the trimmed range along with the folio.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-7-bfoster@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The len param to __iomap_get_folio() is primarily a folio allocation
hint. iomap_write_begin() already trims its local len variable based
on the provided folio, so move the large folio support check closer
to folio lookup.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-6-bfoster@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Several buffered write based iteration callbacks duplicate logic to
trim the current pos and length to within the current folio. Factor
this into a helper to make it easier to relocate closer to folio
lookup.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-5-bfoster@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Both helpers take the iter and pos as parameters. All callers
effectively pass iter->pos, so drop the unnecessary pos parameter.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-4-bfoster@redhat.com
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
iomap_write_begin() and iomap_write_end() both take the iter and
iter->pos as parameters. Drop the unnecessary pos parameter and
sample iter->pos within each function.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-3-bfoster@redhat.com
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In preparation for removing the pos parameter, push the local pos
assignment down after calls to iomap_write_begin().
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-2-bfoster@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
mark_buffer_write_io_error sets sb->s_wb_err to -EIO twice.
Once in mapping_set_error and once in errseq_set.
Only mapping_set_error checks if bh->b_assoc_map->host is NULL.
Discovered during null pointer dereference during writeback
to a failing device:
[<ffffffff9a416dc8>] ? mark_buffer_write_io_error+0x98/0xc0
[<ffffffff9a416dbe>] ? mark_buffer_write_io_error+0x8e/0xc0
[<ffffffff9ad4bda0>] end_buffer_async_write+0x90/0xd0
[<ffffffff9ad4e3eb>] end_bio_bh_io_sync+0x2b/0x40
[<ffffffff9adbafe6>] blk_update_request+0x1b6/0x480
[<ffffffff9adbb3d8>] blk_mq_end_request+0x18/0x30
[<ffffffff9adbc6aa>] blk_mq_dispatch_rq_list+0x4da/0x8e0
[<ffffffff9adc0a68>] __blk_mq_sched_dispatch_requests+0x218/0x6a0
[<ffffffff9adc07fa>] blk_mq_sched_dispatch_requests+0x3a/0x80
[<ffffffff9adbbb98>] blk_mq_run_hw_queue+0x108/0x330
[<ffffffff9adbcf58>] blk_mq_flush_plug_list+0x178/0x5f0
[<ffffffff9adb6741>] __blk_flush_plug+0x41/0x120
[<ffffffff9adb6852>] blk_finish_plug+0x22/0x40
[<ffffffff9ad47cb0>] wb_writeback+0x150/0x280
[<ffffffff9ac5343f>] ? set_worker_desc+0x9f/0xc0
[<ffffffff9ad4676e>] wb_workfn+0x24e/0x4a0
Fixes: 485e9605c0573 ("fs/buffer.c: record blockdev write errors in super_block that it backs")
Signed-off-by: Jeremy Bongio <jbongio@google.com>
Link: https://lore.kernel.org/20250507123010.1228243-1-jbongio@google.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When running in nominal drive mode, the maximum allowed frequency for
the NoC is 800MHz, but the OPP table for the i.MX8MP interconnect device
listed the 1GHz operating point for the NoC, regardless of the active
mode.
The newly introduced imx8mp-nominal.dtsi header reconfigures the clock
controller to observe nominal drive mode limits, so have it modify the
maximum NoC OPP as well.
Fixes: 255fbd9eabe7 ("arm64: dts: imx8mp: Add optional nominal drive mode DTSI")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
The partial matching of DAI widget to link names, can cause problems if
one of the widget names is a substring of another. E.g. with names
"Foo1" and Foo10", it's not possible to correctly link up "Foo1".
Modify the logic so that if multiple DAI links match the widget stream
name, prioritize a full match if one is found.
Fixes: fe88788779fc ("ASoC: SOF: topology: Use partial match for connecting DAI link and DAI widget")
Link: https://github.com/thesofproject/linux/issues/5308
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20250509085318.13936-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Keep using the PIO mode for commands on ACE2+ platforms, similarly how
the legacy stack is configured.
Fixes: 05cf17f1bf6d ("ASoC: SOF: Intel: hda-bus: Use PIO mode for Lunar Lake")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250509081308.13784-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The firmware does not provide any information for capture streams via the
shared pipeline registers.
To avoid reporting invalid delay value for capture streams to user space
we need to disable it.
Fixes: af74dbd0dbcf ("ASoC: SOF: ipc4-pcm: allocate time info for pcm delay feature")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Link: https://patch.msgid.link/20250509085951.15696-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The header.numid is set to scontrol->comp_id in bytes_ext_get and it is
ignored during bytes_ext_put.
The use of comp_id is not quite great as it is kernel internal
identification number.
Set the header.numid to SOF_CTRL_CMD_BINARY during get and validate the
numid during put to provide consistent and compatible identification
number as IPC3.
For IPC4 existing tooling also ignored the numid but with the use of
SOF_CTRL_CMD_BINARY the different handling of the blobs can be dropped,
providing better user experience.
Reported-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Closes: https://github.com/thesofproject/linux/issues/5282
Fixes: a062c8899fed ("ASoC: SOF: ipc4-control: Add support for bytes control get and put")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Link: https://patch.msgid.link/20250509085633.14930-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
kvasprintf() failure is often caused by memory allocation which has error
code -ENOMEM, but config_item_set_name() returns -EFAULT for the failure.
Fix by returning -ENOMEM instead of -EFAULT for the failure.
Reviewed-by: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250507-fix_configfs-v3-3-fe2d96de8dc4@quicinc.com
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
|
|
populate_attrs() may override failure for creating attribute files
by success for creating subsequent bin attribute files, and have
wrong return value.
Fix by creating bin attribute files under successfully creating
attribute files.
Fixes: 03607ace807b ("configfs: implement binary attributes")
Cc: stable@vger.kernel.org
Reviewed-by: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250507-fix_configfs-v3-2-fe2d96de8dc4@quicinc.com
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
|
|
Macro type_print() definition ends with semicolon, so will cause
the subsequent macro invocations end with two semicolons.
Fix by deleting the semicolon from the macro definition.
Reviewed-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250507-fix_configfs-v3-1-fe2d96de8dc4@quicinc.com
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
|
|
Determining the SST/MST mode during state computation must be done based
on the output type stored in the CRTC state, which in turn is set once
based on the modeset connector's SST vs. MST type and will not change as
long as the connector is using the CRTC. OTOH the MST mode indicated by
the given connector's intel_dp::is_mst flag can change independently of
the above output type, based on what sink is at any moment plugged to
the connector.
Fix the state computation accordingly.
Cc: Jani Nikula <jani.nikula@intel.com>
Fixes: f6971d7427c2 ("drm/i915/mst: adapt intel_dp_mtp_tu_compute_config() for 128b/132b SST")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4607
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://lore.kernel.org/r/20250507151953.251846-1-imre.deak@intel.com
(cherry picked from commit 0f45696ddb2b901fbf15cb8d2e89767be481d59f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into atomic_writes
large atomic writes for xfs [v12.1]
Currently atomic write support for xfs is limited to writing a single
block as we have no way to guarantee alignment and that the write covers
a single extent.
This series introduces a method to issue atomic writes via a
software-based method.
The software-based method is used as a fallback for when attempting to
issue an atomic write over misaligned or multiple extents.
For xfs, this support is based on reflink CoW support.
The basic idea of this CoW method is to alloc a range in the CoW fork,
write the data, and atomically update the mapping.
Initial mysql performance testing has shown this method to perform ok.
However, there we are only using 16K atomic writes (and 4K block size),
so typically - and thankfully - this software fallback method won't be
used often.
For other FSes which want large atomics writes and don't support CoW, I
think that they can follow the example in [0].
Catherine is currently working on further xfstests for this feature,
which we hope to share soon.
About 17/17, maybe it can be omitted as there is no strong demand to have
it included.
Based on bfecc4091e07 (xfs/next-rc, xfs/for-next) xfs: allow ro mounts
if rtdev or logdev are read-only
[0] https://lore.kernel.org/linux-xfs/20250102140411.14617-1-john.g.garry@oracle.com/
Differences to v12:
- add more review tags
Differences to v11:
- split "xfs: ignore ..." patch
- inline sync_blockdev() in xfs_alloc_buftarg() (Christoph)
- fix xfs_calc_rtgroup_awu_max() for 0 block count (Darrick)
- Add RB tag from Christoph (thanks!)
Differences to v10:
- add "xfs: only call xfs_setsize_buftarg once ..." by Darrick
- symbol renames in "xfs: ignore HW which cannot..." by Darrick
Differences to v9:
- rework "ignore HW which cannot .." patch by Darrick
- Ensure power-of-2 max always for unit min/max when no HW support
With a bit of luck, this should all go splendidly.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
|
|
When increasing the array size in memblock_double_array() and the slab
is not yet available, a call to memblock_find_in_range() is used to
reserve/allocate memory. However, the range returned may not have been
accepted, which can result in a crash when booting an SNP guest:
RIP: 0010:memcpy_orig+0x68/0x130
Code: ...
RSP: 0000:ffffffff9cc03ce8 EFLAGS: 00010006
RAX: ff11001ff83e5000 RBX: 0000000000000000 RCX: fffffffffffff000
RDX: 0000000000000bc0 RSI: ffffffff9dba8860 RDI: ff11001ff83e5c00
RBP: 0000000000002000 R08: 0000000000000000 R09: 0000000000002000
R10: 000000207fffe000 R11: 0000040000000000 R12: ffffffff9d06ef78
R13: ff11001ff83e5000 R14: ffffffff9dba7c60 R15: 0000000000000c00
memblock_double_array+0xff/0x310
memblock_add_range+0x1fb/0x2f0
memblock_reserve+0x4f/0xa0
memblock_alloc_range_nid+0xac/0x130
memblock_alloc_internal+0x53/0xc0
memblock_alloc_try_nid+0x3d/0xa0
swiotlb_init_remap+0x149/0x2f0
mem_init+0xb/0xb0
mm_core_init+0x8f/0x350
start_kernel+0x17e/0x5d0
x86_64_start_reservations+0x14/0x30
x86_64_start_kernel+0x92/0xa0
secondary_startup_64_no_verify+0x194/0x19b
Mitigate this by calling accept_memory() on the memory range returned
before the slab is available.
Prior to v6.12, the accept_memory() interface used a 'start' and 'end'
parameter instead of 'start' and 'size', therefore the accept_memory()
call must be adjusted to specify 'start + size' for 'end' when applying
to kernels prior to v6.12.
Cc: stable@vger.kernel.org # see patch description, needs adjustments for <= 6.11
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/da1ac73bf4ded761e21b4e4bb5178382a580cd73.1746725050.git.thomas.lendacky@amd.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
|
|
After a recent change [1] in clang's randstruct implementation to
randomize structures that only contain function pointers, there is an
error because qede_ll_ops get randomized but does not use a designated
initializer for the first member:
drivers/net/ethernet/qlogic/qede/qede_main.c:206:2: error: a randomized struct can only be initialized with a designated initializer
206 | {
| ^
Explicitly initialize the common member using a designated initializer
to fix the build.
Cc: stable@vger.kernel.org
Fixes: 035f7f87b729 ("randstruct: Enable Clang support")
Link: https://github.com/llvm/llvm-project/commit/04364fb888eea6db9811510607bed4b200bcb082 [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20250507-qede-fix-clang-randstruct-v1-1-5ccc15626fba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags
- hci_event: Fix not using key encryption size when its known
* tag 'for-net-2025-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_event: Fix not using key encryption size when its known
Bluetooth: MGMT: Fix MGMT_OP_ADD_DEVICE invalid device flags
====================
Link: https://patch.msgid.link/20250508150927.385675-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.15-2025-05-08:
amdgpu:
- DC FP fixes
- Freesync fix
- DMUB AUX fixes
- VCN fix
- Hibernation fixes
- HDP fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250508194102.3242372-1-alexander.deucher@amd.com
|
|
Now that there is the "burst <= 0" fastpath, for all later code, burst
must be strictly greater than zero. Therefore, drop the redundant checks
of this local variable.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Now that unlock_ret releases the lock, then falls into nolock_ret, which
handles ->missed based on the value of ret, the common-case lock-held
code can be collapsed into a single "if" statement with a single-statement
"then" clause.
Yes, we could go further and just assign the "if" condition to ret,
but in the immortal words of MSDOS, "Are you sure?".
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Now that we have a nolock_ret label that handles ->missed correctly
based on the value of ret, we can eliminate a local variable and collapse
several "if" statements on the lock-acquisition-failure code path.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Create a nolock_ret label in order to start consolidating the unlocked
return paths that conditionally invoke ratelimit_state_inc_miss().
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
By making "ret" always be initialized, and moving the final call to
ratelimit_state_inc_miss() out from under the lock, we save a goto and
a couple lines of code. This also saves a couple of lines of code from
the unconditional enable/disable slowpath.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Currently, ___ratelimit() treats a negative ->interval or ->burst as
if it was zero, but this is an accident of the current implementation.
Therefore, splat in this case, which might have the benefit of detecting
use of uninitialized ratelimit_state structures on the one hand or easing
addition of new features on the other.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Currently, if the lock is acquired, the code unconditionally does
an atomic decrement on ->rs_n_left, even if that atomic operation is
guaranteed to return a limit-rate verdict. A limit-rate verdict will
in fact be the common case when something is spewing into a rate limit.
This unconditional atomic operation incurs needless overhead and also
raises the spectre of counter wrap.
Therefore, do the atomic decrement only if there is some chance that
rates won't be limited.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Currently, if the lock could not be acquired, the code unconditionally
does an atomic decrement on ->rs_n_left, even if that atomic operation
is guaranteed to return a limit-rate verdict. This incurs needless
overhead and also raises the spectre of counter wrap.
Therefore, do the atomic decrement only if there is some chance that
rates won't be limited.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Restore the previous semantics where the misses counter is unchanged if
the RATELIMIT_MSG_ON_RELEASE flag is set.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Currently, if rate limiting is disabled, ___ratelimit() does an immediate
early return with no state changes. This can result in false-positive
drops when re-enabling rate limiting. Therefore, mark the ratelimit_state
structure "uninitialized" when rate limiting is disabled.
[ paulmck: Apply Petr Mladek feedback. ]
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
If ->interval is zero, then rate-limiting will be disabled.
Alternatively, if interval is greater than zero and ->burst is zero,
then rate-limiting will be applied unconditionally. The point of this
distinction is to handle current users that pass zero-initialized
ratelimit_state structures to ___ratelimit(), and in such cases the
->lock field will be uninitialized. Acquiring ->lock in this case is
clearly not a strategy to win.
Therefore, make this classification be lockless.
Note that although negative ->interval and ->burst happen to be treated
as if they were zero, this is an accident of the current implementation.
The semantics of negative values for these fields is subject to change
without notice. Especially given that Bert Karwatzki determined that
no current calls to ___ratelimit() ever have negative values for these
fields.
This commit replaces an earlier buggy versions.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Reported-by: Bert Karwatzki <spasswolf@web.de>
Reported-by: "Aithal, Srikanth" <sraithal@amd.com>
Closes: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/257c3b91-e30f-48be-9788-d27a4445a416@sirena.org.uk/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: "Aithal, Srikanth" <sraithal@amd.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
Retain the locked design, but check rate-limiting even when the lock
could not be acquired.
Link: https://lore.kernel.org/all/Z_VRo63o2UsVoxLG@pathway.suse.cz/
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
The ___ratelimit() function special-cases the jiffies-counter value of zero
as "uninitialized". This works well on 64-bit systems, where the jiffies
counter is not going to return to zero for more than half a billion years
on systems with HZ=1000, but similar 32-bit systems take less than 50 days
to wrap the jiffies counter. And although the consequences of wrapping the
jiffies counter seem to be limited to minor confusion on the duration of
the rate-limiting interval that happens to end at time zero, it is almost
no work to avoid this confusion.
Therefore, introduce a RATELIMIT_INITIALIZED bit to the ratelimit_state
structure's ->flags field so that a ->begin value of zero is no longer
special.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
The ___ratelimit() function simply returns zero ("do ratelimiting")
if the trylock fails, but does not adjust the ->missed field. This
means that the resulting dropped printk()s are dropped silently, which
could seriously confuse people trying to do console-log-based debugging.
Therefore, increment the ->missed field upon trylock failure.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
The ratelimit_state structure's ->missed field is sometimes incremented
locklessly, and it would be good to avoid lost counts. This is also
needed to count the number of misses due to trylock failure. Therefore,
convert the ratelimit_state structure's ->missed field to atomic_t.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|