Age | Commit message (Collapse) | Author |
|
A framer is a component in charge of an E1/T1 line interface.
Connected usually to a TDM bus, it converts TDM frames to/from E1/T1
frames. It also provides information related to the E1/T1 line.
The framer framework provides a set of APIs for the framer drivers
(framer provider) to create/destroy a framer and APIs for the framer
users (framer consumer) to obtain a reference to the framer, and
use the framer.
This basic implementation provides a framer abstraction for:
- power on/off the framer
- get the framer status (line state)
- be notified on framer status changes
- get/set the framer configuration
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231128132534.258459-2-herve.codina@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix various bugs / regressions for ext4, including a soft lockup, a
WARN_ON, and a BUG"
* tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: fix soft lockup in journal_finish_inode_data_buffers()
ext4: fix warning in ext4_dio_write_end_io()
jbd2: increase the journal IO's priority
jbd2: correct the printing of write_flags in jbd2_write_superblock()
ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
|
|
166 was skipped by mistake and two clocks:
* CLK_MOUT_CMU_HSI0_USBDPDGB
* CLK_GOUT_HSI0_USBDPDGB
Have an incorrect DGB ending instead of DBG.
This is an ABI break, but as the patch was only applied yesterday this
header has never been in an actual release so it seems better to fix
this early than ignore it.
Fixes: 0a910f160638 ("dt-bindings: clock: Add Google gs101 clock management unit bindings")
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: André Draszik <andre.draszik@linaro.org>
Link: https://lore.kernel.org/r/20231211162331.435900-7-peter.griffin@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
- Fix a couple of potential crashes, one introduced in 6.6 and one
in 5.10
- Fix misbehavior of virtiofs submounts on memory pressure
- Clarify naming in the uAPI for a recent feature
* tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
fuse: share lookup state between submount and its parent
docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
|
|
Xe, is a new driver for Intel GPUs that supports both integrated and
discrete platforms starting with Tiger Lake (first Intel Xe Architecture).
The code is at a stage where it is already functional and has experimental
support for multiple platforms starting from Tiger Lake, with initial
support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan
drivers), as well as in NEO (for OpenCL and Level0).
The new Xe driver leverages a lot from i915.
As for display, the intent is to share the display code with the i915
driver so that there is maximum reuse there. But it is not added
in this patch.
This initial work is a collaboration of many people and unfortunately
the big squashed patch won't fully honor the proper credits. But let's
get some git quick stats so we can at least try to preserve some of the
credits:
Co-developed-by: Matthew Brost <matthew.brost@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Co-developed-by: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Co-developed-by: Francois Dugast <francois.dugast@intel.com>
Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Co-developed-by: Jani Nikula <jani.nikula@intel.com>
Co-developed-by: José Roberto de Souza <jose.souza@intel.com>
Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Co-developed-by: Dave Airlie <airlied@redhat.com>
Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
|
|
Running my yearly branch profiler to see where likely/unlikely annotation
may be added or removed, I discovered this:
correct incorrect % Function File Line
------- --------- - -------- ---- ----
0 457918 100 page_try_dup_anon_rmap rmap.h 264
[..]
458021 0 0 page_try_dup_anon_rmap rmap.h 265
I thought it was interesting that line 264 of rmap.h had a 100% incorrect
annotation, but the line directly below it was 100% correct. Looking at the
code:
if (likely(!is_device_private_page(page) &&
unlikely(page_needs_cow_for_dma(vma, page))))
It didn't make sense. The "likely()" was around the entire if statement
(not just the "!is_device_private_page(page)"), which also included the
"unlikely()" portion of that if condition.
If the unlikely portion is unlikely to be true, that would make the entire
if condition unlikely to be true, so it made no sense at all to say the
entire if condition is true.
What is more likely to be likely is just the first part of the if statement
before the && operation. It's likely to be a misplaced parenthesis. And
after making the if condition broken into a likely() && unlikely(), both
now appear to be correct!
Link: https://lkml.kernel.org/r/20231201145936.5ddfdb50@gandalf.local.home
Fixes:fb3d824d1a46c ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: cleanup and use more folio in page fault", v3.
Rename page_copy_prealloc() to folio_prealloc(), which is used by more
functions, also do more folio conversion in page fault.
This patch (of 5):
Since ksm only support normal page, no swapout/in for ksm large folio too,
add large folio check in ksm_might_need_to_copy(), also convert
page->index to folio->index as page->index is going away.
Then convert ksm_might_need_to_copy() to use more folio api to save nine
compound_head() calls, short 'address' to reduce max-line-length.
Link: https://lkml.kernel.org/r/20231118023232.1409103-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20231118023232.1409103-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/damon: let users feed and tame/auto-tune DAMOS".
Introduce Aim-oriented Feedback-driven DAMOS Aggressiveness Auto-tuning.
It makes DAMOS self-tuned with periodic simple user feedback.
Background: DAMOS Control Difficulty
====================================
DAMOS helps users easily implement access pattern aware system operations.
However, controlling DAMOS in the wild is not that easy.
The basic way for DAMOS control is specifying the target access pattern.
In this approach, the user is assumed to well understand the access
pattern and the characteristics of the system and the workloads. Though
there are useful tools for that, it takes time and effort depending on the
complexity and the dynamicity of the system and the workloads. After all,
the access pattern consists of three ranges, namely the size, the access
rate, and the age of the regions. It means users need to tune six
parameters, which is anyway not a simple task.
One of the worst cases would be DAMOS being too aggressive like a
berserker, and therefore consuming too much system resource and making
unwanted radical system operations. To let users avoid such cases, DAMOS
allows users to set the upper-limit of the schemes' aggressiveness, namely
DAMOS quota. DAMOS further provides its best-effort under the limit by
prioritizing regions based on the access pattern of the regions. For
example, users can ask DAMOS to page out up to 100 MiB of memory regions
per second. Then DAMOS pages out regions that are not accessed for a
longer time (colder) first under the limit. This allows users to set the
target access pattern a bit naive with wider ranges, and focus on tuning
only one parameter, the quota. In other words, the number of parameters
to tune can be reduced from six to one.
Still, however, the optimum value for the quota depends on the system and
the workloads' characteristics, so not that simple. The number of
parameters to tune can also increase again if the user needs to run
multiple schemes.
Aim-oriented Feedback-driven DAMOS Aggressiveness Auto Tuning
=============================================================
Users would use DAMOS since they want to achieve something with it. They
will likely have measurable metrics representing the achievement and the
target number of the metric like SLO, and continuously measure that
anyway. While the additional cost of getting the information is nearly
zero, it could be useful for DAMOS to understand how appropriate its
current aggressiveness is set, and adjust it on its own to make the metric
value more close to the target.
Based on this idea, we introduce a new way of tuning DAMOS with nearly
zero additional effort, namely Aim-oriented Feedback-driven DAMOS
Aggressiveness Auto Tuning. It asks users to provide feedback
representing how well DAMOS is doing relative to the users' aim. Then
DAMOS adjusts its aggressiveness, specifically the quota that provides
the best effort result under the limit, based on the current level of
the aggressiveness and the users' feedback.
Implementation
==============
The implementation asks users to represent the feedback with score
numbers. The scores could be anything including user-space specific
metrics including latency and throughput of special user-space workloads,
and system metrics including free memory ratio, memory pressure stall time
(PSI), and active to inactive LRU lists size ratio. The feedback scores
and the aggressiveness of the given DAMOS scheme are assumed to be
positively proportional, though. Selecting metrics of the assumption is
the users' responsibility.
The core logic uses the below simple feedback loop algorithm to calculate
the next aggressiveness level of the scheme from the current
aggressiveness level and the current feedback (target_score and
current_score). It calculates the compensation for next aggressiveness as
a proportion of current aggressiveness and distance to the target score.
As a result, it arrives at the near-goal state in a short time using big
steps when it's far from the goal, but avoids making unnecessarily radical
changes that could turn out to be a bad decision using small steps when
its near to the goal.
f(n) = max(1, f(n - 1) * ((target_score - current_score) / target_score + 1))
Note that the compensation value becomes negative when it's over
achieving the goal. That's why the feedback metric and the
aggressiveness of the scheme should be positively proportional. The
distance-adaptive speed manipulation is simply applied.
Example Use Cases
=================
If users want to reduce the memory footprint of the system as much as
possible as long as the time spent for handling the resulting memory
pressure is within a threshold, they could use DAMOS scheme that reclaims
cold memory regions aiming for a little level of memory pressure stall
time.
If users want the active/inactive LRU lists well balanced to reduce the
performance impact due to possible future memory pressure, they could use
two schemes. The first one would be set to locate hot pages in the active
LRU list, aiming for a specific active-to-inactive LRU list size ratio,
say, 70%. The second one would be to locate cold pages in the inactive
LRU list, aiming for a specific inactive-to-active LRU list size ratio,
say, 30%. Then, DAMOS will balance the two schemes based on the goal and
feedback.
This aim-oriented auto tuning could also be useful for general
balancing-required access aware system operations such as system memory
auto scaling[3] and tiered memory management[4]. These two example usages
are not what current DAMOS implementation is already supporting, but
require additional DAMOS action developments, though.
Evaluation: subtle memory pressure aiming proactive reclamation
===============================================================
To show if the implementation works as expected, we prepare four different
system configurations on AWS i3.metal instances. The first setup
(original) runs the workload without any DAMOS scheme. The second setup
(not-tuned) runs the workload with a virtual address space-based proactive
reclamation scheme that pages out memory regions that are not accessed for
five seconds or more. The third setup (offline-tuned) runs the same
proactive reclamation DAMOS scheme, but after making it tuned for each
workload offline, using our previous user-space driven automatic tuning
approach, namely DAMOOS[1]. The fourth and final setup (AFDAA) runs the
scheme that is the same as that of 'not-tuned' setup, but aims to keep
0.5% of 'some' memory pressure stall time (PSI) for the last 10 seconds
using the aiming-oriented auto tuning.
For each setup, we run realistic workloads from PARSEC3 and SPLASH-2X
benchmark suites. For each run, we measure RSS and runtime of the
workload, and 'some' memory pressure stall time (PSI) of the system. We
repeat the runs five times and use averaged measurements.
For simple comparison of the results, we normalize the measurements to
those of 'original'. In the case of the PSI, though, the measurement for
'original' was zero, so we normalize the value to that of 'not-tuned'
scheme's result. The normalized results are shown below.
Not-tuned Offline-tuned AFDAA
RSS 0.622688178226118 0.787950678944904 0.740093483278979
runtime 1.11767826657912 1.0564674983585 1.0910833880499
PSI 1 0.727521443794069 0.308498846350299
The 'not-tuned' scheme achieves about 38.7% memory saving but incur about
11.7% runtime slowdown. The 'offline-tuned' scheme achieves about 22.2%
memory saving with about 5.5% runtime slowdown. It also achieves about
28.2% memory pressure stall time saving. AFDAA achieves about 26% memory
saving with about 9.1% runtime slowdown. It also achieves about 69.1%
memory pressure stall time saving. We repeat this test multiple times,
and get consistent results. AFDAA is now integrated in our daily DAMON
performance test setup.
Apparently the aggressiveness of 'AFDAA' setup is somewhere between those
of 'not-tuned' and 'offline-tuned' setup, since its memory saving and
runtime overhead are between those of the other two setups. Actually we
set the memory pressure stall time goal aiming for this middle
aggressiveness. The difference in the two metrics are not significant,
though. However, it shows significant saving of the memory pressure stall
time, which was the goal of the auto-tuning, over the two variants.
Hence, we conclude the automatic tuning is working as expected.
Please note that the AFDAA setup is only for the evaluation, and
therefore intentionally set a bit aggressive. It might not be
appropriate for production environments.
The test code is also available[2], so you could reproduce it on your
system and workloads.
Patches Sequence
================
The first four patches implement the core logic and user interfaces for
the auto tuning. The first patch implements the core logic for the auto
tuning, and the API for DAMOS users in the kernel space. The second
patch implements basic file operations of DAMON sysfs directories and
files that will be used for setting the goals and providing the
feedback. The third patch connects the quota goals files inputs to the
DAMOS core logic. Finally the fourth patch implements a dedicated DAMOS
sysfs command for efficiently committing the quota goals feedback.
Two patches for simple tests of the logic and interfaces follow. The
fifth patch implements the core logic unit test. The sixth patch
implements a selftest for the DAMON Sysfs interface for the goals.
Finally, three patches for documentation follows. The seventh patch
documents the design of the feature. The eighth patch updates the API
doc for the new sysfs files. The final eighth patch updates the usage
document for the features.
References
==========
[1] DAOS paper:
https://www.amazon.science/publications/daos-data-access-aware-operating-system
[2] Evaluation code:
https://github.com/damonitor/damon-tests/commit/3f884e61193f0166b8724554b6d06b0c449a712d
[3] Memory auto scaling RFC idea:
https://lore.kernel.org/damon/20231112195114.61474-1-sj@kernel.org/
[4] DAMON-based tiered memory management RFC idea:
https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/
This patch (of 9)
Users can effectively control the upper-limit aggressiveness of DAMOS
schemes using the quota feature. The quota provides best result under the
limit by prioritizing regions based on the access pattern. That said,
finding the best value, which could depend on dynamic characteristics of
the system and the workloads, is still challenging.
Implement a simple feedback-driven tuning mechanism and use it for
automatic tuning of DAMOS quota. The implementation allows users to
provide the feedback by setting a feedback score returning callback
function. Then DAMOS periodically calls the function back and adjusts the
quota based on the return value of the callback and current quota value.
Note that the absolute-value based time/size quotas still work as the
maximum hard limits of the scheme's aggressiveness. The feedback-driven
auto-tuned quota is applied only if it is not exceeding the manually set
maximum limits. Same for the scheme-target access pattern and filters
like other features.
[sj@kernel.org: document get_score_arg field of struct damos_quota]
Link: https://lkml.kernel.org/r/20231204170106.60992-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20231130023652.50284-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20231130023652.50284-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, we only shrink the zswap pool when the user-defined limit is
hit. This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed ahead
of time, as this depends on the workload (specifically, on factors such as
memory access patterns and compressibility of the memory pages).
This patch implements a memcg- and NUMA-aware shrinker for zswap, that is
initiated when there is memory pressure. The shrinker does not have any
parameter that must be tuned by the user, and can be opted in or out on a
per-memcg basis.
Furthermore, to make it more robust for many workloads and prevent
overshrinking (i.e evicting warm pages that might be refaulted into
memory), we build in the following heuristics:
* Estimate the number of warm pages residing in zswap, and attempt to
protect this region of the zswap LRU.
* Scale the number of freeable objects by an estimate of the memory
saving factor. The better zswap compresses the data, the fewer pages
we will evict to swap (as we will otherwise incur IO for relatively
small memory saving).
* During reclaim, if the shrinker encounters a page that is also being
brought into memory, the shrinker will cautiously terminate its
shrinking action, as this is a sign that it is touching the warmer
region of the zswap LRU.
As a proof of concept, we ran the following synthetic benchmark: build the
linux kernel in a memory-limited cgroup, and allocate some cold data in
tmpfs to see if the shrinker could write them out and improved the overall
performance. Depending on the amount of cold data generated, we observe
from 14% to 35% reduction in kernel CPU time used in the kernel builds.
[nphamcs@gmail.com: check shrinker enablement early, use less costly stat flushing]
Link: https://lkml.kernel.org/r/20231206194456.3234203-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-7-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since zswap now writes back pages from memcg-specific LRUs, we now need a
new stat to show writebacks count for each memcg.
[nphamcs@gmail.com: rename ZSWP_WB to ZSWPWB]
Link: https://lkml.kernel.org/r/20231205193307.2432803-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-5-nphamcs@gmail.com
Suggested-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, we only have a single global LRU for zswap. This makes it
impossible to perform worload-specific shrinking - an memcg cannot
determine which pages in the pool it owns, and often ends up writing pages
from other memcgs. This issue has been previously observed in practice
and mitigated by simply disabling memcg-initiated shrinking:
https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
This patch fully resolves the issue by replacing the global zswap LRU
with memcg- and NUMA-specific LRUs, and modify the reclaim logic:
a) When a store attempt hits an memcg limit, it now triggers a
synchronous reclaim attempt that, if successful, allows the new
hotter page to be accepted by zswap.
b) If the store attempt instead hits the global zswap limit, it will
trigger an asynchronous reclaim attempt, in which an memcg is
selected for reclaim in a round-robin-like fashion.
[nphamcs@gmail.com: use correct function for the onlineness check, use mem_cgroup_iter_break()]
Link: https://lkml.kernel.org/r/20231205195419.2563217-1-nphamcs@gmail.com
[nphamcs@gmail.com: drop the pool's reference at the end of the writeback step]
Link: https://lkml.kernel.org/r/20231206030627.4155634-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-4-nphamcs@gmail.com
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Co-developed-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This patch implements a helper function that try to get a reference to an
memcg's css, as well as checking if it is online. This new function is
almost exactly the same as the existing mem_cgroup_tryget(), except for
the onlineness check. In the !CONFIG_MEMCG case, it always returns true,
analogous to mem_cgroup_tryget(). This is useful for e.g to the new zswap
writeback scheme, where we need to select the next online memcg as a
candidate for the global limit reclaim.
Link: https://lkml.kernel.org/r/20231130194023.4102148-3-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "workload-specific and memory pressure-driven zswap
writeback", v8.
There are currently several issues with zswap writeback:
1. There is only a single global LRU for zswap, making it impossible to
perform worload-specific shrinking - an memcg under memory pressure
cannot determine which pages in the pool it owns, and often ends up
writing pages from other memcgs. This issue has been previously
observed in practice and mitigated by simply disabling
memcg-initiated shrinking:
https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
But this solution leaves a lot to be desired, as we still do not
have an avenue for an memcg to free up its own memory locked up in
the zswap pool.
2. We only shrink the zswap pool when the user-defined limit is hit.
This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed
ahead of time, as this depends on the workload (specifically, on
factors such as memory access patterns and compressibility of the
memory pages).
This patch series solves these issues by separating the global zswap LRU
into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e
memcg- and NUMA-aware) zswap writeback under memory pressure. The new
shrinker does not have any parameter that must be tuned by the user, and
can be opted in or out on a per-memcg basis.
As a proof of concept, we ran the following synthetic benchmark: build the
linux kernel in a memory-limited cgroup, and allocate some cold data in
tmpfs to see if the shrinker could write them out and improved the overall
performance. Depending on the amount of cold data generated, we observe
from 14% to 35% reduction in kernel CPU time used in the kernel builds.
This patch (of 6):
The interface of list_lru is based on the assumption that the list node
and the data it represents belong to the same allocated on the correct
node/memcg. While this assumption is valid for existing slab objects LRU
such as dentries and inodes, it is undocumented, and rather inflexible for
certain potential list_lru users (such as the upcoming zswap shrinker and
the THP shrinker). It has caused us a lot of issues during our
development.
This patch changes list_lru interface so that the caller must explicitly
specify numa node and memcg when adding and removing objects. The old
list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and
list_lru_del_obj(), respectively.
It also extends the list_lru API with a new function, list_lru_putback,
which undoes a previous list_lru_isolate call. Unlike list_lru_add, it
does not increment the LRU node count (as list_lru_isolate does not
decrement the node count). list_lru_putback also allows for explicit
memcg and NUMA node selection.
Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
ma_wr_state was previously tracking the end of the node for writing.
Since the implementation of the ma_state end tracking, this is duplicated
work. This patch removes the maple write state tracking of the end of the
node and uses the maple state end instead.
Link: https://lkml.kernel.org/r/20231101171629.3612299-11-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The maple tree node is overloaded to keep status as well as the active
node. This, unfortunately, results in a re-walk on underflow or overflow.
Since the maple state has room, the status can be placed in its own enum
in the structure. Once an underflow/overflow is detected, certain modes
can restore the status to active and others may need to re-walk just that
one node to see the entry.
The status being an enum has the benefit of detecting unhandled status in
switch statements.
[Liam.Howlett@oracle.com: fix comments about MAS_*]
Link: https://lkml.kernel.org/r/20231106154124.614247-1-Liam.Howlett@oracle.com
[Liam.Howlett@oracle.com: update forking to separate maple state and node]
Link: https://lkml.kernel.org/r/20231106154551.615042-1-Liam.Howlett@oracle.com
[Liam.Howlett@oracle.com: fix mas_prev() state separation code]
Link: https://lkml.kernel.org/r/20231207193319.4025462-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20231101171629.3612299-9-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Analysis of the mas_for_each() iteration showed that there is a
significant time spent finding the end of a node. This time can be
greatly reduced if the end of the node is cached in the maple state. Care
must be taken to update & invalidate as necessary.
Link: https://lkml.kernel.org/r/20231101171629.3612299-5-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
__mas_set_range() was created to shortcut resetting the maple state and a
debug check was added to the caller (the vma iterator) to ensure the
internal maple state remains safe to use. Move the debug check from the
vma iterator into the maple tree itself so other users do not incorrectly
use the advanced maple state modification.
Fallout from this change include a large amount of debug setup needed to
be moved to earlier in the header, and the maple_tree.h radix-tree test
code needed to move the inclusion of the header to after the atomic
define. None of those changes have functional changes.
Link: https://lkml.kernel.org/r/20231101171629.3612299-4-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In preparation for pre-content permission events with file access range,
move fsnotify_file_perm() hook out of security_file_permission() and into
the callers.
Callers that have the access range information call the new hook
fsnotify_file_area_perm() with the access range.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231212094440.250945-6-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
filesystem may be modified in the context of fanotify permission events
(e.g. by HSM service), so assert that sb freeze protection is not held.
If the assertion fails, then the following deadlock would be possible:
CPU0 CPU1 CPU2
-------------------------------------------------------------------------
file_start_write()#0
...
fsnotify_perm()
fanotify_get_response() => (read event and fill file)
...
... freeze_super()
... sb_wait_write()
...
vfs_write()
file_start_write()#1
This example demonstrates a use case of an hierarchical storage management
(HSM) service that uses fanotify permission events to fill the content of
a file before access, while a 3rd process starts fsfreeze.
This creates a circular dependeny:
file_start_write()#0 => fanotify_get_response =>
file_start_write()#1 =>
sb_wait_write() =>
file_end_write()#0
Where file_end_write()#0 can never be called and none of the threads can
make progress.
The assertion is checked for both MAY_READ and MAY_WRITE permission
hooks in preparation for a pre-modify permission event.
The assertion is not checked for an open permission event, because
do_open() takes mnt_want_write() in O_TRUNC case, meaning that it is not
safe to write to filesystem in the content of an open permission event.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231212094440.250945-5-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We would like to make changes to the fsnotify access permission hook -
add file range arguments and add the pre modify event.
In preparation for these changes, split the fsnotify_perm() hook into
fsnotify_open_perm() and fsnotify_file_perm().
This is needed for fanotify "pre content" events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231212094440.250945-4-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
generic_copy_file_range() is just a wrapper around splice_file_range(),
which caps the maximum copy length.
The only caller of splice_file_range(), namely __ceph_copy_file_range()
is already ready to cope with short copy.
Move the length capping into splice_file_range() and replace the exported
symbol generic_copy_file_range() with a simple inline helper.
Suggested-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-fsdevel/20231204083849.GC32438@lst.de/
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231212094440.250945-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Not sure why some splice helpers return long, maybe historic reasons.
Change them all to return ssize_t to conform to the splice methods and
to the rest of the helpers.
Suggested-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20231208-horchen-helium-d3ec1535ede5@brauner/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231212094440.250945-2-amir73il@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
io_uring can currently open/close regular files or fixed/direct
descriptors. Or you can instantiate a fixed descriptor from a regular
one, and then close the regular descriptor. But you currently can't turn
a purely fixed/direct descriptor into a regular file descriptor.
IORING_OP_FIXED_FD_INSTALL adds support for installing a direct
descriptor into the normal file table, just like receiving a file
descriptor or opening a new file would do. This is all nicely abstracted
into receive_fd(), and hence adding support for this is truly trivial.
Since direct descriptors are only usable within io_uring itself, it can
be useful to turn them into real file descriptors if they ever need to
be accessed via normal syscalls. This can either be a transitory thing,
or just a permanent transition for a given direct descriptor.
By default, new fds are installed with O_CLOEXEC set. The application
can disable O_CLOEXEC by setting IORING_FIXED_FD_NO_CLOEXEC in the
sqe->install_fd_flags member.
Suggested-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
With io_uring_types.h we see all required definitions to inline
io_uring_cmd_get_task().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/aa8e317f09e651a5f3e72f8c0ad3902084c1f930.1701391955.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now as we can easily include io_uring_types.h, move IOU_F_TWQ_LAZY_WAKE
and inline io_uring_cmd_do_in_task_lazy().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/2ec9fb31dd192d1c5cf26d0a2dec5657d88a8e48.1701391955.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
linux/io_uring.h is slowly becoming a rubbish bin where we put
anything exposed to other subsystems. For instance, the task exit
hooks and io_uring cmd infra are completely orthogonal and don't need
each other's definitions. Start cleaning it up by splitting out all
command bits into a new header file.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7ec50bae6e21f371d3850796e716917fc141225a.1701391955.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs into for-6.8/io_uring
Merge vfs.file from the VFS tree to avoid conflicts with receive_fd() now
having 3 arguments rather than just 2.
* 'vfs.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
file: remove __receive_fd()
file: stop exposing receive_fd_user()
fs: replace f_rcuhead with f_task_work
file: remove pointless wrapper
file: s/close_fd_get_file()/file_close_fd()/g
Improve __fget_files_rcu() code generation (and thus __fget_light())
file: massage cleanup of files that failed to open
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next
Suzuki writes:
coresight: Updates for Linux v6.8
Updates for the hwtracing subsystem includes :
- Support for CoreSight TPDM DSB set
- Support for tuning Cycle count Threshold for CoreSight ETM via perf
- Support for TRBE on ACPI based systems
- Support for choosing buffer mode in ETR for sysfs mode
- Improvements to HiSilicon PTT driver
- Cleanups to Ultrasoc SMB driver
- Cleanup .remove callback for various Coresight platform drivers
- Remove Leo Yan from Reviewers
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-next-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (32 commits)
coresight: ultrasoc-smb: Use guards to cleanup
coresight: ultrasoc-smb: Convert to platform remove callback returning void
coresight: trbe: Convert to platform remove callback returning void
coresight: replicator: Convert to platform remove callback returning void
coresight: funnel: Convert to platform remove callback returning void
coresight: etm4x: Convert to platform remove callback returning void
coresight: dummy: Convert to platform remove callback returning void
coresight: etm4x: Fix width of CCITMIN field
coresight-tpdm: Correct the property name of MSR number
hwtracing: hisi_ptt: Optimize the trace data committing
hwtracing: hisi_ptt: Disable interrupt after trace end
Documentation: ABI: coresight-tpdm: Fix Bit[3] description indentation
coresight-tpdm: Add nodes for dsb msr support
dt-bindings: arm: Add support for DSB MSR register
coresight-tpdm: Add nodes for timestamp request
coresight-tpdm: Add nodes to configure pattern match output
coresight-tpdm: Add nodes for dsb edge control
coresight-tpdm: Add node to set dsb programming mode
coresight-tpdm: Add nodes to set trigger timestamp and type
coresight-tpdm: Add reset node to TPDM node
...
|
|
Honestly, there's little value in having a helper with and without that
int __user *ufd argument. It's just messy and doesn't really give us
anything. Just expose receive_fd() with that argument and get rid of
that helper.
Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-5-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Not every subsystem needs to have their own specialized helper.
Just us the __receive_fd() helper.
Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-4-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The naming is actively misleading since we switched to
SLAB_TYPESAFE_BY_RCU. rcu_head is #define callback_head. Use
callback_head directly and rename f_rcuhead to f_task_work.
Add comments in there to explain what it's used for.
Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-3-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
That really shouldn't have "get" in there as that implies we're bumping
the reference count which we don't do at all. We used to but not anmore.
Now we're just closing the fd and pick that file from the fdtable
without bumping the reference count. Update the wrong documentation
while at it.
Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-1-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a
performance regression as reported by the kernel test robot.
The __fget_light() function is one of those critical ones for some
loads, and the code generation was unnecessarily impacted. Let's just
write that function to better.
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wiCJtLbFWNURB34b9a_R_unaH3CiMRXfkR0-iihB_z68A@mail.gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
events whose count meets a specified threshold condition. For example if
PMEVTYPERn.TC (Threshold Control) is set to 0b101 (Greater than or
equal, count), and the threshold is set to 2, then the PMU counter will
now only increment by 1 when an event would have previously incremented
the PMU counter by 2 or more on a single processor cycle.
Three new Perf event config fields, 'threshold', 'threshold_compare' and
'threshold_count' have been added to control the feature.
threshold_compare maps to the upper two bits of PMEVTYPERn.TC and
threshold_count maps to the first bit of TC. These separate attributes
have been picked rather than enumerating all the possible combinations
of the TC field as in the Arm ARM. The attributes would be used on a
Perf command line like this:
$ perf stat -e stall_slot/threshold=2,threshold_compare=2/
A new capability for reading out the maximum supported threshold value
has also been added:
$ cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
0x000000ff
If a threshold higher than threshold_max is provided, then an error is
generated. If FEAT_PMUv3_TH isn't implemented or a 32 bit kernel is
running, then threshold_max reads zero, and attempting to set a
threshold value will also result in an error.
The threshold is per PMU counter, and there are potentially different
threshold_max values per PMU type on heterogeneous systems.
Bits higher than 32 now need to be written into PMEVTYPER, so
armv8pmu_write_evtype() has to be updated to take an unsigned long value
rather than u32 which gives the correct behavior on both aarch32 and 64.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20231211161331.1277825-11-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This mechanism makes it much easier to define and read new attributes
so move it to the arm_pmu.h header so that it can be shared. At the same
time update the existing format attributes to use it.
GENMASK has to be changed to GENMASK_ULL because the config fields are
64 bits even on arm32 where this will also be used now.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20231211161331.1277825-7-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
FEAT_PMUv3_TH (Armv8.8) adds two new fields to PMEVTYPER, so include
them in the mask. These aren't writable on 32 bit kernels as they are in
the high part of the register, so only include them for arm64.
It would be difficult to do this statically in the asm header files for
each platform without resulting in circular includes or #ifdefs inline
in the code. For that reason the ARMV8_PMU_EVTYPE_MASK definition has
been removed and the mask is constructed programmatically.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20231211161331.1277825-6-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Convert the remaining fields to use either GENMASK or be built from
other fields. These all already started at bit 0 so don't need a code
change for the lack of _SHIFT.
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20231211161331.1277825-5-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This is so that FIELD_GET and FIELD_PREP can be used and that the fields
are in a consistent format to arm64/tools/sysreg
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20231211161331.1277825-4-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This is so that FIELD_GET and FIELD_PREP can be used and that the fields
are in a consistent format to arm64/tools/sysreg
Signed-off-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20231211161331.1277825-3-james.clark@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
There may be cases where puncturing isn't possible, and
a connection needs to be downgraded. Add a hardware flag
to support this.
This is likely temporary: it seems we will need to move
puncturing to the chandef/channel context.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231211085121.c1e89ea55e93.I37b8ca0ee64d5d7699e351785a9010afc106da3c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add support for setting the TID to link mapping for a non-AP MLD
station.
This is useful in cases user space needs to restrict the possible
set of active links, e.g., since it got a BSS Transition Management
request forcing to use only a subset of the valid links etc.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231211085121.da4d56a5f3ff.Iacf88e943326bf9c169c49b728c4a3445fdedc97@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
With the locking rework, more functions need to be called
with the wiphy mutex held. Document that, and for that use
the "Context" description that shows up more nicely in the
generated documentation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231211085121.24fa44c7eeb4.I8c9e030ddd78e07c99dd21fe1d5156555390f92e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Sometimes there may be reasons for which a BSS that's
actually found in scan cannot be used to connect to,
for example a nonprimary link of an NSTR mobile AP MLD
cannot be used for normal direct connections to it.
Not indicating these to userspace as we do now of course
avoids being able to connect to them, but it's better if
they're shown to userspace and it can make an appropriate
decision, without e.g. doing an additional ML probe.
Thus add an indication of what a BSS can be used for,
currently "normal" and "MLD link", including a reason
bitmap for it being not usable.
The latter can be extended later for certain BSSes if there
are other reasons they cannot be used.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231211085121.0464f25e0b1d.I9f70ca9f1440565ad9a5207d0f4d00a20cca67e7@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Current handling of del pmksa with SSID is limited to FILS
security. In the current change the del pmksa support is extended
to SAE/OWE security offloads as well. For OWE/SAE offloads, the
PMK is generated and cached at driver/FW, so user app needs the
capability to request cache deletion based on SSID for drivers
supporting SAE/OWE offload.
Signed-off-by: Vinayak Yadawad <vinayak.yadawad@broadcom.com>
Link: https://msgid.link/ecdae726459e0944c377a6a6f6cb2c34d2e057d0.1701262123.git.vinayak.yadawad@broadcom.com
[drop whitespace-damaged rdev_ops pointer completely, enabling tracing]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The newly added WBRF feature needs this interface for channel
width calculation.
Signed-off-by: Evan Quan <quanliangl@hotmail.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://msgid.link/20231211100630.2170152-4-Jun.Ma2@amd.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Introduce qmc_chan_{get,set}_ts_info() function to allow timeslots
modification at runtime.
The modification is provided using qmc_chan_set_ts_info() and will be
applied on next qmc_chan_start().
qmc_chan_set_ts_info() must be called with the channel rx and/or tx
stopped.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20231205152116.122512-18-herve.codina@bootlin.com
|
|
QMC child devices support is needed to avoid orphan DT nodes that use a
simple DT phandle to reference a QMC channel.
Allow to instantiate child devices and also extend the API to get the
qmc_chan using a child device.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Link: https://lore.kernel.org/r/20231205152116.122512-7-herve.codina@bootlin.com
|
|
In HDLC mode, some status flags related to the data read transfer can be
set by the hardware and need to be known by a QMC consumer for further
analysis.
Extend the API in order to provide these transfer status flags at the
read complete() call.
In TRANSPARENT mode, these flags have no meaning. Keep only one read
complete() API and update the consumers working in transparent mode.
In this case, the newly introduced flags parameter is simply unused.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20231205152116.122512-5-herve.codina@bootlin.com
|
|
A perfect driver would only call dev_iommu_priv_set() from its probe
callback. We've made it functionally correct to call it from the of_xlate
by adding a lock around that call.
lockdep assert that iommu_probe_device_lock is held to discourage misuse.
Exclude PPC kernels with CONFIG_FSL_PAMU turned on because FSL_PAMU uses a
global static for its priv and abuses priv for its domain.
Remove the pointless stores of NULL, all these are on paths where the core
code will free dev->iommu after the op returns.
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v2-16e4def25ebb+820-iommu_fwspec_p1_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Nothing needs this pointer. Return a normal error code with the usual
IOMMU semantic that ENODEV means 'there is no IOMMU driver'.
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Rob Herring <robh@kernel.org>
Tested-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v2-16e4def25ebb+820-iommu_fwspec_p1_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|