Age | Commit message (Collapse) | Author |
|
Christian Brauner <brauner@kernel.org> says:
Currently overlayfs only allows specifying layers through path names.
This is inconvenient for users such as systemd that want to assemble an
overlayfs mount purely based on file descriptors.
When porting overlayfs to the new mount api I already mentioned this.
This enables user to specify both:
fsconfig(fd_overlay, FSCONFIG_SET_FD, "upperdir+", NULL, fd_upper);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "workdir+", NULL, fd_work);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2);
in addition to:
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "upperdir+", "/upper", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "workdir+", "/work", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower1", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower2", 0);
The selftest contain an example for this.
* patches from https://lore.kernel.org/r/20241014-work-overlayfs-v3-0-32b3fed1286e@kernel.org:
selftests: add overlayfs fd mounting selftests
selftests: use shared header
Documentation,ovl: document new file descriptor based layers
ovl: specify layers via file descriptors
fs: add helper to use mount option as path or fd
Link: https://lore.kernel.org/r/20241014-work-overlayfs-v3-0-32b3fed1286e@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow filesystems to use a mount option either as a
file or path.
Link: https://lore.kernel.org/r/20241014-work-overlayfs-v3-1-32b3fed1286e@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
introduced a per-mm/cpu current concurrency id (mm_cid), which keeps
a reference to the concurrency id allocated for each CPU. This reference
expires shortly after a 100ms delay.
These per-CPU references keep the per-mm-cid data cache-local in
situations where threads are running at least once on each CPU within
each 100ms window, thus keeping the per-cpu reference alive.
However, intermittent workloads behaving in bursts spaced by more than
100ms on each CPU exhibit bad cache locality and degraded performance
compared to purely per-cpu data indexing, because concurrency IDs are
allocated over various CPUs and cores, therefore losing cache locality
of the associated data.
Introduce the following changes to improve per-mm-cid cache locality:
- Add a "recent_cid" field to the per-mm/cpu mm_cid structure to keep
track of which mm_cid value was last used, and use it as a hint to
attempt re-allocating the same concurrency ID the next time this
mm/cpu needs to allocate a concurrency ID,
- Add a per-mm CPUs allowed mask, which keeps track of the union of
CPUs allowed for all threads belonging to this mm. This cpumask is
only set during the lifetime of the mm, never cleared, so it
represents the union of all the CPUs allowed since the beginning of
the mm lifetime (note that the mm_cpumask() is really arch-specific
and tailored to the TLB flush needs, and is thus _not_ a viable
approach for this),
- Add a per-mm nr_cpus_allowed to keep track of the weight of the
per-mm CPUs allowed mask (for fast access),
- Add a per-mm max_nr_cid to keep track of the highest number of
concurrency IDs allocated for the mm. This is used for expanding the
concurrency ID allocation within the upper bound defined by:
min(mm->nr_cpus_allowed, mm->mm_users)
When the next unused CID value reaches this threshold, stop trying
to expand the cid allocation and use the first available cid value
instead.
Spreading allocation to use all the cid values within the range
[ 0, min(mm->nr_cpus_allowed, mm->mm_users) - 1 ]
improves cache locality while preserving mm_cid compactness within the
expected user limits,
- In __mm_cid_try_get, only return cid values within the range
[ 0, mm->nr_cpus_allowed ] rather than [ 0, nr_cpu_ids ]. This
prevents allocating cids above the number of allowed cpus in
rare scenarios where cid allocation races with a concurrent
remote-clear of the per-mm/cpu cid. This improvement is made
possible by the addition of the per-mm CPUs allowed mask,
- In sched_mm_cid_migrate_to, use mm->nr_cpus_allowed rather than
t->nr_cpus_allowed. This criterion was really meant to compare
the number of mm->mm_users to the number of CPUs allowed for the
entire mm. Therefore, the prior comparison worked fine when all
threads shared the same CPUs allowed mask, but not so much in
scenarios where those threads have different masks (e.g. each
thread pinned to a single CPU). This improvement is made
possible by the addition of the per-mm CPUs allowed mask.
* Benchmarks
Each thread increments 16kB worth of 8-bit integers in bursts, with
a configurable delay between each thread's execution. Each thread run
one after the other (no threads run concurrently). The order of
thread execution in the sequence is random. The thread execution
sequence begins again after all threads have executed. The 16kB areas
are allocated with rseq_mempool and indexed by either cpu_id, mm_cid
(not cache-local), or cache-local mm_cid. Each thread is pinned to its
own core.
Testing configurations:
8-core/1-L3: Use 8 cores within a single L3
24-core/24-L3: Use 24 cores, 1 core per L3
192-core/24-L3: Use 192 cores (all cores in the system)
384-thread/24-L3: Use 384 HW threads (all HW threads in the system)
Intermittent workload delays between threads: 200ms, 10ms.
Hardware:
CPU(s): 384
On-line CPU(s) list: 0-383
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9654 96-Core Processor
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 2
Caches (sum of all):
L1d: 6 MiB (192 instances)
L1i: 6 MiB (192 instances)
L2: 192 MiB (192 instances)
L3: 768 MiB (24 instances)
Each result is an average of 5 test runs. The cache-local speedup
is calculated as: (cache-local mm_cid) / (mm_cid).
Intermittent workload delay: 200ms
per-cpu mm_cid cache-local mm_cid cache-local speedup
(ns) (ns) (ns)
8-core/1-L3 1374 19289 1336 14.4x
24-core/24-L3 2423 26721 1594 16.7x
192-core/24-L3 2291 15826 2153 7.3x
384-thread/24-L3 1874 13234 1907 6.9x
Intermittent workload delay: 10ms
per-cpu mm_cid cache-local mm_cid cache-local speedup
(ns) (ns) (ns)
8-core/1-L3 662 756 686 1.1x
24-core/24-L3 1378 3648 1035 3.5x
192-core/24-L3 1439 10833 1482 7.3x
384-thread/24-L3 1503 10570 1556 6.8x
[ This deprecates the prior "sched: NUMA-aware per-memory-map concurrency IDs"
patch series with a simpler and more general approach. ]
[ This patch applies on top of v6.12-rc1. ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/lkml/20240823185946.418340-1-mathieu.desnoyers@efficios.com/
|
|
Sync with sched/urgent to avoid conflicts.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
'struct memstick_device_id' are not modified in these drivers.
Constifying this structure moves some data to a read-only section, so
increases overall security.
Update memstick_dev_match(), memstick_bus_match() and struct
memstick_driver accordingly.
On a x86_64, with allmodconfig, as an example:
Before:
======
text data bss dec hex filename
74055 3455 88 77598 12f1e drivers/memstick/core/ms_block.o
After:
=====
text data bss dec hex filename
74087 3423 88 77598 12f1e drivers/memstick/core/ms_block.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/6509d6f6ed64193f04e747a98ccea7492c976ca8.1727540434.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Add UHS-II specific data structures for commands and defines for
registers, as described in Part 1 UHS-II Addendum Version 1.01.
UHS-II related definitions are listed below:
1. UHS-II card capability: sd_uhs2_caps{}
2. UHS-II configuration: sd_uhs2_config{}
3. UHS-II register I/O address and register field definitions: sd_uhs2.h
Signed-off-by: Jason Lai <jason.lai@genesyslogic.com.tw>
Signed-off-by: Victor Shih <victor.shih@genesyslogic.com.tw>
Link: https://lore.kernel.org/r/20240913102836.6144-6-victorshihgli@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
To allow an additional external regulator to be controlled by an mmc host
driver, let's add support for a vqmmc2 regulator to the mmc core.
For an SD UHS-II interface the vqmmc2 regulator may correspond to the so
called vdd2 supply, as described by the SD spec. Initially, only 1.8V is
needed, hence limit the new helper function, mmc_regulator_set_vqmmc2() to
this too.
Note that, to allow for flexibility mmc host drivers need to manage the
enable/disable of the vqmmc2 regulator themselves, while the regulator is
looked up through the common mmc_regulator_get_supply().
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20240913102836.6144-5-victorshihgli@gmail.com
|
|
To inform the users about SD UHS-II cards, let's extend the print at card
insertion with a "UHS-II" substring. Within this change, it seems
reasonable to convert from using "ultra high speed" into "UHS-I speed", for
the UHS-I type, as it should makes it more clear.
Note that, the new print for UHS-II cards doesn't include the actual
selected speed mode. Instead, this is going to be added from subsequent
change.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20240913102836.6144-4-victorshihgli@gmail.com
|
|
The SD UHS-II interface was introduced to the SD spec v4.00 several years
ago. The interface is fundamentally different from an electrical and a
protocol point of view, comparing to the legacy SD interface.
However, the legacy SD protocol is supported through a specific transport
layer (SD-TRAN) defined in the UHS-II addendum of the spec. This allows the
SD card to be managed in a very similar way as a legacy SD card, hence a
lot of code can be re-used to support these new types of cards through the
mmc subsystem.
Moreover, an SD card that supports the UHS-II interface shall also be
backwards compatible with the legacy SD interface, which allows a UHS-II
card to be inserted into a legacy slot. As a matter of fact, this is
already supported by mmc subsystem as of today.
To prepare to add support for UHS-II, this change puts the basic foundation
in the mmc core in place, allowing it to be more easily reviewed before
subsequent changes implements the actual support.
Basically, the approach here adds a new UHS-II bus_ops type and adds a
separate initialization path for the UHS-II card. The intent is to avoid us
from sprinkling the legacy initialization path, but also to simplify
implementation of the UHS-II specific bits.
At this point, there is only one new host ops added to manage the various
ios settings needed for UHS-II. Additional host ops that are needed, are
being added from subsequent changes.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20240913102836.6144-3-victorshihgli@gmail.com
|
|
For open-ended read/write - just send CMD22 before issuing the command.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20241006051148.160278-5-avri.altman@wdc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
SDUC memory addressing spans beyond 2TB and up to 128TB. Therefore, 38
bits are required to access the entire memory space of all sectors.
Those extra 6 bits are to be carried by CMD22 prior of sending
read/write/erase commands: CMD17, CMD18, CMD24, CMD25, CMD32, and CMD33.
CMD22 will carry the higher order 6 bits, and must precedes any of the
above commands even if it targets sector < 2TB.
No error related to address or length is indicated in CMD22 but rather
in the read/write command itself.
Tested-by: Ricky WU <ricky_wu@realtek.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20241006051148.160278-3-avri.altman@wdc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Ultra Capacity SD cards (SDUC) was already introduced in SD7.0. Those
cards support capacity larger than 2TB and up to including 128TB.
ACMD41 was extended to support the host-card handshake during
initialization. The card expects that the HCS & HO2T bits to be set in
the command argument, and sets the applicable bits in the R3 returned
response. On the contrary, if a SDUC card is inserted to a
non-supporting host, it will never respond to this ACMD41 until
eventually, the host will timed out and give up.
Also, add SD CSD version 3.0 - designated for SDUC, and properly parse
the csd register as the c_size field got expanded to 28 bits.
Do not enable SDUC for now - leave it to the last patch in the series.
Tested-by: Ricky WU <ricky_wu@realtek.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20241006051148.160278-2-avri.altman@wdc.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
GIGASTONE Gaming Plus microSD cards manufactured on 02/2022 report that
they support poweroff notification and cache, but they are not working
correctly.
Flush Cache bit never gets cleared in sd_flush_cache() and Poweroff
Notification Ready bit also never gets set to 1 within 1 second from the
end of busy of CMD49 in sd_poweroff_notify().
This leads to I/O error and runtime PM error state.
I observed that the same card manufactured on 01/2024 works as expected.
This problem seems similar to the Kingston cards fixed with
commit c467c8f08185 ("mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston
Canvas Go Plus from 11/2019") and should be handled using quirks.
CID for the problematic card is here.
12345641535443002000000145016200
Manufacturer ID is 0x12 and defined as CID_MANFID_GIGASTONE as of now,
but would like comments on what naming is appropriate because MID list
is not public and not sure it's right.
Signed-off-by: Keita Aihara <keita.aihara@sony.com>
Link: https://lore.kernel.org/r/20240913094417.GA4191647@sony.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Sean noted that ever since commit 152e11f6df29 ("sched/fair: Implement
delayed dequeue") KVM's preemption notifiers have started
mis-classifying preemption vs blocking.
Notably p->on_rq is no longer sufficient to determine if a task is
runnable or blocked -- the aforementioned commit introduces tasks that
remain on the runqueue even through they will not run again, and
should be considered blocked for many cases.
Add the task_is_runnable() helper to classify things and audit all
external users of the p->on_rq state. Also add a few comments.
Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20241010091843.GK33184@noisy.programming.kicks-ass.net
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into gpio/for-next
Linux 6.12-rc3
|
|
auxiliary_find_device has been unused since commit
1c5de097bea3 ("net/mlx5: Fix mlx5_get_next_dev() peer device matching")
which was the only use since it was originally added.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20240929141112.69824-1-linux@treblig.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Remove macro list_for_each_reverse due to below reasons:
- it is same as list_for_each_prev.
- it is not used by current kernel tree.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20240917-fix_list-v2-1-d2914665e89f@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add support for the UART auxiliary devices. This enables access to up to
3 different UARTs, which are implemented in the FPGA.
Signed-off-by: Gerhard Engleder <eg@keba.com>
Link: https://lore.kernel.org/r/20241011191257.19702-9-gerhard@engleder-embedded.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add support for the battery auxiliary device. This enables monitoring of
the battery.
Signed-off-by: Gerhard Engleder <eg@keba.com>
Link: https://lore.kernel.org/r/20241011191257.19702-8-gerhard@engleder-embedded.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add support for the fan auxiliary device. This enables monitoring of the
fan.
Signed-off-by: Gerhard Engleder <eg@keba.com>
Link: https://lore.kernel.org/r/20241011191257.19702-7-gerhard@engleder-embedded.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add support for the SPI controller auxiliary device. This enables access
to the SPI flash of the FPGA and some other SPI devices.
The actual list of SPI devices is detected by reading some bits out of
the previously registered I2C EEPROM.
Signed-off-by: Gerhard Engleder <eg@keba.com>
Link: https://lore.kernel.org/r/20241011191257.19702-4-gerhard@engleder-embedded.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Introduce the function pskb_network_may_pull_reason() and make
pskb_network_may_pull() a simple inline call to it. The drop reasons of
it just come from pskb_may_pull_reason.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pwm: Support for duty_offset
Support a new abstraction for pwm configuration that allows to specify
the time between start of period and the raising edge of the signal
("duty offset").
This is used in a patch series by Trevor Gamblin for triggering an ADC
conversion and afterwards read out the result. See
https://lore.kernel.org/linux-iio/20240909-ad7625_r1-v5-0-60a397768b25@baylibre.com/
for more details.
|
|
Simple type conversion with no functional change implied.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20241010181535.3083262-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Linux 6.12-rc2
Resolved movement of asm/unaligned.h to linux/unaligned.h
|
|
Fix the build warnings when CONFIG_FSL_ENETC_MDIO is not enabled.
The detailed warnings are shown as follows.
include/linux/fsl/enetc_mdio.h:62:18: warning: no previous prototype for function 'enetc_hw_alloc' [-Wmissing-prototypes]
62 | struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs)
| ^
include/linux/fsl/enetc_mdio.h:62:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
62 | struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs)
| ^
| static
8 warnings generated.
Fixes: 6517798dd343 ("enetc: Make MDIO accessors more generic and export to include/linux/fsl")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410102136.jQHZOcS4-lkp@intel.com/
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241011030103.392362-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull NFS client fixes from Anna Schumaker:
"Localio Bugfixes:
- remove duplicated include in localio.c
- fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
- fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
- fix nfsd_file tracepoints to handle NULL rqstp pointers
Other Bugfixes:
- fix program selection loop in svc_process_common
- fix integer overflow in decode_rc_list()
- prevent NULL-pointer dereference in nfs42_complete_copies()
- fix CB_RECALL performance issues when using a large number of
delegations"
* tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: remove revoked delegation from server's delegation list
nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstp
nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
SUNRPC: Fix integer overflow in decode_rc_list()
sunrpc: fix prog selection loop in svc_process_common
nfs: Remove duplicated include in localio.c
|
|
Remove the scaffold member from the lsm_prop. Remove the
remaining places it is being set.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subj line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Create a new LSM hook security_cred_getlsmprop() which, like
security_cred_getsecid(), fetches LSM specific attributes from the
cred structure. The associated data elements in the audit sub-system
are changed from a secid to a lsm_prop to accommodate multiple possible
LSM audit users.
Cc: linux-integrity@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: selinux@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subj line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Change the security_inode_getsecid() interface to fill in a
lsm_prop structure instead of a u32 secid. This allows for its
callers to gather data from all registered LSMs. Data is provided
for IMA and audit. Change the name to security_inode_getlsmprop().
Cc: linux-integrity@vger.kernel.org
Cc: selinux@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subj line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Change the security_current_getsecid_subj() and
security_task_getsecid_obj() interfaces to fill in a lsm_prop structure
instead of a u32 secid. Audit interfaces will need to collect all
possible security data for possible reporting.
Cc: linux-integrity@vger.kernel.org
Cc: audit@vger.kernel.org
Cc: selinux@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
There may be more than one LSM that provides IPC data for auditing.
Change security_ipc_getsecid() to fill in a lsm_prop structure instead
of the u32 secid. Change the name to security_ipc_getlsmprop() to
reflect the change.
Cc: audit@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Replace the secid value stored in struct audit_context with a struct
lsm_prop. Change the code that uses this value to accommodate the
change. security_audit_rule_match() expects a lsm_prop, so existing
scaffolding can be removed. A call to security_secid_to_secctx()
is changed to security_lsmprop_to_secctx(). The call to
security_ipc_getsecid() is scaffolded.
A new function lsmprop_is_set() is introduced to identify whether
an lsm_prop contains a non-zero value.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject line tweak, fix lsmprop_is_set() typo]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Add a new hook security_lsmprop_to_secctx() and its LSM specific
implementations. The LSM specific code will use the lsm_prop element
allocated for that module. This allows for the possibility that more
than one module may be called upon to translate a secid to a string,
as can occur in the audit code.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Change the secid parameter of security_audit_rule_match
to a lsm_prop structure pointer. Pass the entry from the
lsm_prop structure for the approprite slot to the LSM hook.
Change the users of security_audit_rule_match to use the
lsm_prop instead of a u32. The scaffolding function lsmprop_init()
fills the structure with the value of the old secid, ensuring that
it is available to the appropriate module hook. The sources of
the secid, security_task_getsecid() and security_inode_getsecid(),
will be converted to use the lsm_prop structure later in the series.
At that point the use of lsmprop_init() is dropped.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
When more than one security module is exporting data to audit and
networking sub-systems a single 32 bit integer is no longer
sufficient to represent the data. Add a structure to be used instead.
The lsm_prop structure definition is intended to keep the LSM
specific information private to the individual security modules.
The module specific information is included in a new set of
header files under include/lsm. Each security module is allowed
to define the information included for its use in the lsm_prop.
SELinux includes a u32 secid. Smack includes a pointer into its
global label list. The conditional compilation based on feature
inclusion is contained in the include/lsm files.
Cc: apparmor@lists.ubuntu.com
Cc: bpf@vger.kernel.org
Cc: selinux@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
[PM: added include/linux/lsm/ to MAINTAINERS, subj tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Fix a typo in comments: wether -> whether.
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Link: https://lore.kernel.org/r/20241010091355.8271-1-algonell@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
With KASAN and PREEMPT_RT enabled, calling task_work_add() in
task_tick_mm_cid() may cause the following splat.
[ 63.696416] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 63.696416] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 610, name: modprobe
[ 63.696416] preempt_count: 10001, expected: 0
[ 63.696416] RCU nest depth: 1, expected: 1
This problem is caused by the following call trace.
sched_tick() [ acquire rq->__lock ]
-> task_tick_mm_cid()
-> task_work_add()
-> __kasan_record_aux_stack()
-> kasan_save_stack()
-> stack_depot_save_flags()
-> alloc_pages_mpol_noprof()
-> __alloc_pages_noprof()
-> get_page_from_freelist()
-> rmqueue()
-> rmqueue_pcplist()
-> __rmqueue_pcplist()
-> rmqueue_bulk()
-> rt_spin_lock()
The rq lock is a raw_spinlock_t. We can't sleep while holding
it. IOW, we can't call alloc_pages() in stack_depot_save_flags().
The task_tick_mm_cid() function with its task_work_add() call was
introduced by commit 223baf9d17f2 ("sched: Fix performance regression
introduced by mm_cid") in v6.4 kernel.
Fortunately, there is a kasan_record_aux_stack_noalloc() variant that
calls stack_depot_save_flags() while not allowing it to allocate
new pages. To allow task_tick_mm_cid() to use task_work without
page allocation, a new TWAF_NO_ALLOC flag is added to enable calling
kasan_record_aux_stack_noalloc() instead of kasan_record_aux_stack()
if set. The task_tick_mm_cid() function is modified to add this new flag.
The possible downside is the missing stack trace in a KASAN report due
to new page allocation required when task_work_add_noallloc() is called
which should be rare.
Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241010014432.194742-1-longman@redhat.com
|
|
Add a quirk similar to eeprom_93xx46 to add an extra clock cycle before
reading data from the EEPROM.
The 93Cx6 family of EEPROMs output a "dummy 0 bit" between the writing
of the op-code/address from the host to the EEPROM and the reading of
the actual data from the EEPROM.
More info can be found on page 6 of the AT93C46 datasheet (linked below).
Similar notes are found in other 93xx6 datasheets.
In summary the read operation for a 93Cx6 EEPROM is:
Write to EEPROM: 110[A5-A0] (9 bits)
Read from EEPROM: 0[D15-D0] (17 bits)
Where:
110 is the start bit and READ OpCode
[A5-A0] is the address to read from
0 is a "dummy bit" preceding the actual data
[D15-D0] is the actual data.
Looking at the READ timing diagrams in the 93Cx6 datasheets the dummy
bit should be clocked out on the last address bit clock cycle meaning it
should be discarded naturally.
However, depending on the hardware configuration sometimes this dummy
bit is not discarded. This is the case with Exar PCI UARTs which require
an extra clock cycle between sending the address and reading the data.
Datasheet: https://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-5193-SEEPROM-AT93C46D-Datasheet.pdf
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Parker Newman <pnewman@connecttech.com>
Link: https://lore.kernel.org/r/0f23973efefccd2544705a0480b4ad4c2353e407.1727880931.git.pnewman@connecttech.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Cancelling an rx command is signalled using bit 14 of the rx DMA status
register and not bit 11.
This bit is currently unused, but this error becomes apparent, for
example, when tracing the status register when closing the port.
Fixes: eddac5af0654 ("soc: qcom: Add GENI based QUP Wrapper driver")
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20241009145110.16847-7-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Since commit ebd2c8f6d2ec ("serial: kill off uart_info") has
removed uart_info, the uart_info declaration looks lonely,
let it go.
Signed-off-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240920053423.1373354-1-siyanteng@cqsoftware.com.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
ftrace_regs was created to hold registers that store information to save
function parameters, return value and stack. Since it is a subset of
pt_regs, it should only be used by its accessor functions. But because
pt_regs can easily be taken from ftrace_regs (on most archs), it is
tempting to use it directly. But when running on other architectures, it
may fail to build or worse, build but crash the kernel!
Instead, make struct ftrace_regs an empty structure and have the
architectures define __arch_ftrace_regs and all the accessor functions
will typecast to it to get to the actual fields. This will help avoid
usage of ftrace_regs directly.
Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Setting the end address for a resource with a given size lacks a helper and
is therefore coded manually unlike the getter side which has a helper for
resource size calculation. Also, almost all callsites that calculate the
end address for a resource also set the start address right before it like
this:
res->start = start_addr;
res->end = res->start + size - 1;
Add resource_set_range(res, start_addr, size) that sets the start address
and calculates the end address to simplify this often repeated fragment.
Also add resource_set_size() for the cases where setting the start address
of the resource is not necessary but mention in its kerneldoc that
resource_set_range() is preferred when setting both addresses.
Link: https://lore.kernel.org/r/20240614100606.15830-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
The user thresholds mechanism is a way to have the userspace to tell
the thermal framework to send a notification when a temperature limit
is crossed. There is no id, no hysteresis, just the temperature and
the direction of the limit crossing. That means we can be notified
when a threshold is crossed the way up only, or the way down only or
both ways. That allows to create hysteresis values if it is needed.
A threshold can be added, deleted or flushed. The latter means all
thresholds belonging to a thermal zone will be deleted.
When a threshold is added:
- if the same threshold (temperature and direction) exists, an error
is returned
- if a threshold is specified with the same temperature but a
different direction, the specified direction is added
- if there is no threshold with the same temperature then it is
created
When a threshold is deleted:
- if the same threshold (temperature and direction) exists, it is
deleted
- if a threshold is specified with the same temperature but a
different direction, the specified direction is removed
- if there is no threshold with the same temperature, then an error
is returned
When the threshold are flushed:
- All thresholds related to a thermal zone are deleted
When a threshold is crossed:
- the userspace does not need to know which threshold(s) have been
crossed, it will be notified with the current temperature and the
previous temperature
- if multiple thresholds have been crossed between two updates only
one notification will be send to the userspace, it is pointless to
send a notification per thresholds crossed as the userspace can
handle that easily when it has the temperature delta information
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://patch.msgid.link/20240923100005.2532430-2-daniel.lezcano@linaro.org
[ rjw: Subject edit, use BIT(0) and BIT(1) in symbol definitions ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.12-rc3).
No conflicts and no adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
clocksource_change_rating() has been unused since 2017's commit
63ed4e0c67df ("Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code")
Remove it.
__clocksource_change_rating now only has one use which is ifdef'd.
Move it into the ifdef'd section.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010135446.213098-1-linux@treblig.org
|
|
Simplify return address printing in the function graph tracer by removing
fgraph_extras. Since this feature is only used by the function graph
tracer and the feature flags can directly accessible from the function
graph tracer, fgraph_extras can be removed from the fgraph callback.
Cc: Donglin Peng <dolinux.peng@gmail.com>
Link: https://lore.kernel.org/172857234900.270774.15378354017601069781.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Paolo Abeni says:
====================
net: introduce TX H/W shaping API
We have a plurality of shaping-related drivers API, but none flexible
enough to meet existing demand from vendors[1].
This series introduces new device APIs to configure in a flexible way
TX H/W shaping. The new functionalities are exposed via a newly
defined generic netlink interface and include introspection
capabilities. Some self-tests are included, on top of a dummy
netdevsim implementation. Finally a basic implementation for the iavf
driver is provided.
Some usage examples:
* Configure shaping on a given queue:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
--do set --json '{"ifindex": '$IFINDEX',
"shaper": {"handle":
{"scope": "queue", "id":'$QUEUEID'},
"bw-max": 2000000}}'
* Container B/W sharing
The orchestration infrastructure wants to group the
container-related queues under a RR scheduling and limit the aggregate
bandwidth:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID2'},
"weight": '$W2'}],
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope":"node"},
"bw-max": 10000000}'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}}
Q1 \
\
Q2 -- node 0 ------- netdev
/ (bw-max: 10M)
Q3 /
* Delegation
A containers wants to limit the aggregate B/W bandwidth of 2 of the 3
queues it owns - the starting configuration is the one from the
previous point:
SPEC=Documentation/netlink/specs/net_shaper.yaml
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID2'},
"weight": '$W2'}],
"handle": {"scope": "node"},
"bw-max": 5000000 }'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}}
Q1 -- node 1 --------\
/ (bw-max: 5M) \
Q2 / node 0 ------- netdev
/(bw-max: 10M)
Q3 ------------------/
In a group operation, when prior to the op itself, the leaves have
different parents, the user must specify the parent handle for the
group. I.e., starting from the previous config:
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope": "node"},
"bw-max": 3000000 }'
Netlink error: Invalid argument
nl_len = 96 (80) nl_flags = 0x300 nl_type = 2
error: -22
extack: {'msg': 'All the leaves shapers must have the same old parent'}
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope": "node"},
"parent": {"scope": "node", "id": 1},
"bw-max": 3000000 }
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}}
Q1 -- node 2 ---
/(bw-max:3M)\
Q3 / \
---- node 1 \
/ (bw-max: 5M)\
Q2 node 0 ------- netdev
(bw-max: 10M)
* Cleanup:
Still starting from config 1To delete a single queue shaper
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID3'}}'
Q1 -- node 2 ---
(bw-max:3M)\
\
---- node 1 \
/ (bw-max: 5M)\
Q2 node 0 ------- netdev
(bw-max: 10M)
Deleting a node shaper relinks all its leaves to the node's parent:
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id":2}}'
Q1 ---\
\
node 1----- \
/ (bw-max: 5M)\
Q2----/ node 0 ------- netdev
(bw-max: 10M)
Deleting the last shaper under a node shaper deletes the node, too:
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID1'}}'
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID2'}}'
./tools/net/ynl/cli.py --spec $SPEC --do get --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id": 1}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -2
extack: {'bad-attr': '.handle'}
Such delete recurses on parents that are left over with no leaves:
./tools/net/ynl/cli.py --spec $SPEC --do get --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id": 0}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -2
extack: {'bad-attr': '.handle'}
v8: https://lore.kernel.org/cover.1727704215.git.pabeni@redhat.com
v7: https://lore.kernel.org/cover.1725919039.git.pabeni@redhat.com
v6: https://lore.kernel.org/cover.1725457317.git.pabeni@redhat.com
v5: https://lore.kernel.org/cover.1724944116.git.pabeni@redhat.com
v4: https://lore.kernel.org/cover.1724165948.git.pabeni@redhat.com
v3: https://lore.kernel.org/cover.1722357745.git.pabeni@redhat.com
RFC v2: https://lore.kernel.org/cover.1721851988.git.pabeni@redhat.com
RFC v1: https://lore.kernel.org/cover.1719518113.git.pabeni@redhat.com
====================
Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds new virtchnl opcodes and structures for rate limit
and quanta size configuration, which include:
1. VIRTCHNL_OP_CONFIG_QUEUE_BW, to configure max bandwidth for each
VF per queue.
2. VIRTCHNL_OP_CONFIG_QUANTA, to configure quanta size per queue.
3. VIRTCHNL_OP_GET_QOS_CAPS, VF queries current QoS configuration, such
as enabled TCs, arbiter type, up2tc and bandwidth of VSI node. The
configuration is previously set by DCB and PF, and now is the potential
QoS capability of VF. VF can take it as reference to configure queue TC
mapping.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Wenjun Wu <wenjun1.wu@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/839002f7bd6f63b985a060a51b079f6e6dbbe237.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|