Age | Commit message (Collapse) | Author |
|
When the grace period kthread checks the extended quiescent state
counter of a CPU, full ordering is necessary to ensure that either:
* If the GP kthread observes the remote target in an extended quiescent
state, then that target must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it exits that extended quiescent state. Also the GP kthread must
observe all accesses performed by the target prior it entering in
EQS.
or:
* If the GP kthread observes the remote target NOT in an extended
quiescent state, then the target further entering in an extended
quiescent state must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it enters that extended quiescent state. Also the GP kthread later
observing that EQS must also observe all accesses performed by the
target prior it entering in EQS.
This ordering is explicitly performed both on the first EQS snapshot
and on the second one as well through the combination of a preceding
full barrier followed by an acquire read. However the second snapshot's
full memory barrier is redundant and not needed to enforce the above
guarantees:
GP kthread Remote target
---- -----
// Access prior GP
WRITE_ONCE(A, 1)
// first snapshot
smp_mb()
x = smp_load_acquire(EQS)
// Access prior GP
WRITE_ONCE(B, 1)
// EQS enter
// implied full barrier by atomic_add_return()
atomic_add_return(RCU_DYNTICKS_IDX, EQS)
// implied full barrier by atomic_add_return()
READ_ONCE(A)
// second snapshot
y = smp_load_acquire(EQS)
z = READ_ONCE(B)
If the GP kthread above fails to observe the remote target in EQS
(x not in EQS), the remote target will observe A == 1 after further
entering in EQS. Then the second snapshot taken by the GP kthread only
need to be an acquire read in order to observe z == 1.
Therefore remove the needless full memory barrier on second snapshot.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
In cloud environments it can be useful to *only* enable the vmexit
mitigation and leave syscalls vulnerable. Add that as an option.
This is similar to the old spectre_bhi=auto option which was removed
with the following commit:
36d4fe147c87 ("x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto")
with the main difference being that this has a more descriptive name and
is disabled by default.
Mitigation switch requested by Maksim Davydov <davydov-max@yandex-team.ru>.
[ bp: Massage. ]
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/2cbad706a6d5e1da2829e5e123d8d5c80330148c.1719381528.git.jpoimboe@kernel.org
|
|
Duplicating the documentation of all the Spectre kernel cmdline options
in two separate files is unwieldy and error-prone. Instead just add a
reference to kernel-parameters.txt from spectre.rst.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
Link: https://lore.kernel.org/r/450b5f4ffe891a8cc9736ec52b0c6f225bab3f4b.1719381528.git.jpoimboe@kernel.org
|
|
The direct-call syscall dispatch function doesn't know that the exit()
and exit_group() syscall handlers don't return, so the call sites aren't
optimized accordingly.
Fix that by marking the exit syscall declarations __noreturn.
Fixes the following warnings:
vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
Fixes: 1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/5d8882bc077d8eadcc7fd1740b56dfb781f12288.1719381528.git.jpoimboe@kernel.org
|
|
Josef Bacik <josef@toxicpanda.com> says:
Currently if you want to get mount options for a mount and you're using
statmount(), you still have to open /proc/mounts to parse the mount options.
statmount() does have the ability to store an arbitrary string however,
additionally the way we do that is with a seq_file, which is also how we use
->show_options for the individual file systems.
Extent statmount() to have a flag for fetching the mount options of a mount.
This allows users to not have to parse /proc mount for anything related to a
mount. I've extended the existing statmount() test to validate this feature
works as expected. As you can tell from the ridiculous amount of silly string
parsing, this is a huge win for users and climate change as we will no longer
have to waste several cycles parsing strings anymore.
Josef Bacik (4):
fs: rename show_mnt_opts -> show_vfsmnt_opts
fs: add a helper to show all the options for a mount
fs: export mount options via statmount()
sefltests: extend the statmount test for mount options
fs/internal.h | 5 +
fs/namespace.c | 7 +
fs/proc_namespace.c | 29 ++--
include/uapi/linux/mount.h | 3 +-
.../filesystems/statmount/statmount_test.c | 131 +++++++++++++++++-
5 files changed, 164 insertions(+), 11 deletions(-)
Link: https://lore.kernel.org/r/cover.1719257716.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that we support exporting mount options, via statmount(), add a test
to validate that it works.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/cabe09f0933d9c522da6e7b6cc160254f4f6c3b9.1719257716.git.josef@toxicpanda.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
[brauner: simplify and fix]
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
statmount() can export arbitrary strings, so utilize the __spare1 slot
for a mnt_opts string pointer, and then support asking for and setting
the mount options during statmount(). This calls into the helper for
showing mount options, which already uses a seq_file, so fits in nicely
with our existing mechanism for exporting strings via statmount().
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/3aa6bf8bd5d0a21df9ebd63813af8ab532c18276.1719257716.git.josef@toxicpanda.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
[brauner: only call sb->s_op->show_options()]
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Josef Bacik <josef@toxicpanda.com> says:
Currently the only way to iterate over mount entries in mount namespaces that
aren't your own is to trawl through /proc in order to find /proc/$PID/mountinfo
for the mount namespace that you want. This is hugely inefficient, so extend
both statmount() and listmount() to allow specifying a mount namespace id in
order to get to mounts in other mount namespaces.
There are a few components to this
1. Having a global index of the mount namespace based on the ->seq value in the
mount namespace. This gives us a unique identifier that isn't re-used.
2. Support looking up mount namespaces based on that unique identifier, and
validating the user has permission to access the given mount namespace.
3. Provide a new ioctl() on nsfs in order to extract the unique identifier we
can use for statmount() and listmount().
The code is relatively straightforward, and there is a selftest provided to
validate everything works properly.
This is based on vfs.all as of last week, so must be applied onto a tree that
has Christians error handling rework in this area. If you wish you can pull the
tree directly here
https://github.com/josefbacik/linux/tree/listmount.combined
Christian and I collaborated on this series, which is why there's patches from
both of us in this series.
Christian Brauner (4):
fs: relax permissions for listmount()
fs: relax permissions for statmount()
fs: Allow listmount() in foreign mount namespace
fs: Allow statmount() in foreign mount namespace
Josef Bacik (4):
fs: keep an index of current mount namespaces
fs: export the mount ns id via statmount
fs: add an ioctl to get the mnt ns id from nsfs
selftests: add a test for the foreign mnt ns extensions
fs/mount.h | 2 +
fs/namespace.c | 240 ++++++++++--
fs/nsfs.c | 14 +
include/uapi/linux/mount.h | 6 +-
include/uapi/linux/nsfs.h | 2 +
.../selftests/filesystems/statmount/Makefile | 2 +-
.../filesystems/statmount/statmount.h | 46 +++
.../filesystems/statmount/statmount_test.c | 53 +--
.../filesystems/statmount/statmount_test_ns.c | 360 ++++++++++++++++++
9 files changed, 659 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/filesystems/statmount/statmount.h
create mode 100644 tools/testing/selftests/filesystems/statmount/statmount_test_ns.c
Link: https://lore.kernel.org/r/cover.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This name is more consistent with what the helper does, which is to just
show the vfsmount options.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/fb363c62ffbf78a18095d596a19b8412aa991251.1719257716.git.josef@toxicpanda.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This tests both statmount and listmount to make sure they work with the
extensions that allow us to specify a mount ns to enter in order to find
the mount entries.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/2d1a35bc9ab94b4656c056c420f25e429e7eb0b1.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Update the maintainers entries to the new location of the
IOMMU tree.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan into main
|
|
kexec on pseries disables AIL (reloc_on_exc), required for scv
instruction support, before other CPUs have been shut down. This means
they can execute scv instructions after AIL is disabled, which causes an
interrupt at an unexpected entry location that crashes the kernel.
Change the kexec sequence to disable AIL after other CPUs have been
brought down.
As a refresher, the real-mode scv interrupt vector is 0x17000, and the
fixed-location head code probably couldn't easily deal with implementing
such high addresses so it was just decided not to support that interrupt
at all.
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Closes: https://lore.kernel.org/3b4b2943-49ad-4619-b195-bc416f1d1409@linux.ibm.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Link: https://msgid.link/20240625134047.298759-1-npiggin@gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Tariq Toukan says:
====================
mlx5 fixes 2024-06-27
This patchset provides fixes from the team to the mlx5 core and EN
drivers.
The first 3 patches by Daniel replace a buggy cap field with a newly
introduced one.
Patch 4 by Chris de-couples ingress ACL creation from a specific flow,
so it's invoked by other flows if needed.
Patch 5 by Jianbo fixes a possible missing cleanup of QoS objects.
Patches 6 and 7 by Leon fixes IPsec stats logic to better reflect the
traffic.
Series generated against:
commit 02ea312055da ("octeontx2-pf: Fix coverity and klockwork issues in octeon PF driver")
V2:
Fixed wrong cited SHA in patch 6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ConnectX devices lack ability to count payload data byte size which is
needed for SA to return to libreswan for rekeying.
As a solution let's approximate that by decreasing headers size from
total size counted by flow steering. The calculation doesn't take into
account any other headers which can be in the packet (e.g. IP extensions).
Fixes: 5a6cddb89b51 ("net/mlx5e: Update IPsec per SA packets/bytes count")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPsec SA statistics presents successfully decrypted and encrypted
packet and bytes, and not total handled by this SA. So update the
calculation logic to take into account failures.
Fixes: 6fb7f9408779 ("net/mlx5e: Connect mlx5 IPsec statistics with XFRM core")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the cited commit, mqprio_rl cleanup and free are mistakenly removed
in mlx5e_priv_cleanup(), and it causes the leakage of host memory and
firmware SCHEDULING_ELEMENT objects while changing eswitch mode. So,
add them back.
Fixes: 0bb7228f7096 ("net/mlx5e: Fix mqprio_rl handling on devlink reload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, ingress acl is used for three features. It is created only
when vport metadata match and prio tag are enabled. But active-backup
lag mode also uses it. It is independent of vport metadata match and
prio tag. And vport metadata match can be disabled using the
following devlink command:
# devlink dev param set pci/0000:08:00.0 name esw_port_metadata \
value false cmode runtime
If ingress acl is not created, will hit panic when creating drop rule
for active-backup lag mode. If always create it, there will be about
5% performance degradation.
Fix it by creating ingress acl when needed. If esw_port_metadata is
true, ingress acl exists, then create drop rule using existing
ingress acl. If esw_port_metadata is false, create ingress acl and
then create drop rule.
Fixes: 1749c4c51c16 ("net/mlx5: E-switch, add drop rule support to ingress ACL")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Due a bug in the device max_num_eqs doesn't always reflect a written
value. As a result, setting max_io_eqs may not work but appear
successful. Instead write max_num_eqs_24b, which reflects correct
value.
Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A new capability with more bits is added. If it's set use that value as
the maximum number of EQs available.
This cap is also writable by the vhca_resource_manager to allow limiting
the number of EQs available to SFs and VFs.
Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Expose new capability to support changing the number of EQs available
to other functions.
Fixes: 93197c7c509d ("mlx5/core: Support max_io_eqs for a function")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In some production workloads we noticed that connections could
sometimes close extremely prematurely with ETIMEDOUT after
transmitting only 1 TLP and RTO retransmission (when we would normally
expect roughly tcp_retries2 = TCP_RETR2 = 15 RTOs before a connection
closes with ETIMEDOUT).
From tracing we determined that these workloads can suffer from a
scenario where in fast recovery, after some retransmits, a DSACK undo
can happen at a point where the scoreboard is totally clear (we have
retrans_out == sacked_out == lost_out == 0). In such cases, calling
tcp_try_keep_open() means that we do not execute any code path that
clears tp->retrans_stamp to 0. That means that tp->retrans_stamp can
remain erroneously set to the start time of the undone fast recovery,
even after the fast recovery is undone. If minutes or hours elapse,
and then a TLP/RTO/RTO sequence occurs, then the start_ts value in
retransmits_timed_out() (which is from tp->retrans_stamp) will be
erroneously ancient (left over from the fast recovery undone via
DSACKs). Thus this ancient tp->retrans_stamp value can cause the
connection to die very prematurely with ETIMEDOUT via
tcp_write_err().
The fix: we change DSACK undo in fast recovery (TCP_CA_Recovery) to
call tcp_try_to_open() instead of tcp_try_keep_open(). This ensures
that if no retransmits are in flight at the time of DSACK undo in fast
recovery then we properly zero retrans_stamp. Note that calling
tcp_try_to_open() is more consistent with other loss recovery
behavior, since normal fast recovery (CA_Recovery) and RTO recovery
(CA_Loss) both normally end when tp->snd_una meets or exceeds
tp->high_seq and then in tcp_fastretrans_alert() the "default" switch
case executes tcp_try_to_open(). Also note that by inspection this
change to call tcp_try_to_open() implies at least one other nice bug
fix, where now an ECE-marked DSACK that causes an undo will properly
invoke tcp_enter_cwr() rather than ignoring the ECE mark.
Fixes: c7d9d6a185a7 ("tcp: undo on DSACK during recovery")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For users that hold a reference to a pidfd procfs might not even be
available nor is it desirable to parse through procfs just for the sake
of getting namespace file descriptors for a process.
Make it possible to directly retrieve namespace file descriptors from a
pidfd. Pidfds already can be used with setns() to change a set of
namespaces atomically.
Link: https://lore.kernel.org/r/20240627-work-pidfs-v1-4-7e9ab6cc3bb1@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
and call it from open_related_ns().
Link: https://lore.kernel.org/r/20240627-work-pidfs-v1-3-7e9ab6cc3bb1@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
They all contains struct ns_common ns and if there ever is one where
that isn't the case we'll catch it here at build time.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a simple cleanup helper for nsproxy.
Link: https://lore.kernel.org/r/20240627-work-pidfs-v1-2-7e9ab6cc3bb1@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a helper that returns the file descriptor and ensures that the old
variable contains a negative value. This makes it easy to rely on
CLASS(get_unused_fd).
Link: https://lore.kernel.org/r/20240627-work-pidfs-v1-1-7e9ab6cc3bb1@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix the definition of BSS_CHANGED_UNSOL_BCAST_PROBE_RESP so that
not all higher bits get set, 1<<31 is a signed variable, so when
we do
u64 changed = BSS_CHANGED_UNSOL_BCAST_PROBE_RESP;
we get sign expansion, so the value is 0xffff'ffff'8000'0000 and
that's clearly not desired. Use BIT_ULL() to make it unsigned as
well as the right type for the change flags.
Fixes: 178e9d6adc43 ("wifi: mac80211: fix unsolicited broadcast probe config")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240627104257.06174d291db2.Iba0d642916eb78a61f8ab2cc5ca9280783d9c1db@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add common gpio-line-names property for fsl,qoriq-gpio to fix below
warning.
arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-kbox-a-230-ls.dtb: gpio@2300000: 'gpio-line-names' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/gpio/fsl,qoriq-gpio.yaml
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240627202151.456812-1-Frank.Li@nxp.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
In order to utilize the listmount() and statmount() extensions that
allow us to call them on different namespaces we need a way to get the
mnt namespace id from user space. Add an ioctl to nsfs that will allow
us to extract the mnt namespace id in order to make these new extensions
usable.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/180449959d5a756af7306d6bda55f41b9d53e3cb.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This patch makes use of the new mnt_ns_id field in struct mnt_id_req to
allow users to stat mount entries not in their mount namespace. The
rules are the same as listmount(), the user must have CAP_SYS_ADMIN in
their user namespace and the target mount namespace must be a child of
the current namespace.
Co-developed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/52a2e17e50ba7aa420bc8bae1d9e88ff593395c1.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Expand struct mnt_id_req to add an optional mnt_ns_id field. When this
field is populated, listmount() will be performed on the specified mount
namespace, provided the currently application has CAP_SYS_ADMIN in its
user namespace and the mount namespace is a child of the current
namespace.
Co-developed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/49930bdce29a8367a213eb14c1e68e7e49284f86.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In order to allow users to iterate through children mount namespaces via
listmount we need a way for them to know what the ns id for the mount.
Add a new field to statmount called mnt_ns_id which will carry the ns id
for the given mount entry.
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/6dabf437331fb7415d886f7c64b21cb2a50b1c66.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In order to allow for listmount() to be used on different namespaces we
need a way to lookup a mount ns by its id. Keep a rbtree of the current
!anonymous mount name spaces indexed by ID that we can use to look up
the namespace.
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/e5fdd78a90f5b00a75bd893962a70f52a2c015cd.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It is sufficient to have capabilities in the owning user namespace of
the mount namespace to stat a mount regardless of whether it's reachable
or not.
Link: https://lore.kernel.org/r/bf5961d71ec479ba85806766b0d8d96043e67bba.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
util-linux is about to implement listmount() and statmount() support.
Karel requested the ability to scan the mount table in backwards order
because that's what libmount currently does in order to get the latest
mount first. We currently don't support this in listmount(). Add a new
LISTMOUNT_REVERSE flag to allow listing mounts in reverse order. For
example, listing all child mounts of /sys without LISTMOUNT_REVERSE
gives:
/sys/kernel/security @ mnt_id: 4294968369
/sys/fs/cgroup @ mnt_id: 4294968370
/sys/firmware/efi/efivars @ mnt_id: 4294968371
/sys/fs/bpf @ mnt_id: 4294968372
/sys/kernel/tracing @ mnt_id: 4294968373
/sys/kernel/debug @ mnt_id: 4294968374
/sys/fs/fuse/connections @ mnt_id: 4294968375
/sys/kernel/config @ mnt_id: 4294968376
whereas with LISTMOUNT_REVERSE it gives:
/sys/kernel/config @ mnt_id: 4294968376
/sys/fs/fuse/connections @ mnt_id: 4294968375
/sys/kernel/debug @ mnt_id: 4294968374
/sys/kernel/tracing @ mnt_id: 4294968373
/sys/fs/bpf @ mnt_id: 4294968372
/sys/firmware/efi/efivars @ mnt_id: 4294968371
/sys/fs/cgroup @ mnt_id: 4294968370
/sys/kernel/security @ mnt_id: 4294968369
Link: https://lore.kernel.org/r/20240607-vfs-listmount-reverse-v1-4-7877a2bfa5e5@kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It is sufficient to have capabilities in the owning user namespace of
the mount namespace to list all mounts regardless of whether they are
reachable or not.
Link: https://lore.kernel.org/r/8adc0d3f4f7495faacc6a7c63095961f7f1637c7.1719243756.git.josef@toxicpanda.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rely on cleanup helper and simplify error handling
Link: https://lore.kernel.org/r/20240607-vfs-listmount-reverse-v1-3-7877a2bfa5e5@kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Don't copy mount ids to userspace while holding the namespace semaphore.
We really shouldn't do that and I've gone through lenghts avoiding that
in statmount() already.
Limit the number of mounts that can be retrieved in one go to 1 million
mount ids. That's effectively 10 times the default limt of 100000 mounts
that we put on each mount namespace by default. Since listmount() is an
iterator limiting the number of mounts retrievable in one go isn't a
problem as userspace can just pick up where they left off.
Karel menti_ned that libmount will probably be reading the mount table
in "in small steps, 512 nodes per request. Nobody likes a tool that
takes too long in the kernel, and huge servers are unusual use cases.
Libmount will very probably provide API to define size of the step (IDs
per request)."
Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20240610-frettchen-liberal-a9a5c53865f8@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a simple cleanup helper so we can cleanup struct path easily.
No need for any extra machinery. Avoid DEFINE_FREE() as it causes a
local copy of struct path to be used. Just rely on path_put() directly
called from a cleanup helper.
Link: https://lore.kernel.org/r/20240607-vfs-listmount-reverse-v1-2-7877a2bfa5e5@kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Retrieving the system framebuffer's parent device in sysfb_init()
increments the parent device's reference count. Hence release the
reference before leaving the init function.
Adding the sysfb platform device acquires and additional reference
for the parent. This keeps the parent device around while the system
framebuffer is in use.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 9eac534db001 ("firmware/sysfb: Set firmware-framebuffer parent device")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Sui Jingfeng <suijingfeng@loongson.cn>
Cc: <stable@vger.kernel.org> # v6.9+
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240625081818.15696-1-tzimmermann@suse.de
|
|
Fixes compilation errors for Makefile snapshot target described in:
commit 231ce08b662a ("tools/power turbostat: Add "snapshot:" Makefile target")
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Commit 78464d7681f7 ("tools/power turbostat: Add columns for clustered
uncore frequency") introduced 'probe_intel_uncore_frequency_cluster()'
in a way which prevents printing uncore frequency columns if either of
the '-q' or '-l' options are used. Systems which do not have multiple
uncore frequencies per package are unaffected by this regression.
Fix the function so that uncore frequency columns are shown when either
the '-l' or '-q' option is used by checking if 'quiet' is true after
adding counters for the uncore frequency columns.
Fixes: 78464d7681f7 ("tools/power turbostat: Add columns for clustered uncore frequency")
Signed-off-by: Adam Hawley <adam.james.hawley@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
In some cases specifying the '-n' command line argument will cause
turbostat to fail. For instance 'turbostat -n 1' works fine; however,
'turbostat -n 1 -d' will fail. This is the result of the first call
to getopt_long_only() where "MP" is specified as the optstring. This can
be easily fixed by changing the optstring from "MP" to "MPn:" to remove
ambiguity between the arguments.
tools/power turbostat: option '-n' is ambiguous; possibilities: '-num_iterations' '-no-msr' '-no-perf'
Fixes: a0e86c90b83c ("tools/power turbostat: Add --no-perf option")
Signed-off-by: David Arcari <darcari@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pyll crypto fix from Herbert Xu:
"Fix a build failure in qat"
* tag 'v6.10-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: qat - fix linking errors when PCI_IOV is disabled
|
|
Pull drm fixes from Dave Airlie:
"Regular fixes, mostly amdgpu with some minor fixes in other places,
along with a fix for a very narrow UAF race in the pid handover code.
core:
- fix refcounting race on pid handover
fbdev:
- Fix fb_info when vmalloc is used, regression from
CONFIG_DRM_FBDEV_LEAK_PHYS_SMEM.
amdgpu:
- SMU 14.x fix
- vram info parsing fix
- mode1 reset fix
- LTTPR fix
- Virtual display fix
- Avoid spurious error in PSP init
i915:
- Fix potential UAF due to race on fence register revocation
nouveau
- nouveau tv mode fixes
panel:
- Add KOE TX26D202VM0BWA timings"
* tag 'drm-fixes-2024-06-28' of https://gitlab.freedesktop.org/drm/kernel:
drm/drm_file: Fix pid refcounting race
drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_ld_modes
drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_hd_modes
drm/amdgpu: Don't show false warning for reg list
drm/amdgpu: avoid using null object of framebuffer
drm/amd/display: Send DP_TOTAL_LTTPR_CNT during detection if LTTPR is present
drm/amdgpu: Fix pci state save during mode-1 reset
drm/amdgpu/atomfirmware: fix parsing of vram_info
drm/amd/swsmu: add MALL init support workaround for smu_v14_0_1
drm/i915/gt: Fix potential UAF by revoke of fence registers
drm/panel: simple: Add missing display timing flags for KOE TX26D202VM0BWA
drm/fbdev-dma: Only set smem_start is enable per module option
|
|
Fix copy-paste error in the code comment. The code refers to
LED blinking configuration, not brightness configuration. It
was likely copied from comment above this one which does
refer to brightness configuration.
Fixes: 4e901018432e ("net: phy: phy_device: Call into the PHY driver to set LED blinking")
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240626030638.512069-1-marex@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
<maarten.lankhorst@linux.intel.com>, Maxime Ripard
<mripard@kernel.org>, Thomas Zimmermann <tzimmermann@suse.de>
filp->pid is supposed to be a refcounted pointer; however, before this
patch, drm_file_update_pid() only increments the refcount of a struct
pid after storing a pointer to it in filp->pid and dropping the
dev->filelist_mutex, making the following race possible:
process A process B
========= =========
begin drm_file_update_pid
mutex_lock(&dev->filelist_mutex)
rcu_replace_pointer(filp->pid, <pid B>, 1)
mutex_unlock(&dev->filelist_mutex)
begin drm_file_update_pid
mutex_lock(&dev->filelist_mutex)
rcu_replace_pointer(filp->pid, <pid A>, 1)
mutex_unlock(&dev->filelist_mutex)
get_pid(<pid A>)
synchronize_rcu()
put_pid(<pid B>) *** pid B reaches refcount 0 and is freed here ***
get_pid(<pid B>) *** UAF ***
synchronize_rcu()
put_pid(<pid A>)
As far as I know, this race can only occur with CONFIG_PREEMPT_RCU=y
because it requires RCU to detect a quiescent state in code that is not
explicitly calling into the scheduler.
This race leads to use-after-free of a "struct pid".
It is probably somewhat hard to hit because process A has to pass
through a synchronize_rcu() operation while process B is between
mutex_unlock() and get_pid().
Fix it by ensuring that by the time a pointer to the current task's pid
is stored in the file, an extra reference to the pid has been taken.
This fix also removes the condition for synchronize_rcu(); I think
that optimization is unnecessary complexity, since in that case we
would usually have bailed out on the lockless check above.
Fixes: 1c7a387ffef8 ("drm: Update file owner during use")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
DRIVERS
Currently there is no entry for
Documentation/devicetree/bindings/soc/ti/ti,pruss.yaml file. The driver
file corresponding to this binding file is drivers/soc/ti/pruss.c which
is maintained in TI KEYSTONE MULTICORE NAVIGATOR DRIVERS. Add the
binding file also to the same section so that this could be maintained.
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://lore.kernel.org/r/20240625153319.795665-2-danishanwar@ti.com
Signed-off-by: Nishanth Menon <nm@ti.com>
|