Age | Commit message (Collapse) | Author |
|
Fix one kernel-doc comment to silence the warning:
fs/read_write.c:88: warning: Function parameter or member 'maxsize' not described in 'generic_file_llseek_size'
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Message-Id: <20230811014359.4960-1-yang.lee@linux.alibaba.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Let's clarify from where we take idmapping of each type:
- caller
- filesystem
- mount
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Message-Id: <20230625182047.26854-1-aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
For certain types of applications (for example PLC software or
RAN processing), upon occurrence of an event, it is necessary to
complete a certain task in a maximum amount of time (deadline).
One way to express this requirement is with a pair of numbers,
deadline time and execution time, where:
* deadline time: length of time between event and deadline.
* execution time: length of time it takes for processing of event
to occur on a particular hardware platform
(uninterrupted).
The particular values depend on use-case. For the case
where the realtime application executes in a virtualized
guest, an IPI which must be serviced in the host will cause
the following sequence of events:
1) VM-exit
2) execution of IPI (and function call)
3) VM-entry
Which causes an excess of 50us latency as observed by cyclictest
(this violates the latency requirement of vRAN application with 1ms TTI,
for example).
invalidate_bh_lrus calls an IPI on each CPU that has non empty
per-CPU cache:
on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1);
The performance when using the per-CPU LRU cache is as follows:
42 ns per __find_get_block
68 ns per __find_get_block_slow
Given that the main use cases for latency sensitive applications
do not involve block I/O (data necessary for program operation is
locked in RAM), disable per-CPU buffer_head caches for isolated CPUs.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Message-Id: <ZJtBrybavtb1x45V@tpad>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
sharing
When NFS superblocks are created by automounting, their LSM parameters
aren't set in the fs_context struct prior to sget_fc() being called,
leading to failure to match existing superblocks.
This bug leads to messages like the following appearing in dmesg when
fscache is enabled:
NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
Fix this by adding a new LSM hook to load fc->security for submount
creation.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode")
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230808-master-v9-1-e0ecde888221@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull virtio fixes from Michael Tsirkin:
"Just a bunch of bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (26 commits)
virtio-mem: check if the config changed before fake offlining memory
virtio-mem: keep retrying on offline_and_remove_memory() errors in Sub Block Mode (SBM)
virtio-mem: convert most offline_and_remove_memory() errors to -EBUSY
virtio-mem: remove unsafe unplug in Big Block Mode (BBM)
pds_vdpa: fix up debugfs feature bit printing
pds_vdpa: alloc irq vectors on DRIVER_OK
pds_vdpa: clean and reset vqs entries
pds_vdpa: always allow offering VIRTIO_NET_F_MAC
pds_vdpa: reset to vdpa specified mac
virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG case
vdpa/mlx5: Fix crash on shutdown for when no ndev exists
vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessary
vdpa/mlx5: Fix mr->initialized semantics
vdpa/mlx5: Correct default number of queues when MQ is on
virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs()
vduse: Use proper spinlock for IRQ injection
vdpa: Enable strict validation for netlinks ops
vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length check
vdpa: Add queue index attr to vdpa_nl_policy for nlattr length check
vdpa: Add features attr to vdpa_nl_policy for nlattr length check
...
|
|
David Vernet says:
====================
The struct bpf_struct_ops structure in BPF is a framework that allows
subsystems to extend themselves using BPF. In commit 68b04864ca425
("bpf: Create links for BPF struct_ops maps") and commit aef56f2e918bf
("bpf: Update the struct_ops of a bpf_link"), the structure was updated
to include new ->validate() and ->update() callbacks respectively in
support of allowing struct_ops maps to be created with BPF_F_LINK.
The intention was that struct bpf_struct_ops implementations could
support map updates through the link. Because map validation and
registration would take place in two separate steps for struct_ops
maps managed by the link (the first in map update elem, and the latter
in link create), the ->validate() callback was added, and any struct_ops
implementation that wished to use BPF_F_LINK, even just for lifetime
management, would then be required to define both it and ->update().
Not all struct_ops implementations can or will support update, however.
For example, the sched_ext struct_ops implementation proposed in [0]
will not be able to support atomic map updates because it can race with
sysrq, has to cycle tasks through various states in order to safely
transition, etc. It can, however, benefit from letting the BPF link
automatically evict the struc_ops map when the application exits (e.g.
if it crashes).
This patch set therefore:
1. Updates the struct_ops implementation to support default values for
->validate() and ->update() so that struct_ops implementations can
benefit from BPF_F_LINK management even if they can't support
updates.
2. Documents struct bpf_struct_ops so that the semantics are clear and
well defined.
---
v2: https://lore.kernel.org/bpf/0f5ea3de-c6e7-490f-b5ec-b5c7cd288687@gmail.com/T/
Changes from v2 -> v3:
- Add patch 2/2 that documents the struct bpf_struct_ops structure.
- Add Kui-Feng's Acked-by tag to patch 1/2.
v1: https://lore.kernel.org/lkml/20230811150934.GA542801@maniforge/
Changes from v1 -> v2:
- Move the if (!st_map->st_ops->update) check outside of the critical
section before we acquire the update_mutex.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Subsystems that want to implement a struct bpf_struct_ops structure to
enable struct_ops maps must currently reverse engineer how the structure
works. Given that this is meant to be a way for subsystem maintainers to
extend their subsystems using BPF, let's document it to make it a bit
easier on them.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230814185908.700553-3-void@manifault.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also
define the .validate() and .update() callbacks in its corresponding
struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful
in its own right to ensure that the map is unloaded if an application
crashes. For example, with sched_ext, we want to automatically unload
the host-wide scheduler if the application crashes. We would likely
never support updating elements of a sched_ext struct_ops map, so we'd
have to implement these callbacks showing that they _can't_ support
element updates just to benefit from the basic lifetime management of
struct_ops links.
Let's enable struct_ops maps to work with BPF_F_LINK even if they
haven't defined these callbacks, by assuming that a struct_ops map
element cannot be updated by default.
Acked-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230814185908.700553-2-void@manifault.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add several new tcx test cases to improve test coverage. This also includes
a few new tests with ingress instead of clsact qdisc, to cover the fix from
commit dc644b540a2d ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").
# ./test_progs -t tc
[...]
#234 tc_links_after:OK
#235 tc_links_append:OK
#236 tc_links_basic:OK
#237 tc_links_before:OK
#238 tc_links_chain_classic:OK
#239 tc_links_chain_mixed:OK
#240 tc_links_dev_cleanup:OK
#241 tc_links_dev_mixed:OK
#242 tc_links_ingress:OK
#243 tc_links_invalid:OK
#244 tc_links_prepend:OK
#245 tc_links_replace:OK
#246 tc_links_revision:OK
#247 tc_opts_after:OK
#248 tc_opts_append:OK
#249 tc_opts_basic:OK
#250 tc_opts_before:OK
#251 tc_opts_chain_classic:OK
#252 tc_opts_chain_mixed:OK
#253 tc_opts_delete_empty:OK
#254 tc_opts_demixed:OK
#255 tc_opts_detach:OK
#256 tc_opts_detach_after:OK
#257 tc_opts_detach_before:OK
#258 tc_opts_dev_cleanup:OK
#259 tc_opts_invalid:OK
#260 tc_opts_mixed:OK
#261 tc_opts_prepend:OK
#262 tc_opts_replace:OK
#263 tc_opts_revision:OK
[...]
Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
The failure handling procedure destroys page pools for all queues,
including those that haven't had their page pool created yet. this patch
introduces necessary adjustments to prevent potential risks and
inconsistency with the error handling behavior.
Fixes: 0ebab78cbcbf ("net: veth: add page_pool for page recycling")
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Link: https://lore.kernel.org/r/20230812023016.10553-1-liangchen.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Michal Schmidt says:
====================
octeon_ep: fixes for error and remove paths
I have an Octeon card that's misconfigured in a way that exposes a
couple of bugs in the octeon_ep driver's error paths. It can reproduce
the issues that patches 1 & 4 are fixing. Patches 2 & 3 are a result of
reviewing the nearby code.
====================
Link: https://lore.kernel.org/r/20230810150114.107765-1-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If it fails to get the devices's MAC address, octep_probe exits while
leaving the delayed work intr_poll_task queued. When the work later
runs, it's a use after free.
Move the cancelation of intr_poll_task from octep_remove into
octep_device_cleanup. This does not change anything in the octep_remove
flow, but octep_device_cleanup is called also in the octep_probe error
path, where the cancelation is needed.
Note that the cancelation of ctrl_mbox_task has to follow
intr_poll_task's, because the ctrl_mbox_task may be queued by
intr_poll_task.
Fixes: 24d4333233b3 ("octeon_ep: poll for control messages")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-5-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
intr_poll_task may queue ctrl_mbox_task. The function
octep_poll_non_ioq_interrupts_cn93_pf does this.
When removing the driver and canceling these two works, cancel
ctrl_mbox_task last to guarantee it does not run anymore.
Fixes: 24d4333233b3 ("octeon_ep: poll for control messages")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-4-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tx_timeout_task is canceled too early when removing the driver. Nothing
prevents .ndo_tx_timeout from triggering and queuing the work again.
Better cancel it after the netdev is unregistered.
It's harmless for octep_tx_timeout_task to run in the window between the
unregistration and cancelation, because it checks netif_running.
Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-3-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The intention was to wait up to 500 ms for the mbox response.
The third argument to wait_event_interruptible_timeout() is supposed to
be the timeout duration. The driver mistakenly passed absolute time
instead.
Fixes: 577f0d1b1c5f ("octeon_ep: add separate mailbox command and response queues")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230810150114.107765-2-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On Zynq UltraScale+ MPSoC ubuntu platform when systemctl issues suspend,
network manager bring down the interface and goes into suspend. When it
wakes up it again enables the interface.
This leads to xilinx-psgtr "PLL lock timeout" on interface bringup, as
the power management controller power down the entire FPD (including
SERDES) if none of the FPD devices are in use and serdes is not
initialized on resume.
$ sudo rtcwake -m no -s 120 -v
$ sudo systemctl suspend <this does ifconfig eth1 down>
$ ifconfig eth1 up
xilinx-psgtr fd400000.phy: lane 0 (type 10, protocol 5): PLL lock timeout
phy phy-fd400000.phy.0: phy poweron failed --> -110
macb driver is called in this way:
1. macb_close: Stop network interface. In this function, it
reset MACB IP and disables PHY and network interface.
2. macb_suspend: It is called in kernel suspend flow. But because
network interface has been disabled(netif_running(ndev) is
false), it does nothing and returns directly;
3. System goes into suspend state. Some time later, system is
waken up by RTC wakeup device;
4. macb_resume: It does nothing because network interface has
been disabled;
5. macb_open: It is called to enable network interface again. ethernet
interface is initialized in this API but serdes which is power-off
by PMUFW during FPD-off suspend is not initialized again and so
we hit GT PLL lock issue on open.
To resolve this PLL timeout issue always do PS GTR initialization
when ethernet device is configured as non-wakeup source.
Fixes: f22bd29ba19a ("net: macb: Fix ZynqMP SGMII non-wakeup source resume failure")
Fixes: 8b73fa3ae02b ("net: macb: Added ZynqMP-specific initialization")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1691414091-2260697-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a phylink_get_caps implementation for Marvell 88e6060 DSA switch.
This is a fast ethernet switch, with internal PHYs for ports 0 through
4. Port 4 also supports MII, REVMII, REVRMII and SNI. Port 5 supports
MII, REVMII, REVRMII and SNI without an internal PHY.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/E1qUkx7-003dMX-9b@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In LLVM 16, anonymous items may return names like `(unnamed union at ..)`
rather than empty names [1], which breaks Rust-enabled builds because
bindgen assumed an empty name instead of detecting them via
`clang_Cursor_isAnonymous` [2]:
$ make rustdoc LLVM=1 CLIPPY=1 -j$(nproc)
RUSTC L rust/core.o
BINDGEN rust/bindings/bindings_generated.rs
BINDGEN rust/bindings/bindings_helpers_generated.rs
BINDGEN rust/uapi/uapi_generated.rs
thread 'main' panicked at '"ftrace_branch_data_union_(anonymous_at__/_/include/linux/compiler_types_h_146_2)" is not a valid Ident', .../proc-macro2-1.0.24/src/fallback.rs:693:9
...
thread 'main' panicked at '"ftrace_branch_data_union_(anonymous_at__/_/include/linux/compiler_types_h_146_2)" is not a valid Ident', .../proc-macro2-1.0.24/src/fallback.rs:693:9
...
This was fixed in bindgen 0.62.0. Therefore, upgrade bindgen to
a more recent version, 0.65.1, to support LLVM 16.
Since bindgen 0.58.0 changed the `--{white,black}list-*` flags to
`--{allow,block}list-*` [3], update them on our side too.
In addition, bindgen 0.61.0 moved its CLI utility into a binary crate
called `bindgen-cli` [4]. Thus update the installation command in the
Quick Start guide.
Moreover, bindgen 0.61.0 changed the default functionality to bind
`size_t` to `usize` [5] and added the `--no-size_t-is-usize` flag
to not bind `size_t` as `usize`. Then bindgen 0.65.0 removed
the `--size_t-is-usize` flag [6]. Thus stop passing the flag to bindgen.
Finally, bindgen 0.61.0 added support for the `noreturn` attribute (in
its different forms) [7]. Thus remove the infinite loop in our Rust
panic handler after calling `BUG()`, since bindgen now correctly
generates a `BUG()` binding that returns `!` instead of `()`.
Link: https://github.com/llvm/llvm-project/commit/19e984ef8f49bc3ccced15621989fa9703b2cd5b [1]
Link: https://github.com/rust-lang/rust-bindgen/pull/2319 [2]
Link: https://github.com/rust-lang/rust-bindgen/pull/1990 [3]
Link: https://github.com/rust-lang/rust-bindgen/pull/2284 [4]
Link: https://github.com/rust-lang/rust-bindgen/commit/cc78b6fdb6e829e5fb8fa1639f2182cb49333569 [5]
Link: https://github.com/rust-lang/rust-bindgen/pull/2408 [6]
Link: https://github.com/rust-lang/rust-bindgen/issues/2094 [7]
Signed-off-by: Aakash Sen Sharma <aakashsensharma@gmail.com>
Closes: https://github.com/Rust-for-Linux/linux/issues/1013
Tested-by: Ariel Miculas <amiculas@cisco.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://lore.kernel.org/r/20230612194311.24826-1-aakashsensharma@gmail.com
[ Reworded commit message. Mentioned the `bindgen-cli` binary crate
change, linked to it and updated the Quick Start guide. Re-added a
deleted "as" word in a code comment and reflowed comment to respect
the maximum length. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Now that torture_random() uses swahw32(), its callers no longer see
not-so-random low-order bits, as these are now swapped up into the upper
16 bits of the torture_random() function's return value. This commit
therefore removes the right-shifting of torture_random() return values.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Now that torture_random() uses swahw32(), its callers no longer see
not-so-random low-order bits, as these are now swapped up into the upper
16 bits of the torture_random() function's return value. This commit
therefore removes the right-shifting of torture_random() return values.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
In order to gain better race coverage, move the test start/stop
waits in stutter_wait() to torture_hrtimeout_jiffies().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
In order to gain better race coverage, move the CPU-migration timed
waits in torture_shuffle() to torture_hrtimeout_jiffies().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
In order to gain better race coverage, move the CPU-hotplug-related
timed waits in torture_onoff() to torture_hrtimeout_jiffies().
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Given that it is expected that more code will use torture_hrtimeout_*(),
including for longer timeouts, make it use TASK_IDLE instead of
TASK_UNINTERRUPTIBLE.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit adds a module parameter that causes the locktorture writer
to run at real-time priority.
To use it:
insmod /lib/modules/torture.ko random_shuffle=1
insmod /lib/modules/locktorture.ko torture_type=mutex_lock rt_boost=1 rt_boost_factor=50 nested_locks=3 writer_fifo=1
^^^^^^^^^^^^^
A predecessor to this patch has been helpful to uncover issues with the
proxy-execution series.
[ paulmck: Remove locktorture-specific code from kernel/torture.c. ]
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: kernel-team@android.com
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
[jstultz: Include header change to build, reword commit message]
Signed-off-by: John Stultz <jstultz@google.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit adds a kthread-creation callback to the
_torture_create_kthread() function, which allows callers of a new
torture_create_kthread_cb() macro to specify a function to be invoked
after the kthread is created but before it is awakened for the first time.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: kernel-team@android.com
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: John Stultz <jstultz@google.com>
|
|
In kernels built with CONFIG_PROVE_RCU=y (for example, lockdep kernels),
the following sequence of events can occur:
o rcu_init_tasks_generic() is invoked just before init is spawned.
It invokes rcu_spawn_tasks_kthread() and friends.
o rcu_spawn_tasks_kthread() invokes rcu_spawn_tasks_kthread_generic(),
which uses kthread_run() to create the needed kthread.
o Control returns to rcu_init_tasks_generic(), which, because this
is a CONFIG_PROVE_RCU=y kernel, invokes the version of the
rcu_tasks_initiate_self_tests() function that actually does
something, including invoking synchronize_rcu_tasks(), which
in turn invokes synchronize_rcu_tasks_generic().
o synchronize_rcu_tasks_generic() sees that the ->kthread_ptr is
still NULL, because the newly spawned kthread has not yet
started.
o The new kthread starts, preempting synchronize_rcu_tasks_generic()
just after its check. This kthread invokes rcu_tasks_one_gp(),
which acquires ->tasks_gp_mutex, and, seeing no work, blocks
in rcuwait_wait_event(). Note that this step requires either
a preemptible kernel or a fault-injection-style sleep at the
beginning of mutex_lock().
o synchronize_rcu_tasks_generic() resumes and invokes rcu_tasks_one_gp().
o rcu_tasks_one_gp() attempts to acquire ->tasks_gp_mutex, which
is still held by the newly spawned kthread's rcu_tasks_one_gp()
function. Deadlock.
Because the only reason for ->tasks_gp_mutex is to handle pre-kthread
synchronous grace periods, this commit avoids this deadlock by having
rcu_tasks_one_gp() momentarily release ->tasks_gp_mutex while invoking
rcuwait_wait_event(). This allows the call to rcu_tasks_one_gp() from
synchronize_rcu_tasks_generic() proceed.
Note that it is not necessary to release the mutex anywhere else in
rcu_tasks_one_gp() because rcuwait_wait_event() is the only function
that can block indefinitely.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Roy Hopkins <rhopkins@suse.de>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Roy Hopkins <rhopkins@suse.de>
|
|
This reverts 6f822e1b5d9dda3d20e87365de138046e3baa03a - this helper is
used by bcachefs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Link: https://lore.kernel.org/r/20230813182636.2966159-4-kent.overstreet@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
bio_iov_iter_get_pages() trims the IO based on the block size of the
block device the IO will be issued to.
However, bcachefs is a multi device filesystem; when we're creating the
bio we don't yet know which block device the bio will be submitted to -
we have to handle the alignment checks elsewhere.
Thus this is needed to avoid a null ptr deref.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Link: https://lore.kernel.org/r/20230813182636.2966159-3-kent.overstreet@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
- bio_set_pages_dirty(), bio_check_pages_dirty() - dio path
- blk_status_to_str() - error messages
- bio_add_folio() - this should definitely be exported for everyone,
it's the modern version of bio_add_page()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-block@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20230813182636.2966159-2-kent.overstreet@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Whenever mlx5 driver is probed or reloaded, it queries some capabilities
in MAX mode via set_hca_cap() API. Afterwards, the driver queries all
capabilities in MAX mode via mlx5_query_hca_caps() API.
Since MAX caps are read only caps, querying them twice is redundant.
Hence, delete the second query.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Each device cap has two modes: MAX and CUR. The driver maintains a
cache of both modes of the capabilities. For most device caps, the MAX
cap mode is never used.
Hence, remove all driver queries of the MAX mode of the said caps as
well as their helper MACROs.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but
there isn't any user who requires them.
As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used.
Thus, drop all usages and definitions of the mentioned caps above.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
sw_function_id contains sfnum, so fix the error message to name the
value properly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Since mlx5_vhca_event_supported() is called in mlx5_sf_dev_supported(),
remove the redundant call.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
MLX5_CAP_GEN()
There is a helper called mlx5_sf_start_function_id() that
wraps up a query to get base SF function id. Use that instead of
calling MLX5_CAP_GEN() directly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Since mlx5_sf_supported() check is done as a first thing in
mlx5_sf_max_functions(), remove the redundant check.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Instead of using device_put(), use auxiliary_device_uninit() for
auxiliary device uninit which internally just calls device_put().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Firmware doesn't allow flow rules in FDB to do header rewrite and send
packets to both internal and uplink vports. The following syndrome
will be generated when trying to offload such kind of rules:
mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 23569): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8c8f08), err(-22)
To avoid this syndrome, add a checking before creating FTE. If a rule
with header rewrite action forwards packets to both VF and PF, an
error is returned directly.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Even if the PF driver had no error on his part of the sync reset flow,
the firmware can see wider picture as it syncs all the PFs in the flow.
So add at end of sync reset flow check with firmware by reading MFRL
register and initialization segment that the flow had no issue from
firmware point of view too.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Introduce devlink resource for exposing max possible SFs on mlx5
devices.
For example:
$ devlink resource show pci/0000:00:0b.0
pci/0000:00:0b.0:
name max_local_SFs size 5 unit entry dpipe_tables none
name max_external_SFs size 0 unit entry dpipe_tables none
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
A new check for the tx devlink health reporter is introduced for
determining when the PTP port timestamping SQ is considered unhealthy. If
there are enough CQEs considered never to be delivered, the space that can
be utilized on the SQ decreases significantly, impacting performance and
usability of the SQ. The health reporter is triggered when the number of
likely never delivered port timestamping CQEs that utilize the space of the
PTP SQ is greater than 93.75% of the total capacity of the SQ. A devlink
health reporter recover method is also provided for this specific TX error
context that restarts the PTP SQ.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Use a map structure for associating CQEs containing port timestamping
information with the appropriate skb. Track order of WQEs submitted using a
FIFO. Check if the corresponding port timestamping CQEs from the lookup
values in the FIFO are considered dropped due to time elapsed. Return the
lookup value to a freelist after consuming the skb. Reuse the freed lookup
in future WQE submission iterations.
The map structure uses an integer identifier for the key and returns an skb
corresponding to that identifier. Embed the integer identifier in the WQE
submitted to the WQ for the transmit path when the SQ is a PTP (port
timestamping) SQ. The embedded identifier can then be queried using a field
in the CQE of the corresponding port timestamping CQ. In the port
timestamping napi_poll context, the identifier is queried from the CQE
polled from CQ and used to lookup the corresponding skb from the WQE submit
path. The skb reference is removed from map and then embedded with the port
HW timestamp information from the CQE and eventually consumed.
The metadata freelist FIFO is an array containing integer identifiers that
can be pushed and popped in the FIFO. The purpose of this structure is
bookkeeping what identifier values can safely be used in a subsequent WQE
submission and should not contain identifiers that have still not been
reaped by processing a corresponding CQE completion on the port
timestamping CQ.
The ts_cqe_pending_list structure is a combination of an array and linked
list. The array is pre-populated with the nodes that will be added and
removed from the head of the linked list. Each node contains the unique
identifier value associated with the values submitted in the WQEs and
retrieved in the port timestamping CQEs. When a WQE is submitted, the node
in the array corresponding to the identifier popped from the metadata
freelist is added to the end of the CQE pending list and is marked as
"in-use". The node is removed from the linked list under two conditions.
The first condition is that the corresponding port timestamping CQE is
polled in the PTP napi_poll context. The second condition is that more than
a second has elapsed since the DMA timestamp value corresponding to the WQE
submission. When the first condition occurs, the "in-use" bit in the linked
list node is cleared, and the resources corresponding to the WQE submission
are then released. The second condition, however, indicates that the port
timestamping CQE will likely never be delivered. It's not impossible for
the device to post a CQE after an infinite amount of time though highly
improbable. In order to be resilient to this improbable case, resources
related to the corresponding WQE submission are still kept, the identifier
value is not returned to the freelist, and the "in-use" bit is cleared on
the node to indicate that it's no longer part of the linked list of "likely
to be delivered" port timestamping CQE identifiers. A count for the number
of port timestamping CQEs considered highly likely to never be delivered by
the device is maintained. This count gets decremented in the unlikely event
a port timestamping CQE considered unlikely to ever be delivered is polled
in the PTP napi_poll context.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
De-duplicate documentation by removing mellanox/mlx5/devlink.rst. Instead,
only use the generic devlink documentation directory to document mlx5
devlink parameters. Avoid providing general devlink tool usage information
in mlx5-specific documentation.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
struct i2c_driver::probe_new is about to go away. Switch the driver to
use the probe callback with the same prototype.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230814210759.26395-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Merge series from Yang Yingliang <yangyingliang@huawei.com>:
I'm trying to rename the legacy name to modern name used in SPI drivers,
this is part2 patchset.
After introducing devm_spi_alloc_host/spi_alloc_host(), the legacy
named function devm_spi_alloc_master/spi_alloc_master() can be replaced.
And also change other legacy name master/slave to modern name host/target
or controller. All compile test passed.
Yang Yingliang (20):
spi: amlogic-spifc-a1: switch to use devm_spi_alloc_host()
spi: au1550: switch to use modern name
spi: ep93xx: switch to use modern name
spi: falcon: switch to use modern name
spi: fsi: switch to use spi_alloc_host()
spi: fsl-dspi: switch to use modern name
spi: fsl-espi: switch to use modern name
spi: fsl-lpspi: switch to use modern name
spi: fsl-qspi: switch to use modern name
spi: fsl-spi: switch to use modern name
spi: gpio: switch to use modern name
spi: gxp: switch to use modern name
spi: bcmbca-hsspi: switch to use modern name
spi: hisi-sfc-v3xx: switch to use modern name
spi: img-spfi: switch to use modern name
spi: imx: switch to use modern name
spi: ingenic: switch to use devm_spi_alloc_host()
spi: intel: switch to use modern name
spi: jcore: switch to use modern name
spi: lantiq: switch to use modern name
drivers/spi/spi-amlogic-spifc-a1.c | 2 +-
drivers/spi/spi-au1550.c | 74 ++++++------
drivers/spi/spi-bcmbca-hsspi.c | 66 +++++------
drivers/spi/spi-ep93xx.c | 174 ++++++++++++++---------------
drivers/spi/spi-falcon.c | 34 +++---
drivers/spi/spi-fsi.c | 2 +-
drivers/spi/spi-fsl-dspi.c | 24 ++--
drivers/spi/spi-fsl-espi.c | 76 ++++++-------
drivers/spi/spi-fsl-lpspi.c | 54 ++++-----
drivers/spi/spi-fsl-qspi.c | 10 +-
drivers/spi/spi-fsl-spi.c | 76 ++++++-------
drivers/spi/spi-gpio.c | 72 ++++++------
drivers/spi/spi-gxp.c | 6 +-
drivers/spi/spi-hisi-sfc-v3xx.c | 18 +--
drivers/spi/spi-img-spfi.c | 118 +++++++++----------
drivers/spi/spi-imx.c | 114 +++++++++----------
drivers/spi/spi-ingenic.c | 2 +-
drivers/spi/spi-intel.c | 42 +++----
drivers/spi/spi-jcore.c | 44 ++++----
drivers/spi/spi-lantiq-ssc.c | 96 ++++++++--------
20 files changed, 552 insertions(+), 552 deletions(-)
--
2.25.1
|
|
svc_tcp_sendmsg used to factor in the xdr->page_base when sending pages,
but commit 5df5dd03a8f7 ("sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather
then sendpage") dropped that part of the handling. Fix it by setting
the bv_offset of the first bvec.
Fixes: 5df5dd03a8f7 ("sunrpc: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Jiri Pirko says:
====================
devlink: introduce selective dumps
Motivation:
For SFs, one devlink instance per SF is created. There might be
thousands of these on a single host. When a user needs to know port
handle for specific SF, he needs to dump all devlink ports on the host
which does not scale good.
Solution:
Allow user to pass devlink handle (and possibly other attributes)
alongside the dump command and dump only objects which are matching
the selection.
Use split ops to generate policies for dump callbacks acccording to
the attributes used for selection.
The userspace can use ctrl genetlink GET_POLICY command to find out if
the selective dumps are supported by kernel for particular command.
Example:
$ devlink port show
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
$ devlink port show auxiliary/mlx5_core.eth.0
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
$ devlink port show auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
Extension:
patches #12 and #13 extends selection attributes by port index
for health reporter dumping.
====================
Link: https://lore.kernel.org/r/20230811155714.1736405-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow user to pass port index for health reporter dump request.
Re-generate the related code.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-14-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce a possibility for devlink object to expose attributes it
supports for selection of dumped objects.
Use this by health reporter to indicate it supports port index based
selection of dump objects. Implement this selection mechanism in
devlink_nl_cmd_health_reporter_get_dump_one()
Example:
$ devlink health
pci/0000:08:00.0:
reporter fw
state healthy error 0 recover 0 auto_dump true
reporter fw_fatal
state healthy error 0 recover 0 grace_period 60000 auto_recover true auto_dump true
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32768:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32769:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32770:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.1:
reporter fw
state healthy error 0 recover 0 auto_dump true
reporter fw_fatal
state healthy error 0 recover 0 grace_period 60000 auto_recover true auto_dump true
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.1/98304:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.1/98305:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.1/98306:
reporter vnic
state healthy error 0 recover 0
$ devlink health show pci/0000:08:00.0
pci/0000:08:00.0:
reporter fw
state healthy error 0 recover 0 auto_dump true
reporter fw_fatal
state healthy error 0 recover 0 grace_period 60000 auto_recover true auto_dump true
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32768:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32769:
reporter vnic
state healthy error 0 recover 0
pci/0000:08:00.0/32770:
reporter vnic
state healthy error 0 recover 0
$ devlink health show pci/0000:08:00.0/32768
pci/0000:08:00.0/32768:
reporter vnic
state healthy error 0 recover 0
The last command is possible because of this patch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-13-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|