Age | Commit message (Collapse) | Author |
|
To clear the flow table on flow table free, the following sequence
normally happens in order:
1) gc_step work is stopped to disable any further stats/del requests.
2) All flow table entries are set to teardown state.
3) Run gc_step which will queue HW del work for each flow table entry.
4) Waiting for the above del work to finish (flush).
5) Run gc_step again, deleting all entries from the flow table.
6) Flow table is freed.
But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.
To fix the above, this patch flushes the pending work, then it sets the
teardown flag to all flows in the flowtable and it forces a garbage
collector run to queue work to remove the flows from hardware, then it
flushes this new pending work and (finally) it forces another garbage
collector run to remove the entry from the software flowtable.
Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214] dump_stack+0xbb/0x107
[47773.890818] print_address_description.constprop.0+0x18/0x140
[47773.892990] kasan_report.cold+0x7c/0xd8
[47773.894459] kasan_check_range+0x145/0x1a0
[47773.895174] down_read+0x99/0x460
[47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372] process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031] kasan_save_stack+0x1b/0x40
[47773.922730] __kasan_kmalloc+0x7a/0x90
[47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207] tcf_action_init_1+0x45b/0x700
[47773.925987] tcf_action_init+0x453/0x6b0
[47773.926692] tcf_exts_validate+0x3d0/0x600
[47773.927419] fl_change+0x757/0x4a51 [cls_flower]
[47773.928227] tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303] kasan_save_stack+0x1b/0x40
[47773.938039] kasan_set_track+0x1c/0x30
[47773.938731] kasan_set_free_info+0x20/0x30
[47773.939467] __kasan_slab_free+0xe7/0x120
[47773.940194] slab_free_freelist_hook+0x86/0x190
[47773.941038] kfree+0xce/0x3a0
[47773.941644] tcf_ct_flow_table_cleanup_work
Original patch description and stack trace by Paul Blakey.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Paul Blakey <paulb@nvidia.com>
Tested-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Expose nf_flow_table_gc_run() to force a garbage collector run from the
offload infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
mutex is per-netns, move table_netns to the pernet area.
*read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921
Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In modern Chromebooks, the embedded controller has a mechanism where
it will watch a hardware-controlled line that toggles in suspend, and
wake the system up if an expected sleep transition didn't occur. This
can be very useful for detecting power management issues where the
system appears to suspend, but doesn't actually reach its lowest
expected power states.
Sometimes it's useful in debug and test scenarios to be able to control
the duration of that timeout, or even disable the EC timeout mechanism
altogether. Add a debugfs control to set the timeout to values other
than the EC-defined default, for more convenient debug and
development iteration.
Signed-off-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Prashant Malani <pmalani@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Link: https://lore.kernel.org/r/20220822144026.v3.1.Idd188ff3f9caddebc17ac357a13005f93333c21f@changeid
[tzungbi: fix one nit in Documentation/ABI/testing/debugfs-cros-ec.]
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- The psi data structure was changed to be allocated dynamically but
it wasn't being cleared leading to it reporting garbage values and
triggering spurious oom kills.
- A deadlock involving cpuset and cpu hotplug.
- When a controller is moved across cgroup hierarchies,
css->rstat_css_node didn't get RCU drained properly from the previous
list.
* tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Fix race condition at rebind_subsystems()
cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
sched/psi: Remove redundant cgroup_psi() when !CONFIG_CGROUPS
sched/psi: Remove unused parameter nbytes of psi_trigger_create()
sched/psi: Zero the memory of struct psi_group
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2022-08-22
This series provides bug fixes to mlx5 driver.
* tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Unlock on error in mlx5_sriov_enable()
net/mlx5e: Fix use after free in mlx5e_fs_init()
net/mlx5e: kTLS, Use _safe() iterator in mlx5e_tls_priv_tx_list_cleanup()
net/mlx5: unlock on error path in esw_vfs_changed_event_handler()
net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off
net/mlx5e: TC, Add missing policer validation
net/mlx5e: Fix wrong application of the LRO state
net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
net/mlx5: Fix cmd error logging for manage pages cmd
net/mlx5: Disable irq when locking lag_lock
net/mlx5: Eswitch, Fix forwarding decision to uplink
net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY
net/mlx5e: Properly disable vlan strip on non-UL reps
====================
Link: https://lore.kernel.org/r/20220822195917.216025-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Subsequent patch will render the kdoc from
include/uapi/linux/netlink.h into Documentation.
We need to fix the warnings. While at it move
the comments on struct nlmsghdr to a proper
kdoc comment.
Link: https://lore.kernel.org/r/20220819200221.422801-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
* replace 'syscall' with 'upper layers', still mention that it's being
exported via syscall errno
* describe what happens in set_retval(-EPERM) + return 1
* describe what happens with bind's 'return 3'
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220823222555.523590-5-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The following hooks are per-cgroup hooks but they are not
using cgroup_{common,current}_func_proto, fix it:
* BPF_PROG_TYPE_CGROUP_SKB (cg_skb)
* BPF_PROG_TYPE_CGROUP_SOCK_ADDR (cg_sock_addr)
* BPF_PROG_TYPE_CGROUP_SOCK (cg_sock)
* BPF_PROG_TYPE_LSM+BPF_LSM_CGROUP
Also:
* move common func_proto's into cgroup func_proto handlers
* make sure bpf_{g,s}et_retval are not accessible from recvmsg,
getpeername and getsockname (return/errno is ignored in these
places)
* as a side effect, expose get_current_pid_tgid, get_current_comm_proto,
get_current_ancestor_cgroup_id, get_cgroup_classid to more cgroup
hooks
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220823222555.523590-3-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Split cgroup_base_func_proto into the following:
* cgroup_common_func_proto - common helpers for all cgroup hooks
* cgroup_current_func_proto - common helpers for all cgroup hooks
running in the process context (== have meaningful 'current').
Move bpf_{g,s}et_retval and other cgroup-related helpers into
kernel/bpf/cgroup.c so they closer to where they are being used.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220823222555.523590-2-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Now that we've finally gotten rid of the non-atomic MST users leftover in
the kernel, we can finally get rid of all of the legacy payload code we
have and move as much as possible into the MST atomic state structs. The
main purpose of this is to make the MST code a lot less confusing to work
on, as there's a lot of duplicated logic that doesn't really need to be
here. As well, this should make introducing features like fallback link
retraining and DSC support far easier.
Since the old payload code was pretty gnarly and there's a Lot of changes
here, I expect this might be a bit difficult to review. So to make things
as easy as possible for reviewers, I'll sum up how both the old and new
code worked here (it took me a while to figure this out too!).
The old MST code basically worked by maintaining two different payload
tables - proposed_vcpis, and payloads. proposed_vcpis would hold the
modified payload we wanted to push to the topology, while payloads held the
payload table that was currently programmed in hardware. Modifications to
proposed_vcpis would be handled through drm_dp_allocate_vcpi(),
drm_dp_mst_deallocate_vcpi(), and drm_dp_mst_reset_vcpi_slots(). Then, they
would be pushed via drm_dp_mst_update_payload_step1() and
drm_dp_mst_update_payload_step2().
Furthermore, it's important to note how adding and removing VC payloads
actually worked with drm_dp_mst_update_payload_step1(). When a VC payload
is removed from the VC table, all VC payloads which come after the removed
VC payload's slots must have their time slots shifted towards the start of
the table. The old code handles this by looping through the entire payload
table and recomputing the start slot for every payload in the topology from
scratch. While very much overkill, this ends up doing the right thing
because we always order the VCPIs for payloads from first to last starting
timeslot.
It's important to also note that drm_dp_mst_update_payload_step2() isn't
actually limited to updating a single payload - the driver can use it to
queue up multiple payload changes so that as many of them can be sent as
possible before waiting for the ACT. This is -technically- not against
spec, but as Wayne Lin has pointed out it's not consistently implemented
correctly in hubs - so it might as well be.
drm_dp_mst_update_payload_step2() is pretty self explanatory and basically
the same between the old and new code, save for the fact we don't have a
second step for deleting payloads anymore -and thus rename it to
drm_dp_mst_add_payload_step2().
The new payload code stores all of the current payload info within the MST
atomic state and computes as much of the state as possible ahead of time.
This has the one exception of the starting timeslots for payloads, which
can't be determined at atomic check time since the starting time slots will
vary depending on what order CRTCs are enabled in the atomic state - which
varies from driver to driver. These are still stored in the atomic MST
state, but are only copied from the old MST state during atomic commit
time. Likewise, this is when new start slots are determined.
Adding/removing payloads now works much more closely to how things are
described in the spec. When we delete a payload, we loop through the
current list of payloads and update the start slots for any payloads whose
time slots came after the payload we just deleted. Determining the starting
time slots for new payloads being added is done by simply keeping track of
where the end of the VC table is in
drm_dp_mst_topology_mgr->next_start_slot. Additionally, it's worth noting
that we no longer have a single update_payload() function. Instead, we now
have drm_dp_mst_add_payload_step1|2() and drm_dp_mst_remove_payload(). As
such, it's now left it up to the driver to figure out when to add or remove
payloads. The driver already knows when it's disabling/enabling CRTCs, so
it also already knows when payloads should be added or removed.
Changes since v1:
* Refactor around all of the completely dead code changes that are
happening in amdgpu for some reason when they really shouldn't even be
there in the first place… :\
* Remove mention of sending one ACT per series of payload updates. As Wayne
Lin pointed out, there are apparently hubs on the market that don't work
correctly with this scheme and require a separate ACT per payload update.
* Fix accidental drop of mst_mgr.lock - Wayne Lin
* Remove mentions of allowing multiple ACT updates per payload change,
mention that this is a result of vendors not consistently supporting this
part of the spec and requiring a unique ACT for each payload change.
* Get rid of reference to drm_dp_mst_port in DC - turns out I just got
myself confused by DC and we don't actually need this.
Changes since v2:
* Get rid of fix for not sending payload deallocations if ddps=0 and just
go back to wayne's fix
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Fangzhi Zuo <Jerry.Zuo@amd.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817193847.557945-18-lyude@redhat.com
|
|
Currently, we set drm_dp_atomic_payload->time_slots to 0 in order to
indicate that we're about to delete a payload in the current atomic state.
Since we're going to be dropping all of the legacy code for handling the
payload table however, we need to be able to ensure that we still keep
track of the current time slot allocations for each payload so we can reuse
this info when asking the root MST hub to delete payloads. We'll also be
using it to recalculate the start slots of each VC.
So, let's keep track of the intent of a payload in drm_dp_atomic_payload by
adding ->delete, which we set whenever we're planning on deleting a payload
during the current atomic commit.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Fangzhi Zuo <Jerry.Zuo@amd.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817193847.557945-16-lyude@redhat.com
|
|
There's another kind of situation where we could potentially race with
nonblocking modesets and MST, especially if we were to only use the locking
provided by atomic modesetting:
* Display 1 begins as enabled on DP-1 in SST mode
* Display 1 switches to MST mode, exposes one sink in MST mode
* Userspace does non-blocking modeset to disable the SST display
* Userspace does non-blocking modeset to enable the MST display with a
different CRTC, but the SST display hasn't been fully taken down yet
* Execution order between the last two commits isn't guaranteed since they
share no drm resources
We can fix this however, by ensuring that we always pull in the atomic
topology state whenever a connector capable of driving an MST display
performs its atomic check - and then tracking CRTC commits happening on the
SST connector in the MST topology state. So, let's add some simple helpers
for doing that and hook them up in various drivers.
v2:
* Use intel_dp_mst_source_support() to check for MST support in i915, fixes
CI failures
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Fangzhi Zuo <Jerry.Zuo@amd.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817193847.557945-14-lyude@redhat.com
|
|
As Daniel Vetter pointed out, if we only use the atomic modesetting locks
with MST it's technically possible for a driver with non-blocking modesets
to race when it comes to MST displays - as we make the mistake of not doing
our own CRTC commit tracking in the topology_state object.
This could potentially cause problems if something like this happens:
* User starts non-blocking commit to disable CRTC-1 on MST topology 1
* User starts non-blocking commit to enable CRTC-2 on MST topology 1
There's no guarantee here that the commit for disabling CRTC-2 will only
occur after CRTC-1 has finished, since neither commit shares a CRTC - only
the private modesetting object for MST. Keep in mind this likely isn't a
problem for blocking modesets, only non-blocking.
So, begin fixing this by keeping track of which CRTCs on a topology have
changed by keeping track of which CRTCs we release or allocate timeslots
on. As well, add some helpers for:
* Setting up the drm_crtc_commit structs in the ->commit_setup hook
* Waiting for any CRTC dependencies from the previous topology state
v2:
* Use drm_dp_mst_atomic_setup_commit() directly - Jani
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Fangzhi Zuo <Jerry.Zuo@amd.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817193847.557945-9-lyude@redhat.com
|
|
Since we're about to start adding some stuff here, we may as well fill in
any missing documentation that we forgot to write.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Fangzhi Zuo <Jerry.Zuo@amd.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817193847.557945-7-lyude@redhat.com
|
|
VCPI is only sort of the correct term here, originally the majority of this
code simply referred to timeslots vaguely as "slots" - and since I started
working on it and adding atomic functionality, the name "VCPI slots" has
been used to represent time slots.
Now that we actually have consistent access to the DisplayPort spec thanks
to VESA, I now know this isn't actually the proper term - as the
specification refers to these as time slots.
Since we're trying to make this code as easy to figure out as possible,
let's take this opportunity to correct this nomenclature and call them by
their proper name - timeslots. Likewise, we rename various functions
appropriately, along with replacing references in the kernel documentation
and various debugging messages.
It's important to note that this patch series leaves the legacy MST code
untouched for the most part, which is fine since we'll be removing it soon
anyhow. There should be no functional changes in this series.
v2:
* Add note that Wayne Lin from AMD suggested regarding slots being between
the source DP Tx and the immediate downstream DP Rx
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Fangzhi Zuo <Jerry.Zuo@amd.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817193847.557945-5-lyude@redhat.com
|
|
In retrospect, the name I chose for this originally is confusing, as
there's a lot more info in here then just the VCPI. This really should be
called a payload. Let's make it more obvious that this is meant to be
related to the atomic state and is about payloads by renaming it to
drm_dp_mst_atomic_payload. Also, rename various variables throughout the
code that use atomic payloads.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Fangzhi Zuo <Jerry.Zuo@amd.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Sean Paul <sean@poorly.run>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220817193847.557945-4-lyude@redhat.com
|
|
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely
replaces the flow-dissector logic with custom dissection logic. This
forces implementors to write programs that handle dissection for any
flows expected in the namespace.
It makes sense for flow-dissector BPF programs to just augment the
dissector with custom logic (e.g. dissecting certain flows or custom
protocols), while enjoying the broad capabilities of the standard
dissector for any other traffic.
Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF
programs may return this to indicate no dissection was made, and
fallback to the standard dissector is requested.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com
|
|
Let 'bpf_flow_dissect' callers know the BPF program's retcode and act
accordingly.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220821113519.116765-2-shmulik.ladkani@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Thirteen fixes, almost all for MM.
Seven of these are cc:stable and the remainder fix up the changes
which went into this -rc cycle"
* tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
kprobes: don't call disarm_kprobe() for disabled kprobes
mm/shmem: shmem_replace_page() remember NR_SHMEM
mm/shmem: tmpfs fallocate use file_modified()
mm/shmem: fix chattr fsflags support in tmpfs
mm/hugetlb: support write-faults in shared mappings
mm/hugetlb: fix hugetlb not supporting softdirty tracking
mm/uffd: reset write protection when unregister with wp-mode
mm/smaps: don't access young/dirty bit if pte unpresent
mm: add DEVICE_ZONE to FOR_ALL_ZONES
kernel/sys_ni: add compat entry for fadvise64_64
mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW
Revert "zram: remove double compression logic"
get_maintainer: add Alan to .get_maintainer.ignore
|
|
The resource name parameter should never be changed by DLM so we declare
it as const. At some point it is handled as a char pointer, a resource
name can be a non printable ascii string as well. This patch change it
to handle it as void pointer as it is offered by DLM API.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
The DLM_LSFL_FS flag is set in lockspaces created directly
for a kernel user, as opposed to those lockspaces created
for user space applications. The user space libdlm allowed
this flag to be set for lockspaces created from user space,
but then used by a kernel user. No kernel user has ever
used this method, so remove the ability to do it.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
This patch adds trace callbacks for user locks. Unfortenately user locks
are handled in a different way than kernel locks in some cases. User
locks never call the dlm_lock()/dlm_unlock() kernel API and use the next
step internal API of dlm. Adding those traces from user API callers
should make it possible for dlm trace system to see lock handling for
user locks as well.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
From current design in sof_machine_check and snd_sof_new_platform_drv,
the SOF can only support ACPI type machine.
1. In sof_machine_check if there is no ACPI machine exist, the function
will return -ENODEV directly, that's we don't expected if we do not
base on ACPI machine.
2. In snd_sof_new_platform_drv the component driver need a driver name
to do ignore_machine, currently the driver name is obtained from
machine->drv_name, and the type of machine is snd_soc_acpi_mach.
So we add a new function named sof_of_machine_select that we can pass
sof_machine_check and obtain info required by snd_sof_new_platform_drv.
Signed-off-by: Chunxu Li <chunxu.li@mediatek.com>
Link: https://lore.kernel.org/r/20220805070449.6611-2-chunxu.li@mediatek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Harshit Mogalapalli says:
In ebt_do_table() function dereferencing 'private->hook_entry[hook]'
can lead to NULL pointer dereference. [..] Kernel panic:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[..]
RIP: 0010:ebt_do_table+0x1dc/0x1ce0
Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88
[..]
Call Trace:
nf_hook_slow+0xb1/0x170
__br_forward+0x289/0x730
maybe_deliver+0x24b/0x380
br_flood+0xc6/0x390
br_dev_xmit+0xa2e/0x12c0
For some reason ebtables rejects blobs that provide entry points that are
not supported by the table, but what it should instead reject is the
opposite: blobs that DO NOT provide an entry point supported by the table.
t->valid_hooks is the bitmask of hooks (input, forward ...) that will see
packets. Providing an entry point that is not support is harmless
(never called/used), but the inverse isn't: it results in a crash
because the ebtables traverser doesn't expect a NULL blob for a location
its receiving packets for.
Instead of fixing all the individual checks, do what iptables is doing and
reject all blobs that differ from the expected hooks.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Because acpi_bus_get_acpi_device() is completely analogous to
acpi_fetch_acpi_dev(), rename it to acpi_get_acpi_dev() and
add a kerneldoc comment to it.
Accordingly, rename acpi_bus_put_acpi_device() to acpi_put_acpi_dev()
and update all of the users of these two functions.
While at it, move the acpi_fetch_acpi_dev() header next to the
acpi_get_acpi_dev() header in the header file holding them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com>
|
|
Make it easy for liburing to integrate uapi header with the kernel.
Previously, when this header changes, the liburing side can't directly
copy this header file due to some small differences. Sync them.
Link: https://lore.kernel.org/io-uring/f1feef16-6ea2-0653-238f-4aaee35060b6@kernel.dk
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Dylan Yudaken <dylany@fb.com>
Cc: Facebook Kernel Team <kernel-team@fb.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
After the recently added new scmi_msg_dump traces, the general format of
the various other SCMI traces are not consistent.
As an example the full traces of a simple PERF_LEVEL_SET:
| cpufreq-set-276 scmi_xfer_begin: transfer_id=145 msg_id=7 protocol_id=19 seq=145 poll=0
| cpufreq-set-276 scmi_msg_dump: pt=13 t=CMND msg_id=07 seq=0091 s=0 pyld=000000008066ab13
| cpufreq-set-276 scmi_xfer_response_wait: transfer_id=145 msg_id=7 protocol_id=19 seq=145 tmo_ms=5000 poll=0
| <idle>-0 scmi_msg_dump: pt=13 t=RESP msg_id=07 seq=0091 s=0 pyld=
| <idle>-0 scmi_rx_done: transfer_id=145 msg_id=7 protocol_id=19 seq=145 msg_type=0
| cpufreq-set-276 scmi_xfer_end: transfer_id=145 msg_id=7 protocol_id=19 seq=145 status=0
... where the same information is being reported using different names
(protocol_id= vs pt=) and even worst different bases, which is hard to
read and to parse.
So let us unify them, using the same naming and ordering of the fields
(wherever possible) and moving all the protocol related fields to base-16
while keeping in base-10 timeouts, res_id and values, so that the new
traces would be like:
| cpufreq-set-274 scmi_xfer_begin: pt=13 msg_id=07 seq=0092 transfer_id=92 poll=0
| cpufreq-set-274 scmi_msg_dump: pt=13 t=CMND msg_id=07 seq=0092 s=0 pyld=000000008066ab13
| cpufreq-set-274 scmi_xfer_response_wait: pt=13 msg_id=07 seq=0092 transfer_id=92 tmo_ms=5000 poll=0
| cat-256 scmi_msg_dump: pt=13 t=RESP msg_id=07 seq=0092 s=0 pyld=
| cat-256 scmi_rx_done: pt=13 msg_id=07 seq=0092 transfer_id=92 msg_type=0
| cpufreq-set-274 scmi_xfer_end: pt=13 msg_id=07 seq=0092 transfer_id=92 s=0
Link: https://lore.kernel.org/r/20220818132309.584042-2-cristian.marussi@arm.com
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
|
|
This reverts commit 9cbffc7a59561be950ecc675d19a3d2b45202b2b.
There are a few more issues to fix that have been reported in the thread
for the original series [1]. We'll need to fix those before this will work.
So, revert it for now.
[1] - https://lore.kernel.org/lkml/20220601070707.3946847-1-saravanak@google.com/
Fixes: 9cbffc7a5956 ("driver core: Delete driver_deferred_probe_check_state()")
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Peng Fan <peng.fan@nxp.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220819221616.2107893-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This is a partial revert of commit c295f9831f1d ("net: mscc: ocelot:
switch from {,un}set to {,un}assign for tag_8021q CPU ports"), because
as it turns out, this isn't how tag_8021q CPU ports under a LAG are
supposed to work.
Under that scenario, all user ports are "assigned" to the single
tag_8021q CPU port represented by the logical port corresponding to the
bonding interface. So one CPU port in a LAG would have is_dsa_8021q_cpu
set to true (the one whose physical port ID is equal to the logical port
ID), and the other one to false.
In turn, this makes 2 undesirable things happen:
(1) PGID_CPU contains only the first physical CPU port, rather than both
(2) only the first CPU port will be added to the private VLANs used by
ocelot for VLAN-unaware bridging
To make the driver behave in the same way for both bonded CPU ports, we
need to bring back the old concept of setting up a port as a tag_8021q
CPU port, and this is what deals with VLAN membership and PGID_CPU
updating. But we also need the CPU port "assignment" (the user to CPU
port affinity), and this is what updates the PGID_SRC forwarding rules.
All DSA CPU ports are statically configured for tag_8021q mode when the
tagging protocol is changed to ocelot-8021q. User ports are "assigned"
to one CPU port or the other dynamically (this will be handled by a
future change).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
More logic will be added to dsa_tree_setup_master() and
dsa_tree_teardown_master() in upcoming changes.
Reduce the indentation by one level in these functions by introducing
and using a dedicated iterator for CPU ports of a tree.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This adds 'vsock_data_ready()' which must be called by transport to kick
sleeping data readers. It checks for SO_RCVLOWAT value before waking
user, thus preventing spurious wake ups. Based on 'tcp_data_ready()' logic.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This adds transport specific callback for SO_RCVLOWAT, because in some
transports it may be difficult to know current available number of bytes
ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.
Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
For some reason two masks are used without the AZX prefix, and the
pattern MLCLT should be ML_LCTL for consistency.
Pure rename, no functionality change.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: Rander Wang <rander.wang@intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20220822190044.170495-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
CMU_MFCMSCL generates MFC, M2M, MCSC and JPEG clocks for BLK_MFCMSCL.
Add clock indices and binding documentation for CMU_MFCMSCL.
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220809113323.29965-4-semen.protsenko@linaro.org
|
|
CMU_IS generates CSIS, IPP, ITP, VRA and GDC clocks for BLK_IS. Add
clock indices and bindings documentation for CMU_IS domain.
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220809113323.29965-3-semen.protsenko@linaro.org
|
|
CMU_AUD generates Cortex-A32 clock, bus clock and audio clocks for
BLK_AUD. Add clock indices and binding documentation for CMU_AUD.
Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220809113323.29965-2-semen.protsenko@linaro.org
|
|
Add fsys1(for usb and mmc) clock definitions.
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/debb6335cb2bcc935f7572bed25d76a85e80cfaa.1659054220.git.chanho61.park@samsung.com
|
|
Add fsys0(for PCIe) clock definitions.
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/6f70a59164ad2c5ce5581047ca39a91afc1105d9.1659054220.git.chanho61.park@samsung.com
|
|
There are duplicated definitions of peric0 and peric1 cmu blocks. Thus,
they should be defined correctly as numerical order.
Fixes: 680e1c8370a2 ("dt-bindings: clock: add clock binding definitions for Exynos Auto v9")
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220727021357.152421-2-chanho61.park@samsung.com
|
|
There is the following quirk to bypass "WB Flush" in Write Booster.
- UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
If this quirk is not set, there is no knob that can control "WB Flush".
There are three flags that control Write Booster Feature:
1. WB ON/OFF
2. WB Hibern Flush ON/OFF (implicitly)
3. WB Flush ON/OFF (explicit)
The sysfs attribute that controls the WB was implemented. (1)
In the case of "Hibern Flush", it is always good to turn on. Control may
not be required. (2)
Finally, "Flush" may be necessary because the Auto-Hibern8 is not supported
in a specific environment. So the sysfs attribute that controls this is
necessary. (3)
Link: https://lore.kernel.org/r/20220804075354epcms2p8c21c894b4e28840c5fc651875b7f435f@epcms2p8
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Mediatek UFS does not want to toggle write booster during clock scaling.
Permit host driver to disable wb toggling during clock scaling.
Introduce a flag UFSHCD_CAP_WB_WITH_CLK_SCALING to decouple WB and clock
scaling. UFSHCD_CAP_WB_WITH_CLK_SCALING is only valid when
UFSHCD_CAP_CLK_SCALING is set. Just like UFSHCD_CAP_HIBERN8_WITH_CLK_GATING
is valid only when UFSHCD_CAP_CLK_GATING set.
Set UFSHCD_CAP_WB_WITH_CLK_SCALING for qcom to compatible legacy design at
the same time.
Link: https://lore.kernel.org/r/20220804025422.18803-1-peter.wang@mediatek.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
CLOCK_MONOTONIC is not advanced when the system is in suspend. This becomes
problematic when debugging issues related to suspend-resume: the timestamps
printed by ufshcd_print_trs can not be correlated with dmesg entries, which
are timestamped with local_clock().
Change the used clock to local_clock() for the informational timestamp
variables and adds mirroring *_local_clock instances for variables used in
subsequent derevations (to not change the semantics of those derevations).
Link: https://lore.kernel.org/r/20220804065019.v5.1.I699244ea7efbd326a34a6dfd9b5a31e78400cf68@changeid
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Daniil Lunev <dlunev@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The value is only ever set once in bond_3ad_initialize and only ever
read otherwise. There seems to be no reason to set the variable via
bond_3ad_initialize when setting the global variable will do. Change
ad_ticks_per_sec to a const to enforce its read-only usage.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20220818210156.8143-1-wsa+renesas@sang-engineering.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Most of the clock related dt-binding header files are located in
dt-bindings/clock folder. It would be good to keep all the similar
header files at a single location.
This was discovered while investigating the state of ownership of the
files in include/dt-bindings/ according to the MAINTAINERS file.
This change here is similar to commit 8e28918a85a0 ("dt-bindings: clock:
Move ti-dra7-atl.h to dt-bindings/clock") and commit 35d35aae8177
("dt-bindings: clock: Move at91.h to dt-bindigs/clock").
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220613081632.2159-3-lukas.bulwahn@gmail.com
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Most of the clock-related dt-binding header files are located in
include/dt-bindings/clock. It would be good to keep all the similar
header files at a single location.
This was discovered while investigating the state of ownership of the files
in include/dt-bindings/ according to the MAINTAINERS file.
This change here is similar to commit 8e28918a85a0 ("dt-bindings: clock:
Move ti-dra7-atl.h to dt-bindings/clock") and commit 35d35aae8177
("dt-bindings: clock: Move at91.h to dt-bindigs/clock").
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220613081632.2159-2-lukas.bulwahn@gmail.com
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Now that all its callers have been converted to
fscrypt_parse_test_dummy_encryption() and fscrypt_add_test_dummy_key()
instead, fscrypt_set_test_dummy_encryption() can be removed.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220513231605.175121-6-ebiggers@kernel.org
|
|
Add a lock_class_key per mlx5 device to avoid a false positive
"possible circular locking dependency" warning by lockdep, on flows
which lock more than one mlx5 device, such as adding SF.
kernel log:
======================================================
WARNING: possible circular locking dependency detected
5.19.0-rc8+ #2 Not tainted
------------------------------------------------------
kworker/u20:0/8 is trying to acquire lock:
ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]
but task is already holding lock:
ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(¬ifier->n_head)->rwsem){++++}-{3:3}:
down_write+0x90/0x150
blocking_notifier_chain_register+0x53/0xa0
mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
mlx5_init_one+0x261/0x490 [mlx5_core]
probe_one+0x430/0x680 [mlx5_core]
local_pci_probe+0xd6/0x170
work_for_cpu_fn+0x4e/0xa0
process_one_work+0x7c2/0x1340
worker_thread+0x6f6/0xec0
kthread+0x28f/0x330
ret_from_fork+0x1f/0x30
-> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
__lock_acquire+0x2fc7/0x6720
lock_acquire+0x1c1/0x550
__mutex_lock+0x12c/0x14b0
mlx5_init_one+0x2e/0x490 [mlx5_core]
mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
auxiliary_bus_probe+0x9d/0xe0
really_probe+0x1e0/0xaa0
__driver_probe_device+0x219/0x480
driver_probe_device+0x49/0x130
__device_attach_driver+0x1b8/0x280
bus_for_each_drv+0x123/0x1a0
__device_attach+0x1a3/0x460
bus_probe_device+0x1a2/0x260
device_add+0x9b1/0x1b40
__auxiliary_device_add+0x88/0xc0
mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
blocking_notifier_call_chain+0xd5/0x130
mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
process_one_work+0x7c2/0x1340
worker_thread+0x59d/0xec0
kthread+0x28f/0x330
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(¬ifier->n_head)->rwsem);
lock(&dev->intf_state_mutex);
lock(&(¬ifier->n_head)->rwsem);
lock(&dev->intf_state_mutex);
*** DEADLOCK ***
4 locks held by kworker/u20:0/8:
#0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
#1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
#2: ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
#3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460
stack backtrace:
CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x7d
check_noncircular+0x278/0x300
? print_circular_bug+0x460/0x460
? lock_chain_count+0x20/0x20
? register_lock_class+0x1880/0x1880
__lock_acquire+0x2fc7/0x6720
? register_lock_class+0x1880/0x1880
? register_lock_class+0x1880/0x1880
lock_acquire+0x1c1/0x550
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x400/0x400
__mutex_lock+0x12c/0x14b0
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? mlx5_init_one+0x2e/0x490 [mlx5_core]
? _raw_read_unlock+0x1f/0x30
? mutex_lock_io_nested+0x1320/0x1320
? __ioremap_caller.constprop.0+0x306/0x490
? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
? iounmap+0x160/0x160
mlx5_init_one+0x2e/0x490 [mlx5_core]
mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
auxiliary_bus_probe+0x9d/0xe0
really_probe+0x1e0/0xaa0
__driver_probe_device+0x219/0x480
? auxiliary_match_id+0xe9/0x140
driver_probe_device+0x49/0x130
__device_attach_driver+0x1b8/0x280
? driver_allows_async_probing+0x140/0x140
bus_for_each_drv+0x123/0x1a0
? bus_for_each_dev+0x1a0/0x1a0
? lockdep_hardirqs_on_prepare+0x286/0x400
? trace_hardirqs_on+0x2d/0x100
__device_attach+0x1a3/0x460
? device_driver_attach+0x1e0/0x1e0
? kobject_uevent_env+0x22d/0xf10
bus_probe_device+0x1a2/0x260
device_add+0x9b1/0x1b40
? dev_set_name+0xab/0xe0
? __fw_devlink_link_to_suppliers+0x260/0x260
? memset+0x20/0x40
? lockdep_init_map_type+0x21a/0x7d0
__auxiliary_device_add+0x88/0xc0
? auxiliary_device_init+0x86/0xa0
mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
blocking_notifier_call_chain+0xd5/0x130
mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
? lock_downgrade+0x6e0/0x6e0
? lockdep_hardirqs_on_prepare+0x286/0x400
process_one_work+0x7c2/0x1340
? lockdep_hardirqs_on_prepare+0x400/0x400
? pwq_dec_nr_in_flight+0x230/0x230
? rwlock_bug.part.0+0x90/0x90
worker_thread+0x59d/0xec0
? process_one_work+0x1340/0x1340
kthread+0x28f/0x330
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 6a3273217469 ("net/mlx5: SF, Port function state change support")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Pull NFS client fixes from Trond Myklebust:
"Stable fixes:
- NFS: Fix another fsync() issue after a server reboot
Bugfixes:
- NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
- NFS: Fix missing unlock in nfs_unlink()
- Add sanity checking of the file type used by __nfs42_ssc_open
- Fix a case where we're failing to set task->tk_rpc_status
Cleanups:
- Remove the NFS_CONTEXT_RESEND_WRITES flag that got obsoleted by the
fsync() fix"
* tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: RPC level errors should set task->tk_rpc_status
NFSv4.2 fix problems with __nfs42_ssc_open
NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES
NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds
NFS: Fix another fsync() issue after a server reboot
NFS: Fix missing unlock in nfs_unlink()
|