Age | Commit message (Collapse) | Author |
|
Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmefc_fcp_req' from
112 to 104 bytes.
This structure is embedded in some other structures (nvme_fc_fcp_op
which itself is embedded in nvme_fcp_op_w_sgl), so it helps reducing the
size of these structures too.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvme_dhchap_queue_context' from
416 to 400 bytes.
This structure is kvcalloc()'ed in nvme_auth_init_ctrl(), so it is likely
that the allocation can be relatively big. Saving 16 bytes per structure
may might a slight difference.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmf_ctrl_options' from 136 to
128 bytes.
When such a structure is allocated in nvmf_create_ctrl(), because of the
way memory allocation works, when 136 bytes were requested, 192 bytes were
allocated.
So this saves 64 bytes per allocation, 1 cache line to hold the whole
structure and a few cycles when zeroing the memory in nvmf_create_ctrl().
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvme_ctrl' from 5368 to 5344
bytes when all CONFIG_* are defined.
This structure is embedded into some other structures, so it helps reducing
their size as well.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmet_sq' from 472 to 464
bytes when CONFIG_NVME_TARGET_AUTH is defined.
This structure is embedded into some other structures, so it helps reducing
their sizes as well.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Notice that the enabled flag is only needed for active trip points,
so drop struct acpi_thermal_state_flags, add a simple "bool valid" field
to the definitions of all trip point structures instead of flags and
add a "bool enabled" field to struct acpi_thermal_active.
Adjust the code using the modified structures accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Move the definition of the acpi_thermal_driver structure closer to the
initialization code that registes the driver, so some function forward
declarations can be dropped.
Also move the module information to the end of the file where it is
usually located.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Move all of the symbol definitions to the initial part of the code so
they all can be found in one place.
While at it, consolidate white space used in there.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Drop the ACPI_TRIPS_REFRESH_DEVICES symbol which is redundant, because
ACPI_TRIPS_DEVICES can be used directly instead of it without any
drawbacks and rename the ACPI_TRIPS_REFRESH_THRESHOLDS to
ACPI_TRIPS_THRESHOLDS to make the code a bit more consistent.
While at it, fix up some formatting white space used in the symbol
definitions.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Use the BIT() macro for defining flag symbols in the ACPI thermal driver
instead of using "raw" values for the flags.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Michal Wilczynski <michal.wilczynski@intel.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
tcp and rdma transports have lots of duplicate code setting up the
different queue mappings. Add common helpers.
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Erase the superfluous line that retrieves the nvme_dev.
Signed-off-by: Irvin Cote <irvincoteg@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
There is no ib_stop_cq API and the need for the +1 is for ib_drain_qp.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Call dev_pm_qos_hide_latency_tolerance() in the error unwind patch to
avoid following kmemleak:-
blktests (master) # kmemleak-clear; ./check nvme/044;
blktests (master) # kmemleak-scan ; kmemleak-show
nvme/044 (Test bi-directional authentication) [passed]
runtime 2.111s ... 2.124s
unreferenced object 0xffff888110c46240 (size 96):
comm "nvme", pid 33461, jiffies 4345365353 (age 75.586s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000069ac2cec>] kmalloc_trace+0x25/0x90
[<000000006acc66d5>] dev_pm_qos_update_user_latency_tolerance+0x6f/0x100
[<00000000cc376ea7>] nvme_init_ctrl+0x38e/0x410 [nvme_core]
[<000000007df61b4b>] 0xffffffffc05e88b3
[<00000000d152b985>] 0xffffffffc05744cb
[<00000000f04a4041>] vfs_write+0xc5/0x3c0
[<00000000f9491baf>] ksys_write+0x5f/0xe0
[<000000001c46513d>] do_syscall_64+0x3b/0x90
[<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Link: https://lore.kernel.org/linux-nvme/CAHj4cs-nDaKzMx2txO4dbE+Mz9ePwLtU0e3egz+StmzOUgWUrA@mail.gmail.com/
Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Add missing fault-injection cleanup in nvme_init_ctrl() in the error
unwind path that also fixes following message for blktests:-
linux-block (for-next) # grep debugfs debugfs-err.log
[ 147.853464] debugfs: Directory 'nvme1' with parent '/' already present!
[ 147.853973] nvme1: failed to create debugfs attr
[ 148.802490] debugfs: Directory 'nvme1' with parent '/' already present!
[ 148.803244] nvme1: failed to create debugfs attr
[ 148.877304] debugfs: Directory 'nvme1' with parent '/' already present!
[ 148.877775] nvme1: failed to create debugfs attr
[ 149.816652] debugfs: Directory 'nvme1' with parent '/' already present!
[ 149.818011] nvme1: failed to create debugfs attr
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Free dhchap_secret in nvme_ctrl_dhchap_ctrl_secret_store() before we
return when nvme_auth_generate_key() returns error.
Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Free dhchap_secret in nvme_ctrl_dhchap_secret_store() before we return
fix following kmemleack:-
unreferenced object 0xffff8886376ea800 (size 64):
comm "check", pid 22048, jiffies 4344316705 (age 92.199s)
hex dump (first 32 bytes):
44 48 48 43 2d 31 3a 30 30 3a 6e 78 72 35 4b 67 DHHC-1:00:nxr5Kg
75 58 34 75 6f 41 78 73 4a 61 34 63 2f 68 75 4c uX4uoAxsJa4c/huL
backtrace:
[<0000000030ce5d4b>] __kmalloc+0x4b/0x130
[<000000009be1cdc1>] nvme_ctrl_dhchap_secret_store+0x8f/0x160 [nvme_core]
[<00000000ac06c96a>] kernfs_fop_write_iter+0x12b/0x1c0
[<00000000437e7ced>] vfs_write+0x2ba/0x3c0
[<00000000f9491baf>] ksys_write+0x5f/0xe0
[<000000001c46513d>] do_syscall_64+0x3b/0x90
[<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff8886376eaf00 (size 64):
comm "check", pid 22048, jiffies 4344316736 (age 92.168s)
hex dump (first 32 bytes):
44 48 48 43 2d 31 3a 30 30 3a 6e 78 72 35 4b 67 DHHC-1:00:nxr5Kg
75 58 34 75 6f 41 78 73 4a 61 34 63 2f 68 75 4c uX4uoAxsJa4c/huL
backtrace:
[<0000000030ce5d4b>] __kmalloc+0x4b/0x130
[<000000009be1cdc1>] nvme_ctrl_dhchap_secret_store+0x8f/0x160 [nvme_core]
[<00000000ac06c96a>] kernfs_fop_write_iter+0x12b/0x1c0
[<00000000437e7ced>] vfs_write+0x2ba/0x3c0
[<00000000f9491baf>] ksys_write+0x5f/0xe0
[<000000001c46513d>] do_syscall_64+0x3b/0x90
[<00000000ecf348fe>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixes: f50fff73d620 ("nvme: implement In-Band authentication")
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Since 315bada690e0 ("EDAC: Check for GHES preference in the
chipset-specific EDAC drivers"), vendor specific EDAC driver will not
probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device
is present. Make ghes_get_devices() return NULL when the GHES device
list is empty to fix the problem.
Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The inclusion of linux/arm-smccc.h in acpi_ffh is unnecessary and can
be even termed wrong. It is needed in the arm64 architecture callback
implementation and probably is the leftover from the missed cleanup of
the initial implementation.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Zhaoxin CPUs support NONSTOP TSC feature, so do not mark these CPUs
TSC unstable when use the acpi_pad driver.
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
It's only used inside the __init section. Mark it __initdata.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
We found a refcount UAF bug as follows:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148
Workqueue: events cpuset_hotplug_workfn
Call trace:
refcount_warn_saturate+0xa0/0x148
__refcount_add.constprop.0+0x5c/0x80
css_task_iter_advance_css_set+0xd8/0x210
css_task_iter_advance+0xa8/0x120
css_task_iter_next+0x94/0x158
update_tasks_root_domain+0x58/0x98
rebuild_root_domains+0xa0/0x1b0
rebuild_sched_domains_locked+0x144/0x188
cpuset_hotplug_workfn+0x138/0x5a0
process_one_work+0x1e8/0x448
worker_thread+0x228/0x3e0
kthread+0xe0/0xf0
ret_from_fork+0x10/0x20
then a kernel panic will be triggered as below:
Unable to handle kernel paging request at virtual address 00000000c0000010
Call trace:
cgroup_apply_control_disable+0xa4/0x16c
rebind_subsystems+0x224/0x590
cgroup_destroy_root+0x64/0x2e0
css_free_rwork_fn+0x198/0x2a0
process_one_work+0x1d4/0x4bc
worker_thread+0x158/0x410
kthread+0x108/0x13c
ret_from_fork+0x10/0x18
The race that cause this bug can be shown as below:
(hotplug cpu) | (umount cpuset)
mutex_lock(&cpuset_mutex) | mutex_lock(&cgroup_mutex)
cpuset_hotplug_workfn |
rebuild_root_domains | rebind_subsystems
update_tasks_root_domain | spin_lock_irq(&css_set_lock)
css_task_iter_start | list_move_tail(&cset->e_cset_node[ss->id]
while(css_task_iter_next) | &dcgrp->e_csets[ss->id]);
css_task_iter_end | spin_unlock_irq(&css_set_lock)
mutex_unlock(&cpuset_mutex) | mutex_unlock(&cgroup_mutex)
Inside css_task_iter_start/next/end, css_set_lock is hold and then
released, so when iterating task(left side), the css_set may be moved to
another list(right side), then it->cset_head points to the old list head
and it->cset_pos->next points to the head node of new list, which can't
be used as struct css_set.
To fix this issue, switch from all css_sets to only scgrp's css_sets to
patch in-flight iterators to preserve correct iteration, and then
update it->cset_head as well.
Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Link: https://www.spinics.net/lists/cgroups/msg37935.html
Suggested-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/all/20230526114139.70274-1-xiujianfeng@huaweicloud.com/
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Fixes: 2d8f243a5e6e ("cgroup: implement cgroup->e_csets[]")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
On multiple devices I work on, we noticed that
/sys/firmware/acpi/interrupts/sci_not is non-zero and keeps increasing
over time.
It turns out that there is a race condition between servicing a GPE
interrupt and handling task driven transactions.
If a GPE interrupt is received at the same time ec_poll() is running,
the advance_transaction() clears the GPE flag and the interrupt is not
serviced as acpi_ev_detect_gpe() relies on the GPE flag to call the
handler. As a result, `sci_not' is increased.
To address this, move the GPE status check and clearing from
advance_transaction() directly into acpi_ec_handle_interrupt(), so the
EC GPE only gets cleared in the interrupt handling path.
Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
after ~2012
There have been 2 separate reports now about a non working
"dell_backlight" device getting registered under /sys/class/backlight
1 report for a Raptor Lake based Dell and 1 report for a Meteor Lake
(development) platform.
On hw from the last 10 years dell-laptop will not register "dell_backlight"
because acpi_video_get_backlight_type() will return acpi_backlight_video
there if called before the GPU/kms driver loads. So it does not matter if
the GPU driver's native backlight is registered after dell-laptop loads.
But it seems that on the latest generation laptops the ACPI tables
no longer contain acpi_video backlight control support which causes
acpi_video_get_backlight_type() to return acpi_backlight_vendor causing
"dell_backlight" to get registered if the dell-laptop module is loaded
before the GPU/kms driver.
Vendor specific backlight control like the "dell_backlight" device is
only necessary on quite old hw (from before acpi_video backlight control
was introduced). Work around "dell_backlight" registering on very new
hw (where acpi_video backlight control seems to be no more) by making
acpi_video_get_backlight_type() return acpi_backlight_none instead
of acpi_backlight_vendor as final fallback when the ACPI tables have
support for Windows 8 or later (laptops from after ~2012).
Suggested-by: Matthew Garrett <mjg59@srcf.ucam.org>
Reported-by: AceLan Kao <acelan.kao@canonical.com>
Closes: https://lore.kernel.org/platform-driver-x86/20230607034331.576623-1-acelan.kao@canonical.com/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
freezer_css_{online,offline}()
syzbot is again reporting circular locking dependency between
cpu_hotplug_lock and freezer_mutex. Do like what we did with
commit 57dcd64c7e036299 ("cgroup,freezer: hold cpu_hotplug_lock
before freezer_mutex").
Reported-by: syzbot <syzbot+2ab700fe1829880a2ec6@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=2ab700fe1829880a2ec6
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+2ab700fe1829880a2ec6@syzkaller.appspotmail.com>
Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Get rid of the completion wait in svc_rdma_sendto(), and release
pages in the send completion handler again. A subsequent patch will
handle releasing those pages more efficiently.
Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pre-requisite for releasing pages in the send completion handler.
Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
rq_res.head[0].iov_base")
Pre-requisite for releasing pages in the send completion handler.
Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Modified nfsd4_encode_open to encode the op_recall flag properly
for OPEN result with write delegation granted.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
|
|
At LFS 2023, it was suggested we should publicly document the name and
email of reviewers who new contributors can trust. This also gives them
some recognition for their work as reviewers.
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Ensure that Bruce's old e-mail addresses map to his current one so
he doesn't miss out on all the fun.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The physical device's favored NUMA node ID is available when
allocating a rw_ctxt. Use that value instead of relying on the
assumption that the memory allocation happens to be running on a
node close to the device.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The physical device's favored NUMA node ID is available when
allocating a send_ctxt. Use that value instead of relying on the
assumption that the memory allocation happens to be running on a
node close to the device.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The physical device's favored NUMA node ID is available when
allocating a recv_ctxt. Use that value instead of relying on the
assumption that the memory allocation happens to be running on a
node close to the device.
This clean up eliminates the hack of destroying recv_ctxts that
were not created by the receive CQ thread -- recv_ctxts are now
always allocated on a "good" node.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The physical device's NUMA node ID is available when allocating an
svc_xprt for an incoming connection. Use that value to ensure the
svc_xprt structure is allocated on the NUMA node closest to the
device.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The below-mentioned patch was intended to simplify refcounting on the
svc_serv used by locked. The goal was to only ever have a single
reference from the single thread. To that end we dropped a call to
lockd_start_svc() (except when creating thread) which would take a
reference, and dropped the svc_put(serv) that would drop that reference.
Unfortunately we didn't also remove the svc_get() from
lockd_create_svc() in the case where the svc_serv already existed.
So after the patch:
- on the first call the svc_serv was allocated and the one reference
was given to the thread, so there are no extra references
- on subsequent calls svc_get() was called so there is now an extra
reference.
This is clearly not consistent.
The inconsistency is also clear in the current code in lockd_get()
takes *two* references, one on nlmsvc_serv and one by incrementing
nlmsvc_users. This clearly does not match lockd_put().
So: drop that svc_get() from lockd_get() (which used to be in
lockd_create_svc().
Reported-by: Ido Schimmel <idosch@idosch.org>
Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u
Fixes: b73a2972041b ("lockd: move lockd_start_svc() call into lockd_create_svc()")
Signed-off-by: NeilBrown <neilb@suse.de>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating
'wake_batch' is not atomic:
t1: t2:
_blk_mq_tag_busy blk_mq_tag_busy
inc active_queues
// assume 1->2
inc active_queues
// 2 -> 3
blk_mq_update_wake_batch
// calculate based on 3
blk_mq_update_wake_batch
/* calculate based on 2, while active_queues is actually 3. */
Fix this problem by protecting them wih 'tags->lock', this is not a hot
path, so performance should not be concerned. And now that all writers
are inside the lock, switch 'actives_queues' from atomic to unsigned
int.
Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
WHen the ring exits, cleanup is done and the final cancelation and
waiting on completions is done by io_ring_exit_work. That function is
invoked by kworker, which doesn't take any signals. Because of that, it
doesn't really matter if we wait for completions in TASK_INTERRUPTIBLE
or TASK_UNINTERRUPTIBLE state. However, it does matter to the hung task
detection checker!
Normally we expect cancelations and completions to happen rather
quickly. Some test cases, however, will exit the ring and park the
owning task stopped (eg via SIGSTOP). If the owning task needs to run
task_work to complete requests, then io_ring_exit_work won't make any
progress until the task is runnable again. Hence io_ring_exit_work can
trigger the hung task detection, which is particularly problematic if
panic-on-hung-task is enabled.
As the ring exit doesn't take signals to begin with, have it wait
interruptibly rather than uninterruptibly. io_uring has a separate
stuck-exit warning that triggers independently anyway, so we're not
really missing anything by making this switch.
Cc: stable@vger.kernel.org # 5.10+
Link: https://lore.kernel.org/r/b0e4aaef-7088-56ce-244c-976edeac0e66@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The sysctl net/core/bpf_jit_enable does not work now due to commit
1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc"). The
commit saved the jitted insns into 'rw_image' instead of 'image'
which caused bpf_jit_dump not dumping proper content.
With 'echo 2 > /proc/sys/net/core/bpf_jit_enable', run
'./test_progs -t fentry_test'. Without this patch, one of jitted
image for one particular prog is:
flen=17 proglen=92 pass=4 image=0000000014c64883 from=test_progs pid=1807
00000000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000040: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
00000050: cc cc cc cc cc cc cc cc cc cc cc cc
With this patch, the jitte image for the same prog is:
flen=17 proglen=92 pass=4 image=00000000b90254b7 from=test_progs pid=1809
00000000: f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3
00000010: 0f 1e fa 31 f6 48 8b 57 00 48 83 fa 07 75 2b 48
00000020: 8b 57 10 83 fa 09 75 22 48 8b 57 08 48 81 e2 ff
00000030: 00 00 00 48 83 fa 08 75 11 48 8b 7f 18 be 01 00
00000040: 00 00 48 83 ff 0a 74 02 31 f6 48 bf 18 d0 14 00
00000050: 00 c9 ff ff 48 89 77 00 31 c0 c9 c3
Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20230609005439.3173569-1-yhs@fb.com
|
|
While SDHCI claims to support 64-bit DMA on MSM8916 it does not seem to
be properly functional. It is not immediately obvious because SDHCI is
usually used with IOMMU bypassed on this SoC, and all physical memory
has 32-bit addresses. But when trying to enable the IOMMU it quickly
fails with an error such as the following:
arm-smmu 1e00000.iommu: Unhandled context fault:
fsr=0x402, iova=0xfffff200, fsynr=0xe0000, cbfrsynra=0x140, cb=3
mmc1: ADMA error: 0x02000000
mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00002e02
mmc1: sdhci: Blk size: 0x00000008 | Blk cnt: 0x00000000
mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000013
mmc1: sdhci: Present: 0x03f80206 | Host ctl: 0x00000019
mmc1: sdhci: Power: 0x0000000f | Blk gap: 0x00000000
mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
mmc1: sdhci: Timeout: 0x0000000a | Int stat: 0x00000001
mmc1: sdhci: Int enab: 0x03ff900b | Sig enab: 0x03ff100b
mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
mmc1: sdhci: Caps: 0x322dc8b2 | Caps_1: 0x00008007
mmc1: sdhci: Cmd: 0x0000333a | Max curr: 0x00000000
mmc1: sdhci: Resp[0]: 0x00000920 | Resp[1]: 0x5b590000
mmc1: sdhci: Resp[2]: 0xe6487f80 | Resp[3]: 0x0a404094
mmc1: sdhci: Host ctl2: 0x00000008
mmc1: sdhci: ADMA Err: 0x00000001 | ADMA Ptr: 0x0000000ffffff224
mmc1: sdhci_msm: ----------- VENDOR REGISTER DUMP -----------
mmc1: sdhci_msm: DLL sts: 0x00000000 | DLL cfg: 0x60006400 | DLL cfg2: 0x00000000
mmc1: sdhci_msm: DLL cfg3: 0x00000000 | DLL usr ctl: 0x00000000 | DDR cfg: 0x00000000
mmc1: sdhci_msm: Vndr func: 0x00018a9c | Vndr func2 : 0xf88018a8 Vndr func3: 0x00000000
mmc1: sdhci: ============================================
mmc1: sdhci: fffffffff200: DMA 0x0000ffffffffe100, LEN 0x0008, Attr=0x21
mmc1: sdhci: fffffffff20c: DMA 0x0000000000000000, LEN 0x0000, Attr=0x03
Looking closely it's obvious that only the 32-bit part of the address
(0xfffff200) arrives at the SMMU, the higher 16-bit (0xffff...) get
lost somewhere. This might not be a limitation of the SDHCI itself but
perhaps the bus/interconnect it is connected to, or even the connection
to the SMMU.
Work around this by setting SDHCI_QUIRK2_BROKEN_64_BIT_DMA to avoid
using 64-bit addresses.
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230518-msm8916-64bit-v1-1-5694b0f35211@gerhold.net
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
FMODE_NDELAY, FMODE_EXCL and FMODE_WRITE_IOCTL were only used for
block internal purposed and are now entirely unused, so remove them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-31-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Store the file struct used as the holder in file->private_data as an
indicator that this file descriptor was opened exclusively to remove
the last use of FMODE_EXCL.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20230608110258.189493-30-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Always use I_BDEV(file->f_mapping->host) to find the bdev for a file to
free up file->private_data for other uses.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-29-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A few ioctl handlers have fmode_t arguments that are entirely unused,
remove them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20230608110258.189493-27-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
All these helpers are only used in core block code, so move them out of
the public header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-26-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This code has been dead forever, make sure it doesn't show up in code
searches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Link: https://lore.kernel.org/r/20230608110258.189493-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Stop passing the fmode_t around and just use a simple bool to track if
an export is read-only.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20230608110258.189493-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of propagating the fmode_t, just use a bool to track if a mtd
block device was opened for writing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Link: https://lore.kernel.org/r/20230608110258.189493-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of passing a fmode_t and only checking it fo0r FMODE_WRITE, pass
a bool open_for_write to prepare for callers that won't have the fmode_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|