Age | Commit message (Collapse) | Author |
|
When memory backing is enabled, use null_handle_discard() to free the
backing memory used by a zone when the zone is being reset.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
null_handle_discard() is called from both null_handle_rq() and
null_handle_bio(). As these functions are only passed a nullb_cmd
structure, this forces pointer dereferences to identiify the discard
operation code and to access the sector range to be discarded.
Simplify all this by changing the interface of the functions
null_handle_discard() and null_handle_memory_backed() to pass along
the operation code, operation start sector and number of sectors. With
this change null_handle_discard() can be called directly from
null_handle_memory_backed().
Also add a message warning that the discard configuration attribute has
no effect when memory backing is disabled.
No functional change is introduced by this patch.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When open zone resource management is enabled, that is, when a null_blk
zoned device is created with zone_max_open different than 0, implicitly
or explicitly opening a zone may require implicitly closing a zone
that is already implicitly open. This operation is done using the
function null_close_first_imp_zone(), which search for an implicitly
open zone to close starting from the first sequential zone. This
implementation is simple but may result in the same being constantly
implicitly closed and then implicitly reopened on write, namely, the
lowest numbered zone that is being written.
Avoid this by starting the search for an implicitly open zone starting
from the zone following the last zone that was implicitly closed. The
function null_close_first_imp_zone() is renamed
null_close_imp_open_zone().
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
With memory backing disabled, using a single spinlock for protecting
zone information and zone resource management prevents the parallel
execution on multiple queue of IO requests to different zones.
Furthermore, regardless of the use of memory backing, if a null_blk
device is created without limits on the number of open and active zones,
accounting for zone resource management is not necessary.
>From these observations, zone locking is changed as follows to improve
performance:
1) the zone_lock spinlock is renamed zone_res_lock and used only if
zone resource management is necessary, that is, if either
zone_max_open or zone_max_active are not 0. This is indicated using
the new boolean need_zone_res_mgmt in the nullb_device structure.
null_zone_write() is modified to reduce the amount of code executed
with the zone_res_lock spinlock held.
2) With memory backing disabled, per zone locking is changed to a
spinlock per zone.
3) Introduce the structure nullb_zone to replace the use of
struct blk_zone for zone information. This new structure includes a
union of a spinlock and a mutex for zone locking. The spinlock is
used when memory backing is disabled and the mutex is used with
memory backing.
With these changes, fio performance with zonemode=zbd for 4K random
read and random write on a dual socket (24 cores per socket) machine
using the none schedulder is as follows:
before patch:
write (psync x 96 jobs) = 465 KIOPS
read (libaio@qd=8 x 96 jobs) = 1361 KIOPS
after patch:
write (psync x 96 jobs) = 456 KIOPS
read (libaio@qd=8 x 96 jobs) = 4096 KIOPS
Write performance remains mostly unchanged but read performance is three
times higher. Performance when using the mq-deadline scheduler is not
changed by this patch as mq-deadline becomes the bottleneck for a
multi-queue device.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Conventional zones do not have a write pointer and so cannot accept zone
append writes. Make sure to fail any zone append write command issued to
a conventional zone.
Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For a null_blk device with zoned mode enabled is currently initialized
with a number of zones equal to the device capacity divided by the zone
size, without considering if the device capacity is a multiple of the
zone size. If the zone size is not a divisor of the capacity, the zones
end up not covering the entire capacity, potentially resulting is out
of bounds accesses to the zone array.
Fix this by adding one last smaller zone with a size equal to the
remainder of the disk capacity divided by the zone size if the capacity
is not a multiple of the zone size. For such smaller last zone, the zone
capacity is also checked so that it does not exceed the smaller zone
size.
Reported-by: Naohiro Aota <naohiro.aota@wdc.com>
Fixes: ca4b2a011948 ("null_blk: add zone support")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Dump registers and states prior to leaving IRQ handler when an AH8 error
occurs.
Link: https://lore.kernel.org/r/1606910644-21185-4-git-send-email-cang@codeaurora.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bao D. Nguyen <nguyenb@codeaurora.org>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In current task abort routine, if task abort happens to the device W-LUN,
the code directly jumps to ufshcd_eh_host_reset_handler() to perform a full
reset and restore then returns FAIL or SUCCESS. Commands sent to the device
W-LUN are most likely the SSU cmds sent during UFS PM operations. If such
SSU cmd enters task abort routine when ufshcd_eh_host_reset_handler()
flushes eh_work, it will get stuck there since err_handler is serialized
with PM operations.
In order to unblock above call path, we merely clean up the lrb taken by
this cmd, queue the eh_work and return SUCCESS. Once the cmd is aborted,
the PM operation which sends out the cmd just errors out, then err_handler
shall be able to proceed with the full reset and restore.
In this scenario, the cmd is aborted even before it is actually cleared by
HW, set the lrb->in_use flag to prevent subsequent cmds, including SCSI
cmds and dev cmds, from taking the lrb released from abort. The flag shall
evetually be cleared in __ufshcd_transfer_req_compl() invoked by the full
reset and restore from err_handler.
[mkp: conflict with event logging series]
Link: https://lore.kernel.org/r/1606910644-21185-3-git-send-email-cang@codeaurora.org
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Serialize eh_work with system PM events and async scan to make sure eh_work
does not run in parallel with them.
Link: https://lore.kernel.org/r/1606910644-21185-2-git-send-email-cang@codeaurora.org
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
UFS specficication allows different VCC configurations for UFS devices,
for example:
(1). 2.70V - 3.60V (Activated by default in UFS core driver)
(2). 1.70V - 1.95V (Activated if "vcc-supply-1p8" is declared in
device tree)
(3). 2.40V - 2.70V (Supported since UFS 3.x)
With the introduction of UFS 3.x products, an issue is happening that UFS
driver will use wrong "min_uV-max_uV" values to configure the voltage of
VCC regulator on UFU 3.x products with the configuration (3) used.
To solve this issue, we simply remove pre-defined initial VCC voltage
values in UFS core driver with below reasons,
1. UFS specifications do not define how to detect the VCC configuration
supported by attached device.
2. Device tree already supports standard regulator properties.
Therefore VCC voltage shall be defined correctly in device tree, and shall
not changed by UFS driver. What UFS driver needs to do is simply enable or
disable the VCC regulator only.
Similar change is applied to VCCQ and VCCQ2 as well.
Note that we keep struct ufs_vreg unchanged. This allows vendors to
configure proper min_uV and max_uV of any regulators to make
regulator_set_voltage() works during regulator toggling flow in the
future. Without specific vendor configurations, min_uV and max_uV will be
NULL by default and UFS core driver will enable or disable the regulator
only without adjusting its voltage.
Link: https://lore.kernel.org/r/20201202091819.22363-1-stanley.chu@mediatek.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use phy_initialization helper instead of direct invocation.
Link: https://lore.kernel.org/r/20201205120041.26869-5-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use phy_initialization helper instead of direct function invocation.
Link: https://lore.kernel.org/r/20201205120041.26869-4-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Introduce phy_initialization helper since this is the only one variant
function without helper.
Link: https://lore.kernel.org/r/20201205120041.26869-3-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Since setup_regulators variant function is not used by any vendors, simply
remove it.
Link: https://lore.kernel.org/r/20201205120041.26869-2-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Introduce event_notify implementation on MediaTek UFS platform. A
vendor-specific tracepoint is added that can be used for debugging
purposes.
Link: https://lore.kernel.org/r/20201205115901.26815-5-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Introduce event_notify variant function to allow vendor to get notification
of important events and connect to any proprietary debugging facilities.
Link: https://lore.kernel.org/r/20201205115901.26815-4-stanley.chu@mediatek.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The UFS error history does not only have "history of errors" but also a
log of some other events which are not defined as errors.
This patch fixes the confused naming of related functions and changes the
approach for updating and printing history in preparation of next patch.
This patch does not change any functionality.
Link: https://lore.kernel.org/r/20201205115901.26815-3-stanley.chu@mediatek.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add error history for abort event in UFS Device W-LUN.
Use specified value as parameter of ufshcd_update_reg_hist() to identify
the aborted tag or LUNs.
Link: https://lore.kernel.org/r/20201205115901.26815-2-stanley.chu@mediatek.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
kfree(conn) is called inside put_device(&conn->dev) which could lead to
use-after-free. In addition, device_unregister() should be used here rather
than put_deviceO().
Link: https://lore.kernel.org/r/20201120074852.31658-1-miaoqinglang@huawei.com
Fixes: f3c893e3dbb5 ("scsi: iscsi: Fail session and connection on transport registration failure")
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver did not return an error in the case where
pm8001_configure_phy_settings() failed.
Use rc to store the return value of pm8001_configure_phy_settings().
Link: https://lore.kernel.org/r/20201205115551.2079471-1-zhangqilong3@huawei.com
Fixes: 279094079a44 ("[SCSI] pm80xx: Phy settings support for motherboard controller.")
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add the missing destroy_workqueue() before return from __qedi_probe in the
error handling case when fails to create workqueue qedi->offload_thread.
Link: https://lore.kernel.org/r/20201109091518.55941-1-miaoqinglang@huawei.com
Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The R9A06G032 clock driver uses an array of packed structures to reduce
kernel size. However, this array contains pointers, which are no longer
aligned naturally, and cannot be relocated on PPC64. Hence when
compile-testing this driver on PPC64 with CONFIG_RELOCATABLE=y (e.g.
PowerPC allyesconfig), the following warnings are produced:
WARNING: 136 bad relocations
c000000000616be3 R_PPC64_UADDR64 .rodata+0x00000000000cf338
c000000000616bfe R_PPC64_UADDR64 .rodata+0x00000000000cf370
...
Fix this by dropping the __packed attribute from the r9a06g032_clkdesc
definition, trading a small size increase for portability.
This increases the 156-entry clock table by 1 byte per entry, but due to
the compiler generating more efficient code for unpacked accesses, the
net size increase is only 76 bytes (gcc 9.3.0 on arm32).
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 4c3d88526eba2143 ("clk: renesas: Renesas R9A06G032 clock driver")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201130085743.1656317-1-geert+renesas@glider.be
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # PowerPC allyesconfig build
Acked-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
This issue can be reproduced by having a kernel config with
CONFIG_IMX_MBOX=m and CONFIG_MXC_CLK_SCU=m. It's caused by the Makefile
wanting to build clk-scu.o and clk-imx8qxp.o as different targets but
that doesn't work (e.g. MXC_CLK_SCU = y while CLK_IMX8QXP = n)
"obj-$(CONFIG_MXC_CLK_SCU) += clk-imx-scu.o clk-imx-lpcg-scu.o
clk-imx-scu-$(CONFIG_CLK_IMX8QXP) += clk-scu.o clk-imx8qxp.o"
Having MXC_CLK_SCU=y/m while CLK_IMX8QXP=n will cause a linker problem
like below:
LD [M] drivers/clk/imx/clk-imx-scu.o
arm-poky-linux-gnueabi-ld: no input files
Make MXC_CLK_SCU be un-selectable by users so it can only be selected by
the CLK_IMX8QXP option, ensuring the two symbols are built together.
Drop COMPILE_TEST too because this option isn't selectable anymore. We
can remove it from MXC_CLK_SCU because CLK_IMX8QXP selects MXC_CLK_SCU
which already has COMPILE_TEST.
Fixes: e0d0d4d86c766 ("clk: imx8qxp: Support building i.MX8QXP clock driver as module")
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
Link: https://lore.kernel.org/r/20201130084624.21113-1-aisheng.dong@nxp.com
[sboyd@kernel.org: Rework commit text]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
The query_gid_table ioctl skips non IB/RoCE ports, which as a result
returns an empty gid table for devices such as EFA which have a GID table,
but are not IB/RoCE.
Fixes: c4b4d548fabc ("RDMA/core: Introduce new GID table query API")
Link: https://lore.kernel.org/r/20201206153238.34878-1-galpress@amazon.com
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
direct to backing
There is a race condition in detaching as below:
A. detaching B. Write request
(1) writing back
(2) write back done, set bdev
state to clean.
(3) cached_dev_put() and
schedule_work(&dc->detach);
(4) write data [0 - 4K] directly
into backing and ack to user.
(5) power-failure...
When we restart this bcache device, this bdev is clean but not detached,
and read [0 - 4K], we will get unexpected old data from cache device.
To fix this problem, set the bdev state to none when we writeback done
in detaching, and then if power-failure happened as above, the data in
cache will not be used in next bcache device starting, it's detached, we
will read the correct data from backing derectly.
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
iser_initialize_task_headers() uses in_interrupt() to find out if it is
safe to acquire a mutex.
in_interrupt() is deprecated as it is ill defined and does not provide
what it suggests. Aside of that it covers only parts of the contexts in
which a mutex may not be acquired.
The following callchains exist:
iscsi_queuecommand() *locks* iscsi_session::frwd_lock
-> iscsi_prep_scsi_cmd_pdu()
-> session->tt->init_task() (iscsi_iser_task_init())
-> iser_initialize_task_headers()
-> iscsi_iser_task_xmit() (iscsi_transport::xmit_task)
-> iscsi_iser_task_xmit_unsol_data()
-> iser_send_data_out()
-> iser_initialize_task_headers()
iscsi_data_xmit() *locks* iscsi_session::frwd_lock
-> iscsi_prep_mgmt_task()
-> session->tt->init_task() (iscsi_iser_task_init())
-> iser_initialize_task_headers()
-> iscsi_prep_scsi_cmd_pdu()
-> session->tt->init_task() (iscsi_iser_task_init())
-> iser_initialize_task_headers()
__iscsi_conn_send_pdu() caller has iscsi_session::frwd_lock
-> iscsi_prep_mgmt_task()
-> session->tt->init_task() (iscsi_iser_task_init())
-> iser_initialize_task_headers()
-> session->tt->xmit_task() (
The only callchain that is close to be invoked in preemptible context:
iscsi_xmitworker() worker
-> iscsi_data_xmit()
-> iscsi_xmit_task()
-> conn->session->tt->xmit_task() (iscsi_iser_task_xmit()
In iscsi_iser_task_xmit() there is this check:
if (!task->sc)
return iscsi_iser_mtask_xmit(conn, task);
so it does end up in iser_initialize_task_headers() and
iser_initialize_task_headers() relies on iscsi_task::sc == NULL.
Remove conditional locking of iser_conn::state_mutex because there is no
call chain to do so. Remove the goto label and return early now that there
is no clean up needed.
Link: https://lore.kernel.org/r/20201204174256.62xfcvudndt7oufl@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Max Gurtovoy <maxg@nvidia.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Currently, DM MR registration flow doesn't set the mlx5_ib_dev pointer and
can cause a NULL pointer dereference if userspace dumps the MR via rdma
tool.
Assign the IB device together with the other fields and remove the
redundant reference of mlx5_ib_dev from mlx5_ib_mr.
Cc: stable@vger.kernel.org
Fixes: 6c29f57ea475 ("IB/mlx5: Device memory mr registration support")
Link: https://lore.kernel.org/r/20201203190807.127189-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
These flags will be returned to the userspace through ABI, so they should
be defined in hns-abi.h. Furthermore, there is no need to include
hns-abi.h in every source files, it just needs to be included in the
common header file.
Link: https://lore.kernel.org/r/1606872560-17823-1-git-send-email-liweihang@huawei.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Some functions have different names between their prototypes and the
kernel-doc markup.
Others need to be fixed, as kernel-doc markups should use this format:
identifier - description
Link: https://lore.kernel.org/r/78b98c41a5a0f4c0106433d305b143028a4168b0.1606823973.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
While creating qps, the driver adds one extra entry to the sq size passed
by the ULPs in order to avoid queue full condition. When ULPs creates QPs
with max_qp_wr reported, driver creates QP with 1 more than the max_wqes
supported by HW. Create QP fails in this case. To avoid this error, reduce
1 entry in max_qp_wqes and report it to the stack.
Link: https://lore.kernel.org/r/1606741986-16477-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
atomic_inc_return() is a little neater
Link: https://lore.kernel.org/r/1606726376-7675-1-git-send-email-yejune.deng@gmail.com
Signed-off-by: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-next-2020-12-02
Low level mlx5 updates required by both netdev and rdma trees:
net/mlx5: Treat host PF vport as other (non eswitch manager) vport
net/mlx5: Enable host PF HCA after eswitch is initialized
net/mlx5: Rename peer_pf to host_pf
net/mlx5: Make API mlx5_core_is_ecpf accept const pointer
net/mlx5: Export steering related functions
net/mlx5: Expose other function ifc bits
net/mlx5: Expose IP-in-IP TX and RX capability bits
net/mlx5: Update the hardware interface definition for vhca state
net/mlx5: Update the list of the PCI supported devices
net/mlx5: Avoid exposing driver internal command helpers
net/mlx5: Add ts_cqe_to_dest_cqn related bits
net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits
net/mlx5: Check dr mask size against mlx5_match_param size
net/mlx5: Add sampler destination type
net/mlx5: Add sample offload hardware bits and structures
====================
Link: https://lore.kernel.org/r/20201203011010.213440-1-saeedm@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The ch->lock is used to protect the whole enable() and read() of
sh_cmt's implementation of struct clocksource. The enable()
implementation calls pm_runtime_get_sync() which may result in the clock
source to be read() triggering a cyclic lockdep warning for the
ch->lock.
The sh_cmt driver implement its own balancing of calls to
sh_cmt_{enable,disable}() with flags in sh_cmt_{start,stop}(). It does
this to deal with that start and stop are shared between the clock
source and clock event providers. While this could be improved on
verifying corner cases based on any substantial rework on all devices
this driver supports might prove hard.
As a first step separate the PM handling for clock event and clock
source. Always put/get the device when enabling/disabling the clock
source but keep the clock event logic unchanged. This allows the sh_cmt
implementation of struct clocksource to call PM without holding the
ch->lock and avoiding the deadlock.
Triggering and log of the deadlock warning,
# echo e60f0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource
[ 46.948370] ======================================================
[ 46.954730] WARNING: possible circular locking dependency detected
[ 46.961094] 5.10.0-rc6-arm64-renesas-00001-g0e5fd7414e8b #36 Not tainted
[ 46.967985] ------------------------------------------------------
[ 46.974342] migration/0/11 is trying to acquire lock:
[ 46.979543] ffff0000403ed220 (&dev->power.lock){-...}-{2:2}, at: __pm_runtime_resume+0x40/0x74
[ 46.988445]
[ 46.988445] but task is already holding lock:
[ 46.994441] ffff000040ad0298 (&ch->lock){....}-{2:2}, at: sh_cmt_start+0x28/0x210
[ 47.002173]
[ 47.002173] which lock already depends on the new lock.
[ 47.002173]
[ 47.010573]
[ 47.010573] the existing dependency chain (in reverse order) is:
[ 47.018262]
[ 47.018262] -> #3 (&ch->lock){....}-{2:2}:
[ 47.024033] lock_acquire.part.0+0x120/0x330
[ 47.028970] lock_acquire+0x64/0x80
[ 47.033105] _raw_spin_lock_irqsave+0x7c/0xc4
[ 47.038130] sh_cmt_start+0x28/0x210
[ 47.042352] sh_cmt_clocksource_enable+0x28/0x50
[ 47.047644] change_clocksource+0x9c/0x160
[ 47.052402] multi_cpu_stop+0xa4/0x190
[ 47.056799] cpu_stopper_thread+0x90/0x154
[ 47.061557] smpboot_thread_fn+0x244/0x270
[ 47.066310] kthread+0x154/0x160
[ 47.070175] ret_from_fork+0x10/0x20
[ 47.074390]
[ 47.074390] -> #2 (tk_core.seq.seqcount){----}-{0:0}:
[ 47.081136] lock_acquire.part.0+0x120/0x330
[ 47.086070] lock_acquire+0x64/0x80
[ 47.090203] seqcount_lockdep_reader_access.constprop.0+0x74/0x100
[ 47.097096] ktime_get+0x28/0xa0
[ 47.100960] hrtimer_start_range_ns+0x210/0x2dc
[ 47.106164] generic_sched_clock_init+0x70/0x88
[ 47.111364] sched_clock_init+0x40/0x64
[ 47.115853] start_kernel+0x494/0x524
[ 47.120156]
[ 47.120156] -> #1 (hrtimer_bases.lock){-.-.}-{2:2}:
[ 47.126721] lock_acquire.part.0+0x120/0x330
[ 47.136042] lock_acquire+0x64/0x80
[ 47.144461] _raw_spin_lock_irqsave+0x7c/0xc4
[ 47.153721] hrtimer_start_range_ns+0x68/0x2dc
[ 47.163054] rpm_suspend+0x308/0x5dc
[ 47.171473] rpm_idle+0xc4/0x2a4
[ 47.179550] pm_runtime_work+0x98/0xc0
[ 47.188209] process_one_work+0x294/0x6f0
[ 47.197142] worker_thread+0x70/0x45c
[ 47.205661] kthread+0x154/0x160
[ 47.213673] ret_from_fork+0x10/0x20
[ 47.221957]
[ 47.221957] -> #0 (&dev->power.lock){-...}-{2:2}:
[ 47.236292] check_noncircular+0x128/0x140
[ 47.244907] __lock_acquire+0x13b0/0x204c
[ 47.253332] lock_acquire.part.0+0x120/0x330
[ 47.262033] lock_acquire+0x64/0x80
[ 47.269826] _raw_spin_lock_irqsave+0x7c/0xc4
[ 47.278430] __pm_runtime_resume+0x40/0x74
[ 47.286758] sh_cmt_start+0x84/0x210
[ 47.294537] sh_cmt_clocksource_enable+0x28/0x50
[ 47.303449] change_clocksource+0x9c/0x160
[ 47.311783] multi_cpu_stop+0xa4/0x190
[ 47.319720] cpu_stopper_thread+0x90/0x154
[ 47.328022] smpboot_thread_fn+0x244/0x270
[ 47.336298] kthread+0x154/0x160
[ 47.343708] ret_from_fork+0x10/0x20
[ 47.351445]
[ 47.351445] other info that might help us debug this:
[ 47.351445]
[ 47.370225] Chain exists of:
[ 47.370225] &dev->power.lock --> tk_core.seq.seqcount --> &ch->lock
[ 47.370225]
[ 47.392003] Possible unsafe locking scenario:
[ 47.392003]
[ 47.405314] CPU0 CPU1
[ 47.413569] ---- ----
[ 47.421768] lock(&ch->lock);
[ 47.428425] lock(tk_core.seq.seqcount);
[ 47.438701] lock(&ch->lock);
[ 47.447930] lock(&dev->power.lock);
[ 47.455172]
[ 47.455172] *** DEADLOCK ***
[ 47.455172]
[ 47.471433] 3 locks held by migration/0/11:
[ 47.479099] #0: ffff8000113c9278 (timekeeper_lock){-.-.}-{2:2}, at: change_clocksource+0x2c/0x160
[ 47.491834] #1: ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190
[ 47.504727] #2: ffff000040ad0298 (&ch->lock){....}-{2:2}, at: sh_cmt_start+0x28/0x210
[ 47.516541]
[ 47.516541] stack backtrace:
[ 47.528480] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.10.0-rc6-arm64-renesas-00001-g0e5fd7414e8b #36
[ 47.542147] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[ 47.554241] Call trace:
[ 47.560832] dump_backtrace+0x0/0x190
[ 47.568670] show_stack+0x14/0x30
[ 47.576144] dump_stack+0xe8/0x130
[ 47.583670] print_circular_bug+0x1f0/0x200
[ 47.592015] check_noncircular+0x128/0x140
[ 47.600289] __lock_acquire+0x13b0/0x204c
[ 47.608486] lock_acquire.part.0+0x120/0x330
[ 47.616953] lock_acquire+0x64/0x80
[ 47.624582] _raw_spin_lock_irqsave+0x7c/0xc4
[ 47.633114] __pm_runtime_resume+0x40/0x74
[ 47.641371] sh_cmt_start+0x84/0x210
[ 47.649115] sh_cmt_clocksource_enable+0x28/0x50
[ 47.657916] change_clocksource+0x9c/0x160
[ 47.666165] multi_cpu_stop+0xa4/0x190
[ 47.674056] cpu_stopper_thread+0x90/0x154
[ 47.682308] smpboot_thread_fn+0x244/0x270
[ 47.690560] kthread+0x154/0x160
[ 47.697927] ret_from_fork+0x10/0x20
[ 47.708447] clocksource: Switched to clocksource e60f0000.timer
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20201205021921.1456190-2-niklas.soderlund+renesas@ragnatech.se
|
|
Signed-off-by: Adam Ward <Adam.Ward.opensource@diasemi.com>
Link: https://lore.kernel.org/r/2cf324b68d37c4059c7995e8cab5fc9a608ea65d.1607361013.git.Adam.Ward.opensource@diasemi.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Erroneously left in when switched to using of_parse_cb()
Signed-off-by: Adam Ward <Adam.Ward.opensource@diasemi.com>
Link: https://lore.kernel.org/r/c7a9e947a9582fe0150d860b5eab7e093cd832bb.1607361013.git.Adam.Ward.opensource@diasemi.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add uv_sysfs hubless leaves for UV hubless systems.
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lkml.kernel.org/r/20201128034227.120869-4-mike.travis@hpe.com
|
|
Add uv_sysfs leaves to display the info.
Signed-off-by: Mike Travis <mike.travis@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lkml.kernel.org/r/20201128034227.120869-3-mike.travis@hpe.com
|
|
This is all a giant train wreck of error handling, in many cases the MR is
left in some corrupted state where continuing on is going to lead to
chaos, or various unwinds/order is missed.
rereg had three possible completely different actions, depending on flags
and various details about the MR. Split the three actions into three
functions, and call the right action from the start.
For each action carefully design the error handling to fit the action:
- UMR access/PD update is a simple UMR, if it fails the MR isn't changed,
so do nothing
- PAS update over UMR is multiple UMR operations. To keep everything sane
revoke access to the MKey while it is being changed and restore it once
the MR is correct.
- Recreating the mkey should completely build a parallel MR with a fully
loaded PAS then swap and destroy the old one. If it fails the original
should be left untouched. This is handled in the core code. Directly
call the normal MR creation functions, possibly re-using the existing
umem.
Add support for working with ODP MRs. The READ/WRITE access flags can be
changed by UMR and we can trivially convert to/from ODP MRs using the
logic to build a completely new MR.
This new logic also fixes various problems with MRs continuing to work
while their PAS lists are no longer valid, eg during a page size change.
Link: https://lore.kernel.org/r/20201130075839.278575-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This function handles an ODP and regular MR flow all mushed together, even
though the two flows are quite different. Split them into two dedicated
functions.
Link: https://lore.kernel.org/r/20201130075839.278575-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
mlx5 has an ugly flow where it tries to allocate a new MR and replace the
existing MR in the same memory during rereg. This is very complicated and
buggy. Instead of trying to replace in-place inside the driver, provide
support from uverbs to change the entire HW object assigned to a handle
during rereg_mr.
Since destroying a MR is allowed to fail (ie if a MW is pointing at it)
and can't be detected in advance, the algorithm creates a completely new
uobject to hold the new MR and swaps the IDR entries of the two objects.
The old MR in the temporary IDR entry is destroyed, and if it fails
rereg_mr succeeds and destruction is deferred to FD release. This
complexity is why this cannot live in a driver safely.
Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
No reason only one caller checks this. This properly blocks ODP
from the rereg flow if the device does not support ODP.
Link: https://lore.kernel.org/r/20201130075839.278575-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Unknown flags should be EOPNOTSUPP, only zero flags is EINVAL. Flags is
actually the rereg action to perform.
The checking of the start/hca_va/etc is also redundant and ib_umem_get()
does these checks and returns proper error codes.
Fixes: 7e6edb9b2e0b ("IB/core: Add user MR re-registration support")
Link: https://lore.kernel.org/r/20201130075839.278575-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Traditionally, Linux unlocks the whole flash because there are legacy
devices which has the write protection bits set by default at startup.
If you actually want to use the flash protection bits, eg. because there
is a read-only part for a bootloader, this automatic unlocking is
harmful. If there is no hardware write protection in place (usually
called WP#), a startup of the kernel just discards this protection.
I've gone through the datasheets of all the flashes (except the Intel
ones where I could not find any datasheet nor reference) which supports
the unlocking feature and looked how the sector protection was
implemented. The currently supported flashes can be divided into the
following two categories:
(1) block protection bits are non-volatile. Thus they keep their values
at reset and power-cycle
(2) flashes where these bits are volatile. After reset or power-cycle,
the whole memory array is protected.
(a) some devices needs a special "Global Unprotect" command, eg.
the Atmel AT25DF041A.
(b) some devices require to clear the BPn bits in the status
register.
Due to the reasons above, we do not want to clear the bits for flashes
which belong to category (1). Fortunately for us, only Atmel flashes
fall into category (2a). Implement the "Global Protect" and "Global
Unprotect" commands for these. For (2b) we can use normal block
protection locking scheme.
This patch adds a new flag to indicate the case (2). Only if we have
such a flash we unlock the whole flash array. To be backwards compatible
it also introduces a kernel configuration option which restores the
complete legacy behavior ("Disable write protection on any flashes").
Hopefully, this will clean up "unlock the entire flash for legacy
devices" once and for all.
For reference here are the actually commits which introduced the legacy
behavior (and extended the behavior to other chip manufacturers):
commit f80e521c916cb ("mtd: m25p80: add support for the Intel/Numonyx {16,32,64}0S33B SPI flash chips")
commit ea60658a08f8f ("mtd: m25p80: disable SST software protection bits by default")
commit 7228982442365 ("[MTD] m25p80: fix bug - ATmel spi flash fails to be copied to")
Actually, this might also fix handling of the Atmel AT25DF flashes,
because the original commit 7228982442365 ("[MTD] m25p80: fix bug -
ATmel spi flash fails to be copied to") was writing a 0 to the status
register, which is a "Global Unprotect". This might not be the case in
the current code which only handles the block protection bits BP2, BP1
and BP0. Thus, it depends on the current contents of the status register
if this unlock actually corresponds to a "Global Unprotect" command. In
the worst case, the current code might leave the AT25DF flashes in a
write protected state.
The commit 191f5c2ed4b6f ("mtd: spi-nor: use 16-bit WRR command when QE
is set on spansion flashes") changed that behavior by just clearing BP2
to BP0 instead of writing a 0 to the status register.
Further, the commit 3e0930f109e76 ("mtd: spi-nor: Rework the disabling
of block write protection") expanded the unlock_all() feature to ANY
flash which supports locking.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20201203162959.29589-8-michael@walle.cc
|
|
These flashes have some weird BP bits mapping which aren't supported in
the current locking code. Just add a simple unlock op to unprotect the
entire flash array which is needed for legacy behavior.
Fixes: 3e0930f109e7 ("mtd: spi-nor: Rework the disabling of block write protection")
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20201203162959.29589-7-michael@walle.cc
|
|
For the Atmel and SST parts this flag was already moved to individual
flash parts because it is considered bad esp. because newer flash chips
will automatically inherit the "has locking" support. While this won't
likely be the case for the Intel parts, we do it for consistency
reasons.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20201203162959.29589-6-michael@walle.cc
|
|
This is considered bad for the following reasons:
(1) We only support the block protection with BPn bits for write
protection. Not all SST parts support this.
(2) Newly added flash chip will automatically inherit the "has
locking" support and thus needs to explicitly tested. Better
be opt-in instead of opt-out.
(3) There are already supported flashes which doesn't support
the locking scheme. So I assume this wasn't properly tested
before adding that chip; which enforces my previous argument
that locking support should be an opt-in.
Remove the global flag and add individual flags to all flashes
which supports BP locking. In particular the following flashes
don't support the BP scheme:
- SST26VF016B
- SST26WF016B
- SST26VF064B
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20201203162959.29589-5-michael@walle.cc
|
|
This is considered bad for the following reasons:
(1) We only support the block protection with BPn bits for write
protection. Not all Atmel parts support this.
(2) Newly added flash chip will automatically inherit the "has
locking" support and thus needs to explicitly tested. Better
be opt-in instead of opt-out.
(3) There are already supported flashes which doesn't support
the locking scheme. So I assume this wasn't properly tested
before adding that chip; which enforces my previous argument
that locking support should be an opt-in.
Remove the global flag and add individual flags to all flashes which
supports BP locking. In particular the following flashes don't support
the BP scheme:
- AT26F004
- AT25SL321
- AT45DB081D
Please note, that some flashes which are marked as SPI_NOR_HAS_LOCK just
support Global Protection, i.e. not our supported block protection
locking scheme. This is to keep backwards compatibility with the
current "unlock all at boot" mechanism. In particular the following
flashes doesn't have BP bits:
- AT25DF041A
- AT25DF321
- AT25DF321A
- AT25DF641
- AT26DF081A
- AT26DF161A
- AT26DF321
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20201203162959.29589-4-michael@walle.cc
|
|
Just try to unlock the whole SPI-NOR flash array. Don't abort the
probing in case of an error. Justifications:
(1) For some boards, this just works because
spi_nor_write_16bit_sr_and_check() is broken and just checks the
second half of the 16bit. Once that will be fixed, SPI probe will
fail for boards which has hardware-write protected SPI-NOR flashes.
(2) Until now, hardware write-protection was the only viable solution
to use the block protection bits. This is because this very
function spi_nor_unlock_all() will be called unconditionally on
every linux boot. Therefore, this bits only makes sense in
combination with the hardware write-protection. If we would fail
the SPI probe on an error in spi_nor_unlock_all() we'd break
virtually all users of the block protection bits.
(3) We should try hard to keep the MTD working even if the flash might
not be writable/erasable.
Fixes: 3e0930f109e7 ("mtd: spi-nor: Rework the disabling of block write protection")
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20201203162959.29589-3-michael@walle.cc
|
|
This flash part actually has 4 block protection bits.
Please note, that this patch is just based on information of the
datasheet of the datasheet and wasn't tested.
Fixes: 3e0930f109e7 ("mtd: spi-nor: Rework the disabling of block write protection")
Reported-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20201203162959.29589-2-michael@walle.cc
|
|
The S28 flash family uses 2-bit ECC by default with each ECC block being
16 bytes. Under this scheme multi-pass programming to an ECC block is
not allowed. Set the writesize to make sure multi-pass programming is
not attempted on the flash.
Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Link: https://lore.kernel.org/r/20201201102711.8727-4-p.yadav@ti.com
|