Age | Commit message (Collapse) | Author |
|
This moves move pm_runtime_disable on a devm_add_action_or_reset() handler.
And with the use of the devm_uio_register_device() function, the remove
hook is no longer required.
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20201120075625.12272-2-alexandru.ardelean@analog.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This change converts the simple allocations in the driver to used
device-managed allocation functions.
This removes the error path entirely in the probe function, and reduces
some code in the remove function.
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20201120075625.12272-1-alexandru.ardelean@analog.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The pci_get_drvdata() was moved during commit ef84928cff58
("uio/uio_pci_generic: use device-managed function equivalents").
Storing a private object with pci_set_drvdata() doesn't make sense
since that change, since there is no more pci_get_drvdata() call in the
driver to retrieve the information.
This change removes it.
Fixes: ef84928cff58 ("uio/uio_pci_generic: use device-managed function equivalents")
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20201123143447.16829-1-alexandru.ardelean@analog.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This change uses devm_clk_get() to obtain a reference to the clock. It has
the benefit that clk_put() is no longer required, and cleans up the exit &
error path.
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Link: https://lore.kernel.org/r/20201119145059.48326-1-alexandru.ardelean@analog.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
NGD should depend on QCOM_RPROC_COMMON instead of selecting it, as
this will be selected by respective remoteproc driver.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20201201093843.20126-1-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The value of the variable status must be one of the 0, -EIO and -EILSEQ, so
the switch case -ENODATA is unreached. Remove it.
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Link: https://lore.kernel.org/r/1605284071-6901-1-git-send-email-kaixuxia@tencent.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There is nothing to prevent multiple commands being executed
simultaneously. Add a mutex to prevent this.
Fixes: 606397d67f41 ("fsi: Add ast2600 master driver")
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Link: https://lore.kernel.org/r/20201120004929.185239-1-joel@jms.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Because the minimum supported DPRC version is 6.0, there is no need to
check for incompatible 6.x versions lower to the minimum one. Just
remove the second half of the check to simplify the logic.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20201123164839.1668409-1-ciorneiioana@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 197f4d6a4a00 ("staging: fsl-mc: fsl-mc object allocator driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1607068967-31991-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
'type' and 'flags' fields were missing from dprc_rsp_get_obj_region
structure therefore the MC Bus driver was not receiving proper flags
from MC like DPRC_REGION_CACHEABLE.
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Cristian Sovaiala <cristian.sovaiala@freescale.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Link: https://lore.kernel.org/r/20201124111200.1391-1-laurentiu.tudor@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Some bootloaders might pause the MC firmware before starting the
kernel to ensure that MC will not cause faults as soon as SMMU
probes due to no configuration being in place for the firmware.
Make sure that MC is resumed at probe time as its SMMU setup should
be done by now.
Also included, a comment fix on how PL and BMT bits are packed in
the StreamID.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Link: https://lore.kernel.org/r/20201105153050.19662-2-laurentiu.tudor@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A previous patch accidentally dropped an error check, so add it back.
Fixes: aef85b56c3c1 ("bus: fsl-mc: MC control registers are not always available")
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Link: https://lore.kernel.org/r/20201105153050.19662-1-laurentiu.tudor@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Since 5.10-rc1 i.MX is a devicetree-only platform and the NULL check on
of_device_get_match_data() is no longer needed.
This check was only needed when this driver supported both DT and non-DT
platforms.
Remove the unneeded of_device_get_match_data() NULL check.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20201126124643.3371-1-festevam@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The soc_dev_attr is stored soc_dev->attr during soc_device_register() so
it could be used till the cleanup call: soc_device_unregister().
Therefore this memory should not be freed prior, but after unregistering
soc device.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20201207185952.261697-1-krzk@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
I can do more than just review patches here. The plan is to pick up
patches from the list and shuttle them up to gregkh. The korg tree will
be used to hold the pending patches. Move the list away from
linux-arm-msm to just be linux-kernel as SPMI isn't msm specific
anymore.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <linux-arm-msm@vger.kernel.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Link: https://lore.kernel.org/r/20201207214204.1284946-1-sboyd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In dcon_probe(), when platform_device_add() failes to add the device,
it jumps to call platform_device_unregister() to remove the device,
which is unnecessary. So use platform_device_put() instead.
Fixes: 53c43c5ca133 ("Revert "Staging: olpc_dcon: Remove obsolete driver"")
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Link: https://lore.kernel.org/r/20201120074932.31871-1-jingxiangfeng@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There is a spelling mistake in the Kconfig help text. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201126224602.13878-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'ca91cx42_alloc_consistent()' and
'tsi148_alloc_consistent()' GFP_KERNEL can be used because both functions
are called only from 'vme_alloc_consistent()' (vme.c). This function is
only called from the 'vme_user_probe()' probe function and no lock is
taken in the between.
When memory is allocated in 'ca91cx42_crcsr_init()' and
'tsi148_crcsr_init()' GFP_KERNEL can be used because both functions
are called only from their corresponding probe function and no lock is
taken in the between.
@@
@@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
@@
@@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@
@@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
@@
- PCI_DMA_NONE
+ DMA_NONE
@@
expression e1, e2, e3;
@@
- pci_alloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3;
@@
- pci_zalloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3, e4;
@@
- pci_free_consistent(e1, e2, e3, e4)
+ dma_free_coherent(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_sg(e1, e2, e3, e4)
+ dma_map_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_sg(e1, e2, e3, e4)
+ dma_unmap_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+ dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_device(e1, e2, e3, e4)
+ dma_sync_single_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+ dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+ dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20201206071352.21949-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We return 'err' in the error branch, but this variable may be set as
zero before. Fix it by setting 'err' as a negative value before we
goto the error label.
Fixes: e03327122e2c ("pci_endpoint_test: Add 2 ioctl commands")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Link: https://lore.kernel.org/r/1605790158-6780-1-git-send-email-wangxiongfeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by
an equivalent 'dma_set_mask_and_coherent()' which is much less verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20201201210147.7917-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
rtsx_pcr:
add callback functions to support runtime PM
add delay_work to put device to D3 after idle
over 10 sec
rts5249:
add extra init flow for rtd3 and set rtd3_en from
config setting
rtsx_pci_sdmmc:
child device support autosuspend
Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
Link: https://lore.kernel.org/r/20201202065857.19412-1-ricky_wu@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
changed rtsx_pci_disable_aspm() to rtsx_disable_aspm()
do not access ASPM configuration directly
changed pcie_capability_write_word() to _clear_and_set_word()
make sure only change PCI_EXP_LNKCTL bit8
make sure ASPM disable after extra_init_hw()
Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
Link: https://lore.kernel.org/r/20201202063228.18319-1-ricky_wu@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
enable/disable device ASPM function:
changed write ASPM configuration directly to use write register
Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
Link: https://lore.kernel.org/r/20201202063124.18262-1-ricky_wu@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
vmci_ctx_get_chkpt_doorbells()
A kernel-infoleak was reported by syzbot, which was caused because
dbells was left uninitialized.
Using kzalloc() instead of kmalloc() fixes this issue.
Reported-by: syzbot+a79e17c39564bedf0930@syzkaller.appspotmail.com
Tested-by: syzbot+a79e17c39564bedf0930@syzkaller.appspotmail.com
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
Link: https://lore.kernel.org/r/20201122224534.333471-1-anant.thazhemadam@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The usage of in_interrupt() in xpc_partition_disengaged() is clearly
intended to avoid canceling the timeout timer when the function is invoked
from the timer callback.
While in_interrupt() is deprecated and ill defined as it does not provide
what the name suggests it catches the intended case.
Add an argument to xpc_partition_disengaged() which is true if called
from timer and otherwise false.
Use del_timer_sync() instead of del_singleshot_timer_sync() which is the
same thing.
Note: This does not prevent reentrancy into the function as the function
has no concurrency control and timer callback and regular task context
callers can happen concurrently on different CPUs or the timer can
interrupt the task context before it is able to cancel it.
While the only driver which is providing the arch_xpc_ops callbacks
(xpc_uv) seems not to have a reentrancy problem and the only negative
effect would be a double dev_info() entry in dmesg, the whole mechanism is
conceptually broken.
But that's not subject of this cleanup endeavour and left as an exercise to
the folks who might have interest to make that code fully correct.
[bigeasy: Add the argument, use del_timer_sync().]
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Cliff Whickman <cpw@sgi.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Steve Wahl <steve.wahl@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20201119103151.ppo45mj53ulbxjx4@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The operation of get/set mode was same with get/set resolution. It is
a typo absolutely. This patch updates these bits operated by get/set
mode.
Signed-off-by: Zhou Xingxing <zhou_x1@hoperun.com>
Link: https://lore.kernel.org/r/1607334545-2091-1-git-send-email-zhou_x1@hoperun.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This works towards the goal mentioned in 2006 in commit 594c8281f905
("[PATCH] Add bus_type probe, remove, shutdown methods.").
The functions are moved to where the other bus_type functions are
defined and renamed to match the already established naming scheme.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20201119124611.2573057-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of overwriting the core driver's probe function handle probing
devices for drivers loaded by platform_driver_probe() in the platform
driver probe function.
The intended goal is to not have to change the probe function to
simplify converting the platform bus to use bus functions.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20201119124611.2573057-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This way all callbacks and structures used to initialize
platform_bus_type are defined just before platform_bus_type and in the
same order. Also move platform_drv_probe_fail just before it's only
user.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20201119124611.2573057-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
It's only used inside drivers/base/dd.c
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Link: https://lore.kernel.org/r/20201123111938.18968-1-jwi@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
These were just some minor typos that have crept in recently and are
easily fixed.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20201127104630.1839171-1-thierry.reding@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Device drivers usually depend on the fact that the devices that they
control are suspended in the same order that they were probed in. In
most cases this is already guaranteed via deferred probe.
However, there's one case where this can still break: if a device is
instantiated before a dependency (for example if it appears before the
dependency in device tree) but gets probed only after the dependency is
probed. Instantiation order would cause the dependency to get probed
later, in which case probe of the original device would be deferred and
the suspend/resume queue would get reordered properly. However, if the
dependency is provided by a built-in driver and the device depending on
that driver is controlled by a loadable module, which may only get
loaded after the root filesystem has become available, we can be faced
with a situation where the probe order ends up being different from the
suspend/resume order.
One example where this happens is on Tegra186, where the ACONNECT is
listed very early in device tree (sorted by unit-address) and depends on
BPMP (listed very late because it has no unit-address) for power domains
and clocks/resets. If the ACONNECT driver is built-in, there is no
problem because it will be probed before BPMP, causing a probe deferral
and that in turn reorders the suspend/resume queue. However, if built as
a module, it will end up being probed after BPMP, and therefore not
result in a probe deferral, and therefore the suspend/resume queue will
stay in the instantiation order. This in turn causes problems because
ACONNECT will be resumed before BPMP, which will result in a hang
because the ACONNECT's power domain cannot be powered on as long as the
BPMP is still suspended.
Fix this by always reordering devices on successful probe. This ensures
that the suspend/resume queue is always in probe order and hence meets
the natural expectations of drivers vs. their dependencies.
Reported-by: Jonathan Hunter <jonathanh@nvidia.com>
Acked-by: Rafael. J. Wysocki <rafael@kernel.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/20201203175756.1405564-1-thierry.reding@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Since btrfs scrub is utilizing its own infrastructure to submit
read/write, scrub is independent from all other routines.
This brings one very neat feature, allow us to read 4K data into offset
0 of a 64K page. So is the writeback routine.
This makes scrub on subpage sector size much easier to implement, and
thanks to previous commits which just changed the implementation to
always do scrub based on sector size, now scrub can handle subpage
filesystem without any problem.
This patch will just remove the restriction on
(sectorsize != PAGE_SIZE), to make scrub finally work on subpage
filesystems.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Btrfs scrub is more flexible than buffered data write path, as we can
read an unaligned subpage data into page offset 0.
This ability makes subpage support much easier, we just need to check
each scrub_page::page_len and ensure we only calculate hash for [0,
page_len) of a page.
There is a small thing to notice: for subpage case, we still do sector
by sector scrub. This means we will submit a read bio for each sector
to scrub, resulting in the same amount of read bios, just like on the 4K
page systems.
This behavior can be considered as a good thing, if we want everything
to be the same as 4K page systems. But this also means, we're wasting
the possibility to submit larger bio using 64K page size. This is
another problem to consider in the future.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
To support subpage tree block scrub, scrub_checksum_tree_block() only
needs to learn 2 new tricks:
- Follow sector size
Now scrub_page only represents one sector, we need to follow it
properly.
- Run checksum on all sectors
Since scrub_page only represents one sector, we need to run checksum
on all sectors, not only (nodesize >> PAGE_SIZE).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
For scrub_pages() and scrub_pages_for_parity(), we currently allocate
one scrub_page structure for one page.
This is fine if we only read/write one sector one time. But for cases
like scrubbing RAID56, we need to read/write the full stripe, which is
in 64K size for now.
For subpage size, we will submit the read in just one page, which is
normally a good thing, but for RAID56 case, it only expects to see one
sector, not the full stripe in its endio function.
This could lead to wrong parity checksum for RAID56 on subpage.
To make the existing code work well for subpage case, here we take a
shortcut by always allocating a full page for one sector.
This should provide the base to make RAID56 work for subpage case.
The cost is pretty obvious now, for one RAID56 stripe now we always need
16 pages. For support subpage situation (64K page size, 4K sector size),
this means we need full one megabyte to scrub just one RAID56 stripe.
And for data scrub, each 4K sector will also need one 64K page.
This is mostly just a workaround, the proper fix for this is a much
larger project, using scrub_block to replace scrub_page, and allow
scrub_block to handle multi pages, csums, and csum_bitmap to avoid
allocating one page for each sector.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Btrfs on-disk format chose to use u64 for almost everything, but there
are a other restrictions that won't let us use more than u32 for things
like extent length (the maximum length is 128MiB for non-hole extents),
or stripe length (we have device number limit).
This means if we don't have extra handling to convert u64 to u32, we
will always have some questionable operations like
"u32 = u64 >> sectorsize_bits" in the code.
This patch will try to address the problem by reducing the width for the
following members/parameters:
- scrub_parity::stripe_len
- @len of scrub_pages()
- @extent_len of scrub_remap_extent()
- @len of scrub_parity_mark_sectors_error()
- @len of scrub_parity_mark_sectors_data()
- @len of scrub_extent()
- @len of scrub_pages_for_parity()
- @len of scrub_extent_for_parity()
For members extracted from on-disk structure, like map->stripe_len, they
will be kept as is. Since that modification would require on-disk format
change.
There will be cases like "u32 = u64 - u64" or "u32 = u64", for such call
sites, extra ASSERT() is added to be extra safe for debug builds.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Refactor btrfs_lookup_bio_sums() by:
- Remove the @file_offset parameter
There are two factors making the @file_offset parameter useless:
* For csum lookup in csum tree, file offset makes no sense
We only need disk_bytenr, which is unrelated to file_offset
* page_offset (file offset) of each bvec is not contiguous.
Pages can be added to the same bio as long as their on-disk bytenr
is contiguous, meaning we could have pages at different file offsets
in the same bio.
Thus passing file_offset makes no sense any more.
The only user of file_offset is for data reloc inode, we will use
a new function, search_file_offset_in_bio(), to handle it.
- Extract the csum tree lookup into search_csum_tree()
The new function will handle the csum search in csum tree.
The return value is the same as btrfs_find_ordered_sum(), returning
the number of found sectors which have checksum.
- Change how we do the main loop
The only needed info from bio is:
* the on-disk bytenr
* the length
After extracting the above info, we can do the search without bio
at all, which makes the main loop much simpler:
for (cur_disk_bytenr = orig_disk_bytenr;
cur_disk_bytenr < orig_disk_bytenr + orig_len;
cur_disk_bytenr += count * sectorsize) {
/* Lookup csum tree */
count = search_csum_tree(fs_info, path, cur_disk_bytenr,
search_len, csum_dst);
if (!count) {
/* Csum hole handling */
}
}
- Use single variable as the source to calculate all other offsets
Instead of all different type of variables, we use only one main
variable, cur_disk_bytenr, which represents the current disk bytenr.
All involved values can be calculated from that variable, and
all those variable will only be visible in the inner loop.
The above refactoring makes btrfs_lookup_bio_sums() way more robust than
it used to be, especially related to the file offset lookup. Now
file_offset lookup is only related to data reloc inode, otherwise we
don't need to bother file_offset at all.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The function btrfs_lookup_bio_sums() is only called for read bios.
While btrfs_find_ordered_sum() is to search ordered extent sums, which
is only for write path.
This means to read a page we either:
- Submit read bio if it's not uptodate
This means we only need to search csum tree for checksums.
- The page is already uptodate
It can be marked uptodate for previous read, or being marked dirty.
As we always mark page uptodate for dirty page.
In that case, we don't need to submit read bio at all, thus no need
to search any checksums.
Remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums().
And since btrfs_lookup_bio_sums() is the only caller for
btrfs_find_ordered_sum(), also remove the implementation.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
To support sectorsize < PAGE_SIZE case, we need to take extra care of
extent buffer accessors.
Since sectorsize is smaller than PAGE_SIZE, one page can contain
multiple tree blocks, we must use eb->start to determine the real offset
to read/write for extent buffer accessors.
This patch introduces two helpers to do this:
- get_eb_page_index()
This is to calculate the index to access extent_buffer::pages.
It's just a simple wrapper around "start >> PAGE_SHIFT".
For sectorsize == PAGE_SIZE case, nothing is changed.
For sectorsize < PAGE_SIZE case, we always get index as 0, and
the existing page shift also works.
- get_eb_offset_in_page()
This is to calculate the offset to access extent_buffer::pages.
This needs to take extent_buffer::start into consideration.
For sectorsize == PAGE_SIZE case, extent_buffer::start is always
aligned to PAGE_SIZE, thus adding extent_buffer::start to
offset_in_page() won't change the result.
For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives
us the correct offset to access.
This patch will touch the following parts to cover all extent buffer
accessors:
- BTRFS_SETGET_HEADER_FUNCS()
- read_extent_buffer()
- read_extent_buffer_to_user()
- memcmp_extent_buffer()
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer_full()
- copy_extent_buffer()
- memcpy_extent_buffer()
- memmove_extent_buffer()
- btrfs_get_token_##bits()
- btrfs_get_##bits()
- btrfs_set_token_##bits()
- btrfs_set_##bits()
- generic_bin_search()
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
For subpage sized extent buffer, we have ensured no extent buffer will
cross page boundary, thus we would only need one page for any extent
buffer.
Update function num_extent_pages to handle such case. Now
num_extent_pages() returns 1 for subpage sized extent buffer.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
As a preparation for subpage sector size support (allowing filesystem
with sector size smaller than page size to be mounted) if the sector
size is smaller than page size, we don't allow tree block to be read if
it crosses 64K(*) boundary.
The 64K is selected because:
- we are only going to support 64K page size for subpage for now
- 64K is also the maximum supported node size
This ensures that tree blocks are always contained in one page for a
system with 64K page size, which can greatly simplify the handling.
Otherwise we would have to do complex multi-page handling of tree
blocks. Currently there is no way to create such tree blocks.
In kernel we have avoided such tree blocks allocation even on 4K page
size, as it can lead to RAID56 stripe scrubbing.
While btrfs-progs have fixed its chunk allocator since 2016 for convert,
and has extra checks to do the same behavior as the kernel.
Just add such graceful checks in case of an ancient filesystem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Btrfs only support 64K as maximum node size, thus for 4K page system, we
would have at most 16 pages for one extent buffer.
For a system using 64K page size, we would really have just one page.
While we always use 16 pages for extent_buffer::pages, this means for
systems using 64K pages, we are wasting memory for 15 page pointers
which will never be used.
Calculate the array size based on page size and the node size maximum.
- for systems using 4K page size, it will stay 16 pages
- for systems using 64K page size, it will be 1 page
Move the definition of BTRFS_MAX_METADATA_BLOCKSIZE to btrfs_tree.h, to
avoid circular inclusion of ctree.h.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In btree_write_cache_pages() we have a btree page submission routine
buried deeply in a nested loop.
This patch will extract that part of code into a helper function,
submit_eb_page(), to do the same work.
Since submit_eb_page() now can return >0 for successful extent
buffer submission, remove the "ASSERT(ret <= 0);" line.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently btrfs_verify_data_csum() just passes the whole page to
check_data_csum(), which is fine since we only support sectorsize ==
PAGE_SIZE.
To support subpage, we need to properly honor per-sector
checksum verification, just like what we did in dio read path.
This patch will do the csum verification in a for loop, starts with
pg_off == start - page_offset(page), with sectorsize increase for
each loop.
For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
will only loop once.
For subpage case, we do the iterate over each sector and if we found any
error, we return error.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Parameter icsum for check_data_csum() is a little hard to understand.
So is the phy_offset for btrfs_verify_data_csum().
Both parameters are calculated values for csum lookup.
Instead of some calculated value, just pass bio_offset and let the
final and only user, check_data_csum(), calculate whatever it needs.
Since we are here, also make the bio_offset parameter and some related
variables to be u32 (unsigned int).
As bio size is limited by its bi_size, which is unsigned int, and has
extra size limit check during various bio operations.
Thus we are ensured that bio_offset won't overflow u32.
Thus for all involved functions, not only rename the parameter from
@phy_offset to @bio_offset, but also reduce its width to u32, so we
won't have suspicious "u32 = u64 >> sector_bits;" lines anymore.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The parameter bio_offset of extent_submit_bio_start_t is very confusing.
If it's really bio_offset (offset to bio), then it should be u32. But
in fact, it's only utilized by dio read, and that member is used as file
offset, which must be u64.
Rename it to dio_file_offset since the only user uses it as file offset,
and add comment for who is using it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
A lock dependency loop exists between the root tree lock, the extent tree
lock, and the free space tree lock.
The root tree lock depends on the free space tree lock because
btrfs_create_tree holds the new tree's lock while adding it to the root
tree.
The extent tree lock depends on the root tree lock because during
umount, we write out space cache v1, which writes inodes in the root
tree, which results in holding the root tree lock while doing a lookup
in the extent tree.
Finally, the free space tree depends on the extent tree because
populate_free_space_tree holds a locked path in the extent tree and then
does a lookup in the free space tree to add the new item.
The simplest of the three to break is the one during tree creation: we
unlock the leaf before inserting the tree node into the root tree, which
fixes the lockdep warning.
[30.480136] ======================================================
[30.480830] WARNING: possible circular locking dependency detected
[30.481457] 5.9.0-rc8+ #76 Not tainted
[30.481897] ------------------------------------------------------
[30.482500] mount/520 is trying to acquire lock:
[30.483064] ffff9babebe03908 (btrfs-free-space-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
[30.484054]
but task is already holding lock:
[30.484637] ffff9babebe24468 (btrfs-extent-01#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
[30.485581]
which lock already depends on the new lock.
[30.486397]
the existing dependency chain (in reverse order) is:
[30.487205]
-> #2 (btrfs-extent-01#2){++++}-{3:3}:
[30.487825] down_read_nested+0x43/0x150
[30.488306] __btrfs_tree_read_lock+0x39/0x180
[30.488868] __btrfs_read_lock_root_node+0x3a/0x50
[30.489477] btrfs_search_slot+0x464/0x9b0
[30.490009] check_committed_ref+0x59/0x1d0
[30.490603] btrfs_cross_ref_exist+0x65/0xb0
[30.491108] run_delalloc_nocow+0x405/0x930
[30.491651] btrfs_run_delalloc_range+0x60/0x6b0
[30.492203] writepage_delalloc+0xd4/0x150
[30.492688] __extent_writepage+0x18d/0x3a0
[30.493199] extent_write_cache_pages+0x2af/0x450
[30.493743] extent_writepages+0x34/0x70
[30.494231] do_writepages+0x31/0xd0
[30.494642] __filemap_fdatawrite_range+0xad/0xe0
[30.495194] btrfs_fdatawrite_range+0x1b/0x50
[30.495677] __btrfs_write_out_cache+0x40d/0x460
[30.496227] btrfs_write_out_cache+0x8b/0x110
[30.496716] btrfs_start_dirty_block_groups+0x211/0x4e0
[30.497317] btrfs_commit_transaction+0xc0/0xba0
[30.497861] sync_filesystem+0x71/0x90
[30.498303] btrfs_remount+0x81/0x433
[30.498767] reconfigure_super+0x9f/0x210
[30.499261] path_mount+0x9d1/0xa30
[30.499722] do_mount+0x55/0x70
[30.500158] __x64_sys_mount+0xc4/0xe0
[30.500616] do_syscall_64+0x33/0x40
[30.501091] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[30.501629]
-> #1 (btrfs-root-00){++++}-{3:3}:
[30.502241] down_read_nested+0x43/0x150
[30.502727] __btrfs_tree_read_lock+0x39/0x180
[30.503291] __btrfs_read_lock_root_node+0x3a/0x50
[30.503903] btrfs_search_slot+0x464/0x9b0
[30.504405] btrfs_insert_empty_items+0x60/0xa0
[30.504973] btrfs_insert_item+0x60/0xd0
[30.505412] btrfs_create_tree+0x1b6/0x210
[30.505913] btrfs_create_free_space_tree+0x54/0x110
[30.506460] btrfs_mount_rw+0x15d/0x20f
[30.506937] btrfs_remount+0x356/0x433
[30.507369] reconfigure_super+0x9f/0x210
[30.507868] path_mount+0x9d1/0xa30
[30.508264] do_mount+0x55/0x70
[30.508668] __x64_sys_mount+0xc4/0xe0
[30.509186] do_syscall_64+0x33/0x40
[30.509652] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[30.510271]
-> #0 (btrfs-free-space-00){++++}-{3:3}:
[30.510972] __lock_acquire+0x11ad/0x1b60
[30.511432] lock_acquire+0xa2/0x360
[30.511917] down_read_nested+0x43/0x150
[30.512383] __btrfs_tree_read_lock+0x39/0x180
[30.512947] __btrfs_read_lock_root_node+0x3a/0x50
[30.513455] btrfs_search_slot+0x464/0x9b0
[30.513947] search_free_space_info+0x45/0x90
[30.514465] __add_to_free_space_tree+0x92/0x39d
[30.515010] btrfs_create_free_space_tree.cold.22+0x1ee/0x45d
[30.515639] btrfs_mount_rw+0x15d/0x20f
[30.516142] btrfs_remount+0x356/0x433
[30.516538] reconfigure_super+0x9f/0x210
[30.517065] path_mount+0x9d1/0xa30
[30.517438] do_mount+0x55/0x70
[30.517824] __x64_sys_mount+0xc4/0xe0
[30.518293] do_syscall_64+0x33/0x40
[30.518776] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[30.519335]
other info that might help us debug this:
[30.520210] Chain exists of:
btrfs-free-space-00 --> btrfs-root-00 --> btrfs-extent-01#2
[30.521407] Possible unsafe locking scenario:
[30.522037] CPU0 CPU1
[30.522456] ---- ----
[30.522941] lock(btrfs-extent-01#2);
[30.523311] lock(btrfs-root-00);
[30.523952] lock(btrfs-extent-01#2);
[30.524620] lock(btrfs-free-space-00);
[30.525068]
*** DEADLOCK ***
[30.525669] 5 locks held by mount/520:
[30.526116] #0: ffff9babebc520e0 (&type->s_umount_key#37){+.+.}-{3:3}, at: path_mount+0x7ef/0xa30
[30.527056] #1: ffff9babebc52640 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x3d5/0x5c0
[30.527960] #2: ffff9babeae8f2e8 (&cache->free_space_lock#2){+.+.}-{3:3}, at: btrfs_create_free_space_tree.cold.22+0x101/0x45d
[30.529118] #3: ffff9babebe24468 (btrfs-extent-01#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
[30.530113] #4: ffff9babebd52eb8 (btrfs-extent-00){++++}-{3:3}, at: btrfs_try_tree_read_lock+0x16/0x100
[30.531124]
stack backtrace:
[30.531528] CPU: 0 PID: 520 Comm: mount Not tainted 5.9.0-rc8+ #76
[30.532166] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-4.module_el8.1.0+248+298dec18 04/01/2014
[30.533215] Call Trace:
[30.533452] dump_stack+0x8d/0xc0
[30.533797] check_noncircular+0x13c/0x150
[30.534233] __lock_acquire+0x11ad/0x1b60
[30.534667] lock_acquire+0xa2/0x360
[30.535063] ? __btrfs_tree_read_lock+0x39/0x180
[30.535525] down_read_nested+0x43/0x150
[30.535939] ? __btrfs_tree_read_lock+0x39/0x180
[30.536400] __btrfs_tree_read_lock+0x39/0x180
[30.536862] __btrfs_read_lock_root_node+0x3a/0x50
[30.537304] btrfs_search_slot+0x464/0x9b0
[30.537713] ? trace_hardirqs_on+0x1c/0xf0
[30.538148] search_free_space_info+0x45/0x90
[30.538572] __add_to_free_space_tree+0x92/0x39d
[30.539071] ? printk+0x48/0x4a
[30.539367] btrfs_create_free_space_tree.cold.22+0x1ee/0x45d
[30.539972] btrfs_mount_rw+0x15d/0x20f
[30.540350] btrfs_remount+0x356/0x433
[30.540773] ? shrink_dcache_sb+0xd9/0x100
[30.541203] reconfigure_super+0x9f/0x210
[30.541642] path_mount+0x9d1/0xa30
[30.542040] do_mount+0x55/0x70
[30.542366] __x64_sys_mount+0xc4/0xe0
[30.542822] do_syscall_64+0x33/0x40
[30.543197] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[30.543691] RIP: 0033:0x7f109f7ab93a
[30.546042] RSP: 002b:00007ffc47c4f858 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[30.546770] RAX: ffffffffffffffda RBX: 00007f109f8cf264 RCX: 00007f109f7ab93a
[30.547485] RDX: 0000557e6fc10770 RSI: 0000557e6fc19cf0 RDI: 0000557e6fc19cd0
[30.548185] RBP: 0000557e6fc10520 R08: 0000557e6fc18e30 R09: 0000557e6fc18cb0
[30.548911] R10: 0000000000200020 R11: 0000000000000246 R12: 0000000000000000
[30.549606] R13: 0000557e6fc19cd0 R14: 0000557e6fc10770 R15: 0000557e6fc10520
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we are not using space cache v1, we should not create the free space
object or free space inodes. This comes up when we delete the existing
free space objects/inodes when migrating to v2, only to see them get
recreated for every dirtied block group.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When the filesystem transitions from space cache v1 to v2 or to
nospace_cache, it removes the old cached data, but does not remove
the FREE_SPACE items nor the free space inodes they point to. This
doesn't cause any issues besides being a bit inefficient, since these
items no longer do anything useful.
To fix it, when we are mounting, and plan to disable the space cache,
destroy each block group's free space item and free space inode.
The code to remove the items is lifted from the existing use case of
removing the block group, with a light adaptation to handle whether or
not we have already looked up the free space inode.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|