summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-10-19net: sched: remove one pair of atomic operationsEric Dumazet
__QDISC_STATE_RUNNING is only set/cleared from contexts owning qdisc lock. Thus we can use less expensive bit operations, as we were doing before commit f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount") Fixes: 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ahmed S. Darwish <a.darwish@linutronix.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-19net: sched: fix logic error in qdisc_run_begin()Eric Dumazet
For non TCQ_F_NOLOCK qdisc, qdisc_run_begin() tries to set __QDISC_STATE_RUNNING and should return true if the bit was not set. test_and_set_bit() returns old bit value, therefore we need to invert. Fixes: 29cbcd858283 ("net: sched: Remove Qdisc::running sequence counter") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ahmed S. Darwish <a.darwish@linutronix.de> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-19Merge tag 'v5.15-next-dts64' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux into arm/dt Biggest change is, that we have now support for a reset controller inside the mmsys. This goes inhand with changes to the driver, that you will find in the soc pull request. Mediatek PCI device tree binding described the root port in a wrong. The IP actaully implements several root complex with everyone having a single root port. We need to fix the DT in an incompatible way to describe the HW as it is. This also fixes a problem that no IRQ bigger then 32 could be handled. The only public available HW that is affected by this is the BananaPi R64. I'm not aware that there is a big user base using the upstream kernel. In this boards PCI is only used for extension cards, so I don't expect any boot problems. - mt8173: add reset for dsi0 to mmsys - move dt-bindings reset controller includes to correct folder - split PCIe node to use new format for mt2712 and mt7622 - mt8183: add audio node to chromebook devices - mt8192: add clock controller node * tag 'v5.15-next-dts64' of git://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux: arm64: dts: mt8183: Add the mmsys reset bit to reset the dsi0 arm64: dts: mt8173: Add the mmsys reset bit to reset the dsi0 dt-bindings: display: mediatek: add dsi reset optional property dt-bindings: mediatek: Add #reset-cells to mmsys system controller arm64: dts: mediatek: Move reset controller constants into common location arm64: dts: mediatek: Split PCIe node for MT2712 and MT7622 arm64: dts: mt8183: add kukui platform audio node arm64: dts: mt8183: add audio node arm64: dts: mediatek: Add mt8192 clock controllers Link: https://lore.kernel.org/r/1a3d63a3-c020-3319-26f6-a2ec338cc42e@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-19scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitionsSreekanth Reddy
Add 22.5 Gbps link rate definitions. Link: https://lore.kernel.org/r/20211018070611.26428-1-sreekanth.reddy@broadcom.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-19Merge brank 'mlx5_mkey' into rdma.git for-nextJason Gunthorpe
A small series to clean up the mlx5 mkey code across the mlx5_core and InfiniBand. * branch 'mlx5_mkey': RDMA/mlx5: Attach ndescs to mlx5_ib_mkey RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key RDMA/mlx5: Remove pd from struct mlx5_core_mkey RDMA/mlx5: Remove size from struct mlx5_core_mkey RDMA/mlx5: Remove iova from struct mlx5_core_mkey Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-19lib/xz: Add MicroLZMA decoderLasse Collin
MicroLZMA is a yet another header format variant where the first byte of a raw LZMA stream (without the end of stream marker) has been replaced with a bitwise-negation of the lc/lp/pb properties byte. MicroLZMA was created to be used in EROFS but can be used by other things too where wasting minimal amount of space for headers is important. This is implemented using most of the LZMA2 code as is so the amount of new code is small. The API has a few extra features compared to the XZ decoder. On the other hand, the API lacks XZ_BUF_ERROR support which is important to take into account when using this API. MicroLZMA doesn't support BCJ filters. In theory they could be added later as there are many unused/reserved values for the first byte of the compressed stream but in practice it is somewhat unlikely to happen due to a few implementation reasons. Link: https://lore.kernel.org/r/20211010213145.17462-5-xiang@kernel.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-19ASoC: Intel: Move soc_intel_is_foo() helpers to a generic headerHans de Goede
The soc_intel_is_foo() helpers from sound/soc/intel/common/soc-intel-quirks.h are useful outside of the sound subsystem too. Move these to include/linux/platform_data/x86/soc.h, so that other code can use them too. Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20211018143324.296961-2-hdegoede@redhat.com
2021-10-19perf: Add mem_hops field in perf_mem_data_src structureKajol Jain
Going forward, future generation systems can have more hierarchy within the node/package level but currently we don't have any data source encoding field in perf, which can be used to represent this level of data. Add a new field called 'mem_hops' in the perf_mem_data_src structure which can be used to represent intra-node/package or inter-node/off-package details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value can be used to present different hop levels data. Also add corresponding macros to define mem_hop field values and shift value. Currently we define macro for HOPS_0 which corresponds to data coming from another core but same node. For ex: Encodings for mem_hops fields with L2 cache: L2 - local L2 L2 | REMOTE | HOPS_0 - remote core, same node L2 Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211006140654.298352-3-kjain@linux.ibm.com
2021-10-19perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove ↵Kajol Jain
an extra line Add a comment about PERF_MEM_LVL_* namespace being depricated to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Remove an extra line present in perf_mem__lvl_scnprintf function. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211006140654.298352-2-kjain@linux.ibm.com
2021-10-19block: attempt direct issue of plug listJens Axboe
If we have just one queue type in the plug list, then we can extend our direct issue to cover a full plug list as well. This allows sending a batch of requests for direct issue, which is more efficient than doing one-at-a-time kind of issue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: change plugging to use a singly linked listJens Axboe
Use a singly linked list for the blk_plug. This saves 8 bytes in the blk_plug struct, and makes for faster list manipulations than doubly linked lists. As we don't use the doubly linked lists for anything, singly linked is just fine. This yields a bump in default (merging enabled) performance from 7.0 to 7.1M IOPS, and ~7.5M IOPS with merging disabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: move blk_mq_tag_to_rq() inlineJens Axboe
This is in the fast path of driver issue or completion, and it's a single array index operation. Move it inline to avoid a function call for it. This does mean making struct blk_mq_tags block layer public, but there's not really much in there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: don't call blk_status_to_errno in blk_update_requestChristoph Hellwig
We only need to call it to resolve the blk_status_t -> errno mapping for tracing, so move the conversion into the tracepoints that are not called at all when tracing isn't enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: move bdev_read_only() into the headerJens Axboe
This is called for every write in the fast path, move it inline next to get_disk_ro() which is called internally. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: add flag to not fail link after timeoutPavel Begunkov
For some reason non-off IORING_OP_TIMEOUT always fails links, it's pretty inconvenient and unnecessary limits chaining after it to hard linking, which is far from ideal, e.g. doesn't pair well with timeout cancellation. Add a flag forcing it to not fail links on -ETIME. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/17c7ec0fb7a6113cc6be8cdaedcada0ba836ac0e.1633199723.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19io_uring: dump sqe contents if issue failsJens Axboe
I recently had to look at a production problem where a request ended up getting the dreaded -EINVAL error on submit. The most used and hence useless of error codes, as it just tells you that something was wrong with your request, but not more than that. Let's dump the full sqe contents if we run into an issue failure, that'll allow easier diagnosing of a wide variety of issues. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19ethernet: add a helper for assigning port addressesJakub Kicinski
We have 5 drivers which offset base MAC addr by port id. Create a helper for them. This helper takes care of overflows, which some drivers did not do, please complain if that's going to break anything! Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ibAharon Landau
Move mlx5_core_mkey struct to mlx5_ib, as the mlx5_core doesn't use it at this point. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19RDMA/mlx5: Replace struct mlx5_core_mkey by u32 keyAharon Landau
In mlx5_core and vdpa there is no use of mlx5_core_mkey members except for the key itself. As preparation for moving mlx5_core_mkey to mlx5_ib, the occurrences of struct mlx5_core_mkey in all modules except for mlx5_ib are replaced by a u32 key. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19RDMA/mlx5: Remove pd from struct mlx5_core_mkeyAharon Landau
There is no read of mkey->pd, only write. Remove it. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19RDMA/mlx5: Remove size from struct mlx5_core_mkeyAharon Landau
mkey->size is already stored in ibmr->length, no need to store it here. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19RDMA/mlx5: Remove iova from struct mlx5_core_mkeyAharon Landau
iova is already stored in ibmr->iova, no need to store it here. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19net: sch_tbf: Add a graft commandPetr Machata
As another qdisc is linked to the TBF, the latter should issue an event to give drivers a chance to react to the grafting. In other qdiscs, this event is called GRAFT, so follow suit with TBF as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19mmc: sdhci-pci: Remove dead code (struct sdhci_pci_data et al)Andy Shevchenko
The last user of this struct gone a couple of releases ago. Besides that there were not so many users of this API for more than 10 years: 1/ The one is Intel Merrifield, that had been added 2016-08-31 by the commit 3976b0380b31 ("x86/platform/intel-mid: Enable SD card detection on Merrifield") and removed 2021-02-11 by the commit 4590d98f5a4f ("sfi: Remove framework for deprecated firmware"). 2/ The other is Intel Sunrisepoint related, that had been added 2015-02-06 by the commit e1bfad6d936d ("mmc: sdhci-pci: Add support for drive strength selection for SPT") and removed 2017-03-20 by the commit 51ced59cc02e ("mmc: sdhci-pci: Use ACPI DSM to get driver strength for some Intel devices"). Effectively this is a revert of the commit 52c506f0bc72 ("mmc: sdhci-pci: add platform data"). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20211014132613.27861-4-andriy.shevchenko@linux.intel.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-19Merge branch 'timers/drivers/armv8.6_arch_timer' of ↵Will Deacon
https://git.linaro.org/people/daniel.lezcano/linux into for-next/8.6-timers Pull Arm architected timer driver rework from Marc (via Daniel) so that we can add the Armv8.6 support on top. Link: https://lore.kernel.org/r/d0c55386-2f7f-a940-45bb-d80ae5e0f378@linaro.org * 'timers/drivers/armv8.6_arch_timer' of https://git.linaro.org/people/daniel.lezcano/linux: clocksource/drivers/arch_arm_timer: Move workaround synchronisation around clocksource/drivers/arm_arch_timer: Fix masking for high freq counters clocksource/drivers/arm_arch_timer: Drop unnecessary ISB on CVAL programming clocksource/drivers/arm_arch_timer: Remove any trace of the TVAL programming interface clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations clocksource/drivers/arm_arch_timer: Advertise 56bit timer to the core code clocksource/drivers/arm_arch_timer: Move MMIO timer programming over to CVAL clocksource/drivers/arm_arch_timer: Fix MMIO base address vs callback ordering issue clocksource/drivers/arm_arch_timer: Move drop _tval from erratum function names clocksource/drivers/arm_arch_timer: Move system register timer programming over to CVAL clocksource/drivers/arm_arch_timer: Extend write side of timer register accessors to u64 clocksource/drivers/arm_arch_timer: Drop CNT*_TVAL read accessors clocksource/arm_arch_timer: Add build-time guards for unhandled register accesses
2021-10-19Merge tag 'iio-for-5.16a-split-take4' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: First set of IIO new device and feature support for the 5.16 cycle Counter subsystem changes now sent separately. This has been a busy cycle, so lots here and a few more stragglers to come next week. Big new feature in this cycle is probably output buffer support. This has been in the works for a very long time so it's great to see Mihail pick up the challenge and build upon his predecessors work to finally bring this feature to mainline. New device support ------------------ * adi,adxl313 - New driver and dt bindings for this low power accelerometer. * adi,adxl355 - New driver and dt bindings for this accelerometer. - Later series adds buffer support. * asahi-kasei,ak8975 - Minor additions to driver to support ak09916 * aspeed,aspeed-adc - Substantial rework plus feature additions to add support for the ast2600 including a new dt bindings doc. * atmel,at91_sama5d2 - Rework and support introduced for the sama7g5 parts. * maxim,max31865 - New driver and bindings for this RTD temperature sensor chip. * nxp,imx8qxp - New driver and bindings for the ADC found on the i.MX 8QuadXPlus Soc. * senseair,sunrise - New driver and bindings for this family of carbon dioxide gas sensors. * sensiron,scd4x - New driver and bindings for this carbon dioxide gas sensor. New features ------------ * Output buffer support. Works in a similar fashion to input buffers, but in this case userspace pushes data into the kfifo which is then drained to the device when a trigger occurs. Support added to the ad5766 DAC driver. * Core, devm_iio_map_array_register() to avoid need for devm_add_action_or_reset() based cleanup in fully managed allocation drivers. * Core iio_push_to_buffers_with_ts_unaligned() function to safely handle a few drivers where it really hard to ensure the correct data alignment in an iio_push_to_buffers_with_timestamp() call. Note this uses a bounce buffer so should be avoided whenever possible. Used in the ti,adc108s102, invense,mpu3050 and adi,adis16400. This closes the last known set of drivers with alignment issues at this interface. * maxim,max1027 - Substantial rework to this driver main target of which was supporting use of other triggers than it's own EOC interrupt. - Transfer optimization. * nxp,fxls8962af - Threshold even support including using it as a wakeup source. Cleanups, minor fixes etc ------------------------- Chances of a common type to multiple drivers: * devm_ conversion and drop of .remove() callbacks in: - adi,ad5064 - adi,ad7291 - adi,ad7303 - adi,ad7746 - adi,ad9832 - adi,adis16080 - dialog,da9150-gpadc - intel,mrfld_adc - marvell,berlin2 - maxim,max1363 - maxim,max44000 - nuvoton,nau7802 - st_sensors (includes a lot of rework!) - ti,ads8344 - ti,lp8788 * devm_platform_ioremap_resource() used to reduce boilerplate - cirrus,ep93xx - rockchip,saradc - stm,stm32-dac * Use dev_err_probe() in more places to both not print on deferred probe and ensure a reason for the deferral is available for debug purposes. - adi,ad8801 - capella,cm36651 - linear,ltc1660 - maxim,ds4424 - maxim,max5821 - microchip,mcp4922 - nxp,lpc18xx - onnn,noa1305 - st,lsm9ds0 - st,st_sensors - st,stm32-dac - ti,afe4403 - ti,afe4404 - ti,dac7311 * Drop error returns in SPI and I2C remove() functions as they are ignored and long term plan is to change these all over to returning void. In some cases these patches just make it 'obvious' they always return 0 where it was the case before but not easy to tell. - adi,ad5380 - adi,ad5446 - adi,ad5686 - adi,ad5592r - bosch,bma400 - bosch,bmc150 - fsl,mma7455 - honeywell,hmc5843 - kionix,kxsd9 - maxim,max5487 - meas,ms5611 - ti,afe4403 Driver specific changes * adi,ad5770r - Bring driver inline with documented bindings. * adi,ad7746 - Trivial style fix * adi,ad7949 - Express some magic values as the underlying parts via new #defines. - Make it work with SPI controllers that don't support 14 or 16 bit messages - Support selection of voltage reference from dt including expanding the dt-bindings to cover this new functionality. * adi,ad799x - Implement selection of external reference voltage on AD7991, AD7995 and AD7999. - Add missing dt-bindings doc for devices supported by this driver. * adi,adislib - Move interrupt startup to better location in startup flow. - Handle devices that cannot mask/unmask the drdy pin and must instead mask at the interrupt controller. Applies to the adis16460 and adis16475 from which we then drop equivalent code. * adi,ltc2983 - Add support for optional reset pin. - Fail to probe if no channels specified in dt binding. * asahi-kasei,ak8975 - dt-binding additions of missing vid-supply regulator. * aspeed,aspeed-adc - Typo fix. * fsl,mma7660 - Mark acpi_device_id table __maybe_unused to avoid build warning. * fsl,imx25-gcq - Avoid initializing regulators that aren't used. * invensense,mpu3050 - Drop a dead protection against a clash with the old input driver. * invensense,mpu6050 - Rework code to not use strcpy() and hence avoid possibility of wrong sized buffers. Note this wasn't a bug, but the new code is a lot more readable. - Mark acpi_device_id table __maybe_unused to avoid build warning. * kionix,kxcjk1013 - dt-binding addition to note it supports interrupts. * marvell,berlin2-adc - Enable COMPILE_TEST building. * maxim,max1027 - Avoid returning success in an error path. * nxp,imx8qxp - Fix warning when runtime pm not enabled via __maybe_unused. * ricoh,rn5t618 - Use the new devm_iio_map_array_register() instead of open coding the same. * samsung,exynos_adc - Improve kconfig help text. * st,lsm6dsx - Move max_fifo_size into the fifo_ops structure where the other configuration parameters are found. * st,st_sensors: - Reorder to ensure we turn the power off after removing userspace interfaces. * senseair,sunrise - Add missing I2C dependency. * ti,twl6030 - Small code tidy up. * tag 'iio-for-5.16a-split-take4' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (148 commits) iio: imx8qxp-adc: mark PM functions as __maybe_unused iio: pressure: ms5611: Make ms5611_remove() return void iio: potentiometer: max5487: Don't return an error in .remove() iio: magn: hmc5843: Make hmc5843_common_remove() return void iio: health: afe4403: Don't return an error in .remove() iio: dac: ad5686: Make ad5686_remove() return void iio: dac: ad5592r: Make ad5592r_remove() return void iio: dac: ad5446: Make ad5446_remove() return void iio: dac: ad5380: Make ad5380_remove() return void iio: accel: mma7455: Make mma7455_core_remove() return void iio: accel: kxsd9: Make kxsd9_common_remove() return void iio: accel: bmi088: Make bmi088_accel_core_remove() return void iio: accel: bmc150: Make bmc150_accel_core_remove() return void iio: accel: bma400: Make bma400_remove() return void drivers:iio:dac:ad5766.c: Add trigger buffer iio: triggered-buffer: extend support to configure output buffers iio: kfifo-buffer: Add output buffer support iio: Add output buffer support iio: documentation: Document scd4x calibration use drivers: iio: chemical: Add support for Sensirion SCD4x CO2 sensor ...
2021-10-19counter: drop chrdev_lockDavid Lechner
This removes the chrdev_lock from the counter subsystem. This was intended to prevent opening the chrdev more than once. However, this doesn't work in practice since userspace can duplicate file descriptors and pass file descriptors to other processes. Since this protection can't be relied on, it is best to just remove it. Suggested-by: Greg KH <gregkh@linuxfoundation.org> Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: David Lechner <david@lechnology.com> Link: https://lore.kernel.org/r/20211017185521.3468640-1-david@lechnology.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-19ARM: 9121/1: amba: Drop unused functions about APB/AHB devices addWang Kefeng
No one use the following functions, kill them. amba_aphb_device_add() amba_apb_device_add() amba_apb_device_add_res() amba_ahb_device_add() amba_ahb_device_add_res() Cc: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-19platform_data/mlxreg: Add new field for secured accessVadim Pasternak
Extend structure 'mlxreg_core_data' with the field "secured". The purpose of this field is to restrict access to some attributes, if kernel is configured with security options, like: LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY. Access to some attributes, which for example, allow burning of some hardware components, like FPGA, CPLD, SPI, etcetera can break the system. In case user does not want to allow such access, it can disable it by setting security options. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Michael Shych <michaelsh@nvidia.com> Link: https://lore.kernel.org/r/20211002093238.3771419-7-vadimp@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-10-19platform/mellanox: mlxreg-hotplug: Extend logic for hotplug devices operationsVadim Pasternak
Extend the structure 'mlxreg_hotplug_device" with platform device field to allow transition of the register map and system interrupt line number to underlying hotplug devices, sharing the same register map and same interrupt line with 'mlxreg-hotplug' driver. Extend logic for hotplug devices creation and removing according to the action associated with the hotplug device description. Previously hotplug driver was capable to attach / de-attach upon hotplug events only I2C devices handled by simple I2C drivers. Now it should be able to attach also devices handled by the platform drivers. The motivation is to allow transition of platform data like: - system interrupt line number, sharing with 'mlxreg-hotplug' to underlying hotplug devices. - shared register map of programmable devices on main board to underlying hotplug devices. Additioanlly the number of 'sysfs' attributes is increased, since modular system defines more 'sysfs' attributes. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Michael Shych <michaelsh@nvidia.com> Link: https://lore.kernel.org/r/20211002093238.3771419-4-vadimp@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-10-19platform_data/mlxreg: Add new type to support modular systemsVadim Pasternak
Add new types for the Nvidia modular systems MSN4800 which could be equipped with the different types of replaceable line cards. Add new type to specify the kind of hotplug events for the line cards. The line card events are generated by the programmable device located on the main board. This device implements interrupt controller logic. Line card interrupts are associated with different line cards states during its initialization: insertion, security signature validation, power good state, security validation, hardware-firmware synchronization state, line card PHYs readiness state, firmware availability for line card ports. Also under some circumstances hardware can generate thermal shutdown for particular line card. Add new type specifying the action, which should be performed when particular hotplug event is received. This action defines in which way hotplug event should be handled by hotplug driver. There are the next actions types: - Connect I2C device with empty 'platform_data' field according to the platform topology, if device is configured (for example, power unit micro-controller driver, when power unit is connected to power source (this is what is currently supported). - Connect device with 'platform_data' field set according to the platform topology. The purpose is to pass 'platform_data' through hotplug driver to underlying device (for example line card driver). - No device is associated with hotplug event - just send "udev" event (this is what is currently supported). Extend structure 'mlxreg_hotplug_device' with hotplug action field. Extend structure 'mlxreg_core_data' with: - Registers for line card power and enabling control. - Slot number field, to indicate at which physical slot replaceable line card device is located. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Michael Shych <michaelsh@nvidia.com> Link: https://lore.kernel.org/r/20211002093238.3771419-2-vadimp@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-10-19Merge branch 'timers/drivers/armv8.6_arch_timer' into timers/drivers/nextDaniel Lezcano
The branch is a stable branch shared with ARM maintainers for the first 13th patches of the series: It is based on v5.14-rc3. As stated by the changelog: " [... ] enabling ARMv8.6 support for timer subsystem, and was prompted by a discussion with Oliver around the fact that an ARMv8.6 implementation must have a 1GHz counter, which leads to a number of things to break in the timer code: - the counter rollover can come pretty quickly as we only advertise a 56bit counter, - the maximum timer delta can be remarkably small, as we use the countdown interface which is limited to 32bit... Thankfully, there is a way out: we can compute the minimal width of the counter based on the guarantees that the architecture gives us, and we can use the 64bit comparator interface instead of the countdown to program the timer. Finally, we start making use of the ARMv8.6 ECV features by switching accesses to the counters to a self-synchronising register, removing the need for an ISB. Hopefully, implementations will *not* just stick an invisible ISB there... A side effect of the switch to CVAL is that XGene-1 breaks. I have added a workaround to keep it alive. I have added Oliver's original patch[0] to the series and tweaked a couple of things. Blame me if I broke anything. The whole things has been tested on Juno (sysreg + MMIO timers), XGene-1 (broken sysreg timers), FVP (FEAT_ECV, CNT*CTSS_EL0). " Link: https://lore.kernel.org/r/20211017124225.3018098-1-maz@kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2021-10-19Merge tag 'misc-habanalabs-next-2021-10-18' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next Oded writes: This tag contains habanalabs driver changes for v5.16: - Add a new uAPI (under the memory ioctl) to request from the driver to export a DMA-BUF object that represents a memory region on the device's DRAM. This is needed to enable peer-to-peer over PCIe between habana device and an RDMA adapter (e.g. mlnx5 or efa rdma adapter). - Add debugfs node to dynamically configure CS timeout. Up until now, it was only configurable through kernel module parameter. - Fetch more comprehensive power information from the firmware. - Always take timestamp when waiting for user interrupt, as the user needs that information to optimize the graph runtime compilation. - Modify user interrupt to look on 64-bit user value as fence, instead of 32-bit. - Bypass reset in case of repeated h/w error event after device reset. This is to prevent endless loop of resets to the device. - Fix several bugs in multi CS completion code. - Fix race condition in fd close/open. - Update to latest firmware headers - Add select CRC32 in kconfig - Small fixes, cosmetics * tag 'misc-habanalabs-next-2021-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (25 commits) habanalabs: refactor fence handling in hl_cs_poll_fences habanalabs: context cleanup cosmetics habanalabs: simplify wait for interrupt with timestamp flow habanalabs: initialize hpriv fields before adding new node habanalabs: Unify frequency set/get functionality habanalabs: select CRC32 habanalabs: add support for dma-buf exporter habanalabs: define uAPI to export FD for DMA-BUF habanalabs: fix NULL pointer dereference habanalabs: fix race condition in multi CS completion habanalabs: use only u32 habanalabs: update firmware files habanalabs: bypass reset for continuous h/w error event habanalabs: take timestamp on wait for interrupt habanalabs: prevent race between fd close/open habanalabs: refactor reset log message habanalabs: define soft-reset as inference op habanalabs: fix debugfs device memory MMU VA translation habanalabs: add support for a long interrupt target value habanalabs: remove redundant cs validity checks ...
2021-10-19iio: triggered-buffer: extend support to configure output buffersAlexandru Ardelean
Now that output (kfifo) buffers are supported, we need to extend the {devm_}iio_triggered_buffer_setup_ext() parameter list to take a direction parameter. This allows us to attach an output triggered buffer to a DAC device. Unfortunately it's a bit difficult to add another macro to avoid changing 5 drivers where {devm_}iio_triggered_buffer_setup_ext() is used. Well, it's doable, but may not be worth the trouble vs just updating all these 5 drivers. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Mihail Chindris <mihail.chindris@analog.com> Link: https://lore.kernel.org/r/20211007080035.2531-4-mihail.chindris@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-10-19iio: Add output buffer supportMihail Chindris
Currently IIO only supports buffer mode for capture devices like ADCs. Add support for buffered mode for output devices like DACs. The output buffer implementation is analogous to the input buffer implementation. Instead of using read() to get data from the buffer write() is used to copy data into the buffer. poll() with POLLOUT will wakeup if there is space available. Drivers can remove data from a buffer using iio_pop_from_buffer(), the function can e.g. called from a trigger handler to write the data to hardware. A buffer can only be either a output buffer or an input, but not both. So, for a device that has an ADC and DAC path, this will mean 2 IIO buffers (one for each direction). The direction of the buffer is decided by the new direction field of the iio_buffer struct and should be set after allocating and before registering it. Co-developed-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Co-developed-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Mihail Chindris <mihail.chindris@analog.com> Link: https://lore.kernel.org/r/20211007080035.2531-2-mihail.chindris@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-10-19iio: core: Introduce iio_push_to_buffers_with_ts_unaligned()Jonathan Cameron
Whilst it is almost always possible to arrange for scan data to be read directly into a buffer that is suitable for passing to iio_push_to_buffers_with_timestamp(), there are a few places where leading data needs to be skipped over. For these cases introduce a function that will allocate an appropriate sized and aligned bounce buffer (if not already allocated) and copy the unaligned data into that before calling iio_push_to_buffers_with_timestamp() on the bounce buffer. We tie the lifespace of this buffer to that of the iio_dev.dev which should ensure no memory leaks occur. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20210613151039.569883-2-jic23@kernel.org
2021-10-19iio: adis: handle devices that cannot unmask the drdy pinNuno Sá
Some devices can't mask/unmask the data ready pin and in those cases each driver was just calling '{dis}enable_irq()' to control the trigger state. This change, moves that handling into the library by introducing a new boolean in the data structure that tells the library that the device cannot unmask the pin. On top of controlling the trigger state, we can also use this flag to automatically request the IRQ with 'IRQF_NO_AUTOEN' in case it is set. So far, all users of the library want to start operation with IRQs/DRDY pin disabled so it should be fairly safe to do this inside the library. Signed-off-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20210903141423.517028-3-nuno.sa@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-10-19iio: inkern: introduce devm_iio_map_array_register() short-hand functionAlexandru Ardelean
This change introduces a device-managed variant to the iio_map_array_register() function. It's a simple implementation of calling iio_map_array_register() and registering a callback to iio_map_array_unregister() with the devm_add_action_or_reset(). The function uses an explicit 'dev' parameter to bind the unwinding to. It could have been implemented to implicitly use the parent of the IIO device, however it shouldn't be too expensive to callers to just specify to which device object to bind this unwind call. It would make the API a bit more flexible. Signed-off-by: Alexandru Ardelean <aardelean@deviqon.com> Link: https://lore.kernel.org/r/20210903072917.45769-2-aardelean@deviqon.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2021-10-19Merge tag 'counter-for-5.16a-take2' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: First set of counter subsystem new feature support for the 5.16 cycle Most interesting element this time is the new chrdev based interface for the counter subsystem. Affects all drivers. Some minor precursor patches. Major parts: * Bring all the sysfs attribute setup into the counter core rather than leaving it to individual drivers. Docs updates accompany these changes. * Move various definitions to a uapi header as now needed from userspace. * Add the chardev interface + extensive documentation and example tool * Add new ABI needed to identify indexes needed for chrdev interface * Implement new interface for the 104-quad-8 * Follow up deals with wrong path for documentation build * Various trivial cleanups and missing feature additions related to this series * tag 'counter-for-5.16a-take2' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: docs: counter: Include counter-chrdev kernel-doc to generic-counter.rst counter: fix docum. build problems after filename change counter: microchip-tcb-capture: Tidy up a false kernel-doc /** marking. counter: 104-quad-8: Add IRQ support for the ACCES 104-QUAD-8 counter: 104-quad-8: Replace mutex with spinlock counter: Implement events_queue_size sysfs attribute counter: Implement *_component_id sysfs attributes counter: Implement signalZ_action_component_id sysfs attribute tools/counter: Create Counter tools docs: counter: Document character device interface counter: Add character device interface counter: Move counter enums to uapi header docs: counter: Update to reflect sysfs internalization counter: Update counter.h comments to reflect sysfs internalization counter: Internalize sysfs interface code counter: stm32-timer-cnt: Provide defines for slave mode selection counter: stm32-lptimer-cnt: Provide defines for clock polarities
2021-10-18mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem()Sean Christopherson
Check for a NULL page->mapping before dereferencing the mapping in page_is_secretmem(), as the page's mapping can be nullified while gup() is running, e.g. by reclaim or truncation. BUG: kernel NULL pointer dereference, address: 0000000000000068 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 PID: 4173897 Comm: CPU 3/KVM Tainted: G W RIP: 0010:internal_get_user_pages_fast+0x621/0x9d0 Code: <48> 81 7a 68 80 08 04 bc 0f 85 21 ff ff 8 89 c7 be RSP: 0018:ffffaa90087679b0 EFLAGS: 00010046 RAX: ffffe3f37905b900 RBX: 00007f2dd561e000 RCX: ffffe3f37905b934 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe3f37905b900 ... CR2: 0000000000000068 CR3: 00000004c5898003 CR4: 00000000001726e0 Call Trace: get_user_pages_fast_only+0x13/0x20 hva_to_pfn+0xa9/0x3e0 try_async_pf+0xa1/0x270 direct_page_fault+0x113/0xad0 kvm_mmu_page_fault+0x69/0x680 vmx_handle_exit+0xe1/0x5d0 kvm_arch_vcpu_ioctl_run+0xd81/0x1c70 kvm_vcpu_ioctl+0x267/0x670 __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x56/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/20211007231502.3552715-1-seanjc@google.com Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: Sean Christopherson <seanjc@google.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Stephen <stephenackerman16@gmail.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-18elfcore: correct reference to CONFIG_UMLLukas Bulwahn
Commit 6e7b64b9dd6d ("elfcore: fix building with clang") introduces special handling for two architectures, ia64 and User Mode Linux. However, the wrong name, i.e., CONFIG_UM, for the intended Kconfig symbol for User-Mode Linux was used. Although the directory for User Mode Linux is ./arch/um; the Kconfig symbol for this architecture is called CONFIG_UML. Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs: UM Referencing files: include/linux/elfcore.h Similar symbols: UML, NUMA Correct the name of the config to the intended one. [akpm@linux-foundation.org: fix um/x86_64, per Catalin] Link: https://lkml.kernel.org/r/20211006181119.2851441-1-catalin.marinas@arm.com Link: https://lkml.kernel.org/r/YV6pejGzLy5ppEpt@arm.com Link: https://lkml.kernel.org/r/20211006082209.417-1-lukas.bulwahn@gmail.com Fixes: 6e7b64b9dd6d ("elfcore: fix building with clang") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Barret Rhoden <brho@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-18mm/migrate: fix CPUHP state to update node demotion orderHuang Ying
The node demotion order needs to be updated during CPU hotplug. Because whether a NUMA node has CPU may influence the demotion order. The update function should be called during CPU online/offline after the node_states[N_CPU] has been updated. That is done in CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD during CPU offline. But in commit 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events"), the function to update node demotion order is called in CPUHP_AP_ONLINE_DYN during CPU online/offline. This doesn't satisfy the order requirement. For example, there are 4 CPUs (P0, P1, P2, P3) in 2 sockets (P0, P1 in S0 and P2, P3 in S1), the demotion order is - S0 -> NUMA_NO_NODE - S1 -> NUMA_NO_NODE After P2 and P3 is offlined, because S1 has no CPU now, the demotion order should have been changed to - S0 -> S1 - S1 -> NO_NODE but it isn't changed, because the order updating callback for CPU hotplug doesn't see the new nodemask. After that, if P1 is offlined, the demotion order is changed to the expected order as above. So in this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the update function on them. Link: https://lkml.kernel.org/r/20210929060351.7293-1-ying.huang@intel.com Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-18mm/migrate: add CPU hotplug to demotion #ifdefDave Hansen
Once upon a time, the node demotion updates were driven solely by memory hotplug events. But now, there are handlers for both CPU and memory hotplug. However, the #ifdef around the code checks only memory hotplug. A system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU hotplug events. Update the #ifdef around the common code. Add memory and CPU-specific #ifdefs for their handlers. These memory/CPU #ifdefs avoid unused function warnings when their Kconfig option is off. [arnd@arndb.de: rework hotplug_memory_notifier() stub] Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-19ALSA: memalloc: Drop superfluous snd_dma_buffer_sync() declarationTakashi Iwai
snd_dma_buffer_sync() is declared twice, and the one outside the ifdef CONFIG_HAS_DMA could lead to a build error when CONFIG_HAS_DMA=n. As it's an overlooked leftover after rebase, drop this line. Fixes: a25684a95646 ("ALSA: memalloc: Support for non-contiguous page allocation") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20211019165402.4fa82c38@canb.auug.org.au Link: https://lore.kernel.org/r/20211019060536.26089-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-10-18net/mlx5: Introduce new uplink destination typeMaor Gottlieb
The uplink destination type should be used in rules to steer the packet to the uplink when the device is in steering based LAG mode. Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-18net/mlx5: Add support to create match definerMaor Gottlieb
Introduce new APIs to create and destroy flow matcher for given format id. Flow match definer object is used for defining the fields and mask used for the hash calculation. User should mask the desired fields like done in the match criteria. This object is assigned to flow group of type hash. In this flow group type, packets lookup is done based on the hash result. This patch also adds the required bits to create such flow group. Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-18net/mlx5: Introduce port selection namespaceMaor Gottlieb
Add new port selection flow steering namespace. Flow steering rules in this namespaceare are used to determine the physical port for egress packets. Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-18scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O pathMike Christie
We are only holding the lun_tg_pt_gp_lock in target_alua_state_check() to make sure tg_pt_gp is not freed from under us while we copy the state, delay, ID values. We can instead use RCU here to access the tg_pt_gp. With this patch IOPs can increase up to 10% for jobs like: fio --filename=/dev/sdX --direct=1 --rw=randrw --bs=4k \ --ioengine=libaio --iodepth=64 --numjobs=N when there are multiple sessions (running that fio command to each /dev/sdX or using multipath and there are over 8 paths), or more than 8 queues for the loop or vhost with multiple threads case and numjobs > 8. Link: https://lore.kernel.org/r/20210930020422.92578-5-michael.christie@oracle.com Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-18scsi: target: Fix ordered tag handlingMike Christie
This patch fixes the following bugs: 1. If there are multiple ordered cmds queued and multiple simple cmds completing, target_restart_delayed_cmds() could be called on different CPUs and each instance could start a ordered cmd. They could then run in different orders than they were queued. 2. target_restart_delayed_cmds() and target_handle_task_attr() can race where: 1. target_handle_task_attr() has passed the simple_cmds == 0 check. 2. transport_complete_task_attr() then decrements simple_cmds to 0. 3. transport_complete_task_attr() runs target_restart_delayed_cmds() and it does not see any cmds on the delayed_cmd_list. 4. target_handle_task_attr() adds the cmd to the delayed_cmd_list. The cmd will then end up timing out. 3. If we are sent > 1 ordered cmds and simple_cmds == 0, we can execute them out of order, because target_handle_task_attr() will hit that simple_cmds check first and return false for all ordered cmds sent. 4. We run target_restart_delayed_cmds() after every cmd completion, so if there is more than 1 simple cmd running, we start executing ordered cmds after that first cmd instead of waiting for all of them to complete. 5. Ordered cmds are not supposed to start until HEAD OF QUEUE and all older cmds have completed, and not just simple. 6. It's not a bug but it doesn't make sense to take the delayed_cmd_lock for every cmd completion when ordered cmds are almost never used. Just replacing that lock with an atomic increases IOPs by up to 10% when completions are spread over multiple CPUs and there are multiple sessions/ mqs/thread accessing the same device. This patch moves the queued delayed handling to a per device work to serialze the cmd executions for each device and adds a new counter to track HEAD_OF_QUEUE and SIMPLE cmds. We can then check the new counter to determine when to run the work on the completion path. Link: https://lore.kernel.org/r/20210930020422.92578-3-michael.christie@oracle.com Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-10-18bpf: Rename BTF_KIND_TAG to BTF_KIND_DECL_TAGYonghong Song
Patch set [1] introduced BTF_KIND_TAG to allow tagging declarations for struct/union, struct/union field, var, func and func arguments and these tags will be encoded into dwarf. They are also encoded to btf by llvm for the bpf target. After BTF_KIND_TAG is introduced, we intended to use it for kernel __user attributes. But kernel __user is actually a type attribute. Upstream and internal discussion showed it is not a good idea to mix declaration attribute and type attribute. So we proposed to introduce btf_type_tag as a type attribute and existing btf_tag renamed to btf_decl_tag ([2]). This patch renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG and some other declarations with *_tag to *_decl_tag to make it clear the tag is for declaration. In the future, BTF_KIND_TYPE_TAG might be introduced per [3]. [1] https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ [2] https://reviews.llvm.org/D111588 [3] https://reviews.llvm.org/D111199 Fixes: b5ea834dde6b ("bpf: Support for new btf kind BTF_KIND_TAG") Fixes: 5b84bd10363e ("libbpf: Add support for BTF_KIND_TAG") Fixes: 5c07f2fec003 ("bpftool: Add support for BTF_KIND_TAG") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com