summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-08-18Merge tag 'qcom-drivers-for-5.15' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/drivers Qualcomm driver updates for v5.15 This fixes the "shared memory state machine" (SMSM) interrupt logic to avoid missing transitions happening while the interrupts are masked. SM6115 support is added to smd-rpm and rpmpd. The Qualcomm SCM firmware driver is once again made possible to compile and load as a kernel module. An out-of-bounds error related to the cooling devices of the AOSS driver is corrected. The binding is converted to YAML and a generic compatible is introduced to reduce the driver churn. The GENI wrapper gains a helper function used in I2C and SPI for switching the serial engine hardware to use the wrapper's DMA-engine. Lastly it contains a number of cleanups and smaller fixes for rpmhpd, socinfo, CPR, mdt_loader and the GENI DT binding. * tag 'qcom-drivers-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: smsm: Fix missed interrupts if state changes while masked soc: qcom: smsm: Implement support for get_irqchip_state soc: qcom: mdt_loader: be more informative on errors dt-bindings: qcom: geni-se: document iommus soc: qcom: smd-rpm: Add SM6115 compatible soc: qcom: geni: Add support for gpi dma soc: qcom: geni: move GENI_IF_DISABLE_RO to common header PM: AVS: qcom-cpr: Use nvmem_cell_read_variable_le_u32() drivers: soc: qcom: rpmpd: Add SM6115 RPM Power Domains dt-bindings: power: rpmpd: Add SM6115 to rpmpd binding dt-bindings: soc: qcom: smd-rpm: Add SM6115 compatible soc: qcom: aoss: Fix the out of bound usage of cooling_devs firmware: qcom_scm: Allow qcom_scm driver to be loadable as a permenent module soc: qcom: socinfo: Don't print anything if nothing found soc: qcom: rpmhpd: Use corner in power_off soc: qcom: aoss: Add generic compatible dt-bindings: soc: qcom: aoss: Convert to YAML dt-bindings: soc: qcom: aoss: Add SC8180X and generic compatible firmware: qcom_scm: remove a duplicative condition firmware: qcom_scm: Mark string array const Link: https://lore.kernel.org/r/20210816214840.581244-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-18Merge tag 'tegra-for-5.15-soc' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into arm/drivers soc/tegra: Changes for v5.15-rc1 Implements runtime PM support for the FUSE block and prepares the driver to work better in conjunction with the CPUIDLE driver. * tag 'tegra-for-5.15-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: soc/tegra: fuse: Enable fuse clock on suspend for Tegra124 soc/tegra: fuse: Add runtime PM support soc/tegra: fuse: Clear fuse->clk on driver probe failure soc/tegra: pmc: Prevent racing with cpuilde driver soc/tegra: bpmp: Remove unused including <linux/version.h> Link: https://lore.kernel.org/r/20210813162157.2820913-3-thierry.reding@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-08-18block: fix default IO priority handlingDamien Le Moal
The default IO priority is the best effort (BE) class with the normal priority level IOPRIO_NORM (4). However, get_task_ioprio() returns IOPRIO_CLASS_NONE/IOPRIO_NORM as the default priority and get_current_ioprio() returns IOPRIO_CLASS_NONE/0. Let's be consistent with the defined default and have both of these functions return the default priority IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_NORM) when the user did not define another default IO priority for the task. In include/uapi/linux/ioprio.h, introduce the IOPRIO_BE_NORM macro as an alias to IOPRIO_NORM to clarify that this default level applies to the BE priotity class. In include/linux/ioprio.h, define the macro IOPRIO_DEFAULT as IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM) and use this new macro when setting a priority to the default. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-7-damien.lemoal@wdc.com [axboe: drop unnecessary lightnvm change] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-18block: Introduce IOPRIO_NR_LEVELSDamien Le Moal
The BFQ scheduler and ioprio_check_cap() both assume that the RT priority class (IOPRIO_CLASS_RT) can have up to 8 different priority levels, similarly to the BE class (IOPRIO_CLASS_iBE). This is controlled using the IOPRIO_BE_NR macro , which is badly named as the number of levels also applies to the RT class. Introduce the class independent IOPRIO_NR_LEVELS macro, defined to 8, to make things clear. Keep the old IOPRIO_BE_NR macro definition as an alias for IOPRIO_NR_LEVELS. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-6-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-18block: fix IOPRIO_PRIO_CLASS() and IOPRIO_PRIO_VALUE() macrosDamien Le Moal
The ki_ioprio field of struct kiocb is 16-bits (u16) but often handled as an int in the block layer. E.g. ioprio_check_cap() takes an int as argument. With such implicit int casting function calls, the upper 16-bits of the int argument may be left uninitialized by the compiler, resulting in invalid values for the IOPRIO_PRIO_CLASS() macro (garbage upper bits) and in an error return for functions such as ioprio_check_cap(). Fix this by masking the result of the shift by IOPRIO_CLASS_SHIFT bits in the IOPRIO_PRIO_CLASS() macro. The new macro IOPRIO_CLASS_MASK defines the 3-bits mask for the priority class. Similarly, apply the IOPRIO_PRIO_MASK mask to the data argument of the IOPRIO_PRIO_VALUE() macro to ignore the upper bits of the data value. The IOPRIO_CLASS_MASK mask is also applied to the class argument of this macro before shifting the result by IOPRIO_CLASS_SHIFT bits. While at it, also change the argument name of the IOPRIO_PRIO_CLASS() and IOPRIO_PRIO_DATA() macros from "mask" to "ioprio" to reflect the fact that a priority value should be passed rather than a mask. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-5-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-18block: change ioprio_valid() to an inline functionDamien Le Moal
Change the ioprio_valid() macro in include/usapi/linux/ioprio.h to an inline function declared on the kernel side in include/linux/ioprio.h. Also improve checks on the class value by checking the upper bound value. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-4-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-18block: improve ioprio class description commentDamien Le Moal
In include/usapi/linux/ioprio.h, change the ioprio class enum comment to remove the outdated reference to CFQ and mention BFQ and mq-deadline instead. Also document the high priority NCQ command use for RT class IOs directed at ATA drives that support NCQ priority. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210811033702.368488-3-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-18libata: Introduce ncq_prio_supported sysfs sttributeDamien Le Moal
Currently, the only way a user can determine if a SATA device supports NCQ priority is to try to enable the use of this feature using the ncq_prio_enable sysfs device attribute. If enabling the feature fails, it is because the device does not support NCQ priority. Otherwise, the feature is enabled and success indicates that the device supports NCQ priority. Improve this odd interface by introducing the read-only ncq_prio_supported sysfs device attribute to indicate if a SATA device supports NCQ priority. The value of this attribute reflects the status of device flag ATA_DFLAG_NCQ_PRIO, which is set only for devices supporting NCQ priority. Add this new sysfs attribute to the device attributes group of libahci and libata-sata. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210816014456.2191776-10-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-18libata: print feature list on device scanDamien Le Moal
Print a list of features supported by a drive when it is configured in ata_dev_configure() using the new function ata_dev_print_features(). The features printed are not already advertized and are: trusted send-recev support, device attention support, device sleep support, NCQ send-recv support and NCQ priority support. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210816014456.2191776-9-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-18regulator: Minor regulator documentation fixes.Matti Vaittinen
The newly added regulator ramp-delay specifiers in regulator desc lacked the documentation. Add some. Also fix a typo. Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Link: https://lore.kernel.org/r/20210818041513.GA2408290@dc7vkhyh15000m40t6jht-3.rev.dnainternet.fi Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-18iommu: Allow enabling non-strict mode dynamicallyRobin Murphy
Allocating and enabling a flush queue is in fact something we can reasonably do while a DMA domain is active, without having to rebuild it from scratch. Thus we can allow a strict -> non-strict transition from sysfs without requiring to unbind the device's driver, which is of particular interest to users who want to make selective relaxations to critical devices like the one serving their root filesystem. Disabling and draining a queue also seems technically possible to achieve without rebuilding the whole domain, but would certainly be more involved. Furthermore there's not such a clear use-case for tightening up security *after* the device may already have done whatever it is that you don't trust it not to do, so we only consider the relaxation case. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/d652966348c78457c38bf18daf369272a4ebc2c9.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu: Express DMA strictness via the domain typeRobin Murphy
Eliminate the iommu_get_dma_strict() indirection and pipe the information through the domain type from the beginning. Besides the flow simplification this also has several nice side-effects: - Automatically implies strict mode for untrusted devices by virtue of their IOMMU_DOMAIN_DMA override. - Ensures that we only end up using flush queues for drivers which are aware of them and can actually benefit. - Allows us to handle flush queue init failure by falling back to strict mode instead of leaving it to possibly blow up later. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/47083d69155577f1367877b1594921948c366eb3.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu: Introduce explicit type for non-strict DMA domainsRobin Murphy
Promote the difference between strict and non-strict DMA domains from an internal detail to a distinct domain feature and type, to pave the road for exposing it through the sysfs default domain interface. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/08cd2afaf6b63c58ad49acec3517c9b32c2bb946.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu/io-pgtable: Remove non-strict quirkRobin Murphy
IO_PGTABLE_QUIRK_NON_STRICT was never a very comfortable fit, since it's not a quirk of the pagetable format itself. Now that we have a more appropriate way to convey non-strict unmaps, though, this last of the non-quirk quirks can also go, and with the flush queue code also now enforcing its own ordering we can have a lovely cleanup all round. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/155b5c621cd8936472e273a8b07a182f62c6c20d.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu: Indicate queued flushes via gather dataRobin Murphy
Since iommu_iotlb_gather exists to help drivers optimise flushing for a given unmap request, it is also the logical place to indicate whether the unmap is strict or not, and thus help them further optimise for whether to expect a sync or a flush_all subsequently. As part of that, it also seems fair to make the flush queue code take responsibility for enforcing the really subtle ordering requirement it brings, so that we don't need to worry about forgetting that if new drivers want to add flush queue support, and can consolidate the existing versions. While we're adding to the kerneldoc, also fill in some info for @freelist which was overlooked previously. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/bf5f8e2ad84e48c712ccbf80fa8c610594c7595f.1628682049.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18iommu: Pull IOVA cookie management into the coreRobin Murphy
Now that everyone has converged on iommu-dma for IOMMU_DOMAIN_DMA support, we can abandon the notion of drivers being responsible for the cookie type, and consolidate all the management into the core code. CC: Yong Wu <yong.wu@mediatek.com> CC: Chunyan Zhang <chunyan.zhang@unisoc.com> CC: Maxime Ripard <mripard@kernel.org> Tested-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/46a2c0e7419c7d1d931762dc7b6a69fa082d199a.1628682048.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-08-18net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()Wei Wang
Add gfp_t mask as an input parameter to mem_cgroup_charge_skmem(), to give more control to the networking stack and enable it to change memcg charging behavior. In the future, the networking stack may decide to avoid oom-kills when fallbacks are more appropriate. One behavior change in mem_cgroup_charge_skmem() by this patch is to avoid force charging by default and let the caller decide when and if force charging is needed through the presence or absence of __GFP_NOFAIL. Signed-off-by: Wei Wang <weiwan@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18net: dsa: tag_sja1105: be dsa_loop-safeVladimir Oltean
Add support for tag_sja1105 running on non-sja1105 DSA ports, by making sure that every time we dereference dp->priv, we check the switch's dsa_switch_ops (otherwise we access a struct sja1105_port structure that is in fact something else). This adds an unconditional build-time dependency between sja1105 being built as module => tag_sja1105 must also be built as module. This was there only for PTP before. Some sane defaults must also take place when not running on sja1105 hardware. These are: - sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols depending on VLAN awareness and switch revision (when an encapsulated VLAN must be sent). Default to 0x8100. - sja1105_rcv_meta_state_machine: this aggregates PTP frames with their metadata timestamp frames. When running on non-sja1105 hardware, don't do that and accept all frames unmodified. - sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c which writes a management route over SPI. When not running on sja1105 hardware, bypass the SPI write and send the frame as-is. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18mptcp: remote addresses fullmeshGeliang Tang
This patch added and managed a new per endpoint flag, named MPTCP_PM_ADDR_FLAG_FULLMESH. In mptcp_pm_create_subflow_or_signal_addr(), if such flag is set, instead of: remote_address((struct sock_common *)sk, &remote); fill a temporary allocated array of all known remote address. After releaseing the pm lock loop on such array and create a subflow for each remote address from the given local. Note that the we could still use an array even for non 'fullmesh' endpoint: with a single entry corresponding to the primary MPC subflow remote address. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18firmware: qcom_scm: Introduce SCM calls to access LMhThara Gopinath
Introduce SCM calls to access/configure limits management hardware(LMH). Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210809191605.3742979-2-thara.gopinath@linaro.org
2021-08-18leds: move default_state read from fwnode to coreDenis Osterland-Heim
This patch introduces a new function to read initial default_state from fwnode. Suggested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Denis Osterland-Heim <Denis.Osterland@diehl.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
2021-08-17scsi: target: Allows backend drivers to fail with specific sense codesSergey Samoylenko
Currently, backend drivers can fail I/O with SAM_STAT_CHECK_CONDITION which gets us TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE. Add a new helper that allows backend drivers to fail with specific sense codes. This is based on a patch from Mike Christie <michael.christie@oracle.com>. Cc: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20210803145410.80147-2-s.samoylenko@yadro.com Reviewed-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Sergey Samoylenko <s.samoylenko@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-17scsi: core: Remove scsi_cmnd.tagJohn Garry
It is never read, so get rid of it. Link: https://lore.kernel.org/r/1628862553-179450-4-git-send-email-john.garry@huawei.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-17PCI: Remove reset_fn field from pci_devAmey Narkhede
"reset_fn" indicates whether the device supports any reset mechanism. Remove the use of reset_fn in favor of the reset_methods array that tracks supported reset mechanisms of a device and their ordering. The octeon driver incorrectly used reset_fn to detect whether the device supports FLR or not. Use pcie_reset_flr() to probe whether it supports FLR. Co-developed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20210817180500.1253-5-ameynarkhede03@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2021-08-17PCI: Add array to track reset method orderingAmey Narkhede
Add reset_methods[] in struct pci_dev to keep track of reset mechanisms supported by the device and their ordering. Refactor probing and reset functions to take advantage of calling convention of reset functions. Co-developed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20210817180500.1253-4-ameynarkhede03@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2021-08-17PCI: Add pcie_reset_flr() with 'probe' argumentAmey Narkhede
Most reset methods are of the form "pci_*_reset(dev, probe)". pcie_flr() was an exception because it relied on a separate pcie_has_flr() function instead of taking a "probe" argument. Add "pcie_reset_flr(dev, probe)" to follow the convention. Remove pcie_has_flr(). Some pcie_flr() callers that did not use pcie_has_flr() remain. [bhelgaas: commit log, rework pcie_reset_flr() to use dev->devcap directly] Link: https://lore.kernel.org/r/20210817180500.1253-3-ameynarkhede03@gmail.com Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2021-08-17PCI: Cache PCIe Device Capabilities registerAmey Narkhede
Add a new member called devcap in struct pci_dev for caching the PCIe Device Capabilities register to avoid reading PCI_EXP_DEVCAP multiple times. Refactor pcie_has_flr() to use cached device capabilities. Link: https://lore.kernel.org/r/20210817180500.1253-2-ameynarkhede03@gmail.com Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2021-08-17workqueue: Remove unused WORK_NO_COLORLai Jiangshan
WORK_NO_COLOR has no user now, just remove it. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-17workqueue: Rename "delayed" (delayed by active management) to "inactive"Lai Jiangshan
There are two kinds of "delayed" work items in workqueue subsystem. One is for timer-delayed work items which are visible to workqueue users. The other kind is for work items delayed by active management which can not be directly visible to workqueue users. We mixed the word "delayed" for both kinds and caused somewhat ambiguity. This patch renames the later one (delayed by active management) to "inactive", because it is used for workqueue active management and most of its related symbols are named with "active" or "activate". All "delayed" and "DELAYED" are carefully checked and renamed one by one to avoid accidentally changing the name of the other kind for timer-delayed. No functional change intended. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-17static_call: Update API documentationPeter Zijlstra
Update the comment with the new features. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/YQwIorQBHEq+s73b@hirez.programming.kicks-ass.net
2021-08-17locking/local_lock: Add PREEMPT_RT supportThomas Gleixner
On PREEMPT_RT enabled kernels local_lock maps to a per CPU 'sleeping' spinlock which protects the critical section while staying preemptible. CPU locality is established by disabling migration. Provide the necessary types and macros to substitute the non-RT variant. Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211306.023630962@linutronix.de
2021-08-17locking/spinlock/rt: Prepare for RT local_lockThomas Gleixner
Add the static and runtime initializer mechanics to support the RT variant of local_lock, which requires the lock type in the lockdep map to be set to LD_LOCK_PERCPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.967526724@linutronix.de
2021-08-17preempt: Adjust PREEMPT_LOCK_OFFSET for RTThomas Gleixner
On PREEMPT_RT regular spinlocks and rwlocks are substituted with rtmutex based constructs. spin/rwlock held regions are preemptible on PREEMPT_RT, so PREEMPT_LOCK_OFFSET has to be 0 to make the various cond_resched_*lock() functions work correctly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.804246275@linutronix.de
2021-08-17locking/rtmutex: Add mutex variant for RTThomas Gleixner
Add the necessary defines, helpers and API functions for replacing struct mutex on a PREEMPT_RT enabled kernel with an rtmutex based variant. No functional change when CONFIG_PREEMPT_RT=n Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211305.081517417@linutronix.de
2021-08-17locking/ww_mutex: Add rt_mutex based lock type and accessorsPeter Zijlstra
Provide the defines for RT mutex based ww_mutexes and fix up the debug logic so it's either enabled by DEBUG_MUTEXES or DEBUG_RT_MUTEXES on RT kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.908012566@linutronix.de
2021-08-17locking/ww_mutex: Abstract out internal lock accessesThomas Gleixner
Accessing the internal wait_lock of mutex and rtmutex is slightly different. Provide helper functions for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.734635961@linutronix.de
2021-08-17locking/mutex: Make mutex::wait_lock rawThomas Gleixner
The wait_lock of mutex is really a low level lock. Convert it to a raw_spinlock like the wait_lock of rtmutex. [ mingo: backmerged the test_lockup.c build fix by bigeasy. ] Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.166863404@linutronix.de
2021-08-17locking/ww_mutex: Move the ww_mutex definitions from <linux/mutex.h> into ↵Thomas Gleixner
<linux/ww_mutex.h> Move the ww_mutex definitions into the ww_mutex specific header where they belong. Preparatory change to allow compiling ww_mutexes standalone. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.110216293@linutronix.de
2021-08-17locking/mutex: Move the 'struct mutex_waiter' definition from ↵Thomas Gleixner
<linux/mutex.h> to the internal header Move the mutex waiter declaration from the public <linux/mutex.h> header to the internal kernel/locking/mutex.h header. There is no reason to expose it outside of the core code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211304.054325923@linutronix.de
2021-08-17locking/rwlock: Provide RT variantThomas Gleixner
Similar to rw_semaphores, on RT the rwlock substitution is not writer fair, because it's not feasible to have a writer inherit its priority to multiple readers. Readers blocked on a writer follow the normal rules of priority inheritance. Like RT spinlocks, RT rwlocks are state preserving across the slow lock operations (contended case). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.882793524@linutronix.de
2021-08-17SUNRPC: Add RPC_AUTH_TLS protocol numbersChuck Lever
Shared by client and server. See: https://www.iana.org/assignments/rpc-authentication-numbers/rpc-authentication-numbers.xhtml Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17sysctl: introduce new proc handler proc_doboolJia He
This is to let bool variable could be correctly displayed in big/little endian sysctl procfs. sizeof(bool) is arch dependent, proc_dobool should work in all arches. Suggested-by: Pan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: Jia He <hejianet@gmail.com> [thuth: rebased the patch to the current kernel version] Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()Chuck Lever
Some paths through svc_process() leave rqst->rq_procinfo set to NULL, which triggers a crash if tracing happens to be enabled. Fixes: 89ff87494c6e ("SUNRPC: Display RPC procedure names instead of proc numbers") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17NFSD: remove vanity commentsNeilBrown
Including one's name in copyright claims is appropriate. Including it in random comments is just vanity. After 2 decades, it is time for these to be gone. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17svcrdma: Convert rdma->sc_rw_ctxts to llistChuck Lever
Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts to an llist. The goal is to reduce the average overhead of Send completions, because a transport's completion handlers are single-threaded on one CPU core. This change reduces CPU utilization of each Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17svcrdma: Relieve contention on sc_send_lock.Chuck Lever
/proc/lock_stat indicates the the sc_send_lock is heavily contended when the server is under load from a single client. To address this, convert the send_ctxt free list to an llist. Returning an item to the send_ctxt cache is now waitless, which reduces the instruction path length in the single-threaded Send handler (svc_rdma_wc_send). The goal is to enable the ib_comp_wq worker to handle a higher RPC/RDMA Send completion rate given the same CPU resources. This change reduces CPU utilization of Send completion by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17svcrdma: Fewer calls to wake_up() in Send completion handlerChuck Lever
Because wake_up() takes an IRQ-safe lock, it can be expensive, especially to call inside of a single-threaded completion handler. What's more, the Send wait queue almost never has waiters, so most of the time, this is an expensive no-op. As always, the goal is to reduce the average overhead of each completion, because a transport's completion handlers are single- threaded on one CPU core. This change reduces CPU utilization of the Send completion thread by 2-3% on my server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Tom Talpey <tom@talpey.com>
2021-08-17tracing: Add trace_event helper macros __string_len() and __assign_str_len()Steven Rostedt (VMware)
There's a few cases that a string that is to be recorded in a trace event, does not have a terminating 'nul' character, and instead, the tracepoint passes in the length of the string to record. Add two helper macros to the trace event code that lets this work easier, than tricks with "%.*s" logic. __string_len() which is similar to __string() for declaration, but takes a length argument. __assign_str_len() which is similar to __assign_str() for assiging the string, but it too takes a length argument. Note, the TRACE_EVENT() macro will allocate the location on the ring buffer to 'len + 1', that will be used to store the string into. It is a requirement that the 'len' used for this is a most the length of the string being recorded. This string can still use __get_str() just like strings created with __string() can use to retrieve the string. Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/ Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17SUNRPC: Add svc_rqst_replace_page() APIChuck Lever
Replacing a page in rq_pages[] requires a get_page(), which is a bus-locked operation, and a put_page(), which can be even more costly. To reduce the cost of replacing a page in rq_pages[], batch the put_page() operations by collecting "freed" pages in a pagevec, and then release those pages when the pagevec is full. This pagevec is also emptied when each RPC completes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17locking/spinlock: Provide RT variant header: <linux/spinlock_rt.h>Thomas Gleixner
Provide the necessary wrappers around the actual rtmutex based spinlock implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210815211303.712897671@linutronix.de