summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2021-06-11PM: domains: Drop/restore performance state votes for devices at runtime PMUlf Hansson
A subsystem/driver that need to manage OPPs for its device, should typically drop its vote for the OPP when the device becomes runtime suspended. In this way, the corresponding aggregation of the performance state votes that is managed in genpd for the attached PM domain, may find that the aggregated vote can be decreased. Hence, it may allow genpd to set the lower performance state for the PM domain, thus avoiding to waste energy. To accomplish this, typically a subsystem/driver would need to call dev_pm_opp_set_rate|opp() for its device from its ->runtime_suspend() callback, to drop the vote for the OPP. Accordingly, it needs another call to dev_pm_opp_set_rate|opp() to restore the vote for the OPP from its ->runtime_resume() callback. To avoid boilerplate code in subsystems/driver to deal with these things, let's instead manage this internally in genpd. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-11ice: add low level PTP clock access functionsJacob Keller
Add the ice_ptp_hw.c file and some associated definitions to the ice driver folder. This file contains basic low level definitions for functions that interact with the device hardware. For now, only E810-based devices are supported. The ice hardware supports 2 major variants which have different PHYs with different procedures necessary for interacting with the device clock. Because the device captures timestamps in the PHY, each PHY has its own internal timer. The timers are synchronized in hardware by first preparing the source timer and the PHY timer shadow registers, and then issuing a synchronization command. This ensures that both the source timer and PHY timers are programmed simultaneously. The timers themselves are all driven from the same oscillator source. The functions in ice_ptp_hw.c abstract over the differences between how the PHYs in E810 are programmed vs how the PHYs in E822 devices are programmed. This series only implements E810 support, but E822 support will be added in a future change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11nvmem: prepare basics for FRAM supportJiri Prchal
Added enum and string for FRAM (ferroelectric RAM) to expose it as file named "fram". Added documentation of sysfs file. Signed-off-by: Jiri Prchal <jiri.prchal@aksignal.cz> Link: https://lore.kernel.org/r/20210611094601.95131-2-jiri.prchal@aksignal.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-11perf: Add EVENT_ATTR_ID to simplify event attributesQi Liu
Similar EVENT_ATTR macros are defined in many PMU drivers, like Arm PMU driver, Arm SMMU PMU driver. So add a generic macro to simplify code. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Qi Liu <liuqi115@huawei.com> Link: https://lore.kernel.org/r/1623220863-58233-2-git-send-email-liuqi115@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2021-06-11drivers/soc/litex: remove 8-bit subregister optionGabriel Somlo
Since upstream LiteX recommends that Linux support be limited to designs configured with 32-bit CSR subregisters (see commit a2b71fde in upstream LiteX, https://github.com/enjoy-digital/litex), remove the option to select 8-bit subregisters, significantly reducing the complexity of LiteX CSR (MMIO register) accessor methods. NOTE: for details on the underlying mechanics of LiteX CSR registers, see https://github.com/enjoy-digital/litex/wiki/CSR-Bus or the original LiteX accessors (litex/soc/software/include/hw/common.h in the upstream repository). Signed-off-by: Gabriel Somlo <gsomlo@gmail.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Florent Kermarrec <florent@enjoy-digital.fr> Cc: Mateusz Holenko <mholenko@antmicro.com> Cc: Joel Stanley <joel@jms.id.au> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stafford Horne <shorne@gmail.com>
2021-06-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A mixture of small bug fixes and a small security issue: - WARN_ON when IPoIB is automatically moved between namespaces - Long standing bug where mlx5 would use the wrong page for the doorbell recovery memory if fork is used - Security fix for mlx4 that disables the timestamp feature - Several crashers for mlx5 - Plug a recent mlx5 memory leak for the sig_mr" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/mlx5: Fix initializing CQ fragments buffer RDMA/mlx5: Delete right entry from MR signature database RDMA: Verify port when creating flow rule RDMA/mlx5: Block FDB rules when not in switchdev mode RDMA/mlx4: Do not map the core_clock page to user space unless enabled RDMA/mlx5: Use different doorbell memory for different processes RDMA/ipoib: Fix warning caused by destroying non-initial netns
2021-06-10bootconfig: Share the checksum function with toolsMasami Hiramatsu
Move the checksum calculation function into the header for sharing it with tools/bootconfig. Link: https://lkml.kernel.org/r/162262197470.264090.16325743685807878807.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-10bootconfig: Support mixing a value and subkeys under a keyMasami Hiramatsu
Support mixing a value and subkeys under a key. Since kernel cmdline options will support "aaa.bbb=value1 aaa.bbb.ccc=value2", it is better that the bootconfig supports such configuration too. Note that this does not change syntax itself but just accepts mixed value and subkeys e.g. key = value1 key.subkey = value2 But this is not accepted; key { value1 subkey = value2 } That will make value1 as a subkey. Also, the order of the value node under a key is fixed. If there are a value and subkeys, the value is always the first child node of the key. Thus if user specifies subkeys first, e.g. key.subkey = value1 key = value2 In the program (and /proc/bootconfig), it will be shown as below key = value2 key.subkey = value1 Link: https://lkml.kernel.org/r/162262194685.264090.7738574774030567419.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-10bootconfig: Change array value to use child nodeMasami Hiramatsu
It is not possible to put an array value with subkeys under a key node, because both of subkeys and the array elements are using "next" field of the xbc_node. Thus this changes the array values to use "child" field in the array case. The reason why split this change is to test it easily. Link: https://lkml.kernel.org/r/162262193838.264090.16044473274501498656.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-10iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing ↵Al Viro
variant Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the callers do *not* need to do iov_iter_advance() after it. In case when they end up consuming less than they'd been given they need to do iov_iter_revert() on everything they had not consumed. That, however, needs to be done only on slow paths. All in-tree callers converted. And that kills the last user of iterate_all_kinds() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10sanitize iov_iter_fault_in_readable()Al Viro
1) constify iov_iter argument; we are not advancing it in this primitive. 2) cap the amount requested by the amount of data in iov_iter. All existing callers should've been safe, but the check is really cheap and doing it here makes for easier analysis, as well as more consistent semantics among the primitives. 3) don't bother with iterate_iovec(). Explicit loop is not any harder to follow, and we get rid of standalone iterate_iovec() users - it's only used by iterate_and_advance() and (soon to be gone) iterate_all_kinds(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10iov_iter: separate direction from flavourAl Viro
Instead of having them mixed in iter->type, use separate ->iter_type and ->data_source (u8 and bool resp.) And don't bother with (pseudo-) bitmap for the former - microoptimizations from being able to check if the flavour is one of two values are not worth the confusion for optimizer. It can't prove that we never get e.g. ITER_IOVEC | ITER_PIPE, so we end up with extra headache. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert()Al Viro
Use corresponding plain variants, revert on short copy. That's the way it should've been done from the very beginning, except that we didn't have iov_iter_revert() back then... [fixed another braino caught by Qian Cai <quic_qiancai@quicinc.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10memory: pl353-smc: Let lower level controller drivers handle initsMiquel Raynal
There is no point in having all these definitions at the SMC bus level, these are extremely tight to the NAND controller driver implementation, are not particularly generic, imply more boilerplate than needed, do not really follow the device model by receiving no argument and some of them are actually buggy. Let's get rid of these right now as there is no current user and keep this driver at a simple level: only the SMC bare initializations. The NAND controller driver which I am going to introduce will take care of redefining properly all these helpers and using them directly. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20210610082040.2075611-13-miquel.raynal@bootlin.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
2021-06-10scsi: blkcg: Add app identifier support for blkcgMuneendra Kumar
Add a unique application identifier (i.e fc_app_id member) in blkcg. This allows identification of traffic belonging to an specific both on the host and in the fabric infrastructure. As an example, this allows the storage stack to uniquely identify traffic belong to particular virtual machine. Link: https://lore.kernel.org/r/20210608043556.274139-3-muneendra.kumar@broadcom.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-10scsi: cgroup: Add cgroup_get_from_id()Muneendra Kumar
Add a new function, cgroup_get_from_id(), to retrieve the cgroup associated with a cgroup id. Also export the function cgroup_get_e_css() as this is needed in blk-cgroup.h. Link: https://lore.kernel.org/r/20210608043556.274139-2-muneendra.kumar@broadcom.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-10ACPI: Add \_SB._OSC bit for PRMErik Kaneda
Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-10ACPI: PRM: implement OperationRegion handler for the PlatformRtMechanism subtypeErik Kaneda
Platform Runtime Mechanism (PRM) is a firmware interface that exposes a set of binary executables that can either be called from the AML interpreter or device drivers by bypassing the AML interpreter. This change implements the AML interpreter path. According to the specification [1], PRM services are listed in an ACPI table called the PRMT. This patch parses module and handler information listed in the PRMT and registers the PlatformRtMechanism OpRegion handler before ACPI tables are loaded. Each service is defined by a 16-byte GUID and called from writing a 26-byte ASL buffer containing the identifier to a FieldUnit object defined inside a PlatformRtMechanism OperationRegion. OperationRegion (PRMR, PlatformRtMechanism, 0, 26) Field (PRMR, BufferAcc, NoLock, Preserve) { PRMF, 208 // Write to this field to invoke the OperationRegion Handler } The 26-byte ASL buffer is defined as the following: Byte Offset Byte Length Description ============================================================= 0 1 PRM OperationRegion handler status 1 8 PRM service status 9 1 PRM command 10 16 PRM handler GUID The ASL caller fills out a 26-byte buffer containing the PRM command and the PRM handler GUID like so: /* Local0 is the PRM data buffer */ Local0 = buffer (26){} /* Create byte fields over the buffer */ CreateByteField (Local0, 0x9, CMD) CreateField (Local0, 0x50, 0x80, GUID) /* Fill in the command and data fields of the data buffer */ CMD = 0 // run command GUID = ToUUID("xxxx-xx-xxx-xxxx") /* * Invoke PRM service with an ID that matches GUID and save the * result. */ Local0 = (\_SB.PRMT.PRMF = Local0) Byte offset 0 - 8 are written by the handler as a status passed back to AML and used by ASL like so: /* Create byte fields over the buffer */ CreateByteField (Local0, 0x0, PSTA) CreateQWordField (Local0, 0x1, USTA) In this ASL code, PSTA contains a status from the OperationRegion and USTA contains a status from the PRM service. The 26-byte buffer is recieved by acpi_platformrt_space_handler. This handler will look at the command value and the handler guid and take the approperiate actions. Command value Action ===================================================================== 0 Run the PRM service indicated by the PRM handler GUID (bytes 10-26) 1 Prevent PRM runtime updates from happening to the service's parent module 2 Allow PRM updates from happening to the service's parent module This patch enables command value 0. Link: https://uefi.org/sites/default/files/resources/Platform%20Runtime%20Mechanism%20-%20with%20legal%20notice.pdf # [1] Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-10genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ()Marc Zyngier
Despite the name, handle_domain_irq() deals with non-irqdomain handling for the sake of a handful of legacy ARM platforms. Move such handling into ARM's handle_IRQ(), allowing for better code generation for everyone else. This allows us get rid of some complexity, and to rearrange the guards on the various helpers in a more logical way. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10genirq: Add generic_handle_domain_irq() helperMarc Zyngier
Provide generic_handle_domain_irq() as a pendent to handle_domain_irq() for non-root interrupt controllers Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10irqdesc: Fix __handle_domain_irq() commentMarc Zyngier
It appears that the comment about a NULL domain meaning anything has always been wrong. Fix it. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and coMarc Zyngier
In order to start reaping the benefits of irq_resolve_mapping(), start using it in __handle_domain_irq() and handle_domain_nmi(). This involves splitting generic_handle_irq() to be able to directly provide the irq_desc. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10irqdomain: Introduce irq_resolve_mapping()Marc Zyngier
Rework irq_find_mapping() to return an both an irq_desc pointer, optionally the virtual irq number, and rename the result to __irq_resolve_mapping(). a new helper called irq_resolve_mapping() is provided for code that doesn't need the virtual irq number. irq_find_mapping() is also rewritten in terms of __irq_resolve_mapping(). Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10irqdomain: Protect the linear revmap with RCUMarc Zyngier
It is pretty odd that the radix tree uses RCU while the linear portion doesn't, leading to potential surprises for the users, depending on how the irqdomain has been created. Fix this by moving the update of the linear revmap under the mutex, and the lookup under the RCU read-side lock. The mutex name is updated to reflect that it doesn't only cover the radix-tree anymore. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10irqdomain: Cache irq_data instead of a virq number in the revmapMarc Zyngier
Caching a virq number in the revmap is pretty inefficient, as it means we will need to convert it back to either an irq_data or irq_desc to do anything with it. It is also a bit odd, as the radix tree does cache irq_data pointers. Change the revmap type to be an irq_data pointer instead of an unsigned int, and preserve the current API for now. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10irqdomain: Make normal and nomap irqdomains exclusiveMarc Zyngier
Direct mappings are completely exclusive of normal mappings, meaning that we can refactor the code slightly so that we can get rid of the revmap_direct_max_irq field and use the revmap_size field instead, reducing the size of the irqdomain structure. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10powerpc: Move the use of irq_domain_add_nomap() behind a config optionMarc Zyngier
Only a handful of old PPC systems are still using the old 'nomap' variant of the irqdomain library. Move the associated definitions behind a configuration option, which will allow us to make some more radical changes. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10irqdomain: Reimplement irq_linear_revmap() with irq_find_mapping()Marc Zyngier
irq_linear_revmap() is supposed to be a fast path for domain lookups, but it only exposes low-level details of the irqdomain implementation, details which are better kept private. The *overhead* between the two is only a function call and a couple of tests, so it is likely that noone can show any meaningful difference compared to the cost of taking an interrupt. Reimplement irq_linear_revmap() with irq_find_mapping() in order to preserve source code compatibility, and rename the internal field for a measure. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10irqdomain: Kill irq_domain_add_legacy_isaMarc Zyngier
This helper doesn't have a user anymore, let's remove it. Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10iommu/vt-d: Define counter explicitly as unsigned intParav Pandit
Avoid below checkpatch warning. WARNING: Prefer 'unsigned int' to bare use of 'unsigned' + unsigned iommu_refcnt[DMAR_UNITS_SUPPORTED]; Fixes: 29a27719abaa ("iommu/vt-d: Replace iommu_bmp with a refcount") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-23-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Removed unused iommu_count in dmar domainParav Pandit
DMAR domain uses per DMAR refcount. It is indexed by iommu seq_id. Older iommu_count is only incremented and decremented but no decisions are taken based on this refcount. This is not of much use. Hence, remove iommu_count and further simplify domain_detach_iommu() by returning void. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-21-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Use bitfields for DMAR capabilitiesParav Pandit
IOTLB device presence, iommu coherency and snooping are boolean capabilities. Use them as bits and keep them adjacent. Structure layout before the reorg. $ pahole -C dmar_domain drivers/iommu/intel/dmar.o struct dmar_domain { int nid; /* 0 4 */ unsigned int iommu_refcnt[128]; /* 4 512 */ /* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */ u16 iommu_did[128]; /* 516 256 */ /* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */ bool has_iotlb_device; /* 772 1 */ /* XXX 3 bytes hole, try to pack */ struct list_head devices; /* 776 16 */ struct list_head subdevices; /* 792 16 */ struct iova_domain iovad __attribute__((__aligned__(8))); /* 808 2320 */ /* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */ struct dma_pte * pgd; /* 3128 8 */ /* --- cacheline 49 boundary (3136 bytes) --- */ int gaw; /* 3136 4 */ int agaw; /* 3140 4 */ int flags; /* 3144 4 */ int iommu_coherency; /* 3148 4 */ int iommu_snooping; /* 3152 4 */ int iommu_count; /* 3156 4 */ int iommu_superpage; /* 3160 4 */ /* XXX 4 bytes hole, try to pack */ u64 max_addr; /* 3168 8 */ u32 default_pasid; /* 3176 4 */ /* XXX 4 bytes hole, try to pack */ struct iommu_domain domain; /* 3184 72 */ /* size: 3256, cachelines: 51, members: 18 */ /* sum members: 3245, holes: 3, sum holes: 11 */ /* forced alignments: 1 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); After arranging it for natural padding and to make flags as u8 bits, it saves 8 bytes for the struct. struct dmar_domain { int nid; /* 0 4 */ unsigned int iommu_refcnt[128]; /* 4 512 */ /* --- cacheline 8 boundary (512 bytes) was 4 bytes ago --- */ u16 iommu_did[128]; /* 516 256 */ /* --- cacheline 12 boundary (768 bytes) was 4 bytes ago --- */ u8 has_iotlb_device:1; /* 772: 0 1 */ u8 iommu_coherency:1; /* 772: 1 1 */ u8 iommu_snooping:1; /* 772: 2 1 */ /* XXX 5 bits hole, try to pack */ /* XXX 3 bytes hole, try to pack */ struct list_head devices; /* 776 16 */ struct list_head subdevices; /* 792 16 */ struct iova_domain iovad __attribute__((__aligned__(8))); /* 808 2320 */ /* --- cacheline 48 boundary (3072 bytes) was 56 bytes ago --- */ struct dma_pte * pgd; /* 3128 8 */ /* --- cacheline 49 boundary (3136 bytes) --- */ int gaw; /* 3136 4 */ int agaw; /* 3140 4 */ int flags; /* 3144 4 */ int iommu_count; /* 3148 4 */ int iommu_superpage; /* 3152 4 */ /* XXX 4 bytes hole, try to pack */ u64 max_addr; /* 3160 8 */ u32 default_pasid; /* 3168 4 */ /* XXX 4 bytes hole, try to pack */ struct iommu_domain domain; /* 3176 72 */ /* size: 3248, cachelines: 51, members: 18 */ /* sum members: 3236, holes: 3, sum holes: 11 */ /* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */ /* forced alignments: 1 */ /* last cacheline: 48 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210530075053.264218-1-parav@nvidia.com Link: https://lore.kernel.org/r/20210610020115.1637656-20-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Add common code for dmar latency performance monitorsLu Baolu
The execution time of some operations is very performance critical, such as cache invalidation and PRQ processing time. This adds some common code to monitor the execution time range of those operations. The interfaces include enabling/disabling, checking status, updating sampling data and providing a common string format for users. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-14-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Add prq_report trace eventLu Baolu
This adds a new trace event to track the page fault request report. This event will provide almost all information defined in a page request descriptor. A sample output: | prq_report: dmar0/0000:00:0a.0 seq# 1: rid=0x50 addr=0x559ef6f97 r---- pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 2: rid=0x50 addr=0x559ef6f9c rw--l pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 3: rid=0x50 addr=0x559ef6f98 r---- pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 4: rid=0x50 addr=0x559ef6f9d rw--l pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 5: rid=0x50 addr=0x559ef6f99 r---- pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 6: rid=0x50 addr=0x559ef6f9e rw--l pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 7: rid=0x50 addr=0x559ef6f9a r---- pasid=0x2 index=0x1 | prq_report: dmar0/0000:00:0a.0 seq# 8: rid=0x50 addr=0x559ef6f9f rw--l pasid=0x2 index=0x1 This will be helpful for I/O page fault related debugging. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-13-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Allocate/register iopf queue for sva devicesLu Baolu
This allocates and registers the iopf queue infrastructure for devices which want to support IO page fault for SVA. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-10iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpersLu Baolu
Align the pasid alloc/free code with the generic helpers defined in the iommu core. This also refactored the SVA binding code to improve the readability. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210520031531.712333-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210610020115.1637656-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-06-09net/mlx5: Bridge, add offload infrastructureVlad Buslov
Create new files bridge.{c|h} in en/rep directory that implement bridge interaction with representor netdevices and handle required events/notifications, bridge.{c|h} in esw directory that implement all necessary eswitch offloading infrastructure and works on vport/eswitch level. Provide new kconfig MLX5_BRIDGE which is automatically selected when both kernel bridge and mlx5 eswitch configs are enabled. Provide basic infrastructure for bridge offloads: - struct mlx5_esw_bridge_offloads - per-eswitch bridge offload structure that encapsulates generic bridge-offloads data (notifier blocks, ingress flow table/group, etc.) that is created/deleted on enable/disable eswitch offloads. - struct mlx5_esw_bridge - per-bridge structure that encapsulates per-bridge data (reference counter, FDB, egress flow table/group, etc.) that is created when first eswitch represetor is attached to new bridge and deleted when last representor is removed from the bridge as a result of NETDEV_CHANGEUPPER event. The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB namespace. The new priority is between tc-miss and slow path priorities. Priority consist of two levels: the ingress table that is global per eswitch and matches incoming packets by src_mac/vid and redirects them to next level (egress table) that is chosen according to ingress port bridge membership and matches on dst_mac/vid in order to redirect packet to vport according to the following diagram: + | +---------v----------+ | | | FDB_TC_OFFLOAD | | | +---------+----------+ | | +---------v----------+ | | | FDB_FT_OFFLOAD | | | +---------+----------+ | | +---------v----------+ | | | FDB_TC_MISS | | | +---------+----------+ | +--------------------------------------+ | | | | +------+ | | | | | +------v--------+ FDB_BR_OFFLOAD | | | INGRESS_TABLE | | | +------+---+----+ | | | | match | | | +---------+ | | | | | +-------+ | | +-------v-------+ match | | | | | | EGRESS_TABLE +------------> vport | | | +-------+-------+ | | | | | | | +-------+ | | miss | | | +------+------+ | | | | +--------------------------------------+ | | +---------v----------+ | | | FDB_SLOW_PATH | | | +---------+----------+ | v Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09net/mlx5: Create TC-miss priority and tableVlad Buslov
In order to adhere to kernel software datapath model bridge offloads must come after TC and NF FDBs. Following patches in this series add new FDB priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload is implemented with unmanaged tables, its miss path is not automatically connected to next priority and requires the code to manually connect with slow table. To keep bridge offloads encapsulated and not mix it with eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD and FDB_SLOW_PATH: + | +---------v----------+ | | | FDB_TC_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_FT_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_TC_MISS | | | +---------+----------+ | | | +---------v----------+ | | | FDB_SLOW_PATH | | | +---------+----------+ | v Initialize the new priority with single default empty managed table and use the table as TC/NF miss patch instead of slow table. This approach allows bridge offloads to be created as new FDB namespace priority between FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any other modules since miss path of managed TC-miss table is automatically wired to next priority. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09net/mlx5: Added new parameters to reformat contextYevgeny Kliteynik
Adding new reformat context type (INSERT_HEADER) requires adding two new parameters to reformat context - reformat_param_0 and reformat_param_1. As defined by HW spec, these parameters have different meaning for different reformat context type. The first parameter (reformat_param_0) is not new to HW spec, but it wasn't used by any of the supported reformats. The second parameter (reformat_param_1) is new to the HW spec - it was added to allow supporting INSERT_HEADER. For NSERT_HEADER, reformat_param_0 indicates the header used to reference the location of the inserted header, and reformat_param_1 indicates the offset of the inserted header from the reference point defined by reformat_param_0. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09net/mlx5: mlx5_ifc support for header insert/removeYevgeny Kliteynik
Add support for HCA caps 2 that contains capabilities for the new insert/remove header actions. Added the required definitions for supporting the new reformat type: added packet reformat parameters, reformat anchors and definitions to allow copy/set into the inserted EMD (Embedded MetaData) tag. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09net/mlx5e: Fix page reclaim for dead peer hairpinDima Chumak
When adding a hairpin flow, a firmware-side send queue is created for the peer net device, which claims some host memory pages for its internal ring buffer. If the peer net device is removed/unbound before the hairpin flow is deleted, then the send queue is not destroyed which leads to a stack trace on pci device remove: [ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource [ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110 [ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0 [ 748.002171] ------------[ cut here ]------------ [ 748.001177] FW pages counter is 4 after reclaiming all pages [ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core] [ +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core] [ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1 [ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core] [ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9 [ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286 [ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000 [ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51 [ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8 [ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30 [ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000 [ 748.001429] FS: 00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000 [ 748.001695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0 [ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 748.001654] Call Trace: [ 748.000576] ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core] [ 748.001416] ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core] [ 748.001354] ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core] [ 748.001203] mlx5_function_teardown+0x30/0x60 [mlx5_core] [ 748.001275] mlx5_uninit_one+0xa7/0xc0 [mlx5_core] [ 748.001200] remove_one+0x5f/0xc0 [mlx5_core] [ 748.001075] pci_device_remove+0x9f/0x1d0 [ 748.000833] device_release_driver_internal+0x1e0/0x490 [ 748.001207] unbind_store+0x19f/0x200 [ 748.000942] ? sysfs_file_ops+0x170/0x170 [ 748.001000] kernfs_fop_write_iter+0x2bc/0x450 [ 748.000970] new_sync_write+0x373/0x610 [ 748.001124] ? new_sync_read+0x600/0x600 [ 748.001057] ? lock_acquire+0x4d6/0x700 [ 748.000908] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 748.001126] ? fd_install+0x1c9/0x4d0 [ 748.000951] vfs_write+0x4d0/0x800 [ 748.000804] ksys_write+0xf9/0x1d0 [ 748.000868] ? __x64_sys_read+0xb0/0xb0 [ 748.000811] ? filp_open+0x50/0x50 [ 748.000919] ? syscall_enter_from_user_mode+0x1d/0x50 [ 748.001223] do_syscall_64+0x3f/0x80 [ 748.000892] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 748.001026] RIP: 0033:0x7f58bcfb22f7 [ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7 [ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003 [ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001 [ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d [ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700 [ 748.001564] irq event stamp: 0 [ 748.000787] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0 [ 748.001854] softirqs last enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0 [ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 748.001492] ---[ end trace a6fabd773d1c51ae ]--- Fix by destroying the send queue of a hairpin peer net device that is being removed/unbound, which returns the allocated ring buffer pages to the host. Fixes: 4d8fcf216c90 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add nfgenmsg field to nfnetlink's struct nfnl_info and use it. 2) Remove nft_ctx_init_from_elemattr() and nft_ctx_init_from_setattr() helper functions. 3) Add the nf_ct_pernet() helper function to fetch the conntrack pernetns data area. 4) Expose TCP and UDP flowtable offload timeouts through sysctl, from Oz Shlomo. 5) Add nfnetlink_hook subsystem to fetch the netfilter hook pipeline configuration, from Florian Westphal. This also includes a new field to annotate the hook type as metadata. 6) Fix unsafe memory access to non-linear skbuff in the new SCTP chunk support for nft_exthdr, from Phil Sutter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-09Merge tag 'compiler-attributes-for-linus-v5.13-rc6' of ↵Linus Torvalds
git://github.com/ojeda/linux Pull compiler attribute update from Miguel Ojeda: "A trivial update to the compiler attributes: Add 'continue' keyword to documentation in comment (from Wei Ming Chen)" * tag 'compiler-attributes-for-linus-v5.13-rc6' of git://github.com/ojeda/linux: Compiler Attributes: Add continue in comment
2021-06-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes, including a TLB flush fix that affects processors without nested page tables" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: fix previous commit for 32-bit builds kvm: avoid speculation-based attacks from out-of-range memslot accesses KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU sync KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message selftests: kvm: Add support for customized slot0 memory size KVM: selftests: introduce P47V64 for s390x KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior KVM: X86: MMU: Use the correct inherited permissions to get shadow page KVM: LAPIC: Write 0 to TMICT should also cancel vmx-preemption timer KVM: SVM: Fix SEV SEND_START session length & SEND_UPDATE_DATA query length after commit 238eca821cee
2021-06-09misc: rtsx: separate aspm mode into MODE_REG and MODE_CFGRicky Wu
aspm (Active State Power Management) rtsx_comm_set_aspm: this function is for driver to make sure not enter power saving when processing of init and card_detcct ASPM_MODE_CFG: 8411 5209 5227 5229 5249 5250 Change back to use original way to control aspm ASPM_MODE_REG: 5227A 524A 5250A 5260 5261 5228 Keep the new way to control aspm Fixes: 121e9c6b5c4c ("misc: rtsx: modify and fix init_hw function") Reported-by: Chris Chiu <chris.chiu@canonical.com> Tested-by: Gordon Lack <gordon.lack@dsl.pipex.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Ricky Wu <ricky_wu@realtek.com> Link: https://lore.kernel.org/r/20210607101634.4948-1-ricky_wu@realtek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-09fpga-mgr: change FPGA indirect article to anTom Rix
Change use of 'a fpga' to 'an fpga' Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20210608212350.3029742-9-trix@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-09fpga: bridge: change FPGA indirect article to anTom Rix
Change use of 'a fpga' to 'an fpga' Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20210608212350.3029742-8-trix@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-09vt: vt_kern.h, remove the repeated declarationShaokun Zhang
Function 'vt_set_led_state' is declared twice, so remove the repeated declaration. Cc: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/1623062933-52943-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-09reset: Add compile-test stubsThierry Reding
Add stubs for the reset controller registration functions to allow building reset controller provider drivers with the COMPILE_TEST Kconfig option enabled. Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Suggested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20210609112806.3565057-3-thierry.reding@gmail.com Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2021-06-09spi: remove spi_set_cs_timing()Greg Kroah-Hartman
No one seems to be using this global and exported function, so remove it as it is no longer needed. Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20210609071918.2852069-1-gregkh@linuxfoundation.org Signed-off-by: Mark Brown <broonie@kernel.org>