summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-07-14ARM: 8586/1: cpuidle: make arm_cpuidle_suspend() a bit more efficientJisheng Zhang
Currently, we check cpuidle_ops.suspend every time when entering a low-power idle state. But this check could be avoided in this hot path by moving it into arm_cpuidle_read_ops() to reduce arm_cpuidle_suspend overhead a bit. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-07-14ARM: 8585/1: cpuidle: fix !cpuidle_ops[cpu].init case during initJisheng Zhang
Let's assume cpuidle_ops exists but it doesn't implement the according init callback, current arm_cpuidle_init() will return success to its caller, but in fact it should return -EOPNOTSUPP. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-07-14ARM: 8561/4: dma-mapping: Fix the coherent case when iommu is usedGregory CLEMENT
When doing dma allocation with IOMMU the __iommu_alloc_atomic() was used even when the system was coherent. However, this function allocates from a non-cacheable pool, which is fine when the device is not cache coherent but won't work as expected in the device is cache coherent. Indeed, the CPU and device must access the memory using the same cacheability attributes. Moreover when the devices are coherent, the mmap call must not change the pg_prot flags in the vma struct. The arm_coherent_iommu_mmap_attrs has been updated in the same way that it was done for the arm_dma_mmap in commit 55af8a91640d ("ARM: 8387/1: arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap"). Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-07-14ARM: 8561/3: dma-mapping: Don't use outer_flush_range when the L2C is coherentGregory CLEMENT
When a L2 cache controller is used in a system that provides hardware coherency, the entire outer cache operations are useless, and can be skipped. Moreover, on some systems, it is harmful as it causes deadlocks between the Marvell coherency mechanism, the Marvell PCIe controller and the Cortex-A9. In the current kernel implementation, the outer cache flush range operation is triggered by the dma_alloc function. This operation can be take place during runtime and in some circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x SoCs. This patch extends the __dma_clear_buffer() function to receive a boolean argument related to the coherency of the system. The same things is done for the calling functions. Reported-by: Nadav Haklai <nadavh@marvell.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-07-14tools: Make "__always_inline" just "inline" on AndroidArnaldo Carvalho de Melo
As the gcc there is producing tons of: "warning: always_inline function might not be inlinable" At least on android-ndk-r12/platforms/android-24/arch-arm, so, for the time being, use this big hammer. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Chris Phlipot <cphlipot0@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-97l3eg3fnk5shmo4rsyyvj2t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14perf tools: Do not provide dup sched_getcpu() prototype on AndroidArnaldo Carvalho de Melo
The Bionic libc has this definition, so don't duplicate it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Chris Phlipot <cphlipot0@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-rmd19832zkt07e4crdzyen9z@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14libata-eh: decode all taskfile protocolsHannes Reinecke
Some taskfile protocol values where missing in ata_eh_link_report(). Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-14ata: fixup ATA_PROT_NODATAHannes Reinecke
The taskfile protocol is a numeric value, and can not be ORed. Currently this is harmless as the protocol is always zeroed before, but if it ever has a non-zero value the ORing would create incorrect results. Signed-off-by: Hannes Reinecke <hare@suse.de> [hch: updated patch description] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-14libsas: use ata_is_ncq() and ata_has_dma() accessorsHannes Reinecke
Use accessors instead of the raw protocol value. Signed-off-by: Hannes Reinecke <hare@suse.com> [hch: trivial cleanup of the ata_task assignments] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-14libata: use ata_is_ncq() accessorsHannes Reinecke
Use accessor functions instead of the raw value. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-14libata: return boolean values from ata_is_*Christoph Hellwig
This way we don't have to worry about the exact bit postition of the test to leak out and any crazy propagation effects in the callers. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-14tools lib traceevent: Add correct header for ipv6 definitionsArnaldo Carvalho de Melo
We need to include netinet/in.h to get the in6_addr struct definition, needed to build it on the Android NDK: In file included from event-parse.c:36:0: /home/acme/android/android-ndk-r12/platforms/android-24/arch-arm/usr/include/netinet/ip6.h:82:18: error: field 'ip6_src' has incomplete type struct in6_addr ip6_src; /* source address */ And it is the canonical way of getting IPv6 definitions, as described, for instance, in Linux's 'man ipv6' Doing that uncovers another problem: this source file uses PRIu64 but doesn't include it, depending on it being included by chance via the now replaced header (netinet/ip6.h), fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Chris Phlipot <cphlipot0@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-tilr31n3yaba1whsd47qlwa3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-14ARM: 8560/1: errata: Workaround errata A12 825619 / A17 852421Doug Anderson
The workaround for both errata is to set bit 24 in the diagnostic register. There are no known end-user bugs solved by fixing this errata, but the fix is trivial and it seems sane to apply it. The arguments for why this needs to be in the kernel are similar to the arugments made in the patch "Workaround errata A12 818325/852422 A17 852423". Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-07-14ARM: 8559/1: errata: Workaround erratum A12 821420Doug Anderson
This erratum has a very simple workaround (set a bit in a register), so let's apply it. Apparently the workaround's downside is a very slight power impact. Note that applying this errata fixes deadlocks that are easy to reproduce with real world applications. The arguments for why this needs to be in the kernel are similar to the arugments made in the patch "Workaround errata A12 818325/852422 A17 852423". Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-07-14ARM: 8558/1: errata: Workaround errata A12 818325/852422 A17 852423Doug Anderson
There are several similar errata on Cortex A12 and A17 that all have the same workaround: setting bit[12] of the Feature Register. Technically the list of errata are: - A12 818325: Execution of an UNPREDICTABLE STR or STM instruction might deadlock. Fixed in r0p1. - A12 852422: Execution of a sequence of instructions might lead to either a data corruption or a CPU deadlock. Not fixed in any A12s yet. - A17 852423: Execution of a sequence of instructions might lead to either a data corruption or a CPU deadlock. Not fixed in any A17s yet. Since A12 got renamed to A17 it seems likely that there won't be any future Cortex-A12 cores, so we'll enable for all Cortex-A12. For Cortex-A17 I believe that all known revisions are affected and that all knows revisions means <= r1p2. Presumably if a new A17 was released it would have this problem fixed. Note that in <https://patchwork.kernel.org/patch/4735341/> folks previously expressed opposition to this change because: A) It was thought to only apply to r0p0 and there were no known r0p0 boards supported in mainline. B) It was argued that such a workaround beloned in firmware. Now that this same fix solves other errata on real boards (like rk3288) point A) is addressed. Point B) is impossible to address on boards like rk3288. On rk3288 the firmware doesn't stay resident in RAM and isn't involved at all in the suspend/resume process nor in the SMP bringup process. That means that the most the firmware could do would be to set the bit on "core 0" and this bit would be lost at suspend/resume time. It is true that we could write a "generic" solution that saved the boot-time "core 0" value of this register and applied it at SMP bringup / resume time. However, since this register (described as the "Feature Register" in errata) appears to be undocumented (as far as I can tell) and is only modified for these errata, that "generic" solution seems questionably cleaner. The generic solution also won't fix existing users that haven't happened to do a FW update. Note that in ARM64 presumably PSCI will be universal and fixes like this will end up in ATF. Hopefully we are nearing the end of this style of errata workaround. Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Huang Tao <huangtao@rock-chips.com> Signed-off-by: Kever Yang <kever.yang@rock-chips.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-07-14drm/i915: Ignore panel type from OpRegion on SKLVille Syrjälä
Dell XPS 13 9350 apparently doesn't like it when we use the panel type from OpRegion. The OpRegion panel type (0) tells us to use use low vswing for eDP, whereas the VBT panel type (2) tells us to use normal vswing. The problem is that low vswing results in some display flickers. Since no one seems to know how this stuff is supposed to be handled, let's just ignore the OpRegion panel type on SKL for now. v2: Print the panel type correctly in the debug output Reported-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: drm-intel-fixes@lists.freedesktop.org References: https://lists.freedesktop.org/archives/intel-gfx/2016-June/098826.html Fixes: a05628195a0d ("drm/i915: Get panel_type from OpRegion panel details") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468324837-29237-1-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (cherry picked from commit bb10d4ec3be4b069bfb61c60ca4f708f58f440f1) [danvet: Fix up cherry-pick conflict with an s/dev_priv/dev/.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-14drm/i915: Update ifdeffery for mutex->ownerChris Wilson
In commit 7608a43d8f2e ("locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate") the owner field in the mutex was updated from being dependent upon CONFIG_SMP to using optimistic spin. Update our peek function to suite. Fixes:7608a43d8f2e ("locking/mutexes: Use MUTEX_SPIN_ON_OWNER...") Reported-by: Hong Liu <hong.liu@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1468244777-4888-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> (cherry picked from commit 4f074a5393431a7d2cc0de7fcfe2f61d24854628) Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-07-14i2c: core: Add function for finding the bus speed from ACPIJarkko Nikula
ACPI 5 specification doesn't have property for the I2C bus speed but I2cSerialBus resource descriptors which define each controller-slave connection define the maximum speed supported by that connection. Thus finding the maximum safe speed for the bus is to walk all I2cSerialBus resources that are associated to I2C controller and use the speed of slowest connection. Add function i2c_acpi_find_bus_speed() to the i2c-core that adapter drivers can call prior registering itself to core. This implies two-step walk through the I2cSerialBus resources: call to i2c_acpi_find_bus_speed() does the first scan and finds the safe bus speed that adapter drivers can set up. Adapter driver registration does the second scan when i2c-core creates the I2C slaves by calling the i2c_acpi_register_devices(). In that way the bus speed is set in case slave device probe gets called during registration and does communication. Implement this by reusing the existing ACPI I2C walk routines in the i2c-core. Extend them so that slowest connection speed is saved during the walk and I2C slaves are registered only when calling through the i2c_acpi_register_devices() with the i2c_adapter pointer. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: core: Cleanup I2C ACPI namespaceJarkko Nikula
I2C ACPI enumeration was originally implemented in another module under drivers/acpi/ but was later moved into i2c-core with added support for I2C ACPI operation region. Rename these acpi_i2c_ prefixed functions, structures and defines in i2c-core to i2c_acpi_ in order to have more consistent name space. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: use pr_fmt in the coreWolfram Sang
Now that we revisited all error messages, we can use pr_fmt for the remaining pr_* messages to ensure consistent output. Signed-off-by: Wolfram Sang <wsa-dev@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: print more info when acpi_i2c_space_handler() failsWolfram Sang
Use a warning loglevel instead of info and switch to dev_* for device info. Also print which client was accessed. Signed-off-by: Wolfram Sang <wsa-dev@sang-engineering.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: print more info when of_i2c_notify failsWolfram Sang
Use dev_err instead of pr_err for more details. Signed-off-by: Wolfram Sang <wsa-dev@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: add error message when obtaining idr failsWolfram Sang
Fix some whitespace issues while here. Signed-off-by: Wolfram Sang <wsa-dev@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: improve error messages in i2c_register_adapter()Wolfram Sang
Switch to WARN if no adapter name is given, otherwise we won't know who missed to do that. Add error message if device registration fails. Update error message for missing algo to match style of the others. Signed-off-by: Wolfram Sang <wsa-dev@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: cleanup i2c_register_adapter() by refactoring recovery initWolfram Sang
Move recovery init to a seperate function to let have i2c_register_adapter() less lines and to avoid goto and a label. Refactor string handling there for consistency and to save some bytes. Signed-off-by: Wolfram Sang <wsa-dev@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: free idr when sanity checks in i2c_register_adapter() failWolfram Sang
On error, we should give idr back to the pool in any case. Signed-off-by: Wolfram Sang <wsa-dev@sang-engineering.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: versatile: Convert to use resource managed devm_* APIsAxel Lin
Use devm_* APIs to simplify the code a bit. This patch also fixes the memory leak when unload the module. Signed-off-by: Axel Lin <axel.lin@ingics.com> Tested-by: Liviu Dudau <Liviu.Dudau@arm.com> Acked-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14i2c: versatile: Allow compile test buildAxel Lin
There is no build dependency for this driver, so enable COMPILE_TEST to get better build coverage. Signed-off-by: Axel Lin <axel.lin@ingics.com> Tested-by: Liviu Dudau <Liviu.Dudau@arm.com> Acked-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-07-14powerpc: Make ppc_md.{halt, restart} __noreturnDaniel Axtens
powernv marks it's halt and restart calls as __noreturn. However, ppc_md does not have this annotation. Add the annotation to ppc_md, and then to every halt/restart function that is missing it. Additionally, I have verified that all of these functions do not return. Occasionally I have added a spin loop to be sure. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14s390/cio: allow to reset channel measurement blockSebastian Ott
Prior to commit 1bc6664bdfb949bc69a08113801e7d6acbf6bc3f a call to enable_cmf for a device for which channel measurement was already enabled resulted in a reset of the measurement data. What looked like bugs at the time (a 2nd allocation was triggered but failed, reset was called regardless of previous failures, and errors have not been reported to userspace) was actually something at least one userspace tool depended on. Restore that behavior in a sane way. Fixes: 1bc6664bdfb ("s390/cio: use device_lock during cmb activation") Cc: stable@vger.kernel.org #v4.4+ Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-14powerpc/sparse: Pass endianness to sparseDaniel Axtens
Explicitly give sparse an endianness in the Makefile, so that it doesn't get confused. Normally we have #ifdef one and #else the other, so it doesn't usually matter, but we have been bitten by it before, and indeed this patch fixes a number of sparse errors. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/kvm: Clarify __user annotationsDaniel Axtens
kvmppc_h_put_tce_indirect labels a u64 pointer as __user. It also labelled the u64 where get_user puts the result as __user. This isn't a pointer and so doesn't need to be labelled __user. Split the u64 value definition onto a new line to make it clear that it doesn't get the annotation. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/pmac/smp: Add missing FROZEN hotplug notifier transitionsAnna-Maria Gleixner
The FROZEN transitions are used when a CPU suspends/resumes. In case of a suspend/resume, only the up prepare (CPU_UP_PREPARE_FROZEN) is handled. The error handling transition CPU_UP_CANCELED_FROZEN as well as the CPU_ONLINE_FROZEN transition are not handled. Masking the switch case action argument with ~CPU_TASKS_FROZEN, to handle all FROZEN tasks the same way than the corresponding non frozen tasks. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cardsAndrew Donnellan
Add a new API, cxl_check_and_switch_mode() to allow for switching of bi-modal CAPI cards, such as the Mellanox CX-4 network card. When a driver requests to switch a card to CAPI mode, use PCI hotplug infrastructure to remove all PCI devices underneath the slot. We then write an updated mode control register to the CAPI VSEC, hot reset the card, and reprobe the card. As the card may present a different set of PCI devices after the mode switch, use the infrastructure provided by the pnv_php driver and the OPAL PCI slot management facilities to ensure that: * the old devices are removed from both the OPAL and Linux device trees * the new devices are probed by OPAL and added to the OPAL device tree * the new devices are added to the Linux device tree and probed through the regular PCI device probe path As such, introduce a new option, CONFIG_CXL_BIMODAL, with a dependency on the pnv_php driver. Refactor existing code that touches the mode control register in the regular single mode case into a new function, setup_cxl_protocol_area(). Co-authored-by: Ian Munsie <imunsie@au1.ibm.com> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power stateAndrew Donnellan
When calling pnv_php_set_slot_power_state() with state == OPAL_PCI_SLOT_OFFLINE, remove devices from the device tree as if we're dealing with OPAL_PCI_SLOT_POWER_OFF. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14PCI/hotplug: pnv_php: export symbols and move struct types needed by cxlAndrew Donnellan
The cxl driver will use infrastructure from pnv_php to handle device tree updates when switching bi-modal CAPI cards into CAPI mode. To enable this, export pnv_php_find_slot() and pnv_php_set_slot_power_state(), and add corresponding declarations, as well as the definition of struct pnv_php_slot, to asm/pnv-pci.h. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Workaround PE=0 hardware limitation in Mellanox CX4Ian Munsie
The CX4 card cannot cope with a context with PE=0 due to a hardware limitation, resulting in: [ 34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939 [ 34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting Since the kernel API allocates a default context very early during device init that will almost certainly get Process Element ID 0 there is no easy way for us to extend the API to allow the Mellanox to inform us of this limitation ahead of time. Instead, work around the issue by extending the XSL structure to include a minimum PE to allocate. Although the bug is not in the XSL, it is the easiest place to work around this limitation given that the CX4 is currently the only card that uses an XSL. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add support for interrupts on the Mellanox CX4Ian Munsie
The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where interrupts are routed from the networking hardware to the XSL using the MSIX table, and from there will be transformed back into an MSIX interrupt using the cxl style interrupts (i.e. using IVTE entries and ranges to map a PE and AFU interrupt number to an MSIX address). We want to hide the implementation details of cxl interrupts as much as possible. To this end, we use a special version of the MSI setup & teardown routines in the PHB while in cxl mode to allocate the cxl interrupts and configure the IVTE entries in the process element. This function does not configure the MSIX table - the CX4 card uses a custom format in that table and it would not be appropriate to fill that out in generic code. The rest of the functionality is similar to the "Full MSI-X mode" described in the CAIA, and this could be easily extended to support other adapters that use that mode in the future. The interrupts will be associated with the default context. If the maximum number of interrupts per context has been limited (e.g. by the mlx5 driver), it will automatically allocate additional kernel contexts to associate extra interrupts as required. These contexts will be started using the same WED that was used to start the default context. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add preliminary workaround for CX4 interrupt limitationIan Munsie
The Mellanox CX4 has a hardware limitation where only 4 bits of the AFU interrupt number can be passed to the XSL when sending an interrupt, limiting it to only 15 interrupts per context (AFU interrupt number 0 is invalid). In order to overcome this, we will allocate additional contexts linked to the default context as extra address space for the extra interrupts - this will be implemented in the next patch. This patch adds the preliminary support to allow this, by way of adding a linked list in the context structure that we use to keep track of the contexts dedicated to interrupts, and an API to simultaneously iterate over the related context structures, AFU interrupt numbers and hardware interrupt numbers. The point of using a single API to iterate these is to hide some of the details of the iteration from external code, and to reduce the number of APIs that need to be exported via base.c to allow built in code to call. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add kernel APIs to get & set the max irqs per contextIan Munsie
These APIs will be used by the Mellanox CX4 support. While they function standalone to configure existing behaviour, their primary purpose is to allow the Mellanox driver to inform the cxl driver of a hardware limitation, which will be used in a future patch. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add support for using the kernel API with a real PHBIan Munsie
This hooks up support for using the kernel API with a real PHB. After the AFU initialisation has completed it calls into the PHB code to pass it the AFU that will be used by other peer physical functions on the adapter. The cxl_pci_to_afu API is extended to work with peer PCI devices, retrieving the peer AFU from the PHB. This API may also now return an error if it is called on a PCI device that is not associated with either a cxl vPHB or a peer PCI device to an AFU, and this error is propagated down. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/powernv: Add support for the cxl kernel api on the real phbIan Munsie
This adds support for the peer model of the cxl kernel api to the PowerNV PHB, in which physical function 0 represents the cxl function on the card (an XSL in the case of the CX4), which other physical functions will use for memory access and interrupt services. It is referred to as the peer model as these functions are peers of one another, as opposed to the Virtual PHB model which forms a hierarchy. This patch exports APIs to enable the peer mode, check if a PCI device is attached to a PHB in this mode, and to set and get the peer AFU for this mode. The cxl driver will enable this mode for supported cards by calling pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note that this mode is enabled, and switch out it's controller_ops for the cxl version. The cxl version of the controller_ops struct implements it's own versions of the enable_device_hook and release_device to handle refcounting on the peer AFU and to allocate a default context for the device. Once enabled, the cxl kernel API may not be disabled on a PHB. Currently there is no safe way to disable cxl mode short of a reboot, so until that changes there is no reason to support the disable path. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Do not create vPHB if there are no AFU configuration recordsIan Munsie
The vPHB model of the cxl kernel API is a hierarchy where the AFU is represented by the vPHB, and it's AFU configuration records are exposed as functions under that vPHB. If there are no AFU configuration records we will create a vPHB with nothing under it, which is a waste of resources and will opt us into EEH handling despite not having anything special to handle. This also does not make sense for cards using the peer model of the cxl kernel API, where the other functions of the device are exposed via additional peer physical functions rather than AFU configuration records. This model will also not work with the existing EEH handling in the cxl driver, as that is designed around the vPHB model. Skip creating the vPHB for AFUs without any AFU configuration records, and opt out of EEH handling for them. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Allow a default context to be associated with an external pci_devIan Munsie
The cxl kernel API has a concept of a default context associated with each PCI device under the virtual PHB. The Mellanox CX4 will also use the cxl kernel API, but it does not use a virtual PHB - rather, the AFU appears as a physical function as a peer to the networking functions. In order to allow the kernel API to work with those networking functions, we will need to associate a default context with them as well. To this end, refactor the corresponding code to do this in vphb.c and export it so that it can be called from the PHB code. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Move cxl_afu_get / cxl_afu_put to baseIan Munsie
The Mellanox CX4 uses a model where the AFU is one physical function of the device, and is used by other peer physical functions of the same device. This will require those other devices to grab a reference on the AFU when they are initialised to make sure that it does not go away during their lifetime. Move the AFU refcount functions to base.c so they can be called from the PHB code. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Enable bus mastering for devices using CAPP DMA modeIan Munsie
Devices that use CAPP DMA mode (such as the Mellanox CX4) require bus master to be enabled in order for the CAPI traffic to flow. This should be harmless to enable for other cxl devices, so unconditionally enable it in the adapter init flow. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add cxl_slot_is_supported APIIan Munsie
This extends the check that the adapter is in a CAPI capable slot so that it may be called by external users in the kernel API. This will be used by the upcoming Mellanox CX4 support, which needs to know ahead of time if the card can be switched to cxl mode so that it can leave it in PCI mode if it is not. This API takes a parameter to check if CAPP DMA mode is supported, which it currently only allows on P8NVL systems, since that mode currently has issues accessing memory < 4GB on P8, and we cannot realistically avoid that. This API does not currently check if a CAPP unit is available (i.e. not already assigned to another PHB) on P8. Doing so would be racy since it is assigned on a first come first serve basis, and so long as CAPP DMA mode is not supported on P8 we don't need this, since the only anticipated user of this API requires CAPP DMA mode. Cc: Philippe Bergheaud <felix@linux.vnet.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/powernv: Split cxl code out into a separate fileIan Munsie
The support for using the Mellanox CX4 in cxl mode will require additions to the PHB code. In preparation for this, move the existing cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things more organised. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Use for_each_compatible_node() macroWei Yongjun
Use for_each_compatible_node() macro instead of open coding it. Generated by Coccinelle. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14selftests/powerpc: Add a test for PROT_SAOMichael Ellerman
PROT_SAO is a powerpc-specific flag to mmap(), and we rely on arch specific logic to allow it to be passed to mmap(). Add a small test to ensure mmap() accepts PROT_SAO. We don't have a good way to test that it actually causes the mapping to be created with the right flags, so for now we just touch the mapping so it's faulted in. In future we might be able to do something better. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>