summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-03-31Merge branch 'tipc-slim-down-name-table'David S. Miller
Jon Maloy says: ==================== tipc: slim down name table We clean up and improve the name binding table: - Replace the memory consuming 'sub_sequence/service range' array with an RB tree. - Introduce support for overlapping service sequences/ranges v2: #1: Fixed a missing initialization reported by David Miller #4: Obsoleted and replaced a few more macros to get a consistent terminology in the API. #5: Added new commit to fix a potential string overflow bug (it is still only in net-next) reported by Arnd Bergmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31tipc: avoid possible string overflowJon Maloy
gcc points out that the combined length of the fixed-length inputs to l->name is larger than the destination buffer size: net/tipc/link.c: In function 'tipc_link_create': net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes into a region of size between 26 and 58 [-Werror=format-overflow=] sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str); net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes (assuming 75) into a destination of size 60 sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str); A detailed analysis reveals that the theoretical maximum length of a link name is: max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name = 16 + 1 + 15 + 1 + 16 + 1 + 15 = 65 Since we also need space for a trailing zero we now set MAX_LINK_NAME to 68. Just to be on the safe side we also replace the sprintf() call with snprintf(). Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31tipc: tipc: rename address types in user apiJon Maloy
The three address type structs in the user API have names that in reality reflect the specific, non-Linux environment where they were originally created. We now give them more intuitive names, in accordance with how TIPC is described in the current documentation. struct tipc_portid -> struct tipc_socket_addr struct tipc_name -> struct tipc_service_addr struct tipc_name_seq -> struct tipc_service_range To avoid confusion, we also update some commmets and macro names to match the new terminology. For compatibility, we add macros that map all old names to the new ones. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31tipc: permit overlapping service ranges in name tableJon Maloy
With the new RB tree structure for service ranges it becomes possible to solve an old problem; - we can now allow overlapping service ranges in the table. When inserting a new service range to the tree, we use 'lower' as primary key, and when necessary 'upper' as secondary key. Since there may now be multiple service ranges matching an indicated 'lower' value, we must also add the 'upper' value to the functions used for removing publications, so that the correct, corresponding range item can be found. These changes guarantee that a well-formed publication/withdrawal item from a peer node never will be rejected, and make it possible to eliminate the problematic backlog functionality we currently have for handling such cases. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31tipc: refactor name table translate functionJon Maloy
The function tipc_nametbl_translate() function is ugly and hard to follow. This can be improved somewhat by introducing a stack variable for holding the publication list to be used and re-ordering the if- clauses for selection of algorithm. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31tipc: replace name table service range array with rb treeJon Maloy
The current design of the binding table has an unnecessary memory consuming and complex data structure. It aggregates the service range items into an array, which is expanded by a factor two every time it becomes too small to hold a new item. Furthermore, the arrays never shrink when the number of ranges diminishes. We now replace this array with an RB tree that is holding the range items as tree nodes, each range directly holding a list of bindings. This, along with a few name changes, improves both readability and volume of the code, as well as reducing memory consumption and hopefully improving cache hit rate. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31Merge branch 'bridge-mtu'David S. Miller
Nikolay Aleksandrov says: ==================== net: bridge: MTU handling changes As previously discussed the recent changes break some setups and could lead to packet drops. Thus the first patch reverts the behaviour for the bridge to follow the minimum MTU but also keeps the ability to set the MTU to the maximum (out of all ports) if vlan filtering is enabled. Patch 02 is the bigger change in behaviour - we've always had trouble when configuring bridges and their MTU which is auto tuning on port events (add/del/changemtu), which means config software needs to chase it and fix it after each such event, after patch 02 we allow the user to configure any MTU (ETH_MIN/MAX limited) but once that is done the bridge stops auto tuning and relies on the user to keep the MTU correct. This should be compatible with cases that don't touch the MTU (or set it to the same value), while allowing to configure the MTU and not worry about it changing afterwards. The patches are intentionally split like this, so that if they get accepted and there are any complaints patch 02 can be reverted. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: bridge: disable bridge MTU auto tuning if it was set manuallyNikolay Aleksandrov
As Roopa noted today the biggest source of problems when configuring bridge and ports is that the bridge MTU keeps changing automatically on port events (add/del/changemtu). That leads to inconsistent behaviour and network config software needs to chase the MTU and fix it on each such event. Let's improve on that situation and allow for the user to set any MTU within ETH_MIN/MAX limits, but once manually configured it is the user's responsibility to keep it correct afterwards. In case the MTU isn't manually set - the behaviour reverts to the previous and the bridge follows the minimum MTU. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: bridge: set min MTU on port events and allow user to set maxNikolay Aleksandrov
Recently the bridge was changed to automatically set maximum MTU on port events (add/del/changemtu) when vlan filtering is enabled, but that actually changes behaviour in a way which breaks some setups and can lead to packet drops. In order to still allow that maximum to be set while being compatible, we add the ability for the user to tune the bridge MTU up to the maximum when vlan filtering is enabled, but that has to be done explicitly and all port events (add/del/changemtu) lead to resetting that MTU to the minimum as before. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31Merge branch 'thunderx-DMAC-filtering'David S. Miller
Vadim Lomovtsev says: ==================== net: thunderx: implement DMAC filtering support By default CN88XX BGX accepts all incoming multicast and broadcast packets and filtering is disabled. The nic driver doesn't provide an ability to change such behaviour. This series is to implement DMAC filtering management for CN88XX nic driver allowing user to enable/disable filtering and configure specific MAC addresses to filter traffic. Changes from v1: build issues: - update code in order to address compiler warnings; checkpatch.pl reported issues: - update code in order to fit 80 symbols length; - update commit descriptions in order to fit 80 symbols length; ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add ndo_set_rx_mode callback implementation for VFVadim Lomovtsev
The ndo_set_rx_mode() is called from atomic context which causes messages response timeouts while VF to PF communication via MSIx. To get rid of that we're copy passed mc list, parse flags and queue handling of kernel request to ordered workqueue. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add workqueue control structures for handle ndo_set_rx_mode ↵Vadim Lomovtsev
request The kernel calls ndo_set_rx_mode() callback from atomic context which causes messaging timeouts between VF and PF (as they’re implemented via MSIx). So in order to handle ndo_set_rx_mode() we need to get rid of it. This commit implements necessary workqueue related structures to let VF queue kernel request processing in non-atomic context later. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add XCAST messages handlers for PFVadim Lomovtsev
This commit is to add message handling for ndo_set_rx_mode() callback at PF side. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add new messages for handle ndo_set_rx_mode callbackVadim Lomovtsev
The kernel calls ndo_set_rx_mode() callback supplying it will all necessary info, such as device state flags, multicast mac addresses list and so on. Since we have only 128 bits to communicate with PF we need to initiate several requests to PF with small/short operation each based on input data. So this commit implements following PF messages codes along with new data structures for them: NIC_MBOX_MSG_RESET_XCAST to flush all filters configured for this particular network interface (VF) NIC_MBOX_MSG_ADD_MCAST to add new MAC address to DMAC filter registers for this particular network interface (VF) NIC_MBOX_MSG_SET_XCAST to apply filtering configuration to filter control register Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add multicast filter management supportVadim Lomovtsev
The ThunderX NIC could be partitioned to up to 128 VFs and thus represented to system. Each VF is mapped to pair BGX:LMAC, and each of VF is configured by kernel individually. Eventually the bunch of VFs could be mapped onto same pair BGX:LMAC and thus could cause several multicast filtering configuration requests to LMAC with the same MAC addresses. This commit is to add ThunderX NIC BGX filtering manipulation routines. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: add MAC address filter tracking for LMACVadim Lomovtsev
The ThunderX NIC has two Ethernet Interfaces (BGX) each of them could has up to four Logical MACs configured. Each of BGX has 32 filters to be configured for filtering ingress packets. The number of filters available to particular LMAC is from 8 (if we have four LMACs configured per BGX) up to 32 (in case of only one LMAC is configured per BGX). At the same time the NIC could present up to 128 VFs to OS as network interfaces, each of them kernel will configure with set of MAC addresses for filtering. So to prevent dupes in BGX filter registers from different network interfaces it is required to cache and track all filter configuration requests prior to applying them onto BGX filter registers. This commit is to update LMAC structures with control fields to allocate/releasing filters tracking list along with implementing dmac array allocate/release per LMAC. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: thunderx: move filter register related macro into proper placeVadim Lomovtsev
The ThunderX NIC has set of registers which allows to configure filter policy for ingress packets. There are three possible regimes of filtering multicasts, broadcasts and unicasts: accept all, reject all and accept filter allowed only. Current implementation has enum with all of them and two generic macro for enabling filtering et all (CAM_ACCEPT) and enabling/disabling broadcast packets, which also should be corrected in order to represent register bits properly. All these values are private for driver and there is no need to ‘publish’ them via header file. This commit is to move filtering register manipulation values from header file into source with explicit assignment of exact register values to them to be used while register configuring. Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31Merge branch 'meson8b'David S. Miller
Martin Blumenstingl says: ==================== Meson8m2 support for dwmac-meson8b The Meson8m2 SoC is an updated version of the Meson8 SoC. Some of the peripherals are shared with Meson8b (for example the watchdog registers and the internal temperature sensor calibration procedure). Meson8m2 also seems to include the same Gigabit MAC register layout as Meson8b. The registers in the Amlogic dwmac "glue" seem identical between Meson8b and Meson8m2. Manual testing seems to confirm this. To be extra-safe a new compatible string is added because there's no (public) documentation on the Meson8m2 SoC. This will allow us to implement any SoC-specific variations later on (if needed). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31net: stmmac: dwmac-meson8b: Add support for the Meson8m2 SoCMartin Blumenstingl
The Meson8m2 SoC uses a similar (potentially even identical) register layout as the Meson8b and GXBB SoCs for the dwmac glue. Add a new compatible string and update the module description to indicate support for these SoCs. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31dt-bindings: net: meson-dwmac: add support for the Meson8m2 SoCMartin Blumenstingl
The Meson8m2 SoC uses a similar (potentially even identical) register layout for the dwmac glue as Meson8b and GXBB. Unfortunately there is no documentation available. Testing shows that both, RMII and RGMII PHYs are working if they are configured as on Meson8b. Add a new compatible string to the documentation so differences (if there are any) between Meson8m2 and the other SoCs can be taken care of within the driver. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31PCI/DPC: Rename from pcie-dpc.c to dpc.cBjorn Helgaas
Rename pcie-dpc.c to dpc.c. The path "drivers/pci/pcie/pcie-dpc.c" has more occurrences of "pci" than necessary. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-03-31PCI/IOV: Add missing prototypes for powerpc pcibios interfacesMathieu Malaterre
Add missing prototypes for: resource_size_t pcibios_default_alignment(void); int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs); int pcibios_sriov_disable(struct pci_dev *pdev); This fixes the following warnings treated as errors when using W=1: arch/powerpc/kernel/pci-common.c:236:17: error: no previous prototype for ‘pcibios_default_alignment’ [-Werror=missing-prototypes] arch/powerpc/kernel/pci-common.c:253:5: error: no previous prototype for ‘pcibios_sriov_enable’ [-Werror=missing-prototypes] arch/powerpc/kernel/pci-common.c:261:5: error: no previous prototype for ‘pcibios_sriov_disable’ [-Werror=missing-prototypes] Also, commit 978d2d683123 ("PCI: Add pcibios_iov_resource_alignment() interface") added a new function but the prototype was located in the main header instead of the CONFIG_PCI_IOV specific section. Move this function next to the newly added ones. Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2018-03-31PCI/IOV: Use VF0 cached config registers for other VFsKarimAllah Ahmed
Cache some config data from VF0 and use it for all other VFs instead of reading it from the config space of each VF. We assume these items are the same across all associated VFs: Revision ID Class Code Subsystem Vendor ID Subsystem ID This is an optimization when enabling SR-IOV on a device with many VFs. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> [bhelgaas: changelog, simplify comments, remove unused "device", test CONFIG_PCI_IOV instead of CONFIG_PCI_ATS, rename functions] Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2018-03-31Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two fixlets" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/hwbp: Simplify the perf-hwbp code, fix documentation perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs
2018-03-31Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two UV platform fixes, and a kbuild fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/UV: Fix critical UV MMR address error x86/platform/uv/BAU: Add APIC idt entry x86/purgatory: Avoid creating stray .<pid>.d files, remove -MD from KBUILD_CFLAGS
2018-03-31Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PTI fixes from Ingo Molnar: "Two fixes: a relatively simple objtool fix that makes Clang built kernels work with ORC debug info, plus an alternatives macro fix" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Fixup alternative_call_2 objtool: Add Clang support
2018-04-01powerpc/64s: Remove POWER4 supportNicholas Piggin
POWER4 has been broken since at least the change 49d09bf2a6 ("powerpc/64s: Optimise MSR handling in exception handling"), which requires mtmsrd L=1 support. This was introduced in ISA v2.01, and POWER4 supports ISA v2.00. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc: Remove unused CPU_FTR_ARCH_201Nicholas Piggin
The last usage was removed in c17b98cf6028 ("KVM: PPC: Book3S HV: Remove code for PPC970 processors") (Dec 2014). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU featuresNicholas Piggin
The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and above (which is what the cputable setup does). Fix DT CPU features quirk setup to match. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Merge with upstream changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Set assembler machine type to POWER4Nicholas Piggin
Rather than override the machine type in .S code (which can hide wrong or ambiguous code generation for the target), set the type to power4 for all assembly. This also means we need to be careful not to build power4-only code when we're not building for Book3S, such as the "power7" versions of copyuser/page/memcpy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Explicitly add vector features to CPU_FTRS_POSSIBLENicholas Piggin
ALTIVEC and VSX features are not added by to default to the POWERx CPU feature sets because they are intended to be enabled by firmware. Currently they end up in CPU_FTRS_POSSIBLE due to their inclusion in other the set for other CPUs, eg. PPC970. But they should be added individually to the CPU_FTRS_POSSIBLE set, because if we reduce the set of CPUs that are built-for they may disappear from the possible mask. It already contains CPU_FTR_VSX, so add ALTIVEC. The _COMP features should be used because they won't be present if compiled out. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add detail to change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Add all POWER9 features to CPU_FTRS_ALWAYSNicholas Piggin
It's not a bug to have features missing in CPU_FTR_ALWAYS, but it is a missed opportunity for optimisation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/boot: Remove duplicate typedefs from libfdt_env.hMark Greer
When building a uImage or zImage using ppc6xx_defconfig and some other defconfigs, the following error occurs with GCC 4.5.1: /arch/powerpc/boot/libfdt_env.h:10:13: error: redefinition of typedef 'uint32_t' /arch/powerpc/boot/types.h:21:13: note: previous declaration of 'uint32_t' was here /arch/powerpc/boot/libfdt_env.h:11:13: error: redefinition of typedef 'uint64_t' /arch/powerpc/boot/types.h:22:13: note: previous declaration of 'uint64_t' was here The problem is that commit 656ad58ef19e (powerpc/boot: Add OPAL console to epapr wrappers) adds typedefs for uint32_t and uint64_t to type.h but doesn't remove the pre-existing (and now duplicate) typedefs from libfdt_env.h. Fix the error by removing the duplicate typedefs from libfdt_env.h Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s/idle: avoid sync for KVM state when waking from idleNicholas Piggin
When waking from a CPU idle instruction (e.g., nap or stop), the sync for ordering the KVM secondary thread state can be avoided if there wakeup is coming from a kernel context rather than KVM context. This improves performance for ping-pong benchmark with the stop0 idle state by 0.46% for 2 threads in the same core, and 1.02% for different cores. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplugNicholas Piggin
Implement a new function to invoke stop, power9_offline_stop, which is like power9_idle_stop but used by the cpu hotplug code. Move KVM secondary state manipulation code to the offline case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: sreset panic if there is no debugger or crash dump handlersNicholas Piggin
system_reset_exception does most of its own crash handling now, invoking the debugger or crash dumps if they are registered. If not, then it goes through to die() to print stack traces, and then is supposed to panic (according to comments). However after die() prints oopses, it does its own handling which doesn't allow system_reset_exception to panic (e.g., it may just kill the current process). This patch causes sreset exceptions to return from die after it prints messages but before acting. This also stops die from invoking the debugger on 0x100 crashes. system_reset_exception similarly calls the debugger. It had been thought this was harmless (because if the debugger was disabled, neither call would fire, and if it was enabled the first call would return). However in some cases like xmon 'X' command, the debugger returns 0, which currently causes it to be entered again (first in system_reset_exception, then in die), which is confusing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: return more carefully from sreset NMINicholas Piggin
System Reset, being an NMI, must return more carefully than other interrupts. It has traditionally returned via the nromal return from exception path, but that has a number of problems. - r13 does not get restored if returning to kernel. This is for interrupts which may cause a context switch, which sreset will never do. Interrupting OPAL (which uses a different r13) is one place where this causes breakage. - It may cause several other problems returning to kernel with preempt or TIF_EMULATE_STACK_STORE if it hits at the wrong time. It's safer just to have a simple restore and return, like machine check which is the other NMI. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/eeh: Fix race with driver un/bindMichael Neuling
The current EEH callbacks can race with a driver unbind. This can result in a backtraces like this: EEH: Frozen PHB#0-PE#1fc detected EEH: PE location: S000009, PHB location: N/A CPU: 2 PID: 2312 Comm: kworker/u258:3 Not tainted 4.15.6-openpower1 #2 Workqueue: nvme-wq nvme_reset_work [nvme] Call Trace: dump_stack+0x9c/0xd0 (unreliable) eeh_dev_check_failure+0x420/0x470 eeh_check_failure+0xa0/0xa4 nvme_reset_work+0x138/0x1414 [nvme] process_one_work+0x1ec/0x328 worker_thread+0x2e4/0x3a8 kthread+0x14c/0x154 ret_from_kernel_thread+0x5c/0xc8 nvme nvme1: Removing after probe failure status: -19 <snip> cpu 0x23: Vector: 300 (Data Access) at [c000000ff50f3800] pc: c0080000089a0eb0: nvme_error_detected+0x4c/0x90 [nvme] lr: c000000000026564: eeh_report_error+0xe0/0x110 sp: c000000ff50f3a80 msr: 9000000000009033 dar: 400 dsisr: 40000000 current = 0xc000000ff507c000 paca = 0xc00000000fdc9d80 softe: 0 irq_happened: 0x01 pid = 782, comm = eehd Linux version 4.15.6-openpower1 (smc@smc-desktop) (gcc version 6.4.0 (Buildroot 2017.11.2-00008-g4b6188e)) #2 SM P Tue Feb 27 12:33:27 PST 2018 enter ? for help eeh_report_error+0xe0/0x110 eeh_pe_dev_traverse+0xc0/0xdc eeh_handle_normal_event+0x184/0x4c4 eeh_handle_event+0x30/0x288 eeh_event_handler+0x124/0x170 kthread+0x14c/0x154 ret_from_kernel_thread+0x5c/0xc8 The first part is an EEH (on boot), the second half is the resulting crash. nvme probe starts the nvme_reset_work() worker thread. This worker thread starts touching the device which see a device error (EEH) and hence queues up an event in the powerpc EEH worker thread. nvme_reset_work() then continues and runs nvme_remove_dead_ctrl_work() which results in unbinding the driver from the device and hence releases all resources. At the same time, the EEH worker thread starts doing the EEH .error_detected() driver callback, which no longer works since the resources have been freed. This fixes the problem in the same way the generic PCIe AER code (in drivers/pci/pcie/aer/aerdrv_core.c) does. It makes the EEH code hold the device_lock() while performing the driver EEH callbacks and associated code. This ensures either the callbacks are no longer register, or if they are registered the driver will not be removed from underneath us. This has been broken forever. The EEH call backs were first introduced in 2005 (in 77bd7415610) but it's not clear if a lock was needed back then. Fixes: 77bd74156101 ("[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines") Cc: stable@vger.kernel.org # v2.6.16+ Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/kexec_file: Fix error code when trying to load kdump kernelThiago Jung Bauermann
kexec_file_load() on powerpc doesn't support kdump kernels yet, so it returns -ENOTSUPP in that case. I've recently learned that this errno is internal to the kernel and isn't supposed to be exposed to userspace. Therefore, change to -EOPNOTSUPP which is defined in an uapi header. This does indeed make kexec-tools happier. Before the patch, on ppc64le: # ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz kexec_file_load failed: Unknown error 524 After the patch: # ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz kexec_file_load failed: Operation not supported Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") Cc: stable@vger.kernel.org # v4.10+ Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Reviewed-by: Simon Horman <horms@verge.net.au> Reviewed-by: Dave Young <dyoung@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/mm/32: Remove the reserved memory hackJonathan Neuschäfer
This hack, introduced in commit c5df7f775148 ("powerpc: allow ioremap within reserved memory regions") is now unnecessary. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/wii: Don't rely on the reserved memory hackJonathan Neuschäfer
Because the two memory blocks (usually called MEM1 and MEM2) are not merged anymore, __request_region in kernel/resource.c will correctly allow reserving regions in the physical address space between MEM1 and MEM2, where many important peripherals are (GPIO, MMC, USB, ...). A previous change to __ioremap_caller in arch/powerpc/mm/pgtable_32.c ensures that multiple memblocks are properly considered in ioremap; this makes it unnecessary to set __allow_ioremap_reserved. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/mm/32: Use page_is_ram to check for RAMJonathan Neuschäfer
On systems where there is MMIO space between different blocks of RAM in the physical address space, __ioremap_caller did not allow mapping these MMIO areas, because they were below the end RAM and thus considered RAM as well. Use the memblock-based page_is_ram function, which returns false for such MMIO holes. v2: Keep the check for p < virt_to_phys(high_memory). On 32-bit systems with high memory (memory above physical address 4GiB), the high memory is expected to be available though ioremap. The high_memory variable marks the end of low memory; comparing against it means that only ioremap requests for low RAM will be denied. Reported by Michael Ellerman. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/mm: Use memblock API for PPC32 page_is_ramJonathan Neuschäfer
To support accurate checking for different blocks of memory on PPC32, use the same memblock-based approach that's already used on PPC64 also on PPC32. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/mm: Simplify page_is_ram by using memblock_is_memoryJonathan Neuschäfer
Instead of open-coding the search in page_is_ram, call memblock_is_memory. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/wii.dts: Add drive slot LEDJonathan Neuschäfer
The Wii has a blue LED in the disk drive slot, which is controlled via a GPIO line. Add this LED to wii.dts, and mark it as a panic-indicator. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/wii.dts: Add GPIO line namesJonathan Neuschäfer
These are the GPIO line names on a Nintendo Wii, as documented in: https://wiibrew.org/wiki/Hardware/Hollywood_GPIOs Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/wii.dts: Add ngpios propertyJonathan Neuschäfer
The Hollywood GPIO controller supports 32 GPIOs, but on the Wii, only 24 are used. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/wii: Explicitly configure GPIO owner for poweroff pinJonathan Neuschäfer
The Hollywood chipset's GPIO controller has two sets of registers: One for access by the PowerPC CPU, and one for access by the ARM coprocessor (but both are accessible from the PPC because the memory firewall (AHBPROT) is usually disabled when booting Linux, today). The wii_power_off function currently assumes that the poweroff GPIO pin is configured for use via the ARM side, but the upcoming GPIO driver configures all pins for use via the PPC side, breaking poweroff. Configure the owner register explicitly in wii_power_off to make wii_power_off work with and without the new GPIO driver. I think the Wii can be switched to the generic gpio-poweroff driver, after the GPIO driver is merged. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/wii: Probe the whole devicetreeJonathan Neuschäfer
Previously, wii_device_probe would only initialize devices under the /hollywood node. After this patch, platform devices placed outside of /hollywood will also be initialized. The intended usecase for this are devices located outside of the Hollywood chip, such as GPIO LEDs and GPIO buttons. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64e: Fix oops due to deferral of paca allocationMichael Ellerman
On 64-bit Book3E systems, in setup_tlb_core_data() we reference other CPUs pacas. But in commit 59f577743d71 ("powerpc/64: Defer paca allocation until memory topology is discovered") the allocation of non-boot-CPU pacas was deferred until later in boot. This leads to an oops: CPU maps initialized for 1 thread per core Unable to handle kernel paging request for data at address 0x8888888888888918 Faulting instruction address: 0xc000000000e2f0d0 Oops: Kernel access of bad area, sig: 11 [#1] NIP .setup_tlb_core_data+0xdc/0x160 Call Trace: .setup_tlb_core_data+0x5c/0x160 (unreliable) .setup_arch+0x80/0x348 .start_kernel+0x7c/0x598 start_here_common+0x1c/0x40 Luckily setup_tlb_core_data() is called immediately prior to smp_setup_pacas(). So simply switching their order is sufficient to fix the oops and seems unlikely to have any other unwanted side effects. Fixes: 59f577743d71 ("powerpc/64: Defer paca allocation until memory topology is discovered") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>