summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-01-23x86/boot: Simplify calculation of output addressArvind Sankar
Condense the calculation of decompressed kernel start a little. Committer notes: before: ebp = ebx - (init_size - _end) after: eax = (ebx + _end) - init_size where in both ebx contains the temporary address the kernel is moved to for in-place decompression. The before and after difference in register state is %eax and %ebp but that is immaterial because the compressed image is not built with -mregparm, i.e., all arguments of the following extract_kernel() call are passed on the stack. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200107194436.2166846-1-nivedita@alum.mit.edu
2020-01-23tun: add mutex_unlock() call and napi.skb clearing in tun_get_user()Eric Dumazet
If both IFF_NAPI_FRAGS mode and XDP are enabled, and the XDP program consumes the skb, we need to clear the napi.skb (or risk a use-after-free) and release the mutex (or risk a deadlock) WARNING: lock held when returning to user space! 5.5.0-rc6-syzkaller #0 Not tainted ------------------------------------------------ syz-executor.0/455 is leaving the kernel with locks still held! 1 lock held by syz-executor.0/455: #0: ffff888098f6e748 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x1604/0x3fc0 drivers/net/tun.c:1835 Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Petar Penkov <ppenkov@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23Merge branch 'net-sched-add-Flow-Queue-PIE-packet-scheduler'David S. Miller
Gautam Ramakrishnan says: ==================== net: sched: add Flow Queue PIE packet scheduler Flow Queue PIE packet scheduler This patch series implements the Flow Queue Proportional Integral controller Enhanced (FQ-PIE) active queue Management algorithm. It is an enhancement over the PIE algorithm. It integrates the PIE aqm with a deficit round robin scheme. FQ-PIE is implemented over the latest version of PIE which uses timestamps to calculate queue delay with an additional option of using average dequeue rate to calculate the queue delay. This patch also adds a memory limit of all the packets across all queues to a default value of 32Mb. - Patch #1 - Creates pie.h and moves all small functions and structures common to PIE and FQ-PIE here. The functions are all made inline. - Patch #2 - #8 - Addresses code formatting, indentation, comment changes and rearrangement of structure members. - Patch #9 - Refactors sch_pie.c by changing arguments to calculate_probability(), [pie_]drop_early() and pie_process_dequeue() to make it generic enough to be used by sch_fq_pie.c. These functions are exported to be used by sch_fq_pie.c. - Patch #10 - Adds the FQ-PIE Qdisc. For more information: https://tools.ietf.org/html/rfc8033 Changes from v6 to v7 - Call tcf_block_put() when destroying the Qdisc as suggested by Jakub Kicinski. Changes from v5 to v6 - Rearranged struct members according to their access pattern and to remove holes. Changes from v4 to v5 - This patch series breaks down patch 1 of v4 into separate logical commits as suggested by David Miller. Changes from v3 to v4 - Used non deprecated version of nla_parse_nested - Used SZ_32M macro - Removed an unused variable - Code cleanup All suggested by Jakub and Toke. Changes from v2 to v3 - Exported drop_early, pie_process_dequeue and calculate_probability functions from sch_pie as suggested by Stephen Hemminger. Changes from v1 ( and RFC patch) to v2 - Added timestamp to calculate queue delay as recommended by Dave Taht - Packet memory limit implemented as recommended by Toke. - Added external classifier as recommended by Toke. - Used NET_XMIT_CN instead of NET_XMIT_DROP as the return value in the fq_pie_qdisc_enqueue function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: add Flow Queue PIE packet schedulerMohit P. Tahiliani
Principles: - Packets are classified on flows. - This is a Stochastic model (as we use a hash, several flows might be hashed to the same slot) - Each flow has a PIE managed queue. - Flows are linked onto two (Round Robin) lists, so that new flows have priority on old ones. - For a given flow, packets are not reordered. - Drops during enqueue only. - ECN capability is off by default. - ECN threshold (if ECN is enabled) is at 10% by default. - Uses timestamps to calculate queue delay by default. Usage: tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ] [ target TIME ] [ tupdate TIME ] [ alpha NUMBER ] [ beta NUMBER ] [ quantum BYTES ] [ memory_limit BYTES ] [ ecnprob PERCENTAGE ] [ [no]ecn ] [ [no]bytemode ] [ [no_]dq_rate_estimator ] defaults: limit: 10240 packets, flows: 1024 target: 15 ms, tupdate: 15 ms (in jiffies) alpha: 1/8, beta : 5/4 quantum: device MTU, memory_limit: 32 Mb ecnprob: 10%, ecn: off bytemode: off, dq_rate_estimator: off Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: V. Saicharan <vsaicharan1998@gmail.com> Signed-off-by: Mohit Bhasi <mohitbhasi1998@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: export symbols to be reused by FQ-PIEMohit P. Tahiliani
This patch makes the drop_early(), calculate_probability() and pie_process_dequeue() functions generic enough to be used by both PIE and FQ-PIE (to be added in a future commit). The major change here is in the way the functions take in arguments. This patch exports these functions and makes FQ-PIE dependent on sch_pie. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: fix alignment in struct instancesMohit P. Tahiliani
Make the alignment in the initialization of the struct instances consistent in the file. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: fix commentingMohit P. Tahiliani
Fix punctuation and logical mistakes in the comments. The logical mistake was that "dequeue_rate" is no longer the default way to calculate queuing delay and is not needed. The default way to calculate queue delay was changed in commit cec2975f2b70 ("net: sched: pie: enable timestamp based delay calculation"). Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: improve comments and commenting styleMohit P. Tahiliani
Improve the comments along with the commenting style used to describe the members of the structures and their initial values in the init functions. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: rearrange structure members and their initializationsMohit P. Tahiliani
Rearrange the members of the structure such that closely referenced members appear together and/or fit in the same cacheline. Also, change the order of their initializations to match the order in which they appear in the structure. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: use u8 instead of bool in pie_varsMohit P. Tahiliani
Linux best practice recommends using u8 for true/false values in structures. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: rearrange macros in order of lengthMohit P. Tahiliani
Rearrange macros in order of length and align the values to improve readability. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23pie: use U64_MAX to denote (2^64 - 1)Mohit P. Tahiliani
Use the U64_MAX macro to denote the constant (2^64 - 1). Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23net: sched: pie: move common code to pie.hMohit P. Tahiliani
This patch moves macros, structures and small functions common to PIE and FQ-PIE (to be added in a future commit) from the file net/sched/sch_pie.c to the header file include/net/pie.h. All the moved functions are made inline. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23KVM: arm: Make inject_abt32() inject an external abort insteadJames Morse
KVM's inject_abt64() injects an external-abort into an aarch64 guest. The KVM_CAP_ARM_INJECT_EXT_DABT is intended to do exactly this, but for an aarch32 guest inject_abt32() injects an implementation-defined exception, 'Lockdown fault'. Change this to external abort. For non-LPAE we now get the documented: | Unhandled fault: external abort on non-linefetch (0x008) at 0x9c800f00 and for LPAE: | Unhandled fault: synchronous external abort (0x210) at 0x9c800f00 Fixes: 74a64a981662a ("KVM: arm/arm64: Unify 32bit fault injection") Reported-by: Beata Michalska <beata.michalska@linaro.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200121123356.203000-3-james.morse@arm.com
2020-01-23KVM: arm: Fix DFSR setting for non-LPAE aarch32 guestsJames Morse
Beata reports that KVM_SET_VCPU_EVENTS doesn't inject the expected exception to a non-LPAE aarch32 guest. The host intends to inject DFSR.FS=0x14 "IMPLEMENTATION DEFINED fault (Lockdown fault)", but the guest receives DFSR.FS=0x04 "Fault on instruction cache maintenance". This fault is hooked by do_translation_fault() since ARMv6, which goes on to silently 'handle' the exception, and restart the faulting instruction. It turns out, when TTBCR.EAE is clear DFSR is split, and FS[4] has to shuffle up to DFSR[10]. As KVM only does this in one place, fix up the static values. We now get the expected: | Unhandled fault: lock abort (0x404) at 0x9c800f00 Fixes: 74a64a981662a ("KVM: arm/arm64: Unify 32bit fault injection") Reported-by: Beata Michalska <beata.michalska@linaro.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200121123356.203000-2-james.morse@arm.com
2020-01-23KVM: arm/arm64: Fix young bit from mmu notifierGavin Shan
kvm_test_age_hva() is called upon mmu_notifier_test_young(), but wrong address range has been passed to handle_hva_to_gpa(). With the wrong address range, no young bits will be checked in handle_hva_to_gpa(). It means zero is always returned from mmu_notifier_test_young(). This fixes the issue by passing correct address range to the underly function handle_hva_to_gpa(), so that the hardware young (access) bit will be visited. Fixes: 35307b9a5f7e ("arm/arm64: KVM: Implement Stage-2 page aging") Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200121055659.19560-1-gshan@redhat.com
2020-01-23arm64: KVM: Annotate guest entry/exit as a single functionMark Brown
In an effort to clarify and simplify the annotations of assembly functions in the kernel new macros have been introduced replacing ENTRY and ENDPROC. There are separate annotations SYM_FUNC_ for normal C functions and SYM_CODE_ for other code. Currently __guest_enter and __guest_exit are annotated as standard functions but this is not entirely correct as the former doesn't do a normal return and the latter is not entered in a normal fashion. From the point of view of the hypervisor the guest entry/exit may be viewed as a single function which happens to have an eret in the middle of it so let's annotate it as such. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200120124706.8681-1-broonie@kernel.org
2020-01-23arm64: KVM: Add UAPI notes for swapped registersAndrew Jones
Two UAPI system register IDs do not derive their values from the ARM system register encodings. This is because their values were accidentally swapped. As the IDs are API, they cannot be changed. Add WARNING notes to point them out. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Andrew Jones <drjones@redhat.com> [maz: turned XXX into WARNING] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200120130825.28838-1-drjones@redhat.com
2020-01-23KVM: arm/arm64: Cleanup MMIO handlingMarc Zyngier
Our MMIO handling is a bit odd, in the sense that it uses an intermediate per-vcpu structure to store the various decoded information that describe the access. But the same information is readily available in the HSR/ESR_EL2 field, and we actually use this field to populate the structure. Let's simplify the whole thing by getting rid of the superfluous structure and save a (tiny) bit of space in the vcpu structure. [32bit fix courtesy of Olof Johansson <olof@lixom.net>] Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-01-23mlxsw: spectrum_acl: Fix use-after-free during reloadIdo Schimmel
During reload (or module unload), the router block is de-initialized. Among other things, this results in the removal of a default multicast route from each active virtual router (VRF). These default routes are configured during initialization to trap packets to the CPU. In Spectrum-2, unlike Spectrum-1, multicast routes are implemented using ACL rules. Since the router block is de-initialized before the ACL block, it is possible that the ACL rules corresponding to the default routes are deleted while being accessed by the ACL delayed work that queries rules' activity from the device. This can result in a rare use-after-free [1]. Fix this by protecting the rules list accessed by the delayed work with a lock. We cannot use a spinlock as the activity read operation is blocking. [1] [ 123.331662] ================================================================== [ 123.339920] BUG: KASAN: use-after-free in mlxsw_sp_acl_rule_activity_update_work+0x330/0x3b0 [ 123.349381] Read of size 8 at addr ffff8881f3bb4520 by task kworker/0:2/78 [ 123.357080] [ 123.358773] CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 5.5.0-rc5-custom-33108-gf5df95d3ef41 #2209 [ 123.368898] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018 [ 123.378456] Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work [ 123.385970] Call Trace: [ 123.388734] dump_stack+0xc6/0x11e [ 123.392568] print_address_description.constprop.4+0x21/0x340 [ 123.403236] __kasan_report.cold.8+0x76/0xb1 [ 123.414884] kasan_report+0xe/0x20 [ 123.418716] mlxsw_sp_acl_rule_activity_update_work+0x330/0x3b0 [ 123.444034] process_one_work+0xb06/0x19a0 [ 123.453731] worker_thread+0x91/0xe90 [ 123.467348] kthread+0x348/0x410 [ 123.476847] ret_from_fork+0x24/0x30 [ 123.480863] [ 123.482545] Allocated by task 73: [ 123.486273] save_stack+0x19/0x80 [ 123.490000] __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 123.495379] mlxsw_sp_acl_rule_create+0xa7/0x230 [ 123.500566] mlxsw_sp2_mr_tcam_route_create+0xf6/0x3e0 [ 123.506334] mlxsw_sp_mr_tcam_route_create+0x5b4/0x820 [ 123.512102] mlxsw_sp_mr_table_create+0x3b5/0x690 [ 123.517389] mlxsw_sp_vr_get+0x289/0x4d0 [ 123.521797] mlxsw_sp_fib_node_get+0xa2/0x990 [ 123.526692] mlxsw_sp_router_fib4_event_work+0x54c/0x2d60 [ 123.532752] process_one_work+0xb06/0x19a0 [ 123.537352] worker_thread+0x91/0xe90 [ 123.541471] kthread+0x348/0x410 [ 123.545103] ret_from_fork+0x24/0x30 [ 123.549113] [ 123.550795] Freed by task 518: [ 123.554231] save_stack+0x19/0x80 [ 123.557958] __kasan_slab_free+0x125/0x170 [ 123.562556] kfree+0xd7/0x3a0 [ 123.565895] mlxsw_sp_acl_rule_destroy+0x63/0xd0 [ 123.571081] mlxsw_sp2_mr_tcam_route_destroy+0xd5/0x130 [ 123.576946] mlxsw_sp_mr_tcam_route_destroy+0xba/0x260 [ 123.582714] mlxsw_sp_mr_table_destroy+0x1ab/0x290 [ 123.588091] mlxsw_sp_vr_put+0x1db/0x350 [ 123.592496] mlxsw_sp_fib_node_put+0x298/0x4c0 [ 123.597486] mlxsw_sp_vr_fib_flush+0x15b/0x360 [ 123.602476] mlxsw_sp_router_fib_flush+0xba/0x470 [ 123.607756] mlxsw_sp_vrs_fini+0xaa/0x120 [ 123.612260] mlxsw_sp_router_fini+0x137/0x384 [ 123.617152] mlxsw_sp_fini+0x30a/0x4a0 [ 123.621374] mlxsw_core_bus_device_unregister+0x159/0x600 [ 123.627435] mlxsw_devlink_core_bus_device_reload_down+0x7e/0xb0 [ 123.634176] devlink_reload+0xb4/0x380 [ 123.638391] devlink_nl_cmd_reload+0x610/0x700 [ 123.643382] genl_rcv_msg+0x6a8/0xdc0 [ 123.647497] netlink_rcv_skb+0x134/0x3a0 [ 123.651904] genl_rcv+0x29/0x40 [ 123.655436] netlink_unicast+0x4d4/0x700 [ 123.659843] netlink_sendmsg+0x7c0/0xc70 [ 123.664251] __sys_sendto+0x265/0x3c0 [ 123.668367] __x64_sys_sendto+0xe2/0x1b0 [ 123.672773] do_syscall_64+0xa0/0x530 [ 123.676892] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 123.682552] [ 123.684238] The buggy address belongs to the object at ffff8881f3bb4500 [ 123.684238] which belongs to the cache kmalloc-128 of size 128 [ 123.698261] The buggy address is located 32 bytes inside of [ 123.698261] 128-byte region [ffff8881f3bb4500, ffff8881f3bb4580) [ 123.711303] The buggy address belongs to the page: [ 123.716682] page:ffffea0007ceed00 refcount:1 mapcount:0 mapping:ffff888236403500 index:0x0 [ 123.725958] raw: 0200000000000200 dead000000000100 dead000000000122 ffff888236403500 [ 123.734646] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 [ 123.743315] page dumped because: kasan: bad access detected [ 123.749562] [ 123.751241] Memory state around the buggy address: [ 123.756620] ffff8881f3bb4400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 123.764716] ffff8881f3bb4480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 123.772812] >ffff8881f3bb4500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 123.780904] ^ [ 123.785697] ffff8881f3bb4580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 123.793793] ffff8881f3bb4600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 123.801883] ================================================================== Fixes: cf7221a4f5a5 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-23powerpc/xive: Drop extern qualifiers from header function prototypesGreg Kurz
As reported by ./scripts/checkpatch.pl --strict: CHECK: extern prototypes should be avoided in .h files Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/157384145834.181768.944827793193636924.stgit@bahia.lan
2020-01-23KVM: PPC: Book3S HV: XIVE: Fix typo in commentGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156219139988.578018.1046848908285019838.stgit@bahia.lan
2020-01-23macintosh: Fix Kconfig indentationKrzysztof Kozlowski
Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191120134115.14918-1-krzk@kernel.org
2020-01-23MAINTAINERS: Add myself as maintainer of ehv_bytechan tty driverLaurentiu Tudor
Michael Ellerman made a call for volunteers from NXP to maintain this driver and I offered myself. Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Acked-by: Timur Tabi <timur@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200114110012.17351-1-laurentiu.tudor@nxp.com
2020-01-23powernv/pci: Move pnv_pci_dma_bus_setup() to pci-ioda.cOliver O'Halloran
This is only used in pci-ioda.c so move it there and rename it to match. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-6-oohall@gmail.com
2020-01-23powernv/pci: Fold pnv_pci_dma_dev_setup() into the pci-ioda.c versionOliver O'Halloran
pnv_pci_dma_dev_setup() does nothing but call the phb->dma_dev_setup() callback, if one exists. That callback is only set for normal PCIe PHBs so we can remove the layer of indirection and use the ioda version in the pci_controller_ops. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-5-oohall@gmail.com
2020-01-23powerpc/iov: Move VF pdev fixup into pcibios_fixup_iov()Oliver O'Halloran
An ioda_pe for each VF is allocated in pnv_pci_sriov_enable() before the pci_dev for the VF is created. We need to set the pe->pdev pointer at some point after the pci_dev is created. Currently we do that in: pcibios_bus_add_device() pnv_pci_dma_dev_setup() (via phb->ops.dma_dev_setup) /* fixup is done here */ pnv_pci_ioda_dma_dev_setup() (via pnv_phb->dma_dev_setup) The fixup needs to be done before setting up DMA for for the VF's PE, but there's no real reason to delay it until this point. Move the fixup into pnv_pci_ioda_fixup_iov() so the ordering is: pcibios_add_device() pnv_pci_ioda_fixup_iov() (via ppc_md.pcibios_fixup_sriov) pcibios_bus_add_device() ... This isn't strictly required, but it's a slightly more logical place to do the fixup and it simplifies pnv_pci_dma_dev_setup(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-4-oohall@gmail.com
2020-01-23powernv/pci: Remove dma_dev_setup() for NPU PHBsOliver O'Halloran
The pnv_pci_dma_dev_setup() only does something when: 1) There PHB contains VFs, or 2) The PHB defines a dma_dev_setup() callback in the pnv_phb structure. Neither is true for NPU PHBs so there's no reason to set the callback. Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-3-oohall@gmail.com
2020-01-23powerpc/pci: Fold pcibios_setup_device() into pcibios_bus_add_device()Oliver O'Halloran
pcibios_bus_add_device() is the only caller of pcibios_setup_device(). Fold them together since there's no real reason to keep them separate. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200110070207.439-2-oohall@gmail.com
2020-01-23powerpc/powernv: Allow manually invoking special rebootsOliver O'Halloran
OPAL provides several different kinds of reboot for the kernel to use, namely forcing a full reboot, platform error reboot and MPIPL. Right now triggering the alternative resets requires some ad-hoc method such as triggering a kernel crash and hoping the stars align. It's sometimes handy to be able to trigger one of these resets directly, so add a way to do that. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191101085522.3055-2-oohall@gmail.com
2020-01-23powerpc/xmon: Allow passing an argument to ppc_md.restart()Oliver O'Halloran
On PowerNV a few different kinds of reboot are supported. We'd like to be able to exercise these from xmon so allow 'zr' to take an argument, and pass that to the ppc_md.restart() function. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191101085522.3055-1-oohall@gmail.com
2020-01-23powerpc/powernv: Use common code for the symbol_map exportOliver O'Halloran
Long before we had a generic way for firmware to export memory ranges of interest we added a special case for the skiboot symbol map. The code is pretty much identical to the generic export so re-use the code. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191101062611.32610-2-oohall@gmail.com
2020-01-23powerpc/powernv: Rework exports to support subnodesOliver O'Halloran
Originally we only had a handful of exported memory ranges, but we'd to export the per-core trace buffers. This results in a lot of files in the exports directory which is a but unfortunate. We can clean things up a bit by turning subnodes into subdirectories of the exports directory. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191101062611.32610-1-oohall@gmail.com
2020-01-23powerpc/eeh: Only dump stack once if an MMIO loop is detectedOliver O'Halloran
Many drivers don't check for errors when they get a 0xFFs response from an MMIO load. As a result after an EEH event occurs a driver can get stuck in a polling loop unless it some kind of internal timeout logic. Currently EEH tries to detect and report stuck drivers by dumping a stack trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump would occur every few seconds if the driver was spinning in a loop. This results in a lot of spurious stack traces in the kernel log. Fix this by limiting it to printing one stack trace for each PE freeze. If the driver is truely stuck the kernel's hung task detector is better suited to reporting the probelm anyway. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
2020-01-23powernv/pci: Add a debugfs entry to dump PHB's IODA PE stateOliver O'Halloran
Add a debugfs entry to dump the state of the active IODA PEs. The IODA PE state reflects how the PHB's internal concept of a PE is configured. This is separate to the EEH PE state and is managed power the PowerNV PCI backend rather than the EEH core. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> [mpe: Use DEFINE_DEBUGFS_ATTRIBUTE] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190912052945.12589-3-oohall@gmail.com
2020-01-23powernv/pci: Allow any write trigger the diag dumpOliver O'Halloran
Make the dump trigger off any input rather than just '1'. This allows you to write "echo 1> dump_diag_data" and it'll do what you want rather than erroring out pointlessly. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190912052945.12589-2-oohall@gmail.com
2020-01-23powernv/pci: Use pnv_phb as the private data for debugfs entriesOliver O'Halloran
Use the pnv_phb structure as the private data pointer for the debugfs files. This lets us delete some code and an open-coded use of hose->private_data. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190912052945.12589-1-oohall@gmail.com
2020-01-23powerpc/pcidn: Warn when sriov pci_dn management is used incorrectlyOliver O'Halloran
These functions can only be used on a SR-IOV capable physical function and they're only called in pcibios_sriov_enable / disable. Make them emit a warning in the future if they're used incorrectly and remove the dead code that checks if the device is a VF. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-3-oohall@gmail.com
2020-01-23powerpc/pcidn: Make VF pci_dn management CONFIG_PCI_IOV specificOliver O'Halloran
The powerpc PCI code requires that a pci_dn structure exists for all devices in the system. This is fine for real devices since at boot a pci_dn is created for each PCI device in the DT and it's fine for hotplugged devices since the hotplug slot driver will manage the pci_dn's devices in hotplug slots. For SR-IOV, we need the platform / pcibios to manage the pci_dn for virtual functions since firmware is unaware of VFs, and they aren't "hot plugged" in the traditional sense. Management of the pci_dn is handled by the, poorly named, functions: add_pci_dev_data() and remove_pci_dev_data(). The entire body of these functions is #ifdef`ed around CONFIG_PCI_IOV and they cannot be used in any other context, so make them only available when CONFIG_PCI_IOV is selected, and rename them to reflect their actual usage rather than having them masquerade as generic code. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-2-oohall@gmail.com
2020-01-23powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOVOliver O'Halloran
When disabling virtual functions on an SR-IOV adapter we currently do not correctly remove the EEH state for the now-dead virtual functions. When removing the pci_dn that was created for the VF when SR-IOV was enabled we free the corresponding eeh_dev without removing it from the child device list of the eeh_pe that contained it. This can result in crashes due to the use-after-free. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190821062655.19735-1-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Make clearing EEH_DEV_SYSFS sanerOliver O'Halloran
The eeh_sysfs_remove_device() function is supposed to clear the EEH_DEV_SYSFS flag since it indicates the EEH sysfs entries have been added for a pci_dev. When the sysfs files are removed eeh_remove_device() the eeh_dev and the pci_dev have already been de-associated. This then causes the pci_dev_to_eeh_dev() call in eeh_sysfs_remove_device() to return NULL so the flag can't be cleared from the still-live eeh_dev. This problem is worked around in the caller by clearing the flag manually. However, this behaviour doesn't make a whole lot of sense, so this patch fixes it by: a) Re-ordering eeh_remove_device() so that eeh_sysfs_remove_device() is called before de-associating the pci_dev and eeh_dev. b) Making eeh_sysfs_remove_device() emit a warning if there's no corresponding eeh_dev for a pci_dev. The paths where the sysfs files are only reachable if EEH was setup for the device for the device in the first place so hitting this warning indicates a programming error. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-6-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Remove double pci_dn lookup.Oliver O'Halloran
In eeh_notify_resume_show() the pci_dn for the device is looked up once in the declaration block and then once after checking for a NULL eeh_dev. Remove the second lookup since it's pointless. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-5-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: ifdef pseries sr-iov sysfs propertiesOliver O'Halloran
There are several EEH sysfs properties that only exists when the "ibm,is-open-sriov-pf" property appears in the device tree node of the PCI device. This used on pseries to indicate to the guest that the hypervisor allows the guest to configure the SR-IOV capability. Doing this requires some handshaking between the guest, hypervisor and userspace when a VF is EEH frozen which is why these properties exist. This is all dead code on non-pseries platforms so wrap it in an #ifdef CONFIG_PPC_PSERIES to make the dependency clearer. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-4-oohall@gmail.com
2020-01-23powerpc/eeh_sysfs: Fix incorrect commentOliver O'Halloran
The EEH_ATTR_SHOW() helper is used to display fields from struct eeh_dev not struct pci_dn. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-3-oohall@gmail.com
2020-01-23powerpc/eeh_cache: Don't use pci_dn when inserting new rangesOliver O'Halloran
At the point where we start inserting ranges into the EEH address cache the binding between pci_dev and eeh_dev has already been set up. Instead of consulting the pci_dn tree we can retrieve the eeh_dev directly using pci_dev_to_eeh_dev(). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Tested-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190715085612.8802-2-oohall@gmail.com
2020-01-23ocxl: Add PCI hotplug dependency to KconfigFrederic Barrat
The PCI hotplug framework is used to update the devices when a new image is written to the FPGA. Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-12-fbarrat@linux.ibm.com
2020-01-23pci/hotplug/pnv-php: Wrap warnings in macroFrederic Barrat
An opencapi slot doesn't have an associated bridge device. It's not needed for operation, but any warning is displayed through pci_warn() which uses the pci_dev struct of the assocated bridge device. So wrap those warning so that a different trace mechanism can be used if it's an opencapi slot. Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-11-fbarrat@linux.ibm.com
2020-01-23pci/hotplug/pnv-php: Relax check when disabling slotFrederic Barrat
The driver only allows to disable a slot in the POPULATED state. However, if an error occurs while enabling the slot, say because the link couldn't be trained, then the POPULATED state may not be reached, yet the power state of the slot is on. So allow to disable a slot in the REGISTERED state. Removing the devices will do nothing since it's not populated, and we'll set the power state of the slot back to off. Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-10-fbarrat@linux.ibm.com
2020-01-23pci/hotplug/pnv-php: Register opencapi slotsFrederic Barrat
Add the opencapi PHBs to the list of PHBs being scanned to look for slots. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-9-fbarrat@linux.ibm.com
2020-01-23pci/hotplug/pnv-php: Improve error msg on power state change failureFrederic Barrat
When changing the slot state, if opal hits an error and tells as such in the asynchronous reply, the warning "Wrong msg" is logged, which is rather confusing. Instead we can reuse the better message which is already used when we couldn't submit the asynchronous opal request initially. Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191121134918.7155-8-fbarrat@linux.ibm.com