summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-27powerpc/mpc52xx_gpt: make use of raw_spinlock variantsJulia Cartwright
The mpc52xx_gpt code currently implements an irq_chip for handling interrupts; due to how irq_chip handling is done, it's necessary for the irq_chip methods to be invoked from hardirq context, even on a a real-time kernel. Because the spinlock_t type becomes a "sleeping" spinlock w/ RT kernels, it is not suitable to be used with irq_chips. A quick audit of the operations under the lock reveal that they do only minimal, bounded work, and are therefore safe to do under a raw spinlock. Signed-off-by: Julia Cartwright <julia@ni.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27macintosh/adb: Properly mark continued kernel messagesAndreas Schwab
Use pr_cont where appropriate, and switch to pr_foo throughout. Additionally, lower messages in adb_probe_task to debug level. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> [mpe: Clean up whitespace slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/pseries: Fix cpu hotplug crash with memoryless nodesMichael Bringmann
On powerpc systems with shared configurations of CPUs and memory and memoryless nodes at boot, an event ordering problem was observed on a SLES12 build platforms with the hot-add of CPUs to the memoryless nodes. * The most common error occurred when the memory SLAB driver attempted to reference the memoryless node to which a CPU was being added before the kernel had finished initializing all of the data structures for the CPU and exited 'device_online' under DLPAR/hot-add. Normally the memoryless node would be initialized through the call path device_online ... arch_update_cpu_topology ... find_cpu_nid ... try_online_node. This patch ensures that the powerpc node will be initialized as early as possible, even if it was memoryless and CPU-less at the point when we are trying to hot-add a new CPU to it. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/numa: Ensure nodes initialized for hotplugMichael Bringmann
This patch fixes some problems encountered at runtime with configurations that support memory-less nodes, or that hot-add CPUs into nodes that are memoryless during system execution after boot. The problems of interest include: * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them are allowed to be 'possible' and 'online'. Memory allocations for those nodes are taken from another node that does have memory until and if memory is hot-added to the node. * Nodes which have no resources assigned at boot, but which may still be referenced subsequently by affinity or associativity attributes, are kept in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs to the system can reference these nodes and bring them online instead of redirecting the references to one of the set of nodes known to have memory at boot. Note that this software operates under the context of CPU hotplug. We are not doing memory hotplug in this code, but rather updating the kernel's CPU topology (i.e. arch_update_cpu_topology / numa_update_cpu_topology). We are initializing a node that may be used by CPUs or memory before it can be referenced as invalid by a CPU hotplug operation. CPU hotplug operations are protected by a range of APIs including cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / cpus_read/write_unlock, device locks, and more. Memory hotplug operations, including try_online_node, are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the case of CPUs being hot-added to a previously memoryless node, the try_online_node operation occurs wholly within the CPU locks with no overlap. Using HMC hot-add/hot-remove operations, we have been able to add and remove CPUs to any possible node without failures. HMC operations involve a degree self-serialization, though. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/numa: Use ibm,max-associativity-domains to discover possible nodesMichael Bringmann
On powerpc systems which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized. These empty nodes may occur when, * Dedicated vs. shared resources. Shared resources require information such as the VPHN hcall for CPU assignment to nodes. Associativity decisions made based on dedicated resource rules, such as associativity properties in the device tree, may vary from decisions made using the values returned by the VPHN hcall. * memoryless nodes at boot. Nodes need to be defined as 'possible' at boot for operation with other code modules. Previously, the powerpc code would limit the set of possible nodes to those which have memory assigned at boot, and were thus online. Subsequent add/remove of CPUs or memory would only work with this subset of possible nodes. * memoryless nodes with CPUs at boot. Due to the previous restriction on nodes, nodes that had CPUs but no memory were being collapsed into other nodes that did have memory at boot. In practice this meant that the node assignment presented by the runtime kernel differed from the affinity and associativity attributes presented by the device tree or VPHN hcalls. Nodes that might be known to the pHyp were not 'possible' in the runtime kernel because they did not have memory at boot. This patch ensures that sufficient nodes are defined to support configuration requirements after boot, as well as at boot. This patch set fixes a couple of problems. * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them are allowed to be 'possible' and 'online'. Memory allocations for those nodes are taken from another node that does have memory until and if memory is hot-added to the node. * Nodes which have no resources assigned at boot, but which may still be referenced subsequently by affinity or associativity attributes, are kept in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs to the system can reference these nodes and bring them online instead of redirecting to one of the set of nodes that were known to have memory at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. This new setting will override the instruction: nodes_and(node_possible_map, node_possible_map, node_online_map); presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). If the "ibm,max-associativity-domains" property is not present at boot, no operation will be performed to define or enable additional nodes, or enable the above 'nodes_and()'. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/kernel: Block interrupts when updating TIDRSukadev Bhattiprolu
clear_thread_tidr() is called in interrupt context as a part of delayed put of the task structure (i.e as a part of timer interrupt). To prevent a deadlock, block interrupts when holding vas_thread_id_lock to set/ clear TIDR for a task. Fixes: ec233ede4c86 ("powerpc: Add support for setting SPRN_TIDR") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dnAlexey Kardashevskiy
The pcidev value stored in pci_dn is only used for NPU/NPU2 initialization. We can easily drop the cached pointer and use an ancient helper - pci_get_domain_bus_and_slot() instead in order to reduce complexity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/mm/nohash: do not flush the entire mm when range is a single pageChristophe Leroy
Most of the time, flush_tlb_range() is called on single pages. At the time being, flush_tlb_range() inconditionnaly calls flush_tlb_mm() which flushes at least the entire PID pages and on older CPUs like 4xx or 8xx it flushes the entire TLB table. This patch calls flush_tlb_page() instead of flush_tlb_mm() when the range is a single page. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27x86: Mark hpa as a "Designated Reviewer" for the time beingH. Peter Anvin
Due to some unfortunate events, I have not been directly involved in the x86 kernel patch flow for a while now. I have also not been able to ramp back up by now like I had hoped to, and after reviewing what I will need to work on both internally at Intel and elsewhere in the near term, it is clear that I am not going to be able to ramp back up until late 2018 at the very earliest. It is not acceptable to not recognize that this load is currently taken by Ingo and Thomas without my direct participation, so I mark myself as R: (designated reviewer) rather than M: (maintainer) until further notice. This is in fact recognizing the de facto situation for the past few years. I have obviously no intention of going away, and I will do everything within my power to improve Linux on x86 and x86 for Linux. This, however, puts credit where it is due and reflects a change of focus. This patch also removes stale entries for portions of the x86 architecture which have not been maintained separately from arch/x86 for a long time. If there is a reason to re-introduce them then that can happen later. Signed-off-by: H. Peter Anvin <h.peter.anvin@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bruce Schlobohm <bruce.schlobohm@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-27powerpc/pseries: Add Initialization of VF BarsBryant G. Ly
When enabling SR-IOV in pseries platform, the VF bar properties for a PF are reported on the device node in the device tree. This patch adds the IOV Bar resources to Linux structures from the device tree for later use when configuring SR-IOV by PF driver. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOVBryant G. Ly
After initial validation of SR-IOV resources, firmware will associate PEs to the dynamic VFs created within this call. This patch adds the association of PEs to the PF array of PE numbers indexed by VF. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/eeh: Add EEH notify resume sysfsBryant G. Ly
Introduce a method for notify resume to be called from sysfs. In this patch one can now call notify resume from sysfs when is supported by platform. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> [mpe: Add NULL check, add empty versions to avoid #ifdefs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/eeh: Add EEH operations to notify resumeBryant G. Ly
When pseries SR-IOV is enabled and after a PF driver has resumed from EEH, platform has to be notified of the event so the child VFs can be allowed to resume their normal recovery path. This patch makes the EEH operation allow unfreeze platform dependent code and adds the call to pseries EEH code. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/pseries: Set eeh_pe of EEH_PE_VF typeBryant G. Ly
To correctly use EEH code one has to make sure that the EEH_PE_VF is set for dynamic created VFs. Therefore this patch allocates an eeh_pe of eeh type EEH_PE_VF and associates PE with parent. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27PCI/AER: Add uevents in AER and EEH error/resumeBryant G. Ly
Devices can go offline when erors reported. This patch adds a change to the kernel object and lets udev know of error. When device resumes, a change is also set reporting device as online. Therefore, EEH and AER events are better propagated to user space for PCI devices in all arches. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27powerpc/eeh: Update VF config space after EEHBryant G. Ly
Add EEH platform operations for pseries to update VF config space. With this change after EEH, the VF will have updated config space for pseries platform. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27ocxl: add MAINTAINERS entryFrederic Barrat
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27ocxl: DocumentationFrederic Barrat
ocxl.rst gives a quick, high-level view of opencapi. Update ioctl-number.txt to reflect ioctl numbers being used by the ocxl driver Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> [mpe: Fix up mixed whitespace as spotted by gregkh] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-27KVM: VMX: introduce alloc_loaded_vmcsPaolo Bonzini
Group together the calls to alloc_vmcs and loaded_vmcs_init. Soon we'll also allocate an MSR bitmap there. Cc: stable@vger.kernel.org # prereq for Spectre mitigation Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-27KVM: nVMX: Eliminate vmcs02 poolJim Mattson
The potential performance advantages of a vmcs02 pool have never been realized. To simplify the code, eliminate the pool. Instead, a single vmcs02 is allocated per VCPU when the VCPU enters VMX operation. Cc: stable@vger.kernel.org # prereq for Spectre mitigation Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-by: Ameya More <ameya.more@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-26Merge branch 'fix-lpm-map'Alexei Starovoitov
Yonghong Song says: ==================== A kernel page fault which happens in lpm map trie_get_next_key is reported by syzbot and Eric. The issue was introduced by commit b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map"). Patch #1 fixed the issue in the kernel and patch #2 adds a multithreaded test case in tools/testing/selftests/bpf/test_lpm_map. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26tools/bpf: add a multithreaded stress test in bpf selftests test_lpm_mapYonghong Song
The new test will spawn four threads, doing map update, delete, lookup and get_next_key in parallel. It is able to reproduce the issue in the previous commit found by syzbot and Eric Dumazet. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf: fix kernel page fault in lpm map trie_get_next_keyYonghong Song
Commit b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map") introduces a bug likes below: if (!rcu_dereference(trie->root)) return -ENOENT; if (!key || key->prefixlen > trie->max_prefixlen) { root = &trie->root; goto find_leftmost; } ...... find_leftmost: for (node = rcu_dereference(*root); node;) { In the code after label find_leftmost, it is assumed that *root should not be NULL, but it is not true as it is possbile trie->root is changed to NULL by an asynchronous delete operation. The issue is reported by syzbot and Eric Dumazet with the below error log: ...... kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 8033 Comm: syz-executor3 Not tainted 4.15.0-rc8+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:trie_get_next_key+0x3c2/0xf10 kernel/bpf/lpm_trie.c:682 ...... This patch fixed the issue by use local rcu_dereferenced pointer instead of *(&trie->root) later on. Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command or LPM_TRIE map") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26Merge branches 'clk-aspeed', 'clk-lock-UP', 'clk-mediatek' and ↵Stephen Boyd
'clk-allwinner' into clk-next * clk-aspeed: clk: aspeed: Handle inverse polarity of USB port 1 clock gate clk: aspeed: Fix return value check in aspeed_cc_init() clk: aspeed: Add reset controller clk: aspeed: Register gated clocks clk: aspeed: Add platform driver and register PLLs clk: aspeed: Register core clocks clk: Add clock driver for ASPEED BMC SoCs dt-bindings: clock: Add ASPEED constants * clk-lock-UP: clk: fix reentrancy of clk_enable() on UP systems * clk-mediatek: clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built clk: mediatek: Fix all warnings for missing struct clk_onecell_data clk: mediatek: fixup test-building of MediaTek clock drivers clk: mediatek: group drivers under indpendent menu * clk-allwinner: clk: sunxi-ng: a83t: Add M divider to TCON1 clock clk: sunxi-ng: fix the A64/H5 clock description of DE2 CCU clk: sunxi-ng: add support for Allwinner H3 DE2 CCU dt-bindings: fix the binding of Allwinner DE2 CCU of A83T and H3 clk: sunxi-ng: sun8i: a83t: Use sigma-delta modulation for audio PLL clk: sunxi-ng: sun8i: a83t: Add /2 fixed post divider to audio PLL clk: sunxi-ng: Support fixed post-dividers on NM style clocks clk: sunxi-ng: sun50i: a64: Add 2x fixed post-divider to MMC module clocks clk: sunxi-ng: Support fixed post-dividers on MP style clocks clk: sunxi: Use PTR_ERR_OR_ZERO()
2018-01-26Merge branches 'clk-remove-asm-clkdev', 'clk-debugfs-fixes', 'clk-renesas' ↵Stephen Boyd
and 'clk-meson' into clk-next * clk-remove-asm-clkdev: clk: Move __clk_{get,put}() into private clk.h API clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks arch: Remove clkdev.h asm-generic from Kbuild clk: Prepare to remove asm-generic/clkdev.h blackfin: Use generic clkdev.h header * clk-debugfs-fixes: clk: Simplify debugfs registration clk: Fix debugfs_create_*() usage clk: Show symbolic clock flags in debugfs clk: Improve flags doc for of_clk_detect_critical() * clk-renesas: clk: renesas: r8a7796: Add FDP clock clk: renesas: cpg-mssr: Keep wakeup sources active during system suspend clk: renesas: mstp: Keep wakeup sources active during system suspend clk: renesas: r8a77970: Add LVDS clock * clk-meson: clk: meson-axg: fix potential NULL dereference in axg_clkc_probe() clk: meson-axg: make local symbol axg_gp0_params_table static clk: meson-axg: fix return value check in axg_clkc_probe() clk: meson: mpll: use 64-bit maths in params_from_rate clk: meson-axg: add clock controller drivers clk: meson-axg: add clocks dt-bindings required header dt-bindings: clock: add compatible variant for the Meson-AXG clk: meson: make the spinlock naming more specific clk: meson: gxbb: remove IGNORE_UNUSED from mmc clocks clk: meson: gxbb: fix wrong clock for SARADC/SANA
2018-01-26Merge branch 'clk-divider-container' into clk-nextStephen Boyd
* clk-divider-container: clk: divider: fix incorrect usage of container_of Plus fixup sprd/div.c to pass the width too.
2018-01-26Merge branch 'bpf-improvements-and-fixes'Alexei Starovoitov
Daniel Borkmann says: ==================== This set contains a small cleanup in cBPF prologue generation and otherwise fixes an outstanding issue related to BPF to BPF calls and exception handling. For details please see related patches. Last but not least, BPF selftests is extended with several new test cases. Thanks! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf: add further test cases around div/mod and othersDaniel Borkmann
Update selftests to relfect recent changes and add various new test cases. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf, arm: remove obsolete exception handling from div/modDaniel Borkmann
Since we've changed div/mod exception handling for src_reg in eBPF verifier itself, remove the leftovers from arm32 JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf, mips64: remove unneeded zero check from div/mod with kDaniel Borkmann
The verifier in both cBPF and eBPF reject div/mod by 0 imm, so this can never load. Remove emitting such test and reject it from being JITed instead (the latter is actually also not needed, but given practice in sparc64, ppc64 today, so doesn't hurt to add it here either). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Daney <david.daney@cavium.com> Reviewed-by: David Daney <david.daney@cavium.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf, mips64: remove obsolete exception handling from div/modDaniel Borkmann
Since we've changed div/mod exception handling for src_reg in eBPF verifier itself, remove the leftovers from mips64 JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Daney <david.daney@cavium.com> Reviewed-by: David Daney <david.daney@cavium.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf, sparc64: remove obsolete exception handling from div/modDaniel Borkmann
Since we've changed div/mod exception handling for src_reg in eBPF verifier itself, remove the leftovers from sparc64 JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf, ppc64: remove obsolete exception handling from div/modDaniel Borkmann
Since we've changed div/mod exception handling for src_reg in eBPF verifier itself, remove the leftovers from ppc64 JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf, s390x: remove obsolete exception handling from div/modDaniel Borkmann
Since we've changed div/mod exception handling for src_reg in eBPF verifier itself, remove the leftovers from s390x JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf, arm64: remove obsolete exception handling from div/modDaniel Borkmann
Since we've changed div/mod exception handling for src_reg in eBPF verifier itself, remove the leftovers from arm64 JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf, x86_64: remove obsolete exception handling from div/modDaniel Borkmann
Since we've changed div/mod exception handling for src_reg in eBPF verifier itself, remove the leftovers from x86_64 JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf: fix subprog verifier bypass by div/mod by 0 exceptionDaniel Borkmann
One of the ugly leftovers from the early eBPF days is that div/mod operations based on registers have a hard-coded src_reg == 0 test in the interpreter as well as in JIT code generators that would return from the BPF program with exit code 0. This was basically adopted from cBPF interpreter for historical reasons. There are multiple reasons why this is very suboptimal and prone to bugs. To name one: the return code mapping for such abnormal program exit of 0 does not always match with a suitable program type's exit code mapping. For example, '0' in tc means action 'ok' where the packet gets passed further up the stack, which is just undesirable for such cases (e.g. when implementing policy) and also does not match with other program types. While trying to work out an exception handling scheme, I also noticed that programs crafted like the following will currently pass the verifier: 0: (bf) r6 = r1 1: (85) call pc+8 caller: R6=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 callee: frame1: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_1 10: (b4) (u32) r2 = (u32) 0 11: (b4) (u32) r3 = (u32) 1 12: (3c) (u32) r3 /= (u32) r2 13: (61) r0 = *(u32 *)(r1 +76) 14: (95) exit returning from callee: frame1: R0_w=pkt(id=0,off=0,r=0,imm=0) R1=ctx(id=0,off=0,imm=0) R2_w=inv0 R3_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0,call_1 to caller at 2: R0_w=pkt(id=0,off=0,r=0,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 from 14 to 2: R0=pkt(id=0,off=0,r=0,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 2: (bf) r1 = r6 3: (61) r1 = *(u32 *)(r1 +80) 4: (bf) r2 = r0 5: (07) r2 += 8 6: (2d) if r2 > r1 goto pc+1 R0=pkt(id=0,off=0,r=8,imm=0) R1=pkt_end(id=0,off=0,imm=0) R2=pkt(id=0,off=8,r=8,imm=0) R6=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 7: (71) r0 = *(u8 *)(r0 +0) 8: (b7) r0 = 1 9: (95) exit from 6 to 8: safe processed 16 insns (limit 131072), stack depth 0+0 Basically what happens is that in the subprog we make use of a div/mod by 0 exception and in the 'normal' subprog's exit path we just return skb->data back to the main prog. This has the implication that the verifier thinks we always get a pkt pointer in R0 while we still have the implicit 'return 0' from the div as an alternative unconditional return path earlier. Thus, R0 then contains 0, meaning back in the parent prog we get the address range of [0x0, skb->data_end] as read and writeable. Similar can be crafted with other pointer register types. Since i) BPF_ABS/IND is not allowed in programs that contain BPF to BPF calls (and generally it's also disadvised to use in native eBPF context), ii) unknown opcodes don't return zero anymore, iii) we don't return an exception code in dead branches, the only last missing case affected and to fix is the div/mod handling. What we would really need is some infrastructure to propagate exceptions all the way to the original prog unwinding the current stack and returning that code to the caller of the BPF program. In user space such exception handling for similar runtimes is typically implemented with setjmp(3) and longjmp(3) as one possibility which is not available in the kernel, though (kgdb used to implement it in kernel long time ago). I implemented a PoC exception handling mechanism into the BPF interpreter with porting setjmp()/longjmp() into x86_64 and adding a new internal BPF_ABRT opcode that can use a program specific exception code for all exception cases we have (e.g. div/mod by 0, unknown opcodes, etc). While this seems to work in the constrained BPF environment (meaning, here, we don't need to deal with state e.g. from memory allocations that we would need to undo before going into exception state), it still has various drawbacks: i) we would need to implement the setjmp()/longjmp() for every arch supported in the kernel and for x86_64, arm64, sparc64 JITs currently supporting calls, ii) it has unconditional additional cost on main program entry to store CPU register state in initial setjmp() call, and we would need some way to pass the jmp_buf down into ___bpf_prog_run() for main prog and all subprogs, but also storing on stack is not really nice (other option would be per-cpu storage for this, but it also has the drawback that we need to disable preemption for every BPF program types). All in all this approach would add a lot of complexity. Another poor-man's solution would be to have some sort of additional shared register or scratch buffer to hold state for exceptions, and test that after every call return to chain returns and pass R0 all the way down to BPF prog caller. This is also problematic in various ways: i) an additional register doesn't map well into JITs, and some other scratch space could only be on per-cpu storage, which, again has the side-effect that this only works when we disable preemption, or somewhere in the input context which is not available everywhere either, and ii) this adds significant runtime overhead by putting conditionals after each and every call, as well as implementation complexity. Yet another option is to teach verifier that div/mod can return an integer, which however is also complex to implement as verifier would need to walk such fake 'mov r0,<code>; exit;' sequeuence and there would still be no guarantee for having propagation of this further down to the BPF caller as proper exception code. For parent prog, it is also is not distinguishable from a normal return of a constant scalar value. The approach taken here is a completely different one with little complexity and no additional overhead involved in that we make use of the fact that a div/mod by 0 is undefined behavior. Instead of bailing out, we adapt the same behavior as on some major archs like ARMv8 [0] into eBPF as well: X div 0 results in 0, and X mod 0 results in X. aarch64 and aarch32 ISA do not generate any traps or otherwise aborts of program execution for unsigned divides. I verified this also with a test program compiled by gcc and clang, and the behavior matches with the spec. Going forward we adapt the eBPF verifier to emit such rewrites once div/mod by register was seen. cBPF is not touched and will keep existing 'return 0' semantics. Given the options, it seems the most suitable from all of them, also since major archs have similar schemes in place. Given this is all in the realm of undefined behavior, we still have the option to adapt if deemed necessary and this way we would also have the option of more flexibility from LLVM code generation side (which is then fully visible to verifier). Thus, this patch i) fixes the panic seen in above program and ii) doesn't bypass the verifier observations. [0] ARM Architecture Reference Manual, ARMv8 [ARM DDI 0487B.b] http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487b.b/DDI0487B_b_armv8_arm.pdf 1) aarch64 instruction set: section C3.4.7 and C6.2.279 (UDIV) "A division by zero results in a zero being written to the destination register, without any indication that the division by zero occurred." 2) aarch32 instruction set: section F1.4.8 and F5.1.263 (UDIV) "For the SDIV and UDIV instructions, division by zero always returns a zero result." Fixes: f4d7e40a5b71 ("bpf: introduce function calls (verification)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf: make unknown opcode handling more robustDaniel Borkmann
Recent findings by syzcaller fixed in 7891a87efc71 ("bpf: arsh is not supported in 32 bit alu thus reject it") triggered a warning in the interpreter due to unknown opcode not being rejected by the verifier. The 'return 0' for an unknown opcode is really not optimal, since with BPF to BPF calls, this would go untracked by the verifier. Do two things here to improve the situation: i) perform basic insn sanity check early on in the verification phase and reject every non-uapi insn right there. The bpf_opcode_in_insntable() table reuses the same mapping as the jumptable in ___bpf_prog_run() sans the non-public mappings. And ii) in ___bpf_prog_run() we do need to BUG in the case where the verifier would ever create an unknown opcode due to some rewrites. Note that JITs do not have such issues since they would punt to interpreter in these situations. Moreover, the BPF_JIT_ALWAYS_ON would also help to avoid such unknown opcodes in the first place. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf: improve dead code sanitizingDaniel Borkmann
Given we recently had c131187db2d3 ("bpf: fix branch pruning logic") and 95a762e2c8c9 ("bpf: fix incorrect sign extension in check_alu_op()") in particular where before verifier skipped verification of the wrongly assumed dead branch, we should not just replace the dead code parts with nops (mov r0,r0). If there is a bug such as fixed in 95a762e2c8c9 in future again, where runtime could execute those insns, then one of the potential issues with the current setting would be that given the nops would be at the end of the program, we could execute out of bounds at some point. The best in such case would be to just exit the BPF program altogether and return an exception code. However, given this would require two instructions, and such a dead code gap could just be a single insn long, we would need to place 'r0 = X; ret' snippet at the very end after the user program or at the start before the program (where we'd skip that region on prog entry), and then place unconditional ja's into the dead code gap. While more complex but possible, there's still another block in the road that currently prevents from this, namely BPF to BPF calls. The issue here is that such exception could be returned from a callee, but the caller would not know that it's an exception that needs to be propagated further down. Alternative that has little complexity is to just use a ja-1 code for now which will trap the execution here instead of silently doing bad things if we ever get there due to bugs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26bpf: xor of a/x in cbpf can be done in 32 bit aluDaniel Borkmann
Very minor optimization; saves 1 byte per program in x86_64 JIT in cBPF prologue. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-26Merge branches 'clk-iproc', 'clk-mvebu' and 'clk-qcom-a53' into clk-nextStephen Boyd
* clk-iproc: clk: iproc: Minor tidy up of iproc pll data structures clk: iproc: Allow plls to do minor rate changes without reset clk: iproc: Fix error in the pll post divider rate calculation clk: iproc: Allow iproc pll to runtime calculate vco parameters * clk-mvebu: clk: mvebu: armada-37xx-periph: Use PTR_ERR_OR_ZERO() * clk-qcom-a53: clk: qcom: Add APCS clock controller support clk: qcom: Add regmap mux-div clocks support clk: qcom: Add A53 PLL support
2018-01-26Merge branches 'clk-at91', 'clk-imx7ulp', 'clk-axigen', 'clk-si5351' and ↵Stephen Boyd
'clk-pxa' into clk-next * clk-at91: clk: at91: pmc: Support backup for programmable clocks clk: at91: pmc: Save SCSR during suspend clk: at91: pmc: Wait for clocks when resuming * clk-imx7ulp: clk: Don't touch hardware when reparenting during registration * clk-axigen: clk: axi-clkgen: Round closest in round_rate() and recalc_rate() clk: axi-clkgen: Correctly handle nocount bit in recalc_rate() * clk-si5351: clk: si5351: _si5351_clkout_reset_pll() can be static clk: si5351: Do not enable parent clocks on probe clk: si5351: Rename internal plls to avoid name collisions clk: si5351: Apply PLL soft reset before enabling the outputs clk: si5351: Add DT property to enable PLL reset clk: si5351: implement remove handler * clk-pxa: clk: pxa: unbreak lookup of CLK_POUT
2018-01-26Merge branches 'clk-spreadtrum', 'clk-mvebu-dvfs', 'clk-qoriq', 'clk-imx' ↵Stephen Boyd
and 'clk-qcom-ipq8074' into clk-next * clk-spreadtrum: clk: sprd: add clocks support for SC9860 clk: sprd: Add dt-bindings include file for SC9860 dt-bindings: Add Spreadtrum clock binding documentation clk: sprd: add adjustable pll support clk: sprd: add composite clock support clk: sprd: add divider clock support clk: sprd: add mux clock support clk: sprd: add gate clock support clk: sprd: Add common infrastructure clk: move clock common macros out from vendor directories * clk-mvebu-dvfs: clk: mvebu: armada-37xx-periph: add DVFS support for cpu clocks clk: mvebu: armada-37xx-periph: prepare cpu clk to be used with DVFS clk: mvebu: armada-37xx-periph: cosmetic changes * clk-qoriq: clk: qoriq: add more divider clocks support * clk-imx: clk: imx51: uart4, uart5 gates only exist on imx50, imx53 * clk-qcom-ipq8074: clk: qcom: ipq8074: add misc resets for PCIE and NSS dt-bindings: clock: qcom: add misc resets for PCIE and NSS clk: qcom: ipq8074: add GP and Crypto clocks clk: qcom: ipq8074: add NSS ethernet port clocks clk: qcom: ipq8074: add NSS clocks clk: qcom: ipq8074: add PCIE, USB and SDCC clocks clk: qcom: ipq8074: add remaining PLL’s dt-bindings: clock: qcom: add remaining clocks for IPQ8074 clk: qcom: ipq8074: fix missing GPLL0 divider width clk: qcom: add parent map for regmap mux clk: qcom: add read-only divider operations
2018-01-26Merge branches 'clk-qcom-alpha-pll', 'clk-check-ops-ptr', 'clk-protect-rate' ↵Stephen Boyd
and 'clk-omap' into clk-next * clk-qcom-alpha-pll: clk: qcom: add read-only alpha pll post divider operations clk: qcom: support for 2 bit PLL post divider clk: qcom: support Brammo type Alpha PLL clk: qcom: support Huayra type Alpha PLL clk: qcom: support for dynamic updating the PLL clk: qcom: support for alpha mode configuration clk: qcom: flag for 64 bit CONFIG_CTL clk: qcom: fix 16 bit alpha support calculation clk: qcom: support for alpha pll properties * clk-check-ops-ptr: clk: check ops pointer on clock register * clk-protect-rate: clk: fix set_rate_range when current rate is out of range clk: add clk_rate_exclusive api clk: cosmetic changes to clk_summary debugfs entry clk: add clock protection mechanism to clk core clk: use round rate to bail out early in set_rate clk: rework calls to round and determine rate callbacks clk: add clk_core_set_phase_nolock function clk: take the prepare lock out of clk_core_set_parent clk: fix incorrect usage of ENOSYS * clk-omap: clk: ti: Drop legacy clk-3xxx-legacy code
2018-01-26clk: aspeed: Handle inverse polarity of USB port 1 clock gateBenjamin Herrenschmidt
The USB port 1 clock gate control has an inversed polarity from all the other clock gates in the chip. This makes the aspeed_clk_{enable,disable} functions honor the flag CLK_GATE_SET_TO_DISABLE and set that flag appropriately so it's set for all clocks except USB port 1. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2018-01-26clk: aspeed: Fix return value check in aspeed_cc_init()Wei Yongjun
In case of error, the function of_iomap() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Fixes: a2e230c7b2ea ("clk: Add clock driver for ASPEED BMC SoCs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2018-01-26clk: aspeed: Add reset controllerJoel Stanley
There are some resets that are not associated with gates. These are represented by a reset controller. Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2018-01-26clk: aspeed: Register gated clocksJoel Stanley
The majority of the clocks in the system are gates paired with a reset controller that holds the IP in reset. This borrows from clk_hw_register_gate, but registers two 'gates', one to control the clock enable register and the other to control the reset IP. This allows us to enforce the ordering: 1. Place IP in reset 2. Enable clock 3. Delay 4. Release reset There are some gates that do not have an associated reset; these are handled by using -1 as the index for the reset. Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2018-01-26clk: aspeed: Add platform driver and register PLLsJoel Stanley
This registers a platform driver to set up all of the non-core clocks. The clocks that have configurable rates are now registered. Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2018-01-26clk: aspeed: Register core clocksJoel Stanley
This registers the core clocks; those which are required to calculate the rate of the timer peripheral so the system can load a clocksource driver. Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>