summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2016-06-13s390: avoid extable collisionsHeiko Carstens
We have some inline assemblies where the extable entry points to a label at the end of an inline assembly which is not followed by an instruction. On the other hand we have also inline assemblies where the extable entry points to the first instruction of an inline assembly. If a first type inline asm (extable point to empty label at the end) would be directly followed by a second type inline asm (extable points to first instruction) then we would have two different extable entries that point to the same instruction but would have a different target address. This can lead to quite random behaviour, depending on sorting order. I verified that we currently do not have such collisions within the kernel. However to avoid such subtle bugs add a couple of nop instructions to those inline assemblies which contain an extable that points to an empty label. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/uaccess: use __builtin_expect for get_user/put_userHeiko Carstens
We always expect that get_user and put_user return with zero. Give the compiler a hint so it can slightly optimize the code and avoid branches. This is the same what x86 got with commit a76cf66e948a ("x86/uaccess: Tell the compiler that uaccess is unlikely to fault"). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/uaccess: fix whitespace damageHeiko Carstens
Fix some whitespace damage that was introduced by me with a query-replace when removing 31 bit support. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pci: ensure to not cross a dma segment boundarySebastian Ott
When we use the iommu_area_alloc helper to get dma addresses we specify the boundary_size parameter but not the offset (called shift in this context). As long as the offset (start_dma) is a multiple of the boundary we're ok (on current machines start_dma always seems to be 4GB). Don't leave this to chance and specify the offset for iommu_area_alloc. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pci: ensure page aligned dma start addressSebastian Ott
We don't have an architectural guarantee on the value of the dma offset but rely on it to be at least page aligned. Enforce page alignemt of start_dma. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: use SPARSE_IRQSebastian Ott
Use dynamically allocated irq descriptors on s390 which allows us to get rid of the s390 specific config option PCI_NR_MSI and exploit more MSI interrupts. Also the size of the kernel image is reduced by 131K (using performance_defconfig). Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: use __section macro everywhereHeiko Carstens
Small cleanup patch to use the shorter __section macro everywhere. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: add proper __ro_after_init supportHeiko Carstens
On s390 __ro_after_init is currently mapped to __read_mostly which means that data marked as __ro_after_init will not be protected. Reason for this is that the common code __ro_after_init implementation is x86 centric: the ro_after_init data section was added to rodata, since x86 enables write protection to kernel text and rodata very late. On s390 we have write protection for these sections enabled with the initial page tables. So adding the ro_after_init data section to rodata does not work on s390. In order to make __ro_after_init work properly on s390 move the ro_after_init data, right behind rodata. Unlike the rodata section it will be marked read-only later after all init calls happened. This s390 specific implementation adds new __start_ro_after_init and __end_ro_after_init labels. Everything in between will be marked read-only after the init calls happened. In addition to the __ro_after_init data move also the exception table there, since from a practical point of view it fits the __ro_after_init requirements. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: simplify the TLB flushing codeMartin Schwidefsky
ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to decide between a lazy TLB flush vs an immediate TLB flush. The field contains two 16-bit counters, the number of CPUs that have the mm attached and can create TLB entries for it and the number of CPUs in the middle of a page table update. The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions use the attach counter and a mask check with mm_cpumask(mm) to decide between a local flush local of the current CPU and a global flush. For all these functions the decision between lazy vs immediate and local vs global TLB flush can be based on CPU masks. There are two masks: the mm->context.cpu_attach_mask with the CPUs that are actively using the mm, and the mm_cpumask(mm) with the CPUs that have used the mm since the last full flush. The decision between lazy vs immediate flush is based on the mm->context.cpu_attach_mask, to decide between local vs global flush the mm_cpumask(mm) is used. With this patch all checks will use the CPU masks, the old counter mm->context.attach_count with its two 16-bit values is turned into a single counter mm->context.flush_count that keeps track of the number of CPUs with incomplete page table updates. The sole user of this counter is finish_arch_post_lock_switch() which waits for the end of all page table updates. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: fix vunmap vs finish_arch_post_lock_switchMartin Schwidefsky
The vunmap_pte_range() function calls ptep_get_and_clear() without any locking. ptep_get_and_clear() uses ptep_xchg_lazy()/ptep_flush_direct() for the page table update. ptep_flush_direct requires that preemption is disabled, but without any locking this is not the case. If the kernel preempts the task while the attach_counter is increased an endless loop in finish_arch_post_lock_switch() will occur the next time the task is scheduled. Add explicit preempt_disable()/preempt_enable() calls to the relevant functions in arch/s390/mm/pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/time: remove ETR supportMartin Schwidefsky
The External-Time-Reference (ETR) clock synchronization interface has been superseded by Server-Time-Protocol (STP). Remove the outdated ETR interface. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/time: add leap seconds to initial system timeMartin Schwidefsky
The PTFF instruction can be used to retrieve information about UTC including the current number of leap seconds. Use this value to convert the coordinated server time value of the TOD clock to a proper UTC timestamp to initialize the system time. Without this correction the system time will be off by the number of leap seonds until it has been corrected via NTP. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/time: LPAR offset handlingMartin Schwidefsky
It is possible to specify a user offset for the TOD clock, e.g. +2 hours. The TOD clock will carry this offset even if the clock is synchronized with STP. This makes the time stamps acquired with get_sync_clock() useless as another LPAR migth use a different TOD offset. Use the PTFF instrution to get the TOD epoch difference and subtract it from the TOD clock value to get a physical timestamp. As the epoch difference contains the sync check delta as well the LPAR offset value to the physical clock needs to be refreshed after each clock synchronization. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/time: move PTFF definitionsMartin Schwidefsky
The PTFF instruction is not a function of ETR, rename and move the PTFF definitions from etr.h to timex.h. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/time: STP sync clock correctionMartin Schwidefsky
The sync clock operation of the channel subsystem call for STP delivers the TOD clock difference as a result. Use this TOD clock difference instead of the difference between the TOD timestamps before and after the sync clock operation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/kexec: fix crash on resize of reserved memoryHeiko Carstens
Reducing the size of reserved memory for the crash kernel will result in an immediate crash on s390. Reason for that is that we do not create struct pages for memory that is reserved. If that memory is freed any access to struct pages which correspond to this memory will result in invalid memory accesses and a kernel panic. Fix this by properly creating struct pages when the system gets initialized. Change the code also to make use of set_memory_ro() and set_memory_rw() so page tables will be split if required. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/kexec: fix update of os_info crash kernel sizeHeiko Carstens
Implement an s390 version of the weak crash_free_reserved_phys_range function. This allows us to update the size of the reserved crash kernel memory if it will be resized. This was previously done with a call to crash_unmap_reserved_pages from crash_shrink_memory which was removed with ("s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres()") Fixes: 7a0058ec7860 ("s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres()") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: align swapper_pg_dir to 16kHeiko Carstens
The segment/region table that is part of the kernel image must be properly aligned to 16k in order to make the crdte inline assembly work. Otherwise it will calculate a wrong segment/region table start address and access incorrect memory locations if the swapper_pg_dir is not aligned to 16k. Therefore define BSS_FIRST_SECTIONS in order to put the swapper_pg_dir at the beginning of the bss section and also align the bss section to 16k just like other architectures did. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: dump_stack: fill in arch descriptionChristian Borntraeger
Lets provide the basic machine information for dump_stack on s390. This enables the "Hardware name:" line and results in output like [...] Oops: 0004 ilc:2 [#1] SMP Modules linked in: CPU: 1 PID: 74 Comm: sh Not tainted 4.5.0+ #205 Hardware name: IBM 2964 NC9 704 (KVM) [...] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: use canonical include guard styleDaniel van Gerpen
Signed-off-by: Daniel van Gerpen <daniel@vangerpen.de> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/cpuinfo: show dynamic and static cpu mhzHeiko Carstens
Show the dynamic and static cpu mhz of each cpu. Since these values are per cpu this requires a fundamental extension of the format of /proc/cpuinfo. Historically we had only a single line per cpu and a summary at the top of the file. This format is hardly extendible if we want to add more per cpu information. Therefore this patch adds per cpu blocks at the end of /proc/cpuinfo: cpu : 0 cpu Mhz dynamic : 5504 cpu Mhz static : 5504 cpu : 1 cpu Mhz dynamic : 5504 cpu Mhz static : 5504 cpu : 2 cpu Mhz dynamic : 5504 cpu Mhz static : 5504 cpu : 3 cpu Mhz dynamic : 5504 cpu Mhz static : 5504 Right now each block contains only the dynamic and static cpu mhz, but it can be easily extended like on every other architecture. This extension is supposed to be compatible with the old format. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/cpuinfo: print cache info and all single cpu lines on first iterationHeiko Carstens
Change the code to print all the current output during the first iteration. This is a preparation patch for the upcoming per cpu block extension to /proc/cpuinfo. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390: add explicit <linux/stringify.h> for jump labelJason Baron
Ensure that we always have __stringify(). Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: add mapping statisticsHeiko Carstens
Add statistics that show how memory is mapped within the kernel identity mapping. This is more or less the same like git commit ce0c0e50f94e ("x86, generic: CPA add statistics about state of direct mapping v4") for x86. I also intentionally copied the lower case "k" within DirectMap4k vs the upper case "M" and "G" within the two other lines. Let's have consistent inconsistencies across architectures. The output of /proc/meminfo now contains these additional lines: DirectMap4k: 2048 kB DirectMap1M: 3991552 kB DirectMap2G: 4194304 kB The implementation on s390 is lockless unlike the x86 version, since I assume changes to the kernel mapping are a very rare event. Therefore it really doesn't matter if these statistics could potentially be inconsistent if read while kernel pages tables are being changed. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: simplify vmem code for read-only mappingsHeiko Carstens
For the kernel identity mapping map everything read-writeable and subsequently call set_memory_ro() to make the ro section read-only. This simplifies the code a lot. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pageattr: allow kernel page table splittingHeiko Carstens
set_memory_ro() and set_memory_rw() currently only work on 4k mappings, which is good enough for module code aka the vmalloc area. However we stumbled already twice into the need to make this also work on larger mappings: - the ro after init patch set - the crash kernel resize code Therefore this patch implements automatic kernel page table splitting if e.g. set_memory_ro() would be called on parts of a 2G mapping. This works quite the same as the x86 code, but is much simpler. In order to make this work and to be architecturally compliant we now always use the csp, cspg or crdte instructions to replace valid page table entries. This means that set_memory_ro() and set_memory_rw() will be much more expensive than before. In order to avoid huge latencies the code contains a couple of cond_resched() calls. The current code only splits page tables, but does not merge them if it would be possible. The reason for this is that currently there is no real life scenarion where this would really happen. All current use cases that I know of only change access rights once during the life time. If that should change we can still implement kernel page table merging at a later time. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: make pmd and pud helper functions availableHeiko Carstens
Make pmd_wrprotect() and pmd_mkwrite() available independently from CONFIG_TRANSPARENT_HUGEPAGE and CONFIG_HUGETLB_PAGE so these can be used on the kernel mapping. Also introduce a couple of pud helper functions, namely pud_pfn(), pud_wrprotect(), pud_mkwrite(), pud_mkdirty() and pud_mkclean() which only work on the kernel mapping. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/mm: always use PAGE_KERNEL when mapping pagesHeiko Carstens
Always use PAGE_KERNEL when re-enabling pages within the kernel mapping due to debug pagealloc. Without using this pgprot value pte_mkwrite() and pte_wrprotect() won't work on such mappings after an unmap -> map cycle anymore. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: make use of pte_clear()Heiko Carstens
Use pte_clear() instead of open-coding it. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: get rid of _REGION3_ENTRY_ROHeiko Carstens
_REGION3_ENTRY_RO is a duplicate of _REGION_ENTRY_PROTECT. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: introduce and use SEGMENT_KERNEL and REGION3_KERNELHeiko Carstens
Instead of open-coded SEGMENT_KERNEL and REGION3_KERNEL assignments use defines. Also to make e.g. pmd_wrprotect() work on the kernel mapping a couple more flags must be set. Therefore add the missing flags also. In order to make everything symmetrical this patch also adds software dirty, young, read and write bits for region 3 table entries. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/vmem: align segment and region tables to 16kHeiko Carstens
Usually segment and region tables are 16k aligned due to the way the buddy allocator works. This is not true for the vmem code which only asks for a 4k alignment. In order to be consistent enforce a 16k alignment here as well. This alignment will be assumed and therefore is required by the pageattr code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13s390/pgtable: introduce and use generic csp inline asmHeiko Carstens
We have already two inline assemblies which make use of the csp instruction. Since I need a third instance let's introduce a generic inline assmebly which can be used by everyone. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13KVM: s390/mm: Fix CMMA reset during rebootChristian Borntraeger
commit 1e133ab296f ("s390/mm: split arch/s390/mm/pgtable.c") factored out the page table handling code from __gmap_zap and __s390_reset_cmma into ptep_zap_unused and added a simple flag that tells which one of the function (reset or not) is to be made. This also changed the behaviour, as it also zaps unused page table entries on reset. Turns out that this is wrong as s390_reset_cmma uses the page walker, which DOES NOT take the ptl lock. The most simple fix is to not do the zapping part on reset (which uses the walker) Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: 1e133ab296f ("s390/mm: split arch/s390/mm/pgtable.c") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-10Merge tag 'powerpc-4.7-3Michael Ellerman:' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from - ptrace: Fix out of bounds array access warning from Khem Raj - pseries: Fix PCI config address for DDW from Gavin Shan - pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added from Michael Ellerman - of: fix autoloading due to broken modalias with no 'compatible' from Wolfram Sang - radix: Fix always false comparison against MMU_NO_CONTEXT from Aneesh Kumar K.V - hash: Compute the segment size correctly for ISA 3.0 from Aneesh Kumar K.V - nohash: Fix build break with 64K pages from Michael Ellerman * tag 'powerpc-4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/nohash: Fix build break with 64K pages powerpc/mm/hash: Compute the segment size correctly for ISA 3.0 powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT of: fix autoloading due to broken modalias with no 'compatible' powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added powerpc/pseries: Fix PCI config address for DDW powerpc/ptrace: Fix out of bounds array access warning
2016-06-10Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "A fix for an issue that Alex saw whilst swapping with hardware access/dirty bit support enabled in the kernel: Fix a failure to fault in old pages on a write when CONFIG_ARM64_HW_AFDBM is enabled" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mm: always take dirty state from new pte in ptep_set_access_flags
2016-06-10Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes from all around the map, plus a commit that introduces a new header of Intel model name symbols (unused) that will make the next merge window easier" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Fix incorrect pointers in ioapic_setup_resources() x86/entry/traps: Don't force in_interrupt() to return true in IST handlers x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer models x86/cpu/intel: Introduce macros for Intel family numbers x86, build: copy ldlinux.c32 to image.iso x86/msr: Use the proper trace point conditional for writes
2016-06-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A handful of tooling fixes, two PMU driver fixes and a cleanup of redundant code that addresses a security analyzer false positive" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Remove a redundant check perf/x86/intel/uncore: Remove SBOX support for Broadwell server perf ctf: Convert invalid chars in a string before set value perf record: Fix crash when kptr is restricted perf symbols: Check kptr_restrict for root perf/x86/intel/rapl: Fix pmus free during cleanup
2016-06-10x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()Rui Wang
On a 4-socket Brickland system, hot-removing one ioapic is fine. Hot-removing the 2nd one causes panic in mp_unregister_ioapic() while calling release_resource(). It is because the iomem_res pointer has already been released when removing the first ioapic. To explain the use of &res[num] here: res is assigned to ioapic_resources, and later in ioapic_insert_resources() we do: struct resource *r = ioapic_resources; for_each_ioapic(i) { insert_resource(&iomem_resource, r); r++; } Here 'r' is treated as an arry of 'struct resource', and the r++ ensures that each element of the array is inserted separately. Thus we should call release_resouce() on each element at &res[num]. Fix it by assigning the correct pointers to ioapics[i].iomem_res in ioapic_setup_resources(). Signed-off-by: Rui Wang <rui.y.wang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: tony.luck@intel.com Cc: linux-pci@vger.kernel.org Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1465369193-4816-3-git-send-email-rui.y.wang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-10x86/entry/traps: Don't force in_interrupt() to return true in IST handlersAndy Lutomirski
Forcing in_interrupt() to return true if we're not in a bona fide interrupt confuses the softirq code. This fixes warnings like: NOHZ: local_softirq_pending 282 ... which can happen when running things like selftests/x86. This will change perf's static percpu buffer usage in IST context. I think this is okay, and it's changing the behavior to match historical (pre-4.0) behavior. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 959274753857 ("x86, traps: Track entry into and exit from IST context") Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-10powerpc/nohash: Fix build break with 64K pagesMichael Ellerman
Commit 74701d5947a6 "powerpc/mm: Rename function to indicate we are allocating fragments" renamed page_table_free() to pte_fragment_free(). One occurrence was mistyped as pte_fragment_fre(). This only breaks the nohash 64K page build, which is not the default or enabled in any defconfig. Fixes: 74701d5947a6 ("powerpc/mm: Rename function to indicate we are allocating fragments") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-09Merge tag 'arc-4.7-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Revert of ll-sc backoff retry workaround in atomics/spinlocks as hardware is now proven to work just fine - Typo fixes (Thanks Andrea Gelmini) - Removal of obsolete DT property (Alexey) - Other minor fixes * tag 'arc-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: Revert "ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff" Revert "ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle" Revert "ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff" ARC: don't enable DISCONTIGMEM unconditionally ARC: [intc-compact] simplify code for 2 priority levels arc: Get rid of root core-frequency property Fix typos
2016-06-08x86/cpu/AMD: Extend X86_FEATURE_TOPOEXT workaround to newer modelsBorislav Petkov
We need to reenable the topology extensions CPUID leafs on newer models too, if BIOS has disabled them, as we rely on them to get proper compute unit topology. Make the printk a once thing, while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rui Huang <ray.huang@amd.com> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-hwmon@vger.kernel.org Link: http://lkml.kernel.org/r/1464775468-23355-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-08x86/cpu/intel: Introduce macros for Intel family numbersDave Hansen
Problem: We have a boatload of open-coded family-6 model numbers. Half of them have these model numbers in hex and the other half in decimal. This makes grepping for them tons of fun, if you were to try. Solution: Consolidate all the magic numbers. Put all the definitions in one header. The names here are closely derived from the comments describing the models from arch/x86/events/intel/core.c. We could easily make them shorter by doing things like s/SANDYBRIDGE/SNB/, but they seemed fine even with the longer versions to me. Do not take any of these names too literally, like "DESKTOP" or "MOBILE". These are all colloquial names and not precise descriptions of everywhere a given model will show up. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com> Cc: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Vishwanath Somayaji <vishwanath.somayaji@intel.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: jacob.jun.pan@intel.com Cc: linux-acpi@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: platform-driver-x86@vger.kernel.org Link: http://lkml.kernel.org/r/20160603001927.F2A7D828@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-08arm64: mm: always take dirty state from new pte in ptep_set_access_flagsWill Deacon
Commit 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") ensured that pte flags are updated atomically in the face of potential concurrent, hardware-assisted updates. However, Alex reports that: | This patch breaks swapping for me. | In the broken case, you'll see either systemd cpu time spike (because | it's stuck in a page fault loop) or the system hang (because the | application owning the screen is stuck in a page fault loop). It turns out that this is because the 'dirty' argument to ptep_set_access_flags is always 0 for read faults, and so we can't use it to set PTE_RDONLY. The failing sequence is: 1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte 2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF 3. A read faults due to the missing access flag 4. ptep_set_access_flags is called with dirty = 0, due to the read fault 5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!) 6. A write faults, but pte_write is true so we get stuck The solution is to check the new page table entry (as would be done by the generic, non-atomic definition of ptep_set_access_flags that just calls set_pte_at) to establish the dirty state. Cc: <stable@vger.kernel.org> # 4.3+ Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Alexander Graf <agraf@suse.de> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-08powerpc/mm/hash: Compute the segment size correctly for ISA 3.0Aneesh Kumar K.V
PowerISA 3.0 encodes the segment size in the second half of hash page table entry. Update hpte_decode() accordingly. Fixes: 50de596de8be ("powerpc/mm/hash: Add support for Power9 Hash") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-08powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXTAneesh Kumar K.V
In some of the radix TLB flush routines, we use a local to store the mm->context.id, AKA the PID. Currently we use an int, but the PID is unsigned long, so large values of PID will be truncated. In particular MMU_NO_CONTEXT is -1, which means all our comparisons against that value can never be true. This means we'll issue TLB flushes when we shouldn't on radix enabled machines. Fix it by using an unsigned long for the local. Discovered by Coverity. Fixes: 1a472c9dba6b ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> [mpe: Write change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is 4.6, d_walk - 3.2+. The atomic_open() and coredump ones are regressions from this window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump: fix dumping through pipes fix a regression in atomic_open() fix d_walk()/non-delayed __d_free() race autofs braino fix for do_last() fix EOPENSTALE bug in do_last()
2016-06-07coredump: fix dumping through pipesMateusz Guzik
The offset in the core file used to be tracked with ->written field of the coredump_params structure. The field was retired in favour of file->f_pos. However, ->f_pos is not maintained for pipes which leads to breakage. Restore explicit tracking of the offset in coredump_params. Introduce ->pos field for this purpose since ->written was already reused. Fixes: a00839395103 ("get rid of coredump_params->written"). Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl> Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-08powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was addedMichael Ellerman
The recent commit 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call") added a new PVR mask & value to the start of the ibm_architecture_vec[] array. However it missed the fact that further down in the array, we hard code the offset of one of the fields, and then at boot use that value to patch the value in the array. This means every update to the array must also update the #define, ugh. This means that on pseries machines we will misreport to firmware the number of cores we support, by a factor of threads_per_core. Fix it for now by updating the #define. Fixes: 7cc851039d64 ("powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>