summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-30perf tools: Fix the build on the alpine:edge distroArnaldo Carvalho de Melo
The UAPI file byteorder/little_endian.h uses the __always_inline define without including the header where it is defined, linux/stddef.h, this ends up working in all the other distros because that file gets included seemingly by luck from one of the files included from little_endian.h. But not on Alpine:edge, that fails for all files where perf_event.h is included but linux/stddef.h isn't include before that. Adding the missing linux/stddef.h file where it breaks on Alpine:edge to fix that, in all other distros, that is just a very small header anyway. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-9r1pifftxvuxms8l7ir73p5l@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-30tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'Arnaldo Carvalho de Melo
To cope with the changes in: 12c89130a56a ("x86/asm/memcpy_mcsafe: Add write-protection-fault handling") 60622d68227d ("x86/asm/memcpy_mcsafe: Return bytes remaining") bd131544aa7e ("x86/asm/memcpy_mcsafe: Add labels for __memcpy_mcsafe() write fault handling") da7bc9c57eb0 ("x86/asm/memcpy_mcsafe: Remove loop unrolling") This needed introducing a file with a copy of the mcsafe_handle_tail() function, that is used in the new memcpy_64.S file, as well as a dummy mcsafe_test.h header. Testing it: $ nm ~/bin/perf | grep mcsafe 0000000000484130 T mcsafe_handle_tail 0000000000484300 T __memcpy_mcsafe $ $ perf bench mem memcpy # Running 'mem/memcpy' benchmark: # function 'default' (Default memcpy() provided by glibc) # Copying 1MB bytes ... 44.389205 GB/sec # function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S) # Copying 1MB bytes ... 22.710756 GB/sec # function 'x86-64-movsq' (movsq-based memcpy() in arch/x86/lib/memcpy_64.S) # Copying 1MB bytes ... 42.459239 GB/sec # function 'x86-64-movsb' (movsb-based memcpy() in arch/x86/lib/memcpy_64.S) # Copying 1MB bytes ... 42.459239 GB/sec $ This silences this perf tools build warning: Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S' Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-igdpciheradk3gb3qqal52d0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-30tools headers uapi: Refresh linux/bpf.h copyArnaldo Carvalho de Melo
To get the changes in: 4c79579b44b1 ("bpf: Change bpf_fib_lookup to return lookup status") That do not entail changes in tools/perf/ use of it, elliminating the following perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-yei494y6b3mn6bjzz9g0ws12@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-30tools headers powerpc: Update asm/unistd.h copy to pick newArnaldo Carvalho de Melo
The new 'io_pgetevents' syscall was wired up in PowerPC in the following cset: b2f82565f2ca ("powerpc: Wire up io_pgetevents") Update tools/arch/powerpc/ copy of the asm/unistd.h file so that 'perf trace' on PowerPC gets it in its syscall table. This elliminated the following perf build warning: Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/unistd.h' differs from latest version at 'arch/powerpc/include/uapi/asm/unistd.h' Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Breno Leitao <leitao@debian.org> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Link: https://lkml.kernel.org/n/tip-9uvu7tz4ud3bxxfyxwryuz47@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-30tools headers uapi: Update tools's copy of linux/perf_event.hArnaldo Carvalho de Melo
To get the changes in: 6cbc304f2f36 ("perf/x86/intel: Fix unwind errors from PEBS entries (mk-II)") That do not imply any changes in the tooling side, the (ab)use of sample_type is entirely done in kernel space, nothing for userspace to witness here. This cures the following warning during perf's build: Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h' Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-o64mjoy35s9gd1gitunw1zg4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-30virtio_balloon: fix another race between migration and ballooningJiang Biao
Kernel panic when with high memory pressure, calltrace looks like, PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java" #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942 #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30 #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8 #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46 #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc #6 [ffff881ec7ed7838] __node_set at ffffffff81680300 #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5 #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8 [exception RIP: _raw_spin_lock_irqsave+47] RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8 RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008 RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098 R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 It happens in the pagefault and results in double pagefault during compacting pages when memory allocation fails. Analysed the vmcore, the page leads to second pagefault is corrupted with _mapcount=-256, but private=0. It's caused by the race between migration and ballooning, and lock missing in virtballoon_migratepage() of virtio_balloon driver. This patch fix the bug. Fixes: e22504296d4f64f ("virtio_balloon: introduce migration primitives to balloon pages") Cc: stable@vger.kernel.org Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Huang Chong <huang.chong@zte.com.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-07-30media: v4l: vsp1: Fix deadlock in VSPDL DRM pipelinesLaurent Pinchart
The VSP uses a lock to protect the BRU and BRS assignment when configuring pipelines. The lock is taken in vsp1_du_atomic_begin() and released in vsp1_du_atomic_flush(), as well as taken and released in vsp1_du_setup_lif(). This guards against multiple pipelines trying to assign the same BRU and BRS at the same time. The DRM framework calls the .atomic_begin() operations in a loop over all CRTCs included in an atomic commit. On a VSPDL (the only VSP type where this matters), a single VSP instance handles two CRTCs, with a single lock. This results in a deadlock when the .atomic_begin() operation is called on the second CRTC. The DRM framework serializes atomic commits that affect the same CRTCs, but doesn't know about two CRTCs sharing the same VSPDL. Two commits affecting the VSPDL LIF0 and LIF1 respectively can thus race each other, hence the need for a lock. This could be fixed on the DRM side by forcing serialization of commits affecting CRTCs backed by the same VSPDL, but that would negatively affect performances, as the locking is only needed when the BRU and BRS need to be reassigned, which is an uncommon case. The lock protects the whole .atomic_begin() to .atomic_flush() sequence. The only operation that can occur in-between is vsp1_du_atomic_update(), which doesn't touch the BRU and BRS, and thus doesn't need to be protected by the lock. We can thus only take the lock around the pipeline setup calls in vsp1_du_atomic_flush(), which fixes the deadlock. Fixes: f81f9adc4ee1 ("media: v4l: vsp1: Assign BRU and BRS to pipelines dynamically") Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2018-07-30media: rc: read out of bounds if bpf reports high protocol numberSean Young
The repeat period is read from a static array. If a keydown event is reported from bpf with a high protocol number, we read out of bounds. This is unlikely to end up with a reasonable repeat period at the best of times, in which case no timely key up event is generated. Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2018-07-30x86/kexec: Allocate 8k PGDs for PTIJoerg Roedel
Fuzzing the PTI-x86-32 code with trinity showed unhandled kernel paging request oops-messages that looked a lot like silent data corruption. Lot's of debugging and testing lead to the kexec-32bit code, which is still allocating 4k PGDs when PTI is enabled. But since it uses native_set_pud() to build the page-table, it will unevitably call into __pti_set_user_pgtbl(), which writes beyond the allocated 4k page. Use PGD_ALLOCATION_ORDER to allocate PGDs in the kexec code to fix the issue. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: David H. Gutteridge <dhgutteridge@sympatico.ca> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1532533683-5988-4-git-send-email-joro@8bytes.org
2018-07-30Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables"Joerg Roedel
This reverts commit 77754cfa09a6c528c38cbca9ee4cc4f7cf6ad6f2. The patch was necessary to silence a WARN_ON_ONCE(in_nmi()) that triggered in the vmalloc_fault() function when PTI was enabled on x86-32. Faulting in an NMI handler turned out to be safe and the warning in vmalloc_fault() is gone now. So the above patch can be reverted. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: David H. Gutteridge <dhgutteridge@sympatico.ca> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1532533683-5988-3-git-send-email-joro@8bytes.org
2018-07-30x86/mm: Remove in_nmi() warning from vmalloc_fault()Joerg Roedel
It is perfectly okay to take page-faults, especially on the vmalloc area while executing an NMI handler. Remove the warning. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: David H. Gutteridge <dhgutteridge@sympatico.ca> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1532533683-5988-2-git-send-email-joro@8bytes.org
2018-07-30Merge branch 'clockevents/4.19' of ↵Thomas Gleixner
git://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clockevent/source changes from Daniel Lezcano: - Add a less accurate but always-on clocksource for the sprd platform (Baoling Wang) - Add the system timer for the new mediatek platforms (Stanley Chu) - Change the cpumask to cpu_possible_mask (Sudeep Holla)
2018-07-30ARM: 8785/1: use compiler built-ins for ffs and flsNicolas Pitre
On ARMv5 and above, it is beneficial to use compiler built-ins such as __builtin_ffs() and __builtin_ctzl() to implement ffs(), __ffs(), fls() and __fls(). The compiler does inline the clz instruction and even the rbit instruction when available, or provide a constant value when possible. On ARMv4 the compiler calls out to helper functions for those built-ins so it is best to keep the open coded versions in that case. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30ARM: 8784/1: NOMMU: Allow enter in Hyp modeVladimir Murzin
ARMv8R adds support for virtualisation extension (with some deviation from v8A). With this patch hyp-unaware boot code can offload to kernel setting up HYP stuff in a sane state. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30ARM: 8783/1: NOMMU: Extend check for VBAR supportVladimir Murzin
ARMv8R adds support for VBAR and updates ID_PFR1 with the new filed Sec_frac (bits [23:20]): Security fractional field. When the Security field is 0000, determines the support for features from the ARMv7 Security Extensions. Permitted values are: 0000 No features from the ARMv7 Security Extensions are implemented. This value is not supported in ARMv8 if ID_PFR1 bits [7:4] are zero. 0001 The implementation includes the VBAR, and the TCR.PD0 and TCR.PD1 bits. 0010 As for 0001, plus the ability to access Secure or Non-secure physical memory is supported. All other values are reserved. This field is only valid when ID_PFR1[7:4] == 0, otherwise it holds the value 0000. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30ARM: 8782/1: vfp: clean up arch/arm/vfp/MakefileMasahiro Yamada
Since commit 799c43415442 ("kbuild: thin archives make default for all archs"), $(AR) is used instead of $(LD) to combine object files. The following code in arch/arm/vfp/Makefile: LDFLAGS +=--no-warn-mismatch ... is no longer used. Also, arch/arm/Makefile already guards arch/arm/vfp/ by a boolean symbol, CONFIG_VFP, like this: core-$(CONFIG_VFP) += arch/arm/vfp/ So, $(CONFIG_VFP) is always evaluated to y in arch/arm/vfp/Makefile. There is no point to use pseudo object, vfp.o, which never becomes a module. Add all objects to obj-y directly. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+Vincent Whitchurch
When building the kernel as Thumb-2 with binutils 2.29 or newer, if the assembler has seen the .type directive (via ENDPROC()) for a symbol, it automatically handles the setting of the lowest bit when the symbol is used with ADR. The badr macro on the other hand handles this lowest bit manually. This leads to a jump to a wrong address in the wrong state in the syscall return path: Internal error: Oops - undefined instruction: 0 [#2] SMP THUMB2 Modules linked in: CPU: 0 PID: 652 Comm: modprobe Tainted: G D 4.18.0-rc3+ #8 PC is at ret_fast_syscall+0x4/0x62 LR is at sys_brk+0x109/0x128 pc : [<80101004>] lr : [<801c8a35>] psr: 60000013 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 50c5387d Table: 9e82006a DAC: 00000051 Process modprobe (pid: 652, stack limit = 0x(ptrval)) 80101000 <ret_fast_syscall>: 80101000: b672 cpsid i 80101002: f8d9 2008 ldr.w r2, [r9, #8] 80101006: f1b2 4ffe cmp.w r2, #2130706432 ; 0x7f000000 80101184 <local_restart>: 80101184: f8d9 a000 ldr.w sl, [r9] 80101188: e92d 0030 stmdb sp!, {r4, r5} 8010118c: f01a 0ff0 tst.w sl, #240 ; 0xf0 80101190: d117 bne.n 801011c2 <__sys_trace> 80101192: 46ba mov sl, r7 80101194: f5ba 7fc8 cmp.w sl, #400 ; 0x190 80101198: bf28 it cs 8010119a: f04f 0a00 movcs.w sl, #0 8010119e: f3af 8014 nop.w {20} 801011a2: f2af 1ea2 subw lr, pc, #418 ; 0x1a2 To fix this, add a new symbol name which doesn't have ENDPROC used on it and use that with badr. We can't remove the badr usage since that would would cause breakage with older binutils. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-07-30KVM: s390: Add skey emulation fault handlingJanosch Frank
When doing skey emulation for huge guests, we now need to fault in pmds, as we don't have PGSTES anymore to store them when we do not have valid table entries. Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-07-30s390/mm: Add huge pmd storage key handlingJanosch Frank
Storage keys for guests with huge page mappings have to be managed in hardware. There are no PGSTEs for PMDs that we could use to retain the guests's logical view of the key. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Clear skeys for newly mapped huge guest pmdsJanosch Frank
Similarly to the pte skey handling, where we set the storage key to the default key for each newly mapped pte, we have to also do that for huge pmds. With the PG_arch_1 flag we keep track if the area has already been cleared of its skeys. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-30s390/mm: Clear huge page storage keys on enable_skeyDominik Dingel
When a guest starts using storage keys, we trap and set a default one for its whole valid address space. With this patch we are now able to do that for large pages. To speed up the storage key insertion, we use __storage_key_init_range, which in-turn will use sske_frame to set multiple storage keys with one instruction. As it has been previously used for debuging we have to get rid of the default key check and make it quiescing. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> [replaced page_set_storage_key loop with __storage_key_init_range] Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Add huge page dirty sync supportJanosch Frank
To do dirty loging with huge pages, we protect huge pmds in the gmap. When they are written to, we unprotect them and mark them dirty. We introduce the function gmap_test_and_clear_dirty_pmd which handles dirty sync for huge pages. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Add gmap pmd invalidation and clearingJanosch Frank
If the host invalidates a pmd, we also have to invalidate the corresponding gmap pmds, as well as flush them from the TLB. This is necessary, as we don't share the pmd tables between host and guest as we do with ptes. The clearing part of these three new functions sets a guest pmd entry to _SEGMENT_ENTRY_EMPTY, so the guest will fault on it and we will re-link it. Flushing the gmap is not necessary in the host's lazy local and csp cases. Both purge the TLB completely. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Add gmap pmd notification bit settingJanosch Frank
Like for ptes, we also need invalidation notification for pmds, to make sure the guest lowcore pages are always accessible and later addition of shadowed pmds. With PMDs we do not have PGSTEs or some other bits we could use in the host PMD. Instead we pick one of the free bits in the gmap PMD. Every time a host pmd will be invalidated, we will check if the respective gmap PMD has the bit set and in that case fire up the notifier. Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-07-30s390/mm: Add gmap pmd linkingJanosch Frank
Let's allow pmds to be linked into gmap for the upcoming s390 KVM huge page support. Before this patch we copied the full userspace pmd entry. This is not correct, as it contains SW defined bits that might be interpreted differently in the GMAP context. Now we only copy over all hardware relevant information leaving out the software bits. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Abstract gmap notify bit settingJanosch Frank
Currently we use the software PGSTE bits PGSTE_IN_BIT and PGSTE_VSIE_BIT to notify before an invalidation occurs on a prefix page or a VSIE page respectively. Both bits are pgste specific, but are used when protecting a memory range. Let's introduce abstract GMAP_NOTIFY_* bits that will be realized into the respective bits when gmap DAT table entries are protected. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-30s390/mm: Make gmap_protect_range more modularJanosch Frank
This patch reworks the gmap_protect_range logic and extracts the pte handling into an own function. Also we do now walk to the pmd and make it accessible in the function for later use. This way we can add huge page handling logic more easily. Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-07-30can: ems_usb: Fix memory leak on ems_usb_disconnect()Anton Vasilyev
ems_usb_probe() allocates memory for dev->tx_msg_buffer, but there is no its deallocation in ems_usb_disconnect(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Cc: <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-07-29Linux 4.18-rc7v4.18-rc7Linus Torvalds
2018-07-29ext4: fix race when setting the bitmap corrupted flagWang Shilong
Whenever we hit block or inode bitmap corruptions we set bit and then reduce this block group free inode/clusters counter to expose right available space. However some of ext4_mark_group_bitmap_corrupted() is called inside group spinlock, some are not, this could make it happen that we double reduce one block group free counters from system. Always hold group spinlock for it could fix it, but it looks a little heavy, we could use test_and_set_bit() to fix race problems here. Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-07-29ext4: reset error code in ext4_find_entry in fallbackEric Sandeen
When ext4_find_entry() falls back to "searching the old fashioned way" due to a corrupt dx dir, it needs to reset the error code to NULL so that the nonstandard ERR_BAD_DX_DIR code isn't returned to userspace. https://bugzilla.kernel.org/show_bug.cgi?id=199947 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@yandex.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-07-29ext4: handle layout changes to pinned DAX mappingsRoss Zwisler
Follow the lead of xfs_break_dax_layouts() and add synchronization between operations in ext4 which remove blocks from an inode (hole punch, truncate down, etc.) and pages which are pinned due to DAX DMA operations. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
2018-07-29dax: dax_layout_busy_page() warn on !exceptionalRoss Zwisler
Inodes using DAX should only ever have exceptional entries in their page caches. Make this clear by warning if the iteration in dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a comment for the pagevec_release() call which only deals with struct page pointers. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2018-07-29docs: fix up the obviously obsolete bits in the new ext4 documentationTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29openvswitch: meter: Fix setting meter id for new entriesJustin Pettit
The meter code would create an entry for each new meter. However, it would not set the meter id in the new entry, so every meter would appear to have a meter id of zero. This commit properly sets the meter id when adding the entry. Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Signed-off-by: Justin Pettit <jpettit@ovn.org> Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29docs: add new ext4 superblock time extension fieldsDarrick J. Wong
The superblock timestamp fields were enlarged by u8 to be 40 bits wide. Update the documentation to reflect this. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29docs: create filesystem internal sectionDarrick J. Wong
Create a new top-level section for documentation of filesystem usage, on-disk format information, and anything else. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Some miscellaneous ext4 fixes for 4.18; one fix is for a regression introduced in 4.18-rc4. Sorry for the late-breaking pull. I was originally going to wait for the next merge window, but Eric Whitney found a regression introduced in 4.18-rc4, so I decided to push out the regression plus the other fixes now. (The other commits have been baking in linux-next since early July)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix check to prevent initializing reserved inodes ext4: check for allocation block validity with block group locked ext4: fix inline data updates with checksums enabled ext4: clear mmp sequence number when remounting read-only ext4: fix false negatives *and* false positives in ext4_check_descriptors()
2018-07-29ext4: use swap macro in mext_page_double_lockGustavo A. R. Silva
Make use of the swap macro and remove unnecessary variable *tmp*. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: check allocation failure when duplicating "data" in ext4_remount()Chengguang Xu
There is no check for allocation failure when duplicating "data" in ext4_remount(). Check for failure and return error -ENOMEM in this case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2018-07-29ext4: fix warning message in ext4_enable_quotas()Junichi Uekawa
Output the warning message before we clobber type and be -1 all the time. The error message would now be [ 1.519791] EXT4-fs warning (device vdb): ext4_enable_quotas:5402: Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix. Signed-off-by: Junichi Uekawa <uekawa@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2018-07-29ext4: super: extend timestamps to 40 bitsArnd Bergmann
The inode timestamps use 34 bits in ext4, but the various timestamps in the superblock are limited to 32 bits. If every user accesses these as 'unsigned', then this is good until year 2106, but it seems better to extend this a bit further in the process of removing the deprecated get_seconds() function. This adds another byte for each timestamp in the superblock, making them long enough to store timestamps beyond what is in the inodes, which seems good enough here (in ocfs2, they are already 64-bit wide, which is appropriate for a new layout). I did not modify e2fsprogs, which obviously needs the same change to actually interpret future timestamps correctly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29jbd2: replace current_kernel_time64 with ktime equivalentArnd Bergmann
jbd2 is one of the few callers of current_kernel_time64(), which is a wrapper around ktime_get_coarse_real_ts64(). This calls the latter directly for consistency with the rest of the kernel that is moving to the ktime_get_ family of time accessors. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: use timespec64 for all inode timesArnd Bergmann
This is the last missing piece for the inode times on 32-bit systems: now that VFS interfaces use timespec64, we just need to stop truncating the tv_sec values for y2038 compatibililty. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29netlink: Do not subscribe to non-existent groupsDmitry Safonov
Make ABI more strict about subscribing to group > ngroups. Code doesn't check for that and it looks bogus. (one can subscribe to non-existing group) Still, it's possible to bind() to all possible groups with (-1) Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: netdev@vger.kernel.org Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-29ext4: use ktime_get_real_seconds for i_dtimeArnd Bergmann
We only care about the low 32-bit for i_dtime as explained in commit b5f515735bea ("ext4: avoid Y2038 overflow in recently_deleted()"), so the use of get_seconds() is correct here, but that function is getting removed in the process of the y2038 fixes, so let's use the modern ktime_get_real_seconds() here. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: use 64-bit timestamps for mmp_timeArnd Bergmann
The mmp_time field is 64 bits wide, which is good, but calling get_seconds() results in a 32-bit value on 32-bit architectures. Using ktime_get_real_seconds() instead returns 64 bits everywhere. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: sysfs: print ext4_super_block fields as little-endianArnd Bergmann
While working on extended rand for last_error/first_error timestamps, I noticed that the endianess is wrong; we access the little-endian fields in struct ext4_super_block as native-endian when we print them. This adds a special case in ext4_attr_show() and ext4_attr_store() to byteswap the superblock fields if needed. In older kernels, this code was part of super.c, it got moved to sysfs.c in linux-4.4. Cc: stable@vger.kernel.org Fixes: 52c198c6820f ("ext4: add sysfs entry showing whether the fs contains errors") Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: import extended attributes chapter from wiki pageDarrick J. Wong
Import the chapter about extended attributes from the on-disk format wiki page into the kernel documentation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-29ext4: import directory layout chapter from wiki pageDarrick J. Wong
Import the chapter about directory layout from the on-disk format wiki page into the kernel documentation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>