summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-08-29Merge branch 'for-4.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "A late but obvious fix for cgroup. I broke the 'cpuset.memory_pressure' file a long time ago (v4.4) by accidentally deleting its file index, which made it a duplicate of the 'cpuset.memory_migrate' file. Spotted and fixed by Waiman" * 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: Fix incorrect memory_pressure control file mapping
2017-08-29Merge branch 'for-4.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "Late fixes for libata. There's a minor platform driver fix but the important one is READ LOG PAGE. This is a new ATA command which is used to test some optional features but it broke probing of some devices - they locked up instead of failing the unknown command. Christoph tried blacklisting, but, after finding out there are multiple devices which fail this way, backed off to testing feature bit in IDENTIFY data first, which is a bit lossy (we can miss features on some devices) but should be a lot safer" * 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: Revert "libata: quirk read log on no-name M.2 SSD" libata: check for trusted computing in IDENTIFY DEVICE data libata: quirk read log on no-name M.2 SSD sata: ahci-da850: Fix some error handling paths in 'ahci_da850_probe()'
2017-08-29ipv6: do not set sk_destruct in IPV6_ADDRFORM sockoptXin Long
ChunYu found a kernel warn_on during syzkaller fuzzing: [40226.038539] WARNING: CPU: 5 PID: 23720 at net/ipv4/af_inet.c:152 inet_sock_destruct+0x78d/0x9a0 [40226.144849] Call Trace: [40226.147590] <IRQ> [40226.149859] dump_stack+0xe2/0x186 [40226.176546] __warn+0x1a4/0x1e0 [40226.180066] warn_slowpath_null+0x31/0x40 [40226.184555] inet_sock_destruct+0x78d/0x9a0 [40226.246355] __sk_destruct+0xfa/0x8c0 [40226.290612] rcu_process_callbacks+0xaa0/0x18a0 [40226.336816] __do_softirq+0x241/0x75e [40226.367758] irq_exit+0x1f6/0x220 [40226.371458] smp_apic_timer_interrupt+0x7b/0xa0 [40226.376507] apic_timer_interrupt+0x93/0xa0 The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct. As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"), udp has changed to use udp_destruct_sock as sk_destruct where it would udp_rmem_release all rmem. But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after changing family to PF_INET. If rmem is not 0 at that time, and there is no place to release rmem before calling inet_sock_destruct, the warn_on will be triggered. This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has already set it's sk_destruct with inet_sock_destruct and UDP has set with udp_destruct_sock since they're created. Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers") Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-08-29 1) Fix dst_entry refcount imbalance when using socket policies. From Lorenzo Colitti. 2) Fix locking when adding the ESP trailers. 3) Fix tailroom calculation for the ESP trailer by using skb_tailroom instead of skb_availroom. 4) Fix some info leaks in xfrm_user. From Mathias Krause. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29Revert "rmap: do not call mmu_notifier_invalidate_page() under ptl"Linus Torvalds
This reverts commit aac2fea94f7a3df8ad1eeb477eb2643f81fd5393. It turns out that that patch was complete and utter garbage, and broke KVM, resulting in odd oopses. Quoting Andrea Arcangeli: "The aforementioned commit has 3 bugs. 1) mmu_notifier_invalidate_range cannot be used in replacement of mmu_notifier_invalidate_range_start/end. For KVM mmu_notifier_invalidate_range is a noop and rightfully so. A MMU notifier implementation has to implement either ->invalidate_range method or the invalidate_range_start/end methods, not both. And if you implement invalidate_range_start/end like KVM is forced to do, calling mmu_notifier_invalidate_range in common code is a noop for KVM. For those MMU notifiers that can get away only implementing ->invalidate_range, the ->invalidate_range is implicitly called by mmu_notifier_invalidate_range_end(). And only those secondary MMUs that share the same pagetable with the primary MMU (like AMD iommuv2) can get away only implementing ->invalidate_range. So all cases (THP on/off) are broken right now. To fix this is enough to replace mmu_notifier_invalidate_range with mmu_notifier_invalidate_range_start;mmu_notifier_invalidate_range_end. Either that or call multiple mmu_notifier_invalidate_page like before. 2) address + (1UL << compound_order(page) is buggy, it should be PAGE_SIZE << compound_order(page), it's bytes not pages, 2M not 512. 3) The whole invalidate_range thing was an attempt to call a single invalidate while walking multiple 4k ptes that maps the same THP (after a pmd virtual split without physical compound page THP split). It's unclear if the rmap_walk will always provide an address that is 2M aligned as parameter to try_to_unmap_one, in presence of THP. I think it needs also an address &= (PAGE_SIZE << compound_order(page)) - 1 to be safe" In general, we should stop making excuses for horrible MMU notifier users. It's much more important that the core VM is sane and safe, than letting MMU notifiers sleep. So if some MMU notifier is sleeping under a spinlock, we need to fix the notifier, not try to make excuses for that garbage in the core VM. Reported-and-tested-by: Bernhard Held <berny156@gmx.de> Reported-and-tested-by: Adam Borowski <kilobyte@angband.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: axie <axie@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-29Revert "libata: quirk read log on no-name M.2 SSD"Tejun Heo
This reverts commit 35f0b6a779b8b7a98faefd7c1c660b4dac9a5c26. We now conditionalize issuing of READ LOG PAGE on the TRUSTED COMPUTING SUPPORTED bit in the identity data and this shouldn't be necessary. Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-29libata: check for trusted computing in IDENTIFY DEVICE dataChristoph Hellwig
ATA-8 and later mirrors the TRUSTED COMPUTING SUPPORTED bit in word 48 of the IDENTIFY DEVICE data. Check this before issuing a READ LOG PAGE command to avoid issues with buggy devices. The only downside is that we can't support Security Send / Receive for a device with an older revision due to the conflicting use of this field in earlier specifications. tj: The reason we need this is because some devices which don't support READ LOG PAGE lock up after getting issued that command. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-29perf symbols: Fix plt entry calculation for ARM and AARCH64Li Bin
On x86, the plt header size is as same as the plt entry size, and can be identified from shdr's sh_entsize of the plt. But we can't assume that the sh_entsize of the plt shdr is always the plt entry size in all architecture, and the plt header size may be not as same as the plt entry size in some architecure. On ARM, the plt header size is 20 bytes and the plt entry size is 12 bytes (don't consider the FOUR_WORD_PLT case) that refer to the binutils implementation. The plt section is as follows: Disassembly of section .plt: 000004a0 <__cxa_finalize@plt-0x14>: 4a0: e52de004 push {lr} ; (str lr, [sp, #-4]!) 4a4: e59fe004 ldr lr, [pc, #4] ; 4b0 <_init+0x1c> 4a8: e08fe00e add lr, pc, lr 4ac: e5bef008 ldr pc, [lr, #8]! 4b0: 00008424 .word 0x00008424 000004b4 <__cxa_finalize@plt>: 4b4: e28fc600 add ip, pc, #0, 12 4b8: e28cca08 add ip, ip, #8, 20 ; 0x8000 4bc: e5bcf424 ldr pc, [ip, #1060]! ; 0x424 000004c0 <printf@plt>: 4c0: e28fc600 add ip, pc, #0, 12 4c4: e28cca08 add ip, ip, #8, 20 ; 0x8000 4c8: e5bcf41c ldr pc, [ip, #1052]! ; 0x41c On AARCH64, the plt header size is 32 bytes and the plt entry size is 16 bytes. The plt section is as follows: Disassembly of section .plt: 0000000000000560 <__cxa_finalize@plt-0x20>: 560: a9bf7bf0 stp x16, x30, [sp,#-16]! 564: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 568: f944be11 ldr x17, [x16,#2424] 56c: 9125e210 add x16, x16, #0x978 570: d61f0220 br x17 574: d503201f nop 578: d503201f nop 57c: d503201f nop 0000000000000580 <__cxa_finalize@plt>: 580: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 584: f944c211 ldr x17, [x16,#2432] 588: 91260210 add x16, x16, #0x980 58c: d61f0220 br x17 0000000000000590 <__gmon_start__@plt>: 590: 90000090 adrp x16, 10000 <__FRAME_END__+0xf8a8> 594: f944c611 ldr x17, [x16,#2440] 598: 91262210 add x16, x16, #0x988 59c: d61f0220 br x17 NOTES: In addition to ARM and AARCH64, other architectures, such as s390/alpha/mips/parisc/poperpc/sh/sparc/xtensa also need to consider this issue. Signed-off-by: Li Bin <huawei.libin@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexis Berlemont <alexis.berlemont@gmail.com> Cc: David Tolnay <dtolnay@gmail.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Link: http://lkml.kernel.org/r/1496622849-21877-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-29Merge branch 'stable/for-jens-4.13' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus Pull xen-blkback fix from Konrad: "[...] A bug-fix when shutting down xen block backend driver with multiple queues and the driver not clearing all of them."
2017-08-29s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangsChristian Borntraeger
Right now there is a potential hang situation for postcopy migrations, if the guest is enabling storage keys on the target system during the postcopy process. For storage key virtualization, we have to forbid the empty zero page as the storage key is a property of the physical page frame. As we enable storage key handling lazily we then drop all mappings for empty zero pages for lazy refaulting later on. This does not work with the postcopy migration, which relies on the empty zero page never triggering a fault again in the future. The reason is that postcopy migration will simply read a page on the target system if that page is a known zero page to fault in an empty zero page. At the same time postcopy remembers that this page was already transferred - so any future userfault on that page will NOT be retransmitted again to avoid races. If now the guest enters the storage key mode while in postcopy, we will break this assumption of postcopy. The solution is to disable the empty zero page for KVM guests early on and not during storage key enablement. With this change, the postcopy migration process is guaranteed to start after no zero pages are left. As guest pages are very likely not empty zero pages anyway the memory overhead is also pretty small. While at it this also adds proper page table locking to the zero page removal. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-08-29s390/dasd: Add discard support for FBA devicesJan Höppner
The z/VM hypervisor provides virtual disks (VDISK) which are backed by main memory of the hypervisor. Those devices are seen as DASD FBA disks within the Linux guest. Whenever data is written to such a device, memory is allocated on-the-fly by z/VM accordingly. This memory, however, is not being freed if data on the device is deleted by the guest OS. In order to make memory usable after deletion again, add discard support to the FBA discipline. While at it, update comments regarding the DASD_FEATURE_* flags. Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com> Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-08-29s390/zcrypt: make CPRBX constBhumika Goyal
Make this const as it is only used in a copy operation. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-08-29s390/uaccess: avoid mvcos jump labelMartin Schwidefsky
If the kernel is compiled for z10 or later machines the uaccess code inlines the mvcos instruction. The facility bit 27 which indicates the availability of MVCOS has to be set. The have_mvcos jump label will always be true. Make the generation of the have_mvcos jump label conditional on !CONFIG_HAVE_MARCH_Z10_FEATURES. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-08-29s390/mm: use generic mm_hooksMartin Schwidefsky
With git commit 3446c13b268af86391d06611327006b059b8bab1 "s390/mm: four page table levels vs. fork" s390 dropped its architecture specific version of arch_dup_mmap. Now all functions defined by include/asm-generic/mm_hooks.h are identical to the s390 versions. Use the generic header. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-08-29s390/facilities: fix typoHeiko Carstens
There really is no "general extension" facility available. Use the correct name "general instructions extension" facility instead. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-08-29s390/vmcp: simplify vmcp_response_free()Heiko Carstens
Get rid of the goto and "out" label within vmcp_response_free() which I added. This just makes the code harder to read than necessary. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-08-29perf probe: Fix kprobe blacklist checking conditionLi Bin
The commit 9aaf5a5f479b ("perf probe: Check kprobes blacklist when adding new events"), 'perf probe' supports checking the blacklist of the fuctions which can not be probed. But the checking condition is wrong, that the end_addr of the symbol which is the start_addr of the next symbol can't be included. Committer notes: IOW make it match its kernel counterpart in kernel/kprobes.c: bool within_kprobe_blacklist(unsigned long addr) Each entry have as its end address not its end address, but the first address _outside_ that symbol, which for related functions, is the first address of the next symbol, like these from kernel/trace/trace_probe.c: 0xffffffffbd198df0-0xffffffffbd198e40 print_type_u8 0xffffffffbd198e40-0xffffffffbd198e90 print_type_u16 0xffffffffbd198e90-0xffffffffbd198ee0 print_type_u32 0xffffffffbd198ee0-0xffffffffbd198f30 print_type_u64 0xffffffffbd198f30-0xffffffffbd198f80 print_type_s8 0xffffffffbd198f80-0xffffffffbd198fd0 print_type_s16 0xffffffffbd198fd0-0xffffffffbd199020 print_type_s32 0xffffffffbd199020-0xffffffffbd199070 print_type_s64 0xffffffffbd199070-0xffffffffbd1990c0 print_type_x8 0xffffffffbd1990c0-0xffffffffbd199110 print_type_x16 0xffffffffbd199110-0xffffffffbd199160 print_type_x32 0xffffffffbd199160-0xffffffffbd1991b0 print_type_x64 But not always: 0xffffffffbd1997b0-0xffffffffbd1997c0 fetch_kernel_stack_address (kernel/trace/trace_probe.c) 0xffffffffbd1c57f0-0xffffffffbd1c58b0 __context_tracking_enter (kernel/context_tracking.c) Signed-off-by: Li Bin <huawei.libin@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: zhangmengting@huawei.com Fixes: 9aaf5a5f479b ("perf probe: Check kprobes blacklist when adding new events") Link: http://lkml.kernel.org/r/1504011443-7269-1-git-send-email-huawei.libin@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-29MIPS: Remove pt_regs adjustments in indirect syscall handlerJames Cowgill
If a restartable syscall is called using the indirect o32 syscall handler - eg: syscall(__NR_waitid, ...), then it is possible for the incorrect arguments to be passed to the syscall after it has been restarted. This is because the syscall handler tries to shift all the registers down one place in pt_regs so that when the syscall is restarted, the "real" syscall is called instead. Unfortunately it only shifts the arguments passed in registers, not the arguments on the user stack. This causes the 4th argument to be duplicated when the syscall is restarted. Fix by removing all the pt_regs shifting so that the indirect syscall handler is called again when the syscall is restarted. The comment "some syscalls like execve get their arguments from struct pt_regs" is long out of date so this should now be safe. Signed-off-by: James Cowgill <James.Cowgill@imgtec.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> Tested-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/15856/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-08-29MIPS: seccomp: Fix indirect syscall argsJames Hogan
Since commit 669c4092225f ("MIPS: Give __secure_computing() access to syscall arguments."), upon syscall entry when seccomp is enabled, syscall_trace_enter() passes a carefully prepared struct seccomp_data containing syscall arguments to __secure_computing(). Unfortunately it directly uses mips_get_syscall_arg() and fails to take into account the indirect O32 system calls (i.e. syscall(2)) which put the system call number in a0 and have the arguments shifted up by one entry. We can't just revert that commit as samples/bpf/tracex5 would break again, so use syscall_get_arguments() which already takes indirect syscalls into account instead of directly using mips_get_syscall_arg(), similar to what populate_seccomp_data() does. This also removes the redundant error checking of the mips_get_syscall_arg() return value (get_user() already zeroes the result if an argument from the stack can't be loaded). Reported-by: James Cowgill <James.Cowgill@imgtec.com> Fixes: 669c4092225f ("MIPS: Give __secure_computing() access to syscall arguments.") Signed-off-by: James Hogan <james.hogan@imgtec.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: David Daney <david.daney@cavium.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16994/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-08-29clocksource/drivers/bcm2835: Remove message for a memory allocation failureMarkus Elfring
The bcm2835_timer_init() function emits an error message in case of a memory allocation failure. This is pointless as the mm core does that already. Remove this message. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29locking/lockdep/selftests: Fix mixed read-write ABBA testsPeter Zijlstra
Commit: e91498589746 ("locking/lockdep/selftests: Add mixed read-write ABBA tests") adds an explicit FAILURE to the locking selftest but overlooked the fact that this kills lockdep. Fudge the test to avoid this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/20170828124245.xlo2yshxq2btgmuf@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29sched/completion: Avoid unnecessary stack allocation for ↵Boqun Feng
COMPLETION_INITIALIZER_ONSTACK() In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the stack allocation of the caller. However, on some compilers, a temporary structure was allocated for the return value of COMPLETION_INITIALIZER_ONSTACK(). For example in write_journal() with LOCKDEP_COMPLETIONS=y (GCC is 7.1.1): io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 2462: e8 00 00 00 00 callq 2467 <write_journal+0x47> 2467: 48 8d 85 80 fd ff ff lea -0x280(%rbp),%rax 246e: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 2475: 48 c7 c2 00 00 00 00 mov $0x0,%rdx x->done = 0; 247c: c7 85 90 fd ff ff 00 movl $0x0,-0x270(%rbp) 2483: 00 00 00 init_waitqueue_head(&x->wait); 2486: 48 8d 78 18 lea 0x18(%rax),%rdi 248a: e8 00 00 00 00 callq 248f <write_journal+0x6f> if (commit_start + commit_sections <= ic->journal_sections) { 248f: 41 8b 87 a8 00 00 00 mov 0xa8(%r15),%eax io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 2496: 48 8d bd e8 f9 ff ff lea -0x618(%rbp),%rdi 249d: 48 8d b5 90 fd ff ff lea -0x270(%rbp),%rsi 24a4: b9 17 00 00 00 mov $0x17,%ecx 24a9: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) if (commit_start + commit_sections <= ic->journal_sections) { 24ac: 41 39 c6 cmp %eax,%r14d io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 24af: 48 8d bd 90 fd ff ff lea -0x270(%rbp),%rdi 24b6: 48 8d b5 e8 f9 ff ff lea -0x618(%rbp),%rsi 24bd: b9 17 00 00 00 mov $0x17,%ecx 24c2: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi) We can obviously see the temporary structure allocated, and the compiler also does two meaningless memcpy with "rep movsq". And according to: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs The return value of a statement expression is returned by value, so the temporary variable is created in COMPLETION_INITIALIZER_ONSTACK(), and that's why the temporary structures are allocted. To fix this, make the brace block in COMPLETION_INITIALIZER_ONSTACK() return a pointer and dereference it outside the block rather than return the whole structure, in this way, we are able to teach the compiler not to do the unnecessary stack allocation. This could also reduce the stack size even if !LOCKDEP, for example in write_journal(), compiled with gcc 7.1.1, the result of command: objdump -d drivers/md/dm-integrity.o | ./scripts/checkstack.pl x86 before: 0x0000246a write_journal [dm-integrity.o]: 696 after: 0x00002b7a write_journal [dm-integrity.o]: 296 Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/20170823152542.5150-3-boqun.feng@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuseBoqun Feng
COMPLETION_INITIALIZER_ONSTACK() is supposed to be used as an initializer, in other words, it should only be used in assignment expressions or compound literals. So the usage in drivers/acpi/nfit/core.c: COMPLETION_INITIALIZER_ONSTACK(flush.cmp); ... is inappropriate. Besides, this usage could also break the build for another fix that reduces stack sizes caused by COMPLETION_INITIALIZER_ONSTACK(), because that fix changes COMPLETION_INITIALIZER_ONSTACK() from rvalue to lvalue, and usage as above will report the following error: drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe': include/linux/completion.h:77:3: error: value computed is not used [-Werror=unused-value] (*({ init_completion(&work); &work; })) This patch fixes this by replacing COMPLETION_INITIALIZER_ONSTACK() with init_completion() in acpi_nfit_flush_probe(), which does the same initialization without any other problems. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/20170824142239.15178-1-boqun.feng@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29locking/pvqspinlock: Relax cmpxchg's to improve performance on some ↵Waiman Long
architectures All the locking related cmpxchg's in the following functions are replaced with the _acquire variants: - pv_queued_spin_steal_lock() - trylock_clear_pending() This change should help performance on architectures that use LL/SC. The cmpxchg in pv_kick_node() is replaced with a relaxed version with explicit memory barrier to make sure that it is fully ordered in the writing of next->lock and the reading of pn->state whether the cmpxchg is a success or failure without affecting performance in non-LL/SC architectures. On a 2-socket 12-core 96-thread Power8 system with pvqspinlock explicitly enabled, the performance of a locking microbenchmark with and without this patch on a 4.13-rc4 kernel with Xinhui's PPC qspinlock patch were as follows: # of thread w/o patch with patch % Change ----------- --------- ---------- -------- 8 5054.8 Mop/s 5209.4 Mop/s +3.1% 16 3985.0 Mop/s 4015.0 Mop/s +0.8% 32 2378.2 Mop/s 2396.0 Mop/s +0.7% Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1502741222-24360-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29smp: Avoid using two cache lines for struct call_single_dataYing Huang
struct call_single_data is used in IPIs to transfer information between CPUs. Its size is bigger than sizeof(unsigned long) and less than cache line size. Currently it is not allocated with any explicit alignment requirements. This makes it possible for allocated call_single_data to cross two cache lines, which results in double the number of the cache lines that need to be transferred among CPUs. This can be fixed by requiring call_single_data to be aligned with the size of call_single_data. Currently the size of call_single_data is the power of 2. If we add new fields to call_single_data, we may need to add padding to make sure the size of new definition is the power of 2 as well. Fortunately, this is enforced by GCC, which will report bad sizes. To set alignment requirements of call_single_data to the size of call_single_data, a struct definition and a typedef is used. To test the effect of the patch, I used the vm-scalability multiple thread swap test case (swap-w-seq-mt). The test will create multiple threads and each thread will eat memory until all RAM and part of swap is used, so that huge number of IPIs are triggered when unmapping memory. In the test, the throughput of memory writing improves ~5% compared with misaligned call_single_data, because of faster IPIs. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Huang, Ying <ying.huang@intel.com> [ Add call_single_data_t and align with size of call_single_data. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29locking/lockdep: Untangle xhlock history save/restore from task independencePeter Zijlstra
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to ensure the temporal IRQ events don't interact with task state, the XHLOCK_PROC is a fundament different beast that just happens to share the interface. The purpose of XHLOCK_PROC is to annotate independent execution inside one task. For example workqueues, each work should appear to run in its own 'pristine' 'task'. Remove XHLOCK_PROC in favour of its own interface to avoid confusion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: kernel-team@lge.com Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/x86: Fix caps/ for !IntelPeter Zijlstra
Move the 'max_precise' capability into generic x86 code where it belongs. This fixes a sysfs splat on !Intel systems where we fail to set x86_pmu_caps_group.atts. Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory") Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/core, x86: Add PERF_SAMPLE_PHYS_ADDRKan Liang
For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. Add a new sample type for physical address. perf already has a facility to collect data virtual address. This patch introduces a function to convert the virtual address to physical address. The function is quite generic and can be extended to any architecture as long as a virtual address is provided. - For kernel direct mapping addresses, virt_to_phys is used to convert the virtual addresses to physical address. - For user virtual addresses, __get_user_pages_fast is used to walk the pages tables for user physical address. - This does not work for vmalloc addresses right now. These are not resolved, but code to do that could be added. The new sample type requires collecting the virtual address. The virtual address will not be output unless SAMPLE_ADDR is applied. For security, the physical address can only be exposed to root or privileged user. Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mpe@ellerman.id.au Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/core, pt, bts: Get rid of itrace_startedAlexander Shishkin
I just noticed that hw.itrace_started and hw.config are aliased to the same location. Now, the PT driver happens to use both, which works out fine by sheer luck: - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the program order, although there are no compiler barriers to ensure that, - to the perf_log_itrace_start() hw.itrace_start looks set at the same time as when it is intended to be set because both stores happen in the same path, - hw.config is never reset to zero in the PT driver. Now, the use of hw.config by the PT driver makes more sense (it being a HW PMU) than messing around with itrace_started, which is an awkward API to begin with. This patch replaces hw.itrace_started with an attach_state bit and an API call for the PMU drivers to use to communicate the condition. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29Documentation: Hardware tag matchingArtemy Kovalyov
Add document providing definitions of terms and core explanations for tag matching (TM) protocols, eager and rendezvous, TM application header, tag list manipulations and matching process. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/mlx5: Support IB_SRQT_TMArtemy Kovalyov
Pass to mlx5_core flag to enable rendezvous offload, list_size and CQ when SRQ created with IB_SRQT_TM. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29net/mlx5: Add XRQ supportArtemy Kovalyov
Add support to new XRQ(eXtended shared Receive Queue) hardware object. It supports SRQ semantics with addition of extended receive buffers topologies and offloads. Currently supports tag matching topology and rendezvouz offload. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/mlx5: Fill XRQ capabilitiesArtemy Kovalyov
Provide driver specific values for XRQ capabilities. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/uverbs: Expose XRQ capabilitiesArtemy Kovalyov
Make XRQ capabilities available via ibv_query_device() verb. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/uverbs: Add new SRQ type IB_SRQT_TMArtemy Kovalyov
Add new SRQ type capable of new tag matching feature. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/uverbs: Add XRQ creation parameter to UAPIArtemy Kovalyov
Add tm_list_size parameter to struct ib_uverbs_create_xsrq. If SRQ type is tag-matching this field defines maximum size of tag matching list. Otherwise, it is expected to be zero. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/core: Add new SRQ type IB_SRQT_TMArtemy Kovalyov
This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag matching and rendezvous offloads for MPI applications. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/core: Separate CQ handle in SRQ contextArtemy Kovalyov
Before this change CQ attached to SRQ was part of XRC specific extension. Moving CQ handle out makes it available to other types extending SRQ functionality. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/core: Add XRQ capabilitiesArtemy Kovalyov
This patch adds following TM XRQ capabilities: * max_rndv_hdr_size - Max size of rendezvous request message * max_num_tags - Max number of entries in tag matching list * max_ops - Max number of outstanding list operations * max_sge - Max number of SGE in tag matching entry * flags - the following flags are currently defined: - IB_TM_CAP_RC - Support tag matching on RC transport Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29net/mlx5: Update HW layout definitionsArtemy Kovalyov
* add offload_type field to mlx5_ifc_qpc_bits * update mlx5_ifc_xrqc_bits layout Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29mux: make device_type constBhumika Goyal
Make this const as it is only stored in the type field of a device structure, which is const. Done using Coccinelle. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-29x86/boot: Prevent faulty bootparams.screeninfo from causing harmJan H. Schönherr
If a zero for the number of lines manages to slip through, scroll() may underflow some offset calculations, causing accesses outside the video memory. Make the check in __putstr() more pessimistic to prevent that. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot: Provide more slack space during decompressionJan H. Schönherr
The current slack space is not enough for LZ4, which has a worst case overhead of 0.4% for data that cannot be further compressed. With an LZ4 compressed kernel with an embedded initrd, the output is likely to overwrite the input. Increase the slack space to avoid that. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1503842124-29718-1-git-send-email-jschoenh@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/ftrace: Fix double traces of perf on ftrace:functionZhou Chengming
When running perf on the ftrace:function tracepoint, there is a bug which can be reproduced by: perf record -e ftrace:function -a sleep 20 & perf record -e ftrace:function ls perf script ls 10304 [005] 171.853235: ftrace:function: perf_output_begin ls 10304 [005] 171.853237: ftrace:function: perf_output_begin ls 10304 [005] 171.853239: ftrace:function: task_tgid_nr_ns ls 10304 [005] 171.853240: ftrace:function: task_tgid_nr_ns ls 10304 [005] 171.853242: ftrace:function: __task_pid_nr_ns ls 10304 [005] 171.853244: ftrace:function: __task_pid_nr_ns We can see that all the function traces are doubled. The problem is caused by the inconsistency of the register function perf_ftrace_event_register() with the probe function perf_ftrace_function_call(). The former registers one probe for every perf_event. And the latter handles all perf_events on the current cpu. So when two perf_events on the current cpu, the traces of them will be doubled. So this patch adds an extra parameter "event" for perf_tp_event, only send sample data to this event when it's not NULL. Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: huawei.libin@huawei.com Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/core: Fix potential double-fetch bugMeng Xu
While examining the kernel source code, I found a dangerous operation that could turn into a double-fetch situation (a race condition bug) where the same userspace memory region are fetched twice into kernel with sanity checks after the first fetch while missing checks after the second fetch. 1. The first fetch happens in line 9573 get_user(size, &uattr->size). 2. Subsequently the 'size' variable undergoes a few sanity checks and transformations (line 9577 to 9584). 3. The second fetch happens in line 9610 copy_from_user(attr, uattr, size) 4. Given that 'uattr' can be fully controlled in userspace, an attacker can race condition to override 'uattr->size' to arbitrary value (say, 0xFFFFFFFF) after the first fetch but before the second fetch. The changed value will be copied to 'attr->size'. 5. There is no further checks on 'attr->size' until the end of this function, and once the function returns, we lose the context to verify that 'attr->size' conforms to the sanity checks performed in step 2 (line 9577 to 9584). 6. My manual analysis shows that 'attr->size' is not used elsewhere later, so, there is no working exploit against it right now. However, this could easily turns to an exploitable one if careless developers start to use 'attr->size' later. To fix this, override 'attr->size' from the second fetch to the one from the first fetch, regardless of what is actually copied in. In this way, it is assured that 'attr->size' is consistent with the checks performed after the first fetch. Signed-off-by: Meng Xu <mengxu.gatech@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: meng.xu@gatech.edu Cc: sanidhya@gatech.edu Cc: taesoo@gatech.edu Link: http://lkml.kernel.org/r/1503522470-35531-1-git-send-email-meng.xu@gatech.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()Jiri Slaby
ALIGN+GLOBAL is effectively what ENTRY() does, so use ENTRY() which is dedicated for exactly this purpose -- global functions. Note that stub32_clone() is a C-like leaf function -- it has a standard call frame -- it only switches one argument and continues by jumping into C. Since each ENTRY() should be balanced by some END*() marker, we add a corresponding ENDPROC() to stub32_clone() too. Besides that, x86's custom GLOBAL macro is going to die very soon. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170824080624.7768-2-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/fpu/math-emu: Add ENDPROC to functionsJiri Slaby
Functions in math-emu are annotated as ENTRY() symbols, but their ends are not annotated at all. But these are standard functions called from C, with proper stack register update etc. Omitting the ends means: * the annotations are not paired and we cannot deal with such functions e.g. in objtool * the symbols are not marked as functions in the object file * there are no sizes of the functions in the object file So fix this by adding ENDPROC() to each such case in math-emu. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170824080624.7768-1-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot/64: Extract efi_pe_entry() from startup_64()Jiri Slaby
Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into startup_64(). In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry() to start at 0x210. But this requirement was removed long time ago, in: 99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code") The way it is now makes the code less readable and illogical. Given we can now safely extract the inlined efi_pe_entry() body from startup_64() into a separate function, we do so. We also annotate the function appropriatelly by ENTRY+ENDPROC. ABI offsets are preserved: 0000000000000000 T startup_32 0000000000000200 T startup_64 0000000000000390 T efi64_stub_entry On the top-level, it looked like: .org 0x200 ENTRY(startup_64) #ifdef CONFIG_EFI_STUB ; start of inlined jmp preferred_addr GLOBAL(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) leaq preferred_addr(%rax), %rax jmp *%rax preferred_addr: #endif ; end of inlined ... ; a lot of assembly (startup_64) ENDPROC(startup_64) And it is now converted into: .org 0x200 ENTRY(startup_64) ... ; a lot of assembly (startup_64) ENDPROC(startup_64) #ifdef CONFIG_EFI_STUB ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) leaq startup_64(%rax), %rax jmp *%rax ENDPROC(efi_pe_entry) #endif Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot/32: Extract efi_pe_entry() from startup_32()Jiri Slaby
The efi_pe_entry() body is somehow squashed into startup_32(). In the old days, we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start at 0x10. But this requirement was removed long time ago, in: 99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code") The way it is now makes the code less readable and illogical. Given we can now safely extract the inlined efi_pe_entry() body from startup_32() into a separate function, we do so and we separate it to two functions as they are marked already: efi_pe_entry() + efi32_stub_entry(). We also annotate the functions appropriatelly by ENTRY+ENDPROC. ABI offset is preserved: 0000 128 FUNC GLOBAL DEFAULT 6 startup_32 0080 60 FUNC GLOBAL DEFAULT 6 efi_pe_entry 00bc 68 FUNC GLOBAL DEFAULT 6 efi32_stub_entry On the top-level, it looked like this: ENTRY(startup_32) #ifdef CONFIG_EFI_STUB ; start of inlined jmp preferred_addr ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) ENTRY(efi32_stub_entry) ... ; a lot of assembly (efi32_stub_entry) leal preferred_addr(%eax), %eax jmp *%eax preferred_addr: #endif ; end of inlined ... ; a lot of assembly (startup_32) ENDPROC(startup_32) And it is now converted into: ENTRY(startup_32) ... ; a lot of assembly (startup_32) ENDPROC(startup_32) #ifdef CONFIG_EFI_STUB ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) ENDPROC(efi_pe_entry) ENTRY(efi32_stub_entry) ... ; a lot of assembly (efi32_stub_entry) leal startup_32(%eax), %eax jmp *%eax ENDPROC(efi32_stub_entry) #endif Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>