summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-26io-wq: shrink io_wq_work a bitJens Axboe
Currently we're using 40 bytes for the io_wq_work structure, and 16 of those is the doubly link list node. We don't need doubly linked lists, we always add to tail to keep things ordered, and any other use case is list traversal with deletion. For the deletion case, we can easily support any node deletion by keeping track of the previous entry. This shrinks io_wq_work to 32 bytes, and subsequently io_kiock from io_uring to 216 to 208 bytes. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io-wq: fix handling of NUMA node IDsJann Horn
There are several things that can go wrong in the current code on NUMA systems, especially if not all nodes are online all the time: - If the identifiers of the online nodes do not form a single contiguous block starting at zero, wq->wqes will be too small, and OOB memory accesses will occur e.g. in the loop in io_wq_create(). - If a node comes online between the call to num_online_nodes() and the for_each_node() loop in io_wq_create(), an OOB write will occur. - If a node comes online between io_wq_create() and io_wq_enqueue(), a lookup is performed for an element that doesn't exist, and an OOB read will probably occur. Fix it by: - using nr_node_ids instead of num_online_nodes() for the allocation size; nr_node_ids is calculated by setup_nr_node_ids() to be bigger than the highest node ID that could possibly come online at some point, even if those nodes' identifiers are not a contiguous block - creating workers for all possible CPUs, not just all online ones This is basically what the normal workqueue code also does, as far as I can tell. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io_uring: use kzalloc instead of kcalloc for single-element allocationsJann Horn
These allocations are single-element allocations, so don't use the array allocation wrapper for them. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io_uring: cleanup io_import_fixed()Pavel Begunkov
Clean io_import_fixed() call site and make it return proper type. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io_uring: inline struct sqe_submitPavel Begunkov
There is no point left in keeping struct sqe_submit. Inline it into struct io_kiocb, so any req->submit.field is now just req->field - moves initialisation of ring_file into io_get_req() - removes duplicated req->sequence. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io_uring: store timeout's sqe->off in proper placePavel Begunkov
Timeouts' sequence offset (i.e. sqe->off) is stored in req->submit.sequence under a false name. Keep it in timeout.data instead. The unused space for sequence will be reclaimed in the following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26net: disallow ancillary data for __sys_{send,recv}msg_file()Jens Axboe
Only io_uring uses (and added) these, and we want to disallow the use of sendmsg/recvmsg for anything but regular data transfers. Use the newly added prep helper to split the msghdr copy out from the core function, to check for msg_control and msg_controllen settings. If either is set, we return -EINVAL. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26net: separate out the msghdr copy from ___sys_{send,recv}msg()Jens Axboe
This is in preparation for enabling the io_uring helpers for sendmsg and recvmsg to first copy the header for validation before continuing with the operation. There should be no functional changes in this patch. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26net: port < inet_prot_sock(net) --> inet_port_requires_bind_service(net, port)Maciej Żenczykowski
Note that the sysctl write accessor functions guarantee that: net->ipv4.sysctl_ip_prot_sock <= net->ipv4.ip_local_ports.range[0] invariant is maintained, and as such the max() in selinux hooks is actually spurious. ie. even though if (snum < max(inet_prot_sock(sock_net(sk)), low) || snum > high) { per logic is the same as if ((snum < inet_prot_sock(sock_net(sk)) && snum < low) || snum > high) { it is actually functionally equivalent to: if (snum < low || snum > high) { which is equivalent to: if (snum < inet_prot_sock(sock_net(sk)) || snum < low || snum > high) { even though the first clause is spurious. But we want to hold on to it in case we ever want to change what what inet_port_requires_bind_service() means (for example by changing it from a, by default, [0..1024) range to some sort of set). Test: builds, git 'grep inet_prot_sock' finds no other references Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26Merge branch 'ibmvnic-Harden-device-commands-and-queries'David S. Miller
Thomas Falcon says: ==================== ibmvnic: Harden device commands and queries This patch series fixes some shortcomings with the current VNIC device command implementation. The first patch fixes the initialization of driver completion structures used for device commands. Additionally, all waits for device commands are bounded with a timeout in the event that the device does not respond or becomes inoperable. Finally, serialize queries to retain the integrity of device return codes. Changes in v2: - included header comment for ibmvnic_wait_for_completion - removed open-coded loop in patch 3/4, suggested by Jakub - ibmvnic_wait_for_completion accepts timeout value in milliseconds instead of jiffies - timeout calculations cleaned up and completed before wait loop - included missing mutex_destroy calls, suggested by Jakub - included comment before mutex declaration ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26ibmvnic: Serialize device queriesThomas Falcon
Provide some serialization for device CRQ commands and queries to ensure that the shared variable used for storing return codes is properly synchronized. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26ibmvnic: Bound waits for device queriesThomas Falcon
Create a wrapper for wait_for_completion calls with additional driver checks to ensure that the driver does not wait on a disabled device. In those cases or if the device does not respond in an extended amount of time, this will allow the driver an opportunity to recover. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26ibmvnic: Terminate waiting device threads after loss of serviceThomas Falcon
If we receive a notification that the device has been deactivated or removed, force a completion of all waiting threads. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26ibmvnic: Fix completion structure initializationThomas Falcon
Fix multiple calls to init_completion for device completion structures. Instead, initialize them during device probe and reinitialize them later as needed. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26net-sctp: replace some sock_net(sk) with just 'net'Maciej Żenczykowski
It already existed in part of the function, but move it to a higher level and use it consistently throughout. Safe since sk is never written to. Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26net: Fix a documentation bug wrt. ip_unprivileged_port_startMaciej Żenczykowski
It cannot overlap with the local port range - ie. with autobind selectable ports - and not with reserved ports. Indeed 'ip_local_reserved_ports' isn't even a range, it's a (by default empty) set. Fixes: 4548b683b781 ("Introduce a sysctl that modifies the value of PROT_SOCK.") Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26x86/ptrace: Document FSBASE and GSBASE ABI odditiesAndy Lutomirski
Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26x86/ptrace: Remove set_segment_reg() implementations for currentAndy Lutomirski
seg_segment_reg() should be unreachable with task == current. Rather than confusingly trying to make it work, just explicitly disable this case. (regset->get is used for current in the coredump code, but the ->set interface is only used for ptrace, and you can't ptrace yourself.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26x86/traps: die() instead of panicking on a double faultAndy Lutomirski
A double fault has a decent chance of being recoverable by killing the offending thread. Use die() so that we at least try to recover. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bitAndy Lutomirski
The old x86_32 doublefault_fn() was old and crufty, and it did not even try to recover. do_double_fault() is much nicer. Rewrite the 32-bit double fault code to sanitize CPU state and call do_double_fault(). This is mostly an exercise i386 archaeology. With this patch applied, 32-bit double faults get a real stack trace, just like 64-bit double faults. [ mingo: merged the patch to a later kernel base. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26x86/doublefault/32: Move #DF stack and TSS to cpu_entry_areaAndy Lutomirski
There are three problems with the current layout of the doublefault stack and TSS. First, the TSS is only cacheline-aligned, which is not enough -- if the hardware portion of the TSS (struct x86_hw_tss) crosses a page boundary, horrible things happen [0]. Second, the stack and TSS are global, so simultaneous double faults on different CPUs will cause massive corruption. Third, the whole mechanism won't work if user CR3 is loaded, resulting in a triple fault [1]. Let the doublefault stack and TSS share a page (which prevents the TSS from spanning a page boundary), make it percpu, and move it into cpu_entry_area. Teach the stack dump code about the doublefault stack. [0] Real hardware will read past the end of the page onto the next *physical* page if a task switch happens. Virtual machines may have any number of bugs, and I would consider it reasonable for a VM to summarily kill the guest if it tries to task-switch to a page-spanning TSS. [1] Real hardware triple faults. At least some VMs seem to hang. I'm not sure what's going on. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26x86/doublefault/32: Rename doublefault.c to doublefault_32.cAndy Lutomirski
doublefault.c now only contains 32-bit code. Rename it to doublefault_32.c. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26x86/traps: Disentangle the 32-bit and 64-bit doublefault codeAndy Lutomirski
The 64-bit doublefault handler is much nicer than the 32-bit one. As a first step toward unifying them, make the 64-bit handler self-contained. This should have no effect no functional effect except in the odd case of x86_64 with CONFIG_DOUBLEFAULT=n in which case it will change the logging a bit. This also gets rid of CONFIG_DOUBLEFAULT configurability on 64-bit kernels. It didn't do anything useful -- CONFIG_DOUBLEFAULT=n didn't actually disable doublefault handling on x86_64. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26lkdtm: Add a DOUBLE_FAULT crash type on x86Andy Lutomirski
The DOUBLE_FAULT crash does INT $8, which is a decent approximation of a double fault. This is useful for testing the double fault handling. Use it like: Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26selftests/x86/single_step_syscall: Check SYSENTER directlyAndy Lutomirski
We used to test SYSENTER only through the vDSO. Test it directly too, just in case. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()Joerg Roedel
The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc() ranges: before such vmap ranges are reused we make sure that they are unmapped from every task's page tables. This is really easy on pagetable setups where the kernel page tables are shared between all tasks - this is the case on 32-bit kernels with SHARED_KERNEL_PMD = 1. But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating over the pgd_list and clearing all pmd entries in the pgds that are cleared in the init_mm.pgd, which is the reference pagetable that the vmalloc() code uses. In that context the current practice of vmalloc_sync_all() iterating until FIX_ADDR_TOP is buggy: for (address = VMALLOC_START & PMD_MASK; address >= TASK_SIZE_MAX && address < FIXADDR_TOP; address += PMD_SIZE) { struct page *page; Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc address ranges: VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges that don't clear their pmds, but it's lethal for the LDT range, which relies on having different mappings in different processes, and 'synchronizing' them in the vmalloc sense corrupts those pagetable entries (clearing them). This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD off and makes this the dominant mapping mode on 32-bit. To make LDT working again vmalloc_sync_all() must only iterate over the volatile parts of the kernel address range that are identical between all processes. So the correct check in vmalloc_sync_all() is "address < VMALLOC_END" to make sure the VMALLOC areas are synchronized and the LDT mapping is not falsely overwritten. The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either, but this is not really a proplem since their PMDs get established during bootup and never change. This change fixes the ldt_gdt selftest in my setup. [ mingo: Fixed up the changelog to explain the logic and modified the copying to only happen up until VMALLOC_END. ] Reported-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Fixes: 7757d607c6b3: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32") Link: https://lkml.kernel.org/r/20191126111119.GA110513@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26x86/iopl: Make 'struct tss_struct' constant size againIngo Molnar
After the following commit: 05b042a19443: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise") 'struct cpu_entry_area' has to be Kconfig invariant, so that we always have a matching CPU_ENTRY_AREA_PAGES size. This commit added a CONFIG_X86_IOPL_IOPERM dependency to tss_struct: 111e7b15cf10: ("x86/ioperm: Extend IOPL config to control ioperm() as well") Which, if CONFIG_X86_IOPL_IOPERM is turned off, reduces the size of cpu_entry_area by two pages, triggering the assert: ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_202’ declared with attribute error: BUILD_BUG_ON failed: (CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE Simplify the Kconfig dependencies and make cpu_entry_area constant size on 32-bit kernels again. Fixes: 05b042a19443: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise") Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26sr_vendor: support Beurer GL50 evo CD-on-a-chip devices.Diego Elio Pettenò
The Beurer GL50 evo uses a Cygnal-manufactured CD-on-a-chip that only accepts a subset of SCSI commands, and supports neither audio commands nor generic packet commands. Actually sending those commands bring the device to an unrecoverable state that causes the device to hang and reset. To: Jens Axboe <axboe@kernel.dk> Cc: linux-kernel@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26cdrom: respect device capabilities during opening actionDiego Elio Pettenò
Reading the TOC only works if the device can play audio, otherwise these commands fail (and possibly bring the device to an unhealthy state.) Similarly, cdrom_mmc3_profile() should only be called if the device supports generic packet commands. To: Jens Axboe <axboe@kernel.dk> Cc: linux-kernel@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26Revert "vfs: properly and reliably lock f_pos in fdget_pos()"Linus Torvalds
This reverts commit 0be0ee71816b2b6725e2b4f32ad6726c9d729777. I was hoping it would be benign to switch over entirely to FMODE_STREAM, and we'd have just a couple of small fixups we'd need, but it looks like we're not quite there yet. While it worked fine on both my desktop and laptop, they are fairly similar in other respects, and run mostly the same loads. Kenneth Crudup reports that it seems to break both his vmware installation and the KDE upower service. In both cases apparently leading to timeouts due to waitinmg for the f_pos lock. There are a number of character devices in particular that definitely want stream-like behavior, but that currently don't get marked as streams, and as a result get the exclusion between concurrent read()/write() on the same file descriptor. Which doesn't work well for them. The most obvious example if this is /dev/console and /dev/tty, which use console_fops and tty_fops respectively (and ptmx_fops for the pty master side). It may be that it's just this that causes problems, but we clearly weren't ready yet. Because there's a number of other likely common cases that don't have llseek implementations and would seem to act as stream devices: /dev/fuse (fuse_dev_operations) /dev/mcelog (mce_chrdev_ops) /dev/mei0 (mei_fops) /dev/net/tun (tun_fops) /dev/nvme0 (nvme_dev_fops) /dev/tpm0 (tpm_fops) /proc/self/ns/mnt (ns_file_operations) /dev/snd/pcm* (snd_pcm_f_ops[]) and while some of these could be trivially automatically detected by the vfs layer when the character device is opened by just noticing that they have no read or write operations either, it often isn't that obvious. Some character devices most definitely do use the file position, even if they don't allow seeking: the firmware update code, for example, uses simple_read_from_buffer() that does use f_pos, but doesn't allow seeking back and forth. We'll revisit this when there's a better way to detect the problem and fix it (possibly with a coccinelle script to do more of the FMODE_STREAM annotations). Reported-by: Kenneth R. Crudup <kenny@panix.com> Cc: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-26Merge branch 'x86-iopl-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 iopl updates from Ingo Molnar: "This implements a nice simplification of the iopl and ioperm code that Thomas Gleixner discovered: we can implement the IO privilege features of the iopl system call by using the IO permission bitmap in permissive mode, while trapping CLI/STI/POPF/PUSHF uses in user-space if they change the interrupt flag. This implements that feature, with testing facilities and related cleanups" [ "Simplification" may be an over-statement. The main goal is to avoid the cli/sti of iopl by effectively implementing the IO port access parts of iopl in terms of ioperm. This may end up not workign well in case people actually depend on cli/sti being available, or if there are mixed uses of iopl and ioperm. We will see.. - Linus ] * 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86/ioperm: Fix use of deprecated config option x86/entry/32: Clarify register saving in __switch_to_asm() selftests/x86/iopl: Extend test to cover IOPL emulation x86/ioperm: Extend IOPL config to control ioperm() as well x86/iopl: Remove legacy IOPL option x86/iopl: Restrict iopl() permission scope x86/iopl: Fixup misleading comment selftests/x86/ioperm: Extend testing so the shared bitmap is exercised x86/ioperm: Share I/O bitmap if identical x86/ioperm: Remove bitmap if all permissions dropped x86/ioperm: Move TSS bitmap update to exit to user work x86/ioperm: Add bitmap sequence number x86/ioperm: Move iobitmap data into a struct x86/tss: Move I/O bitmap data into a seperate struct x86/io: Speedup schedule out of I/O bitmap user x86/ioperm: Avoid bitmap allocation if no permissions are set x86/ioperm: Simplify first ioperm() invocation logic x86/iopl: Cleanup include maze x86/tss: Fix and move VMX BUILD_BUG_ON() x86/cpu: Unify cpu_init() ...
2019-11-26sysctl: Remove the sysctl system callEric W. Biederman
This system call has been deprecated almost since it was introduced, and in a survey of the linux distributions I can no longer find any of them that enable CONFIG_SYSCTL_SYSCALL. The only indication that I can find that anyone might care is that a few of the defconfigs in the kernel enable CONFIG_SYSCTL_SYSCALL. However this appears in only 31 of 414 defconfigs in the kernel, so I suspect this symbols presence is simply because it is harmless to include rather than because it is necessary. As there appear to be no users of the sysctl system call, remove the code. As this removes one of the few uses of the internal kernel mount of proc I hope this allows for even more simplifications of the proc filesystem. Cc: Alex Smith <alex.smith@imgtec.com> Cc: Anders Berg <anders.berg@lsi.com> Cc: Apelete Seketeli <apelete@seketeli.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Chee Nouk Phoon <cnphoon@altera.com> Cc: Chris Zankel <chris@zankel.net> Cc: Christian Ruppert <christian.ruppert@abilis.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Harvey Hunt <harvey.hunt@imgtec.com> Cc: Helge Deller <deller@gmx.de> Cc: Hongliang Tao <taohl@lemote.com> Cc: Hua Yan <yanh@lemote.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: John Crispin <blogic@openwrt.org> Cc: Jonas Jensen <jonas.jensen@gmail.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Jun Nie <jun.nie@linaro.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Kevin Wells <kevin.wells@nxp.com> Cc: Kumar Gala <galak@codeaurora.org> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Ley Foon Tan <lftan@altera.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Olof Johansson <olof@lixom.net> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Phil Edworthy <phil.edworthy@renesas.com> Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Roland Stigge <stigge@antcom.de> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Scott Telford <stelford@cadence.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: Tanmay Inamdar <tinamdar@apm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Wolfram Sang <w.sang@pengutronix.de> Acked-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-11-26Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this cycle were: - Cross-arch changes to move the linker sections for NOTES and EXCEPTION_TABLE into the RO_DATA area, where they belong on most architectures. (Kees Cook) - Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to trap jumps into the middle of those padding areas instead of sliding execution. (Kees Cook) - A thorough cleanup of symbol definitions within x86 assembler code. The rather randomly named macros got streamlined around a (hopefully) straightforward naming scheme: SYM_START(name, linkage, align...) SYM_END(name, sym_type) SYM_FUNC_START(name) SYM_FUNC_END(name) SYM_CODE_START(name) SYM_CODE_END(name) SYM_DATA_START(name) SYM_DATA_END(name) etc - with about three times of these basic primitives with some label, local symbol or attribute variant, expressed via postfixes. No change in functionality intended. (Jiri Slaby) - Misc other changes, cleanups and smaller fixes" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits) x86/entry/64: Remove pointless jump in paranoid_exit x86/entry/32: Remove unused resume_userspace label x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o m68k: Convert missed RODATA to RO_DATA x86/vmlinux: Use INT3 instead of NOP for linker fill bytes x86/mm: Report actual image regions in /proc/iomem x86/mm: Report which part of kernel image is freed x86/mm: Remove redundant address-of operators on addresses xtensa: Move EXCEPTION_TABLE to RO_DATA segment powerpc: Move EXCEPTION_TABLE to RO_DATA segment parisc: Move EXCEPTION_TABLE to RO_DATA segment microblaze: Move EXCEPTION_TABLE to RO_DATA segment ia64: Move EXCEPTION_TABLE to RO_DATA segment h8300: Move EXCEPTION_TABLE to RO_DATA segment c6x: Move EXCEPTION_TABLE to RO_DATA segment arm64: Move EXCEPTION_TABLE to RO_DATA segment alpha: Move EXCEPTION_TABLE to RO_DATA segment x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment x86/vmlinux: Actually use _etext for the end of the text segment vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA ...
2019-11-26Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "These are the fixes left over from the v5.4 cycle: - Various low level 32-bit entry code fixes and improvements by Andy Lutomirski, Peter Zijlstra and Thomas Gleixner. - Fix 32-bit Xen PV breakage, by Jan Beulich" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3 x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise selftests/x86/sigreturn/32: Invalidate DS and ES when abusing the kernel selftests/x86/mov_ss_trap: Fix the SYSENTER test x86/entry/32: Fix NMI vs ESPFIX x86/entry/32: Unwind the ESPFIX stack earlier on exception entry x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL x86/entry/32: Use %ss segment where required x86/entry/32: Fix IRET exception x86/cpu_entry_area: Add guard page for entry stack on 32bit x86/pti/32: Size initial_page_table correctly x86/doublefault/32: Fix stack canaries in the double fault handler x86/xen/32: Simplify ring check in xen_iret_crit_fixup() x86/xen/32: Make xen_iret_crit_fixup() independent of frame layout x86/stackframe/32: Repair 32-bit Xen PV
2019-11-26Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PTI updates from Ingo Molnar: "Fix reporting bugs of the MDS and TAA mitigation status, if one or both are set via a boot option. No change to mitigation behavior intended" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Fix redundant MDS mitigation message x86/speculation: Fix incorrect MDS/TAA mitigation status
2019-11-26tipc: fix link name length checkJohn Rutherford
In commit 4f07b80c9733 ("tipc: check msg->req data len in tipc_nl_compat_bearer_disable") the same patch code was copied into routines: tipc_nl_compat_bearer_disable(), tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats(). The two link routine occurrences should have been modified to check the maximum link name length and not bearer name length. Fixes: 4f07b80c9733 ("tipc: check msg->reg data len in tipc_nl_compat_bearer_disable") Signed-off-by: John Rutherford <john.rutherford@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26clk: aspeed: Add RMII RCLK gates for both AST2500 MACsAndrew Jeffery
RCLK is a fixed 50MHz clock derived from HPLL that is described by a single gate for each MAC. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Link: https://lkml.kernel.org/r/20191010020655.3776-3-andrew@aj.id.au Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-11-26Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "UV platform updates (with a 'hubless' variant) and Jailhouse updates for better UART support" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/jailhouse: Only enable platform UARTs if available x86/jailhouse: Improve setup data version comparison x86/platform/uv: Account for UV Hubless in is_uvX_hub Ops x86/platform/uv: Check EFI Boot to set reboot type x86/platform/uv: Decode UVsystab Info x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files x86/platform/uv: Setup UV functions for Hubless UV Systems x86/platform/uv: Add return code to UV BIOS Init function x86/platform/uv: Return UV Hubless System Type x86/platform/uv: Save OEM_ID from ACPI MADT probe
2019-11-26Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - A PAT series from Davidlohr Bueso, which simplifies the memtype rbtree by using the interval tree helpers. (There's more cleanups in this area queued up, but they didn't make the merge window.) - Also flip over CONFIG_X86_5LEVEL to default-y. This might draw in a few more testers, as all the major distros are going to have 5-level paging enabled by default in their next iterations. - Misc cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/pat: Rename pat_rbtree.c to pat_interval.c x86/mm/pat: Drop the rbt_ prefix from external memtype calls x86/mm/pat: Do not pass 'rb_root' down the memtype tree helper functions x86/mm/pat: Convert the PAT tree to a generic interval tree x86/mm: Clean up the pmd_read_atomic() comments x86/mm: Fix function name typo in pmd_read_atomic() comment x86/cpu: Clean up intel_tlb_table[] x86/mm: Enable 5-level paging support by default
2019-11-26Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kdump updates from Ingo Molnar: "This solves a kdump artifact where encrypted memory contents are dumped, instead of unencrypted ones. The solution also happens to simplify the kdump code, to everyone's delight" * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/crash: Align function arguments on opening braces x86/kdump: Remove the backup region handling x86/kdump: Always reserve the low 1M when the crashkernel option is specified x86/crash: Add a forward declaration of struct kimage
2019-11-26Merge branch 'x86-hyperv-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hyperv updates from Ingo Molnar: "Misc updates to the hyperv guest code: - Rework clockevents initialization to better support hibernation - Allow guests to enable InvariantTSC - Micro-optimize send_ipi_one" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Initialize clockevents earlier in CPU onlining x86/hyperv: Allow guests to enable InvariantTSC x86/hyperv: Micro-optimize send_ipi_one()
2019-11-26Merge branch 'x86-entry-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 syscall entry updates from Ingo Molnar: "These changes relate to the preparatory cleanup of syscall function type signatures - to fix indirect call mismatches with Control-Flow Integrity (CFI) checking. No change in behavior intended" * 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Use the correct function type for native_set_fixmap() syscalls/x86: Fix function types in COND_SYSCALL syscalls/x86: Use the correct function type for sys_ni_syscall syscalls/x86: Use COMPAT_SYSCALL_DEFINE0 for IA32 (rt_)sigreturn syscalls/x86: Wire up COMPAT_SYSCALL_DEFINE0 syscalls/x86: Use the correct function type in SYSCALL_DEFINE0
2019-11-26Merge branches 'x86-cpu-for-linus' and 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu and fpu updates from Ingo Molnar: - math-emu fixes - CPUID updates - sanity-check RDRAND output to see whether the CPU at least pretends to produce random data - various unaligned-access across cachelines fixes in preparation of hardware level split-lock detection - fix MAXSMP constraints to not allow !CPUMASK_OFFSTACK kernels with larger than 512 NR_CPUS - misc FPU related cleanups * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Align the x86_capability array to size of unsigned long x86/cpu: Align cpu_caps_cleared and cpu_caps_set to unsigned long x86/umip: Make the comments vendor-agnostic x86/Kconfig: Rename UMIP config parameter x86/Kconfig: Enforce limit of 512 CPUs with MAXSMP and no CPUMASK_OFFSTACK x86/cpufeatures: Add feature bit RDPRU on AMD x86/math-emu: Limit MATH_EMULATION to 486SX compatibles x86/math-emu: Check __copy_from_user() result x86/rdrand: Sanity-check RDRAND output * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Use XFEATURE_FP/SSE enum values instead of hardcoded numbers x86/fpu: Shrink space allocated for xstate_comp_offsets x86/fpu: Update stale variable name in comment
2019-11-26Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The main changes were: - Extend the boot protocol to allow future extensions without hitting the setup_header size limit. - Add quirk to devicetree systems to disable the RTC unless it's listed as a supported device. - Fix ld.lld linker pedantry" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Introduce setup_indirect x86/boot: Introduce kernel_info.setup_type_max x86/boot: Introduce kernel_info x86/init: Allow DT configured systems to disable RTC at boot time x86/realmode: Explicitly set entry point via ENTRY in linker script
2019-11-26Merge branches 'core-objtool-for-linus', 'x86-cleanups-for-linus' and ↵Linus Torvalds
'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 objtool, cleanup, and apic updates from Ingo Molnar: "Objtool: - Fix a gawk 5.0 incompatibility in gen-insn-attr-x86.awk. Most distros are still on gawk 4.2.x. Cleanup: - Misc cleanups, plus the removal of obsolete code such as Calgary IOMMU support, which code hasn't seen any real testing in a long time and there's no known users left. apic: - Two changes: a cleanup and a fix for an (old) race for oneshot threaded IRQ handlers" * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Fix awk regexp warnings * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Remove unused asm/rio.h x86: Fix typos in comments x86/pci: Remove #ifdef __KERNEL__ guard from <asm/pci.h> x86/pci: Remove pci_64.h x86: Remove the calgary IOMMU driver x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments x86/kdump: Remove the unused crash_copy_backup_region() x86/nmi: Remove stale EDAC include leftover * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Rename misnamed functions x86/ioapic: Prevent inconsistent state when moving an interrupt
2019-11-26perf tools: Allow to link with libbpf dynamicalyJiri Olsa
Currently we support only static linking with kernel's libbpf (tools/lib/bpf). This patch adds libbpf package detection and support to link perf with it dynamically. The libbpf package status is displayed with: $ make VF=1 Auto-detecting system features: ... ... libbpf: [ on ] It's not checked by default, because it's quite new. Once it's on most distros we can switch it on. For the same reason it's not added to the test-all check. Perf does not need advanced version of libbpf, so we can check just for the base bpf_object__open function. Adding new compile variable to detect libbpf package and link bpf dynamically: $ make LIBBPF_DYNAMIC=1 ... LINK perf $ ldd perf | grep bpf libbpf.so.0 => /lib64/libbpf.so.0 (0x00007f46818bc000) If libbpf is not installed, build stops with: Makefile.config:486: *** Error: No libbpf devel library found,\ please install libbpf-devel. Stop. Committer testing: $ make LIBBPF_DYNAMIC=1 -C tools/perf O=/tmp/build/perf make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j8' parallel build Makefile.config:493: *** Error: No libbpf devel library found, please install libbpf-devel. Stop. make[1]: *** [Makefile.perf:225: sub-make] Error 2 make: *** [Makefile:70: all] Error 2 make: Leaving directory '/home/acme/git/perf/tools/perf' $ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191126121253.28253-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26perf tests: Rename tests/map_groups.c to tests/maps.cArnaldo Carvalho de Melo
One more step in mergint the maps and map_groups structs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-bw6aagubqxc47m54k2maezfu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26perf tests: Rename thread-mg-share to thread-maps-shareArnaldo Carvalho de Melo
One more step in merging 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-naxsl3g4ou3fyxb8l8e0pn5e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26perf maps: Rename map_groups.h to maps.hArnaldo Carvalho de Melo
One more step in the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-9ibtn3vua76f934t7woyf26w@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26perf maps: Rename 'mg' variables to 'maps'Arnaldo Carvalho de Melo
Continuing the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-z8d14wrw393a0fbvmnk1bqd9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>