summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-12-11seccomp: add a return code to trap to userspaceTycho Andersen
This patch introduces a means for syscalls matched in seccomp to notify some other task that a particular filter has been triggered. The motivation for this is primarily for use with containers. For example, if a container does an init_module(), we obviously don't want to load this untrusted code, which may be compiled for the wrong version of the kernel anyway. Instead, we could parse the module image, figure out which module the container is trying to load and load it on the host. As another example, containers cannot mount() in general since various filesystems assume a trusted image. However, if an orchestrator knows that e.g. a particular block device has not been exposed to a container for writing, it want to allow the container to mount that block device (that is, handle the mount for it). This patch adds functionality that is already possible via at least two other means that I know about, both of which involve ptrace(): first, one could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL. Unfortunately this is slow, so a faster version would be to install a filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP. Since ptrace allows only one tracer, if the container runtime is that tracer, users inside the container (or outside) trying to debug it will not be able to use ptrace, which is annoying. It also means that older distributions based on Upstart cannot boot inside containers using ptrace, since upstart itself uses ptrace to monitor services while starting. The actual implementation of this is fairly small, although getting the synchronization right was/is slightly complex. Finally, it's worth noting that the classic seccomp TOCTOU of reading memory data from the task still applies here, but can be avoided with careful design of the userspace handler: if the userspace handler reads all of the task memory that is necessary before applying its security policy, the tracee's subsequent memory edits will not be read by the tracer. Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: "Serge E. Hallyn" <serge@hallyn.com> Acked-by: Serge Hallyn <serge@hallyn.com> CC: Christian Brauner <christian@brauner.io> CC: Tyler Hicks <tyhicks@canonical.com> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-11seccomp: switch system call argument type to void *Tycho Andersen
The const qualifier causes problems for any code that wants to write to the third argument of the seccomp syscall, as we will do in a future patch in this series. The third argument to the seccomp syscall is documented as void *, so rather than just dropping the const, let's switch everything to use void * as well. I believe this is safe because of 1. the documentation above, 2. there's no real type information exported about syscalls anywhere besides the man pages. Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: "Serge E. Hallyn" <serge@hallyn.com> Acked-by: Serge Hallyn <serge@hallyn.com> CC: Christian Brauner <christian@brauner.io> CC: Tyler Hicks <tyhicks@canonical.com> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-11-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of x86 fixes: - Cure the LDT remapping to user space on 5 level paging which ended up in the KASLR space - Remove LDT mapping before freeing the LDT pages - Make NFIT MCE handling more robust - Unbreak the VSMP build by removing the dependency on paravirt ops - Support broken PIT emulation on Microsoft hyperV - Don't trace vmware_sched_clock() to avoid tracer recursion - Remove -pipe from KBUILD CFLAGS which breaks clang and is also slower on GCC - Trivial coding style and typo fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/vmware: Do not trace vmware_sched_clock() x86/vsmp: Remove dependency on pv_irq_ops x86/ldt: Remove unused variable in map_ldt_struct() x86/ldt: Unmap PTEs for the slot before freeing LDT pages x86/mm: Move LDT remap out of KASLR region on 5-level paging acpi/nfit, x86/mce: Validate a MCE's address before using it acpi/nfit, x86/mce: Handle only uncorrectable machine checks x86/build: Remove -pipe from KBUILD_CFLAGS x86/hyper-v: Fix indentation in hv_do_fast_hypercall16() Documentation/x86: Fix typo in zero-page.txt x86/hyper-v: Enable PIT shutdown quirk clockevents/drivers/i8253: Add support for PIT shutdown quirk
2018-11-11Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: "A couple of fixlets for the core: - Kernel doc function documentation fixes - Missing prototypes for weak watchdog functions" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: resource/docs: Complete kernel-doc style function documentation watchdog/core: Add missing prototypes for weak functions resource/docs: Fix new kernel-doc warnings
2018-11-09Merge tag 'ceph-for-4.20-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull Ceph fixes from Ilya Dryomov: "Two CephFS fixes (copy_file_range and quota) and a small feature bit cleanup" * tag 'ceph-for-4.20-rc2' of https://github.com/ceph/ceph-client: libceph: assume argonaut on the server side ceph: quota: fix null pointer dereference in quota check ceph: add destination file data sync before doing any remote copy
2018-11-09Merge tag 's390-4.20-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - A fix for the pgtable_bytes misaccounting on s390. The patch changes common code part in regard to page table folding and adds extra checks to mm_[inc|dec]_nr_[pmds|puds]. - Add FORCE for all build targets using if_changed - Use non-loadable phdr for the .vmlinux.info section to avoid a segment overlap that confuses kexec - Cleanup the attribute definition for the diagnostic sampling - Increase stack size for CONFIG_KASAN=y builds - Export __node_distance to fix a build error - Correct return code of a PMU event init function - An update for the default configs * tag 's390-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/perf: Change CPUM_CF return code in event init function s390: update defconfigs s390/mm: Fix ERROR: "__node_distance" undefined! s390/kasan: increase instrumented stack size to 64k s390/cpum_sf: Rework attribute definition for diagnostic sampling s390/mm: fix mis-accounting of pgtable_bytes mm: add mm_pxd_folded checks to pgtable_bytes accounting functions mm: introduce mm_[p4d|pud|pmd]_folded mm: make the __PAGETABLE_PxD_FOLDED defines non-empty s390: avoid vmlinux segments overlap s390/vdso: add missing FORCE to build targets s390/decompressor: add missing FORCE to build targets
2018-11-08libceph: assume argonaut on the server sideIlya Dryomov
No one is running pre-argonaut. In addition one of the argonaut features (NOSRCADDR) has been required since day one (and a half, 2.6.34 vs 2.6.35) of the kernel client. Allow for the possibility of reusing these feature bits later. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2018-11-08Merge tag 'compiler-attributes-for-linus-v4.20-rc2' of ↵Linus Torvalds
https://github.com/ojeda/linux Pull compiler attribute fixlets from Miguel Ojeda: "Small improvements to Compiler Attributes: - Define asm_volatile_goto for non-gcc compilers (Nick Desaulniers) - Improve the explanation of compiler_attributes.h" * tag 'compiler-attributes-for-linus-v4.20-rc2' of https://github.com/ojeda/linux: Compiler Attributes: improve explanation of header include/linux/compiler*.h: define asm_volatile_goto
2018-11-08Merge tag 'mtd/fixes-for-4.20-rc2' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD fixes from Boris Brezillon: "MTD changes: - Kill a VLA in sa1100 SPI NOR changes: - Make sure ->addr_width is restored when SFDP parsing fails - Propate errors happening in cqspi_direct_read_execute() NAND changes: - Fix kernel-doc mismatch - Fix nanddev_neraseblocks() to return the correct value - Avoid selection of BCH_CONST_PARAMS when some users require dynamic BCH settings" * tag 'mtd/fixes-for-4.20-rc2' of git://git.infradead.org/linux-mtd: mtd: nand: Fix nanddev_pos_next_page() kernel-doc header mtd: sa1100: avoid VLA in sa1100_setup_mtd mtd: spi-nor: Reset nor->addr_width when SFDP parsing failed mtd: spi-nor: cadence-quadspi: Return error code in cqspi_direct_read_execute() mtd: nand: Fix nanddev_neraseblocks() mtd: nand: drop kernel-doc notation for a deleted function parameter mtd: docg3: don't set conflicting BCH_CONST_PARAMS option
2018-11-08Compiler Attributes: improve explanation of headerMiguel Ojeda
Explain better what "optional" attributes are, and avoid calling them so to avoid confusion. Simply retain "Optional" as a word to look for in the comments. Moreover, add a couple sentences to explain a bit more the intention and the documentation links. Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-11-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - hid.git is moving towards group maintainership (where group is myself and Benjamin Tissoires), therefore this pull request updates MAINTAINERS accordingly - fix for hid-asus config dependency from Arnd Bergmann - two device-specific quirks for i2c-hid from Julian Sax and Kai-Heng Feng - other few small assorted fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: fix up .raw_event() documentation HID: asus: fix build warning wiht CONFIG_ASUS_WMI disabled HID: i2c-hid: add Direkt-Tek DTLAPY133-1 to descriptor override HID: moving to group maintainership model HID: alps: allow incoming reports when only the trackstick is opened Revert "HID: add NOGET quirk for Eaton Ellipse MAX UPS" HID: i2c-hid: Add a small delay after sleep command for Raydium touchpanel HID: hiddev: fix potential Spectre v1
2018-11-06watchdog/core: Add missing prototypes for weak functionsMathieu Malaterre
The split out of the hard lockup detector exposed two new weak functions, but no prototypes for them, which triggers the build warning: kernel/watchdog.c:109:12: warning: no previous prototype for ‘watchdog_nmi_enable’ [-Wmissing-prototypes] kernel/watchdog.c:115:13: warning: no previous prototype for ‘watchdog_nmi_disable’ [-Wmissing-prototypes] Add the prototypes. Fixes: 73ce0511c436 ("kernel/watchdog.c: move hardlockup detector to separate file") Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Babu Moger <babu.moger@oracle.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180606194232.17653-1-malat@debian.org
2018-11-06mtd: nand: Fix nanddev_pos_next_page() kernel-doc headerBoris Brezillon
Function name is wrong in the kernel-doc header. Fixes: 9c3736a3de21 ("mtd: nand: Add core infrastructure to deal with NAND devices") Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle errors mid-stream of an all dump, from Alexey Kodanev. 2) Fix build of openvswitch with certain combinations of netfilter options, from Arnd Bergmann. 3) Fix interactions between GSO and BQL, from Eric Dumazet. 4) Don't put a '/' in RTL8201F's sysfs file name, from Holger Hoffstätte. 5) S390 qeth driver fixes from Julian Wiedmann. 6) Allow ipv6 link local addresses for netconsole when both source and destination are link local, from Matwey V. Kornilov. 7) Fix the BPF program address seen in /proc/kallsyms, from Song Liu. 8) Initialize mutex before use in dsa microchip driver, from Tristram Ha. 9) Out-of-bounds access in hns3, from Yunsheng Lin. 10) Various netfilter fixes from Stefano Brivio, Jozsef Kadlecsik, Jiri Slaby, Florian Westphal, Eric Westbrook, Andrey Ryabinin, and Pablo Neira Ayuso. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (50 commits) net: alx: make alx_drv_name static net: bpfilter: fix iptables failure if bpfilter_umh is disabled sock_diag: fix autoloading of the raw_diag module net: core: netpoll: Enable netconsole IPv6 link local address ipv6: properly check return value in inet6_dump_all() rtnetlink: restore handling of dumpit return value in rtnl_dump_all() net/ipv6: Move anycast init/cleanup functions out of CONFIG_PROC_FS bonding/802.3ad: fix link_failure_count tracking net: phy: realtek: fix RTL8201F sysfs name sctp: define SCTP_SS_DEFAULT for Stream schedulers sctp: fix strchange_flags name for Stream Change Event mlxsw: spectrum: Fix IP2ME CPU policer configuration openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS qed: fix link config error handling net: hns3: Fix for out-of-bounds access when setting pfc back pressure net/mlx4_en: use __netdev_tx_sent_queue() net: do not abort bulk send on BQL status net: bql: add __netdev_tx_sent_queue() s390/qeth: report 25Gbit link speed s390/qeth: sanitize ARP requests ...
2018-11-06include/linux/compiler*.h: define asm_volatile_gotondesaulniers@google.com
asm_volatile_goto should also be defined for other compilers that support asm goto. Fixes commit 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive"). Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-11-06HID: fix up .raw_event() documentationLinus Walleij
The documentation for the .raw_event() callback says that if the driver return 1, there will be no further processing of the event, but this is not true, the actual code in hid-core.c looks like this: if (hdrv && hdrv->raw_event && hid_match_report(hid, report)) { ret = hdrv->raw_event(hid, report, data, size); if (ret < 0) goto unlock; } ret = hid_report_raw_event(hid, type, data, size, interrupt); The only return value that has any effect on the processing is a negative error. Correct this as it seems to confuse people: I found bogus code in the Razer out-of-tree driver attempting to return 1 here. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-11-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains the first batch of Netfilter fixes for your net tree: 1) Fix splat with IPv6 defragmenting locally generated fragments, from Florian Westphal. 2) Fix Incorrect check for missing attribute in nft_osf. 3) Missing INT_MIN & INT_MAX definition for netfilter bridge uapi header, from Jiri Slaby. 4) Revert map lookup in nft_numgen, this is already possible with the existing infrastructure without this extension. 5) Fix wrong listing of set reference counter, make counter synchronous again, from Stefano Brivio. 6) Fix CIDR 0 in hash:net,port,net, from Eric Westbrook. 7) Fix allocation failure with large set, use kvcalloc(). From Andrey Ryabinin. 8) No need to disable BH when fetch ip set comment, patch from Jozsef Kadlecsik. 9) Sanity check for valid sysfs entry in xt_IDLETIMER, from Taehee Yoo. 10) Fix suspicious rcu usage via ip_set() macro at netlink dump, from Jozsef Kadlecsik. 11) Fix setting default timeout via nfnetlink_cttimeout, this comes with preparation patch to add nf_{tcp,udp,...}_pernet() helper. 12) Allow ebtables table nat to be of filter type via nft_compat. From Florian Westphal. 13) Incorrect calculation of next bucket in early_drop, do no bump hash value, update bucket counter instead. From Vasily Khoruzhick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05compiler: remove __no_sanitize_address_or_inline againMartin Schwidefsky
The __no_sanitize_address_or_inline and __no_kasan_or_inline defines are almost identical. The only difference is that __no_kasan_or_inline does not have the 'notrace' attribute. To be able to replace __no_sanitize_address_or_inline with the older definition, add 'notrace' to __no_kasan_or_inline and change to two users of __no_sanitize_address_or_inline in the s390 code. The 'notrace' option is necessary for e.g. the __load_psw_mask function in arch/s390/include/asm/processor.h. Without the option it is possible to trace __load_psw_mask which leads to kernel stack overflow. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Pointed-out-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-05mtd: nand: Fix nanddev_neraseblocks()Boris Brezillon
nanddev_neraseblocks() currently returns the number pages per LUN instead of the total number of eraseblocks. Fixes: 9c3736a3de21 ("mtd: nand: Add core infrastructure to deal with NAND devices") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-11-04Merge tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Bugfix: - Fix build issues on architectures that don't provide 64-bit cmpxchg Cleanups: - Fix a spelling mistake" * tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: fix spelling mistake, EACCESS -> EACCES SUNRPC: Use atomic(64)_t for seq_send(64)
2018-11-04Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more timer updates from Thomas Gleixner: "A set of commits for the new C-SKY architecture timers" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dt-bindings: timer: gx6605s SOC timer clocksource/drivers/c-sky: Add gx6605s SOC system timer dt-bindings: timer: C-SKY Multi-processor timer clocksource/drivers/c-sky: Add C-SKY SMP timer
2018-11-04clockevents/drivers/i8253: Add support for PIT shutdown quirkMichael Kelley
Add support for platforms where pit_shutdown() doesn't work because of a quirk in the PIT emulation. On these platforms setting the counter register to zero causes the PIT to start running again, negating the shutdown. Provide a global variable that controls whether the counter register is zero'ed, which platform specific code can override. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org> Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org> Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org> Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org> Cc: "jgross@suse.com" <jgross@suse.com> Cc: "akataria@vmware.com" <akataria@vmware.com> Cc: "olaf@aepfle.de" <olaf@aepfle.de> Cc: "apw@canonical.com" <apw@canonical.com> Cc: vkuznets <vkuznets@redhat.com> Cc: "jasowang@redhat.com" <jasowang@redhat.com> Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1541303219-11142-2-git-send-email-mikelley@microsoft.com
2018-11-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A number of fixes and some late updates: - make in_compat_syscall() behavior on x86-32 similar to other platforms, this touches a number of generic files but is not intended to impact non-x86 platforms. - objtool fixes - PAT preemption fix - paravirt fixes/cleanups - cpufeatures updates for new instructions - earlyprintk quirk - make microcode version in sysfs world-readable (it is already world-readable in procfs) - minor cleanups and fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: compat: Cleanup in_compat_syscall() callers x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT objtool: Support GCC 9 cold subfunction naming scheme x86/numa_emulation: Fix uniform-split numa emulation x86/paravirt: Remove unused _paravirt_ident_32 x86/mm/pat: Disable preemption around __flush_tlb_all() x86/paravirt: Remove GPL from pv_ops export x86/traps: Use format string with panic() call x86: Clean up 'sizeof x' => 'sizeof(x)' x86/cpufeatures: Enumerate MOVDIR64B instruction x86/cpufeatures: Enumerate MOVDIRI instruction x86/earlyprintk: Add a force option for pciserial device objtool: Support per-function rodata sections x86/microcode: Make revision and processor flags world-readable
2018-11-03Merge branch 'core/urgent' into x86/urgent, to pick up objtool fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-03net: bql: add __netdev_tx_sent_queue()Eric Dumazet
When qdisc_run() tries to use BQL budget to bulk-dequeue a batch of packets, GSO can later transform this list in another list of skbs, and each skb is sent to device ndo_start_xmit(), one at a time, with skb->xmit_more being set to one but for last skb. Problem is that very often, BQL limit is hit in the middle of the packet train, forcing dev_hard_start_xmit() to stop the bulk send and requeue the end of the list. BQL role is to avoid head of line blocking, making sure a qdisc can deliver high priority packets before low priority ones. But there is no way requeued packets can be bypassed by fresh packets in the qdisc. Aborting the bulk send increases TX softirqs, and hot cache lines (after skb_segment()) are wasted. Note that for TSO packets, we never split a packet in the middle because of BQL limit being hit. Drivers should be able to update BQL counters without flipping/caring about BQL status, if the current skb has xmit_more set. Upper layers are ultimately responsible to stop sending another packet train when BQL limit is hit. Code template in a driver might look like the following : send_doorbell = __netdev_tx_sent_queue(tx_queue, nr_bytes, skb->xmit_more); Note that __netdev_tx_sent_queue() use is not mandatory, since following patch will change dev_hard_start_xmit() to not care about BQL status. But it is highly recommended so that xmit_more full benefits can be reached (less doorbells sent, and less atomic operations as well) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmaskMichal Hocko
THP allocation mode is quite complex and it depends on the defrag mode. This complexity is hidden in alloc_hugepage_direct_gfpmask from a large part currently. The NUMA special casing (namely __GFP_THISNODE) is however independent and placed in alloc_pages_vma currently. This both adds an unnecessary branch to all vma based page allocation requests and it makes the code more complex unnecessarily as well. Not to mention that e.g. shmem THP used to do the node reclaiming unconditionally regardless of the defrag mode until recently. This was not only unexpected behavior but it was also hardly a good default behavior and I strongly suspect it was just a side effect of the code sharing more than a deliberate decision which suggests that such a layering is wrong. Get rid of the thp special casing from alloc_pages_vma and move the logic to alloc_hugepage_direct_gfpmask. __GFP_THISNODE is applied to the resulting gfp mask only when the direct reclaim is not requested and when there is no explicit numa binding to preserve the current logic. Please note that there's also a slight difference wrt MPOL_BIND now. The previous code would avoid using __GFP_THISNODE if the local node was outside of policy_nodemask(). After this patch __GFP_THISNODE is avoided for all MPOL_BIND policies. So there's a difference that if local node is actually allowed by the bind policy's nodemask, previously __GFP_THISNODE would be added, but now it won't be. From the behavior POV this is still correct because the policy nodemask is used. Link: http://lkml.kernel.org/r/20180925120326.24392-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03include/linux/notifier.h: SRCU: fix ctagsSam Protsenko
ctags indexing ("make tags" command) throws this warning: ctags: Warning: include/linux/notifier.h:125: null expansion of name pattern "\1" This is the result of DEFINE_PER_CPU() macro expansion. Fix that by getting rid of line break. Similar fix was already done in commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU expansions"), but this one probably wasn't noticed. Link: http://lkml.kernel.org/r/20181030202808.28027-1-semen.protsenko@linaro.org Fixes: 9c80172b902d ("kernel/SRCU: provide a static initializer") Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03netfilter: ipset: Correct rcu_dereference() call in ip_set_put_comment()Jozsef Kadlecsik
The function is called when rcu_read_lock() is held and not when rcu_read_lock_bh() is held. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-02clocksource/drivers/c-sky: Add C-SKY SMP timerGuo Ren
The driver is for C-SKY SMP timer. It only supports oneshot event and 32bit overflow for clocksource. Per cpu core has one timer and all timers share one clock-counter-input from the same clocksource. This use mfcr&mtcr instructions to access the regs. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-11-02Merge tag 'for-linus-20181102' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "The biggest part of this pull request is the revert of the blkcg cleanup series. It had one fix earlier for a stacked device issue, but another one was reported. Rather than play whack-a-mole with this, revert the entire series and try again for the next kernel release. Apart from that, only small fixes/changes. Summary: - Indentation fixup for mtip32xx (Colin Ian King) - The blkcg cleanup series revert (Dennis Zhou) - Two NVMe fixes. One fixing a regression in the nvme request initialization in this merge window, causing nvme-fc to not work. The other is a suspend/resume p2p resource issue (James, Keith) - Fix sg discard merge, allowing us to merge in cases where we didn't before (Jianchao Wang) - Call rq_qos_exit() after the queue is frozen, preventing a hang (Ming) - Fix brd queue setup, fixing an oops if we fail setting up all devices (Ming)" * tag 'for-linus-20181102' of git://git.kernel.dk/linux-block: nvme-pci: fix conflicting p2p resource adds nvme-fc: fix request private initialization blkcg: revert blkcg cleanups series block: brd: associate with queue until adding disk block: call rq_qos_exit() after queue is frozen mtip32xx: clean an indentation issue, remove extraneous tabs block: fix the DISCARD request merge
2018-11-02Merge tag 'edac_for_4.20_2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull more EDAC updates from Borislav Petkov: "The second part of the EDAC pile which contains the ADXL user and a build fix which addresses a not-so-sensical .config but fixes randconfig builds people do: - skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo) - ACPI_ADXL build fix" [ I don't think "sensical" is a word, particularly when used in the context of actually meaning "nonsensical", but I like it - Linus ] * tag 'edac_for_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, skx: Fix randconfig builds EDAC, skx_edac: Add address translation for non-volatile DIMMs
2018-11-02Merge tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull vfs dedup fixes from Dave Chinner: "This reworks the vfs data cloning infrastructure. We discovered many issues with these interfaces late in the 4.19 cycle - the worst of them (data corruption, setuid stripping) were fixed for XFS in 4.19-rc8, but a larger rework of the infrastructure fixing all the problems was needed. That rework is the contents of this pull request. Rework the vfs_clone_file_range and vfs_dedupe_file_range infrastructure to use a common .remap_file_range method and supply generic bounds and sanity checking functions that are shared with the data write path. The current VFS infrastructure has problems with rlimit, LFS file sizes, file time stamps, maximum filesystem file sizes, stripping setuid bits, etc and so they are addressed in these commits. We also introduce the ability for the ->remap_file_range methods to return short clones so that clones for vfs_copy_file_range() don't get rejected if the entire range can't be cloned. It also allows filesystems to sliently skip deduplication of partial EOF blocks if they are not capable of doing so without requiring errors to be thrown to userspace. Existing filesystems are converted to user the new remap_file_range method, and both XFS and ocfs2 are modified to make use of the new generic checking infrastructure" * tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits) xfs: remove [cm]time update from reflink calls xfs: remove xfs_reflink_remap_range xfs: remove redundant remap partial EOF block checks xfs: support returning partial reflink results xfs: clean up xfs_reflink_remap_blocks call site xfs: fix pagecache truncation prior to reflink ocfs2: remove ocfs2_reflink_remap_range ocfs2: support partial clone range and dedupe range ocfs2: fix pagecache truncation prior to reflink ocfs2: truncate page cache for clone destination file before remapping vfs: clean up generic_remap_file_range_prep return value vfs: hide file range comparison function vfs: enable remap callers that can handle short operations vfs: plumb remap flags through the vfs dedupe functions vfs: plumb remap flags through the vfs clone functions vfs: make remap_file_range functions take and return bytes completed vfs: remap helper should update destination inode metadata vfs: pass remap flags to generic_remap_checks vfs: pass remap flags to generic_remap_file_range_prep vfs: combine the clone and dedupe into a single remap_file_range ...
2018-11-02mm: add mm_pxd_folded checks to pgtable_bytes accounting functionsMartin Schwidefsky
The common mm code calls mm_dec_nr_pmds() and mm_dec_nr_puds() in free_pgtables() if the address range spans a full pud or pmd. If mm_dec_nr_puds/mm_dec_nr_pmds are non-empty due to configuration settings they blindly subtract the size of the pmd or pud table from pgtable_bytes even if the pud or pmd page table layer is folded. Add explicit mm_[pmd|pud]_folded checks to the four pgtable_bytes accounting functions mm_inc_nr_puds, mm_inc_nr_pmds, mm_dec_nr_puds and mm_dec_nr_pmds. As the check for folded page tables can be overwritten by the architecture, this allows to keep a correct pgtable_bytes value for platforms that use a dynamic number of page table levels. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-01Merge branch 'work.afs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...
2018-11-01blkcg: revert blkcg cleanups seriesDennis Zhou
This reverts a series committed earlier due to null pointer exception bug report in [1]. It seems there are edge case interactions that I did not consider and will need some time to understand what causes the adverse interactions. The original series can be found in [2] with a follow up series in [3]. [1] https://www.spinics.net/lists/cgroups/msg20719.html [2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/ [3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/ This reverts the following commits: d459d853c2ed, b2c3fa546705, 101246ec02b5, b3b9f24f5fcc, e2b0989954ae, f0fcb3ec89f3, c839e7a03f92, bdc2491708c4, 74b7c02a9bc1, 5bf9a1f3b4ef, a7b39b4e961c, 07b05bcc3213, 49f4c2dc2b50, 27e6fa996c53 Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-01Merge tag 'compiler-attributes-for-linus-4.20-rc1' of ↵Linus Torvalds
https://github.com/ojeda/linux Pull compiler attribute updates from Miguel Ojeda: "This is an effort to disentangle the include/linux/compiler*.h headers and bring them up to date. The main idea behind the series is to use feature checking macros (i.e. __has_attribute) instead of compiler version checks (e.g. GCC_VERSION), which are compiler-agnostic (so they can be shared, reducing the size of compiler-specific headers) and version-agnostic. Other related improvements have been performed in the headers as well, which on top of the use of __has_attribute it has amounted to a significant simplification of these headers (e.g. GCC_VERSION is now only guarding a few non-attribute macros). This series should also help the efforts to support compiling the kernel with clang and icc. A fair amount of documentation and comments have also been added, clarified or removed; and the headers are now more readable, which should help kernel developers in general. The series was triggered due to the move to gcc >= 4.6. In turn, this series has also triggered Sparse to gain the ability to recognize __has_attribute on its own. Finally, the __nonstring variable attribute series has been also applied on top; plus two related patches from Nick Desaulniers for unreachable() that came a bit afterwards" * tag 'compiler-attributes-for-linus-4.20-rc1' of https://github.com/ojeda/linux: compiler-gcc: remove comment about gcc 4.5 from unreachable() compiler.h: update definition of unreachable() Compiler Attributes: ext4: remove local __nonstring definition Compiler Attributes: auxdisplay: panel: use __nonstring Compiler Attributes: enable -Wstringop-truncation on W=1 (gcc >= 8) Compiler Attributes: add support for __nonstring (gcc >= 8) Compiler Attributes: add MAINTAINERS entry Compiler Attributes: add Doc/process/programming-language.rst Compiler Attributes: remove uses of __attribute__ from compiler.h Compiler Attributes: KENTRY used twice the "used" attribute Compiler Attributes: use feature checks instead of version checks Compiler Attributes: add missing SPDX ID in compiler_types.h Compiler Attributes: remove unneeded sparse (__CHECKER__) tests Compiler Attributes: homogenize __must_be_array Compiler Attributes: remove unneeded tests Compiler Attributes: always use the extra-underscores syntax Compiler Attributes: remove unused attributes
2018-11-01Merge branch 'next-keys2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull keys updates from James Morris: "Provide five new operations in the key_type struct that can be used to provide access to asymmetric key operations. These will be implemented for the asymmetric key type in a later patch and may refer to a key retained in RAM by the kernel or a key retained in crypto hardware. int (*asym_query)(const struct kernel_pkey_params *params, struct kernel_pkey_query *info); int (*asym_eds_op)(struct kernel_pkey_params *params, const void *in, void *out); int (*asym_verify_signature)(struct kernel_pkey_params *params, const void *in, const void *in2); Since encrypt, decrypt and sign are identical in their interfaces, they're rolled together in the asym_eds_op() operation and there's an operation ID in the params argument to distinguish them. Verify is different in that we supply the data and the signature instead and get an error value (or 0) as the only result on the expectation that this may well be how a hardware crypto device may work" * 'next-keys2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (22 commits) KEYS: asym_tpm: Add support for the sign operation [ver #2] KEYS: asym_tpm: Implement tpm_sign [ver #2] KEYS: asym_tpm: Implement signature verification [ver #2] KEYS: asym_tpm: Implement the decrypt operation [ver #2] KEYS: asym_tpm: Implement tpm_unbind [ver #2] KEYS: asym_tpm: Add loadkey2 and flushspecific [ver #2] KEYS: Move trusted.h to include/keys [ver #2] KEYS: trusted: Expose common functionality [ver #2] KEYS: asym_tpm: Implement encryption operation [ver #2] KEYS: asym_tpm: Implement pkey_query [ver #2] KEYS: Add parser for TPM-based keys [ver #2] KEYS: asym_tpm: extract key size & public key [ver #2] KEYS: asym_tpm: add skeleton for asym_tpm [ver #2] crypto: rsa-pkcs1pad: Allow hash to be optional [ver #2] KEYS: Implement PKCS#8 RSA Private Key parser [ver #2] KEYS: Implement encrypt, decrypt and sign for software asymmetric key [ver #2] KEYS: Allow the public_key struct to hold a private key [ver #2] KEYS: Provide software public key query function [ver #2] KEYS: Make the X.509 and PKCS7 parsers supply the sig encoding type [ver #2] KEYS: Provide missing asymmetric key subops for new key type ops [ver #2] ...
2018-11-01Merge tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsAl Viro
backmerge to do fixup of iov_iter_kvec() conflict
2018-11-01Merge tag 'stackleak-v4.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull stackleak gcc plugin from Kees Cook: "Please pull this new GCC plugin, stackleak, for v4.20-rc1. This plugin was ported from grsecurity by Alexander Popov. It provides efficient stack content poisoning at syscall exit. This creates a defense against at least two classes of flaws: - Uninitialized stack usage. (We continue to work on improving the compiler to do this in other ways: e.g. unconditional zero init was proposed to GCC and Clang, and more plugin work has started too). - Stack content exposure. By greatly reducing the lifetime of valid stack contents, exposures via either direct read bugs or unknown cache side-channels become much more difficult to exploit. This complements the existing buddy and heap poisoning options, but provides the coverage for stacks. The x86 hooks are included in this series (which have been reviewed by Ingo, Dave Hansen, and Thomas Gleixner). The arm64 hooks have already been merged through the arm64 tree (written by Laura Abbott and reviewed by Mark Rutland and Will Deacon). With VLAs having been removed this release, there is no need for alloca() protection, so it has been removed from the plugin" * tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: Drop unneeded stackleak_check_alloca() stackleak: Allow runtime disabling of kernel stack erasing doc: self-protection: Add information about STACKLEAK feature fs/proc: Show STACKLEAK metrics in the /proc file system lkdtm: Add a test for STACKLEAK gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
2018-11-01SUNRPC: Use atomic(64)_t for seq_send(64)Paul Burton
The seq_send & seq_send64 fields in struct krb5_ctx are used as atomically incrementing counters. This is implemented using cmpxchg() & cmpxchg64() to implement what amount to custom versions of atomic_fetch_inc() & atomic64_fetch_inc(). Besides the duplication, using cmpxchg64() has another major drawback in that some 32 bit architectures don't provide it. As such commit 571ed1fd2390 ("SUNRPC: Replace krb5_seq_lock with a lockless scheme") resulted in build failures for some architectures. Change seq_send to be an atomic_t and seq_send64 to be an atomic64_t, then use atomic(64)_* functions to manipulate the values. The atomic64_t type & associated functions are provided even on architectures which lack real 64 bit atomic memory access via CONFIG_GENERIC_ATOMIC64 which uses spinlocks to serialize access. This fixes the build failures for architectures lacking cmpxchg64(). A potential alternative that was raised would be to provide cmpxchg64() on the 32 bit architectures that currently lack it, using spinlocks. However this would provide a version of cmpxchg64() with semantics a little different to the implementations on architectures with real 64 bit atomics - the spinlock-based implementation would only work if all access to the memory used with cmpxchg64() is *always* performed using cmpxchg64(). That is not currently a requirement for users of cmpxchg64(), and making it one seems questionable. As such avoiding cmpxchg64() outside of architecture-specific code seems best, particularly in cases where atomic64_t seems like a better fit anyway. The CONFIG_GENERIC_ATOMIC64 implementation of atomic64_* functions will use spinlocks & so faces the same issue, but with the key difference that the memory backing an atomic64_t ought to always be accessed via the atomic64_* functions anyway making the issue moot. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 571ed1fd2390 ("SUNRPC: Replace krb5_seq_lock with a lockless scheme") Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: linux-nfs@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) BPF verifier fixes from Daniel Borkmann. 2) HNS driver fixes from Huazhong Tan. 3) FDB only works for ethernet devices, reject attempts to install FDB rules for others. From Ido Schimmel. 4) Fix spectre V1 in vhost, from Jason Wang. 5) Don't pass on-stack object to irq_set_affinity_hint() in mvpp2 driver, from Marc Zyngier. 6) Fix mlx5e checksum handling when RXFCS is enabled, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits) openvswitch: Fix push/pop ethernet validation net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules bpf: test make sure to run unpriv test cases in test_verifier bpf: add various test cases to test_verifier bpf: don't set id on after map lookup with ptr_to_map_val return bpf: fix partial copy of map_ptr when dst is scalar libbpf: Fix compile error in libbpf_attach_type_by_name kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists selftests: mlxsw: qos_mc_aware: Add a test for UC awareness selftests: mlxsw: qos_mc_aware: Tweak for min shaper mlxsw: spectrum: Set minimum shaper on MC TCs mlxsw: reg: QEEC: Add minimum shaper fields net: hns3: bugfix for rtnl_lock's range in the hclgevf_reset() net: hns3: bugfix for rtnl_lock's range in the hclge_reset() net: hns3: bugfix for handling mailbox while the command queue reinitialized net: hns3: fix incorrect return value/type of some functions net: hns3: bugfix for hclge_mdio_write and hclge_mdio_read net: hns3: bugfix for is_valid_csq_clean_head() net: hns3: remove unnecessary queue reset in the hns3_uninit_all_ring() net: hns3: bugfix for the initialization of command queue's spin lock ...
2018-11-01Merge tag 'platform-drivers-x86-v4.20-1' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver updates from Darren Hart: - Move the Dell dcdbas and dell_rbu drivers into platform/drivers/x86 as they are closely coupled with other drivers in this location. - Improve _init* usage for acerhdf and fix some usage issues with messages and module parameters. - Simplify asus-wmi by calling ACPI/WMI methods directly, eliminating workqueue overhead, eliminate double reporting of keyboard backlight. - Fix wake from USB failure on Bay Trail devices (intel_int0002_vgpio). - Notify intel_telemetry users when IPC1 device is not enabled. - Update various drivers with new laptop model IDs. - Update several intel drivers to use SPDX identifers and order headers alphabetically. * tag 'platform-drivers-x86-v4.20-1' of git://git.infradead.org/linux-platform-drivers-x86: (64 commits) HID: asus: only support backlight when it's not driven by WMI platform/x86: asus-wmi: export function for evaluating WMI methods platform/x86: asus-wmi: Only notify kbd LED hw_change by fn-key pressed platform/x86: wmi: declare device_type structure as constant platform/x86: ideapad: Add Y530-15ICH to no_hw_rfkill platform/x86: Add Intel AtomISP2 dummy / power-management driver platform/x86: touchscreen_dmi: Add min-x and min-y settings for various models platform/x86: touchscreen_dmi: Add info for the Onda V80 Plus v3 tablet platform/x86: touchscreen_dmi: Add info for the Trekstor Primetab T13B tablet platform/x86: intel_telemetry: Get rid of custom macro platform/x86: intel_telemetry: report debugfs failure MAINTAINERS: intel_telemetry: Update maintainers info platform/x86: Add LG Gram laptop special features driver platform/x86: asus-wmi: Simplify the keyboard brightness updating process platform/x86: touchscreen_dmi: Add info for the Trekstor Primebook C11 convertible platform/x86: mlx-platform: Properly use mlxplat_mlxcpld_msn201x_items MAINTAINERS: intel_pmc_core: Update MAINTAINERS firmware: dcdbas: include linux/io.h platform/x86: intel-wmi-thunderbolt: Add dynamic debugging platform/x86: intel-wmi-thunderbolt: Convert to use SPDX identifier ...
2018-11-01x86/compat: Adjust in_compat_syscall() to generic code under !COMPATDmitry Safonov
The result of in_compat_syscall() can be pictured as: x86 platform: --------------------------------------------------- | Arch\syscall | 64-bit | ia32 | x32 | |-------------------------------------------------| | x86_64 | false | true | true | |-------------------------------------------------| | i686 | | <true> | | --------------------------------------------------- Other platforms: ------------------------------------------- | Arch\syscall | 64-bit | compat | |-----------------------------------------| | 64-bit | false | true | |-----------------------------------------| | 32-bit(?) | | <false> | ------------------------------------------- As seen, the result of in_compat_syscall() on generic 32-bit platform differs from i686. There is no reason for in_compat_syscall() == true on native i686. It also easy to misread code if the result on native 32-bit platform differs between arches. Because of that non arch-specific code has many places with: if (IS_ENABLED(CONFIG_COMPAT) && in_compat_syscall()) in different variations. It looks-like the only non-x86 code which uses in_compat_syscall() not under CONFIG_COMPAT guard is in amd/amdkfd. But according to the commit a18069c132cb ("amdkfd: Disable support for 32-bit user processes"), it actually should be disabled on native i686. Rename in_compat_syscall() to in_32bit_syscall() for x86-specific code and make in_compat_syscall() false under !CONFIG_COMPAT. A follow on patch will clean up generic users which were forced to check IS_ENABLED(CONFIG_COMPAT) with in_compat_syscall(). Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Stultz <john.stultz@linaro.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-efi@vger.kernel.org Cc: netdev@vger.kernel.org Link: https://lkml.kernel.org/r/20181012134253.23266-2-dima@arista.com
2018-10-31Merge branch '10GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-10-31 This series contains a various collection of fixes. Miroslav Lichvar from Red Hat or should I say IBM now? Updates the PHC timecounter interval for igb so that it gets updated at least once every 550 seconds. Ngai-Mint provides a fix for fm10k to prevent a soft lockup or system crash by adding a new condition to determine if the SM mailbox is in the correct state before proceeding. Jake provides several fm10k fixes, first one marks complier aborts as non-fatal since on some platforms trigger machine check errors when the compile aborts. Added missing device ids to the in-kernel driver. Due to the recent fixes, bumped the driver version. I (Jeff Kirsher) fixed a XFRM_ALGO dependency for both ixgbe and ixgbevf. This fix was based on the original work from Arnd Bergmann, which only fixed ixgbe. Mitch provides a fix for i40e/avf to update the status codes, which resolves an issue between a mis-match between i40e and the iavf driver, which also supports the ice LAN driver. Radoslaw fixes the ixgbe where the driver is logging a message about spoofed packets detected when the VF is re-started with a different MAC address. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31bpf: fix partial copy of map_ptr when dst is scalarDaniel Borkmann
ALU operations on pointers such as scalar_reg += map_value_ptr are handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr and range in the register state share a union, so transferring state through dst_reg->range = ptr_reg->range is just buggy as any new map_ptr in the dst_reg is then truncated (or null) for subsequent checks. Fix this by adding a raw member and use it for copying state over to dst_reg. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-31Merge tag 'tag-chrome-platform-for-v4.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform Pull chrome-platform updates from Benson Leung: - Move mfd/cros_ec_lpc* includes to drivers/platform from mfd - Adding a new interrupt path for cros_ec_lpc * tag 'tag-chrome-platform-for-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform: platform/chrome: chromeos_tbmc - Remove unneeded const platform/chrome: Add a new interrupt path for cros_ec_lpc mfd: cros_ec: Fix and improve kerneldoc comments. platform/chrome: Move mfd/cros_ec_lpc* includes to drivers/platform.
2018-11-01netfilter: ipset: list:set: Decrease refcount synchronously on deletion and ↵Stefano Brivio
replace Commit 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") postponed decreasing set reference counters to the RCU callback. An 'ipset del' command can terminate before the RCU grace period is elapsed, and if sets are listed before then, the reference counter shown in userspace will be wrong: # ipset create h hash:ip; ipset create l list:set; ipset add l # ipset del l h; ipset list h Name: h Type: hash:ip Revision: 4 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 88 References: 1 Number of entries: 0 Members: # sleep 1; ipset list h Name: h Type: hash:ip Revision: 4 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 88 References: 0 Number of entries: 0 Members: Fix this by making the reference count update synchronous again. As a result, when sets are listed, ip_set_name_byindex() might now fetch a set whose reference count is already zero. Instead of relying on the reference count to protect against concurrent set renaming, grab ip_set_ref_lock as reader and copy the name, while holding the same lock in ip_set_rename() as writer instead. Reported-by: Li Shuang <shuali@redhat.com> Fixes: 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-31Merge tag 'fuse-update-4.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "As well as the usual bug fixes, this adds the following new features: - cached readdir and readlink - max I/O size increased from 128k to 1M - improved performance and scalability of request queues - copy_file_range support The only non-fuse bits are trivial cleanups of macros in <linux/bitops.h>" * tag 'fuse-update-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (31 commits) fuse: enable caching of symlinks fuse: only invalidate atime in direct read fuse: don't need GETATTR after every READ fuse: allow fine grained attr cache invaldation bitops: protect variables in bit_clear_unless() macro bitops: protect variables in set_mask_bits() macro fuse: realloc page array fuse: add max_pages to init_out fuse: allocate page array more efficiently fuse: reduce size of struct fuse_inode fuse: use iversion for readdir cache verification fuse: use mtime for readdir cache verification fuse: add readdir cache version fuse: allow using readdir cache fuse: allow caching readdir fuse: extract fuse_emit() helper fuse: add FOPEN_CACHE_DIR fuse: split out readdir.c fuse: Use hash table to link processing request fuse: kill req->intr_unique ...
2018-10-31Merge tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The highlights are: - a series that fixes some old memory allocation issues in libceph (myself). We no longer allocate memory in places where allocation failures cannot be handled and BUG when the allocation fails. - support for copy_file_range() syscall (Luis Henriques). If size and alignment conditions are met, it leverages RADOS copy-from operation. Otherwise, a local copy is performed. - a patch that reduces memory requirement of ceph_sync_read() from the size of the entire read to the size of one object (Zheng Yan). - fallocate() syscall is now restricted to FALLOC_FL_PUNCH_HOLE (Luis Henriques)" * tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-client: (25 commits) ceph: new mount option to disable usage of copy-from op ceph: support copy_file_range file operation libceph: support the RADOS copy-from operation ceph: add non-blocking parameter to ceph_try_get_caps() libceph: check reply num_data_items in setup_request_data() libceph: preallocate message data items libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls libceph: introduce alloc_watch_request() libceph: assign cookies in linger_submit() libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get() ceph: num_ops is off by one in ceph_aio_retry_work() libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op() ceph: set timeout conditionally in __cap_delay_requeue libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist() libceph: introduce ceph_pagelist_alloc() libceph: osd_req_op_cls_init() doesn't need to take opcode libceph: bump CEPH_MSG_MAX_DATA_LEN ceph: only allow punch hole mode in fallocate ceph: refactor ceph_sync_read() ceph: check if LOOKUPNAME request was aborted when filling trace ...
2018-10-31EDAC, skx: Fix randconfig buildsBorislav Petkov
The driver depends on the ADXL component glue and selects it. However, ADXL itself implicitly depends on ACPI and in nonsensical randconfig builds like this: # CONFIG_ACPI is not set CONFIG_ACPI_ADXL=y where ACPI is not enabled, the build fails with: drivers/edac/skx_edac.o: In function `skx_mce_check_error': skx_edac.c:(.text+0xab): undefined reference to `adxl_decode' drivers/edac/skx_edac.o: In function `skx_init': skx_edac.c:(.init.text+0x8bf): undefined reference to `adxl_get_component_names' make: *** [vmlinux] Error 1 Add stubs for that case so that the build succeeds. CONFIG_ACPI=n doesn't make any sense for real configurations but this fix will at least silence randconfig builds. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org>