summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-04-13Merge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fix: - Fix a deadlock in close() due to incorrect draining of RDMA queues Bugfixes: - Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping" as it is causing stack overflows - Fix a regression where NFSv4 getacl and fs_locations stopped working - Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family. - Fix xfstests failures due to incorrect copy_file_range() return values" * tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping" NFSv4.1 fix incorrect return value in copy_file_range xprtrdma: Fix helper that drains the transport NFS: Fix handling of reply page vector NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
2019-04-13Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Here's more than a handful of clk driver fixes for changes that came in during the merge window: - Fix the AT91 sama5d2 programmable clk prescaler formula - A bunch of Amlogic meson clk driver fixes for the VPU clks - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc clks as critical only when really needed - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate implementation - Use the right structure to test for a frequency table in i.MX's PLL_1416x driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx: Fix PLL_1416X not rounding rates clk: mediatek: fix clk-gate flag setting platform/x86: pmc_atom: Drop __initconst on dmi table clk: x86: Add system specific quirk to mark clocks as critical clk: meson: vid-pll-div: remove warning and return 0 on invalid config clk: meson: pll: fix rounding and setting a rate that matches precisely clk: meson-g12a: fix VPU clock parents clk: meson: g12a: fix VPU clock muxes mask clk: meson-gxbb: round the vdec dividers to closest clk: at91: fix programmable clock for sama5d2
2019-04-13CPER: Remove unnecessary use of user-space typesBjorn Helgaas
"__u32" and similar types are intended for things exported to user-space, including structs used in ioctls; see include/uapi/asm-generic/int-l64.h. They are not needed for the CPER struct definitions, which not exported to user-space and not used in ioctls. Replace them with the typical "u32" and similar types. No functional change intended. The reason for changing this is to remove the question of "why do we use __u32 here instead of u32?" We should use __u32 when there's a reason for it; otherwise, we should prefer u32 for consistency. Reference: Documentation/process/coding-style.rst Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Masahiro Yamada <yamada.masahiro@socionext.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Andrew Morton <akpm@linux-foundation.org>
2019-04-13CPER: Add UEFI spec referencesBjorn Helgaas
Add UEFI spec references for CPER UUIDs and structures, fix a few typos, and remove some useless comments. No functional change intended. Link: http://www.uefi.org/specifications Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-04-12Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion bug" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add rewind_stack_do_exit() to the noreturn list linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
2019-04-12fs: drop unused fput_atomic definitionLukas Bulwahn
commit d7065da03822 ("get rid of the magic around f_count in aio") added fput_atomic to include/linux/fs.h, motivated by its use in __aio_put_req() in fs/aio.c. Later, commit 3ffa3c0e3f6e ("aio: now fput() is OK from interrupt context; get rid of manual delayed __fput()") removed the only use of fput_atomic in __aio_put_req(), but did not remove the since then unused fput_atomic definition in include/linux/fs.h. We curate this now and finally remove the unused definition. This issue was identified during a code review due to a coccinelle warning from the atomic_as_refcounter.cocci rule pointing to the use of atomic_t in fput_atomic. Suggested-by: Krystian Radlak <kradlak@exida.com> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-04-12rhashtable: use BIT(0) for locking.NeilBrown
As reported by Guenter Roeck, the new bit-locking using BIT(1) doesn't work on the m68k architecture. m68k only requires 2-byte alignment for words and longwords, so there is only one unused bit in pointers to structs - We current use two, one for the NULLS marker at the end of the linked list, and one for the bit-lock in the head of the list. The two uses don't need to conflict as we never need the head of the list to be a NULLS marker - the marker is only needed to check if an object has moved to a different table, and the bucket head cannot move. The NULLS marker is only needed in a ->next pointer. As we already have different types for the bucket head pointer (struct rhash_lock_head) and the ->next pointers (struct rhash_head), it is fairly easy to treat the lsb differently in each. So: Initialize buckets heads to NULL, and use the lsb for locking. When loading the pointer from the bucket head, if it is NULL (ignoring the lock big), report as being the expected NULLS marker. When storing a value into a bucket head, if it is a NULLS marker, store NULL instead. And convert all places that used bit 1 for locking, to use bit 0. Fixes: 8f0db018006a ("rhashtable: use bit_spin_locks to protect hash bucket.") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rhashtable: replace rht_ptr_locked() with rht_assign_locked()NeilBrown
The only times rht_ptr_locked() is used, it is to store a new value in a bucket-head. This is the only time it makes sense to use it too. So replace it by a function which does the whole task: Sets the lock bit and assigns to a bucket head. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rhashtable: move dereference inside rht_ptr()NeilBrown
Rather than dereferencing a pointer to a bucket and then passing the result to rht_ptr(), we now pass in the pointer and do the dereference in rht_ptr(). This requires that we pass in the tbl and hash as well to support RCU checks, and means that the various rht_for_each functions can expect a pointer that can be dereferenced without further care. There are two places where we dereference a bucket pointer where there is no testable protection - in each case we know that we much have exclusive access without having taken a lock. The previous code used rht_dereference() to pretend that holding the mutex provided protects, but holding the mutex never provides protection for accessing buckets. So instead introduce rht_ptr_exclusive() that can be used when there is known to be exclusive access without holding any locks. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rhashtable: reorder some inline functions and macros.NeilBrown
This patch only moves some code around, it doesn't change the code at all. A subsequent patch will benefit from this as it needs to add calls to functions which are now defined before the call-site, but weren't before. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rhashtable: fix some __rcu annotation errorsNeilBrown
With these annotations, the rhashtable now gets no warnings when compiled with "C=1" for sparse checking. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12scsi: scsi_transport_fc: nvme: display FC-NVMe port rolesHannes Reinecke
Currently the FC-NVMe driver is leverating the SCSI FC transport class to access the remote ports. Which means that all FC-NVMe remote ports will be visible to the fc transport layer, but due to missing definitions the port roles will always be 'unknown'. This patch adds the missing definitions to the fc transport class to that the port roles are correctly displayed. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Giridhar Malavali <gmalavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-12Merge branch 'next-integrity-for-james' of ↵James Morris
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next-integrity From Mimi: "This pull request contains just three patches, the remainder are either included in other pull requests (eg. audit, lockdown) or will be upstreamed via other subsystems (eg. kselftests, Power).  Included in this pull request is one bug fix, one documentation update, and extending the x86 IMA arch policy rules to coordinate the different kernel module signature verification methods."
2019-04-12bpf: Introduce bpf_strtol and bpf_strtoul helpersAndrey Ignatov
Add bpf_strtol and bpf_strtoul to convert a string to long and unsigned long correspondingly. It's similar to user space strtol(3) and strtoul(3) with a few changes to the API: * instead of NUL-terminated C string the helpers expect buffer and buffer length; * resulting long or unsigned long is returned in a separate result-argument; * return value is used to indicate success or failure, on success number of consumed bytes is returned that can be used to identify position to read next if the buffer is expected to contain multiple integers; * instead of *base* argument, *flags* is used that provides base in 5 LSB, other bits are reserved for future use; * number of supported bases is limited. Documentation for the new helpers is provided in bpf.h UAPI. The helpers are made available to BPF_PROG_TYPE_CGROUP_SYSCTL programs to be able to convert string input to e.g. "ulongvec" output. E.g. "net/ipv4/tcp_mem" consists of three ulong integers. They can be parsed by calling to bpf_strtoul three times. Implementation notes: Implementation includes "../../lib/kstrtox.h" to reuse integer parsing functions. It's done exactly same way as fs/proc/base.c already does. Unfortunately existing kstrtoX function can't be used directly since they fail if any invalid character is present right after integer in the string. Existing simple_strtoX functions can't be used either since they're obsolete and don't handle overflow properly. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce ARG_PTR_TO_{INT,LONG} arg typesAndrey Ignatov
Currently the way to pass result from BPF helper to BPF program is to provide memory area defined by pointer and size: func(void *, size_t). It works great for generic use-case, but for simple types, such as int, it's overkill and consumes two arguments when it could use just one. Introduce new argument types ARG_PTR_TO_INT and ARG_PTR_TO_LONG to be able to pass result from helper to program via pointer to int and long correspondingly: func(int *) or func(long *). New argument types are similar to ARG_PTR_TO_MEM with the following differences: * they don't require corresponding ARG_CONST_SIZE argument, predefined access sizes are used instead (32bit for int, 64bit for long); * it's possible to use more than one such an argument in a helper; * provided pointers have to be aligned. It's easy to introduce similar ARG_PTR_TO_CHAR and ARG_PTR_TO_SHORT argument types. It's not done due to lack of use-case though. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Add file_pos field to bpf_sysctl ctxAndrey Ignatov
Add file_pos field to bpf_sysctl context to read and write sysctl file position at which sysctl is being accessed (read or written). The field can be used to e.g. override whole sysctl value on write to sysctl even when sys_write is called by user space with file_pos > 0. Or BPF program may reject such accesses. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce bpf_sysctl_{get,set}_new_value helpersAndrey Ignatov
Add helpers to work with new value being written to sysctl by user space. bpf_sysctl_get_new_value() copies value being written to sysctl into provided buffer. bpf_sysctl_set_new_value() overrides new value being written by user space with a one from provided buffer. Buffer should contain string representation of the value, similar to what can be seen in /proc/sys/. Both helpers can be used only on sysctl write. File position matters and can be managed by an interface that will be introduced separately. E.g. if user space calls sys_write to a file in /proc/sys/ at file position = X, where X > 0, then the value set by bpf_sysctl_set_new_value() will be written starting from X. If program wants to override whole value with specified buffer, file position has to be set to zero. Documentation for the new helpers is provided in bpf.h UAPI. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce bpf_sysctl_get_current_value helperAndrey Ignatov
Add bpf_sysctl_get_current_value() helper to copy current sysctl value into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer. It provides same string as user space can see by reading corresponding file in /proc/sys/, including new line, etc. Documentation for the new helper is provided in bpf.h UAPI. Since current value is kept in ctl_table->data in a parsed form, ctl_table->proc_handler() with write=0 is called to read that data and convert it to a string. Such a string can later be parsed by a program using helpers that will be introduced separately. Unfortunately it's not trivial to provide API to access parsed data due to variety of data representations (string, intvec, uintvec, ulongvec, custom structures, even NULL, etc). Instead it's assumed that user know how to handle specific sysctl they're interested in and appropriate helpers can be used. Since ctl_table->proc_handler() expects __user buffer, conversion to __user happens for kernel allocated one where the value is stored. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Sysctl hookAndrey Ignatov
Containerized applications may run as root and it may create problems for whole host. Specifically such applications may change a sysctl and affect applications in other containers. Furthermore in existing infrastructure it may not be possible to just completely disable writing to sysctl, instead such a process should be gradual with ability to log what sysctl are being changed by a container, investigate, limit the set of writable sysctl to currently used ones (so that new ones can not be changed) and eventually reduce this set to zero. The patch introduces new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type BPF_CGROUP_SYSCTL to solve these problems on cgroup basis. New program type has access to following minimal context: struct bpf_sysctl { __u32 write; }; Where @write indicates whether sysctl is being read (= 0) or written (= 1). Helpers to access sysctl name and value will be introduced separately. BPF_CGROUP_SYSCTL attach point is added to sysctl code right before passing control to ctl_table->proc_handler so that BPF program can either allow or deny access to sysctl. Suggested-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12block: disk_events: introduce event flagsMartin Wilck
Currently, an empty disk->events field tells the block layer not to forward media change events to user space. This was done in commit 7c88a168da80 ("block: don't propagate unlisted DISK_EVENTs to userland") in order to avoid events from "fringe" drivers to be forwarded to user space. By doing so, the block layer lost the information which events were supported by a particular block device, and most importantly, whether or not a given device supports media change events at all. Prepare for not interpreting the "events" field this way in the future any more. This is done by adding an additional field "event_flags" to struct gendisk, and two flag bits that can be set to have the device treated like one that had the "events" field set to a non-zero value before. This applies only to the sd and sr drivers, which are changed to set the new flags. The new flags are DISK_EVENT_FLAG_POLL to enforce polling of the device for synchronous events, and DISK_EVENT_FLAG_UEVENT to tell the blocklayer to generate udev events from kernel events. In order to add the event_flags field to struct gendisk, the events field is converted to an "unsigned short"; it doesn't need to hold values bigger than 2 anyway. This patch doesn't change behavior. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12block: genhd: remove async_events fieldMartin Wilck
The async_events field, intended to be used for drivers that support asynchronous notifications about disk events (aka media change events), isn't currently used by any driver, and apparently that has been that way for a long time (if not forever). Remove it. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12iommu: io-pgtable: Add ARM Mali midgard MMU page table formatRob Herring
ARM Mali midgard GPU is similar to standard 64-bit stage 1 page tables, but have a few differences. Add a new format type to represent the format. The input address size is 48-bits and the output address size is 40-bits (and possibly less?). Note that the later bifrost GPUs follow the standard 64-bit stage 1 format. The differences in the format compared to 64-bit stage 1 format are: The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. The access flags are not read-only and unprivileged, but read and write. This is similar to stage 2 entries, but the memory attributes field matches stage 1 being an index. The nG bit is not set by the vendor driver. This one didn't seem to matter, but we'll keep it aligned to the vendor driver. Cc: Will Deacon <will.deacon@arm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: iommu@lists.linux-foundation.org Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190409205427.6943-2-robh@kernel.org
2019-04-12block: only allow contiguous page structs in a bio_vecChristoph Hellwig
We currently have to call nth_page when iterating over pages inside a bio_vec. Jens complained a while ago that this is fairly expensive. To mitigate this we can check that that the actual page structures are contiguous when adding them to the bio, and just do check pointer arithmetics later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12block: change how we get page references in bio_iov_iter_get_pagesChristoph Hellwig
Instead of needing a special macro to iterate over all pages in a bvec just do a second passs over the whole bio. This also matches what we do on the release side. The release side helper is moved up to where we need the get helper to clearly express the symmetry. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12overflow.h: Add comment documenting __ab_c_size()Rasmus Villemoes
__ab_c_size() is a somewhat opaque name. Document its purpose, and while at it, rename the parameters to actually match the abc naming. [ bp: glued a complete patch from chunks on LKML. ] Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Matthew Wilcox <willy@infradead.org> Link: https://lkml.kernel.org/r/20190405045711.30339-1-bp@alien8.de
2019-04-12vfio/mdev: Add iommu related member in mdev_deviceLu Baolu
A parent device might create different types of mediated devices. For example, a mediated device could be created by the parent device with full isolation and protection provided by the IOMMU. One usage case could be found on Intel platforms where a mediated device is an assignable subset of a PCI, the DMA requests on behalf of it are all tagged with a PASID. Since IOMMU supports PASID-granular translations (scalable mode in VT-d 3.0), this mediated device could be individually protected and isolated by an IOMMU. This patch adds a new member in the struct mdev_device to indicate that the mediated device represented by mdev could be isolated and protected by attaching a domain to a device represented by mdev->iommu_device. It also adds a helper to add or set the iommu device. * mdev_device->iommu_device - This, if set, indicates that the mediated device could be fully isolated and protected by IOMMU via attaching an iommu domain to this device. If empty, it indicates using vendor defined isolation, hence bypass IOMMU. * mdev_set/get_iommu_device(dev, iommu_device) - Set or get the iommu device which represents this mdev in IOMMU's device scope. Drivers don't need to set the iommu device if it uses vendor defined isolation. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-04-12spi: Add spi_is_bpw_supported()Noralf Trønnes
This let SPI clients check if the controller supports a particular word width. drivers/gpu/drm/tinydrm/mipi-dbi.c will use this to determine if the controller supports 16-bit for RGB565 pixels. If it doesn't it will swap the bytes before transfer on little endian machines. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-04-12 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Improve BPF verifier scalability for large programs through two optimizations: i) remove verifier states that are not useful in pruning, ii) stop walking parentage chain once first LIVE_READ is seen. Combined gives approx 20x speedup. Increase limits for accepting large programs under root, and add various stress tests, from Alexei. 2) Implement global data support in BPF. This enables static global variables for .data, .rodata and .bss sections to be properly handled which allows for more natural program development. This also opens up the possibility to optimize program workflow by compiling ELFs only once and later only rewriting section data before reload, from Daniel and with test cases and libbpf refactoring from Joe. 3) Add config option to generate BTF type info for vmlinux as part of the kernel build process. DWARF debug info is converted via pahole to BTF. Latter relies on libbpf and makes use of BTF deduplication algorithm which results in 100x savings compared to DWARF data. Resulting .BTF section is typically about 2MB in size, from Andrii. 4) Add BPF verifier support for stack access with variable offset from helpers and add various test cases along with it, from Andrey. 5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header so that L2 encapsulation can be used for tc tunnels, from Alan. 6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that users can define a subset of allowed __sk_buff fields that get fed into the test program, from Stanislav. 7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up various UBSAN warnings in bpftool, from Yonghong. 8) Generate a pkg-config file for libbpf, from Luca. 9) Dump program's BTF id in bpftool, from Prashant. 10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP program, from Magnus. 11) kallsyms related fixes for the case when symbols are not present in BPF selftests and samples, from Daniel ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12bridge: broute: make broute a real ebtables tableFlorian Westphal
This makes broute a normal ebtables table, hooking at PREROUTING. The broute hook is removed. It uses skb->cb to signal to bridge rx handler that the skb should be routed instead of being bridged. This change is backwards compatible with ebtables as no userspace visible parts are changed. This means we can also remove the !ops test in ebt_register_table, it was only there for broute table sake. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-11PM / Domains: Add genpd governor for CPUsUlf Hansson
After some preceding changes, PM domains managed by genpd may contain CPU devices, so idle state residency values should be taken into account during the state selection process. [The residency value is the minimum amount of time to be spent by a CPU (or a group of CPUs) in an idle state in order to save more energy than could be saved by picking up a shallower idle state.] For this purpose, add a new genpd governor, pm_domain_cpu_gov, to be used for selecting idle states of PM domains with CPU devices attached either directly or through subdomains. The new governor computes the minimum expected idle duration for all online CPUs attached to a PM domain and its subdomains. Next, it finds the deepest idle state whose target residency is within the expected idle duration and selects it as the target idle state of the domain. It should be noted that the minimum expected idle duration computation is based on the closest timer event information stored in the per-CPU variables cpuidle_devices for all of the CPUs in the domain. That needs to be revisited in future, as obviously there are other reasons why a CPU may be woken up from idle. Co-developed-by: Lina Iyer <lina.iyer@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-04-11bpf: fix missing bpf_check_uarg_tail_zero in BPF_PROG_TEST_RUNStanislav Fomichev
Commit b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN") started using bpf_check_uarg_tail_zero in BPF_PROG_TEST_RUN. However, bpf_check_uarg_tail_zero is not defined for !CONFIG_BPF_SYSCALL: net/bpf/test_run.c: In function ‘bpf_ctx_init’: net/bpf/test_run.c:142:9: error: implicit declaration of function ‘bpf_check_uarg_tail_zero’ [-Werror=implicit-function-declaration] err = bpf_check_uarg_tail_zero(data_in, max_size, size); ^~~~~~~~~~~~~~~~~~~~~~~~ Let's not build net/bpf/test_run.c when CONFIG_BPF_SYSCALL is not set. Reported-by: kbuild test robot <lkp@intel.com> Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"Trond Myklebust
This reverts commit 009a82f6437490c262584d65a14094a818bcb747. The ability to optimise here relies on compiler being able to optimise away tail calls to avoid stack overflows. Unfortunately, we are seeing reports of problems, so let's just revert. Reported-by: Daniel Mack <daniel@zonque.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-04-11video: amba-clcd: Decomission Versatile and NomadikLinus Walleij
These board families are now handled in the DRM subsystem where we can have reusable panel drivers and some other stuff. The PL111 there is now the driver used in the defconfig for Versatile and Nomadik so no need to keep this code around. There are a few minor machines in arch/arm/ such as mach-netx still using the old driver, so we need to keep the core fbdev driver around for some time. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2019-04-11nvmet: fix discover log page when offsets are usedKeith Busch
The nvme target hadn't been taking the Get Log Page offset parameter into consideration, and so has been returning corrupted log pages when offsets are used. Since many tools, including nvme-cli, split the log request to 4k, we've been breaking discovery log responses when more than 3 subsystems exist. Fix the returned data by internally generating the entire discovery log page and copying only the requested bytes into the user buffer. The command log page offset type has been modified to a native __le64 to make it easier to extract the value from a command. Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-04-11iommu/vt-d: Aux-domain specific domain attach/detachLu Baolu
When multiple domains per device has been enabled by the device driver, the device will tag the default PASID for the domain to all DMA traffics out of the subset of this device; and the IOMMU should translate the DMA requests in PASID granularity. This adds the intel_iommu_aux_attach/detach_device() ops to support managing PASID granular translation structures when the device driver has enabled multiple domains per device. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-04-11iommu/vt-d: Add per-device IOMMU feature ops entriesLu Baolu
This adds the iommu ops entries for aux-domain per-device feature query and enable/disable. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-04-11iommu/vt-d: Make intel_iommu_enable_pasid() more genericLu Baolu
This moves intel_iommu_enable_pasid() out of the scope of CONFIG_INTEL_IOMMU_SVM with more and more features requiring pasid function. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-04-11iommu: Bind process address spaces to devicesJean-Philippe Brucker
Add bind() and unbind() operations to the IOMMU API. iommu_sva_bind_device() binds a device to an mm, and returns a handle to the bond, which is released by calling iommu_sva_unbind_device(). Each mm bound to devices gets a PASID (by convention, a 20-bit system-wide ID representing the address space), which can be retrieved with iommu_sva_get_pasid(). When programming DMA addresses, device drivers include this PASID in a device-specific manner, to let the device access the given address space. Since the process memory may be paged out, device and IOMMU must support I/O page faults (e.g. PCI PRI). Using iommu_sva_set_ops(), device drivers provide an mm_exit() callback that is called by the IOMMU driver if the process exits before the device driver called unbind(). In mm_exit(), device driver should disable DMA from the given context, so that the core IOMMU can reallocate the PASID. Whether the process exited or nor, the device driver should always release the handle with unbind(). To use these functions, device driver must first enable the IOMMU_DEV_FEAT_SVA device feature with iommu_dev_enable_feature(). Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-04-11iommu: Add APIs for multiple domains per deviceLu Baolu
Sharing a physical PCI device in a finer-granularity way is becoming a consensus in the industry. IOMMU vendors are also engaging efforts to support such sharing as well as possible. Among the efforts, the capability of support finer-granularity DMA isolation is a common requirement due to the security consideration. With finer-granularity DMA isolation, subsets of a PCI function can be isolated from each others by the IOMMU. As a result, there is a request in software to attach multiple domains to a physical PCI device. One example of such use model is the Intel Scalable IOV [1] [2]. The Intel vt-d 3.0 spec [3] introduces the scalable mode which enables PASID granularity DMA isolation. This adds the APIs to support multiple domains per device. In order to ease the discussions, we call it 'a domain in auxiliary mode' or simply 'auxiliary domain' when multiple domains are attached to a physical device. The APIs include: * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX) - Detect both IOMMU and PCI endpoint devices supporting the feature (aux-domain here) without the host driver dependency. * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX) - Check the enabling status of the feature (aux-domain here). The aux-domain interfaces are available only if this returns true. * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX) - Enable/disable device specific aux-domain feature. * iommu_aux_attach_device(domain, dev) - Attaches @domain to @dev in the auxiliary mode. Multiple domains could be attached to a single device in the auxiliary mode with each domain representing an isolated address space for an assignable subset of the device. * iommu_aux_detach_device(domain, dev) - Detach @domain which has been attached to @dev in the auxiliary mode. * iommu_aux_get_pasid(domain, dev) - Return ID used for finer-granularity DMA translation. For the Intel Scalable IOV usage model, this will be a PASID. The device which supports Scalable IOV needs to write this ID to the device register so that DMA requests could be tagged with a right PASID prefix. This has been updated with the latest proposal from Joerg posted here [5]. Many people involved in discussions of this design. Kevin Tian <kevin.tian@intel.com> Liu Yi L <yi.l.liu@intel.com> Ashok Raj <ashok.raj@intel.com> Sanjay Kumar <sanjay.k.kumar@intel.com> Jacob Pan <jacob.jun.pan@linux.intel.com> Alex Williamson <alex.williamson@redhat.com> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Joerg Roedel <joro@8bytes.org> and some discussions can be found here [4] [5]. [1] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification [2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf [3] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification [4] https://lkml.org/lkml/2018/7/26/4 [5] https://www.spinics.net/lists/iommu/msg31874.html Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Suggested-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Suggested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-04-11iommu/iova: Separate atomic variables to improve performanceJinyu Qi
In struct iova_domain, there are three atomic variables, the former two are about TLB flush counters which use atomic_add operation, anoter is used to flush timer that use cmpxhg operation. These variables are in the same cache line, so it will cause some performance loss under the condition that many cores call queue_iova function, Let's isolate the two type atomic variables to different cache line to reduce cache line conflict. Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Jinyu Qi <jinyuqi@huawei.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-04-11gpio: gpio-omap: configure edge detection for level IRQs for idle wakeupRussell King
The GPIO block can enter idle independently of the CPU power management calls via smart-idle. When the GPIO block enters idle, level detection stops working due to clocks being shut off, and an alternative form of edge detection is used. However, this needs the edge detection registers set to mark the appropriate edges. Arrange to configure the edge detection enables along with the level detection to ensure that any transition to active interrupt state that occurs while the block is idle is detected as a wake-up event. Since we enable the edge detection when configuring the IRQ, both omap2_gpio_enable_level_quirk() nor omap2_gpio_disable_level_quirk() become redundant, which also means OMAP_GPIO_QUIRK_IDLE_REMOVE_TRIGGER can be removed. This can be now done without regressions as patch "gpio: gpio-omap: fix level interrupt idling" allows level interrupts to idle on omap4 without a workaround. Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: Tero Kristo <t-kristo@ti.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> [tony@atomide.com: update description for the fix dependency] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-04-11firmware: imx: enable imx scu general irq functionAnson Huang
The System Controller Firmware (SCFW) controls RTC, thermal and WDOG etc., these resources' interrupt function are managed by SCU. When any IRQ pending, SCU will notify Linux via MU general interrupt channel #3, and Linux kernel needs to call SCU APIs to get IRQ status and notify each module to handle the interrupt. Since there is no data transmission for SCU IRQ notification, so doorbell mode is used for this MU channel, and SCU driver will use notifier mechanism to broadcast to every module which registers the SCU block notifier. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-04-10failover: allow name change on IFF_UP slave interfacesSi-Wei Liu
When a netdev appears through hot plug then gets enslaved by a failover master that is already up and running, the slave will be opened right away after getting enslaved. Today there's a race that userspace (udev) may fail to rename the slave if the kernel (net_failover) opens the slave earlier than when the userspace rename happens. Unlike bond or team, the primary slave of failover can't be renamed by userspace ahead of time, since the kernel initiated auto-enslavement is unable to, or rather, is never meant to be synchronized with the rename request from userspace. As the failover slave interfaces are not designed to be operated directly by userspace apps: IP configuration, filter rules with regard to network traffic passing and etc., should all be done on master interface. In general, userspace apps only care about the name of master interface, while slave names are less important as long as admin users can see reliable names that may carry other information describing the netdev. For e.g., they can infer that "ens3nsby" is a standby slave of "ens3", while for a name like "eth0" they can't tell which master it belongs to. Historically the name of IFF_UP interface can't be changed because there might be admin script or management software that is already relying on such behavior and assumes that the slave name can't be changed once UP. But failover is special: with the in-kernel auto-enslavement mechanism, the userspace expectation for device enumeration and bring-up order is already broken. Previously initramfs and various userspace config tools were modified to bypass failover slaves because of auto-enslavement and duplicate MAC address. Similarly, in case that users care about seeing reliable slave name, the new type of failover slaves needs to be taken care of specifically in userspace anyway. It's less risky to lift up the rename restriction on failover slave which is already UP. Although it's possible this change may potentially break userspace component (most likely configuration scripts or management software) that assumes slave name can't be changed while UP, it's relatively a limited and controllable set among all userspace components, which can be fixed specifically to listen for the rename events on failover slaves. Userspace component interacting with slaves is expected to be changed to operate on failover master interface instead, as the failover slave is dynamic in nature which may come and go at any point. The goal is to make the role of failover slaves less relevant, and userspace components should only deal with failover master in the long run. Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-10Drivers: hv: vmbus: Fix race condition with new ring_buffer_info mutexKimberly Brown
Fix a race condition that can result in a ring buffer pointer being set to null while a "_show" function is reading the ring buffer's data. This problem was discussed here: https://lkml.org/lkml/2018/10/18/779 To fix the race condition, add a new mutex lock to the "hv_ring_buffer_info" struct. Add a new function, "hv_ringbuffer_pre_init()", where a channel's inbound and outbound ring_buffer_info mutex locks are initialized. Acquire/release the locks in the "hv_ringbuffer_cleanup()" function, which is where the ring buffer pointers are set to null. Acquire/release the locks in the four channel-level "_show" functions that access ring buffer data. Remove the "const" qualifier from the "vmbus_channel" parameter and the "rbi" variable of the channel-level "_show" functions so that the locks can be acquired/released in these functions. Acquire/release the locks in hv_ringbuffer_get_debuginfo(). Remove the "const" qualifier from the "hv_ring_buffer_info" parameter so that the locks can be acquired/released in this function. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-10clk: x86: Add system specific quirk to mark clocks as criticalDavid Müller
Since commit 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL"), the pmc_plt_clocks of the Bay Trail SoC are unconditionally gated off. Unfortunately this will break systems where these clocks are used for external purposes beyond the kernel's knowledge. Fix it by implementing a system specific quirk to mark the necessary pmc_plt_clks as critical. Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") Signed-off-by: David Müller <dave.mueller@gmx.ch> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-04-10of: use correct function prototype for of_overlay_fdt_apply()Chris Packham
When CONFIG_OF_OVERLAY is not enabled the fallback stub for of_overlay_fdt_apply() does not match the prototype for the case when CONFIG_OF_OVERLAY is enabled. Update the stub to use the correct function prototype. Fixes: 39a751a4cb7e ("of: change overlay apply input data from unflattened to FDT") Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-04-10Merge branch 'mlx5-next' into rdma.git for-nextJason Gunthorpe
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies on the next series * branch 'mlx5-next': net/mlx5: E-Switch, add a new prio to be used by the RDMA side net/mlx5: E-Switch, don't use hardcoded values for FDB prios net/mlx5: Fix false compilation warning net/mlx5: Expose MPEIN (Management PCIE INfo) register layout net/mlx5: Add rate limit print macros net/mlx5: Add explicit bar address field net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/info net/mlx5: Use dev->priv.name instead of dev_name net/mlx5: Make mlx5_core messages independent from mdev->pdev net/mlx5: Break load_one into three stages net/mlx5: Function setup/teardown procedures net/mlx5: Move health and page alloc init to mdev_init net/mlx5: Split mdev init and pci init net/mlx5: Remove redundant init functions parameter net/mlx5: Remove spinlock support from mlx5_write64 net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macros Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10keys: safe concurrent user->{session,uid}_keyring accessJann Horn
The current code can perform concurrent updates and reads on user->session_keyring and user->uid_keyring. Add a comment to struct user_struct to document the nontrivial locking semantics, and use READ_ONCE() for unlocked readers and smp_store_release() for writers to prevent memory ordering issues. Fixes: 69664cf16af4 ("keys: don't generate user and user session keyrings unless they're accessed") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2019-04-10security: don't use RCU accessors for cred->session_keyringJann Horn
sparse complains that a bunch of places in kernel/cred.c access cred->session_keyring without the RCU helpers required by the __rcu annotation. cred->session_keyring is written in the following places: - prepare_kernel_cred() [in a new cred struct] - keyctl_session_to_parent() [in a new cred struct] - prepare_creds [in a new cred struct, via memcpy] - install_session_keyring_to_cred() - from install_session_keyring() on new creds - from join_session_keyring() on new creds [twice] - from umh_keys_init() - from call_usermodehelper_exec_async() on new creds All of these writes are before the creds are committed; therefore, cred->session_keyring doesn't need RCU protection. Remove the __rcu annotation and fix up all existing users that use __rcu. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2019-04-10Merge branch 'omap-for-v5.2/am4-ddr3' into omap-for-v5.2/am4-pm-v2Tony Lindgren