summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-10-17bpf: Add support for BTF pointers to x86 JITAlexei Starovoitov
Pointer to BTF object is a pointer to kernel object or NULL. Such pointers can only be used by BPF_LDX instructions. The verifier changed their opcode from LDX|MEM|size to LDX|PROBE_MEM|size to make JITing easier. The number of entries in extable is the number of BPF_LDX insns that access kernel memory via "pointer to BTF type". Only these load instructions can fault. Since x86 extable is relative it has to be allocated in the same memory region as JITed code. Allocate it prior to last pass of JITing and let the last pass populate it. Pointer to extable in bpf_prog_aux is necessary to make page fault handling fast. Page fault handling is done in two steps: 1. bpf_prog_kallsyms_find() finds BPF program that page faulted. It's done by walking rb tree. 2. then extable for given bpf program is binary searched. This process is similar to how page faulting is done for kernel modules. The exception handler skips over faulting x86 instruction and initializes destination register with zero. This mimics exact behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed). JITs for other architectures can add support in similar way. Until then they will reject unknown opcode and fallback to interpreter. Since extable should be aligned and placed near JITed code make bpf_jit_binary_alloc() return 4 byte aligned image offset, so that extable aligning formula in bpf_int_jit_compile() doesn't need to rely on internal implementation of bpf_jit_binary_alloc(). On x86 gcc defaults to 16-byte alignment for regular kernel functions due to better performance. JITed code may be aligned to 16 in the future, but it will use 4 in the meantime. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-10-ast@kernel.org
2019-10-17bpf: Add support for BTF pointers to interpreterAlexei Starovoitov
Pointer to BTF object is a pointer to kernel object or NULL. The memory access in the interpreter has to be done via probe_kernel_read to avoid page faults. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-9-ast@kernel.org
2019-10-17bpf: Implement accurate raw_tp context access via BTFAlexei Starovoitov
libbpf analyzes bpf C program, searches in-kernel BTF for given type name and stores it into expected_attach_type. The kernel verifier expects this btf_id to point to something like: typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc); which represents signature of raw_tracepoint "kfree_skb". Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb' and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint. In first case it passes btf_id of 'struct sk_buff *' back to the verifier core and 'void *' in second case. Then the verifier tracks PTR_TO_BTF_ID as any other pointer type. Like PTR_TO_SOCKET points to 'struct bpf_sock', PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on. PTR_TO_BTF_ID points to in-kernel structs. If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF then PTR_TO_BTF_ID#1234 points to one of in kernel skbs. When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32) the btf_struct_access() checks which field of 'struct sk_buff' is at offset 32. Checks that size of access matches type definition of the field and continues to track the dereferenced type. If that field was a pointer to 'struct net_device' the r2's type will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device' in vmlinux's BTF. Such verifier analysis prevents "cheating" in BPF C program. The program cannot cast arbitrary pointer to 'struct sk_buff *' and access it. C compiler would allow type cast, of course, but the verifier will notice type mismatch based on BPF assembly and in-kernel BTF. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-7-ast@kernel.org
2019-10-17bpf: Add attach_btf_id attribute to program loadAlexei Starovoitov
Add attach_btf_id attribute to prog_load command. It's similar to existing expected_attach_type attribute which is used in several cgroup based program types. Unfortunately expected_attach_type is ignored for tracing programs and cannot be reused for new purpose. Hence introduce attach_btf_id to verify bpf programs against given in-kernel BTF type id at load time. It is strictly checked to be valid for raw_tp programs only. In a later patches it will become: btf_id == 0 semantics of existing raw_tp progs. btd_id > 0 raw_tp with BTF and additional type safety. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-5-ast@kernel.org
2019-10-17bpf: Process in-kernel BTFAlexei Starovoitov
If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux' for further use by the verifier. In-kernel BTF is trusted just like kallsyms and other build artifacts embedded into vmlinux. Yet run this BTF image through BTF verifier to make sure that it is valid and it wasn't mangled during the build. If in-kernel BTF is incorrect it means either gcc or pahole or kernel are buggy. In such case disallow loading BPF programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-4-ast@kernel.org
2019-10-17bpf: Add typecast to bpf helpers to help BTF generationAlexei Starovoitov
When pahole converts dwarf to btf it emits only used types. Wrap existing bpf helper functions into typedef and use it in typecast to make gcc emits this type into dwarf. Then pahole will convert it to btf. The "btf_#name_of_helper" types will be used to figure out types of arguments of bpf helpers. The generated code before and after is the same. Only dwarf and btf sections are different. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-3-ast@kernel.org
2019-10-17ARM: mmp: move cputype.h to include/linux/soc/Lubomir Rintel
Let's move cputype.h away from mach-mmp/ so that the drivers outside that directory are able to tell the precise silicon revision. The MMP3 USB OTG PHY driver needs this. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
2019-10-17printf: add support for printing symbolic error namesRasmus Villemoes
It has been suggested several times to extend vsnprintf() to be able to convert the numeric value of ENOSPC to print "ENOSPC". This implements that as a %p extension: With %pe, one can do if (IS_ERR(foo)) { pr_err("Sorry, can't do that: %pe\n", foo); return PTR_ERR(foo); } instead of what is seen in quite a few places in the kernel: if (IS_ERR(foo)) { pr_err("Sorry, can't do that: %ld\n", PTR_ERR(foo)); return PTR_ERR(foo); } If the value passed to %pe is an ERR_PTR, but the library function errname() added here doesn't know about the value, the value is simply printed in decimal. If the value passed to %pe is not an ERR_PTR, we treat it as an ordinary %p and thus print the hashed value (passing non-ERR_PTR values to %pe indicates a bug in the caller, but we can't do much about that). With my embedded hat on, and because it's not very invasive to do, I've made it possible to remove this. The errname() function and associated lookup tables take up about 3K. For most, that's probably quite acceptable and a price worth paying for more readable dmesg (once this starts getting used), while for those that disable printk() it's of very little use - I don't see a procfs/sysfs/seq_printf() file reasonably making use of this - and they clearly want to squeeze vmlinux as much as possible. Hence the default y if PRINTK. The symbols to include have been found by massaging the output of find arch include -iname 'errno*.h' | xargs grep -E 'define\s*E' In the cases where some common aliasing exists (e.g. EAGAIN=EWOULDBLOCK on all platforms, EDEADLOCK=EDEADLK on most), I've moved the more popular one (in terms of 'git grep -w Efoo | wc) to the bottom so that one takes precedence. Link: http://lkml.kernel.org/r/20191015190706.15989-1-linux@rasmusvillemoes.dk To: "Jonathan Corbet" <corbet@lwn.net> To: linux-kernel@vger.kernel.org Cc: "Andy Shevchenko" <andy.shevchenko@gmail.com> Cc: "Andrew Morton" <akpm@linux-foundation.org> Cc: "Joe Perches" <joe@perches.com> Cc: linux-doc@vger.kernel.org Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org> Reviewed-by: Petr Mladek <pmladek@suse.com> [andy.shevchenko@gmail.com: use abs()] Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-10-17pidfd: check pid has attached task in fdinfoChristian Brauner
Currently, when a task is dead we still print the pid it used to use in the fdinfo files of its pidfds. This doesn't make much sense since the pid may have already been reused. So verify that the task is still alive by introducing the pid_has_task() helper which will be used by other callers in follow-up patches. If the task is not alive anymore, we will print -1. This allows us to differentiate between a task not being present in a given pid namespace - in which case we already print 0 - and a task having been reaped. Note that this uses PIDTYPE_PID for the check. Technically, we could've checked PIDTYPE_TGID since pidfds currently only refer to thread-group leaders but if they won't anymore in the future then this check becomes problematic without it being immediately obvious to non-experts imho. If a thread is created via clone(CLONE_THREAD) than struct pid has a single non-empty list pid->tasks[PIDTYPE_PID] and this pid can't be used as a PIDTYPE_TGID meaning pid->tasks[PIDTYPE_TGID] will return NULL even though the thread-group leader might still be very much alive. So checking PIDTYPE_PID is fine and is easier to maintain should we ever allow pidfds to refer to threads. Cc: Jann Horn <jannh@google.com> Cc: Christian Kellner <christian@kellner.me> Cc: linux-api@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20191017101832.5985-1-christian.brauner@ubuntu.com
2019-10-17driver: core: Improve documentation for fwnode_operations.add_links()Saravana Kannan
The add_links() ops shouldn't return on the first failed device link add. It needs to continue trying to add device links to other suppliers that are available. The documentation didn't explain WHY this behavior is necessary. So, update the documentation with an example that explains why this is necessary. Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20191011191521.179614-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-17netfilter: add and use nf_hook_slow_list()Florian Westphal
At this time, NF_HOOK_LIST() macro will iterate the list and then calls nf_hook() for each individual skb. This makes it so the entire list is passed into the netfilter core. The advantage is that we only need to fetch the rule blob once per list instead of per-skb. NF_HOOK_LIST now only works for ipv4 and ipv6, as those are the only callers. v2: use skb_list_del_init() instead of list_del (Edward Cree) Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-16net: sfp: move fwnode parsing into sfp-bus layerRussell King
Rather than parsing the sfp firmware node in phylink, parse it in the sfp-bus code, so we can re-use this code for PHYs without having to duplicate the parsing. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-16arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabledJulien Thierry
Preempting from IRQ-return means that the task has its PSTATE saved on the stack, which will get restored when the task is resumed and does the actual IRQ return. However, enabling some CPU features requires modifying the PSTATE. This means that, if a task was scheduled out during an IRQ-return before all CPU features are enabled, the task might restore a PSTATE that does not include the feature enablement changes once scheduled back in. * Task 1: PAN == 0 ---| |--------------- | |<- return from IRQ, PSTATE.PAN = 0 | <- IRQ | +--------+ <- preempt() +-- ^ | reschedule Task 1, PSTATE.PAN == 1 * Init: --------------------+------------------------ ^ | enable_cpu_features set PSTATE.PAN on all CPUs Worse than this, since PSTATE is untouched when task switching is done, a task missing the new bits in PSTATE might affect another task, if both do direct calls to schedule() (outside of IRQ/exception contexts). Fix this by preventing preemption on IRQ-return until features are enabled on all CPUs. This way the only PSTATE values that are saved on the stack are from synchronous exceptions. These are expected to be fatal this early, the exception is BRK for WARN_ON(), but as this uses do_debug_exception() which keeps IRQs masked, it shouldn't call schedule(). Signed-off-by: Julien Thierry <julien.thierry@arm.com> [james: Replaced a really cool hack, with an even simpler static key in C. expanded commit message with Julien's cover-letter ascii art] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-10-16debugfs: remove return value of debugfs_create_x64()Greg Kroah-Hartman
No one checks the return value of debugfs_create_x64(), as it's not needed, so make the return value void, so that no one tries to do so in the future. Link: https://lore.kernel.org/r/20191011132931.1186197-8-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-16debugfs: remove return value of debugfs_create_x32()Greg Kroah-Hartman
No one checks the return value of debugfs_create_x32(), as it's not needed, so make the return value void, so that no one tries to do so in the future. Link: https://lore.kernel.org/r/20191011132931.1186197-7-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-16debugfs: remove return value of debugfs_create_x16()Greg Kroah-Hartman
No one checks the return value of debugfs_create_x16(), as it's not needed, so make the return value void, so that no one tries to do so in the future. Link: https://lore.kernel.org/r/20191011132931.1186197-6-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-16soc: xilinx: Set CAP_UNUSABLE requirement for versal while powering down domainTejas Patel
For "0" requirement which is used to inform firmware that device is not required currently by master, Versal PLM (Platform Loader and Manager) which runs on Platform Management Controller and is responsible platform management of devices that disables clock, power it down and reset the device. genpd_power_off() is being called during runtime suspend also. So, if any device goes to runtime suspend state during resumes it needs to be re-initialized again. It is possible that drivers do not reinitialize device upon resume from runtime suspend every time ans so dont want it to be powered down or get reset during runtime suspend. In Versal PLM new PM_CAP_UNUSABLE capability is added, which disables clock only and avoids power down and reset during runtime suspend. Power and reset will be gated with core suspend.So, this patch sets CAPABILITY_UNUSABLE requirement during gpd_power_off() if platform is other than zynqmp. Signed-off-by: Tejas Patel <tejas.patel@xilinx.com> Signed-off-by: Jolly Shah <jolly.shah@xilinx.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2019-10-16rtc: ds1685: add indirect access method and remove plat_read/plat_writeThomas Bogendoerfer
SGI Octane (IP30) doesn't have RTC register directly mapped into CPU address space, but accesses RTC registers with an address and data register. This is now supported by additional access functions, which are selected by a new field in platform data. Removed plat_read/plat_write since there is no user and their usage could introduce lifetime issue, when functions are placed in different modules. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Acked-by: Joshua Kinard <kumba@gentoo.org> Reviewed-by: Joshua Kinard <kumba@gentoo.org> Link: https://lore.kernel.org/r/20191014214621.25257-1-tbogendoerfer@suse.de Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2019-10-15net: phylink: use more linkmode_*Russell King
Use more linkmode_* helpers rather than open-coding the bitmap operations. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actionsDavide Caratti
the following script: # tc qdisc add dev eth0 clsact # tc filter add dev eth0 egress protocol ip matchall \ > action mpls push protocol mpls_uc label 0x355aa bos 1 causes corruption of all IP packets transmitted by eth0. On TC egress, we can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push' operation will result in an overwrite of the first 4 octets in the packet L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same error pattern is present also in the MPLS 'pop' operation. Fix this error in act_mpls data plane, computing 'mac_len' as the difference between the network header and the mac header (when not at TC ingress), and use it in MPLS 'push'/'pop' core functions. v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len' in skb_mpls_pop(), reported by kbuild test robot CC: Lorenzo Bianconi <lorenzo@kernel.org> Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15PCI/ATS: Make pci_restore_pri_state(), pci_restore_pasid_state() privateBjorn Helgaas
These interfaces: void pci_restore_pri_state(struct pci_dev *pdev); void pci_restore_pasid_state(struct pci_dev *pdev); are only used in drivers/pci and do not need to be seen by the rest of the kernel. Most them to drivers/pci/pci.h so they're private to the PCI subsystem. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Joerg Roedel <jroedel@suse.de>
2019-10-15PCI/ATS: Remove unused PRI and PASID stubsBjorn Helgaas
The following functions are only used by amd_iommu.c and intel-iommu.c (when CONFIG_INTEL_IOMMU_SVM is enabled). CONFIG_PCI_PRI and CONFIG_PCI_PASID are always defined in those cases, so there's no need for the stubs. pci_enable_pri() pci_disable_pri() pci_reset_pri() pci_prg_resp_pasid_required() pci_enable_pasid() pci_disable_pasid() Remove the unused stubs. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Joerg Roedel <jroedel@suse.de>
2019-10-15PCI/ATS: Consolidate ATS declarations in linux/pci-ats.hKrzysztof Wilczynski
Move ATS function prototypes from include/linux/pci.h to include/linux/pci-ats.h as the ATS, PRI, and PASID interfaces are related and are used only by the IOMMU drivers. This effectively reverts ff9bee895c4d ("PCI: Move ATS declarations to linux/pci.h so they're all together"). Also, remove surplus forward declaration of struct pci_ats from include/linux/pci.h, as it is no longer needed, since struct pci_ats was embedded directly into struct pci_dev by d544d75ac96a ("PCI: Embed ATS info directly into struct pci_dev"). No functional changes intended. Link: https://lore.kernel.org/r/20190914213032.22314-1-kw@linux.com Signed-off-by: Krzysztof Wilczynski <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-15PCI/ATS: Cache PRI PRG Response PASID Required bitBjorn Helgaas
The PRG Response PASID Required bit in the PRI Capability is read-only. Read it once when we enumerate the device and cache the value so we don't need to read it again. Based-on-patch-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-15PCI/ATS: Cache PASID Capability offsetKuppuswamy Sathyanarayanan
Previously each PASID interface searched for the PASID Capability. Cache the capability offset the first time we use it instead of searching each time. [bhelgaas: commit log, reorder patch to later, call pci_pasid_init() from pci_init_capabilities()] Link: https://lore.kernel.org/r/4957778959fa34eab3e8b3065d1951989c61cb0f.1567029860.git.sathyanarayanan.kuppuswamy@linux.intel.com Link: https://lore.kernel.org/r/20190905193146.90250-6-helgaas@kernel.org Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-15PCI/ATS: Cache PRI Capability offsetKuppuswamy Sathyanarayanan
Previously each PRI interface searched for the PRI Capability. Cache the capability offset the first time we use it instead of searching each time. [bhelgaas: commit log, reorder patch to later, call pci_pri_init() from pci_init_capabilities()] Link: https://lore.kernel.org/r/0c5495d376faf6dbb8eb2165204c474438aaae65.156 7029860.git.sathyanarayanan.kuppuswamy@linux.intel.com Link: https://lore.kernel.org/r/20190905193146.90250-5-helgaas@kernel.org Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-15PCI/ATS: Disable PF/VF ATS service independentlyKuppuswamy Sathyanarayanan
Previously we didn't disable the PF ATS until all associated VFs had disabled it. But per PCIe spec r5.0, sec 9.3.7.8, the ATS Capability in VFs and associated PFs may be enabled independently. Leaving ATS enabled in the PF unnecessarily may have power and performance impacts. Remove this dependency logic in the ATS enable/disable code. [bhelgaas: commit log] Suggested-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/8163ab8fa66afd2cba514ae95d29ab12104781aa.1567029860.git.sathyanarayanan.kuppuswamy@linux.intel.com Link: https://lore.kernel.org/r/20190905193146.90250-4-helgaas@kernel.org Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com>
2019-10-15PCI/ATS: Move pci_prg_resp_pasid_required() to CONFIG_PCI_PRIBjorn Helgaas
pci_prg_resp_pasid_required() returns the value of the "PRG Response PASID Required" bit from the PRI capability, but the interface was previously defined under #ifdef CONFIG_PCI_PASID. Move it from CONFIG_PCI_PASID to CONFIG_PCI_PRI so it's with the other PRI-related things. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Joerg Roedel <jroedel@suse.de>
2019-10-15iio: imu: st_lsm6dsx: add wakeup_source in st_sensors_platform_dataLorenzo Bianconi
Add the possibility to enable/disable wakeup source through st_sensors_platform_data and not only through device tree Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2019-10-15PCI/ASPM: Add L1 PM substate support to pci_disable_link_state()Heiner Kallweit
Add support for disabling states L1.1 and L1.2 to pci_disable_link_state(). Allow separate control of ASPM and PCI PM L1 substates. Link: https://lore.kernel.org/r/d81f8036-c236-6463-48e7-ebcdcda85bba@gmail.com Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-15iomap: Allow forcing of waiting for running DIO in iomap_dio_rw()Jan Kara
Filesystems do not support doing IO as asynchronous in some cases. For example in case of unaligned writes or in case file size needs to be extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO in such cases, add argument to iomap_dio_rw() which makes the function wait for IO completion. This also results in executing iomap_dio_complete() inline in iomap_dio_rw() providing its return value to the caller as for ordinary sync IO. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15iommu: Introduce guest PASID bind functionJacob Pan
Guest shared virtual address (SVA) may require host to shadow guest PASID tables. Guest PASID can also be allocated from the host via enlightened interfaces. In this case, guest needs to bind the guest mm, i.e. cr3 in guest physical address to the actual PASID table in the host IOMMU. Nesting will be turned on such that guest virtual address can go through a two level translation: - 1st level translates GVA to GPA - 2nd level translates GPA to HPA This patch introduces APIs to bind guest PASID data to the assigned device entry in the physical IOMMU. See the diagram below for usage explanation. .-------------. .---------------------------. | vIOMMU | | Guest process mm, FL only | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush - '-------------' | | | V | | GP '-------------' Guest ------| Shadow |----------------------- GP->HP* --------- v v | Host v .-------------. .----------------------. | pIOMMU | | Bind FL for GVA-GPA | | | '----------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------\.---------------------. | | |Set SL to GPA-HPA | | | '---------------------' '-------------' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables - GP = Guest PASID - HP = Host PASID * Conversion needed if non-identity GP-HP mapping option is chosen. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-10-15iommu/ioasid: Add custom allocatorsJacob Pan
IOASID allocation may rely on platform specific methods. One use case is that when running in the guest, in order to obtain system wide global IOASIDs, emulated allocation interface is needed to communicate with the host. Here we call these platform specific allocators custom allocators. Custom IOASID allocators can be registered at runtime and take precedence over the default XArray allocator. They have these attributes: - provides platform specific alloc()/free() functions with private data. - allocation results lookup are not provided by the allocator, lookup request must be done by the IOASID framework by its own XArray. - allocators can be unregistered at runtime, either fallback to the next custom allocator or to the default allocator. - custom allocators can share the same set of alloc()/free() helpers, in this case they also share the same IOASID space, thus the same XArray. - switching between allocators requires all outstanding IOASIDs to be freed unless the two allocators share the same alloc()/free() helpers. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lkml.org/lkml/2019/4/26/462 Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-10-15iommu: Add I/O ASID allocatorJean-Philippe Brucker
Some devices might support multiple DMA address spaces, in particular those that have the PCI PASID feature. PASID (Process Address Space ID) allows to share process address spaces with devices (SVA), partition a device into VM-assignable entities (VFIO mdev) or simply provide multiple DMA address space to kernel drivers. Add a global PASID allocator usable by different drivers at the same time. Name it I/O ASID to avoid confusion with ASIDs allocated by arch code, which are usually a separate ID space. The IOASID space is global. Each device can have its own PASID space, but by convention the IOMMU ended up having a global PASID space, so that with SVA, each mm_struct is associated to a single PASID. The allocator is primarily used by IOMMU subsystem but in rare occasions drivers would like to allocate PASIDs for devices that aren't managed by an IOMMU, using the same ID space as IOMMU. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-10-15iommu: Introduce cache_invalidate APIYi L Liu
In any virtualization use case, when the first translation stage is "owned" by the guest OS, the host IOMMU driver has no knowledge of caching structure updates unless the guest invalidation activities are trapped by the virtualizer and passed down to the host. Since the invalidation data can be obtained from user space and will be written into physical IOMMU, we must allow security check at various layers. Therefore, generic invalidation data format are proposed here, model specific IOMMU drivers need to convert them into their own format. Signed-off-by: Yi L Liu <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-10-15arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clearMarc Zyngier
The GICv3 architecture specification is incredibly misleading when it comes to PMR and the requirement for a DSB. It turns out that this DSB is only required if the CPU interface sends an Upstream Control message to the redistributor in order to update the RD's view of PMR. This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't the case in Linux. It can still be set from EL3, so some special care is required. But the upshot is that in the (hopefuly large) majority of the cases, we can drop the DSB altogether. This relies on a new static key being set if the boot CPU has PMHE set. The drawback is that this static key has to be exported to modules. Cc: Will Deacon <will@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-15spi: spi-fsl-espi: convert transfer delay to `spi_delay` formatAlexandru Ardelean
The way the max delay is computed for this controller, it looks like it is searching for the max delay from an SPI message a using that. No idea if this is valid. But this change should support both `delay_usecs` and the new `delay` data which is of `spi_delay` type. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20190926105147.7839-17-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-15spi: implement SW control for CS timesAlexandru Ardelean
This change implements CS control for setup, hold & inactive delays. The `cs_setup` delay is completely new, and can help with cases where asserting the CS, also brings the device out of power-sleep, where there needs to be a longer (than usual), before transferring data. The `cs_hold` time can overlap with the `delay` (or `delay_usecs`) from an SPI transfer. The main difference is that `cs_hold` implies that CS will be de-asserted. The `cs_inactive` delay does not have a clear use-case yet. It has been implemented mostly because the `spi_set_cs_timing()` function implements it. To some degree, this could overlap or replace `cs_change_delay`, but this will require more consideration/investigation in the future. All these delays have been added to the `spi_controller` struct, as they would typically be configured by calling `spi_set_cs_timing()` after an `spi_setup()` call. Software-mode for CS control, implies that the `set_cs_timing()` hook has not been provided for the `spi_controller` object. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20190926105147.7839-16-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-15spi: tegra114: change format for `spi_set_cs_timing()` functionAlexandru Ardelean
The initial version of `spi_set_cs_timing()` was implemented with consideration only for clock-cycles as delay. For cases like `CS setup` time, it's sometimes needed that micro-seconds (or nano-seconds) are required, or sometimes even longer delays, for cases where the device needs a little longer to start transferring that after CS is asserted. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20190926105147.7839-15-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-15spi: introduce `delay` field for `spi_transfer` + spi_transfer_delay_exec()Alexandru Ardelean
The change introduces the `delay` field to the `spi_transfer` struct as an `struct spi_delay` type. This intends to eventually replace `delay_usecs`. But, since there are many users of `delay_usecs`, this needs some intermediate work. A helper called `spi_transfer_delay_exec()` is also added, which maintains backwards compatibility with `delay_usecs`, by assigning the value to `delay` if non-zero. This should maintain backwards compatibility with current users of `udelay_usecs`. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20190926105147.7839-9-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-15spi: core,atmel: convert `word_delay_usecs` -> `word_delay` for spi_deviceAlexandru Ardelean
This change does a conversion from the `word_delay_usecs` -> `word_delay` for the `spi_device` struct. This allows users to specify inter-word delays in other unit types (nano-seconds or clock cycles), depending on how users want. The Atmel SPI driver is the only current user of the `word_delay_usecs` field (from the `spi_device` struct). So, it needed a slight conversion to use the `word_delay` as an `spi_delay` struct. In SPI core, the only required mechanism is to update the `word_delay` information per `spi_transfer`. This requires a bit more logic than before, because it needs that both delays be converted to a common unit (nano-seconds) for comparison. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20190926105147.7839-8-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-15spi: sprd: convert transfer word delay to spi_delay structAlexandru Ardelean
The Spreadtrum SPI driver is the only user of the `word_delay` field in the `spi_transfer` struct. This change converts the field to use the `spi_delay` struct. This also enforces the users to specify the delay unit to be `SPI_DELAY_UNIT_SCK`. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20190926105147.7839-5-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-15spi: make `cs_change_delay` the first user of the `spi_delay` logicAlexandru Ardelean
Since the logic for `spi_delay` struct + `spi_delay_exec()` has been copied from the `cs_change_delay` logic, it's natural to make this delay, the first user. The `cs_change_delay` logic requires that the default remain 10 uS, in case it is unspecified/unconfigured. So, there is some special handling needed to do that. The ADIS library is one of the few users of the new `cs_change_delay` parameter for an spi_transfer. The introduction of the `spi_delay` struct, requires that the users of of `cs_change_delay` get an update. This change also updates the ADIS library. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20190926105147.7839-4-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-15spi: introduce spi_delay struct as "value + unit" & spi_delay_exec()Alexandru Ardelean
There are plenty of delays that have been introduced in SPI core. Most of them are in micro-seconds, some need to be in nano-seconds, and some in clock-cycles. For some of these delays (related to transfers & CS timing) it may make sense to have a `spi_delay` struct that abstracts these a bit. The important element of these delays [for unification] seems to be the `unit` of the delay. It looks like micro-seconds is good enough for most people, but every-once in a while, some delays seem to require other units of measurement. This change adds the `spi_delay` struct & a `spi_delay_exec()` function that processes a `spi_delay` object/struct to execute the delay. It's a copy of the `cs_change_delay` mechanism, but without the default for 10 uS. The clock-cycle delay unit is a bit special, as it needs to be bound to an `spi_transfer` object to execute. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Link: https://lore.kernel.org/r/20190926105147.7839-3-alexandru.ardelean@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-15ACPI / utils: Introduce acpi_dev_hid_uid_match() helperAndy Shevchenko
There are users outside of ACPI realm which reimplementing the comparator function to check if the given device matches to given HID and UID. For better utilization, introduce a helper for everyone to use. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-10-15iommu: Add gfp parameter to iommu_ops::mapTom Murphy
Add a gfp_t parameter to the iommu_ops::map function. Remove the needless locking in the AMD iommu driver. The iommu_ops::map function (or the iommu_map function which calls it) was always supposed to be sleepable (according to Joerg's comment in this thread: https://lore.kernel.org/patchwork/patch/977520/ ) and so should probably have had a "might_sleep()" since it was written. However currently the dma-iommu api can call iommu_map in an atomic context, which it shouldn't do. This doesn't cause any problems because any iommu driver which uses the dma-iommu api uses gfp_atomic in it's iommu_ops::map function. But doing this wastes the memory allocators atomic pools. Signed-off-by: Tom Murphy <murphyt7@tcd.ie> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-10-15gpiolib: Initialize the hardware with a callbackAndy Shevchenko
After changing the drivers to use GPIO core to add an IRQ chip it appears that some of them requires a hardware initialization before adding the IRQ chip. Add an optional callback ->init_hw() to allow that drivers to initialize hardware if needed. This change is a part of the fix NULL pointer dereference brought to the several drivers recently. Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-14xarray.h: fix kernel-doc warningRandy Dunlap
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>: include/linux/xarray.h:232: WARNING: Unexpected indentation. Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org Fixes: a3e4d3f97ec8 ("XArray: Redesign xa_alloc API") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14bitmap.h: fix kernel-doc warning and typoRandy Dunlap
Fix kernel-doc warning in <linux/bitmap.h>: include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal' Also fix small typo (bitnaps). Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org Fixes: b9fa6442f704 ("cpumask: Implement cpumask_or_equal()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm, page_owner: rename flag indicating that page is allocatedVlastimil Babka
Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page is tracked as being allocated. Kirril suggested naming it PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat loaded term for a page". Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>