summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2018-05-24bpf: devmap introduce dev_map_enqueueJesper Dangaard Brouer
Functionality is the same, but the ndo_xdp_xmit call is now simply invoked from inside the devmap.c code. V2: Fix compile issue reported by kbuild test robot <lkp@intel.com> V5: Cleanups requested by Daniel - Newlines before func definition - Use BUILD_BUG_ON checks - Remove unnecessary use return value store in dev_map_enqueue Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24bpf: introduce bpf subcommand BPF_TASK_FD_QUERYYonghong Song
Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, if the <pid, fd> is associated with a tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24perf/core: add perf_get_event() to return perf_event given a struct fileYonghong Song
A new extern function, perf_get_event(), is added to return a perf event given a struct file. This function will be used in later patches. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24net/mlx5: PPTB and PBMC register firmware command supportHuy Nguyen
Add firmware command interface to read and write PPTB and PBMC registers. PPTB register enables mappings priority to a specific receive buffer. PBMC registers enables changing the receive buffer's configuration such as buffer size, xon/xoff thresholds, buffer's lossy property and buffer's shared property. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net/mlx5: Add pbmc and pptb in the port_access_reg_cap_maskHuy Nguyen
Add pbmc and pptb in the port_access_reg_cap_mask. These two bits determine if device supports receive buffer configuration. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24net: phy: replace bool members in struct phy_device with bit-fieldsHeiner Kallweit
In struct phy_device we have a number of flags being defined as type bool. Similar to e.g. struct pci_dev we can save some space by using bit-fields. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24hwspinlock/core: Switch to SPDX license identifierSuman Anna
Use the appropriate SPDX license identifier in the Hwspinlock core driver source files and drop the previous boilerplate license text. Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-05-24Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"Joonsoo Kim
This reverts the following commits that change CMA design in MM. 3d2054ad8c2d ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y") 1d47a3ec09b5 ("mm/cma: remove ALLOC_CMA") bad8c6c0b114 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE") Ville reported a following error on i386. Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) microcode: microcode updated early to revision 0x4, date = 2013-06-28 Initializing CPU#0 Initializing HighMem for node 0 (000377fe:00118000) Initializing Movable for node 0 (00000001:00118000) BUG: Bad page state in process swapper pfn:377fe page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0 flags: 0x80000000() raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145 Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013 Call Trace: dump_stack+0x60/0x96 bad_page+0x9a/0x100 free_pages_check_bad+0x3f/0x60 free_pcppages_bulk+0x29d/0x5b0 free_unref_page_commit+0x84/0xb0 free_unref_page+0x3e/0x70 __free_pages+0x1d/0x20 free_highmem_page+0x19/0x40 add_highpages_with_active_regions+0xab/0xeb set_highmem_pages_init+0x66/0x73 mem_init+0x1b/0x1d7 start_kernel+0x17a/0x363 i386_start_kernel+0x95/0x99 startup_32_smp+0x164/0x168 The reason for this error is that the span of MOVABLE_ZONE is extended to whole node span for future CMA initialization, and, normal memory is wrongly freed here. I submitted the fix and it seems to work, but, another problem happened. It's so late time to fix the later problem so I decide to reverting the series. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-24blk-mq: avoid starving tag allocation after allocating process migratesMing Lei
When the allocation process is scheduled back and the mapped hw queue is changed, fake one extra wake up on previous queue for compensating wake up miss, so other allocations on the previous queue won't be starved. This patch fixes one request allocation hang issue, which can be triggered easily in case of very low nr_request. The race is as follows: 1) 2 hw queues, nr_requests are 2, and wake_batch is one 2) there are 3 waiters on hw queue 0 3) two in-flight requests in hw queue 0 are completed, and only two waiters of 3 are waken up because of wake_batch, but both the two waiters can be scheduled to another CPU and cause to switch to hw queue 1 4) then the 3rd waiter will wait for ever, since no in-flight request is in hw queue 0 any more. 5) this patch fixes it by the fake wakeup when waiter is scheduled to another hw queue Cc: <stable@vger.kernel.org> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Modified commit message to make it clearer, and make it apply on top of the 4.18 branch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-24PCI: Add Intel VMD devices to pci idsJon Derrick
Add the Intel VMD device ids to the pci id database and update the VMD driver. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2018-05-24power: supply: Add fwnode pointer to power_supply_config structAdam Thomson
To allow users of the power supply framework to be hw description agnostic, this commit adds the ability to pass a fwnode pointer, via the power_supply_config structure, to the initialisation code of the core, instead of explicitly specifying of_ndoe. If that fwnode pointer is provided then it will automatically resolve down to of_node on platforms which support it, otherwise it will be NULL. In the future, when ACPI support is added, this can be modified to accommodate ACPI without the need to change calling code which already provides the fwnode handle in this manner. Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com> Suggested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-24regulator: tps65090: Pass descriptor instead of GPIO numberLinus Walleij
Instead of passing a global GPIO number for the enable GPIO, pass a descriptor looked up from the device tree node for the regulator. This regulator supports passing platform data, but enable/sleep regulators are looked up from the device tree exclusively, so we can need not touch other files. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-05-24regulator: s5m8767: Pass descriptor instead of GPIO numberLinus Walleij
Instead of passing a global GPIO number for the enable GPIO, pass a descriptor looked up from the device tree node for the regulator. This regulator supports passing platform data, but enable/sleep regulators are looked up from the device tree exclusively, so we can need not touch other files. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-05-24regulator: max8952: Pass descriptor instead of GPIO numberLinus Walleij
Instead of passing a global GPIO number for the enable GPIO, pass a descriptor looked up with the standard devm_gpiod_get_optional() call. All users of this regulator use device tree so the transition is pretty smooth. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-05-24regulator: lp8788-ldo: Pass descriptor instead of GPIO numberLinus Walleij
Instead of passing a global GPIO number, pass a descriptor looked up with the standard devm_gpiod_get_index_optional() call. This driver has supported passing a LDO enable GPIO for years, yet this facility has never been put to use in the upstream kernel. If someone desires to put in place GPIO control for the LDOs, this can be done by adding a GPIO descriptor table in the MFD nexus in drivers/mfd/lp8788.c for the LDO device when spawning the MFD children, or using a board file. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-05-24Merge tag 'usb-for-v4.18' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next usb: changes for v4.18 merge window A total of 98 non-merge commits, the biggest part being in dwc3 this time around with a large refactoring of dwc3's transfer handling code. We also have a new driver for Aspeed virtual hub controller. Apart from that, just a list of miscellaneous fixes all over the place.
2018-05-24Merge tag 'mlx5-updates-2018-05-17' of ↵Jason Gunthorpe
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into for-next mlx5-updates-2018-05-17 mlx5 core dirver updates for both net-next and rdma-next branches. From Christophe JAILLET, first three patche to use kvfree where needed. From: Or Gerlitz <ogerlitz@mellanox.com> Next six patches from Roi and Co adds support for merged sriov e-switch which comes to serve cases where both PFs, VFs set on them and both uplinks are to be used in single v-switch SW model. When merged e-switch is supported, the per-port e-switch is logically merged into one e-switch that spans both physical ports and all the VFs. This model allows to offload TC eswitch rules between VFs belonging to different PFs (and hence have different eswitch affinity), it also sets the some of the foundations needed for uplink LAG support. * tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5e: Explicitly set source e-switch in offloaded TC rules net/mlx5: Add source e-switch owner net/mlx5e: Explicitly set destination e-switch in FDB rules net/mlx5: Add destination e-switch owner net/mlx5: Properly handle a vport destination when setting FTE net/mlx5: Add merged e-switch cap IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()' net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()' net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
2018-05-24bpf: properly enforce index mask to prevent out-of-bounds speculationDaniel Borkmann
While reviewing the verifier code, I recently noticed that the following two program variants in relation to tail calls can be loaded. Variant 1: # bpftool p d x i 15 0: (15) if r1 == 0x0 goto pc+3 1: (18) r2 = map[id:5] 3: (05) goto pc+2 4: (18) r2 = map[id:6] 6: (b7) r3 = 7 7: (35) if r3 >= 0xa0 goto pc+2 8: (54) (u32) r3 &= (u32) 255 9: (85) call bpf_tail_call#12 10: (b7) r0 = 1 11: (95) exit # bpftool m s i 5 5: prog_array flags 0x0 key 4B value 4B max_entries 4 memlock 4096B # bpftool m s i 6 6: prog_array flags 0x0 key 4B value 4B max_entries 160 memlock 4096B Variant 2: # bpftool p d x i 20 0: (15) if r1 == 0x0 goto pc+3 1: (18) r2 = map[id:8] 3: (05) goto pc+2 4: (18) r2 = map[id:7] 6: (b7) r3 = 7 7: (35) if r3 >= 0x4 goto pc+2 8: (54) (u32) r3 &= (u32) 3 9: (85) call bpf_tail_call#12 10: (b7) r0 = 1 11: (95) exit # bpftool m s i 8 8: prog_array flags 0x0 key 4B value 4B max_entries 160 memlock 4096B # bpftool m s i 7 7: prog_array flags 0x0 key 4B value 4B max_entries 4 memlock 4096B In both cases the index masking inserted by the verifier in order to control out of bounds speculation from a CPU via b2157399cc98 ("bpf: prevent out-of-bounds speculation") seems to be incorrect in what it is enforcing. In the 1st variant, the mask is applied from the map with the significantly larger number of entries where we would allow to a certain degree out of bounds speculation for the smaller map, and in the 2nd variant where the mask is applied from the map with the smaller number of entries, we get buggy behavior since we truncate the index of the larger map. The original intent from commit b2157399cc98 is to reject such occasions where two or more different tail call maps are used in the same tail call helper invocation. However, the check on the BPF_MAP_PTR_POISON is never hit since we never poisoned the saved pointer in the first place! We do this explicitly for map lookups but in case of tail calls we basically used the tail call map in insn_aux_data that was processed in the most recent path which the verifier walked. Thus any prior path that stored a pointer in insn_aux_data at the helper location was always overridden. Fix it by moving the map pointer poison logic into a small helper that covers both BPF helpers with the same logic. After that in fixup_bpf_calls() the poison check is then hit for tail calls and the program rejected. Latter only happens in unprivileged case since this is the *only* occasion where a rewrite needs to happen, and where such rewrite is specific to the map (max_entries, index_mask). In the privileged case the rewrite is generic for the insn->imm / insn->code update so multiple maps from different paths can be handled just fine since all the remaining logic happens in the instruction processing itself. This is similar to the case of map lookups: in case there is a collision of maps in fixup_bpf_calls() we must skip the inlined rewrite since this will turn the generic instruction sequence into a non- generic one. Thus the patch_call_imm will simply update the insn->imm location where the bpf_map_lookup_elem() will later take care of the dispatch. Given we need this 'poison' state as a check, the information of whether a map is an unpriv_array gets lost, so enforcing it prior to that needs an additional state. In general this check is needed since there are some complex and tail call intensive BPF programs out there where LLVM tends to generate such code occasionally. We therefore convert the map_ptr rather into map_state to store all this w/o extra memory overhead, and the bit whether one of the maps involved in the collision was from an unpriv_array thus needs to be retained as well there. Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24ipv6: sr: Add seg6local action End.BPFMathieu Xhonneux
This patch adds the End.BPF action to the LWT seg6local infrastructure. This action works like any other seg6local End action, meaning that an IPv6 header with SRH is needed, whose DA has to be equal to the SID of the action. It will also advance the SRH to the next segment, the BPF program does not have to take care of this. Since the BPF program may not be a source of instability in the kernel, it is important to ensure that the integrity of the packet is maintained before yielding it back to the IPv6 layer. The hook hence keeps track if the SRH has been altered through the helpers, and re-validates its content if needed with seg6_validate_srh. The state kept for validation is stored in a per-CPU buffer. The BPF program is not allowed to directly write into the packet, and only some fields of the SRH can be altered through the helper bpf_lwt_seg6_store_bytes. Performances profiling has shown that the SRH re-validation does not induce a significant overhead. If the altered SRH is deemed as invalid, the packet is dropped. This validation is also done before executing any action through bpf_lwt_seg6_action, and will not be performed again if the SRH is not modified after calling the action. The BPF program may return 3 types of return codes: - BPF_OK: the End.BPF action will look up the next destination through seg6_lookup_nexthop. - BPF_REDIRECT: if an action has been executed through the bpf_lwt_seg6_action helper, the BPF program should return this value, as the skb's destination is already set and the default lookup should not be performed. - BPF_DROP : the packet will be dropped. Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-24bpf: Split lwt inout verifier structuresMathieu Xhonneux
The new bpf_lwt_push_encap helper should only be accessible within the LWT BPF IN hook, and not the OUT one, as this may lead to a skb under panic. At the moment, both LWT BPF IN and OUT share the same list of helpers, whose calls are authorized by the verifier. This patch separates the verifier ops for the IN and OUT hooks, and allows the IN hook to call the bpf_lwt_push_encap helper. This patch is also the occasion to put all lwt_*_func_proto functions together for clarity. At the moment, socks_op_func_proto is in the middle of lwt_inout_func_proto and lwt_xmit_func_proto. Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal. 2) Add support for map lookups to numgen, random and hash expressions, from Laura Garcia. 3) Allow to register nat hooks for iptables and nftables at the same time. Patchset from Florian Westpha. 4) Timeout support for rbtree sets. 5) ip6_rpfilter works needs interface for link-local addresses, from Vincent Bernat. 6) Add nf_ct_hook and nf_nat_hook structures and use them. 7) Do not drop packets on packets raceing to insert conntrack entries into hashes, this is particularly a problem in nfqueue setups. 8) Address fallout from xt_osf separation to nf_osf, patches from Florian Westphal and Fernando Mancera. 9) Remove reference to struct nft_af_info, which doesn't exist anymore. From Taehee Yoo. This batch comes with is a conflict between 25fd386e0bc0 ("netfilter: core: add missing __rcu annotation") in your tree and 2c205dd3981f ("netfilter: add struct nf_nat_hook and use it") coming in this batch. This conflict can be solved by leaving the __rcu tag on __netfilter_net_init() - added by 25fd386e0bc0 - and remove all code related to nf_nat_decode_session_hook - which is gone after 2c205dd3981f, as described by: diff --cc net/netfilter/core.c index e0ae4aae96f5,206fb2c4c319..168af54db975 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo EXPORT_SYMBOL_GPL(nf_ct_zone_dflt); #endif /* CONFIG_NF_CONNTRACK */ - static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max) -#ifdef CONFIG_NF_NAT_NEEDED -void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *); -EXPORT_SYMBOL(nf_nat_decode_session_hook); -#endif - + static void __net_init + __netfilter_net_init(struct nf_hook_entries __rcu **e, int max) { int h; I can also merge your net-next tree into nf-next, solve the conflict and resend the pull request if you prefer so. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23gso: limit udp gso to egress-only virtual devicesWillem de Bruijn
Until the udp receive stack supports large packets (UDP GRO), GSO packets must not loop from the egress to the ingress path. Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual devices through NETIF_F_GSO_ENCAP_ALL as this included devices that may loop packets, such as veth and macvlan. Instead add it to specific devices that forward to another device's egress path, bonding and team. Fixes: 83aa025f535f ("udp: add gso support to virtual devices") CC: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23net: add skeleton of bpfilter kernel moduleAlexei Starovoitov
bpfilter.ko consists of bpfilter_kern.c (normal kernel module code) and user mode helper code that is embedded into bpfilter.ko The steps to build bpfilter.ko are the following: - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file is converted into bpfilter_umh.o object file with _binary_net_bpfilter_bpfilter_umh_start and _end symbols Example: $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o 0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end 0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size 0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko bpfilter_kern.c is a normal kernel module code that calls the fork_usermode_blob() helper to execute part of its own data as a user mode process. Notice that _binary_net_bpfilter_bpfilter_umh_start - end is placed into .init.rodata section, so it's freed as soon as __init function of bpfilter.ko is finished. As part of __init the bpfilter.ko does first request/reply action via two unix pipe provided by fork_usermode_blob() helper to make sure that umh is healthy. If not it will kill it via pid. Later bpfilter_process_sockopt() will be called from bpfilter hooks in get/setsockopt() to pass iptable commands into umh via bpfilter.ko If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will kill umh as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23umh: introduce fork_usermode_blob() helperAlexei Starovoitov
Introduce helper: int fork_usermode_blob(void *data, size_t len, struct umh_info *info); struct umh_info { struct file *pipe_to_umh; struct file *pipe_from_umh; pid_t pid; }; that GPLed kernel modules (signed or unsigned) can use it to execute part of its own data as swappable user mode process. The kernel will do: - allocate a unique file in tmpfs - populate that file with [data, data + len] bytes - user-mode-helper code will do_execve that file and, before the process starts, the kernel will create two unix pipes for bidirectional communication between kernel module and umh - close tmpfs file, effectively deleting it - the fork_usermode_blob will return zero on success and populate 'struct umh_info' with two unix pipes and the pid of the user process As the first step in the development of the bpfilter project the fork_usermode_blob() helper is introduced to allow user mode code to be invoked from a kernel module. The idea is that user mode code plus normal kernel module code are built as part of the kernel build and installed as traditional kernel module into distro specified location, such that from a distribution point of view, there is no difference between regular kernel modules and kernel modules + umh code. Such modules can be signed, modprobed, rmmod, etc. The use of this new helper by a kernel module doesn't make it any special from kernel and user space tooling point of view. Such approach enables kernel to delegate functionality traditionally done by the kernel modules into the user space processes (either root or !root) and reduces security attack surface of the new code. The buggy umh code would crash the user process, but not the kernel. Another advantage is that umh code of the kernel module can be debugged and tested out of user space (e.g. opening the possibility to run clang sanitizers, fuzzers or user space test suites on the umh code). In case of the bpfilter project such architecture allows complex control plane to be done in the user space while bpf based data plane stays in the kernel. Since umh can crash, can be oom-ed by the kernel, killed by the admin, the kernel module that uses them (like bpfilter) needs to manage life time of umh on its own via two unix pipes and the pid of umh. The exit code of such kernel module should kill the umh it started, so that rmmod of the kernel module will cleanup the corresponding umh. Just like if the kernel module does kmalloc() it should kfree() it in the exit code. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23gpio: Remove VLA from gpiolibLaura Abbott
The new challenge is to remove VLAs from the kernel (see https://lkml.org/lkml/2018/3/7/621) to eventually turn on -Wvla. Using a kmalloc array is the easy way to fix this but kmalloc is still more expensive than stack allocation. Introduce a fast path with a fixed size stack array to cover most chip with gpios below some fixed amount. The slow path dynamically allocates an array to cover those chips with a large number of gpios. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Phil Reid <preid@electromag.com.au> Reviewed-and-tested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-05-23bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_infoMartin KaFai Lau
In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id" could cause confusion because the "id" of "btf_id" means the BPF obj id given to the BTF object while "btf_key_id" and "btf_value_id" means the BTF type id within that BTF object. To make it clear, btf_key_id and btf_value_id are renamed to btf_key_type_id and btf_value_type_id. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23bpf: Expose check_uarg_tail_zero()Martin KaFai Lau
This patch exposes check_uarg_tail_zero() which will be reused by a later BTF patch. Its name is changed to bpf_check_uarg_tail_zero(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-23cpufreq: Rename cpufreq_can_do_remote_dvfs()Viresh Kumar
This routine checks if the CPU running this code belongs to the policy of the target CPU or if not, can it do remote DVFS for it remotely. But the current name of it implies as if it is only about doing remote updates. Rename it to make it more relevant. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-23netfilter: nfnetlink_queue: resolve clash for unconfirmed conntracksPablo Neira Ayuso
In nfqueue, two consecutive skbuffs may race to create the conntrack entry. Hence, the one that loses the race gets dropped due to clash in the insertion into the hashes from the nf_conntrack_confirm() path. This patch adds a new nf_conntrack_update() function which searches for possible clashes and resolve them. NAT mangling for the packet losing race is corrected by using the conntrack information that won race. In order to avoid direct module dependencies with conntrack and NAT, the nf_ct_hook and nf_nat_hook structures are used for this purpose. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: add struct nf_nat_hook and use itPablo Neira Ayuso
Move decode_session() and parse_nat_setup_hook() indirections to struct nf_nat_hook structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: add struct nf_ct_hook and use itPablo Neira Ayuso
Move the nf_ct_destroy indirection to the struct nf_ct_hook. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23netfilter: lift one-nat-hook-only restrictionFlorian Westphal
This reverts commit f92b40a8b2645 ("netfilter: core: only allow one nat hook per hook point"), this limitation is no longer needed. The nat core now invokes these functions and makes sure that hook evaluation stops after a mapping is created and a null binding is created otherwise. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-22dax: Introduce a ->copy_to_iter dax operationDan Williams
Similar to the ->copy_from_iter() operation, a platform may want to deploy an architecture or device specific routine for handling reads from a dax_device like /dev/pmemX. On x86 this routine will point to a machine check safe version of copy_to_iter(). For now, add the plumbing to device-mapper and the dax core. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-05-22uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilationDan Williams
Add a common Kconfig CONFIG_ARCH_HAS_UACCESS_MCSAFE that archs can optionally select, and fixup the declaration of _copy_to_iter_mcsafe(). Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-05-22qed: Add driver infrastucture for handling mfw requests.Sudarsana Reddy Kalluru
MFW requests the TLVs in interrupt context. Extracting of the required data from upper layers and populating of the TLVs require process context. The patch adds work-queues for processing the tlv requests. It also adds the implementation for requesting the tlv values from appropriate protocol driver. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-22qed: Add support for processing iscsi tlv request.Sudarsana Reddy Kalluru
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-22qed: Add support for processing fcoe tlv request.Sudarsana Reddy Kalluru
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-22qed: Add support for tlv request processing.Sudarsana Reddy Kalluru
The patch adds driver support for processing TLV requests/repsonses from the mfw and upper driver layers respectively. The implementation reads the requested TLVs from the shared memory, requests the values from upper layer drivers, populates this info (TLVs) shared memory and notifies MFW about the TLV values. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-22rcu/x86: Provide early rcu_cpu_starting() callbackPeter Zijlstra
The x86/mtrr code does horrific things because hardware. It uses stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper thread on another CPU), which uses RCU, all before the CPU is onlined. RCU complains about this, because wakeups use RCU and RCU does (rightfully) not consider offline CPUs for grace-periods. Fix this by initializing RCU way early in the MTRR case. Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Add !SMP support, per 0day Test Robot report. ]
2018-05-22mm, fs, dax: handle layout changes to pinned dax mappingsDan Williams
Background: get_user_pages() in the filesystem pins file backed memory pages for access by devices performing dma. However, it only pins the memory pages not the page-to-file offset association. If a file is truncated the pages are mapped out of the file and dma may continue indefinitely into a page that is owned by a device driver. This breaks coherency of the file vs dma, but the assumption is that if userspace wants the file-space truncated it does not matter what data is inbound from the device, it is not relevant anymore. The only expectation is that dma can safely continue while the filesystem reallocates the block(s). Problem: This expectation that dma can safely continue while the filesystem changes the block map is broken by dax. With dax the target dma page *is* the filesystem block. The model of leaving the page pinned for dma, but truncating the file block out of the file, means that the filesytem is free to reallocate a block under active dma to another file and now the expected data-incoherency situation has turned into active data-corruption. Solution: Defer all filesystem operations (fallocate(), truncate()) on a dax mode file while any page/block in the file is under active dma. This solution assumes that dma is transient. Cases where dma operations are known to not be transient, like RDMA, have been explicitly disabled via commits like 5f1d43de5416 "IB/core: disable memory registration of filesystem-dax vmas". The dax_layout_busy_page() routine is called by filesystems with a lock held against mm faults (i_mmap_lock) to find pinned / busy dax pages. The process of looking up a busy page invalidates all mappings to trigger any subsequent get_user_pages() to block on i_mmap_lock. The filesystem continues to call dax_layout_busy_page() until it finally returns no more active pages. This approach assumes that the page pinning is transient, if that assumption is violated the system would have likely hung from the uncompleted I/O. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-05-22mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPSDan Williams
In preparation for fixing dax-dma-vs-unmap issues, filesystems need to be able to rely on the fact that they will get wakeups on dev_pagemap page-idle events. Introduce MEMORY_DEVICE_FS_DAX and generic_dax_page_free() as common indicator / infrastructure for dax filesytems to require. With this change there are no users of the MEMORY_DEVICE_HOST designation, so remove it. The HMM sub-system extended dev_pagemap to arrange a callback when a dev_pagemap managed page is freed. Since a dev_pagemap page is free / idle when its reference count is 1 it requires an additional branch to check the page-type at put_page() time. Given put_page() is a hot-path we do not want to incur that check if HMM is not in use, so a static branch is used to avoid that overhead when not necessary. Now, the FS_DAX implementation wants to reuse this mechanism for receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific static-key into a generic mechanism that either HMM or FS_DAX code paths can enable. For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support, care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure. However, we still need to support FS_DAX in the FS_DAX_LIMITED case implemented by the s390/dcssblk driver. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Thomas Meyer <thomas@m3y3r.de> Reported-by: Dave Jiang <dave.jiang@intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-05-22i2c: Export of_i2c_get_board_info()Boris Brezillon
I3C busses have to know about all I2C devices connected on the I3C bus to properly initialize the I3C master, and I2C frames can't be sent on the bus until this initialization is done. We can't let the I2C core parse the DT and instantiate I2C devices as part of its i2c_add_adapter() procedure because, when done this way, I2C devices are directly registered to the device-model and might be attached to drivers which could in turn start sending frames on the bus, which won't work since, as said above, the bus is not yet initialized. Export of_i2c_register_device() in order to let the I3C core parse the I2C device nodes by itself and initialize the bus. Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-05-22usb: musb: remove unused members in struct musb_hdrc_configBin Liu
The following members in struct musb_hdrc_config are not used, so remove them. soft_con utm_16 big_endian mult_bulk_tx mult_bulk_rx high_iso_tx high_iso_rx dma dma_channels dyn_fifo_size vendor_ctrl vendor_stat vendor_req dma_req_chan musb_hdrc_eps_bits Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-22tty: add missing const to termios hw-change helperJohan Hovold
Add missing const qualifiers to the parameters of the termios hw-change helper, which is used by a few USB serial drivers. This specifically allows the pl2303 driver to use const arguments in one of its helper as well. Cc: Jiri Slaby <jslaby@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org>
2018-05-22Merge tag 'drm/tegra/for-4.18-rc1' of ↵Dave Airlie
git://anongit.freedesktop.org/tegra/linux into drm-next drm/tegra: Changes for v4.18-rc1 This set enables IOMMU support in the gr2d and gr3d drivers and adds support for the zpos property on older Tegra generations. It also enables scaling filters and incorporates some rework to eliminate a private wrapper around struct drm_framebuffer. The remainder is mostly a random assortment of fixes and cleanups, as well as some preparatory work for destaging the userspace ABI, which is almost ready and is targetted for v4.19-rc1. Signed-off-by: Dave Airlie <airlied@redhat.com> # gpg: Signature made Sat 19 May 2018 08:31:00 AEST # gpg: using RSA key DD23ACD77F3EB3A1 # gpg: Can't check signature: public key not found Link: https://patchwork.freedesktop.org/patch/msgid/20180518224523.30982-1-thierry.reding@gmail.com
2018-05-22Merge branch 'drm-tda998x-devel' of git://git.armlinux.org.uk/~rmk/linux-arm ↵Dave Airlie
into drm-next Please incorporate support for TDA998x I2C driver CEC Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180424095456.GA32460@rmk-PC.armlinux.org.uk
2018-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal. TLS data structures split the TX and RX components in 'net-next', put the new struct members from the bug fix in 'net' into the RX part. The 'net-next' tree had some reworking of how the ERSPAN code works in the GRE tunneling code, overlapping with a one-line headroom calculation fix in 'net'. Overlapping changes in __sock_map_ctx_update_elem(), keep the bits that read the prog members via READ_ONCE() into local variables before using them. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "Assorted fixes all over the place" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: fix io_destroy(2) vs. lookup_ioctx() race ext2: fix a block leak nfsd: vfs_mkdir() might succeed leaving dentry negative unhashed cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed unfuck sysfs_mount() kernfs: deal with kernfs_fill_super() failures cramfs: Fix IS_ENABLED typo befs_lookup(): use d_splice_alias() affs_lookup: switch to d_splice_alias() affs_lookup(): close a race with affs_remove_link() fix breakage caused by d_find_alias() semantics change fs: don't scan the inode cache before SB_BORN is set do d_instantiate/unlock_new_inode combinations safely iov_iter: fix memory leak in pipe_get_pages_alloc() iov_iter: fix return type of __pipe_get_pages()
2018-05-21Merge branch 'speck-v20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Merge speculative store buffer bypass fixes from Thomas Gleixner: - rework of the SPEC_CTRL MSR management to accomodate the new fancy SSBD (Speculative Store Bypass Disable) bit handling. - the CPU bug and sysfs infrastructure for the exciting new Speculative Store Bypass 'feature'. - support for disabling SSB via LS_CFG MSR on AMD CPUs including Hyperthread synchronization on ZEN. - PRCTL support for dynamic runtime control of SSB - SECCOMP integration to automatically disable SSB for sandboxed processes with a filter flag for opt-out. - KVM integration to allow guests fiddling with SSBD including the new software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on AMD. - BPF protection against SSB .. this is just the core and x86 side, other architecture support will come separately. * 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) bpf: Prevent memory disambiguation attack x86/bugs: Rename SSBD_NO to SSB_NO KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG x86/bugs: Rework spec_ctrl base and mask logic x86/bugs: Remove x86_spec_ctrl_set() x86/bugs: Expose x86_spec_ctrl_base directly x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host} x86/speculation: Rework speculative_store_bypass_update() x86/speculation: Add virtualized speculative store bypass disable support x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL x86/speculation: Handle HT correctly on AMD x86/cpufeatures: Add FEATURE_ZEN x86/cpufeatures: Disentangle SSBD enumeration x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP KVM: SVM: Move spec control call after restore of GS x86/cpu: Make alternative_msr_write work for 32-bit code x86/bugs: Fix the parameters alignment and missing void x86/bugs: Make cpu_show_common() static ...
2018-05-21arm_pmu: simplify arm_pmu::handle_irqMark Rutland
The arm_pmu::handle_irq() callback has the same prototype as a generic IRQ handler, taking the IRQ number and a void pointer argument which it must convert to an arm_pmu pointer. This means that all arm_pmu::handle_irq() take an IRQ number they never use, and all must explicitly cast the void pointer to an arm_pmu pointer. Instead, let's change arm_pmu::handle_irq to take an arm_pmu pointer, allowing these casts to be removed. The redundant IRQ number parameter is also removed. Suggested-by: Hoeun Ryu <hoeun.ryu@lge.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>