summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2017-04-28Merge tag 'mac80211-next-for-davem-2017-04-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Another set of patches for -next: * API support for concurrent scheduled scan requests * API changes for roaming reporting * BSS max idle support in mac80211 * API changes for TX status reporting in mac80211 * API changes for RX rate reporting in mac80211 * rewrite monitor logic to prepare for BPF filters * bugfix for rare devices without 2.4 GHz support * a bugfix for recent DFS changes * some further cleanups The API changes are actually at a nice time, since it's typically quiet just before the merge window, and trees can be synchronized easily during it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-28RDMA/qedr: properly check atomic capabilitiesAmrani, Ram
After checking the path upwards towards root complex, actualy check root complex atomic_req capability, and not our own NIC. Verify that the PCIe device control register's atomic egress block is cleared in the path. Verify that the PCIe version is at least 2. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28Merge branch 'pci/resource' into nextBjorn Helgaas
* pci/resource: PCI: Don't resize resources when realigning all devices in system PCI: Don't reassign resources that are already aligned PCI: Factor pci_reassigndev_resource_alignment() powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned PCI: Add pcibios_default_alignment() for arch-specific alignment control PCI: Fix calculation of bridge window's size and alignment PCI: Ignore requested alignment for IOV BARs PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constant
2017-04-28Merge branch 'pci/switchtec' into nextBjorn Helgaas
* pci/switchtec: switchtec: Add IOCTLs to the Switchtec driver switchtec: Add sysfs attributes to the Switchtec driver switchtec: Add user interface documentation MicroSemi Switchtec management interface driver Conflicts: drivers/pci/Kconfig
2017-04-28misc: Add host side PCI driver for PCI test function deviceKishon Vijay Abraham I
Add PCI endpoint test driver that can verify base address register, legacy interrupt/MSI interrupt and read/write/copy buffers between host and device. The corresponding pci-epf-test function driver should be used on the EP side. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-04-27bridge: add per-port broadcast flood flagMike Manning
Support for l2 multicast flood control was added in commit b6cb5ac8331b ("net: bridge: add per-port multicast flood flag"). It allows broadcast as it was introduced specifically for unknown multicast flood control. But as broadcast is a special case of multicast, this may also need to be disabled. For this purpose, introduce a flag to disable the flooding of received l2 broadcasts. This approach is backwards compatible and provides flexibility in filtering for the desired packet types. Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Mike Manning <mmanning@brocade.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - fix orangefs handling of faults on write() - I'd missed that one back when orangefs was going through review. - readdir counterpart of "9p: cope with bogus responses from server in p9_client_{read,write}" - server might be lying or broken, and we'd better not overrun the kmalloc'ed buffer we are copying the results into. - NFS O_DIRECT read/write can leave iov_iter advanced by too much; that's what had been causing iov_iter_pipe() warnings davej had been seeing. - statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should go in before 4.11. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: uapi: change the type of struct statx_timestamp.tv_nsec to unsigned fix nfs O_DIRECT advancing iov_iter too much p9_client_readdir() fix orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
2017-04-27Merge tag 'kvm-arm-for-v4.12' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.12. Changes include: - Using the common sysreg definitions between KVM and arm64 - Improved hyp-stub implementation with support for kexec and kdump on the 32-bit side - Proper PMU exception handling - Performance improvements of our GIC handling - Support for irqchip in userspace with in-kernel arch-timers and PMU support - A fix for a race condition in our PSCI code Conflicts: Documentation/virtual/kvm/api.txt include/uapi/linux/kvm.h
2017-04-26uapi: change the type of struct statx_timestamp.tv_nsec to unsignedDmitry V. Levin
The comment asserting that the value of struct statx_timestamp.tv_nsec must be negative when statx_timestamp.tv_sec is negative, is wrong, as could be seen from the following example: #define _FILE_OFFSET_BITS 64 #include <assert.h> #include <fcntl.h> #include <stdio.h> #include <sys/stat.h> #include <unistd.h> #include <asm/unistd.h> #include <linux/stat.h> int main(void) { static const struct timespec ts[2] = { { .tv_nsec = UTIME_OMIT }, { .tv_sec = -2, .tv_nsec = 42 } }; assert(utimensat(AT_FDCWD, ".", ts, 0) == 0); struct stat st; assert(stat(".", &st) == 0); printf("st_mtim.tv_sec = %lld, st_mtim.tv_nsec = %lu\n", (long long) st.st_mtim.tv_sec, (unsigned long) st.st_mtim.tv_nsec); struct statx stx; assert(syscall(__NR_statx, AT_FDCWD, ".", 0, 0, &stx) == 0); printf("stx_mtime.tv_sec = %lld, stx_mtime.tv_nsec = %lu\n", (long long) stx.stx_mtime.tv_sec, (unsigned long) stx.stx_mtime.tv_nsec); return 0; } It expectedly prints: st_mtim.tv_sec = -2, st_mtim.tv_nsec = 42 stx_mtime.tv_sec = -2, stx_mtime.tv_nsec = 42 The more generic comment asserting that the value of struct statx_timestamp.tv_nsec might be negative is confusing to say the least. It contradicts both the struct stat.st_[acm]time_nsec tradition and struct timespec.tv_nsec requirements in utimensat syscall. If statx syscall ever returns a stx_[acm]time containing a negative tv_nsec that cannot be passed unmodified to utimensat syscall, it will cause an immense confusion. Fix this source of confusion by changing the type of struct statx_timestamp.tv_nsec from __s32 to __u32. Fixes: a528d35e8bfc ("statx: Add a system call to make enhanced file info available") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-api@vger.kernel.org cc: mtk.manpages@gmail.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26nl80211: add support for BSSIDs in scheduled scan matchsetsArend Van Spriel
This patch allows for the scheduled scan request to specify matchsets for specific BSSIDs. Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> [docs, netlink policy fix] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-26nl80211: allow multiple active scheduled scan requestsArend Van Spriel
This patch implements the idea to have multiple scheduled scan requests running concurrently. It mainly illustrates how to deal with the incoming request from user-space in terms of backward compatibility. In order to use multiple scheduled scans user-space needs to provide a flag attribute NL80211_ATTR_SCHED_SCAN_MULTI to indicate support. If not the request is treated as a legacy scan. Drivers currently supporting scheduled scan are now indicating they support a single scheduled scan request. This obsoletes WIPHY_FLAG_SUPPORTS_SCHED_SCAN. Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> [clean up netlink destroy path to avoid allocations, code cleanups] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-04-25uapi: fix linux/nfsd/cld.h userspace compilation errorsDmitry V. Levin
Include <linux/types.h> and consistently use types it provides to fix the following linux/nfsd/cld.h userspace compilation errors: /usr/include/linux/nfsd/cld.h:40:2: error: unknown type name 'uint16_t' uint16_t cn_len; /* length of cm_id */ /usr/include/linux/nfsd/cld.h:46:2: error: unknown type name 'uint8_t' uint8_t cm_vers; /* upcall version */ /usr/include/linux/nfsd/cld.h:47:2: error: unknown type name 'uint8_t' uint8_t cm_cmd; /* upcall command */ /usr/include/linux/nfsd/cld.h:48:2: error: unknown type name 'int16_t' int16_t cm_status; /* return code */ /usr/include/linux/nfsd/cld.h:49:2: error: unknown type name 'uint32_t' uint32_t cm_xid; /* transaction id */ /usr/include/linux/nfsd/cld.h:51:3: error: unknown type name 'int64_t' int64_t cm_gracetime; /* grace period start time */ Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25net: Generic XDPDavid S. Miller
This provides a generic SKB based non-optimized XDP path which is used if either the driver lacks a specific XDP implementation, or the user requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE. It is arguable that perhaps I should have required something like this as part of the initial XDP feature merge. I believe this is critical for two reasons: 1) Accessibility. More people can play with XDP with less dependencies. Yes I know we have XDP support in virtio_net, but that just creates another depedency for learning how to use this facility. I wrote this to make life easier for the XDP newbies. 2) As a model for what the expected semantics are. If there is a pure generic core implementation, it serves as a semantic example for driver folks adding XDP support. One thing I have not tried to address here is the issue of XDP_PACKET_HEADROOM, thanks to Daniel for spotting that. It seems incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or whatever even if the XDP program doesn't try to push headers at all. I think we really need the verifier to somehow propagate whether certain XDP helpers are used or not. v5: - Handle both negative and positive offset after running prog - Fix mac length in XDP_TX case (Alexei) - Use rcu_dereference_protected() in free_netdev (kbuild test robot) v4: - Fix MAC header adjustmnet before calling prog (David Ahern) - Disable LRO when generic XDP is installed (Michael Chan) - Bypass qdisc et al. on XDP_TX and record the event (Alexei) - Do not perform generic XDP on reinjected packets (DaveM) v3: - Make sure XDP program sees packet at MAC header, push back MAC header if we do XDP_TX. (Alexei) - Elide GRO when generic XDP is in use. (Alexei) - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic XDP even if the driver has an XDP implementation. (Alexei) - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS attribute. (Daniel) v2: - Add some "fall through" comments in switch statements based upon feedback from Andrew Lunn - Use RCU for generic xdp_prog, thanks to Johannes Berg. Tested-by: Andy Gospodarek <andy@greyhouse.net> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25can: add Virtual CAN Tunnel driver (vxcan)Oliver Hartkopp
Similar to the virtual ethernet driver veth, vxcan implements a local CAN traffic tunnel between two virtual CAN network devices. See Kconfig entry for details. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-04-24cls_flower: add support for matching MPLS fields (v2)Benjamin LaHaise
Add support to the tc flower classifier to match based on fields in MPLS labels (TTL, Bottom of Stack, TC field, Label). Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Gao Feng <fgao@ikuai8.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24net/tcp_fastopen: Add snmp counter for blackhole detectionWei Wang
This counter records the number of times the firewall blackhole issue is detected and active TFO is disabled. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24Merge tag 'mlx5-updates-2017-04-22' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-04-22 Sparse and compiler warnings fixes from Stephen Hemminger. From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling E-Switch encapsulation mode, this knob will enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24netfilter: ctnetlink: make it safer when updating ct->statusLiping Zhang
After converting to use rcu for conntrack hash, one CPU may update the ct->status via ctnetlink, while another CPU may process the packets and update the ct->status. So the non-atomic operation "ct->status |= status;" via ctnetlink becomes unsafe, and this may clear the IPS_DYING_BIT bit set by another CPU unexpectedly. For example: CPU0 CPU1 ctnetlink_change_status __nf_conntrack_find_get old = ct->status nf_ct_gc_expired - nf_ct_kill - test_and_set_bit(IPS_DYING_BIT new = old | status; - ct->status = new; <-- oops, _DYING_ is cleared! Now using a series of atomic bit operation to solve the above issue. Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly, so make these two bits be unchangable too. If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free, but actually it is alloced by nf_conntrack_alloc. If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer deference, as the nfct_seqadj(ct) maybe NULL. Last, add some comments to describe the logic change due to the commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS processing"), which makes me feel a little confusing. Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-24openvswitch: Add eventmask support to CT action.Jarno Rajahalme
Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK, which can be used in conjunction with the commit flag (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which conntrack events (IPCT_*) should be delivered via the Netfilter netlink multicast groups. Default behavior depends on the system configuration, but typically a lot of events are delivered. This can be very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some types of events are of interest. Netfilter core init_conntrack() adds the event cache extension, so we only need to set the ctmask value. However, if the system is configured without support for events, the setting will be skipped due to extension not being found. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.Mike Maloney
Fanout uses a per net global namespace. A process that intends to create a new fanout group can accidentally join an existing group. It is not possible to detect this. Add socket option PACKET_FANOUT_FLAG_UNIQUEID. When specified the supplied fanout group id must be set to 0, and the kernel chooses an id that is not already in use. This is an ephemeral flag so that other sockets can be added to this group using setsockopt, but NOT specifying this flag. The current getsockopt(..., PACKET_FANOUT, ...) can be used to retrieve the new group id. We assume that there are not a lot of fanout groups and that this is not a high frequency call. The method assigns ids starting at zero and increases until it finds an unused id. It keeps track of the last assigned id, and uses it as a starting point to find new ids. Signed-off-by: Mike Maloney <maloney@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24VSOCK: Add vsockmon deviceGerard Garcia
Add vsockmon virtual network device that receives packets from the vsock transports and exposes them to user space. Based on the nlmon device. Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24VSOCK: Add vsockmon tap functionsGerard Garcia
Add tap functions that can be used by the vsock transports to deliver packets to vsockmon virtual network devices. Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-22net/devlink: Add E-Switch encapsulation controlRoi Dayan
This is an e-switch global knob to enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. The actual encap/decap is carried out (along with the matching and other actions) per offloaded e-switch rules, e.g as done when offloading the TC tunnel key action. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Both conflict were simple overlapping changes. In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the conversion of the driver in net-next to use in-netdev stats. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21net: Remove NET_CORE_BUDGET_USECS from sysctl binary interface.David S. Miller
We are not supposed to add new entries to this thing any more. Thanks to Eric Dumazet for noticing this. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-04-20 This adds the basic infrastructure for IPsec hardware offloading, it creates a configuration API and adjusts the packet path. 1) Add the needed netdev features to configure IPsec offloads. 2) Add the IPsec hardware offloading API. 3) Prepare the ESP packet path for hardware offloading. 4) Add gso handlers for esp4 and esp6, this implements the software fallback for GSO packets. 5) Add xfrm replay handler functions for offloading. 6) Change ESP to use a synchronous crypto algorithm on offloading, we don't have the option for asynchronous returns when we handle IPsec at layer2. 7) Add a xfrm validate function to validate_xmit_skb. This implements the software fallback for non GSO packets. 8) Set the inner_network and inner_transport members of the SKB, as well as encapsulation, to reflect the actual positions of these headers, and removes them only once encryption is done on the payload. From Ilan Tayari. 9) Prepare the ESP GRO codepath for hardware offloading. 10) Fix incorrect null pointer check in esp6. From Colin Ian King. 11) Fix for the GSO software fallback path to detect the fallback correctly. From Ilan Tayari. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21net: ipv6: RTF_PCPU should not be settable from userspaceDavid Ahern
Andrey reported a fault in the IPv6 route code: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880069809600 task.stack: ffff880062dc8000 RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975 RSP: 0018:ffff880062dced30 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006 RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018 RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0 Call Trace: ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128 ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212 ... Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit set. Flags passed to the kernel are blindly copied to the allocated rt6_info by ip6_route_info_create making a newly inserted route appear as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set and expects rt->dst.from to be set - which it is not since it is not really a per-cpu copy. The subsequent call to __ip6_dst_alloc then generates the fault. Fix by checking for the flag and failing with EINVAL. Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21bpf: add napi_id read access to __sk_buffDaniel Borkmann
Add napi_id access to __sk_buff for socket filter program types, tc program types and other bpf_convert_ctx_access() users. Having access to skb->napi_id is useful for per RX queue listener siloing, f.e. in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is used, meaning SO_REUSEPORT enabled listeners can then select the corresponding socket at SYN time already [1]. The skb is marked via skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()). Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b35 ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up the NAPI ID associated with the queue for steering, which requires a prior sk_mark_napi_id() after the socket was looked up. Semantics for the __sk_buff napi_id access are similar, meaning if skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu), then an invalid napi_id of 0 is returned to the program, otherwise a valid non-zero napi_id. [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuningMatthew Whitehead
Constants used for tuning are generally a bad idea, especially as hardware changes over time. Replace the constant 2 jiffies with sysctl variable netdev_budget_usecs to enable sysadmins to tune the softirq processing. Also document the variable. For example, a very fast machine might tune this to 1000 microseconds, while my regression testing 486DX-25 needs it to be 4000 microseconds on a nearly idle network to prevent time_squeeze from being incremented. Version 2: changed jiffies to microseconds for predictable units. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21ip6_tunnel: Allow policy-based routing through tunnelsCraig Gallek
This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21kvm: better MWAIT emulation for guestsMichael S. Tsirkin
Guests that are heavy on futexes end up IPI'ing each other a lot. That can lead to significant slowdowns and latency increase for those guests when running within KVM. If only a single guest is needed on a host, we have a lot of spare host CPU time we can throw at the problem. Modern CPUs implement a feature called "MWAIT" which allows guests to wake up sleeping remote CPUs without an IPI - thus without an exit - at the expense of never going out of guest context. The decision whether this is something sensible to use should be up to the VM admin, so to user space. We can however allow MWAIT execution on systems that support it properly hardware wise. This patch adds a CAP to user space and a KVM cpuid leaf to indicate availability of native MWAIT execution. With that enabled, the worst a guest can do is waste as many cycles as a "jmp ." would do, so it's not a privilege problem. We consciously do *not* expose the feature in our CPUID bitmap, as most people will want to benefit from sleeping vCPUs to allow for over commit. Reported-by: "Gabriel L. Somlo" <gsomlo@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [agraf: fix amd, change commit message] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-04-21s390/gs: add regset for the guarded storage broadcast control blockMartin Schwidefsky
The guarded storage interface allows to register a control block for each thread that is activated with the guarded storage broadcast event. To retrieve the complete state of a process from the kernel a register set for the stored broadcast control block is required. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-04-20Merge tag 'mac80211-next-for-davem-2017-04-18' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== My last pull request has been a while, we now have: * connection quality monitoring with multiple thresholds * support for FILS shared key authentication offload * pre-CAC regulatory compliance - only ETSI allows this * sanity check for some rate confusion that hit ChromeOS (but nobody else uses it, evidently) * some documentation updates * lots of cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20uapi: fix linux/raid/md_p.h userspace compilation errorArtur Paszkiewicz
Use __le32 and __le64 instead of u32 and u64. This fixes klibc build error: In file included from /klibc/usr/klibc/../include/sys/md.h:30:0, from /klibc/usr/kinit/do_mounts_md.c:19: /linux-next/usr/include/linux/raid/md_p.h:414:51: error: 'u32' undeclared here (not in a function) (PPL_HEADER_SIZE - PPL_HDR_RESERVED - 4 * sizeof(u32) - sizeof(u64)) Reported-by: Greg Thelen <gthelen@google.com> Reported-by: Nigel Croxon <ncroxon@redhat.com> Tested-by: Greg Thelen <gthelen@google.com> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2017-04-20nubus: Add MVC and VSC video card definitionsFinn Thain
Also move the NUBUS_DRHW_APPLE_JET definition, for numerical order. Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2017-04-20KVM: PPC: VFIO: Add in-kernel acceleration for VFIOAlexey Kardashevskiy
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO without passing them to user space which saves time on switching to user space and back. This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM. KVM tries to handle a TCE request in the real mode, if failed it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to the user space; this is not expected to happen though. To avoid dealing with page use counters (which is tricky in real mode), this only accelerates SPAPR TCE IOMMU v2 clients which are required to pre-register the userspace memory. The very first TCE request will be handled in the VFIO SPAPR TCE driver anyway as the userspace view of the TCE table (iommu_table::it_userspace) is not allocated till the very first mapping happens and we cannot call vmalloc in real mode. If we fail to update a hardware IOMMU table unexpected reason, we just clear it and move on as there is nothing really we can do about it - for example, if we hot plug a VFIO device to a guest, existing TCE tables will be mirrored automatically to the hardware and there is no interface to report to the guest about possible failures. This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd and associates a physical IOMMU table with the SPAPR TCE table (which is a guest view of the hardware IOMMU table). The iommu_table object is cached and referenced so we do not have to look up for it in real mode. This does not implement the UNSET counterpart as there is no use for it - once the acceleration is enabled, the existing userspace won't disable it unless a VFIO container is destroyed; this adds necessary cleanup to the KVM_DEV_VFIO_GROUP_DEL handler. This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user space. This adds real mode version of WARN_ON_ONCE() as the generic version causes problems with rcu_sched. Since we testing what vmalloc_to_phys() returns in the code, this also adds a check for already existing vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect(). This finally makes use of vfio_external_user_iommu_id() which was introduced quite some time ago and was considered for removal. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability numberAlexey Kardashevskiy
This adds a capability number for in-kernel support for VFIO on SPAPR platform. The capability will tell the user space whether in-kernel handlers of H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space must not attempt allocating a TCE table in the host kernel via the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests will not be passed to the user space which is desired action in the situation like that. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-19nefilter: eache: reduce struct size from 32 to 24 byteFlorian Westphal
Only "cache" needs to use ulong (its used with set_bit()), missed can use u16. Also add build-time assertion to ensure event bits fit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-19netfilter: nft_ct: allow to set ctnetlink event types of a connectionFlorian Westphal
By default the kernel emits all ctnetlink events for a connection. This allows to select the types of events to generate. This can be used to e.g. only send DESTROY events but no NEW/UPDATE ones and will work even if sysctl net.netfilter.nf_conntrack_events is set to 0. This was already possible via iptables' CT target, but the nft version has the advantage that it can also be used with already-established conntracks. The added nf_ct_is_template() check isn't a bug fix as we only support mark and labels (and unlike ecache the conntrack core doesn't copy those). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-19powerpc/perf: Define big-endian version of perf_mem_data_srcSukadev Bhattiprolu
perf_mem_data_src is a union that is initialized in the kernel via the ->val field and accessed by userspace via the mem_xxx bitfields. For this to work correctly on big endian platforms, we need a big-endian definition for the bitfields. Currently on a big endian system, if a user requests PERF_SAMPLE_DATA_SRC (perf report -d), they will get the default value from perf_sample_data_init(), which is PERF_MEM_NA. The value for PERF_MEM_NA is constructed using shifts: /* TLB access */ #define PERF_MEM_TLB_NA 0x01 /* not available */ ... #define PERF_MEM_TLB_SHIFT 26 #define PERF_MEM_S(a, s) \ (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT) #define PERF_MEM_NA (PERF_MEM_S(OP, NA) |\ PERF_MEM_S(LVL, NA) |\ PERF_MEM_S(SNOOP, NA) |\ PERF_MEM_S(LOCK, NA) |\ PERF_MEM_S(TLB, NA)) Which works out as: ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26)) Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021 in CPU endian. But then in the perf tool, the code uses the bitfields to inspect the value, and currently the bitfields are defined using little endian ordering. So eg. in perf_mem__tlb_scnprintf() we see: data_src->val = 0x5080021 op = 0x0 lvl = 0x0 snoop = 0x0 lock = 0x0 dtlb = 0x0 rsvd = 0x5080021 Because of the way the perf tool code is written this is still displayed to the user as "N/A", so there is no bug visible at the UI level. Currently there are no big endian architectures which export a meaningful value (ie. other than PERF_MEM_NA), so the extent of the bug on big endian platforms is that the PERF_MEM_NA value is exported incorrectly as described above. Subsequent patches will add support on big endian powerpc for populating the data source value. This patch does a minimal fix of adding big endian definition of the bitfields to match the values that are already exported by the kernel on big endian. And it makes no change on little endian. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-19Merge tag 'v4.11-rc7' into drm-nextDave Airlie
Backmerge Linux 4.11-rc7 from Linus tree, to fix some conflicts that were causing problems with the rerere cache in drm-tip.
2017-04-18PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constantMatthias Kaehlcke
A 64-bit value is not needed since a PCI ROM address consists in 32 bits. This fixes a clang warning about "implicit conversion from 'unsigned long' to 'u32'". Also remove now unnecessary casts to u32 from __pci_read_base() and pci_std_update_resource(). Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-04-18usb: fix some references for /proc/bus/usbMauro Carvalho Chehab
Since when we got rid of usbfs, the /proc/bus/usb is now elsewhere. Fix references for it. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-18Btrfs: consistent usage of types in balance_argsHans van Kranenburg
The btrfs_balance_args are only used for the balance ioctl, so use __u instead of __le here for consistency. The __le usage was introduced in bc3094673f22d and dee32d0ac3719 and was probably a result of copy/pasting when the code was written. The usage of __le did not break anything, but it's unnecessary. Also, this change makes the code less confusing for the careful reader. Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-04-17nbd: add a flag to destroy an nbd device on disconnectJosef Bacik
For ease of management it would be nice for users to specify that the device node for a nbd device is destroyed once it is disconnected and there are no more users. Add a client flag and enable this operation to happen. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a status netlink commandJosef Bacik
Allow users to query the status of existing nbd devices. Right now this only returns whether or not the device is connected, but could be extended in the future to include more information. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: handle dead connectionsJosef Bacik
Sometimes we like to upgrade our server without making all of our clients freak out and reconnect. This patch provides a way to specify a dead connection timeout to allow us to pause all requests and wait for new connections to be opened. With this in place I can take down the nbd server for less than the dead connection timeout time and bring it back up and everything resumes gracefully. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: multicast dead link notificationsJosef Bacik
Provide a mechanism to notify userspace that there's been a link problem on a NBD device. This will allow userspace to re-establish a connection and provide the new socket to the device without disrupting the device. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a reconfigure netlink commandJosef Bacik
We want to be able to reconnect dead connections to existing block devices, so add a reconfigure netlink command. We will also allow users to change their timeout on the fly, but everything else will require a disconnect and reconnect. You won't be able to add more connections either, simply replace dead connections with new more lively connections. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-17nbd: add a basic netlink interfaceJosef Bacik
The existing ioctl interface for configuring NBD devices is a bit cumbersome and hard to extend. The other problem is we leave a userspace app sitting in it's syscall until the device disconnects, which is less than ideal. This patch introduces a netlink interface for adding and disconnecting nbd devices. This has the benefits of being easily extendable without breaking older userspace applications, and allows us to configure a nbd device without leaving a userspace app sitting waiting for the device to disconnect. With this interface we also gain the ability to configure more devices than are preallocated at insmod time. We also have gained the ability to not specify a particular device and be provided one for us so that userspace doesn't need to find a free device to configure. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>