summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-05-01net: ll_temac: Allow configuration of IRQ coalescingEsben Haabendal
This allows custom setup of IRQ coalescing for platforms using legacy platform_device. The irq timeout and count parameters can be used for tuning cpu load vs. latency. I have maintained the 0x00000400 bit in TX_CHNL_CTRL. It is specified as unused in the documentation I have available. It does not make any difference in the hardware I have available, so it is left in to not risk breaking other platforms where it might be used. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: ll_temac: Support indirect_mutex share within TEMAC IPEsben Haabendal
Indirect register access goes through a DCR bus bridge, which allows only one outstanding transaction. And to make matters worse, each TEMAC IP block contains two Ethernet interfaces, and although they seem to have separate registers for indirect access, they actually share the registers. Or to be more specific, MSW, LSW and CTL registers are physically shared between Ethernet interfaces in same TEMAC IP, with RDY register being (almost) specificic to the Ethernet interface. The 0x10000 bit in RDY reflects combined bus ready state though. So we need to take care to synchronize not only within a single device, but also between devices in same TEMAC IP. This commit allows to do that with legacy platform devices. For OF devices, the xlnx,compound parent of the temac node should be used to find siblings, and setup a shared indirect_mutex between them. I will leave this work to somebody else, as I don't have hardware to test that. No regression is introduced by that, as before this commit using two Ethernet interfaces in same TEMAC block is simply broken. Signed-off-by: Esben Haabendal <esben@geanix.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: ll_temac: Add support for non-native register endiannessEsben Haabendal
Replace the powerpc specific MMIO register access functions with the generic big-endian mmio access functions, and add support for little-endian access depending on configuration. Big-endian access is maintained as the default, but little-endian can be configured in device-tree binding or in platform data. The temac_ior()/temac_iow() functions are replaced with macro wrappers to avoid modifying existing code more than necessary. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: ll_temac: Extend support to non-device-tree platformsEsben Haabendal
Support initialization with platdata, so the driver can be used on non-device-tree platforms. For currently supported device-tree platforms, the driver should behave as before. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-04-28 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Introduce BPF socket local storage map so that BPF programs can store private data they associate with a socket (instead of e.g. separate hash table), from Martin. 2) Add support for bpftool to dump BTF types. This is done through a new `bpftool btf dump` sub-command, from Andrii. 3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which was currently not supported since skb was used to lookup netns, from Stanislav. 4) Add an opt-in interface for tracepoints to expose a writable context for attached BPF programs, used here for NBD sockets, from Matt. 5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel. 6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to support tunnels such as sit. Add selftests as well, from Willem. 7) Various smaller misc fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27ipset: drop ipset_nest_start() and ipset_nest_end()Michal Kubecek
After the previous commit, both ipset_nest_start() and ipset_nest_end() are just aliases for nla_nest_start() and nla_nest_end() so that there is no need to keep them. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27net/tls: move definition of tls ops into net/tls.hJakub Kicinski
There seems to be no reason for tls_ops to be defined in netdevice.h which is included in a lot of places. Don't wrap the struct/enum declaration in ifdefs, it trickles down unnecessary ifdefs into driver code. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27bpf: Introduce bpf sk local storageMartin KaFai Lau
After allowing a bpf prog to - directly read the skb->sk ptr - get the fullsock bpf_sock by "bpf_sk_fullsock()" - get the bpf_tcp_sock by "bpf_tcp_sock()" - get the listener sock by "bpf_get_listener_sock()" - avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock" into different bpf running context. this patch is another effort to make bpf's network programming more intuitive to do (together with memory and performance benefit). When bpf prog needs to store data for a sk, the current practice is to define a map with the usual 4-tuples (src/dst ip/port) as the key. If multiple bpf progs require to store different sk data, multiple maps have to be defined. Hence, wasting memory to store the duplicated keys (i.e. 4 tuples here) in each of the bpf map. [ The smallest key could be the sk pointer itself which requires some enhancement in the verifier and it is a separate topic. ] Also, the bpf prog needs to clean up the elem when sk is freed. Otherwise, the bpf map will become full and un-usable quickly. The sk-free tracking currently could be done during sk state transition (e.g. BPF_SOCK_OPS_STATE_CB). The size of the map needs to be predefined which then usually ended-up with an over-provisioned map in production. Even the map was re-sizable, while the sk naturally come and go away already, this potential re-size operation is arguably redundant if the data can be directly connected to the sk itself instead of proxy-ing through a bpf map. This patch introduces sk->sk_bpf_storage to provide local storage space at sk for bpf prog to use. The space will be allocated when the first bpf prog has created data for this particular sk. The design optimizes the bpf prog's lookup (and then optionally followed by an inline update). bpf_spin_lock should be used if the inline update needs to be protected. BPF_MAP_TYPE_SK_STORAGE: ----------------------- To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can be created to fit different bpf progs' needs. The map enforces BTF to allow printing the sk-local-storage during a system-wise sk dump (e.g. "ss -ta") in the future. The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete a "sk-local-storage" data from a particular sk. Think of the map as a meta-data (or "type") of a "sk-local-storage". This particular "type" of "sk-local-storage" data can then be stored in any sk. The main purposes of this map are mostly: 1. Define the size of a "sk-local-storage" type. 2. Provide a similar syscall userspace API as the map (e.g. lookup/update, map-id, map-btf...etc.) 3. Keep track of all sk's storages of this "type" and clean them up when the map is freed. sk->sk_bpf_storage: ------------------ The main lookup/update/delete is done on sk->sk_bpf_storage (which is a "struct bpf_sk_storage"). When doing a lookup, the "map" pointer is now used as the "key" to search on the sk_storage->list. The "map" pointer is actually serving as the "type" of the "sk-local-storage" that is being requested. To allow very fast lookup, it should be as fast as looking up an array at a stable-offset. At the same time, it is not ideal to set a hard limit on the number of sk-local-storage "type" that the system can have. Hence, this patch takes a cache approach. The last search result from sk_storage->list is cached in sk_storage->cache[] which is a stable sized array. Each "sk-local-storage" type has a stable offset to the cache[] array. In the future, a map's flag could be introduced to do cache opt-out/enforcement if it became necessary. The cache size is 16 (i.e. 16 types of "sk-local-storage"). Programs can share map. On the program side, having a few bpf_progs running in the networking hotpath is already a lot. The bpf_prog should have already consolidated the existing sock-key-ed map usage to minimize the map lookup penalty. 16 has enough runway to grow. All sk-local-storage data will be removed from sk->sk_bpf_storage during sk destruction. bpf_sk_storage_get() and bpf_sk_storage_delete(): ------------------------------------------------ Instead of using bpf_map_(lookup|update|delete)_elem(), the bpf prog needs to use the new helper bpf_sk_storage_get() and bpf_sk_storage_delete(). The verifier can then enforce the ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to "create" new elem if one does not exist in the sk. It is done by the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE. The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together, it has eliminated the potential use cases for an equivalent bpf_map_update_elem() API (for bpf_prog) in this patch. Misc notes: ---------- 1. map_get_next_key is not supported. From the userspace syscall perspective, the map has the socket fd as the key while the map can be shared by pinned-file or map-id. Since btf is enforced, the existing "ss" could be enhanced to pretty print the local-storage. Supporting a kernel defined btf with 4 tuples as the return key could be explored later also. 2. The sk->sk_lock cannot be acquired. Atomic operations is used instead. e.g. cmpxchg is done on the sk->sk_bpf_storage ptr. Please refer to the source code comments for the details in synchronization cases and considerations. 3. The mem is charged to the sk->sk_omem_alloc as the sk filter does. Benchmark: --------- Here is the benchmark data collected by turning on the "kernel.bpf_stats_enabled" sysctl. Two bpf progs are tested: One bpf prog with the usual bpf hashmap (max_entries = 8192) with the sk ptr as the key. (verifier is modified to support sk ptr as the key That should have shortened the key lookup time.) Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE. Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for each egress skb and then bump the cnt. netperf is used to drive data with 4096 connected UDP sockets. BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run) 27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633 loaded_at 2019-04-15T13:46:39-0700 uid 0 xlated 344B jited 258B memlock 4096B map_ids 16 btf_id 5 BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run) 30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739 loaded_at 2019-04-15T13:47:54-0700 uid 0 xlated 168B jited 156B memlock 4096B map_ids 17 btf_id 6 Here is a high-level picture on how are the objects organized: sk ┌──────┐ │ │ │ │ │ │ │*sk_bpf_storage─────▶ bpf_sk_storage └──────┘ ┌───────┐ ┌───────────┤ list │ │ │ │ │ │ │ │ │ │ │ └───────┘ │ │ elem │ ┌────────┐ ├─▶│ snode │ │ ├────────┤ │ │ data │ bpf_map │ ├────────┤ ┌─────────┐ │ │map_node│◀─┬─────┤ list │ │ └────────┘ │ │ │ │ │ │ │ │ elem │ │ │ │ ┌────────┐ │ └─────────┘ └─▶│ snode │ │ ├────────┤ │ bpf_map │ data │ │ ┌─────────┐ ├────────┤ │ │ list ├───────▶│map_node│ │ │ │ └────────┘ │ │ │ │ │ │ elem │ └─────────┘ ┌────────┐ │ ┌─▶│ snode │ │ │ ├────────┤ │ │ │ data │ │ │ ├────────┤ │ │ │map_node│◀─┘ │ └────────┘ │ │ │ ┌───────┐ sk └──────────│ list │ ┌──────┐ │ │ │ │ │ │ │ │ │ │ │ │ └───────┘ │*sk_bpf_storage───────▶bpf_sk_storage └──────┘ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26bpf: add writable context for raw tracepointsMatt Mullins
This is an opt-in interface that allows a tracepoint to provide a safe buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program. The size of the buffer must be a compile-time constant, and is checked before allowing a BPF program to attach to a tracepoint that uses this feature. The pointer to this buffer will be the first argument of tracepoints that opt in; the pointer is valid and can be bpf_probe_read() by both BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that attach to such a tracepoint, but the buffer to which it points may only be written by the latter. Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-26Merge tag 'mac80211-next-for-davem-2019-04-26' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Various updates, notably: * extended key ID support (from 802.11-2016) * per-STA TX power control support * mac80211 TX performance improvements * HE (802.11ax) updates * mesh link probing support * enhancements of multi-BSSID support (also related to HE) * OWE userspace processing support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26ieee80211: update HE IEs to D4.0 specLiad Kaufman
Update the out-dated comments as well, and have them point to the correct sections in the D4.0 spec. Signed-off-by: Liad Kaufman <liad.kaufman@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26cfg80211: support non-inheritance elementSara Sharon
Subelement profile may specify element IDs it doesn't inherit from the management frame. Support it. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-04-26Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-04-25 Here's the main bluetooth-next pull request for the 5.2 kernel. - Added support for Mediatek SDIO controllers - Added support for Broadcom BCM2076B1 UART controller - Added support for Marvel SD8987 chipset - Fix buffer overflow bug in hidp protocol - Various other smaller fixes & improvements Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25bpf: support BPF_PROG_QUERY for BPF_FLOW_DISSECTOR attach_typeStanislav Fomichev
target_fd is target namespace. If there is a flow dissector BPF program attached to that namespace, its (single) id is returned. v5: * drop net ref right after rcu unlock (Daniel Borkmann) v4: * add missing put_net (Jann Horn) v3: * add missing inline to skb_flow_dissector_prog_query static def (kbuild test robot <lkp@intel.com>) v2: * don't sleep in rcu critical section (Jakub Kicinski) * check input prog_cnt (exit early) Cc: Jann Horn <jannh@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Just the usual assortment of small'ish fixes: 1) Conntrack timeout is sometimes not initialized properly, from Alexander Potapenko. 2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid undefined behavior. From ZhangXiaoxu. 3) des1 field of descriptor in stmmac driver is initialized with the wrong variable. From Yue Haibing. 4) Increase mlxsw pci sw reset timeout a little bit more, from Ido Schimmel. 5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng. 6) Fallback refcount fix in TLS code, from Jakub Kicinski. 7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy. 8) Fix recursive locking in team driver, from Hangbin Liu. 9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski. 10) Don't use napi_alloc_frag() outside of softiq context of socionext driver, from Ilias Apalodimas. 11) MAC address increment overflow in ncsi, from Tao Ren. 12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun. 13) ipv4_link_failure has to validate the headers that are actually there because RAW sockets can pass in arbitrary garbage, from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) ipv4: add sanity checks in ipv4_link_failure() net/rose: fix unbound loop in rose_loopback_timer() rxrpc: fix race condition in rxrpc_input_packet() net: rds: exchange of 8K and 1M pool net: vrf: Fix operation not supported when set vrf mac net/ncsi: handle overflow when incrementing mac address net: socionext: replace napi_alloc_frag with the netdev variant on init net: atheros: fix spelling mistake "underun" -> "underrun" spi: ST ST95HF NFC: declare missing of table spi: Micrel eth switch: declare missing of table net: stmmac: move stmmac_check_ether_addr() to driver probe netfilter: fix nf_l4proto_log_invalid to log invalid packets netfilter: never get/set skb->tstamp netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK dt-bindings: add an explanation for internal phy-mode net/tls: don't leak IV and record seq when offload fails net/tls: avoid potential deadlock in tls_set_device_offload_rx() selftests/net: correct the return value for run_afpackettests team: fix possible recursive locking when add slaves ...
2019-04-23net/ncsi: handle overflow when incrementing mac addressTao Ren
Previously BMC's MAC address is calculated by simply adding 1 to the last byte of network controller's MAC address, and it produces incorrect result when network controller's MAC address ends with 0xFF. The problem can be fixed by calling eth_addr_inc() function to increment MAC address; besides, the MAC address is also validated before assigning to BMC. Fixes: cb10c7c0dfd9 ("net/ncsi: Add NCSI Broadcom OEM command") Signed-off-by: Tao Ren <taoren@fb.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-23Merge tag 'mlx5-updates-2019-04-22' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-04-22 This series includes updates to mlx5e driver RX data path and some significant XDP RX/TX improvements to overcome/mitigate HW and PCIE bottlenecks. From Tariq: 1) Some Enhancements in rq->flags 2) Stabilize RX packet rate (on Striding RQ) with multiple outstanding UMR posts In this patch, we add support for multiple outstanding UMR posts, to allow faster gap closure between consuming MPWQEs and reposting them back into the WQ. Performance test: As expected, huge improvement in large-scale (48 cores). xdp_redirect_map, 64B UDP multi-stream. Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. Before: Unstable, 7 to 30 Mpps After: Stable, at 70.5 Mpps From Shay: 3) XDP, Inline small packets into the TX MPWQE in XDP xmit flow Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: | | before | after | | | 24 rings | 51Mpps | 116Mpps | +126% | | 1 ring | 12Mpps | 12Mpps | same | XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. | | before | after | | | 32 rings | 64Mpps | 92Mpps | +43% | | 1 ring | 6.4Mpps | 6.4Mpps | same | As we can see, feature significantly improves scaling, without hurting single ring performance. From Maxim: 4) Some trivial refactoring and code improvements prior to a larger series to support AF_XDP. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-23net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flowShay Agroskin
Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. This is added to better utilize the HW resources (which now makes one less packet data prefetch) and allow better scalability, on the account of CPU usage (which now 'memcpy's the packet into the WQE). To load balance between HW and CPU and get max packet rate, we use watermarks to detect how much the HW is congested and move the work loads back and forth between HW and CPU. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: | | before | after | | | 24 rings | 51Mpps | 116Mpps | +126% | | 1 ring | 12Mpps | 12Mpps | same | XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. | | before | after | | | 32 rings | 64Mpps | 92Mpps | +43% | | 1 ring | 6.4Mpps | 6.4Mpps | same | As we can see, feature significantly improves scaling, without hurting single ring performance. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-23Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
2019-04-23net: phy: marvell: add new default led configure for m88e151xJian Shen
The default m88e151x LED configuration is 0x1177, used LED[0] for 1000M link, LED[1] for 100M link, and LED[2] for active. But for some boards, which use LED[0] for link, and LED[1] for active, prefer to be 0x1040. To be compatible with this case, this patch defines a new dev_flag, and set it before connect phy in HNS3 driver. When phy initializing, using the new LED configuration if this dev_flag is set. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-23net: pass net_device argument to the eth_get_headlenStanislav Fomichev
Update all users of eth_get_headlen to pass network device, fetch network namespace from it and pass it down to the flow dissector. This commit is a noop until administrator inserts BPF flow dissector program. Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: Yisen Zhuang <yisen.zhuang@huawei.com> Cc: Salil Mehta <salil.mehta@huawei.com> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-23flow_dissector: handle no-skb use caseStanislav Fomichev
When called without skb, gather all required data from the __skb_flow_dissect's arguments and use recently introduces no-skb mode of bpf flow dissector. Note: WARN_ON_ONCE(!net) will now trigger for eth_get_headlen users. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-23net: plumb network namespace into __skb_flow_dissectStanislav Fomichev
This new argument will be used in the next patches for the eth_get_headlen use case. eth_get_headlen calls flow dissector with only data (without skb) so there is currently no way to pull attached BPF flow dissector program. With this new argument, we can amend the callers to explicitly pass network namespace so we can use attached BPF program. Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-23flow_dissector: switch kernel context to struct bpf_flow_dissectorStanislav Fomichev
struct bpf_flow_dissector has a small subset of sk_buff fields that flow dissector BPF program is allowed to access and an optional pointer to real skb. Real skb is used only in bpf_skb_load_bytes helper to read non-linear data. The real motivation for this is to be able to call flow dissector from eth_get_headlen context where we don't have an skb and need to dissect raw bytes. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-23mmc: sdio: Add helper macro for sdio_driver boilerplateSean Wang
This patch introduces the module_sdio_driver macro which is a convenience macro for SDIO driver modules similar to module_usb_driver. It is intended to be used by drivers which init/exit section does nothing but register/ unregister the SDIO driver. By using this macro it is possible to eliminate a few lines of boilerplate code per SDIO driver. Suggested-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-04-23mmc: add SDIO identifiers for MediaTek Bluetooth devicesSean Wang
The SDIO identifier for MediaTek Bluetooth devices were defined in the MediaTek Bluetooth driver. Moving the definitions in MMC header file seems common sense. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-04-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-04-22 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) allow stack/queue helpers from more bpf program types, from Alban. 2) allow parallel verification of root bpf programs, from Alexei. 3) introduce bpf sysctl hook for trusted root cases, from Andrey. 4) recognize var/datasec in btf deduplication, from Andrii. 5) cpumap performance optimizations, from Jesper. 6) verifier prep for alu32 optimization, from Jiong. 7) libbpf xsk cleanup, from Magnus. 8) other various fixes and cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-23bpf: remove global variablesAlexei Starovoitov
Move three global variables protected by bpf_verifier_lock into 'struct bpf_verifier_env' to allow parallel verification. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-22Merge tag 'v5.1-rc1' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next Linux 5.1-rc1 We forgot to reset the branch last merge window thus mlx5-next is outdated and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next branch with 5.1-rc1. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-20Merge tag 'for-linus-20190420' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A set of small fixes that should go into this series. This contains: - Removal of unused queue member (Hou) - Overflow bvec fix (Ming) - Various little io_uring tweaks (me) - kthread parking - Only call cpu_possible() for verified CPU - Drop unused 'file' argument to io_file_put() - io_uring_enter vs io_uring_register deadlock fix - CQ overflow fix - BFQ internal depth update fix (me)" * tag 'for-linus-20190420' of git://git.kernel.dk/linux-block: block: make sure that bvec length can't be overflow block: kill all_q_node in request_queue io_uring: fix CQ overflow condition io_uring: fix possible deadlock between io_uring_{enter,register} io_uring: drop io_file_put() 'file' argument bfq: update internal depth state when queue depth changes io_uring: only test SQPOLL cpu after we've verified it io_uring: park SQPOLL thread if it's percpu
2019-04-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: - various tooling fixes - kretprobe fixes - kprobes annotation fixes - kprobes error checking fix - fix the default events for AMD Family 17h CPUs - PEBS fix - AUX record fix - address filtering fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Avoid kretprobe recursion bug kprobes: Mark ftrace mcount handler functions nokprobe x86/kprobes: Verify stack frame on kretprobe perf/x86/amd: Add event map for AMD Family 17h perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf() perf tools: Fix map reference counting perf evlist: Fix side band thread draining perf tools: Check maps for bpf programs perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info() tools include uapi: Sync sound/asound.h copy perf top: Always sample time to satisfy needs of use of ordered queuing perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user) tools lib traceevent: Fix missing equality check for strcmp perf stat: Disable DIR_FORMAT feature for 'perf stat record' perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view perf header: Fix lock/unlock imbalances when processing BPF/BTF info perf/x86: Fix incorrect PEBS_REGS perf/ring_buffer: Fix AUX record suppression perf/core: Fix the address filtering fix kprobes: Fix error check when reusing optimized probes
2019-04-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes all over the place: a console spam fix, section attributes fixes, a KASLR fix, a TLB stack-variable alignment fix, a reboot quirk, boot options related warnings fix, an LTO fix, a deadlock fix and an RDT fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority x86/cpu/bugs: Use __initconst for 'const' init data x86/mm/KASLR: Fix the size of the direct mapping section x86/Kconfig: Fix spelling mistake "effectivness" -> "effectiveness" x86/mm/tlb: Revert "x86/mm: Align TLB invalidation info" x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51T x86/mm: Prevent bogus warnings with "noexec=off" x86/build/lto: Fix truncated .bss with -fdata-sections x86/speculation: Prevent deadlock on ssb_state::lock x86/resctrl: Do not repeat rdtgroup mode initialization
2019-04-19net: rework SIOCGSTAMP ioctl handlingArnd Bergmann
The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many socket protocol handlers, and all of those end up calling the same sock_get_timestamp()/sock_get_timestampns() helper functions, which results in a lot of duplicate code. With the introduction of 64-bit time_t on 32-bit architectures, this gets worse, as we then need four different ioctl commands in each socket protocol implementation. To simplify that, let's add a new .gettstamp() operation in struct proto_ops, and move ioctl implementation into the common sock_ioctl()/compat_sock_ioctl_trans() functions that these all go through. We can reuse the sock_get_timestamp() implementation, but generalize it so it can deal with both native and compat mode, as well as timeval and timespec structures. Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19block: make sure that bvec length can't be overflowMing Lei
bvec->bv_offset may be bigger than PAGE_SIZE sometimes, such as, when one bio is splitted in the middle of one bvec via bio_split(), and bi_iter.bi_bvec_done is used to build offset of the 1st bvec of remained bio. And the remained bio's bvec may be re-submitted to fs layer via ITER_IBVEC, such as loop and nvme-loop. So we have to make sure that every bvec's offset is less than PAGE_SIZE from bio_for_each_segment_all() because some drivers(loop, nvme-loop) passes the splitted bvec to fs layer via ITER_BVEC. This patch fixes this issue reported by Zhang Yi When running nvme/011. Cc: Christoph Hellwig <hch@lst.de> Cc: Yi Zhang <yi.zhang@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 6dc4f100c175 ("block: allow bio_for_each_segment_all() to iterate over multi-page bvec") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-19block: kill all_q_node in request_queueHou Tao
all_q_node has not been used since commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU"), so remove it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-19coredump: fix race condition between mmget_not_zero()/get_task_mm() and core ↵Andrea Arcangeli
dumping The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-19mm: swapoff: shmem_unuse() stop eviction without igrab()Hugh Dickins
The igrab() in shmem_unuse() looks good, but we forgot that it gives no protection against concurrent unmounting: a point made by Konstantin Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893ae78 ("tmpfs: fix race between umount and swapoff"). The current 5.1-rc swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day..." followed by GPF. Once again, give up on using igrab(); but don't go back to making such heavy-handed use of shmem_swaplist_mutex as last time: that would spoil the new design, and I expect could deadlock inside shmem_swapin_page(). Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem- specific inode, and shmem_evict_inode() wait for that to go down to 0. Call it "stop_eviction" rather than "swapoff_busy" because it can be put to use for others later (huge tmpfs patches expect to use it). That simplifies shmem_unuse(), protecting it from both unlink and unmount; and in practice lets it locate all the swap in its first try. But do not rely on that: there's still a theoretical case, when shmem_writepage() might have been preempted after its get_swap_page(), before making the swap entry visible to swapoff. [hughd@google.com: remove incorrect list_del()] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904091133570.1898@eggly.anvils Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1904081259400.1523@eggly.anvils Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca> Cc: Huang Ying <ying.huang@intel.com> Cc: Kelley Nielsen <kelleynnn@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Rik van Riel <riel@surriel.com> Cc: Vineeth Pillai <vpillai@digitalocean.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-19x86/kprobes: Verify stack frame on kretprobeMasami Hiramatsu
Verify the stack frame pointer on kretprobe trampoline handler, If the stack frame pointer does not match, it skips the wrong entry and tries to find correct one. This can happen if user puts the kretprobe on the function which can be used in the path of ftrace user-function call. Such functions should not be probed, so this adds a warning message that reports which function should be blacklisted. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/155094059185.6137.15527904013362842072.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-18net: mdio: rename mdio_device reset to reset_gpioDavid Bauer
This renames the GPIO reset of mdio devices from 'reset' to 'reset_gpio' to better differentiate between GPIO and reset-controller driven reset line. Signed-off-by: David Bauer <mail@david-bauer.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18net: phy: add support for reset-controllerDavid Bauer
This commit adds support for PHY reset pins handled by a reset controller. Signed-off-by: David Bauer <mail@david-bauer.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18net: skb: remove unused assertsJakub Kicinski
We are discouraging the use of BUG() these days, remove the unused ASSERT macros from skbuff.h. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17net: core: introduce build_skb_aroundJesper Dangaard Brouer
The function build_skb() also have the responsibility to allocate and clear the SKB structure. Introduce a new function build_skb_around(), that moves the responsibility of allocation and clearing to the caller. This allows caller to use kmem_cache (slab/slub) bulk allocation API. Next patch use this function combined with kmem_cache_alloc_bulk. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Handle init flow failures properly in iwlwifi driver, from Shahar S Matityahu. 2) mac80211 TXQs need to be unscheduled on powersave start, from Felix Fietkau. 3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau. 4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed. 5) Avoid checksum complete with XDP in mlx5, also from Saeed. 6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon. 7) Partial sent TLS record leak fix from Jakub Kicinski. 8) Reject zero size iova range in vhost, from Jason Wang. 9) Allow pending work to complete before clcsock release from Karsten Graul. 10) Fix XDP handling max MTU in thunderx, from Matteo Croce. 11) A lot of protocols look at the sa_family field of a sockaddr before validating it's length is large enough, from Tetsuo Handa. 12) Don't write to free'd pointer in qede ptp error path, from Colin Ian King. 13) Have to recompile IP options in ipv4_link_failure because it can be invoked from ARP, from Stephen Suryaputra. 14) Doorbell handling fixes in qed from Denis Bolotin. 15) Revert net-sysfs kobject register leak fix, it causes new problems. From Wang Hai. 16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva. 17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay Aleksandrov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits) socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW tcp: tcp_grow_window() needs to respect tcp_space() ocelot: Clean up stats update deferred work ocelot: Don't sleep in atomic context (irqs_disabled()) net: bridge: fix netlink export of vlan_stats_per_port option qed: fix spelling mistake "faspath" -> "fastpath" tipc: set sysctl_tipc_rmem and named_timeout right range tipc: fix link established but not in session net: Fix missing meta data in skb with vlan packet net: atm: Fix potential Spectre v1 vulnerabilities net/core: work around section mismatch warning for ptp_classifier net: bridge: fix per-port af_packet sockets bnx2x: fix spelling mistake "dicline" -> "decline" route: Avoid crash from dereferencing NULL rt->from MAINTAINERS: normalize Woojung Huh's email address bonding: fix event handling for stacked bonds Revert "net-sysfs: Fix memory leak in netdev_register_kobject" rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check qed: Fix the DORQ's attentions handling qed: Fix missing DORQ attentions ...
2019-04-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "5.1 keeps its reputation as a big bugfix release for KVM x86. - Fix for a memory leak introduced during the merge window - Fixes for nested VMX with ept=0 - Fixes for AMD (APIC virtualization, NMI injection) - Fixes for Hyper-V under KVM and KVM under Hyper-V - Fixes for 32-bit SMM and tests for SMM virtualization - More array_index_nospec peppering" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing KVM: fix spectrev1 gadgets KVM: x86: fix warning Using plain integer as NULL pointer selftests: kvm: add a selftest for SMM selftests: kvm: fix for compilers that do not support -no-pie selftests: kvm/evmcs_test: complete I/O before migrating guest state KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU KVM: x86: clear SMM flags before loading state while leaving SMM KVM: x86: Open code kvm_set_hflags KVM: x86: Load SMRAM in a single shot when leaving SMM KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU KVM: x86: Raise #GP when guest vCPU do not support PMU x86/kvm: move kvm_load/put_guest_xcr0 into atomic context KVM: x86: svm: make sure NMI is injected after nmi_singlestep svm/avic: Fix invalidate logical APIC id entry Revert "svm: Fix AVIC incomplete IPI emulation" kvm: mmu: Fix overflow on kvm mmu page limit calculation KVM: nVMX: always use early vmcs check when EPT is disabled KVM: nVMX: allow tests to use bad virtual-APIC page address ...
2019-04-16KVM: fix spectrev1 gadgetsPaolo Bonzini
These were found with smatch, and then generalized when applicable. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-16x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51TJian-Hong Pan
Upon reboot, the Acer TravelMate X514-51T laptop appears to complete the shutdown process, but then it hangs in BIOS POST with a black screen. The problem is intermittent - at some points it has appeared related to Secure Boot settings or different kernel builds, but ultimately we have not been able to identify the exact conditions that trigger the issue to come and go. Besides, the EFI mode cannot be disabled in the BIOS of this model. However, after extensive testing, we observe that using the EFI reboot method reliably avoids the issue in all cases. So add a boot time quirk to use EFI reboot on such systems. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203119 Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: linux@endlessm.com Link: http://lkml.kernel.org/r/20190412080152.3718-1-jian-hong@endlessm.com [ Fix !CONFIG_EFI build failure, clarify the code and the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for net-next: 1) Remove the broute pseudo hook, implement this from the bridge prerouting hook instead. Now broute becomes real table in ebtables, from Florian Westphal. This also includes a size reduction patch for the bridge control buffer area via squashing boolean into bitfields and a selftest. 2) Add OS passive fingerprint version matching, from Fernando Fernandez. 3) Support for gue encapsulation for IPVS, from Jacky Hu. 4) Add support for NAT to the inet family, from Florian Westphal. This includes support for masquerade, redirect and nat extensions. 5) Skip interface lookup in flowtable, use device in the dst object. 6) Add jiffies64_to_msecs() and use it, from Li RongQing. 7) Remove unused parameter in nf_tables_set_desc_parse(), from Colin Ian King. 8) Statify several functions, patches from YueHaibing and Florian Westphal. 9) Add an optimized version of nf_inet_addr_cmp(), from Li RongQing. 10) Merge route extension to core, also from Florian. 11) Use IS_ENABLED(CONFIG_NF_NAT) instead of NF_NAT_NEEDED, from Florian. 12) Merge ip/ip6 masquerade extensions, from Florian. This includes netdevice notifier unification. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>