summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-08-09Merge branch 'regmap-4.19' into regmap-nextMark Brown
2018-08-09Merge tag 'regmap-noinc-read' into regmap-4.19Mark Brown
regmap: Support non-incrementing registers Some devices have individual registers that don't autoincrement the register address during bulk reads but instead repeatedly read the same value, for example for monitoring GPIOs or ADCs. Add support for these.
2018-08-09regmap: Add regmap_noinc_read APICrestez Dan Leonard
The regmap API usually assumes that bulk read operations will read a range of registers but some I2C/SPI devices have certain registers for which a such a read operation will return data from an internal FIFO instead. Add an explicit API to support bulk read without range semantics. Some linux drivers use regmap_bulk_read or regmap_raw_read for such registers, for example mpu6050 or bmi150 from IIO. This only happens to work because when caching is disabled a single regmap read op will map to a single bus read op (as desired). This breaks if caching is enabled and reg+1 happens to be a cacheable register. Without regmap support refactoring a driver to enable regmap caching requires separate I2C and SPI paths. This is exactly what regmap is supposed to help avoid. Suggested-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Crestez Dan Leonard <leonard.crestez@intel.com> Signed-off-by: Stefan Popa <stefan.popa@analog.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-08-09arm64 / ACPI: clean the additional checks before calling ghes_notify_sea()Dongjiu Geng
In order to remove the additional check before calling the ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA. After this cleanup, we can simply call the ghes_notify_sea() to let APEI driver handle the SEA notification. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-08net/mlx5: Remove unused mlx5_query_vport_admin_stateEli Cohen
mlx5_query_vport_admin_state() is not used anywhere. Remove it. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/mlx5: Rename modify/query_vport state related enumsEran Ben Elisha
Modify and query vport state commands share the same admin_state and op_mod values, rename the enums to fit them both. In addition, remove the esw prefix from the admin state enum as this also applied for vnic. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08net/mlx5: Use max_num_eqs for calculation of required MSIX vectorsDenis Drozdov
New firmware has defined new HCA capability field called "max_num_eqs", that is the number of available EQs after subtracting reserved FW EQs. Before this capability the FW reported the EQ number in "log_max_eqs", the reported value also contained FW reserved EQs, but the driver might be failing to load on 320 cpus systems due to the fact that FW reserved EQs were not available to the driver. Now the driver has to obtain max_num_eqs value from new FW to get real number of EQs available. Signed-off-by: Denis Drozdov <denisd@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08i2c: core: Parse SDA hold time from firmwareAndy Shevchenko
There are two drivers already using the SDA hold time setting. It might be more in the future, thus, make I2C core to parse the setting for us if provided by firmware. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-08netfilter: nfnetlink_osf: add missing enum in nfnetlink_osf uapi headerFernando Fernandez Mancera
xt_osf_window_size_options was originally part of include/uapi/linux/netfilter/xt_osf.h, restore it. Fixes: bfb15f2a95cb ("netfilter: extract Passive OS fingerprint infrastructure from xt_osf") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-08overflow.h: Add arithmetic shift helperJason Gunthorpe
Add shift_overflow() helper to assist driver authors in ensuring that shift operations don't cause overflows or other odd conditions. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> [kees: tweaked comments and commit log, dropped unneeded assignment] Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-08Merge branches 'arm/shmobile', 'arm/renesas', 'arm/msm', 'arm/smmu', ↵Joerg Roedel
'arm/omap', 'x86/amd', 'x86/vt-d' and 'core' into next
2018-08-08nvme.h: add support for ns write protect definitionsChaitanya Kulkarni
Add various definitions from NVMe 1.3 TP 4005. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-08nvme.h: fixup ANA group descriptor formatHannes Reinecke
ANA Phase 3 draft had the 'reserved' field in the group descriptor format set to '23:17' (so that the first namespace identifier started at byte 24), but that got move with the approved TP to '31:17' (so that the first namespace identifier started at byte 32). Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-08-08regulator: maxim: Add SPDX license identifiersKrzysztof Kozlowski
Replace GPL v2.0 and v2.0+ license statements with SPDX license identifiers. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-08-08iommu: Remove the ->map_sg indirectionChristoph Hellwig
All iommu drivers use the default_iommu_map_sg implementation, and there is no good reason to ever override it. Just expose it as iommu_map_sg directly and remove the indirection, specially in our post-spectre world where indirect calls are horribly expensive. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-08-07llc: use refcount_inc_not_zero() for llc_sap_find()Cong Wang
llc_sap_put() decreases the refcnt before deleting sap from the global list. Therefore, there is a chance llc_sap_find() could find a sap with zero refcnt in this global list. Close this race condition by checking if refcnt is zero or not in llc_sap_find(), if it is zero then it is being removed so we can just treat it as gone. Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07net: phy: Add support for Broadcom Omega internal Combo GPHYArun Parameswaran
Add support for the Broadcom Omega SoC internal Combo Ethernet GPHY to the bcm7xxx phy driver. Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08Merge branch 'drm-next-4.19' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-next Fixes for 4.19: - Fix UVD 7.2 instance handling - Fix UVD 7.2 harvesting - GPU scheduler fix for when a process is killed - TTM cleanups - amdgpu CS bo_list fixes - Powerplay fixes for polaris12 and CZ/ST - DC fixes for link training certain HMDs - DC fix for vega10 blank screen in certain cases From: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180801222906.1016-1-alexander.deucher@amd.com
2018-08-07vsock: split dwork to avoid reinitializationsCong Wang
syzbot reported that we reinitialize an active delayed work in vsock_stream_connect(): ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414 WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326 The pattern is apparently wrong, we should only initialize the dealyed work once and could repeatly schedule it. So we have to move out the initializations to allocation side. And to avoid confusion, we can split the shared dwork into two, instead of re-using the same one. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com> Cc: Andy king <acking@vmware.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07net/sched: allow flower to match tunnel optionsPieter Jansen van Vuuren
Allow matching on options in Geneve tunnel headers. This makes use of existing tunnel metadata support. The options can be described in the form CLASS:TYPE:DATA/CLASS_MASK:TYPE_MASK:DATA_MASK, where CLASS is represented as a 16bit hexadecimal value, TYPE as an 8bit hexadecimal value and DATA as a variable length hexadecimal value. e.g. # ip link add name geneve0 type geneve dstport 0 external # tc qdisc add dev geneve0 ingress # tc filter add dev geneve0 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ geneve_opts 0102:80:1122334421314151/ffff:ff:ffffffffffffffff \ ip_proto udp \ action mirred egress redirect dev eth1 This patch adds support for matching Geneve options in the order supplied by the user. This leads to an efficient implementation in the software datapath (and in our opinion hardware datapaths that offload this feature). It is also compatible with Geneve options matching provided by the Open vSwitch kernel datapath which is relevant here as the Flower classifier may be used as a mechanism to program flows into hardware as a form of Open vSwitch datapath offload (sometimes referred to as OVS-TC). The netlink Kernel/Userspace API may be extended, for example by adding a flag, if other matching options are desired, for example matching given options in any order. This would require an implementation in the TC software datapath. And be done in a way that drivers that facilitate offload of the Flower classifier can reject or accept such flows based on hardware datapath capabilities. This approach was discussed and agreed on at Netconf 2017 in Seoul. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07flow_dissector: allow dissection of tunnel options from metadataSimon Horman
Allow the existing 'dissection' of tunnel metadata to 'dissect' options already present in tunnel metadata. This dissection is controlled by a new dissector key, FLOW_DISSECTOR_KEY_ENC_OPTS. This dissection only occurs when skb_flow_dissect_tunnel_info() is called, currently only the Flower classifier makes that call. So there should be no impact on other users of the flow dissector. This is in preparation for allowing the flower classifier to match on Geneve options. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07ethtool: Add WAKE_FILTER and RX_CLS_FLOW_WAKEFlorian Fainelli
Add the ability to specify through ethtool::rxnfc that a rule location is special and will be used to participate in Wake-on-LAN, by e.g: having a specific pattern be matched. When this is the case, fs->ring_cookie must be set to the special value RX_CLS_FLOW_WAKE. We also define an additional ethtool::wolinfo flag: WAKE_FILTER which can be used to configure an Ethernet adapter to allow Wake-on-LAN using previously programmed filters. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-08-07 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add cgroup local storage for BPF programs, which provides a fast accessible memory for storing various per-cgroup data like number of transmitted packets, etc, from Roman. 2) Support bpf_get_socket_cookie() BPF helper in several more program types that have a full socket available, from Andrey. 3) Significantly improve the performance of perf events which are reported from BPF offload. Also convert a couple of BPF AF_XDP samples overto use libbpf, both from Jakub. 4) seg6local LWT provides the End.DT6 action, which allows to decapsulate an outer IPv6 header containing a Segment Routing Header. Adds this action now to the seg6local BPF interface, from Mathieu. 5) Do not mark dst register as unbounded in MOV64 instruction when both src and dst register are the same, from Arthur. 6) Define u_smp_rmb() and u_smp_wmb() to their respective barrier instructions on arm64 for the AF_XDP sample code, from Brian. 7) Convert the tcp_client.py and tcp_server.py BPF selftest scripts over from Python 2 to Python 3, from Jeremy. 8) Enable BTF build flags to the BPF sample code Makefile, from Taeung. 9) Remove an unnecessary rcu_read_lock() in run_lwt_bpf(), from Taehee. 10) Several improvements to the README.rst from the BPF documentation to make it more consistent with RST format, from Tobin. 11) Replace all occurrences of strerror() by calls to strerror_r() in libbpf and fix a FORTIFY_SOURCE build error along with it, from Thomas. 12) Fix a bug in bpftool's get_btf() function to correctly propagate an error via PTR_ERR(), from Yue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07netfilter: nft_ct: add ct timeout supportHarsha Sharma
This patch allows to add, list and delete connection tracking timeout policies via nft objref infrastructure and assigning these timeout via nft rule. %./libnftnl/examples/nft-ct-timeout-add ip raw cttime tcp Ruleset: table ip raw { ct timeout cttime { protocol tcp; policy = {established: 111, close: 13 } } chain output { type filter hook output priority -300; policy accept; ct timeout set "cttime" } } %./libnftnl/examples/nft-rule-ct-timeout-add ip raw output cttime %conntrack -E [NEW] tcp 6 111 ESTABLISHED src=172.16.19.128 dst=172.16.19.1 sport=22 dport=41360 [UNREPLIED] src=172.16.19.1 dst=172.16.19.128 sport=41360 dport=22 %nft delete rule ip raw output handle <handle> %./libnftnl/examples/nft-ct-timeout-del ip raw cttime Joint work with Pablo Neira. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: remove ifdef around cttimeout in struct nf_conntrack_l4protoPablo Neira Ayuso
Simplify this, include it inconditionally in this structure layout as we do with ctnetlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: cttimeout: decouple timeout policy from nfnetlink_cttimeout objectPablo Neira Ayuso
The timeout policy is currently embedded into the nfnetlink_cttimeout object, move the policy into an independent object. This allows us to reuse part of the existing conntrack timeout extension from nf_tables without adding dependencies with the nfnetlink_cttimeout object layout. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: cttimeout: move ctnl_untimeout to nf_conntrackHarsha Sharma
As, ctnl_untimeout is required by nft_ct, so move ctnl_timeout from nfnetlink_cttimeout to nf_conntrack_timeout and rename as nf_ct_timeout. Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-07netfilter: nft_osf: use NFT_OSF_MAXGENRELEN instead of IFNAMSIZFernando Fernandez Mancera
As no "genre" on pf.os exceed 16 bytes of length, we reduce NFT_OSF_MAXGENRELEN parameter to 16 bytes and use it instead of IFNAMSIZ. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-08powerpc/64s: Fix page table fragment refcount race vs speculative referencesNicholas Piggin
The page table fragment allocator uses the main page refcount racily with respect to speculative references. A customer observed a BUG due to page table page refcount underflow in the fragment allocator. This can be caused by the fragment allocator set_page_count stomping on a speculative reference, and then the speculative failure handler decrements the new reference, and the underflow eventually pops when the page tables are freed. Fix this by using a dedicated field in the struct page for the page table fragment allocator. Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage") Cc: stable@vger.kernel.org # v3.10+ Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07cpu/hotplug: Fix SMT supported evaluationThomas Gleixner
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied on the kernel command line as it cannot differentiate between SMT disabled by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and makes the sysfs interface unusable. Rework this so that during bringup of the non boot CPUs the availability of SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a 'primary' thread then set the local cpu_smt_available marker and evaluate this explicitely right after the initial SMP bringup has finished. SMT evaulation on x86 is a trainwreck as the firmware has all the information _before_ booting the kernel, but there is no interface to query it. Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-06Merge branch 'ieee802154-for-davem-2018-08-06' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2018-08-06 An update from ieee802154 for *net-next* Romuald added a socket option to get the LQI value of the received datagram. Alexander added a new hardware simulation driver modelled after hwsim of the wireless people. It allows runtime configuration for new nodes and edges over a netlink interface (a config utlity is making its way into wpan-tools). We also have three fixes in here. One from Colin which is more of a cleanup and two from Alex fixing tailroom and single frame space problems. I would normally put the last two into my fixes tree, but given we are already in -rc8 I simply put them here and added a cc: stable to them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06Merge branch 'x86/pti-urgent' into x86/ptiThomas Gleixner
Integrate the PTI Global bit fixes which conflict with the 32bit PTI support. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-06Bluetooth: btqca: Introduce HCI_EV_VENDOR and use itMarcel Holtmann
Using HCI_VENDOR_PKT for vendor specific events does work since it has also the value 0xff, but it is actually the packet type indicator constant and not the event constant. So introduce HCI_EV_VENDOR and use it. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2018-08-06vhost: switch to use new message formatJason Wang
We use to have message like: struct vhost_msg { int type; union { struct vhost_iotlb_msg iotlb; __u8 padding[64]; }; }; Unfortunately, there will be a hole of 32bit in 64bit machine because of the alignment. This leads a different formats between 32bit API and 64bit API. What's more it will break 32bit program running on 64bit machine. So fixing this by introducing a new message type with an explicit 32bit reserved field after type like: struct vhost_msg_v2 { __u32 type; __u32 reserved; union { struct vhost_iotlb_msg iotlb; __u8 padding[64]; }; }; We will have a consistent ABI after switching to use this. To enable this capability, introduce a new ioctl (VHOST_SET_BAKCEND_FEATURE) for userspace to enable this feature (VHOST_BACKEND_F_IOTLB_V2). Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06Revert "ata: ahci_platform: allow disabling of hotplug to save power"Tejun Heo
This reverts commit aece27a2f01be4bb7683790f69cd1bed3a0929a2. Causes boot failure on some devices. http://lore.kernel.org/r/CA+G9fYuKW_jCFZPqG4tz=QY9ROfHO38KiCp9XTA+KaDOFVtcqQ@mail.gmail.com Signed-off-by: Tejun Heo <tj@kernel.org>
2018-08-06locks: add tracepoint in flock codepathJeff Layton
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2018-08-06KVM: X86: Implement PV IPIs in linux guestWanpeng Li
Implement paravirtual apic hooks to enable PV IPIs for KVM if the "send IPI" hypercall is available. The hypercall lets a guest send IPIs, with at most 128 destinations per hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: X86: Implement "send IPI" hypercallWanpeng Li
Using hypercall to send IPIs by one vmexit instead of one by one for xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster mode. Intel guest can enter x2apic cluster mode when interrupt remmaping is enabled in qemu, however, latest AMD EPYC still just supports xapic mode which can get great improvement by Exit-less IPIs. This patchset lets a guest send multicast IPIs, with at most 128 destinations per hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode. Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141): x2apic cluster mode, vanilla Dry-run: 0, 2392199 ns Self-IPI: 6907514, 15027589 ns Normal IPI: 223910476, 251301666 ns Broadcast IPI: 0, 9282161150 ns Broadcast lock: 0, 8812934104 ns x2apic cluster mode, pv-ipi Dry-run: 0, 2449341 ns Self-IPI: 6720360, 15028732 ns Normal IPI: 228643307, 255708477 ns Broadcast IPI: 0, 7572293590 ns => 22% performance boost Broadcast lock: 0, 8316124651 ns x2apic physical mode, vanilla Dry-run: 0, 3135933 ns Self-IPI: 8572670, 17901757 ns Normal IPI: 226444334, 255421709 ns Broadcast IPI: 0, 19845070887 ns Broadcast lock: 0, 19827383656 ns x2apic physical mode, pv-ipi Dry-run: 0, 2446381 ns Self-IPI: 6788217, 15021056 ns Normal IPI: 219454441, 249583458 ns Broadcast IPI: 0, 7806540019 ns => 154% performance boost Broadcast lock: 0, 9143618799 ns Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: x86: Add tlb remote flush callback in kvm_x86_ops.Tianyu Lan
This patch is to provide a way for platforms to register hv tlb remote flush callback and this helps to optimize operation of tlb flush among vcpus for nested virtualization case. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: nVMX: Introduce KVM_CAP_NESTED_STATEJim Mattson
For nested virtualization L0 KVM is managing a bit of state for L2 guests, this state can not be captured through the currently available IOCTLs. In fact the state captured through all of these IOCTLs is usually a mix of L1 and L2 state. It is also dependent on whether the L2 guest was running at the moment when the process was interrupted to save its state. With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM that is in VMX operation. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> [karahmed@ - rename structs and functions and make them ready for AMD and address previous comments. - handle nested.smm state. - rebase & a bit of refactoring. - Merge 7/8 and 8/8 into one patch. ] Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: Switch 'requests' to be 64-bit (explicitly)KarimAllah Ahmed
Switch 'requests' to be explicitly 64-bit and update BUILD_BUG_ON check to use the size of "requests" instead of the hard-coded '32'. That gives us a bit more room again for arch-specific requests as we already ran out of space for x86 due to the hard-coded check. The only exception here is ARM32 as it is still 32-bits. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06Merge tag 'v4.18-rc6' into HEADPaolo Bonzini
Pull bug fixes into the KVM development tree to avoid nasty conflicts.
2018-08-06btrfs: Get rid of the confusing btrfs_file_extent_inline_lenQu Wenruo
We used to call btrfs_file_extent_inline_len() to get the uncompressed data size of an inlined extent. However this function is hiding evil, for compressed extent, it has no choice but to directly read out ram_bytes from btrfs_file_extent_item. While for uncompressed extent, it uses item size to calculate the real data size, and ignoring ram_bytes completely. In fact, for corrupted ram_bytes, due to above behavior kernel btrfs_print_leaf() can't even print correct ram_bytes to expose the bug. Since we have the tree-checker to verify all EXTENT_DATA, such mismatch can be detected pretty easily, thus we can trust ram_bytes without the evil btrfs_file_extent_inline_len(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: remove the logged extents infrastructureJosef Bacik
This is no longer used anywhere, remove all of it. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06PM / reboot: Eliminate race between reboot and suspendPingfan Liu
At present, "systemctl suspend" and "shutdown" can run in parrallel. A system can suspend after devices_shutdown(), and resume. Then the shutdown task goes on to power off. This causes many devices are not really shut off. Hence replacing reboot_mutex with system_transition_mutex (renamed from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as system_transition_mutex can be better to reflect the purpose of the mutex. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-06aio: implement IOCB_CMD_POLLChristoph Hellwig
Simple one-shot poll through the io_submit() interface. To poll for a file descriptor the application should submit an iocb of type IOCB_CMD_POLL. It will poll the fd for the events specified in the the first 32 bits of the aio_buf field of the iocb. Unlike poll or epoll without EPOLLONESHOT this interface always works in one shot mode, that is once the iocb is completed, it will have to be resubmitted. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Avi Kivity <avi@scylladb.com>
2018-08-06Merge back cpufreq changes for 4.19.Rafael J. Wysocki
2018-08-06Merge remote-tracking branch 'net-next/master'Stefan Schmidt
2018-08-05Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/block2Jens Axboe
Pull NVMe changes from Christoph: "This contains the support for TP4004, Asymmetric Namespace Access, which makes NVMe multipathing usable in practice." * 'nvme-4.19' of git://git.infradead.org/nvme: nvmet: use Retain Async Event bit to clear AEN nvmet: support configuring ANA groups nvmet: add minimal ANA support nvmet: track and limit the number of namespaces per subsystem nvmet: keep a port pointer in nvmet_ctrl nvme: add ANA support nvme: remove nvme_req_needs_failover nvme: simplify the API for getting log pages nvme.h: add ANA definitions nvme.h: add support for the log specific field Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-05Merge tag 'v4.18-rc6' into for-4.19/block2Jens Axboe
Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk>