summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-05-02spi: ep93xx: Convert to use CS GPIO descriptorsLinus Walleij
This converts the EP93xx SPI master driver to use GPIO descriptors for chip select handling. EP93xx was using platform data to pass in GPIO lines, by converting all board files to use GPIO descriptor tables the core will look up the GPIO lines from the SPI device in the same manner as for device tree. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-05-02spi: expand mode supportSowjanya Komatineni
This patch changes mode and mode_bits from u16 to u32 to allow more mode configurations. Signed-off-by: Sowjanya Komatineni <skomatineni@nvidia.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-05-02spi: spi-mem: Make spi_mem_default_supports_op() static inlineYueHaibing
Stub helper spi_mem_default_supports_op() should be set to static inline Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-05-02spi: spi-mem: Fix build error without CONFIG_SPI_MEMYueHaibing
When building with CONFIG_SPI_MEM is not set gc warns this: drivers/spi/spi-zynq-qspi.o: In function `zynq_qspi_supports_op': spi-zynq-qspi.c:(.text+0x1da): undefined reference to `spi_mem_default_supports_op' Fixes: 67dca5e580f1 ("spi: spi-mem: Add support for Zynq QSPI controller") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-05-02power: supply: core: Add CHARGE_CONTROL_{START_THRESHOLD,END_THRESHOLD} ↵Nick Crews
properties Add POWER_SUPPLY_PROP_CHARGE_CONTROL_START_THRESHOLD and POWER_SUPPLY_PROP_CHARGE_CONTROL_END_THRESHOLD properties, to expand the existing CHARGE_CONTROL_* properties. I am adding them in order to support a new Chrome OS device, but these properties should be general enough that they can be used on other devices. When the charge_type is "Custom", the charge controller uses the POWER_SUPPLY_PROP_CHARGE_CONTROL_* properties as configuration for some other algorithm. For example, in the use case that I am supporting, this means the battery begins charging when the percentage level drops below POWER_SUPPLY_PROP_CHARGE_CONTROL_START_THRESHOLD and charging ceases when the percentage level goes above POWER_SUPPLY_PROP_CHARGE_CONTROL_END_THRESHOLD. v5 changes: - Add the other missing CHARGE_CONTROL_* properties documentation in a separate commit - Split up adding the charge types and adding the POWER_SUPPLY_PROP_CHARGE_CONTROL_START_THRESHOLD and POWER_SUPPLY_PROP_CHARGE_CONTROL_END_THRESHOLD properties into two different commits. v4 changes: - Add documentation for the new properties, and add documentation for the the previously missing charge_control_limit and charge_control_limit_max properties. Signed-off-by: Nick Crews <ncrews@chromium.org> Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2019-05-02power: supply: core: Add Standard, Adaptive, and Custom charge typesNick Crews
Add "Standard", "Adaptive", and "Custom" modes to the charge_type property, to expand the existing "Trickle" and "Fast" modes. I am adding them in order to support a new Chrome OS device, but these properties should be general enough that they can be used on other devices. The meaning of "Standard" is obvious, but "Adaptive" and "Custom" are more tricky: "Adaptive" means that the charge controller uses some custom algorithm to change the charge type automatically, with no configuration needed. "Custom" means that the charge controller uses the POWER_SUPPLY_PROP_CHARGE_CONTROL_* properties as configuration for some other algorithm. v5 changes: - Split up adding the charge types and adding the POWER_SUPPLY_PROP_CHARGE_CONTROL_START_THRESHOLD and POWER_SUPPLY_PROP_CHARGE_CONTROL_END_THRESHOLD properties into two different commits. v4 changes: - Add documentation for the new properties, and add documentation for the the previously missing charge_control_limit and charge_control_limit_max properties. Signed-off-by: Nick Crews <ncrews@chromium.org> Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2019-05-01net/mlx5: E-Switch, Use atomic rep state to serialize state changeBodong Wang
When the state of rep was introduced, it was also designed to prevent duplicate unloading of the same rep. Considering the following two flows when an eswitch manager is at switchdev mode with n VF reps loaded. +--------------------------------------+--------------------------------+ | cpu-0 | cpu-1 | | -------- | -------- | | mlx5_ib_remove | mlx5_eswitch_disable_sriov | | mlx5_ib_unregister_vport_reps | esw_offloads_cleanup | | mlx5_eswitch_unregister_vport_reps | esw_offloads_unload_all_reps | | __unload_reps_all_vport | __unload_reps_all_vport | +--------------------------------------+--------------------------------+ These two flows will try to unload the same rep. Per original design, once one flow unloads the rep, the state moves to REGISTERED. The 2nd flow will no longer needs to do the unload and bails out. However, as read and write of the state is not atomic, when 1st flow is doing the unload, the state is still LOADED, 2nd flow is able to do the same unload action. Kernel crash will happen. To solve this, driver should do atomic test-and-set for the state. So that only one flow can change the rep state from LOADED to REGISTERED, and proceed to do the actual unloading. Since the state is changing to atomic type, all other read/write should be atomic action as well. Fixes: f121e0ea9586 (net/mlx5: E-Switch, Add state to eswitch vport representors) Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5: Remove unused mlx5_query_nic_vport_vlansBodong Wang
mlx5_query_nic_vport_vlans() is not used anymore. Hence remove it. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: ethtool, Add support for EEPROM high pages queryErez Alfasi
Add the support to read additional EEPROM information from high pages. Information for modules such as SFF-8436 and SFF-8636: 1) Application select table 2) User writable EEPROM 3) Thresholds and alarms Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge commit includes some misc shared code updates from mlx5-next branch needed for net-next. 1) From Aya: Enable general events on all physical link types and restrict general event handling of subtype DELAY_DROP_TIMEOUT in mlx5 rdma driver to ethernet links only as it was intended. 2) From Eli: Introduce low level bits for prio tag mode 3) From Maor: Low level steering updates to support RDMA RX flow steering and enables RoCE loopback traffic when switchdev is enabled. 4) From Vu and Parav: Two small mlx5 core cleanups 5) From Yevgeny add HW definitions of geneve offloads Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5: Fix broken hca cap offsetSaeed Mahameed
The cited commit broke the offsets of hca cap struct, fix it. While at it, cleanup a white space introduced by the same commit. Fixes: b169e64a2444 ("net/mlx5: Geneve, Add flow table capabilities for Geneve decap with TLV options") Reported-by: Qian Cai <cai@lca.pw> Cc: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01of: fix clang -Wunsequenced for be32_to_cpu()Phong Tran
Now, make the loop explicit to avoid clang warning. ./include/linux/of.h:238:37: warning: multiple unsequenced modifications to 'cell' [-Wunsequenced] r = (r << 32) | be32_to_cpu(*(cell++)); ^~ ./include/linux/byteorder/generic.h:95:21: note: expanded from macro 'be32_to_cpu' ^ ./include/uapi/linux/byteorder/little_endian.h:40:59: note: expanded from macro '__be32_to_cpu' ^ ./include/uapi/linux/swab.h:118:21: note: expanded from macro '__swab32' ___constant_swab32(x) : \ ^ ./include/uapi/linux/swab.h:18:12: note: expanded from macro '___constant_swab32' (((__u32)(x) & (__u32)0x000000ffUL) << 24) | \ ^ Signed-off-by: Phong Tran <tranmanphong@gmail.com> Reported-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/460 Suggested-by: David Laight <David.Laight@ACULAB.COM> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: stable@vger.kernel.org [robh: fix up whitespace] Signed-off-by: Rob Herring <robh@kernel.org>
2019-05-01net: ll_temac: Allow configuration of IRQ coalescingEsben Haabendal
This allows custom setup of IRQ coalescing for platforms using legacy platform_device. The irq timeout and count parameters can be used for tuning cpu load vs. latency. I have maintained the 0x00000400 bit in TX_CHNL_CTRL. It is specified as unused in the documentation I have available. It does not make any difference in the hardware I have available, so it is left in to not risk breaking other platforms where it might be used. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: ll_temac: Support indirect_mutex share within TEMAC IPEsben Haabendal
Indirect register access goes through a DCR bus bridge, which allows only one outstanding transaction. And to make matters worse, each TEMAC IP block contains two Ethernet interfaces, and although they seem to have separate registers for indirect access, they actually share the registers. Or to be more specific, MSW, LSW and CTL registers are physically shared between Ethernet interfaces in same TEMAC IP, with RDY register being (almost) specificic to the Ethernet interface. The 0x10000 bit in RDY reflects combined bus ready state though. So we need to take care to synchronize not only within a single device, but also between devices in same TEMAC IP. This commit allows to do that with legacy platform devices. For OF devices, the xlnx,compound parent of the temac node should be used to find siblings, and setup a shared indirect_mutex between them. I will leave this work to somebody else, as I don't have hardware to test that. No regression is introduced by that, as before this commit using two Ethernet interfaces in same TEMAC block is simply broken. Signed-off-by: Esben Haabendal <esben@geanix.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: ll_temac: Add support for non-native register endiannessEsben Haabendal
Replace the powerpc specific MMIO register access functions with the generic big-endian mmio access functions, and add support for little-endian access depending on configuration. Big-endian access is maintained as the default, but little-endian can be configured in device-tree binding or in platform data. The temac_ior()/temac_iow() functions are replaced with macro wrappers to avoid modifying existing code more than necessary. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: ll_temac: Extend support to non-device-tree platformsEsben Haabendal
Support initialization with platdata, so the driver can be used on non-device-tree platforms. For currently supported device-tree platforms, the driver should behave as before. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01RDMA/core: Introduce RDMA subsystem ibdev_* print functionsGal Pressman
Similarly to dev/netdev/etc printk helpers, add standard printk helpers for the RDMA subsystem. Example output: efa 0000:00:06.0 efa_0: Hello World! efa_0: Hello World! (no parent device set) (NULL ib_device): Hello World! (ibdev is NULL) Cc: Jason Baron <jbaron@akamai.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-05-01iomap: Add a page_prepare callbackAndreas Gruenbacher
Move the page_done callback into a separate iomap_page_ops structure and add a page_prepare calback to be called before the next page is written to. In gfs2, we'll want to start a transaction in page_prepare and end it in page_done. Other filesystems that implement data journaling will require the same kind of mechanism. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-01iov_iter: fix iov_iter_typeMing Lei
Commit 875f1d0769cd ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag") introduces one extra flag of ITER_BVEC_FLAG_NO_REF, and this flag is stored into iter->type. However, iov_iter_type() doesn't consider the new added flag, fix it by masking this flag in iov_iter_type(). Fixes: 875f1d0769cd ("iov_iter: add ITER_BVEC_FLAG_NO_REF flag") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-01soc: ti: Add MSI domain bus support for Interrupt AggregatorLokesh Vutla
With the system coprocessor managing the range allocation of the inputs to Interrupt Aggregator, it is difficult to represent the device IRQs from DT. The suggestion is to use MSI in such cases where devices wants to allocate and group interrupts dynamically. Create a MSI domain bus layer that allocates and frees MSIs for a device. APIs that are implemented: - ti_sci_inta_msi_create_irq_domain() that creates a MSI domain - ti_sci_inta_msi_domain_alloc_irqs() that creates MSIs for the specified device and resource. - ti_sci_inta_msi_domain_free_irqs() frees the irqs attached to the device. - ti_sci_inta_msi_get_virq() for getting the virq attached to a specific event. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-01genirq: Introduce irq_chip_{request,release}_resource_parent() apisLokesh Vutla
Introduce irq_chip_{request,release}_resource_parent() apis so that these can be used in hierarchical irqchips. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-01firmware: ti_sci: Add helper apis to manage resourcesLokesh Vutla
Each resource with in the device can be uniquely identified as defined by TISCI. Since this is generic across the devices, resource allocation also can be made generic instead of each client driver handling the resource. So add helper apis to manage the resource. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-01firmware: ti_sci: Add support for IRQ managementLokesh Vutla
TISCI abstracts the handling of IRQ routes where interrupt sources are not directly connected to host interrupt controller. Add support for the set of TISCI commands for requesting and releasing IRQs. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-01firmware: ti_sci: Add support for RM core opsLokesh Vutla
TISCI provides support for getting the resources(IRQ, RING etc..) assigned to a specific device. These resources can be handled by the client and in turn sends TISCI cmd to configure the resources. It is very important that client should keep track on usage of these resources. Add support for TISCI commands to get resource ranges. Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-01firmware: ti_sci: Add support to get TISCI handle using of_phandleGrygorii Strashko
TISCI has been updated to have support for Resource management(like interrupts etc..). And there can be multiple device instances of a resource type in a SoC. So every driver corresponding to a resource type should get a TISCI handle so that it can make TISCI calls. And each DT node corresponding to a device should exist under its corresponding bus node as per the SoC architecture. But existing apis in TISCI library assumes that all TISCI users are child nodes of TISCI. Which is not true in the above case. So introduce (devm_)ti_sci_get_by_phandle() apis that can be used by TISCI users to get TISCI handle using of phandle property. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-04-30block: remove the unused blk_queue_dma_pad functionChristoph Hellwig
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30block: add a SPDX tag to blk-mq-rdma.hChristoph Hellwig
This file has no copyright notice, but was added as part of a commit adding another file using the default kernel GPLv2 license. Add a matching SPDX tag. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30block: switch all files cleared marked as GPLv2 or later to SPDX tagsChristoph Hellwig
All these files have some form of the usual GPLv2 or later boilerplate. Switch them to use SPDX tags instead. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30block: switch all files cleared marked as GPLv2 to SPDX tagsChristoph Hellwig
All these files have some form of the usual GPLv2 boilerplate. Switch them to use SPDX tags instead. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30KVM: Introduce a new guest mapping APIKarimAllah Ahmed
In KVM, specially for nested guests, there is a dominant pattern of: => map guest memory -> do_something -> unmap guest memory In addition to all this unnecessarily noise in the code due to boiler plate code, most of the time the mapping function does not properly handle memory that is not backed by "struct page". This new guest mapping API encapsulate most of this boiler plate code and also handles guest memory that is not backed by "struct page". The current implementation of this API is using memremap for memory that is not backed by a "struct page" which would lead to a huge slow-down if it was used for high-frequency mapping operations. The API does not have any effect on current setups where guest memory is backed by a "struct page". Further patches are going to also introduce a pfn-cache which would significantly improve the performance of the memremap case. Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-30KVM: x86: Inject PMI for KVM guestLuwei Kang
Inject a PMI for KVM guest when Intel PT working in Host-Guest mode and Guest ToPA entry memory buffer was completely filled. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-30Merge tag 'kvm-s390-next-5.2-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features and fixes for 5.2 - VSIE crypto fixes - new guest features for gen15 - disable halt polling for nested virtualization with overcommit
2019-04-30Merge tag 'usb-5.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for a bunch of warnings/errors that the syzbot has been finding with it's new-found ability to stress-test the USB layer. All of these are tiny, but fix real issues, and are marked for stable as well. All of these have had lots of testing in linux-next as well" * tag 'usb-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: w1 ds2490: Fix bug caused by improper use of altsetting array USB: yurex: Fix protection fault after device removal usb: usbip: fix isoc packet num validation in get_pipe USB: core: Fix bug caused by duplicate interface PM usage counter USB: dummy-hcd: Fix failure to give back unlinked URBs USB: core: Fix unterminated string returned by usb_string()
2019-04-30block: remove the __bio_add_pc_page exportChristoph Hellwig
The same page optimization is a rather odd corner case, which is not used outside bio.c and which really should not be used outside of bio.c either - we have better highlevel helpers like the rq/bio mapping helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30block: remove the i argument to bio_for_each_segment_allChristoph Hellwig
We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30bpf: Use vmalloc special flagRick Edgecombe
Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special permissioned memory in vmalloc and remove places where memory was set RW before freeing which is no longer needed. Don't track if the memory is RO anymore because it is now tracked in vmalloc. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-19-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30mm/vmalloc: Add flag for freeing of special permsissionsRick Edgecombe
Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to immediately clear executable TLB entries before freeing pages, and handle resetting permissions on the directmap. This flag is useful for any kind of memory with elevated permissions, or where there can be related permissions changes on the directmap. Today this is RO+X and RO memory. Although this enables directly vfreeing non-writeable memory now, non-writable memory cannot be freed in an interrupt because the allocation itself is used as a node on deferred free list. So when RO memory needs to be freed in an interrupt the code doing the vfree needs to have its own work queue, as was the case before the deferred vfree list was added to vmalloc. For architectures with set_direct_map_ implementations this whole operation can be done with one TLB flush when centralized like this. For others with directmap permissions, currently only arm64, a backup method using set_memory functions is used to reset the directmap. When arm64 adds set_direct_map_ functions, this backup can be removed. When the TLB is flushed to both remove TLB entries for the vmalloc range mapping and the direct map permissions, the lazy purge operation could be done to try to save a TLB flush later. However today vm_unmap_aliases could flush a TLB range that does not include the directmap. So a helper is added with extra parameters that can allow both the vmalloc address and the direct mapping to be flushed during this operation. The behavior of the normal vm_unmap_aliases function is unchanged. Suggested-by: Dave Hansen <dave.hansen@intel.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-17-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30mm/hibernation: Make hibernation handle unmapped pagesRick Edgecombe
Make hibernate handle unmapped pages on the direct map when CONFIG_ARCH_HAS_SET_ALIAS=y is set. These functions allow for setting pages to invalid configurations, so now hibernate should check if the pages have valid mappings and handle if they are unmapped when doing a hibernate save operation. Previously this checking was already done when CONFIG_DEBUG_PAGEALLOC=y was configured. It does not appear to have a big hibernating performance impact. The speed of the saving operation before this change was measured as 819.02 MB/s, and after was measured at 813.32 MB/s. Before: [ 4.670938] PM: Wrote 171996 kbytes in 0.21 seconds (819.02 MB/s) After: [ 4.504714] PM: Wrote 178932 kbytes in 0.22 seconds (813.32 MB/s) Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-16-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/mm/cpa: Add set_direct_map_*() functionsRick Edgecombe
Add two new functions set_direct_map_default_noflush() and set_direct_map_invalid_noflush() for setting the direct map alias for the page to its default valid permissions and to an invalid state that cannot be cached in a TLB, respectively. These functions do not flush the TLB. Note, __kernel_map_pages() does something similar but flushes the TLB and doesn't reset the permission bits to default on all architectures. Also add an ARCH config ARCH_HAS_SET_DIRECT_MAP for specifying whether these have an actual implementation or a default empty one. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-15-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/modules: Avoid breaking W^X while loading modulesNadav Amit
When modules and BPF filters are loaded, there is a time window in which some memory is both writable and executable. An attacker that has already found another vulnerability (e.g., a dangling pointer) might be able to exploit this behavior to overwrite kernel code. Prevent having writable executable PTEs in this stage. In addition, avoiding having W+X mappings can also slightly simplify the patching of modules code on initialization (e.g., by alternatives and static-key), as would be done in the next patch. This was actually the main motivation for this patch. To avoid having W+X mappings, set them initially as RW (NX) and after they are set as RO set them as X as well. Setting them as executable is done as a separate step to avoid one core in which the old PTE is cached (hence writable), and another which sees the updated PTE (executable), which would break the W^X protection. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30fork: Provide a function for copying init_mmNadav Amit
Provide a function for copying init_mm. This function will be later used for setting a temporary mm. Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-6-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30uprobes: Initialize uprobes earlierNadav Amit
In order to have a separate address space for text poking, we need to duplicate init_mm early during start_kernel(). This, however, introduces a problem since uprobes functions are called from dup_mmap(), but uprobes is still not initialized in this early stage. Since uprobes initialization is necassary for fork, and since all the dependant initialization has been done when fork is initialized (percpu and vmalloc), move uprobes initialization to fork_init(). It does not seem uprobes introduces any security problem for the poking_mm. Crash and burn if uprobes initialization fails, similarly to other early initializations. Change the init_probes() name to probes_init() to match other early initialization functions name convention. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: ard.biesheuvel@linaro.org Cc: deneen.t.dock@intel.com Cc: kernel-hardening@lists.openwall.com Cc: kristen@linux.intel.com Cc: linux_dti@icloud.com Cc: will.deacon@arm.com Link: https://lkml.kernel.org/r/20190426232303.28381-6-nadav.amit@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30KVM: Introduce a 'release' method for KVM devicesCédric Le Goater
When a P9 sPAPR VM boots, the CAS negotiation process determines which interrupt mode to use (XICS legacy or XIVE native) and invokes a machine reset to activate the chosen mode. To be able to switch from one interrupt mode to another, we introduce the capability to release a KVM device without destroying the VM. The KVM device interface is extended with a new 'release' method which is called when the file descriptor of the device is closed. Once 'release' is called, the 'destroy' method will not be called anymore as the device is removed from the device list of the VM. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-30KVM: Introduce a 'mmap' method for KVM devicesCédric Le Goater
Some KVM devices will want to handle special mappings related to the underlying HW. For instance, the XIVE interrupt controller of the POWER9 processor has MMIO pages for thread interrupt management and for interrupt source control that need to be exposed to the guest when the OS has the required support. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-04-30Merge branch 'opp/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp Pull operating performance points (OPP) framework changes for v5.2 from Viresh Kumar: "This pull request contains: - New helper in OPP core to find best matching frequency for a voltage value." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: OPP: Introduce dev_pm_opp_find_freq_ceil_by_volt()
2019-04-30USB: serial: drop unused iflag macroJohan Hovold
Drop the RELEVANT_IFLAG() macro which essentially hasn't been used for over a decade except in some remnant debug printks that were recently removed. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org>
2019-04-30USB: serial: clean up throttle handlingJohan Hovold
Clean up the throttle implementation by dropping the redundant throttle_req flag which was a remnant from back when there was only a single read URB. Also convert the throttled flag to an atomic bit flag. Signed-off-by: Johan Hovold <johan@kernel.org>
2019-04-29net/mlx5: Geneve, Add flow table capabilities for Geneve decap with TLV optionsYevgeny Kliteynik
Introduce specification for Geneve decap flow with encapsulation options and allow creation of rules that are matching on Geneve TLV options. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-29net/mlx5: Geneve, Add basic Geneve encap/decap flow table capabilitiesYevgeny Kliteynik
Introduce support for Geneve flow specification and allow the creation of rules that are matching on basic Geneve protocol fields: VNI, OAM bit, protocol type, options length. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-29net/mlx5: Eswitch, enable RoCE loopback trafficMaor Gottlieb
When in switchdev mode, we would like to treat loopback RoCE traffic (on eswitch manager) as RDMA and not as regular Ethernet traffic In order to enable it we add flow steering rule that forward RoCE loopback traffic to the HW RoCE filter (by adding allow rule). In addition we add RoCE address in GID index 0, which will be set in the RoCE loopback packet. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>