summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-06-23net: ena: update ena driver to version 1.2.0Netanel Belgazal
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: update driver's rx drop statisticsNetanel Belgazal
rx drop counter is reported by the device in the keep-alive event. update the driver's counter with the device counter. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: use lower_32_bits()/upper_32_bits() to split dma addressNetanel Belgazal
In ena_com_mem_addr_set(), use the above functions to split dma address to the lower 32 bits and the higher 16 bits. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: separate skb allocation to dedicated functionNetanel Belgazal
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: use napi_schedule_irqoff when possibleNetanel Belgazal
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: allow the driver to work with small number of msix vectorsNetanel Belgazal
Current driver tries to allocate msix vectors as the number of the negotiated io queues. (with another msix vector for management). If pci_alloc_irq_vectors() fails, the driver aborts the probe and the ENA network device is never brought up. With this patch, the driver's logic will reduce the number of IO queues to the number of allocated msix vectors (minus one for management) instead of failing probe(). Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: add support for out of order rx buffers refillNetanel Belgazal
ENA driver post Rx buffers through the Rx submission queue for the ENA device to fill them with receive packets. Each Rx buffer is marked with req_id in the Rx descriptor. Newer ENA devices could consume the posted Rx buffer in out of order, and as result the corresponding Rx completion queue will have Rx completion descriptors with non contiguous req_id(s) In this change the driver holds two rings. The first ring (called free_rx_ids) is a mapping ring. It holds all the unused request ids. The values in this ring are from 0 to ring_size -1. When the driver wants to allocate a new Rx buffer it uses the head of free_rx_ids and uses it's value as the index for rx_buffer_info ring. The req_id is also written to the Rx descriptor Upon Rx completion, The driver took the req_id from the completion descriptor and uses it as index in rx_buffer_info. The req_id is then return to the free_rx_ids ring. This patch also adds statistics to inform when the driver receive out of range or unused req_id. Note: free_rx_ids is only accessible from the napi handler, so no locking is required Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: add reset reason for each device FLRNetanel Belgazal
For each device reset, log to the device what is the cause the reset occur. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: change sizeof() argument to be the type pointerNetanel Belgazal
Instead of using: memset(ptr, 0x0, sizeof(struct ...)) use: memset(ptr, 0x0, sizeor(*ptr)) Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: add hardware hints capability to the driverNetanel Belgazal
With this patch, ENA device can update the ena driver about the desired timeout values: These values are part of the "hardware hints" which are transmitted to the driver as Asynchronous event through ENA async event notification queue. In case the ENA device does not support this capability, the driver will use its own default values. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: change return value for unsupported features unsupported return valueNetanel Belgazal
return -EOPNOTSUPP instead of -EPERM. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2017-06-23 1) Fix xfrm garbage collecting when unregistering a netdevice. From Hangbin Liu. 2) Fix NULL pointer derefernce when exiting a network namespace. From Hangbin Liu. 3) Fix some error codes in pfkey to prevent a NULL pointer derefernce. From Dan Carpenter. 4) Fix NULL pointer derefernce on allocation failure in pfkey. From Dan Carpenter. 5) Adjust IPv6 payload_len to include extension headers. Otherwise we corrupt the packets when doing ESP GRO on transport mode. From Yossi Kuperman. 6) Set nhoff to the proper offset of the IPv6 nexthdr when doing ESP GRO. From Yossi Kuperman. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23tcp: fix out-of-bounds access in ULP sysctlJakub Kicinski
KASAN reports out-of-bound access in proc_dostring() coming from proc_tcp_available_ulp() because in case TCP ULP list is empty the buffer allocated for the response will not have anything printed into it. Set the first byte to zero to avoid strlen() going out-of-bounds. Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23sit: use __GFP_NOWARN for user controlled allocationWANG Cong
The memory allocation size is controlled by user-space, if it is too large just fail silently and return NULL, not to mention there is a fallback allocation later. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23bpf: possibly avoid extra masking for narrower load in verifierYonghong Song
Commit 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") permits narrower load for certain ctx fields. The commit however will already generate a masking even if the prog-specific ctx conversion produces the result with narrower size. For example, for __sk_buff->protocol, the ctx conversion loads the data into register with 2-byte load. A narrower 2-byte load should not generate masking. For __sk_buff->vlan_present, the conversion function set the result as either 0 or 1, essentially a byte. The narrower 2-byte or 1-byte load should not generate masking. To avoid unnecessary masking, prog-specific *_is_valid_access now passes converted_op_size back to verifier, which indicates the valid data width after perceived future conversion. Based on this information, verifier is able to avoid unnecessary marking. Since we want more information back from prog-specific *_is_valid_access checking, all of them are packed into one data structure for more clarity. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: stmmac: make some functions staticColin Ian King
The functions dwmac4_dma_init_rx_chan, dwmac4_dma_init_tx_chan and dwmac4_dma_init_channel do not need to be in global scope, so them static. Cleans up sparse warnings: "symbol 'dwmac4_dma_init_rx_chan' was not declared. Should it be static?" "symbol 'dwmac4_dma_init_tx_chan' was not declared. Should it be static?" "symbol 'dwmac4_dma_init_channel' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23mtd: partitions: fixup some allocate_partition() whitespaceBrian Norris
Some recent patches caused churn around this area, and checkpatch noticed the existing issues. Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2017-06-23mtd: parsers: trx: fix pr_err format for printing offsetRafał Miłecki
This fixes following warning: include/linux/kern_levels.h:4:18: warning: format '%X' expects argument of type 'unsigned int', but argument 2 has type 'size_t {aka long unsigned int}' [-Wformat=] Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2017-06-23Merge branch 'xdp-offload-mode'David S. Miller
Jakub Kicinski says: ==================== xdp: offload mode While we discuss the representors.. :) This set adds XDP flag for forcing offload and a attachment mode for reporting to user space that program has been offloaded. The nfp driver is modified to make use of the new flags, but also to adhere to the DRV_MODE flag which should disable the HW offload. The intended driver behaviour is: DRV mode offload no flags yes attempted DRV_MODE yes no HW_MODE no yes Where 'yes' means required, and error will be returned if setup fails. 'Attempted' means the offload will only happen automatically if HW is capable and offloading the program will cause no change in system behaviour (e.g. maps don't have to bound). Thanks to loading the program both to the driver and HW by default we can fallback to the driver mode without disruption in case user replaces the program with one which cannot be offloaded later. Note that the NFP driver currently claims XDP offload support but lacks most basic features like direct packet access. Only change compared to the RFC is fixing the double bpf_prog_put() which Daniel has spotted (patch 5). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23nfp: xdp: report if program is offloadedJakub Kicinski
Make use of just added XDP_ATTACHED_HW. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23xdp: add reporting of offload modeJakub Kicinski
Extend the XDP_ATTACHED_* values to include offloaded mode. Let drivers report whether program is installed in the driver or the HW by changing the prog_attached field from bool to u8 (type of the netlink attribute). Exploit the fact that the value of XDP_ATTACHED_DRV is 1, therefore since all drivers currently assign the mode with double negation: mode = !!xdp_prog; no drivers have to be modified. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23nfp: bpf: add support for XDP_FLAGS_HW_MODEJakub Kicinski
Respect the XDP_FLAGS_HW_MODE. When it's set install the program on the NIC and skip enabling XDP in the driver. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23nfp: bpf: release the reference on offloaded programsJakub Kicinski
The xdp_prog member of the adapter's data path structure is used for XDP in driver mode. In case a XDP program is loaded with in HW-only mode, we need to store it somewhere else. Add a new XDP prog pointer in the main structure and use that when we need to know whether any XDP program is loaded, not only a driver mode one. Only release our reference on adapter free instead of immediately after netdev unregister to allow offload to be disabled first. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23nfp: bpf: don't offload XDP programs in DRV_MODEJakub Kicinski
DRV_MODE means that user space wants the program to be run in the driver. Do not try to offload. Only offload if no mode flags have been specified. Remember what the mode is when the program is installed and refuse new setup requests if there is already a program loaded in a different mode. This should leave it open for us to implement simultaneous loading of two programs - one in the drv path and another to the NIC later. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23nfp: xdp: move driver XDP setup into a separate functionJakub Kicinski
In preparation of XDP offload flags move the driver setup into a function. Otherwise the number of conditions in one function would make it slightly hard to follow. The offload handler may now be called with NULL prog, even if no offload is currently active, but that's fine, offload code can handle that. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23xdp: add HW offload mode flag for installing programsJakub Kicinski
Add an installation-time flag for requesting that the program be installed only if it can be offloaded to HW. Internally new command for ndo_xdp is added, this way we avoid putting checks into drivers since they all return -EINVAL on an unknown command. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23xdp: pass XDP flags into install handlersJakub Kicinski
Pass XDP flags to the xdp ndo. This will allow drivers to look at the mode flags and make decisions about offload. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23staging: wlan-ng: Fix struct definition's and variable typeSuniel Mahesh
le16_to_cpu() accepts argument of type __le16 and cpu_to_le16() returns an argument of type __le16. This patch fixes: (a) the type of the variable that end's up getting return from cpu_to_le16(). (b) the member types of struct hfa384x_host_scan_request_data, struct hfa384x_bytestr32 and struct hfa384x_hscan_result_sub. The following type mismatch warnings reported by sparse have been fixed: warning: incorrect type in assignment (different base types) warning: cast to restricted __le16 Signed-off-by: Suniel Mahesh <sunil.m@techveda.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-23staging: rtl8723bs: Remove unnecessary cast in kfreeAmitoj Kaur Chawla
Remove unnecassary casts in the argument to kfree. Found using Coccinelle. The semantic patch used to find this is as follows: //<smpl> @@ type T; expression *f; @@ - kfree((T *)(f)); + kfree(f); //</smpl> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-23staging: rtl8723bs: hal: Use (true/false) in assignment to boolsimran singhal
This patch assigns (true/false) to boolean EDCCA_State instead of (1/0). And, there is no need of comparing EDCCA_State explicitly with constant 1. Signed-off-by: simran singhal <singhalsimran0@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-23staging: sm750fb: change default screen resolutionSudip Mukherjee
The previous patch which updated screen resolution was tested under wrong environment. sm750 driver does not support 24bpp. It only supports 8bpp, 16bpp and 32bpp. Lets update the default screen resolution to use 32bpp for a better screen performance. Fixes: ac669251087d ("staging: sm750fb: change default screen resolution") Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-23staging: fb_xgi: vb_table: Remove white space after tabstopMatthew Reed
Remove white space after tabstop on closing bracket as suggested by checkpatch.pl. Additional white space removed for page consistency. Signed-off-by: Matthew Reed <4d5452@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-23net: account for current skb length when deciding about UFOMichal Kubeček
Our customer encountered stuck NFS writes for blocks starting at specific offsets w.r.t. page boundary caused by networking stack sending packets via UFO enabled device with wrong checksum. The problem can be reproduced by composing a long UDP datagram from multiple parts using MSG_MORE flag: sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 3000, 0, ...); Assume this packet is to be routed via a device with MTU 1500 and NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(), this condition is tested (among others) to decide whether to call ip_ufo_append_data(): ((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb)) At the moment, we already have skb with 1028 bytes of data which is not marked for GSO so that the test is false (fragheaderlen is usually 20). Thus we append second 1000 bytes to this skb without invoking UFO. Third sendto(), however, has sufficient length to trigger the UFO path so that we end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb() uses udp_csum() to calculate the checksum but that assumes all fragments have correct checksum in skb->csum which is not true for UFO fragments. When checking against MTU, we need to add skb->len to length of new segment if we already have a partially filled skb and fragheaderlen only if there isn't one. In the IPv6 case, skb can only be null if this is the first segment so that we have to use headersize (length of the first IPv6 header) rather than fragheaderlen (length of IPv6 header of further fragments) for skb == NULL. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Fixes: e4c5e13aa45c ("ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernelsMark Rutland
When a kernel is built without CONFIG_ARM64_MODULE_PLTS, we don't generate the expected branch instruction in ftrace_make_nop(). This means we pass zero (rather than a valid branch) to ftrace_modify_code() as the expected instruction to validate. This causes us to return -EINVAL to the core ftrace code for a valid case, resulting in a splat at boot time. This was an unintended effect of commit: 687644209a6e9557 ("arm64: ftrace: fix building without CONFIG_MODULES") ... which incorrectly moved the generation of the branch instruction into the ifdef for CONFIG_ARM64_MODULE_PLTS. This patch fixes the issue by moving the ifdef inside of the relevant if-else case, and always checking that the branch is in range, regardless of CONFIG_ARM64_MODULE_PLTS. This ensures that we generate the expected branch instruction, and also improves our sanity checks. For consistency, both ftrace_make_nop() and ftrace_make_call() are updated with this pattern. Fixes: 687644209a6e9557 ("arm64: ftrace: fix building without CONFIG_MODULES") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23arm64: signal: Allow expansion of the signal frameDave Martin
This patch defines an extra_context signal frame record that can be used to describe an expanded signal frame, and modifies the context block allocator and signal frame setup and parsing code to create, populate, parse and decode this block as necessary. To avoid abuse by userspace, parse_user_sigframe() attempts to ensure that: * no more than one extra_context is accepted; * the extra context data is a sensible size, and properly placed and aligned. The extra_context data is required to start at the first 16-byte aligned address immediately after the dummy terminator record following extra_context in rt_sigframe.__reserved[] (as ensured during signal delivery). This serves as a sanity-check that the signal frame has not been moved or copied without taking the extra data into account. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [will: add __force annotation when casting extra_datap to __user pointer] Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126Geetha Sowjanya
Cavium ThunderX2 SMMU doesn't support MSI and also doesn't have unique irq lines for gerror, eventq and cmdq-sync. New named irq "combined" is set as a errata workaround, which allows to share the irq line by register single irq handler for all the interrupts. Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Geetha sowjanya <gakula@caviumnetworks.com> [will: reworked irq equality checking and added SPI check] Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum ↵shameer
161010701) HiSilicon SMMUv3 on Hip06/Hip07 platforms doesn't support CMD_PREFETCH command. The dt based support for this quirk is already present in the driver(hisilicon,broken-prefetch-cmd). This adds ACPI support for the quirk using the IORT smmu model number. Signed-off-by: shameer <shameerali.kolothum.thodi@huawei.com> Signed-off-by: hanjun <guohanjun@huawei.com> [will: rewrote patch] Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74Linu Cherian
Cavium ThunderX2 SMMU implementation doesn't support page 1 register space and PAGE0_REGS_ONLY option is enabled as an errata workaround. This option when turned on, replaces all page 1 offsets used for EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets. SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY, since resource size can be either 64k/128k. For this, arm_smmu_device_dt_probe/acpi_probe has been moved before platform_get_resource call, so that SMMU options are set beforehand. Signed-off-by: Linu Cherian <linu.cherian@cavium.com> Signed-off-by: Geetha Sowjanya <geethasowjanya.akula@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 modelLinu Cherian
Cavium ThunderX2 implementation doesn't support second page in SMMU register space. Hence, resource size is set as 64k for this model. Signed-off-by: Linu Cherian <linu.cherian@cavium.com> Signed-off-by: Geetha Sowjanya <geethasowjanya.akula@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number ↵Robert Richter
definitions The model number is already defined in acpica and we are actually waiting for the acpi maintainers to include it: https://github.com/acpica/acpica/commit/d00a4eb86e64 Adding those temporary definitions until the change makes it into include/acpi/actbl2.h. Once that is done this patch can be reverted. Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing tableWill Deacon
When writing a new table entry, we must ensure that the contents of the table is made visible to the SMMU page table walker before the updated table entry itself. This is currently achieved using wmb(), which expands to an expensive and unnecessary DSB instruction. Ideally, we'd just use cmpxchg64_release when writing the table entry, but this doesn't have memory ordering semantics on !SMP systems. Instead, use dma_wmb(), which emits DMB OSHST. Strictly speaking, this does more than we require (since it targets the outer-shareable domain), but it's likely to be significantly faster than the DSB approach. Reported-by: Linu Cherian <linu.cherian@cavium.com> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAEWill Deacon
The LPAE/ARMv8 page table format relies on the ability to read and write 64-bit page table entries in an atomic fashion. With the move to a lockless implementation, we also need support for cmpxchg64 to resolve races when installing table entries concurrently. Unfortunately, not all architectures support cmpxchg64, so the code can fail to compiler when building for these architectures using COMPILE_TEST. Rather than disable COMPILE_TEST altogether, instead check that GENERIC_ATOMIC64 is not selected, which is a reasonable indication that the architecture has support for 64-bit cmpxchg. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/arm-smmu-v3: Remove io-pgtable spinlockRobin Murphy
As for SMMUv2, take advantage of io-pgtable's newfound tolerance for concurrency. Unfortunately in this case the command queue lock remains a point of serialisation for the unmap path, but there may be a little more we can do to ameliorate that in future. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/arm-smmu: Remove io-pgtable spinlockRobin Murphy
With the io-pgtable code now robust against (valid) races, we no longer need to serialise all operations with a lock. This might make broken callers who issue concurrent operations on overlapping addresses go even more wrong than before, but hey, they already had little hope of useful or deterministic results. We do however still have to keep a lock around to serialise the ATS1* translation ops, as parallel iova_to_phys() calls could lead to unpredictable hardware behaviour otherwise. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/io-pgtable-arm-v7s: Support lockless operationRobin Murphy
Mirroring the LPAE implementation, rework the v7s code to be robust against concurrent operations. The same two potential races exist, and are solved in the same manner, with the fixed 2-level structure making life ever so slightly simpler. What complicates matters compared to LPAE, however, is large page entries, since we can't update a block of 16 PTEs atomically, nor assume available software bits to do clever things with. As most users are never likely to do partial unmaps anyway (due to DMA API rules), it doesn't seem unreasonable for this case to remain behind a serialising lock; we just pull said lock down into the bowels of the implementation so it's well out of the way of the normal call paths. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/io-pgtable-arm: Support lockless operationRobin Murphy
For parallel I/O with multiple concurrent threads servicing the same device (or devices, if several share a domain), serialising page table updates becomes a massive bottleneck. On reflection, though, we don't strictly need to do that - for valid IOMMU API usage, there are in fact only two races that we need to guard against: multiple map requests for different blocks within the same region, when the intermediate-level table for that region does not yet exist; and multiple unmaps of different parts of the same block entry. Both of those are fairly easily solved by using a cmpxchg to install the new table, such that if we then find that someone else's table got there first, we can simply free ours and continue. Make the requisite changes such that we can withstand being called without the caller maintaining a lock. In theory, this opens up a few corners in which wildly misbehaving callers making nonsensical overlapping requests might lead to crashes instead of just unpredictable results, but correct code really does not deserve to pay a significant performance cost for the sake of masking bugs in theoretical broken code. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/io-pgtable: Introduce explicit coherencyRobin Murphy
Once we remove the serialising spinlock, a potential race opens up for non-coherent IOMMUs whereby a caller of .map() can be sure that cache maintenance has been performed on their new PTE, but will have no guarantee that such maintenance for table entries above it has actually completed (e.g. if another CPU took an interrupt immediately after writing the table entry, but before initiating the DMA sync). Handling this race safely will add some potentially non-trivial overhead to installing a table entry, which we would much rather avoid on coherent systems where it will be unnecessary, and where we are stirivng to minimise latency by removing the locking in the first place. To that end, let's introduce an explicit notion of cache-coherency to io-pgtable, such that we will be able to avoid penalising IOMMUs which know enough to know when they are coherent. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/io-pgtable-arm-v7s: Refactor split_blk_unmapRobin Murphy
Whilst the short-descriptor format's split_blk_unmap implementation has no need to be recursive, it followed the pattern of the LPAE version anyway for the sake of consistency. With the latter now reworked for both efficiency and future scalability improvements, tweak the former similarly, not least to make it less obtuse. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/io-pgtable-arm: Improve split_blk_unmapRobin Murphy
The current split_blk_unmap implementation suffers from some inscrutable pointer trickery for creating the tables to replace the block entry, but more than that it also suffers from hideous inefficiency. For example, the most pathological case of unmapping a level 3 page from a level 1 block will allocate 513 lower-level tables to remap the entire block at page granularity, when only 2 are actually needed (the rest can be covered by level 2 block entries). Also, we would like to be able to relax the spinlock requirement in future, for which the roll-back-and-try-again logic for race resolution would be pretty hideous under the current paradigm. Both issues can be resolved most neatly by turning things sideways: instead of repeatedly recursing into __arm_lpae_map() map to build up an entire new sub-table depth-first, we can directly replace the block entry with a next-level table of block/page entries, then repeat by unmapping at the next level if necessary. With a little refactoring of some helper functions, the code ends up not much bigger than before, but considerably easier to follow and to adapt in future. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23iommu/io-pgtable-arm-v7s: Check table PTEs more preciselyRobin Murphy
Whilst we don't support the PXN bit at all, so should never encounter a level 1 section or supersection PTE with it set, it would still be wise to check both table type bits to resolve any theoretical ambiguity. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>