linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2019-02-28	s390/qeth: unconditionally clear MAC_REGISTERED flag	Julian Wiedmann
	In its attempt to run only the minimal amount of tear down steps, qeth_l2_stop_card() fails to reset the "is dev_addr registered?" flag in some rare scenarios. But a future change to the tear down sequence would cause us to _always_ hit this issue, so patch it up before that code lands. Fix it by unconditionally clearing the flag bit. This also allows us to remove the additional cleanup step in qeth_dev_layer2_store(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	s390/qeth: enable/disable the HW trap a little earlier	Julian Wiedmann
	When setting a L2 qeth device online, enable the HW trap as soon as the control plane is available. This allows us to catch any error that occurs during the very first commands. In the same spirit, the offline code should disable the HW trap as the very first step of its processing. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	s390/qeth: remove RECOVER state	Julian Wiedmann
	The offline code uses a specific RECOVER state to indicate that the interface should be brought up when a qeth device is set online again. Rather than having a specific card-state for this, just put it in an internal flag bit and set the state to DOWN. When working with the card's state transitions, this reduces the complexity quite a bit. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	net: dsa: mv88e6xxx: Fix u64 statistics	Andrew Lunn
	The switch maintains u64 counters for the number of octets sent and received. These are kept as two u32's which need to be combined. Fix the combing, which wrongly worked on u16's. Fixes: 80c4627b2719 ("dsa: mv88x6xxx: Refactor getting a single statistic") Reported-by: Chris Healy <Chris.Healy@zii.aero> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	xen-netback: don't populate the hash cache on XenBus disconnect	Igor Druzhinin
	Occasionally, during the disconnection procedure on XenBus which includes hash cache deinitialization there might be some packets still in-flight on other processors. Handling of these packets includes hashing and hash cache population that finally results in hash cache data structure corruption. In order to avoid this we prevent hashing of those packets if there are no queues initialized. In that case RCU protection of queues guards the hash cache as well. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	net/smc: allow pnetid-less configuration	Ursula Braun
	Without hardware pnetid support there must currently be a pnet table configured to determine the IB device port to be used for SMC RDMA traffic. This patch enables a setup without pnet table, if the used handshake interface belongs already to a RoCE port. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	block: optimize bvec iteration in bvec_iter_advance	Christoph Hellwig
	There is no need to only iterate in chunks of PAGE_SIZE or less in bvec_iter_advance, given that the callers pass in the chunk length that they are operating on - either that already is less than PAGE_SIZE because they do classic page-based iteration, or it is larger because the caller operates on multi-page bvecs. This should help shaving off a few cycles of the I/O hot path. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28	PCI/PME: Fix hotplug/sysfs remove deadlock in pcie_pme_remove()	Rafael J. Wysocki
	Dongdong reported a deadlock triggered by a hotplug event during a sysfs "remove" operation: pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up # echo 1 > 0000:00:0c.0/remove PME and hotplug share an MSI/MSI-X vector. The sysfs "remove" side is: remove_store pci_stop_and_remove_bus_device_locked pci_lock_rescan_remove pci_stop_and_remove_bus_device ... pcie_pme_remove pcie_pme_suspend synchronize_irq # wait for hotplug IRQ handler pci_unlock_rescan_remove The hotplug side is: pciehp_ist pciehp_handle_presence_or_link_change pciehp_configure_device pci_lock_rescan_remove # wait for pci_unlock_rescan_remove() INFO: task bash:10913 blocked for more than 120 seconds. # ps -ax \|grep D PID TTY STAT TIME COMMAND 10913 ttyAMA0 Ds+ 0:00 -bash 14022 ? D 0:00 [irq/745-pciehp] # cat /proc/14022/stack __switch_to+0x94/0xd8 pci_lock_rescan_remove+0x20/0x28 pciehp_configure_device+0x30/0x140 pciehp_handle_presence_or_link_change+0x324/0x458 pciehp_ist+0x1dc/0x1e0 # cat /proc/10913/stack __switch_to+0x94/0xd8 synchronize_irq+0x8c/0xc0 pcie_pme_suspend+0xa4/0x118 pcie_pme_remove+0x20/0x40 pcie_port_remove_service+0x3c/0x58 ... pcie_port_device_remove+0x2c/0x48 pcie_portdrv_remove+0x68/0x78 pci_device_remove+0x48/0x120 ... pci_stop_bus_device+0x84/0xc0 pci_stop_and_remove_bus_device_locked+0x24/0x40 remove_store+0xa4/0xb8 dev_attr_store+0x44/0x60 sysfs_kf_write+0x58/0x80 It is incorrect to call pcie_pme_suspend() from pcie_pme_remove() for two reasons. First, pcie_pme_suspend() calls synchronize_irq(), which will wait for the native hotplug interrupt handler as well as for the PME one, because they share one IRQ (as per the spec). That may deadlock if hotplug is signaled while pcie_pme_remove() is running and the latter calls pci_lock_rescan_remove() before the former. Second, if pcie_pme_suspend() figures out that wakeup needs to be enabled for the port, it will return without disabling the interrupt as expected by pcie_pme_remove() which was overlooked by commit c7b5a4e6e8fb ("PCI / PM: Fix native PME handling during system suspend/resume"). To fix that, rework pcie_pme_remove() to disable the PME interrupt, clear its status and prevent the PME worker function from re-enabling it before calling free_irq() on it, which should be sufficient. Fixes: c7b5a4e6e8fb ("PCI / PM: Fix native PME handling during system suspend/resume") Link: https://lore.kernel.org/linux-pci/c7697e7c-e1af-13e4-8491-0a3996e6ab5d@huawei.com Reported-by: Dongdong Liu <liudongdong3@huawei.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [bhelgaas: add URL and deadlock details from Dongdong] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-02-28	tools lib traceevent: Fix buffer overflow in arg_eval	Tony Jones
	Fix buffer overflow observed when running perf test. The overflow is when trying to evaluate "1ULL << (64 - 1)" which is resulting in -9223372036854775808 which overflows the 20 character buffer. If is possible this bug has been reported before but I still don't see any fix checked in: See: https://www.spinics.net/lists/linux-perf-users/msg07714.html Reported-by: Michael Sartain <mikesart@fastmail.com> Reported-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Tony Jones <tonyj@suse.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Fixes: f7d82350e597 ("tools/events: Add files to create libtraceevent.a") Link: http://lkml.kernel.org/r/20190228015532.8941-1-tonyj@suse.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-28	device-dax: "Hotplug" persistent memory for use like normal RAM	Dave Hansen
	This is intended for use with NVDIMMs that are physically persistent (physically like flash) so that they can be used as a cost-effective RAM replacement. Intel Optane DC persistent memory is one implementation of this kind of NVDIMM. Currently, a persistent memory region is "owned" by a device driver, either the "Direct DAX" or "Filesystem DAX" drivers. These drivers allow applications to explicitly use persistent memory, generally by being modified to use special, new libraries. (DIMM-based persistent memory hardware/software is described in great detail here: Documentation/nvdimm/nvdimm.txt). However, this limits persistent memory use to applications which have been modified. To make it more broadly usable, this driver "hotplugs" memory into the kernel, to be managed and used just like normal RAM would be. To make this work, management software must remove the device from being controlled by the "Device DAX" infrastructure: echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind and then tell the new driver that it can bind to the device: echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id After this, there will be a number of new memory sections visible in sysfs that can be onlined, or that may get onlined by existing udev-initiated memory hotplug rules. This rebinding procedure is currently a one-way trip. Once memory is bound to "kmem", it's there permanently and can not be unbound and assigned back to device_dax. The kmem driver will never bind to a dax device unless the device is explicitly bound to the driver. There are two reasons for this: One, since it is a one-way trip, it can not be undone if bound incorrectly. Two, the kmem driver destroys data on the device. Think of if you had good data on a pmem device. It would be catastrophic if you compile-in "kmem", but leave out the "device_dax" driver. kmem would take over the device and write volatile data all over your good data. This inherits any existing NUMA information for the newly-added memory from the persistent memory device that came from the firmware. On Intel platforms, the firmware has guarantees that require each socket's persistent memory to be in a separate memory-only NUMA node. That means that this patch is not expected to create NUMA nodes, but will simply hotplug memory into existing nodes. Because NUMA nodes are created, the existing NUMA APIs and tools are sufficient to create policies for applications or memory areas to have affinity for or an aversion to using this memory. There is currently some metadata at the beginning of pmem regions. The section-size memory hotplug restrictions, plus this small reserved area can cause the "loss" of a section or two of capacity. This should be fixable in follow-on patches. But, as a first step, losing 256MB of memory (worst case) out of hundreds of gigabytes is a good tradeoff vs. the required code to fix this up precisely. This calculation is also the reason we export memory_block_size_bytes(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying <ying.huang@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28	mm/resource: Let walk_system_ram_range() search child resources	Dave Hansen
	In the process of onlining memory, we use walk_system_ram_range() to find the actual RAM areas inside of the area being onlined. However, it currently only finds memory resources which are "top-level" iomem_resources. Children are not currently searched which causes it to skip System RAM in areas like this (in the format of /proc/iomem): a0000000-bfffffff : Persistent Memory (legacy) a0000000-afffffff : System RAM Changing the true->false here allows children to be searched as well. We need this because we add a new "System RAM" resource underneath the "persistent memory" resource when we use persistent memory in a volatile mode. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying <ying.huang@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28	mm/memory-hotplug: Allow memory resources to be children	Dave Hansen
	The mm/resource.c code is used to manage the physical address space. The current resource configuration can be viewed in /proc/iomem. An example of this is at the bottom of this description. The nvdimm subsystem "owns" the physical address resources which map to persistent memory and has resources inserted for them as "Persistent Memory". The best way to repurpose this for volatile use is to leave the existing resource in place, but add a "System RAM" resource underneath it. This clearly communicates the ownership relationship of this memory. The request_resource_conflict() API only deals with the top-level resources. Replace it with __request_region() which will search for !IORESOURCE_BUSY areas lower in the resource tree than the top level. We could also simply truncate the existing top-level "Persistent Memory" resource and take over the released address space. But, this means that if we ever decide to hot-unplug the "RAM" and give it back, we need to recreate the original setup, which may mean going back to the BIOS tables. This should have no real effect on the existing collision detection because the areas that truly conflict should be marked IORESOURCE_BUSY. 00000000-00000fff : Reserved 00001000-0009fbff : System RAM 0009fc00-0009ffff : Reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c97ff : Video ROM 000c9800-000ca5ff : Adapter ROM 000f0000-000fffff : Reserved 000f0000-000fffff : System ROM 00100000-9fffffff : System RAM 01000000-01e071d0 : Kernel code 01e071d1-027dfdff : Kernel data 02dc6000-0305dfff : Kernel bss a0000000-afffffff : Persistent Memory (legacy) a0000000-a7ffffff : System RAM b0000000-bffdffff : System RAM bffe0000-bfffffff : Reserved c0000000-febfffff : PCI Bus 0000:00 Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying <ying.huang@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28	mm/resource: Move HMM pr_debug() deeper into resource code	Dave Hansen
	HMM consumes physical address space for its own use, even though nothing is mapped or accessible there. It uses a special resource description (IORES_DESC_DEVICE_PRIVATE_MEMORY) to uniquely identify these areas. When HMM consumes address space, it makes a best guess about what to consume. However, it is possible that a future memory or device hotplug can collide with the reserved area. In the case of these conflicts, there is an error message in register_memory_resource(). Later patches in this series move register_memory_resource() from using request_resource_conflict() to __request_region(). Unfortunately, __request_region() does not return the conflict like the previous function did, which makes it impossible to check for IORES_DESC_DEVICE_PRIVATE_MEMORY in a conflicting resource. Instead of warning in register_memory_resource(), move the check into the core resource code itself (__request_region()) where the conflicting resource _is_ available. This has the added bonus of producing a warning in case of HMM conflicts with devices or RAM address space, as opposed to the RAM- only warnings that were there previously. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying <ying.huang@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28	mm/resource: Return real error codes from walk failures	Dave Hansen
	walk_system_ram_range() can return an error code either becuase it failed, or because the 'func' that it calls returned an error. The memory hotplug does the following: ret = walk_system_ram_range(..., func); if (ret) return ret; and 'ret' makes it out to userspace, eventually. The problem s, walk_system_ram_range() failues that result from it failing (as opposed to 'func') return -1. That leads to a very odd -EPERM (-1) return code out to userspace. Make walk_system_ram_range() return -EINVAL for internal failures to keep userspace less confused. This return code is compatible with all the callers that I audited. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying <ying.huang@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28	xen-netback: fix occasional leak of grant ref mappings under memory pressure	Igor Druzhinin
	Zero-copy callback flag is not yet set on frag list skb at the moment xenvif_handle_frag_list() returns -ENOMEM. This eventually results in leaking grant ref mappings since xenvif_zerocopy_callback() is never called for these fragments. Those eventually build up and cause Xen to kill Dom0 as the slots get reused for new mappings: "d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005" That behavior is observed under certain workloads where sudden spikes of page cache writes coexist with active atomic skb allocations from network traffic. Additionally, rework the logic to deal with frag_list deallocation in a single place. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	net: sched: pie: avoid slow division in drop probability decay	Leslie Monis
	As per RFC 8033, it is sufficient for the drop probability decay factor to have a value of (1 - 1/64) instead of 98%. This avoids the need to do slow division. Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	sctp: chunk.c: correct format string for size_t in printk	Matthias Maennich
	According to Documentation/core-api/printk-formats.rst, size_t should be printed with %zu, rather than %Zu. In addition, using %Zu triggers a warning on clang (-Wformat-extra-args): net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args] __func__, asoc, max_data); ~~~~~~~~~~~~~~~~^~~~~~~~~ ./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited' printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ ./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited' printk(fmt, ##__VA_ARGS__); \ ~~~ ^ Fixes: 5b5e0928f742 ("lib/vsprintf.c: remove %Z support") Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Matthias Maennich <maennich@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	net: netem: fix skb length BUG_ON in __skb_to_sgvec	Sheng Lan
	It can be reproduced by following steps: 1. virtio_net NIC is configured with gso/tso on 2. configure nginx as http server with an index file bigger than 1M bytes 3. use tc netem to produce duplicate packets and delay: tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90% 4. continually curl the nginx http server to get index file on client 5. BUG_ON is seen quickly [10258690.371129] kernel BUG at net/core/skbuff.c:4028! [10258690.371748] invalid opcode: 0000 [#1] SMP PTI [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2 [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202 [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002 [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028 [10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000 [10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0 [10258690.372094] Call Trace: [10258690.372094] <IRQ> [10258690.372094] skb_to_sgvec+0x11/0x40 [10258690.372094] start_xmit+0x38c/0x520 [virtio_net] [10258690.372094] dev_hard_start_xmit+0x9b/0x200 [10258690.372094] sch_direct_xmit+0xff/0x260 [10258690.372094] __qdisc_run+0x15e/0x4e0 [10258690.372094] net_tx_action+0x137/0x210 [10258690.372094] __do_softirq+0xd6/0x2a9 [10258690.372094] irq_exit+0xde/0xf0 [10258690.372094] smp_apic_timer_interrupt+0x74/0x140 [10258690.372094] apic_timer_interrupt+0xf/0x20 [10258690.372094] </IRQ> In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's linear data size and nonlinear data size, thus BUG_ON triggered. Because the skb is cloned and a part of nonlinear data is split off. Duplicate packet is cloned in netem_enqueue() and may be delayed some time in qdisc. When qdisc len reached the limit and returns NET_XMIT_DROP, the skb will be retransmit later in write queue. the skb will be fragmented by tso_fragment(), the limit size that depends on cwnd and mss decrease, the skb's nonlinear data will be split off. The length of the skb cloned by netem will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(), the BUG_ON trigger. To fix it, netem returns NET_XMIT_SUCCESS to upper stack when it clones a duplicate packet. Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()") Signed-off-by: Sheng Lan <lansheng@huawei.com> Reported-by: Qin Ji <jiqin.ji@huawei.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	clk: imx8mq: add GPIO clocks to clock tree	Anson Huang
	i.MX8MQ has clock gate for each GPIO bank, add them into clock tree for GPIO driver to manage. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-02-28	cxgb4vf: Enter debugging mode if FW is inaccessible	Arjun Vynipadath
	If we are not able to reach firmware, enter debugging mode that will help us to get adapter logs. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	cxgb4: Enable outer UDP checksum offload for T6	Arjun Vynipadath
	T6 adapters support outer UDP checksum offload for encapsulated packets, hence enabling netdev feature flag NETIF_F_GSO_UDP_TUNNEL_CSUM. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	cxgb4/cxgb4vf: Fix up netdev->hw_features	Arjun Vynipadath
	GRO is done by cxgb4/cxgb4vf. Hence set NETIF_F_GRO flag for both cxgb4/cxgb4vf. Cleaned up VLAN netdev features in cxgb4vf. Also fixed NETIF_F_HIGHDMA being set unconditionally for vlan netdev features. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28	arm64: avoid clang warning about self-assignment	Arnd Bergmann
	Building a preprocessed source file for arm64 now always produces a warning with clang because of the page_to_virt() macro assigning a variable to itself. Adding a new temporary variable avoids this issue. Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc") Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-28	staging: ks7010: removed custom Michael MIC implementation.	Jeremy Sowden
	Changed the driver to use the kernel's own implementation. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-28	staging: rtl8192e: Fix space and suspect issue	Oscar Gomez Fuente
	These changes fixed a checkpatch error for space required before the open brace '{' as well as a warning for suspect code indent for conditional statements. Signed-off-by: Oscar Gomez Fuente <oscargomezf@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-28	arm64: Kconfig.platforms: fix warning unmet direct dependencies	Anders Roxell
	When ARCH_MXC get enabled, ARM64_ERRATUM_845719 will be selected and this warning will happen when COMPAT isn't set. WARNING: unmet direct dependencies detected for ARM64_ERRATUM_845719 Depends on [n]: COMPAT [=n] Selected by [y]: - ARCH_MXC [=y] Rework to add 'if COMPAT' before ARM64_ERRATUM_845719 gets selected, since ARM64_ERRATUM_845719 depends on COMPAT. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-28	libnvdimm/btt: Fix LBA masking during 'free list' population	Vishal Verma
	The Linux BTT implementation assumes that log entries will never have the 'zero' flag set, and indeed it never sets that flag for log entries itself. However, the UEFI spec is ambiguous on the exact format of the LBA field of a log entry, specifically as to whether it should include the additional flag bits or not. While a zero bit doesn't make sense in the context of a log entry, other BTT implementations might still have it set. If an implementation does happen to have it set, we would happily read it in as the next block to write to for writes. Since a high bit is set, it pushes the block number out of the range of an 'arena', and we fail such a write with an EIO. Follow the robustness principle, and tolerate such implementations by stripping out the zero flag when populating the free list during initialization. Additionally, use the same stripped out entries for detection of incomplete writes and map restoration that happens at this stage. Add a sysfs file 'log_zero_flags' that indicates the ability to accept such a layout to userspace applications. This enables 'ndctl check-namespace' to recognize whether the kernel is able to handle zero flags, or whether it should attempt a fix-up under the --repair option. Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Dexuan Cui <decui@microsoft.com> Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28	of: unittest: unflatten device tree on UML when testing	Brendan Higgins
	UML supports enabling OF, and is useful for running the device tree tests, so add support for unflattening device tree blobs so we can actually use it. Signed-off-by: Brendan Higgins <brendanhiggins@google.com> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-02-28	lib/raid6: arm: optimize away a mask operation in NEON recovery routine	Ard Biesheuvel
	The NEON recovery code was modeled after the x86 SIMD code, and for some reason, that code uses a 16 bit wide signed shift and a mask to perform what amounts to a 8 bit unsigned shift. So fold the ops together. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-28	lib/raid6: use vdupq_n_u8 to avoid endianness warnings	ndesaulniers@google.com
	Clang warns: vector initializers are not compatible with NEON intrinsics in big endian mode [-Wnonportable-vector-initialization] While this is usually the case, it's not an issue for this case since we're initializing the uint8x16_t (16x uint8_t's) with the same value. Instead, use vdupq_n_u8 which both compilers lower into a single movi instruction: https://godbolt.org/z/vBrgzt This avoids the static storage for a constant value. Link: https://github.com/ClangBuiltLinux/linux/issues/214 Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-28	libnvdimm/btt: Remove unnecessary code in btt_freelist_init	Vishal Verma
	We call btt_log_read() twice, once to get the 'old' log entry, and again to get the 'new' entry. However, we have no use for the 'old' entry, so remove it. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28	dt-bindings: Add vendor prefix for feiyang	Jagan Teki
	Add vendor prefix for feiyang, known as Shenzhen Fly Young Technology Co.,LTD. a known producer for LCD modules. Signed-off-by: Jagan Teki <jagan@amarulasolutions.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-02-28	dt-bindings: Add vendor prefix for techstar	Jagan Teki
	Add vendor prefix for techstar, known as Shenzhen Techstar Electronics Co., Ltd. a known producer for LCD modules. Signed-off-by: Jagan Teki <jagan@amarulasolutions.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-02-28	dt-bindings: display: add missing semicolon in example	Akinobu Mita
	Add missing semicolon in example for Sitronix ST7735R display panels. Cc: Rob Herring <robh@kernel.org> Cc: Noralf Trønnes <noralf@tronnes.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Noralf Trønnes <noralf@tronnes.org> Signed-off-by: Rob Herring <robh@kernel.org>
2019-02-28	of: mark early_init_dt_alloc_reserved_memory_arch static	Christoph Hellwig
	This function is only used in of_reserved_mem.c, and never overridden despite the __weak marker. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rob Herring <robh@kernel.org>
2019-02-28	of: add dtc annotations functionality to dtx_diff	Frank Rowand
	Add -T and --annotations command line arguments to dtx_diff. These arguments will be passed through to dtc. dtc will then add source location annotations to its output. Signed-off-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-02-28	dt-bindings: hwmon: Add missing documentation for lm75	Jagan Teki
	Add missing dt-binding documentation for lm75 hwmon sensor. Signed-off-by: Jagan Teki <jagan@amarulasolutions.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-02-28	arm64: io: Hook up __io_par() for inX() ordering	Will Deacon
	Ensure that inX() provides the same ordering guarantees as readX() by hooking up __io_par() so that it maps directly to __iormb(). Reported-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-28	riscv: io: Update __io_[p]ar() macros to take an argument	Will Deacon
	The definitions of the __io_[p]ar() macros in asm-generic/io.h take the value returned by the preceding I/O read as an argument so that architectures can use this to create order with a subsequent delayX() routine using a dependency. Update the riscv barrier definitions to match, although the argument is currently unused. Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-28	asm-generic/io: Pass result of I/O accessor to __io_[p]ar()	Will Deacon
	The inX() and readX() I/O accessors must enforce ordering against subsequent calls to the delay() routines, so that a read-back from a device can be used to postpone a subsequent write to the same device. On some architectures, including arm64, this ordering can only be achieved by creating a dependency on the value returned by the I/O accessor operation, so we need to pass the value we read to the __io_par() and __io_ar() macros in these cases. Acked-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-28	perf probe: Clarify error message about not finding kernel modules debuginfo	Arnaldo Carvalho de Melo
	'perf probe' supports using just the kernel module name, but that will work only when the module is loaded, or using the full pathname to the file with the DWARF debug info, but the warning was cryptic: Before: # perf probe -m cls_flower -L fl_change Failed to find the path for cls_flower: No such file or directory Error: Failed to show lines. # After: # perf probe -m cls_flower -L fl_change Module cls_flower is not loaded, please specify its full path name. Error: Failed to show lines. # perf probe -m /lib/modules/5.0.0-rc7+/kernel/net/sched/cls_flower.ko -L fl_change \| head -7 <fl_change@/home/acme/git/linux/net/sched/cls_flower.c:0> 0 static int fl_change(struct net net, struct sk_buff in_skb, struct tcf_proto tp, unsigned long base, u32 handle, struct nlattr tca, void arg, bool ovr, struct netlink_ext_ack extack) 4 { 5 struct cls_fl_head head = rtnl_dereference(tp->root); # The behaviour doesn't change when the module is loaded: # modprobe cls_flower # perf probe -m cls_flower -L fl_change \| head -7 <fl_change@/home/acme/git/linux/net/sched/cls_flower.c:0> 0 static int fl_change(struct net net, struct sk_buff in_skb, struct tcf_proto tp, unsigned long base, u32 handle, struct nlattr tca, void arg, bool ovr, struct netlink_ext_ack extack) 4 { 5 struct cls_fl_head head = rtnl_dereference(tp->root); # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-q4njvk9mshra00jacqjbzfn5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-28	perf, bpf: Consider events with attr.bpf_event as side-band events	Song Liu
	Events with attr.bpf_event set should be considered as side-band events, as they carry information about BPF programs. Signed-off-by: Song Liu <songliubraving@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-team@fb.com Cc: netdev@vger.kernel.org Fixes: 6ee52e2a3fe4 ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT") Link: http://lkml.kernel.org/r/20190226002019.3748539-2-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-28	Merge tag 'mmc-v5.0-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix NULL ptr crash for a special test case - Align max segment size with logical block size to prevent bugs in v5.1-rc1. MMC host: - cqhci: Minor fixes - tmio: Prevent interrupt storm - tmio: Fixup SD/MMC card initialization - spi: Allow card to be detected during probe - sdhci-esdhc-imx: Fixup fix for ERR004536" * tag 'mmc-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-esdhc-imx: correct the fix of ERR004536 mmc: core: align max segment size with logical block size mmc: cqhci: Fix a tiny potential memory leak on error condition mmc: cqhci: fix space allocated for transfer descriptor mmc: core: Fix NULL ptr crash from mmc_should_fail_request mmc: tmio: fix access width of Block Count Register mmc: tmio_mmc_core: don't claim spurious interrupts mmc: spi: Fix card detection during probe
2019-02-28	Merge branch 'linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a compiler warning introduced by a previous fix, as well as two crash bugs on ARM" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: sha512/arm - fix crash bug in Thumb2 build crypto: sha256/arm - fix crash bug in Thumb2 build crypto: ccree - add missing inline qualifier
2019-02-28	kvm: properly check debugfs dentry before using it	Greg Kroah-Hartman
	debugfs can now report an error code if something went wrong instead of just NULL. So if the return value is to be used as a "real" dentry, it needs to be checked if it is an error before dereferencing it. This is now happening because of ff9fb72bc077 ("debugfs: return error values, not NULL"). syzbot has found a way to trigger multiple debugfs files attempting to be created, which fails, and then the error code gets passed to dentry_path_raw() which obviously does not like it. Reported-by: Eric Biggers <ebiggers@kernel.org> Reported-and-tested-by: syzbot+7857962b4d45e602b8ad@syzkaller.appspotmail.com Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-28	arm64: Add workaround for Fujitsu A64FX erratum 010001	Zhang Lei
	On the Fujitsu-A64FX cores ver(1.0, 1.1), memory access may cause an undefined fault (Data abort, DFSC=0b111111). This fault occurs under a specific hardware condition when a load/store instruction performs an address translation. Any load/store instruction, except non-fault access including Armv8 and SVE might cause this undefined fault. The TCR_ELx.NFD1 bit is used by the kernel when CONFIG_RANDOMIZE_BASE is enabled to mitigate timing attacks against KASLR where the kernel address space could be probed using the FFR and suppressed fault on SVE loads. Since this erratum causes spurious exceptions, which may corrupt the exception registers, we clear the TCR_ELx.NFDx=1 bits when booting on an affected CPU. Signed-off-by: Zhang Lei <zhang.lei@jp.fujitsu.com> [Generated MIDR value/mask for __cpu_setup(), removed spurious-fault handler and always disabled the NFDx bits on affected CPUs] Signed-off-by: James Morse <james.morse@arm.com> Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-28	io_uring: add io_kiocb ref count	Jens Axboe
	We'll use this for the POLL implementation. Regular requests will NOT be using references, so initialize it to 0. Any real use of the io_kiocb ref will initialize it to at least 2. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28	io_uring: add submission polling	Jens Axboe
	This enables an application to do IO, without ever entering the kernel. By using the SQ ring to fill in new sqes and watching for completions on the CQ ring, we can submit and reap IOs without doing a single system call. The kernel side thread will poll for new submissions, and in case of HIPRI/polled IO, it'll also poll for completions. By default, we allow 1 second of active spinning. This can by changed by passing in a different grace period at io_uring_register(2) time. If the thread exceeds this idle time without having any work to do, it will set: sq_ring->flags \|= IORING_SQ_NEED_WAKEUP. The application will have to call io_uring_enter() to start things back up again. If IO is kept busy, that will never be needed. Basically an application that has this feature enabled will guard it's io_uring_enter(2) call with: read_barrier(); if (*sq_ring->flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP); instead of calling it unconditionally. It's mandatory to use fixed files with this feature. Failure to do so will result in the application getting an -EBADF CQ entry when submitting IO. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28	io_uring: add file set registration	Jens Axboe
	We normally have to fget/fput for each IO we do on a file. Even with the batching we do, the cost of the atomic inc/dec of the file usage count adds up. This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes for the io_uring_register(2) system call. The arguments passed in must be an array of __s32 holding file descriptors, and nr_args should hold the number of file descriptors the application wishes to pin for the duration of the io_uring instance (or until IORING_UNREGISTER_FILES is called). When used, the application must set IOSQE_FIXED_FILE in the sqe->flags member. Then, instead of setting sqe->fd to the real fd, it sets sqe->fd to the index in the array passed in to IORING_REGISTER_FILES. Files are automatically unregistered when the io_uring instance is torn down. An application need only unregister if it wishes to register a new set of fds. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-28	net: split out functions related to registering inflight socket files	Jens Axboe
	We need this functionality for the io_uring file registration, but we cannot rely on it since CONFIG_UNIX can be modular. Move the helpers to a separate file, that's always builtin to the kernel if CONFIG_UNIX is m/y. No functional changes in this patch, just moving code around. Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>