linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-11-26	Merge branch 'core-rcu-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Dynamic tick (nohz) updates, perhaps most notably changes to force the tick on when needed due to lengthy in-kernel execution on CPUs on which RCU is waiting. - Linux-kernel memory consistency model updates. - Replace rcu_swap_protected() with rcu_prepace_pointer(). - Torture-test updates. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer() net/sched: Replace rcu_swap_protected() with rcu_replace_pointer() net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer() net/core: Replace rcu_swap_protected() with rcu_replace_pointer() bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer() fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer() drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer() drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer() x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer() rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() rcu: Suppress levelspread uninitialized messages rcu: Fix uninitialized variable in nocb_gp_wait() rcu: Update descriptions for rcu_future_grace_period tracepoint rcu: Update descriptions for rcu_nocb_wake tracepoint rcu: Remove obsolete descriptions for rcu_barrier tracepoint rcu: Ensure that ->rcu_urgent_qs is set before resched IPI workqueue: Convert for_each_wq to use built-in list check rcu: Several rcu_segcblist functions can be static rcu: Remove unused function hlist_bl_del_init_rcu() Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch() ...
2019-11-26	Merge branch 'sched-core-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The biggest changes in this cycle were: - Make kcpustat vtime aware (Frederic Weisbecker) - Rework the CFS load_balance() logic (Vincent Guittot) - Misc cleanups, smaller enhancements, fixes. The load-balancing rework is the most intrusive change: it replaces the old heuristics that have become less meaningful after the introduction of the PELT metrics, with a grounds-up load-balancing algorithm. As such it's not really an iterative series, but replaces the old load-balancing logic with the new one. We hope there are no performance regressions left - but statistically it's highly probable that there is going to be some workload that is hurting from these chnages. If so then we'd prefer to have a look at that workload and fix its scheduling, instead of reverting the changes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) rackmeter: Use vtime aware kcpustat accessor leds: Use all-in-one vtime aware kcpustat accessor cpufreq: Use vtime aware kcpustat accessors for user time procfs: Use all-in-one vtime aware kcpustat accessor sched/vtime: Bring up complete kcpustat accessor sched/cputime: Support other fields on kcpustat_field() sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util() sched/fair: Add comments for group_type and balancing at SD_NUMA level sched/fair: Fix rework of find_idlest_group() sched/uclamp: Fix overzealous type replacement sched/Kconfig: Fix spelling mistake in user-visible help text sched/core: Further clarify sched_class::set_next_task() sched/fair: Use mul_u32_u32() sched/core: Simplify sched_class::pick_next_task() sched/core: Optimize pick_next_task() sched/core: Make pick_next_task_idle() more consistent sched/fair: Better document newidle_balance() leds: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM cpufreq: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM procfs: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM ...
2019-11-26	Merge branch 'efi-core-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Wire up the EFI RNG code for x86. This enables an additional source of entropy during early boot. - Enable the TPM event log code on ARM platforms. - Update Ard's email address" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: libstub/tpm: enable tpm eventlog function for ARM platforms x86: efi/random: Invoke EFI_RNG_PROTOCOL to seed the UEFI RNG table efi/random: use arch-independent efi_call_proto() MAINTAINERS: update Ard's email address to @kernel.org
2019-11-26	net: phy: dp83869: Fix return paths to return proper values	Dan Murphy
	Fix the return paths for all I/O operations to ensure that the I/O completed successfully. Then pass the return to the caller for further processing Fixes: 01db923e8377 ("net: phy: dp83869: Add TI dp83869 phy") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26	net: usbnet: Fix -Wcast-function-type	Phong Tran
	correct usage prototype of callback in tasklet_init(). Report by https://github.com/KSPP/linux/issues/20 Signed-off-by: Phong Tran <tranmanphong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26	net: hso: Fix -Wcast-function-type	Phong Tran
	correct usage prototype of callback in tasklet_init(). Report by https://github.com/KSPP/linux/issues/20 Signed-off-by: Phong Tran <tranmanphong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26	ibmvnic: Serialize device queries	Thomas Falcon
	Provide some serialization for device CRQ commands and queries to ensure that the shared variable used for storing return codes is properly synchronized. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26	ibmvnic: Bound waits for device queries	Thomas Falcon
	Create a wrapper for wait_for_completion calls with additional driver checks to ensure that the driver does not wait on a disabled device. In those cases or if the device does not respond in an extended amount of time, this will allow the driver an opportunity to recover. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26	ibmvnic: Terminate waiting device threads after loss of service	Thomas Falcon
	If we receive a notification that the device has been deactivated or removed, force a completion of all waiting threads. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26	ibmvnic: Fix completion structure initialization	Thomas Falcon
	Fix multiple calls to init_completion for device completion structures. Instead, initialize them during device probe and reinitialize them later as needed. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26	lkdtm: Add a DOUBLE_FAULT crash type on x86	Andy Lutomirski
	The DOUBLE_FAULT crash does INT $8, which is a decent approximation of a double fault. This is useful for testing the double fault handling. Use it like: Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26	sr_vendor: support Beurer GL50 evo CD-on-a-chip devices.	Diego Elio Pettenò
	The Beurer GL50 evo uses a Cygnal-manufactured CD-on-a-chip that only accepts a subset of SCSI commands, and supports neither audio commands nor generic packet commands. Actually sending those commands bring the device to an unrecoverable state that causes the device to hang and reset. To: Jens Axboe <axboe@kernel.dk> Cc: linux-kernel@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26	cdrom: respect device capabilities during opening action	Diego Elio Pettenò
	Reading the TOC only works if the device can play audio, otherwise these commands fail (and possibly bring the device to an unhealthy state.) Similarly, cdrom_mmc3_profile() should only be called if the device supports generic packet commands. To: Jens Axboe <axboe@kernel.dk> Cc: linux-kernel@vger.kernel.org Cc: linux-scsi@vger.kernel.org Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26	of: unittest: fix memory leak in attach_node_and_children	Erhard Furtner
	In attach_node_and_children memory is allocated for full_name via kasprintf. If the condition of the 1st if is not met the function returns early without freeing the memory. Add a kfree() to fix that. This has been detected with kmemleak: Link: https://bugzilla.kernel.org/show_bug.cgi?id=205327 It looks like the leak was introduced by this commit: Fixes: 5babefb7f7ab ("of: unittest: allow base devicetree to have symbol metadata") Signed-off-by: Erhard Furtner <erhard_f@mailbox.org> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-11-26	of: overlay: add_changeset_property() memory leak	Frank Rowand
	No changeset entries are created for #address-cells and #size-cells properties, but the duplicated properties are never freed. This results in a memory leak which is detected by kmemleak: unreferenced object 0x85887180 (size 64): backtrace: kmem_cache_alloc_trace+0x1fb/0x1fc __of_prop_dup+0x25/0x7c add_changeset_property+0x17f/0x370 build_changeset_next_level+0x29/0x20c of_overlay_fdt_apply+0x32b/0x6b4 ... Fixes: 6f75118800ac ("of: overlay: validate overlay properties #address-cells and #size-cells") Reported-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Frank Rowand <frank.rowand@sony.com> Tested-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-11-26	PCI: Remove PCI_MSI_IRQ_DOMAIN architecture whitelist	Palmer Dabbelt
	The only apparent reason for the PCI_MSI_IRQ_DOMAIN architecture whitelist was that it requires msi.h. Now that msi.h is mandatory in asm-generic/Kbuild, every arch should have at least the default version, so remove the whitelist. Built for all the architectures that play nice with make.cross, but not boot tested anywhere. Link: https://lore.kernel.org/r/514e7b040be8ccd69088193aba260da1b89e919c.1571983829.git.michal.simek@xilinx.com Signed-off-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Waiman Long <longman@redhat.com>
2019-11-26	Revert "nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"	Jian-Hong Pan
	Since e045fa29e893 ("PCI/MSI: Fix incorrect MSI-X masking on resume") is merged, we can revert the previous quirk now. This reverts commit 19ea025e1d28c629b369c3532a85b3df478cc5c6. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204887 Fixes: 19ea025e1d28 ("nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T") Link: https://lore.kernel.org/r/20191031093408.9322-1-jian-hong@endlessm.com Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org
2019-11-26	PCI/MSI: Fix incorrect MSI-X masking on resume	Jian-Hong Pan
	When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector Control register for each vector and saves it in desc->masked. Each register is 32 bits and bit 0 is the actual Mask bit. When we restored these registers during resume, we previously set the Mask bit if any bit in desc->masked was set instead of when the Mask bit itself was set: pci_restore_state pci_restore_msi_state __pci_restore_msix_state for_each_pci_msi_entry msix_mask_irq(entry, entry->masked) <-- entire u32 word __pci_msix_desc_mask_irq(desc, flag) mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT if (flag) <-- testing entire u32, not just bit 0 mask_bits \|= PCI_MSIX_ENTRY_CTRL_MASKBIT writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL) This means that after resume, MSI-X vectors were masked when they shouldn't be, which leads to timeouts like this: nvme nvme0: I/O 978 QID 3 timeout, completion polled On resume, set the Mask bit only when the saved Mask bit from suspend was set. This should remove the need for 19ea025e1d28 ("nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T"). [bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()] Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887 Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com Fixes: f2440d9acbe8 ("PCI MSI: Refactor interrupt masking code") Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org
2019-11-26	PCI/MSI: Move power state check out of pci_msi_supported()	Bjorn Helgaas
	27e20603c54b ("PCI/MSI: Move D0 check into pci_msi_check_device()") moved the power state check into pci_msi_check_device(), which was subsequently renamed to pci_msi_supported(). This didn't change the behavior, since both callers checked the power state. However, it doesn't fit the current "pci_msi_supported()" name, which should return what the device is capable of, independent of the power state. Move the power state check back into the callers for readability. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-26	PCI/MSI: Remove unused pci_irq_get_node()	Greg Kroah-Hartman
	The function pci_irq_get_node() is not used by anyone in the tree, so just delete it. Link: https://lore.kernel.org/r/20191014100452.GA6699@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com>
2019-11-26	clk: aspeed: Add RMII RCLK gates for both AST2500 MACs	Andrew Jeffery
	RCLK is a fixed 50MHz clock derived from HPLL that is described by a single gate for each MAC. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Link: https://lkml.kernel.org/r/20191010020655.3776-3-andrew@aj.id.au Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-11-27	nvme-fc: fix double-free scenarios on hw queues	James Smart
	If an error occurs on one of the ios used for creating an association, the creating routine has error paths that are invoked by the command failure and the error paths will free up the controller resources created to that point. But... the io was ultimately determined by an asynchronous completion routine that detected the error and which unconditionally invokes the error_recovery path which calls delete_association. Delete association deletes all outstanding io then tears down the controller resources. So the create_association thread can be running in parallel with the error_recovery thread. What was seen was the LLDD received a call to delete a queue, causing the LLDD to do a free of a resource, then the transport called the delete queue again causing the driver to repeat the free call. The second free routine corrupted the allocator. The transport shouldn't be making the duplicate call, and the delete queue is just one of the resources being freed. To fix, it is realized that the create_association path is completely serialized with one command at a time. So the failed io completion will always be seen by the create_association path and as of the failure, there are no ios to terminate and there is no reason to be manipulating queue freeze states, etc. The serialized condition stays true until the controller is transitioned to the LIVE state. Thus the fix is to change the error recovery path to check the controller state and only invoke the teardown path if not already in the CONNECTING state. Reviewed-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-27	nvme: else following return is not needed	Edmund Nadolski
	Remove unnecessary keyword in nvme_create_queue(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Edmund Nadolski <edmund.nadolski@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-27	nvme: add error message on mismatching controller ids	James Smart
	We've seen a few devices that return different controller id's to the Fabric Connect command vs the Identify(controller) command. It's currently hard to identify this failure by existing error messages. It comes across as a (re)connect attempt in the transport that fails with a -22 (-EINVAL) status. The issue is compounded by older kernels not having the controller id check or had the identify command overwrite the fabrics controller id value before it checked. Both resulted in cases where the devices appeared fine until more recent kernels. Clarify the reject by adding an error message on controller id mismatches. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-27	nvme_fc: add module to ops template to allow module references	James Smart
	In nvme-fc: it's possible to have connected active controllers and as no references are taken on the LLDD, the LLDD can be unloaded. The controller would enter a reconnect state and as long as the LLDD resumed within the reconnect timeout, the controller would resume. But if a namespace on the controller is the root device, allowing the driver to unload can be problematic. To reload the driver, it may require new io to the boot device, and as it's no longer connected we get into a catch-22 that eventually fails, and the system locks up. Fix this issue by taking a module reference for every connected controller (which is what the core layer did to the transport module). Reference is cleared when the controller is removed. Acked-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-26	Merge branch 'x86-hyperv-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hyperv updates from Ingo Molnar: "Misc updates to the hyperv guest code: - Rework clockevents initialization to better support hibernation - Allow guests to enable InvariantTSC - Micro-optimize send_ipi_one" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Initialize clockevents earlier in CPU onlining x86/hyperv: Allow guests to enable InvariantTSC x86/hyperv: Micro-optimize send_ipi_one()
2019-11-26	amdgpu: Enable KFD on POWER systems	Timothy Pearson
	KFD has been verified to function on POWER systems (Talos II / Vega 64). It should be available as a kernel configuration option on these systems. Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-26	drm/amdgpu: Optimize KFD page table reservation	Felix Kuehling
	Be less pessimistic about estimated page table use for KFD. Most allocations use 2MB pages and therefore need less VRAM for page tables. This allows more VRAM to be used for applications especially on large systems with many GPUs and hundreds of GB of system memory. Example: 8 GPUs with 32GB VRAM each + 256GB system memory = 512GB Old page table reservation per GPU: 1GB New page table reservation per GPU: 32MB Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-26	drm/amdgpu: flag vram lost on baco reset for VI/CIK	Alex Deucher
	VI/CIK BACO was inflight when this fix landed for SOC15/NV. Add the fix to VI/CIK as well. Acked-by: Evan Quan <evan.quan@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-26	drm/amdgpu: Resolved offchip EEPROM I/O issue	John Clements
	Updated target I2C address Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-26	drm/amd/display: add default clocks if not able to fetch them	Alex Deucher
	dm_pp_get_clock_levels_by_type needs to add the default clocks to the powerplay case as well. This was accidently dropped. Fixes: b3ea88fef321de ("drm/amd/powerplay: add get_clock_by_type interface for display") Bug: https://gitlab.freedesktop.org/drm/amd/issues/906 Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2019-11-27	nvmet-loop: Avoid preallocating big SGL for data	Israel Rukshin
	nvme_loop_create_io_queues() preallocates a big buffer for the IO SGL based on SG_CHUNK_SIZE. Modern DMA engines are often capable of dealing with very big segments so the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB SGL allocation per command. If a controller has lots of deep queues, preallocation for the sg list can consume substantial amounts of memory. For nvmet-loop, nr_hw_queues can be 128 and each queue's depth 128. This means the resulting preallocation for the data SGL is 1281284K = 64MB per controller. Switch to runtime allocation for SGL for lists longer than 2 entries. This is the approach used by NVMe PCI so it should be reasonable for NVMeOF as well. Runtime SGL allocation has always been the case for the legacy I/O path so this is nothing new. Tested-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-27	nvme-fc: Avoid preallocating big SGL for data	Israel Rukshin
	nvme_fc_create_io_queues() preallocates a big buffer for the IO SGL based on SG_CHUNK_SIZE. Modern DMA engines are often capable of dealing with very big segments so the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB SGL allocation per command. If a controller has lots of deep queues, preallocation for the sg list can consume substantial amounts of memory. For nvme-fc, nr_hw_queues can be 128 and each queue's depth 128. This means the resulting preallocation for the data SGL is 1281284K = 64MB per controller. Switch to runtime allocation for SGL for lists longer than 2 entries. This is the approach used by NVMe PCI so it should be reasonable for NVMeOF as well. Runtime SGL allocation has always been the case for the legacy I/O path so this is nothing new. Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-27	nvme-rdma: Avoid preallocating big SGL for data	Israel Rukshin
	nvme_rdma_alloc_tagset() preallocates a big buffer for the IO SGL based on SG_CHUNK_SIZE. Modern DMA engines are often capable of dealing with very big segments so the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB SGL allocation per command. If a controller has lots of deep queues, preallocation for the sg list can consume substantial amounts of memory. For nvme-rdma, nr_hw_queues can be 128 and each queue's depth 128. This means the resulting preallocation for the data SGL is 1281284K = 64MB per controller. Switch to runtime allocation for SGL for lists longer than 2 entries. This is the approach used by NVMe PCI so it should be reasonable for NVMeOF as well. Runtime SGL allocation has always been the case for the legacy I/O path so this is nothing new. The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't support SG_CHAIN, use only runtime allocation for the SGL. We didn't notice of a performance degradation, since for small IOs we'll use the inline SG and for the bigger IOs the allocation of a bigger SGL from slab is fast enough. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-11-26	dm mpath: remove harmful bio-based optimization	Mike Snitzer
	Removes the branching for edge-case where no SCSI device handler exists. The __map_bio_fast() method was far too limited, by only selecting a new pathgroup or path IFF there was a path failure, fix this be eliminating it in favor of __map_bio(). __map_bio()'s extra SCSI device handler specific MPATHF_PG_INIT_REQUIRED test is not in the fast path anyway. This change restores full path selector functionality for bio-based configurations that don't haave a SCSI device handler. But it should be noted that the path selectors do have an impact on performance for certain networks that are extremely fast (and don't require frequent switching). Fixes: 8d47e65948dd ("dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks") Cc: stable@vger.kernel.org Reported-by: Drew Hastings <dhastings@crucialwebhost.com> Suggested-by: Martin Wilck <mwilck@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-26	Merge branch 'patchwork' into v4l_for_linus	Mauro Carvalho Chehab
	* patchwork: (360 commits) media: Revert "media: mtk-vcodec: Remove extra area allocation in an input buffer on encoding" media: hantro: Set H264 FIELDPIC_FLAG_E flag correctly media: hantro: Remove now unused H264 pic_size media: hantro: Use output buffer width and height for H264 decoding media: hantro: Reduce H264 extra space for motion vectors media: hantro: Fix H264 motion vector buffer offset media: ti-vpe: vpe: fix compatible to match bindings media: dt-bindings: media: ti-vpe: Document VPE driver media: zr364xx: remove redundant assigmnent to idx, clean up code media: Documentation: media: *_DEFAULT targets for subdevs media: hantro: Fix s_fmt for dynamic resolution changes media: i2c: Use the correct style for SPDX License Identifier media: siano: Use the correct style for SPDX License Identifier media: vicodec: media_device_cleanup was called too early media: vim2m: media_device_cleanup was called too early media: cedrus: Increase maximum supported size media: cedrus: Fix H264 4k support media: cedrus: Properly signal size in mode register media: v4l2-ctrl: Lock main_hdl on operations of requests_queued. media: si470x-i2c: add missed operations in remove ...
2019-11-26	firmware: arm_scmi: Avoid double free in error flow	Wen Yang
	If device_register() fails, both put_device() and kfree() are called, ending with a double free of the scmi_dev. Calling kfree() is needed only when a failure happens between the allocation of the scmi_dev and its registration, so move it to there and remove it from the error flow. Fixes: 46edb8d1322c ("firmware: arm_scmi: provide the mandatory device release callback") Signed-off-by: Wen Yang <wenyang@linux.alibaba.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2019-11-26	PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer	Dexuan Cui
	With the recent 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), kzalloc() is able to allocate a 4KB buffer that is guaranteed to be 4KB-aligned. Here the size and alignment of hbus is important because hbus's field retarget_msi_interrupt_params must not cross a 4KB page boundary. Here we prefer kzalloc to get_zeroed_page(), because a buffer allocated by the latter is not tracked and scanned by kmemleak, and hence kmemleak reports the pointer contained in the hbus buffer (i.e. the hpdev struct, which is created in new_pcichild_device() and is tracked by hbus->children) as memory leak (false positive). If the kernel doesn't have 59bb47985c1d, get_zeroed_page() must be used to allocate the hbus buffer and we can avoid the kmemleak false positive by using kmemleak_alloc() and kmemleak_free() to ask kmemleak to track and scan the hbus buffer. Reported-by: Lili Deng <v-lide@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
2019-11-26	PCI: hv: Change pci_protocol_version to per-hbus	Dexuan Cui
	A VM can have multiple Hyper-V hbus. It's incorrect to set the global variable 'pci_protocol_version' when every hbus is initialized in hv_pci_protocol_negotiation(). This is not an issue in practice since every hbus should have the same value of hbus->protocol_version, but we should make the variable per-hbus, so in case we have busses with different protocol versions, the driver can still work correctly. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
2019-11-26	PCI: hv: Add hibernation support	Dexuan Cui
	Add suspend() and resume() functions so that Hyper-V virtual PCI devices are handled properly when the VM hibernates and resumes from hibernation. Note that the suspend() function must make sure there are no pending work items before calling vmbus_close(), since it runs in a process context as a callback in dpm_suspend(). When it starts to run, the channel callback hv_pci_onchannelcallback(), which runs in a tasklet context, can be still running concurrently and scheduling new work items onto hbus->wq in hv_pci_devices_present() and hv_pci_eject_device(), and the work item handlers can access the vmbus channel, which can be being closed by hv_pci_suspend(), e.g. the work item handler pci_devices_present_work() -> new_pcichild_device() writes to the vmbus channel. To eliminate the race, hv_pci_suspend() disables the channel callback tasklet, sets hbus->state to hv_pcibus_removing, and re-enables the tasklet. This way, when hv_pci_suspend() proceeds, it knows that no new work item can be scheduled, and then it flushes hbus->wq and safely closes the vmbus channel. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
2019-11-26	PCI: hv: Reorganize the code in preparation of hibernation	Dexuan Cui
	There is no functional change. This is just preparatory for a later patch which adds the hibernation support for the pci-hyperv driver. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
2019-11-26	Merge branch 'acpi-mm'	Rafael J. Wysocki
	* acpi-mm: ACPI: HMAT: use %u instead of %d to print u32 values ACPI: NUMA: HMAT: fix a section mismatch ACPI: HMAT: don't mix pxm and nid when setting memory target processor_pxm ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device ACPI: NUMA: HMAT: Register HMAT at device_initcall level device-dax: Add a driver for "hmem" devices dax: Fix alloc_dax_region() compile warning lib: Uplevel the pmem "region" ida to a global allocator x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP arm/efi: EFI soft reservation to memblock x86/efi: EFI soft reservation to E820 enumeration efi: Common enable/disable infrastructure for EFI soft reservation x86/efi: Push EFI_MEMMAP check into leaf routines efi: Enumerate EFI_MEMORY_SP ACPI: NUMA: Establish a new drivers/acpi/numa/ directory
2019-11-26	Merge branches 'acpi-utils', 'acpi-platform', 'acpi-video' and 'acpi-doc'	Rafael J. Wysocki
	* acpi-utils: iommu/amd: Switch to use acpi_dev_hid_uid_match() mmc: sdhci-acpi: Switch to use acpi_dev_hid_uid_match() ACPI / LPSS: Switch to use acpi_dev_hid_uid_match() ACPI / utils: Introduce acpi_dev_hid_uid_match() helper ACPI / utils: Move acpi_dev_get_first_match_dev() under CONFIG_ACPI ACPI / utils: Describe function parameters in kernel-doc * acpi-platform: ACPI: platform: Unregister stale platform devices ACPI: Always build evged in * acpi-video: ACPI: video: update doc for acpi_video_bus_DOS() * acpi-doc: ACPI: Documentation: Minor spelling fix in namespace.rst
2019-11-26	Merge branches 'acpi-ec', 'acpi-soc', 'acpi-pmic' and 'acpi-button'	Rafael J. Wysocki
	* acpi-ec: ACPI: EC: add support for hardware-reduced systems ACPI: EC: tweak naming in preparation for GpioInt support * acpi-soc: ACPI: LPSS: Add dmi quirk for skipping _DEP check for some device-links ACPI: LPSS: Add LNXVIDEO -> BYT I2C1 to lpss_device_links ACPI: LPSS: Add LNXVIDEO -> BYT I2C7 to lpss_device_links * acpi-pmic: ACPI / PMIC: Add Cherry Trail Crystal Cove PMIC OpRegion driver ACPI / PMIC: Add byt prefix to Crystal Cove PMIC OpRegion driver ACPI / PMIC: Do not register handlers for unhandled OpRegions * acpi-button: ACPI: button: Remove unused acpi_lid_notifier_[un]register() functions ACPI: button: Add DMI quirk for Asus T200TA ACPI: button: Add DMI quirk for Medion Akoya E2215T ACPI: button: Turn lid_blacklst DMI table into a generic quirk table ACPI: button: Allow disabling LID support with the lid_init_state module option ACPI: button: Refactor lid_init_state module parsing code
2019-11-26	Merge branch 'acpica'	Rafael J. Wysocki
	* acpica: ACPICA: Update version to 20191018 ACPICA: debugger: remove leading whitespaces when converting a string to a buffer ACPICA: acpiexec: initialize all simple types and field units from user input ACPICA: debugger: add field unit support for acpi_db_get_next_token ACPICA: debugger: surround field unit output with braces '{' ACPICA: debugger: add command to dump all fields of particular subtype ACPICA: utilities: add flag to only display data when dumping buffers ACPICA: make acpi_load_table() return table index ACPICA: Add new external interface, acpi_unload_table() ACPICA: More Clang changes ACPICA: Win OSL: Replace get_tick_count with get_tick_count64 ACPICA: Results from Clang
2019-11-26	Merge branches 'pm-avs', 'pm-docs' and 'pm-tools'	Rafael J. Wysocki
	* pm-avs: ARM: OMAP2+: SmartReflex: add omap_sr_pdata definition power: avs: smartreflex: Remove superfluous cast in debugfs_create_file() call * pm-docs: PM: Wrap documentation to fit in 80 columns * pm-tools: cpupower: ToDo: Update ToDo with ideas for per_cpu_schedule handling cpupower: mperf_monitor: Update cpupower to use the RDPRU instruction cpupower: mperf_monitor: Introduce per_cpu_schedule flag cpupower: Move needs_root variable into a sub-struct cpupower : Handle set and info subcommands correctly pm-graph info added to MAINTAINERS tools/power/cpupower: Fix initializer override in hsw_ext_cstates
2019-11-26	Merge branches 'pm-sleep', 'pm-domains', 'pm-opp' and 'powercap'	Rafael J. Wysocki
	* pm-sleep: PM / wakeirq: remove unnecessary parentheses PM / core: Clean up some function headers in power.h PM / hibernate: memory_bm_find_bit(): Tighten node optimisation * pm-domains: PM / Domains: Convert to dev_to_genpd_safe() in genpd_syscore_switch() mmc: tmio: Avoid boilerplate code in ->runtime_suspend() PM / Domains: Implement the ->start() callback for genpd PM / Domains: Introduce dev_pm_domain_start() * pm-opp: PM / OPP: Support adjusting OPP voltages at runtime * powercap: powercap/intel_rapl: add support for Cometlake desktop powercap/intel_rapl: add support for CometLake Mobile
2019-11-26	Merge branch 'pm-devfreq'	Rafael J. Wysocki
	* pm-devfreq: (26 commits) PM / devfreq: tegra30: Tune up MCCPU boost-down coefficient PM / devfreq: tegra30: Support variable polling interval PM / devfreq: Add new interrupt_driven flag for governors PM / devfreq: tegra30: Use kHz units for dependency threshold PM / devfreq: tegra30: Disable consecutive interrupts when appropriate PM / devfreq: tegra30: Don't enable already enabled consecutive interrupts PM / devfreq: tegra30: Include appropriate header PM / devfreq: tegra30: Constify structs PM / devfreq: tegra30: Don't enable consecutive-down interrupt on startup PM / devfreq: tegra30: Reset boosting on startup PM / devfreq: tegra30: Move clk-notifier's registration to governor's start PM / devfreq: tegra30: Use CPUFreq notifier PM / devfreq: tegra30: Use kHz units uniformly in the code PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out PM / devfreq: tegra30: Drop write-barrier PM / devfreq: tegra30: Handle possible round-rate error PM / devfreq: tegra30: Keep interrupt disabled while governor is stopped PM / devfreq: tegra30: Change irq type to unsigned int PM / devfreq: exynos-ppmu: remove useless assignment PM / devfreq: Lock devfreq in trans_stat_show ...
2019-11-26	Merge branch 'pm-cpufreq'	Rafael J. Wysocki
	* pm-cpufreq: (23 commits) cpufreq: Register drivers only after CPU devices have been registered cpufreq: Add NULL checks to show() and store() methods of cpufreq cpufreq: intel_pstate: Fix plain int as pointer warning from sparse cpufreq: sun50i: Fix CPU speed bin detection cpufreq: powernv: fix stack bloat and hard limit on number of CPUs cpufreq: Clarify the comment in cpufreq_set_policy() cpufreq: vexpress-spc: find and skip duplicates when merging frequencies cpufreq: vexpress-spc: use macros instead of hardcoded values for cluster ids cpufreq: s3c64xx: Remove pointless NULL check in s3c64xx_cpufreq_driver_init cpufreq: imx-cpufreq-dt: Correct i.MX8MN's default speed grade value cpufreq: vexpress-spc: fix some coding style issues cpufreq: vexpress-spc: remove lots of debug messages cpufreq: vexpress-spc: drop unnessary cpufreq_arm_bL_ops abstraction cpufreq: merge arm_big_little and vexpress-spc cpufreq: scpi: remove stale/outdated comment about the driver ARM: dts: Add OPP-V2 table for AM3517 cpufreq: ti-cpufreq: Add support for AM3517 ARM: dts: omap36xx: using OPP1G needs to control the abb_ldo cpufreq: ti-cpufreq: omap36xx use "cpu0","vbb" if run in multi_regulator mode ARM: dts: omap3: bulk convert compatible to be explicitly ti,omap3430 or ti,omap3630 or ti,am3517 ...
2019-11-26	Merge branch 'pm-cpuidle'	Rafael J. Wysocki
	* pm-cpuidle: cpuidle: Pass exit latency limit to cpuidle_use_deepest_state() cpuidle: Allow idle injection to apply exit latency limit cpuidle: Introduce cpuidle_driver_state_disabled() for driver quirks cpuidle: teo: Avoid code duplication in conditionals cpuidle: teo: Avoid using "early hits" incorrectly cpuidle: teo: Exclude cpuidle overhead from computations cpuidle: Use nanoseconds as the unit of time cpuidle: Consolidate disabled state checks ACPI: processor_idle: Skip dummy wait if kernel is in guest cpuidle: Do not unset the driver if it is there already cpuidle: teo: Fix "early hits" handling for disabled idle states cpuidle: teo: Consider hits and misses metrics of disabled states cpuidle: teo: Rename local variable in teo_select() cpuidle: teo: Ignore disabled idle states that are too deep