linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2019-11-24	powerpc: Add const qual to local_read() parameter	Eric Dumazet
	A patch in net-next triggered a compile error on powerpc: include/linux/u64_stats_sync.h: In function 'u64_stats_read': include/asm-generic/local64.h:30:37: warning: passing argument 1 of 'local_read' discards 'const' qualifier from pointer target type This seems reasonable to relax powerpc local_read() requirements. Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build only Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	Merge branch 'bnxt_en-Updates'	Jakub Kicinski
	Michael Chan says: ==================== bnxt_en: Updates. This patchset contains these main features: 1. Add the proper logic to support suspend/resume on the new 57500 chips. 2. Allow Phy configurations from user on a Multihost function if supported by fw. 3. devlink NVRAM flashing support. 4. Add a couple of chip IDs, PHY loopback enhancement, and provide more RSS contexts to VFs. v2: Dropped the devlink info patches to address some feedback and resubmit for the 5.6 kernel. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Add support for flashing the device via devlink	Vasundhara Volam
	Use the same bnxt_flash_package_from_file() function to support devlink flash operation. Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Allow PHY settings on multi-function or NPAR PFs if allowed by FW.	Michael Chan
	Currently, the driver does not allow PHY settings on a multi-function or NPAR NIC whose port is shared by more than one function. Newer firmware now allows PHY settings on some of these NICs. Check for this new firmware setting and allow the user to set the PHY settings accordingly. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Add async. event logic for PHY configuration changes.	Michael Chan
	If the link settings have been changed by another function sharing the port, firmware will send us an async. message. In response, we will call the new bnxt_init_ethtool_link_settings() function to update the current settings. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Refactor the initialization of the ethtool link settings.	Michael Chan
	Refactor this logic in bnxt_probe_phy() into a separate function bnxt_init_ethtool_link_settings(). It used to be that the settable link settings will never be changed without going through ethtool. So we only needed to do this once in bnxt_probe_phy(). Now, another function sharing the port may change it and we may need to re-initialize the ethtool settings again in run-time. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Skip disabling autoneg before PHY loopback when appropriate.	Michael Chan
	New firmware allows PHY loopback to be set without disabling autoneg first. Check this capability and skip disabling autoneg when it is supported by firmware. Using this scheme, loopback will always work even if the PHY only supports autoneg. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Assign more RSS context resources to the VFs.	Michael Chan
	The driver currently only assignes 1 RSS context to each VF. This works for the Linux VF driver. But other drivers, such as DPDK, can make use of additional RSS contexts. Modify the code to divide up and assign RSS contexts to VFs just like other resources. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Initialize context memory to the value specified by firmware.	Michael Chan
	Some chips that need host context memory as a backing store requires the memory to be initialized to a non-zero value. Query the value from firmware and initialize the context memory accordingly. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Fix suspend/resume path on 57500 chips	Vasundhara Volam
	Driver calls HWRM_FUNC_RESET firmware call while resuming the device which clears the context memory backing store. Because of which allocating firmware resources would eventually fail. Fix it by freeing all context memory during suspend and reallocate the memory during resume. Call bnxt_hwrm_queue_qportcfg() in resume path. This firmware call is needed on the 57500 chips so that firmware will set up the proper queue mapping in relation to the context memory. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Send FUNC_RESOURCE_QCAPS command in bnxt_resume()	Vasundhara Volam
	After driver unregister, firmware is erasing the information that driver supports new resource management. Send FUNC_RESOURCE_QCAPS command to inform the firmware that driver supports new resource management while resuming from hibernation. Otherwise, we fallback to the older resource allocation scheme. Also, move driver register after sending FUNC_RESOURCE_QCAPS command to be consistent with the normal initialization sequence. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Combine 2 functions calling the same HWRM_DRV_RGTR fw command.	Vasundhara Volam
	Everytime driver registers with firmware, driver is required to register for async event notifications as well. These 2 calls are done using the same firmware command and can be combined. We are also missing the 2nd step to register for async events in the suspend/resume path and this will fix it. Prior to this, we were getting only default notifications. ULP can register for additional async events for the RDMA driver, so we add a parameter to the new function to only do step 2 when it is called from ULP. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Do driver unregister cleanup in bnxt_init_one() failure path.	Vasundhara Volam
	In the bnxt_init_one() failure path, if the driver has already called firmware to register the driver, it is not undoing the driver registration. Add this missing step to unregister for correctness, so that the firmware knows that the driver has unloaded. Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Disable/enable Bus master during suspend/resume.	Michael Chan
	Disable Bus master during suspend to prevent DMAs after the device goes into D3hot state. The new 57500 devices may continue to DMA from context memory after the system goes into D3hot state. This may cause some PCIe errors on some system. Re-enable it during resume. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	bnxt_en: Add chip IDs for 57452 and 57454 chips.	Michael Chan
	Fix BNXT_CHIP_NUM_5645X() to include 57452 and 56454 chip IDs, so that these chips will be properly classified as P4 chips to take advantage of the P4 fixes and features. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	sfc: fix build without CONFIG_RFS_ACCEL	Jakub Kicinski
	The rfs members of struct efx_channel are under CONFIG_RFS_ACCEL. Ethtool stats which access those need to be as well. Reported-by: kbuild test robot <lkp@intel.com> Fixes: ca70bd423f10 ("sfc: add statistics for ARFS") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-24	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull cramfs fix from Al Viro: "Regression fix, fallen through the cracks" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: cramfs: fix usage on non-MTD device
2019-11-24	xen: Fix Kconfig indentation	Krzysztof Kozlowski
	Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-11-24	ALSA: aloop: Fix dependency on timer API	Takashi Iwai
	An explicit Kconfig dependency is missing for the recent addition of the timer support. CONFIG_SND_TIMER isn't always selected by SND_PCM. Fixes: 26c53379f98d ("ALSA: aloop: Support selection of snd_timer instead of jiffies") Reported-by: kbuild test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20191124083924.14049-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-11-24	erofs: remove unnecessary output in erofs_show_options()	Chengguang Xu
	We have already handled cache_strategy option carefully, so incorrect setting could not pass option parsing. Meanwhile, print 'cache_strategy=(unknown)' can cause failure on remount. Link: https://lore.kernel.org/r/20191119115049.3401-1-cgxu519@mykernel.net Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2019-11-24	erofs: drop all vle annotations for runtime names	Gao Xiang
	VLE was an old informal name of fixed-sized output compression which came from published ATC'19 paper [1]. Drop those old annotations since erofs can handle all encoded clusters in block-aligned basis, which is wider than fixed-sized output compression after larger clustersize feature is fully implemented. Unaligned encoding won't be considered in EROFS since it's not friendly to inplace I/O and perhaps decompression inplace. a) Fixed-sized output compression with 16KB pcluster: ___________________________________ \|xxxxxxxx\|xxxxxxxx\|xxxxxxxx\|xxxxxxxx\| \|___ 0___\|___ 1___\|___ 2___\|___ 3___\| physical blocks b) Block-aligned fixed-sized input compression with 16KB pcluster: ___________________________________ \|xxxxxxxx\|xxxxxxxx\|xxxxxxxx\|xxx00000\| \|___ 0___\|___ 1___\|___ 2___\|___ 3___\| physical blocks c) Block-unaligned fixed-sized input compression with 16KB compression unit: ____________________________________________ \|..xxxxxx\|xxxxxxxx\|xxxxxxxx\|xxxxxxxx\|x.......\| \|___ 0___\|___ 1___\|___ 2___\|___ 3___\|___ 4___\| physical blocks Refine better names for those as well. [1] https://www.usenix.org/conference/atc19/presentation/gao Link: https://lore.kernel.org/r/20191108033733.63919-1-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2019-11-24	erofs: support superblock checksum	Pratik Shinde
	Introduce superblock checksum feature in order to check at mounting time. Note that the first 1024 bytes are ignore for x86 boot sectors and other oddities. Link: https://lore.kernel.org/r/20191104024937.113939-1-gaoxiang25@huawei.com Signed-off-by: Pratik Shinde <pratikshinde320@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2019-11-24	erofs: set iowait for sync decompression	Gao Xiang
	For those tasks waiting I/O for sync decompression, they should be better marked as IO wait state. Link: https://lore.kernel.org/r/20191008125616.183715-5-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2019-11-24	erofs: clean up decompress queue stuffs	Gao Xiang
	Previously, both z_erofs_unzip_io and z_erofs_unzip_io_sb record decompress queues for backend to use. The only difference is that z_erofs_unzip_io is used for on-stack sync decompression so that it doesn't have a super block field (since the caller can pass it in its context), but it increases complexity with only a pointer saving. Rename z_erofs_unzip_io to z_erofs_decompressqueue with a fixed super_block member and kill the other entirely, and it can fallback to sync decompression if memory allocation failure. Link: https://lore.kernel.org/r/20191008125616.183715-4-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2019-11-24	erofs: get rid of __stagingpage_alloc helper	Gao Xiang
	Now open code is much cleaner due to iterative development. Link: https://lore.kernel.org/r/20191124025217.12345-1-hsiangkao@aol.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
2019-11-23	cramfs: fix usage on non-MTD device	Maxime Bizon
	When both CONFIG_CRAMFS_MTD and CONFIG_CRAMFS_BLOCKDEV are enabled, if we fail to mount on MTD, we don't try on block device. Note: this relies upon cramfs_mtd_fill_super() leaving no side effects on fc state in case of failure; in general, failing get_tree_...() does not mean "fine to try again"; e.g. parsed options might've been consumed by fill_super callback and freed on failure. Fixes: 74f78fc5ef43 ("vfs: Convert cramfs to use the new mount API") Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-11-23	hv_netvsc: make recording RSS hash depend on feature flag	Stephen Hemminger
	The recording of RSS hash should be controlled by NETIF_F_RXHASH. Fixes: 1fac7ca4e63b ("hv_netvsc: record hardware hash in skb") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23	sctp: cache netns in sctp_ep_common	Xin Long
	This patch is to fix a data-race reported by syzbot: BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1: sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091 sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465 sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916 inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734 __sys_accept4+0x224/0x430 net/socket.c:1754 __do_sys_accept net/socket.c:1795 [inline] __se_sys_accept net/socket.c:1792 [inline] __x64_sys_accept+0x4e/0x60 net/socket.c:1792 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0: sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894 rht_key_get_hash include/linux/rhashtable.h:133 [inline] rht_key_hashfn include/linux/rhashtable.h:159 [inline] rht_head_hashfn include/linux/rhashtable.h:174 [inline] head_hashfn lib/rhashtable.c:41 [inline] rhashtable_rehash_one lib/rhashtable.c:245 [inline] rhashtable_rehash_chain lib/rhashtable.c:276 [inline] rhashtable_rehash_table lib/rhashtable.c:316 [inline] rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate is changing its value. However, what rhashtable wants is netns from asoc base.sk, and for an asoc, its netns won't change once set. So we can simply fix it by caching netns since created. Fixes: d6c0256a60e6 ("sctp: add the rhashtable apis for sctp global transport hashtable") Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23	sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook	Navid Emamdoost
	In the implementation of sctp_sf_do_5_2_4_dupcook() the allocated new_asoc is leaked if security_sctp_assoc_request() fails. Release it via sctp_association_free(). Fixes: 2277c7cd75e3 ("sctp: Add LSM hooks") Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23	net: gro: use vlan API instead of accessing directly	Tonghao Zhang
	Use vlan common api to access the vlan_tag info. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23	Merge tag 'mlx5-updates-2019-11-22' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-11-22 1) Misc Cleanups 2) Software steering support for Geneve ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23	net: phylink: rename mac_link_state() op to mac_pcs_get_state()	Russell King
	Rename the mac_link_state() method to mac_pcs_get_state() to make it clear that it should be returning the MACs PCS current state, which is used for inband negotiation rather than just reading back what the MAC has been configured for. Update the documentation to explicitly mention that this is for inband. We drop the return value as well; most of phylink doesn't check the return value and it is not clear what it should do on error - instead arrange for state->link to be false. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-23	mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap	Christoph Hellwig
	These two functions have never been used since they were added. Link: https://lore.kernel.org/r/20191113134528.21187-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	mm/hmm: make full use of walk_page_range()	Ralph Campbell
	hmm_range_fault() calls find_vma() and walk_page_range() in a loop. This is unnecessary duplication since walk_page_range() calls find_vma() in a loop already. Simplify hmm_range_fault() by defining a walk_test() callback function to filter unhandled vmas. This also fixes a bug where hmm_range_fault() was not checking start >= vma->vm_start before checking vma->vm_flags so hmm_range_fault() could return an error based on the wrong vma for the requested range. It also fixes a bug when the vma has no read access and the caller did not request a fault, there shouldn't be any error return code. Link: https://lore.kernel.org/r/20191104222141.5173-2-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	xen/gntdev: use mmu_interval_notifier_insert	Jason Gunthorpe
	gntdev simply wants to monitor a specific VMA for any notifier events, this can be done straightforwardly using mmu_interval_notifier_insert() over the VMA's VA range. The notifier should be attached until the original VMA is destroyed. It is unclear if any of this is even sane, but at least a lot of duplicate code is removed. Link: https://lore.kernel.org/r/20191112202231.3856-15-jgg@ziepe.ca Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	mm/hmm: remove hmm_mirror and related	Jason Gunthorpe
	The only two users of this are now converted to use mmu_interval_notifier, delete all the code and update hmm.rst. Link: https://lore.kernel.org/r/20191112202231.3856-14-jgg@ziepe.ca Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror	Jason Gunthorpe
	Convert the collision-retry lock around hmm_range_fault to use the one now provided by the mmu_interval notifier. Although this driver does not seem to use the collision retry lock that hmm provides correctly, it can still be converted over to use the mmu_interval_notifier api instead of hmm_mirror without too much trouble. This also deletes another place where a driver is associating additional data (struct amdgpu_mn) with a mmu_struct. Link: https://lore.kernel.org/r/20191112202231.3856-13-jgg@ziepe.ca Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror	Jason Gunthorpe
	Remove the interval tree in the driver and rely on the tree maintained by the mmu_notifier for delivering mmu_notifier invalidation callbacks. For some reason amdgpu has a very complicated arrangement where it tries to prevent duplicate entries in the interval_tree, this is not necessary, each amdgpu_bo can be its own stand alone entry. interval_tree already allows duplicates and overlaps in the tree. Also, there is no need to remove entries upon a release callback, the mmu_interval API safely allows objects to remain registered beyond the lifetime of the mm. The driver only has to stop touching the pages during release. Link: https://lore.kernel.org/r/20191112202231.3856-12-jgg@ziepe.ca Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	drm/amdgpu: Call find_vma under mmap_sem	Jason Gunthorpe
	find_vma() must be called under the mmap_sem, reorganize this code to do the vma check after entering the lock. Further, fix the unlocked use of struct task_struct's mm, instead use the mm from hmm_mirror which has an active mm_grab. Also the mm_grab must be converted to a mm_get before acquiring mmap_sem or calling find_vma(). Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers") Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads") Link: https://lore.kernel.org/r/20191112202231.3856-11-jgg@ziepe.ca Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	nouveau: use mmu_interval_notifier instead of hmm_mirror	Jason Gunthorpe
	Remove the hmm_mirror object and use the mmu_interval_notifier API instead for the range, and use the normal mmu_notifier API for the general invalidation callback. While here re-organize the pagefault path so the locking pattern is clear. nouveau is the only driver that uses a temporary range object and instead forwards nearly every invalidation range directly to the HW. While this is not how the mmu_interval_notifier was intended to be used, the overheads on the pagefaulting path are similar to the existing hmm_mirror version. Particularly since the interval tree will be small. Link: https://lore.kernel.org/r/20191112202231.3856-10-jgg@ziepe.ca Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	nouveau: use mmu_notifier directly for invalidate_range_start	Jason Gunthorpe
	There is no reason to get the invalidate_range_start() callback via an indirection through hmm_mirror, just register a normal notifier directly. Link: https://lore.kernel.org/r/20191112202231.3856-9-jgg@ziepe.ca Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	drm/radeon: use mmu_interval_notifier_insert	Jason Gunthorpe
	The new API is an exact match for the needs of radeon. For some reason radeon tries to remove overlapping ranges from the interval tree, but interval trees (and mmu_interval_notifier_insert()) support overlapping ranges directly. Simply delete all this code. Since this driver is missing a invalidate_range_end callback, but still calls get_user_pages(), it cannot be correct against all races. Link: https://lore.kernel.org/r/20191112202231.3856-8-jgg@ziepe.ca Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv	Jason Gunthorpe
	This converts one of the two users of mmu_notifiers to use the new API. The conversion is fairly straightforward, however the existing use of notifiers here seems to be racey. Link: https://lore.kernel.org/r/20191112202231.3856-7-jgg@ziepe.ca Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	RDMA/odp: Use mmu_interval_notifier_insert()	Jason Gunthorpe
	Replace the internal interval tree based mmu notifier with the new common mmu_interval_notifier_insert() API. This removes a lot of code and fixes a deadlock that can be triggered in ODP: zap_page_range() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) unmap_single_vma() [..] __split_huge_page_pmd() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) // DEADLOCK mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) The umem_rwsem is held across the range_start/end as the ODP algorithm for invalidate_range_end cannot tolerate changes to the interval tree. However, due to the nested invalidation regions the second down_read() can deadlock if there are competing writers. The new core code provides an alternative scheme to solve this problem. Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count") Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca Tested-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	mm/hmm: define the pre-processor related parts of hmm.h even if disabled	Jason Gunthorpe
	Only the function calls are stubbed out with static inlines that always fail. This is the standard way to write a header for an optional component and makes it easier for drivers that only optionally need HMM_MIRROR. Link: https://lore.kernel.org/r/20191112202231.3856-5-jgg@ziepe.ca Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror	Jason Gunthorpe
	hmm_mirror's handling of ranges does not use a sequence count which results in this bug: CPU0 CPU1 hmm_range_wait_until_valid(range) valid == true hmm_range_fault(range) hmm_invalidate_range_start() range->valid = false hmm_invalidate_range_end() range->valid = true hmm_range_valid(range) valid == true Where the hmm_range_valid() should not have succeeded. Adding the required sequence count would make it nearly identical to the new mmu_interval_notifier. Instead replace the hmm_mirror stuff with mmu_interval_notifier. Co-existence of the two APIs is the first step. Link: https://lore.kernel.org/r/20191112202231.3856-4-jgg@ziepe.ca Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	mm/mmu_notifier: add an interval tree notifier	Jason Gunthorpe
	Of the 13 users of mmu_notifiers, 8 of them use only invalidate_range_start/end() and immediately intersect the mmu_notifier_range with some kind of internal list of VAs. 4 use an interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list of some kind (scif_dma, vhost, gntdev, hmm) And the remaining 5 either don't use invalidate_range_start() or do some special thing with it. It turns out that building a correct scheme with an interval tree is pretty complicated, particularly if the use case is synchronizing against another thread doing get_user_pages(). Many of these implementations have various subtle and difficult to fix races. This approach puts the interval tree as common code at the top of the mmu notifier call tree and implements a shareable locking scheme. It includes: - An interval tree tracking VA ranges, with per-range callbacks - A read/write locking scheme for the interval tree that avoids sleeping in the notifier path (for OOM killer) - A sequence counter based collision-retry locking scheme to tell device page fault that a VA range is being concurrently invalidated. This is based on various ideas: - hmm accumulates invalidated VA ranges and releases them when all invalidates are done, via active_invalidate_ranges count. This approach avoids having to intersect the interval tree twice (as umem_odp does) at the potential cost of a longer device page fault. - kvm/umem_odp use a sequence counter to drive the collision retry, via invalidate_seq - a deferred work todo list on unlock scheme like RTNL, via deferred_list. This makes adding/removing interval tree members more deterministic - seqlock, except this version makes the seqlock idea multi-holder on the write side by protecting it with active_invalidate_ranges and a spinlock To minimize MM overhead when only the interval tree is being used, the entire SRCU and hlist overheads are dropped using some simple branches. Similarly the interval tree overhead is dropped when in hlist mode. The overhead from the mandatory spinlock is broadly the same as most of existing users which already had a lock (or two) of some sort on the invalidation path. Link: https://lore.kernel.org/r/20191112202231.3856-3-jgg@ziepe.ca Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23	MIPS: SGI-IP27: Enable ethernet phy on second Origin 200 module	Thomas Bogendoerfer
	PROM only enables ethernet PHY on first Origin 200 module, so we must do it ourselves for the second module. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: David S. Miller <davem@davemloft.net> Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-rtc@vger.kernel.org Cc: linux-serial@vger.kernel.org
2019-11-23	MIPS: PCI: Fix fake subdevice ID for IOC3	Thomas Bogendoerfer
	Generation of fake subdevice ID had vendor and device ID swapped. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: David S. Miller <davem@davemloft.net> Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-rtc@vger.kernel.org Cc: linux-serial@vger.kernel.org
2019-11-23	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull last minute virtio bugfixes from Michael Tsirkin: "Minor bugfixes all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: fix shrinker count virtio_balloon: fix shrinker scan number of pages virtio_console: allocate inbufs in add_port() only if it is needed virtio_ring: fix return code on DMA mapping fails