Age | Commit message (Collapse) | Author |
|
A patch in net-next triggered a compile error on powerpc:
include/linux/u64_stats_sync.h: In function 'u64_stats_read':
include/asm-generic/local64.h:30:37: warning: passing argument 1 of 'local_read' discards 'const' qualifier from pointer target type
This seems reasonable to relax powerpc local_read() requirements.
Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build only
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Michael Chan says:
====================
bnxt_en: Updates.
This patchset contains these main features:
1. Add the proper logic to support suspend/resume on the new 57500
chips.
2. Allow Phy configurations from user on a Multihost function if
supported by fw.
3. devlink NVRAM flashing support.
4. Add a couple of chip IDs, PHY loopback enhancement, and provide
more RSS contexts to VFs.
v2: Dropped the devlink info patches to address some feedback
and resubmit for the 5.6 kernel.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Use the same bnxt_flash_package_from_file() function to support
devlink flash operation.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Currently, the driver does not allow PHY settings on a multi-function or
NPAR NIC whose port is shared by more than one function. Newer
firmware now allows PHY settings on some of these NICs. Check for
this new firmware setting and allow the user to set the PHY settings
accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
If the link settings have been changed by another function sharing the
port, firmware will send us an async. message. In response, we will
call the new bnxt_init_ethtool_link_settings() function to update
the current settings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Refactor this logic in bnxt_probe_phy() into a separate function
bnxt_init_ethtool_link_settings(). It used to be that the settable
link settings will never be changed without going through ethtool.
So we only needed to do this once in bnxt_probe_phy(). Now, another
function sharing the port may change it and we may need to re-initialize
the ethtool settings again in run-time.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
New firmware allows PHY loopback to be set without disabling autoneg
first. Check this capability and skip disabling autoneg when
it is supported by firmware. Using this scheme, loopback will
always work even if the PHY only supports autoneg.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
The driver currently only assignes 1 RSS context to each VF. This works
for the Linux VF driver. But other drivers, such as DPDK, can make use
of additional RSS contexts. Modify the code to divide up and assign
RSS contexts to VFs just like other resources.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Some chips that need host context memory as a backing store requires
the memory to be initialized to a non-zero value. Query the
value from firmware and initialize the context memory accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Driver calls HWRM_FUNC_RESET firmware call while resuming the device
which clears the context memory backing store. Because of which
allocating firmware resources would eventually fail. Fix it by freeing
all context memory during suspend and reallocate the memory during resume.
Call bnxt_hwrm_queue_qportcfg() in resume path. This firmware call
is needed on the 57500 chips so that firmware will set up the proper
queue mapping in relation to the context memory.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
After driver unregister, firmware is erasing the information that
driver supports new resource management. Send FUNC_RESOURCE_QCAPS
command to inform the firmware that driver supports new resource
management while resuming from hibernation. Otherwise, we fallback
to the older resource allocation scheme.
Also, move driver register after sending FUNC_RESOURCE_QCAPS command
to be consistent with the normal initialization sequence.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Everytime driver registers with firmware, driver is required to
register for async event notifications as well. These 2 calls
are done using the same firmware command and can be combined.
We are also missing the 2nd step to register for async events
in the suspend/resume path and this will fix it. Prior to this,
we were getting only default notifications.
ULP can register for additional async events for the RDMA driver,
so we add a parameter to the new function to only do step 2 when
it is called from ULP.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
In the bnxt_init_one() failure path, if the driver has already called
firmware to register the driver, it is not undoing the driver
registration. Add this missing step to unregister for correctness,
so that the firmware knows that the driver has unloaded.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Disable Bus master during suspend to prevent DMAs after the device
goes into D3hot state. The new 57500 devices may continue to DMA
from context memory after the system goes into D3hot state. This
may cause some PCIe errors on some system. Re-enable it during resume.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Fix BNXT_CHIP_NUM_5645X() to include 57452 and 56454 chip IDs, so
that these chips will be properly classified as P4 chips to take
advantage of the P4 fixes and features.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
The rfs members of struct efx_channel are under CONFIG_RFS_ACCEL.
Ethtool stats which access those need to be as well.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: ca70bd423f10 ("sfc: add statistics for ARFS")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Pull cramfs fix from Al Viro:
"Regression fix, fallen through the cracks"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cramfs: fix usage on non-MTD device
|
|
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
An explicit Kconfig dependency is missing for the recent addition of
the timer support. CONFIG_SND_TIMER isn't always selected by SND_PCM.
Fixes: 26c53379f98d ("ALSA: aloop: Support selection of snd_timer instead of jiffies")
Reported-by: kbuild test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20191124083924.14049-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
We have already handled cache_strategy option carefully,
so incorrect setting could not pass option parsing.
Meanwhile, print 'cache_strategy=(unknown)' can cause
failure on remount.
Link: https://lore.kernel.org/r/20191119115049.3401-1-cgxu519@mykernel.net
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
|
|
VLE was an old informal name of fixed-sized output
compression which came from published ATC'19 paper [1].
Drop those old annotations since erofs can handle
all encoded clusters in block-aligned basis, which
is wider than fixed-sized output compression after
larger clustersize feature is fully implemented.
Unaligned encoding won't be considered in EROFS
since it's not friendly to inplace I/O and perhaps
decompression inplace.
a) Fixed-sized output compression with 16KB pcluster:
___________________________________
|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|
|___ 0___|___ 1___|___ 2___|___ 3___| physical blocks
b) Block-aligned fixed-sized input compression with
16KB pcluster:
___________________________________
|xxxxxxxx|xxxxxxxx|xxxxxxxx|xxx00000|
|___ 0___|___ 1___|___ 2___|___ 3___| physical blocks
c) Block-unaligned fixed-sized input compression with
16KB compression unit:
____________________________________________
|..xxxxxx|xxxxxxxx|xxxxxxxx|xxxxxxxx|x.......|
|___ 0___|___ 1___|___ 2___|___ 3___|___ 4___| physical blocks
Refine better names for those as well.
[1] https://www.usenix.org/conference/atc19/presentation/gao
Link: https://lore.kernel.org/r/20191108033733.63919-1-gaoxiang25@huawei.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
|
|
Introduce superblock checksum feature in order to
check at mounting time.
Note that the first 1024 bytes are ignore for x86
boot sectors and other oddities.
Link: https://lore.kernel.org/r/20191104024937.113939-1-gaoxiang25@huawei.com
Signed-off-by: Pratik Shinde <pratikshinde320@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
|
|
For those tasks waiting I/O for sync decompression,
they should be better marked as IO wait state.
Link: https://lore.kernel.org/r/20191008125616.183715-5-gaoxiang25@huawei.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
|
|
Previously, both z_erofs_unzip_io and z_erofs_unzip_io_sb
record decompress queues for backend to use.
The only difference is that z_erofs_unzip_io is used for
on-stack sync decompression so that it doesn't have a super
block field (since the caller can pass it in its context),
but it increases complexity with only a pointer saving.
Rename z_erofs_unzip_io to z_erofs_decompressqueue with
a fixed super_block member and kill the other entirely,
and it can fallback to sync decompression if memory
allocation failure.
Link: https://lore.kernel.org/r/20191008125616.183715-4-gaoxiang25@huawei.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
|
|
Now open code is much cleaner due to iterative development.
Link: https://lore.kernel.org/r/20191124025217.12345-1-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
|
|
When both CONFIG_CRAMFS_MTD and CONFIG_CRAMFS_BLOCKDEV are enabled, if
we fail to mount on MTD, we don't try on block device.
Note: this relies upon cramfs_mtd_fill_super() leaving no side
effects on fc state in case of failure; in general, failing
get_tree_...() does *not* mean "fine to try again"; e.g. parsed
options might've been consumed by fill_super callback and freed
on failure.
Fixes: 74f78fc5ef43 ("vfs: Convert cramfs to use the new mount API")
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The recording of RSS hash should be controlled by NETIF_F_RXHASH.
Fixes: 1fac7ca4e63b ("hv_netvsc: record hardware hash in skb")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
This patch is to fix a data-race reported by syzbot:
BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj
write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1:
sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091
sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465
sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916
inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734
__sys_accept4+0x224/0x430 net/socket.c:1754
__do_sys_accept net/socket.c:1795 [inline]
__se_sys_accept net/socket.c:1792 [inline]
__x64_sys_accept+0x4e/0x60 net/socket.c:1792
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0:
sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894
rht_key_get_hash include/linux/rhashtable.h:133 [inline]
rht_key_hashfn include/linux/rhashtable.h:159 [inline]
rht_head_hashfn include/linux/rhashtable.h:174 [inline]
head_hashfn lib/rhashtable.c:41 [inline]
rhashtable_rehash_one lib/rhashtable.c:245 [inline]
rhashtable_rehash_chain lib/rhashtable.c:276 [inline]
rhashtable_rehash_table lib/rhashtable.c:316 [inline]
rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate
is changing its value. However, what rhashtable wants is netns from asoc
base.sk, and for an asoc, its netns won't change once set. So we can
simply fix it by caching netns since created.
Fixes: d6c0256a60e6 ("sctp: add the rhashtable apis for sctp global transport hashtable")
Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
In the implementation of sctp_sf_do_5_2_4_dupcook() the allocated
new_asoc is leaked if security_sctp_assoc_request() fails. Release it
via sctp_association_free().
Fixes: 2277c7cd75e3 ("sctp: Add LSM hooks")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Use vlan common api to access the vlan_tag info.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-11-22
1) Misc Cleanups
2) Software steering support for Geneve
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
Rename the mac_link_state() method to mac_pcs_get_state() to make it
clear that it should be returning the MACs PCS current state, which
is used for inband negotiation rather than just reading back what the
MAC has been configured for. Update the documentation to explicitly
mention that this is for inband.
We drop the return value as well; most of phylink doesn't check the
return value and it is not clear what it should do on error - instead
arrange for state->link to be false.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
|
|
These two functions have never been used since they were added.
Link: https://lore.kernel.org/r/20191113134528.21187-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
hmm_range_fault() calls find_vma() and walk_page_range() in a loop. This
is unnecessary duplication since walk_page_range() calls find_vma() in a
loop already.
Simplify hmm_range_fault() by defining a walk_test() callback function to
filter unhandled vmas.
This also fixes a bug where hmm_range_fault() was not checking start >=
vma->vm_start before checking vma->vm_flags so hmm_range_fault() could
return an error based on the wrong vma for the requested range.
It also fixes a bug when the vma has no read access and the caller did not
request a fault, there shouldn't be any error return code.
Link: https://lore.kernel.org/r/20191104222141.5173-2-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
gntdev simply wants to monitor a specific VMA for any notifier events,
this can be done straightforwardly using mmu_interval_notifier_insert()
over the VMA's VA range.
The notifier should be attached until the original VMA is destroyed.
It is unclear if any of this is even sane, but at least a lot of duplicate
code is removed.
Link: https://lore.kernel.org/r/20191112202231.3856-15-jgg@ziepe.ca
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The only two users of this are now converted to use mmu_interval_notifier,
delete all the code and update hmm.rst.
Link: https://lore.kernel.org/r/20191112202231.3856-14-jgg@ziepe.ca
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Convert the collision-retry lock around hmm_range_fault to use the one now
provided by the mmu_interval notifier.
Although this driver does not seem to use the collision retry lock that
hmm provides correctly, it can still be converted over to use the
mmu_interval_notifier api instead of hmm_mirror without too much trouble.
This also deletes another place where a driver is associating additional
data (struct amdgpu_mn) with a mmu_struct.
Link: https://lore.kernel.org/r/20191112202231.3856-13-jgg@ziepe.ca
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Remove the interval tree in the driver and rely on the tree maintained by
the mmu_notifier for delivering mmu_notifier invalidation callbacks.
For some reason amdgpu has a very complicated arrangement where it tries
to prevent duplicate entries in the interval_tree, this is not necessary,
each amdgpu_bo can be its own stand alone entry. interval_tree already
allows duplicates and overlaps in the tree.
Also, there is no need to remove entries upon a release callback, the
mmu_interval API safely allows objects to remain registered beyond the
lifetime of the mm. The driver only has to stop touching the pages during
release.
Link: https://lore.kernel.org/r/20191112202231.3856-12-jgg@ziepe.ca
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
find_vma() must be called under the mmap_sem, reorganize this code to
do the vma check after entering the lock.
Further, fix the unlocked use of struct task_struct's mm, instead use
the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
must be converted to a mm_get before acquiring mmap_sem or calling
find_vma().
Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers")
Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads")
Link: https://lore.kernel.org/r/20191112202231.3856-11-jgg@ziepe.ca
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Remove the hmm_mirror object and use the mmu_interval_notifier API instead
for the range, and use the normal mmu_notifier API for the general
invalidation callback.
While here re-organize the pagefault path so the locking pattern is clear.
nouveau is the only driver that uses a temporary range object and instead
forwards nearly every invalidation range directly to the HW. While this is
not how the mmu_interval_notifier was intended to be used, the overheads on
the pagefaulting path are similar to the existing hmm_mirror version.
Particularly since the interval tree will be small.
Link: https://lore.kernel.org/r/20191112202231.3856-10-jgg@ziepe.ca
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
There is no reason to get the invalidate_range_start() callback via an
indirection through hmm_mirror, just register a normal notifier directly.
Link: https://lore.kernel.org/r/20191112202231.3856-9-jgg@ziepe.ca
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The new API is an exact match for the needs of radeon.
For some reason radeon tries to remove overlapping ranges from the
interval tree, but interval trees (and mmu_interval_notifier_insert())
support overlapping ranges directly. Simply delete all this code.
Since this driver is missing a invalidate_range_end callback, but
still calls get_user_pages(), it cannot be correct against all races.
Link: https://lore.kernel.org/r/20191112202231.3856-8-jgg@ziepe.ca
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This converts one of the two users of mmu_notifiers to use the new API.
The conversion is fairly straightforward, however the existing use of
notifiers here seems to be racey.
Link: https://lore.kernel.org/r/20191112202231.3856-7-jgg@ziepe.ca
Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Replace the internal interval tree based mmu notifier with the new common
mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
deadlock that can be triggered in ODP:
zap_page_range()
mmu_notifier_invalidate_range_start()
[..]
ib_umem_notifier_invalidate_range_start()
down_read(&per_mm->umem_rwsem)
unmap_single_vma()
[..]
__split_huge_page_pmd()
mmu_notifier_invalidate_range_start()
[..]
ib_umem_notifier_invalidate_range_start()
down_read(&per_mm->umem_rwsem) // DEADLOCK
mmu_notifier_invalidate_range_end()
up_read(&per_mm->umem_rwsem)
mmu_notifier_invalidate_range_end()
up_read(&per_mm->umem_rwsem)
The umem_rwsem is held across the range_start/end as the ODP algorithm for
invalidate_range_end cannot tolerate changes to the interval
tree. However, due to the nested invalidation regions the second
down_read() can deadlock if there are competing writers. The new core code
provides an alternative scheme to solve this problem.
Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count")
Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca
Tested-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Only the function calls are stubbed out with static inlines that always
fail. This is the standard way to write a header for an optional component
and makes it easier for drivers that only optionally need HMM_MIRROR.
Link: https://lore.kernel.org/r/20191112202231.3856-5-jgg@ziepe.ca
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
hmm_mirror's handling of ranges does not use a sequence count which
results in this bug:
CPU0 CPU1
hmm_range_wait_until_valid(range)
valid == true
hmm_range_fault(range)
hmm_invalidate_range_start()
range->valid = false
hmm_invalidate_range_end()
range->valid = true
hmm_range_valid(range)
valid == true
Where the hmm_range_valid() should not have succeeded.
Adding the required sequence count would make it nearly identical to the
new mmu_interval_notifier. Instead replace the hmm_mirror stuff with
mmu_interval_notifier.
Co-existence of the two APIs is the first step.
Link: https://lore.kernel.org/r/20191112202231.3856-4-jgg@ziepe.ca
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Of the 13 users of mmu_notifiers, 8 of them use only
invalidate_range_start/end() and immediately intersect the
mmu_notifier_range with some kind of internal list of VAs. 4 use an
interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list
of some kind (scif_dma, vhost, gntdev, hmm)
And the remaining 5 either don't use invalidate_range_start() or do some
special thing with it.
It turns out that building a correct scheme with an interval tree is
pretty complicated, particularly if the use case is synchronizing against
another thread doing get_user_pages(). Many of these implementations have
various subtle and difficult to fix races.
This approach puts the interval tree as common code at the top of the mmu
notifier call tree and implements a shareable locking scheme.
It includes:
- An interval tree tracking VA ranges, with per-range callbacks
- A read/write locking scheme for the interval tree that avoids
sleeping in the notifier path (for OOM killer)
- A sequence counter based collision-retry locking scheme to tell
device page fault that a VA range is being concurrently invalidated.
This is based on various ideas:
- hmm accumulates invalidated VA ranges and releases them when all
invalidates are done, via active_invalidate_ranges count.
This approach avoids having to intersect the interval tree twice (as
umem_odp does) at the potential cost of a longer device page fault.
- kvm/umem_odp use a sequence counter to drive the collision retry,
via invalidate_seq
- a deferred work todo list on unlock scheme like RTNL, via deferred_list.
This makes adding/removing interval tree members more deterministic
- seqlock, except this version makes the seqlock idea multi-holder on the
write side by protecting it with active_invalidate_ranges and a spinlock
To minimize MM overhead when only the interval tree is being used, the
entire SRCU and hlist overheads are dropped using some simple
branches. Similarly the interval tree overhead is dropped when in hlist
mode.
The overhead from the mandatory spinlock is broadly the same as most of
existing users which already had a lock (or two) of some sort on the
invalidation path.
Link: https://lore.kernel.org/r/20191112202231.3856-3-jgg@ziepe.ca
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
PROM only enables ethernet PHY on first Origin 200 module, so we must
do it ourselves for the second module.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-rtc@vger.kernel.org
Cc: linux-serial@vger.kernel.org
|
|
Generation of fake subdevice ID had vendor and device ID swapped.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-rtc@vger.kernel.org
Cc: linux-serial@vger.kernel.org
|
|
Pull last minute virtio bugfixes from Michael Tsirkin:
"Minor bugfixes all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: fix shrinker count
virtio_balloon: fix shrinker scan number of pages
virtio_console: allocate inbufs in add_port() only if it is needed
virtio_ring: fix return code on DMA mapping fails
|