Age | Commit message (Collapse) | Author |
|
I've found that on occasion, "rmmod <dev>" will hang while if an NFS
is under load.
Ensure that ri_remove_done is initialized only just before the
transport is woken up to force a close. This avoids the completion
possibly getting initialized again while the CM event handler is
waiting for a wake-up.
Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from under an NFS mount")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
On device re-insertion, the RDMA device driver crashes trying to set
up a new QP:
Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G W 5.4.0 #852
Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
Nov 27 16:32:06 manet kernel: FS: 0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
Nov 27 16:32:06 manet kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
Nov 27 16:32:06 manet kernel: Call Trace:
Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]
The fix is to copy the qp_init_attr struct that was just created by
rpcrdma_ep_create() instead of using the one from the previous
connection instance.
Fixes: 98ef77d1aaa7 ("xprtrdma: Send Queue size grows after a reconnect")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"A boot crash fix by Mike Rapoport and a printk fix by Krzysztof
Kozlowski"
* 'parisc-5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix map_pages() to actually populate upper directory
parisc: Use proper printk format for resource_size_t
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
Pull asm-generic fixes from Arnd Bergmann:
"Here are two bugfixes from Mike Rapoport, both fixing compile-time
errors for the nds32 architecture that were recently introduced"
* tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
nds32: fix build failure caused by page table folding updates
asm-generic/nds32: don't redefine cacheflush primitives
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two simple fixes in the upper drivers (so both fairly core), one in
enclosures, which fixes replugging a device into an enclosure slot and
one in the disk driver which fixes revalidating a drive with
protection information (PI) to make it a non-PI drive ... previously
we were still remembering the old PI state.
Both fixed issues are quite rare in the field"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: enclosure: Fix stale device oops with hot replug
scsi: sd: Clear sdkp->protection_type if disk is reformatted without PI
|
|
Merge misc fixes from David Howells.
Two afs fixes and a key refcounting fix.
* dhowells:
afs: Fix afs_lookup() to not clobber the version on a new dentry
afs: Fix use-after-loss-of-ref
keys: Fix request_key() cache
|
|
Fix afs_lookup() to not clobber the version set on a new dentry by
afs_do_lookup() - especially as it's using the wrong version of the
version (we need to use the one given to us by whatever op the dir
contents correspond to rather than what's in the afs_vnode).
Fixes: 9dd0b82ef530 ("afs: Fix missing dentry data version updating")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
afs_lookup() has a tracepoint to indicate the outcome of
d_splice_alias(), passing it the inode to retrieve the fid from.
However, the function gave up its ref on that inode when it called
d_splice_alias(), which may have failed and dropped the inode.
Fix this by caching the fid.
Fixes: 80548b03991f ("afs: Add more tracepoints")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the key cached by request_key() and co. is cleaned up on exit(),
the code looks in the wrong task_struct, and so clears the wrong cache.
This leads to anomalies in key refcounting when doing, say, a kernel
build on an afs volume, that then trigger kasan to report a
use-after-free when the key is viewed in /proc/keys.
Fix this by making exit_creds() look in the passed-in task_struct rather
than in current (the task_struct cleanup code is deferred by RCU and
potentially run in another task).
Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Merge misc fixes from Andrew Morton:
"11 mm fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE
mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid
mm/page-writeback.c: improve arithmetic divisions
mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide
mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio()
mm, debug_pagealloc: don't rely on static keys too early
mm: memcg/slab: fix percpu slab vmstats flushing
mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment
mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment
mm/memory_hotplug: don't free usage map when removing a re-added early section
mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
|
|
If driver debugfs files are accessed, power up the GPU
when necessary.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If power management sysfs or debugfs files are accessed,
power up the GPU when necessary.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes: 4dee6e4ca50a ("drm/amdgpu: use linux size macro to simplify ONE_Kib & One_Mib")
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The conversion to bool is not needed, remove it.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
read_current_link_settings_on_detect() on eDP 1.4+ may use the
edp_supported_link_rates table which is set up by
detect_edp_sink_caps(), so that function needs to be called first.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Martin Leung <martin.leung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
So that it gets included in the initrd. At the moment
this is optional firmware that contains support for HDCP.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
On Arcturus, data fabric hashing is set by the VBIOS, and
affects which addresses map to which memory channels. The
gfx core's caches also need to know this mapping, but the
hash settings for these these caches is set by the driver.
This change queries the DF to understand how the VBIOS
configured DF, then matches the TC hash configuration bits
to do the same thing.
v2: squash in warning fix
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
On Arcturus, we need TC channel hashing, which is set by the
driver, to match DF hashing, which is set by VBIOS. To match
these, we plan to query the DF information and then properly
set the TC configuration bits to match them.
This patch adds the required fields to register definitions
in preparation for a future patch which will use them.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The only data fabric information the adev struct currently
contains is a function pointer table. In the near future,
we will be adding some cached DF information into adev. As
such, this patch creates a new amdgpu_df struct for adev.
Right now, it only containst the old function pointer table,
but new stuff will be added soon.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
Only a single voltage level should be available to Pollock (min level)
Pollock & Dali get misidentified as Renoir, use wrong clk mgr constructor
[HOW]
Add provided Pollock IDs to ASIC Rev. ID list.
Create new Pollock ASIC RID check, fix RV2 & Dali ASIC checks.
Check RID and set max voltage level to 0 if Pollock is detected.
Work around broken ASICREV_IS_RENOIR, IS_RAVEN2, etc. checks by
performing Dali/Pollock checks before they can be misidentified as RN.
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
between UMC RAS err register access restore previous RSMU UMC index mode state
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
in event of GPU reset, XGMI TA unload causes unrecoverable GPU hang
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update mmSDMA0_UTCL1_WATERMK golden setting for renoir.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch updates SMU12_DRIVER_IF_VERSION to 11.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We don't need to store the pre-OS console memory after
the driver has loaded so free it.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Leftover from bring up. We look up the actual pre-OS memory usage
value later in the same function.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It won't get used unless the driver allows the gtt domain for
display buffers which is controlled elsewhere.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It should work on all Raven variants, but some users have
reported issues with original Raven with IOMMU enabled.
So far there have been no issues observed with PCO or RV2.
v2: split out the dm init and domain changes into separate
patches.
Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It won't get used unless the driver allows the gtt domain for
display buffers which is controlled elsewhere.
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
All of the sdma stuff these were used for moves to
the sdma code, so remove them.
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
(v2): Fix preprocessor tag
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
sdma ras funcs are not supported by ASIC prior
to vega20
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Hardcoded offset is not friendly. And another benifit of this
patch is to keep read and write access to this register be
consistent with other similar UMC regsiters in this file.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Both are needed on vega20 and arcturus chip.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Cast to make min() happy. The values are well within
range.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
change the sw ctf setting to smu_v11_0_set_thermal_range()
since software_shutdown_temp shares the same definition and
name in all the smu11 project.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
guest vm gets 0xffffffff when reading RCC_DEV0_EPF0_STRAP0,
as a consequence, the rev_id and external_rev_id are wrong.
workaround it by hardcoding the rev_id to 0, which is the default value.
v2. add comment in the code
Signed-off-by: Tiecheng Zhou <Tiecheng.Zhou@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
There is a use case that link loss happens accidentally,
and we need to recover that link loss as soon as possible.
Under this circumstance, we will perform link training,
and try to recover the link that's just lost.
However, if link PHY is disabled before link training
happens, then DP display will never come back again.
Also, please note that dropping this disable_phy function
call won't break USB-C hotplug functionality.
(This line of code was firstly introduced associated with
a patch to fix USB-C hotplug issue)
[How]
Don't disable DP transmitter and its encoder before link
training happens, even if link loss is detected.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA edc counter registers were added in gfx edc counters
array. When querying gfx error counter in that array, there
is no way to differentiate sdma instance number for different
asic and then results to NULL pointer access when trying to
read sdma register base address for instances greater
than 2 on Vega20.
In addition, this also results to wrong gfx error counters
since it actually added sdma edc counters.
Therefore, sdma edc counter registers should be separated
from gfx edc counter regsiter array and only get initialized
when driver tries to enable sdma ras.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
move ras_late_init and ras_fini to sdma_ras_funcs table
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
invoke sdma query_ras_error_count to get sdma single
bit error count
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
query_ras_error_count function will be invoked to query
single bit error count detected in sdma ip block
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
With default PSP FW loading
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
ucodes for instances are from different location
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Do not ignore amdgpu_irq_add_id return value while registering
VMC page fault interrupt.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This can save users much troubles. As they do not
actually need to care whether swSMU or traditional
powerplay routine should be used.
V2: apply the fixes to vi.c and cik.c also
V3: squash in oops fix
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The SOC15_REG_OFFSET() macro needs to dereference adev->reg_offset[IP]
pointer, which is sometimes NULL when there are fewer than 8 sdma engines.
Avoid that by not initializing the array regardless.
v2: squash in warning fixes
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[why]
Fix compilation warnings on i386 architecture:
undefined reference to `__udivdi3'
[how]
Switch DIV_ROUND_UP to DIV64_U64_ROUND_UP
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mikita Lipski <mikita.lipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/da6467d2649339b42339124fd19a8a2f91cc00dd.camel@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When building ARCH=um with CONFIG_UML_X86=y and CONFIG_64BIT=y we get
the build errors:
drivers/misc/lkdtm/bugs.c: In function ‘lkdtm_UNSET_SMEP’:
drivers/misc/lkdtm/bugs.c:288:8: error: implicit declaration of function ‘native_read_cr4’ [-Werror=implicit-function-declaration]
cr4 = native_read_cr4();
^~~~~~~~~~~~~~~
drivers/misc/lkdtm/bugs.c:290:13: error: ‘X86_CR4_SMEP’ undeclared (first use in this function); did you mean ‘X86_FEATURE_SMEP’?
if ((cr4 & X86_CR4_SMEP) != X86_CR4_SMEP) {
^~~~~~~~~~~~
X86_FEATURE_SMEP
drivers/misc/lkdtm/bugs.c:290:13: note: each undeclared identifier is reported only once for each function it appears in
drivers/misc/lkdtm/bugs.c:297:2: error: implicit declaration of function ‘native_write_cr4’; did you mean ‘direct_write_cr4’? [-Werror=implicit-function-declaration]
native_write_cr4(cr4);
^~~~~~~~~~~~~~~~
direct_write_cr4
So specify that this block of code should only build when
CONFIG_X86_64=y *AND* CONFIG_UML is unset.
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20191213003522.66450-1-brendanhiggins@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|