Age | Commit message (Collapse) | Author |
|
Make sure we hold the rcu lock as we acquire the rcu protected reference
of the object when looking it up from the associated mmap vma.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1083
Fixes: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130143931.1906301-1-chris@chris-wilson.co.uk
(cherry picked from commit 280d14a69da2e71f43408537c008f2775d5e5360)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Only the first and the last nodes were being added to
ref->preallocated_barriers.
Renaming variables to make it more easy to read.
Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com
(cherry picked from commit d4c3c0b8221a72107eaf35c80c40716b81ca463e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Similar to commit ac0e331a628b ("drm/i915: Tighten atomicity of
i915_active_acquire vs i915_active_release") we have the same race of
trying to pin the context underneath a mutex while allowing the
decrement to be atomic outside of that mutex. This leads to the problem
where two threads may simultaneously try to pin the context and the
second not notice that they needed to repin the context.
<2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387!
<4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G U W 5.5.0-rc6-CI-CI_DRM_7755+ #1
<4> [198.669723] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915]
<4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48
<4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296
<4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006
<4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009
<4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000
<4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980
<4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068
<4> [198.669849] FS: 00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000
<4> [198.669857] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0
<4> [198.669872] Call Trace:
<4> [198.669924] intel_timeline_get_seqno+0x12/0x40 [i915]
<4> [198.669977] __i915_request_create+0x76/0x5a0 [i915]
<4> [198.670024] i915_request_create+0x86/0x1c0 [i915]
<4> [198.670068] i915_gem_do_execbuffer+0xbf2/0x2500 [i915]
<4> [198.670082] ? __lock_acquire+0x460/0x15d0
<4> [198.670128] i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915]
<4> [198.670171] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
<4> [198.670181] drm_ioctl_kernel+0xa7/0xf0
<4> [198.670188] drm_ioctl+0x2e1/0x390
<4> [198.670233] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
References: ac0e331a628b ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk
(cherry picked from commit e5429340bfa2dc43a07c3329e0c30cdae4cc0b35)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
As we use a mutex to serialise the first acquire (as it may be a lengthy
operation), but only an atomic decrement for the release, we have to
be careful in case a second thread races and completes both
acquire/release as the first finishes its acquire.
Thread A Thread B
i915_active_acquire i915_active_acquire
atomic_read() == 0 atomic_read() == 0
mutex_lock() mutex_lock()
atomic_read() == 0
ref->active();
atomic_inc()
mutex_unlock()
atomic_read() == 1
i915_active_release
atomic_dec_and_test() -> 0
ref->retire()
atomic_inc() -> 1
mutex_unlock()
So thread A has acquired the ref->active_count but since the ref was
still active at the time, it did not initialise it. By switching the
check inside the mutex to an atomic increment only if already active, we
close the race.
Fixes: c9ad602feabe ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk
(cherry picked from commit ac0e331a628b5ded087eab09fad2ffb082ac61ba)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
i915_gpu_coreddump_put is currently only defined if
CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise.
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 742379c0c400 ("drm/i915: Start chopping up the GPU error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mike Lothian <mike@fireburn.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200124192255.541355-1-chris@chris-wilson.co.uk
(cherry picked from commit 7e36505d0cf82f2920f2fd22ebb14a8b540396a3)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
comp_ctx can be NULL in a very rare case when an admin command is executed
during the execution of ena_remove().
The bug scenario is as follows:
* ena_destroy_device() sets the comp_ctx to be NULL
* An admin command is executed before executing unregister_netdev(),
this can still happen because our device can still receive callbacks
from the netdev infrastructure such as ethtool commands.
* When attempting to access the comp_ctx, the bug occurs since it's set
to NULL
Fix:
Added a check that comp_ctx is not NULL
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Up till kernel 4.11 there was no enum defined for crc32 hash in ethtool,
thus the xor enum was used for supporting crc32.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As the name suggests ETH_RSS_HASH_NO_CHANGE is received upon changing
the key or indirection table using ethtool while keeping the same hash
function.
Also add a function for retrieving the current hash function from
the ena-com layer.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Saeed Bshara <saeedb@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function ena_com_ind_tbl_convert_from_device() has an overflow
bug as explained below. Either way, this function is not needed at
all since we don't retrieve the indirection table from the device
at any point which means that this conversion is not needed.
The bug:
The for loop iterates over all io_sq_queues, when passing the actual
number of used queues the io_sq_queues[i].idx equals 0 since they are
uninitialized which results in the following code to be executed till
the end of the loop:
dev_idx_to_host_tbl[0] = i;
This results dev_idx_to_host_tbl[0] in being equal to
ENA_TOTAL_NUM_QUEUES - 1.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
table
The indirection table has the indices of the Rx queues. When we store it
during set indirection operation, we convert the indices to our internal
representation of the indices.
Our internal representation of the indices is: even indices for Tx and
uneven indices for Rx, where every Tx/Rx pair are in a consecutive order
starting from 0. For example if the driver has 3 queues (3 for Tx and 3
for Rx) then the indices are as follows:
0 1 2 3 4 5
Tx Rx Tx Rx Tx Rx
The BUG:
The issue is that when we satisfy a get request for the indirection
table, we don't convert the indices back to the original representation.
The FIX:
Simply apply the inverse function for the indices of the indirection
table after we set it.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The device receives, stores and retrieves the hash function value as bits
and not as their enum value.
The bug:
* In ena_com_set_hash_function() we set
cmd.u.flow_hash_func.selected_func to the bit value of rss->hash_func.
(1 << rss->hash_func)
* In ena_com_get_hash_function() we retrieve the hash function and store
it's bit value in rss->hash_func. (Now the bit value of rss->hash_func
is stored in rss->hash_func instead of it's enum value)
The fix:
This commit fixes the issue by converting the retrieved hash function
values from the device to the matching enum value of the set bit using
ffs(). ffs() finds the first set bit's index in a word. Since the function
returns 1 for the LSB's index, we need to subtract 1 from the returned
value (note that BIT(0) is 1).
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On old hardware, getting / setting the hash function is not supported while
gettting / setting the indirection table is.
This commit enables us to still show the indirection table on older
hardwares by setting the hash function and key to NULL.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we allocate the key whether the device supports setting the
key or not. This commit adds a check to the allocation function and
handles the error accordingly.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Bug description:
When running "ethtool -x <if_name>" the key shows up as all zeros.
When we use "ethtool -X <if_name> hfunc toeplitz hkey <some:random:key>" to
set the key and then try to retrieve it using "ethtool -x <if_name>" then
we return the correct key because we return the one we saved.
Bug cause:
We don't fetch the key from the device but instead return the key
that we have saved internally which is by default set to zero upon
allocation.
Fix:
This commit fixes the issue by initializing the key to a random value
using netdev_rss_key_fill().
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current implementation of the driver calls skb_tx_timestamp()to add a
software tx timestamp to the skb, however the software-transmit capability
is not reported in ethtool -T.
This commit updates the ethtool structure to report the software-transmit
capability in ethtool -T using the standard ethtool_op_get_ts_info().
This function reports all software timestamping capabilities (tx and rx),
as well as setting phc_index = -1. phc_index is the index of the PTP
hardware clock device that will be used for hardware timestamps. Since we
don't have such a device in ENA, using the default -1 value is the correct
setting.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
>From the documentation of round_jiffies():
"Rounds a time delta in the future (in jiffies) up or down to
(approximately) full seconds. This is useful for timers for which
the exact time they fire does not matter too much, as long as
they fire approximately every X seconds.
By rounding these timers to whole seconds, all such timers will fire
at the same time, rather than at various times spread out. The goal
of this is to have the CPU wake up less, which saves power."
There are 2 parts to this patch:
================================
Part 1:
-------
In our case we need timer_service to be called approximately every
X=1 seconds, and the exact time does not matter, so using round_jiffies()
is the right way to go.
Therefore we add round_jiffies() to the mod_timer() in ena_timer_service().
Part 2:
-------
round_jiffies() is used in check_for_missing_keep_alive() when
getting the jiffies of the expiration of the keep_alive timeout. Here it
is actually a mistake to use round_jiffies() because we want the exact
time when keep_alive should expire and not an approximate rounded time,
which can cause early, false positive, timeouts.
Therefore we remove round_jiffies() in the calculation of
keep_alive_expired() in check_for_missing_keep_alive().
Fixes: 82ef30f13be0 ("net: ena: add hardware hints capability to the driver")
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ethtool -X is called without an hkey, ena_com_fill_hash_function()
is called with key=NULL, which is passed to memcpy causing a crash.
This commit fixes this issue by checking key is not NULL.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit d9d6a9aed3f6 ("i40e: Fix virtchnl_queue_select bitmap
validation") introduced a necessary change for verifying how queue
bitmaps from the iavf driver get validated. Unfortunately, the
conditional was reversed. Fix this.
Fixes: d9d6a9aed3f6 ("i40e: Fix virtchnl_queue_select bitmap validation")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax fixes from Dan Williams:
"A fix for an xfstest failure and some and an update that removes an
fsdax dependency on block devices.
Summary:
- Fix RWF_NOWAIT writes to properly return -EAGAIN
- Clean up an unused helper
- Update dax_writeback_mapping_range to not need a block_device
argument"
* tag 'dax-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: pass NOWAIT flag to iomap_apply
dax: Get rid of fs_dax_get_by_host() helper
dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range()
|
|
If only Tegra194 support is enabled, the tegra30_fuse_read() and
tegra30_fuse_init() function are not declared and cause a build failure.
Add Tegra194 to the preprocessor guard to make sure these functions are
available for Tegra194-only builds as well.
Link: https://lore.kernel.org/r/20200203143114.3967295-1-thierry.reding@gmail.com
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
If the platform triggers a spurious SCI even though the status bit
is not set for any GPE when the system is suspended to idle, it will
be treated as a genuine wakeup, so avoid that by checking if any GPEs
are active at all before returning 'true' from acpi_s2idle_wake().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206413
Fixes: 56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow")
Reported-by: Tsuchiya Yuto <kitakar@gmail.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Introduce a new helper function, acpi_any_gpe_status_set(), for
checking the status bits of all enabled GPEs in one go.
It is needed to distinguish spurious SCIs from genuine ones when
deciding whether or not to wake up the system from suspend-to-idle.
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Rather than the FEATURE_ID flags. Avoids a possible reading past
the end of the array.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reported-by: Aleksandr Mezin <mezin.alexander@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x
|
|
Update to the latest changes.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.5.x
|
|
Former comment looks to be one intended behavior in code,
actually it's not. So correct it.
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
Programming DPPCLK to the same value currently set may cause
underflow while playing video in certain conditions.
[HOW]
Only program DPPCLK if clock is not the same as the
previous value programmed.
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
In DCN hardware sequencer we do actually call ATOM_INIT correctly per
pipe. The workaround is not necessary for command table offloading.
[How]
Drop the workaround since it's not needed.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Chris Park <Chris.Park@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix warning during switching to dpg pause mode for
VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 16
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
GDS clear workaround will cause gfx failure in suspend/resume case.
[ 98.679559] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <gfx_v9_0> failed -110
[ 98.679561] PM: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[ 98.679562] PM: Device 0000:03:00.0 failed to resume async: error -110
As this workaround is specific to the HW bug of GDS's ECC error
existing in cold boot up, so bypass this workaround in suspend/
resume case after booting up.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
hwc->conf was designated specifically for AMD APU IOMMU purposes. This
could cause problems in performance and/or function since APU IOMMU
implementation is elsewhere. Also hwc->conf and hwc->config are
different members of an anonymous union so hwc->conf aliases as
hw->last_tag.
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
On PowerPC, the compiler will tag object files with whether they
use hard or soft float FP ABI and whether they use 64 or 128-bit
long double ABI. On systems with 64-bit long double ABI, a tag
will get emitted whenever a double is used, as on those systems
a long double is the same as a double. This will prevent linkage
as other files are being compiled with hard-float.
On ppc64, this code will never actually get used for the time
being, as the only currently existing hardware using it are the
Renoir APUs. Therefore, until this is testable and can be fixed
properly, at least make sure the build will not fail.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Support pause_state for multiple instance, and it will fix vcn2.5 DPG mode
power off issue on instance 1.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Starting from 14nm, the PLL is built into the PHY and the PLL is mapped
to PHY on 1 to 1 basis. In the code, the DP port is mapped to a PLL that was not
initialized. This causes DP to HDMI dongle to not light up the display.
[How]
Initializations added for PLL2 when creating resources.
Signed-off-by: Isabel Zhang <isabel.zhang@amd.com>
Reviewed-by: Eric Yang <eric.yang2@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Underflow is observed when plug in a 4K@60 monitor with
1366x768 eDP due to DPPCLK is too low.
[How]
Limit minimum DPPCLK to 100MHz.
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Eric Yang <eric.yang2@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Engine can be NULL in some cases, so we must not acquire it.
[How]
Check for NULL engine before acquiring.
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into HEAD
fix style of SPDX License Identifier
* tag 'vfio-ccw-20200206' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw:
vfio-ccw: Use the correct style for SPDX License Identifier
Link: https://lkml.kernel.org/r/20200206170331.1032-1-cohuck@redhat.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The only way to reach this allocation is via
qdio_establish()
qdio_detect_hsicq()
qdio_enable_async_operation()
and since qdio_establish() uses wait_event_*() just a few lines ealier,
we can trust that it certainly is never called from atomic context.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Warnings like below can fill up the dmesg while disconnecting RDMA
connections.
Hence, remove the unwanted WARN_ON.
WARNING: CPU: 6 PID: 0 at drivers/infiniband/sw/siw/siw_cm.c:1229 siw_cm_llp_data_ready+0xc1/0xd0 [siw]
RIP: 0010:siw_cm_llp_data_ready+0xc1/0xd0 [siw]
Call Trace:
<IRQ>
tcp_data_queue+0x226/0xb40
tcp_rcv_established+0x220/0x620
tcp_v4_do_rcv+0x12a/0x1e0
tcp_v4_rcv+0xb05/0xc00
ip_local_deliver_finish+0x69/0x210
ip_local_deliver+0x6b/0xe0
ip_rcv+0x273/0x362
__netif_receive_skb_core+0xb35/0xc30
netif_receive_skb_internal+0x3d/0xb0
napi_gro_frags+0x13b/0x200
t4_ethrx_handler+0x433/0x7d0 [cxgb4]
process_responses+0x318/0x580 [cxgb4]
napi_rx_handler+0x14/0x100 [cxgb4]
net_rx_action+0x149/0x3b0
__do_softirq+0xe3/0x30a
irq_exit+0x100/0x110
do_IRQ+0x7f/0xe0
common_interrupt+0xf/0xf
</IRQ>
Link: https://lore.kernel.org/r/20200207141429.27927-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE
when entering into TERM state.
In c4iw_modify_qp(), disconnect operation should only be performed when
the modify_qp call is invoked from ib_core. And all other internal
modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should
call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks
like below can occur:
Call Trace:
schedule+0x2f/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.isra.5+0x2d0/0x4a0
c4iw_ep_disconnect+0x39/0x430 => tries to reacquire ep lock again
c4iw_modify_qp+0x468/0x10d0
rx_data+0x218/0x570 => acquires ep lock
process_work+0x5f/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
kthread+0x112/0x130
ret_from_fork+0x35/0x40
Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state")
Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com
Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
When binding a QP with a counter and the QP state is not RESET, return
failure if the rts2rts_qp_counters_set_id is not supported by the
device.
This is to prevent cases like manual bind for Connect-IB devices from
returning success when the feature is not supported.
Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter")
Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Add a check that the size specified in the flow spec header doesn't cause
an overflow when calculating the filter size, and thus prevent access to
invalid memory. The following crash from syzkaller revealed it.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:memchr_inv+0xd3/0x330
Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85
ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 <80> 3c 01
00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01
RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04
RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820
RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009
R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004
R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040
FS: 00007f9ee0e51700(0000) GS:ffff88811b100000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
spec_filter_size.part.16+0x34/0x50
ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770
ib_uverbs_ex_create_flow+0x9ea/0x1b40
ib_uverbs_write+0xaa5/0xdf0
__vfs_write+0x7c/0x100
vfs_write+0x168/0x4a0
ksys_write+0xc8/0x200
do_syscall_64+0x9c/0x390
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465b49
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49
RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc
R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
Fixes: 94e03f11ad1f ("IB/uverbs: Add support for flow tag")
Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
[WHY & HOW]
Previously drain clk was unconstrained and fill clk was constrained on fclk.
We want to change it to fill clk unconstrained and drain clock constrained
to dcfclk.
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
The optimized_require flag is needed to set watermarks and clocks lower
in certain conditions. This flag is set to true and then set to false
while programming front end in dcn20.
[HOW]
Do not set the flag to false while disabling plane.
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Driver crash with psr feature enabled due to divide-by-zero error.
This is a regression after rework to calculate static screen frame
number entry time.
[How]
Correct order of operations to avoid divide-by-zero.
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: Zhan Liu <Zhan.Liu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When the hfi1 device is shut down during a system reboot, it is possible
that some QPs might have not not freed by ULPs. More requests could be
post sent and a lingering timer could be triggered to schedule more packet
sends, leading to a crash:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0
PGD 0
Oops: 0000 1 SMP
Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en
sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod
CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018
task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000
RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0
RSP: 0018:ffff88105df43d48 EFLAGS: 00010046
RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000
RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f
RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000
R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0
R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000
FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000
ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98
ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde
Call Trace:
IRQ
[ffffffff810a6b85] queue_work_on+0x45/0x50
[ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1]
[ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt]
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffff81097316] call_timer_fn+0x36/0x110
[ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ffffffff8109982d] run_timer_softirq+0x22d/0x310
[ffffffff81090b3f] __do_softirq+0xef/0x280
[ffffffff816b6a5c] call_softirq+0x1c/0x30
[ffffffff8102d3c5] do_softirq+0x65/0xa0
[ffffffff81090ec5] irq_exit+0x105/0x110
[ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50
[ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80
EOI
[ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0
[ffffffff81527b48] cpuidle_idle_call+0xd8/0x210
[ffffffff81034fee] arch_cpu_idle+0xe/0x30
[ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0
[ffffffff81051af6] start_secondary+0x1b6/0x230
Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00
RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0
RSP ffff88105df43d48
CR2: 0000000000000102
The solution is to reset the QPs before the device resources are freed.
This reset will change the QP state to prevent post sends and delete
timers to prevent callbacks.
Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table")
Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Cleaning up a pq can result in the following warning and panic:
WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100)
Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G OE ------------ 3.10.0-957.38.3.el7.x86_64 #1
Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
Call Trace:
[<ffffffff90365ac0>] dump_stack+0x19/0x1b
[<ffffffff8fc98b78>] __warn+0xd8/0x100
[<ffffffff8fc98bff>] warn_slowpath_fmt+0x5f/0x80
[<ffffffff8ff970c3>] __list_del_entry+0x63/0xd0
[<ffffffff8ff9713d>] list_del+0xd/0x30
[<ffffffff8fddda70>] kmem_cache_destroy+0x50/0x110
[<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
[<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
[<ffffffff8fe4519c>] __fput+0xec/0x260
[<ffffffff8fe453fe>] ____fput+0xe/0x10
[<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
[<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
[<ffffffff90379134>] int_signal+0x12/0x17
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
PGD 2cdab19067 PUD 2f7bfdb067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.38.3.el7.x86_64 #1
Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000
RIP: 0010:[<ffffffff8fe1f93e>] [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
RSP: 0018:ffff88b5393abd60 EFLAGS: 00010287
RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003
RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800
RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19
R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000
R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000
FS: 00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
[<ffffffff8fe20d44>] __kmem_cache_shutdown+0x14/0x80
[<ffffffff8fddda78>] kmem_cache_destroy+0x58/0x110
[<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
[<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
[<ffffffff8fe4519c>] __fput+0xec/0x260
[<ffffffff8fe453fe>] ____fput+0xe/0x10
[<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
[<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
[<ffffffff90379134>] int_signal+0x12/0x17
Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39
RIP [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
RSP <ffff88b5393abd60>
CR2: 0000000000000010
The panic is the result of slab entries being freed during the destruction
of the pq slab.
The code attempts to quiesce the pq, but looking for n_req == 0 doesn't
account for new requests.
Fix the issue by using SRCU to get a pq pointer and adjust the pq free
logic to NULL the fd pq pointer prior to the quiesce.
Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized")
Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Each user context is allocated a certain number of RcvArray (TID)
entries and these entries are managed through TID groups. These groups
are put into one of three lists in each user context: tid_group_list,
tid_used_list, and tid_full_list, depending on the number of used TID
entries within each group. When TID packets are expected, one or more
TID groups will be allocated. After the packets are received, the TID
groups will be freed. Since multiple user threads may access the TID
groups simultaneously, a mutex exp_mutex is used to synchronize the
access. However, when the user file is closed, it tries to release
all TID groups without acquiring the mutex first, which risks a race
condition with another thread that may be releasing its TID groups,
leading to data corruption.
This patch addresses the issue by acquiring the mutex first before
releasing the TID groups when the file is closed.
Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs")
Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Commit e812744c5f95 ("drm: msm: a6xx: Add support for A618") missed
updating the VBIF flush in a6xx_gmu_shutdown and instead
inserted the new sequence into a6xx_pm_suspend along with a redundant
GMU idle.
Move a6xx_bus_clear_pending_transactions to a6xx_gmu.c and use it in
the appropriate place in the shutdown routine and remove the redundant
idle call.
v2: Remove newly unused variable that was triggering a warning
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Fixes: e812744c5f95 ("drm: msm: a6xx: Add support for A618")
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Fixup the GMU bus table values for the sc7180 target.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Fixes: e812744c5f95 ("drm: msm: a6xx: Add support for A618")
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Commit e812744c5f95 ("drm: msm: a6xx: Add support for A618") added a
universal GBIF un-halt into a6xx_start(). This can cause problems for
a630 targets which do not use GBIF and might have access protection
enabled on the region now occupied by the GBIF registers.
But it turns out that we didn't need to unhalt the GBIF in this path
since the stop function already takes care of that after executing a flush
but before turning off the headswitch. We should be confident that the
GBIF is open for business when we restart the hardware.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Fixes: e812744c5f95 ("drm: msm: a6xx: Add support for A618")
Signed-off-by: Rob Clark <robdclark@chromium.org>
|