Age | Commit message (Collapse) | Author |
|
Simply return the condition. No functional changes.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181116120729.7580-4-jani.nikula@intel.com
|
|
No need to use a compound statement enclosed in parenthesis where a C99
compound literal will do. No functional changes.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181116120729.7580-3-jani.nikula@intel.com
|
|
While at it, conform to kernel spacing (i.e. no space) after cast. No
functional changes.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181116120729.7580-2-jani.nikula@intel.com
|
|
Reduce bloat in one of the bigger header files. Fix some indentation
while at it. No functional changes.
v2: Add include guards (Joonas)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181116120729.7580-1-jani.nikula@intel.com
|
|
Under moderate amounts of GPU stress, we can observe on Bearlake and
Pineview (later gen3 models) that we execute the following batch buffer
before the write into the batch is coherent. Adding extra (tested with
upto 32x) MI_FLUSH to either the invalidation, flush or both phases does
not solve the incoherency issue with the relocations, but emitting the
MI_STORE_DWORD_IMM twice does. So be it.
Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_tiled_fence_blits # blb/pnv
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181119154153.15327-1-chris@chris-wilson.co.uk
|
|
kbuild test robot reports:
>> ERROR: "i2c_mux_add_adapter" [drivers/gpu/drm/bridge/sii902x.ko] undefined!
>> ERROR: "i2c_mux_alloc" [drivers/gpu/drm/bridge/sii902x.ko] undefined!
>> ERROR: "i2c_mux_del_adapters" [drivers/gpu/drm/bridge/sii902x.ko] undefined!
Quite obviously the driver depends on I2C_MUX, but adding a "depends on"
introduces a recursive dependency, therefore this patch selects I2C_MUX
instead.
Fixes: 21d808405fe4 ("drm/bridge/sii902x: Fix EDID readback")
Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Link: https://lists.01.org/pipermail/kbuild-all/2018-November/054924.html
Acked-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1542633978-22064-1-git-send-email-fabrizio.castro@bp.renesas.com
|
|
Since capturing the error state requires fiddling around with the GGTT
to read arbitrary buffers and is itself run under stop_machine(), it
deadlocks the machine (effectively a hard hang) when run in conjunction
with Broxton's VTd workaround to serialize GGTT access.
v2: Store the ERR_PTR in first_error so that the error can be reported
to the user via sysfs.
v3: Mention the quirk in dmesg (using info as per usual)
Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: John Harrison <john.C.Harrison@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181102161232.17742-5-chris@chris-wilson.co.uk
(cherry picked from commit fb6f0b64e455b207a636346588e65bf9598d30eb)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-11-19
The following fixes are for mlx5 core and netdev driver.
For -stable v4.16
bc7fda7d4637 ('net/mlx5e: IPoIB, Reset QP after channels are closed')
For -stable v4.17
36917a270395 ('net/mlx5: IPSec, Fix the SA context hash key')
For -stable v4.18
6492a432be3a ('net/mlx5e: Always use the match level enum when parsing TC rule match')
c3f81be236b1 ('net/mlx5e: Removed unnecessary warnings in FEC caps query')
c5ce2e736b64 ('net/mlx5e: Fix selftest for small MTUs')
For -stable v4.19
effcd896b25e ('net/mlx5e: Adjust to max number of channles when re-attaching')
394cbc5acd68 ('net/mlx5e: RX, verify received packet size in Linear Striding RQ')
447cbb3613c8 ('net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded')
c223c1574612 ('net/mlx5e: Claim TC hw offloads support only under a proper build config')
Please pull and let me know if there's any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch changes to use rtnl_lock only during a reset to avoid
deadlock that could occur when a thread operating close is holding
rtnl_lock and waiting for reset_lock acquired by another thread,
which is waiting for rtnl_lock in order to set the number of tx/rx
queues during a reset.
Also, we now setting the number of tx/rx queues during a soft reset
for failover or LPM events.
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Denis Bolotin says:
====================
qed: Fix Queue Manager getters
This patch series fixes various queue manager getter functions. It is
important to make sure the getter's caller will receive a valid queue even
in error case to prevent more serious bugs.
Please consider applying to net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The getter callers doesn't know the valid Physical Queues (PQ) values.
This patch makes sure that a valid PQ will always be returned.
The patch consists of 3 fixes:
- When qed_init_qm_get_idx_from_flags() receives a disabled flag, it
returned PQ 0, which can potentially be another function's pq. Verify
that flag is enabled, otherwise return default start_pq.
- When qed_init_qm_get_idx_from_flags() receives an unknown flag, it
returned NULL and could lead to a segmentation fault. Return default
start_pq instead.
- A modulo operation was added to MCOS/VFS PQ getters to make sure the
PQ returned is in range of the required flag.
Fixes: b5a9ee7cf3be ("qed: Revise QM cofiguration")
Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the condition which verifies that only one flag is set. The API
bitmap_weight() should receive size in bits instead of bytes.
Fixes: b5a9ee7cf3be ("qed: Revise QM cofiguration")
Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix a deadlock whereby the NFSv4 state manager can get stuck in the
delegation return code, waiting for a layout return to complete in
another thread. If the server reboots before that other thread
completes, then we need to be able to start a second state
manager thread in order to perform recovery.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If FEC caps query fails when executing 'ethtool <interface>'
the whole callback fails unnecessarily, fixed that by replacing the
error return code with debug logging only.
Fixes: 6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Querying interface FEC caps with 'ethtool [int]' after link reset
throws warning regading link speed.
This warning is not needed as there is already an indication in
user space that the link is not up.
Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This bug would result in reading wrong FEC capabilities for 10G/40G.
Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Some speeds don't support turning FEC policy off. In case a requested
FEC policy is not supported for a speed (including current speed), its new
FEC policy would be:
no FEC - if disabling FEC is supported for that speed
unchanged - else
Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Arthur Kiyanovski says:
====================
net: ena: hibernation and rmmod bug fixes
This patchset includes 2 bug fixes:
1. A fix to a crash during resume from hibernation.
2. A fix to an illegal memory access during driver removal (e.g. during rmmod)
which might cause a crash in certain systems.
The subminor number in the driver version is also promoted to indicate driver
was changed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update driver version due to critical bug fixes.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In ena_remove() we have the following stack call:
ena_remove()
unregister_netdev()
ena_destroy_device()
netif_carrier_off()
Calling netif_carrier_off() causes linkwatch to try to handle the
link change event on the already unregistered netdev, which leads
to a read from an unreadable memory address.
This patch switches the order of the two functions, so that
netif_carrier_off() is called on a regiestered netdev.
To accomplish this fix we also had to:
1. Remove the set bit ENA_FLAG_TRIGGER_RESET
2. Add a sanitiy check in ena_close()
both to prevent double device reset (when calling unregister_netdev()
ena_close is called, but the device was already deleted in
ena_destroy_device()).
3. Set the admin_queue running state to false to avoid using it after
device was reset (for example when calling ena_destroy_all_io_queues()
right after ena_com_dev_reset() in ena_down)
Fixes: 944b28aa2982 ("net: ena: fix missing lock during device destruction")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During resume from hibernation if ena_restore_device fails,
ena_com_dev_reset() is called, and uses the readless read mechanism,
which was already destroyed by the call to
ena_com_mmio_reg_read_request_destroy(). This causes a NULL pointer
reference.
In this commit we switch the call order of the above two functions
to avoid this crash.
Fixes: d7703ddbd7c9 ("net: ena: fix rare bug when failed restart/resume is followed by driver removal")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Different from processing the addstrm_out request, The receiver handles
an addstrm_in request by sending back an addstrm_out request to the
sender who will increase its stream's in and incnt later.
Now stream->incnt has been increased since it sent out the addstrm_in
request in sctp_send_add_streams(), with the wrong stream->incnt will
even cause crash when copying stream info from the old stream's in to
the new one's in sctp_process_strreset_addstrm_out().
This patch is to fix it by simply removing the stream->incnt change
from sctp_send_add_streams().
Fixes: 242bd2d519d7 ("sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request Parameter")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Loopback test had fixed packet size, which can be bigger than configured
MTU. Shorten the loopback packet size to be bigger than minimal MTU
allowed by the device. Text field removed from struct 'mlx5ehdr'
as redundant to allow send small packets as minimal allowed MTU.
Fixes: d605d66 ("net/mlx5e: Add support for ethtool self diagnostics test")
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In case of striding RQ, we use MPWRQ (Multi Packet WQE RQ), which means
that WQE (RX descriptor) can be used for many packets and so the WQE is
much bigger than MTU. In virtualization setups where the port mtu can
be larger than the vf mtu, if received packet is bigger than MTU, it
won't be dropped by HW on too small receive WQE. If we use linear SKB in
striding RQ, since each stride has room for mtu size payload and skb
info, an oversized packet can lead to crash for crossing allocated page
boundary upon the call to build_skb. So driver needs to check packet
size and drop it.
Introduce new SW rx counter, rx_oversize_pkts_sw_drop, which counts the
number of packets dropped by the driver for being too large.
As a new field is added to the RQ struct, re-open the channels whenever
this field is being used in datapath (i.e., in the case of linear
Striding RQ).
Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The mirror and not the output count is the one denoting a split.
Fix to condition the offload attempt on the mirror count being > 0
along the firmware to have the related capability.
Fixes: 592d36515969 ("net/mlx5e: Parse mirroring action for offloaded TC eswitch flows")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Yossi Kuperman <yossiku@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When core driver enters deattach/attach flow after pci reset,
Number of logical CPUs may have changed.
As a result we need to update the cpu affiliated resource tables.
1. indirect rqt list
2. eq table
Reproduction (PowerPC):
echo 1000 > /sys/kernel/debug/powerpc/eeh_max_freezes
ppc64_cpu --smt=on
# Restart driver
modprobe -r ... ; modprobe ...
# Link up
ifconfig ...
# Only physical CPUs
ppc64_cpu --smt=off
# Inject PCI errors so PCI will reset - calling the pci error handler
echo 0x8000000000000000 > /sys/kernel/debug/powerpc/<PCI BUS>/err_injct_inboundA
Call trace when trying to add non-existing rqs to an indirect rqt:
mlx5e_redirect_rqt+0x84/0x260 [mlx5_core] (unreliable)
mlx5e_redirect_rqts+0x188/0x190 [mlx5_core]
mlx5e_activate_priv_channels+0x488/0x570 [mlx5_core]
mlx5e_open_locked+0xbc/0x140 [mlx5_core]
mlx5e_open+0x50/0x130 [mlx5_core]
mlx5e_nic_enable+0x174/0x1b0 [mlx5_core]
mlx5e_attach_netdev+0x154/0x290 [mlx5_core]
mlx5e_attach+0x88/0xd0 [mlx5_core]
mlx5_attach_device+0x168/0x1e0 [mlx5_core]
mlx5_load_one+0x1140/0x1210 [mlx5_core]
mlx5_pci_resume+0x6c/0xf0 [mlx5_core]
Create cq will fail when trying to use non-existing EQ.
Fixes: 89d44f0a6c73 ("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
We get the match level (none, l2, l3, l4) while going over the match
dissectors of an offloaded tc rule. When doing this, the match level
enum and the not min inline enum values should be used, fix that.
This worked accidentally b/c both enums have the same numerical values.
Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently, we are only supporting tc hw offloads when the eswitch
support is compiled in, but we are not gating the adevertizment
of the NETIF_F_HW_TC feature on this config being set.
Fix it, and while doing that, also avoid dealing with the feature
on ethtool when the config is not set.
Fixes: e8f887ac6a45 ('net/mlx5e: Introduce tc offload support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
For the "all" ethertype we should not care whether the packet has
vlans. Besides being wrong, the way we did it caused FW error
for rules such as:
tc filter add dev eth0 protocol all parent ffff: \
prio 1 flower skip_sw action drop
b/c the matching meta-data (outer headers bit in struct mlx5_flow_spec)
wasn't set. Fix that by matching on vlan non-existence only if we were
also told to match on the ethertype.
Fixes: cee26487620b ('net/mlx5e: Set vlan masks for all offloaded TC rules')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The mlx5e channels should be closed before mlx5i_uninit_underlay_qp
puts the QP into RST (reset) state during mlx5i_close. Currently QP
state incorrectly set to RST before channels got deactivated and closed,
since mlx5_post_send request expects QP in RTS (Ready To Send) state.
The fix is to keep QP in RTS state until mlx5e channels get closed
and to reset QP afterwards.
Also this fix is simply correct in order to keep the open/close flow
symmetric, i.e mlx5i_init_underlay_qp() is called first thing at open,
the correct thing to do is to call mlx5i_uninit_underlay_qp() last thing
at close, which is exactly what this patch is doing.
Fixes: dae37456c8ac ("net/mlx5: Support for attaching multiple underlay QPs to root flow table")
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The commit "net/mlx5: Refactor accel IPSec code" introduced a
bug where asynchronous short time change in hash key value
by create/release SA context might happen during an asynchronous
hash resize operation this could cause a subsequent remove SA
context operation to fail as the key value used during resize is
not the same key value used when remove SA context operation is
invoked.
This commit fixes the bug by defining the SA context hash key
such that it includes only fields that never change during the
lifetime of the SA context object.
Fixes: d6c4f0298cec ("net/mlx5: Refactor accel IPSec code")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Remove gca/gfx_8_0_enum.h which is included more than once
Signed-off-by: Brajeswar Ghosh <brajeswar.linux@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the define rather than hardcoded value.
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
A bad job is the one triggered TDR(In the current amdgpu's
implementation, actually all the jobs in the current joq-queue will
be treated as bad jobs). In the recovery process, its fence
will be fake signaled and as a result, the work behind will be scheduled
to delete it from the mirror list, but if the TDR process is invoked
before the work's execution, then this bad job might be processed again
and the call dma_fence_set_error to its fence in TDR process will lead to
kernel warning trace:
[ 143.033605] WARNING: CPU: 2 PID: 53 at ./include/linux/dma-fence.h:437 amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched]
kernel: [ 143.033606] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 snd_hda_codec_generic crypto_simd glue_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev snd_seq_device snd_timer snd soundcore binfmt_misc input_leds mac_hid serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 8139too floppy psmouse 8139cp mii i2c_piix4 pata_acpi
[ 143.033649] CPU: 2 PID: 53 Comm: kworker/2:1 Tainted: G OE 4.15.0-20-generic #21-Ubuntu
[ 143.033650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 143.033653] Workqueue: events drm_sched_job_timedout [amd_sched]
[ 143.033656] RIP: 0010:amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched]
[ 143.033657] RSP: 0018:ffffa9f880fe7d48 EFLAGS: 00010202
[ 143.033659] RAX: 0000000000000007 RBX: ffff9b98f2b24c00 RCX: ffff9b98efef4f08
[ 143.033660] RDX: ffff9b98f2b27400 RSI: ffff9b98f2b24c50 RDI: ffff9b98efef4f18
[ 143.033660] RBP: ffffa9f880fe7d98 R08: 0000000000000001 R09: 00000000000002b6
[ 143.033661] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b98efef3430
[ 143.033662] R13: ffff9b98efef4d80 R14: ffff9b98efef4e98 R15: ffff9b98eaf91c00
[ 143.033663] FS: 0000000000000000(0000) GS:ffff9b98ffd00000(0000) knlGS:0000000000000000
[ 143.033664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 143.033665] CR2: 00007fc49c96d470 CR3: 000000001400a005 CR4: 00000000003606e0
[ 143.033669] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 143.033669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 143.033670] Call Trace:
[ 143.033744] amdgpu_device_gpu_recover+0x144/0x820 [amdgpu]
[ 143.033788] amdgpu_job_timedout+0x9b/0xa0 [amdgpu]
[ 143.033791] drm_sched_job_timedout+0xcc/0x150 [amd_sched]
[ 143.033795] process_one_work+0x1de/0x410
[ 143.033797] worker_thread+0x32/0x410
[ 143.033799] kthread+0x121/0x140
[ 143.033801] ? process_one_work+0x410/0x410
[ 143.033803] ? kthread_create_worker_on_cpu+0x70/0x70
[ 143.033806] ret_from_fork+0x35/0x40
So just delete the bad job from mirror list directly
Changes in v3:
- Add a helper function to delete the bad jobs from mirror list and call
it directly *before* the job's fence is signaled
Changes in v2:
- delete the useless list node check
- also delete bad jobs in drm_sched_main because:
kthread_unpark(ring->sched.thread) will be invoked very early before
amdgpu_device_gpu_recover's return, then drm_sched_main will have
chance to pick up a new job from the job queue. This new job will be
added into the mirror list and processed by amdgpu_job_run, but may
not be deleted from the mirror list on time due to the same reason.
And finally re-processed by drm_sched_job_recovery
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com>
Reviewed-by: Christian König <chrstian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Looks like a copy paste typo.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This is a known gfx9 HW issue, and this change can perfectly workaround
the issue.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This makes debug message get printed even when there is early return.
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add Vega12 and Polaris12 device info and device IDs to KFD.
Signed-off-by: Gang Ba <gaba@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This will make reading code much easier. This fixes a few spots missed in a
previous commit with the same title.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
xfs_file_remap_range() is only used in fs/xfs/xfs_file.c, so make it
static.
This addresses a gcc warning when -Wmissing-prototypes is enabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Page writeback indirectly handles shared extents via the existence
of overlapping COW fork blocks. If COW fork blocks exist, writeback
always performs the associated copy-on-write regardless if the
underlying blocks are actually shared. If the blocks are shared,
then overlapping COW fork blocks must always exist.
fstests shared/010 reproduces a case where a buffered write occurs
over a shared block without performing the requisite COW fork
reservation. This ultimately causes writeback to the shared extent
and data corruption that is detected across md5 checks of the
filesystem across a mount cycle.
The problem occurs when a buffered write lands over a shared extent
that crosses an extent size hint boundary and that also happens to
have a partial COW reservation that doesn't cover the start and end
blocks of the data fork extent.
For example, a buffered write occurs across the file offset (in FSB
units) range of [29, 57]. A shared extent exists at blocks [29, 35]
and COW reservation already exists at blocks [32, 34]. After
accommodating a COW extent size hint of 32 blocks and the existing
reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
blocks of reservation at offset 0 and returns with COW reservation
across the range of [0, 34]. The associated data fork extent is
still [29, 35], however, which isn't fully covered by the COW
reservation.
This leads to a buffered write at file offset 35 over a shared
extent without associated COW reservation. Writeback eventually
kicks in, performs an overwrite of the underlying shared block and
causes the associated data corruption.
Update xfs_reflink_reserve_cow() to accommodate the fact that a
delalloc allocation request may not fully cover the extent in the
data fork. Trim the data fork extent appropriately, just as is done
for shared extent boundaries and/or existing COW reservations that
happen to overlap the start of the data fork extent. This prevents
shared/010 failures due to data corruption on reflink enabled
filesystems.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
The current Cadence QSPI driver caused a kernel panic sporadically
when writing to QSPI. The problem was caused by writing more bytes
than needed because the QSPI operated on 4 bytes at a time.
<snip>
[ 11.202044] Unable to handle kernel paging request at virtual address bffd3000
[ 11.209254] pgd = e463054d
[ 11.211948] [bffd3000] *pgd=2fffb811, *pte=00000000, *ppte=00000000
[ 11.218202] Internal error: Oops: 7 [#1] SMP ARM
[ 11.222797] Modules linked in:
[ 11.225844] CPU: 1 PID: 1317 Comm: systemd-hwdb Not tainted 4.17.7-d0c45cd44a8f
[ 11.235796] Hardware name: Altera SOCFPGA Arria10
[ 11.240487] PC is at __raw_writesl+0x70/0xd4
[ 11.244741] LR is at cqspi_write+0x1a0/0x2cc
</snip>
On a page boundary limit the number of bytes copied from the tx buffer
to remain within the page.
This patch uses a temporary buffer to hold the 4 bytes to write and then
copies only the bytes required from the tx buffer.
Reported-by: Adrian Amborzewicz <adrian.ambrozewicz@intel.com>
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
|
|
Reading the sysfs files pp_sclk_od and pp_mclk_od return the
percentage difference between the VBIOS-provided default
frequency and the current (possibly user-set) frequency in
the highest SCLK and MCLK DPM states, respectively.
Writing to these files provides an easy mechanism for
setting a higher-than-default maximum frequency. We
normally only allow values >= 0 to be written here.
However, with the addition of pp_od_clk_voltage, we now
allow users to set custom DPM tables. If they then set
the maximum DPM state to something less than the default,
later reads of pp_*_od should return a negative value.
The highest DPM state is now less than the VBIOS-provided
default, so the percentage is negative.
The math to calculate this was originally performed with
unsigned values, meaning reads that should return negative
values returned meaningless data. This patch corrects that
issue and normalizes how all of the calculations are done
across the various hwmgr types.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Due to lack of MODULE_FIRMWARE() with hainan_mc.bin, the driver
doesn't work properly in initrd. Let's add it.
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1116239
Fixes: 8eaf2b1faaf4 ("drm/amdgpu: switch firmware path for SI parts")
Cc: <stable@vger.kernel.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Disable these features on Vega20 for now.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Feifei Xu<Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
On Vega20 and other pre-production GPUs, powerplay is not enabled yet.
Check for NULL pointers before calling pp_funcs function pointers.
Also affects Kaveri.
CC: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
This reverts commit 22d7be267eaa8114dcc28d66c1c347f667d7878a.
The dst's mtu in transport can be updated by a non sctp place like
in xfrm where the MTU information didn't get synced between asoc,
transport and dst, so it is still needed to do the pmtu check
in sctp_packet_config.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As rfc7496#section4.5 says about SCTP_PR_SUPPORTED:
This socket option allows the enabling or disabling of the
negotiation of PR-SCTP support for future associations. For existing
associations, it allows one to query whether or not PR-SCTP support
was negotiated on a particular association.
It means only sctp sock's prsctp_enable can be set.
Note that for the limitation of SCTP_{CURRENT|ALL}_ASSOC, we will
add it when introducing SCTP_{FUTURE|CURRENT|ALL}_ASSOC for linux
sctp in another patchset.
v1->v2:
- drop the params.assoc_id check as Neil suggested.
Fixes: 28aa4c26fce2 ("sctp: add SCTP_PR_SUPPORTED on sctp sockopt")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now sctp increases sk_wmem_alloc by 1 when doing set_owner_w for the
skb allocked in sctp_packet_transmit and decreases by 1 when freeing
this skb.
But when this skb goes through networking stack, some subcomponents
might change skb->truesize and add the same amount on sk_wmem_alloc.
However sctp doesn't know the amount to decrease by, it would cause
a leak on sk->sk_wmem_alloc and the sock can never be freed.
Xiumei found this issue when it hit esp_output_head() by using sctp
over ipsec, where skb->truesize is added and so is sk->sk_wmem_alloc.
Since sctp has used sk_wmem_queued to count for writable space since
Commit cd305c74b0f8 ("sctp: use sk_wmem_queued to check for writable
space"), it's ok to fix it by counting sk_wmem_alloc by skb truesize
in sctp_packet_transmit.
Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With GFXOFF enabled, this patch will cause PCO amdgpu_test failed,
but GFXOFF is necessary for PCO, so revert the patch.
This reverts commit b83761bb0b09ec11c924afe9d88e458cb16a0372.
v2: add a comment for future reference (Alex)
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jack Gui <Jack.Gui@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|