linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-11-20	drm/i915/fixed: simplify is_fixed16_zero()	Jani Nikula
	Simply return the condition. No functional changes. Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181116120729.7580-4-jani.nikula@intel.com
2018-11-20	drm/i915/fixed: simplify FP_16_16_MAX definition	Jani Nikula
	No need to use a compound statement enclosed in parenthesis where a C99 compound literal will do. No functional changes. Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181116120729.7580-3-jani.nikula@intel.com
2018-11-20	drm/i915/fixed: prefer kernel types over stdint types	Jani Nikula
	While at it, conform to kernel spacing (i.e. no space) after cast. No functional changes. Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181116120729.7580-2-jani.nikula@intel.com
2018-11-20	drm/i915: extract fixed point math to i915_fixed.h	Jani Nikula
	Reduce bloat in one of the bigger header files. Fix some indentation while at it. No functional changes. v2: Add include guards (Joonas) Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181116120729.7580-1-jani.nikula@intel.com
2018-11-20	drm/i915: Write GPU relocs harder with gen3	Chris Wilson
	Under moderate amounts of GPU stress, we can observe on Bearlake and Pineview (later gen3 models) that we execute the following batch buffer before the write into the batch is coherent. Adding extra (tested with upto 32x) MI_FLUSH to either the invalidation, flush or both phases does not solve the incoherency issue with the relocations, but emitting the MI_STORE_DWORD_IMM twice does. So be it. Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181119154153.15327-1-chris@chris-wilson.co.uk
2018-11-20	drm/bridge/sii902x: Add missing dependency on I2C_MUX	Fabrizio Castro
	kbuild test robot reports: >> ERROR: "i2c_mux_add_adapter" [drivers/gpu/drm/bridge/sii902x.ko] undefined! >> ERROR: "i2c_mux_alloc" [drivers/gpu/drm/bridge/sii902x.ko] undefined! >> ERROR: "i2c_mux_del_adapters" [drivers/gpu/drm/bridge/sii902x.ko] undefined! Quite obviously the driver depends on I2C_MUX, but adding a "depends on" introduces a recursive dependency, therefore this patch selects I2C_MUX instead. Fixes: 21d808405fe4 ("drm/bridge/sii902x: Fix EDID readback") Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Link: https://lists.01.org/pipermail/kbuild-all/2018-November/054924.html Acked-by: Peter Rosin <peda@axentia.se> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/1542633978-22064-1-git-send-email-fabrizio.castro@bp.renesas.com
2018-11-20	drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture	Chris Wilson
	Since capturing the error state requires fiddling around with the GGTT to read arbitrary buffers and is itself run under stop_machine(), it deadlocks the machine (effectively a hard hang) when run in conjunction with Broxton's VTd workaround to serialize GGTT access. v2: Store the ERR_PTR in first_error so that the error can be reported to the user via sysfs. v3: Mention the quirk in dmesg (using info as per usual) Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: John Harrison <john.C.Harrison@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181102161232.17742-5-chris@chris-wilson.co.uk (cherry picked from commit fb6f0b64e455b207a636346588e65bf9598d30eb) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-11-19	Merge tag 'mlx5-fixes-2018-11-19' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-11-19 The following fixes are for mlx5 core and netdev driver. For -stable v4.16 bc7fda7d4637 ('net/mlx5e: IPoIB, Reset QP after channels are closed') For -stable v4.17 36917a270395 ('net/mlx5: IPSec, Fix the SA context hash key') For -stable v4.18 6492a432be3a ('net/mlx5e: Always use the match level enum when parsing TC rule match') c3f81be236b1 ('net/mlx5e: Removed unnecessary warnings in FEC caps query') c5ce2e736b64 ('net/mlx5e: Fix selftest for small MTUs') For -stable v4.19 effcd896b25e ('net/mlx5e: Adjust to max number of channles when re-attaching') 394cbc5acd68 ('net/mlx5e: RX, verify received packet size in Linear Striding RQ') 447cbb3613c8 ('net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded') c223c1574612 ('net/mlx5e: Claim TC hw offloads support only under a proper build config') Please pull and let me know if there's any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19	net/ibmnvic: Fix deadlock problem in reset	Juliet Kim
	This patch changes to use rtnl_lock only during a reset to avoid deadlock that could occur when a thread operating close is holding rtnl_lock and waiting for reset_lock acquired by another thread, which is waiting for rtnl_lock in order to set the number of tx/rx queues during a reset. Also, we now setting the number of tx/rx queues during a soft reset for failover or LPM events. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19	qed: Fix QM getters to always return a valid pq	Denis Bolotin
	The getter callers doesn't know the valid Physical Queues (PQ) values. This patch makes sure that a valid PQ will always be returned. The patch consists of 3 fixes: - When qed_init_qm_get_idx_from_flags() receives a disabled flag, it returned PQ 0, which can potentially be another function's pq. Verify that flag is enabled, otherwise return default start_pq. - When qed_init_qm_get_idx_from_flags() receives an unknown flag, it returned NULL and could lead to a segmentation fault. Return default start_pq instead. - A modulo operation was added to MCOS/VFS PQ getters to make sure the PQ returned is in range of the required flag. Fixes: b5a9ee7cf3be ("qed: Revise QM cofiguration") Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19	qed: Fix bitmap_weight() check	Denis Bolotin
	Fix the condition which verifies that only one flag is set. The API bitmap_weight() should receive size in bits instead of bytes. Fixes: b5a9ee7cf3be ("qed: Revise QM cofiguration") Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19	net/mlx5e: Fix failing ethtool query on FEC query error	Shay Agroskin
	If FEC caps query fails when executing 'ethtool <interface>' the whole callback fails unnecessarily, fixed that by replacing the error return code with debug logging only. Fixes: 6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: Removed unnecessary warnings in FEC caps query	Shay Agroskin
	Querying interface FEC caps with 'ethtool [int]' after link reset throws warning regading link speed. This warning is not needed as there is already an indication in user space that the link is not up. Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: Fix wrong field name in FEC related functions	Shay Agroskin
	This bug would result in reading wrong FEC capabilities for 10G/40G. Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: Fix a bug in turning off FEC policy in unsupported speeds	Shay Agroskin
	Some speeds don't support turning FEC policy off. In case a requested FEC policy is not supported for a speed (including current speed), its new FEC policy would be: no FEC - if disabling FEC is supported for that speed unchanged - else Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net: ena: update driver version from 2.0.1 to 2.0.2	Arthur Kiyanovski
	Update driver version due to critical bug fixes. Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19	net: ena: fix crash during ena_remove()	Arthur Kiyanovski
	In ena_remove() we have the following stack call: ena_remove() unregister_netdev() ena_destroy_device() netif_carrier_off() Calling netif_carrier_off() causes linkwatch to try to handle the link change event on the already unregistered netdev, which leads to a read from an unreadable memory address. This patch switches the order of the two functions, so that netif_carrier_off() is called on a regiestered netdev. To accomplish this fix we also had to: 1. Remove the set bit ENA_FLAG_TRIGGER_RESET 2. Add a sanitiy check in ena_close() both to prevent double device reset (when calling unregister_netdev() ena_close is called, but the device was already deleted in ena_destroy_device()). 3. Set the admin_queue running state to false to avoid using it after device was reset (for example when calling ena_destroy_all_io_queues() right after ena_com_dev_reset() in ena_down) Fixes: 944b28aa2982 ("net: ena: fix missing lock during device destruction") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19	net: ena: fix crash during failed resume from hibernation	Arthur Kiyanovski
	During resume from hibernation if ena_restore_device fails, ena_com_dev_reset() is called, and uses the readless read mechanism, which was already destroyed by the call to ena_com_mmio_reg_read_request_destroy(). This causes a NULL pointer reference. In this commit we switch the call order of the above two functions to avoid this crash. Fixes: d7703ddbd7c9 ("net: ena: fix rare bug when failed restart/resume is followed by driver removal") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19	net/mlx5e: Fix selftest for small MTUs	Valentine Fatiev
	Loopback test had fixed packet size, which can be bigger than configured MTU. Shorten the loopback packet size to be bigger than minimal MTU allowed by the device. Text field removed from struct 'mlx5ehdr' as redundant to allow send small packets as minimal allowed MTU. Fixes: d605d66 ("net/mlx5e: Add support for ethtool self diagnostics test") Signed-off-by: Valentine Fatiev <valentinef@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: RX, verify received packet size in Linear Striding RQ	Moshe Shemesh
	In case of striding RQ, we use MPWRQ (Multi Packet WQE RQ), which means that WQE (RX descriptor) can be used for many packets and so the WQE is much bigger than MTU. In virtualization setups where the port mtu can be larger than the vf mtu, if received packet is bigger than MTU, it won't be dropped by HW on too small receive WQE. If we use linear SKB in striding RQ, since each stride has room for mtu size payload and skb info, an oversized packet can lead to crash for crossing allocated page boundary upon the call to build_skb. So driver needs to check packet size and drop it. Introduce new SW rx counter, rx_oversize_pkts_sw_drop, which counts the number of packets dropped by the driver for being too large. As a new field is added to the RQ struct, re-open the channels whenever this field is being used in datapath (i.e., in the case of linear Striding RQ). Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: Apply the correct check for supporting TC esw rules split	Roi Dayan
	The mirror and not the output count is the one denoting a split. Fix to condition the offload attempt on the mirror count being > 0 along the firmware to have the related capability. Fixes: 592d36515969 ("net/mlx5e: Parse mirroring action for offloaded TC eswitch flows") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Yossi Kuperman <yossiku@mellanox.com> Reviewed-by: Chris Mi <chrism@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: Adjust to max number of channles when re-attaching	Yuval Avnery
	When core driver enters deattach/attach flow after pci reset, Number of logical CPUs may have changed. As a result we need to update the cpu affiliated resource tables. 1. indirect rqt list 2. eq table Reproduction (PowerPC): echo 1000 > /sys/kernel/debug/powerpc/eeh_max_freezes ppc64_cpu --smt=on # Restart driver modprobe -r ... ; modprobe ... # Link up ifconfig ... # Only physical CPUs ppc64_cpu --smt=off # Inject PCI errors so PCI will reset - calling the pci error handler echo 0x8000000000000000 > /sys/kernel/debug/powerpc/<PCI BUS>/err_injct_inboundA Call trace when trying to add non-existing rqs to an indirect rqt: mlx5e_redirect_rqt+0x84/0x260 [mlx5_core] (unreliable) mlx5e_redirect_rqts+0x188/0x190 [mlx5_core] mlx5e_activate_priv_channels+0x488/0x570 [mlx5_core] mlx5e_open_locked+0xbc/0x140 [mlx5_core] mlx5e_open+0x50/0x130 [mlx5_core] mlx5e_nic_enable+0x174/0x1b0 [mlx5_core] mlx5e_attach_netdev+0x154/0x290 [mlx5_core] mlx5e_attach+0x88/0xd0 [mlx5_core] mlx5_attach_device+0x168/0x1e0 [mlx5_core] mlx5_load_one+0x1140/0x1210 [mlx5_core] mlx5_pci_resume+0x6c/0xf0 [mlx5_core] Create cq will fail when trying to use non-existing EQ. Fixes: 89d44f0a6c73 ("net/mlx5_core: Add pci error handlers to mlx5_core driver") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: Always use the match level enum when parsing TC rule match	Or Gerlitz
	We get the match level (none, l2, l3, l4) while going over the match dissectors of an offloaded tc rule. When doing this, the match level enum and the not min inline enum values should be used, fix that. This worked accidentally b/c both enums have the same numerical values. Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: Claim TC hw offloads support only under a proper build config	Or Gerlitz
	Currently, we are only supporting tc hw offloads when the eswitch support is compiled in, but we are not gating the adevertizment of the NETIF_F_HW_TC feature on this config being set. Fix it, and while doing that, also avoid dealing with the feature on ethtool when the config is not set. Fixes: e8f887ac6a45 ('net/mlx5e: Introduce tc offload support') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded	Or Gerlitz
	For the "all" ethertype we should not care whether the packet has vlans. Besides being wrong, the way we did it caused FW error for rules such as: tc filter add dev eth0 protocol all parent ffff: \ prio 1 flower skip_sw action drop b/c the matching meta-data (outer headers bit in struct mlx5_flow_spec) wasn't set. Fix that by matching on vlan non-existence only if we were also told to match on the ethertype. Fixes: cee26487620b ('net/mlx5e: Set vlan masks for all offloaded TC rules') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5e: IPoIB, Reset QP after channels are closed	Denis Drozdov
	The mlx5e channels should be closed before mlx5i_uninit_underlay_qp puts the QP into RST (reset) state during mlx5i_close. Currently QP state incorrectly set to RST before channels got deactivated and closed, since mlx5_post_send request expects QP in RTS (Ready To Send) state. The fix is to keep QP in RTS state until mlx5e channels get closed and to reset QP afterwards. Also this fix is simply correct in order to keep the open/close flow symmetric, i.e mlx5i_init_underlay_qp() is called first thing at open, the correct thing to do is to call mlx5i_uninit_underlay_qp() last thing at close, which is exactly what this patch is doing. Fixes: dae37456c8ac ("net/mlx5: Support for attaching multiple underlay QPs to root flow table") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	net/mlx5: IPSec, Fix the SA context hash key	Raed Salem
	The commit "net/mlx5: Refactor accel IPSec code" introduced a bug where asynchronous short time change in hash key value by create/release SA context might happen during an asynchronous hash resize operation this could cause a subsequent remove SA context operation to fail as the key value used during resize is not the same key value used when remove SA context operation is invoked. This commit fixes the bug by defining the SA context hash key such that it includes only fields that never change during the lifetime of the SA context object. Fixes: d6c4f0298cec ("net/mlx5: Refactor accel IPSec code") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19	drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c: Remove duplicate header	Brajeswar Ghosh
	Remove gca/gfx_8_0_enum.h which is included more than once Signed-off-by: Brajeswar Ghosh <brajeswar.linux@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdgpu/psp: use define rather than magic number for mode1 reset	Alex Deucher
	Use the define rather than hardcoded value. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/scheduler: Fix bad job be re-processed in TDR	Trigger Huang
	A bad job is the one triggered TDR(In the current amdgpu's implementation, actually all the jobs in the current joq-queue will be treated as bad jobs). In the recovery process, its fence will be fake signaled and as a result, the work behind will be scheduled to delete it from the mirror list, but if the TDR process is invoked before the work's execution, then this bad job might be processed again and the call dma_fence_set_error to its fence in TDR process will lead to kernel warning trace: [ 143.033605] WARNING: CPU: 2 PID: 53 at ./include/linux/dma-fence.h:437 amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] kernel: [ 143.033606] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 snd_hda_codec_generic crypto_simd glue_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev snd_seq_device snd_timer snd soundcore binfmt_misc input_leds mac_hid serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 8139too floppy psmouse 8139cp mii i2c_piix4 pata_acpi [ 143.033649] CPU: 2 PID: 53 Comm: kworker/2:1 Tainted: G OE 4.15.0-20-generic #21-Ubuntu [ 143.033650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 143.033653] Workqueue: events drm_sched_job_timedout [amd_sched] [ 143.033656] RIP: 0010:amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] [ 143.033657] RSP: 0018:ffffa9f880fe7d48 EFLAGS: 00010202 [ 143.033659] RAX: 0000000000000007 RBX: ffff9b98f2b24c00 RCX: ffff9b98efef4f08 [ 143.033660] RDX: ffff9b98f2b27400 RSI: ffff9b98f2b24c50 RDI: ffff9b98efef4f18 [ 143.033660] RBP: ffffa9f880fe7d98 R08: 0000000000000001 R09: 00000000000002b6 [ 143.033661] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b98efef3430 [ 143.033662] R13: ffff9b98efef4d80 R14: ffff9b98efef4e98 R15: ffff9b98eaf91c00 [ 143.033663] FS: 0000000000000000(0000) GS:ffff9b98ffd00000(0000) knlGS:0000000000000000 [ 143.033664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 143.033665] CR2: 00007fc49c96d470 CR3: 000000001400a005 CR4: 00000000003606e0 [ 143.033669] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 143.033669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 143.033670] Call Trace: [ 143.033744] amdgpu_device_gpu_recover+0x144/0x820 [amdgpu] [ 143.033788] amdgpu_job_timedout+0x9b/0xa0 [amdgpu] [ 143.033791] drm_sched_job_timedout+0xcc/0x150 [amd_sched] [ 143.033795] process_one_work+0x1de/0x410 [ 143.033797] worker_thread+0x32/0x410 [ 143.033799] kthread+0x121/0x140 [ 143.033801] ? process_one_work+0x410/0x410 [ 143.033803] ? kthread_create_worker_on_cpu+0x70/0x70 [ 143.033806] ret_from_fork+0x35/0x40 So just delete the bad job from mirror list directly Changes in v3: - Add a helper function to delete the bad jobs from mirror list and call it directly before the job's fence is signaled Changes in v2: - delete the useless list node check - also delete bad jobs in drm_sched_main because: kthread_unpark(ring->sched.thread) will be invoked very early before amdgpu_device_gpu_recover's return, then drm_sched_main will have chance to pick up a new job from the job queue. This new job will be added into the mirror list and processed by amdgpu_job_run, but may not be deleted from the mirror list on time due to the same reason. And finally re-processed by drm_sched_job_recovery Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Reviewed-by: Christian König <chrstian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdgpu/gfx: use proper offset define for MEC doorbells	Alex Deucher
	Looks like a copy paste typo. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdkfd: Workaround PASID missing in gfx9 interrupt payload under non HWS	Yong Zhao
	This is a known gfx9 HW issue, and this change can perfectly workaround the issue. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdkfd: Adjust the debug message in KFD ISR	Yong Zhao
	This makes debug message get printed even when there is early return. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdkfd: Added Vega12 and Polaris12 for KFD.	Gang Ba
	Add Vega12 and Polaris12 device info and device IDs to KFD. Signed-off-by: Gang Ba <gaba@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdkfd: Replace mqd with mqd_mgr as the variable name for mqd_manager	Yong Zhao
	This will make reading code much easier. This fixes a few spots missed in a previous commit with the same title. Signed-off-by: Yong Zhao <yong.zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	mtd: spi-nor: Fix Cadence QSPI page fault kernel panic	Thor Thayer
	The current Cadence QSPI driver caused a kernel panic sporadically when writing to QSPI. The problem was caused by writing more bytes than needed because the QSPI operated on 4 bytes at a time. <snip> [ 11.202044] Unable to handle kernel paging request at virtual address bffd3000 [ 11.209254] pgd = e463054d [ 11.211948] [bffd3000] pgd=2fffb811, pte=00000000, *ppte=00000000 [ 11.218202] Internal error: Oops: 7 [#1] SMP ARM [ 11.222797] Modules linked in: [ 11.225844] CPU: 1 PID: 1317 Comm: systemd-hwdb Not tainted 4.17.7-d0c45cd44a8f [ 11.235796] Hardware name: Altera SOCFPGA Arria10 [ 11.240487] PC is at __raw_writesl+0x70/0xd4 [ 11.244741] LR is at cqspi_write+0x1a0/0x2cc </snip> On a page boundary limit the number of bytes copied from the tx buffer to remain within the page. This patch uses a temporary buffer to hold the 4 bytes to write and then copies only the bytes required from the tx buffer. Reported-by: Adrian Amborzewicz <adrian.ambrozewicz@intel.com> Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-11-19	drm/amd/pp: handle negative values when reading OD	Greathouse, Joseph
	Reading the sysfs files pp_sclk_od and pp_mclk_od return the percentage difference between the VBIOS-provided default frequency and the current (possibly user-set) frequency in the highest SCLK and MCLK DPM states, respectively. Writing to these files provides an easy mechanism for setting a higher-than-default maximum frequency. We normally only allow values >= 0 to be written here. However, with the addition of pp_od_clk_voltage, we now allow users to set custom DPM tables. If they then set the maximum DPM state to something less than the default, later reads of pp_*_od should return a negative value. The highest DPM state is now less than the VBIOS-provided default, so the percentage is negative. The math to calculate this was originally performed with unsigned values, meaning reads that should return negative values returned meaningless data. This patch corrects that issue and normalizes how all of the calculations are done across the various hwmgr types. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdgpu: Add missing firmware entry for HAINAN	Takashi Iwai
	Due to lack of MODULE_FIRMWARE() with hainan_mc.bin, the driver doesn't work properly in initrd. Let's add it. Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1116239 Fixes: 8eaf2b1faaf4 ("drm/amdgpu: switch firmware path for SI parts") Cc: <stable@vger.kernel.org> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amd/powerplay: disable Vega20 DS related features	Evan Quan
	Disable these features on Vega20 for now. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Feifei Xu<Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset	Felix Kuehling
	On Vega20 and other pre-production GPUs, powerplay is not enabled yet. Check for NULL pointers before calling pp_funcs function pointers. Also affects Kaveri. CC: Joerg Roedel <jroedel@suse.de> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2018-11-19	Revert "drm/amdgpu: use GMC v9 KIQ workaround only for the GFXHUB" (v2)	Chengming Gui
	With GFXOFF enabled, this patch will cause PCO amdgpu_test failed, but GFXOFF is necessary for PCO, so revert the patch. This reverts commit b83761bb0b09ec11c924afe9d88e458cb16a0372. v2: add a comment for future reference (Alex) Reviewed-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jack Gui <Jack.Gui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amd/powerplay: Ratelimit all "was not implemented" messages	Joerg Roedel
	Running kfdtest on an AMD Carizzo flooded the kernel log with thousands of these "was not implemented" messages, making it impossible to see other messages there. Ratelimit the messages to prevent user-space from flooding the kernel log. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amd/display: Clean up DCN1 clock requests	David Francis
	[Why] There was a full clock request struct of which only one value was being used. [How] Replace the struct with a uint32_t Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amd/display: Support amdgpu "max bpc" connector property (v2)	Nicholas Kazlauskas
	[Why] Many panels support more than 8bpc but some modes are unavailable while running at greater than 8bpc due to DP/HDMI bandwidth constraints. Support for more than 8bpc was added recently in the driver but it defaults to the maximum supported bpc - locking out these modes. This should be a user configurable option such that the user can select what bpc configuration they would like. [How] This patch adds support for getting and setting the amdgpu driver specific "max bpc" property on the connector. It also adds support for limiting the output bpc based on the property value. The default limitation is the lowest value in the range, 8bpc. This was the old value before the range was uncapped. This patch should be updated/replaced later once common drm support for max bpc lands. Bugzilla: https://bugs.freedesktop.org/108542 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201585 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200645 Fixes: e03fd3f300f6 ("drm/amd/display: Do not limit color depth to 8bpc") v2: rebase on upstream (Alex) Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdgpu: Add amdgpu "max bpc" connector property (v2)	Nicholas Kazlauskas
	[Why] Many panels support more than 8bpc but some modes are unavailable while running at greater than 8bpc due to DP/HDMI bandwidth constraints. Support for more than 8bpc was added recently in the driver but it defaults to the maximum supported bpc - locking out these modes. This should be a user configurable option such that the user can select what bpc configuration they would like. [How] This patch introduces the "max bpc" amdgpu driver specific connector property so the user can limit the maximum bpc. It ranges from 8 to 16. This doesn't directly set the preferred bpc for the panel since it follows Intel's existing driver conventions. This proprety should be removed once common drm support for max bpc lands. v2: rebase on upstream (Alex) Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amdgpu: remove set but not used variable 'ring'	YueHaibing
	Fixes gcc '-Wunused-but-set-variable' warning: drivers/gpu/drm/amd/amdgpu/psp_v10_0.c: In function 'psp_v10_0_ring_stop': drivers/gpu/drm/amd/amdgpu/psp_v10_0.c:230:19: warning: variable 'ring' set but not used [-Wunused-but-set-variable] drivers/gpu/drm/amd/amdgpu/psp_v3_1.c: In function 'psp_v3_1_ring_stop': drivers/gpu/drm/amd/amdgpu/psp_v3_1.c:359:19: warning: variable ‘ring’ set but not used [-Wunused-but-set-variable] It not used since commit 4ef72453311a ("drm/amdgpu: added api for stopping psp ring (v2)") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amd/amdgpu/sriov: Aligned the definition with libgv	Emily Deng
	Aligned the amd_sriov_msg_pf2vf_info_header and amd_sriov_msg_pf2vf_info_header's definition with libgv. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amd/display: Get backlight controller id from link	David Francis
	[Why] dc_link_set_backlight_level can be called from a context where the stream is unknown. In this case, we can still find which controller is driving this particular backlight [How] Compare links for equality instead of streams Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amd/display: expose surface confirm color function	Charlene Liu
	expose dcn10_get_surface_visual_confirm_color() to be used in the future Signed-off-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-19	drm/amd/display: fix pipe interdependent hubp programming	Dmytro Laktyushkin
	A number of registers need to be updated for all active pipes wherever any pipe causes a change in watermarks. This change separates programming of these registers into a separate function call that is called for all active pipes during a bw update. Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Reviewed-by: Tony Cheng <Tony.Cheng@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>