git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-12-20	pxa168fb: trivial typo fix	Lubomir Rintel
	A missing space in an error message. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	fbdev: fsl-diu: remove redundant null check on cmap	Wen Yang
	The null check on &info->cmap is redundant since cmap is a struct inside fb_info and can never be null, so the check is always true. We may remove it. Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Acked-by: Timur Tabi <timur@kernel.org> Cc: zhong.weidong@zte.com.cn Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	fbdev: omap2: omapfb: convert to DEFINE_SHOW_ATTRIBUTE	Yangtao Li
	Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	fbdev: uvesafb: fix spelling mistake "memoery" -> "memory"	Colin Ian King
	There is a spelling mistake in the module parameter description, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Michal Januszewski <spock@gentoo.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	fbdev: fbmem: add config option to center the bootup logo	Peter Rosin
	If there are extra logos (CONFIG_FB_LOGO_EXTRA) the heights of these extra logos are not considered when centering the first logo vertically. Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	fbdev: fbmem: make fb_show_logo_line return the end instead of the height	Peter Rosin
	In preparation for allowing centering of the bootup logo, make fb_show_logo_line return where the next free framebuffer line is, instead of returning the height of the shown logo. Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	video: fbdev: pxafb: Fix "WARNING: invalid free of devm_ allocated data"	YueHaibing
	'info->modes' got allocated with devm_kcalloc in of_get_pxafb_display. This gives this error message: ./drivers/video/fbdev/pxafb.c:2238:2-7: WARNING: invalid free of devm_ allocated data Fixes: c8f96304ec8b4 ("video: fbdev: pxafb: switch to devm_* API") Cc: stable@kernel.org [v4.19+] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Daniel Mack <daniel@zonque.org> Cc: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	fbdev: fbmem: behave better with small rotated displays and many CPUs	Peter Rosin
	Blitting an image with "negative" offsets is not working since there is no clipping. It hopefully just crashes. For the bootup logo, there is protection so that blitting does not happen as the image is drawn further and further to the right (ROTATE_UR) or further and further down (ROTATE_CW). There is however no protection when drawing in the opposite directions (ROTATE_UD and ROTATE_CCW). Add back this protection. The regression is 20-odd years old but the mindless warning-killing mentality displayed in commit 34bdb666f4b2 ("fbdev: fbmem: remove positive test on unsigned values") is also to blame, methinks. Fixes: 448d479747b8 ("fbdev: fb_do_show_logo() updates") Signed-off-by: Peter Rosin <peda@axentia.se> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Fabian Frederick <ffrederick@users.sourceforge.net> Cc: Geert Uytterhoeven <geert+renesas@glider.be> cc: Geoff Levand <geoff@infradead.org> Cc: James Simmons <jsimmons@users.sf.net> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	video: clps711x-fb: release disp device node in probe()	Alexey Khoroshilov
	clps711x_fb_probe() increments refcnt of disp device node by of_parse_phandle() and leaves it undecremented on both successful and error paths. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Alexander Shiyan <shc_work@mail.ru> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	fbdev: make FB_BACKLIGHT a tristate	Rob Clark
	BACKLIGHT_CLASS_DEVICE is already tristate, but a dependency FB_BACKLIGHT prevents it from being built as a module. There doesn't seem to be any particularly good reason for this, so switch FB_BACKLIGHT over to tristate. Signed-off-by: Rob Clark <robdclark@gmail.com> Tested-by: Arnd Bergmann <arnd@arndb.de> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ulf Magnusson <ulfalizer@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Hans de Goede <j.w.r.degoede@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	udlfb: fix some inconsistent NULL checking	Dan Carpenter
	In the current kernel, then kzalloc() can't fail for small allocations, but if it did fail then we would have a NULL dereference in the error handling. Also in dlfb_usb_disconnect() if "info" were NULL then it would cause an Oops inside the unregister_framebuffer() function but it can't be NULL so let's remove that check. Fixes: 68a958a915ca ("udlfb: handle unplug properly") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Bernie Thompson <bernie@plugable.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Colin Ian King <colin.king@canonical.com> Cc: Wen Yang <wen.yang99@zte.com.cn> [b.zolnierkie: added "Fixes:" tag] Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
2018-12-20	Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git	Kalle Valo
	ath.git patches for 4.21. Major changes: ath10k * add amsdu support for QCA6174 monitor mode * report tx rate using the new ieee80211_tx_rate_update() API * wcn3990 support is not experimental anymore
2018-12-20	drm/amd/display: Fix MST dp_blank REG_WAIT timeout	Jerry (Fangzhi) Zuo
	Need to blank stream before deallocate MST payload. [drm:generic_reg_wait [amdgpu]] ERROR REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:944 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2201 at /var/lib/dkms/amdgpu/18.50-690240/build/amd/amdgpu/../display/dc/dc_helper.c:249 generic_reg_wait+0xe7/0x160 [amdgpu] Call Trace: dce110_stream_encoder_dp_blank+0x11c/0x180 [amdgpu] core_link_disable_stream+0x40/0x230 [amdgpu] ? generic_reg_update_ex+0xdb/0x130 [amdgpu] dce110_reset_hw_ctx_wrap+0xb7/0x1f0 [amdgpu] dce110_apply_ctx_to_hw+0x30/0x430 [amdgpu] ? dce110_apply_ctx_for_surface+0x206/0x260 [amdgpu] dc_commit_state+0x2ba/0x4d0 [amdgpu] amdgpu_dm_atomic_commit_tail+0x297/0xd70 [amdgpu] ? amdgpu_bo_pin_restricted+0x58/0x260 [amdgpu] ? wait_for_completion_timeout+0x1f/0x120 ? wait_for_completion_interruptible+0x1c/0x160 commit_tail+0x3d/0x60 [drm_kms_helper] drm_atomic_helper_commit+0xf6/0x100 [drm_kms_helper] drm_atomic_connector_commit_dpms+0xe5/0xf0 [drm] drm_mode_obj_set_property_ioctl+0x14f/0x250 [drm] drm_mode_connector_property_set_ioctl+0x2e/0x40 [drm] drm_ioctl+0x1e0/0x430 [drm] ? drm_mode_connector_set_obj_prop+0x70/0x70 [drm] ? ep_read_events_proc+0xb0/0xb0 ? ep_scan_ready_list.constprop.18+0x1e6/0x1f0 ? timerqueue_add+0x52/0x80 amdgpu_drm_ioctl+0x49/0x80 [amdgpu] do_vfs_ioctl+0x90/0x5f0 SyS_ioctl+0x74/0x80 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ---[ end trace 3ed7b77a97d60f72 ]--- Signed-off-by: Jerry (Fangzhi) Zuo <Jerry.Zuo@amd.com> Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2018-12-20	drm/amd/display: validate extended dongle caps	Wenjing Liu
	[why] Some dongle doesn't have a valid extended dongle caps, but we still set the extended dongle caps to be valid. This causes validation fails for all timing. [how] If no dp_hdmi_max_pixel_clk is provided, don't use extended dongle caps. Signed-off-by: Wenjing Liu <Wenjing.Liu@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Abdoulaye Berthe <Abdoulaye.Berthe@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-20	drm/amd/display: Use div_u64 for flip timestamp ns to ms	Nicholas Kazlauskas
	Resolves __udivdi3 missing errors when building for i386. Fixes: 6378ef012ddc ("drm/amd/display: Add below the range support for FreeSync") Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-20	drm/amdgpu/uvd:Change uvd ring name convention	James Zhu
	Since umr tool can't handle bracket, change uvd ring name convention. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-20	drm/amd/powerplay: add Vega20 LCLK DPM level setting support	Evan Quan
	Support manual LCLK DPM level switch on Vega20. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rex Zhu <Rex.Zhu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-20	drm/amdgpu: print process info when job timeout	Trigger Huang
	When a job is timeout, try to print the related process information for debugging Signed-off-by: Trigger Huang <Trigger.Huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-20	drm/amdgpu/nbio7.4: add hw bug workaround for vega20	Alex Deucher
	Configure PCIE_CI_CNTL to work around a hw bug that affects some multi-GPU compute workloads. Acked-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-20	drm/amdgpu/nbio6.1: add hw bug workaround for vega10/12	Alex Deucher
	Configure PCIE_CI_CNTL to work around a hw bug that affects some multi-GPU compute workloads. Acked-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-20	staging: mt7621-mmc: Correct spelling mistakes in comments	Jona Crasselt
	Changed "avaiable" to "available" and "interupt" to "interrupt". Signed-off-by: Jona Crasselt <jona.crasselt@fau.de> Signed-off-by: Felix Windsheimer <felix.windsheimer@fau.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-20	ath10k: add support to configure BB timing over wmi	Bhagavathi Perumal S
	Add wmi configuration cmd to configure base band(BB) power amplifier(PA) off timing values in hardware. The default PA off timings were fine tuned to make proper DFS radar detection in QCA reference design. If ODM uses different PA in their design, then the same default PA off timing values cannot be used, it requires different settling time to detect radar pulses very sooner and avoid radar detection problems. In that case it provides provision to select proper PA off timing values based on the PA hardware used. The PA component is part of FEM hardware and new device tree entry "ext-fem-name" is used to indentify the FEM hardware. And this wmi configuration cmd is enabled via wmi service flag "WMI_SERVICE_BB_TIMING_CONFIG_SUPPORT". Other way is to apply these values through calibration data, but recalibration of all boards out there might not be feasible. This change tested on firmware ver 10.2.4-1.0-00042 in QCA988X chipset. Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: fix tx_stats memory leak	Zhi Chen
	Memory of tx_stats was allocated when a STA was added. But it's not freed if the STA failed to be added to driver. This issue could be seen in MDK3 attack case when STA number reached the limit. Tested: QCA9984 with firmware ver 10.4-3.9.0.1-00005 Signed-off-by: Zhi Chen <zhichen@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: fix peer stats null pointer dereference	Zhi Chen
	There was a race condition in SMP that an ath10k_peer was created but its member sta was null. Following are procedures of ath10k_peer creation and member sta access in peer statistics path. 1. Peer creation: ath10k_peer_create() =>ath10k_wmi_peer_create() =>ath10k_wait_for_peer_created() ... # another kernel path, RX from firmware ath10k_htt_t2h_msg_handler() =>ath10k_peer_map_event() =>wake_up() # ar->peer_map[id] = peer //add peer to map #wake up original path from waiting ... # peer->sta = sta //sta assignment 2. RX path of statistics ath10k_htt_t2h_msg_handler() =>ath10k_update_per_peer_tx_stats() =>ath10k_htt_fetch_peer_stats() # peer->sta //sta accessing Any access of peer->sta after peer was added to peer_map but before sta was assigned could cause a null pointer issue. And because these two steps are asynchronous, no proper lock can protect them. So both peer and sta need to be checked before access. Tested: QCA9984 with firmware ver 10.4-3.9.0.1-00005 Signed-off-by: Zhi Chen <zhichen@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: remove an unnecessary NULL check	Dan Carpenter
	The "survey" pointer is the address of an array element. We know that it can't be NULL so this check can be removed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: move non-fatal warn logs to dbg level	Govind Singh
	During driver load below warn logs are printed in the console. Since driver may not implement all wmi events sent by fw and all of them are non-fatal, move this log to debug level to remove un-necessary warn message on console. [ 361.887230] ath10k_snoc a000000.wifi: Unknown eventid: 16393 [ 361.907037] ath10k_snoc a000000.wifi: Unknown eventid: 237569 Signed-off-by: Govind Singh <govinds@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: fix a NULL vs IS_ERR() check	Dan Carpenter
	The devm_memremap() function doesn't return NULLs, it returns error pointers. Fixes: ba94c753ccb4 ("ath10k: add QMI message handshake for wcn3990 client") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: remove work in progress logs from snoc driver	Govind Singh
	All the necessary patches to make wifi running (over SNOC) are merged and tested on SDM845/QCS404 platform with WCN3990 wifi module, hence remove work in progress debug from snoc driver and Kconfig. Signed-off-by: Govind Singh <govinds@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: fix warning due to msdu limit error	Bhagavathi Perumal S
	Some hardwares variants (QCA99x0) are limiting msdu deaggregation with some threshold value(default limit in QCA99x0 is 64 msdus), it was introduced to avoid excessive MSDU-deaggregation in error cases. When number of sub frames exceeds the limit, target hardware will send all msdus starting from present msdu in RAW format as a single msdu packet and it will be indicated with error status bit "RX_MSDU_END_INFO0_MSDU_LIMIT_ERR" set in rx descriptor. This msdu frame is a partial raw MSDU and does't have first msdu and ieee80211 header. It caused below warning message. [ 320.151332] ------------[ cut here ]------------ [ 320.155006] WARNING: CPU: 0 PID: 3 at drivers/net/wireless/ath/ath10k/htt_rx.c:1188 In our issue case, MSDU limit error happened due to FCS error and generated this warning message. This fixes the warning by handling the MSDU limit error. If msdu limit error happens, driver adds first MSDU's ieee80211 header and sets A-MSDU present bit in QOS header so that upper layer processes this frame if it is valid or drop it if FCS error set. And removed the warning message, hence partial msdus without first msdu is expected in msdu limit error cases. Tested on QCA9984, Firmware 10.4-3.6-00104 Signed-off-by: Bhagavathi Perumal S <bperumal@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: disable 4addr source port learning in 10.4 FW by default	Sathishkumar Muruganandam
	Currently in 10.4 FW, all the received 4addr frames are processed for source port learning which is enabled by default. This learning can't be disabled by default in FW since it breaks backward compatibility. Since ath10k uses mac80211 based 4addr mode, source port learning done in 10.4 FW is redundant and also causes issues when 3addr frames are transmitted/received for a 4addr station. One such visible functional impact is when GTK rekey frame from hostapd based AP to 4addr STA is dropped in AP's 10.4 FW. This is since GTK rekey EAPOL frame is 3addr frame on AP interface and STA enabled with 4addr is already allowed for receiving 3addr EAPOL frames. Source port learning implementation in 10.4 FW drops this 3addr GTK rekey frame in AP destinated for 4addr STA causing disassociation and re-association for every GTK rekey session. GTK rekey issue is not seen when learning is disabled in FW. To prevent such issues without breaking backward compatibility, FW advertises new service bit making the source port learning configurable and this learning is being currently disabled during ath10k vdev creation. * Tested HW: QCA9984 * Tested FW: 10.4-3.6.0.1-00004 Signed-off-by: Sathishkumar Muruganandam <murugana@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	ath10k: report tx rate using ieee80211_tx_rate_update()	Anilkumar Kolli
	Mesh path metric needs tx rate information from ieee80211_tx_status() call but in ath10k there is no mechanism to report tx rate information via ieee80211_tx_status(), the tx rate is only accessible via sta_statiscs() op. Per peer tx stats has tx rate info available, Tx rate is available to ath10k driver after every 4 PPDU sent in the air. For each PPDU, ath10k driver updates rate informattion to mac80211 using ieee80211_tx_rate_update(). Per peer txrate information is updated through per peer statistics and is available for QCA9888/QCA9984/QCA4019/QCA998X only Tested on QCA9984 with firmware-5.bin_10.4-3.5.3-00053 Tested on QCA998X with firmware-5.bin_10.2.4-1.0-00036 Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	md: fix raid10 hang issue caused by barrier	Guoqing Jiang
	When both regular IO and resync IO happen at the same time, and if we also need to split regular. Then we can see tasks hang due to barrier. 1. resync thread [ 1463.757205] INFO: task md1_resync:5215 blocked for more than 480 seconds. [ 1463.757207] Not tainted 4.19.5-1-default #1 [ 1463.757209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1463.757212] md1_resync D 0 5215 2 0x80000000 [ 1463.757216] Call Trace: [ 1463.757223] ? __schedule+0x29a/0x880 [ 1463.757231] ? raise_barrier+0x8d/0x140 [raid10] [ 1463.757236] schedule+0x78/0x110 [ 1463.757243] raise_barrier+0x8d/0x140 [raid10] [ 1463.757248] ? wait_woken+0x80/0x80 [ 1463.757257] raid10_sync_request+0x1f6/0x1e30 [raid10] [ 1463.757265] ? _raw_spin_unlock_irq+0x22/0x40 [ 1463.757284] ? is_mddev_idle+0x125/0x137 [md_mod] [ 1463.757302] md_do_sync.cold.78+0x404/0x969 [md_mod] [ 1463.757311] ? wait_woken+0x80/0x80 [ 1463.757336] ? md_rdev_init+0xb0/0xb0 [md_mod] [ 1463.757351] md_thread+0xe9/0x140 [md_mod] [ 1463.757358] ? _raw_spin_unlock_irqrestore+0x2e/0x60 [ 1463.757364] ? __kthread_parkme+0x4c/0x70 [ 1463.757369] kthread+0x112/0x130 [ 1463.757374] ? kthread_create_worker_on_cpu+0x40/0x40 [ 1463.757380] ret_from_fork+0x3a/0x50 2. regular IO [ 1463.760679] INFO: task kworker/0:8:5367 blocked for more than 480 seconds. [ 1463.760683] Not tainted 4.19.5-1-default #1 [ 1463.760684] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1463.760687] kworker/0:8 D 0 5367 2 0x80000000 [ 1463.760718] Workqueue: md submit_flushes [md_mod] [ 1463.760721] Call Trace: [ 1463.760731] ? __schedule+0x29a/0x880 [ 1463.760741] ? wait_barrier+0xdd/0x170 [raid10] [ 1463.760746] schedule+0x78/0x110 [ 1463.760753] wait_barrier+0xdd/0x170 [raid10] [ 1463.760761] ? wait_woken+0x80/0x80 [ 1463.760768] raid10_write_request+0xf2/0x900 [raid10] [ 1463.760774] ? wait_woken+0x80/0x80 [ 1463.760778] ? mempool_alloc+0x55/0x160 [ 1463.760795] ? md_write_start+0xa9/0x270 [md_mod] [ 1463.760801] ? try_to_wake_up+0x44/0x470 [ 1463.760810] raid10_make_request+0xc1/0x120 [raid10] [ 1463.760816] ? wait_woken+0x80/0x80 [ 1463.760831] md_handle_request+0x121/0x190 [md_mod] [ 1463.760851] md_make_request+0x78/0x190 [md_mod] [ 1463.760860] generic_make_request+0x1c6/0x470 [ 1463.760870] raid10_write_request+0x77a/0x900 [raid10] [ 1463.760875] ? wait_woken+0x80/0x80 [ 1463.760879] ? mempool_alloc+0x55/0x160 [ 1463.760895] ? md_write_start+0xa9/0x270 [md_mod] [ 1463.760904] raid10_make_request+0xc1/0x120 [raid10] [ 1463.760910] ? wait_woken+0x80/0x80 [ 1463.760926] md_handle_request+0x121/0x190 [md_mod] [ 1463.760931] ? _raw_spin_unlock_irq+0x22/0x40 [ 1463.760936] ? finish_task_switch+0x74/0x260 [ 1463.760954] submit_flushes+0x21/0x40 [md_mod] So resync io is waiting for regular write io to complete to decrease nr_pending (conf->barrier++ is called before waiting). The regular write io splits another bio after call wait_barrier which call nr_pending++, then the splitted bio would continue with raid10_write_request -> wait_barrier, so the splitted bio has to wait for barrier to be zero, then deadlock happens as follows. resync io regular io raise_barrier wait_barrier generic_make_request wait_barrier To resolve the issue, we need to call allow_barrier to decrease nr_pending before generic_make_request since regular IO is not issued to underlying devices, and wait_barrier is called again to ensure no internal IO happening. Fixes: fc9977dd069e ("md/raid10: simplify the splitting of requests.") Reported-and-tested-by: Siniša Bandin <sinisa@4net.rs> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-12-20	raid10: refactor common wait code from regular read/write request	Guoqing Jiang
	Both raid10_read_request and raid10_write_request share the same code at the beginning of them, so introduce regular_request_wait to clean up code, and call it in both request functions. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-12-20	md: remvoe redundant condition check	Chengguang Xu
	mempool_destroy() can handle NULL pointer correctly, so there is no need to check NULL pointer before calling mempool_destroy(). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-12-20	md: remove set but not used variable 'bi_rdev'	Yue Haibing
	Fixes gcc '-Wunused-but-set-variable' warning: drivers/md/md.c: In function 'md_integrity_add_rdev': drivers/md/md.c:2149:24: warning: variable 'bi_rdev' set but not used [-Wunused-but-set-variable] It not used any more after commit 1501efadc524 ("md/raid: only permit hot-add of compatible integrity profiles") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-12-20	ath10k: add amsdu support for monitor mode	Yu Wang
	When processing HTT_T2H_MSG_TYPE_RX_IN_ORD_PADDR_IND, if the length of a msdu is larger than the tailroom of the rx skb, skb_over_panic issue will happen when calling skb_put. In monitor mode, amsdu will be handled in this path, and msdu_len of the first msdu_desc is the length of the entire amsdu, which might be larger than the maximum length of a skb, in such case, it will hit the issue upon. To fix this issue, process msdu list separately for monitor mode. Successfully tested with: QCA6174 (FW version: RM.4.4.1.c2-00057-QCARMSWP-1). Signed-off-by: Yu Wang <yyuwang@codeaurora.org> [kvalo@codeaurora.org: cosmetic cleanup] Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-12-20	drbd: Change drbd_request_detach_interruptible's return type to int	Nathan Chancellor
	Clang warns when an implicit conversion is done between enumerated types: drivers/block/drbd/drbd_state.c:708:8: warning: implicit conversion from enumeration type 'enum drbd_ret_code' to different enumeration type 'enum drbd_state_rv' [-Wenum-conversion] rv = ERR_INTR; ~ ^~~~~~~~ drbd_request_detach_interruptible's only call site is in the return statement of adm_detach, which returns an int. Change the return type of drbd_request_detach_interruptible to match, silencing Clang's warning. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")	Lars Ellenberg
	And also re-enable partial-zero-out + discard aligned. With the introduction of REQ_OP_WRITE_ZEROES, we started to use that for both WRITE_ZEROES and DISCARDS, hoping that WRITE_ZEROES would "do what we want", UNMAP if possible, zero-out the rest. The example scenario is some LVM "thin" backend. While an un-allocated block on dm-thin reads as zeroes, on a dm-thin with "skip_block_zeroing=true", after a partial block write allocated that block, that same block may well map "undefined old garbage" from the backends on LBAs that have not yet been written to. If we cannot distinguish between zero-out and discard on the receiving side, to avoid "undefined old garbage" to pop up randomly at later times on supposedly zero-initialized blocks, we'd need to map all discards to zero-out on the receiving side. But that would potentially do a full alloc on thinly provisioned backends, even when the expectation was to unmap/trim/discard/de-allocate. We need to distinguish on the protocol level, whether we need to guarantee zeroes (and thus use zero-out, potentially doing the mentioned full-alloc), or if we want to put the emphasis on discard, and only do a "best effort zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing only potential unaligned head and tail clippings to at least try to avoid "false positives" in an online-verify later), hoping that someone set skip_block_zeroing=false. For some discussion regarding this on dm-devel, see also https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html For backward compatibility, P_TRIM means zero-out, unless the DRBD_FF_WZEROES feature flag is agreed upon during handshake. To have upper layers even try to submit WRITE ZEROES requests, we need to announce "efficient zeroout" independently. We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits(): if we can handle "zeroes" efficiently on the protocol, we want to do that, even if our backend does not announce max_write_zeroes_sectors itself. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: skip spurious timeout (ping-timeo) when failing promote	Lars Ellenberg
	If you try to promote a Secondary while connected to a Primary and allow-two-primaries is NOT set, we will wait for "ping-timeout" to give this node a chance to detect a dead primary, in case the cluster manager noticed faster than we did. But if we then are still connected to a Primary, we fail (after an additional timeout of ping-timout). This change skips the spurious second timeout. Most people won't notice really, since "ping-timeout" by default is half a second. But in some installations, ping-timeout may be 10 or 20 seconds or more, and spuriously delaying the error return becomes annoying. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: don't retry connection if peers do not agree on "authentication" settings	Lars Ellenberg
	emma: "Unexpected data packet AuthChallenge (0x0010)" ava: "expected AuthChallenge packet, received: ReportProtocol (0x000b)" "Authentication of peer failed, trying again." Pattern repeats. There is no point in retrying the handshake, if we expect to receive an AuthChallenge, but the peer is not even configured to expect or use a shared secret. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: fix print_st_err()'s prototype to match the definition	Luc Van Oostenryck
	print_st_err() is defined with its 4th argument taking an 'enum drbd_state_rv' but its prototype use an int for it. Fix this by using 'enum drbd_state_rv' in the prototype too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: avoid spurious self-outdating with concurrent disconnect / down	Lars Ellenberg
	If peers are "simultaneously" told to disconnect from each other, either explicitly, or implicitly by taking down the resource, with bad timing, one side may see its disconnect "fail" with a result of "state change failed by peer", and interpret this as "please oudate yourself". Try to catch this by checking for current connection status, and possibly retry as local-only state change instead. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: do not block when adjusting "disk-options" while IO is frozen	Lars Ellenberg
	"suspending" IO is overloaded. It can mean "do not allow new requests" (obviously), but it also may mean "must not complete pending IO", for example while the fencing handlers do their arbitration. When adjusting disk options, we suspend io (disallow new requests), then wait for the activity-log to become unused (drain all IO completions), and possibly replace it with a new activity log of different size. If the other "suspend IO" aspect is active, pending IO completions won't happen, and we would block forever (unkillable drbdsetup process). Fix this by skipping the activity log adjustment if the "al-extents" setting did not change. Also, in case it did change, fail early without blocking if it looks like we would block forever. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: fix comment typos	Lars Ellenberg
	Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: reject attach of unsuitable uuids even if connected	Lars Ellenberg
	Multiple failure scenario: a) all good Connected Primary/Secondary UpToDate/UpToDate b) lose disk on Primary, Connected Primary/Secondary Diskless/UpToDate c) continue to write to the device, changes only make it to the Secondary storage. d) lose disk on Secondary, Connected Primary/Secondary Diskless/Diskless e) now try to re-attach on Primary This would have succeeded before, even though that is clearly the wrong data set to attach to (missing the modifications from c). Because we only compared our "effective" and the "to-be-attached" data generation uuid tags if (device->state.conn < C_CONNECTED). Fix: change that constraint to (device->state.pdsk != D_UP_TO_DATE) compare the uuids, and reject the attach. This patch also tries to improve the reverse scenario: first lose Secondary, then Primary disk, then try to attach the disk on Secondary. Before this patch, the attach on the Secondary succeeds, but since commit drbd: disconnect, if the wrong UUIDs are attached on a connected peer the Primary will notice unsuitable data, and drop the connection hard. Though unfortunately at a point in time during the handshake where we cannot easily abort the attach on the peer without more refactoring of the handshake. We now reject any attach to "unsuitable" uuids, as long as we can see a Primary role, unless we already have access to "good" data. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: attach on connected diskless peer must not shrink a consistent device	Lars Ellenberg
	If we would reject a new handshake, if the peer had attached first, and then connected, we should force disconnect if the peer first connects, and only then attaches. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: fix confusing error message during attach	Lars Ellenberg
	If we attach a (consistent) backing device, which knows about a last-agreed effective size, and that effective size is larger than the currently requested size, we refused to attach with ERR_DISK_TOO_SMALL Failure: (111) Low.dev. smaller than requested DRBD-dev. size. which is confusing to say the least. This patch changes the error code in that case to ERR_IMPLICIT_SHRINK Failure: (170) Implicit device shrinking not allowed. See kernel log. additional info from kernel: To-be-attached device has last effective > current size, and is consistent (9999 > 7777 sectors). Refusing to attach. It also allows to attach with an explicit size. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: disconnect, if the wrong UUIDs are attached on a connected peer	Lars Ellenberg
	With "on-no-data-accessible suspend-io", DRBD requires the next attach or connect to be to the very same data generation uuid tag it lost last. If we first lost connection to the peer, then later lost connection to our own disk, we would usually refuse to re-connect to the peer, because it presents the wrong data set. However, if the peer first connects without a disk, and then attached its disk, we accepted that same wrong data set, which would be "unexpected" by any user of that DRBD and cause "undefined results" (read: very likely data corruption). The fix is to forcefully disconnect as soon as we notice that the peer attached to the "wrong" dataset. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: ignore "all zero" peer volume sizes in handshake	Lars Ellenberg
	During handshake, if we are diskless ourselves, we used to accept any size presented by the peer. Which could be zero if that peer was just brought up and connected to us without having a disk attached first, in which case both peers would just "flip" their volume sizes. Now, even a diskless node will ignore "zero" sizes presented by a diskless peer. Also a currently Diskless Primary will refuse to shrink during handshake: it may be frozen, and waiting for a "suitable" local disk or peer to re-appear (on-no-data-accessible suspend-io). If the peer is smaller than what we used to be, it is not suitable. The logic for a diskless node during handshake is now supposed to be: believe the peer, if - I don't have a current size myself - we agree on the size anyways - I do have a current size, am Secondary, and he has the only disk - I do have a current size, am Primary, and he has the only disk, which is larger than my current size Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-20	drbd: centralize printk reporting of new size into drbd_set_my_capacity()	Lars Ellenberg
	Previously, some implicit resizes that happend during handshake have not been reported as prominently as explicit resize. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>