linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-05-14	Merge branch 'lockless-qdisc-packet-stuck'	David S. Miller
	Yunsheng Lin says: ==================== ix packet stuck problem for lockless qdisc This patchset fixes the packet stuck problem mentioned in [1]. Patch 1: Add STATE_MISSED flag to fix packet stuck problem. Patch 2: Fix a tx_action rescheduling problem after STATE_MISSED flag is added in patch 1. Patch 3: Fix the significantly higher CPU consumption problem when multiple threads are competing on a saturated outgoing device. V8: Change function name as suggested by Jakub and fix some typo in patch 3, adjust commit log in patch 2, and add Acked-by from Jakub. V7: Fix netif_tx_wake_queue() data race noted by Jakub. V6: Some performance optimization in patch 1 suggested by Jakub and drop NET_XMIT_DROP checking in patch 3. V5: add patch 3 to fix the problem reported by Michal Kubecek. V4: Change STATE_NEED_RESCHEDULE to STATE_MISSED and add patch 2. [1]. https://lkml.org/lkml/2019/10/9/42 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14	net: sched: fix tx action reschedule issue with stopped queue	Yunsheng Lin
	The netdev qeueue might be stopped when byte queue limit has reached or tx hw ring is full, net_tx_action() may still be rescheduled if STATE_MISSED is set, which consumes unnecessary cpu without dequeuing and transmiting any skb because the netdev queue is stopped, see qdisc_run_end(). This patch fixes it by checking the netdev queue state before calling qdisc_run() and clearing STATE_MISSED if netdev queue is stopped during qdisc_run(), the net_tx_action() is rescheduled again when netdev qeueue is restarted, see netif_tx_wake_queue(). As there is time window between netif_xmit_frozen_or_stopped() checking and STATE_MISSED clearing, between which STATE_MISSED may set by net_tx_action() scheduled by netif_tx_wake_queue(), so set the STATE_MISSED again if netdev queue is restarted. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Reported-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14	net: sched: fix tx action rescheduling issue during deactivation	Yunsheng Lin
	Currently qdisc_run() checks the STATE_DEACTIVATED of lockless qdisc before calling __qdisc_run(), which ultimately clear the STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED is set before clearing STATE_MISSED, there may be rescheduling of net_tx_action() at the end of qdisc_run_end(), see below: CPU0(net_tx_atcion) CPU1(__dev_xmit_skb) CPU2(dev_deactivate) . . . . set STATE_MISSED . . __netif_schedule() . . . set STATE_DEACTIVATED . . qdisc_reset() . . . .<--------------- . synchronize_net() clear __QDISC_STATE_SCHED \| . . . \| . . . \| . some_qdisc_is_busy() . \| . return false . \| . . test STATE_DEACTIVATED \| . . __qdisc_run() not called \| . . . \| . . test STATE_MISS \| . . __netif_schedule()--------\| . . . . . . . . __qdisc_run() is not called by net_tx_atcion() in CPU0 because CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule is called at the end of qdisc_run_end(), causing tx action rescheduling problem. qdisc_run() called by net_tx_action() runs in the softirq context, which should has the same semantic as the qdisc_run() called by __dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a synchronize_net() between STATE_DEACTIVATED flag being set and qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely bail out for the deactived lockless qdisc in net_tx_action(), and qdisc_reset() will reset all skb not dequeued yet. So add the rcu_read_lock() explicitly to protect the qdisc_run() and do the STATE_DEACTIVATED checking in net_tx_action() before calling qdisc_run_begin(). Another option is to do the checking in the qdisc_run_end(), but it will add unnecessary overhead for non-tx_action case, because __dev_queue_xmit() will not see qdisc with STATE_DEACTIVATED after synchronize_net(), the qdisc with STATE_DEACTIVATED can only be seen by net_tx_action() because of __netif_schedule(). The STATE_DEACTIVATED checking in qdisc_run() is to avoid race between net_tx_action() and qdisc_reset(), see: commit d518d2ed8640 ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc"). As the bailout added above for deactived lockless qdisc in net_tx_action() provides better protection for the race without calling qdisc_run() at all, so remove the STATE_DEACTIVATED checking in qdisc_run(). After qdisc_reset(), there is no skb in qdisc to be dequeued, so clear the STATE_MISSED in dev_reset_queue() too. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> V8: Clearing STATE_MISSED before calling __netif_schedule() has avoid the endless rescheduling problem, but there may still be a unnecessary rescheduling, so adjust the commit log. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14	net: sched: fix packet stuck problem for lockless qdisc	Yunsheng Lin
	Lockless qdisc has below concurrent problem: cpu0 cpu1 . . q->enqueue . . . qdisc_run_begin() . . . dequeue_skb() . . . sch_direct_xmit() . . . . q->enqueue . qdisc_run_begin() . return and do nothing . . qdisc_run_end() . cpu1 enqueue a skb without calling __qdisc_run() because cpu0 has not released the lock yet and spin_trylock() return false for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb enqueued by cpu1 when calling dequeue_skb() because cpu1 may enqueue the skb after cpu0 calling dequeue_skb() and before cpu0 calling qdisc_run_end(). Lockless qdisc has below another concurrent problem when tx_action is involved: cpu0(serving tx_action) cpu1 cpu2 . . . . q->enqueue . . qdisc_run_begin() . . dequeue_skb() . . . q->enqueue . . . . sch_direct_xmit() . . . qdisc_run_begin() . . return and do nothing . . . clear __QDISC_STATE_SCHED . . qdisc_run_begin() . . return and do nothing . . . . . . qdisc_run_end() . This patch fixes the above data race by: 1. If the first spin_trylock() return false and STATE_MISSED is not set, set STATE_MISSED and retry another spin_trylock() in case other CPU may not see STATE_MISSED after it releases the lock. 2. reschedule if STATE_MISSED is set after the lock is released at the end of qdisc_run_end(). For tx_action case, STATE_MISSED is also set when cpu1 is at the end if qdisc_run_end(), so tx_action will be rescheduled again to dequeue the skb enqueued by cpu2. Clear STATE_MISSED before retrying a dequeuing when dequeuing returns NULL in order to reduce the overhead of the second spin_trylock() and __netif_schedule() calling. Also clear the STATE_MISSED before calling __netif_schedule() at the end of qdisc_run_end() to avoid doing another round of dequeuing in the pfifo_fast_dequeue(). The performance impact of this patch, tested using pktgen and dummy netdev with pfifo_fast qdisc attached: threads without+this_patch with+this_patch delta 1 2.61Mpps 2.60Mpps -0.3% 2 3.97Mpps 3.82Mpps -3.7% 4 5.62Mpps 5.59Mpps -0.5% 8 2.78Mpps 2.77Mpps -0.3% 16 2.22Mpps 2.22Mpps -0.0% Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14	tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT	Jim Ma
	In tls_sw_splice_read, checkout MSG_* is inappropriate, should use SPLICE_*, update tls_wait_data to accept nonblock arguments instead of flags for recvmsg and splice. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Jim Ma <majinjing3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14	Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv"	Hoang Le
	This reverts commit 6bf24dc0cc0cc43b29ba344b66d78590e687e046. Above fix is not correct and caused memory leak issue. Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") Acked-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-15	Merge tag 'drm-msm-fixes-2021-05-09' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/msm into drm-fixes - dsi regression fix - dma-buf pinning fix - displayport fixes - llc fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGuqLZDAEJwUFKb6m+h3kyxgjDEKa3DPA1fHA69vxbXH=g@mail.gmail.com
2021-05-14	Merge tag 'trace-v5.13-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix trace_check_vprintf() for %.s The sanity check of all strings being read from the ring buffer to make sure they are in safe memory space did not account for the %.s notation having another parameter to process (the length). Add that to the check" * tag 'trace-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Handle %.*s in trace_check_vprintf()
2021-05-15	Merge tag 'drm-intel-fixes-2021-05-14' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.13-rc2: - Fix active callback alignment annotations and subsequent crashes - Retract link training strategy to slow and wide, again - Avoid division by zero on gen2 - Use correct width reads for C0DRB3/C1DRB3 registers - Fix double free in pdp allocation failure path - Fix HDMI 2.1 PCON downstream caps check Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87a6oxu9ao.fsf@intel.com
2021-05-14	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: "Fixes and cpucaps.h automatic generation: - Generate cpucaps.h at build time rather than carrying lots of #defines. Merged at -rc1 to avoid some conflicts during the merge window. - Initialise RGSR_EL1.SEED in __cpu_setup() as it may be left as 0 out of reset and the IRG instruction would not function as expected if only the architected pseudorandom number generator is implemented. - Fix potential race condition in __sync_icache_dcache() where the PG_dcache_clean page flag is set before the actual cache maintenance. - Fix header include in BTI kselftests" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache() arm64: tools: Add __ASM_CPUCAPS_H to the endif in cpucaps.h arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup kselftest/arm64: Add missing stddef.h include to BTI tests arm64: Generate cpucaps.h
2021-05-14	Merge tag 'f2fs-5.13-rc1-fix' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "This fixes some critical bugs such as memory leak in compression flows, kernel panic when handling errors, and swapon failure due to newly added condition check" * tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: return EINVAL for hole cases in swap file f2fs: avoid swapon failure by giving a warning first f2fs: compress: fix to assign cc.cluster_idx correctly f2fs: compress: fix race condition of overwrite vs truncate f2fs: compress: fix to free compress page correctly f2fs: support iflag change given the mask f2fs: avoid null pointer access when handling IPU error
2021-05-14	arm64: dts: ti: k3*: Introduce reg definition for interrupt routers	Nishanth Menon
	Interrupt routers are memory mapped peripherals, that are organized in our dts bus hierarchy to closely represents the actual hardware behavior. However, without explicitly calling out the reg property, using 2021.03+ dt-schema package, this exposes the following problem with dtbs_check: /arch/arm64/boot/dts/ti/k3-am654-base-board.dt.yaml: bus@100000: interrupt-controller0: {'type': 'object'} is not allowed for {'compatible': ['ti,sci-intr'], ..... Even though we don't use interrupt router directly via memory mapped registers and have to use it via the system controller, the hardware block is memory mapped, so describe the base address in device tree. This is a valid, comprehensive description of hardware and permitted by the existing ti,sci-intr schema. Reviewed-by: Tero Kristo <kristo@kernel.org> Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/r/20210511194821.13919-1-nm@ti.com
2021-05-14	arm64: dts: ti: k3-am65\|j721e\|am64: Map the dma / navigator subsystem via ↵	Nishanth Menon
	explicit ranges Instead of using empty ranges property, lets map explicitly the address range that is mapped onto the dma / navigator subsystems (navss/dmss). This is also exposed via the dtbs_check with dt-schema newer than 2021.03 version by throwing out following: arch/arm64/boot/dts/ti/k3-am654-base-board.dt.yaml: bus@100000: main-navss: {'type': 'object'} is not allowed for {'compatible': ['simple-mfd'], '#address-cells': [[2]], ..... This has already been correctly done for J7200, however was missed for other k3 SoCs. Fix that oversight. Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Tero Kristo <kristo@kernel.org> Acked-by: Vignesh Raghavendra <vigneshr@ti.com> Link: https://lore.kernel.org/r/20210510145429.8752-1-nm@ti.com
2021-05-14	arm64: dts: ti: k3-*: Rename the TI-SCI node	Nishanth Menon
	Lets rename the node name of TI-SCI node to be system-controller as it is a better standardized name for the function that TI-SCI plays in the SoC. Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Tero Kristo <kristo@kernel.org> Link: https://lore.kernel.org/r/20210510145033.7426-5-nm@ti.com
2021-05-14	arm64: dts: ti: k3-am65-wakeup: Drop un-necessary properties from dmsc node	Nishanth Menon
	The DMSC node does'nt require any of "#address-cells", "#size-cells" or "ranges" property as the child nodes are representations of SoC's system controller itself, so align it with the bindings. Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Tero Kristo <kristo@kernel.org> Link: https://lore.kernel.org/r/20210510145033.7426-4-nm@ti.com
2021-05-14	arm64: dts: ti: k3-am65-wakeup: Add debug region to TI-SCI node	Nishanth Menon
	Lets add the TISCI debug region to TI-SCI region in line with TI-SCI documentation[1]. While at it, lets rename the node to indicate the address usage. [1] http://downloads.ti.com/tisci/esd/latest/4_trace/trace.html Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Tero Kristo <kristo@kernel.org> Link: https://lore.kernel.org/r/20210510145033.7426-3-nm@ti.com
2021-05-14	arm64: dts: ti: k3-*: Rename the TI-SCI clocks node name	Nishanth Menon
	We currently use clocks as the node name for the node representing TI-SCI clock nodes. This is better renamed to being clock-controller as that is a better representative of the system controller function as a clock controller for the SoC. Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Tero Kristo <kristo@kernel.org> Link: https://lore.kernel.org/r/20210510145033.7426-2-nm@ti.com
2021-05-14	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf	David S. Miller
	Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Remove the flowtable hardware refresh state, fall back to the existing hardware pending state instead, from Roi Dayan. 2) Fix crash in pipapo avx2 lookup when FPU is in used from user context, from Stefano Brivio. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-14	arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent	Vignesh Raghavendra
	Traffic through main NAVSS interconnect is coherent wrt ARM caches on J7200 SoC. Add missing dma-coherent property to main_navss node. Also add dma-ranges to be consistent with mcu_navss node and with AM65/J721e main_navss and mcu_navss nodes. Fixes: d361ed88455fe ("arm64: dts: ti: Add support for J7200 SoC") Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/r/20210510180601.19458-1-vigneshr@ti.com
2021-05-14	arm64: dts: ti: k3-am654-base-board: remove ov5640	Tomi Valkeinen
	AM654 EVM boards are not shipped with OV5640 sensor module, it is a separate purchase. OV5640 module is also just one of the possible sensors or capture boards you can connect. However, for some reason, OV5640 has been added to the board dts file, making it cumbersome to use other sensors. Remove the OV5640 from the dts file so that it is easy to use other sensors via DT overlays. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Acked-by: Pratyush Yadav <p.yadav@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/r/20210423083120.73476-1-tomi.valkeinen@ideasonboard.com
2021-05-14	Merge tag 'drm-fixes-2021-05-14' of git://anongit.freedesktop.org/drm/drm	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Not much here, mostly amdgpu fixes, with a couple of radeon, and a cosmetic vc4. Two MAINTAINERS file updates also. amdgpu: - Fixes for flexible array conversions - Fix sysfs attribute init - Harvesting fixes - VCN CG/PG fixes for Picasso radeon: - Fixes for flexible array conversions - Fix for flickering on Oland with multiple 4K displays vc4: - drop unused function" * tag 'drm-fixes-2021-05-14' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: update vcn1.0 Non-DPG suspend sequence drm/amdgpu: set vcn mgcg flag for picasso drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected drm/amdgpu: update the method for harvest IP for specific SKU drm/amdgpu: add judgement when add ip blocks (v2) drm/amd/display: Initialize attribute for hdcp_srm sysfs file drm/amd/pm: Fix out-of-bounds bug drm/radeon/si_dpm: Fix SMU power state load drm/radeon/ni_dpm: Fix booting bug MAINTAINERS: Update address for Emma Anholt MAINTAINERS: Update my e-mail drm/vc4: remove unused function drm/ttm: Do not add non-system domain BO into swap list
2021-05-14	arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()	Catalin Marinas
	To ensure that instructions are observable in a new mapping, the arm64 set_pte_at() implementation cleans the D-cache and invalidates the I-cache to the PoU. As an optimisation, this is only done on executable mappings and the PG_dcache_clean page flag is set to avoid future cache maintenance on the same page. When two different processes map the same page (e.g. private executable file or shared mapping) there's a potential race on checking and setting PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the fault paths the page is locked (PG_locked), mprotect() does not take the page lock. The result is that one process may see the PG_dcache_clean flag set but the I/D cache maintenance not yet performed. Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit() and set_bit(). In the rare event of a race, the cache maintenance is done twice. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210514095001.13236-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-05-14	block/partitions/efi.c: Fix the efi_partition() kernel-doc header	Bart Van Assche
	Fix the following kernel-doc warning: block/partitions/efi.c:685: warning: wrong kernel-doc identifier on line: * efi_partition(struct parsed_partitions *state) Cc: Alexander Viro <viro@math.psu.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210513171708.8391-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	blk-mq: Swap two calls in blk_mq_exit_queue()	Bart Van Assche
	If a tag set is shared across request queues (e.g. SCSI LUNs) then the block layer core keeps track of the number of active request queues in tags->active_queues. blk_mq_tag_busy() and blk_mq_tag_idle() update that atomic counter if the hctx flag BLK_MQ_F_TAG_QUEUE_SHARED is set. Make sure that blk_mq_exit_queue() calls blk_mq_tag_idle() before that flag is cleared by blk_mq_del_queue_tag_set(). Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Fixes: 0d2602ca30e4 ("blk-mq: improve support for shared tags maps") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210513171529.7977-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	blk-mq: plug request for shared sbitmap	Ming Lei
	In case of shared sbitmap, request won't be held in plug list any more sine commit 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset"), this way makes request merge from flush plug list & batching submission not possible, so cause performance regression. Yanhui reports performance regression when running sequential IO test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk is emulated with image stored on xfs/megaraid_sas. Fix the issue by recovering original behavior to allow to hold request in plug list. Cc: Yanhui Ma <yama@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: kashyap.desai@broadcom.com Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210514022052.1047665-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	xen/swiotlb: check if the swiotlb has already been initialized	Stefano Stabellini
	xen_swiotlb_init calls swiotlb_late_init_with_tbl, which fails with -ENOMEM if the swiotlb has already been initialized. Add an explicit check io_tlb_default_mem != NULL at the beginning of xen_swiotlb_init. If the swiotlb is already initialized print a warning and return -EEXIST. On x86, the error propagates. On ARM, we don't actually need a special swiotlb buffer (yet), any buffer would do. So ignore the error and continue. CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Boris Ostrovsky <boris.ostrvsky@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210512201823.1963-3-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14	arm64: do not set SWIOTLB_NO_FORCE when swiotlb is required	Christoph Hellwig
	Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init, today dma_direct_map_page returns error if SWIOTLB_NO_FORCE. For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it is going to be required later (e.g. Xen requires it). CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: catalin.marinas@arm.com CC: will@kernel.org CC: linux-arm-kernel@lists.infradead.org Fixes: 2726bf3ff252 ("swiotlb: Make SWIOTLB_NO_FORCE perform no allocation") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210512201823.1963-2-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14	xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.h	Stefano Stabellini
	Move xen_swiotlb_detect to a static inline function to make it available to !CONFIG_XEN builds. CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20210512201823.1963-1-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-14	clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86	Vitaly Kuznetsov
	Mohammed reports (https://bugzilla.kernel.org/show_bug.cgi?id=213029) the commit e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline") broke vDSO on x86. The problem appears to be that VDSO_CLOCKMODE_HVCLOCK is an enum value in 'enum vdso_clock_mode' and '#ifdef VDSO_CLOCKMODE_HVCLOCK' branch evaluates to false (it is not a define). Use a dedicated HAVE_VDSO_CLOCKMODE_HVCLOCK define instead. Fixes: e4ab4658f1cf ("clocksource/drivers/hyper-v: Handle vDSO differences inline") Reported-by: Mohammed Gamal <mgamal@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210513073246.1715070-1-vkuznets@redhat.com
2021-05-14	spi: Don't have controller clean up spi device before driver unbind	Saravana Kannan
	When a spi device is unregistered and triggers a driver unbind, the driver might need to access the spi device. So, don't have the controller clean up the spi device before the driver is unbound. Clean up the spi device after the driver is unbound. Fixes: c7299fea6769 ("spi: Fix spi device unregister flow") Reported-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20210505164734.175546-1-saravanak@google.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-05-14	io_uring: increase max number of reg buffers	Pavel Begunkov
	Since recent changes instead of storing a large array of struct io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and we should not to so worry about restricting max number of registererd buffer slots, increase the limit 4 times. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	io_uring: further remove sqpoll limits on opcodes	Pavel Begunkov
	There are three types of requests that left disabled for sqpoll, namely epoll ctx, statx, and resources update. Since SQPOLL task is now closely mimics a userspace thread, remove the restrictions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	io_uring: fix ltout double free on completion race	Pavel Begunkov
	Always remove linked timeout on io_link_timeout_fn() from the master request link list, otherwise we may get use-after-free when first io_link_timeout_fn() puts linked timeout in the fail path, and then will be found and put on master's free. Cc: stable@vger.kernel.org # 5.10+ Fixes: 90cd7e424969d ("io_uring: track link timeout's master explicitly") Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14	misc: eeprom: at24: check suspend status before disable regulator	Hsin-Yi Wang
	cd5676db0574 ("misc: eeprom: at24: support pm_runtime control") disables regulator in runtime suspend. If runtime suspend is called before regulator disable, it will results in regulator unbalanced disabling. Fixes: cd5676db0574 ("misc: eeprom: at24: support pm_runtime control") Cc: stable <stable@vger.kernel.org> Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Link: https://lore.kernel.org/r/20210420133050.377209-1-hsinyi@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14	uio_hv_generic: Fix another memory leak in error handling paths	Christophe JAILLET
	Memory allocated by 'vmbus_alloc_ring()' at the beginning of the probe function is never freed in the error handling path. Add the missing 'vmbus_free_ring()' call. Note that it is already freed in the .remove function. Fixes: cdfa835c6e5e ("uio_hv_generic: defer opening vmbus until first use") Cc: stable <stable@vger.kernel.org> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0d86027b8eeed8e6360bc3d52bcdb328ff9bdca1.1620544055.git.christophe.jaillet@wanadoo.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14	uio_hv_generic: Fix a memory leak in error handling paths	Christophe JAILLET
	If 'vmbus_establish_gpadl()' fails, the (recv\|send)_gpadl will not be updated and 'hv_uio_cleanup()' in the error handling path will not be able to free the corresponding buffer. In such a case, we need to free the buffer explicitly. Fixes: cdfa835c6e5e ("uio_hv_generic: defer opening vmbus until first use") Cc: stable <stable@vger.kernel.org> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/4fdaff557deef6f0475d02ba7922ddbaa1ab08a6.1620544055.git.christophe.jaillet@wanadoo.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14	uio/uio_pci_generic: fix return value changed in refactoring	Martin Ågren
	Commit ef84928cff58 ("uio/uio_pci_generic: use device-managed function equivalents") was able to simplify various error paths thanks to no longer having to clean up on the way out. Some error paths were dropped, others were simplified. In one of those simplifications, the return value was accidentally changed from -ENODEV to -ENOMEM. Restore the old return value. Fixes: ef84928cff58 ("uio/uio_pci_generic: use device-managed function equivalents") Cc: stable <stable@vger.kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Link: https://lore.kernel.org/r/20210422192240.1136373-1-martin.agren@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14	ALSA: hda/realtek: Add some CLOVE SSIDs of ALC293	PeiSen Hou
	Fix "use as headset mic, without its own jack detect" problen. Signed-off-by: PeiSen Hou <pshou@realtek.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/d0746eaf29f248a5acc30313e3ba4f99@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-05-14	ALSA: firewire-lib: fix amdtp_packet tracepoints event for packet_index field	Takashi Sakamoto
	The snd_firewire_lib:amdtp_packet tracepoints event includes index of packet processed in a context handling. However in IR context, it is not calculated as expected. Cc: <stable@vger.kernel.org> Fixes: 753e717986c2 ("ALSA: firewire-lib: use packet descriptor for IR context") Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20210513125652.110249-6-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-05-14	ALSA: firewire-lib: fix calculation for size of IR context payload	Takashi Sakamoto
	The quadlets for CIP header is handled as a part of IR context header, thus it doesn't join in IR context payload. However current calculation includes the quadlets in IR context payload. Cc: <stable@vger.kernel.org> Fixes: f11453c7cc01 ("ALSA: firewire-lib: use 16 bytes IR context header to separate CIP header") Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20210513125652.110249-5-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-05-14	ALSA: firewire-lib: fix check for the size of isochronous packet payload	Takashi Sakamoto
	The check for size of isochronous packet payload just cares of the size of IR context payload without the size of CIP header. Cc: <stable@vger.kernel.org> Fixes: f11453c7cc01 ("ALSA: firewire-lib: use 16 bytes IR context header to separate CIP header") Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20210513125652.110249-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-05-14	ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro	Takashi Sakamoto
	Mackie d.2 has an extension card for IEEE 1394 communication, which uses BridgeCo DM1000 ASIC. On the other hand, Mackie d.4 Pro has built-in function for IEEE 1394 communication by Oxford Semiconductor OXFW971, according to schematic diagram available in Mackie website. Although I misunderstood that Mackie d.2 Pro would be also a model with OXFW971, it's wrong. Mackie d.2 Pro is a model which includes the extension card as factory settings. This commit fixes entries in Kconfig and comment in ALSA OXFW driver. Cc: <stable@vger.kernel.org> Fixes: fd6f4b0dc167 ("ALSA: bebob: Add skelton for BeBoB based devices") Fixes: ec4dba5053e1 ("ALSA: oxfw: Add support for Behringer/Mackie devices") Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20210513125652.110249-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-05-14	ALSA: dice: fix stream format at middle sampling rate for Alesis iO 26	Takashi Sakamoto
	Alesis iO 26 FireWire has two pairs of digital optical interface. It delivers PCM frames from the interfaces by second isochronous packet streaming. Although both of the interfaces are available at 44.1/48.0 kHz, first one of them is only available at 88.2/96.0 kHz. It reduces the number of PCM samples to 4 in Multi Bit Linear Audio data channel of data blocks on the second isochronous packet streaming. This commit fixes hardcoded stream formats. Cc: <stable@vger.kernel.org> Fixes: 28b208f600a3 ("ALSA: dice: add parameters of stream formats for models produced by Alesis") Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20210513125652.110249-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-05-14	thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALID	Srinivas Pandruvada
	After commit 81ad4276b505 ("Thermal: Ignore invalid trip points") all user_space governor notifications via RW trip point is broken in intel thermal drivers. This commits marks trip_points with value of 0 during call to thermal_zone_device_register() as invalid. RW trip points can be 0 as user space will set the correct trip temperature later. During driver init, x86_package_temp and all int340x drivers sets RW trip temperature as 0. This results in all these trips marked as invalid by the thermal core. To fix this initialize RW trips to THERMAL_TEMP_INVALID instead of 0. Cc: <stable@vger.kernel.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210430122343.1789899-1-srinivas.pandruvada@linux.intel.com
2021-05-14	powerpc/64e/interrupt: Fix nvgprs being clobbered	Nicholas Piggin
	Some interrupt handlers have an "extra" that saves 1 or 2 registers (r14, r15) in the paca save area and makes them available to use by the handler. The change to always save nvgprs in exception handlers lead to some interrupt handlers saving those scratch r14 / r15 registers into the interrupt frame's GPR saves, which get restored on interrupt exit. Fix this by always reloading those scratch registers from paca before the EXCEPTION_COMMON that saves nvgprs. Fixes: 4228b2c3d20e ("powerpc/64e/interrupt: always save nvgprs on interrupt") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
2021-05-14	powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled	Nicholas Piggin
	scv support introduced the notion of code that implicitly soft-masks irqs due to the instruction addresses. This is required because scv enters the kernel with MSR[EE]=1. If a NMI (including soft-NMI) interrupt hits when we are implicitly soft-masked then its regs->softe does not reflect this because it is derived from the explicit soft mask state (paca->irq_soft_mask). This makes arch_irq_disabled_regs(regs) return false. This can trigger a warning in the soft-NMI watchdog code (shown below). Fix it by having NMI interrupts set regs->softe to disabled in case of interrupting an implicit soft-masked region. ------------[ cut here ]------------ WARNING: CPU: 41 PID: 1103 at arch/powerpc/kernel/watchdog.c:259 soft_nmi_interrupt+0x3e4/0x5f0 CPU: 41 PID: 1103 Comm: (spawn) Not tainted NIP: c000000000039534 LR: c000000000039234 CTR: c000000000009a00 REGS: c000007fffbcf940 TRAP: 0700 Not tainted MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 22042482 XER: 200400ad CFAR: c000000000039260 IRQMASK: 3 GPR00: c000000000039204 c000007fffbcfbe0 c000000001d6c300 0000000000000003 GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020 GPR08: 0000007ffd4e0000 0000000000000000 c000007ffffceb00 7265677368657265 GPR12: 9000000000009033 c000007ffffceb00 00000f7075bf4480 000000000000002a GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000 GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040 GPR24: 0000000000000000 0000000000000029 c000000001dae058 0000000000000029 GPR28: 0000000000000000 0000000000000800 0000000000000009 c000007fffbcfd60 NIP [c000000000039534] soft_nmi_interrupt+0x3e4/0x5f0 LR [c000000000039234] soft_nmi_interrupt+0xe4/0x5f0 Call Trace: [c000007fffbcfbe0] [c000000000039204] soft_nmi_interrupt+0xb4/0x5f0 (unreliable) [c000007fffbcfcf0] [c00000000000c0e8] soft_nmi_common+0x138/0x1c4 --- interrupt: 900 at end_real_trampolines+0x0/0x1000 NIP: c000000000003000 LR: 00007ca426adb03c CTR: 900000000280f033 REGS: c000007fffbcfd60 TRAP: 0900 MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44042482 XER: 200400ad CFAR: 00007ca426946020 IRQMASK: 0 GPR00: 00000000000000ad 00007ffffa45d050 00007ca426b07f00 0000000000000035 GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020 GPR08: 0000000000000000 0000000000100000 0000000010000000 00007ffffa45d110 GPR12: 0000000000000001 00007ca426d4e680 00000f7075bf4480 000000000000002a GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000 GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040 GPR24: 0000000000000000 00000f7057473f68 0000000000000003 000000000000041b GPR28: 00007ffffa45d4c4 0000000000000035 0000000000000000 00000f7057473f68 NIP [c000000000003000] end_real_trampolines+0x0/0x1000 LR [00007ca426adb03c] 0x7ca426adb03c --- interrupt: 900 Instruction dump: 60000000 60000000 60420000 38600001 482b3ae5 60000000 e93f0138 a36d0008 7daa6b78 71290001 7f7907b4 4082fd34 <0fe00000> 4bfffd2c 60420000 ea6100a8 ---[ end trace dc75f67d819779da ]--- Fixes: 118178e62e2e ("powerpc: move NMI entry/exit code into wrapper") Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503111708.758261-1-npiggin@gmail.com
2021-05-14	powerpc/64s: Fix stf mitigation patching w/strict RWX & hash	Michael Ellerman
	The stf entry barrier fallback is unsafe to execute in a semi-patched state, which can happen when enabling/disabling the mitigation with strict kernel RWX enabled and using the hash MMU. See the previous commit for more details. Fix it by changing the order in which we patch the instructions. Note the stf barrier fallback is only used on Power6 or earlier. Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210513140800.1391706-2-mpe@ellerman.id.au
2021-05-14	powerpc/64s: Fix entry flush patching w/strict RWX & hash	Michael Ellerman
	The entry flush mitigation can be enabled/disabled at runtime. When this happens it results in the kernel patching its own instructions to enable/disable the mitigation sequence. With strict kernel RWX enabled instruction patching happens via a secondary mapping of the kernel text, so that we don't have to make the primary mapping writable. With the hash MMU this leads to a hash fault, which causes us to execute the exception entry which contains the entry flush mitigation. This means we end up executing the entry flush in a semi-patched state, ie. after we have patched the first instruction but before we patch the second or third instruction of the sequence. On machines with updated firmware the entry flush is a series of special nops, and it's safe to to execute in a semi-patched state. However when using the fallback flush the sequence is mflr/branch/mtlr, and so it's not safe to execute if we have patched out the mflr but not the other two instructions. Doing so leads to us corrputing LR, leading to an oops, for example: # echo 0 > /sys/kernel/debug/powerpc/entry_flush kernel tried to execute exec-protected page (c000000002971000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xc000000002971000 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 0 PID: 2215 Comm: bash Not tainted 5.13.0-rc1-00010-gda3bb206c9ce #1 NIP: c000000002971000 LR: c000000002971000 CTR: c000000000120c40 REGS: c000000013243840 TRAP: 0400 Not tainted (5.13.0-rc1-00010-gda3bb206c9ce) MSR: 8000000010009033 <SF,EE,ME,IR,DR,RI,LE> CR: 48428482 XER: 00000000 ... NIP 0xc000000002971000 LR 0xc000000002971000 Call Trace: do_patch_instruction+0xc4/0x340 (unreliable) do_entry_flush_fixups+0x100/0x3b0 entry_flush_set+0x50/0xe0 simple_attr_write+0x160/0x1a0 full_proxy_write+0x8c/0x110 vfs_write+0xf0/0x340 ksys_write+0x84/0x140 system_call_exception+0x164/0x2d0 system_call_common+0xec/0x278 The simplest fix is to change the order in which we patch the instructions, so that the sequence is always safe to execute. For the non-fallback flushes it doesn't matter what order we patch in. Fixes: bd573a81312f ("powerpc/mm/64s: Allow STRICT_KERNEL_RWX again") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210513140800.1391706-1-mpe@ellerman.id.au
2021-05-14	powerpc/64s: Fix crashes when toggling entry flush barrier	Michael Ellerman
	The entry flush mitigation can be enabled/disabled at runtime via a debugfs file (entry_flush), which causes the kernel to patch itself to enable/disable the relevant mitigations. However depending on which mitigation we're using, it may not be safe to do that patching while other CPUs are active. For example the following crash: sleeper[15639]: segfault (11) at c000000000004c20 nip c000000000004c20 lr c000000000004c20 Shows that we returned to userspace with a corrupted LR that points into the kernel, due to executing the partially patched call to the fallback entry flush (ie. we missed the LR restore). Fix it by doing the patching under stop machine. The CPUs that aren't doing the patching will be spinning in the core of the stop machine logic. That is currently sufficient for our purposes, because none of the patching we do is to that code or anywhere in the vicinity. Fixes: f79643787e0a ("powerpc/64s: flush L1D on kernel entry") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210506044959.1298123-2-mpe@ellerman.id.au
2021-05-14	powerpc/64s: Fix crashes when toggling stf barrier	Michael Ellerman
	The STF (store-to-load forwarding) barrier mitigation can be enabled/disabled at runtime via a debugfs file (stf_barrier), which causes the kernel to patch itself to enable/disable the relevant mitigations. However depending on which mitigation we're using, it may not be safe to do that patching while other CPUs are active. For example the following crash: User access of kernel address (c00000003fff5af0) - exploit attempt? (uid: 0) segfault (11) at c00000003fff5af0 nip 7fff8ad12198 lr 7fff8ad121f8 code 1 code: 40820128 e93c00d0 e9290058 7c292840 40810058 38600000 4bfd9a81 e8410018 code: 2c030006 41810154 3860ffb6 e9210098 <e94d8ff0> 7d295279 39400000 40820a3c Shows that we returned to userspace without restoring the user r13 value, due to executing the partially patched STF exit code. Fix it by doing the patching under stop machine. The CPUs that aren't doing the patching will be spinning in the core of the stop machine logic. That is currently sufficient for our purposes, because none of the patching we do is to that code or anywhere in the vicinity. Fixes: a048a07d7f45 ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210506044959.1298123-1-mpe@ellerman.id.au