linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-08-21	crypto: qat - move IO virtualization functions	Giovanni Cabiddu
	Move IOV functions at the end of hw_data so that PFVF functions related functions are group together. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - fix naming for init/shutdown VF to PF notifications	Marco Chiappero
	At start and shutdown, VFs notify the PF about their state. These notifications are carried out through a message exchange using the PFVF protocol. Function names lead to believe they do perform init or shutdown logic. This is to fix the naming to better reflect their purpose. Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - protect interrupt mask CSRs with a spinlock	Kanchana Velusamy
	In the PF interrupt handler, the interrupt is disabled for a set of VFs by writing to the interrupt source mask register, ERRMSK. The interrupt is re-enabled in the bottom half handler by writing to the same CSR. This is done through the functions enable_vf2pf_interrupts() and disable_vf2pf_interrupts() which perform a read-modify-write operation on the ERRMSK registers to mask and unmask the source of interrupt. There can be a race condition where the top half handler for one VF interrupt runs just as the bottom half for another VF is about to re-enable the interrupt. Depending on whether the top or bottom half updates the CSR first, this would result either in a spurious interrupt or in the interrupt not being re-enabled. This patch protects the access of ERRMSK with a spinlock. The functions adf_enable_vf2pf_interrupts() and adf_disable_vf2pf_interrupts() have been changed to acquire a spin lock before accessing and modifying the ERRMSK registers. These functions use spin_lock_irqsave() to disable IRQs and avoid potential deadlocks. In addition, the function adf_disable_vf2pf_interrupts_irq() has been added. This uses spin_lock() and it is meant to be used in the top half only. Signed-off-by: Kanchana Velusamy <kanchanax.velusamy@intel.com> Co-developed-by: Marco Chiappero <marco.chiappero@intel.com> Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - move pf2vf interrupt [en\|dis]able to adf_vf_isr.c	Marco Chiappero
	Interrupt code to enable interrupts from PF does not belong to the protocol code, so move it to the interrupt handling specific file for better code organization. Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - fix reuse of completion variable	Marco Chiappero
	Use reinit_completion() to set to a clean state a completion variable, used to coordinate the VF to PF request-response flow, before every new VF request. Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - remove intermediate tasklet for vf2pf	Svyatoslav Pankratov
	The PF driver uses the tasklet vf2pf_bh_tasklet to schedule a workqueue to handle the vf2vf protocol (pf2vf_resp_wq). Since the tasklet is only used to schedule the workqueue, this patch removes it and schedules the pf2vf_resp_wq workqueue directly for the top half. Signed-off-by: Svyatoslav Pankratov <svyatoslav.pankratov@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Marco Chiappero <marco.chiappero@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - rename compatibility version definition	Marco Chiappero
	Rename ADF_PFVF_COMPATIBILITY_VERSION in ADF_PFVF_COMPAT_THIS_VERSION since it is used to indicate the current version of the PFVF protocol. Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - prevent spurious MSI interrupt in PF	Marco Chiappero
	There is a chance that the PFVF handler, adf_vf2pf_req_hndl(), runs twice for the same request when multiple interrupts come simultaneously from different VFs. Since the source VF is identified by a positional bit set in the ERRSOU registers and that is not cleared until the bottom half completes, new top halves from other VFs may reschedule a second bottom half for previous interrupts. This patch solves the problem in the ISR handler by not considering sources with already disabled interrupts (and processing pending), as set in the ERRMSK registers. Also, move some definitions where actually needed. Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - prevent spurious MSI interrupt in VF	Giovanni Cabiddu
	QAT GEN2 devices suffer from a defect where the MSI interrupt can be sent multiple times. If the second (spurious) interrupt is handled before the bottom half handler runs, then the extra interrupt is effectively ignored because the bottom half is only scheduled once. However, if the top half runs again after the bottom half runs, this will appear as a spurious PF to VF interrupt. This can be avoided by checking the interrupt mask register in addition to the interrupt source register in the interrupt handler. This patch is based on earlier work done by Conor McLoughlin. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Co-developed-by: Marco Chiappero <marco.chiappero@intel.com> Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - handle both source of interrupt in VF ISR	Giovanni Cabiddu
	The top half of the VF drivers handled only a source at the time. If an interrupt for PF2VF and bundle occurred at the same time, the ISR scheduled only the bottom half for PF2VF. This patch fixes the VF top half so that if both sources of interrupt trigger at the same time, both bottom halves are scheduled. This patch is based on earlier work done by Conor McLoughlin. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Marco Chiappero <marco.chiappero@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - do not ignore errors from enable_vf2pf_comms()	Giovanni Cabiddu
	The function adf_dev_init() ignores the error code reported by enable_vf2pf_comms(). If the latter fails, e.g. the VF is not compatible with the pf, then the load of the VF driver progresses. This patch changes adf_dev_init() so that the error code from enable_vf2pf_comms() is returned to the caller. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Marco Chiappero <marco.chiappero@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - enable interrupts only after ISR allocation	Marco Chiappero
	Enable device interrupts after the setup of the interrupt handlers. Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - remove empty sriov_configure()	Marco Chiappero
	Remove the empty implementation of sriov_configure() and set the sriov_configure member of the pci_driver structure to NULL. This way, if a user tries to enable VFs on a device, when kernel and driver are built with CONFIG_PCI_IOV=n, the kernel reports an error message saying that the driver does not support SRIOV configuration via sysfs. Signed-off-by: Marco Chiappero <marco.chiappero@intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - use proper type for vf_mask	Giovanni Cabiddu
	Replace vf_mask type with unsigned long to avoid a stack-out-of-bound. This is to fix the following warning reported by KASAN the first time adf_msix_isr_ae() gets called. [ 692.091987] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x28/0x50 [ 692.092017] Read of size 8 at addr ffff88afdf789e60 by task swapper/32/0 [ 692.092076] Call Trace: [ 692.092089] <IRQ> [ 692.092101] dump_stack+0x9c/0xcf [ 692.092132] print_address_description.constprop.0+0x18/0x130 [ 692.092164] ? find_first_bit+0x28/0x50 [ 692.092185] kasan_report.cold+0x7f/0x111 [ 692.092213] ? static_obj+0x10/0x80 [ 692.092234] ? find_first_bit+0x28/0x50 [ 692.092262] find_first_bit+0x28/0x50 [ 692.092288] adf_msix_isr_ae+0x16e/0x230 [intel_qat] Fixes: ed8ccaef52fa ("crypto: qat - Add support for SRIOV") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Marco Chiappero <marco.chiappero@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - fix a typo in a comment	Christophe JAILLET
	s/Enable/Disable/ when describing 'adf_disable_aer()' Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - disable AER if an error occurs in probe functions	Christophe JAILLET
	If an error occurs after a 'adf_enable_aer()' call, it must be undone by a corresponding 'adf_disable_aer()' call, as already done in the remove function. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - set DMA mask to 48 bits for Gen2	Giovanni Cabiddu
	Change the DMA mask from 64 to 48 for Gen2 devices as they cannot handle addresses greater than 48 bits. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: qat - simplify code and axe the use of a deprecated API	Christophe JAILLET
	The wrappers in include/linux/pci-dma-compat.h should go away. Replace 'pci_set_dma_mask/pci_set_consistent_dma_mask' by an equivalent and less verbose 'dma_set_mask_and_coherent()' call. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Co-developed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: omap - Fix inconsistent locking of device lists	Ben Hutchings
	lockdep complains that in omap-aes, the list_lock is taken both with softirqs enabled at probe time, and also in softirq context, which could lead to a deadlock: ================================ WARNING: inconsistent lock state 5.14.0-rc1-00035-gc836005b01c5-dirty #69 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/0/7 [HC0[0]:SC1[3]:HE1:SE0] takes: bf00e014 (list_lock){+.?.}-{2:2}, at: omap_aes_find_dev+0x18/0x54 [omap_aes_driver] {SOFTIRQ-ON-W} state was registered at: _raw_spin_lock+0x40/0x50 omap_aes_probe+0x1d4/0x664 [omap_aes_driver] platform_probe+0x58/0xb8 really_probe+0xbc/0x314 __driver_probe_device+0x80/0xe4 driver_probe_device+0x30/0xc8 __driver_attach+0x70/0xf4 bus_for_each_dev+0x70/0xb4 bus_add_driver+0xf0/0x1d4 driver_register+0x74/0x108 do_one_initcall+0x84/0x2e4 do_init_module+0x5c/0x240 load_module+0x221c/0x2584 sys_finit_module+0xb0/0xec ret_fast_syscall+0x0/0x2c 0xbed90b30 irq event stamp: 111800 hardirqs last enabled at (111800): [<c02a21e4>] __kmalloc+0x484/0x5ec hardirqs last disabled at (111799): [<c02a21f0>] __kmalloc+0x490/0x5ec softirqs last enabled at (111776): [<c01015f0>] __do_softirq+0x2b8/0x4d0 softirqs last disabled at (111781): [<c0135948>] run_ksoftirqd+0x34/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(list_lock); <Interrupt> lock(list_lock); * DEADLOCK * 2 locks held by ksoftirqd/0/7: #0: c0f5e8c8 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x6c/0x260 #1: c0f5e8c8 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x2c/0xdc stack backtrace: CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.14.0-rc1-00035-gc836005b01c5-dirty #69 Hardware name: Generic AM43 (Flattened Device Tree) [<c010e6e0>] (unwind_backtrace) from [<c010b9d0>] (show_stack+0x10/0x14) [<c010b9d0>] (show_stack) from [<c017c640>] (mark_lock.part.17+0x5bc/0xd04) [<c017c640>] (mark_lock.part.17) from [<c017d9e4>] (__lock_acquire+0x960/0x2fa4) [<c017d9e4>] (__lock_acquire) from [<c0180980>] (lock_acquire+0x10c/0x358) [<c0180980>] (lock_acquire) from [<c093d324>] (_raw_spin_lock_bh+0x44/0x58) [<c093d324>] (_raw_spin_lock_bh) from [<bf00b258>] (omap_aes_find_dev+0x18/0x54 [omap_aes_driver]) [<bf00b258>] (omap_aes_find_dev [omap_aes_driver]) from [<bf00b328>] (omap_aes_crypt+0x94/0xd4 [omap_aes_driver]) [<bf00b328>] (omap_aes_crypt [omap_aes_driver]) from [<c08ac6d0>] (esp_input+0x1b0/0x2c8) [<c08ac6d0>] (esp_input) from [<c08c9e90>] (xfrm_input+0x410/0x1290) [<c08c9e90>] (xfrm_input) from [<c08b6374>] (xfrm4_esp_rcv+0x54/0x11c) [<c08b6374>] (xfrm4_esp_rcv) from [<c0838840>] (ip_protocol_deliver_rcu+0x48/0x3bc) [<c0838840>] (ip_protocol_deliver_rcu) from [<c0838c50>] (ip_local_deliver_finish+0x9c/0xdc) [<c0838c50>] (ip_local_deliver_finish) from [<c0838dd8>] (ip_local_deliver+0x148/0x1b0) [<c0838dd8>] (ip_local_deliver) from [<c0838f5c>] (ip_rcv+0x11c/0x180) [<c0838f5c>] (ip_rcv) from [<c077e3a4>] (__netif_receive_skb_one_core+0x54/0x74) [<c077e3a4>] (__netif_receive_skb_one_core) from [<c077e588>] (netif_receive_skb+0xa8/0x260) [<c077e588>] (netif_receive_skb) from [<c068d6d4>] (cpsw_rx_handler+0x224/0x2fc) [<c068d6d4>] (cpsw_rx_handler) from [<c0688ccc>] (__cpdma_chan_process+0xf4/0x188) [<c0688ccc>] (__cpdma_chan_process) from [<c068a0c0>] (cpdma_chan_process+0x3c/0x5c) [<c068a0c0>] (cpdma_chan_process) from [<c0690e14>] (cpsw_rx_mq_poll+0x44/0x98) [<c0690e14>] (cpsw_rx_mq_poll) from [<c0780810>] (__napi_poll+0x28/0x268) [<c0780810>] (__napi_poll) from [<c0780c64>] (net_rx_action+0xcc/0x204) [<c0780c64>] (net_rx_action) from [<c0101478>] (__do_softirq+0x140/0x4d0) [<c0101478>] (__do_softirq) from [<c0135948>] (run_ksoftirqd+0x34/0x50) [<c0135948>] (run_ksoftirqd) from [<c01583b8>] (smpboot_thread_fn+0xf4/0x1d8) [<c01583b8>] (smpboot_thread_fn) from [<c01546dc>] (kthread+0x14c/0x174) [<c01546dc>] (kthread) from [<c010013c>] (ret_from_fork+0x14/0x38) ... The omap-des and omap-sham drivers appear to have a similar issue. Fix this by using spin_{,un}lock_bh() around device list access in all the probe and remove functions. Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	crypto: omap - Avoid redundant copy when using truncated sg list	Ben Hutchings
	omap_crypto_cleanup() currently copies data from sg to orig if either copy flag is set. However OMAP_CRYPTO_SG_COPIED means that sg refers to the same pages as orig, truncated to len bytes. There is no need to copy in this case. Only copy data if the OMAP_CRYPTO_DATA_COPIED flag is set. Fixes: 74ed87e7e7f7 ("crypto: omap - add base support library for common ...") Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-08-21	usb: chipidea: host: fix port index underflow and UBSAN complains	Li Jun
	If wIndex is 0 (and it often is), these calculations underflow and UBSAN complains, here resolve this by not decrementing the index when it is equal to 0, this copies the solution from commit 85e3990bea49 ("USB: EHCI: avoid undefined pointer arithmetic and placate UBSAN") Reported-by: Zhipeng Wang <zhipeng.wang_1@nxp.com> Signed-off-by: Li Jun <jun.li@nxp.com> Link: https://lore.kernel.org/r/1624004938-2399-1-git-send-email-jun.li@nxp.com Signed-off-by: Peter Chen <peter.chen@kernel.org>
2021-08-20	block: add back the bd_holder_dir reference in bd_link_disk_holder	Christoph Hellwig
	This essentially reverts "block: remove the extra kobject reference in bd_link_disk_holder". That commit dropped the extra reference because the condition in the comment can't be true. But it turns out that comment did not actually describe the problematic situation, so add back the extra reference and document it properly. Fixes: fbd9a39542ec ("block: remove the extra kobject reference in bd_link_disk_holder") Reported-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-20	stmmac: Revert "stmmac: align RX buffers"	Marc Zyngier
	This reverts commit a955318fe67e ("stmmac: align RX buffers"), which breaks at least one platform (Nvidia Jetson-X1), causing packet corruption. This is 100% reproducible, and reverting the patch results in a working system again. Given that it is "only" a performance optimisation, let's return to a known working configuration until we can have a good understanding of what is happening here. Fixes: a955318fe67e ("stmmac: align RX buffers") Cc: Matteo Croce <mcroce@linux.microsoft.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Link: https://lore.kernel.org/netdev/871r71azjw.wl-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210820183002.457226-1-maz@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-20	io_uring: fix xa_alloc_cycle() error return value check	Jens Axboe
	We currently check for ret != 0 to indicate error, but '1' is a valid return and just indicates that the allocation succeeded with a wrap. Correct the check to be for < 0, like it was before the xarray conversion. Cc: stable@vger.kernel.org Fixes: 61cf93700fe6 ("io_uring: Convert personality_idr to XArray") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-20	Merge tag 'acpi-5.14-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix two mistakes in new code. Specifics: - Prevent confusing messages from being printed if the PRMT table is not present or there are no PRM modules (Aubrey Li). - Fix the handling of suspend-to-idle entry and exit in the case when the Microsoft UUID is used with the Low-Power S0 Idle _DSM interface (Mario Limonciello)" * tag 'acpi-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PM: s2idle: Invert Microsoft UUID entry and exit ACPI: PRM: Deal with table not present or no module found
2021-08-20	Merge tag 'pm-5.14-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix some issues in the ARM cpufreq drivers and in the operating performance points (OPP) framework. Specifics: - Fix useless WARN() in the OPP core and prevent a noisy warning from being printed by OPP _put functions (Dmitry Osipenko). - Fix error path when allocation failed in the arm_scmi cpufreq driver (Lukasz Luba). - Blacklist Qualcomm sc8180x and Qualcomm sm8150 in cpufreq-dt-platdev (Bjorn Andersson, Thara Gopinath). - Forbid cpufreq for 1.2 GHz variant in the armada-37xx cpufreq driver (Marek Behún)" * tag 'pm-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: opp: Drop empty-table checks from _put functions cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev cpufreq: arm_scmi: Fix error path when allocation failed opp: remove WARN when no valid OPPs remain cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev
2021-08-20	dm crypt: use in_hardirq() instead of deprecated in_irq()	Changbin Du
	Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-20	xfs: fix perag structure refcounting error when scrub fails	Darrick J. Wong
	The kernel test robot found the following bug when running xfs/355 to scrub a bmap btree: XFS: Assertion failed: !sa->pag, file: fs/xfs/scrub/common.c, line: 412 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:110! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 1415 Comm: xfs_scrub Not tainted 5.14.0-rc4-00021-g48c6615cc557 #1 Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013 RIP: 0010:assfail+0x23/0x28 [xfs] RSP: 0018:ffffc9000aacb890 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffffc9000aacbcc8 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffc09e7dcd RBP: ffffc9000aacbc80 R08: ffff8881fdf17d50 R09: 0000000000000000 R10: 000000000000000a R11: f000000000000000 R12: 0000000000000000 R13: ffff88820c7ed000 R14: 0000000000000001 R15: ffffc9000aacb980 FS: 00007f185b955700(0000) GS:ffff8881fdf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7f6ef43000 CR3: 000000020de38002 CR4: 00000000001706e0 Call Trace: xchk_ag_read_headers+0xda/0x100 [xfs] xchk_ag_init+0x15/0x40 [xfs] xchk_btree_check_block_owner+0x76/0x180 [xfs] xchk_btree_get_block+0xd0/0x140 [xfs] xchk_btree+0x32e/0x440 [xfs] xchk_bmap_btree+0xd4/0x140 [xfs] xchk_bmap+0x1eb/0x3c0 [xfs] xfs_scrub_metadata+0x227/0x4c0 [xfs] xfs_ioc_scrub_metadata+0x50/0xc0 [xfs] xfs_file_ioctl+0x90c/0xc40 [xfs] __x64_sys_ioctl+0x83/0xc0 do_syscall_64+0x3b/0xc0 The unusual handling of errors while initializing struct xchk_ag is the root cause here. Since the beginning of xfs_scrub, the goal of xchk_ag_read_headers has been to read all three AG header buffers and attach them both to the xchk_ag structure and the scrub transaction. Corruption errors on any of the three headers doesn't necessarily trigger an immediate return to userspace, because xfs_scrub can also tell us to /fix/ the problem. In other words, it's possible for the xchk_ag init functions to return an error code and a partially filled out structure so that scrub can use however much information it managed to pull. Before 5.15, it was sufficient to cancel (or commit) the scrub transaction on the way out of the scrub code to release the buffers. Ccommit 48c6615cc557 added a reference to the perag structure to struct xchk_ag. Since perag structures are not attached to transactions like buffers are, this adds the requirement that the perag ref be released explicitly. The scrub teardown function xchk_teardown was amended to do this for the xchk_ag embedded in struct xfs_scrub. Unfortunately, I forgot that certain parts of the scrub code probe multiple AGs and therefore handle the initialization and cleanup on their own. Specifically, the bmbt scrubber will initialize it long enough to cross-reference AG metadata for btree blocks and for the extent mappings in the bmbt. If one of the AG headers is corrupt, the init function returns with a live perag structure reference and some of the AG header buffers. If an error occurs, the cross referencing will be noted as XCORRUPTion and skipped, but the main scrub process will move on to the next record. It is now necessary to release the perag reference before we try to analyze something from a different AG, or else we'll trip over the assertion noted above. Fixes: 48c6615cc557 ("xfs: grab active perag ref when reading AG headers") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-20	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge misc fixes from Andrew Morton: "10 patches. Subsystems affected by this patch series: MAINTAINERS and mm (shmem, pagealloc, tracing, memcg, memory-failure, vmscan, kfence, and hugetlb)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: hugetlb: don't pass page cache pages to restore_reserve_on_error kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE mm: vmscan: fix missing psi annotation for node_reclaim() mm/hwpoison: retry with shake_page() for unhandlable pages mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim MAINTAINERS: update ClangBuiltLinux IRC chat mmflags.h: add missing __GFP_ZEROTAGS and __GFP_SKIP_KASAN_POISON names mm/page_alloc: don't corrupt pcppage_migratetype Revert "mm: swap: check if swap backing device is congested or not" Revert "mm/shmem: fix shmem_swapin() race with swapoff"
2021-08-20	dm ima: update dm documentation for ima measurement support	Tushar Sugandhi
	The ima documentation for measuring DM targets (dm-ima.rst) is missing the attribute information for the targets - 'cache', 'integrity', 'multipath', and 'snapshot'. It is also missing the grammar for various DM events and targets, which can help the attestation servers to determine what data to expect for a given DM device. Further, the documentation needs to be updated to incorporate code changes made to DM ima events and targets as part of this patch series. For instance, prefixing the event names with "dm_", adding the DM version to events, prefixing the table hashes in the ima log with the hash algorithm etc. There are warnings reported by 'make htmldocs' on dm-ima.rst, which need to be fixed. And lastly, the expected behavior needs to be documented when the configuration CONFIG_IMA_DISABLE_HTABLE is disabled. Update the documentation to add examples for 'cache', 'integrity', 'multipath', and 'snapshot' targets. Add the grammar for various DM events and targets in Backus Naur form, so that the attestation servers can interpret and act on the ima measurements for DM target. Fix htmldocs warnings in dm-ima.rst. Update the documentation to be consistent with the code changes that are part of this patch series. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-20	dm ima: update dm target attributes for ima measurements	Tushar Sugandhi
	Certain DM targets ('integrity', 'multipath', 'verity') need to update the way their attributes are recorded in the ima log, so that the attestation servers can interpret the data correctly and decide if the devices meet the attestation requirements. For instance, the "mode=%c" attribute in the 'integrity' target is measured twice, the 'verity' target is missing the attribute "root_hash_sig_key_desc=%s", and the 'multipath' target needs to index the attributes properly. Update 'integrity' target to remove the duplicate measurement of the attribute "mode=%c". Add "root_hash_sig_key_desc=%s" attribute for the 'verity' target. Index various attributes in 'multipath' target. Also, add "nr_priority_groups=%u" attribute to 'multipath' target to record the number of priority groups. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Suggested-by: Thore Sommer <public@thson.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-20	dm ima: add a warning in dm_init if duplicate ima events are not measured	Tushar Sugandhi
	The end-users of DM devices/targets may remove and re-create the same device multiple times. IMA does not measure such duplicate events if the configuration CONFIG_IMA_DISABLE_HTABLE is set to 'n'. To avoid confusion, the end-users need some indication on the client if that configuration option is disabled. Add a one-time warning during dm_init() if CONFIG_IMA_DISABLE_HTABLE is set to 'n', to notify the end-users that duplicate events will not be measured in the ima log. Also cleanup some whitespace in dm_init(). Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-20	dm ima: prefix ima event name related to device mapper with dm_	Tushar Sugandhi
	The event names for the DM events recorded in the ima log do not contain any information to indicate the events are part of the DM devices/targets. Prefix the event names for DM events with "dm_" to indicate that they are part of device-mapper. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Suggested-by: Thore Sommer <public@thson.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-20	Merge tag 'drm-fixes-2021-08-20-3' of git://anongit.freedesktop.org/drm/drm	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Regularly scheduled fixes. The ttm one solves a problem of GPU drivers failing to load if debugfs is off in Kconfig, otherwise the i915 and mediatek, and amdgpu fixes all fairly normal. Nouveau has a couple of display fixes, but it has a fix for a longstanding race condition in it's memory manager code, and the fix mostly removes some code that wasn't working properly and has no userspace users. This fix makes the diffstat kinda larger but in a good (negative line-count) way. core: - fix drm_wait_vblank uapi copying bug ttm: - fix debugfs init when debugfs is off amdgpu: - vega10 SMU workload fix - DCN VM fix - DCN 3.01 watermark fix amdkfd: - SVM fix nouveau: - ampere display fixes - remove MM misfeature to fix a longstanding race condition i915: - tweaked display workaround for all PCHs - eDP MSO pipe sanity for ADL-P fix - remove unused symbol export mediatek: - AAL output size setting - Delete component in remove function" * tag 'drm-fixes-2021-08-20-3' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Use DCN30 watermark calc for DCN301 drm/i915/dp: remove superfluous EXPORT_SYMBOL() drm/i915/edp: fix eDP MSO pipe sanity checks for ADL-P drm/i915: Tweaked Wa_14010685332 for all PCHs drm/nouveau: rip out nvkm_client.super drm/nouveau: block a bunch of classes from userspace drm/nouveau/fifo/nv50-: rip out dma channels drm/nouveau/kms/nv50: workaround EFI GOP window channel format differences drm/nouveau/disp: power down unused DP links during init drm/nouveau: recognise GA107 drm: Copy drm_wait_vblank to user before returning drm/amd/display: Ensure DCN save after VM setup drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure drm/amd/pm: change the workload type for some cards Revert "drm/amd/pm: fix workload mismatch on vega10" drm: ttm: Don't bail from ttm_global_init if debugfs_create_dir fails drm/mediatek: Add component_del in OVL and COLOR remove function drm/mediatek: Add AAL output size configuration
2021-08-20	dm ima: add version info to dm related events in ima log	Tushar Sugandhi
	The DM events present in the ima log contain various attributes in the key=value format. The attributes' names/values may change in future, and new attributes may also get added. The attestation server needs some versioning to determine which attributes are supported and are expected in the ima log. Add version information to the DM events present in the ima log to help attestation servers to correctly process the attributes across different versions. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Suggested-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-20	dm ima: prefix dm table hashes in ima log with hash algorithm	Tushar Sugandhi
	The active/inactive table hashes measured in the ima log do not contain the information about hash algorithm. This information is useful for the attestation servers to recreate the hashes and compare them with the ones present in the ima log to verify the table contents. Prefix the table hashes in various DM events in ima log with the hash algorithm used to compute those hashes. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Suggested-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-08-20	Merge tag 'pci-v5.14-fixes-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Add Rahul Tanwar as Intel LGM Gateway PCIe maintainer (Rahul Tanwar) - Add Jim Quinlan et al as Broadcom STB PCIe maintainers (Jim Quinlan) - Increase D3hot-to-D0 delay for AMD Renoir/Cezanne XHCI (Marcin Bachry) - Correct iomem_get_mapping() usage for legacy_mem sysfs (Krzysztof Wilczyński) * tag 'pci-v5.14-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/sysfs: Use correct variable for the legacy_mem sysfs object PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI MAINTAINERS: Add Jim Quinlan et al as Broadcom STB PCIe maintainers MAINTAINERS: Add Rahul Tanwar as Intel LGM Gateway PCIe maintainer
2021-08-20	Merge tag 'mmc-v5.14-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC host fixes from Ulf Hansson: - dw_mmc: Fix hang on data CRC error - mmci: Fix voltage switch procedure for the stm32 variant - sdhci-iproc: Fix some clock issues for BCM2711 - sdhci-msm: Fixup software timeout value * tag 'mmc-v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711 mmc: sdhci-iproc: Cap min clock frequency on BCM2711 mmc: sdhci-msm: Update the software timeout value for sdhc mmc: mmci: stm32: Check when the voltage switch procedure should be done mmc: dw_mmc: Fix hang on data CRC error
2021-08-20	Merge tag 'sound-5.14-rc7-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull more sound fixes from Takashi Iwai: "This is a quick follow up for 5.14: a fix for a very recently introduced regression on ASoC Intel Atom driver, and another trivial HD-audio quirk for HP laptops" * tag 'sound-5.14-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: intel: atom: Fix breakage for PCM buffer address setup ALSA: hda/realtek: Limit mic boost on HP ProBook 445 G8
2021-08-20	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix cleaning of vDSO directories - Ensure CNTHCTL_EL2 is fully initialised when booting at EL2 * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: initialize all of CNTHCTL_EL2 arm64: clean vdso & vdso32 files
2021-08-20	Merge branch 'acpi-pm'	Rafael J. Wysocki
	* acpi-pm: ACPI: PM: s2idle: Invert Microsoft UUID entry and exit
2021-08-20	Merge tag 'iommu-fixes-v5.14-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Fix for a potential NULL-ptr dereference in IOMMU core code - Two resource leak fixes - Cache flush fix in the Intel VT-d driver * tag 'iommu-fixes-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix incomplete cache flush in intel_pasid_tear_down_entry() iommu/vt-d: Fix PASID reference leak iommu: Check if group is NULL before remove device iommu/dma: Fix leak in non-contiguous API
2021-08-20	Merge branch 'pm-opp'	Rafael J. Wysocki
	* pm-opp: opp: Drop empty-table checks from _put functions opp: remove WARN when no valid OPPs remain
2021-08-20	arm64: replace in_irq() with in_hardirq()	Changbin Du
	Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Signed-off-by: Changbin Du <changbin.du@gmail.com> Link: https://lore.kernel.org/r/20210814005405.2658-1-changbin.du@gmail.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-20	RDMA/rxe: Zero out index member of struct rxe_queue	Xiao Yang
	1) New index member of struct rxe_queue was introduced but not zeroed so the initial value of index may be random. 2) The current index is not masked off to index_mask. In this case producer_addr() and consumer_addr() will get an invalid address by the random index and then accessing the invalid address triggers the following panic: "BUG: unable to handle page fault for address: ffff9ae2c07a1414" Fix the issue by using kzalloc() to zero out index member. Fixes: 5bcf5a59c41e ("RDMA/rxe: Protext kernel index from user space") Link: https://lore.kernel.org/r/20210820111509.172500-1-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-20	hugetlb: don't pass page cache pages to restore_reserve_on_error	Mike Kravetz
	syzbot hit kernel BUG at fs/hugetlbfs/inode.c:532 as described in [1]. This BUG triggers if the HPageRestoreReserve flag is set on a page in the page cache. It should never be set, as the routine huge_add_to_page_cache explicitly clears the flag after adding a page to the cache. The only code other than huge page allocation which sets the flag is restore_reserve_on_error. It will potentially set the flag in rare out of memory conditions. syzbot was injecting errors to cause memory allocation errors which exercised this specific path. The code in restore_reserve_on_error is doing the right thing. However, there are instances where pages in the page cache were being passed to restore_reserve_on_error. This is incorrect, as once a page goes into the cache reservation information will not be modified for the page until it is removed from the cache. Error paths do not remove pages from the cache, so even in the case of error, the page will remain in the cache and no reservation adjustment is needed. Modify routines that potentially call restore_reserve_on_error with a page cache page to no longer do so. Note on fixes tag: Prior to commit 846be08578ed ("mm/hugetlb: expand restore_reserve_on_error functionality") the routine would not process page cache pages because the HPageRestoreReserve flag is not set on such pages. Therefore, this issue could not be trigggered. The code added by commit 846be08578ed ("mm/hugetlb: expand restore_reserve_on_error functionality") is needed and correct. It exposed incorrect calls to restore_reserve_on_error which is the root cause addressed by this commit. [1] https://lore.kernel.org/linux-mm/00000000000050776d05c9b7c7f0@google.com/ Link: https://lkml.kernel.org/r/20210818213304.37038-1-mike.kravetz@oracle.com Fixes: 846be08578ed ("mm/hugetlb: expand restore_reserve_on_error functionality") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: <syzbot+67654e51e54455f1c585@syzkaller.appspotmail.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20	kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE	Marco Elver
	Originally the addr != NULL check was meant to take care of the case where __kfence_pool == NULL (KFENCE is disabled). However, this does not work for addresses where addr > 0 && addr < KFENCE_POOL_SIZE. This can be the case on NULL-deref where addr > 0 && addr < PAGE_SIZE or any other faulting access with addr < KFENCE_POOL_SIZE. While the kernel would likely crash, the stack traces and report might be confusing due to double faults upon KFENCE's attempt to unprotect such an address. Fix it by just checking that __kfence_pool != NULL instead. Link: https://lkml.kernel.org/r/20210818130300.2482437-1-elver@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> [5.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20	mm: vmscan: fix missing psi annotation for node_reclaim()	Johannes Weiner
	In a debugging session the other day, Rik noticed that node_reclaim() was missing memstall annotations. This means we'll miss pressure and lost productivity resulting from reclaim on an overloaded local NUMA node when vm.zone_reclaim_mode is enabled. There haven't been any reports, but that's likely because vm.zone_reclaim_mode hasn't been a commonly used feature recently, and the intersection between such setups and psi users is probably nil. But secondary memory such as CXL-connected DIMMS, persistent memory etc, and the page demotion patches that handle them (https://lore.kernel.org/lkml/20210401183216.443C4443@viggo.jf.intel.com/) could soon make this a more common codepath again. Link: https://lkml.kernel.org/r/20210818152457.35846-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Rik van Riel <riel@surriel.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20	mm/hwpoison: retry with shake_page() for unhandlable pages	Naoya Horiguchi
	HWPoisonHandlable() sometimes returns false for typical user pages due to races with average memory events like transfers over LRU lists. This causes failures in hwpoison handling. There's retry code for such a case but does not work because the retry loop reaches the retry limit too quickly before the page settles down to handlable state. Let get_any_page() call shake_page() to fix it. [naoya.horiguchi@nec.com: get_any_page(): return -EIO when retry limit reached] Link: https://lkml.kernel.org/r/20210819001958.2365157-1-naoya.horiguchi@linux.dev Link: https://lkml.kernel.org/r/20210817053703.2267588-1-naoya.horiguchi@linux.dev Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [5.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20	mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim	Johannes Weiner
	We've noticed occasional OOM killing when memory.low settings are in effect for cgroups. This is unexpected and undesirable as memory.low is supposed to express non-OOMing memory priorities between cgroups. The reason for this is proportional memory.low reclaim. When cgroups are below their memory.low threshold, reclaim passes them over in the first round, and then retries if it couldn't find pages anywhere else. But when cgroups are slightly above their memory.low setting, page scan force is scaled down and diminished in proportion to the overage, to the point where it can cause reclaim to fail as well - only in that case we currently don't retry, and instead trigger OOM. To fix this, hook proportional reclaim into the same retry logic we have in place for when cgroups are skipped entirely. This way if reclaim fails and some cgroups were scanned with diminished pressure, we'll try another full-force cycle before giving up and OOMing. [akpm@linux-foundation.org: coding-style fixes] Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Leon Yang <lnyng@fb.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>