linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-10-26	spi: spl022: fix Microwire full duplex mode	Thomas Perrot
	There are missing braces in the function that verify controller parameters, then an error is always returned when the parameter to select Microwire frames operation is used on devices allowing it. Signed-off-by: Thomas Perrot <thomas.perrot@bootlin.com> Link: https://lore.kernel.org/r/20211022142104.1386379-1-thomas.perrot@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-26	irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()	Marc Zyngier
	The MIPS GIC driver uses irq_cpu_online() to go and program the per-CPU interrupts. However, this method iterates over all IRQs in the system, despite only 3 per-CPU interrupts being of interest. Let's be terribly bold and do the iteration ourselves. To ensure mutual exclusion, hold the gic_lock spinlock that is otherwise taken while dealing with these interrupts. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20211021170414.3341522-3-maz@kernel.org
2021-10-26	irq: remove handle_domain_{irq,nmi}()	Mark Rutland
	Now that entry code handles IRQ entry (including setting the IRQ regs) before calling irqchip code, irqchip code can safely call generic_handle_domain_irq(), and there's no functional reason for it to call handle_domain_irq(). Let's cement this split of responsibility and remove handle_domain_irq() entirely, updating irqchip drivers to call generic_handle_domain_irq(). For consistency, handle_domain_nmi() is similarly removed and replaced with a generic_handle_domain_nmi() function which also does not perform any entry logic. Previously handle_domain_{irq,nmi}() had a WARN_ON() which would fire when they were called in an inappropriate context. So that we can identify similar issues going forward, similar WARN_ON_ONCE() logic is added to the generic_handle_*() functions, and comments are updated for clarity and consistency. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-26	nvme-tcp: fix H2CData PDU send accounting (again)	Sagi Grimberg
	We should not access request members after the last send, even to determine if indeed it was the last data payload send. The reason is that a completion could have arrived and trigger a new execution of the request which overridden these members. This was fixed by commit 825619b09ad3 ("nvme-tcp: fix possible use-after-completion"). Commit e371af033c56 broke that assumption again to address cases where multiple r2t pdus are sent per request. To fix it, we need to record the request data_sent and data_len and after the payload network send we reference these counters to determine weather we should advance the request iterator. Fixes: e371af033c56 ("nvme-tcp: fix incorrect h2cdata pdu offset accounting") Reported-by: Keith Busch <kbusch@kernel.org> Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-26	nvmet-tcp: fix a memory leak when releasing a queue	Maurizio Lombardi
	page_frag_free() won't completely release the memory allocated for the commands, the cache page must be explicitly freed by calling __page_frag_cache_drain(). This bug can be easily reproduced by repeatedly executing the following command on the initiator: $echo 1 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-26	drm/i915/dp: Skip the HW readout of DPCD on disabled encoders	Imre Deak
	Reading out the DP encoders' DPCD during booting or resume is only required for enabled encoders: such encoders may be modesetted during the initial commit and the link training this involves depends on an initialized DPCD. For DDI encoders reading out the DPCD is skipped, do the same on pre-DDI platforms. Atm, the first DPCD readout without a sink connected - which is a likely scneario if the encoder is disabled - leaves intel_dp->num_common_rates at 0, which resulted in intel_dp_sync_state()->intel_dp_max_common_rate() in a intel_dp->common_rates[-1] access. This by definition results in an undefined behaviour, though to my best knowledge in all HW/compiler configurations it actually results in accessing the array item type value preceding the array. In this case the preceding value happens to be intel_dp->num_common_rates, which is 0, so this issue - by luck - didn't cause a user visible problem. Nevertheless it's still an undefined behaviour and in CONFIG_UBSAN builds leads to a kernel BUG() (which revealed this problem for us), hence CC:stable. A related problem in case the encoder is enabled but the sink is not connected or the DPCD readout fails is fixed by the next patch. v2: Amend the commit message describing the root cause of the CONFIG_UBSAN BUG(). Fixes: a532cde31de3 ("drm/i915/tc: Fix TypeC port init/resume time sanitization") References: https://gitlab.freedesktop.org/drm/intel/-/issues/4297 Reported-and-tested-by: Mat Jonczyk <mat.jonczyk@o2.pl> Cc: Mat Jonczyk <mat.jonczyk@o2.pl> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018094154.1407705-2-imre.deak@intel.com (cherry picked from commit 4ec5ffc341cecbea060739aea1d53398ac2ec3f8) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-10-26	drm/i915: Catch yet another unconditioal clflush	Ville Syrjälä
	Replace the unconditional clflush() with drm_clflush_virt_range() which does the wbinvd() fallback when clflush is not available. This time no justification is given for the clflush in the offending commit. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 2c8ab3339e39 ("drm/i915: Pin timeline map after first timeline pin, v4.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-4-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit 9ced12182d0d8401d821e9602e56e276459900fc) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-10-26	drm/i915: Convert unconditional clflush to drm_clflush_virt_range()	Ville Syrjälä
	This one is apparently a "clflush for good measure", so bit more justification (if you can call it that) than some of the others. Convert to drm_clflush_virt_range() again so that machines without clflush will survive the ordeal. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> #v1 Fixes: 12ca695d2c1e ("drm/i915: Do not share hwsp across contexts any more, v8.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-3-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com> (cherry picked from commit af7b6d234eefa30c461cc16912bafb32b9e6141c) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-10-26	tpm_tis_spi: Add missing SPI ID	Mark Brown
	In commit c46ed2281bbe ("tpm_tis_spi: add missing SPI device ID entries") we added SPI IDs for all the DT aliases to handle the fact that we always use SPI modaliases to load modules even when probed via DT however the mentioned commit missed that the SPI and OF device ID entries did not match and were different and so DT nodes with compatible "tcg,tpm_tis-spi" will not match. Add an extra ID for tpm_tis-spi rather than just fix the existing one since what's currently there is going to be better for anyone actually using SPI IDs to instantiate. Fixes: c46ed2281bbe ("tpm_tis_spi: add missing SPI device ID entries") Fixes: 96c8395e2166 ("spi: Revert modalias changes") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-10-26	tpm: fix Atmel TPM crash caused by too frequent queries	Hao Wu
	The Atmel TPM 1.2 chips crash with error `tpm_try_transmit: send(): error -62` since kernel 4.14. It is observed from the kernel log after running `tpm_sealdata -z`. The error thrown from the command is as follows ``` $ tpm_sealdata -z Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error ``` The issue was reproduced with the following Atmel TPM chip: ``` $ tpm_version T0 TPM 1.2 Version Info: Chip Version: 1.2.66.1 Spec Level: 2 Errata Revision: 3 TPM Vendor ID: ATML TPM Version: 01010000 Manufacturer Info: 41544d4c ``` The root cause of the issue is due to the TPM calls to msleep() were replaced with usleep_range() [1], which reduces the actual timeout. Via experiments, it is observed that the original msleep(5) actually sleeps for 15ms. Because of a known timeout issue in Atmel TPM 1.2 chip, the shorter timeout than 15ms can cause the error described above. A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further reduced the timeout to less than 1ms. With experiments, the problematic timeout in the latest kernel is the one for `wait_for_tpm_stat`. To fix it, the patch reverts the timeout of `wait_for_tpm_stat` to 15ms for all Atmel TPM 1.2 chips, but leave it untouched for Ateml TPM 2.0 chip, and chips from other vendors. As explained above, the chosen 15ms timeout is the actual timeout before this issue introduced, thus the old value is used here. Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to the existing TPM_TIMEOUT_RANGE_US (300us). The fixed has been tested in the system with the affected Atmel chip with no issues observed after boot up. References: [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit() [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer granularity Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers") Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/ Signed-off-by: Hao Wu <hao.wu@rubrik.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-10-26	tpm: Check for integer overflow in tpm2_map_response_body()	Dan Carpenter
	The "4 * be32_to_cpu(data->count)" multiplication can potentially overflow which would lead to memory corruption. Add a check for that. Cc: stable@vger.kernel.org Fixes: 745b361e989a ("tpm: infrastructure for TPM spaces") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-10-26	tpm: tis: Kconfig: Add helper dependency on COMPILE_TEST	Cai Huoqing
	COMPILE_TEST is helpful to find compilation errors in other platform(e.g.X86). In this case, the support of COMPILE_TEST is added, so this module could be compiled in other platform(e.g.X86), without ARCH_SYNQUACER configuration. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2021-10-25	mlxsw: pci: Recycle received packet upon allocation failure	Ido Schimmel
	When the driver fails to allocate a new Rx buffer, it passes an empty Rx descriptor (contains zero address and size) to the device and marks it as invalid by setting the skb pointer in the descriptor's metadata to NULL. After processing enough Rx descriptors, the driver will try to process the invalid descriptor, but will return immediately seeing that the skb pointer is NULL. Since the driver no longer passes new Rx descriptors to the device, the Rx queue will eventually become full and the device will start to drop packets. Fix this by recycling the received packet if allocation of the new packet failed. This means that allocation is no longer performed at the end of the Rx routine, but at the start, before tearing down the DMA mapping of the received packet. Remove the comment about the descriptor being zeroed as it is no longer correct. This is OK because we either use the descriptor as-is (when recycling) or overwrite its address and size fields with that of the newly allocated Rx buffer. The issue was discovered when a process ("perf") consumed too much memory and put the system under memory pressure. It can be reproduced by injecting slab allocation failures [1]. After the fix, the Rx queue no longer comes to a halt. [1] # echo 10 > /sys/kernel/debug/failslab/times # echo 1000 > /sys/kernel/debug/failslab/interval # echo 100 > /sys/kernel/debug/failslab/probability FAULT_INJECTION: forcing a failure. name failslab, interval 1000, probability 100, space 0, times 8 [...] Call Trace: <IRQ> dump_stack_lvl+0x34/0x44 should_fail.cold+0x32/0x37 should_failslab+0x5/0x10 kmem_cache_alloc_node+0x23/0x190 __alloc_skb+0x1f9/0x280 __netdev_alloc_skb+0x3a/0x150 mlxsw_pci_rdq_skb_alloc+0x24/0x90 mlxsw_pci_cq_tasklet+0x3dc/0x1200 tasklet_action_common.constprop.0+0x9f/0x100 __do_softirq+0xb5/0x252 irq_exit_rcu+0x7a/0xa0 common_interrupt+0x83/0xa0 </IRQ> asm_common_interrupt+0x1e/0x40 RIP: 0010:cpuidle_enter_state+0xc8/0x340 [...] mlxsw_spectrum2 0000:06:00.0: Failed to alloc skb for RDQ Fixes: eda6500a987a ("mlxsw: Add PCI bus implementation") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20211024064014.1060919-1-idosch@idosch.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25	nvdimm/pmem: stop using q_usage_count as external pgmap refcount	Christoph Hellwig
	Originally all DAX access when through block_device operations and thus needed a queue reference. But since commit cccbce671582 ("filesystem-dax: convert to dax_direct_access()") all this happens at the DAX device level which uses its own refcounting. Having the external refcount thus wasn't needed but has otherwise been harmless for long time. But now that "block: drain file system I/O on del_gendisk" waits for q_usage_count to reach 0 in del_gendisk this whole scheme can't work anymore (and pmem is the only driver abusing q_usage_count like that). So switch to the internal reference and remove the unbalanced blk_freeze_queue_start that is taken care of by del_gendisk. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019073641.2323410-2-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-10-25	ice: check whether PTP is initialized in ice_ptp_release()	Yongxin Liu
	PTP is currently only supported on E810 devices, it is checked in ice_ptp_init(). However, there is no check in ice_ptp_release(). For other E800 series devices, ice_ptp_release() will be wrongly executed. Fix the following calltrace. INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. Workqueue: ice ice_service_task [ice] Call Trace: dump_stack_lvl+0x5b/0x82 dump_stack+0x10/0x12 register_lock_class+0x495/0x4a0 ? find_held_lock+0x3c/0xb0 __lock_acquire+0x71/0x1830 lock_acquire+0x1e6/0x330 ? ice_ptp_release+0x3c/0x1e0 [ice] ? _raw_spin_lock+0x19/0x70 ? ice_ptp_release+0x3c/0x1e0 [ice] _raw_spin_lock+0x38/0x70 ? ice_ptp_release+0x3c/0x1e0 [ice] ice_ptp_release+0x3c/0x1e0 [ice] ice_prepare_for_reset+0xcb/0xe0 [ice] ice_do_reset+0x38/0x110 [ice] ice_service_task+0x138/0xf10 [ice] ? __this_cpu_preempt_check+0x13/0x20 process_one_work+0x26a/0x650 worker_thread+0x3f/0x3b0 ? __kthread_parkme+0x51/0xb0 ? process_one_work+0x650/0x650 kthread+0x161/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 Fixes: 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-25	ice: Respond to a NETDEV_UNREGISTER event for LAG	Dave Ertman
	When the PF is a member of a link aggregate, and the driver is removed, the process will hang unless we respond to the NETDEV_UNREGISTER event that is sent to the event_handler for LAG. Add a case statement for the ice_lag_event_handler to unlink the PF from the link aggregate. Also remove code that was incorrectly applying a dev_hold to peer_netdevs that were associated with the ice driver. Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-25	spi: Fix tegra20 build with CONFIG_PM=n once again	Linus Torvalds
	Commit efafec27c565 ("spi: Fix tegra20 build with CONFIG_PM=n") already fixed the build without PM support once. There was an alternative fix by Guenter in commit 2bab94090b01 ("spi: tegra20-slink: Declare runtime suspend and resume functions conditionally"), and Mark then merged the two correctly in ffb1e76f4f32 ("Merge tag 'v5.15-rc2' into spi-5.15"). But for some inexplicable reason, Mark then merged things _again_ in commit 59c4e190b10c ("Merge tag 'v5.15-rc3' into spi-5.15"), and screwed things up at that point, and the __maybe_unused attribute on tegra_slink_runtime_resume() went missing. Reinstate it, so that alpha (and other architectures without PM support) builds cleanly again. Btw, this is another prime example of how random back-merges are not good. Just don't do them. Subsystem developers should not merge my tree in any normal circumstances. Both of those merge commits pointed to above are bad: even the one that got the merge result right doesn't even mention _why_ it was done, and the one that got it wrong is obviously broken. Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-25	Merge tag 'libata-5.15-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull libata fix from Damien Le Moal: "A single fix in this pull request addressing an invalid error code return in the sata_mv driver (from Zheyu)" * tag 'libata-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: sata_mv: Fix the error handling of mv_chip_id()
2021-10-25	Merge tag 'pinctrl-v5.15-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some late pin control fixes, the most generally annoying will probably be the AMD IRQ storm fix affecting the Microsoft surface. Summary: - Three fixes pertaining to Broadcom DT bindings. Some stuff didn't work out as inteded, we need to back out - A resume bug fix in the STM32 driver - Disable and mask the interrupts on probe in the AMD pinctrl driver, affecting Microsoft surface" * tag 'pinctrl-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: disable and mask interrupts on probe pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume() Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode" dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from example Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"
2021-10-25	fs: get rid of the res2 iocb->ki_complete argument	Jens Axboe
	The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25	usb: remove res2 argument from gadget code completions	Jens Axboe
	The USB gadget code is the only code that every tried to utilize the 2nd argument of the aio completions, but there are strong suspicions that it was never actually used by anything on the userspace side. Out of the 3 cases that touch it, two of them just pass in the same as res, and the last one passes in error/transfer in res like any other normal use case. Remove the 2nd argument, pass 0 like the rest of the in-kernel users of kiocb based IO. Link: https://lore.kernel.org/linux-block/20211021174021.273c82b1.john@metanate.com/ Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25	xen/netfront: stop tx queues during live migration	Dongli Zhang
	The tx queues are not stopped during the live migration. As a result, the ndo_start_xmit() may access netfront_info->queues which is freed by talk_to_netback()->xennet_destroy_queues(). This patch is to netif_device_detach() at the beginning of xen-netfront resuming, and netif_device_attach() at the end of resuming. CPU A CPU B talk_to_netback() -> if (info->queues) xennet_destroy_queues(info); to free netfront_info->queues xennet_start_xmit() to access netfront_info->queues -> err = xennet_create_queues(info, &num_queues); The idea is borrowed from virtio-net. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25	RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a string	Mark Zhang
	When copying the device name, the length of the data memcpy copied exceeds the length of the source buffer, which cause the KASAN issue below. Use strscpy_pad() instead. BUG: KASAN: slab-out-of-bounds in ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] Read of size 64 at addr ffff88811a10f5e0 by task rping/140263 CPU: 3 PID: 140263 Comm: rping Not tainted 5.15.0-rc1+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1d/0xa0 kasan_report+0xcb/0x110 kasan_check_range+0x13d/0x180 memcpy+0x20/0x60 ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] ib_nl_make_request+0x1c6/0x380 [ib_core] send_mad+0x20a/0x220 [ib_core] ib_sa_path_rec_get+0x3e3/0x800 [ib_core] cma_query_ib_route+0x29b/0x390 [rdma_cm] rdma_resolve_route+0x308/0x3e0 [rdma_cm] ucma_resolve_route+0xe1/0x150 [rdma_ucm] ucma_write+0x17b/0x1f0 [rdma_ucm] vfs_write+0x142/0x4d0 ksys_write+0x133/0x160 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f26499aa90f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48 RSP: 002b:00007f26495f2dc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000000007d0 RCX: 00007f26499aa90f RDX: 0000000000000010 RSI: 00007f26495f2e00 RDI: 0000000000000003 RBP: 00005632a8315440 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26495f2e00 R13: 00005632a83154e0 R14: 00005632a8315440 R15: 00005632a830a810 Allocated by task 131419: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 proc_self_get_link+0x8b/0x100 pick_link+0x4f1/0x5c0 step_into+0x2eb/0x3d0 walk_component+0xc8/0x2c0 link_path_walk+0x3b8/0x580 path_openat+0x101/0x230 do_filp_open+0x12e/0x240 do_sys_openat2+0x115/0x280 __x64_sys_openat+0xce/0x140 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Link: https://lore.kernel.org/r/72ede0f6dab61f7f23df9ac7a70666e07ef314b0.1635055496.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-25	net: nxp: lpc_eth.c: avoid hang when bringing interface down	Trevor Woerner
	A hard hang is observed whenever the ethernet interface is brought down. If the PHY is stopped before the LPC core block is reset, the SoC will hang. Comparing lpc_eth_close() and lpc_eth_open() I re-arranged the ordering of the functions calls in lpc_eth_close() to reset the hardware before stopping the PHY. Fixes: b7370112f519 ("lpc32xx: Added ethernet driver") Signed-off-by: Trevor Woerner <twoerner@gmail.com> Acked-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25	block: ataflop: more blk-mq refactoring fixes	Michael Schmitz
	As it turns out, my earlier patch in commit 86d46fdaa12a (block: ataflop: fix breakage introduced at blk-mq refactoring) was incomplete. This patch fixes any remaining issues found during more testing and code review. Requests exceeding 4 k are handled in 4k segments but __blk_mq_end_request() is never called on these (still sectors outstanding on the request). With redo_fd_request() removed, there is no provision to kick off processing of the next segment, causing requests exceeding 4k to hang. (By setting /sys/block/fd0/queue/max_sectors_k <= 4 as workaround, this behaviour can be avoided). Instead of reintroducing redo_fd_request(), requeue the remainder of the request by calling blk_mq_requeue_request() on incomplete requests (i.e. when blk_update_request() still returns true), and rely on the block layer to queue the residual as new request. Both error handling and formatting needs to release the ST-DMA lock, so call finish_fdc() on these (this was previously handled by redo_fd_request()). finish_fdc() may be called legitimately without the ST-DMA lock held - make sure we only release the lock if we actually held it. In a similar way, early exit due to errors in ataflop_queue_rq() must release the lock. After minor errors, fd_error sets up to recalibrate the drive but never re-runs the current operation (another task handled by redo_fd_request() before). Call do_fd_action() to get the next steps (seek, retry read/write) underway. Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Fixes: 6ec3938cff95f (ataflop: convert to blk-mq) CC: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20211024002013.9332-1-schmitzmic@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25	phy: phy_ethtool_ksettings_set: Lock the PHY while changing settings	Andrew Lunn
	There is a race condition where the PHY state machine can change members of the phydev structure at the same time userspace requests a change via ethtool. To prevent this, have phy_ethtool_ksettings_set take the PHY lock. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Reported-by: Walter Stoll <Walter.Stoll@duagon.com> Suggested-by: Walter Stoll <Walter.Stoll@duagon.com> Tested-by: Walter Stoll <Walter.Stoll@duagon.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25	phy: phy_start_aneg: Add an unlocked version	Andrew Lunn
	Split phy_start_aneg into a wrapper which takes the PHY lock, and a helper doing the real work. This will be needed when phy_ethtook_ksettings_set takes the lock. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25	phy: phy_ethtool_ksettings_set: Move after phy_start_aneg	Andrew Lunn
	This allows it to make use of a helper which assume the PHY is already locked. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25	phy: phy_ethtool_ksettings_get: Lock the phy for consistency	Andrew Lunn
	The PHY structure should be locked while copying information out if it, otherwise there is no guarantee of self consistency. Without the lock the PHY state machine could be updating the structure. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25	irq: arm: perform irqentry in entry code	Mark Rutland
	In preparation for removing HANDLE_DOMAIN_IRQ_IRQENTRY, have arch/arm perform all the irqentry accounting in its entry code. For configurations with CONFIG_GENERIC_IRQ_MULTI_HANDLER, we can use generic_handle_arch_irq(). Other than asm_do_IRQ(), all C calls to handle_IRQ() are from irqchip handlers which will be called from generic_handle_arch_irq(), so to avoid double accounting IRQ entry, the entry logic is moved from handle_IRQ() into asm_do_IRQ(). For ARMv7M the entry assembly is tightly coupled with the NVIC irqchip, and while the entry code should logically live under arch/arm/, moving the entry logic there makes things more convoluted. So for now, place the entry logic in the NVIC irqchip, but separated into a separate function to make the split of responsibility clear. For all other configurations without CONFIG_GENERIC_IRQ_MULTI_HANDLER, IRQ entry is already handled in arch code, and requires no changes. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25	irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ	Mark Rutland
	In preparation for removing HANDLE_DOMAIN_IRQ, have arch/nds32 perform all the necessary IRQ entry accounting in its entry code. Currently arch/nds32 is tightly coupled with the ativic32 irqchip, and while the entry code should logically live under arch/nds32/, moving the entry logic there makes things more convoluted. So for now, place the entry logic in the ativic32 irqchip, but separated into a separate function to make the split of responsibility clear. In future this should probably use GENERIC_IRQ_MULTI_HANDLER to cleanly decouple this. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Chen <deanbo422@gmail.com>
2021-10-25	irq: mips: simplify bcm6345_l1_irq_handle()	Mark Rutland
	As bcm6345_l1_irq_handle() only needs to know /whether/ an IRQ was resolved, and doesn't need to know the specific IRQ, it's simpler for it to call generic_handle_domain_irq() directly and check the return code, so let's do that. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25	irq: mips: avoid nested irq_enter()	Mark Rutland
	As bcm6345_l1_irq_handle() is a chained irqchip handler, it will be invoked within the context of the root irqchip handler, which must have entered IRQ context already. When bcm6345_l1_irq_handle() calls arch/mips's do_IRQ() , this will nest another call to irq_enter(), and the resulting nested increment to `rcu_data.dynticks_nmi_nesting` will cause rcu_is_cpu_rrupt_from_idle() to fail to identify wakeups from idle, resulting in failure to preempt, and RCU stalls. Chained irqchip handlers must invoke IRQ handlers by way of thee core irqchip code, i.e. generic_handle_irq() or generic_handle_domain_irq() and should not call do_IRQ(), which is intended only for root irqchip handlers. Fix bcm6345_l1_irq_handle() by calling generic_handle_irq() directly. Fixes: c7c42ec2baa1de7a ("irqchips/bmips: Add bcm6345-l1 interrupt controller") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25	gpio: mlxbf2.c: Add check for bgpio_init failure	Asmaa Mnebhi
	Add a check if bgpio_init fails. Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-10-25	gpio: xgs-iproc: fix parsing of ngpios property	Jonas Gorski
	of_property_read_u32 returns 0 on success, not true, so we need to invert the check to actually take over the provided ngpio value. Fixes: 6a41b6c5fc20 ("gpio: Add xgs-iproc driver") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2021-10-25	Merge branch irq/mchp-eic into irq/irqchip-next	Marc Zyngier
	* irq/mchp-eic: : . : New irqchip driver for the Microchip EIC block : . irqchip/mchp-eic: Fix return value check in mchp_eic_init() irqchip/mchp-eic: Add support for the Microchip EIC dt-bindings: microchip,eic: Add bindings for the Microchip EIC Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-25	Merge branch irq/modular-irqchips into irq/irqchip-next	Marc Zyngier
	* irq/modular-irqchips: : . : Update a set of irqchip drivers to be build as modules. : : This includes an Amlogic and multiple Broadcom drivers, triggering : a cascade of other changes (MIPS arch code, symbols being exported, : config changes) : . irqchip: Fix kernel-doc parameter typo for IRQCHIP_DECLARE ARM: bcm: Removed forced select of interrupt controllers arm64: broadcom: Removed forced select of interrupt controllers irqchip/irq-bcm7120-l2: Switch to IRQCHIP_PLATFORM_DRIVER genirq: Export irq_gc_noop() irqchip/irq-brcmstb-l2: Switch to IRQCHIP_PLATFORM_DRIVER genirq: Export irq_gc_{unmask_enable,mask_disable}_reg irqchip/irq-bcm7038-l1: Switch to IRQCHIP_PLATFORM_DRIVER irqchip/irq-bcm7038-l1: Restrict affinity setting to MIPS irqchip/irq-bcm7038-l1: Gate use of CPU logical map to MIPS irqchip/irq-bcm7038-l1: Use irq_get_irq_data() irqchip/irq-bcm7038-l1: Remove .irq_cpu_offline() MIPS: BMIPS: Remove use of irq_cpu_offline arm64: meson: remove MESON_IRQ_GPIO selection irqchip/meson-gpio: Make it possible to build as a module irqchip: Provide stronger type checking for IRQCHIP_MATCH/IRQCHIP_DECLARE Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-25	irqchip/mchp-eic: Fix return value check in mchp_eic_init()	Yang Yingliang
	In case of error, the function of_iomap() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211025050055.1129845-1-yangyingliang@huawei.com
2021-10-25	ata: sata_mv: Fix the error handling of mv_chip_id()	Zheyu Ma
	mv_init_host() propagates the value returned by mv_chip_id() which in turn gets propagated by mv_pci_init_one() and hits local_pci_probe(). During the process of driver probing, the probe function should return < 0 for failure, otherwise, the kernel will treat value > 0 as success. Since this is a bug rather than a recoverable runtime error we should use dev_alert() instead of dev_err(). Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2021-10-24	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Ten fixes, seven of which are in drivers. The core fixes are one to fix a potential crash on resume, one to sort out our reference count releases to avoid releasing in-use modules and one to adjust the cmd per lun calculation to avoid an overflow in hyper-v" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: ufs-pci: Force a full restore after suspend-to-disk scsi: qla2xxx: Fix unmap of already freed sgl scsi: qla2xxx: Fix a memory leak in an error path of qla2x00_process_els() scsi: qla2xxx: Return -ENOMEM if kzalloc() fails scsi: sd: Fix crashes in sd_resume_runtime() scsi: mpi3mr: Fix duplicate device entries when scanning through sysfs scsi: core: Put LLD module refcnt after SCSI device is released scsi: storvsc: Fix validation for unsolicited incoming packets scsi: iscsi: Fix set_param() handling scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
2021-10-24	net: ethernet: microchip: lan743x: Fix dma allocation failure by using ↵	Yuiko Oshino
	dma_set_mask_and_coherent The dma failure was reported in the raspberry pi github (issue #4117). https://github.com/raspberrypi/linux/issues/4117 The use of dma_set_mask_and_coherent fixes the issue. Tested on 32/64-bit raspberry pi CM4 and 64-bit ubuntu x86 PC with EVB-LAN7430. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24	net: ethernet: microchip: lan743x: Fix driver crash when lan743x_pm_resume fails	Yuiko Oshino
	The driver needs to clean up and return when the initialization fails on resume. Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver") Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-22	Merge tag 'hyperv-fixes-signed-20211022' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyper-v fix from Wei Liu: - Fix vmbus ARM64 build (Arnd Bergmann) * tag 'hyperv-fixes-signed-20211022' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: hyperv/vmbus: include linux/bitops.h
2021-10-22	hyperv/vmbus: include linux/bitops.h	Arnd Bergmann
	On arm64 randconfig builds, hyperv sometimes fails with this error: In file included from drivers/hv/hv_trace.c:3: In file included from drivers/hv/hyperv_vmbus.h:16: In file included from arch/arm64/include/asm/sync_bitops.h:5: arch/arm64/include/asm/bitops.h:11:2: error: only <linux/bitops.h> can be included directly In file included from include/asm-generic/bitops/hweight.h:5: include/asm-generic/bitops/arch_hweight.h:9:9: error: implicit declaration of function '__sw_hweight32' [-Werror,-Wimplicit-function-declaration] include/asm-generic/bitops/atomic.h:17:7: error: implicit declaration of function 'BIT_WORD' [-Werror,-Wimplicit-function-declaration] Include the correct header first. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20211018131929.2260087-1-arnd@kernel.org Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-22	Merge tag 'acpi-5.15-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix two regressions, one related to ACPI power resources management and one that broke ACPI tools compilation. Specifics: - Stop turning off unused ACPI power resources in an unknown state to address a regression introduced during the 5.14 cycle (Rafael Wysocki). - Fix an ACPI tools build issue introduced recently when the minimal stdarg.h was added (Miguel Bernal Marin)" * tag 'acpi-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PM: Do not turn off power resources in unknown state ACPI: tools: fix compilation error
2021-10-22	xen-blkback: use sync_blockdev	Christoph Hellwig
	Use sync_blockdev instead of opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211019062530.2174626-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22	block: remove support for cryptoloop and the xor transfer	Christoph Hellwig
	Support for cyrptoloop has been officially marked broken and deprecated in favor of dm-crypt (which supports the same broken algorithms if needed) in Linux 2.6.4 (released in March 2004), and support for it has been entirely removed from losetup in util-linux 2.23 (released in April 2013). The XOR transfer has never been more than a toy to demonstrate the transfer in the bad old times of crypto export restrictions. Remove them as they have some nasty interactions with loop device life times due to the iteration over all loop devices in loop_unregister_transfer. Suggested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019075639.2333969-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22	block: remove QUEUE_FLAG_SCSI_PASSTHROUGH	Christoph Hellwig
	Export scsi_device_from_queue for use with pktcdvd and use that instead of the otherwise unused QUEUE_FLAG_SCSI_PASSTHROUGH queue flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22	scsi: add a scsi_alloc_request helper	Christoph Hellwig
	Add a new helper that calls blk_get_request and initializes the scsi_request to avoid the indirect call through ->.initialize_rq_fn. Note that this makes the pktcdvd driver depend on the SCSI core, but given that only SCSI devices support SCSI passthrough requests that is not a functional change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22	sd: implement ->get_unique_id	Christoph Hellwig
	Add the method to query for a uniqueue ID of a given type by looking it up in the cached device identification VPD page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>