summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-07-28ARM: dts: aspeed: rainier: Add I2C buses for NVMe useJet Li
Adding pca9552 exposes the presence detect lines for the cards and tca9554 exposes the presence details for the cards. Signed-off-by: Jet Li <Jet.Li@ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2020-07-28ARM: dts: aspeed: Initial device tree for AMD EthanolXSupreeth Venkatesh
Initial introduction of AMD EthanolX platform equipped with an Aspeed ast2500 BMC manufactured by AMD. AMD EthanolX platform is an AMD customer reference board with an Aspeed ast2500 BMC manufactured by AMD. This adds AMD EthanolX device tree file including the flash layout used by EthanolX BMC machines. Signed-off-by: Supreeth Venkatesh <supreeth.venkatesh@amd.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
2020-07-28ARM: dts: rainier: Describe GPIO mux on I2C3Andrew Jeffery
We have a 4-bus mux whose output is selected by two GPIO inputs. Wire it up in the devicetree and ensure the output is enabled by hogging the appropriate line. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au>
2020-07-27sched: Fix a typo in a comment王文虎
Change the comment typo: "direcly" -> "directly". Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/AAcAXwBTDSpsKN-5iyIOtaqk.1.1595857191899.Hmail.wenhu.wang@vivo.com
2020-07-27arm64: dts: qcom: sc7180: Add iommus property to ETRSai Prakash Ranjan
Define iommus property for Coresight ETR component in SC7180 SoC with the SID and mask to enable SMMU translation for this master. Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Link: https://lore.kernel.org/r/2312c9a10e7251d69e31e4f51c0f1d70e6f2f2f5.1591708204.git.saiprakash.ranjan@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-07-27arm64: dts: qcom: sc7180: Add support to skip powering up of ETMSai Prakash Ranjan
Add "qcom,skip-power-up" property to skip powering up ETM on SC7180 SoC to workaround a hardware errata where CPU watchdog counter is stopped when ETM power up bit is set (i.e., when TRCPDCR.PU = 1). Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Link: https://lore.kernel.org/r/8c5ff297d8c89d9d451352f189baf26c8938842a.1591708204.git.saiprakash.ranjan@codeaurora.org Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-07-27arm64: dts: qcom: sc7180: Add properties to qfprom for fuse blowingRavi Kumar Bokka
This patch adds properties to the qfprom node to enable fuse blowing. Signed-off-by: Ravi Kumar Bokka <rbokka@codeaurora.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200710073439.v5.4.I70c17309f8b433e900656d7c53a2e6b61888bb68@changeid Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-07-27fix a braino in cmsghdr_from_user_compat_to_kern()Al Viro
commit 547ce4cfb34c ("switch cmsghdr_from_user_compat_to_kern() to copy_from_user()") missed one of the places where ucmlen should've been replaced with cmsg.cmsg_len, now that we are fetching the entire struct rather than doing it field-by-field. As the result, compat sendmsg() with several different-sized cmsg attached started to fail with EINVAL. Trivial to fix, fortunately. Fixes: 547ce4cfb34c ("switch cmsghdr_from_user_compat_to_kern() to copy_from_user()") Reported-by: Nick Bowler <nbowler@draconx.ca> Tested-by: Nick Bowler <nbowler@draconx.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27sh: Fix validation of system call numberMichael Karcher
The slow path for traced system call entries accessed a wrong memory location to get the number of the maximum allowed system call number. Renumber the numbered "local" label for the correct location to avoid collisions with actual local labels. Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Fixes: f3a8308864f920d2 ("sh: Add a few missing irqflags tracing markers.") Signed-off-by: Rich Felker <dalias@libc.org>
2020-07-27sh/tlb: Fix PGTABLE_LEVELS > 2Peter Zijlstra
Geert reported that his SH7722-based Migo-R board failed to boot after commit: c5b27a889da9 ("sh/tlb: Convert SH to generic mmu_gather") That commit fell victim to copying the wrong pattern -- __pmd_free_tlb() used to be implemented with pmd_free(). Fixes: c5b27a889da9 ("sh/tlb: Convert SH to generic mmu_gather") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rich Felker <dalias@libc.org>
2020-07-27drm: hold gem reference until object is no longer accessedSteve Cohen
A use-after-free in drm_gem_open_ioctl can happen if the GEM object handle is closed between the idr lookup and retrieving the size from said object since a local reference is not being held at that point. Hold the local reference while the object can still be accessed to fix this and plug the potential security hole. Signed-off-by: Steve Cohen <cohens@codeaurora.org> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/1595284250-31580-1-git-send-email-cohens@codeaurora.org
2020-07-27Merge branch 'selftests-net-Fix-clang-warnings-on-powerpc'David S. Miller
Tanner Love says: ==================== selftests/net: Fix clang warnings on powerpc This is essentially a v2 of http://patchwork.ozlabs.org/project/netdev/patch/20200724181757.2331172-1-tannerlove.kernel@gmail.com/, but it has been split up in order to have only one "Fixes" tag per patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27selftests/net: tcp_mmap: fix clang warning for target arch PowerPCTanner Love
When size_t maps to unsigned int (e.g. on 32-bit powerpc), then the comparison with 1<<35 is always true. Clang 9 threw: warning: result of comparison of constant 34359738368 with \ expression of type 'size_t' (aka 'unsigned int') is always true \ [-Wtautological-constant-out-of-range-compare] while (total < FILE_SZ) { Tested: make -C tools/testing/selftests TARGETS="net" run_tests Fixes: 192dc405f308 ("selftests: net: add tcp_mmap program") Signed-off-by: Tanner Love <tannerlove@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27selftests/net: so_txtime: fix clang issues for target arch PowerPCTanner Love
On powerpcle, int64_t maps to long long. Clang 9 threw: warning: absolute value function 'labs' given an argument of type \ 'long long' but has parameter of type 'long' which may cause \ truncation of value [-Wabsolute-value] if (labs(tstop - texpect) > cfg_variance_us) Tested: make -C tools/testing/selftests TARGETS="net" run_tests Fixes: af5136f95045 ("selftests/net: SO_TXTIME with ETF and FQ") Signed-off-by: Tanner Love <tannerlove@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27selftests/net: psock_fanout: fix clang issues for target arch PowerPCTanner Love
Clang 9 threw: warning: format specifies type 'unsigned short' but the argument has \ type 'int' [-Wformat] typeflags, PORT_BASE, PORT_BASE + port_off); Tested: make -C tools/testing/selftests TARGETS="net" run_tests Fixes: 77f65ebdca50 ("packet: packet fanout rollover during socket overload") Signed-off-by: Tanner Love <tannerlove@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27selftests/net: rxtimestamp: fix clang issues for target arch PowerPCTanner Love
The signedness of char is implementation-dependent. Some systems (including PowerPC and ARM) use unsigned char. Clang 9 threw: warning: result of comparison of constant -1 with expression of type \ 'char' is always true [-Wtautological-constant-out-of-range-compare] &arg_index)) != -1) { Tested: make -C tools/testing/selftests TARGETS="net" run_tests Fixes: 16e781224198 ("selftests/net: Add a test to validate behavior of rx timestamps") Signed-off-by: Tanner Love <tannerlove@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27net: mscc: ocelot: fix hardware timestamp dequeue logiclaurent brando
The next hw timestamp should be snapshoot to the read registers only once the current timestamp has been read. If none of the pending skbs matches the current HW timestamp just gracefully flush the available timestamp by reading it. Signed-off-by: laurent brando <laurent.brando@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27arm64: dts: meson: fix mmc0 tuning error on Khadas VIM3Christian Hewitt
Similar to other G12B devices using the W400 dtsi, I see reports of mmc0 tuning errors on VIM3 after a few hours uptime: [12483.917391] mmc0: tuning execution failed: -5 [30535.551221] mmc0: tuning execution failed: -5 [35359.953671] mmc0: tuning execution failed: -5 [35561.875332] mmc0: tuning execution failed: -5 [61733.348709] mmc0: tuning execution failed: -5 I do not see the same on VIM3L, so remove sd-uhs-sdr50 from the common dtsi to silence the error, then (re)add it to the VIM3L dts. Fixes: 4f26cc1c96c9 ("arm64: dts: khadas-vim3: move common nodes into meson-khadas-vim3.dtsi") Fixes: 700ab8d83927 ("arm64: dts: khadas-vim3: add support for the SM1 based VIM3L") Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Kevin Hilman <khilman@baylibre.com> Link: https://lore.kernel.org/r/20200721015950.11816-1-christianshewitt@gmail.com
2020-07-27arm64: dts: meson: misc fixups for w400 dtsiChristian Hewitt
Current devices using the W400 dtsi show mmc tuning errors: [12483.917391] mmc0: tuning execution failed: -5 [30535.551221] mmc0: tuning execution failed: -5 [35359.953671] mmc0: tuning execution failed: -5 [35561.875332] mmc0: tuning execution failed: -5 [61733.348709] mmc0: tuning execution failed: -5 Removing "sd-uhs-sdr50" from the SDIO node prevents this. We also add keep-power-in-suspend to the SDIO node and fix an indentation. Fixes: 3cb74db9b256 ("arm64: dts: meson: convert ugoos-am6 to common w400 dtsi") Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Kevin Hilman <khilman@baylibre.com> Link: https://lore.kernel.org/r/20200721013952.11635-1-christianshewitt@gmail.com
2020-07-27mptcp: fix joined subflows with unblocking skMatthieu Baerts
Unblocking sockets used for outgoing connections were not containing inet info about the initial connection due to a typo there: the value of "err" variable is negative in the kernelspace. This fixes the creation of additional subflows where the remote port has to be reused if the other host didn't announce another one. This also fixes inet_diag showing blank info about MPTCP sockets from unblocking sockets doing a connect(). Fixes: 41be81a8d3d0 ("mptcp: fix unblocking connect()") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27drm/dbi: Fix SPI Type 1 (9-bit) transferPaul Cercueil
The function mipi_dbi_spi1_transfer() will transfer its payload as 9-bit data, the 9th (MSB) bit being the data/command bit. In order to do that, it unpacks the 8-bit values into 16-bit values, then sets the 9th bit if the byte corresponds to data, clears it otherwise. The 7 MSB are padding. The array of now 16-bit values is then passed to the SPI core for transfer. This function was broken since its introduction, as the length of the SPI transfer was set to the payload size before its conversion, but the payload doubled in size due to the 8-bit -> 16-bit conversion. Fixes: 02dd95fe3169 ("drm/tinydrm: Add MIPI DBI support") Cc: <stable@vger.kernel.org> # 5.4+ Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Noralf Trønnes <noralf@tronnes.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20200703141341.1266263-1-paul@crapouillou.net
2020-07-27MAINTAINERS: Update GENI I2C maintainers listAkash Asthana
Alok Chauhan has moved out of GENI team, he no longer supports GENI I2C driver, remove him from maintainer list. Add Akash Asthana & Mukesh Savaliya as maintainers for GENI I2C drivers. Signed-off-by: Akash Asthana <akashast@codeaurora.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-07-27i2c: also convert placeholder function to return errnoWolfram Sang
All i2c_new_device-alike functions return ERR_PTR these days, but this fallback function was missed. Fixes: 2dea645ffc21 ("i2c: acpi: Return error pointers from i2c_acpi_new_device()") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [wsa: changed from 'ENOSYS' to 'ENODEV'] Signed-off-by: Wolfram Sang <wsa@kernel.org>
2020-07-27ALSA: hda: Workaround for spurious wakeups on some Intel platformsTakashi Iwai
We've received a regression report on Intel HD-audio controller that wakes up immediately after S3 suspend. The bisection leads to the commit c4c8dd6ef807 ("ALSA: hda: Skip controller resume if not needed"). This commit replaces the system-suspend to use pm_runtime_force_suspend() instead of the direct call of __azx_runtime_suspend(). However, by some really mysterious reason, pm_runtime_force_suspend() causes a spurious wakeup (although it calls the same __azx_runtime_suspend() internally). As an ugly workaround for now, revert the behavior to call __azx_runtime_suspend() and __azx_runtime_resume() for those old Intel platforms that may exhibit such a problem, while keeping the new standard pm_runtime_force_suspend() and pm_runtime_force_resume() pair for the remaining chips. Fixes: c4c8dd6ef807 ("ALSA: hda: Skip controller resume if not needed") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208649 Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200727164443.4233-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-07-27fscrypt: document inline encryption supportSatya Tangirala
Update the fscrypt documentation file for inline encryption support. Signed-off-by: Satya Tangirala <satyat@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20200724184501.1651378-7-satyat@google.com Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-27Merge tag 'at91-dt-5.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/dt AT91 DT for 5.9 - ClassD pull down fixes - Enable RTT as RTC on sam9x60ek - Fix phy-mode for sama5d3_xplained * tag 'at91-dt-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: dts: at91: sama5d3_xplained: change phy-mode ARM: dts: at91: sama5d2_xplained: Remove pdmic node ARM: dts: sam9x60: add rtt dt-bindings: rtc: add microchip,sam9x60-rtt ARM: dts: at91: sam9x60ek: classd: pull-down the L1 and L3 lines ARM: dts: at91: sama5d2_xplained: classd: pull-down the R1 and R3 lines Link: https://lore.kernel.org/r/20200726193207.GA182066@piout.net Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-07-27Merge tag 'at91-soc-5.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/soc AT91 SoC for 5.9 - Two small fixes * tag 'at91-soc-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: at91: Replace HTTP links with HTTPS ones ARM: at91: pm: add missing put_device() call in at91_pm_sram_init() Link: https://lore.kernel.org/r/20200726193335.GA182444@piout.net Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-07-27RDMA/mlx5: Fix prefetch memory leak if get_prefetchable_mr failsJason Gunthorpe
destroy_prefetch_work() must always be called if the work is not going to be queued. The num_sge also should have been set to i, not i-1 which avoids the condition where it shouldn't have been called in the first place. Cc: stable@vger.kernel.org Fixes: fb985e278a30 ("RDMA/mlx5: Use SRCU properly in ODP prefetch") Link: https://lore.kernel.org/r/20200727095712.495652-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27RDMA/cm: Add min length checks to user structure copiesJason Gunthorpe
These are missing throughout ucma, it harmlessly copies garbage from userspace, but in this new code which uses min to compute the copy length it can result in uninitialized stack memory. Check for minimum length at the very start. BUG: KMSAN: uninit-value in ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 CPU: 0 PID: 8457 Comm: syz-executor069 Not tainted 5.8.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1df/0x240 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 ucma_write+0x5c5/0x630 drivers/infiniband/core/ucma.c:1764 do_loop_readv_writev fs/read_write.c:737 [inline] do_iter_write+0x710/0xdc0 fs/read_write.c:1020 vfs_writev fs/read_write.c:1091 [inline] do_writev+0x42d/0x8f0 fs/read_write.c:1134 __do_sys_writev fs/read_write.c:1207 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1204 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1204 do_syscall_64+0xb0/0x150 arch/x86/entry/common.c:386 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 34e2ab57a911 ("RDMA/ucma: Extend ucma_connect to receive ECE parameters") Fixes: 0cb15372a615 ("RDMA/cma: Connect ECE to rdma_accept") Link: https://lore.kernel.org/r/0-v1-d5b86dab17dc+28c25-ucma_syz_min_jgg@nvidia.com Reported-by: syzbot+086ab5ca9eafd2379aa6@syzkaller.appspotmail.com Reported-by: syzbot+7446526858b83c8828b2@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27drm/drm_fb_helper: fix fbdev with sparc64Sam Ravnborg
Recent kernels have been reported to panic using the bochs_drm framebuffer under qemu-system-sparc64 which was bisected to commit 7a0483ac4ffc ("drm/bochs: switch to generic drm fbdev emulation"). The backtrace indicates that the shadow framebuffer copy in drm_fb_helper_dirty_blit_real() is trying to access the real framebuffer using a virtual address rather than use an IO access typically implemented using a physical (ASI_PHYS) access on SPARC. The fix is to replace the memcpy with memcpy_toio() from io.h. memcpy_toio() uses writeb() where the original fbdev code used sbus_memcpy_toio(). The latter uses sbus_writeb(). The difference between writeb() and sbus_memcpy_toio() is that writeb() writes bytes in little-endian, where sbus_writeb() writes bytes in big-endian. As endian does not matter for byte writes they are the same. So we can safely use memcpy_toio() here. Note that this only fixes bochs, in general fbdev helpers still have issues with mixing up system memory and __iomem space. Fixing that will require a lot more work. v3: - Improved changelog (Daniel) - Added FIXME to fbdev_use_iomem (Daniel) v2: - Added missing __iomem cast (kernel test robot) - Made changelog readable and fix typos (Mark) - Add flag to select iomem - and set it in the bochs driver Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reported-by: kernel test robot <lkp@intel.com> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20200709193016.291267-1-sam@ravnborg.org Link: https://patchwork.freedesktop.org/patch/msgid/20200725191012.GA434957@ravnborg.org
2020-07-27genirq/debugfs: Add missing irqchip flagsMarc Zyngier
Recently introduced irqchip flags lack the corresponding printouts in debugfs. Add them. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/874kpvydxc.wl-maz@kernel.org
2020-07-27genirq/affinity: Make affinity setting if activated opt-inThomas Gleixner
John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de
2020-07-27platform/x86: thinkpad_acpi: use standard charge control attribute namesThomas Weißschuh
The standard attributes were only introduced after the ones from thinkpad_acpi in commit 813cab8f3994 ("power: supply: core: Add CHARGE_CONTROL_{START_THRESHOLD,END_THRESHOLD} properties"). The new standard attributes are aliased to their previous names, preserving backwards compatibility. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-07-27platform/x86: thinkpad_acpi: remove unused definesThomas Weißschuh
They were never used. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-07-27platform/x86: ISST: drop a duplicated word in isst_if.hRandy Dunlap
Drop the repeated word "for" in a comment. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: platform-driver-x86@vger.kernel.org Cc: Darren Hart <dvhart@infradead.org> Cc: Andy Shevchenko <andy@infradead.org> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-07-27locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIspeterz@infradead.org
Prior to commit: 859d069ee1dd ("lockdep: Prepare for NMI IRQ state tracking") IRQ state tracking was disabled in NMIs due to nmi_enter() doing lockdep_off() -- with the obvious requirement that NMI entry call nmi_enter() before trace_hardirqs_off(). [ AFAICT, PowerPC and SH violate this order on their NMI entry ] However, that commit explicitly changed lockdep_hardirqs_*() to ignore lockdep_off() and breaks every architecture that has irq-tracing in it's NMI entry that hasn't been fixed up (x86 being the only fixed one at this point). The reason for this change is that by ignoring lockdep_off() we can: - get rid of 'current->lockdep_recursion' in lockdep_assert_irqs*() which was going to to give header-recursion issues with the seqlock rework. - allow these lockdep_assert_*() macros to function in NMI context. Restore the previous state of things and allow an architecture to opt-in to the NMI IRQ tracking support, however instead of relying on lockdep_off(), rely on in_nmi(), both are part of nmi_enter() and so over-all entry ordering doesn't need to change. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200727124852.GK119549@hirez.programming.kicks-ass.net
2020-07-27KVM: nVMX: check for invalid hdr.vmx.flagsPaolo Bonzini
hdr.vmx.flags is meant for future extensions to the ABI, rejecting invalid flags is necessary to avoid broken half-loads of the nVMX state. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27KVM: nVMX: check for required but missing VMCS12 in KVM_SET_NESTED_STATEPaolo Bonzini
A missing VMCS12 was not causing -EINVAL (it was just read with copy_from_user, so it is not a security issue, but it is still wrong). Test for VMCS12 validity and reject the nested state if a VMCS12 is required but not present. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27selftests: kvm: do not set guest mode flagPaolo Bonzini
Setting KVM_STATE_NESTED_GUEST_MODE enables various consistency checks on VMCS12 and therefore causes KVM_SET_NESTED_STATE to fail spuriously with -EINVAL. Do not set the flag so that we're sure to cover the conditions included by the test, and cover the case where VMCS12 is set and KVM_SET_NESTED_STATE is called with invalid VMCS12 contents. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27Merge tag 'at91-defconfig-5.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/defconfig AT91 defconfig for 5.9 - Add ClassD, KSZ ethernet switches, brdige, vlan to sama5_defconfig - Reenable CAN support in sama5_defconfig * tag 'at91-defconfig-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: configs: at91: sama5: enable CAN PLATFORM driver ARM: configs: at91: sama5: enable bridge and VLAN filtering ARM: configs: at91: sama5: add support for KSZ ethernet switches ARM: configs: at91: sama5: Enable CLASSD Link: https://lore.kernel.org/r/20200726192810.GA181818@piout.net Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-07-27Merge tag 'drivers_soc_for_5.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into arm/drivers SOC: TI Keystone driver update for v5.9 - TI K3 Ring Accelerator updates - Few non critical warining fixes * tag 'drivers_soc_for_5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone: soc: TI knav_qmss: make symbol 'knav_acc_range_ops' static firmware: ti_sci: Replace HTTP links with HTTPS ones soc: ti/ti_sci_protocol.h: drop a duplicated word + clarify soc: ti: k3: fix semicolon.cocci warnings soc: ti: k3-ringacc: fix: warn: variable dereferenced before check 'ring' dmaengine: ti: k3-udma: Switch to k3_ringacc_request_rings_pair soc: ti: k3-ringacc: separate soc specific initialization soc: ti: k3-ringacc: add request pair of rings api. soc: ti: k3-ringacc: add ring's flags to dump soc: ti: k3-ringacc: Move state tracking variables under a struct dt-bindings: soc: ti: k3-ringacc: convert bindings to json-schema Link: https://lore.kernel.org/r/1595711814-7015-1-git-send-email-santosh.shilimkar@oracle.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-07-27ALSA: hda/realtek: Fix add a "ultra_low_power" function for intel reference ↵PeiSen Hou
board (alc256) Intel requires to enable power saving mode for intel reference board (alc256) Signed-off-by: PeiSen Hou <pshou@realtek.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200727115647.10967-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-07-27btrfs: do not set the full sync flag on the inode during page releaseFilipe Manana
When removing an extent map at try_release_extent_mapping(), called through the page release callback (btrfs_releasepage()), we always set the full sync flag on the inode, which forces the next fsync to use a slower code path. This hurts performance for workloads that dirty an amount of data that exceeds or is very close to the system's RAM memory and do frequent fsync operations (like database servers can for example). In particular if there are concurrent fsyncs against different files, by falling back to a full fsync we do a lot more checksum lookups in the checksums btree, as we do it for all the extents created in the current transaction, instead of only the new ones since the last fsync. These checksums lookups not only take some time but, more importantly, they also cause contention on the checksums btree locks due to the concurrency with checksum insertions in the btree by ordered extents from other inodes. We actually don't need to set the full sync flag on the inode, because we only remove extent maps that are in the list of modified extents if they were created in a past transaction, in which case an fsync skips them as it's pointless to log them. So stop setting the full fsync flag on the inode whenever we remove an extent map. This patch is part of a patchset that consists of 3 patches, which have the following subjects: 1/3 btrfs: fix race between page release and a fast fsync 2/3 btrfs: release old extent maps during page release 3/3 btrfs: do not set the full sync flag on the inode during page release Performance tests were ran against a branch (misc-next) containing the whole patchset. The test exercises a workload where there are multiple processes writing to files and fsyncing them (each writing and fsyncing its own file), and in total the amount of data dirtied ranges from 2x to 4x the system's RAM memory (16GiB), so that the page release callback is invoked frequently. The following script, using fio, was used to perform the tests: $ cat test-fsync.sh #!/bin/bash DEV=/dev/sdk MNT=/mnt/sdk MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-d single -m single" if [ $# -ne 3 ]; then echo "Use $0 NUM_JOBS FILE_SIZE FSYNC_FREQ" exit 1 fi NUM_JOBS=$1 FILE_SIZE=$2 FSYNC_FREQ=$3 cat <<EOF > /tmp/fio-job.ini [writers] rw=write fsync=$FSYNC_FREQ fallocate=none group_reporting=1 direct=0 bs=64k ioengine=sync size=$FILE_SIZE directory=$MNT numjobs=$NUM_JOBS thread EOF echo "Using config:" echo cat /tmp/fio-job.ini echo mkfs.btrfs -f $MKFS_OPTIONS $DEV &> /dev/null mount $MOUNT_OPTIONS $DEV $MNT fio /tmp/fio-job.ini umount $MNT The tests were performed for different numbers of jobs, file sizes and fsync frequency. A qemu VM using kvm was used, with 8 cores (the host has 12 cores, with cpu governance set to performance mode on all cores), 16GiB of ram (the host has 64GiB) and using a NVMe device directly (without an intermediary filesystem in the host). While running the tests, the host was not used for anything else, to avoid disturbing the tests. The obtained results were the following, and the last line printed by fio is pasted (includes aggregated throughput and test run time). ***************************************************** **** 1 job, 32GiB file, fsync frequency 1 **** ***************************************************** Before patchset: WRITE: bw=29.1MiB/s (30.5MB/s), 29.1MiB/s-29.1MiB/s (30.5MB/s-30.5MB/s), io=32.0GiB (34.4GB), run=1127557-1127557msec After patchset: WRITE: bw=29.3MiB/s (30.7MB/s), 29.3MiB/s-29.3MiB/s (30.7MB/s-30.7MB/s), io=32.0GiB (34.4GB), run=1119042-1119042msec (+0.7% throughput, -0.8% run time) ***************************************************** **** 2 jobs, 16GiB files, fsync frequency 1 **** ***************************************************** Before patchset: WRITE: bw=33.5MiB/s (35.1MB/s), 33.5MiB/s-33.5MiB/s (35.1MB/s-35.1MB/s), io=32.0GiB (34.4GB), run=979000-979000msec After patchset: WRITE: bw=39.9MiB/s (41.8MB/s), 39.9MiB/s-39.9MiB/s (41.8MB/s-41.8MB/s), io=32.0GiB (34.4GB), run=821283-821283msec (+19.1% throughput, -16.1% runtime) ***************************************************** **** 4 jobs, 8GiB files, fsync frequency 1 **** ***************************************************** Before patchset: WRITE: bw=52.1MiB/s (54.6MB/s), 52.1MiB/s-52.1MiB/s (54.6MB/s-54.6MB/s), io=32.0GiB (34.4GB), run=629130-629130msec After patchset: WRITE: bw=71.8MiB/s (75.3MB/s), 71.8MiB/s-71.8MiB/s (75.3MB/s-75.3MB/s), io=32.0GiB (34.4GB), run=456357-456357msec (+37.8% throughput, -27.5% runtime) ***************************************************** **** 8 jobs, 4GiB files, fsync frequency 1 **** ***************************************************** Before patchset: WRITE: bw=76.1MiB/s (79.8MB/s), 76.1MiB/s-76.1MiB/s (79.8MB/s-79.8MB/s), io=32.0GiB (34.4GB), run=430708-430708msec After patchset: WRITE: bw=133MiB/s (140MB/s), 133MiB/s-133MiB/s (140MB/s-140MB/s), io=32.0GiB (34.4GB), run=245458-245458msec (+74.7% throughput, -43.0% run time) ***************************************************** **** 16 jobs, 2GiB files, fsync frequency 1 **** ***************************************************** Before patchset: WRITE: bw=74.7MiB/s (78.3MB/s), 74.7MiB/s-74.7MiB/s (78.3MB/s-78.3MB/s), io=32.0GiB (34.4GB), run=438625-438625msec After patchset: WRITE: bw=184MiB/s (193MB/s), 184MiB/s-184MiB/s (193MB/s-193MB/s), io=32.0GiB (34.4GB), run=177864-177864msec (+146.3% throughput, -59.5% run time) ***************************************************** **** 32 jobs, 2GiB files, fsync frequency 1 **** ***************************************************** Before patchset: WRITE: bw=72.6MiB/s (76.1MB/s), 72.6MiB/s-72.6MiB/s (76.1MB/s-76.1MB/s), io=64.0GiB (68.7GB), run=902615-902615msec After patchset: WRITE: bw=227MiB/s (238MB/s), 227MiB/s-227MiB/s (238MB/s-238MB/s), io=64.0GiB (68.7GB), run=288936-288936msec (+212.7% throughput, -68.0% run time) ***************************************************** **** 64 jobs, 1GiB files, fsync frequency 1 **** ***************************************************** Before patchset: WRITE: bw=98.8MiB/s (104MB/s), 98.8MiB/s-98.8MiB/s (104MB/s-104MB/s), io=64.0GiB (68.7GB), run=663126-663126msec After patchset: WRITE: bw=294MiB/s (308MB/s), 294MiB/s-294MiB/s (308MB/s-308MB/s), io=64.0GiB (68.7GB), run=222940-222940msec (+197.6% throughput, -66.4% run time) Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: release old extent maps during page releaseFilipe Manana
When removing an extent map at try_release_extent_mapping(), called through the page release callback (btrfs_releasepage()), we never release an extent map that is in the list of modified extents. This is to prevent races with a concurrent fsync using the fast path, which could lead to not logging an extent created in the current transaction. However we can safely remove an extent map created in a past transaction that is still in the list of modified extents (because no one fsynced yet the inode after that transaction got commited), because such extents are skipped during an fsync as it is pointless to log them. This change does that. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: fix race between page release and a fast fsyncFilipe Manana
When releasing an extent map, done through the page release callback, we can race with an ongoing fast fsync and cause the fsync to miss a new extent and not log it. The steps for this to happen are the following: 1) A page is dirtied for some inode I; 2) Writeback for that page is triggered by a path other than fsync, for example by the system due to memory pressure; 3) When the ordered extent for the extent (a single 4K page) finishes, we unpin the corresponding extent map and set its generation to N, the current transaction's generation; 4) The btrfs_releasepage() callback is invoked by the system due to memory pressure for that no longer dirty page of inode I; 5) At the same time, some task calls fsync on inode I, joins transaction N, and at btrfs_log_inode() it sees that the inode does not have the full sync flag set, so we proceed with a fast fsync. But before we get into btrfs_log_changed_extents() and lock the inode's extent map tree: 6) Through btrfs_releasepage() we end up at try_release_extent_mapping() and we remove the extent map for the new 4Kb extent, because it is neither pinned anymore nor locked. By calling remove_extent_mapping(), we remove the extent map from the list of modified extents, since the extent map does not have the logging flag set. We unlock the inode's extent map tree; 7) The task doing the fast fsync now enters btrfs_log_changed_extents(), locks the inode's extent map tree and iterates its list of modified extents, which no longer has the 4Kb extent in it, so it does not log the extent; 8) The fsync finishes; 9) Before transaction N is committed, a power failure happens. After replaying the log, the 4K extent of inode I will be missing, since it was not logged due to the race with try_release_extent_mapping(). So fix this by teaching try_release_extent_mapping() to not remove an extent map if it's still in the list of modified extents. Fixes: ff44c6e36dc9dc ("Btrfs: do not hold the write_lock on the extent tree while logging") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: open-code remount flag setting in btrfs_remountJohannes Thumshirn
When we're (re)mounting a btrfs filesystem we set the BTRFS_FS_STATE_REMOUNTING state in fs_info to serialize against async reclaim or defrags. This flag is set in btrfs_remount_prepare() called by btrfs_remount(). As btrfs_remount_prepare() does nothing but setting this flag and doesn't have a second caller, we can just open-code the flag setting in btrfs_remount(). Similarly do for so clearing of the flag by moving it out of btrfs_remount_cleanup() into btrfs_remount() to be symmetrical. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: if we're restriping, use the target restripe profileJosef Bacik
Previously we depended on some weird behavior in our chunk allocator to force the allocation of new stripes, so by the time we got to doing the reduce we would usually already have a chunk with the proper target. However that behavior causes other problems and needs to be removed. First however we need to remove this check to only restripe if we already have those available profiles, because if we're allocating our first chunk it obviously will not be available. Simply use the target as specified, and if that fails it'll be because we're out of space. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: don't adjust bg flags and use default allocation profilesJosef Bacik
btrfs/061 has been failing consistently for me recently with a transaction abort. We run out of space in the system chunk array, which means we've allocated way too many system chunks than we need. Chris added this a long time ago for balance as a poor mans restriping. If you had a single disk and then added another disk and then did a balance, update_block_group_flags would then figure out which RAID level you needed. Fast forward to today and we have restriping behavior, so we can explicitly tell the fs that we're trying to change the raid level. This is accomplished through the normal get_alloc_profile path. Furthermore this code actually causes btrfs/061 to fail, because we do things like mkfs -m dup -d single with multiple devices. This trips this check alloc_flags = update_block_group_flags(fs_info, cache->flags); if (alloc_flags != cache->flags) { ret = btrfs_chunk_alloc(trans, alloc_flags, CHUNK_ALLOC_FORCE); in btrfs_inc_block_group_ro. Because we're balancing and scrubbing, but not actually restriping, we keep forcing chunk allocation of RAID1 chunks. This eventually causes us to run out of system space and the file system aborts and flips read only. We don't need this poor mans restriping any more, simply use the normal get_alloc_profile helper, which will get the correct alloc_flags and thus make the right decision for chunk allocation. This keeps us from allocating a billion system chunks and falling over. Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: fix lockdep splat from btrfs_dump_space_infoJosef Bacik
When running with -o enospc_debug you can get the following splat if one of the dump_space_info's trip ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ #20 Tainted: G OE ------------------------------------------------------ dd/563090 is trying to acquire lock: ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs] but task is already holding lock: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&cache->lock){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs] find_free_extent+0x7ef/0x13b0 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x340 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x530 [btrfs] btrfs_cow_block+0x106/0x210 [btrfs] commit_cowonly_roots+0x55/0x300 [btrfs] btrfs_commit_transaction+0x4ed/0xac0 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x36/0x70 cleanup_mnt+0x104/0x160 task_work_run+0x5f/0x90 __prepare_exit_to_usermode+0x1bd/0x1c0 do_syscall_64+0x5e/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&space_info->lock){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs] btrfs_inode_rsv_release+0x4f/0x170 [btrfs] btrfs_clear_delalloc_extent+0x155/0x480 [btrfs] clear_state_bit+0x81/0x1a0 [btrfs] __clear_extent_bit+0x25c/0x5d0 [btrfs] clear_extent_bit+0x15/0x20 [btrfs] btrfs_invalidatepage+0x2b7/0x3c0 [btrfs] truncate_cleanup_page+0x47/0xe0 truncate_inode_pages_range+0x238/0x840 truncate_pagecache+0x44/0x60 btrfs_setattr+0x202/0x5e0 [btrfs] notify_change+0x33b/0x490 do_truncate+0x76/0xd0 path_openat+0x687/0xa10 do_filp_open+0x91/0x100 do_sys_openat2+0x215/0x2d0 do_sys_open+0x44/0x80 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&tree->lock#2){+.+.}-{2:2}: _raw_spin_lock+0x25/0x30 find_first_extent_bit+0x32/0x150 [btrfs] write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs] __btrfs_write_out_cache+0x172/0x480 [btrfs] btrfs_write_out_cache+0x7a/0xf0 [btrfs] btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs] commit_cowonly_roots+0x245/0x300 [btrfs] btrfs_commit_transaction+0x4ed/0xac0 [btrfs] close_ctree+0xf9/0x2f5 [btrfs] generic_shutdown_super+0x6c/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x36/0x70 cleanup_mnt+0x104/0x160 task_work_run+0x5f/0x90 __prepare_exit_to_usermode+0x1bd/0x1c0 do_syscall_64+0x5e/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&ctl->tree_lock){+.+.}-{2:2}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 _raw_spin_lock+0x25/0x30 btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_space_info+0xf4/0x120 [btrfs] btrfs_reserve_extent+0x176/0x180 [btrfs] __btrfs_prealloc_file_range+0x145/0x550 [btrfs] cache_save_setup+0x28d/0x3b0 [btrfs] btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs] btrfs_commit_transaction+0xcc/0xac0 [btrfs] btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs] btrfs_file_write_iter+0x3cf/0x610 [btrfs] new_sync_write+0x11e/0x1b0 vfs_write+0x1c9/0x200 ksys_write+0x68/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &ctl->tree_lock --> &space_info->lock --> &cache->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&cache->lock); lock(&space_info->lock); lock(&cache->lock); lock(&ctl->tree_lock); *** DEADLOCK *** 6 locks held by dd/563090: #0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200 #1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs] #2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs] #3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs] #4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs] #5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs] stack backtrace: CPU: 3 PID: 563090 Comm: dd Tainted: G OE 5.8.0-rc5+ #20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? wake_up_klogd.part.0+0x30/0x40 lock_acquire+0xab/0x360 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs] _raw_spin_lock+0x25/0x30 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_free_space+0x2b/0xa0 [btrfs] btrfs_dump_space_info+0xf4/0x120 [btrfs] btrfs_reserve_extent+0x176/0x180 [btrfs] __btrfs_prealloc_file_range+0x145/0x550 [btrfs] ? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs] cache_save_setup+0x28d/0x3b0 [btrfs] btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs] btrfs_commit_transaction+0xcc/0xac0 [btrfs] ? start_transaction+0xe0/0x5b0 [btrfs] btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs] ? ktime_get_coarse_real_ts64+0xa8/0xd0 ? trace_hardirqs_on+0x1c/0xe0 btrfs_file_write_iter+0x3cf/0x610 [btrfs] new_sync_write+0x11e/0x1b0 vfs_write+0x1c9/0x200 ksys_write+0x68/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because we're holding the block_group->lock while trying to dump the free space cache. However we don't need this lock, we just need it to read the values for the printk, so move the free space cache dumping outside of the block group lock. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27btrfs: move the chunk_mutex in btrfs_read_chunk_treeJosef Bacik
We are currently getting this lockdep splat in btrfs/161: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc5+ #20 Tainted: G E ------------------------------------------------------ mount/678048 is trying to acquire lock: ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs] but task is already holding lock: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __mutex_lock+0x8b/0x8f0 btrfs_init_new_device+0x2d2/0x1240 [btrfs] btrfs_ioctl+0x1de/0x2d20 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __lock_acquire+0x1240/0x2460 lock_acquire+0xab/0x360 __mutex_lock+0x8b/0x8f0 clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] btrfs_mount_root.cold+0x13/0xfa [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); lock(&fs_info->chunk_mutex); lock(&fs_devs->device_list_mutex); *** DEADLOCK *** 3 locks held by mount/678048: #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380 #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs] #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs] stack backtrace: CPU: 2 PID: 678048 Comm: mount Tainted: G E 5.8.0-rc5+ #20 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011 Call Trace: dump_stack+0x96/0xd0 check_noncircular+0x162/0x180 __lock_acquire+0x1240/0x2460 ? asm_sysvec_apic_timer_interrupt+0x12/0x20 lock_acquire+0xab/0x360 ? clone_fs_devices+0x4d/0x170 [btrfs] __mutex_lock+0x8b/0x8f0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? rcu_read_lock_sched_held+0x52/0x60 ? cpumask_next+0x16/0x20 ? module_assert_mutex_or_preempt+0x14/0x40 ? __module_address+0x28/0xf0 ? clone_fs_devices+0x4d/0x170 [btrfs] ? static_obj+0x4f/0x60 ? lockdep_init_map_waits+0x43/0x200 ? clone_fs_devices+0x4d/0x170 [btrfs] clone_fs_devices+0x4d/0x170 [btrfs] btrfs_read_chunk_tree+0x330/0x800 [btrfs] open_ctree+0xb7c/0x18ce [btrfs] ? super_setup_bdi_name+0x79/0xd0 btrfs_mount_root.cold+0x13/0xfa [btrfs] ? vfs_parse_fs_string+0x84/0xb0 ? rcu_read_lock_sched_held+0x52/0x60 ? kfree+0x2b5/0x310 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 fc_mount+0xe/0x40 vfs_kern_mount.part.0+0x71/0x90 btrfs_mount+0x13b/0x3e0 [btrfs] ? cred_has_capability+0x7c/0x120 ? rcu_read_lock_sched_held+0x52/0x60 ? legacy_get_tree+0x30/0x50 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 do_mount+0x7de/0xb30 ? memdup_user+0x4e/0x90 __x64_sys_mount+0x8e/0xd0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and then read the device, which takes the device_list_mutex. The device_list_mutex needs to be taken before the chunk_mutex, so this is a problem. We only really need the chunk mutex around adding the chunk, so move the mutex around read_one_chunk. An argument could be made that we don't even need the chunk_mutex here as it's during mount, and we are protected by various other locks. However we already have special rules for ->device_list_mutex, and I'd rather not have another special case for ->chunk_mutex. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>