summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-19perf/x86: Hybrid PMU support for intel_ctrlKan Liang
The intel_ctrl is the counter mask of a PMU. The PMU counter information may be different among hybrid PMUs, each hybrid PMU should use its own intel_ctrl to check and access the counters. When handling a certain hybrid PMU, apply the intel_ctrl from the corresponding hybrid PMU. When checking the HW existence, apply the PMU and number of counters from the corresponding hybrid PMU as well. Perf will check the HW existence for each Hybrid PMU before registration. Expose the check_hw_exists() for a later patch. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/1618237865-33448-6-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86/intel: Hybrid PMU support for perf capabilitiesKan Liang
Some platforms, e.g. Alder Lake, have hybrid architecture. Although most PMU capabilities are the same, there are still some unique PMU capabilities for different hybrid PMUs. Perf should register a dedicated pmu for each hybrid PMU. Add a new struct x86_hybrid_pmu, which saves the dedicated pmu and capabilities for each hybrid PMU. The architecture MSR, MSR_IA32_PERF_CAPABILITIES, only indicates the architecture features which are available on all hybrid PMUs. The architecture features are stored in the global x86_pmu.intel_cap. For Alder Lake, the model-specific features are perf metrics and PEBS-via-PT. The corresponding bits of the global x86_pmu.intel_cap should be 0 for these two features. Perf should not use the global intel_cap to check the features on a hybrid system. Add a dedicated intel_cap in the x86_hybrid_pmu to store the model-specific capabilities. Use the dedicated intel_cap to replace the global intel_cap for thse two features. The dedicated intel_cap will be set in the following "Add Alder Lake Hybrid support" patch. Add is_hybrid() to distinguish a hybrid system. ADL may have an alternative configuration. With that configuration, the X86_FEATURE_HYBRID_CPU is not set. Perf cannot rely on the feature bit. Add a new static_key_false, perf_is_hybrid, to indicate a hybrid system. It will be assigned in the following "Add Alder Lake Hybrid support" patch as well. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-5-git-send-email-kan.liang@linux.intel.com
2021-04-19perf/x86: Track pmu in per-CPU cpu_hw_eventsKan Liang
Some platforms, e.g. Alder Lake, have hybrid architecture. In the same package, there may be more than one type of CPU. The PMU capabilities are different among different types of CPU. Perf will register a dedicated PMU for each type of CPU. Add a 'pmu' variable in the struct cpu_hw_events to track the dedicated PMU of the current CPU. Current x86_get_pmu() use the global 'pmu', which will be broken on a hybrid platform. Modify it to apply the 'pmu' of the specific CPU. Initialize the per-CPU 'pmu' variable with the global 'pmu'. There is nothing changed for the non-hybrid platforms. The is_x86_event() will be updated in the later patch ("perf/x86: Register hybrid PMUs") for hybrid platforms. For the non-hybrid platforms, nothing is changed here. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1618237865-33448-4-git-send-email-kan.liang@linux.intel.com
2021-04-19x86/cpu: Add helper function to get the type of the current hybrid CPURicardo Neri
On processors with Intel Hybrid Technology (i.e., one having more than one type of CPU in the same package), all CPUs support the same instruction set and enumerate the same features on CPUID. Thus, all software can run on any CPU without restrictions. However, there may be model-specific differences among types of CPUs. For instance, each type of CPU may support a different number of performance counters. Also, machine check error banks may be wired differently. Even though most software will not care about these differences, kernel subsystems dealing with these differences must know. Add and expose a new helper function get_this_hybrid_cpu_type() to query the type of the current hybrid CPU. The function will be used later in the perf subsystem. The Intel Software Developer's Manual defines the CPU type as 8-bit identifier. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1618237865-33448-3-git-send-email-kan.liang@linux.intel.com
2021-04-19x86/cpufeatures: Enumerate Intel Hybrid Technology feature bitRicardo Neri
Add feature enumeration to identify a processor with Intel Hybrid Technology: one in which CPUs of more than one type are the same package. On a hybrid processor, all CPUs support the same homogeneous (i.e., symmetric) instruction set. All CPUs enumerate the same features in CPUID. Thus, software (user space and kernel) can run and migrate to any CPU in the system as well as utilize any of the enumerated features without any change or special provisions. The main difference among CPUs in a hybrid processor are power and performance properties. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1618237865-33448-2-git-send-email-kan.liang@linux.intel.com
2021-04-19preempt/dynamic: Fix typo in macro conditional statementZhouyi Zhou
Commit 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call") tried to provide irqentry_exit_cond_resched() static call in irqentry_exit, but has a typo in macro conditional statement. Fixes: 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call") Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210410073523.5493-1-zhouzhouyi@gmail.com
2021-04-19x86/crypto: Enable objtool in crypto codeJosh Poimboeuf
Now that all the stack alignment prologues have been cleaned up in the crypto code, enable objtool. Among other benefits, this will allow ORC unwinding to work. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/fc2a1918c50e33e46ef0e9a5de02743f2f6e3639.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha512-ssse3: Standardize stack alignment prologueJosh Poimboeuf
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/6ecaaac9f3828fbb903513bf90c34a08380a8e35.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha512-avx2: Standardize stack alignment prologueJosh Poimboeuf
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/b1a7b29fcfc65d60a3b6e77ef75f4762a5b8488d.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha512-avx: Standardize stack alignment prologueJosh Poimboeuf
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/d36e9ea1c819d87fa89b3df3fa83e2a1ede18146.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha256-avx2: Standardize stack alignment prologueJosh Poimboeuf
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/8048e7444c49a8137f05265262b83dc50f8fb7f3.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha1_avx2: Standardize stack alignment prologueJosh Poimboeuf
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/fdaaf8670ed1f52f55ba9a6bbac98c1afddc1af6.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/sha_ni: Standardize stack alignment prologueJosh Poimboeuf
Use a more standard prologue for saving the stack pointer before realigning the stack. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/5033e1a79867dff1b18e1b4d0783c38897d3f223.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/crc32c-pcl-intel: Standardize jump tableJosh Poimboeuf
Simplify the jump table code so that it resembles a compiler-generated table. This enables ORC unwinding by allowing objtool to follow all the potential code paths. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/5357a039def90b8ef6b5874ef12cda008ecf18ba.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/camellia-aesni-avx2: Unconditionally allocate stack bufferJosh Poimboeuf
A conditional stack allocation violates traditional unwinding requirements when a single instruction can have differing stack layouts. There's no benefit in allocating the stack buffer conditionally. Just do it unconditionally. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/85ac96613ee5784b6239c18d3f68b1f3c509caa3.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/aesni-intel_avx: Standardize stack alignment prologueJosh Poimboeuf
Use RBP instead of R14 for saving the old stack pointer before realignment. This resembles what compilers normally do. This enables ORC unwinding by allowing objtool to understand the stack realignment. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/02d00a0903a0959f4787e186e2a07d271e1f63d4.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/aesni-intel_avx: Fix register usage commentsJosh Poimboeuf
Fix register usage comments to match reality. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/8655d4513a0ed1eddec609165064153973010aa2.1614182415.git.jpoimboe@redhat.com
2021-04-19x86/crypto/aesni-intel_avx: Remove unused macrosJosh Poimboeuf
These macros are no longer used; remove them. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/53f7136ea93ebdbca399959e6d2991ecb46e733e.1614182415.git.jpoimboe@redhat.com
2021-04-19objtool: Support asm jump tablesJosh Poimboeuf
Objtool detection of asm jump tables would normally just work, except for the fact that asm retpolines use alternatives. Objtool thinks the alternative code path (a jump to the retpoline) is a sibling call. Don't treat alternative indirect branches as sibling calls when the original instruction has a jump table. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/460cf4dc675d64e1124146562cabd2c05aa322e8.1614182415.git.jpoimboe@redhat.com
2021-04-19io_uring: fix shared sqpoll cancellation hangsPavel Begunkov
[ 736.982891] INFO: task iou-sqp-4294:4295 blocked for more than 122 seconds. [ 736.982897] Call Trace: [ 736.982901] schedule+0x68/0xe0 [ 736.982903] io_uring_cancel_sqpoll+0xdb/0x110 [ 736.982908] io_sqpoll_cancel_cb+0x24/0x30 [ 736.982911] io_run_task_work_head+0x28/0x50 [ 736.982913] io_sq_thread+0x4e3/0x720 We call io_uring_cancel_sqpoll() one by one for each ctx either in sq_thread() itself or via task works, and it's intended to cancel all requests of a specified context. However the function uses per-task counters to track the number of inflight requests, so it counts more requests than available via currect io_uring ctx and goes to sleep for them to appear (e.g. from IRQ), that will never happen. Cancel a bit more than before, i.e. all ctxs that share sqpoll and continue to use shared counters. Don't forget that we should not remove ctx from the list before running that task_work sqpoll-cancel, otherwise the function wouldn't be able to find the context and will hang. Reported-by: Joakim Hassila <joj@mac.com> Reported-by: Jens Axboe <axboe@kernel.dk> Fixes: 37d1e2e3642e2 ("io_uring: move SQPOLL thread io-wq forked worker") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1bded7e6c6b32e0bae25fce36be2868e46b116a0.1618752958.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-19io_uring: remove extra sqpoll submission haltingPavel Begunkov
SQPOLL task won't submit requests for a context that is currently dying, so no need to remove ctx from sqd_list prior the main loop of io_ring_exit_work(). Kill it, will be removed by io_sq_thread_finish() and only brings confusion and lockups. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f220c2b786ba0f9499bebc9f3cd9714d29efb6a5.1618752958.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-19Revert "mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND"Fabio Estevam
This reverts commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51. Since commit bd9c9fe2ad04 ("mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in NAND") the bad block table cannot be found on a imx27-phytec-phycard-s-rdk board: Bad block table not found for chip 0 Bad block table not found for chip 0 Revert it for now, until a better solution can be found. Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210419140350.809853-1-festevam@gmail.com
2021-04-19Merge tag 'qcom-defconfig-for-5.13' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/defconfig Qualcomm ARM defconfig updates for 5.13 This enables all the hardware support currently available for the Qualcomm SDX55 platform in the qcom_defconfig. Due to (current) size limitations these changes are not done in the multi-platform config. * tag 'qcom-defconfig-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: ARM: configs: qcom_defconfig: Reduce CMA size to 64MB ARM: configs: qcom_defconfig: Enable GLINK SMEM driver ARM: configs: qcom_defconfig: Enable SDX55 interconnect driver ARM: configs: qcom_defconfig: Enable Q6V5_PAS remoteproc driver ARM: configs: qcom_defconfig: Enable CPUFreq support ARM: configs: qcom_defconfig: Enable SDX55 A7 PLL and APCS clock driver ARM: configs: qcom_defconfig: Enable APCS IPC mailbox driver Link: https://lore.kernel.org/r/20210419152143.861934-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19Merge tag 'qcom-arm64-for-5.13-3' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/dt Even more Qualcomm ARM64 updates for v5.13 This contains three audio related fixes for the sc7180 Trogdor devices. * tag 'qcom-arm64-for-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: sc7180: Update iommu property for simultaneous playback arm64: dts: qcom: sc7180: pompom: Add "dmic_clk_en" + sound model arm64: dts: qcom: sc7180: coachz: Add "dmic_clk_en" Link: https://lore.kernel.org/r/20210419151637.861409-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-04-19btrfs: zoned: fail mount if the device does not support zone appendJohannes Thumshirn
For zoned btrfs, zone append is mandatory to write to a sequential write only zone, otherwise parallel writes to the same zone could result in unaligned write errors. If a zoned block device does not support zone append (e.g. a dm-crypt zoned device using a non-NULL IV cypher), fail to mount. CC: stable@vger.kernel.org # 5.12 Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: fix race between transaction aborts and fsyncs leading to use-after-freeFilipe Manana
There is a race between a task aborting a transaction during a commit, a task doing an fsync and the transaction kthread, which leads to an use-after-free of the log root tree. When this happens, it results in a stack trace like the following: BTRFS info (device dm-0): forced readonly BTRFS warning (device dm-0): Skipping commit of aborted transaction. BTRFS: error (device dm-0) in cleanup_transaction:1958: errno=-5 IO failure BTRFS warning (device dm-0): lost page write due to IO error on /dev/mapper/error-test (-5) BTRFS warning (device dm-0): Skipping commit of aborted transaction. BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0xa4e8 len 4096 err no 10 BTRFS error (device dm-0): error writing primary super block to device 1 BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e000 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e008 len 4096 err no 10 BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e010 len 4096 err no 10 BTRFS: error (device dm-0) in write_all_supers:4110: errno=-5 IO failure (1 errors while writing supers) BTRFS: error (device dm-0) in btrfs_sync_log:3308: errno=-5 IO failure general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b68: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 2458471 Comm: fsstress Not tainted 5.12.0-rc5-btrfs-next-84 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mutex_lock+0x139/0xa40 Code: c0 74 19 (...) RSP: 0018:ffff9f18830d7b00 EFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b68 RBX: 0000000000000001 RCX: 0000000000000002 RDX: ffffffffb9c54d13 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff9f18830d7bc0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff9f18830d7be0 R11: 0000000000000001 R12: ffff8c6cd199c040 R13: ffff8c6c95821358 R14: 00000000fffffffb R15: ffff8c6cbcf01358 FS: 00007fa9140c2b80(0000) GS:ffff8c6fac600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa913d52000 CR3: 000000013d2b4003 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? __btrfs_handle_fs_error+0xde/0x146 [btrfs] ? btrfs_sync_log+0x7c1/0xf20 [btrfs] ? btrfs_sync_log+0x7c1/0xf20 [btrfs] btrfs_sync_log+0x7c1/0xf20 [btrfs] btrfs_sync_file+0x40c/0x580 [btrfs] do_fsync+0x38/0x70 __x64_sys_fsync+0x10/0x20 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fa9142a55c3 Code: 8b 15 09 (...) RSP: 002b:00007fff26278d48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 0000563c83cb4560 RCX: 00007fa9142a55c3 RDX: 00007fff26278cb0 RSI: 00007fff26278cb0 RDI: 0000000000000005 RBP: 0000000000000005 R08: 0000000000000001 R09: 00007fff26278d5c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000340 R13: 00007fff26278de0 R14: 00007fff26278d96 R15: 0000563c83ca57c0 Modules linked in: btrfs dm_zero dm_snapshot dm_thin_pool (...) ---[ end trace ee2f1b19327d791d ]--- The steps that lead to this crash are the following: 1) We are at transaction N; 2) We have two tasks with a transaction handle attached to transaction N. Task A and Task B. Task B is doing an fsync; 3) Task B is at btrfs_sync_log(), and has saved fs_info->log_root_tree into a local variable named 'log_root_tree' at the top of btrfs_sync_log(). Task B is about to call write_all_supers(), but before that... 4) Task A calls btrfs_commit_transaction(), and after it sets the transaction state to TRANS_STATE_COMMIT_START, an error happens before it waits for the transaction's 'num_writers' counter to reach a value of 1 (no one else attached to the transaction), so it jumps to the label "cleanup_transaction"; 5) Task A then calls cleanup_transaction(), where it aborts the transaction, setting BTRFS_FS_STATE_TRANS_ABORTED on fs_info->fs_state, setting the ->aborted field of the transaction and the handle to an errno value and also setting BTRFS_FS_STATE_ERROR on fs_info->fs_state. After that, at cleanup_transaction(), it deletes the transaction from the list of transactions (fs_info->trans_list), sets the transaction to the state TRANS_STATE_COMMIT_DOING and then waits for the number of writers to go down to 1, as it's currently 2 (1 for task A and 1 for task B); 6) The transaction kthread is running and sees that BTRFS_FS_STATE_ERROR is set in fs_info->fs_state, so it calls btrfs_cleanup_transaction(). There it sees the list fs_info->trans_list is empty, and then proceeds into calling btrfs_drop_all_logs(), which frees the log root tree with a call to btrfs_free_log_root_tree(); 7) Task B calls write_all_supers() and, shortly after, under the label 'out_wake_log_root', it deferences the pointer stored in 'log_root_tree', which was already freed in the previous step by the transaction kthread. This results in a use-after-free leading to a crash. Fix this by deleting the transaction from the list of transactions at cleanup_transaction() only after setting the transaction state to TRANS_STATE_COMMIT_DOING and waiting for all existing tasks that are attached to the transaction to release their transaction handles. This makes the transaction kthread wait for all the tasks attached to the transaction to be done with the transaction before dropping the log roots and doing other cleanups. Fixes: ef67963dac255b ("btrfs: drop logs when we've aborted a transaction") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: introduce submit_eb_subpage() to submit a subpage metadata pageQu Wenruo
The new function, submit_eb_subpage(), will submit all the dirty extent buffers in the page. The major difference between submit_eb_page() and submit_eb_subpage() is: - How to grab extent buffer Now we use find_extent_buffer_nospinlock() other than using page::private. All other different handling is already done in functions like lock_extent_buffer_for_io() and write_one_eb(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: make lock_extent_buffer_for_io() to be subpage compatibleQu Wenruo
For subpage metadata, we don't use page locking at all. So just skip the page locking part for subpage. The rest of the function can be reused. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: introduce write_one_subpage_eb() functionQu Wenruo
The new function, write_one_subpage_eb(), as a subroutine for subpage metadata write, will handle the extent buffer bio submission. The major differences between the new write_one_subpage_eb() and write_one_eb() is: - No page locking When entering write_one_subpage_eb() the page is no longer locked. We only lock the page for its status update, and unlock immediately. Now we completely rely on extent io tree locking. - Extra bitmap update along with page status update Now page dirty and writeback is controlled by btrfs_subpage::dirty_bitmap and btrfs_subpage::writeback_bitmap. They both follow the schema that any sector is dirty/writeback, then the full page gets dirty/writeback. - When to update the nr_written number Now we take a shortcut, if we have cleared the last dirty bit of the page, we update nr_written. This is not completely perfect, but should emulate the old behavior well enough. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: introduce end_bio_subpage_eb_writepage() functionQu Wenruo
The new function, end_bio_subpage_eb_writepage(), will handle the metadata writeback endio. The major differences involved are: - How to grab extent buffer Now page::private is a pointer to btrfs_subpage, we can no longer grab extent buffer directly. Thus we need to use the bv_offset to locate the extent buffer manually and iterate through the whole range. - Use btrfs_subpage_end_writeback() caller This helper will handle the subpage writeback for us. Since this function is executed under endio context, when grabbing extent buffers it can't grab eb->refs_lock as that lock is not designed to be grabbed under hardirq context. So here introduce a helper, find_extent_buffer_nolock(), for such situation, and convert find_extent_buffer() to use that helper. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: check return value of btrfs_commit_transaction in relocationJosef Bacik
There are a few places where we don't check the return value of btrfs_commit_transaction in relocation.c. Thankfully all these places have straightforward error handling, so simply change all of the sites at once. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: do proper error handling in merge_reloc_rootsJosef Bacik
We have a BUG_ON() if we get an error back from btrfs_get_fs_root(). This honestly should never fail, as at this point we have a solid coordination of fs root to reloc root, and these roots will all be in memory. But in the name of killing BUG_ON()'s remove these and handle the error condition properly, ASSERT()'ing for developers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle extent corruption with select_one_root properlyJosef Bacik
In corruption cases we could have paths from a block up to no root at all, and thus we'll BUG_ON(!root) in select_one_root. Handle this by adding an ASSERT() for developers, and returning an error for normal users. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: cleanup error handling in prepare_to_mergeJosef Bacik
This probably can't happen even with a corrupt file system, because we would have failed much earlier on than here. However there's no reason we can't just check and bail out as appropriate, so do that and convert the correctness BUG_ON() to an ASSERT(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: do not panic in __add_reloc_rootJosef Bacik
If we have a duplicate entry for a reloc root then we could have fs corruption that resulted in a double allocation. Since this shouldn't happen unless there is corruption, add an ASSERT(ret != -EEXIST) to all of the callers of __add_reloc_root() to catch any logic mistakes for developers, otherwise normal error handling will happen for normal users. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle __add_reloc_root failures in btrfs_recover_relocationJosef Bacik
We can already handle errors appropriately from this function, deal with an error coming from __add_reloc_root appropriately. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment ] Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: do proper error handling in create_reloc_inodeJosef Bacik
We already handle some errors in this function, and the callers do the correct error handling, so clean up the rest of the function to do the appropriate error handling. There's a little extra work that needs to be done here, as we create the inode item before we create the orphan item. We could potentially add the orphan item, but if we failed to create the inode item we would have to abort the transaction. Instead add a helper to delete the inode item we created in the case that we're unable to look up the inode (this would likely be caused by an ENOMEM), which if it succeeds means we can avoid a transaction abort in this particular error case. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: remove the extent item sanity checks in relocate_block_groupJosef Bacik
These checks are all taken care of for us by the tree checker code: - the flags don't change or are updated consistently - the v0 extent item format is invalid and caught in many other places too Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: tree-checker: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set ↵Josef Bacik
improperly We need to validate that a data extent item does not have the FULL_BACKREF flag set on its flags. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle extent reference errors in do_relocationJosef Bacik
We can already deal with errors appropriately from do_relocation, simply handle any errors that come from changing the refs at this point cleanly. We have to abort the transaction if we fail here as we've modified metadata at this point. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle errors in reference count manipulation in replace_pathJosef Bacik
If any of the reference count manipulation stuff fails in replace_path we need to abort the transaction, as we've modified the blocks already. We can simply break at this point and everything will be cleaned up. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle btrfs_search_slot failure in replace_pathJosef Bacik
The search can fail for various reasons, in case of errors there's no cleanup to be done so we can pass the error to the caller, adjusting for the case where the key is not found and search slot returns 1. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle btrfs_cow_block errors in replace_pathJosef Bacik
If we error out COWing the root node when doing a replace_path then we simply unlock and free the buffer and return the error. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'sJosef Bacik
A few BUG_ON()'s in replace_path are purely to keep us from making logical mistakes, so replace them with ASSERT()'s. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: do proper error handling in btrfs_update_reloc_rootJosef Bacik
We call btrfs_update_root in btrfs_update_reloc_root, which can fail for all sorts of reasons, including IO errors. Instead of panicing the box lets return the error, now that all callers properly handle those errors. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle btrfs_update_reloc_root failure in prepare_to_mergeJosef Bacik
btrfs_update_reloc_root will will return errors in the future, so handle an error properly in prepare_to_merge. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvolJosef Bacik
btrfs_update_reloc_root will will return errors in the future, so handle the error properly in insert_dirty_subvol. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: change insert_dirty_subvol to return errorsJosef Bacik
This will be able to return errors in the future, so change it to return an error and handle the errors appropriately. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: handle btrfs_update_reloc_root failure in commit_fs_rootsJosef Bacik
btrfs_update_reloc_root will will return errors in the future, so handle the error properly in commit_fs_roots. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19btrfs: validate root::reloc_root after recording root in transJosef Bacik
If we fail to setup a root->reloc_root in a different thread that path will error out, however it still leaves root->reloc_root NULL but would still appear set up in the transaction. Subsequent calls to btrfs_record_root_in_transaction would succeed without attempting to create the reloc root, as the transid has already been updated. Handle this case by making sure we have a root->reloc_root set after a btrfs_record_root_in_transaction call so we don't end up dereferencing a NULL pointer. Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>