summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-02staging: greybus: gpio: Use generic_handle_irq_safe().Sebastian Andrzej Siewior
Instead of manually disabling interrupts before invoking use generic_handle_irq_safe() which can be invoked with enabled and disabled interrupts. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Johan Hovold <johan@kernel.org> Cc: Alex Elder <elder@kernel.org> Cc: Rui Miguel Silva <rmfrfs@gmail.com> Link: https://lore.kernel.org/r/20220211181500.1856198-8-bigeasy@linutronix.de
2022-03-02net: usb: lan78xx: Use generic_handle_irq_safe().Sebastian Andrzej Siewior
Instead of manually disabling interrupts before invoking use generic_handle_irq_safe() which can be invoked with enabled and disabled interrupts. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Woojung Huh <woojung.huh@microchip.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: UNGLinuxDriver@microchip.com Link: https://lore.kernel.org/r/20220211181500.1856198-7-bigeasy@linutronix.de
2022-03-02mfd: ezx-pcap: Use generic_handle_irq_safe().Sebastian Andrzej Siewior
Instead of manually disabling interrupts before invoking use generic_handle_irq_safe() which can be invoked with enabled and disabled interrupts. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20220211181500.1856198-6-bigeasy@linutronix.de
2022-03-02misc: hi6421-spmi-pmic: Use generic_handle_irq_safe().Sebastian Andrzej Siewior
generic_handle_irq() is invoked from a regular interrupt service routine. This handler will become a forced-threaded handler on PREEMPT_RT and will be invoked with enabled interrupts. The generic_handle_irq() must be invoked with disabled interrupts in order to avoid deadlocks. Instead of manually disabling interrupts before invoking use generic_handle_irq_safe() which can be invoked with enabled and disabled interrupts. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220211181500.1856198-5-bigeasy@linutronix.de
2022-03-02Merge tag 'erofs-for-5.17-rc7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: "A one-line patch to fix the new ztailpacking feature on > 4GiB filesystems because z_idataoff can get trimmed improperly. ztailpacking is still a brand new EXPERIMENTAL feature, but it'd be better to fix the issue as soon as possible to avoid unnecessary backporting. Summary: - Fix ztailpacking z_idataoff getting trimmed on > 4GiB filesystems" * tag 'erofs-for-5.17-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix ztailpacking on > 4GiB filesystems
2022-03-02Merge tag 'ntb-5.17-bugfixes' of git://github.com/jonmason/ntbLinus Torvalds
Pull NTB fixes from Jon Mason: "Bug fixes for sparse warning, intel port config offset, and a new mailing list" * tag 'ntb-5.17-bugfixes' of git://github.com/jonmason/ntb: MAINTAINERS: update mailing list address for NTB subsystem ntb: intel: fix port config status offset for SPR NTB/msi: Use struct_size() helper in devm_kzalloc()
2022-03-02ACPI: bus: Introduce acpi_bus_for_each_dev()Rafael J. Wysocki
In order to avoid exposing acpi_bus_type to modules, introduce an acpi_bus_for_each_dev() helper for iterating over all ACPI device objects and make typec_link_ports() use it instead of the raw bus_for_each_dev() along with acpi_bus_type. Having done that, drop the acpi_bus_type export. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
2022-03-02dt-bindings: hwmon: Add sample averaging properties for ADM1275Potin Lai
Add documentation of new properties for sample averaging in PMON_CONFIG register. New properties: - adi,volt-curr-sample-average - adi,power-sample-average Signed-off-by: Potin Lai <potin.lai@quantatw.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220302123817.27025-3-potin.lai@quantatw.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-03-02hwmon: (adm1275) Allow setting sample averagingPotin Lai
Current driver assume PWR_AVG and VI_AVG as 1 by default, and user needs to set sample averaging via sysfs manually. This patch parses the properties "adi,power-sample-average" and "adi,volt-curr-sample-average" from device tree, and setting sample averaging during probe. Input value must be one of value in the list [1, 2, 4, 8, 16, 32, 64, 128]. Signed-off-by: Potin Lai <potin.lai@quantatw.com> Link: https://lore.kernel.org/r/20220302123817.27025-2-potin.lai@quantatw.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-03-02ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustmentsJonathan Lemon
In ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime."), the ns adjustment was written to the FPGA register, so the clock could accurately perform adjustments. However, the adjtime() call passes in a s64, while the clock adjustment registers use a s32. When trying to perform adjustments with a large value (37 sec), things fail. Examine the incoming delta, and if larger than 1 sec, use the original (coarse) adjustment method. If smaller than 1 sec, then allow the FPGA to fold in the changes over a 1 second window. Fixes: 6d59d4fa1789 ("ptp: ocp: Have FPGA fold in ns adjustment for adjtime.") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20220228203957.367371-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02hwmon: (xdpe12284) Add regulator supportMarcello Sylvester Bauer
Add simple on/off regulator support for xdpe12284 and other pmbus parts supported by the xdpe12284 driver. Signed-off-by: Marcello Sylvester Bauer <sylv@sylv.io> Link: https://lore.kernel.org/r/f69b8e7fa32cd2bad9516d8fa590abb87c7e4869.1646214248.git.sylv@sylv.io Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-03-02hwmon: (xdpe12284) Add support for xdpe11280Marcello Sylvester Bauer
Add support for another Infineon Multi-phase controller chip. The xdpe11280 uses linear instead of vid data format for VOUT. Detect VOUT_MODE format during identification and skip the xdpe122 related adaptions in case it is linear. Signed-off-by: Marcello Sylvester Bauer <sylv@sylv.io> Link: https://lore.kernel.org/r/fa6a4b636a05ecb337d132824efca2545188a2a2.1646214248.git.sylv@sylv.io Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-03-02dt-bindings: trivial-devices: Add xdpe11280Marcello Sylvester Bauer
Add Infineon Multi-phase Digital Controller XDPE11280. Signed-off-by: Marcello Sylvester Bauer <sylv@sylv.io> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/81c0523ba790965a371d267bb1944652d80f5f72.1646214248.git.sylv@sylv.io Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-03-02selinux: shorten the policy capability enum namesPaul Moore
The SELinux policy capability enum names are rather long and follow the "POLICYDB_CAPABILITY_XXX format". While the "POLICYDB_" prefix is helpful in tying the enums to other SELinux policy constants, macros, etc. there is no reason why we need to spell out "CAPABILITY" completely. Shorten "CAPABILITY" to "CAP" in order to make things a bit shorter and cleaner. Moving forward, the SELinux policy capability enum names should follow the "POLICYDB_CAP_XXX" format. Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-03-02KVM: x86: pull kvm->srcu read-side to kvm_arch_vcpu_ioctl_runPaolo Bonzini
kvm_arch_vcpu_ioctl_run is already doing srcu_read_lock/unlock in two places, namely vcpu_run and post_kvm_run_save, and a third is actually needed around the call to vcpu->arch.complete_userspace_io to avoid the following splat: WARNING: suspicious RCU usage arch/x86/kvm/pmu.c:190 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by CPU 28/KVM/370841: #0: ff11004089f280b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x87/0x730 [kvm] Call Trace: <TASK> dump_stack_lvl+0x59/0x73 reprogram_fixed_counter+0x15d/0x1a0 [kvm] kvm_pmu_trigger_event+0x1a3/0x260 [kvm] ? free_moved_vector+0x1b4/0x1e0 complete_fast_pio_in+0x8a/0xd0 [kvm] This splat is not at all unexpected, since complete_userspace_io callbacks can execute similar code to vmexits. For example, SVM with nrips=false will call into the emulator from svm_skip_emulated_instruction(). While it's tempting to never acquire kvm->srcu for an uninitialized vCPU, practically speaking there's no penalty to acquiring kvm->srcu "early" as the KVM_MP_STATE_UNINITIALIZED path is a one-time thing per vCPU. On the other hand, seemingly innocuous helpers like kvm_apic_accept_events() and sync_regs() can theoretically reach code that might access SRCU-protected data structures, e.g. sync_regs() can trigger forced existing of nested mode via kvm_vcpu_ioctl_x86_set_vcpu_events(). Reported-by: Like Xu <likexu@tencent.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-02KVM: x86/mmu: Passing up the error state of mmu_alloc_shadow_roots()Like Xu
Just like on the optional mmu_alloc_direct_roots() path, once shadow path reaches "r = -EIO" somewhere, the caller needs to know the actual state in order to enter error handling and avoid something worse. Fixes: 4a38162ee9f1 ("KVM: MMU: load PDPTRs outside mmu_lock") Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220301124941.48412-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-02btrfs: add missing run of delayed items after unlink during log replayFilipe Manana
During log replay, whenever we need to check if a name (dentry) exists in a directory we do searches on the subvolume tree for inode references or or directory entries (BTRFS_DIR_INDEX_KEY keys, and BTRFS_DIR_ITEM_KEY keys as well, before kernel 5.17). However when during log replay we unlink a name, through btrfs_unlink_inode(), we may not delete inode references and dir index keys from a subvolume tree and instead just add the deletions to the delayed inode's delayed items, which will only be run when we commit the transaction used for log replay. This means that after an unlink operation during log replay, if we attempt to search for the same name during log replay, we will not see that the name was already deleted, since the deletion is recorded only on the delayed items. We run delayed items after every unlink operation during log replay, except at unlink_old_inode_refs() and at add_inode_ref(). This was due to an overlook, as delayed items should be run after evert unlink, for the reasons stated above. So fix those two cases. Fixes: 0d836392cadd5 ("Btrfs: fix mount failure after fsync due to hard link recreation") Fixes: 1f250e929a9c9 ("Btrfs: fix log replay failure after unlink and link combination") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: qgroup: fix deadlock between rescan worker and remove qgroupSidong Yang
The commit e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker") by Kawasaki resolves deadlock between quota disable and qgroup rescan worker. But also there is a deadlock case like it. It's about enabling or disabling quota and creating or removing qgroup. It can be reproduced in simple script below. for i in {1..100} do btrfs quota enable /mnt & btrfs qgroup create 1/0 /mnt & btrfs qgroup destroy 1/0 /mnt & btrfs quota disable /mnt & done Here's why the deadlock happens: 1) The quota rescan task is running. 2) Task A calls btrfs_quota_disable(), locks the qgroup_ioctl_lock mutex, and then calls btrfs_qgroup_wait_for_completion(), to wait for the quota rescan task to complete. 3) Task B calls btrfs_remove_qgroup() and it blocks when trying to lock the qgroup_ioctl_lock mutex, because it's being held by task A. At that point task B is holding a transaction handle for the current transaction. 4) The quota rescan task calls btrfs_commit_transaction(). This results in it waiting for all other tasks to release their handles on the transaction, but task B is blocked on the qgroup_ioctl_lock mutex while holding a handle on the transaction, and that mutex is being held by task A, which is waiting for the quota rescan task to complete, resulting in a deadlock between these 3 tasks. To resolve this issue, the thread disabling quota should unlock qgroup_ioctl_lock before waiting rescan completion. Move btrfs_qgroup_wait_for_completion() after unlock of qgroup_ioctl_lock. Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: fix relocation crash due to premature return from ↵Omar Sandoval
btrfs_commit_transaction() We are seeing crashes similar to the following trace: [38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs] [38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54 [38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs] [38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206 [38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14 [38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360 [38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000 [38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800 [38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360 [38.987146] FS: 00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000 [38.988662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0 [38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [38.992528] Call Trace: [38.992854] <TASK> [38.993148] btrfs_relocate_chunk+0x27/0xe0 [btrfs] [38.993941] btrfs_balance+0x78e/0xea0 [btrfs] [38.994801] ? vsnprintf+0x33c/0x520 [38.995368] ? __kmalloc_track_caller+0x351/0x440 [38.996198] btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs] [38.997084] btrfs_ioctl+0x11b0/0x2da0 [btrfs] [38.997867] ? mod_objcg_state+0xee/0x340 [38.998552] ? seq_release+0x24/0x30 [38.999184] ? proc_nr_files+0x30/0x30 [38.999654] ? call_rcu+0xc8/0x2f0 [39.000228] ? __x64_sys_ioctl+0x84/0xc0 [39.000872] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [39.001973] __x64_sys_ioctl+0x84/0xc0 [39.002566] do_syscall_64+0x3a/0x80 [39.003011] entry_SYSCALL_64_after_hwframe+0x44/0xae [39.003735] RIP: 0033:0x7f11c166959b [39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b [39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003 [39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0 [39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3 [39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001 [39.015040] </TASK> [39.015418] ---[ end trace 0000000000000000 ]--- [43.131559] ------------[ cut here ]------------ [43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717! [43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI [43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G W 5.17.0-rc4 #54 [43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs] [43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246 [43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001 [43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff [43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50 [43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000 [43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000 [43.145686] FS: 00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000 [43.146808] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0 [43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [43.150559] Call Trace: [43.150904] <TASK> [43.151253] btrfs_finish_extent_commit+0x88/0x290 [btrfs] [43.152127] btrfs_commit_transaction+0x74f/0xaa0 [btrfs] [43.152932] ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs] [43.153786] btrfs_ioctl+0x1edc/0x2da0 [btrfs] [43.154475] ? __check_object_size+0x150/0x170 [43.155170] ? preempt_count_add+0x49/0xa0 [43.155753] ? __x64_sys_ioctl+0x84/0xc0 [43.156437] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [43.157456] __x64_sys_ioctl+0x84/0xc0 [43.157980] do_syscall_64+0x3a/0x80 [43.158543] entry_SYSCALL_64_after_hwframe+0x44/0xae [43.159231] RIP: 0033:0x7f7657f1e59b [43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b [43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003 [43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 [43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800 [43.166863] </TASK> [43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common [43.169552] ---[ end trace 0000000000000000 ]--- [43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs] [43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246 [43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001 [43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff [43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50 [43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000 [43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000 [43.180071] FS: 00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000 [43.181073] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0 [43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 We first hit the WARN_ON(rc->block_group->pinned > 0) in btrfs_relocate_block_group() and then the BUG_ON(!cache) in unpin_extent_range(). This tells us that we are exiting relocation and removing the block group with bytes still pinned for that block group. This is supposed to be impossible: the last thing relocate_block_group() does is commit the transaction to get rid of pinned extents. Commit d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit") introduced an optimization so that commits from fsync don't have to wait for the previous commit to unpin extents. This was only intended to affect fsync, but it inadvertently made it possible for any commit to skip waiting for the previous commit to unpin. This is because if a call to btrfs_commit_transaction() finds that another thread is already committing the transaction, it waits for the other thread to complete the commit and then returns. If that other thread was in fsync, then it completes the commit without completing the previous commit. This makes the following sequence of events possible: Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________ btrfs_commit_transaction(N) | | btrfs_run_delayed_refs | | pin extents | | ... | | state = UNBLOCKED |btrfs_sync_file | | btrfs_start_transaction(N + 1) |relocate_block_group | | btrfs_join_transaction(N + 1) | btrfs_commit_transaction(N + 1) | ... | trans->state = COMMIT_START | | | btrfs_commit_transaction(N + 1) | | wait_for_commit(N + 1, COMPLETED) | wait_for_commit(N, SUPER_COMMITTED)| state = SUPER_COMMITTED | ... | btrfs_finish_extent_commit| | unpin_extent_range() | trans->state = COMPLETED | | | return | | ... | |Thread 1 isn't done, so pinned > 0 | |and we WARN | | | |btrfs_remove_block_group unpin_extent_range() | | Thread 3 removed the | | block group, so we BUG| | There are other sequences involving SUPER_COMMITTED transactions that can cause a similar outcome. We could fix this by making relocation explicitly wait for unpinning, but there may be other cases that need it. Josef mentioned ENOSPC flushing and the free space cache inode as other potential victims. Rather than playing whack-a-mole, this fix is conservative and makes all commits not in fsync wait for all previous transactions, which is what the optimization intended. Fixes: d0c2f4fa555e ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: do not start relocation until in progress drops are doneJosef Bacik
We hit a bug with a recovering relocation on mount for one of our file systems in production. I reproduced this locally by injecting errors into snapshot delete with balance running at the same time. This presented as an error while looking up an extent item WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680 CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8 RIP: 0010:lookup_inline_extent_backref+0x647/0x680 RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000 RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001 R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000 R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000 FS: 0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0 Call Trace: <TASK> insert_inline_extent_backref+0x46/0xd0 __btrfs_inc_extent_ref.isra.0+0x5f/0x200 ? btrfs_merge_delayed_refs+0x164/0x190 __btrfs_run_delayed_refs+0x561/0xfa0 ? btrfs_search_slot+0x7b4/0xb30 ? btrfs_update_root+0x1a9/0x2c0 btrfs_run_delayed_refs+0x73/0x1f0 ? btrfs_update_root+0x1a9/0x2c0 btrfs_commit_transaction+0x50/0xa50 ? btrfs_update_reloc_root+0x122/0x220 prepare_to_merge+0x29f/0x320 relocate_block_group+0x2b8/0x550 btrfs_relocate_block_group+0x1a6/0x350 btrfs_relocate_chunk+0x27/0xe0 btrfs_balance+0x777/0xe60 balance_kthread+0x35/0x50 ? btrfs_balance+0xe60/0xe60 kthread+0x16b/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Normally snapshot deletion and relocation are excluded from running at the same time by the fs_info->cleaner_mutex. However if we had a pending balance waiting to get the ->cleaner_mutex, and a snapshot deletion was running, and then the box crashed, we would come up in a state where we have a half deleted snapshot. Again, in the normal case the snapshot deletion needs to complete before relocation can start, but in this case relocation could very well start before the snapshot deletion completes, as we simply add the root to the dead roots list and wait for the next time the cleaner runs to clean up the snapshot. Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that had a pending drop_progress key. If they do then we know we were in the middle of the drop operation and set a flag on the fs_info. Then balance can wait until this flag is cleared to start up again. If there are DEAD_ROOT's that don't have a drop_progress set then we're safe to start balance right away as we'll be properly protected by the cleaner_mutex. CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: tree-checker: use u64 for item data end to avoid overflowSu Yue
User reported there is an array-index-out-of-bounds access while mounting the crafted image: [350.411942 ] loop0: detected capacity change from 0 to 262144 [350.427058 ] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 8 /dev/loop0 scanned by systemd-udevd (1044) [350.428564 ] BTRFS info (device loop0): disk space caching is enabled [350.428568 ] BTRFS info (device loop0): has skinny extents [350.429589 ] [350.429619 ] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1 [350.429636 ] index 1048096 is out of range for type 'page *[16]' [350.429650 ] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 [350.429652 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [350.429653 ] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs] [350.429772 ] Call Trace: [350.429774 ] <TASK> [350.429776 ] dump_stack_lvl+0x47/0x5c [350.429780 ] ubsan_epilogue+0x5/0x50 [350.429786 ] __ubsan_handle_out_of_bounds+0x66/0x70 [350.429791 ] btrfs_get_16+0xfd/0x120 [btrfs] [350.429832 ] check_leaf+0x754/0x1a40 [btrfs] [350.429874 ] ? filemap_read+0x34a/0x390 [350.429878 ] ? load_balance+0x175/0xfc0 [350.429881 ] validate_extent_buffer+0x244/0x310 [btrfs] [350.429911 ] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs] [350.429935 ] end_bio_extent_readpage+0x3af/0x850 [btrfs] [350.429969 ] ? newidle_balance+0x259/0x480 [350.429972 ] end_workqueue_fn+0x29/0x40 [btrfs] [350.429995 ] btrfs_work_helper+0x71/0x330 [btrfs] [350.430030 ] ? __schedule+0x2fb/0xa40 [350.430033 ] process_one_work+0x1f6/0x400 [350.430035 ] ? process_one_work+0x400/0x400 [350.430036 ] worker_thread+0x2d/0x3d0 [350.430037 ] ? process_one_work+0x400/0x400 [350.430038 ] kthread+0x165/0x190 [350.430041 ] ? set_kthread_struct+0x40/0x40 [350.430043 ] ret_from_fork+0x1f/0x30 [350.430047 ] </TASK> [350.430047 ] [350.430077 ] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2 btrfs check reports: corrupt leaf: root=3 block=20975616 physical=20975616 slot=1, unexpected item end, have 4294971193 expect 3897 The first slot item offset is 4293005033 and the size is 1966160. In check_leaf, we use btrfs_item_end() to check item boundary versus extent_buffer data size. However, return type of btrfs_item_end() is u32. (u32)(4293005033 + 1966160) == 3897, overflow happens and the result 3897 equals to leaf data size reasonably. Fix it by use u64 variable to store item data end in check_leaf() to avoid u32 overflow. This commit does solve the invalid memory access showed by the stack trace. However, its metadata profile is DUP and another copy of the leaf is fine. So the image can be mounted successfully. But when umount is called, the ASSERT btrfs_mark_buffer_dirty() will be triggered because the only node in extent tree has 0 item and invalid owner. It's solved by another commit "btrfs: check extent buffer owner against the owner rootid". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299 Reported-by: Wenqing Liu <wenqingliu0120@gmail.com> CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: do not WARN_ON() if we have PageError setJosef Bacik
Whenever we do any extent buffer operations we call assert_eb_page_uptodate() to complain loudly if we're operating on an non-uptodate page. Our overnight tests caught this warning earlier this week WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50 CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G W 5.17.0-rc3+ #564 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: btrfs-cache btrfs_work_helper RIP: 0010:assert_eb_page_uptodate+0x3f/0x50 RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246 RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000 RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0 RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000 R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1 R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000 FS: 0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0 Call Trace: extent_buffer_test_bit+0x3f/0x70 free_space_test_bit+0xa6/0xc0 load_free_space_tree+0x1f6/0x470 caching_thread+0x454/0x630 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? lock_release+0x1f0/0x2d0 btrfs_work_helper+0xf2/0x3e0 ? lock_release+0x1f0/0x2d0 ? finish_task_switch.isra.0+0xf9/0x3a0 process_one_work+0x26d/0x580 ? process_one_work+0x580/0x580 worker_thread+0x55/0x3b0 ? process_one_work+0x580/0x580 kthread+0xf0/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 This was partially fixed by c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it"), however all that fix did was keep us from finding extent buffers after a failed writeout. It didn't keep us from continuing to use a buffer that we already had found. In this case we're searching the commit root to cache the block group, so we can start committing the transaction and switch the commit root and then start writing. After the switch we can look up an extent buffer that hasn't been written yet and start processing that block group. Then we fail to write that block out and clear Uptodate on the page, and then we start spewing these errors. Normally we're protected by the tree lock to a certain degree here. If we read a block we have that block read locked, and we block the writer from locking the block before we submit it for the write. However this isn't necessarily fool proof because the read could happen before we do the submit_bio and after we locked and unlocked the extent buffer. Also in this particular case we have path->skip_locking set, so that won't save us here. We'll simply get a block that was valid when we read it, but became invalid while we were using it. What we really want is to catch the case where we've "read" a block but it's not marked Uptodate. On read we ClearPageError(), so if we're !Uptodate and !Error we know we didn't do the right thing for reading the page. Fix this by checking !Uptodate && !Error, this way we will not complain if our buffer gets invalidated while we're using it, and we'll maintain the spirit of the check which is to make sure we have a fully in-cache block while we're messing with it. CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: fix lost prealloc extents beyond eof after full fsyncFilipe Manana
When doing a full fsync, if we have prealloc extents beyond (or at) eof, and the leaves that contain them were not modified in the current transaction, we end up not logging them. This results in losing those extents when we replay the log after a power failure, since the inode is truncated to the current value of the logged i_size. Just like for the fast fsync path, we need to always log all prealloc extents starting at or beyond i_size. The fast fsync case was fixed in commit 471d557afed155 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay") but it missed the full fsync path. The problem exists since the very early days, when the log tree was added by commit e02119d5a7b439 ("Btrfs: Add a write ahead tree log to optimize synchronous operations"). Example reproducer: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt # Create our test file with many file extent items, so that they span # several leaves of metadata, even if the node/page size is 64K. Use # direct IO and not fsync/O_SYNC because it's both faster and it avoids # clearing the full sync flag from the inode - we want the fsync below # to trigger the slow full sync code path. $ xfs_io -f -d -c "pwrite -b 4K 0 16M" /mnt/foo # Now add two preallocated extents to our file without extending the # file's size. One right at i_size, and another further beyond, leaving # a gap between the two prealloc extents. $ xfs_io -c "falloc -k 16M 1M" /mnt/foo $ xfs_io -c "falloc -k 20M 1M" /mnt/foo # Make sure everything is durably persisted and the transaction is # committed. This makes all created extents to have a generation lower # than the generation of the transaction used by the next write and # fsync. sync # Now overwrite only the first extent, which will result in modifying # only the first leaf of metadata for our inode. Then fsync it. This # fsync will use the slow code path (inode full sync bit is set) because # it's the first fsync since the inode was created/loaded. $ xfs_io -c "pwrite 0 4K" -c "fsync" /mnt/foo # Extent list before power failure. $ xfs_io -c "fiemap -v" /mnt/foo /mnt/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 2178048..2178055 8 0x0 1: [8..16383]: 26632..43007 16376 0x0 2: [16384..32767]: 2156544..2172927 16384 0x0 3: [32768..34815]: 2172928..2174975 2048 0x800 4: [34816..40959]: hole 6144 5: [40960..43007]: 2174976..2177023 2048 0x801 <power fail> # Mount fs again, trigger log replay. $ mount /dev/sdc /mnt # Extent list after power failure and log replay. $ xfs_io -c "fiemap -v" /mnt/foo /mnt/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 2178048..2178055 8 0x0 1: [8..16383]: 26632..43007 16376 0x0 2: [16384..32767]: 2156544..2172927 16384 0x1 # The prealloc extents at file offsets 16M and 20M are missing. So fix this by calling btrfs_log_prealloc_extents() when we are doing a full fsync, so that we always log all prealloc extents beyond eof. A test case for fstests will follow soon. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02btrfs: subpage: fix a wrong check on subpage->writersQu Wenruo
[BUG] When looping btrfs/074 with 64K page size and 4K sectorsize, there is a low chance (1/50~1/100) to crash with the following ASSERT() triggered in btrfs_subpage_start_writer(): ret = atomic_add_return(nbits, &subpage->writers); ASSERT(ret == nbits); <<< This one <<< [CAUSE] With more debugging output on the parameters of btrfs_subpage_start_writer(), it shows a very concerning error: ret=29 nbits=13 start=393216 len=53248 For @nbits it's correct, but @ret which is the returned value from atomic_add_return(), it's not only larger than nbits, but also larger than max sectors per page value (for 64K page size and 4K sector size, it's 16). This indicates that some call sites are not properly decreasing the value. And that's exactly the case, in btrfs_page_unlock_writer(), due to the fact that we can have page locked either by lock_page() or process_one_page(), we have to check if the subpage has any writer. If no writers, it's locked by lock_page() and we only need to unlock it. But unfortunately the check for the writers are completely opposite: if (atomic_read(&subpage->writers)) /* No writers, locked by plain lock_page() */ return unlock_page(page); We directly unlock the page if it has writers, which is the completely opposite what we want. Thankfully the affected call site is only limited to extent_write_locked_range(), so it's mostly affecting compressed write. [FIX] Just fix the wrong check condition to fix the bug. Fixes: e55a0de18572 ("btrfs: rework page locking in __extent_writepage()") CC: stable@vger.kernel.org # 5.16 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-02staging: rtl8723bs: Improve the comment explaining the locking rulesHans de Goede
rtw_mlme.h has a comment which briefly describes the locking rules for the rtl8723bs driver, improve this to also mention the locking order of xmit_priv.lock vs the lock(s) embedded in the various queues. Cc: Fabio Aiuto <fabioaiuto83@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220302101637.26542-2-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02staging: rtl8723bs: Fix access-point mode deadlockHans de Goede
Commit 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock into 2 locks in attempt to fix a lockdep reported issue with the locking order of the sta_hash_lock vs pxmitpriv->lock. But in the end this turned out to not fully solve the sta_hash_lock issue so commit a7ac783c338b ("staging: rtl8723bs: remove a second possible deadlock") was added to fix this in another way. The original fix was kept as it was still seen as a good thing to have, but now it turns out that it creates a deadlock in access-point mode: [Feb20 23:47] ====================================================== [ +0.074085] WARNING: possible circular locking dependency detected [ +0.074077] 5.16.0-1-amd64 #1 Tainted: G C E [ +0.064710] ------------------------------------------------------ [ +0.074075] ksoftirqd/3/29 is trying to acquire lock: [ +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs] [ +0.114921] but task is already holding lock: [ +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs] [ +0.116976] which lock already depends on the new lock. [ +0.098037] the existing dependency chain (in reverse order) is: [ +0.089704] -> #1 (&psta->sleep_q.lock){+.-.}-{2:2}: [ +0.077232] _raw_spin_lock_bh+0x34/0x40 [ +0.053261] xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs] [ +0.082572] rtw_xmit+0x58b/0x940 [r8723bs] [ +0.056528] _rtw_xmit_entry+0xba/0x350 [r8723bs] [ +0.062755] dev_hard_start_xmit+0xf1/0x320 [ +0.056381] sch_direct_xmit+0x9e/0x360 [ +0.052212] __dev_queue_xmit+0xce4/0x1080 [ +0.055334] ip6_finish_output2+0x18f/0x6e0 [ +0.056378] ndisc_send_skb+0x2c8/0x870 [ +0.052209] ndisc_send_ns+0xd3/0x210 [ +0.050130] addrconf_dad_work+0x3df/0x5a0 [ +0.055338] process_one_work+0x274/0x5a0 [ +0.054296] worker_thread+0x52/0x3b0 [ +0.050124] kthread+0x16c/0x1a0 [ +0.044925] ret_from_fork+0x1f/0x30 [ +0.049092] -> #0 (&pxmitpriv->lock){+.-.}-{2:2}: [ +0.074101] __lock_acquire+0x10f5/0x1d80 [ +0.054298] lock_acquire+0xd7/0x300 [ +0.049088] _raw_spin_lock_bh+0x34/0x40 [ +0.053248] rtw_xmit_classifier+0x8a/0x140 [r8723bs] [ +0.066949] rtw_xmitframe_enqueue+0xa/0x20 [r8723bs] [ +0.066946] rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs] [ +0.078386] wakeup_sta_to_xmit+0xa6/0x300 [r8723bs] [ +0.065903] rtw_recv_entry+0xe36/0x1160 [r8723bs] [ +0.063809] rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs] [ +0.071093] tasklet_action_common.constprop.0+0xe5/0x110 [ +0.070966] __do_softirq+0x16f/0x50a [ +0.050134] __irq_exit_rcu+0xeb/0x140 [ +0.051172] irq_exit_rcu+0xa/0x20 [ +0.047006] common_interrupt+0xb8/0xd0 [ +0.052214] asm_common_interrupt+0x1e/0x40 [ +0.056381] finish_task_switch.isra.0+0x100/0x3a0 [ +0.063670] __schedule+0x3ad/0xd20 [ +0.048047] schedule+0x4e/0xc0 [ +0.043880] smpboot_thread_fn+0xc4/0x220 [ +0.054298] kthread+0x16c/0x1a0 [ +0.044922] ret_from_fork+0x1f/0x30 [ +0.049088] other info that might help us debug this: [ +0.095950] Possible unsafe locking scenario: [ +0.070952] CPU0 CPU1 [ +0.054282] ---- ---- [ +0.054285] lock(&psta->sleep_q.lock); [ +0.047004] lock(&pxmitpriv->lock); [ +0.074082] lock(&psta->sleep_q.lock); [ +0.077209] lock(&pxmitpriv->lock); [ +0.043873] *** DEADLOCK *** [ +0.070950] 1 lock held by ksoftirqd/3/29: [ +0.049082] #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs] Analysis shows that in hindsight the splitting of the lock was not a good idea, so revert this to fix the access-point mode deadlock. Note this is a straight-forward revert done with git revert, the commented out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the code before the reverted changes. Fixes: 54659ca026e5 ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)") Cc: stable <stable@vger.kernel.org> Cc: Fabio Aiuto <fabioaiuto83@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542 Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02erofs: fix ztailpacking on > 4GiB filesystemsGao Xiang
z_idataoff here is an absolute physical offset, so it should use erofs_off_t (64 bits at least). Otherwise, it'll get trimmed and cause the decompresion failure. Link: https://lore.kernel.org/r/20220222033118.20540-1-hsiangkao@linux.alibaba.com Fixes: ab92184ff8f1 ("erofs: add on-disk compressed tail-packing inline support") Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-03-02spi: spi-zynqmp-gqspi: Handle error for dma_set_maskJiasheng Jiang
As the potential failure of the dma_set_mask(), it should be better to check it and return error if fails. Fixes: 126bdb606fd2 ("spi: spi-zynqmp-gqspi: return -ENOMEM if dma_map_single fails") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Link: https://lore.kernel.org/r/20220302092051.121343-1-jiasheng@iscas.ac.cn Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-02spi: dt-bindings: mediatek: Set min size for 'mediatek,pad-select'Rob Herring
The minimum array length defaults to the same size as the maximum. For 'mediatek,pad-select', the example has a length of 2 and in-tree .dts files have a length of 1, but the schema says the length must be 4. There's currently no warning in the example because the schema fixups are not handling this case correctly. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220301212358.1887668-1-robh@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-02spi: dt-bindings: renesas,rspi: Drop comment for generic compatible stringLad Prabhakar
Drop comment "# RZ/A and RZ/G2{L,LC}" for generic compatible string "renesas,rspi-rz" as this will avoid changing the line for every new SoC addition. Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20220301134244.20174-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-02regulator: virtual: add devicetree supportVincent Whitchurch
The reg-virt-consumer is very useful for development and testing of regulator drivers since it allows voltages and modes to be set from userspace. However, it currently requires platform data so it cannot be used without patching the kernel. Add support for probing it from the devicetree to remedy this. Since this driver is only meant for testing and is a purely software construct, no binding documentation is added. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Link: https://lore.kernel.org/r/20220301111831.3742383-4-vincent.whitchurch@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-02regulator: virtual: warn against production useVincent Whitchurch
This driver is only meant for debugging and testing. Currently, it's not possible to use it without patching the kernel since it requires platform data, but we'll be adding devicetree support, so add a loud warning to make it clear that it's still only meant for debugging and testing. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Link: https://lore.kernel.org/r/20220301111831.3742383-3-vincent.whitchurch@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-02regulator: virtual: use dev_err_probe()Vincent Whitchurch
Use dev_err_probe() to avoid printing spurious warnings on probe deferral. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Link: https://lore.kernel.org/r/20220301111831.3742383-2-vincent.whitchurch@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-03-02irqchip/sifive-plic: Disable S-mode IRQs if running in M-modeNiklas Cassel
When detecting a context for a privilege mode different from the current running privilege mode, we simply skip to the next context register. This means that we never clear the S-mode enable bits when running in M-mode. On canaan k210, a bunch of S-mode interrupts are enabled by the bootrom. These S-mode specific interrupts should never trigger, since we never set the mie.SEIE bit in the parent interrupt controller (riscv-intc). However, we will be able to see the mip.SEIE bit set as pending. This isn't a good default when CONFIG_RISCV_M_MODE is set, since in that case we will never enter a lower privilege mode (e.g. S-mode). Let's clear the S-mode enable bits when running the kernel in M-mode, such that we won't have a interrupt pending bit set, which we will never clear. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220302131544.3166154-3-Niklas.Cassel@wdc.com
2022-03-02irqchip/sifive-plic: Improve naming scheme for per context offsetsNiklas Cassel
The PLIC supports a fixed number of contexts (15872). Each context has fixed register offsets in PLIC. The number of contexts that we need to initialize depends on the privilege modes supported by each hart. Therefore, this mapping between PLIC context registers to hart privilege modes is platform specific, and is currently supplied via device tree. For example, canaan,k210 has the following mapping: Context0: hart0 M-mode Context1: hart0 S-mode Context2: hart1 M-mode Context3: hart1 S-mode While sifive,fu540 has the following mapping: Context0: hart0 M-mode Context1: hart1 M-mode Context2: hart1 S-mode Because the number of contexts per hart is not fixed, the names ENABLE_PER_HART and CONTEXT_PER_HART for the register offsets are quite confusing and might mislead the reader to think that these are fixed register offsets per hart. Rename the offsets to more clearly highlight that these are per PLIC context and not per hart. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220302131544.3166154-2-Niklas.Cassel@wdc.com
2022-03-02ALSA: intel_hdmi: Fix reference to PCM buffer addressZhen Ni
PCM buffers might be allocated dynamically when the buffer preallocation failed or a larger buffer is requested, and it's not guaranteed that substream->dma_buffer points to the actually used buffer. The driver needs to refer to substream->runtime->dma_addr instead for the buffer address. Signed-off-by: Zhen Ni <nizhen@uniontech.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220302074241.30469-1-nizhen@uniontech.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-03-02batman-adv: Don't expect inter-netns unique iflink indicesSven Eckelmann
The ifindex doesn't have to be unique for multiple network namespaces on the same machine. $ ip netns add test1 $ ip -net test1 link add dummy1 type dummy $ ip netns add test2 $ ip -net test2 link add dummy2 type dummy $ ip -net test1 link show dev dummy1 6: dummy1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 96:81:55:1e:dd:85 brd ff:ff:ff:ff:ff:ff $ ip -net test2 link show dev dummy2 6: dummy2: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 5a:3c:af:35:07:c3 brd ff:ff:ff:ff:ff:ff But the batman-adv code to walk through the various layers of virtual interfaces uses this assumption because dev_get_iflink handles it internally and doesn't return the actual netns of the iflink. And dev_get_iflink only documents the situation where ifindex == iflink for physical devices. But only checking for dev->netdev_ops->ndo_get_iflink is also not an option because ipoib_get_iflink implements it even when it sometimes returns an iflink != ifindex and sometimes iflink == ifindex. The caller must therefore make sure itself to check both netns and iflink + ifindex for equality. Only when they are equal, a "physical" interface was detected which should stop the traversal. On the other hand, vxcan_get_iflink can also return 0 in case there was currently no valid peer. In this case, it is still necessary to stop. Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface") Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02batman-adv: Request iflink once in batadv_get_real_netdeviceSven Eckelmann
There is no need to call dev_get_iflink multiple times for the same net_device in batadv_get_real_netdevice. And since some of the ndo_get_iflink callbacks are dynamic (for example via RCUs like in vxcan_get_iflink), it could easily happen that the returned values are not stable. The pre-checks before __dev_get_by_index are then of course bogus. Fixes: 5ed4a460a1d3 ("batman-adv: additional checks for virtual interfaces on top of WiFi") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-02batman-adv: Request iflink once in batadv-on-batadv checkSven Eckelmann
There is no need to call dev_get_iflink multiple times for the same net_device in batadv_is_on_batman_iface. And since some of the .ndo_get_iflink callbacks are dynamic (for example via RCUs like in vxcan_get_iflink), it could easily happen that the returned values are not stable. The pre-checks before __dev_get_by_index are then of course bogus. Fixes: b7eddd0b3950 ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2022-03-01Input: elan_i2c - fix regulator enable count imbalance after suspend/resumeHans de Goede
Before these changes elan_suspend() would only disable the regulator when device_may_wakeup() returns false; whereas elan_resume() would unconditionally enable it, leading to an enable count imbalance when device_may_wakeup() returns true. This triggers the "WARN_ON(regulator->enable_count)" in regulator_put() when the elan_i2c driver gets unbound, this happens e.g. with the hot-plugable dock with Elan I2C touchpad for the Asus TF103C 2-in-1. Fix this by making the regulator_enable() call also be conditional on device_may_wakeup() returning false. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220131135436.29638-2-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-03-01Input: elan_i2c - move regulator_[en|dis]able() out of elan_[en|dis]able_power()Hans de Goede
elan_disable_power() is called conditionally on suspend, where as elan_enable_power() is always called on resume. This leads to an imbalance in the regulator's enable count. Move the regulator_[en|dis]able() calls out of elan_[en|dis]able_power() in preparation of fixing this. No functional changes intended. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220131135436.29638-1-hdegoede@redhat.com [dtor: consolidate elan_[en|dis]able() into elan_set_power()] Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-03-01tracing/histogram: Fix sorting on old "cpu" valueSteven Rostedt (Google)
When trying to add a histogram against an event with the "cpu" field, it was impossible due to "cpu" being a keyword to key off of the running CPU. So to fix this, it was changed to "common_cpu" to match the other generic fields (like "common_pid"). But since some scripts used "cpu" for keying off of the CPU (for events that did not have "cpu" as a field, which is most of them), a backward compatibility trick was added such that if "cpu" was used as a key, and the event did not have "cpu" as a field name, then it would fallback and switch over to "common_cpu". This fix has a couple of subtle bugs. One was that when switching over to "common_cpu", it did not change the field name, it just set a flag. But the code still found a "cpu" field. The "cpu" field is used for filtering and is returned when the event does not have a "cpu" field. This was found by: # cd /sys/kernel/tracing # echo hist:key=cpu,pid:sort=cpu > events/sched/sched_wakeup/trigger # cat events/sched/sched_wakeup/hist Which showed the histogram unsorted: { cpu: 19, pid: 1175 } hitcount: 1 { cpu: 6, pid: 239 } hitcount: 2 { cpu: 23, pid: 1186 } hitcount: 14 { cpu: 12, pid: 249 } hitcount: 2 { cpu: 3, pid: 994 } hitcount: 5 Instead of hard coding the "cpu" checks, take advantage of the fact that trace_event_field_field() returns a special field for "cpu" and "CPU" if the event does not have "cpu" as a field. This special field has the "filter_type" of "FILTER_CPU". Check that to test if the returned field is of the CPU type instead of doing the string compare. Also, fix the sorting bug by testing for the hist_field flag of HIST_FIELD_FL_CPU when setting up the sort routine. Otherwise it will use the special CPU field to know what compare routine to use, and since that special field does not have a size, it returns tracing_map_cmp_none. Cc: stable@vger.kernel.org Fixes: 1e3bac71c505 ("tracing/histogram: Rename "cpu" to "common_cpu"") Reported-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-01net: dsa: restore error path of dsa_tree_change_tag_protoVladimir Oltean
When the DSA_NOTIFIER_TAG_PROTO returns an error, the user space process which initiated the protocol change exits the kernel processing while still holding the rtnl_mutex. So any other process attempting to lock the rtnl_mutex would deadlock after such event. The error handling of DSA_NOTIFIER_TAG_PROTO was inadvertently changed by the blamed commit, introducing this regression. We must still call rtnl_unlock(), and we must still call DSA_NOTIFIER_TAG_PROTO for the old protocol. The latter is due to the limiting design of notifier chains for cross-chip operations, which don't have a built-in error recovery mechanism - we should look into using notifier_call_chain_robust for that. Fixes: dc452a471dba ("net: dsa: introduce tagger-owned storage for private and shared data") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220228141715.146485-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01Merge tag 'for-net-2022-03-01' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix regression with scanning not working in some systems. * tag 'for-net-2022-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: Fix not checking MGMT cmd pending queue ==================== Link: https://lore.kernel.org/r/20220302004330.125536-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01ELF: Properly redefine PT_GNU_* in terms of PT_LOOSKees Cook
The PT_GNU_* program header types are actually offsets from PT_LOOS, so redefine them as such, reorder them, and add the missing PT_GNU_RELRO. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2022-03-01MAINTAINERS: Update execve entry with more detailsKees Cook
The UAPI elf.h header was missed in the original MAINTAINER entry. Add it. Include linux-mm mailing list since that's where execve has traditionally been discussed. Note that this area is Supported, and aim at the git tree. Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2022-03-01exec: cleanup commentsTom Rix
Remove the second 'from'. Replace 'backwords' with 'backwards'. Replace 'visibile' with 'visible'. Signed-off-by: Tom Rix <trix@redhat.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220211160940.2516243-1-trix@redhat.com
2022-03-01fs/binfmt_elf: Refactor load_elf_binary functionAkira Kawata
I delete load_addr because it is not used anymore. And I rename load_addr_set to first_pt_load because it is used only to capture the first iteration of the loop. Signed-off-by: Akira Kawata <akirakawata1@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127124014.338760-3-akirakawata1@gmail.com
2022-03-01fs/binfmt_elf: Fix AT_PHDR for unusual ELF filesAkira Kawata
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=197921 As pointed out in the discussion of buglink, we cannot calculate AT_PHDR as the sum of load_addr and exec->e_phoff. : The AT_PHDR of ELF auxiliary vectors should point to the memory address : of program header. But binfmt_elf.c calculates this address as follows: : : NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff); : : which is wrong since e_phoff is the file offset of program header and : load_addr is the memory base address from PT_LOAD entry. : : The ld.so uses AT_PHDR as the memory address of program header. In normal : case, since the e_phoff is usually 64 and in the first PT_LOAD region, it : is the correct program header address. : : But if the address of program header isn't equal to the first PT_LOAD : address + e_phoff (e.g. Put the program header in other non-consecutive : PT_LOAD region), ld.so will try to read program header from wrong address : then crash or use incorrect program header. This is because exec->e_phoff is the offset of PHDRs in the file and the address of PHDRs in the memory may differ from it. This patch fixes the bug by calculating the address of program headers from PT_LOADs directly. Signed-off-by: Akira Kawata <akirakawata1@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220127124014.338760-2-akirakawata1@gmail.com
2022-03-01binfmt: move more stuff undef CONFIG_COREDUMPAlexey Dobriyan
struct linux_binfmt::core_dump and struct min_coredump::min_coredump are used under CONFIG_COREDUMP only. Shrink those embedded configs a bit. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/YglbIFyN+OtwVyjW@localhost.localdomain