linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-10-07	dt-bindings: Add missing 'unevaluatedProperties'	Rob Herring
	This doesn't yet do anything in the tools, but make it explicit so we can check either 'unevaluatedProperties' or 'additionalProperties' is present in schemas. 'unevaluatedProperties' is appropriate when including another schema (via '$ref') and all possible properties and/or child nodes are not explicitly listed in the schema with the '$ref'. This is in preparation to add a meta-schema to check for missing 'unevaluatedProperties' or 'additionalProperties'. This has been a constant source of review issues. Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Wolfram Sang <wsa@kernel.org> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-By: Vinod Koul <vkoul@kernel.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Link: https://lore.kernel.org/r/20201005183830.486085-2-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2020-10-07	locking/atomics: Check atomic-arch-fallback.h too	Paul Bolle
	The sha1sum of include/linux/atomic-arch-fallback.h isn't checked by check-atomics.sh. It's not clear why it's skipped so let's check it too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201001202028.1048418-1-pebolle@tiscali.nl
2020-10-07	locking/seqlock: Tweak DEFINE_SEQLOCK() kernel doc	Sebastian Andrzej Siewior
	ctags creates a warning: \|ctags: Warning: include/linux/seqlock.h:738: null expansion of name pattern "\2" The DEFINE_SEQLOCK() macro is passed to ctags and being told to expect an argument. Add a dummy argument to keep ctags quiet. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20200924154851.skmswuyj322yuz4g@linutronix.de
2020-10-07	Docs: Fixing spelling errors in Documentation/devicetree/bindings/	Marlon Rac Cambasis
	Revised patch fixing six spelling errors within Documentation/devicetree/bindings/. "specfied" replaced with "specified" in all three files modified. "atleast" seperated into "at least" three times in samsung-pinctrl.txt. This should remove any confusion that a reader might have. Signed-off-by: Marlon Rac Cambasis <marlonrc08@gmail.com> Link: https://lore.kernel.org/r/20201007071705.GA11381@marlonpc-debian Signed-off-by: Rob Herring <robh@kernel.org>
2020-10-07	x86/asm: Add an enqcmds() wrapper for the ENQCMDS instruction	Dave Jiang
	Currently, the MOVDIR64B instruction is used to atomically submit 64-byte work descriptors to devices. Although it can encounter errors like device queue full, command not accepted, device not ready, etc when writing to a device MMIO, MOVDIR64B can not report back on errors from the device itself. This means that MOVDIR64B users need to separately interact with a device to see if a descriptor was successfully queued, which slows down device interactions. ENQCMD and ENQCMDS also atomically submit 64-byte work descriptors to devices. But, they can report back errors directly from the device, such as if the device was busy, or device not enabled or does not support the command. This immediate feedback from the submission instruction itself reduces the number of interactions with the device and can greatly increase efficiency. ENQCMD can be used at any privilege level, but can effectively only submit work on behalf of the current process. ENQCMDS is a ring0-only instruction and can explicitly specify a process context instead of being tied to the current process or needing to reprogram the IA32_PASID MSR. Use ENQCMDS for work submission within the kernel because a Process Address ID (PASID) is setup to translate the kernel virtual address space. This PASID is provided to ENQCMDS from the descriptor structure submitted to the device and not retrieved from IA32_PASID MSR, which is setup for the current user address space. See Intel Software Developer’s Manual for more information on the instructions. [ bp: - Make operand constraints like movdir64b() because both insns are basically doing the same thing, more or less. - Fixup comments and cleanup. ] Link: https://lkml.kernel.org/r/20200924180041.34056-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20201005151126.657029-3-dave.jiang@intel.com
2020-10-07	x86/asm: Carve out a generic movdir64b() helper for general usage	Dave Jiang
	Carve out the MOVDIR64B inline asm primitive into a generic helper so that it can be used by other functions. Move it to special_insns.h and have iosubmit_cmds512() call it. [ bp: Massage commit message. ] Suggested-by: Michael Matz <matz@suse.de> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201005151126.657029-2-dave.jiang@intel.com
2020-10-07	ALSA: bebob: potential info leak in hwdep_read()	Dan Carpenter
	The "count" variable needs to be capped on every path so that we don't copy too much information to the user. Fixes: 618eabeae711 ("ALSA: bebob: Add hwdep interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201007074928.GA2529578@mwanda Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-10-07	ALSA: hda/realtek: Enable audio jacks of ASUS D700SA with ALC887	Jian-Hong Pan
	The ASUS D700SA desktop's audio (1043:2390) with ALC887 cannot detect the headset microphone and another headphone jack until ALC887_FIXUP_ASUS_HMIC and ALC887_FIXUP_ASUS_AUDIO quirks are applied. The NID 0x15 maps as the headset microphone and NID 0x19 maps as another headphone jack. Also need the function like alc887_fixup_asus_jack to enable the audio jacks. Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> Signed-off-by: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20201007052224.22611-1-jhp@endlessos.org Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-10-07	cpufreq: stats: Add memory barrier to store_reset()	Rafael J. Wysocki
	There is nothing to prevent the CPU or the compiler from reordering the writes to stats->reset_time and stats->reset_pending in store_reset(), in which case the readers of stats->reset_time may see a stale value. Moreover, on 32-bit arches the write to reset_time cannot be completed in one go, so the readers of it may see a partially updated value in that case. To prevent that from happening, add a write memory barrier between the writes to stats->reset_time and stats->reset_pending in store_reset() and corresponding read memory barrier in the readers of stats->reset_time. Fixes: 40c3bd4cfa6f ("cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-07	cpufreq: schedutil: Simplify sugov_fast_switch()	Rafael J. Wysocki
	Drop a redundant local variable definition from sugov_fast_switch() and rearrange the code in there to avoid the redundant logical negation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-10-07	Merge tag 'nvme-5.9-2020-10-07' of git://git.infradead.org/nvme into block-5.9	Jens Axboe
	Pull NVMe fix from Christoph: "nvme fix for 5.9: - fix a recently introduced controller leak (Logan Gunthorpe)" * tag 'nvme-5.9-2020-10-07' of git://git.infradead.org/nvme: nvme-core: put ctrl ref when module ref get fail
2020-10-07	block: soft limit zone-append sectors as well	Johannes Thumshirn
	Martin rightfully noted that for normal filesystem IO we have soft limits in place, to prevent them from getting too big and not lead to unpredictable latencies. For zone append we only have the hardware limit in place. Cap the max sectors we submit via zone-append to the maximal number of sectors if the second limit is lower. Reported-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/linux-btrfs/yq1k0w8g3rw.fsf@ca-mkp.ca.oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-07	platform/x86: hp-wmi: add support for thermal policy	Elia Devito
	HP Spectre notebooks (and probably other model as well) support up to 4 thermal policy: - HP Recommended - Performance - Cool - Quiet at least on HP Spectre x360 Convertible 15-df0xxx the firmware sets the thermal policy to default but hardcode the odvp0 variable to 1, this causes thermald to choose the wrong DPTF profile witch result in low performance when notebook is on AC, calling thermal policy write command allow firmware to correctly set the odvp0 variable. Signed-off-by: Elia Devito <eliadevito@gmail.com> Link: https://lore.kernel.org/r/20201004211305.11628-1-eliadevito@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2020-10-07	partitions/ibm: fix non-DASD devices	Christoph Hellwig
	Don't error out if the dasd_biodasdinfo symbol is not available. Cc: stable@vger.kernel.org Fixes: 26d7e28e3820 ("s390/dasd: remove ioctl_by_bdev calls") Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-07	samples: configfs: prefer pr_err() over bare printk(KERN_ERR	Bartosz Golaszewski
	pr_*() printing helpers are preferred over using bare printk(). Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	samples: configfs: don't use spaces before tabs	Bartosz Golaszewski
	The copyright notice alarms checkpatch.pl of usin spaces before tabs. Fix this. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	samples: configfs: consolidate local variables of the same type	Bartosz Golaszewski
	Move local variables of the same type into a single line for better readability. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	samples: configfs: don't reinitialize variables which are already zeroed	Bartosz Golaszewski
	The structure containing the storeme field is allocated using kzalloc(). There's no need to set it to 0 again. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	samples: configfs: replace simple_strtoul() with kstrtoint()	Bartosz Golaszewski
	simple_strtoul() is deprecated. Use kstrtoint(). Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	samples: configfs: fix alignment in item struct	Bartosz Golaszewski
	Aling the assignment of a static structure's field to be consistent with all other instances. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	samples: configfs: drop unnecessary ternary operators	Bartosz Golaszewski
	Checking pointers for NULL value before passing them to container_of() is pointless because even if we return NULL from the ternary operator, none of the users checks the returned value but they instead dereference it unconditionally. AFAICT this cannot really happen either. Simplify the code by removing the ternary operators from to_childless() et al. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	samples: configfs: remove redundant newlines	Bartosz Golaszewski
	There's no need for suplemental newlines in the source file - especially since the examples are well divided with comments already. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	MAINTAINERS: add the sample directory to the configfs entry	Bartosz Golaszewski
	Code samples for configfs don't have an explicit maintainer. Add the samples directory to the existing configfs entry in MAINTAINERS. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-07	pinctrl: amd: Add missing pins to the pin group list	Shyam Sundar S K
	Some of the pins were not exposed in the initial driver or kept as reserved. Exposing all of them now. Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20201007111220.744348-1-Shyam-sundar.S-k@amd.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2020-10-07	Merge branch 'for-next/late-arrivals' into for-next/core	Will Deacon
	Late patches for 5.10: MTE selftests, minor KCSAN preparation and removal of some unused prototypes. (Amit Daniel Kachhap and others) * for-next/late-arrivals: arm64: random: Remove no longer needed prototypes arm64: initialize per-cpu offsets earlier kselftest/arm64: Check mte tagged user address in kernel kselftest/arm64: Verify KSM page merge for MTE pages kselftest/arm64: Verify all different mmap MTE options kselftest/arm64: Check forked child mte memory accessibility kselftest/arm64: Verify mte tag inclusion via prctl kselftest/arm64: Add utilities and a test to validate mte memory
2020-10-07	arm64: random: Remove no longer needed prototypes	Andre Przywara
	Commit 9bceb80b3cc4 ("arm64: kaslr: Use standard early random function") removed the direct calls of the __arm64_rndr() and __early_cpu_has_rndr() functions, but left the dummy prototypes in the #else branch of the #ifdef CONFIG_ARCH_RANDOM guard. Remove the redundant prototypes, as they have no users outside of this header file. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20201006194453.36519-1-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-10-07	ASoC: mchp-spdifrx: fix spelling mistake "overrrun" -> "overrun"	Colin Ian King
	There is a spelling mistake in a dev_warn message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Link: https://lore.kernel.org/r/20201006152024.542418-1-colin.king@canonical.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-07	dt-bindings: staging: wfx: silabs,wfx yaml conversion	Jérôme Pouiller
	The device can be connected on SPI or on SDIO. The original file described the two options separately. So, most of the file had to be rewritten in order to match with the Yaml requirements. Some device requirements are still written in the comments since they cannot been expressed with the current scheme (e.g. reg must be set to 1 with SDIO, interrupt is mandatory with SPI, reset-gpio in SPI is replaced by mmc-pwrseq in SDIO, etc...). The examples provided have also been reworked in order to make dt_binding_check happy. Finally, also fix typo in the name of the file (siliabs instead of silabs) Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20201007101943.749898-7-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-07	staging: wfx: update copyrights dates	Jérôme Pouiller
	Most of the files have been modified in 2020, so update the copyright notices. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20201007101943.749898-6-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-07	staging: wfx: fix QoS priority for slow buses	Jérôme Pouiller
	The device is in charge of respecting the QoS constraints. The driver have to ensure that all the queues contain data and the device choose the right queue to send. The things starts to be more difficult when the bandwidth of the bus is lower than the bandwidth of the WiFi. The device quickly sends the frames of the highest priority queue. Then, it starts to send frames from a lower priority queue. Though, there are still some high priority frames waiting in the driver. To work around this problem, this patch add some priorities to each queue. The weigh of the queue was (roughly) calculated experimentally by checking the speed ratio of each queue when the bus does not limit the traffic: - Be/Bk -> 20Mbps/10Mbps - Vi/Be -> 36Mbps/180Kbps - Vo/Be -> 35Mbps/600Kbps - Vi/Vo -> 24Mbps/12Mbps So, if we fix the weigh of the Background to 1, the weight of Best Effort should be 2. The weight of Video should be 116. However, since there is only 32 queues, it make no sense to use a value greater than 64[1]. And finally, the weight of the Voice is set to 128. [1] Because of this approximation, with very slow bus, we can still observe frame starvation when we measure the speed ratio of Vi/Be. It is around 35Mbps/1Mbps (instead of 36Mbps/180Kbps). However, it is still in accepted error range. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20201007101943.749898-5-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-07	staging: wfx: fix BA sessions for older firmwares	Jérôme Pouiller
	Firmwares with API < 3.6 do not forward DELBA requests. Thus, when a Block Ack session is restarted, the reordering buffer is not flushed and the received sequence number is not contiguous. Therefore, mac80211 starts to wait some missing frames that it will never receive. This patch disables the reordering buffer for old firmware. It is harmless when the network is unencrypted. When the network is encrypted, the non-contiguous frames will be thrown away by the TKIP/CCMP replay protection. So, the user will observe some packet loss with UDP and performance drop with TCP. Fixes: e5da5fbd7741 ("staging: wfx: fix CCMP/TKIP replay protection") Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20201007101943.749898-4-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-07	staging: wfx: remove remaining code of 'secure link' feature	Jérôme Pouiller
	Commit e8d607ce0c81 ("staging: wfx: drop 'secure link' feature") had removed the 'secure link' feature. However, a few lines of codes were yet here. Fixes: e8d607ce0c81 ("staging: wfx: drop 'secure link' feature") Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20201007101943.749898-3-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-07	staging: wfx: fix handling of MMIC error	Jérôme Pouiller
	As expected, when the device detect a MMIC error, it returns a specific status. However, it also strip IV from the frame (don't ask me why). So, with the current code, mac80211 detects a corrupted frame and it drops it before it handle the MMIC error. The expected behavior would be to detect MMIC error then to renegotiate the EAP session. So, this patch correctly informs mac80211 that IV is not available. So, mac80211 correctly takes into account the MMIC error. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20201007101943.749898-2-Jerome.Pouiller@silabs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-07	btrfs: rename BTRFS_INODE_ORDERED_DATA_CLOSE flag	Nikolay Borisov
	Commit 8d875f95da43 ("btrfs: disable strict file flushes for renames and truncates") eliminated the notion of ordered operations and instead BTRFS_INODE_ORDERED_DATA_CLOSE only remained as a flag indicating that a file's content should be synced to disk in case a file is truncated and any writes happen to it concurrently. In fact this intendend behavior was broken until it was fixed in f6dc45c7a93a ("Btrfs: fix filemap_flush call in btrfs_file_release"). All things considered let's give the flag a more descriptive name. Also slightly reword comments. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: annotate device name rcu_string with __rcu	Madhuparna Bhowmik
	This patch fixes the following sparse errors in fs/btrfs/super.c in function btrfs_show_devname() fs/btrfs/super.c: error: incompatible types in comparison expression (different address spaces): fs/btrfs/super.c: struct rcu_string [noderef] <asn:4> * fs/btrfs/super.c: struct rcu_string * The error was because of the following line in function btrfs_show_devname(): if (first_dev) seq_escape(m, rcu_str_deref(first_dev->name), " \t\n\\"); Annotating the btrfs_device::name member with __rcu fixes the sparse error. Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik04@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: skip devices without magic signature when mounting	Anand Jain
	Many things can happen after the device is scanned and before the device is mounted. One such thing is losing the BTRFS_MAGIC on the device. If it happens we still won't free that device from the memory and cause the userland confusion. For example: As the BTRFS_IOC_DEV_INFO still carries the device path which does not have the BTRFS_MAGIC, 'btrfs fi show' still lists device which does not belong to the filesystem anymore: $ mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb $ wipefs -a /dev/sdb # /dev/sdb does not contain magic signature $ mount -o degraded /dev/sda /btrfs $ btrfs fi show -m Label: none uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd Total devices 2 FS bytes used 128.00KiB devid 1 size 3.00GiB used 571.19MiB path /dev/sda devid 2 size 3.00GiB used 571.19MiB path /dev/sdb We need to distinguish the missing signature and invalid superblock, so add a specific error code ENODATA for that. This also fixes failure of fstest btrfs/198. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: cleanup cow block on error	Josef Bacik
	In fstest btrfs/064 a transaction abort in __btrfs_cow_block could lead to a system lockup. It gets stuck trying to write back inodes, and the write back thread was trying to lock an extent buffer: $ cat /proc/2143497/stack [<0>] __btrfs_tree_lock+0x108/0x250 [<0>] lock_extent_buffer_for_io+0x35e/0x3a0 [<0>] btree_write_cache_pages+0x15a/0x3b0 [<0>] do_writepages+0x28/0xb0 [<0>] __writeback_single_inode+0x54/0x5c0 [<0>] writeback_sb_inodes+0x1e8/0x510 [<0>] wb_writeback+0xcc/0x440 [<0>] wb_workfn+0xd7/0x650 [<0>] process_one_work+0x236/0x560 [<0>] worker_thread+0x55/0x3c0 [<0>] kthread+0x13a/0x150 [<0>] ret_from_fork+0x1f/0x30 This is because we got an error while COWing a block, specifically here if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) { ret = btrfs_reloc_cow_block(trans, root, buf, cow); if (ret) { btrfs_abort_transaction(trans, ret); return ret; } } [16402.241552] BTRFS: Transaction aborted (error -2) [16402.242362] WARNING: CPU: 1 PID: 2563188 at fs/btrfs/ctree.c:1074 __btrfs_cow_block+0x376/0x540 [16402.249469] CPU: 1 PID: 2563188 Comm: fsstress Not tainted 5.9.0-rc6+ #8 [16402.249936] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [16402.250525] RIP: 0010:__btrfs_cow_block+0x376/0x540 [16402.252417] RSP: 0018:ffff9cca40e578b0 EFLAGS: 00010282 [16402.252787] RAX: 0000000000000025 RBX: 0000000000000002 RCX: ffff9132bbd19388 [16402.253278] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9132bbd19380 [16402.254063] RBP: ffff9132b41a49c0 R08: 0000000000000000 R09: 0000000000000000 [16402.254887] R10: 0000000000000000 R11: ffff91324758b080 R12: ffff91326ef17ce0 [16402.255694] R13: ffff91325fc0f000 R14: ffff91326ef176b0 R15: ffff9132815e2000 [16402.256321] FS: 00007f542c6d7b80(0000) GS:ffff9132bbd00000(0000) knlGS:0000000000000000 [16402.256973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [16402.257374] CR2: 00007f127b83f250 CR3: 0000000133480002 CR4: 0000000000370ee0 [16402.257867] Call Trace: [16402.258072] btrfs_cow_block+0x109/0x230 [16402.258356] btrfs_search_slot+0x530/0x9d0 [16402.258655] btrfs_lookup_file_extent+0x37/0x40 [16402.259155] __btrfs_drop_extents+0x13c/0xd60 [16402.259628] ? btrfs_block_rsv_migrate+0x4f/0xb0 [16402.259949] btrfs_replace_file_extents+0x190/0x820 [16402.260873] btrfs_clone+0x9ae/0xc00 [16402.261139] btrfs_extent_same_range+0x66/0x90 [16402.261771] btrfs_remap_file_range+0x353/0x3b1 [16402.262333] vfs_dedupe_file_range_one.part.0+0xd5/0x140 [16402.262821] vfs_dedupe_file_range+0x189/0x220 [16402.263150] do_vfs_ioctl+0x552/0x700 [16402.263662] __x64_sys_ioctl+0x62/0xb0 [16402.264023] do_syscall_64+0x33/0x40 [16402.264364] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [16402.264862] RIP: 0033:0x7f542c7d15cb [16402.266901] RSP: 002b:00007ffd35944ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [16402.267627] RAX: ffffffffffffffda RBX: 00000000009d1968 RCX: 00007f542c7d15cb [16402.268298] RDX: 00000000009d2490 RSI: 00000000c0189436 RDI: 0000000000000003 [16402.268958] RBP: 00000000009d2520 R08: 0000000000000036 R09: 00000000009d2e64 [16402.269726] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 [16402.270659] R13: 000000000001f000 R14: 00000000009d1970 R15: 00000000009d2e80 [16402.271498] irq event stamp: 0 [16402.271846] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [16402.272497] hardirqs last disabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0 [16402.273343] softirqs last enabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0 [16402.273905] softirqs last disabled at (0): [<0000000000000000>] 0x0 [16402.274338] ---[ end trace 737874a5a41a8236 ]--- [16402.274669] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry [16402.276179] BTRFS info (device dm-9): forced readonly [16402.277046] BTRFS: error (device dm-9) in btrfs_replace_file_extents:2723: errno=-2 No such entry [16402.278744] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry [16402.279968] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry [16402.280582] BTRFS info (device dm-9): balance: ended with status: -30 The problem here is that as soon as we allocate the new block it is locked and marked dirty in the btree inode. This means that we could attempt to writeback this block and need to lock the extent buffer. However we're not unlocking it here and thus we deadlock. Fix this by unlocking the cow block if we have any errors inside of __btrfs_cow_block, and also free it so we do not leak it. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK	Goldwyn Rodrigues
	Since we now perform direct reads using i_rwsem, we can remove this inode flag used to co-ordinate unlocked reads. The truncate call takes i_rwsem. This means it is correctly synchronized with concurrent direct reads. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	fs: remove no longer used dio_end_io()	Goldwyn Rodrigues
	Since we removed the last user of dio_end_io() when btrfs got converted to iomap infrastructure ("btrfs: switch to iomap for direct IO"), remove the helper function dio_end_io(). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: return error if we're unable to read device stats	Josef Bacik
	I noticed when fixing device stats for seed devices that we simply threw away the return value from btrfs_search_slot(). This is because we may not have stat items, but we could very well get an error, and thus miss reporting the error up the chain. Fix this by returning ret if it's an actual error, and then stop trying to init the rest of the devices stats and return the error up the chain. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: init device stats for seed devices	Josef Bacik
	We recently started recording device stats across the fleet, and noticed a large increase in messages such as this BTRFS warning (device dm-0): get dev_stats failed, not yet valid on our tiers that use seed devices for their root devices. This is because we do not initialize the device stats for any seed devices if we have a sprout device and mount using that sprout device. The basic steps for reproducing are: $ mkfs seed device $ mount seed device # fill seed device $ umount seed device $ btrfstune -S 1 seed device $ mount seed device $ btrfs device add -f sprout device /mnt/wherever $ umount /mnt/wherever $ mount sprout device /mnt/wherever $ btrfs device stats /mnt/wherever This will fail with the above message in dmesg. Fix this by iterating over the fs_devices->seed if they exist in btrfs_init_dev_stats. This fixed the problem and properly reports the stats for both devices. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ rename to btrfs_device_init_dev_stats ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: remove struct extent_io_ops	Nikolay Borisov
	It's no longer used just remove the function and any related code which was initialising it for inodes. No functional changes. Removing 8 bytes from extent_io_tree in turn reduces size of other structures where it is embedded, notably btrfs_inode where it reduces size by 24 bytes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: call submit_bio_hook directly for metadata pages	Nikolay Borisov
	No need to go through a function pointer indirection simply call submit_bio_hook directly by exporting and renaming the helper to btrfs_submit_metadata_bio. This makes the code more readable and should result in somewhat faster code due to no longer paying the price for specualtive attack mitigations that come with indirect function calls. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: stop calling submit_bio_hook for data inodes	Nikolay Borisov
	Instead export and rename the function to btrfs_submit_data_bio and call it directly in submit_one_bio. This avoids paying the cost for speculative attacks mitigations and improves code readability. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: don't opencode is_data_inode in end_bio_extent_readpage	Nikolay Borisov
	Use the is_data_inode helper. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: call submit_bio_hook directly in submit_one_bio	Nikolay Borisov
	BTRFS has 2 inode types (for the purposes of the code in submit_one_bio) - ordinary data inodes (including the freespace inode) and the btree inode. Both of these implement submit_bio_hook so btrfsic_submit_bio can never be called from submit_one_bio so just remove it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: remove extent_io_ops::readpage_end_io_hook	Nikolay Borisov
	It's no longer used so let's remove it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: replace readpage_end_io_hook with direct calls	Nikolay Borisov
	Don't call readpage_end_io_hook for the btree inode. Instead of relying on indirect calls to implement metadata buffer validation simply check if the inode whose page we are processing equals the btree inode. If it does call the necessary function. This is an improvement in 2 directions: 1. We aren't paying the penalty of indirect calls in a post-speculation attacks world. 2. The function is now named more explicitly so it's obvious what's going on This is in preparation to removing struct extent_io_ops altogether. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: send, recompute reference path after orphanization of a directory	Filipe Manana
	During an incremental send, when an inode has multiple new references we might end up emitting rename operations for orphanizations that have a source path that is no longer valid due to a previous orphanization of some directory inode. This causes the receiver to fail since it tries to rename a path that does not exists. Example reproducer: $ cat reproducer.sh #!/bin/bash mkfs.btrfs -f /dev/sdi >/dev/null mount /dev/sdi /mnt/sdi touch /mnt/sdi/f1 touch /mnt/sdi/f2 mkdir /mnt/sdi/d1 mkdir /mnt/sdi/d1/d2 # Filesystem looks like: # # . (ino 256) # \|----- f1 (ino 257) # \|----- f2 (ino 258) # \|----- d1/ (ino 259) # \|----- d2/ (ino 260) btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap1 btrfs send -f /tmp/snap1.send /mnt/sdi/snap1 # Now do a series of changes such that: # # ) inode 258 has one new hardlink and the previous name changed # # ) both names conflict with the old names of two other inodes: # # 1) the new name "d1" conflicts with the old name of inode 259, # under directory inode 256 (root) # # 2) the new name "d2" conflicts with the old name of inode 260 # under directory inode 259 # # ) inodes 259 and 260 now have the old names of inode 258 # # ) inode 257 is now located under inode 260 - an inode with a number # smaller than the inode (258) for which we created a second hard # link and swapped its names with inodes 259 and 260 # ln /mnt/sdi/f2 /mnt/sdi/d1/f2_link mv /mnt/sdi/f1 /mnt/sdi/d1/d2/f1 # Swap d1 and f2. mv /mnt/sdi/d1 /mnt/sdi/tmp mv /mnt/sdi/f2 /mnt/sdi/d1 mv /mnt/sdi/tmp /mnt/sdi/f2 # Swap d2 and f2_link mv /mnt/sdi/f2/d2 /mnt/sdi/tmp mv /mnt/sdi/f2/f2_link /mnt/sdi/f2/d2 mv /mnt/sdi/tmp /mnt/sdi/f2/f2_link # Filesystem now looks like: # # . (ino 256) # \|----- d1 (ino 258) # \|----- f2/ (ino 259) # \|----- f2_link/ (ino 260) # \| \|----- f1 (ino 257) # \| # \|----- d2 (ino 258) btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap2 btrfs send -f /tmp/snap2.send -p /mnt/sdi/snap1 /mnt/sdi/snap2 mkfs.btrfs -f /dev/sdj >/dev/null mount /dev/sdj /mnt/sdj btrfs receive -f /tmp/snap1.send /mnt/sdj btrfs receive -f /tmp/snap2.send /mnt/sdj umount /mnt/sdi umount /mnt/sdj When executed the receive of the incremental stream fails: $ ./reproducer.sh Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1' At subvol /mnt/sdi/snap1 Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2' At subvol /mnt/sdi/snap2 At subvol snap1 At snapshot snap2 ERROR: rename d1/d2 -> o260-6-0 failed: No such file or directory This happens because: 1) When processing inode 257 we end up computing the name for inode 259 because it is an ancestor in the send snapshot, and at that point it still has its old name, "d1", from the parent snapshot because inode 259 was not yet processed. We then cache that name, which is valid until we start processing inode 259 (or set the progress to 260 after processing its references); 2) Later we start processing inode 258 and collecting all its new references into the list sctx->new_refs. The first reference in the list happens to be the reference for name "d1" while the reference for name "d2" is next (the last element of the list). We compute the full path "d1/d2" for this second reference and store it in the reference (its ->full_path member). The path used for the new parent directory was "d1" and not "f2" because inode 259, the new parent, was not yet processed; 3) When we start processing the new references at process_recorded_refs() we start with the first reference in the list, for the new name "d1". Because there is a conflicting inode that was not yet processed, which is directory inode 259, we orphanize it, renaming it from "d1" to "o259-6-0"; 4) Then we start processing the new reference for name "d2", and we realize it conflicts with the reference of inode 260 in the parent snapshot. So we issue an orphanization operation for inode 260 by emitting a rename operation with a destination path of "o260-6-0" and a source path of "d1/d2" - this source path is the value we stored in the reference earlier at step 2), corresponding to the ->full_path member of the reference, however that path is no longer valid due to the orphanization of the directory inode 259 in step 3). This makes the receiver fail since the path does not exists, it should have been "o259-6-0/d2". Fix this by recomputing the full path of a reference before emitting an orphanization if we previously orphanized any directory, since that directory could be a parent in the new path. This is a rare scenario so keeping it simple and not checking if that previously orphanized directory is in fact an ancestor of the inode we are trying to orphanize. A test case for fstests follows soon. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07	btrfs: send, orphanize first all conflicting inodes when processing references	Filipe Manana
	When doing an incremental send it is possible that when processing the new references for an inode we end up issuing rename or link operations that have an invalid path, which contains the orphanized name of a directory before we actually orphanized it, causing the receiver to fail. The following reproducer triggers such scenario: $ cat reproducer.sh #!/bin/bash mkfs.btrfs -f /dev/sdi >/dev/null mount /dev/sdi /mnt/sdi touch /mnt/sdi/a touch /mnt/sdi/b mkdir /mnt/sdi/testdir # We want "a" to have a lower inode number then "testdir" (257 vs 259). mv /mnt/sdi/a /mnt/sdi/testdir/a # Filesystem looks like: # # . (ino 256) # \|----- testdir/ (ino 259) # \| \|----- a (ino 257) # \| # \|----- b (ino 258) btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap1 btrfs send -f /tmp/snap1.send /mnt/sdi/snap1 # Now rename 259 to "testdir_2", then change the name of 257 to # "testdir" and make it a direct descendant of the root inode (256). # Also create a new link for inode 257 with the old name of inode 258. # By swapping the names and location of several inodes and create a # nasty dependency chain of rename and link operations. mv /mnt/sdi/testdir/a /mnt/sdi/a2 touch /mnt/sdi/testdir/a mv /mnt/sdi/b /mnt/sdi/b2 ln /mnt/sdi/a2 /mnt/sdi/b mv /mnt/sdi/testdir /mnt/sdi/testdir_2 mv /mnt/sdi/a2 /mnt/sdi/testdir # Filesystem now looks like: # # . (ino 256) # \|----- testdir_2/ (ino 259) # \| \|----- a (ino 260) # \| # \|----- testdir (ino 257) # \|----- b (ino 257) # \|----- b2 (ino 258) btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap2 btrfs send -f /tmp/snap2.send -p /mnt/sdi/snap1 /mnt/sdi/snap2 mkfs.btrfs -f /dev/sdj >/dev/null mount /dev/sdj /mnt/sdj btrfs receive -f /tmp/snap1.send /mnt/sdj btrfs receive -f /tmp/snap2.send /mnt/sdj umount /mnt/sdi umount /mnt/sdj When running the reproducer, the receive of the incremental send stream fails: $ ./reproducer.sh Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1' At subvol /mnt/sdi/snap1 Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2' At subvol /mnt/sdi/snap2 At subvol snap1 At snapshot snap2 ERROR: link b -> o259-6-0/a failed: No such file or directory The problem happens because of the following: 1) Before we start iterating the list of new references for inode 257, we generate its current path and store it at @valid_path, done at the very beginning of process_recorded_refs(). The generated path is "o259-6-0/a", containing the orphanized name for inode 259; 2) Then we iterate over the list of new references, which has the references "b" and "testdir" in that specific order; 3) We process reference "b" first, because it is in the list before reference "testdir". We then issue a link operation to create the new reference "b" using a target path corresponding to the content at @valid_path, which corresponds to "o259-6-0/a". However we haven't yet orphanized inode 259, its name is still "testdir", and not "o259-6-0". The orphanization of 259 did not happen yet because we will process the reference named "testdir" for inode 257 only in the next iteration of the loop that goes over the list of new references. Fix the issue by having a preliminar iteration over all the new references at process_recorded_refs(). This iteration is responsible only for doing the orphanization of other inodes that have and old reference that conflicts with one of the new references of the inode we are currently processing. The emission of rename and link operations happen now in the next iteration of the new references. A test case for fstests will follow soon. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>