linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2021-12-21	uml/i386: missing include in barrier.h	Al Viro
	we need cpufeatures.h there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	um: stop polluting the namespace with registers.h contents	Al Viro
	Only one extern in there is needed in processor-generic.h, and it's not needed anywhere else. So move it over there and get rid of the include in processor-generic.h, adding includes of registers.h to the few files that need the declarations in it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	logic_io instance of iounmap() needs volatile on argument	Al Viro
	... same as the rest of implementations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	um: move amd64 variant of mmap(2) to arch/x86/um/syscalls_64.c	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	uml: trim unused junk from arch/x86/um/sys_call_table_*.c	Al Viro
	a bunch of detritus there - definitions that are never expanded or checked. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	um: virtio_uml: Fix time-travel external time propagation	Johannes Berg
	When creating an external event, the current time needs to be propagated to other participants of a simulation. This is done in the places here where we kick a virtq etc. However, it must be done for _all_ external events, and that includes making the initial socket connection and later closing it. Call time_travel_propagate_time() to do this before making or closing the socket connection. Apparently, at least for the initial connection creation, due to the remote side in my use cases using microseconds (rather than nanoseconds), this wasn't a problem yet; only started failing between 5.14-rc1 and 5.15-rc1 (didn't test others much), or possibly depending on the configuration, where more delays happen before the virtio devices are initialized. Fixes: 88ce64249233 ("um: Implement time-travel=ext") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	lib/logic_iomem: Fix operation on 32-bit	Johannes Berg
	On 32-bit, the first entry might be at 0/NULL, but that's strange and leads to issues, e.g. where we check "if (ret)". Use a IOREMAP_BIAS/IOREMAP_MASK of 0x80000000UL to avoid this. This then requires reducing the number of areas (via MAX_AREAS), but we still have 128 areas, which is enough. Fixes: ca2e334232b6 ("lib: add iomem emulation (logic_iomem)") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	lib/logic_iomem: Fix 32-bit build	Johannes Berg
	On a 32-bit build, the (unsigned long long) casts throw warnings (or errors) due to being to a different integer size. Cast to uintptr_t first (with the __force for sparse) and then further to get the consistent print on 32 and 64-bit. Fixes: ca2e334232b6 ("lib: add iomem emulation (logic_iomem)") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	um: virt-pci: Fix 32-bit compile	Johannes Berg
	There were a few 32-bit compile warnings that of course turned into errors with -Werror, fix the 32-bit build. Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: - Fix for compilation of selftests on non-x86 architectures - Fix for kvm_run->if_flag on SEV-ES - Fix for page table use-after-free if yielding during exit_mm() - Improve behavior when userspace starts a nested guest with invalid state - Fix missed wakeup with assigned devices but no VT-d posted interrupts - Do not tell userspace to save/restore an unsupported PMU MSR * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU KVM: selftests: Add test to verify TRIPLE_FAULT on invalid L2 guest state KVM: VMX: Fix stale docs for kvm-intel.emulate_invalid_guest_state KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if emulation is required KVM: VMX: Always clear vmx->fail on emulation_required selftests: KVM: Fix non-x86 compiling KVM: x86: Always set kvm_run->if_flag KVM: x86/mmu: Don't advance iterator after restart due to yielding KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all
2021-12-21	um: gitignore: Add kernel/capflags.c	Johannes Berg
	This file is generated, we should ignore it. Fixes: d8fb32f4790f ("um: Add support for host CPU flags and alignment") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: anton.ivanov@cambridgegreys.com Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	um: registers: Rename function names to avoid conflicts and build problems	Randy Dunlap
	The function names init_registers() and restore_registers() are used in several net/ethernet/ and gpu/drm/ drivers for other purposes (not calls to UML functions), so rename them. This fixes multiple build errors. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: linux-um@lists.infradead.org Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	um: Replace if (cond) BUG() with BUG_ON()	Changcheng Deng
	Fix the following coccinelle reports: ./arch/um/kernel/mem.c:89:2-5: WARNING: Use BUG_ON instead of if condition followed by BUG. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	parisc: Fix mask used to select futex spinlock	John David Anglin
	The address bits used to select the futex spinlock need to match those used in the LWS code in syscall.S. The mask 0x3f8 only selects 7 bits. It should select 8 bits. This change fixes the glibc nptl/tst-cond24 and nptl/tst-cond25 tests. Signed-off-by: John David Anglin <dave.anglin@bell.net> Fixes: 53a42b6324b8 ("parisc: Switch to more fine grained lws locks") Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Helge Deller <deller@gmx.de>
2021-12-21	selinux: minor tweaks to selinux_add_opt()	Paul Moore
	Two minor edits to selinux_add_opt(): use "sizeof(*ptr)" instead of "sizeof(type)" in the kzalloc() call, and rename the "Einval" jump target to "err" for the sake of consistency. Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-21	uml: x86: add FORCE to user_constants.h	Johannes Berg
	The build system has started warning when filechk is called without FORCE: arch/x86/um/Makefile:44: FORCE prerequisite is missing Add FORCE to make sure the file is checked/rebuilt when necessary (and to quiet up the warning.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	um: rename set_signals() to um_set_signals()	Johannes Berg
	Rename set_signals() as there's at least one driver that uses the same name and can now be built on UM due to PCI support, and thus we can get symbol conflicts. Also rename set_signals_trace() to be consistent. Reported-by: kernel test robot <lkp@intel.com> Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	um: fix ndelay/udelay defines	Johannes Berg
	Many places in the kernel use 'udelay' as an identifier, and are broken with the current "#define udelay um_udelay". Fix this by adding an argument to the macro, and do the same to 'ndelay' as well, just in case. Fixes: 0bc8fb4dda2b ("um: Implement ndelay/udelay in time-travel mode") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-12-21	parisc: Correct completer in lws start	John David Anglin
	The completer in the "or,ev %r1,%r30,%r30" instruction is reversed, so we are not clipping the LWS number when we are called from a 32-bit process (W=0). We need to nulify the following depdi instruction when the least-significant bit of %r30 is 1. If the %r20 register is not clipped, a user process could perform a LWS call that would branch to an undefined location in the kernel and potentially crash the machine. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Helge Deller <deller@gmx.de>
2021-12-21	Merge tag 'nfsd-5.16-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: "Address a buffer overrun reported by Anatoly Trosinenko" * tag 'nfsd-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Fix READDIR buffer overflow
2021-12-21	dt-bindings: arm,cci-400: Drop the PL330 from example	Rob Herring
	The PL330 was commented out because its binding wasn't converted to a schema. With the binding converted, the example now needs several updates. However, while it's possible that the PL330 has a 'cci-control-port', there aren't any platforms upstream which do. So rather than allowing 'cci-control-port' in the PL330 binding, let's just drop the example. Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>
2021-12-21	dt-bindings: arm: ux500: Document missing compatibles	Stanislav Jakubek
	These compatibles are used in Ux500 device trees, but were not documented so far. Add them to the schema to document them. Signed-off-by: Stanislav Jakubek <stano.jakubek@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211218144927.GA6388@standask-GA-A55M-S2HP
2021-12-21	dt-bindings: power: reset: gpio-restart: Convert to json-schema	Thierry Reding
	Convert the GPIO restart bindings from the free-form text format to json-schema. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211217170042.2740058-1-thierry.reding@gmail.com
2021-12-21	selinux: fix potential memleak in selinux_add_opt()	Bernard Zhao
	This patch try to fix potential memleak in error branch. Fixes: ba6418623385 ("selinux: new helper - selinux_add_opt()") Signed-off-by: Bernard Zhao <bernard@vivo.com> [PM: tweak the subject line, add Fixes tag] Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-21	drm/i915/guc: Request RP0 before loading firmware	Vinay Belgaumkar
	By default, GT (and GuC) run at RPn. Requesting for RP0 before firmware load can speed up DMA and HuC auth as well. In addition to writing to 0xA008, we also need to enable swreq in 0xA024 so that Punit will pay heed to our request. SLPC will restore the frequency back to RPn after initialization, but we need to manually do that for the non-SLPC path. We don't need a manual override in the SLPC disabled case, just use the intel_rps_set function to ensure consistent RPS state. Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211216233022.21351-1-vinay.belgaumkar@intel.com
2021-12-21	iomap: Inline __iomap_zero_iter into its caller	Matthew Wilcox (Oracle)
	To make the merge easier, replicate the inlining of __iomap_zero_iter() into iomap_zero_iter() that is currently in the nvdimm tree. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-12-21	platform/x86: asus-wmi: Reshuffle headers for better maintenance	Andy Shevchenko
	Reshuffle headers in alphabetical order for better maintenance. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20211210163009.19894-3-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/x86: asus-wmi: Split MODULE_AUTHOR() on per author basis	Andy Shevchenko
	There are as many as needed MODULE_AUTHOR() macro entries allowed in the single driver. Split author list to a few macro entries. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20211210163009.19894-2-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/x86: asus-wmi: Join string literals back	Andy Shevchenko
	For easy grepping on debug purposes join string literals back in the messages. No functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20211210163009.19894-1-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/x86: apple-gmux: use resource_size() with res	Wang Qing
	This should be (res->end - res->start + 1) here actually, use resource_size() derectly. Signed-off-by: Wang Qing <wangqing@vivo.com> Link: https://lore.kernel.org/r/1639484316-75873-1-git-send-email-wangqing@vivo.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/x86: amd-pmc: only use callbacks for suspend	Mario Limonciello
	This driver is intended to be used exclusively for suspend to idle so callbacks to send OS_HINT during hibernate and S5 will set OS_HINT at the wrong time leading to an undefined behavior. Cc: stable@vger.kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20211210143529.10594-1-mario.limonciello@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/mellanox: mlxbf-pmc: Fix an IS_ERR() vs NULL bug in ↵	Miaoqian Lin
	mlxbf_pmc_map_counters The devm_ioremap() function returns NULL on error, it doesn't return error pointers. Also according to doc of device_property_read_u64_array, values in info array are properties of device or NULL. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20211210070753.10761-1-linmq006@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	rtw88: support SAR via kernel common API	Zong-Zhe Yang
	Register cfg80211_sar_capa with type NL80211_SAR_TYPE_POWER and four frequency ranges for configurations in unit of 0.25 dBm. And handle callback set_sar_specs. Originally, TX power has three main parameters, i.e. power base, power by rate, and power limit. The formula can be simply considered as TX power = power base + min(power by rate, power limit). With the support of SAR which can be treated as another power limit, there is one more parameter for TX power. And the formula will evolve into TX power = power base + min(power by rate, power limit, power sar). Besides, debugfs tx_pwr_tbl is also refined to show SAR information. The following is an example for the difference. Before supporting SAR, ----------------------------------- ... path rate pwr base (byr lmt ) rem A CCK_1M 66(0x42) 78 -12 ( 12 -12) 0 A CCK_2M 66(0x42) 78 -12 ( 8 -12) 0 ... ----------------------------------- After supporting SAR and making some configurations, ----------------------------------- ... path rate pwr base (byr lmt sar ) rem A CCK_1M 62(0x3e) 78 -16 ( 12 -12 -16) 0 A CCK_2M 62(0x3e) 78 -16 ( 8 -12 -16) 0 ... ----------------------------------- Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20211220093656.65312-1-pkshih@realtek.com
2021-12-21	rtw88: 8822c: add ieee80211_ops::hw_scan	Po-Hao Huang
	Declare this function allows us to use customized scanning policy. By doing so we can be more time efficient on each scan. In order to make existing coex mechanism work as usual, firmware notifies driver on each channel switch event, then decide antenna ownership based on the current channel/band. Do note that this new mechanism affects throughput more than the sw_scan we used to have, but the overall average throughput is not affected since each scan take less time. Since the firmware size is limited, we only support probe requests with custom IEs length under 128 bytes for now, if any user space tools requires more than that, we'll introduce related changes afterwards. For backward compatibility, we fallback to sw_scan when using older firmware that does not support this feature. Signed-off-by: Po-Hao Huang <phhuang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20211221085010.39421-1-pkshih@realtek.com
2021-12-21	Merge tag 'iwlwifi-next-for-kalle-2021-12-21-v2' of ↵	Kalle Valo
	git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next wlwifi patches for v5.17 v2 * Support for Time-Aware-SAR (TAS) as read from the BIOS; * Fix scan timeout issue when 6GHz is enabled; * Work continues for new HW family Bz; * Support for Optimized Connectivity Experience (OCE) scan; * A bunch of FW debugging improvements and fixes; * Fix one 32-bit compilation issue; * Some RX changes for new HW family * Some fixes for 6 GHz scan; * Fix SAR table fixes with newer platforms; * Fix early restart crash; * Small fix in the debugging code; * Add new Killer device IDs; * Datapath updates for Bz family continues; * A couple of important fixes in iwlmei; * Some other small fixes, clean-ups and improvements.
2021-12-21	platform/x86: think-lmi: Prevent underflow in index_store()	Dan Carpenter
	There needs to be a check to prevent negative offsets for setting->index. I have reviewed this code and I think that the "if (block->instance_count <= instance)" check in __query_block() will prevent this from resulting in an out of bounds access. But it's still worth fixing. Fixes: 640a5fa50a42 ("platform/x86: think-lmi: Opcode support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211217071209.GF26548@kili Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/x86: apple-gmux: use resource_size() with res	Wang Qing
	This should be (res->end - res->start + 1) here actually, use resource_size() derectly. Signed-off-by: Wang Qing <wangqing@vivo.com> Link: https://lore.kernel.org/r/1639484316-75873-1-git-send-email-wangqing@vivo.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/x86: amd-pmc: only use callbacks for suspend	Mario Limonciello
	This driver is intended to be used exclusively for suspend to idle so callbacks to send OS_HINT during hibernate and S5 will set OS_HINT at the wrong time leading to an undefined behavior. Cc: stable@vger.kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20211210143529.10594-1-mario.limonciello@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/mellanox: mlxbf-pmc: Fix an IS_ERR() vs NULL bug in ↵	Miaoqian Lin
	mlxbf_pmc_map_counters The devm_ioremap() function returns NULL on error, it doesn't return error pointers. Also according to doc of device_property_read_u64_array, values in info array are properties of device or NULL. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20211210070753.10761-1-linmq006@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list()	Dan Carpenter
	The "bufsize" comes from the root user. If "bufsize" is negative then, because of type promotion, neither of the validation checks at the start of the function are able to catch it: if (bufsize < sizeof(struct xfs_attrlist) \|\| bufsize > XFS_XATTR_LIST_MAX) return -EINVAL; This means "bufsize" will trigger (WARN_ON_ONCE(size > INT_MAX)) in kvmalloc_node(). Fix this by changing the type from int to size_t. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-21	xfs: Fix comments mentioning xfs_ialloc	Yang Xu
	Since kernel commit 1abcf261016e ("xfs: move on-disk inode allocation out of xfs_ialloc()"), xfs_ialloc has been renamed to xfs_init_new_inode. So update this in comments. Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-21	xfs: check sb_meta_uuid for dabuf buffer recovery	Dave Chinner
	Got a report that a repeated crash test of a container host would eventually fail with a log recovery error preventing the system from mounting the root filesystem. It manifested as a directory leaf node corruption on writeback like so: XFS (loop0): Mounting V5 Filesystem XFS (loop0): Starting recovery (logdev: internal) XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=....... 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...).... 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.# 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C...... 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................ 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A...... 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A.... XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed Tracing indicated that we were recovering changes from a transaction at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57. That is, log recovery was overwriting a buffer with newer changes on disk than was in the transaction. Tracing indicated that we were hitting the "recovery immediately" case in xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the buffer. The code was extracting the LSN correctly, then ignoring it because the UUID in the buffer did not match the superblock UUID. The problem arises because the UUID check uses the wrong UUID - it should be checking the sb_meta_uuid, not sb_uuid. This filesystem has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the correct matching sb_meta_uuid in it, it's just the code checked it against the wrong superblock uuid. The is no corruption in the filesystem, and failing to recover the buffer due to a write verifier failure means the recovery bug did not propagate the corruption to disk. Hence there is no corruption before or after this bug has manifested, the impact is limited simply to an unmountable filesystem.... This was missed back in 2015 during an audit of incorrect sb_uuid usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs to validate against sb_meta_uuid") that fixed the magic32 buffers to validate against sb_meta_uuid instead of sb_uuid. It missed the magicda buffers.... Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-12-21	xfs: fix a bug in the online fsck directory leaf1 bestcount check	Darrick J. Wong
	When xfs_scrub encounters a directory with a leaf1 block, it tries to validate that the leaf1 block's bestcount (aka the best free count of each directory data block) is the correct size. Previously, this author believed that comparing bestcount to the directory isize (since directory data blocks are under isize, and leaf/bestfree blocks are above it) was sufficient. Unfortunately during testing of online repair, it was discovered that it is possible to create a directory with a hole between the last directory block and isize. The directory code seems to handle this situation just fine and xfs_repair doesn't complain, which effectively makes this quirk part of the disk format. Fix the check to work properly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-12-21	xfs: only run COW extent recovery when there are no live extents	Darrick J. Wong
	As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. In the first patch, we fixed the race condition in (2) so that (A) will always flush the COW fork. In this second patch, we move the _recover_cow call to the initial mount call in (0) for safety. As mentioned previously, xfs_reflink_recover_cow walks the refcount btree looking for COW staging extents, and frees them. This was intended to be run at mount time (when we know there are no live inodes) to clean up any leftover staging events that may have been left behind during an unclean shutdown. As a time "optimization" for readonly mounts, we deferred this to the ro->rw transition, not realizing that any failure to clean all COW forks during a rw->ro transition would result in catastrophic corruption. Therefore, remove this optimization and only run the recovery routine when we're guaranteed not to have any COW staging extents anywhere, which means we always run this at mount time. While we're at it, move the callsite to xfs_log_mount_finish because any refcount btree expansion (however unlikely given that we're removing records from the right side of the index) must be fed by a per-AG reservation, which doesn't exist in its current location. Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-12-21	xfs: don't expose internal symlink metadata buffers to the vfs	Darrick J. Wong
	Ian Kent reported that for inline symlinks, it's possible for vfs_readlink to hang on to the target buffer returned by _vn_get_link_inline long after it's been freed by xfs inode reclaim. This is a layering violation -- we should never expose XFS internals to the VFS. When the symlink has a remote target, we allocate a separate buffer, copy the internal information, and let the VFS manage the new buffer's lifetime. Let's adapt the inline code paths to do this too. It's less efficient, but fixes the layering violation and avoids the need to adapt the if_data lifetime to rcu rules. Clearly I don't care about readlink benchmarks. As a side note, this fixes the minor locking violation where we can access the inode data fork without taking any locks; proper locking (and eliminating the possibility of having to switch inode_operations on a live inode) is essential to online repair coordinating repairs correctly. Reported-by: Ian Kent <raven@themaw.net> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-12-21	xfs: fix quotaoff mutex usage now that we don't support disabling it	Darrick J. Wong
	Prior to commit 40b52225e58c ("xfs: remove support for disabling quota accounting on a mounted file system"), we used the quotaoff mutex to protect dquot operations against quotaoff trying to pull down dquots as part of disabling quota. Now that we only support turning off quota enforcement, the quotaoff mutex only protects changes in m_qflags/sb_qflags. We don't need it to protect dquots, which means we can remove it from setqlimits and the dquot scrub code. While we're at it, fix the function that forces quotacheck, since it should have been taking the quotaoff mutex. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-12-21	xfs: shut down filesystem if we xfs_trans_cancel with deferred work items	Darrick J. Wong
	While debugging some very strange rmap corruption reports in connection with the online directory repair code. I root-caused the error to the following incorrect sequence: <start repair transaction> <expand directory, causing a deferred rmap to be queued> <roll transaction> <cancel transaction> Obviously, we should have committed the transaction instead of cancelling it. Thinking more broadly, however, xfs_trans_cancel should have warned us that we were throwing away work item that we already committed to performing. This is not correct, and we need to shut down the filesystem. Change xfs_trans_cancel to complain in the loudest manner if we're cancelling any transaction with deferred work items attached. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-12-21	platform/x86: amd-pmc: Add support for AMD Smart Trace Buffer	Sanket Goswami
	STB (Smart Trace Buffer), is a debug trace buffer that isolates the failures by analyzing the last running feature of a system. This non-intrusive way always runs in the background and stores the trace into the SoC. This patch enables the STB feature by passing module param "enable_stb=1" while loading the driver and provides mechanism to access the STB buffer using the read and write routines. Co-developed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Sanket Goswami <Sanket.Goswami@amd.com> Link: https://lore.kernel.org/r/20211130112318.92850-3-Sanket.Goswami@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	platform/x86: amd-pmc: Simplify error handling and store the pci_dev in ↵	Sanket Goswami
	amd_pmc_dev structure Handle error-exits in the amd_pmc_probe() to avoid duplication and store the root port information in amd_pmc_probe() so that the information can be used across multiple routines. Suggested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sanket Goswami <Sanket.Goswami@amd.com> Link: https://lore.kernel.org/r/20211130112318.92850-2-Sanket.Goswami@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-12-21	KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU	Sean Christopherson
	Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() \| \|-> eventfd_signal() \| \|-> ... \| \|-> irqfd_wakeup() \| \|->kvm_arch_set_irq_inatomic() \| \|-> kvm_irq_delivery_to_apic_fast() \| \|-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@vger.kernel.org Reported-by: Longpeng (Mike) <longpeng2@huawei.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>