linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-07-23	netfilter: nfnl_hook: fix unused variable warning	Arnd Bergmann
	The only user of this variable is in an #ifdef: net/netfilter/nfnetlink_hook.c: In function 'nfnl_hook_entries_head': net/netfilter/nfnetlink_hook.c:177:28: error: unused variable 'netdev' [-Werror=unused-variable] Fixes: e2cf17d3774c ("netfilter: add new hook nfnl subsystem") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23	tracing/histogram: Rename "cpu" to "common_cpu"	Steven Rostedt (VMware)
	Currently the histogram logic allows the user to write "cpu" in as an event field, and it will record the CPU that the event happened on. The problem with this is that there's a lot of events that have "cpu" as a real field, and using "cpu" as the CPU it ran on, makes it impossible to run histograms on the "cpu" field of events. For example, if I want to have a histogram on the count of the workqueue_queue_work event on its cpu field, running: ># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger Gives a misleading and wrong result. Change the command to "common_cpu" as no event should have "common_*" fields as that's a reserved name for fields used by all events. And this makes sense here as common_cpu would be a field used by all events. Now we can even do: ># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger ># cat events/workqueue/workqueue_queue_work/hist # event histogram # # trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active] # { common_cpu: 0, cpu: 2 } hitcount: 1 { common_cpu: 0, cpu: 4 } hitcount: 1 { common_cpu: 7, cpu: 7 } hitcount: 1 { common_cpu: 0, cpu: 7 } hitcount: 1 { common_cpu: 0, cpu: 1 } hitcount: 1 { common_cpu: 0, cpu: 6 } hitcount: 2 { common_cpu: 0, cpu: 5 } hitcount: 2 { common_cpu: 1, cpu: 1 } hitcount: 4 { common_cpu: 6, cpu: 6 } hitcount: 4 { common_cpu: 5, cpu: 5 } hitcount: 14 { common_cpu: 4, cpu: 4 } hitcount: 26 { common_cpu: 0, cpu: 0 } hitcount: 39 { common_cpu: 2, cpu: 2 } hitcount: 184 Now for backward compatibility, I added a trick. If "cpu" is used, and the field is not found, it will fall back to "common_cpu" and work as it did before. This way, it will still work for old programs that use "cpu" to get the actual CPU, but if the event has a "cpu" as a field, it will get that event's "cpu" field, which is probably what it wants anyway. I updated the tracefs/README to include documentation about both the common_timestamp and the common_cpu. This way, if that text is present in the README, then an application can know that common_cpu is supported over just plain "cpu". Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: 8b7622bf94a44 ("tracing: Add cpu field for hist triggers") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-23	tracing: Synthetic event field_pos is an index not a boolean	Steven Rostedt (VMware)
	Performing the following: ># echo 'wakeup_lat s32 pid; u64 delta; char wake_comm[]' > synthetic_events ># echo 'hist:keys=pid:__arg__1=common_timestamp.usecs' > events/sched/sched_waking/trigger ># echo 'hist:keys=next_pid:pid=next_pid,delta=common_timestamp.usecs-$__arg__1:onmatch(sched.sched_waking).trace(wakeup_lat,$pid,$delta,prev_comm)'\ > events/sched/sched_switch/trigger ># echo 1 > events/synthetic/enable Crashed the kernel: BUG: kernel NULL pointer dereference, address: 000000000000001b #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.13.0-rc5-test+ #104 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:strlen+0x0/0x20 Code: f6 82 80 2b 0b bc 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2b 0b bc 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 9 f8 c3 31 RSP: 0018:ffffaa75000d79d0 EFLAGS: 00010046 RAX: 0000000000000002 RBX: ffff9cdb55575270 RCX: 0000000000000000 RDX: ffff9cdb58c7a320 RSI: ffffaa75000d7b40 RDI: 000000000000001b RBP: ffffaa75000d7b40 R08: ffff9cdb40a4f010 R09: ffffaa75000d7ab8 R10: ffff9cdb4398c700 R11: 0000000000000008 R12: ffff9cdb58c7a320 R13: ffff9cdb55575270 R14: ffff9cdb58c7a000 R15: 0000000000000018 FS: 0000000000000000(0000) GS:ffff9cdb5aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001b CR3: 00000000c0612006 CR4: 00000000001706e0 Call Trace: trace_event_raw_event_synth+0x90/0x1d0 action_trace+0x5b/0x70 event_hist_trigger+0x4bd/0x4e0 ? cpumask_next_and+0x20/0x30 ? update_sd_lb_stats.constprop.0+0xf6/0x840 ? __lock_acquire.constprop.0+0x125/0x550 ? find_held_lock+0x32/0x90 ? sched_clock_cpu+0xe/0xd0 ? lock_release+0x155/0x440 ? update_load_avg+0x8c/0x6f0 ? enqueue_entity+0x18a/0x920 ? __rb_reserve_next+0xe5/0x460 ? ring_buffer_lock_reserve+0x12a/0x3f0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x1ae/0x240 trace_event_raw_event_sched_switch+0x114/0x170 __traceiter_sched_switch+0x39/0x50 __schedule+0x431/0xb00 schedule_idle+0x28/0x40 do_idle+0x198/0x2e0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb The reason is that the dynamic events array keeps track of the field position of the fields array, via the field_pos variable in the synth_field structure. Unfortunately, that field is a boolean for some reason, which means any field_pos greater than 1 will be a bug (in this case it was 2). Link: https://lkml.kernel.org/r/20210721191008.638bce34@oasis.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-23	arm64: dts: armada-3720-turris-mox: remove mrvl,i2c-fast-mode	Pali Rohár
	Some SFP modules are not detected when i2c-fast-mode is enabled even when clock-frequency is already set to 100000. The I2C bus violates the timing specifications when run in fast mode. So disable fast mode on Turris Mox. Same change was already applied for uDPU (also Armada 3720 board with SFP) in commit fe3ec631a77d ("arm64: dts: uDPU: remove i2c-fast-mode"). Fixes: 7109d817db2e ("arm64: dts: marvell: add DTS for Turris Mox") Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: Marek Behún <kabel@kernel.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2021-07-23	netfilter: nft_nat: allow to specify layer 4 protocol NAT only	Pablo Neira Ayuso
	nft_nat reports a bogus EAFNOSUPPORT if no layer 3 information is specified. Fixes: d07db9884a5f ("netfilter: nf_tables: introduce nft_validate_register_load()") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23	netfilter: conntrack: adjust stop timestamp to real expiry value	Florian Westphal
	In case the entry is evicted via garbage collection there is delay between the timeout value and the eviction event. This adjusts the stop value based on how much time has passed. Fixes: b87a2f9199ea82 ("netfilter: conntrack: add gc worker to remove timed-out entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23	netfilter: nft_last: avoid possible false sharing	Pablo Neira Ayuso
	Use the idiom described in: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance Moreover, prevent a compiler optimization. Fixes: 836382dc2471 ("netfilter: nf_tables: add last expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23	netfilter: flowtable: avoid possible false sharing	Pablo Neira Ayuso
	The flowtable follows the same timeout approach as conntrack, use the same idiom as in cc16921351d8 ("netfilter: conntrack: avoid same-timeout update") but also include the fix provided by e37542ba111f ("netfilter: conntrack: avoid possible false sharing"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-23	ext4: remove conflicting comment from __ext4_forget	Guoqing Jiang
	We do a bforget and return for no journal case, so let's remove this conflict comment. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Link: https://lore.kernel.org/r/20210714055940.1553705-1-guoqing.jiang@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-23	ext4: fix potential uninitialized access to retval in kmmpd	Ye Bin
	if (!ext4_has_feature_mmp(sb)) then retval can be unitialized before we jump to the wait_to_exit label. Fixes: 61bb4a1c417e ("ext4: fix possible UAF when remounting r/o a mmp-protected file system") Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20210713022728.2533770-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-07-23	arm64: dts: armada-3720-turris-mox: fixed indices for the SDHC controllers	Vladimir Oltean
	Since drivers/mmc/host/sdhci-xenon.c declares the PROBE_PREFER_ASYNCHRONOUS probe type, it is not guaranteed whether /dev/mmcblk0 will belong to sdhci0 or sdhci1. In turn, this will break booting by: root=/dev/mmcblk0p1 Fix the issue by adding aliases so that the old MMC controller indices are preserved. Fixes: 7320915c8861 ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.14") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2021-07-23	cfg80211: Fix possible memory leak in function cfg80211_bss_update	Nguyen Dinh Phi
	When we exceed the limit of BSS entries, this function will free the new entry, however, at this time, it is the last door to access the inputed ies, so these ies will be unreferenced objects and cause memory leak. Therefore we should free its ies before deallocating the new entry, beside of dropping it from hidden_list. Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Link: https://lore.kernel.org/r/20210628132334.851095-1-phind.uet@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23	nl80211: limit band information in non-split data	Johannes Berg
	In non-split data, we shouldn't be adding S1G and 6 GHz data (or future bands) since we're really close to the 4k message size limit. Remove those bands, any modern userspace that can use S1G or 6 GHz should already be using split dumps, and if not then it needs to update. Link: https://lore.kernel.org/r/20210712215329.31444162a2c2.I5555312e4a074c84f8b4e7ad79dc4d1fbfc5126c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23	virt_wifi: fix error on connect	Matteo Croce
	When connecting without first doing a scan, the BSS list is empty and __cfg80211_connect_result() generates this warning: $ iw dev wlan0 connect -w VirtWifi [ 15.371989] ------------[ cut here ]------------ [ 15.372179] WARNING: CPU: 0 PID: 92 at net/wireless/sme.c:756 __cfg80211_connect_result+0x402/0x440 [ 15.372383] CPU: 0 PID: 92 Comm: kworker/u2:2 Not tainted 5.13.0-kvm #444 [ 15.372512] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-3.fc34 04/01/2014 [ 15.372597] Workqueue: cfg80211 cfg80211_event_work [ 15.372756] RIP: 0010:__cfg80211_connect_result+0x402/0x440 [ 15.372818] Code: 48 2b 04 25 28 00 00 00 75 59 48 8b 3b 48 8b 76 10 48 8d 65 e0 5b 41 5c 41 5d 41 5e 5d 49 8d 65 f0 41 5d e9 d0 d4 fd ff 0f 0b <0f> 0b e9 f6 fd ff ff e8 f2 4a b4 ff e9 ec fd ff ff 0f 0b e9 19 fd [ 15.372966] RSP: 0018:ffffc900005cbdc0 EFLAGS: 00010246 [ 15.373022] RAX: 0000000000000000 RBX: ffff8880028e2400 RCX: ffff8880028e2472 [ 15.373088] RDX: 0000000000000002 RSI: 00000000fffffe01 RDI: ffffffff815335ba [ 15.373149] RBP: ffffc900005cbe00 R08: 0000000000000008 R09: ffff888002bdf8b8 [ 15.373209] R10: ffff88803ec208f0 R11: ffffffffffffe9ae R12: ffff88801d687d98 [ 15.373280] R13: ffff88801b5fe000 R14: ffffc900005cbdc0 R15: dead000000000100 [ 15.373330] FS: 0000000000000000(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 15.373382] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.373425] CR2: 000056421c468958 CR3: 000000001b458001 CR4: 0000000000170eb0 [ 15.373478] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 15.373529] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 15.373580] Call Trace: [ 15.373611] ? cfg80211_process_wdev_events+0x10e/0x170 [ 15.373743] cfg80211_process_wdev_events+0x10e/0x170 [ 15.373783] cfg80211_process_rdev_events+0x21/0x40 [ 15.373846] cfg80211_event_work+0x20/0x30 [ 15.373892] process_one_work+0x1e9/0x340 [ 15.373956] worker_thread+0x4b/0x3f0 [ 15.374017] ? process_one_work+0x340/0x340 [ 15.374053] kthread+0x11f/0x140 [ 15.374089] ? set_kthread_struct+0x30/0x30 [ 15.374153] ret_from_fork+0x1f/0x30 [ 15.374187] ---[ end trace 321ef0cb7e9c0be1 ]--- wlan0 (phy #0): connected to 00:00:00:00:00:00 Add the fake bss just before the connect so that cfg80211_get_bss() finds the virtual network. As some code was duplicated, move it in a common function. Signed-off-by: Matteo Croce <mcroce@microsoft.com> Link: https://lore.kernel.org/r/20210706154423.11065-1-mcroce@linux.microsoft.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23	mac80211: fix enabling 4-address mode on a sta vif after assoc	Felix Fietkau
	Notify the driver about the 4-address mode change and also send a nulldata packet to the AP to notify it about the change Fixes: 1ff4e8f2dec8 ("mac80211: notify the driver when a sta uses 4-address mode") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210702050111.47546-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23	mac80211: fix starting aggregation sessions on mesh interfaces	Felix Fietkau
	The logic for starting aggregation sessions was recently moved from minstrel_ht to mac80211, into the subif tx handler just after the sta lookup. Unfortunately this didn't work for mesh interfaces, since the sta lookup is deferred until a much later point in time on those. Fix this by also calling the aggregation check right after the deferred sta lookup. Fixes: 08a46c642001 ("mac80211: move A-MPDU session check from minstrel_ht to mac80211") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210629112853.29785-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23	mac80211: Do not strip skb headroom on monitor frames	Johan Almbladh
	When a monitor interface is present together with other interfaces, a received skb is copied and received on the monitor netdev. Before, the copied skb was allocated with exactly the amount of space needed for the radiotap header, resulting in an skb without any headroom at all being received on the monitor netdev. With the introduction of eBPF and XDP in the kernel, skbs may be processed by custom eBPF programs. However, since the skb cannot be reallocated in the eBPF program, no more data or headers can be pushed. The old code made sure the final headroom was zero regardless of the value of NET_SKB_PAD, so increasing that constant would have no effect. Now we allocate monitor skb copies with a headroom of NET_SKB_PAD bytes before the radiotap header. Monitor interfaces now behave in the same way as other netdev interfaces that honor the NET_SKB_PAD constant. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Link: https://lore.kernel.org/r/20210628123713.2070753-1-johan.almbladh@anyfinetworks.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-07-23	ARM: dts: imx: Swap M53Menlo pinctrl_power_button/pinctrl_power_out pins	Marek Vasut
	The pinctrl_power_button/pinctrl_power_out each define single GPIO pinmux, except it is exactly the other one than the matching gpio-keys and gpio-poweroff DT nodes use for that functionality. Swap the two GPIOs to correct this error. Fixes: 50d29fdb765d ("ARM: dts: imx53: Add power GPIOs on M53Menlo") Signed-off-by: Marek Vasut <marex@denx.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Fabio Estevam <festevam@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-07-23	ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init	Colin Ian King
	The function imx_mmdc_perf_init recently had a 3rd argument added to it but the equivalent macro was not updated and is still the older 2 argument version. Fix this by adding in the missing 3rd argumement mmdc_ipg_clk. Fixes: f07ec8536580 ("ARM: imx: add missing clk_disable_unprepare()") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-07-23	KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state	Nicholas Piggin
	The H_ENTER_NESTED hypercall is handled by the L0, and it is a request by the L1 to switch the context of the vCPU over to that of its L2 guest, and return with an interrupt indication. The L1 is responsible for switching some registers to guest context, and the L0 switches others (including all the hypervisor privileged state). If the L2 MSR has TM active, then the L1 is responsible for recheckpointing the L2 TM state. Then the L1 exits to L0 via the H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit, and then it recheckpoints the TM state as part of the nested entry and finally HRFIDs into the L2 with TM active MSR. Not efficient, but about the simplest approach for something that's horrendously complicated. Problems arise if the L1 exits to the L0 with a TM state which does not match the L2 TM state being requested. For example if the L1 is transactional but the L2 MSR is non-transactional, or vice versa. The L0's HRFID can take a TM Bad Thing interrupt and crash. Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then ensuring that if the L1 is suspended then the L2 must have TM active, and if the L1 is not suspended then the L2 must not have TM active. Fixes: 360cae313702 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall") Cc: stable@vger.kernel.org # v4.20+ Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-07-23	KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow	Nicholas Piggin
	The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on the rtas_args.nargs that was provided by the guest. That guest nargs value is not range checked, so the guest can cause the host rets pointer to be pointed outside the args array. The individual rtas function handlers check the nargs and nrets values to ensure they are correct, but if they are not, the handlers store a -3 (0xfffffffd) failure indication in rets[0] which corrupts host memory. Fix this by testing up front whether the guest supplied nargs and nret would exceed the array size, and fail the hcall directly without storing a failure indication to rets[0]. Also expand on a comment about why we kill the guest and try not to return errors directly if we have a valid rets[0] pointer. Fixes: 8e591cb72047 ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls") Cc: stable@vger.kernel.org # v3.10+ Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-07-23	pcmcia: i82092: fix a null pointer dereference bug	Zheyu Ma
	During the driver loading process, the 'dev' field was not assigned, but the 'dev' field was referenced in the subsequent 'i82092aa_set_mem_map' function. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> CC: <stable@vger.kernel.org> [linux@dominikbrodowski.net: shorten commit message, add Cc to stable] Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2021-07-22	riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE	Alexandre Ghiti
	The check that is done in setup_bootmem currently only works for 32-bit kernel since the kernel mapping has been moved outside of the linear mapping for 64-bit kernel. So make sure that for 64-bit kernel, the kernel mapping does not overlap with the last 4K of the addressable memory. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-22	riscv: Make sure the linear mapping does not use the kernel mapping	Alexandre Ghiti
	For 64-bit kernel, the end of the address space is occupied by the kernel mapping and currently, the functions to populate the kernel page tables (i.e. create_p*d_mapping) do not override existing mapping so we must make sure the linear mapping does not map memory in the kernel mapping by clipping the memory above the memory limit. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Fixes: c9811e379b21 ("riscv: Add mem kernel parameter support") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-22	Merge tag 'drm-fixes-2021-07-23' of git://anongit.freedesktop.org/drm/drm	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Regular fixes - a bunch of amdgpu fixes are the main thing mostly for the new gpus. There is also some i915 reverts for older changes that were having some unwanted side effects. One nouveau fix for a report regressions, and otherwise just some misc fixes. core: - fix for non-drm ioctls on drm fd panel: - avoid double free ttm: - refcounting fix - NULL checks amdgpu: - Yellow Carp updates - Add some Yellow Carp DIDs - Beige Goby updates - CIK 10bit 4K regression fix - GFX10 golden settings updates - eDP panel regression fix - Misc display fixes - Aldebaran fix - fix COW checks nouveau: - init BO GEM fields i915: - revert async command parsing - revert fence error propogation - GVT fix for shadow ppgtt vc4: - fix interrupt handling" * tag 'drm-fixes-2021-07-23' of git://anongit.freedesktop.org/drm/drm: (34 commits) drm/panel: raspberrypi-touchscreen: Prevent double-free drm/amdgpu - Corrected the video codecs array name for yellow carp drm/amd/display: Fix ASSR regression on embedded panels drm/amdgpu: add yellow carp pci id (v2) drm/amdgpu: update yellow carp external rev_id handling drm/amd/pm: Support board calibration on aldebaran drm/amd/display: change zstate allow msg condition drm/amd/display: Populate dtbclk entries for dcn3.02/3.03 drm/amd/display: Line Buffer changes drm/amd/display: Remove MALL function from DCN3.1 drm/amd/display: Only set default brightness for OLED drm/amd/display: Update bounding box for DCN3.1 drm/amd/display: Query VCO frequency from register for DCN3.1 drm/amd/display: Populate socclk entries for dcn3.02/3.03 drm/amd/display: Fix max vstartup calculation for modes with borders drm/amd/display: implement workaround for riommu related hang drm/amd/display: Fix comparison error in dcn21 DML drm/i915: Correct the docs for intel_engine_cmd_parser drm/ttm: add missing NULL checks drm/ttm: Force re-init if ttm_global_init() fails ...
2021-07-22	riscv: Fix memory_limit for 64-bit kernel	Alexandre Ghiti
	As described in Documentation/riscv/vm-layout.rst, the end of the virtual address space for 64-bit kernel is occupied by the modules/BPF/ kernel mappings so this actually reduces the amount of memory we are able to map and then use in the linear mapping. So make sure this limit is correctly set. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-07-23	ARM: dts: colibri-imx6ull: limit SDIO clock to 25MHz	Oleksandr Suvorov
	NXP and AzureWave don't recommend using SDIO bus mode 3.3V@50MHz due to noise affecting the wireless throughput. Colibri iMX6ULL uses only 3.3V signaling for Wi-Fi module AW-CM276NF. Limit the SDIO Clock on Colibri iMX6ULL to 25MHz. Fixes: c2e4987e0e02 ("ARM: dts: imx6ull: add Toradex Colibri iMX6ULL support") Signed-off-by: Oleksandr Suvorov <oleksandr.suvorov@toradex.com> Reviewed-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-07-23	arm64: dts: ls1028: sl28: fix networking for variant 2	Michael Walle
	The PHY configuration for the variant 2 is still missing the flag for in-band signalling between PHY and MAC. Both sides - MAC and PHY - have to match the setting. For now, Linux only supports setting the MAC side and thus it has to match the setting the bootloader is configuring. Enable in-band signalling to make ethernet work. Fixes: ab43f0307449 ("arm64: dts: ls1028a: sl28: add support for variant 2") Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2021-07-22	cifs: fix fallocate when trying to allocate a hole.	Ronnie Sahlberg
	Remove the conditional checking for out_data_len and skipping the fallocate if it is 0. This is wrong will actually change any legitimate the fallocate where the entire region is unallocated into a no-op. Additionally, before allocating the range, if FALLOC_FL_KEEP_SIZE is set then we need to clamp the length of the fallocate region as to not extend the size of the file. Fixes: 966a3cb7c7db ("cifs: improve fallocate emulation") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-22	Merge tag 'fallthrough-fixes-clang-5.14-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fix from Gustavo Silva: "Fix a fall-through warning when building with -Wimplicit-fallthrough on PowerPC" * tag 'fallthrough-fixes-clang-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: powerpc/pasemi: Fix fall-through warning for Clang
2021-07-23	Merge tag 'drm-misc-fixes-2021-07-22' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * Return -ENOTTY for non-DRM ioctls * amdgpu: Fix COW checks * nouveau: init BO GME fields * panel: Avoid double free * ttm: Fix refcounting in ttm_global_init(); NULL checks * vc4: Fix interrupt handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/YPlbkmH6S4VAHP9j@linux-uq9g.fritz.box
2021-07-23	Merge tag 'drm-intel-fixes-2021-07-22' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Couple reverts from Jason getting rid of asynchronous command parsing and fence error propagation and a GVT fix of shadow ppgtt invalidation with proper D3 state tracking from Colin. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YPl1sIyruD0U5Orl@intel.com
2021-07-22	io_uring: fix early fdput() of file	Jens Axboe
	A previous commit shuffled some code around, and inadvertently used struct file after fdput() had been called on it. As we can't touch the file post fdput() dropping our reference, move the fdput() to after that has been done. Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/io-uring/YPnqM0fY3nM5RdRI@zeniv-ca.linux.org.uk/ Fixes: f2a48dd09b8e ("io_uring: refactor io_sq_offload_create()") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-22	Merge tag 'array-bounds-fixes-5.14-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull array bounds warning fix from Gustavo Silva: "Fix a couple of out-of-bounds warnings in the media subsystem. This is part of the ongoing efforts to globally enable -Warray-bounds" * tag 'array-bounds-fixes-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
2021-07-22	Merge tag 'nvme-5.14-2021-07-22' of git://git.infradead.org/nvme into block-5.14	Jens Axboe
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.14: - tracing fix (Keith Busch) - fix multipath head refcounting (Hannes Reinecke) - Write Zeroes vs PI fix (me) - drop a bogus WARN_ON (Zhihao Cheng)" * tag 'nvme-5.14-2021-07-22' of git://git.infradead.org/nvme: nvme: set the PRACT bit when using Write Zeroes with T10 PI nvme: fix nvme_setup_command metadata trace event nvme: fix refcounting imbalance when all paths are down nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING
2021-07-22	CIFS: Clarify SMB1 code for POSIX delete file	Steve French
	Coverity also complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes) for SMB1 CIFSPOSIXDelFile. This changeset doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711519 ("Out of bounds write") Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-22	Merge tag 'usb-serial-5.14-rc3' of ↵	Greg Kroah-Hartman
	https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial fixes for 5.14-rc3 Here are some new device ids and a device-id comment fix. All have been in linux-next with no reported issues. * tag 'usb-serial-5.14-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick USB: serial: cp210x: fix comments for GE CS1000 USB: serial: option: add support for u-blox LARA-R6 family
2021-07-22	CIFS: Clarify SMB1 code for POSIX Create	Steve French
	Coverity also complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes) for SMB1 CIFSPOSIXCreate. This changeset doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711518 ("Out of bounds write") Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-22	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A pair of arm64 fixes for -rc3. The straightforward one is a fix to our firmware calling stub, which accidentally started corrupting the link register on machines with SVE. Since these machines don't really exist yet, it wasn't spotted in -next. The other fix is a revert-and-a-bit of a patch originally intended to allow PTE-level huge mappings for the VMAP area on 32-bit PPC 8xx. A side-effect of this change was that our pXd_set_huge() implementations could be replaced with generic dummy functions depending on the levels of page-table being used, which in turn broke the boot if we fail to create the linear mapping as a result of using these functions to operate on the pgd. Huge thanks to Michael Ellerman for modifying the revert so as not to regress PPC 8xx in terms of functionality. Anyway, that's the background and it's also available in the commit message along with Link tags pointing at all of the fun. Summary: - Fix hang when issuing SMC on SVE-capable system due to clobbered LR - Fix boot failure due to missing block mappings with folded page-table" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: Revert "mm/pgtable: add stubs for {pmd/pub}_{set/clear}_huge" arm64: smccc: Save lr before calling __arm_smccc_sve_check()
2021-07-22	Merge tag 'hyperv-fixes-signed-20210722' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - bug fix from Haiyang for vmbus CPU assignment - revert of a bogus patch that went into 5.14-rc1 * tag 'hyperv-fixes-signed-20210722' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Revert "x86/hyperv: fix logical processor creation" Drivers: hv: vmbus: Fix duplicate CPU assignments within a device
2021-07-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) Fix type of bind option flag in af_xdp, from Baruch Siach. 2) Fix use after free in bpf_xdp_link_release(), from Xuan Zhao. 3) PM refcnt imbakance in r8152, from Takashi Iwai. 4) Sign extension ug in liquidio, from Colin Ian King. 5) Mising range check in s390 bpf jit, from Colin Ian King. 6) Uninit value in caif_seqpkt_sendmsg(), from Ziyong Xuan. 7) Fix skb page recycling race, from Ilias Apalodimas. 8) Fix memory leak in tcindex_partial_destroy_work, from Pave Skripkin. 9) netrom timer sk refcnt issues, from Nguyen Dinh Phi. 10) Fix data races aroun tcp's tfo_active_disable_stamp, from Eric Dumazet. 11) act_skbmod should only operate on ethernet packets, from Peilin Ye. 12) Fix slab out-of-bpunds in fib6_nh_flush_exceptions(),, from Psolo Abeni. 13) Fix sparx5 dependencies, from Yajun Deng. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits) dpaa2-switch: seed the buffer pool after allocating the swp net: sched: cls_api: Fix the the wrong parameter net: sparx5: fix unmet dependencies warning net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum net: dsa: ensure linearized SKBs in case of tail taggers ravb: Remove extra TAB ravb: Fix a typo in comment net: dsa: sja1105: make VID 4095 a bridge VLAN too tcp: disable TFO blackhole logic by default sctp: do not update transport pathmtu if SPP_PMTUD_ENABLE is not set net: ixp46x: fix ptp build failure ibmvnic: Remove the proper scrq flush selftests: net: add ESP-in-UDP PMTU test udp: check encap socket in __udp_lib_err sctp: update active_key for asoc when old key is being replaced r8169: Avoid duplicate sysfs entry creation error ixgbe: Fix packet corruption due to missing DMA sync Revert "qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()" ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions fsl/fman: Add fibre support ...
2021-07-22	Merge tag 'mmc-v5.14-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - Use kref to fix KASAN splats triggered during card removal - Don't allocate IDA for OF aliases * tag 'mmc-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Don't allocate IDA for OF aliases mmc: core: Use kref in place of struct mmc_blk_data::usage
2021-07-22	cifs: support share failover when remounting	Paulo Alcantara
	When remouting a DFS share, force a new DFS referral of the path and if the currently cached targets do not match any of the new targets or there was no cached targets, then mark it for reconnect. For example: $ mount //dom/dfs/link /mnt -o username=foo,password=bar $ ls /mnt oldfile.txt change target share of 'link' in server settings $ mount /mnt -o remount,username=foo,password=bar $ ls /mnt newfile.txt Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-22	cifs: only write 64kb at a time when fallocating a small region of a file	Ronnie Sahlberg
	We only allow sending single credit writes through the SMB2_write() synchronous api so split this into smaller chunks. Fixes: 966a3cb7c7db ("cifs: improve fallocate emulation") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reported-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-22	tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.	Haoran Luo
	The "rb_per_cpu_empty()" misinterpret the condition (as not-empty) when "head_page" and "commit_page" of "struct ring_buffer_per_cpu" points to the same buffer page, whose "buffer_data_page" is empty and "read" field is non-zero. An error scenario could be constructed as followed (kernel perspective): 1. All pages in the buffer has been accessed by reader(s) so that all of them will have non-zero "read" field. 2. Read and clear all buffer pages so that "rb_num_of_entries()" will return 0 rendering there's no more data to read. It is also required that the "read_page", "commit_page" and "tail_page" points to the same page, while "head_page" is the next page of them. 3. Invoke "ring_buffer_lock_reserve()" with large enough "length" so that it shot pass the end of current tail buffer page. Now the "head_page", "commit_page" and "tail_page" points to the same page. 4. Discard current event with "ring_buffer_discard_commit()", so that "head_page", "commit_page" and "tail_page" points to a page whose buffer data page is now empty. When the error scenario has been constructed, "tracing_read_pipe" will be trapped inside a deadloop: "trace_empty()" returns 0 since "rb_per_cpu_empty()" returns 0 when it hits the CPU containing such constructed ring buffer. Then "trace_find_next_entry_inc()" always return NULL since "rb_num_of_entries()" reports there's no more entry to read. Finally "trace_seq_to_user()" returns "-EBUSY" spanking "tracing_read_pipe" back to the start of the "waitagain" loop. I've also written a proof-of-concept script to construct the scenario and trigger the bug automatically, you can use it to trace and validate my reasoning above: https://github.com/aegistudio/RingBufferDetonator.git Tests has been carried out on linux kernel 5.14-rc2 (2734d6c1b1a089fb593ef6a23d4b70903526fe0c), my fixed version of kernel (for testing whether my update fixes the bug) and some older kernels (for range of affected kernels). Test result is also attached to the proof-of-concept repository. Link: https://lore.kernel.org/linux-trace-devel/YPaNxsIlb2yjSi5Y@aegistudio/ Link: https://lore.kernel.org/linux-trace-devel/YPgrN85WL9VyrZ55@aegistudio Cc: stable@vger.kernel.org Fixes: bf41a158cacba ("ring-buffer: make reentrant") Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Haoran Luo <www@aegistudio.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-22	spi: update modalias_show after of_device_uevent_modalias support	Andreas Schwab
	Commit 3ce6c9e2617e ("spi: add of_device_uevent_modalias support") is incomplete, as it didn't update the modalias_show function to generate the of: modalias string if available. Fixes: 3ce6c9e2617e ("spi: add of_device_uevent_modalias support") Signed-off-by: Andreas Schwab <schwab@suse.de> Link: https://lore.kernel.org/r/mvmwnpi4fya.fsf@suse.de Signed-off-by: Mark Brown <broonie@kernel.org>
2021-07-22	spi: meson-spicc: fix memory leak in meson_spicc_remove	Dongliang Mu
	In meson_spicc_probe, the error handling code needs to clean up master by calling spi_master_put, but the remove function does not have this function call. This will lead to memory leak of spicc->master. Reported-by: Dongliang Mu <mudongliangabcd@gmail.com> Fixes: 454fa271bc4e("spi: Add Meson SPICC driver") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Link: https://lore.kernel.org/r/20210720100116.1438974-1-mudongliangabcd@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-07-22	btrfs: store a block_device in struct btrfs_ordered_extent	Christoph Hellwig
	Store the block device instead of the gendisk in the btrfs_ordered_extent structure instead of acquiring a reference to it later. Note: this is from series removing bdgrab/bdput, btrfs is one of the last users. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-22	btrfs: fix lock inversion problem when doing qgroup extent tracing	Filipe Manana
	At btrfs_qgroup_trace_extent_post() we call btrfs_find_all_roots() with a NULL value as the transaction handle argument, which makes that function take the commit_root_sem semaphore, which is necessary when we don't hold a transaction handle or any other mechanism to prevent a transaction commit from wiping out commit roots. However btrfs_qgroup_trace_extent_post() can be called in a context where we are holding a write lock on an extent buffer from a subvolume tree, namely from btrfs_truncate_inode_items(), called either during truncate or unlink operations. In this case we end up with a lock inversion problem because the commit_root_sem is a higher level lock, always supposed to be acquired before locking any extent buffer. Lockdep detects this lock inversion problem since we switched the extent buffer locks from custom locks to semaphores, and when running btrfs/158 from fstests, it reported the following trace: [ 9057.626435] ====================================================== [ 9057.627541] WARNING: possible circular locking dependency detected [ 9057.628334] 5.14.0-rc2-btrfs-next-93 #1 Not tainted [ 9057.628961] ------------------------------------------------------ [ 9057.629867] kworker/u16:4/30781 is trying to acquire lock: [ 9057.630824] ffff8e2590f58760 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 9057.632542] but task is already holding lock: [ 9057.633551] ffff8e25582d4b70 (&fs_info->commit_root_sem){++++}-{3:3}, at: iterate_extent_inodes+0x10b/0x280 [btrfs] [ 9057.635255] which lock already depends on the new lock. [ 9057.636292] the existing dependency chain (in reverse order) is: [ 9057.637240] -> #1 (&fs_info->commit_root_sem){++++}-{3:3}: [ 9057.638138] down_read+0x46/0x140 [ 9057.638648] btrfs_find_all_roots+0x41/0x80 [btrfs] [ 9057.639398] btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs] [ 9057.640283] btrfs_add_delayed_data_ref+0x418/0x490 [btrfs] [ 9057.641114] btrfs_free_extent+0x35/0xb0 [btrfs] [ 9057.641819] btrfs_truncate_inode_items+0x424/0xf70 [btrfs] [ 9057.642643] btrfs_evict_inode+0x454/0x4f0 [btrfs] [ 9057.643418] evict+0xcf/0x1d0 [ 9057.643895] do_unlinkat+0x1e9/0x300 [ 9057.644525] do_syscall_64+0x3b/0xc0 [ 9057.645110] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 9057.645835] -> #0 (btrfs-tree-00){++++}-{3:3}: [ 9057.646600] __lock_acquire+0x130e/0x2210 [ 9057.647248] lock_acquire+0xd7/0x310 [ 9057.647773] down_read_nested+0x4b/0x140 [ 9057.648350] __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 9057.649175] btrfs_read_lock_root_node+0x31/0x40 [btrfs] [ 9057.650010] btrfs_search_slot+0x537/0xc00 [btrfs] [ 9057.650849] scrub_print_warning_inode+0x89/0x370 [btrfs] [ 9057.651733] iterate_extent_inodes+0x1e3/0x280 [btrfs] [ 9057.652501] scrub_print_warning+0x15d/0x2f0 [btrfs] [ 9057.653264] scrub_handle_errored_block.isra.0+0x135f/0x1640 [btrfs] [ 9057.654295] scrub_bio_end_io_worker+0x101/0x2e0 [btrfs] [ 9057.655111] btrfs_work_helper+0xf8/0x400 [btrfs] [ 9057.655831] process_one_work+0x247/0x5a0 [ 9057.656425] worker_thread+0x55/0x3c0 [ 9057.656993] kthread+0x155/0x180 [ 9057.657494] ret_from_fork+0x22/0x30 [ 9057.658030] other info that might help us debug this: [ 9057.659064] Possible unsafe locking scenario: [ 9057.659824] CPU0 CPU1 [ 9057.660402] ---- ---- [ 9057.660988] lock(&fs_info->commit_root_sem); [ 9057.661581] lock(btrfs-tree-00); [ 9057.662348] lock(&fs_info->commit_root_sem); [ 9057.663254] lock(btrfs-tree-00); [ 9057.663690] * DEADLOCK * [ 9057.664437] 4 locks held by kworker/u16:4/30781: [ 9057.665023] #0: ffff8e25922a1148 ((wq_completion)btrfs-scrub){+.+.}-{0:0}, at: process_one_work+0x1c7/0x5a0 [ 9057.666260] #1: ffffabb3451ffe70 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x1c7/0x5a0 [ 9057.667639] #2: ffff8e25922da198 (&ret->mutex){+.+.}-{3:3}, at: scrub_handle_errored_block.isra.0+0x5d2/0x1640 [btrfs] [ 9057.669017] #3: ffff8e25582d4b70 (&fs_info->commit_root_sem){++++}-{3:3}, at: iterate_extent_inodes+0x10b/0x280 [btrfs] [ 9057.670408] stack backtrace: [ 9057.670976] CPU: 7 PID: 30781 Comm: kworker/u16:4 Not tainted 5.14.0-rc2-btrfs-next-93 #1 [ 9057.672030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 9057.673492] Workqueue: btrfs-scrub btrfs_work_helper [btrfs] [ 9057.674258] Call Trace: [ 9057.674588] dump_stack_lvl+0x57/0x72 [ 9057.675083] check_noncircular+0xf3/0x110 [ 9057.675611] __lock_acquire+0x130e/0x2210 [ 9057.676132] lock_acquire+0xd7/0x310 [ 9057.676605] ? __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 9057.677313] ? lock_is_held_type+0xe8/0x140 [ 9057.677849] down_read_nested+0x4b/0x140 [ 9057.678349] ? __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 9057.679068] __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 9057.679760] btrfs_read_lock_root_node+0x31/0x40 [btrfs] [ 9057.680458] btrfs_search_slot+0x537/0xc00 [btrfs] [ 9057.681083] ? _raw_spin_unlock+0x29/0x40 [ 9057.681594] ? btrfs_find_all_roots_safe+0x11f/0x140 [btrfs] [ 9057.682336] scrub_print_warning_inode+0x89/0x370 [btrfs] [ 9057.683058] ? btrfs_find_all_roots_safe+0x11f/0x140 [btrfs] [ 9057.683834] ? scrub_write_block_to_dev_replace+0xb0/0xb0 [btrfs] [ 9057.684632] iterate_extent_inodes+0x1e3/0x280 [btrfs] [ 9057.685316] scrub_print_warning+0x15d/0x2f0 [btrfs] [ 9057.685977] ? ___ratelimit+0xa4/0x110 [ 9057.686460] scrub_handle_errored_block.isra.0+0x135f/0x1640 [btrfs] [ 9057.687316] scrub_bio_end_io_worker+0x101/0x2e0 [btrfs] [ 9057.688021] btrfs_work_helper+0xf8/0x400 [btrfs] [ 9057.688649] ? lock_is_held_type+0xe8/0x140 [ 9057.689180] process_one_work+0x247/0x5a0 [ 9057.689696] worker_thread+0x55/0x3c0 [ 9057.690175] ? process_one_work+0x5a0/0x5a0 [ 9057.690731] kthread+0x155/0x180 [ 9057.691158] ? set_kthread_struct+0x40/0x40 [ 9057.691697] ret_from_fork+0x22/0x30 Fix this by making btrfs_find_all_roots() never attempt to lock the commit_root_sem when it is called from btrfs_qgroup_trace_extent_post(). We can't just pass a non-NULL transaction handle to btrfs_find_all_roots() from btrfs_qgroup_trace_extent_post(), because that would make backref lookup not use commit roots and acquire read locks on extent buffers, and therefore could deadlock when btrfs_qgroup_trace_extent_post() is called from the btrfs_truncate_inode_items() code path which has acquired a write lock on an extent buffer of the subvolume btree. CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-22	btrfs: check for missing device in btrfs_trim_fs	Anand Jain
	A fstrim on a degraded raid1 can trigger the following null pointer dereference: BTRFS info (device loop0): allowing degraded mounts BTRFS info (device loop0): disk space caching is enabled BTRFS info (device loop0): has skinny extents BTRFS warning (device loop0): devid 2 uuid 97ac16f7-e14d-4db1-95bc-3d489b424adb is missing BTRFS warning (device loop0): devid 2 uuid 97ac16f7-e14d-4db1-95bc-3d489b424adb is missing BTRFS info (device loop0): enabling ssd optimizations BUG: kernel NULL pointer dereference, address: 0000000000000620 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 4574 Comm: fstrim Not tainted 5.13.0-rc7+ #31 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:btrfs_trim_fs+0x199/0x4a0 [btrfs] RSP: 0018:ffff959541797d28 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff946f84eca508 RCX: a7a67937adff8608 RDX: ffff946e8122d000 RSI: 0000000000000000 RDI: ffffffffc02fdbf0 RBP: ffff946ea4615000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: ffff946e8122d960 R12: 0000000000000000 R13: ffff959541797db8 R14: ffff946e8122d000 R15: ffff959541797db8 FS: 00007f55917a5080(0000) GS:ffff946f9bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000620 CR3: 000000002d2c8001 CR4: 00000000000706f0 Call Trace: btrfs_ioctl_fitrim+0x167/0x260 [btrfs] btrfs_ioctl+0x1c00/0x2fe0 [btrfs] ? selinux_file_ioctl+0x140/0x240 ? syscall_trace_enter.constprop.0+0x188/0x240 ? __x64_sys_ioctl+0x83/0xb0 __x64_sys_ioctl+0x83/0xb0 Reproducer: $ mkfs.btrfs -fq -d raid1 -m raid1 /dev/loop0 /dev/loop1 $ mount /dev/loop0 /btrfs $ umount /btrfs $ btrfs dev scan --forget $ mount -o degraded /dev/loop0 /btrfs $ fstrim /btrfs The reason is we call btrfs_trim_free_extents() for the missing device, which uses device->bdev (NULL for missing device) to find if the device supports discard. Fix is to check if the device is missing before calling btrfs_trim_free_extents(). CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>