git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-05-25	mailmap: change email for Ricardo Ribalda	Ricardo Ribalda Delgado
	Modify emails to ribalda@kernel.org and unify my surname in all the files. Signed-off-by: Ricardo Ribalda <ribalda@kernel.org> Link: https://lore.kernel.org/r/20200430135224.362700-1-ricardo@ribalda.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-05-25	docs: sysctl/kernel: document unaligned controls	Stephen Kitt
	This documents ignore-unaligned-usertrap, unaligned-dump-stack, and unaligned-trap, based on arch/arc/kernel/unaligned.c, arch/ia64/kernel/unaligned.c, and arch/parisc/kernel/unaligned.c. While we're at it, integrate unaligned-memory-access.txt into the docs tree. Signed-off-by: Stephen Kitt <steve@sk2.org> Link: https://lore.kernel.org/r/20200515212443.5012-1-steve@sk2.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-05-25	Documentation: admin-guide: update bug-hunting.rst	Randy Dunlap
	Update Documentation/admin-guide/bug-hunting.rst: - add a small section on "Modules linked in" and their possible flags; - delete all references to ksymoops since it is no longer applicable; - fix spello, grammar, and punctuation; - note that get_maintainers.pl only provides recent patchers if it is run inside a git tree; - add mention of scripts/decode_stacktrace.sh; Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: greg@wind.rmcc.com Link: https://lore.kernel.org/r/c629a9ef-3867-c3d1-f6c9-2c3b0e4ac68a@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-05-25	docs: sysctl/kernel: document ngroups_max	Stephen Kitt
	This is a read-only export of NGROUPS_MAX. Signed-off-by: Stephen Kitt <steve@sk2.org> Link: https://lore.kernel.org/r/20200518145836.15816-1-steve@sk2.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-05-25	dpaa_eth: fix usage as DSA master, try 3	Vladimir Oltean
	The dpaa-eth driver probes on compatible string for the MAC node, and the fman/mac.c driver allocates a dpaa-ethernet platform device that triggers the probing of the dpaa-eth net device driver. All of this is fine, but the problem is that the struct device of the dpaa_eth net_device is 2 parents away from the MAC which can be referenced via of_node. So of_find_net_device_by_node can't find it, and DSA switches won't be able to probe on top of FMan ports. It would be a bit silly to modify a core function (of_find_net_device_by_node) to look for dev->parent->parent->of_node just for one driver. We're just 1 step away from implementing full recursion. Actually there have already been at least 2 previous attempts to make this work: - Commit a1a50c8e4c24 ("fsl/man: Inherit parent device and of_node") - One or more of the patches in "[v3,0/6] adapt DPAA drivers for DSA": https://patchwork.ozlabs.org/project/netdev/cover/1508178970-28945-1-git-send-email-madalin.bucur@nxp.com/ (I couldn't really figure out which one was supposed to solve the problem and how). Point being, it looks like this is still pretty much a problem today. On T1040, the /sys/class/net/eth0 symlink currently points to ../../devices/platform/ffe000000.soc/ffe400000.fman/ffe4e6000.ethernet/dpaa-ethernet.0/net/eth0 which pretty much illustrates the problem. The closest of_node we've got is the "fsl,fman-memac" at /soc@ffe000000/fman@400000/ethernet@e6000, which is what we'd like to be able to reference from DSA as host port. For of_find_net_device_by_node to find the eth0 port, we would need the parent of the eth0 net_device to not be the "dpaa-ethernet" platform device, but to point 1 level higher, aka the "fsl,fman-memac" node directly. The new sysfs path would look like this: ../../devices/platform/ffe000000.soc/ffe400000.fman/ffe4e6000.ethernet/net/eth0 And this is exactly what SET_NETDEV_DEV does. It sets the parent of the net_device. The new parent has an of_node associated with it, and of_dev_node_match already checks for the of_node of the device or of its parent. Fixes: a1a50c8e4c24 ("fsl/man: Inherit parent device and of_node") Fixes: c6e26ea8c893 ("dpaa_eth: change device used") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	net/tls: fix race condition causing kernel panic	Vinay Kumar Yadav
	tls_sw_recvmsg() and tls_decrypt_done() can be run concurrently. // tls_sw_recvmsg() if (atomic_read(&ctx->decrypt_pending)) crypto_wait_req(-EINPROGRESS, &ctx->async_wait); else reinit_completion(&ctx->async_wait.completion); //tls_decrypt_done() pending = atomic_dec_return(&ctx->decrypt_pending); if (!pending && READ_ONCE(ctx->async_notify)) complete(&ctx->async_wait.completion); Consider the scenario tls_decrypt_done() is about to run complete() if (!pending && READ_ONCE(ctx->async_notify)) and tls_sw_recvmsg() reads decrypt_pending == 0, does reinit_completion(), then tls_decrypt_done() runs complete(). This sequence of execution results in wrong completion. Consequently, for next decrypt request, it will not wait for completion, eventually on connection close, crypto resources freed, there is no way to handle pending decrypt response. This race condition can be avoided by having atomic_read() mutually exclusive with atomic_dec_return(),complete().Intoduced spin lock to ensure the mutual exclution. Addressed similar problem in tx direction. v1->v2: - More readable commit message. - Corrected the lock to fix new race scenario. - Removed barrier which is not needed now. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm	Linus Torvalds
	Pull ARM fixes from Russell King: - correct value of decompressor tag size in header - fix DACR value when we have nested exceptions - fix a missing newline on a kernel message - fix mask for ptrace thumb breakpoint hook * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8977/1: ptrace: Fix mask for thumb breakpoint hook ARM: 8973/1: Add missing newline terminator to kernel message ARM: uaccess: fix DACR mismatch with nested exceptions ARM: uaccess: integrate uaccess_save and uaccess_restore ARM: uaccess: consolidate uaccess asm to asm/uaccess-asm.h ARM: 8970/1: decompressor: increase tag size
2020-05-26	Merge tag 'omap-for-v5.7/cpsw-fixes-signed' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Few cpsw related dts fixes for omaps Recent cpsw driver changes exposed few regressions in the cpsw related dts configuration that would be good to fix: - Few more boards still need to be updated to use rgmii-rxid phy caused by the fallout from commit bcf3440c6dd7 ("net: phy: micrel: add phy-mode support for the KSZ9031 PHY" as the rx delay is now disabled unless we use rgmii-rxid. - On dm814x we have been using a wrong clock for mdio that now can produce external abort on some boards * tag 'omap-for-v5.7/cpsw-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: Fix wrong mdio clock for dm814x ARM: dts: am437x: fix networking on boards with ksz9031 phy ARM: dts: am57xx: fix networking on boards with ksz9031 phy Link: https://lore.kernel.org/r/pull-1589472123-367692@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-05-26	xsk: Add overflow check for u64 division, stored into u32	Björn Töpel
	The npgs member of struct xdp_umem is an u32 entity, and stores the number of pages the UMEM consumes. The calculation of npgs npgs = size / PAGE_SIZE can overflow. To avoid overflow scenarios, the division is now first stored in a u64, and the result is verified to fit into 32b. An alternative would be storing the npgs as a u64, however, this wastes memory and is an unrealisticly large packet area. Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt") Reported-by: "Minh Bùi Quang" <minhquangbui99@gmail.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/CACtPs=GGvV-_Yj6rbpzTVnopgi5nhMoCcTkSkYrJHGQHJWFZMQ@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20200525080400.13195-1-bjorn.topel@gmail.com
2020-05-25	netfilter: nfnetlink_cthelper: unbreak userspace helper support	Pablo Neira Ayuso
	Restore helper data size initialization and fix memcopy of the helper data size. Fixes: 157ffffeb5dc ("netfilter: nfnetlink_cthelper: reject too large userspace allocation requests") Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25	netfilter: conntrack: make conntrack userspace helpers work again	Pablo Neira Ayuso
	Florian Westphal says: "Problem is that after the helper hook was merged back into the confirm one, the queueing itself occurs from the confirm hook, i.e. we queue from the last netfilter callback in the hook-list. Therefore, on return, the packet bypasses the confirm action and the connection is never committed to the main conntrack table. To fix this there are several ways: 1. revert the 'Fixes' commit and have a extra helper hook again. Works, but has the drawback of adding another indirect call for everyone. 2. Special case this: split the hooks only when userspace helper gets added, so queueing occurs at a lower priority again, and normal enqueue reinject would eventually call the last hook. 3. Extend the existing nf_queue ct update hook to allow a forced confirmation (plus run the seqadj code). This goes for 3)." Fixes: 827318feb69cb ("netfilter: conntrack: remove helper hook again") Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25	netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code	Pablo Neira Ayuso
	Dan Carpenter says: "Smatch complains that the value for "cmd" comes from the network and can't be trusted." Add pptp_msg_name() helper function that checks for the array boundary. Fixes: f09943fefe6b ("[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25	netfilter: ipset: Fix subcounter update skip	Phil Sutter
	If IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE is set, user requested to not update counters in sub sets. Therefore IPSET_FLAG_SKIP_COUNTER_UPDATE must be set, not unset. Fixes: 6e01781d1c80e ("netfilter: ipset: set match: add support to match the counters") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25	netfilter: nft_reject_bridge: enable reject with bridge vlan	Michael Braun
	Currently, using the bridge reject target with tagged packets results in untagged packets being sent back. Fix this by mirroring the vlan id as well. Fixes: 85f5b3086a04 ("netfilter: bridge: add reject support") Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-25	RDMA/pvrdma: Fix missing pci disable in pvrdma_pci_probe()	Qiushi Wu
	In function pvrdma_pci_probe(), pdev was not disabled in one error path. Thus replace the jump target “err_free_device” by "err_disable_pdev". Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Link: https://lore.kernel.org/r/20200523030457.16160-1-wu000273@umn.edu Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-25	nvdimm: fixes to maintainter-entry-profile	Randy Dunlap
	Fix punctuation and wording in a few places. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/103a0e71-28b5-e4c2-fdf2-80d2dd005b44@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-05-25	Documentation/features: Correct RISC-V kprobes support entry	Björn Töpel
	The Documentation/features/debug/kprobes/arch-support.txt incorrectly states that RISC-V has kprobes support. This is not the case. Note that entries that have been incorrectly marked with 'ok' will not be changed back to 'TODO' by the features-refresh.sh script. Fixes: 7156fc292850 ("Documentation/features: Refresh the arch support status files in place") Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/r/20200523210005.59140-1-bjorn.topel@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-05-25	Documentation/features: Refresh the arch support status files	Björn Töpel
	I was manually editing the arch-support.txt for eBPF-JIT, when I realized the refresh script [1] has not been run for a while. Let's fix that, so that the entries are more up-to-date. [1] Documentation/features/scripts/features-refresh.sh Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Link: https://lore.kernel.org/r/20200523191135.21889-1-bjorn.topel@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-05-25	spi: flags 'SPI_CONTROLLER_MUST_RX' and 'SPI_CONTROLLER_MUST_TX' can't be ↵	dillon min
	coexit with 'SPI_3WIRE' mode since chip spi driver need get the transfer direction by 'tx_buf' and 'rx_buf' of 'struct spi_transfer' in 'SPI_3WIRE' mode. so, we need bypass 'SPI_CONTROLLER_MUST_RX' and 'SPI_CONTROLLER_MUST_TX' feature in 'SPI_3WIRE' mode Signed-off-by: dillon min <dillon.minfei@gmail.com> Link: https://lore.kernel.org/r/1590378348-8115-9-git-send-email-dillon.minfei@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-05-25	spi: stm32: Add 'SPI_SIMPLEX_RX', 'SPI_3WIRE_RX' support for stm32f4	dillon min
	in l3gd20 driver startup, there is a setup failed error return from stm32 spi driver " [ 2.687630] st-gyro-spi spi0.0: supply vdd not found, using dummy regulator [ 2.696869] st-gyro-spi spi0.0: supply vddio not found, using dummy regulator [ 2.706707] spi_stm32 40015000.spi: SPI transfer setup failed [ 2.713741] st-gyro-spi spi0.0: SPI transfer failed: -22 [ 2.721096] spi_master spi0: failed to transfer one message from queue [ 2.729268] iio iio:device0: failed to read Who-Am-I register. [ 2.737504] st-gyro-spi: probe of spi0.0 failed with error -22 " after debug into spi-stm32 driver, st-gyro-spi split two steps to read l3gd20 id first: send command to l3gd20 with read id command in tx_buf, rx_buf is null. second: read id with tx_buf is null, rx_buf not null. so, for second step, stm32 driver recongise this process as 'SPI_SIMPLE_RX' from stm32_spi_communication_type(), but there is no related process for this type in stm32f4_spi_set_mode(), then we get error from stm32_spi_transfer_one_setup(). we can use two method to fix this bug. 1, use stm32 spi's "In unidirectional receive-only mode (BIDIMODE=0 and RXONLY=1)". but as our code running in sdram, the read latency is too large to get so many receive overrun error in interrupts handler. 2, use stm32 spi's "In full-duplex (BIDIMODE=0 and RXONLY=0)", as tx_buf is null, so add flag 'SPI_MASTER_MUST_TX' to spi master. Change since V4: 1 remove dummy data sent out by stm32 spi driver 2 add flag 'SPI_MASTER_MUST_TX' to spi master Signed-off-by: dillon min <dillon.minfei@gmail.com> Link: https://lore.kernel.org/r/1590378348-8115-8-git-send-email-dillon.minfei@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-05-25	xfrm: fix a warning in xfrm_policy_insert_list	Xin Long
	This waring can be triggered simply by: # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 1 mark 0 mask 0x10 #[1] # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 2 mark 0 mask 0x1 #[2] # ip xfrm policy update src 192.168.1.1/24 dst 192.168.1.2/24 dir in \ priority 2 mark 0 mask 0x10 #[3] Then dmesg shows: [ ] WARNING: CPU: 1 PID: 7265 at net/xfrm/xfrm_policy.c:1548 [ ] RIP: 0010:xfrm_policy_insert_list+0x2f2/0x1030 [ ] Call Trace: [ ] xfrm_policy_inexact_insert+0x85/0xe50 [ ] xfrm_policy_insert+0x4ba/0x680 [ ] xfrm_add_policy+0x246/0x4d0 [ ] xfrm_user_rcv_msg+0x331/0x5c0 [ ] netlink_rcv_skb+0x121/0x350 [ ] xfrm_netlink_rcv+0x66/0x80 [ ] netlink_unicast+0x439/0x630 [ ] netlink_sendmsg+0x714/0xbf0 [ ] sock_sendmsg+0xe2/0x110 The issue was introduced by Commit 7cb8a93968e3 ("xfrm: Allow inserting policies with matching mark and different priorities"). After that, the policies [1] and [2] would be able to be added with different priorities. However, policy [3] will actually match both [1] and [2]. Policy [1] was matched due to the 1st 'return true' in xfrm_policy_mark_match(), and policy [2] was matched due to the 2nd 'return true' in there. It caused WARN_ON() in xfrm_policy_insert_list(). This patch is to fix it by only (the same value and priority) as the same policy in xfrm_policy_mark_match(). Thanks to Yuehaibing, we could make this fix better. v1->v2: - check policy->mark.v == pol->mark.v only without mask. Fixes: 7cb8a93968e3 ("xfrm: Allow inserting policies with matching mark and different priorities") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-05-25	Merge tag 'efi-changes-for-v5.8' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core More EFI changes for v5.8: - Rename pr_efi/pr_efi_err to efi_info/efi_err, and use them consistently - Simplify and unify initrd loading - Parse the builtin command line on x86 (if provided) - Implement printk() support, including support for wide character strings - Some fixes for issues introduced by the first batch of v5.8 changes - Fix a missing prototypes warning - Simplify GDT handling in early mixed mode thunking code - Some other minor fixes and cleanups Conflicts: drivers/firmware/efi/libstub/efistub.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-25	Merge tag 'v5.7-rc7' into efi/core, to refresh the branch and pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-25	spi: mux: repair mux usage	Peter Rosin
	It is not valid to cache/short out selection of the mux. mux_control_select() only locks the mux until mux_control_deselect() is called. mux_control_deselect() may put the mux in some low power state or some other user of the mux might select it for other purposes. These things are probably not happening in the original setting where this driver was developed, but it is said to be a generic SPI mux. Also, the mux framework will short out the actual low level muxing operation when/if that is possible. Fixes: e9e40543ad5b ("spi: Add generic SPI multiplexer") Signed-off-by: Peter Rosin <peda@axentia.se> Link: https://lore.kernel.org/r/20200525104352.26807-1-peda@axentia.se Signed-off-by: Mark Brown <broonie@kernel.org>
2020-05-25	iomap: remove lockdep_assert_held()	Goldwyn Rodrigues
	Filesystems such as btrfs can perform direct I/O without holding the inode->i_rwsem in some of the cases like writing within i_size. So, remove the check for lockdep_assert_held() in iomap_dio_rw(). Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	iomap: add a filesystem hook for direct I/O bio submission	Goldwyn Rodrigues
	This helps filesystems to perform tasks on the bio while submitting for I/O. This could be post-write operations such as data CRC or data replication for fs-handled RAID. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	fs: export generic_file_buffered_read()	Goldwyn Rodrigues
	Export generic_file_buffered_read() to be used to supplement incomplete direct reads. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	cfg80211: fix debugfs rename crash	Johannes Berg
	Removing the "if (IS_ERR(dir)) dir = NULL;" check only works if we adjust the remaining code to not rely on it being NULL. Check IS_ERR_OR_NULL() before attempting to dereference it. I'm not actually entirely sure this fixes the syzbot crash as the kernel config indicates that they do have DEBUG_FS in the kernel, but this is what I found when looking there. Cc: stable@vger.kernel.org Fixes: d82574a8e5a4 ("cfg80211: no need to check return value of debugfs_create functions") Reported-by: syzbot+fd5332e429401bf42d18@syzkaller.appspotmail.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20200525113816.fc4da3ec3d4b.Ica63a110679819eaa9fb3bc1b7437d96b1fd187d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-05-25	ACPI: EC: PM: s2idle: Extend GPE dispatching debug message	Rafael J. Wysocki
	Add the "ACPI" string to the "EC GPE dispatched" message as it is ACPI-related. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-25	ACPI: PM: s2idle: Print type of wakeup debug messages	Rafael J. Wysocki
	Since acpi_s2idle_wake() knows the category of wakeup causing the system to resume from suspend-to-idle, make it print a unique message for each of them to help diagnose wakeup issues. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-25	ACPI: DPTF: Add battery participant driver	Srinivas Pandruvada
	This driver adds support for Dynamic Platform and Thermal Framework battery participant device support. These attributes are presented via sysfs interface under the platform device for the battery participant: $ls /sys/bus/platform/devices/INT3532:00/dptf_battery current_discharge_capbility_ma max_platform_power_mw no_load_voltage_mv high_freq_impedance_mohm max_steady_state_power_mw Refer to the documentation at Documentation/ABI/testing/sysfs-platform-dptf for details. Here the implementation reuses existing dptf-power.c as the motivation and processing is same. It also shares one ACPI method. Here this change is using participant type, "PTYP" method to identify and do different processing. By using participant type, create/delete either "dptf_power" or "dptf_battery" attribute group and send notifications. The particpant type for for the battery participant is 0x0C. ACPI methods description: PMAX (Intel(R) Dynamic Tuning Platform Max Power Supplied by Battery): This object evaluates to the maximum platform power that can be supported by the battery in milli watts. PBSS (Intel(R) Dynamic Tuning Power Battery Steady State): This object returns the max sustained power for battery in milli watts. RBHF (Intel(R) Dynamic Tuning High Frequency Impedance): This object returns high frequency impedance value that can be obtained from battery fuel gauge. VBNL (Intel(R) Dynamic Tuning No-Load Voltage) This object returns battery instantaneous no-load voltage that can be obtained from battery fuel gauge in milli volts CMPP (Intel(R) Dynamic Tuning Current Discharge Capability) This object returns battery discharge current capability obtained from battery fuel gauge milli amps. Notifications: 0x80: PMAX change. Used to notify Intel(R)Dynamic Tuning Battery participant driver when the PMAX has changed by 250mw. 0x83: PBSS change. Used to notify Intel(R) Dynamic Tuning Battery participant driver when the power source has changed. 0x85: RBHF change. Used to notify Intel(R)Dynamic Tuning Battery participant driver when the RBHF has changed over a threshold by 5mOhm. 0x86: Battery Capability change. Used to notify Intel(R)Dynamic Tuning Battery participant driver when the battery capability has changed. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-25	ACPI: DPTF: Additional sysfs attributes for power participant driver	Srinivas Pandruvada
	Add two additional attributes to the existing power participant driver: rest_of_platform_power_mw: (RO) Shows the rest of worst case platform power in mW outside of S0C. This will help in power distribution to SoC and rest of the system. For example on a test system, this value is 2.5W with a 15W TDP SoC. Based on the adapter rating (adapter_rating_mw), user space software can decide on proper power allocation to SoC to improve short term performance via powercap/RAPL interface. prochot_confirm: (WO) Confirm EC about a prochot notification. Also userspace is notified via sysfs_notify(), whenever power source or rest of the platform power is changed. So user space can use poll() system call on those attributes. The ACPI methods used in this patch are as follows: PROP This object evaluates to the rest of worst case platform power in mW. Bits: 23:0 Worst case rest of platform power in mW. PBOK PBOK is a method designed to provide a mechanism for OSPM to change power setting before EC can de-assert a PROCHOT from a device. The EC may receive several PROCHOTs, so it has a sequence number attached to PSRC (read via existing attribute "platform_power_source"). Once OSPM takes action for a PSRC change notification, it can call PBOK method to confirm with the sequence number. Bits: 3:0 Power Delivery State Change Sequence number 30 Reserved 31 0 – Not OK to de-assert PROCHOT 1 – OK to de-assert PROCHOT PSRC (Platform Power Source): Not new in this patch but for documentation for new bits This object evaluates to an integer that represents the system power source as well as the power delivery state change sequence number. Bits: 3:0 The current power source as an integer for AC, DC, USB, Wireless. 0 = DC, 1 = AC, 2 = USB, 3 = Wireless Charging 7:4 Power Delivery State Change Sequence Number. Default value is 0 Notifications: 0x81: (Power State Change) Used to notify when the power source has changed. 0x84: (PROP change) Used to notify when the platform rest of power has changed. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject, minor ABI documentation edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-25	ACPI: video: Use native backlight on Acer TravelMate 5735Z	Paul Menzel
	Currently, changing the brightness of the internal display of the Acer TravelMate 5735Z does not work. Pressing the function keys or changing the slider, GNOME Shell 3.36.2 displays the OSD (five steps), but the brightness does not change. The Acer TravelMate 5735Z shipped with Windows 7 and as such does not trigger our "win8 ready" heuristic for preferring the native backlight interface. Still ACPI backlight control doesn't work on this model, where as the native (intel_video) backlight interface does work by adding `acpi_backlight=native` or `acpi_backlight=none` to Linux’ command line. So, add a quirk to force using native backlight control on this model. Link: https://bugzilla.kernel.org/show_bug.cgi?id=207835 Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-05-25	irqchip/sifive-plic: Improve boot prints for multiple PLIC instances	Anup Patel
	We improve PLIC banner to help distinguish multiple PLIC instances in boot time prints. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Link: https://lore.kernel.org/r/20200518091441.94843-4-anup.patel@wdc.com
2020-05-25	irqchip/sifive-plic: Setup cpuhp once after boot CPU handler is present	Anup Patel
	For multiple PLIC instances, the plic_init() is called once for each PLIC instance. Due to this we have two issues: 1. cpuhp_setup_state() is called multiple times 2. plic_starting_cpu() can crash for boot CPU if cpuhp_setup_state() is called before boot CPU PLIC handler is available. Address both issues by only initializing the HP notifiers when the boot CPU setup is complete. Fixes: f1ad1133b18f ("irqchip/sifive-plic: Add support for multiple PLICs") Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200518091441.94843-3-anup.patel@wdc.com
2020-05-25	irqchip/sifive-plic: Set default irq affinity in plic_irqdomain_map()	Anup Patel
	For multiple PLIC instances, each PLIC can only target a subset of CPUs which is represented by "lmask" in the "struct plic_priv". Currently, the default irq affinity for each PLIC interrupt is all online CPUs which is illegal value for default irq affinity when we have multiple PLIC instances. To fix this, we now set "lmask" as the default irq affinity in for each interrupt in plic_irqdomain_map(). Fixes: f1ad1133b18f ("irqchip/sifive-plic: Add support for multiple PLICs") Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200518091441.94843-2-anup.patel@wdc.com
2020-05-25	irqchip/gic-v2, v3: Drop extra IRQ_NOAUTOEN setting for (E)PPIs	Valentin Schneider
	(E)PPIs are per-CPU interrupts, so we want each CPU to go and enable them via enable_percpu_irq(); this also means we want IRQ_NOAUTOEN for them as the autoenable would lead to calling irq_enable() instead of the more appropriate irq_percpu_enable(). Calling irq_set_percpu_devid() is enough to get just that since it trickles down to irq_set_percpu_devid_flags(), which gives us IRQ_NOAUTOEN (and a few others). Setting IRQ_NOAUTOEN again right after this call is just redundant, so don't do it. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200521223500.834-1-valentin.schneider@arm.com
2020-05-25	btrfs: turn space cache writeout failure messages into debug messages	Filipe Manana
	Since commit 1afb648e945428 ("btrfs: use standard debug config option to enable free-space-cache debug prints"), we started to log error messages that were never logged before since there was no DEBUG macro defined anywhere. This started to make test case btrfs/187 to fail very often, as it greps for any btrfs error messages in dmesg/syslog and fails if any is found: (...) btrfs/186 1s ... 2s btrfs/187 - output mismatch (see .../results//btrfs/187.out.bad) \--- tests/btrfs/187.out 2019-05-17 12:48:32.537340749 +0100 \+++ /home/fdmanana/git/hub/xfstests/results//btrfs/187.out.bad ... \@@ -1,3 +1,8 @@ QA output created by 187 Create a readonly snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/snap1' Create a readonly snapshot of 'SCRATCH_MNT' in 'SCRATCH_MNT/snap2' +[268364.139958] BTRFS error (device sdc): failed to write free space cache for block group 30408704 +[268380.156503] BTRFS error (device sdc): failed to write free space cache for block group 30408704 +[268380.161703] BTRFS error (device sdc): failed to write free space cache for block group 30408704 +[268380.253180] BTRFS error (device sdc): failed to write free space cache for block group 30408704 ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/187.out ... btrfs/188 4s ... 2s (...) The space cache write failures happen due to ENOSPC when attempting to update the free space cache items in the root tree. This happens because when starting or joining a transaction we don't know how many block groups we will end up changing (due to extent allocation or release) and therefore never reserve space for updating free space cache items. More often than not, the free space cache writeout succeeds since the metadata space info is not yet full nor very close to being full, but when it is, the space cache writeout fails with ENOSPC. Occasional failures to write space caches are not considered critical since they can be rebuilt when mounting the filesystem or the next attempt to write a free space cache in the next transaction commit might succeed, so we used to hide those error messages with a preprocessor check for the existence of the DEBUG macro that was never enabled anywhere. A few other generic test cases also trigger the error messages due to ENOSPC failure when writing free space caches as well, however they don't fail since they don't grep dmesg/syslog for any btrfs specific error messages. So change the messages from 'error' level to 'debug' level, as it doesn't make much sense to have error messages triggered only if the debug macro is enabled plus, more importantly, the error is not serious nor highly unexpected. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: include error on messages about failure to write space/inode caches	Filipe Manana
	Currently the error messages logged when we fail to write a free space cache or an inode cache are not very useful as they don't mention what was the error. So include the error number in the messages. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()	Filipe Manana
	The label 'fail_unlock' is pointless, all it does is to jump to the label 'out', so just remove it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums	Filipe Manana
	We are currently treating any non-zero return value from btrfs_next_leaf() the same way, by going to the code that inserts a new checksum item in the tree. However if btrfs_next_leaf() returns an error (a value < 0), we should just stop and return the error, and not behave as if nothing has happened, since in that case we do not have a way to know if there is a next leaf or we are currently at the last leaf already. So fix that by returning the error from btrfs_next_leaf(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: make checksum item extension more efficient	Filipe Manana
	When we want to add checksums into the checksums tree, or a log tree, we try whenever possible to extend existing checksum items, as this helps reduce amount of metadata space used, since adding a new item uses extra metadata space for a btrfs_item structure (25 bytes). However we have two inefficiencies in the current approach: 1) After finding a checksum item that covers a range with an end offset that matches the start offset of the checksum range we want to insert, we release the search path populated by btrfs_lookup_csum() and then do another COW search on tree with the goal of getting additional space for at least one checksum. Doing this path release and then searching again is a waste of time because very often the leaf already has enough free space for at least one more checksum; 2) After the COW search that guarantees we get free space in the leaf for at least one more checksum, we end up not doing the extension of the previous checksum item, and fallback to insertion of a new checksum item, if the leaf doesn't have an amount of free space larger then the space required for 2 checksums plus one btrfs_item structure - this is pointless for two reasons: a) We want to extend an existing item, so we don't need to account for a btrfs_item structure (25 bytes); b) We made the COW search with an insertion size for 1 single checksum, so if the leaf ends up with a free space amount smaller then 2 checksums plus the size of a btrfs_item structure, we give up on the extension of the existing item and jump to the 'insert' label, where we end up releasing the path and then doing yet another search to insert a new checksum item for a single checksum. Fix these inefficiencies by doing the following: - For case 1), before releasing the path just check if the leaf already has enough space for at least 1 more checksum, and if it does, jump directly to the item extension code, with releasing our current path, which was already COWed by btrfs_lookup_csum(); - For case 2), fix the logic so that for item extension we require only that the leaf has enough free space for 1 checksum, and not a minimum of 2 checksums plus space for a btrfs_item structure. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents	Filipe Manana
	When we have extents shared amongst different inodes in the same subvolume, if we fsync them in parallel we can end up with checksum items in the log tree that represent ranges which overlap. For example, consider we have inodes A and B, both sharing an extent that covers the logical range from X to X + 64KiB: 1) Task A starts an fsync on inode A; 2) Task B starts an fsync on inode B; 3) Task A calls btrfs_csum_file_blocks(), and the first search in the log tree, through btrfs_lookup_csum(), returns -EFBIG because it finds an existing checksum item that covers the range from X - 64KiB to X; 4) Task A checks that the checksum item has not reached the maximum possible size (MAX_CSUM_ITEMS) and then releases the search path before it does another path search for insertion (through a direct call to btrfs_search_slot()); 5) As soon as task A releases the path and before it does the search for insertion, task B calls btrfs_csum_file_blocks() and gets -EFBIG too, because there is an existing checksum item that has an end offset that matches the start offset (X) of the checksum range we want to log; 6) Task B releases the path; 7) Task A does the path search for insertion (through btrfs_search_slot()) and then verifies that the checksum item that ends at offset X still exists and extends its size to insert the checksums for the range from X to X + 64KiB; 8) Task A releases the path and returns from btrfs_csum_file_blocks(), having inserted the checksums into an existing checksum item that got its size extended. At this point we have one checksum item in the log tree that covers the logical range from X - 64KiB to X + 64KiB; 9) Task B now does a search for insertion using btrfs_search_slot() too, but it finds that the previous checksum item no longer ends at the offset X, it now ends at an of offset X + 64KiB, so it leaves that item untouched. Then it releases the path and calls btrfs_insert_empty_item() that inserts a checksum item with a key offset corresponding to X and a size for inserting a single checksum (4 bytes in case of crc32c). Subsequent iterations end up extending this new checksum item so that it contains the checksums for the range from X to X + 64KiB. So after task B returns from btrfs_csum_file_blocks() we end up with two checksum items in the log tree that have overlapping ranges, one for the range from X - 64KiB to X + 64KiB, and another for the range from X to X + 64KiB. Having checksum items that represent ranges which overlap, regardless of being in the log tree or in the chekcsums tree, can lead to problems where checksums for a file range end up not being found. This type of problem has happened a few times in the past and the following commits fixed them and explain in detail why having checksum items with overlapping ranges is problematic: 27b9a8122ff71a "Btrfs: fix csum tree corruption, duplicate and outdated checksums" b84b8390d6009c "Btrfs: fix file read corruption after extent cloning and fsync" 40e046acbd2f36 "Btrfs: fix missing data checksums after replaying a log tree" Since this specific instance of the problem can only happen when logging inodes, because it is the only case where concurrent attempts to insert checksums for the same range can happen, fix the issue by using an extent io tree as a range lock to serialize checksum insertion during inode logging. This issue could often be reproduced by the test case generic/457 from fstests. When it happens it produces the following trace: BTRFS critical (device dm-0): corrupt leaf: root=18446744073709551610 block=30625792 slot=42, csum end range (15020032) goes beyond the start range (15015936) of the next csum item BTRFS info (device dm-0): leaf 30625792 gen 7 total ptrs 49 free space 2402 owner 18446744073709551610 BTRFS info (device dm-0): refs 1 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 15884 item 0 key (18446744073709551606 128 13979648) itemoff 3991 itemsize 4 item 1 key (18446744073709551606 128 13983744) itemoff 3987 itemsize 4 item 2 key (18446744073709551606 128 13987840) itemoff 3983 itemsize 4 item 3 key (18446744073709551606 128 13991936) itemoff 3979 itemsize 4 item 4 key (18446744073709551606 128 13996032) itemoff 3975 itemsize 4 item 5 key (18446744073709551606 128 14000128) itemoff 3971 itemsize 4 (...) BTRFS error (device dm-0): block=30625792 write time tree block corruption detected ------------[ cut here ]------------ WARNING: CPU: 1 PID: 15884 at fs/btrfs/disk-io.c:539 btree_csum_one_bio+0x268/0x2d0 [btrfs] Modules linked in: btrfs dm_thin_pool ... CPU: 1 PID: 15884 Comm: fsx Tainted: G W 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btree_csum_one_bio+0x268/0x2d0 [btrfs] Code: c7 c7 ... RSP: 0018:ffffbb0109e6f8e0 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffffe1c0847b6080 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffffaa963988 RDI: 0000000000000001 RBP: ffff956a4f4d2000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000526 R11: 0000000000000000 R12: ffff956a5cd28bb0 R13: 0000000000000000 R14: ffff956a649c9388 R15: 000000011ed82000 FS: 00007fb419959e80(0000) GS:ffff956a7aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000fe6d54 CR3: 0000000138696005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btree_submit_bio_hook+0x67/0xc0 [btrfs] submit_one_bio+0x31/0x50 [btrfs] btree_write_cache_pages+0x2db/0x4b0 [btrfs] ? __filemap_fdatawrite_range+0xb1/0x110 do_writepages+0x23/0x80 __filemap_fdatawrite_range+0xd2/0x110 btrfs_write_marked_extents+0x15e/0x180 [btrfs] btrfs_sync_log+0x206/0x10a0 [btrfs] ? kmem_cache_free+0x315/0x3b0 ? btrfs_log_inode+0x1e8/0xf90 [btrfs] ? __mutex_unlock_slowpath+0x45/0x2a0 ? lockref_put_or_lock+0x9/0x30 ? dput+0x2d/0x580 ? dput+0xb5/0x580 ? btrfs_sync_file+0x464/0x4d0 [btrfs] btrfs_sync_file+0x464/0x4d0 [btrfs] do_fsync+0x38/0x60 __x64_sys_fsync+0x10/0x20 do_syscall_64+0x5c/0x280 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fb41953a6d0 Code: 48 3d ... RSP: 002b:00007ffcc86bd218 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fb41953a6d0 RDX: 0000000000000009 RSI: 0000000000040000 RDI: 0000000000000003 RBP: 0000000000040000 R08: 0000000000000001 R09: 0000000000000009 R10: 0000000000000064 R11: 0000000000000246 R12: 0000556cf4b2c060 R13: 0000000000000100 R14: 0000000000000000 R15: 0000556cf322b420 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffffa96bdedf>] copy_process+0x74f/0x2020 softirqs last enabled at (0): [<ffffffffa96bdedf>] copy_process+0x74f/0x2020 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace d543fc76f5ad7fd8 ]--- In that trace the tree checker detected the overlapping checksum items at the time when we triggered writeback for the log tree when syncing the log. Another trace that can happen is due to BUG_ON() when deleting checksum items while logging an inode: BTRFS critical (device dm-0): slot 81 key (18446744073709551606 128 13635584) new key (18446744073709551606 128 13635584) BTRFS info (device dm-0): leaf 30949376 gen 7 total ptrs 98 free space 8527 owner 18446744073709551610 BTRFS info (device dm-0): refs 4 lock (w:1 r:0 bw:0 br:0 sw:1 sr:0) lock_owner 13473 current 13473 item 0 key (257 1 0) itemoff 16123 itemsize 160 inode generation 7 size 262144 mode 100600 item 1 key (257 12 256) itemoff 16103 itemsize 20 item 2 key (257 108 0) itemoff 16050 itemsize 53 extent data disk bytenr 13631488 nr 4096 extent data offset 0 nr 131072 ram 131072 (...) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:3153! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 1 PID: 13473 Comm: fsx Not tainted 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x1ea/0x270 [btrfs] Code: 0f b6 ... RSP: 0018:ffff95e3889179d0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000051 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffffb7763988 RDI: 0000000000000001 RBP: fffffffffffffff6 R08: 0000000000000000 R09: 0000000000000001 R10: 00000000000009ef R11: 0000000000000000 R12: ffff8912a8ba5a08 R13: ffff95e388917a06 R14: ffff89138dcf68c8 R15: ffff95e388917ace FS: 00007fe587084e80(0000) GS:ffff8913baa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe587091000 CR3: 0000000126dac005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_del_csums+0x2f4/0x540 [btrfs] copy_items+0x4b5/0x560 [btrfs] btrfs_log_inode+0x910/0xf90 [btrfs] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs] ? dget_parent+0x5/0x370 btrfs_log_dentry_safe+0x4a/0x70 [btrfs] btrfs_sync_file+0x42b/0x4d0 [btrfs] __x64_sys_msync+0x199/0x200 do_syscall_64+0x5c/0x280 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fe586c65760 Code: 00 f7 ... RSP: 002b:00007ffe250f98b8 EFLAGS: 00000246 ORIG_RAX: 000000000000001a RAX: ffffffffffffffda RBX: 00000000000040e1 RCX: 00007fe586c65760 RDX: 0000000000000004 RSI: 0000000000006b51 RDI: 00007fe58708b000 RBP: 0000000000006a70 R08: 0000000000000003 R09: 00007fe58700cb61 R10: 0000000000000100 R11: 0000000000000246 R12: 00000000000000e1 R13: 00007fe58708b000 R14: 0000000000006b51 R15: 0000558de021a420 Modules linked in: dm_log_writes ... ---[ end trace c92a7f447a8515f5 ]--- CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: unexport btrfs_compress_set_level()	Anand Jain
	btrfs_compress_set_level() can be static function in the file compression.c. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: simplify iget helpers	David Sterba
	The inode lookup starting at btrfs_iget takes the full location key, while only the objectid is used to match the inode, because the lookup happens inside the given root thus the inode number is unique. The entire location key is properly set up in btrfs_init_locked_inode. Simplify the helpers and pass only inode number, renaming it to 'ino' instead of 'objectid'. This allows to remove temporary variables key, saving some stack space. Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: open code read_fs_root	David Sterba
	After the update to btrfs_get_fs_root, read_fs_root has become trivial wrapper that can be open coded. Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: simplify root lookup by id	David Sterba
	The main function to lookup a root by its id btrfs_get_fs_root takes the whole key, while only using the objectid. The value of offset is preset to (u64)-1 but not actually used until btrfs_find_root that does the actual search. Switch btrfs_get_fs_root to use only objectid and remove all local variables that existed just for the lookup. The actual key for search is set up in btrfs_get_fs_root, reusing another key variable. Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: reloc: clear DEAD_RELOC_TREE bit for orphan roots to prevent runaway ↵	Qu Wenruo
	balance [BUG] There are several reported runaway balance, that balance is flooding the log with "found X extents" where the X never changes. [CAUSE] Commit d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") introduced BTRFS_ROOT_DEAD_RELOC_TREE bit to indicate that one subvolume has finished its tree blocks swap with its reloc tree. However if balance is canceled or hits ENOSPC halfway, we didn't clear the BTRFS_ROOT_DEAD_RELOC_TREE bit, leaving that bit hanging forever until unmount. Any subvolume root with that bit, would cause backref cache to skip this tree block, as it has finished its tree block swap. This would cause all tree blocks of that root be ignored by balance, leading to runaway balance. [FIX] Fix the problem by also clearing the BTRFS_ROOT_DEAD_RELOC_TREE bit for the original subvolume of orphan reloc root. Add an umount check for the stale bit still set. Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: reloc: fix reloc root leak and NULL pointer dereference	Qu Wenruo
	[BUG] When balance is canceled, there is a pretty high chance that unmounting the fs can lead to lead the NULL pointer dereference: BTRFS warning (device dm-3): page private not zero on page 223158272 ... BTRFS warning (device dm-3): page private not zero on page 223162368 BTRFS error (device dm-3): leaked root 18446744073709551608-304 refcount 1 BUG: kernel NULL pointer dereference, address: 0000000000000168 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 5793 Comm: umount Tainted: G O 5.7.0-rc5-custom+ #53 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:__lock_acquire+0x5dc/0x24c0 Call Trace: lock_acquire+0xab/0x390 _raw_spin_lock+0x39/0x80 btrfs_release_extent_buffer_pages+0xd7/0x200 [btrfs] release_extent_buffer+0xb2/0x170 [btrfs] free_extent_buffer+0x66/0xb0 [btrfs] btrfs_put_root+0x8e/0x130 [btrfs] btrfs_check_leaked_roots.cold+0x5/0x5d [btrfs] btrfs_free_fs_info+0xe5/0x120 [btrfs] btrfs_kill_super+0x1f/0x30 [btrfs] deactivate_locked_super+0x3b/0x80 deactivate_super+0x3e/0x50 cleanup_mnt+0x109/0x160 __cleanup_mnt+0x12/0x20 task_work_run+0x67/0xa0 exit_to_usermode_loop+0xc5/0xd0 syscall_return_slowpath+0x205/0x360 do_syscall_64+0x6e/0xb0 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x7fd028ef740b [CAUSE] When balance is canceled, all reloc roots are marked as orphan, and orphan reloc roots are going to be cleaned up. However for orphan reloc roots and merged reloc roots, their lifespan are quite different: Merged reloc roots \| Orphan reloc roots by cancel -------------------------------------------------------------------- create_reloc_root() \| create_reloc_root() \|- refs == 1 \| \|- refs == 1 \| btrfs_grab_root(reloc_root); \| btrfs_grab_root(reloc_root); \|- refs == 2 \| \|- refs == 2 \| root->reloc_root = reloc_root; \| root->reloc_root = reloc_root; >>> No difference so far <<< \| prepare_to_merge() \| prepare_to_merge() \|- btrfs_set_root_refs(item, 1);\| \|- if (!err) (err == -EINTR) \| merge_reloc_roots() \| merge_reloc_roots() \|- merge_reloc_root() \| \|- Doing nothing to put reloc root \|- insert_dirty_subvol() \| \|- refs == 2 \|- __del_reloc_root() \| \|- btrfs_put_root() \| \|- refs == 1 \| >>> Now orphan reloc roots still have refs 2 <<< \| clean_dirty_subvols() \| clean_dirty_subvols() \|- btrfs_drop_snapshot() \| \|- btrfS_drop_snapshot() \|- reloc_root get freed \| \|- reloc_root still has refs 2 \| related ebs get freed, but \| reloc_root still recorded in \| allocated_roots btrfs_check_leaked_roots() \| btrfs_check_leaked_roots() \|- No leaked roots \| \|- Leaked reloc_roots detected \| \|- btrfs_put_root() \| \|- free_extent_buffer(root->node); \| \|- eb already freed, caused NULL \| pointer dereference [FIX] The fix is to clear fs_root->reloc_root and put it at merge_reloc_roots() time, so that we won't leak reloc roots. Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") CC: stable@vger.kernel.org # 5.1+ Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25	btrfs: reduce lock contention when creating snapshot	Robbie Ko
	When creating a snapshot, ordered extents need to be flushed and this can take a long time. In create_snapshot there are two locks held when this happens: 1. Destination directory inode lock 2. Global subvolume semaphore This will unnecessarily block other operations like subvolume destroy, create, or setflag until the snapshot is created. We can fix that by moving the flush outside the locked section as this does not depend on the aforementioned locks. The code factors out the snapshot related work from create_snapshot to btrfs_mksnapshot. __btrfs_ioctl_snap_create btrfs_mksubvol create_subvol btrfs_mksnapshot <flush> btrfs_mksubvol create_snapshot Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>