summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-15hwmon: (pmbus) Fix page count auto-detection.Dmitry Bazhenov
Devices with compatible="pmbus" field have zero initial page count, and pmbus_clear_faults() being called before the page count auto- detection does not actually clear faults because it depends on the page count. Non-cleared faults in its turn may fail the subsequent page count auto-detection. This patch fixes this problem by calling pmbus_clear_fault_page() for currently set page and calling pmbus_clear_faults() after the page count was detected. Cc: stable@vger.kernel.org Signed-off-by: Dmitry Bazhenov <bazhenov.dn@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-10-15Merge branch 'sockmap_and_ktls'Alexei Starovoitov
Daniel Borkmann says: ==================== This work adds a generic sk_msg layer and converts both sockmap and later ktls over to make use of it as a common data structure for application data (similarly as sk_buff for network packets). With that in place the sk_msg framework spans accross ULP layer in the kernel and allows for introspection or filtering of L7 data with the help of BPF programs operating on a common input context. In a second step, we enable the latter for ktls which was previously not possible, meaning, ktls and sk_msg verdict programs were mutually exclusive in the ULP layer which created challenges for the orchestrator when trying to apply TCP based policy, for example. Leveraging the prior consolidation we can finally overcome this limitation. Note, there's no change in behavior when ktls is not used in combination with BPF, and also no change in behavior for stand alone sockmap. The kselftest suites for ktls, sockmap and ktls with sockmap combined also runs through successfully. For further details please see individual patches. Thanks! v1 -> v2: - Removed leftover comment spotted by Alexei - Improved commit messages, rebase ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15bpf, doc: add maintainers entry to related filesDaniel Borkmann
Add a MAINTAINERS entry to the skmsg and related files such that patches, features, bug reports land with the right Cc. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15bpf: add tls support for testing in test_sockmapJohn Fastabend
This adds a --ktls option to test_sockmap in order to enable the combination of ktls and sockmap to run, which makes for another batch of 648 test cases for both in combination. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: add bpf support to sk_msg handlingJohn Fastabend
This work adds BPF sk_msg verdict program support to kTLS allowing BPF and kTLS to be combined together. Previously kTLS and sk_msg verdict programs were mutually exclusive in the ULP layer which created challenges for the orchestrator when trying to apply TCP based policy, for example. To resolve this, leveraging the work from previous patches that consolidates the use of sk_msg, we can finally enable BPF sk_msg verdict programs so they continue to run after the kTLS socket is created. No change in behavior when kTLS is not used in combination with BPF, the kselftest suite for kTLS also runs successfully. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: replace poll implementation with read hookJohn Fastabend
Instead of re-implementing poll routine use the poll callback to trigger read from kTLS, we reuse the stream_memory_read callback which is simpler and achieves the same. This helps to align sockmap and kTLS so we can more easily embed BPF in kTLS. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: convert to generic sk_msg interfaceDaniel Borkmann
Convert kTLS over to make use of sk_msg interface for plaintext and encrypted scattergather data, so it reuses all the sk_msg helpers and data structure which later on in a second step enables to glue this to BPF. This also allows to remove quite a bit of open coded helpers which are covered by the sk_msg API. Recent changes in kTLs 80ece6a03aaf ("tls: Remove redundant vars from tls record structure") and 4e6d47206c32 ("tls: Add support for inplace records encryption") changed the data path handling a bit; while we've kept the latter optimization intact, we had to undo the former change to better fit the sk_msg model, hence the sg_aead_in and sg_aead_out have been brought back and are linked into the sk_msg sgs. Now the kTLS record contains a msg_plaintext and msg_encrypted sk_msg each. In the original code, the zerocopy_from_iter() has been used out of TX but also RX path. For the strparser skb-based RX path, we've left the zerocopy_from_iter() in decrypt_internal() mostly untouched, meaning it has been moved into tls_setup_from_iter() with charging logic removed (as not used from RX). Given RX path is not based on sk_msg objects, we haven't pursued setting up a dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it could be an option to prusue in a later step. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15bpf, sockmap: convert to generic sk_msg interfaceDaniel Borkmann
Add a generic sk_msg layer, and convert current sockmap and later kTLS over to make use of it. While sk_buff handles network packet representation from netdevice up to socket, sk_msg handles data representation from application to socket layer. This means that sk_msg framework spans across ULP users in the kernel, and enables features such as introspection or filtering of data with the help of BPF programs that operate on this data structure. Latter becomes in particular useful for kTLS where data encryption is deferred into the kernel, and as such enabling the kernel to perform L7 introspection and policy based on BPF for TLS connections where the record is being encrypted after BPF has run and came to a verdict. In order to get there, first step is to transform open coding of scatter-gather list handling into a common core framework that subsystems can use. The code itself has been split and refactored into three bigger pieces: i) the generic sk_msg API which deals with managing the scatter gather ring, providing helpers for walking and mangling, transferring application data from user space into it, and preparing it for BPF pre/post-processing, ii) the plain sock map itself where sockets can be attached to or detached from; these bits are independent of i) which can now be used also without sock map, and iii) the integration with plain TCP as one protocol to be used for processing L7 application data (later this could e.g. also be extended to other protocols like UDP). The semantics are the same with the old sock map code and therefore no change of user facing behavior or APIs. While pursuing this work it also helped finding a number of bugs in the old sockmap code that we've fixed already in earlier commits. The test_sockmap kselftest suite passes through fine as well. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tcp, ulp: remove ulp bits from sockmapDaniel Borkmann
In order to prepare sockmap logic to be used in combination with kTLS we need to detangle it from ULP, and further split it in later commits into a generic API. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanupDaniel Borkmann
Whenever the ULP data on the socket is mangled, enforce that the caller has the socket lock held as otherwise things may race with initialization and cleanup callbacks from ulp ops as both would mangle internal socket state. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15hv_balloon: Replace spin_is_locked() with lockdepLance Roy
lockdep_assert_held() is better suited to checking locking requirements, since it won't get confused when someone else holds the lock. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15sgi-xp: Replace spin_is_locked() with lockdepLance Roy
lockdep_assert_held() is better suited to checking locking requirements, since it won't get confused when someone else holds the lock. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Cliff Whickman <cpw@sgi.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Robin Holt <robinmholt@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15eeprom: New ee1004 driver for DDR4 memoryJean Delvare
The EEPROMs which hold the SPD data on DDR4 memory modules are no longer standard AT24C02-compatible EEPROMs. They are 512-byte EEPROMs which use only 1 I2C address for data access. You need to switch between the lower page and the upper page of data by sending commands on the SMBus. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15eeprom: at25: remove unneeded 'at25_remove'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/misc/eeprom/at25.c: In function 'at25_remove': drivers/misc/eeprom/at25.c:384:20: warning: variable 'at25' set but not used [-Wunused-but-set-variable] Since commit 96d08fb43e30 ("eeprom: at25: use devm_nvmem_register()"), at25_remove is do nothing, so can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for ↵Julien Folly
unsigned, count for max size). IAD Register is yet readable trough the "iad" sys file. A write to the "iad" sys file enables or disables the current measurement, but it was not possible to get the measured value by reading it. Fix: %u in snprintf for unsigned values (vdd and vad) Fix: Avoid possibles overflows (Usage of the 'count' variables) Signed-off-by: Julien Folly <julien.folly@gmail.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15misc: mic: scif: remove set but not used variables 'src_dma_addr, dst_dma_addr'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/misc/mic/scif/scif_dma.c: In function 'scif_rma_list_dma_copy_wrapper': drivers/misc/mic/scif/scif_dma.c:1558:27: warning: variable 'dst_dma_addr' set but not used [-Wunused-but-set-variable] drivers/misc/mic/scif/scif_dma.c:1558:13: warning: variable 'src_dma_addr' set but not used [-Wunused-but-set-variable] They never used since introduction in commit 7cc31cd27752 ("misc: mic: SCIF DMA and CPU copy interface") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15misc: mic: fix a DMA pool free failureWenwen Wang
In _scif_prog_signal(), the boolean variable 'x100' is used to indicate whether the MIC Coprocessor is X100. If 'x100' is true, the status descriptor will be used to write the value to the destination. Otherwise, a DMA pool will be allocated for this purpose. Specifically, if the DMA pool is allocated successfully, two memory addresses will be returned. One is for the CPU and the other is for the device to access the DMA pool. The former is stored to the variable 'status' and the latter is stored to the variable 'src'. After the allocation, the address in 'src' is saved to 'status->src_dma_addr', which is actually in the DMA pool, and 'src' is then modified. Later on, if an error occurs, the execution flow will transfer to the label 'dma_fail', which will check 'x100' and free up the allocated DMA pool if 'x100' is false. The point here is that 'status->src_dma_addr' is used for freeing up the DMA pool. As mentioned before, 'status->src_dma_addr' is in the DMA pool. And thus, the device is able to modify this data. This can potentially cause failures when freeing up the DMA pool because of the modified device address. This patch avoids the above issue by using the variable 'src' (with necessary calculation) to free up the DMA pool. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15platform: goldfish: pipe: Add a blank line to separate varibles and codeRoman Kiryanov
checkpacth: Missing a blank line after declarations Signed-off-by: Roman Kiryanov <rkir@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15platform: goldfish: pipe: Remove redundant castingRoman Kiryanov
This casting is not required. Signed-off-by: Roman Kiryanov <rkir@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15platform: goldfish: pipe: Call misc_deregister if init failsRoman Kiryanov
Undo effects of misc_register if driver's init fails after misc_register. Signed-off-by: Roman Kiryanov <rkir@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15platform: goldfish: pipe: Move the file-scope goldfish_pipe_dev variable ↵Roman Kiryanov
into the driver state This is the last patch in the series of patches to move file-scope variables into the driver state. This change will help to introduce another version of the pipe driver (with different state) for the older host interface or having several instances of this device. Signed-off-by: Roman Kiryanov <rkir@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15platform: goldfish: pipe: Move the file-scope goldfish_pipe_miscdev variable ↵Roman Kiryanov
into the driver state This is a series of patches to move mutable file-scope variables into the driver state. This change will help to introduce another version of the pipe driver (with different state) for the older host interface or having several instances of this device. Signed-off-by: Roman Kiryanov <rkir@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15platform: goldfish: pipe: Move the file-scope goldfish_interrupt_tasklet ↵Roman Kiryanov
variable into the driver state This is a series of patches to move mutable file-scope variables into the driver state. This change will help to introduce another version of the pipe driver (with different state) for the older host interface or having several instances of this device. Signed-off-by: Roman Kiryanov <rkir@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15gsmi: Add GSMI commands to log S0ix infoFurquan Shaikh
Add new GSMI commands (GSMI_CMD_LOG_S0IX_SUSPEND = 0xa, GSMI_CMD_LOG_S0IX_RESUME = 0xb) that allow firmware to log any information during S0ix suspend/resume paths. Traditional ACPI suspend S3 involves BIOS both during the suspend and the resume paths. However, modern suspend type like S0ix does not involve firmware on either of the paths. This command gives the firmware an opportunity to log any required information about the suspend and resume operations e.g. wake sources. Additionally, this change adds a module parameter to allow platforms to specifically enable S0ix logging if required. This prevents any other platforms from unnecessarily making a GSMI call which could have any side-effects. Tested by verifying that wake sources are correctly logged in eventlog. Signed-off-by: Furquan Shaikh <furquan@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org> Reviewed-by: Rajat Jain <rajatja@chromium.org> Signed-off-by: Furquan Shaikh <furquan@google.com> Tested-by: Furquan Shaikh <furquan@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org> [zwisler: update changelog for upstream] Signed-off-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15gsmi: Remove autoselected dependency on EFI and EFI_VARSDuncan Laurie
Instead of selecting EFI and EFI_VARS automatically when GSMI is enabled let that portion of the driver be conditionally compiled if EFI and EFI_VARS are enabled. This allows the rest of the driver (specifically event log) to be used if EFI_VARS is not enabled. To test: 1) verify that EFI_VARS is not automatically selected when CONFIG_GOOGLE_GSMI is enabled 2) verify that the kernel boots on Link and that GSMI event log is still available and functional 3) specifically boot the kernel on Alex to ensure it does not try to load efivars and that gsmi also does not load because it is not in the supported DMI table Signed-off-by: Duncan Laurie <dlaurie@chromium.org> Reviewed-by: Olof Johansson <olofj@chromium.org> Signed-off-by: Benson Leung <bleung@chromium.org> Signed-off-by: Ben Zhang <benzh@chromium.org> Signed-off-by: Filipe Brandenburger <filbranden@chromium.org> Signed-off-by: Furquan Shaikh <furquan@google.com> Tested-by: Furquan Shaikh <furquan@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org> [zwisler: update changelog for upstream] Signed-off-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15gsmi: Add coreboot to list of matching BIOS vendorsDuncan Laurie
In order to use this coreboot needs board support for: CONFIG_ELOG=y CONFIG_ELOG_GSMI=y And the kernel driver needs enabled: CONFIG_GOOGLE_GSMI=y To test, verify that clean shutdown event is added to the log: > mosys eventlog list | grep 'Clean Shutdown' 11 | 2012-06-25 09:49:24 | Kernl Event | Clean Shutdown Signed-off-by: Duncan Laurie <dlaurie@chromium.org> Reviewed-by: Vadim Bendebury <vbendeb@chromium.org> Reviewed-by: Stefan Reinauer <reinauer@chromium.org> Signed-off-by: Furquan Shaikh <furquan@google.com> Tested-by: Furquan Shaikh <furquan@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org> Reviewed-by: Justin TerAvest <teravest@chromium.org> [zwisler: update changelog for upstream] Signed-off-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15gsmi: Fix bug in append_to_eventlog sysfs handlerDuncan Laurie
The sysfs handler should return the number of bytes consumed, which in the case of a successful write is the entire buffer. Also fix a bug where param.data_len was being set to (count - (2 * sizeof(u32))) instead of just (count - sizeof(u32)). The latter is correct because we skip over the leading u32 which is our param.type, but we were also incorrectly subtracting sizeof(u32) on the line where we were actually setting param.data_len: param.data_len = count - sizeof(u32); This meant that for our example event.kernel_software_watchdog with total length 10 bytes, param.data_len was just 2 prior to this change. To test, successfully append an event to the log with gsmi sysfs. This sample event is for a "Kernel Software Watchdog" > xxd -g 1 event.kernel_software_watchdog 0000000: 01 00 00 00 ad de 06 00 00 00 > cat event.kernel_software_watchdog > /sys/firmware/gsmi/append_to_eventlog > mosys eventlog list | tail -1 14 | 2012-06-25 10:14:14 | Kernl Event | Software Watchdog Signed-off-by: Duncan Laurie <dlaurie@chromium.org> Reviewed-by: Vadim Bendebury <vbendeb@chromium.org> Reviewed-by: Stefan Reinauer <reinauer@chromium.org> Signed-off-by: Furquan Shaikh <furquan@google.com> Tested-by: Furquan Shaikh <furquan@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org> Reviewed-by: Justin TerAvest <teravest@chromium.org> [zwisler: updated changelog for 2nd bug fix and upstream] Signed-off-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15MAINTAINERS: Add me to Android driversJoel Fernandes (Google)
I am one of the main engineers working on ashmem. I have been fixing bugs in the driver and have been involved in the memfd conversion discussions and sending patches about that. I also have an understanding of the binder driver and was involved with some of the development on finer grained locking. So I would like to be added to the MAINTAINERS file for android drivers for review and maintenance of ashmem and other Android drivers. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Todd Kjos <tkjos@android.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15Input: elan_i2c - add ACPI ID for Lenovo IdeaPad 330-15IGMMikhail Nikiforov
Add ELAN061C to the ACPI table to support Elan touchpad found in Lenovo IdeaPad 330-15IGM. Signed-off-by: Mikhail Nikiforov <jackxviichaos@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-10-15gfs2: write revokes should traverse sd_ail1_list in reverseBob Peterson
All the other functions that deal with the sd_ail_list run the list from the tail back to the head, iow, in reverse. We should do the same while writing revokes, otherwise we might miss removing entries properly from the list when we hit the limit of how many revokes we can write at one time (based on block size, which determines how many block pointers will fit in the revoke block). Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-10-15dmaengine: owl: Fix warnings generated during buildManivannan Sadhasivam
Following warnings are generated when compiled with W=1, drivers/dma/owl-dma.c:170: warning: Function parameter or member 'cyclic' not described in 'owl_dma_txd' drivers/dma/owl-dma.c:198: warning: Function parameter or member 'cfg' not described in 'owl_dma_vchan' drivers/dma/owl-dma.c:198: warning: Function parameter or member 'drq' not described in 'owl_dma_vchan' drivers/dma/owl-dma.c:225: warning: Function parameter or member 'irq' not described in 'owl_dma' Fix this by adding comments for relevant struct members to appear in kernel-doc. Fixes: d64e1b3f5cce ("dmaengine: owl: Add Slave and Cyclic mode support for Actions Semi Owl S900 SoC") Reported-by: Vinod Koul <vinod.koul@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Vinod Koul <vkoul@kernel.org>
2018-10-15btrfs: switch return_bigger to bool in find_ref_headLu Fengqi
Using bool is more suitable than int here, and add the comment about the return_bigger. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: remove fs_info from btrfs_should_throttle_delayed_refsLu Fengqi
The avg_delayed_ref_runtime can be referenced from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: remove fs_info from btrfs_check_space_for_delayed_refsLu Fengqi
It can be referenced from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lockLu Fengqi
Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btrfs_delayed_ref_lock(). No functional change. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: delayed-ref: pass delayed_refs directly to btrfs_select_ref_headLu Fengqi
Since trans is only used for referring to delayed_refs, there is no need to pass it instead of delayed_refs to btrfs_select_ref_head(). No functional change. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: qgroup: move the qgroup->members check out from (!qgroup)'s else branchLu Fengqi
There is no reason to put this check in (!qgroup)'s else branch because if qgroup is null, it will goto out directly. So move it out to reduce indentation level. No functional change. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: relocation: Remove redundant tree level checkQu Wenruo
Commit 581c1760415c ("btrfs: Validate child tree block's level and first key") has made tree block level check mandatory. So if tree block level doesn't match, we won't get a valid extent buffer. The extra WARN_ON() check can be removed completely. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: relocation: Cleanup while loop using rbtree_postorder_for_each_entry_safeQu Wenruo
And add one line comment explaining what we're doing for each loop. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: qgroup: Avoid calling qgroup functions if qgroup is not enabledQu Wenruo
Some qgroup trace events like btrfs_qgroup_release_data() and btrfs_qgroup_free_delayed_ref() can still be triggered even if qgroup is not enabled. This is caused by the lack of qgroup status check before calling some qgroup functions. Thankfully the functions can handle quota disabled case well and just do nothing for qgroup disabled case. This patch will do earlier check before triggering related trace events. And for enabled <-> disabled race case: 1) For enabled->disabled case Disable will wipe out all qgroups data including reservation and excl/rfer. Even if we leak some reservation or numbers, it will still be cleared, so nothing will go wrong. 2) For disabled -> enabled case Current btrfs_qgroup_release_data() will use extent_io tree to ensure we won't underflow reservation. And for delayed_ref we use head->qgroup_reserved to record the reserved space, so in that case head->qgroup_reserved should be 0 and we won't underflow. CC: stable@vger.kernel.org # 4.14+ Reported-by: Chris Murphy <lists@colorremedies.com> Link: https://lore.kernel.org/linux-btrfs/CAJCQCtQau7DtuUUeycCkZ36qjbKuxNzsgqJ7+sJ6W0dK_NLE3w@mail.gmail.com/ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15Btrfs: fix wrong dentries after fsync of file that got its parent replacedFilipe Manana
In a scenario like the following: mkdir /mnt/A # inode 258 mkdir /mnt/B # inode 259 touch /mnt/B/bar # inode 260 sync mv /mnt/B/bar /mnt/A/bar mv -T /mnt/A /mnt/B fsync /mnt/B/bar <power fail> After replaying the log we end up with file bar having 2 hard links, both with the name 'bar' and one in the directory with inode number 258 and the other in the directory with inode number 259. Also, we end up with the directory inode 259 still existing and with the directory inode 258 still named as 'A', instead of 'B'. In this scenario, file 'bar' should only have one hard link, located at directory inode 258, the directory inode 259 should not exist anymore and the name for directory inode 258 should be 'B'. This incorrect behaviour happens because when attempting to log the old parents of an inode, we skip any parents that no longer exist. Fix this by forcing a full commit if an old parent no longer exists. A test case for fstests follows soon. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15Btrfs: fix warning when replaying log after fsync of a tmpfileFilipe Manana
When replaying a log which contains a tmpfile (which necessarily has a link count of 0) we end up calling inc_nlink(), at fs/btrfs/tree-log.c:replay_one_buffer(), which produces a warning like the following: [195191.943673] WARNING: CPU: 0 PID: 6924 at fs/inode.c:342 inc_nlink+0x33/0x40 [195191.943723] CPU: 0 PID: 6924 Comm: mount Not tainted 4.19.0-rc6-btrfs-next-38 #1 [195191.943724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [195191.943726] RIP: 0010:inc_nlink+0x33/0x40 [195191.943728] RSP: 0018:ffffb96e425e3870 EFLAGS: 00010246 [195191.943730] RAX: 0000000000000000 RBX: ffff8c0d1e6af4f0 RCX: 0000000000000006 [195191.943731] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8c0d1e6af4f0 [195191.943731] RBP: 0000000000000097 R08: 0000000000000001 R09: 0000000000000000 [195191.943732] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb96e425e3a60 [195191.943733] R13: ffff8c0d10cff0c8 R14: ffff8c0d0d515348 R15: ffff8c0d78a1b3f8 [195191.943735] FS: 00007f570ee24480(0000) GS:ffff8c0dfb200000(0000) knlGS:0000000000000000 [195191.943736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [195191.943737] CR2: 00005593286277c8 CR3: 00000000bb8f2006 CR4: 00000000003606f0 [195191.943739] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [195191.943740] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [195191.943741] Call Trace: [195191.943778] replay_one_buffer+0x797/0x7d0 [btrfs] [195191.943802] walk_up_log_tree+0x1c1/0x250 [btrfs] [195191.943809] ? rcu_read_lock_sched_held+0x3f/0x70 [195191.943825] walk_log_tree+0xae/0x1d0 [btrfs] [195191.943840] btrfs_recover_log_trees+0x1d7/0x4d0 [btrfs] [195191.943856] ? replay_dir_deletes+0x280/0x280 [btrfs] [195191.943870] open_ctree+0x1c3b/0x22a0 [btrfs] [195191.943887] btrfs_mount_root+0x6b4/0x800 [btrfs] [195191.943894] ? rcu_read_lock_sched_held+0x3f/0x70 [195191.943899] ? pcpu_alloc+0x55b/0x7c0 [195191.943906] ? mount_fs+0x3b/0x140 [195191.943908] mount_fs+0x3b/0x140 [195191.943912] ? __init_waitqueue_head+0x36/0x50 [195191.943916] vfs_kern_mount+0x62/0x160 [195191.943927] btrfs_mount+0x134/0x890 [btrfs] [195191.943936] ? rcu_read_lock_sched_held+0x3f/0x70 [195191.943938] ? pcpu_alloc+0x55b/0x7c0 [195191.943943] ? mount_fs+0x3b/0x140 [195191.943952] ? btrfs_remount+0x570/0x570 [btrfs] [195191.943954] mount_fs+0x3b/0x140 [195191.943956] ? __init_waitqueue_head+0x36/0x50 [195191.943960] vfs_kern_mount+0x62/0x160 [195191.943963] do_mount+0x1f9/0xd40 [195191.943967] ? memdup_user+0x4b/0x70 [195191.943971] ksys_mount+0x7e/0xd0 [195191.943974] __x64_sys_mount+0x21/0x30 [195191.943977] do_syscall_64+0x60/0x1b0 [195191.943980] entry_SYSCALL_64_after_hwframe+0x49/0xbe [195191.943983] RIP: 0033:0x7f570e4e524a [195191.943986] RSP: 002b:00007ffd83589478 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [195191.943989] RAX: ffffffffffffffda RBX: 0000563f335b2060 RCX: 00007f570e4e524a [195191.943990] RDX: 0000563f335b2240 RSI: 0000563f335b2280 RDI: 0000563f335b2260 [195191.943992] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020 [195191.943993] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000563f335b2260 [195191.943994] R13: 0000563f335b2240 R14: 0000000000000000 R15: 00000000ffffffff [195191.944002] irq event stamp: 8688 [195191.944010] hardirqs last enabled at (8687): [<ffffffff9cb004c3>] console_unlock+0x503/0x640 [195191.944012] hardirqs last disabled at (8688): [<ffffffff9ca037dd>] trace_hardirqs_off_thunk+0x1a/0x1c [195191.944018] softirqs last enabled at (8638): [<ffffffff9cc0a5d1>] __set_page_dirty_nobuffers+0x101/0x150 [195191.944020] softirqs last disabled at (8634): [<ffffffff9cc26bbe>] wb_wakeup_delayed+0x2e/0x60 [195191.944022] ---[ end trace 5d6e873a9a0b811a ]--- This happens because the inode does not have the flag I_LINKABLE set, which is a runtime only flag, not meant to be persisted, set when the inode is created through open(2) if the flag O_EXCL is not passed to it. Except for the warning, there are no other consequences (like corruptions or metadata inconsistencies). Since it's pointless to replay a tmpfile as it would be deleted in a later phase of the log replay procedure (it has a link count of 0), fix this by not logging tmpfiles and if a tmpfile is found in a log (created by a kernel without this change), skip the replay of the inode. A test case for fstests follows soon. Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay") CC: stable@vger.kernel.org # 4.18+ Reported-by: Martin Steigerwald <martin@lichtvoll.de> Link: https://lore.kernel.org/linux-btrfs/3666619.NTnn27ZJZE@merkaba/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: drop min_size from evict_refill_and_joinJosef Bacik
We don't need it, rsv->size is set once and never changes throughout its lifetime, so just use that for the reserve size. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: assert on non-empty delayed iputsJosef Bacik
I ran into an issue where there was some reference being held on an inode that I couldn't track. This assert wasn't triggered, but it at least rules out we're doing something stupid. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: make sure we create all new block groupsJosef Bacik
Allocating new chunks modifies both the extent and chunk tree, which can trigger new chunk allocations. So instead of doing list_for_each_safe, just do while (!list_empty()) so we make sure we don't exit with other pending bg's still on our list. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: reset max_extent_size on clear in a bitmapJosef Bacik
We need to clear the max_extent_size when we clear bits from a bitmap since it could have been from the range that contains the max_extent_size. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: protect space cache inode alloc with GFP_NOFSJosef Bacik
If we're allocating a new space cache inode it's likely going to be under a transaction handle, so we need to use memalloc_nofs_save() in order to avoid deadlocks, and more importantly lockdep messages that make xfstests fail. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: release metadata before running delayed refsJosef Bacik
We want to release the unused reservation we have since it refills the delayed refs reserve, which will make everything go smoother when running the delayed refs if we're short on our reservation. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15Btrfs: kill btrfs_clear_path_blockingLiu Bo
Btrfs's btree locking has two modes, spinning mode and blocking mode, while searching btree, locking is always acquired in spinning mode and then converted to blocking mode if necessary, and in some hot paths we may switch the locking back to spinning mode by btrfs_clear_path_blocking(). When acquiring locks, both of reader and writer need to wait for blocking readers and writers to complete before doing read_lock()/write_lock(). The problem is that btrfs_clear_path_blocking() needs to switch nodes in the path to blocking mode at first (by btrfs_set_path_blocking) to make lockdep happy before doing its actual clearing blocking job. When switching to blocking mode from spinning mode, it consists of step 1) bumping up blocking readers counter and step 2) read_unlock()/write_unlock(), this has caused serious ping-pong effect if there're a great amount of concurrent readers/writers, as waiters will be woken up and go to sleep immediately. 1) Killing this kind of ping-pong results in a big improvement in my 1600k files creation script, MNT=/mnt/btrfs mkfs.btrfs -f /dev/sdf mount /dev/def $MNT time fsmark -D 10000 -S0 -n 100000 -s 0 -L 1 -l /tmp/fs_log.txt \ -d $MNT/0 -d $MNT/1 \ -d $MNT/2 -d $MNT/3 \ -d $MNT/4 -d $MNT/5 \ -d $MNT/6 -d $MNT/7 \ -d $MNT/8 -d $MNT/9 \ -d $MNT/10 -d $MNT/11 \ -d $MNT/12 -d $MNT/13 \ -d $MNT/14 -d $MNT/15 w/o patch: real 2m27.307s user 0m12.839s sys 13m42.831s w/ patch: real 1m2.273s user 0m15.802s sys 8m16.495s 1.1) latency histogram from funclatency[1] Overall with the patch, there're ~50% less write lock acquisition and the 95% max latency that write lock takes also reduces to ~100ms from >500ms. -------------------------------------------- w/o patch: -------------------------------------------- Function = btrfs_tree_lock msecs : count distribution 0 -> 1 : 2385222 |****************************************| 2 -> 3 : 37147 | | 4 -> 7 : 20452 | | 8 -> 15 : 13131 | | 16 -> 31 : 3877 | | 32 -> 63 : 3900 | | 64 -> 127 : 2612 | | 128 -> 255 : 974 | | 256 -> 511 : 165 | | 512 -> 1023 : 13 | | Function = btrfs_tree_read_lock msecs : count distribution 0 -> 1 : 6743860 |****************************************| 2 -> 3 : 2146 | | 4 -> 7 : 190 | | 8 -> 15 : 38 | | 16 -> 31 : 4 | | -------------------------------------------- w/ patch: -------------------------------------------- Function = btrfs_tree_lock msecs : count distribution 0 -> 1 : 1318454 |****************************************| 2 -> 3 : 6800 | | 4 -> 7 : 3664 | | 8 -> 15 : 2145 | | 16 -> 31 : 809 | | 32 -> 63 : 219 | | 64 -> 127 : 10 | | Function = btrfs_tree_read_lock msecs : count distribution 0 -> 1 : 6854317 |****************************************| 2 -> 3 : 2383 | | 4 -> 7 : 601 | | 8 -> 15 : 92 | | 2) dbench also proves the improvement, dbench -t 120 -D /mnt/btrfs 16 w/o patch: Throughput 158.363 MB/sec w/ patch: Throughput 449.52 MB/sec 3) xfstests didn't show any additional failures. One thing to note is that callers may set path->leave_spinning to have all nodes in the path stay in spinning mode, which means callers are ready to not sleep before releasing the path, but it won't cause problems if they don't want to sleep in blocking mode. [1]: https://github.com/iovisor/bcc/blob/master/tools/funclatency.py Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15btrfs: dev-replace: remove pointless assert in write unlockDavid Sterba
The value of blocking_readers is increased only when the lock is taken for read, no way we can fail the condition with the write lock. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>