linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-12-17	kprobes/x86: Remove unneeded arch_within_kprobe_blacklist from x86	Masami Hiramatsu
	Remove x86 specific arch_within_kprobe_blacklist(). Since we have already added all blacklisted symbols to the kprobe blacklist by arch_populate_kprobe_blacklist(), we don't need arch_within_kprobe_blacklist() on x86 anymore. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503491354.26176.13903264647254766066.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17	kprobes/x86: Show x86-64 specific blacklisted symbols correctly	Masami Hiramatsu
	Show x86-64 specific blacklisted symbols in debugfs. Since x86-64 prohibits probing on symbols which are in entry text, those should be shown. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503488425.26176.17136784384033608516.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17	kprobes: Blacklist symbols in arch-defined prohibited area	Masami Hiramatsu
	Blacklist symbols in arch-defined probe-prohibited areas. With this change, user can see all symbols which are prohibited to probe in debugfs. All archtectures which have custom prohibit areas should define its own arch_populate_kprobe_blacklist() function, but unless that, all symbols marked __kprobes are blacklisted. Reported-by: Andrea Righi <righi.andrea@gmail.com> Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503485491.26176.15823229545155174796.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17	Merge tag 'v4.20-rc7' into perf/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17	posix-timers: Fix division by zero bug	Thomas Gleixner
	The signal delivery path of posix-timers can try to rearm the timer even if the interval is zero. That's handled for the common case (hrtimer) but not for alarm timers. In that case the forwarding function raises a division by zero exception. The handling for hrtimer based posix timers is wrong because it marks the timer as active despite the fact that it is stopped. Move the check from common_hrtimer_rearm() to posixtimer_rearm() to cure both issues. Reported-by: syzbot+9d38bedac9cc77b8ad5e@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: sboyd@kernel.org Cc: stable@vger.kernel.org Cc: syzkaller-bugs@googlegroups.com Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1812171328050.1880@nanos.tec.linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-17	selftests: Fix test errors related to lib.mk khdr target	Shuah Khan
	Commit b2d35fa5fc80 ("selftests: add headers_install to lib.mk") added khdr target to run headers_install target from the main Makefile. The logic uses KSFT_KHDR_INSTALL and top_srcdir as controls to initialize variables and include files to run headers_install from the top level Makefile. There are a few problems with this logic. 1. Exposes top_srcdir to all tests 2. Common logic impacts all tests 3. Uses KSFT_KHDR_INSTALL, top_srcdir, and khdr in an adhoc way. Tests add "khdr" dependency in their Makefiles to TEST_PROGS_EXTENDED in some cases, and STATIC_LIBS in other cases. This makes this framework confusing to use. The common logic that runs for all tests even when KSFT_KHDR_INSTALL isn't defined by the test. top_srcdir is initialized to a default value when test doesn't initialize it. It works for all tests without a sub-dir structure and tests with sub-dir structure fail to build. e.g: make -C sparc64/drivers/ or make -C drivers/dma-buf ../../lib.mk:20: ../../../../scripts/subarch.include: No such file or directory make: *** No rule to make target '../../../../scripts/subarch.include'. Stop. There is no reason to require all tests to define top_srcdir and there is no need to require tests to add khdr dependency using adhoc changes to TEST_* and other variables. Fix it with a consistent use of KSFT_KHDR_INSTALL and top_srcdir from tests that have the dependency on headers_install. Change common logic to include khdr target define and "all" target with dependency on khdr when KSFT_KHDR_INSTALL is defined. Only tests that have dependency on headers_install have to define just the KSFT_KHDR_INSTALL, and top_srcdir variables and there is no need to specify khdr dependency in the test Makefiles. Fixes: b2d35fa5fc80 ("selftests: add headers_install to lib.mk") Cc: stable@vger.kernel.org Signed-off-by: Shuah Khan <shuah@kernel.org> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Shuah Khan <shuah@kernel.org>
2018-12-17	media: docs: fix some GPL licensing ambiguity at the text	Mauro Carvalho Chehab
	Those files are meant to be dual GPL 2.0 and GFDL without implicit sections. However, by a wrong cut-and-paste, I ended by applying a GPL 2+ license text to it, while still using the GPL 2.0 SPDX tag, with would cause an ambiguity about the licensing model. Solve this by explicitly mentioning that the dual licensing is between GPL 2.0 and GFDL and correcting the text. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2018-12-17	Merge tag 'v4.20-rc7' into patchwork	Mauro Carvalho Chehab
	Linux 4.20-rc7 * tag 'v4.20-rc7': (403 commits) Linux 4.20-rc7 scripts/spdxcheck.py: always open files in binary mode checkstack.pl: fix for aarch64 userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered fs/iomap.c: get/put the page in iomap_page_create/release() hugetlbfs: call VM_BUG_ON_PAGE earlier in free_huge_page() memblock: annotate memblock_is_reserved() with __init_memblock psi: fix reference to kernel commandline enable arch/sh/include/asm/io.h: provide prototypes for PCI I/O mapping in asm/io.h mm/sparse: add common helper to mark all memblocks present mm: introduce common STRUCT_PAGE_MAX_SHIFT define alpha: fix hang caused by the bootmem removal XArray: Fix xa_alloc when id exceeds max drm/vmwgfx: Protect from excessive execbuf kernel memory allocations v3 MAINTAINERS: Daniel for drm co-maintainer drm/amdgpu: drop fclk/gfxclk ratio setting IB/core: Fix oops in netdev_next_upper_dev_rcu() dm thin: bump target version drm/vmwgfx: remove redundant return ret statement drm/i915: Flush GPU relocs harder for gen3 ...
2018-12-17	xen/pciback: Check dev_data before using it	Ross Lagerwall
	If pcistub_init_device fails, the release function will be called with dev_data set to NULL. Check it before using it to avoid a NULL pointer dereference. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-17	kprobes/x86/xen: blacklist non-attachable xen interrupt functions	Andrea Righi
	Blacklist symbols in Xen probe-prohibited areas, so that user can see these prohibited symbols in debugfs. See also: a50480cb6d61. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-17	Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again"	Paul Burton
	Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again") makes a change to FIFO clearing code which its commit message suggests was intended to be specific to use with RS485 mode, however: 1) The change made does not just affect __do_stop_tx_rs485(), it also affects other uses of serial8250_clear_fifos() including paths for starting up, shutting down or auto-configuring a port regardless of whether it's an RS485 port or not. 2) It makes the assumption that resetting the FIFOs is a no-op when FIFOs are disabled, and as such it checks for this case & explicitly avoids setting the FIFO reset bits when the FIFO enable bit is clear. A reading of the PC16550D manual would suggest that this is OK since the FIFO should automatically be reset if it is later enabled, but we support many 16550-compatible devices and have never required this auto-reset behaviour for at least the whole git era. Starting to rely on it now seems risky, offers no benefit, and indeed breaks at least the Ingenic JZ4780's UARTs which reads garbage when the RX FIFO is enabled if we don't explicitly reset it. 3) By only resetting the FIFOs if they're enabled, the behaviour of serial8250_do_startup() during boot now depends on what the value of FCR is before the 8250 driver is probed. This in itself seems questionable and leaves us with FCR=0 & no FIFO reset if the UART was used by 8250_early, otherwise it depends upon what the bootloader left behind. 4) Although the naming of serial8250_clear_fifos() may be unclear, it is clear that callers of it expect that it will disable FIFOs. Both serial8250_do_startup() & serial8250_do_shutdown() contain comments to that effect, and other callers explicitly re-enable the FIFOs after calling serial8250_clear_fifos(). The premise of that patch that disabling the FIFOs is incorrect therefore seems wrong. For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again"). Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again"). Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Daniel Jedrychowski <avistel@gmail.com> Cc: Marek Vasut <marex@denx.de> Cc: linux-mips@vger.kernel.org Cc: linux-serial@vger.kernel.org Cc: stable <stable@vger.kernel.org> # 4.10+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	tty: serial: samsung: Increase maximum baudrate	Seung-Woo Kim
	This driver can be used to communicate with Bluetooth chip in high-speed UART mode, so increase the maximum baudrate to 3Mbps. Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com> [mszyprow: rephrased commit message] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	tty: serial: samsung: Properly set flags in autoCTS mode	Beomho Seo
	Commit 391f93f2ec9f ("serial: core: Rework hw-assited flow control support") has changed the way the autoCTS mode is handled. According to that change, serial drivers which enable H/W autoCTS mode must set UPSTAT_AUTOCTS to prevent the serial core from inadvertently disabling TX. This patch adds proper handling of UPSTAT_AUTOCTS flag. Signed-off-by: Beomho Seo <beomho.seo@samsung.com> [mszyprow: rephrased commit message] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	tty: Use of_node_name_{eq,prefix} for node name comparisons	Rob Herring
	Convert string compares of DT node names to use of_node_name_eq helper instead. This removes direct access to the node name pointer. For hvc, the code can also be simplified by using of_stdout pointer instead of searching again for the stdout node. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-serial@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	tty/serial: do not free trasnmit buffer page under port lock	Sergey Senozhatsky
	LKP has hit yet another circular locking dependency between uart console drivers and debugobjects [1]: CPU0 CPU1 rhltable_init() __init_work() debug_object_init uart_shutdown() /* db->lock / / uart_port->lock / debug_print_object() free_page() printk() call_console_drivers() debug_check_no_obj_freed() / uart_port->lock / / db->lock */ debug_print_object() So there are two dependency chains: uart_port->lock -> db->lock And db->lock -> uart_port->lock This particular circular locking dependency can be addressed in several ways: a) One way would be to move debug_print_object() out of db->lock scope and, thus, break the db->lock -> uart_port->lock chain. b) Another one would be to free() transmit buffer page out of db->lock in UART code; which is what this patch does. It makes sense to apply a) and b) independently: there are too many things going on behind free(), none of which depend on uart_port->lock. The patch fixes transmit buffer page free() in uart_shutdown() and, additionally, in uart_port_startup() (as was suggested by Dmitry Safonov). [1] https://lore.kernel.org/lkml/20181211091154.GL23332@shao2-debian/T/#u Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Waiman Long <longman@redhat.com> Cc: Dmitry Safonov <dima@arista.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	staging: most: Documentation: add information to driver_usage file	Christian Gromm
	This patch updates driver_usage.txt file to reflect the latest changes that this patch set introduces. Signed-off-by: Christian Gromm <christian.gromm@microchip.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	staging: most: sound: remove channel number from ALSA card's long name	Christian Gromm
	Adding the channel number to the name of the sound card is wrong, as the card does not represent a single streaming channel of the MOST device. Signed-off-by: Christian Gromm <christian.gromm@microchip.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	staging: most: sound: use static name for ALSA card	Christian Gromm
	This patch uses a static name for the sound card's short name and long name. Having the card names configurable doesn't make sense anymore, as the card represents the same physical hardware. Signed-off-by: Christian Gromm <christian.gromm@microchip.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	staging: most: sound: rename variable	Christian Gromm
	Since the channels of a MOST device are now being represented as individual PCM devices of one sound card, the variable card_name is not suitable anymore to describe them. Therefore, this patch renames the variable to device_name. Signed-off-by: Christian Gromm <christian.gromm@microchip.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	staging: most: sound: correct label name	Christian Gromm
	This patch fixes the lable name that is used to jump to error handling section of function audio_probe_channel() in case something went wrong. Signed-off-by: Christian Gromm <christian.gromm@microchip.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	staging: most: sound: create one sound card w/ multiple PCM devices per MOST ↵	Christian Gromm
	device This patch avoids that a sound card is created and registered with ALSA every time a channel is being linked. Instead the channels are hooked on the same card, which is registered not until the final link has been added to the component. The string provided by user space that used to be the card name becomes the PCM device name. The user space API to add a link is being expanded by a "create" flag to trigger the registration. Signed-off-by: Christian Gromm <christian.gromm@microchip.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	serial: 8250: Rate limit serial port rx interrupts during input overruns	Darwin Dingel
	When a serial port gets faulty or gets flooded with inputs, its interrupt handler starts to work double time to get the characters to the workqueue for the tty layer to handle them. When this busy time on the serial/tty subsystem happens during boot, where it is also busy on the userspace trying to initialise, some processes can continuously get preempted and will be on hold until the interrupts subside. The fix is to backoff on processing received characters for a specified amount of time when an input overrun is seen (received a new character before the previous one is processed). This only stops receive and will continue to transmit characters to serial port. After the backoff period is done, it receive will be re-enabled. This is optional and will only be enabled by setting 'overrun-throttle-ms' in the dts. Signed-off-by: Darwin Dingel <darwin.dingel@alliedtelesis.co.nz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	dt-bindings: serial: 8250: Add rate limit for serial port input overruns	Darwin Dingel
	When a serial port continuously experiences input overrun from (1) continuous receive characters from remote and or (2) hardware issues, its interrupt handler can preempt other tasks especially when the system is busy (ie. boot up period). This can cause other tasks to get starved of processing time from the CPU. When this dts binding is enabled and input overrun on the serial port is detected, serial port receive will be throttled to give some breathing room for processing other tasks. Value provided will be in milliseconds. &serial0{ overrun-throttle-ms = <500>; }; Signed-off-by: Darwin Dingel <darwin.dingel@alliedtelesis.co.nz> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	USB: xhci: fix 'broken_suspend' placement in struct xchi_hcd	Nicolas Saenz Julienne
	As commented in the struct's definition there shouldn't be anything underneath its 'priv[0]' member as it would break some macros. The patch converts the broken_suspend into a bit-field and relocates it next to to the rest of bit-fields. Fixes: a7d57abcc8a5 ("xhci: workaround CSS timeout on AMD SNPS 3.0 xHC") Reported-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	gpio: pca953x: Add regmap dependency for PCA953x driver	Marek Vasut
	Select REGMAP_I2C in Kconfig, since the driver now depends on regmap and this was missing, thus breaking build on various systems. Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-12-17	gpio: raspberrypi-exp: decrease refcount on firmware dt node	Nicolas Saenz Julienne
	We're getting a reference RPi's firmware node in order to be able to communicate with it's driver. We should decrease the reference count on the dt node after being done with it. Fixes: a98d90e7d588 ("gpio: raspberrypi-exp: Driver for RPi3 GPIO expander via mailbox service") Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-12-17	tty: serial: qcom_geni_serial: Remove interrupt storm	Ryan Case
	Disable M_TX_FIFO_WATERMARK_EN after we've sent all data for a given transaction so we don't continue to receive a flurry of free space interrupts while waiting for the M_CMD_DONE notification. Re-enable the watermark when establishing the next transaction. Also clear the watermark interrupt after filling the FIFO so we do not receive notification again prior to actually having free space. Signed-off-by: Ryan Case <ryandcase@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	HID: intel-ish-hid: fixes incorrect error handling	Pan Bian
	The memory chunk allocated by hid_allocate_device() should be released by hid_destroy_device(), not kfree(). Fixes: 0b28cb4bcb1("HID: intel-ish-hid: ISH HID client driver") Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-12-17	serial: sh-sci: Resume PIO in sci_rx_interrupt() on DMA failure	Geert Uytterhoeven
	On (H)SCIF, sci_submit_rx() is called in the receive interrupt handler. Hence if DMA submission fails, the interrupt handler should resume handling reception using PIO, else no more data is received. Make sci_submit_rx() return an error indicator, so the receive interrupt handler can act appropriately. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback	Geert Uytterhoeven
	When falling back to PIO, active_rx must be set to a different value than cookie_rx[i], else sci_dma_rx_find_active() will incorrectly find a match, leading to a NULL pointer dereference in rx_timer_fn() later. Use zero instead, which is the same value as after driver initialization. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	serial: sh-sci: Fix locking in sci_submit_rx()	Geert Uytterhoeven
	Some callers of sci_submit_rx() hold the port spinlock, others don't. During fallback to PIO, the driver needs to obtain the port spinlock. If the lock was already held, spinlock recursion is detected, causing a deadlock: BUG: spinlock recursion on CPU#0. Fix this by adding a flag parameter to sci_submit_rx() for the caller to indicate the port spinlock is already held, so spinlock recursion can be avoided. Move the spin_lock_irqsave() up, so all DMA disable steps are protected, which is safe as the recently introduced dmaengine_terminate_async() can be called in atomic context. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17	btrfs: Fix typos in comments and strings	Andrea Gelmini
	The typos accumulate over time so once in a while time they get fixed in a large patch. Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: improve error handling of btrfs_add_link	Johannes Thumshirn
	In the error handling block, err holds the return value of either btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked since it's introduction with commit fe66a05a0679 (Btrfs: improve error handling for btrfs_insert_dir_item callers) in 2012. If the error handling in the error handling fails, there's not much left to do and the abort either happened earlier in the callees or is necessary here. So if one of btrfs_del_root_ref() or btrfs_del_inode_ref() failed, abort the transaction, but still return the original code of the failure stored in 'ret' as this will be reported to the user. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	Btrfs: use generic_remap_file_range_prep() for cloning and deduplication	Filipe Manana
	Since cloning and deduplication are no longer Btrfs specific operations, we now have generic code to handle parameter validation, compare file ranges used for deduplication, clear capabilities when cloning, etc. This change makes Btrfs use it, eliminating a lot of code in Btrfs and also fixing a few bugs, such as: 1) When cloning, the destination file's capabilities were not dropped (the fstest generic/513 tests this); 2) We were not checking if the destination file is immutable; 3) Not checking if either the source or destination files are swap files (swap file support is coming soon for Btrfs); 4) System limits were not checked (resource limits and O_LARGEFILE). Note that the generic helper generic_remap_file_range_prep() does start and waits for writeback by calling filemap_write_and_wait_range(), however that is not enough for Btrfs for two reasons: 1) With compression, we need to start writeback twice in order to get the pages marked for writeback and ordered extents created; 2) filemap_write_and_wait_range() (and all its other variants) only waits for the IO to complete, but we need to wait for the ordered extents to finish, so that when we do the actual reflinking operations the file extent items are in the fs tree. This is also important due to the fact that the generic helper, for the deduplication case, compares the contents of the pages in the requested range, which might require reading extents from disk in the very unlikely case that pages get invalidated after writeback finishes (so the file extent items must be up to date in the fs tree). Since these reasons are specific to Btrfs we have to do it in the Btrfs code before calling generic_remap_file_range_prep(). This also results in a simpler way of dealing with existing delalloc in the source/target ranges, specially for the deduplication case where we used to lock all the pages first and then if we found any dealloc for the range, or ordered extent, we would unlock the pages trigger writeback and wait for ordered extents to complete, then lock all the pages again and check if deduplication can be done. So now we get a simpler approach: lock the inodes, then trigger writeback and then wait for ordered extents to complete. So make btrfs use generic_remap_file_range_prep() (XFS and OCFS2 use it) to eliminate duplicated code, fix a few bugs and benefit from future bug fixes done there - for example the recent clone and dedupe bugs involving reflinking a partial EOF block got a counterpart fix in the generic helper, since it affected all filesystems supporting these operations, so we no longer need special checks in Btrfs for them. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: Refactor main loop in extent_readpages	Nikolay Borisov
	extent_readpages processes all pages in the readlist in batches of 16, this is implemented by a single for loop but thanks to an if condition the loop does 2 things based on whether we've filled the batch or not. Additionally due to the structure of the code there is an additional check which deals with partial batches. Streamline all of this by explicitly using two loops. The outter one is used to process all pages while the inner one just fills in the batch of 16 (currently). Due to this new structure the code guarantees that all pages are processed in the loop hence the code to deal with any leftovers is eliminated. This also enable the compiler to inline __extent_readpages: ./scripts/bloat-o-meter fs/btrfs/extent_io.o extent_io.for add/remove: 0/1 grow/shrink: 1/0 up/down: 660/-820 (-160) Function old new delta extent_readpages 476 1136 +660 __extent_readpages 820 - -820 Total: Before=44315, After=44155, chg -0.36% Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: Remove 1st shrink/grow phase from balance	Nikolay Borisov
	The first step of the rebalance process ensures there is 1MiB free on each device. This number seems rather small. And in fact when talking to the original authors their opinions were: "man that's a little bonkers" "i don't think we even need that code anymore" "I think it was there to make sure we had room for the blank 1M at the beginning. I bet it goes all the way back to v0" "we just don't need any of that tho, i say we just delete it" Clearly, this piece of code has lost its original intent throughout the years. It doesn't really bring any real practical benefits to the relocation process. Additionally, this patch makes the balance process more lightweight by removing a pair of shrink/grow operations which are rather expensive for heavily populated filesystems. This is mainly due to shrink requiring relocating block groups, involving heavy use of the btree. The intermediate shrink/grow can fail and leave the filesystem in a middle state that would need to be changed back by the user. Suggested-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	Btrfs: send, fix race with transaction commits that create snapshots	Filipe Manana
	If we create a snapshot of a snapshot currently being used by a send operation, we can end up with send failing unexpectedly (returning -ENOENT error to user space for example). The following diagram shows how this happens. CPU 1 CPU2 CPU3 btrfs_ioctl_send() (...) create_snapshot() -> creates snapshot of a root used by the send task btrfs_commit_transaction() create_pending_snapshot() __get_inode_info() btrfs_search_slot() btrfs_search_slot_get_root() down_read commit_root_sem get reference on eb of the commit root -> eb with bytenr == X up_read commit_root_sem btrfs_cow_block(root node) btrfs_free_tree_block() -> creates delayed ref to free the extent btrfs_run_delayed_refs() -> runs the delayed ref, adds extent to fs_info->pinned_extents btrfs_finish_extent_commit() unpin_extent_range() -> marks extent as free in the free space cache transaction commit finishes btrfs_start_transaction() (...) btrfs_cow_block() btrfs_alloc_tree_block() btrfs_reserve_extent() -> allocates extent at bytenr == X btrfs_init_new_buffer(bytenr X) btrfs_find_create_tree_block() alloc_extent_buffer(bytenr X) find_extent_buffer(bytenr X) -> returns existing eb, which the send task got (...) -> modifies content of the eb with bytenr == X -> uses an eb that now belongs to some other tree and no more matches the commit root of the snapshot, resuts will be unpredictable The consequences of this race can be various, and can lead to searches in the commit root performed by the send task failing unexpectedly (unable to find inode items, returning -ENOENT to user space, for example) or not failing because an inode item with the same number was added to the tree that reused the metadata extent, in which case send can behave incorrectly in the worst case or just fail later for some reason. Fix this by performing a copy of the commit root's extent buffer when doing a search in the context of a send operation. CC: stable@vger.kernel.org # 4.4.x: 1fc28d8e2e9: Btrfs: move get root out of btrfs_search_slot to a helper CC: stable@vger.kernel.org # 4.4.x: f9ddfd0592a: Btrfs: remove unused check of skip_locking CC: stable@vger.kernel.org # 4.4.x Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	Btrfs: use nofs context when initializing security xattrs to avoid deadlock	Filipe Manana
	When initializing the security xattrs, we are holding a transaction handle therefore we need to use a GFP_NOFS context in order to avoid a deadlock with reclaim in case it's triggered. Fixes: 39a27ec1004e8 ("btrfs: use GFP_KERNEL for xattr and acl allocations") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: run delayed items before dropping the snapshot	Josef Bacik
	With my delayed refs patches in place we started seeing a large amount of aborts in __btrfs_free_extent: BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964 owner 1 offset 0 Call Trace: ? btrfs_merge_delayed_refs+0xaf/0x340 __btrfs_run_delayed_refs+0x6ea/0xfc0 ? btrfs_set_path_blocking+0x31/0x60 btrfs_run_delayed_refs+0xeb/0x180 btrfs_commit_transaction+0x179/0x7f0 ? btrfs_check_space_for_delayed_refs+0x30/0x50 ? should_end_transaction.isra.19+0xe/0x40 btrfs_drop_snapshot+0x41c/0x7c0 btrfs_clean_one_deleted_snapshot+0xb5/0xd0 cleaner_kthread+0xf6/0x120 kthread+0xf8/0x130 ? btree_invalidatepage+0x90/0x90 ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 This was because btrfs_drop_snapshot depends on the root not being modified while it's dropping the snapshot. It will unlock the root node (and really every node) as it walks down the tree, only to re-lock it when it needs to do something. This is a problem because if we modify the tree we could cow a block in our path, which frees our reference to that block. Then once we get back to that shared block we'll free our reference to it again, and get ENOENT when trying to lookup our extent reference to that block in __btrfs_free_extent. This is ultimately happening because we have delayed items left to be processed for our deleted snapshot _after_ all of the inodes are closed for the snapshot. We only run the delayed inode item if we're deleting the inode, and even then we do not run the delayed insertions or delayed removals. These can be run at any point after our final inode does its last iput, which is what triggers the snapshot deletion. We can end up with the snapshot deletion happening and then have the delayed items run on that file system, resulting in the above problem. This problem has existed forever, however my patches made it much easier to hit as I wake up the cleaner much more often to deal with delayed iputs, which made us more likely to start the snapshot dropping work before the transaction commits, which is when the delayed items would generally be run. Before, generally speaking, we would run the delayed items, commit the transaction, and wakeup the cleaner thread to start deleting snapshots, which means we were less likely to hit this problem. You could still hit it if you had multiple snapshots to be deleted and ended up with lots of delayed items, but it was definitely harder. Fix for now by simply running all the delayed items before starting to drop the snapshot. We could make this smarter in the future by making the delayed items per-root, and then simply drop any delayed items for roots that we are going to delete. But for now just a quick and easy solution is the safest. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: catch cow on deleting snapshots	Josef Bacik
	When debugging some weird extent reference bug I suspected that we were changing a snapshot while we were deleting it, which could explain my bug. This was indeed what was happening, and this patch helped me verify my theory. It is never correct to modify the snapshot once it's being deleted, so mark the root when we are deleting it and make sure we complain about it when it happens. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: extent-tree: cleanup one-shot usage of @blocksize in do_walk_down	Qu Wenruo
	@blocksize variable in do_walk_down() is only used once, really no need to declare it. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	Btrfs: scrub, move setup of nofs contexts higher in the stack	Filipe Manana
	Since scrub workers only do memory allocation with GFP_KERNEL when they need to perform repair, we can move the recent setup of the nofs context up to scrub_handle_errored_block() instead of setting it up down the call chain at insert_full_stripe_lock() and scrub_add_page_to_wr_bio(), removing some duplicate code and comment. So the only paths for which a scrub worker can do memory allocations using GFP_KERNEL are the following: scrub_bio_end_io_worker() scrub_block_complete() scrub_handle_errored_block() lock_full_stripe() insert_full_stripe_lock() -> kmalloc with GFP_KERNEL scrub_bio_end_io_worker() scrub_block_complete() scrub_handle_errored_block() scrub_write_page_to_dev_replace() scrub_add_page_to_wr_bio() -> kzalloc with GFP_KERNEL Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex	David Sterba
	The scrub context is allocated with GFP_KERNEL and called from btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe regarding reclaim that could try to flush filesystem data in order to get the memory. And the device_list_mutex is held during superblock commit, so this would cause a lockup. Move the alocation and initialization before any changes that require the mutex. Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: scrub: pass fs_info to scrub_setup_ctx	David Sterba
	We can pass fs_info directly as this is the only member of btrfs_device that's bing used inside scrub_setup_ctx. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: fix truncate throttling	Josef Bacik
	We have a bunch of magic to make sure we're throttling delayed refs when truncating a file. Now that we have a delayed refs rsv and a mechanism for refilling that reserve simply use that instead of all of this magic. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: don't run delayed refs in the end transaction logic	Josef Bacik
	Over the years we have built up a lot of infrastructure to keep delayed refs in check, mostly by running them at btrfs_end_transaction() time. We have a lot of different maths we do to figure out how much, if we should do it inline or async, etc. This existed because we had no feedback mechanism to force the flushing of delayed refs when they became a problem. However with the enospc flushing infrastructure in place for flushing delayed refs when they put too much pressure on the enospc system we have this problem solved. Rip out all of this code as it is no longer needed. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: rework btrfs_check_space_for_delayed_refs	Josef Bacik
	Now with the delayed_refs_rsv we can now know exactly how much pending delayed refs space we need. This means we can drastically simplify btrfs_check_space_for_delayed_refs by simply checking how much space we have reserved for the global rsv (which acts as a spill over buffer) and the delayed refs rsv. If our total size is beyond that amount then we know it's time to commit the transaction and stop any more delayed refs from being generated. With the introduction of dealyed_refs_rsv infrastructure, namely btrfs_update_delayed_refs_rsv we now know exactly how much pending delayed refs space is required. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: add new flushing states for the delayed refs rsv	Josef Bacik
	A nice thing we gain with the delayed refs rsv is the ability to flush the delayed refs on demand to deal with enospc pressure. Add states to flush delayed refs on demand, and this will allow us to remove a lot of ad-hoc work around checking to see if we should commit the transaction to run our delayed refs. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: update may_commit_transaction to use the delayed refs rsv	Josef Bacik
	Any space used in the delayed_refs_rsv will be freed up by a transaction commit, so instead of just counting the pinned space we also need to account for any space in the delayed_refs_rsv when deciding if it will make a different to commit the transaction to satisfy our space reservation. If we have enough bytes to satisfy our reservation ticket then we are good to go, otherwise subtract out what space we would gain back by committing the transaction and compare that against the pinned space to make our decision. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17	btrfs: introduce delayed_refs_rsv	Josef Bacik
	Traditionally we've had voodoo in btrfs to account for the space that delayed refs may take up by having a global_block_rsv. This works most of the time, except when it doesn't. We've had issues reported and seen in production where sometimes the global reserve is exhausted during transaction commit before we can run all of our delayed refs, resulting in an aborted transaction. Because of this voodoo we have equally dubious flushing semantics around throttling delayed refs which we often get wrong. So instead give them their own block_rsv. This way we can always know exactly how much outstanding space we need for delayed refs. This allows us to make sure we are constantly filling that reservation up with space, and allows us to put more precise pressure on the enospc system. Instead of doing math to see if its a good time to throttle, the normal enospc code will be invoked if we have a lot of delayed refs pending, and they will be run via the normal flushing mechanism. For now the delayed_refs_rsv will hold the reservations for the delayed refs, the block group updates, and deleting csums. We could have a separate rsv for the block group updates, but the csum deletion stuff is still handled via the delayed_refs so that will stay there. Historical background: The global reserve has grown to cover everything we don't reserve space explicitly for, and we've grown a lot of weird ad-hoc heuristics to know if we're running short on space and when it's time to force a commit. A failure rate of 20-40 file systems when we run hundreds of thousands of them isn't super high, but cleaning up this code will make things less ugly and more predictible. Thus the delayed refs rsv. We always know how many delayed refs we have outstanding, and although running them generates more we can use the global reserve for that spill over, which fits better into it's desired use than a full blown reservation. This first approach is to simply take how many times we're reserving space for and multiply that by 2 in order to save enough space for the delayed refs that could be generated. This is a niave approach and will probably evolve, but for now it works. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> # high-level review [ added background notes from the cover letter ] Signed-off-by: David Sterba <dsterba@suse.com>