linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2017-04-26	ibmvnic: Split initialization of scrqs to its own routine	Nathan Fontenot
	Split the sending of capability request crqs and the initialization of sub crqs into their own routines. This is a first step to moving the allocation of sub-crqs out of interrupt context. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	net: core: Prevent from dereferencing null pointer when releasing SKB	Myungho Jung
	Added NULL check to make __dev_kfree_skb_irq consistent with kfree family of functions. Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289 Signed-off-by: Myungho Jung <mhjungk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	dt-bindings: mdio: Clarify binding document	Florian Fainelli
	The described GPIO reset property is applicable to all child PHYs. If we have one reset line per PHY present on the MDIO bus, these automatically become properties of the child PHY nodes. Finally, indicate how the RESET pulse width must be defined, which is the maximum value of all individual PHYs RESET pulse widths determined by reading their datasheets. Fixes: 69226896ad63 ("mdio_bus: Issue GPIO RESET to PHYs.") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Roger Quadros <rogerq@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	Merge branch 'tcp-do-not-use-tcp_time_stamp-for-rcv-autotuning'	David S. Miller
	Eric Dumazet says: ==================== tcp: do not use tcp_time_stamp for rcv autotuning Some devices or linux distributions use HZ=100 or HZ=250 TCP receive buffer autotuning has poor behavior caused by this choice. Since autotuning happens after 4 ms or 10 ms, short distance flows get their receive buffer tuned to a very high value, but after an initial period where it was frozen to (too small) initial value. With BBR (or other CC allowing to increase BDP), we are willing to increase tcp_rmem[2], but this receive autotuning defect is a blocker for hosts dealing with gazillions of TCP flows in the data centers, since many of them have inflated RCVBUF. Risk of OOM is too high. Note that TSO autodefer, tcp cubic, and TCP TS options (RFC 7323) also suffer from our dependency to jiffies (via tcp_time_stamp). We have ongoing efforts to improve all that in the future. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps	Eric Dumazet
	Some devices or distributions use HZ=100 or HZ=250 TCP receive buffer autotuning has poor behavior caused by this choice. Since autotuning happens after 4 ms or 10 ms, short distance flows get their receive buffer tuned to a very high value, but after an initial period where it was frozen to (too small) initial value. With tp->tcp_mstamp introduction, we can switch to high resolution timestamps almost for free (at the expense of 8 additional bytes per TCP structure) Note that some TCP stacks use usec TCP timestamps where this patch makes even more sense : Many TCP flows have < 500 usec RTT. Hopefully this finer TS option can be standardized soon. Tested: HZ=100 kernel ./netperf -H lpaa24 -t TCP_RR -l 1000 -- -r 10000,10000 & Peer without patch : lpaa24:~# ss -tmi dst lpaa23 ... skmem:(r0,rb8388608,...) rcv_rtt:10 rcv_space:3210000 minrtt:0.017 Peer with the patch : lpaa23:~# ss -tmi dst lpaa24 ... skmem:(r0,rb428800,...) rcv_rtt:0.069 rcv_space:30000 minrtt:0.017 We can see saner RCVBUF, and more precise rcv_rtt information. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: remove ack_time from struct tcp_sacktag_state	Eric Dumazet
	It is no longer needed, everything uses tp->tcp_mstamp instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: use tp->tcp_mstamp in tcp_clean_rtx_queue()	Eric Dumazet
	Following patch will remove ack_time from struct tcp_sacktag_state Same info is now found in tp->tcp_mstamp Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: do not pass timestamp to tcp_rack_advance()	Eric Dumazet
	No longer needed, since tp->tcp_mstamp holds the information. This is needed to remove sack_state.ack_time in a following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: do not pass timestamp to tcp_rate_gen()	Eric Dumazet
	No longer needed, since tp->tcp_mstamp holds the information. This is needed to remove sack_state.ack_time in a following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: do not pass timestamp to tcp_fastretrans_alert()	Eric Dumazet
	Not used anymore now tp->tcp_mstamp holds the information. This is needed to remove sack_state.ack_time in a following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: do not pass timestamp to tcp_rack_identify_loss()	Eric Dumazet
	Not used anymore now tp->tcp_mstamp holds the information. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: do not pass timestamp to tcp_rack_mark_lost()	Eric Dumazet
	This is no longer used, since tcp_rack_detect_loss() takes the timestamp from tp->tcp_mstamp Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: do not pass timestamp to tcp_rack_detect_loss()	Eric Dumazet
	We can use tp->tcp_mstamp as it contains a recent timestamp. This removes a call to skb_mstamp_get() from tcp_rack_reo_timeout() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	tcp: add tp->tcp_mstamp field	Eric Dumazet
	We want to use precise timestamps in TCP stack, but we do not want to call possibly expensive kernel time services too often. tp->tcp_mstamp is guaranteed to be updated once per incoming packet. We will use it in the following patches, removing specific skb_mstamp_get() calls, and removing ack_time from struct tcp_sacktag_state. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	macsec: dynamically allocate space for sglist	Jason A. Donenfeld
	We call skb_cow_data, which is good anyway to ensure we can actually modify the skb as such (another error from prior). Now that we have the number of fragments required, we can safely allocate exactly that amount of memory. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	perf units: Move parse_tag_value() to units.[ch]	Arnaldo Carvalho de Melo
	Its basically to do units handling, so move to a more appropriately named object. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-90ob9vfepui24l8l2makhd9u@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-26	rhashtable: remove insecure_max_entries param	Florian Westphal
	no users in the tree, insecure_max_entries is always set to ht->p.max_size * 2 in rhtashtable_init(). Replace only spot that uses it with a ht->p.max_size check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	Revert "phy: micrel: Disable auto negotiation on startup"	David S. Miller
	This reverts commit 99f81afc139c6edd14d77a91ee91685a414a1c66. It was papering over the real problem, which is fixed by commit f555f34fdc58 ("net: phy: fix auto-negotiation stall due to unavailable interrupt") Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	orangefs: handle zero size write in debugfs	Dan Carpenter
	If we write zero bytes to this debugfs file, then it will cause an underflow when we do copy_from_user(buf, ubuf, count - 1). Debugfs can normally only be written to by root so the impact of this is low. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: do not wait for timeout if umounting	Martin Brandenburg
	When the computer is turned off, all the processes are killed and then all the filesystems are umounted. OrangeFS should not wait for the userspace daemon to come back in that case. This only works for plain umount(2). To actually take advantage of this interactively, `umount -f' is needed; otherwise umount will issue a statfs first, which will wait for the userspace daemon to come back. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: return from orangefs_devreq_read quickly if possible	Martin Brandenburg
	It is not necessary to take the lock and search through the request list if the list is empty. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: ensure the userspace component is unmounted if mount fails	Martin Brandenburg
	If the mount is aborted after userspace has been asked to mount, userspace must be told to unmount. Ordinarily orangefs_kill_sb does the unmount. However it cannot be called if the superblock has not been set up. This is a very narrow window. The NULL fs_id is not unmounted. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: do not check possibly stale size on truncate	Martin Brandenburg
	Let the server figure this out because our size might be out of date or not present. The bug was that xfs_io -f -t -c "pread -v 0 100" /mnt/foo echo "Test" > /mnt/foo xfs_io -f -t -c "pread -v 0 100" /mnt/foo fails because the second truncate did not happen if nothing had requested the size after the write in echo. Thus i_size was zero (not present) and the orangefs_setattr though i_size was zero and there was nothing to do. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: implement statx	Martin Brandenburg
	Fortunately OrangeFS has had a getattr request mask for a long time. The server basically has two difficulty levels for attributes. Fetching any attribute except size requires communicating with the metadata server for that handle. Since all the attributes are right there, it makes sense to return them all. Fetching the size requires communicating with every I/O server (that the file is distributed across). Therefore if asked for anything except size, get everything except size, and if asked for size, get everything. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: remove ORANGEFS_READDIR macros	Martin Brandenburg
	They are clones of the ORANGEFS_ITERATE macros in use elsewhere. Delete ORANGEFS_ITERATE_NEXT which is a hack previously used by readdir. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: support very large directories	Martin Brandenburg
	This works by maintaining a linked list of pages which the directory has been read into rather than one giant fixed-size buffer. This replaces code which limits the total directory size to the total amount that could be returned in one server request. Since filenames are usually considerably shorter than the maximum, the old code could usually handle several server requests before running out of space. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: support llseek on directories	Martin Brandenburg
	This and the previous commit fix xfstests generic/257. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: rewrite readdir to fix several bugs	Martin Brandenburg
	In the past, readdir assumed that the user buffer will be large enough that all entries from the server will fit. If this was not true, entries would be skipped. Since it works now, request 512 entries rather than 96 per server operation. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: do not set getattr_time on orangefs_lookup	Martin Brandenburg
	Since orangefs_lookup calls orangefs_iget which calls orangefs_inode_getattr, getattr_time will get set. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: clean up oversize xattr validation	Martin Brandenburg
	Also don't check flags as this has been validated by the VFS already. Fix an off-by-one error in the max size checking. Stop logging just because userspace wants to write attributes which do not fit. This and the previous commit fix xfstests generic/020. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: fix bounds check for listxattr	Martin Brandenburg
	Signed-off-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	orangefs: remove unused get_fsid_from_ino	Martin Brandenburg
	Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26	net: phy: fix auto-negotiation stall due to unavailable interrupt	Alexander Kochetkov
	The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet cable was plugged before the Ethernet interface was brought up. The patch trigger PHY state machine to update link state if PHY was requested to do auto-negotiation and auto-negotiation complete flag already set. During power-up cycle the PHY do auto-negotiation, generate interrupt and set auto-negotiation complete flag. Interrupt is handled by PHY state machine but doesn't update link state because PHY is in PHY_READY state. After some time MAC bring up, start and request PHY to do auto-negotiation. If there are no new settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation. PHY continue to stay in auto-negotiation complete state and doesn't fire interrupt. At the same time PHY state machine expect that PHY started auto-negotiation and is waiting for interrupt from PHY and it won't get it. Fixes: 321beec5047a ("net: phy: Use interrupts when available in NOLINK state") Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Cc: stable <stable@vger.kernel.org> # v4.9+ Tested-by: Roger Quadros <rogerq@ti.com> Tested-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26	perf ui gtk: Move gtk .so name to the only place where it is used	Arnaldo Carvalho de Melo
	No need to pollute util.h with this. Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/n/tip-kec0chbdtgrd71o3oi2kz2zt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-26	perf tools: Move HAS_BOOL define to where perl headers are used	Arnaldo Carvalho de Melo
	This is a perl specific hack, so move it from util.h to where perl headers are used. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-4igctbinuom2sr6g4b03jqht@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-26	srcu: Make rcutorture writer stalls print SRCU GP state	Paul E. McKenney
	In the past, SRCU was simple enough that there was little point in making the rcutorture writer stall messages print the SRCU grace-period number state. With the advent of Tree SRCU, this has changed. This commit therefore makes Classic, Tiny, and Tree SRCU report this state to rcutorture as needed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26	srcu: Exact tracking of srcu_data structures containing callbacks	Paul E. McKenney
	The current Tree SRCU implementation schedules a workqueue for every srcu_data covered by a given leaf srcu_node structure having callbacks, even if only one of those srcu_data structures actually contains callbacks. This is clearly inefficient for workloads that don't feature callbacks everywhere all the time. This commit therefore adds an array of masks that are used by the leaf srcu_node structures to track exactly which srcu_data structures contain callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26	NFSv4: Don't special case "launder"	Trond Myklebust
	If the client receives a fatal server error from nfs_pageio_add_request(), then we should always truncate the page on which the error occurred. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-26	NFS: Add a few more fatal I/O errors to nfs_error_is_fatal()	Trond Myklebust
	EACCES, EDQUOT, EFBIG and ESTALE are all fatal errors as far as NFS I/O is concerned. They need to be reported back to the application. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-04-26	drm: mali-dp: use div_u64 for expensive 64-bit divisions	Arnd Bergmann
	On 32-bit machines, we can't divide 64-bit integers: drivers/gpu/drm/arm/malidp_crtc.o: In function `malidp_crtc_atomic_check': malidp_crtc.c:(.text.malidp_crtc_atomic_check+0x3c0): undefined reference to `__aeabi_uldivmod' malidp_crtc.c:(.text.malidp_crtc_atomic_check+0x3dc): undefined reference to `__aeabi_uldivmod' This calls the div_u64 function explicitly instead. Fixes: 4cea4e9f6690 ("drm: mali-dp: Add plane upscaling support") Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-04-26	Merge tag 'sound-4.11' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Since we got a bonus week, let me try to screw a few pending fixes. A slightly large fix is the locking fix in ASoC STI driver, but it's pretty board-specific, and the risk is fairly low. All the rest are small / trivial fixes, mostly marked as stable, for ALSA sequencer core, ASoC topology, ASoC Intel bytcr and Firewire drivers" * tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: intel: Fix PM and non-atomic crash in bytcr drivers ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type ALSA: seq: Don't break snd_use_lock_sync() loop by timeout ASoC: topology: Fix to store enum text values ASoC: STI: Fix null ptr deference in IRQ handler ALSA: oxfw: fix regression to handle Stanton SCS.1m/1d
2017-04-26	HAVE_ARCH_HARDENED_USERCOPY is unconditional now	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26	CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now	Al Viro
	all architectures converted Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26	Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', ↵	Al Viro
	'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess
2017-04-26	m32r: switch to RAW_COPY_USER	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26	ASoC: stm32: add SAI driver	olivier moysan
	This patch implements SAI ASoC driver for STM32. Signed-off-by: olivier moysan <olivier.moysan@st.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-04-26	ASoC: stm32: add bindings for SAI	olivier moysan
	This patch adds documentation of device tree bindings for the STM32 SAI ASoC driver. Signed-off-by: olivier moysan <olivier.moysan@st.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2017-04-26	Btrfs: fix reported number of inode blocks	Filipe Manana
	Currently when there are buffered writes that were not yet flushed and they fall within allocated ranges of the file (that is, not in holes or beyond eof assuming there are no prealloc extents beyond eof), btrfs simply reports an incorrect number of used blocks through the stat(2) system call (or any of its variants), regardless of mount options or inode flags (compress, compress-force, nodatacow). This is because the number of blocks used that is reported is based on the current number of bytes in the vfs inode plus the number of dealloc bytes in the btrfs inode. The later covers bytes that both fall within allocated regions of the file and holes. Example scenarios where the number of reported blocks is wrong while the buffered writes are not flushed: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -c "pwrite -S 0xaa 0 64K" /mnt/sdc/foo1 wrote 65536/65536 bytes at offset 0 64 KiB, 16 ops; 0.0000 sec (259.336 MiB/sec and 66390.0415 ops/sec) $ sync $ xfs_io -c "pwrite -S 0xbb 0 64K" /mnt/sdc/foo1 wrote 65536/65536 bytes at offset 0 64 KiB, 16 ops; 0.0000 sec (192.308 MiB/sec and 49230.7692 ops/sec) # The following should have reported 64K... $ du -h /mnt/sdc/foo1 128K /mnt/sdc/foo1 $ sync # After flushing the buffered write, it now reports the correct value. $ du -h /mnt/sdc/foo1 64K /mnt/sdc/foo1 $ xfs_io -f -c "falloc -k 0 128K" -c "pwrite -S 0xaa 0 64K" /mnt/sdc/foo2 wrote 65536/65536 bytes at offset 0 64 KiB, 16 ops; 0.0000 sec (520.833 MiB/sec and 133333.3333 ops/sec) $ sync $ xfs_io -c "pwrite -S 0xbb 64K 64K" /mnt/sdc/foo2 wrote 65536/65536 bytes at offset 65536 64 KiB, 16 ops; 0.0000 sec (260.417 MiB/sec and 66666.6667 ops/sec) # The following should have reported 128K... $ du -h /mnt/sdc/foo2 192K /mnt/sdc/foo2 $ sync # After flushing the buffered write, it now reports the correct value. $ du -h /mnt/sdc/foo2 128K /mnt/sdc/foo2 So the number of used file blocks is simply incorrect, unlike in other filesystems such as ext4 and xfs for example, but only while the buffered writes are not flushed. Fix this by tracking the number of delalloc bytes that fall within holes and beyond eof of a file, and use instead this new counter when reporting the number of used blocks for an inode. Another different problem that exists is that the delalloc bytes counter is reset when writeback starts (by clearing the EXTENT_DEALLOC flag from the respective range in the inode's iotree) and the vfs inode's bytes counter is only incremented when writeback finishes (through insert_reserved_file_extent()). Therefore while writeback is ongoing we simply report a wrong number of blocks used by an inode if the write operation covers a range previously unallocated. While this change does not fix this problem, it does minimizes it a lot by shortening that time window, as the new dealloc bytes counter (new_delalloc_bytes) is only decremented when writeback finishes right before updating the vfs inode's bytes counter. Fully fixing this second problem is not trivial and will be addressed later by a different patch. Signed-off-by: Filipe Manana <fdmanana@suse.com>
2017-04-26	Btrfs: send, fix file hole not being preserved due to inline extent	Filipe Manana
	Normally we don't have inline extents followed by regular extents, but there's currently at least one harmless case where this happens. For example, when the page size is 4Kb and compression is enabled: $ mkfs.btrfs -f /dev/sdb $ mount -o compress /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xaa 0 4K" -c "fsync" /mnt/foobar $ xfs_io -c "pwrite -S 0xbb 8K 4K" -c "fsync" /mnt/foobar In this case we get a compressed inline extent, representing 4Kb of data, followed by a hole extent and then a regular data extent. The inline extent was not expanded/converted to a regular extent exactly because it represents 4Kb of data. This does not cause any apparent problem (such as the issue solved by commit e1699d2d7bf6 ("btrfs: add missing memset while reading compressed inline extents")) except trigger an unexpected case in the incremental send code path that makes us issue an operation to write a hole when it's not needed, resulting in more writes at the receiver and wasting space at the receiver. So teach the incremental send code to deal with this particular case. The issue can be currently triggered by running fstests btrfs/137 with compression enabled (MOUNT_OPTIONS="-o compress" ./check btrfs/137). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2017-04-26	Btrfs: fix extent map leak during fallocate error path	Filipe Manana
	If the call to btrfs_qgroup_reserve_data() failed, we were leaking an extent map structure. The failure can happen either due to an -ENOMEM condition or, when quotas are enabled, due to -EDQUOT for example. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com>