Age | Commit message (Collapse) | Author |
|
Pull XFS fixes from Darrick Wong:
"Here are three more fixes for 4.11.
The first one reworks the inline directory verifier to check the
working copy of the directory metadata and to avoid triggering a
periodic crash in xfs/348. The second patch fixes a regression in hole
punching at EOF that corrupts files; and the third patch closes a
kernel memory disclosure bug.
Summary:
- rework the inline directory verifier to avoid crashes on disk
corruption
- don't change file size when punching holes w/ KEEP_SIZE
- close a kernel memory exposure bug"
* tag 'xfs-4.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix kernel memory exposure problems
xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of files
xfs: rework the inline directory verifiers
|
|
Fix a memory exposure problems in inumbers where we allocate an array of
structures with holes, fail to zero the holes, then blindly copy the
kernel memory contents (junk and all) into userspace.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will
round the file size up to the nearest multiple of PAGE_SIZE:
calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1
calvinow@vm-disks/generic-xfs-1 ~$ stat test
Size: 2048 Blocks: 8 IO Block: 4096 regular file
calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test
calvinow@vm-disks/generic-xfs-1 ~$ stat test
Size: 4096 Blocks: 8 IO Block: 4096 regular file
Commit 3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes") replaced
xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers
don't enforce that [pos,offset) lies strictly on [0,i_size) when being
called from xfs_free_file_space(), so by "leaking" these ranges into
xfs_zero_range() we get this buggy behavior.
Fix this by reintroducing the checks xfs_zero_remaining_bytes() did
against i_size at the bottom of xfs_free_file_space().
Reported-by: Aaron Gao <gzh@fb.com>
Fixes: 3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Brian Foster <bfoster@redhat.com>
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
The inline directory verifiers should be called on the inode fork data,
which means after iformat_local on the read side, and prior to
ifork_flush on the write side. This makes the fork verifier more
consistent with the way buffer verifiers work -- i.e. they will operate
on the memory buffer that the code will be reading and writing directly.
Furthermore, revise the verifier function to return -EFSCORRUPTED so
that we don't flood the logs with corruption messages and assert
notices. This has been a particular problem with xfs/348, which
triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
kernel when CONFIG_XFS_DEBUG=y. Disk corruption isn't supposed to do
that, at least not in a verifier.
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Include a mask in struct stat to indicate which bits of stx_attributes the
filesystem actually supports.
This would also be useful if we add another system call that allows you to
do a 'bulk attribute set' and pass in a statx struct with the masks
appropriately set to say what you want to set.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reserve the top bit of the mask for future expansion of the statx struct
and give an error if statx() sees it set. All the other bits are ignored
if we see them set but don't support the bit; we just clear the bit in the
returned mask.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
statx has the ability to report inode creation times and inode flags, so
hook up di_crtime and di_flags to that functionality.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Return enhanced file attributes from the Ext4 filesystem. This includes
the following:
(1) The inode creation time (i_crtime) as stx_btime, setting STATX_BTIME.
(2) Certain FS_xxx_FL flags are mapped to stx_attribute flags.
This requires that all ext4 inodes have a getattr call, not just some of
them, so to this end, split the ext4_getattr() function and only call part
of it where appropriate.
Example output:
[root@andromeda ~]# touch foo
[root@andromeda ~]# chattr +ai foo
[root@andromeda ~]# /tmp/test-statx foo
statx(foo) = 0
results=fff
Size: 0 Blocks: 0 IO Block: 4096 regular file
Device: 08:12 Inode: 2101950 Links: 1
Access: (0644/-rw-r--r--) Uid: 0 Gid: 0
Access: 2016-02-11 17:08:29.031795451+0000
Modify: 2016-02-11 17:08:29.031795451+0000
Change: 2016-02-11 17:11:11.987790114+0000
Birth: 2016-02-11 17:08:29.031795451+0000
Attributes: 0000000000000030 (-------- -------- -------- -------- -------- -------- -------- --ai----)
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
I found that statx() was significantly slower than stat(). As a
microbenchmark, I compared 10,000,000 invocations of fstat() on a tmpfs
file to the same with statx() passed a NULL path:
$ time ./stat_benchmark
real 0m1.464s
user 0m0.275s
sys 0m1.187s
$ time ./statx_benchmark
real 0m5.530s
user 0m0.281s
sys 0m5.247s
statx is expected to be a little slower than stat because struct statx
is larger than struct stat, but not by *that* much. It turns out that
most of the overhead was in copying struct statx to userspace, mostly in
all the stac/clac instructions that got generated for each __put_user()
call. (This was on x86_64, but some other architectures, e.g. arm64,
have something similar now too.)
stat() instead initializes its struct on the stack and copies it to
userspace with a single call to copy_to_user(). This turns out to be
much faster, and changing statx to do this makes it almost as fast as
stat:
$ time ./statx_benchmark
real 0m1.624s
user 0m0.270s
sys 0m1.354s
For zeroing the reserved fields, start by zeroing the full struct with
memset. This makes it clear that every byte copied to userspace is
initialized, even implicit padding bytes (though there are none
currently). In the scenarios I tested, it also performed the same as a
designated initializer. Manually initializing each field was still
slightly faster, but would have been more error-prone and less
verifiable.
Also rename statx_set_result() to cp_statx() for consistency with
cp_old_stat() et al., and make it noinline so that struct statx doesn't
add to the stack usage during the main portion of the syscall execution.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
request_mask and query_flags are function arguments, not passed in
struct kstat. So remove the part of the comment which claims otherwise.
This was apparently left over from an earlier version of the statx
patch.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The statx() system call currently accepts unknown flags when called with
a NULL path to operate on a file descriptor. Left unchanged, this could
make it hard to introduce new query flags in the future, since
applications may not be able to tell whether a given flag is supported.
Fix this by failing the system call with EINVAL if any flags other than
KSTAT_QUERY_FLAGS are specified in combination with a NULL path.
Arguably, we could still permit known lookup-related flags such as
AT_SYMLINK_NOFOLLOW. However, that would be inconsistent with how
sys_utimensat() behaves when passed a NULL path, which seems to be the
closest precedent. And given that the NULL path case is (I believe)
mainly intended to be used to implement a wrapper function like fstatx()
that doesn't have a path argument, I think rejecting lookup-related
flags too is probably the best choice.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: do not sanitize kexec purgatory
drivers/rapidio/devices/tsi721.c: make module parameter variable name unique
mm/hugetlb.c: don't call region_abort if region_chg fails
kasan: report only the first error by default
hugetlbfs: initialize shared policy as part of inode allocation
mm: fix section name for .data..ro_after_init
mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
mm: workingset: fix premature shadow node shrinking with cgroups
mm: rmap: fix huge file mmap accounting in the memcg stats
mm: move mm_percpu_wq initialization earlier
mm: migrate: fix remove_migration_pte() for ksm pages
|
|
Pull nfsd fixes from Bruce Fields:
"The restriction of NFSv4 to TCP went overboard and also broke the
backchannel; fix.
Also some minor refinements to the nfsd version-setting interface that
we'd like to get fixed before release"
* tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linux:
svcrdma: set XPT_CONG_CTRL flag for bc xprt
NFSD: fix nfsd_reset_versions for NFSv4.
NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
NFSD: further refinement of content of /proc/fs/nfsd/versions
nfsd: map the ENOKEY to nfserr_perm for avoiding warning
SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"We have three small fixes queued up in my for-linus-4.11 branch"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix an integer overflow check
btrfs: Change qgroup_meta_rsv to 64bit
Btrfs: bring back repair during read
|
|
Any time after inode allocation, destroy_inode can be called. The
hugetlbfs inode contains a shared_policy structure, and
mpol_free_shared_policy is unconditionally called as part of
hugetlbfs_destroy_inode. Initialize the policy as part of inode
allocation so that any quick (error path) calls to destroy_inode will be
handed an initialized policy.
syzkaller fuzzer found this bug, that resulted in the following:
BUG: KASAN: user-memory-access in atomic_inc
include/asm-generic/atomic-instrumented.h:87 [inline] at addr
000000131730bd7a
BUG: KASAN: user-memory-access in __lock_acquire+0x21a/0x3a80
kernel/locking/lockdep.c:3239 at addr 000000131730bd7a
Write of size 4 by task syz-executor6/14086
CPU: 3 PID: 14086 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #364
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
atomic_inc include/asm-generic/atomic-instrumented.h:87 [inline]
__lock_acquire+0x21a/0x3a80 kernel/locking/lockdep.c:3239
lock_acquire+0x1ee/0x590 kernel/locking/lockdep.c:3762
__raw_write_lock include/linux/rwlock_api_smp.h:210 [inline]
_raw_write_lock+0x33/0x50 kernel/locking/spinlock.c:295
mpol_free_shared_policy+0x43/0xb0 mm/mempolicy.c:2536
hugetlbfs_destroy_inode+0xca/0x120 fs/hugetlbfs/inode.c:952
alloc_inode+0x10d/0x180 fs/inode.c:216
new_inode_pseudo+0x69/0x190 fs/inode.c:889
new_inode+0x1c/0x40 fs/inode.c:918
hugetlbfs_get_inode+0x40/0x420 fs/hugetlbfs/inode.c:734
hugetlb_file_setup+0x329/0x9f0 fs/hugetlbfs/inode.c:1282
newseg+0x422/0xd30 ipc/shm.c:575
ipcget_new ipc/util.c:285 [inline]
ipcget+0x21e/0x580 ipc/util.c:639
SYSC_shmget ipc/shm.c:673 [inline]
SyS_shmget+0x158/0x230 ipc/shm.c:657
entry_SYSCALL_64_fastpath+0x1f/0xc2
Analysis provided by Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: http://lkml.kernel.org/r/1490477850-7944-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull NFS client fixes from Anna Schumaker:
"Here are a few more bugfixes that came in over the last couple of
weeks. Most of these fix various hangs and loops that people found,
but we also had a few error handling fixes.
Stable Bugfixes:
- fix infinite loop on BAD_STATEID error
Other Bugfixes:
- fix old dentry rehash after move
- fix pnfs GETDEVINFO hangs
- fix pnfs fallback to MDS on commit errors
- fix flexfiles kernel oops"
* tag 'nfs-for-4.11-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: flexfiles: fix kernel OOPS if MDS returns unsupported DS type
NFSv4.1 fix infinite loop on IO BAD_STATEID error
PNFS fix fallback to MDS if got error on commit to DS
NFS filelayout:call GETDEVICEINFO after pnfs_layout_process completes
NFS store nfs4_deviceid in struct nfs4_filelayout_segment
NFS cleanup struct nfs4_filelayout_segment
NFS: Fix old dentry rehash after move
|
|
this fix aims to fix dereferencing of a mirror in an error state when MDS
returns unsupported DS type (IOW, not v3), which causes the following oops:
[ 220.370709] BUG: unable to handle kernel NULL pointer dereference at 0000000000000065
[ 220.370842] IP: ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles]
[ 220.370920] PGD 0
[ 220.370972] Oops: 0000 [#1] SMP
[ 220.371013] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btrfs kvm arc4 snd_hda_codec_hdmi iwldvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate mac80211 xor uvcvideo
[ 220.371814] videobuf2_vmalloc videobuf2_memops snd_hda_codec_idt mei_wdt videobuf2_v4l2 snd_hda_codec_generic iTCO_wdt ppdev videobuf2_core iTCO_vendor_support dell_rbtn dell_wmi iwlwifi sparse_keymap dell_laptop dell_smbios snd_hda_intel dcdbas videodev snd_hda_codec dell_smm_hwmon snd_hda_core media cfg80211 intel_uncore snd_hwdep raid6_pq snd_seq intel_rapl_perf snd_seq_device joydev i2c_i801 rfkill lpc_ich snd_pcm parport_pc mei_me parport snd_timer dell_smo8800 mei snd shpchp soundcore tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915 nouveau mxm_wmi ttm i2c_algo_bit drm_kms_helper crc32c_intel e1000e drm sdhci_pci firewire_ohci sdhci serio_raw mmc_core firewire_core ptp crc_itu_t pps_core wmi fjes video
[ 220.372568] CPU: 7 PID: 4988 Comm: cat Not tainted 4.10.5-200.fc25.x86_64 #1
[ 220.372647] Hardware name: Dell Inc. Latitude E6520/0J4TFW, BIOS A06 07/11/2011
[ 220.372729] task: ffff94791f6ea580 task.stack: ffffb72b88c0c000
[ 220.372802] RIP: 0010:ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles]
[ 220.372883] RSP: 0018:ffffb72b88c0f970 EFLAGS: 00010246
[ 220.372945] RAX: 0000000000000000 RBX: ffff9479015ca600 RCX: ffffffffffffffed
[ 220.373025] RDX: ffffffffffffffed RSI: ffff9479753dc980 RDI: 0000000000000000
[ 220.373104] RBP: ffffb72b88c0f988 R08: 000000000001c980 R09: ffffffffc0ea6112
[ 220.373184] R10: ffffef17477d9640 R11: ffff9479753dd6c0 R12: ffff9479211c7440
[ 220.373264] R13: ffff9478f45b7790 R14: 0000000000000001 R15: ffff9479015ca600
[ 220.373345] FS: 00007f555fa3e700(0000) GS:ffff9479753c0000(0000) knlGS:0000000000000000
[ 220.373435] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 220.373506] CR2: 0000000000000065 CR3: 0000000196044000 CR4: 00000000000406e0
[ 220.373586] Call Trace:
[ 220.373627] nfs4_ff_layout_prepare_ds+0x5e/0x200 [nfs_layout_flexfiles]
[ 220.373708] ff_layout_pg_init_read+0x81/0x160 [nfs_layout_flexfiles]
[ 220.373806] __nfs_pageio_add_request+0x11f/0x4a0 [nfs]
[ 220.373886] ? nfs_create_request.part.14+0x37/0x330 [nfs]
[ 220.373967] nfs_pageio_add_request+0xb2/0x260 [nfs]
[ 220.374042] readpage_async_filler+0xaf/0x280 [nfs]
[ 220.374103] read_cache_pages+0xef/0x1b0
[ 220.374166] ? nfs_read_completion+0x210/0x210 [nfs]
[ 220.374239] nfs_readpages+0x129/0x200 [nfs]
[ 220.374293] __do_page_cache_readahead+0x1d0/0x2f0
[ 220.374352] ondemand_readahead+0x17d/0x2a0
[ 220.374403] page_cache_sync_readahead+0x2e/0x50
[ 220.374460] generic_file_read_iter+0x6c8/0x950
[ 220.374532] ? nfs_mapping_need_revalidate_inode+0x17/0x40 [nfs]
[ 220.374617] nfs_file_read+0x6e/0xc0 [nfs]
[ 220.374670] __vfs_read+0xe2/0x150
[ 220.374715] vfs_read+0x96/0x130
[ 220.374758] SyS_read+0x55/0xc0
[ 220.374801] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 220.374856] RIP: 0033:0x7f555f570bd0
[ 220.374900] RSP: 002b:00007ffeb73e1b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 220.374986] RAX: ffffffffffffffda RBX: 00007f555f839ae0 RCX: 00007f555f570bd0
[ 220.375066] RDX: 0000000000020000 RSI: 00007f555fa41000 RDI: 0000000000000003
[ 220.375145] RBP: 0000000000021010 R08: ffffffffffffffff R09: 0000000000000000
[ 220.375226] R10: 00007f555fa40010 R11: 0000000000000246 R12: 0000000000022000
[ 220.375305] R13: 0000000000021010 R14: 0000000000001000 R15: 0000000000002710
[ 220.375386] Code: 66 66 90 55 48 89 e5 41 54 53 49 89 fc 48 83 ec 08 48 85 f6 74 2e 48 8b 4e 30 48 89 f3 48 81 f9 00 f0 ff ff 77 1e 48 85 c9 74 15 <48> 83 79 78 00 b8 01 00 00 00 74 2c 48 83 c4 08 5b 41 5c 5d c3
[ 220.375653] RIP: ff_layout_mirror_valid+0x2d/0x110 [nfs_layout_flexfiles] RSP: ffffb72b88c0f970
[ 220.375748] CR2: 0000000000000065
[ 220.403538] ---[ end trace bcdca752211b7da9 ]---
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Commit 63d63cbf5e03 "NFSv4.1: Don't recheck delegations that
have already been checked" introduced a regression where when a
client received BAD_STATEID error it would not send any TEST_STATEID
and instead go into an infinite loop of resending the IO that caused
the BAD_STATEID.
Fixes: 63d63cbf5e03 ("NFSv4.1: Don't recheck delegations that have already been checked")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Upong receiving some errors (EACCES) on commit to the DS the code
doesn't fallback to MDS and intead retrieds to the same DS again.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Remove faulty leftover check in do_rename(), apparently introduced in a
merge that combined whiteout support changes with commit f03b8ad8d386
("fs: support RENAME_NOREPLACE for local filesystems")
Fixes: f03b8ad8d386 ("fs: support RENAME_NOREPLACE for local
filesystems")
Fixes: 9e0a1fff8db5 ("ubifs: Implement RENAME_WHITEOUT")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
instead of filenames, print inode numbers, file types, and length.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
if a character is not printable, print '?' instead of that.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
if filename is encrypted, filename could have no printable characters.
so remove it.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
When fscrypt_setup_filename() fails we have to free dev.
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
This isn't super serious because you need CAP_ADMIN to run this code.
I added this integer overflow check last year but apparently I am
rubbish at writing integer overflow checks... There are two issues.
First, access_ok() works on unsigned long type and not u64 so on 32 bit
systems the access_ok() could be checking a truncated size. The other
issue is that we should be using a stricter limit so we don't overflow
the kzalloc() setting ctx->clone_roots later in the function after the
access_ok():
alloc_size = sizeof(struct clone_root) * (arg->clone_sources_count + 1);
sctx->clone_roots = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);
Fixes: f5ecec3ce21f ("btrfs: send: silence an integer overflow warning")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ added comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Using an int value is causing qg->reserved to become negative and
exclusive -EDQUOT to be reached prematurely.
This affects exclusive qgroups only.
TEST CASE:
DEVICE=/dev/vdb
MOUNTPOINT=/mnt
SUBVOL=$MOUNTPOINT/tmp
umount $SUBVOL
umount $MOUNTPOINT
mkfs.btrfs -f $DEVICE
mount /dev/vdb $MOUNTPOINT
btrfs quota enable $MOUNTPOINT
btrfs subvol create $SUBVOL
umount $MOUNTPOINT
mount /dev/vdb $MOUNTPOINT
mount -o subvol=tmp $DEVICE $SUBVOL
btrfs qgroup limit -e 3G $SUBVOL
btrfs quota rescan /mnt -w
for i in `seq 1 44000`; do
dd if=/dev/zero of=/mnt/tmp/test_$i bs=10k count=1
if [[ $? > 0 ]]; then
btrfs qgroup show -pcref $SUBVOL
exit 1
fi
done
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[ add reproducer to changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit 20a7db8ab3f2 ("btrfs: add dummy callback for readpage_io_failed
and drop checks") made a cleanup around readpage_io_failed_hook, and
it was supposed to keep the original sematics, but it also
unexpectedly disabled repair during read for dup, raid1 and raid10.
This fixes the problem by letting data's inode call the generic
readpage_io_failed callback by returning -EAGAIN from its
readpage_io_failed_hook in order to notify end_bio_extent_readpage to
do the rest. We don't call it directly because the generic one takes
an offset from end_bio_extent_readpage() to calculate the index in the
checksum array and inode's readpage_io_failed_hook doesn't offer that
offset.
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ keep the const function attribute ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Fix a filelayout GETDEVICEINFO call hang triggered from the LAYOUTGET
pnfs_layout_process where the GETDEVICEINFO call is waiting for a
session slot, and the LAYOUGET call is waiting for pnfs_layout_process
to complete before freeing the slot GETDEVICEINFO is waiting for..
This occurs in testing against the pynfs pNFS server where the
the on-wire reply highest_slotid and slot id are zero, and the
target high slot id is 8 (negotiated in CREATE_SESSION).
The internal fore channel slot table max_slotid, the maximum allowed
table slotid value, has been reduced via nfs41_set_max_slotid_locked
from 8 to 1. Thus there is one slot (slotid 0) available for use but
it has not been freed by LAYOUTGET proir to the GETDEVICEINFO request.
In order to ensure that layoutrecall callbacks are processed in the
correct order, nfs4_proc_layoutget processing needs to be finished
e.g. pnfs_layout_process) before giving up the slot that identifies
the layoutget (see referring_call_exists).
Move the filelayout_check_layout nfs4_find_get_device call outside of
the pnfs_layout_process call tree.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
In preparation for moving the filelayout getdeviceinfo call from
filelayout_alloc_lseg called by pnfs_process_layout
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Now that nfs_rename()'s d_move has moved within the RPC task's rpc_call_done
callback, rehashing new_dentry will actually rehash the old dentry's name
in nfs_rename(). d_move() is going to rehash the new dentry for us anyway,
so doing it again here is unnecessary.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 920b4530fb80 ("NFS: nfs_rename() handle -ERESTARTSYS dentry left behind")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single kernfs fix for 4.11-rc4 that resolves a reported
issue.
It has been in linux-next with no reported issues"
* tag 'driver-core-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a memory leak on an error path, and two races when modifying
inodes relating to the inline_data and metadata checksum features"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix two spelling nits
ext4: lock the xattr block before checksuming it
jbd2: don't leak memory if setting up journal fails
ext4: mark inode dirty after converting inline directory
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypto fixes from Ted Ts'o:
"A code cleanup and bugfix for fs/crypto"
* tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: eliminate ->prepare_context() operation
fscrypt: remove broken support for detecting keyring key revocation
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.
https://bugzilla.kernel.org/show_bug.cgi?id=193661
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Zygo tracked down a very old bug with inline compressed extents.
I didn't tag this one for stable because I want to do individual
tested backports. It's a little tricky and I'd rather do some extra
testing on it along the way"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: add missing memset while reading compressed inline extents
Btrfs: fix regression in lock_delalloc_pages
btrfs: remove btrfs_err_str function from uapi/linux/btrfs.h
|
|
Both nat_bits cache and free_nid_bitmap cache provide same functionality
as a intermediate cache between free nid cache and disk, but with
different granularity of indicating free nid range, and different
persistence policy. nat_bits cache provides better persistence ability,
and free_nid_bitmap provides better granularity.
In this patch we combine advantage of both caches, so finally policy of
the intermediate cache would be:
- init: load free nid status from nat_bits into free_nid_bitmap
- lookup: scan free_nid_bitmap before load NAT blocks
- update: update free_nid_bitmap in real-time
- persistence: udpate and persist nat_bits in checkpoint
This patch also resolves performance regression reported by lkp-robot.
commit:
4ac912427c4214d8031d9ad6fbc3bc75e71512df ("f2fs: introduce free nid bitmap")
d00030cf9cd0bb96fdccc41e33d3c91dcbb672ba ("f2fs: use __set{__clear}_bit_le")
1382c0f3f9d3f936c8bc42ed1591cf7a593ef9f7 ("f2fs: combine nat_bits and free_nid_bitmap cache")
4ac912427c4214d8 d00030cf9cd0bb96fdccc41e33 1382c0f3f9d3f936c8bc42ed15
---------------- -------------------------- --------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
77863 ± 0% +2.1% 79485 ± 1% +50.8% 117404 ± 0% aim7.jobs-per-min
231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time
231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time.max
896604 ± 0% -0.8% 889221 ± 3% -20.2% 715260 ± 1% aim7.time.involuntary_context_switches
2394 ± 1% +4.6% 2503 ± 1% +3.7% 2481 ± 2% aim7.time.maximum_resident_set_size
6240 ± 0% -1.5% 6145 ± 1% -14.1% 5360 ± 1% aim7.time.system_time
1111357 ± 3% +1.9% 1132509 ± 2% -6.2% 1041932 ± 2% aim7.time.voluntary_context_switches
...
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to account free nids for each NAT blocks, and while
scanning all free nid bitmap, do check count and skip lookuping in
full NAT block.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch uses __set{__clear}_bit_le for highter speed.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This is to avoid build warning reported by kbuild test robot.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch fixes that SSR can overwrite previous warm node block consisting of
a node chain since the last checkpoint.
Fixes: 5b6c6be2d878 ("f2fs: use SSR for warm node as well")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Pull NFS client fixes from Anna Schumaker:
"We have a handful of stable fixes to fix kernel warnings and other
bugs that have been around for a while. We've also found a few other
reference counting bugs and memory leaks since the initial 4.11 pull.
Stable Bugfixes:
- Fix decrementing nrequests in NFS v4.2 COPY to fix kernel warnings
- Prevent a double free in async nfs4_exchange_id()
- Squelch a kbuild sparse complaint for xprtrdma
Other Bugfixes:
- Fix a typo (NFS_ATTR_FATTR_GROUP_NAME) that causes a memory leak
- Fix a reference leak that causes kernel warnings
- Make nfs4_cb_sv_ops static to fix a sparse warning
- Respect a server's max size in CREATE_SESSION
- Handle errors from nfs4_pnfs_ds_connect
- Flexfiles layout shouldn't mark devices as unavailable"
* tag 'nfs-for-4.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
pNFS/flexfiles: never nfs4_mark_deviceid_unavailable
pNFS: return status from nfs4_pnfs_ds_connect
NFSv4.1 respect server's max size in CREATE_SESSION
NFS prevent double free in async nfs4_exchange_id
nfs: make nfs4_cb_sv_ops static
xprtrdma: Squelch kbuild sparse complaint
NFS: fix the fault nrequests decreasing for nfs_inode COPY
NFSv4: fix a reference leak caused WARNING messages
nfs4: fix a typo of NFS_ATTR_FATTR_GROUP_NAME
|
|
This is a story about 4 distinct (and very old) btrfs bugs.
Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).
Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1: uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read. The fix
was to add a memset in btrfs_get_extent to zero out the hole.
Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2: compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases. This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.
There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record. This memset doesn't cover the region between the end of the
decompressed data and the end of the page. It has also moved around a
few times over the years, so there's no single patch to refer to.
This patch fixes bug #3: compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same: zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.
The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern. Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code. In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.
A fast reproducer for bug #0 is presented below. A few offending extents
are also created in the wild during large rsync transfers with the -S
flag. A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well. Once an offending
file is created, it can present different content to userspace each
time it is read.
Bug #0 is at least 4 and possibly 8 years old. I verified every vX.Y
kernel back to v3.5 has this behavior. There are fossil records of this
bug's effects in commits all the way back to v2.6.32. I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.
It is not clear whether bug #0 is worth fixing. A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.
Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak. So enough about
bug #0, let's get back to bug #3 (this patch).
An example of on-disk structure leading to data corruption found in
the wild:
item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
inode generation 50 transid 50 size 47424 nbytes 49141
block group 0 mode 100644 links 1 uid 0 gid 0
rdev 0 flags 0x0(none)
item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
inode ref index 3 namelen 10 name: DB_File.so
item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
inline extent data size 1341 ram 4085 compress(zlib)
item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
extent data disk byte 5367308288 nr 20480
extent data offset 0 nr 45056 ram 45056
extent compression(zlib)
Different data appears in userspace during each read of the 11 bytes
between 4085 and 4096. The extent in item 63 is not long enough to
fill the first page of the file, so a memset is required to fill the
space between item 63 (ending at 4085) and item 64 (beginning at 4096)
with zero.
Here is a reproducer from Liu Bo, which demonstrates another method
of creating the same inline extent and hole pattern:
Using 'page_poison=on' kernel command line (or enable
CONFIG_PAGE_POISONING) run the following:
# touch foo
# chattr +c foo
# xfs_io -f -c "pwrite -W 0 1000" foo
# xfs_io -f -c "falloc 4 8188" foo
# od -x foo
# echo 3 >/proc/sys/vm/drop_caches
# od -x foo
This produce the following on my box:
Correct output: file contains 1000 data bytes followed
by zeros:
0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000
0001760 0000 0000 0000 0000 0000 0000 0000 0000
*
0020000
Actual output: the data after the first 1000 bytes
will be different each run:
0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d
0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400
0002000 435f 0056 5f74 6164 7400 645f 0062 5f74
(...)
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
The bug is a regression after commit
(da2c7009f6ca "btrfs: teach __process_pages_contig about PAGE_LOCK operation")
and commit
(76c0021db8fd "Btrfs: use helper to simplify lock/unlock pages").
So if the dirty pages which are under writeback got truncated partially
before we lock the dirty pages, we couldn't find all pages mapping to the
delalloc range, and the bug didn't return an error so it kept going on and
found that the delalloc range got truncated and got to unlock the dirty
pages, and then the ASSERT could caught the error, and showed
-----------------------------------------------------------------------------
assertion failed: page_ops & PAGE_LOCK, file: fs/btrfs/extent_io.c, line: 1716
-----------------------------------------------------------------------------
This fixes the bug by returning the proper -EAGAIN.
Cc: David Sterba <dsterba@suse.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The flexfiles layout should never mark a device unavailable.
Move nfs4_mark_deviceid_unavailable out of nfs4_pnfs_ds_connect and call
directly from files layout where it's still needed.
The flexfiles driver still handles marked devices in error paths, but will
now print a rate limited warning.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it
can wait on another context to reach the same failure.
This checks that the rpc_create succeeded and returns the error to the
caller.
When an error is returned, both the files and flexfiles layouts will return
NULL from _prepare_ds(). The flexfiles layout will also return the layout
with the error NFS4ERR_NXIO.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Currently client doesn't respect max sizes server returns in CREATE_SESSION.
nfs4_session_set_rwsize() gets called and server->rsize, server->wsize are 0
so they never get set to the sizes returned by the server.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Since rpc_task is async, the release function should be called which
will free the impl_id, scope, and owner.
Trond pointed at 2 more problems:
-- use of client pointer after free in the nfs4_exchangeid_release() function
-- cl_count mismatch if rpc_run_task() isn't run
Fixes: 8d89bd70bc9 ("NFS setup async exchange_id")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Fixes the following sparse warning:
fs/nfs/callback.c:235:21: warning: symbol 'nfs4_cb_sv_ops' was not
declared. Should it be static?
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|