summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-04-28btrfs: create a helper function to read the disk superAnand Jain
A part of code from btrfs_scan_one_device() is moved to a new function btrfs_read_disk_super(), so that former function looks cleaner. (In this process it also moves the code which ensures null terminating label). So this creates easy opportunity to merge various duplicate codes on read disk super. Earlier attempt to merge duplicate codes highlighted that there were some issues for which there are duplicate codes (to read disk super), however it was not clear what was the issue. So until we figure that out, its better to keep them in a separate functions. Signed-off-by: Anand Jain <anand.jain@oracle.com> [ use GFP_KERNEL, PAGE_CACHE_ removal related fixups ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28udf: Export superblock magic to userspaceJan Kara
Currently UDF superblock magic doesn't appear in any userspace header files and thus userspace apps have hard time checking for this fs. Let's export the magic to userspace as with any other filesystem. Signed-off-by: Jan Kara <jack@suse.cz>
2016-04-28Btrfs: do not create empty block group if we have allocated dataLiu Bo
Now we force to create empty block group to keep data profile alive, however, in the below example, we eventually get an empty block group while we're trying to get more space for other types (metadata/system), - Before, block group "A": size=2G, used=1.2G block group "B": size=2G, used=512M - After "btrfs balance start -dusage=50 mount_point", block group "A": size=2G, used=(1.2+0.5)G block group "C": size=2G, used=0 Since there is no data in block group C, it won't be deleted automatically and we have to get the unused 2G until the next mount. Balance itself just moves data and doesn't remove data, so it's safe to not create such a empty block group if we already have data allocated in other block groups. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28Btrfs: __btrfs_buffered_write: Pass valid file offset when releasing ↵Chandan Rajendra
delalloc space The delalloc reserved space is calculated in terms of number of bytes used by an integral number of blocks. This is done by rounding down the value of 'pos' to the nearest multiple of sectorsize. The file offset value held by 'pos' variable may not be aligned to sectorsize and hence when passing it as an argument to btrfs_delalloc_release_space(), we may end up releasing larger delalloc space than we originally had reserved. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28Btrfs: cleanup error handling in extent_write_cached_pagesLiu Bo
Now that we bail out immediately if ->writepage() returns an error, we don't need an extra error to retain the error code. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28Btrfs: make mapping->writeback_index point to the last written pageLiu Bo
If sequential writer is writing in the middle of the page and it just redirties the last written page by continuing from it. In the above case this can end up with seeking back to that firstly redirtied page after writing all the pages at the end of file because btrfs updates mapping->writeback_index to 1 past the current one. For non-cow filesystems, the cost is only about extra seek, while for cow filesystems such as btrfs, it means unnecessary fragments. To avoid it, we just need to continue writeback from the last written page. This also updates btrfs to behave like what write_cache_pages() does, ie, bail out immediately if there is an error in writepage(). <Ref: https://www.spinics.net/lists/linux-btrfs/msg52628.html> Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctlLuke Dashjr
32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr fail. Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org> Cc: stable@vger.kernel.org Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: fix typos in commentsLuis de Bethencourt
Correct a typo in the chunk_mutex name to make it grepable. Since it is better to fix several typos at once, fixing the 2 more in the same file. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28Btrfs: Refactor btrfs_lock_cluster() to kill compiler warningGeert Uytterhoeven
fs/btrfs/extent-tree.c: In function ‘btrfs_lock_cluster’: fs/btrfs/extent-tree.c:6399: warning: ‘used_bg’ may be used uninitialized in this function - Replace "again: ... goto again;" by standard C "while (1) { ... }", - Move block not processed during the first iteration of the loop to the end of the loop, which allows to kill the "locked" variable, Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-and-Tested-by: Miao Xie <miaox@cn.fujitsu.com> [ the compilation warning has been fixed by other patch, now we want to clean up the function ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: remove save_error_info()Anand Jain
Actually save_error_info() sets the FS state to error and nothing else. Further the word save doesn't induce caffeine when compared to the word set in what actually it does. So to make it better understandable move save_error_info() code to its only consumer itself. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: Simplify conditions about compress while mapping btrfs flags to inode ↵Satoru Takeuchi
flags Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: move error handling code together in ctree.hAnand Jain
Looks like we added the incompatible defines in between the error handling defines in the file ctree.h. Now group them back. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: remove unused function btrfs_assert()Anand Jain
Apparently looks like ASSERT does the same intended job, as intended btrfs_assert(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-28btrfs: rename btrfs_std_error to btrfs_handle_fs_errorAnand Jain
btrfs_std_error() handles errors, puts FS into readonly mode (as of now). So its good idea to rename it to btrfs_handle_fs_error(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ edit changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-27f2fs: fix to return 0 if err == -ENOENT in f2fs_readdirYunlong Song
Commit 57b62d29ad5b384775974973087d47755a8c6fcc ("f2fs: fix to report error in f2fs_readdir") causes f2fs_readdir to return -ENOENT when get_lock_data_page returns -ENOENT. However, the original logic is to continue when get_lock_data_page returns -ENOENT, but it forgets to reset err to 0. This will cause getdents64 incorretly return -ENOENT when lastdirent is NULL in getdents64. This will lead to a wrong return value for syscall caller. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-27f2fs: move node pages only in victim section during GCChao Yu
For foreground GC, we cache node blocks in victim section and set them dirty, then we call sync_node_pages to flush these node pages, but meanwhile, those node pages which does not locate in victim section will be flushed together, so more bandwidth and continuous free space would be occupied. So for this condition, it's better to leave those unrelated node page in cache for further write hit, and let CP or VM to flush them afterward. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-27f2fs: be aware of invalid filename lengthChao Yu
The filename length in dirent of may become zero-sized after random junk data injection, once encounter such dirent, find_target_dentry or f2fs_add_inline_entries will run into an infinite loop. So let f2fs being aware of that to avoid deadloop. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor overlapping changes in the conflicts. In the macsec case, the change of the default ID macro name overlapped with the 64-bit netlink attribute alignment fixes in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27ext4: remove trailing \n from ext4_warning/ext4_error callsJakub Wilk
Messages passed to ext4_warning() or ext4_error() don't need trailing newlines, because these function add the newlines themselves. Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
2016-04-26devpts: more pty driver interface cleanupsLinus Torvalds
This is more prep-work for the upcoming pty changes. Still just code cleanup with no actual semantic changes. This removes a bunch pointless complexity by just having the slave pty side remember the dentry associated with the devpts slave rather than the inode. That allows us to remove all the "look up the dentry" code for when we want to remove it again. Together with moving the tty pointer from "inode->i_private" to "dentry->d_fsdata" and getting rid of pointless inode locking, this removes about 30 lines of code. Not only is the end result smaller, it's simpler and easier to understand. The old code, for example, depended on the d_find_alias() to not just find the dentry, but also to check that it is still hashed, which in turn validated the tty pointer in the inode. That is a _very_ roundabout way to say "invalidate the cached tty pointer when the dentry is removed". The new code just does dentry->d_fsdata = NULL; in devpts_pty_kill() instead, invalidating the tty pointer rather more directly and obviously. Don't do something complex and subtle when the obvious straightforward approach will do. The rest of the patch (ie apart from code deletion and the above tty pointer clearing) is just switching the calling convention to pass the dentry or file pointer around instead of the inode. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Greg KH <greg@kroah.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-26f2fs: issue cache flush on direct IOJaegeuk Kim
Under direct IO path with O_(D)SYNC, it needs to set proper APPEND or UPDATE flags, so taht f2fs_sync_file can make its data safe. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26f2fs: set fsync mark only for the last dnodeJaegeuk Kim
In order to give atomic writes, we should consider power failure during sync_node_pages in fsync. So, this patch marks fsync flag only in the last dnode block. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26f2fs: report unwritten status in fsync_node_pagesJaegeuk Kim
The fsync_node_pages should return pass or failure so that user could know fsync is completed or not. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26f2fs: split sync_node_pages with fsync_node_pagesJaegeuk Kim
This patch splits the existing sync_node_pages into (f)sync_node_pages. The fsync_node_pages is used for f2fs_sync_file only. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26f2fs: avoid writing 0'th page in volatile writesJaegeuk Kim
The first page of volatile writes usually contains a sort of header information which will be used for recovery. (e.g., journal header of sqlite) If this is written without other journal data, user needs to handle the stale journal information. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26f2fs: avoid needless lock for node pages when fsyncing a fileJaegeuk Kim
When fsync is called, sync_node_pages finds a proper direct node pages to flush. But, it locks unrelated direct node pages together unnecessarily. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-04-26fs/quota: use nla_put_u64_64bit()Nicolas Dichtel
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26udf: Prevent stack overflow on corrupted filesystem mountAlden Tondettar
Presently, a corrupted or malicious UDF filesystem containing a very large number (or cycle) of Logical Volume Integrity Descriptor extent indirections may trigger a stack overflow and kernel panic in udf_load_logicalvolint() on mount. Replace the unnecessary recursion in udf_load_logicalvolint() with simple iteration. Set an arbitrary limit of 1000 indirections (which would have almost certainly overflowed the stack without this fix), and treat such cases as if there were no LVID. Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-04-25ext4: fix races between changing inode journal mode and ext4_writepagesDaeho Jeong
In ext4, there is a race condition between changing inode journal mode and ext4_writepages(). While ext4_writepages() is executed on a non-journalled mode inode, the inode's journal mode could be enabled by ioctl() and then, some pages dirtied after switching the journal mode will be still exposed to ext4_writepages() in non-journaled mode. To resolve this problem, we use fs-wide per-cpu rw semaphore by Jan Kara's suggestion because we don't want to waste ext4_inode_info's space for this extra rare case. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-04-25ext4: handle unwritten or delalloc buffers before enabling data journalingDaeho Jeong
We already allocate delalloc blocks before changing the inode mode into "per-file data journal" mode to prevent delalloc blocks from remaining not allocated, but another issue concerned with "BH_Unwritten" status still exists. For example, by fallocate(), several buffers' status change into "BH_Unwritten", but these buffers cannot be processed by ext4_alloc_da_blocks(). So, they still remain in unwritten status after per-file data journaling is enabled and they cannot be changed into written status any more and, if they are journaled and eventually checkpointed, these unwritten buffer will cause a kernel panic by the below BUG_ON() function of submit_bh_wbc() when they are submitted during checkpointing. static int submit_bh_wbc(int rw, struct buffer_head *bh,... { ... BUG_ON(buffer_unwritten(bh)); Moreover, when "dioread_nolock" option is enabled, the status of a buffer is changed into "BH_Unwritten" after write_begin() completes and the "BH_Unwritten" status will be cleared after I/O is done. Therefore, if a buffer's status is changed into unwrutten but the buffer's I/O is not submitted and completed, it can cause the same problem after enabling per-file data journaling. You can easily generate this bug by executing the following command. ./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269 To resolve these problems and define a boundary between the previous mode and per-file data journaling mode, we need to flush and wait all the I/O of buffers of a file before enabling per-file data journaling of the file. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-04-25ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart()Theodore Ts'o
The function jbd2_journal_extend() takes as its argument the number of new credits to be added to the handle. We weren't taking into account the currently unused handle credits; worse, we would try to extend the handle by N credits when it had N credits available. In the case where jbd2_journal_extend() fails because the transaction is too large, when jbd2_journal_restart() gets called, the N credits owned by the handle gets returned to the transaction, and the transaction commit is asynchronously requested, and then start_this_handle() will be able to successfully attach the handle to the current transaction since the required credits are now available. This is mostly harmless, but since ext4_ext_truncate_extend_restart() returns EAGAIN, the truncate machinery will once again try to call ext4_ext_truncate_extend_restart(), which will do the above sequence over and over again until the transaction has committed. This was found while I was debugging a lockup in caused by running xfstests generic/074 in the data=journal case. I'm still not sure why we ended up looping forever, which suggests there may still be another bug hiding in the transaction accounting machinery, but this commit prevents us from looping in the first place. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-25libceph: make authorizer destruction independent of ceph_auth_clientIlya Dryomov
Starting the kernel client with cephx disabled and then enabling cephx and restarting userspace daemons can result in a crash: [262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000 [262671.531460] IP: [<ffffffff811cd04a>] kfree+0x5a/0x130 [262671.584334] PGD 0 [262671.635847] Oops: 0000 [#1] SMP [262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu [262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014 [262672.268937] Workqueue: ceph-msgr con_work [libceph] [262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000 [262672.428330] RIP: 0010:[<ffffffff811cd04a>] [<ffffffff811cd04a>] kfree+0x5a/0x130 [262672.535880] RSP: 0018:ffff880149aeba58 EFLAGS: 00010286 [262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018 [262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012 [262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000 [262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40 [262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840 [262673.131722] FS: 0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000 [262673.245377] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0 [262673.417556] Stack: [262673.472943] ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04 [262673.583767] ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00 [262673.694546] ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800 [262673.805230] Call Trace: [262673.859116] [<ffffffffc035f70f>] ceph_x_destroy_authorizer+0x1f/0x50 [libceph] [262673.968705] [<ffffffffc035c89e>] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph] [262674.078852] [<ffffffffc0352805>] put_osd+0x45/0x80 [libceph] [262674.134249] [<ffffffffc035290e>] remove_osd+0xae/0x140 [libceph] [262674.189124] [<ffffffffc0352aa3>] __reset_osd+0x103/0x150 [libceph] [262674.243749] [<ffffffffc0354703>] kick_requests+0x223/0x460 [libceph] [262674.297485] [<ffffffffc03559e2>] ceph_osdc_handle_map+0x282/0x5e0 [libceph] [262674.350813] [<ffffffffc035022e>] dispatch+0x4e/0x720 [libceph] [262674.403312] [<ffffffffc034bd91>] try_read+0x3d1/0x1090 [libceph] [262674.454712] [<ffffffff810ab7c2>] ? dequeue_entity+0x152/0x690 [262674.505096] [<ffffffffc034cb1b>] con_work+0xcb/0x1300 [libceph] [262674.555104] [<ffffffff8108fb3e>] process_one_work+0x14e/0x3d0 [262674.604072] [<ffffffff810901ea>] worker_thread+0x11a/0x470 [262674.652187] [<ffffffff810900d0>] ? rescuer_thread+0x310/0x310 [262674.699022] [<ffffffff810957a2>] kthread+0xd2/0xf0 [262674.744494] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0 [262674.789543] [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70 [262674.834094] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0 What happens is the following: (1) new MON session is established (2) old "none" ac is destroyed (3) new "cephx" ac is constructed ... (4) old OSD session (w/ "none" authorizer) is put ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer) osd->o_auth.authorizer in the "none" case is just a bare pointer into ac, which contains a single static copy for all services. By the time we get to (4), "none" ac, freed in (2), is long gone. On top of that, a new vtable installed in (3) points us at ceph_x_destroy_authorizer(), so we end up trying to destroy a "none" authorizer with a "cephx" destructor operating on invalid memory! To fix this, decouple authorizer destruction from ac and do away with a single static "none" authorizer by making a copy for each OSD or MDS session. Authorizers themselves are independent of ac and so there is no reason for destroy_authorizer() to be an ac op. Make it an op on the authorizer itself by turning ceph_authorizer into a real struct. Fixes: http://tracker.ceph.com/issues/15447 Reported-by: Alan Zhang <alan.zhang@linux.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-04-25udf: Fix conversion of 'dstring' fields to UTF8Andrew Gabbasov
Commit 9293fcfbc1812a22ad5ce1b542eb90c1bbe01be1 ("udf: Remove struct ustr as non-needed intermediate storage"), while getting rid of 'struct ustr', does not take any special care of 'dstring' fields and effectively use fixed field length instead of actual string length, encoded in the last byte of the field. Also, commit 484a10f49387e4386bf2708532e75bf78ffea2cb ("udf: Merge linux specific translation into CS0 conversion function") introduced checking of the length of the string being converted, requiring proper alignment to number of bytes constituing each character. The UDF volume identifier is represented as a 32-bytes 'dstring', and needs to be converted from CS0 to UTF8, while mounting UDF filesystem. The changes in mentioned commits can in some cases lead to incorrect handling of volume identifier: - if the actual string in 'dstring' is of maximal length and does not have zero bytes separating it from dstring encoded length in last byte, that last byte may be included in conversion, thus making incorrect resulting string; - if the identifier is encoded with 2-bytes characters (compression code is 16), the length of 31 bytes (32 bytes of field length minus 1 byte of compression code), taken as the string length, is reported as an incorrect (unaligned) length, and the conversion fails, which in its turn leads to volume mounting failure. This patch introduces handling of 'dstring' encoded length field in udf_CS0toUTF8 function, that is used in all and only cases when 'dstring' fields are converted. Currently these cases are processing of Volume Identifier and Volume Set Identifier fields. The function is also renamed to udf_dstrCS0toUTF8 to distinctly indicate that it handles 'dstring' input. Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Jan Kara <jack@suse.cz>
2016-04-25fuse: Fix return value from fuse_get_user_pages()Ashish Samant
fuse_get_user_pages() should return error or 0. Otherwise fuse_direct_io read will not return 0 to indicate that read has completed. Fixes: 742f992708df ("fuse: return patrial success from fuse_direct_io()") Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-04-24ext4: do not ask jbd2 to write data for delalloc buffersJan Kara
Currently we ask jbd2 to write all dirty allocated buffers before committing a transaction when doing writeback of delay allocated blocks. However this is unnecessary since we move all pages to writeback state before dropping a transaction handle and then submit all the necessary IO. We still need the transaction commit to wait for all the outstanding writeback before flushing disk caches during transaction commit to avoid data exposure issues though. Use the new jbd2 capability and ask it to only wait for outstanding writeback during transaction commit when writing back data in ext4_writepages(). Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-24jbd2: add support for avoiding data writes during transaction commitsJan Kara
Currently when filesystem needs to make sure data is on permanent storage before committing a transaction it adds inode to transaction's inode list. During transaction commit, jbd2 writes back all dirty buffers that have allocated underlying blocks and waits for the IO to finish. However when doing writeback for delayed allocated data, we allocate blocks and immediately submit the data. Thus asking jbd2 to write dirty pages just unnecessarily adds more work to jbd2 possibly writing back other redirtied blocks. Add support to jbd2 to allow filesystem to ask jbd2 to only wait for outstanding data writes before committing a transaction and thus avoid unnecessary writes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-24ext4: remove EXT4_STATE_ORDERED_MODEJan Kara
This flag is just duplicating what ext4_should_order_data() tells you and is used in a single place. Furthermore it doesn't reflect changes to inode data journalling flag so it may be possibly misleading. Just remove it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-24ext4: fix data exposure after a crashJan Kara
Huang has reported that in his powerfail testing he is seeing stale block contents in some of recently allocated blocks although he mounts ext4 in data=ordered mode. After some investigation I have found out that indeed when delayed allocation is used, we don't add inode to transaction's list of inodes needing flushing before commit. Originally we were doing that but commit f3b59291a69d removed the logic with a flawed argument that it is not needed. The problem is that although for delayed allocated blocks we write their contents immediately after allocating them, there is no guarantee that the IO scheduler or device doesn't reorder things and thus transaction allocating blocks and attaching them to inode can reach stable storage before actual block contents. Actually whenever we attach freshly allocated blocks to inode using a written extent, we should add inode to transaction's ordered inode list to make sure we properly wait for block contents to be written before committing the transaction. So that is what we do in this patch. This also handles other cases where stale data exposure was possible - like filling hole via mmap in data=ordered,nodelalloc mode. The only exception to the above rule are extending direct IO writes where blkdev_direct_IO() waits for IO to complete before increasing i_size and thus stale data exposure is not possible. For now we don't complicate the code with optimizing this special case since the overhead is pretty low. In case this is observed to be a performance problem we can always handle it using a special flag to ext4_map_blocks(). CC: stable@vger.kernel.org Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-23ext4: allow readdir()'s of large empty directories to be interruptedTheodore Ts'o
If a directory has a large number of empty blocks, iterating over all of them can take a long time, leading to scheduler warnings and users getting irritated when they can't kill a process in the middle of one of these long-running readdir operations. Fix this by adding checks to ext4_readdir() and ext4_htree_fill_tree(). This was reverted earlier due to a typo in the original commit where I experimented with using signal_pending() instead of fatal_signal_pending(). The test was in the wrong place if we were going to return signal_pending() since we would end up returning duplicant entries. See 9f2394c9be47 for a more detailed explanation. Added fix as suggested by Linus to check for signal_pending() in in the filldir() functions. Reported-by: Benjamin LaHaise <bcrl@kvack.org> Google-Bug-Id: 27880676 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23ceph: kill __ceph_removexattr()Yan, Zheng
when removing a xattr, generic_removexattr() calls __ceph_setxattr() with NULL value and XATTR_REPLACE flag. __ceph_removexattr() is not used any more. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23ceph: Switch to generic xattr handlersAndreas Gruenbacher
Add a catch-all xattr handler at the end of ceph_xattr_handlers. Check for valid attribute names there, and remove those checks from __ceph_{get,set,remove}xattr instead. No "system.*" xattrs need to be handled by the catch-all handler anymore. The set xattr handler is called with a NULL value to indicate that the attribute should be removed; __ceph_setxattr already handles that case correctly (ceph_set_acl could already calling __ceph_setxattr with a NULL value). Move the check for snapshots from ceph_{set,remove}xattr into __ceph_{set,remove}xattr. With that, ceph_{get,set,remove}xattr can be replaced with the generic iops. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23ceph: Get rid of d_find_alias in ceph_set_aclAndreas Gruenbacher
Create a variant of ceph_setattr that takes an inode instead of a dentry. Change __ceph_setxattr (and also __ceph_removexattr) to take an inode instead of a dentry. Use those in ceph_set_acl so that we no longer need a dentry there. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23cifs: Switch to generic xattr handlersAndreas Gruenbacher
Use xattr handlers for resolving attribute names. The amount of setup code required on cifs is nontrivial, so use the same get and set functions for all handlers, with switch statements for the different types of attributes in them. The set_EA handler can handle NULL values, so we don't need a separate removexattr function anymore. Remove the cifs_dbg statements related to xattr name resolution; they don't add much. Don't build xattr.o when CONFIG_CIFS_XATTR is not defined. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23cifs: Fix removexattr for os2.* xattrsAndreas Gruenbacher
If cifs_removexattr finds a "user." or "os2." xattr name prefix, it skips 5 bytes, one byte too many for "os2.". Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23cifs: Check for equality with ACL_TYPE_ACCESS and ACL_TYPE_DEFAULTAndreas Gruenbacher
The two values ACL_TYPE_ACCESS and ACL_TYPE_DEFAULT are meant to be enumerations, not bits in a bit mask. Use '==' instead of '&' to check for these values. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-23cifs: Fix xattr name checksAndreas Gruenbacher
Use strcmp(str, name) instead of strncmp(str, name, strlen(name)) for checking if str and name are the same (as opposed to name being a prefix of str) in the gexattr and setxattr inode operations. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-20eCryptfs: Do not allocate hash tfm in NORECLAIM contextHerbert Xu
You cannot allocate crypto tfm objects in NORECLAIM or NOFS contexts. The ecryptfs code currently does exactly that for the MD5 tfm. This patch fixes it by preallocating the MD5 tfm in a safe context. The MD5 tfm is also reentrant so this patch removes the superfluous cs_hash_tfm_mutex. Reported-by: Nicolas Boichat <drinkcat@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-04-19GFS2: Add calls to gfs2_holder_uninit in two error handlersDaniel DeFreez
This patch fixes two locations that do not call gfs2_holder_uninit if gfs2_glock_nq returns an error. Signed-off-by: Daniel DeFreez <dcdefreez@ucdavis.edu> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-19Merge branch 'ptmx-cleanup'Linus Torvalds
Merge the ptmx internal interface cleanup branch. This doesn't change semantics, but it should be a sane basis for eventually getting the multi-instance devpts code into some sane shape where we can get rid of the kernel config option. Which we can hopefully get done next merge window.. * ptmx-cleanup: devpts: clean up interface to pty drivers