summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-09-03fanotify: add API to attach/detach super block markAmir Goldstein
Add another mark type flag FAN_MARK_FILESYSTEM for add/remove/flush of super block mark type. A super block watch gets all events on the filesystem, regardless of the mount from which the mark was added, unless an ignore mask exists on either the inode or the mount where the event was generated. Only one of FAN_MARK_MOUNT and FAN_MARK_FILESYSTEM mark type flags may be provided to fanotify_mark() or no mark type flag for inode mark. Cc: <linux-api@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03fsnotify: send path type events to group with super block marksAmir Goldstein
Send events to group if super block mark mask matches the event and unless the same group has an ignore mask on the vfsmount or the inode on which the event occurred. Soon, fanotify backend is going to support super block marks and fanotify backend only supports path type events. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03fsnotify: add super block object typeAmir Goldstein
Add the infrastructure to attach a mark to a super_block struct and detach all attached marks when super block is destroyed. This is going to be used by fanotify backend to setup super block marks. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03x86/fault: BUG() when uaccess helpers fault on kernel addressesJann Horn
There have been multiple kernel vulnerabilities that permitted userspace to pass completely unchecked pointers through to userspace accessors: - the waitid() bug - commit 96ca579a1ecc ("waitid(): Add missing access_ok() checks") - the sg/bsg read/write APIs - the infiniband read/write APIs These don't happen all that often, but when they do happen, it is hard to test for them properly; and it is probably also hard to discover them with fuzzing. Even when an unmapped kernel address is supplied to such buggy code, it just returns -EFAULT instead of doing a proper BUG() or at least WARN(). Try to make such misbehaving code a bit more visible by refusing to do a fixup in the pagefault handler code when a userspace accessor causes a #PF on a kernel address and the current context isn't whitelisted. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: dvyukov@google.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180828201421.157735-7-jannh@google.com
2018-09-03fsnotify: fix ignore mask logic in fsnotify()Amir Goldstein
Commit 92183a42898d ("fsnotify: fix ignore mask logic in send_to_group()") acknoledges the use case of ignoring an event on an inode mark, because of an ignore mask on a mount mark of the same group (i.e. I want to get all events on this file, except for the events that came from that mount). This change depends on correctly merging the inode marks and mount marks group lists, so that the mount mark ignore mask would be tested in send_to_group(). Alas, the merging of the lists did not take into account the case where event in question is not in the mask of any of the mount marks. To fix this, completely remove the tests for inode and mount event masks from the lists merging code. Fixes: 92183a42898d ("fsnotify: fix ignore mask logic in send_to_group") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03ext2: cache NULL when both default_acl and acl are NULLChengguang Xu
default_acl and acl of newly created inode will be initiated as ACL_NOT_CACHED in vfs function inode_init_always() and later will be updated by calling xxx_init_acl() in specific filesystems. However, when default_acl and acl are NULL, then they keep the value of ACL_NOT_CACHED. This patch changes the code to cache NULL for acl / default_acl in this case to save unnecessary ACL lookup in future. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03udf: remove unused variables group_start and nr_groupsColin Ian King
Variables group_start and nr_groups are being assigned but are never used hence they are redundant and can be removed. Cleans up clang warning: variable 'group_start' set but not used [-Wunused-but-set-variable] variable 'nr_groups' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz>
2018-09-03ovl: add ovl_fadvise()Amir Goldstein
Implement stacked fadvise to fix syscalls readahead(2) and fadvise64(2) on an overlayfs file. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: d1d04ef8572b ("ovl: stack file ops") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-02cifs: connect to servername instead of IP for IPC$ shareThomas Werschlein
This patch is required allows access to a Microsoft fileserver failover cluster behind a 1:1 NAT firewall. The change also provides stronger context for authentication and share connection (see MS-SMB2 3.3.5.7 and MS-SRVS 3.1.6.8) as noted by Tom Talpey, and addresses comments about the buffer size for the UNC made by Aurélien Aptel. Signed-off-by: Thomas Werschlein <thomas.werschlein@geo.uzh.ch> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Tom Talpey <ttalpey@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org>
2018-09-02smb3: check for and properly advertise directory lease supportSteve French
Although servers will typically ignore unsupported features, we should advertise the support for directory leases (as Windows e.g. does) in the negotiate protocol capabilities we pass to the server, and should check for the server capability (CAP_DIRECTORY_LEASING) before sending a lease request for an open of a directory. This will prevent us from accidentally sending directory leases to SMB2.1 or SMB2 server for example. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-09-02smb3: minor debugging clarifications in rfc1001 len processingSteve French
I ran into some cases where server was returning the wrong length on frames but I couldn't easily match them to the command in the network trace (or server logs) since I need the command and/or multiplex id to find the offending SMB2/SMB3 command. Add these two fields to the log message. In the case of padding too much it may not be a problem in all cases but might have correlated to a network disconnect case in some problems we have been looking at. In the case of frame too short is even more important. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-09-02SMB3: Backup intent flag missing for directory opens with backupuid mountsSteve French
When "backup intent" is requested on the mount (e.g. backupuid or backupgid mount options), the corresponding flag needs to be set on opens of directories (and files) but was missing in some places causing access denied trying to enumerate and backup servers. Fixes kernel bugzilla #200953 https://bugzilla.kernel.org/show_bug.cgi?id=200953 Reported-and-tested-by: <whh@rubrik.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-09-02fs/cifs: don't translate SFM_SLASH (U+F026) to backslashJon Kuhn
When a Mac client saves an item containing a backslash to a file server the backslash is represented in the CIFS/SMB protocol as as U+F026. Before this change, listing a directory containing an item with a backslash in its name will return that item with the backslash represented with a true backslash character (U+005C) because convert_sfm_character mapped U+F026 to U+005C when interpretting the CIFS/SMB protocol response. However, attempting to open or stat the path using a true backslash will result in an error because convert_to_sfm_char does not map U+005C back to U+F026 causing the CIFS/SMB request to be made with the backslash represented as U+005C. This change simply prevents the U+F026 to U+005C conversion from happenning. This is analogous to how the code does not do any translation of UNI_SLASH (U+F000). Signed-off-by: Jon Kuhn <jkuhn@barracuda.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-09-02Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Thomas Gleixner: "A small set of updates for core code: - Prevent tracing in functions which are called from trace patching via stop_machine() to prevent executing half patched function trace entries. - Remove old GCC workarounds - Remove pointless includes of notifier.h" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Remove workaround for unreachable warnings from old GCC notifier: Remove notifier header file wherever not used watchdog: Mark watchdog touch functions as notrace
2018-09-01ext4: recalucate superblock checksum after updating free blocks/inodesTheodore Ts'o
When mounting the superblock, ext4_fill_super() calculates the free blocks and free inodes and stores them in the superblock. It's not strictly necessary, since we don't use them any more, but it's nice to keep them roughly aligned to reality. Since it's not critical for file system correctness, the code doesn't call ext4_commit_super(). The problem is that it's in ext4_commit_super() that we recalculate the superblock checksum. So if we're not going to call ext4_commit_super(), we need to call ext4_superblock_csum_set() to make sure the superblock checksum is consistent. Most of the time, this doesn't matter, since we end up calling ext4_commit_super() very soon thereafter, and definitely by the time the file system is unmounted. However, it doesn't work in this sequence: mke2fs -Fq -t ext4 /dev/vdc 128M mount /dev/vdc /vdc cp xfstests/git-versions /vdc godown /vdc umount /vdc mount /dev/vdc tune2fs -l /dev/vdc With this commit, the "tune2fs -l" no longer fails. Reported-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2018-09-01ext4: avoid arithemetic overflow that can trigger a BUGTheodore Ts'o
A maliciously crafted file system can cause an overflow when the results of a 64-bit calculation is stored into a 32-bit length parameter. https://bugzilla.kernel.org/show_bug.cgi?id=200623 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
2018-08-30ovl: fix GPF in swapfile_activate of file from overlayfs over xfsAmir Goldstein
Since overlayfs implements stacked file operations, the underlying filesystems are not supposed to be exposed to the overlayfs file, whose f_inode is an overlayfs inode. Assigning an overlayfs file to swap_file results in an attempt of xfs code to dereference an xfs_inode struct from an ovl_inode pointer: CPU: 0 PID: 2462 Comm: swapon Not tainted 4.18.0-xfstests-12721-g33e17876ea4e #3402 RIP: 0010:xfs_find_bdev_for_inode+0x23/0x2f Call Trace: xfs_iomap_swapfile_activate+0x1f/0x43 __se_sys_swapon+0xb1a/0xee9 Fix this by not assigning the real inode mapping to f_mapping, which will cause swapon() to return an error (-EINVAL). Although it makes sense not to allow setting swpafile on an overlayfs file, some users may depend on it, so we may need to fix this up in the future. Keeping f_mapping pointing to overlay inode mapping will cause O_DIRECT open to fail. Fix this by installing ovl_aops with noop_direct_IO in overlay inode mapping. Keeping f_mapping pointing to overlay inode mapping will cause other a_ops related operations to fail (e.g. readahead()). Those will be fixed by follow up patches. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: f7c72396d0de ("ovl: add O_DIRECT support") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-08-30ovl: respect FIEMAP_FLAG_SYNC flagAmir Goldstein
Stacked overlayfs fiemap operation broke xfstests that test delayed allocation (with "_test_generic_punch -d"), because ovl_fiemap() failed to write dirty pages when requested. Fixes: 9e142c4102db ("ovl: add ovl_fiemap()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-08-30notifier: Remove notifier header file wherever not usedMukesh Ojha
The conversion of the hotplug notifiers to a state machine left the notifier.h includes around in some places. Remove them. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1535114033-4605-1-git-send-email-mojha@codeaurora.org
2018-08-29Merge tag 'for_v4.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull misc fs fixes from Jan Kara: - make UDF to properly mount media created by Win7 - make isofs to properly refuse devices with large physical block size - fix a Spectre gadget in quotactl(2) - fix a warning in fsnotify code hit by syzkaller * tag 'for_v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix mounting of Win7 created UDF filesystems udf: Remove dead code from udf_find_fileset() fs/quota: Fix spectre gadget in do_quotactl fs/quota: Replace XQM_MAXQUOTAS usage with MAXQUOTAS isofs: reject hardware sector size > 2048 bytes fsnotify: fix false positive warning on inode delete
2018-08-29y2038: utimes: Rework #ifdef guards for compat syscallsArnd Bergmann
After changing over to 64-bit time_t syscalls, many architectures will want compat_sys_utimensat() but not respective handlers for utime(), utimes() and futimesat(). This adds a new __ARCH_WANT_SYS_UTIME32 to complement __ARCH_WANT_SYS_UTIME. For now, all 64-bit architectures that support CONFIG_COMPAT set it, but future 64-bit architectures will not (tile would not have needed it either, but got removed). As older 32-bit architectures get converted to using CONFIG_64BIT_TIME, they will have to use __ARCH_WANT_SYS_UTIME32 instead of __ARCH_WANT_SYS_UTIME. Architectures using the generic syscall ABI don't need either of them as they never had a utime syscall. Since the compat_utimbuf structure is now required outside of CONFIG_COMPAT, I'm moving it into compat_time.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- changed from last version: - renamed __ARCH_WANT_COMPAT_SYS_UTIME to __ARCH_WANT_SYS_UTIME32
2018-08-29y2038: Compile utimes()/futimesat() conditionallyArnd Bergmann
There are four generations of utimes() syscalls: utime(), utimes(), futimesat() and utimensat(), each one being a superset of the previous one. For y2038 support, we have to add another one, which is the same as the existing utimensat() but always passes 64-bit times_t based timespec values. There are currently 10 architectures that only use utimensat(), two that use utimes(), futimesat() and utimensat() but not utime(), and 11 architectures that have all four, and those define __ARCH_WANT_SYS_UTIME in order to get a sys_utime implementation. Since all the new architectures only want utimensat(), moving all the legacy entry points into a common __ARCH_WANT_SYS_UTIME guard simplifies the logic. Only alpha and ia64 grow a tiny bit as they now also get an unused sys_utime(), but it didn't seem worth the extra complexity of adding yet another ifdef for those. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29y2038: Change sys_utimensat() to use __kernel_timespecArnd Bergmann
When 32-bit architectures get changed to support 64-bit time_t, utimensat() needs to use the new __kernel_timespec structure as its argument. The older utime(), utimes() and futimesat() system calls don't need a corresponding change as they are no longer used on C libraries that have 64-bit time support. As we do for the other syscalls that have timespec arguments, we reuse the 'compat' syscall entry points to implement the traditional four interfaces, and only leave the new utimensat() as a native handler, so that the same code gets used on both 32-bit and 64-bit kernels on each syscall. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macroArnd Bergmann
The sys_llseek sytem call is needed on all 32-bit architectures and none of the 64-bit ones, so we can remove the __ARCH_WANT_SYS_LLSEEK guard and simplify the include/asm-generic/unistd.h header further. Since 32-bit tasks can run either natively or in compat mode on 64-bit architectures, we have to check for both !CONFIG_64BIT and CONFIG_COMPAT. There are a few 64-bit architectures that also reference sys_llseek in their 64-bit ABI (e.g. sparc), but I verified that those all select CONFIG_COMPAT, so the #if check is still correct here. It's a bit odd to include it in the syscall table though, as it's the same as sys_lseek() on 64-bit, but with strange calling conventions. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29y2038: Remove newstat family from default syscall setArnd Bergmann
We have four generations of stat() syscalls: - the oldstat syscalls that are only used on the older architectures - the newstat family that is used on all 64-bit architectures but lacked support for large files on 32-bit architectures. - the stat64 family that is used mostly on 32-bit architectures to replace newstat - statx() to replace all of the above, adding 64-bit timestamps among other things. We already compile stat64 only on those architectures that need it, but newstat is always built, including on those that don't reference it. This adds a new __ARCH_WANT_NEW_STAT symbol along the lines of __ARCH_WANT_OLD_STAT and __ARCH_WANT_STAT64 to control compilation of newstat. All architectures that need it use an explict define, the others now get a little bit smaller, and future architecture (including 64-bit targets) won't ever see it. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29v9fs_dir_readdir: fix double-free on p9stat_read errorDominique Martinet
p9stat_read will call p9stat_free on error, we should only free the struct content on success. There also is no need to "p9stat_init" st as the read function will zero the whole struct for us anyway, so clean up the code a bit while we are here. Link: http://lkml.kernel.org/r/1535410108-20650-1-git-send-email-asmadeus@codewreck.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Reported-by: syzbot+d4252148d198410b864f@syzkaller.appspotmail.com
2018-08-28gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updatedBob Peterson
The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when a rgrp buffer is valid. It's cleared when the glock is invalidated, signifying that the buffer data is now invalid. But before this patch, function update_rgrp_lvb was setting the flag when it determined it had a valid lvb. But that's an invalid assumption: just because you have a valid lvb doesn't mean you have valid buffers. After all, another node may have made the lvb valid, and this node just fetched it from the glock via dlm. Consider this scenario: 1. The file system is mounted with RGRPLVB option. 2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks to GL_SKIP, it skips the gfs2_rgrp_bh_get. 3. Since loops == 0 and the allocation target (ap->target) is bigger than the largest known chunk of blocks in the rgrp (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses the call to gfs2_rgrp_bh_get there as well. 4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due to this invalid assumption. 5. The next time update_rgrp_lvb is called, it sees the bit is set and just returns 0, assuming both the lvb and rgrp are both uptodate. But since this is a smaller allocation, or space has been freed by another node, thus adjusting the lvb values, it decides to use the rgrp for allocations, with invalid rd_free due to the fact it was never updated. This patch changes update_rgrp_lvb so it doesn't set the UPTODATE flag anymore. That way, it has no choice but to fetch the latest values. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-08-28gfs2: improve debug information when lvb mismatches are foundBob Peterson
Before this patch, gfs2_rgrp_bh_get would check for lvb mismatches, but it wouldn't tell you what was actually wrong. This patch adds more information to help us debug it. It also makes rgrp consistency checks dump any bad rgrps, and the rgrp dump code dump any lvbs as well as the rgrp itself. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2018-08-27ext4: avoid divide by zero fault when deleting corrupted inline directoriesTheodore Ts'o
A specially crafted file system can trick empty_inline_dir() into reading past the last valid entry in a inline directory, and then run into the end of xattr marker. This will trigger a divide by zero fault. Fix this by using the size of the inline directory instead of dir->i_size. Also clean up error reporting in __ext4_check_dir_entry so that the message is clearer and more understandable --- and avoids the division by zero trap if the size passed in is zero. (I'm not sure why we coded it that way in the first place; printing offset % size is actually more confusing and less useful.) https://bugzilla.kernel.org/show_bug.cgi?id=200933 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
2018-08-27y2038: globally rename compat_time to old_time32Arnd Bergmann
Christoph Hellwig suggested a slightly different path for handling backwards compatibility with the 32-bit time_t based system calls: Rather than simply reusing the compat_sys_* entry points on 32-bit architectures unchanged, we get rid of those entry points and the compat_time types by renaming them to something that makes more sense on 32-bit architectures (which don't have a compat mode otherwise), and then share the entry points under the new name with the 64-bit architectures that use them for implementing the compatibility. The following types and interfaces are renamed here, and moved from linux/compat_time.h to linux/time32.h: old new --- --- compat_time_t old_time32_t struct compat_timeval struct old_timeval32 struct compat_timespec struct old_timespec32 struct compat_itimerspec struct old_itimerspec32 ns_to_compat_timeval() ns_to_old_timeval32() get_compat_itimerspec64() get_old_itimerspec32() put_compat_itimerspec64() put_old_itimerspec32() compat_get_timespec64() get_old_timespec32() compat_put_timespec64() put_old_timespec32() As we already have aliases in place, this patch addresses only the instances that are relevant to the system call interface in particular, not those that occur in device drivers and other modules. Those will get handled separately, while providing the 64-bit version of the respective interfaces. I'm not renaming the timex, rusage and itimerval structures, as we are still debating what the new interface will look like, and whether we will need a replacement at all. This also doesn't change the names of the syscall entry points, which can be done more easily when we actually switch over the 32-bit architectures to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix. Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-27ext4: check to make sure the rename(2)'s destination is not freedTheodore Ts'o
If the destination of the rename(2) system call exists, the inode's link count (i_nlinks) must be non-zero. If it is, the inode can end up on the orphan list prematurely, leading to all sorts of hilarity, including a use-after-free. https://bugzilla.kernel.org/show_bug.cgi?id=200931 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Wen Xu <wen.xu@gatech.edu> Cc: stable@vger.kernel.org
2018-08-27ext4: add nonstring annotations to ext4.hTheodore Ts'o
This suppresses some false positives in gcc 8's -Wstringop-truncation Suggested by Miguel Ojeda (hopefully the __nonstring definition will eventually get accepted in the compiler-gcc.h header file). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2018-08-26Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds
Pull IDA updates from Matthew Wilcox: "A better IDA API: id = ida_alloc(ida, GFP_xxx); ida_free(ida, id); rather than the cumbersome ida_simple_get(), ida_simple_remove(). The new IDA API is similar to ida_simple_get() but better named. The internal restructuring of the IDA code removes the bitmap preallocation nonsense. I hope the net -200 lines of code is convincing" * 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax: (29 commits) ida: Change ida_get_new_above to return the id ida: Remove old API test_ida: check_ida_destroy and check_ida_alloc test_ida: Convert check_ida_conv to new API test_ida: Move ida_check_max test_ida: Move ida_check_leaf idr-test: Convert ida_check_nomem to new API ida: Start new test_ida module target/iscsi: Allocate session IDs from an IDA iscsi target: fix session creation failure handling drm/vmwgfx: Convert to new IDA API dmaengine: Convert to new IDA API ppc: Convert vas ID allocation to new IDA API media: Convert entity ID allocation to new IDA API ppc: Convert mmu context allocation to new IDA API Convert net_namespace to new IDA API cb710: Convert to new IDA API rsxx: Convert to new IDA API osd: Convert to new IDA API sd: Convert to new IDA API ...
2018-08-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Kernel: - Improve kallsyms coverage - Add x86 entry trampolines to kcore - Fix ARM SPE handling - Correct PPC event post processing Tools: - Make the build system more robust - Small fixes and enhancements all over the place - Update kernel ABI header copies - Preparatory work for converting libtraceevnt to a shared library - License cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy' tools arch x86: Update tools's copy of cpufeatures.h perf python: Fix pyrf_evlist__read_on_cpu() interface perf mmap: Store real cpu number in 'struct perf_mmap' perf tools: Remove ext from struct kmod_path perf tools: Add gzip_is_compressed function perf tools: Add lzma_is_compressed function perf tools: Add is_compressed callback to compressions array perf tools: Move the temp file processing into decompress_kmodule perf tools: Use compression id in decompress_kmodule() perf tools: Store compression id into struct dso perf tools: Add compression id into 'struct kmod_path' perf tools: Make is_supported_compression() static perf tools: Make decompress_to_file() function static perf tools: Get rid of dso__needs_decompress() call in __open_dso() perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble() perf tools: Get rid of dso__needs_decompress() call in read_object_code() tools lib traceevent: Change to SPDX License format perf llvm: Allow passing options to llc in addition to clang perf parser: Improve error message for PMU address filters ...
2018-08-25Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm memory-failure update from Dave Jiang: "As it stands, memory_failure() gets thoroughly confused by dev_pagemap backed mappings. The recovery code has specific enabling for several possible page states and needs new enabling to handle poison in dax mappings. In order to support reliable reverse mapping of user space addresses: 1/ Add new locking in the memory_failure() rmap path to prevent races that would typically be handled by the page lock. 2/ Since dev_pagemap pages are hidden from the page allocator and the "compound page" accounting machinery, add a mechanism to determine the size of the mapping that encompasses a given poisoned pfn. 3/ Given pmem errors can be repaired, change the speculatively accessed poison protection, mce_unmap_kpfn(), to be reversible and otherwise allow ongoing access from the kernel. A side effect of this enabling is that MADV_HWPOISON becomes usable for dax mappings, however the primary motivation is to allow the system to survive userspace consumption of hardware-poison via dax. Specifically the current behavior is: mce: Uncorrected hardware memory error in user-access at af34214200 {1}[Hardware Error]: It has been corrected by h/w and requires no further action mce: [Hardware Error]: Machine check events logged {1}[Hardware Error]: event severity: corrected Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users [..] Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed mce: Memory error not recovered <reboot> ...and with these changes: Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000 Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption Memory failure: 0x20cb00: recovery action for dax page: Recovered Given all the cross dependencies I propose taking this through nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax folks" * tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, pmem: Restore page attributes when clearing errors x86/memory_failure: Introduce {set, clear}_mce_nospec() x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses mm, memory_failure: Teach memory_failure() about dev_pagemap pages filesystem-dax: Introduce dax_lock_mapping_entry() mm, memory_failure: Collect mapping size in collect_procs() mm, madvise_inject_error: Let memory_failure() optionally take a page reference mm, dev_pagemap: Do not clear ->mapping on final put mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages filesystem-dax: Set page->index device-dax: Set page->index device-dax: Enable page_mapping() device-dax: Convert to vmf_insert_mixed and vm_fault_t
2018-08-25Merge tag 'libnvdimm-for-4.19_misc' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dave Jiang: "Collection of misc libnvdimm patches for 4.19 submission: - Adding support to read locked nvdimm capacity. - Change test code to make DSM failure code injection an override. - Add support for calculate maximum contiguous area for namespace. - Add support for queueing a short ARS when there is on going ARS for nvdimm. - Allow NULL to be passed in to ->direct_access() for kaddr and pfn params. - Improve smart injection support for nvdimm emulation testing. - Fix test code that supports for emulating controller temperature. - Fix hang on error before devm_memremap_pages() - Fix a bug that causes user memory corruption when data returned to user for ars_status. - Maintainer updates for Ross Zwisler emails and adding Jan Kara to fsdax" * tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm: fix ars_status output length calculation device-dax: avoid hang on error before devm_memremap_pages() tools/testing/nvdimm: improve emulation of smart injection filesystem-dax: Do not request kaddr and pfn when not required md/dm-writecache: Don't request pointer dummy_addr when not required dax/super: Do not request a pointer kaddr when not required tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access() s390, dcssblk: kaddr and pfn can be NULL to ->direct_access() libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access() acpi/nfit: queue issuing of ars when an uc error notification comes in libnvdimm: Export max available extent libnvdimm: Use max contiguous area for namespace size MAINTAINERS: Add Jan Kara for filesystem DAX MAINTAINERS: update Ross Zwisler's email address tools/testing/nvdimm: Fix support for emulating controller temperature tools/testing/nvdimm: Make DSM failure code injection an override acpi, nfit: Prefer _DSM over _LSR for namespace label reads libnvdimm: Introduce locked DIMM capacity support
2018-08-25Merge tag 'upstream-4.19-rc1-fix' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBIFS fix from Richard Weinberger: "Remove an empty file from UBIFS source" * tag 'upstream-4.19-rc1-fix' of git://git.infradead.org/linux-ubifs: ubifs: Remove empty file.h
2018-08-25Merge tag '4.19-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three small SMB3 fixes, one for stable" * tag '4.19-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number for cifs.ko to 2.12 cifs: check kmalloc before use cifs: check if SMB2 PDU size has been padded and suppress the warning cifs: create a define for how many iovs we need for an SMB2_open()
2018-08-25hpfs: remove unnecessary checks on the value of r when assigning error codeColin Ian King
At the point where r is being checked for different values, r is always going to be equal to 2 as the previous if statements jump to end or end1 if r is not 2. Hence the assignment to err can be simplified to just err an assignment without any checks on the value or r. Detected by CoverityScan, CID#1226737 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-24Merge branch 'userns-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace fixes from Eric Biederman: "This is a set of four fairly obvious bug fixes: - a switch from d_find_alias to d_find_any_alias because the xattr code perversely takes a dentry - two mutex vs copy_to_user fixes from Jann Horn - a fix to use a sanitized size not the size userspace passed in from Christian Brauner" * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: getxattr: use correct xattr length sys: don't hold uts_sem while accessing userspace memory userns: move user access out of the mutex cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()
2018-08-24btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcuMisono Tomohiro
Commit 672d599041c8 ("btrfs: Use wrapper macro for rcu string to remove duplicate code") replaces some open coded RCU string handling with macro. It turns out that btrfs_debug_in_rcu() is used for the first time and the macro lacks lock/unlock of RCU string for non-debug case (i.e. when the message is not printed), leading to suspicious RCU usage warning when CONFIG_PROVE_RCU is on. Fix this by adding a wrapper to call lock/unlock for the non-debug case too. Fixes: 672d599041c8 ("btrfs: Use wrapper macro for rcu string to remove duplicate code") Reported-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-24ubifs: Remove empty file.hRichard Weinberger
This empty file sneaked into the tree by mistake. Remove it. Fixes: 6eb61d587f45 ("ubifs: Pass struct ubifs_info to ubifs_assert()") Signed-off-by: Richard Weinberger <richard@nod.at>
2018-08-24udf: Fix mounting of Win7 created UDF filesystemsJan Kara
Win7 is creating UDF filesystems with single partition with number 8192. Current partition descriptor scanning code does not handle this well as it incorrectly assumes that partition numbers will form mostly contiguous space of small numbers. This results in unmountable media due to errors like: UDF-fs: error (device dm-1): udf_read_tagged: tag version 0x0000 != 0x0002 || 0x0003, block 0 UDF-fs: warning (device dm-1): udf_fill_super: No fileset found Fix the problem by handling partition descriptors in a way that sparse partition numbering does not matter. Reported-and-tested-by: jean-luc malet <jeanluc.malet@gmail.com> CC: stable@vger.kernel.org Fixes: 7b78fd02fb19530fd101ae137a1f46aa466d9bb6 Signed-off-by: Jan Kara <jack@suse.cz>
2018-08-24udf: Remove dead code from udf_find_fileset()Jan Kara
Remove dead code and slightly simplify code in udf_find_fileset(). Signed-off-by: Jan Kara <jack@suse.cz>
2018-08-23Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: - the rest of MM - various misc fixes and tweaks * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) mm: Change return type int to vm_fault_t for fault handlers lib/fonts: convert comments to utf-8 s390: ebcdic: convert comments to UTF-8 treewide: convert ISO_8859-1 text comments to utf-8 drivers/gpu/drm/gma500/: change return type to vm_fault_t docs/core-api: mm-api: add section about GFP flags docs/mm: make GFP flags descriptions usable as kernel-doc docs/core-api: split memory management API to a separate file docs/core-api: move *{str,mem}dup* to "String Manipulation" docs/core-api: kill trailing whitespace in kernel-api.rst mm/util: add kernel-doc for kvfree mm/util: make strndup_user description a kernel-doc comment fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu builds treewide: correct "differenciate" and "instanciate" typos fs/afs: use new return type vm_fault_t drivers/hwtracing/intel_th/msu.c: change return type to vm_fault_t mm: soft-offline: close the race against page allocation mm: fix race on soft-offlining free huge pages namei: allow restricted O_CREAT of FIFOs and regular files hfs: prevent crash on exit from failed search ...
2018-08-23mm: Change return type int to vm_fault_t for fault handlersSouptick Joarder
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t") The aim is to change the return type of finish_fault() and handle_mm_fault() to vm_fault_t type. As part of that clean up return type of all other recursively called functions have been changed to vm_fault_t type. The places from where handle_mm_fault() is getting invoked will be change to vm_fault_t type but in a separate patch. vmf_error() is the newly introduce inline function in 4.17-rc6. [akpm@linux-foundation.org: don't shadow outer local `ret' in __do_huge_pmd_anonymous_page()] Link: http://lkml.kernel.org/r/20180604171727.GA20279@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu buildsArnd Bergmann
Without CONFIG_MMU, we get a build warning: fs/proc/vmcore.c:228:12: error: 'vmcoredd_mmap_dumps' defined but not used [-Werror=unused-function] static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst, The function is only referenced from an #ifdef'ed caller, so this uses the same #ifdef around it. Link: http://lkml.kernel.org/r/20180525213526.2117790-1-arnd@arndb.de Fixes: 7efe48df8a3d ("vmcore: append device dumps to vmcore as elf notes") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Ganesh Goudar <ganeshgr@chelsio.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23fs/afs: use new return type vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. See 1c8f422059ae ("mm: change return type to vm_fault_t") for reference. Link: http://lkml.kernel.org/r/20180702152017.GA3780@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23namei: allow restricted O_CREAT of FIFOs and regular filesSalvatore Mesoraca
Disallows open of FIFOs or regular files not owned by the user in world writable sticky directories, unless the owner is the same as that of the directory or the file is opened without the O_CREAT flag. The purpose is to make data spoofing attacks harder. This protection can be turned on and off separately for FIFOs and regular files via sysctl, just like the symlinks/hardlinks protection. This patch is based on Openwall's "HARDEN_FIFO" feature by Solar Designer. This is a brief list of old vulnerabilities that could have been prevented by this feature, some of them even allow for privilege escalation: CVE-2000-1134 CVE-2007-3852 CVE-2008-0525 CVE-2009-0416 CVE-2011-4834 CVE-2015-1838 CVE-2015-7442 CVE-2016-7489 This list is not meant to be complete. It's difficult to track down all vulnerabilities of this kind because they were often reported without any mention of this particular attack vector. In fact, before hardlinks/symlinks restrictions, fifos/regular files weren't the favorite vehicle to exploit them. [s.mesoraca16@gmail.com: fix bug reported by Dan Carpenter] Link: https://lkml.kernel.org/r/20180426081456.GA7060@mwanda Link: http://lkml.kernel.org/r/1524829819-11275-1-git-send-email-s.mesoraca16@gmail.com [keescook@chromium.org: drop pr_warn_ratelimited() in favor of audit changes in the future] [keescook@chromium.org: adjust commit subjet] Link: http://lkml.kernel.org/r/20180416175918.GA13494@beast Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Suggested-by: Solar Designer <solar@openwall.com> Suggested-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23hfs: prevent crash on exit from failed searchErnesto A. Fernández
hfs_find_exit() expects fd->bnode to be NULL after a search has failed. hfs_brec_insert() may instead set it to an error-valued pointer. Fix this to prevent a crash. Link: http://lkml.kernel.org/r/53d9749a029c41b4016c495fc5838c9dba3afc52.1530294815.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>