summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-04-27inotify: Fix error return code assignment flow.youngjun
If error code is initialized -EINVAL, there is no need to assign -EINVAL. Link: https://lore.kernel.org/r/20200426143316.29877-1-her0gyugyu@gmail.com Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-04-27Merge 5.7-rc3 into driver-core-nextGreg Kroah-Hartman
We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-27configfs: fix config_item refcnt leak in configfs_rmdir()Xiyu Yang
configfs_rmdir() invokes configfs_get_config_item(), which returns a reference of the specified config_item object to "parent_item" with increased refcnt. When configfs_rmdir() returns, local variable "parent_item" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of configfs_rmdir(). When down_write_killable() fails, the function forgets to decrease the refcnt increased by configfs_get_config_item(), causing a refcnt leak. Fix this issue by calling config_item_put() when down_write_killable() fails. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-04-27sysctl: pass kernel pointers to ->proc_handlerChristoph Hellwig
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27zonefs: Replace uuid_copy() with import_uuid()Andy Shevchenko
There is a specific API to treat raw data as UUID, i.e. import_uuid(). Use it instead of uuid_copy() with explicit casting. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
2020-04-26Merge tag '5.7-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Five cifs/smb3 fixes:two for DFS reconnect failover, one lease fix for stable and the others to fix a missing spinlock during reconnect" * tag '5.7-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix uninitialised lease_key in open_shroot() cifs: ensure correct super block for DFS reconnect cifs: do not share tcons with DFS cifs: minor update to comments around the cifs_tcp_ses_lock mutex cifs: protect updating server->dstaddr with a spinlock
2020-04-26Merge tag 'driver-core-5.7-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some small firmware/driver core/debugfs fixes for 5.7-rc3. The debugfs change is now possible as now the last users of debugfs_create_u32() have been fixed up in the different trees that got merged into 5.7-rc1, and I don't want it creeping back in. The firmware changes did cause a regression in linux-next, so the final patch here reverts part of that, re-exporting the symbol to resolve that issue. All of these patches, with the exception of the final one, have been in linux-next with only that one reported issue" * tag 'driver-core-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: firmware_loader: revert removal of the fw_fallback_config export debugfs: remove return value of debugfs_create_u32() firmware_loader: remove unused exports firmware: imx: fix compile-testing
2020-04-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull pid leak fix from Eric Biederman: "Oleg noticed that put_pid(thread_pid) was not getting called when proc was not compiled in. Let's get that fixed before 5.7 is released and causes problems for anyone" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Put thread_pid in release_task not proc_flush_pid
2020-04-25NFSv4: Remove unreachable error condition due to rpc_run_task()Xiyu Yang
nfs4_proc_layoutget() invokes rpc_run_task(), which return the value to "task". Since rpc_run_task() is impossible to return an ERR pointer, there is no need to add the IS_ERR() condition on "task" here. So we need to remove it. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-24proc: Use PIDTYPE_TGID in next_tgidEric W. Biederman
Combine the pid_task and thes test has_group_leader_pid into a single dereference by using pid_task(PIDTYPE_TGID). This makes the code simpler and proof against needing to even think about any shenanigans that de_thread might get up to. Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-24proc: modernize proc to support multiple private instancesEric W. Biederman
Alexey Gladkov <gladkov.alexey@gmail.com> writes: Procfs modernization: --------------------- Historically procfs was always tied to pid namespaces, during pid namespace creation we internally create a procfs mount for it. However, this has the effect that all new procfs mounts are just a mirror of the internal one, any change, any mount option update, any new future introduction will propagate to all other procfs mounts that are in the same pid namespace. This may have solved several use cases in that time. However today we face new requirements, and making procfs able to support new private instances inside same pid namespace seems a major point. If we want to to introduce new features and security mechanisms we have to make sure first that we do not break existing usecases. Supporting private procfs instances will allow to support new features and behaviour without propagating it to all other procfs mounts. Today procfs is more of a burden especially to some Embedded, IoT, sandbox, container use cases. In user space we are over-mounting null or inaccessible files on top to hide files and information. If we want to hide pids we have to create PID namespaces otherwise mount options propagate to all other proc mounts, changing a mount option value in one mount will propagate to all other proc mounts. If we want to introduce new features, then they will propagate to all other mounts too, resulting either maybe new useful functionality or maybe breaking stuff. We have also to note that userspace should not workaround procfs, the kernel should just provide a sane simple interface. In this regard several developers and maintainers pointed out that there are problems with procfs and it has to be modernized: "Here's another one: split up and modernize /proc." by Andy Lutomirski [1] Discussion about kernel pointer leaks: "And yes, as Kees and Daniel mentioned, it's definitely not just dmesg. In fact, the primary things tend to be /proc and /sys, not dmesg itself." By Linus Torvalds [2] Lot of other areas in the kernel and filesystems have been updated to be able to support private instances, devpts is one major example [3]. Which will be used for: 1) Embedded systems and IoT: usually we have one supervisor for apps, we have some lightweight sandbox support, however if we create pid namespaces we have to manage all the processes inside too, where our goal is to be able to run a bunch of apps each one inside its own mount namespace, maybe use network namespaces for vlans setups, but right now we only want mount namespaces, without all the other complexity. We want procfs to behave more like a real file system, and block access to inodes that belong to other users. The 'hidepid=' will not work since it is a shared mount option. 2) Containers, sandboxes and Private instances of file systems - devpts case Historically, lot of file systems inside Linux kernel view when instantiated were just a mirror of an already created and mounted filesystem. This was the case of devpts filesystem, it seems at that time the requirements were to optimize things and reuse the same memory, etc. This design used to work but not anymore with today's containers, IoT, hostile environments and all the privacy challenges that Linux faces. In that regards, devpts was updated so that each new mounts is a total independent file system by the following patches: "devpts: Make each mount of devpts an independent filesystem" by Eric W. Biederman [3] [4] 3) Linux Security Modules have multiple ptrace paths inside some subsystems, however inside procfs, the implementation does not guarantee that the ptrace() check which triggers the security_ptrace_check() hook will always run. We have the 'hidepid' mount option that can be used to force the ptrace_may_access() check inside has_pid_permissions() to run. The problem is that 'hidepid' is per pid namespace and not attached to the mount point, any remount or modification of 'hidepid' will propagate to all other procfs mounts. This also does not allow to support Yama LSM easily in desktop and user sessions. Yama ptrace scope which restricts ptrace and some other syscalls to be allowed only on inferiors, can be updated to have a per-task context, where the context will be inherited during fork(), clone() and preserved across execve(). If we support multiple private procfs instances, then we may force the ptrace_may_access() on /proc/<pids>/ to always run inside that new procfs instances. This will allow to specifiy on user sessions if we should populate procfs with pids that the user can ptrace or not. By using Yama ptrace scope, some restricted users will only be able to see inferiors inside /proc, they won't even be able to see their other processes. Some software like Chromium, Firefox's crash handler, Wine and others are already using Yama to restrict which processes can be ptracable. With this change this will give the possibility to restrict /proc/<pids>/ but more importantly this will give desktop users a generic and usuable way to specifiy which users should see all processes and which user can not. Side notes: * This covers the lack of seccomp where it is not able to parse arguments, it is easy to install a seccomp filter on direct syscalls that operate on pids, however /proc/<pid>/ is a Linux ABI using filesystem syscalls. With this change all LSMs should be able to analyze open/read/write/close... on /proc/<pid>/ 4) This will allow to implement new features either in kernel or userspace without having to worry about procfs. In containers, sandboxes, etc we have workarounds to hide some /proc inodes, this should be supported natively without doing extra complex work, the kernel should be able to support sane options that work with today and future Linux use cases. 5) Creation of new superblock with all procfs options for each procfs mount will fix the ignoring of mount options. The problem is that the second mount of procfs in the same pid namespace ignores the mount options. The mount options are ignored without error until procfs is remounted. Before: proc /proc proc rw,relatime,hidepid=2 0 0 mount("proc", "/tmp/proc", "proc", 0, "hidepid=1") = 0 +++ exited with 0 +++ proc /proc proc rw,relatime,hidepid=2 0 0 proc /tmp/proc proc rw,relatime,hidepid=2 0 0 proc /proc proc rw,relatime,hidepid=1 0 0 proc /tmp/proc proc rw,relatime,hidepid=1 0 0 After: proc /proc proc rw,relatime,hidepid=ptraceable 0 0 proc /proc proc rw,relatime,hidepid=ptraceable 0 0 proc /tmp/proc proc rw,relatime,hidepid=invisible 0 0 Introduced changes: ------------------- Each mount of procfs creates a separate procfs instance with its own mount options. This series adds few new mount options: * New 'hidepid=ptraceable' or 'hidepid=4' mount option to show only ptraceable processes in the procfs. This allows to support lightweight sandboxes in Embedded Linux, also solves the case for LSM where now with this mount option, we make sure that they have a ptrace path in procfs. * 'subset=pid' that allows to hide non-pid inodes from procfs. It can be used in containers and sandboxes, as these are already trying to hide and block access to procfs inodes anyway. ChangeLog: ---------- * Rebase on top of v5.7-rc1. * Fix a resource leak if proc is not mounted or if proc is simply reconfigured. * Add few selftests. * After a discussion with Eric W. Biederman, the numerical values for hidepid parameter have been removed from uapi. * Remove proc_self and proc_thread_self from the pid_namespace struct. * I took into account the comment of Kees Cook. * Update Reviewed-by tags. * 'subset=pidfs' renamed to 'subset=pid' as suggested by Alexey Dobriyan. * Include Reviewed-by tags. * Rebase on top of Eric W. Biederman's procfs changes. * Add human readable values of 'hidepid' as suggested by Andy Lutomirski. * Started using RCU lock to clean dcache entries as suggested by Linus Torvalds. * 'pidonly=1' renamed to 'subset=pidfs' as suggested by Alexey Dobriyan. * HIDEPID_* moved to uapi/ as they are user interface to mount(). Suggested-by Alexey Dobriyan <adobriyan@gmail.com> * 'hidepid=' and 'gid=' mount options are moved from pid namespace to superblock. * 'newinstance' mount option removed as suggested by Eric W. Biederman. Mount of procfs always creates a new instance. * 'limit_pids' renamed to 'hidepid=3'. * I took into account the comment of Linus Torvalds [7]. * Documentation added. * Fixed a bug that caused a problem with the Fedora boot. * The 'pidonly' option is visible among the mount options. * Renamed mount options to 'newinstance' and 'pids=' Suggested-by: Andy Lutomirski <luto@kernel.org> * Fixed order of commit, Suggested-by: Andy Lutomirski <luto@kernel.org> * Many bug fixes. * Removed 'unshared' mount option and replaced it with 'limit_pids' which is attached to the current procfs mount. Suggested-by Andy Lutomirski <luto@kernel.org> * Do not fill dcache with pid entries that we can not ptrace. * Many bug fixes. References: ----------- [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-January/004215.html [2] http://www.openwall.com/lists/kernel-hardening/2017/10/05/5 [3] https://lwn.net/Articles/689539/ [4] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14 [5] https://lkml.org/lkml/2017/5/2/407 [6] https://lkml.org/lkml/2017/5/3/357 [7] https://lkml.org/lkml/2018/5/11/505 Alexey Gladkov (7): proc: rename struct proc_fs_info to proc_fs_opts proc: allow to mount many instances of proc in one pid namespace proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option proc: add option to mount only a pids subset docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior proc: use human-readable values for hidepid proc: use named enums for better readability Documentation/filesystems/proc.rst | 92 +++++++++--- fs/proc/base.c | 48 +++++-- fs/proc/generic.c | 9 ++ fs/proc/inode.c | 30 +++- fs/proc/root.c | 131 +++++++++++++----- fs/proc/self.c | 6 +- fs/proc/thread_self.c | 6 +- fs/proc_namespace.c | 14 +- include/linux/pid_namespace.h | 12 -- include/linux/proc_fs.h | 30 +++- tools/testing/selftests/proc/.gitignore | 2 + tools/testing/selftests/proc/Makefile | 2 + .../selftests/proc/proc-fsconfig-hidepid.c | 50 +++++++ .../selftests/proc/proc-multiple-procfs.c | 48 +++++++ 14 files changed, 384 insertions(+), 96 deletions(-) create mode 100644 tools/testing/selftests/proc/proc-fsconfig-hidepid.c create mode 100644 tools/testing/selftests/proc/proc-multiple-procfs.c Link: https://lore.kernel.org/lkml/20200419141057.621356-1-gladkov.alexey@gmail.com/ Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-24Use proc_pid_ns() to get pid_namespace from the proc superblockAlexey Gladkov
To get pid_namespace from the procfs superblock should be used a special helper. This will avoid errors when s_fs_info will change the type. Link: https://lore.kernel.org/lkml/20200423200316.164518-3-gladkov.alexey@gmail.com/ Link: https://lore.kernel.org/lkml/20200423112858.95820-1-gladkov.alexey@gmail.com/ Link: https://lore.kernel.org/lkml/06B50A1C-406F-4057-BFA8-3A7729EA7469@lca.pw/ Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-24proc: Put thread_pid in release_task not proc_flush_pidEric W. Biederman
Oleg pointed out that in the unlikely event the kernel is compiled with CONFIG_PROC_FS unset that release_task will now leak the pid. Move the put_pid out of proc_flush_pid into release_task to fix this and to guarantee I don't make that mistake again. When possible it makes sense to keep get and put in the same function so it can easily been seen how they pair up. Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc") Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-24Merge tag 'io_uring-5.7-2020-04-24' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Single fixup for a change that went into -rc2" * tag 'io_uring-5.7-2020-04-24' of git://git.kernel.dk/linux-block: io_uring: only restore req->work for req that needs do completion
2020-04-24Merge tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes/changes that should go into this release: - null_blk zoned fixes (Damien) - blkdev_close() sync improvement (Douglas) - Fix regression in blk-iocost that impacted (at least) systemtap (Waiman) - Comment fix, header removal (Zhiqiang, Jianpeng)" * tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block: null_blk: Cleanup zoned device initialization null_blk: Fix zoned command handling block: remove unused header blk-iocost: Fix error on iocost_ioc_vrate_adj bdev: Reduce time holding bd_mutex in sync in blkdev_close() buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.
2020-04-24Merge tag 'afs-fixes-20200424' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull misc AFS fixes from David Howells: "Three miscellaneous fixes to the afs filesystem: - Remove some struct members that aren't used, aren't set or aren't read, plus a wake up that nothing ever waits for. - Actually set the AFS_SERVER_FL_HAVE_EPOCH flag so that the code that depends on it can work. - Make a couple of waits uninterruptible if they're done for an operation that isn't supposed to be interruptible" * tag 'afs-fixes-20200424' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCH afs: Remove some unused bits
2020-04-24afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriateDavid Howells
When an operation is meant to be done uninterruptibly (such as FS.StoreData), we should not be allowing volume and server record checking to be interrupted. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCHDavid Howells
AFS keeps track of the epoch value from the rxrpc protocol to note (a) when a fileserver appears to have restarted and (b) when different endpoints of a fileserver do not appear to be associated with the same fileserver (ie. all probes back from a fileserver from all of its interfaces should carry the same epoch). However, the AFS_SERVER_FL_HAVE_EPOCH flag that indicates that we've received the server's epoch is never set, though it is used. Fix this to set the flag when we first receive an epoch value from a probe sent to the filesystem client from the fileserver. Fixes: 3bf0fb6f33dd ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24afs: Remove some unused bitsDavid Howells
Remove three bits: (1) afs_server::no_epoch is neither set nor used. (2) afs_server::have_result is set and a wakeup is applied to it, but nothing looks at it or waits on it. (3) afs_vl_dump_edestaddrreq() prints afs_addr_list::probed, but nothing sets it for VL servers. Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24block: unexport bdev_read_page and bdev_write_pageChristoph Hellwig
Each one just has two callers, both in always built-in code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-23f2fs: fix quota_sync failure due to f2fs_lock_opJaegeuk Kim
f2fs_quota_sync() uses f2fs_lock_op() before flushing dirty pages, but f2fs_write_data_page() returns EAGAIN. Likewise dentry blocks, we can just bypass getting the lock, since quota blocks are also maintained by checkpoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-23dlmfs: convert dlmfs_file_read() to copy_to_user()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-23dlmfs_file_write(): fix the bogosity in handling non-zero *pposAl Viro
'count' is how much you want written, not the final position. Moreover, it can legitimately be less than the current position... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-23Merge tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6Linus Torvalds
Pull nfsd fixes from Chuck Lever: "The first set of 5.7-rc fixes for NFS server issues. These were all unresolved at the time the 5.7 window opened, and needed some additional time to ensure they were correctly addressed. They are ready now. At the moment I know of one more urgent issue regarding the NFS server. A fix has been tested and is under review. I expect to send one more pull request, containing this fix (which now consists of 3 patches). Fixes: - Address several use-after-free and memory leak bugs - Prevent a backchannel livelock" * tag 'nfsd-5.7-rc-1' of git://git.linux-nfs.org/projects/cel/cel-2.6: svcrdma: Fix leak of svc_rdma_recv_ctxt objects svcrdma: Fix trace point use-after-free race SUNRPC: Fix backchannel RPC soft lockups SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge nfsd: memory corruption in nfsd4_lock()
2020-04-23Merge tag 'for-5.7-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - several bug fixes(broken mount discard option, remount failure, memory leak) - add missing MODULE_ALIAS_FS for automatically loading exfat module. - set s_time_gran and truncate atime with exfat timestamp granularity. * tag 'for-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: truncate atimes to 2s granularity exfat: properly set s_time_gran exfat: remove 'bps' mount-option exfat: Unify access to the boot sector exfat: add missing MODULE_ALIAS_FS() exfat: Fix discard support
2020-04-23btrfs: fix transaction leak in btrfs_recover_relocationXiyu Yang
btrfs_recover_relocation() invokes btrfs_join_transaction(), which joins a btrfs_trans_handle object into transactions and returns a reference of it with increased refcount to "trans". When btrfs_recover_relocation() returns, "trans" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of btrfs_recover_relocation(). When read_fs_root() failed, the refcnt increased by btrfs_join_transaction() is not decreased, causing a refcnt leak. Fix this issue by calling btrfs_end_transaction() on this error path when read_fs_root() failed. Fixes: 79787eaab461 ("btrfs: replace many BUG_ONs with proper error handling") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23btrfs: fix block group leak when removing failsXiyu Yang
btrfs_remove_block_group() invokes btrfs_lookup_block_group(), which returns a local reference of the block group that contains the given bytenr to "block_group" with increased refcount. When btrfs_remove_block_group() returns, "block_group" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in several exception handling paths of btrfs_remove_block_group(). When those error scenarios occur such as btrfs_alloc_path() returns NULL, the function forgets to decrease its refcnt increased by btrfs_lookup_block_group() and will cause a refcnt leak. Fix this issue by jumping to "out_put_group" label and calling btrfs_put_block_group() when those error scenarios occur. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23btrfs: drop logs when we've aborted a transactionJosef Bacik
Dave reported a problem where we were panicing with generic/475 with misc-5.7. This is because we were doing IO after we had stopped all of the worker threads, because we do the log tree cleanup on roots at drop time. Cleaning up the log tree will always need to do reads if we happened to have evicted the blocks from memory. Because of this simply add a helper to btrfs_cleanup_transaction() that will go through and drop all of the log roots. This gets run before we do the close_ctree() work, and thus we are allowed to do any reads that we would need. I ran this through many iterations of generic/475 with constrained memory and I did not see the issue. general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 2 PID: 12359 Comm: umount Tainted: G W 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_queue_work+0x33/0x1c0 [btrfs] RSP: 0018:ffff9cfb015937d8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8eb5e339ed80 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8eb5eb33b770 RDI: ffff8eb5e37a0460 RBP: ffff8eb5eb33b770 R08: 000000000000020c R09: ffffffff9fc09ac0 R10: 0000000000000007 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b R13: ffff9cfb00229040 R14: 0000000000000008 R15: ffff8eb5d3868000 FS: 00007f167ea022c0(0000) GS:ffff8eb5fae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f167e5e0cb1 CR3: 0000000138c18004 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_end_bio+0x81/0x130 [btrfs] __split_and_process_bio+0xaf/0x4e0 [dm_mod] ? percpu_counter_add_batch+0xa3/0x120 dm_process_bio+0x98/0x290 [dm_mod] ? generic_make_request+0xfb/0x410 dm_make_request+0x4d/0x120 [dm_mod] ? generic_make_request+0xfb/0x410 generic_make_request+0x12a/0x410 ? submit_bio+0x38/0x160 submit_bio+0x38/0x160 ? percpu_counter_add_batch+0xa3/0x120 btrfs_map_bio+0x289/0x570 [btrfs] ? kmem_cache_alloc+0x24d/0x300 btree_submit_bio_hook+0x79/0xc0 [btrfs] submit_one_bio+0x31/0x50 [btrfs] read_extent_buffer_pages+0x2fe/0x450 [btrfs] btree_read_extent_buffer_pages+0x7e/0x170 [btrfs] walk_down_log_tree+0x343/0x690 [btrfs] ? walk_log_tree+0x3d/0x380 [btrfs] walk_log_tree+0xf7/0x380 [btrfs] ? plist_requeue+0xf0/0xf0 ? delete_node+0x4b/0x230 free_log_tree+0x4c/0x130 [btrfs] ? wait_log_commit+0x140/0x140 [btrfs] btrfs_free_log+0x17/0x30 [btrfs] btrfs_drop_and_free_fs_root+0xb0/0xd0 [btrfs] btrfs_free_fs_roots+0x10c/0x190 [btrfs] ? do_raw_spin_unlock+0x49/0xc0 ? _raw_spin_unlock+0x29/0x40 ? release_extent_buffer+0x121/0x170 [btrfs] close_ctree+0x289/0x2e6 [btrfs] generic_shutdown_super+0x6c/0x110 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x3a/0x70 Reported-by: David Sterba <dsterba@suse.com> Fixes: 8c38938c7bb096 ("btrfs: move the root freeing stuff into btrfs_put_root") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23btrfs: fix memory leak of transaction when deleting unused block groupFilipe Manana
When cleaning pinned extents right before deleting an unused block group, we check if there's still a previous transaction running and if so we increment its reference count before using it for cleaning pinned ranges in its pinned extents iotree. However we ended up never decrementing the reference count after using the transaction, resulting in a memory leak. Fix it by decrementing the reference count. Fixes: fe119a6eeb6705 ("btrfs: switch to per-transaction pinned extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-04-23debugfs: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in header file related to debugfs File System support. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Link: https://lore.kernel.org/r/20200419144852.GA9206@nishad Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23kernfs: Change kernfs_node lockdep name to "kn->active"Waiman Long
The kernfs_node lockdep tracking is being done on kn->active, the active reference count. The other reference count (kn->count) is not tracked by lockdep. So change the lockdep name to reflect what it is tracking. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20200402171056.27871-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23pstore: switch to copy_from_user()Al Viro
don't bother trying to do bulk access_ok() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-22cifs: fix uninitialised lease_key in open_shroot()Paulo Alcantara
SMB2_open_init() expects a pre-initialised lease_key when opening a file with a lease, so set pfid->lease_key prior to calling it in open_shroot(). This issue was observed when performing some DFS failover tests and the lease key was never randomly generated. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org>
2020-04-22cifs: ensure correct super block for DFS reconnectPaulo Alcantara
This patch is basically fixing the lookup of tcons (DFS specific) during reconnect (smb2pdu.c:__smb2_reconnect) to update their prefix paths. Previously, we relied on the TCP_Server_Info pointer (misc.c:tcp_super_cb) to determine which tcon to update the prefix path We could not rely on TCP server pointer to determine which super block to update the prefix path when reconnecting tcons since it might map to different tcons that share same TCP connection. Instead, walk through all cifs super blocks and compare their DFS full paths with the tcon being updated to. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2020-04-22cifs: do not share tcons with DFSPaulo Alcantara
This disables tcon re-use for DFS shares. tcon->dfs_path stores the path that the tcon should connect to when doing failing over. If that tcon is used multiple times e.g. 2 mounts using it with different prefixpath, each will need a different dfs_path but there is only one tcon. The other solution would be to split the tcon in 2 tcons during failover but that is much harder. tcons could not be shared with DFS in cifs.ko because in a DFS namespace like: //domain/dfsroot -> /serverA/dfsroot, /serverB/dfsroot //serverA/dfsroot/link -> /serverA/target1/aa/bb //serverA/dfsroot/link2 -> /serverA/target1/cc/dd you can see that link and link2 are two DFS links that both resolve to the same target share (/serverA/target1), so cifs.ko will only contain a single tcon for both link and link2. The problem with that is, if we (auto)mount "link" and "link2", cifs.ko will only contain a single tcon for both DFS links so we couldn't perform failover or refresh the DFS cache for both links because tcon->dfs_path was set to either "link" or "link2", but not both -- which is wrong. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-22mm: Remove MPX leftoversJimmy Assarsson
Remove MPX leftovers in generic code. Fixes: 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86") Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20200402172507.2786-1-jimmyassarsson@gmail.com
2020-04-22proc: use named enums for better readabilityAlexey Gladkov
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22proc: use human-readable values for hidepidAlexey Gladkov
The hidepid parameter values are becoming more and more and it becomes difficult to remember what each new magic number means. Backward compatibility is preserved since it is possible to specify numerical value for the hidepid parameter. This does not break the fsconfig since it is not possible to specify a numerical value through it. All numeric values are converted to a string. The type FSCONFIG_SET_BINARY cannot be used to indicate a numerical value. Selftest has been added to verify this behavior. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22proc: add option to mount only a pids subsetAlexey Gladkov
This allows to hide all files and directories in the procfs that are not related to tasks. Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22proc: instantiate only pids that we can ptrace on 'hidepid=4' mount optionAlexey Gladkov
If "hidepid=4" mount option is set then do not instantiate pids that we can not ptrace. "hidepid=4" means that procfs should only contain pids that the caller can ptrace. Signed-off-by: Djalal Harouni <tixxdz@gmail.com> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22proc: allow to mount many instances of proc in one pid namespaceAlexey Gladkov
This patch allows to have multiple procfs instances inside the same pid namespace. The aim here is lightweight sandboxes, and to allow that we have to modernize procfs internals. 1) The main aim of this work is to have on embedded systems one supervisor for apps. Right now we have some lightweight sandbox support, however if we create pid namespacess we have to manages all the processes inside too, where our goal is to be able to run a bunch of apps each one inside its own mount namespace without being able to notice each other. We only want to use mount namespaces, and we want procfs to behave more like a real mount point. 2) Linux Security Modules have multiple ptrace paths inside some subsystems, however inside procfs, the implementation does not guarantee that the ptrace() check which triggers the security_ptrace_check() hook will always run. We have the 'hidepid' mount option that can be used to force the ptrace_may_access() check inside has_pid_permissions() to run. The problem is that 'hidepid' is per pid namespace and not attached to the mount point, any remount or modification of 'hidepid' will propagate to all other procfs mounts. This also does not allow to support Yama LSM easily in desktop and user sessions. Yama ptrace scope which restricts ptrace and some other syscalls to be allowed only on inferiors, can be updated to have a per-task context, where the context will be inherited during fork(), clone() and preserved across execve(). If we support multiple private procfs instances, then we may force the ptrace_may_access() on /proc/<pids>/ to always run inside that new procfs instances. This will allow to specifiy on user sessions if we should populate procfs with pids that the user can ptrace or not. By using Yama ptrace scope, some restricted users will only be able to see inferiors inside /proc, they won't even be able to see their other processes. Some software like Chromium, Firefox's crash handler, Wine and others are already using Yama to restrict which processes can be ptracable. With this change this will give the possibility to restrict /proc/<pids>/ but more importantly this will give desktop users a generic and usuable way to specifiy which users should see all processes and which users can not. Side notes: * This covers the lack of seccomp where it is not able to parse arguments, it is easy to install a seccomp filter on direct syscalls that operate on pids, however /proc/<pid>/ is a Linux ABI using filesystem syscalls. With this change LSMs should be able to analyze open/read/write/close... In the new patch set version I removed the 'newinstance' option as suggested by Eric W. Biederman. Selftest has been added to verify new behavior. Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22proc: rename struct proc_fs_info to proc_fs_optsAlexey Gladkov
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22exfat: truncate atimes to 2s granularityEric Sandeen
The timestamp for access_time has double seconds granularity(There is no 10msIncrement field for access_time unlike create/modify_time). exfat's atimes are restricted to only 2s granularity so after we set an atime, round it down to the nearest 2s and set the sub-second component of the timestamp to 0. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: properly set s_time_granEric Sandeen
The s_time_gran superblock field indicates the on-disk nanosecond granularity of timestamps, and for exfat that seems to be 10ms, so set s_time_gran to 10000000ns. Without this, in-memory timestamps change when they get re-read from disk. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: remove 'bps' mount-optionTetsuhiro Kohada
remount fails because exfat_show_options() returns unsupported option 'bps'. > # mount -o ro,remount > exfat: Unknown parameter 'bps' To fix the problem, just remove 'bps' option from exfat_show_options(). Signed-off-by: Tetsuhiro Kohada <Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: Unify access to the boot sectorTetsuhiro Kohada
Unify access to boot sector via 'sbi->pbr_bh'. This fixes vol_flags inconsistency at read failed in fs_set_vol_flags(), and buffer_head leak in __exfat_fill_super(). Signed-off-by: Tetsuhiro Kohada <Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: add missing MODULE_ALIAS_FS()Thomas Backlund
This adds the necessary MODULE_ALIAS_FS() to exfat so the module gets automatically loaded when an exfat filesystem is mounted. Signed-off-by: Thomas Backlund <tmb@mageia.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-22exfat: Fix discard supportPali Rohár
Discard support was always unconditionally disabled. Now it is disabled only in the case when blk_queue_discard() returns false. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2020-04-21cifs: minor update to comments around the cifs_tcp_ses_lock mutexSteve French
Update comment to note that it protects server->dstaddr Signed-off-by: Steve French <stfrench@microsoft.com>
2020-04-21coredump: fix null pointer dereference on coredumpSudip Mukherjee
If the core_pattern is set to "|" and any process segfaults then we get a null pointer derefernce while trying to coredump. The call stack shows: RIP: do_coredump+0x628/0x11c0 When the core_pattern has only "|" there is no use of trying the coredump and we can check that while formating the corename and exit with an error. After this change I get: format_corename failed Aborting core Fixes: 315c69261dd3 ("coredump: split pipe command whitespace before expanding template") Reported-by: Matthew Ruffell <matthew.ruffell@canonical.com> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Wise <pabs3@bonedaddy.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416194612.21418-1-sudipm.mukherjee@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>