summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-12-06Merge tag 'drm-intel-next-2018-12-04' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Final drm/i915 changes for v4.21: - ICL DSI video mode enabling (Madhav, Vandita, Jani, Imre) - eDP sink count fix (José) - PSR fixes (José) - DRM DP helper and i915 DSC enabling (Manasi, Gaurav, Anusha) - DP FEC enabling (Anusha) - SKL+ watermark/ddb programming improvements (Ville) - Pixel format fixes (Ville) - Selftest updates (Chris, Tvrtko) - GT and engine workaround improvements (Tvrtko) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87va496uoe.fsf@intel.com
2018-12-05Merge tag 'for-4.20-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A patch in 4.19 introduced a sanity check that was too strict and a filesystem cannot be mounted. This happens for filesystems with more than 10 devices and has been reported by a few users so we need the fix to propagate to stable" * tag 'for-4.20-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: tree-checker: Don't check max block group size as current max chunk size limit is unreliable
2018-12-05fanotify: Make sure to check event_len when copyingKees Cook
As a precaution, make sure we check event_len when copying to userspace. Based on old feedback: https://lkml.kernel.org/r/542D9FE5.3010009@gmx.de Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jan Kara <jack@suse.cz>
2018-12-04dax: Fix unlock mismatch with updated APIMatthew Wilcox
Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to store a replacement entry in the Xarray at the given xas-index with the DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked value of the entry relative to the current Xarray state to be specified. In most contexts dax_unlock_entry() is operating in the same scope as the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry() case the implementation needs to recall the original entry. In the case where the original entry is a 'pmd' entry it is possible that the pfn performed to do the lookup is misaligned to the value retrieved in the Xarray. Change the api to return the unlock cookie from dax_lock_page() and pass it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was assuming that the page was PMD-aligned if the entry was a PMD entry with signatures like: WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0 RIP: 0010:dax_insert_entry+0x2b2/0x2d0 [..] Call Trace: dax_iomap_pte_fault.isra.41+0x791/0xde0 ext4_dax_huge_fault+0x16f/0x1f0 ? up_read+0x1c/0xa0 __do_fault+0x1f/0x160 __handle_mm_fault+0x1033/0x1490 handle_mm_fault+0x18b/0x3d0 Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray") Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-04nfsd: Return EPERM, not EACCES, in some SETATTR caseszhengbin
As the man(2) page for utime/utimes states, EPERM is returned when the second parameter of utime or utimes is not NULL, the caller's effective UID does not match the owner of the file, and the caller is not privileged. However, in a NFS directory mounted from knfsd, it will return EACCES (from nfsd_setattr-> fh_verify->nfsd_permission). This patch fixes that. Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-04iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)Darrick J. Wong
In commit 4721a601099, we tried to fix a problem wherein directio reads into a splice pipe will bounce EFAULT/EAGAIN all the way out to userspace by simulating a zero-byte short read. This happens because some directio read implementations (xfs) will call bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous reads, but as soon as we run out of pipe buffers that _get_pages call returns EFAULT, which the splice code translates to EAGAIN and bounces out to userspace. In that commit, the iomap code catches the EFAULT and simulates a zero-byte read, but that causes assertion errors on regular splice reads because xfs doesn't allow short directio reads. This causes infinite splice() loops and assertion failures on generic/095 on overlayfs because xfs only permit total success or total failure of a directio operation. The underlying issue in the pipe splice code has now been fixed by changing the pipe splice loop to avoid avoid reading more data than there is space in the pipe. Therefore, it's no longer necessary to simulate the short directio, so remove the hack from iomap. Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Ranted-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04splice: don't read more than available pipe spaceDarrick J. Wong
In commit 4721a601099, we tried to fix a problem wherein directio reads into a splice pipe will bounce EFAULT/EAGAIN all the way out to userspace by simulating a zero-byte short read. This happens because some directio read implementations (xfs) will call bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous reads, but as soon as we run out of pipe buffers that _get_pages call returns EFAULT, which the splice code translates to EAGAIN and bounces out to userspace. In that commit, the iomap code catches the EFAULT and simulates a zero-byte read, but that causes assertion errors on regular splice reads because xfs doesn't allow short directio reads. The brokenness is compounded by splice_direct_to_actor immediately bailing on do_splice_to returning <= 0 without ever calling ->actor (which empties out the pipe), so if userspace calls back we'll EFAULT again on the full pipe, and nothing ever gets copied. Therefore, teach splice_direct_to_actor to clamp its requests to the amount of free space in the pipe and remove the simulated short read from the iomap directio code. Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Ranted-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04vfs: allow some remap flags to be passed to vfs_clone_file_rangeDarrick J. Wong
In overlayfs, ovl_remap_file_range calls vfs_clone_file_range on the lower filesystem's inode, passing through whatever remap flags it got from its caller. Since vfs_copy_file_range first tries a filesystem's remap function with REMAP_FILE_CAN_SHORTEN, this can get passed through to the second vfs_copy_file_range call, and this isn't an issue. Change the WARN_ON to look only for the DEDUP flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-12-04xfs: fix inverted return from xfs_btree_sblock_verify_crcEric Sandeen
xfs_btree_sblock_verify_crc is a bool so should not be returning a failaddr_t; worse, if xfs_log_check_lsn fails it returns __this_address which looks like a boolean true (i.e. success) to the caller. (interestingly xfs_btree_lblock_verify_crc doesn't have the issue) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-04xfs: fix PAGE_MASK usage in xfs_free_file_spaceDarrick J. Wong
In commit e53c4b598, I *tried* to teach xfs to force writeback when we fzero/fpunch right up to EOF so that if EOF is in the middle of a page, the post-EOF part of the page gets zeroed before we return to userspace. Unfortunately, I missed the part where PAGE_MASK is ~(PAGE_SIZE - 1), which means that we totally fail to zero if we're fpunching and EOF is within the first page. Worse yet, the same PAGE_MASK thinko plagues the filemap_write_and_wait_range call, so we'd initiate writeback of the entire file, which (mostly) masked the thinko. Drop the tricky PAGE_MASK and replace it with correct usage of PAGE_SIZE and the proper rounding macros. Fixes: e53c4b598 ("xfs: ensure post-EOF zeroing happens after zeroing part of a file") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-12-04aio: clear IOCB_HIPRIChristoph Hellwig
No one is going to poll for aio (yet), so we must clear the HIPRI flag, as we would otherwise send it down the poll queues, where no one will be polling for completions. Signed-off-by: Christoph Hellwig <hch@lst.de> IOCB_HIPRI, not RWF_HIPRI. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04Merge tag 'v4.20-rc5' into for-4.21/blockJens Axboe
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and also getting the merge fix that went into mainline that users are hitting testing for-4.21/block and/or for-next. * tag 'v4.20-rc5': (664 commits) Linux 4.20-rc5 PCI: Fix incorrect value returned from pcie_get_speed_cap() MAINTAINERS: Update linux-mips mailing list address ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set ...
2018-12-04Revert "exec: make de_thread() freezable"Rafael J. Wysocki
Revert commit c22397888f1e "exec: make de_thread() freezable" as requested by Ingo Molnar: "So there's a new regression in v4.20-rc4, my desktop produces this lockdep splat: [ 1772.588771] WARNING: pkexec/4633 still has locks held! [ 1772.588773] 4.20.0-rc4-custom-00213-g93a49841322b #1 Not tainted [ 1772.588775] ------------------------------------ [ 1772.588776] 1 lock held by pkexec/4633: [ 1772.588778] #0: 00000000ed85fbf8 (&sig->cred_guard_mutex){+.+.}, at: prepare_bprm_creds+0x2a/0x70 [ 1772.588786] stack backtrace: [ 1772.588789] CPU: 7 PID: 4633 Comm: pkexec Not tainted 4.20.0-rc4-custom-00213-g93a49841322b #1 [ 1772.588792] Call Trace: [ 1772.588800] dump_stack+0x85/0xcb [ 1772.588803] flush_old_exec+0x116/0x890 [ 1772.588807] ? load_elf_phdrs+0x72/0xb0 [ 1772.588809] load_elf_binary+0x291/0x1620 [ 1772.588815] ? sched_clock+0x5/0x10 [ 1772.588817] ? search_binary_handler+0x6d/0x240 [ 1772.588820] search_binary_handler+0x80/0x240 [ 1772.588823] load_script+0x201/0x220 [ 1772.588825] search_binary_handler+0x80/0x240 [ 1772.588828] __do_execve_file.isra.32+0x7d2/0xa60 [ 1772.588832] ? strncpy_from_user+0x40/0x180 [ 1772.588835] __x64_sys_execve+0x34/0x40 [ 1772.588838] do_syscall_64+0x60/0x1c0 The warning gets triggered by an ancient lockdep check in the freezer: (gdb) list *0xffffffff812ece06 0xffffffff812ece06 is in flush_old_exec (./include/linux/freezer.h:57). 52 * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION 53 * If try_to_freeze causes a lockdep warning it means the caller may deadlock 54 */ 55 static inline bool try_to_freeze_unsafe(void) 56 { 57 might_sleep(); 58 if (likely(!freezing(current))) 59 return false; 60 return __refrigerator(false); 61 } I reviewed the ->cred_guard_mutex code, and the mutex is held across all of exec() - and we always did this. But there's this recent -rc4 commit: > Chanho Min (1): > exec: make de_thread() freezable c22397888f1e: exec: make de_thread() freezable I believe this commit is bogus, you cannot call try_to_freeze() from de_thread(), because it's holding the ->cred_guard_mutex." Reported-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-04btrfs: tree-checker: Don't check max block group size as current max chunk ↵Qu Wenruo
size limit is unreliable [BUG] A completely valid btrfs will refuse to mount, with error message like: BTRFS critical (device sdb2): corrupt leaf: root=2 block=239681536 slot=172 \ bg_start=12018974720 bg_len=10888413184, invalid block group size, \ have 10888413184 expect (0, 10737418240] This has been reported several times as the 4.19 kernel is now being used. The filesystem refuses to mount, but is otherwise ok and booting 4.18 is a workaround. Btrfs check returns no error, and all kernels used on this fs is later than 2011, which should all have the 10G size limit commit. [CAUSE] For a 12 devices btrfs, we could allocate a chunk larger than 10G due to stripe stripe bump up. __btrfs_alloc_chunk() |- max_stripe_size = 1G |- max_chunk_size = 10G |- data_stripe = 11 |- if (1G * 11 > 10G) { stripe_size = 976128930; stripe_size = round_up(976128930, SZ_16M) = 989855744 However the final stripe_size (989855744) * 11 = 10888413184, which is still larger than 10G. [FIX] For the comprehensive check, we need to do the full check at chunk read time, and rely on bg <-> chunk mapping to do the check. We could just skip the length check for now. Fixes: fce466eab7ac ("btrfs: tree-checker: Verify block_group_item") Cc: stable@vger.kernel.org # v4.19+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-04Revert "ovl: relax permission checking on underlying layers"Miklos Szeredi
This reverts commit 007ea44892e6fa963a0876a979e34890325c64eb. The commit broke some selinux-testsuite cases, and it looks like there's no straightforward fix keeping the direction of this patch, so revert for now. The original patch was trying to fix the consistency of permission checks, and not an observed bug. So reverting should be safe. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-04Merge branch 'for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU changes from Paul E. McKenney: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04ext4: fix EXT4_IOC_GROUP_ADD ioctlruippan (潘睿)
Commit e2b911c53584 ("ext4: clean up feature test macros with predicate functions") broke the EXT4_IOC_GROUP_ADD ioctl. This was not noticed since only very old versions of resize2fs (before e2fsprogs 1.42) use this ioctl. However, using a new kernel with an enterprise Linux userspace will cause attempts to use online resize to fail with "No reserved GDT blocks". Fixes: e2b911c53584 ("ext4: clean up feature test macros with predicate...") Cc: stable@kernel.org # v4.4 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: ruippan (潘睿) <ruippan@tencent.com>
2018-12-04ext4: hard fail dax mount on unsupported devicesEric Sandeen
As dax inches closer to production use, an administrator should not be surprised by silently disabling the feature they asked for. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04ext4: remove redundant condition checkChengguang Xu
ext4_xattr_destroy_cache() can handle NULL pointer correctly, so there is no need to check NULL pointer before calling ext4_xattr_destroy_cache(). Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04jbd2: clean up indentation issue, replace spaces with tabColin Ian King
There is a statement that is indented with spaces, replace it with a tab. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04ext4: clean up indentation issues, remove extraneous tabsColin Ian King
There are several lines that are indented too far, clean these up by removing the tabs. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-04ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()Maurizio Lombardi
In case of error, ext4_try_to_write_inline_data() should unlock and release the page it holds. Fixes: f19d5870cbf7 ("ext4: add normal write support for inline data") Cc: stable@kernel.org # 3.8 Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-03ext4: fix possible use after free in ext4_quota_enablePan Bian
The function frees qf_inode via iput but then pass qf_inode to lockdep_set_quota_inode on the failure path. This may result in a use-after-free bug. The patch frees df_inode only when it is never used. Fixes: daf647d2dd5 ("ext4: add lockdep annotations for i_data_sem") Cc: stable@kernel.org # 4.6 Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-03jbd2: avoid long hold times of j_state_lock while committing a transactionJan Kara
We can hold j_state_lock for writing at the beginning of jbd2_journal_commit_transaction() for a rather long time (reportedly for 30 ms) due cleaning revoke bits of all revoked buffers under it. The handling of revoke tables as well as cleaning of t_reserved_list, and checkpoint lists does not need j_state_lock for anything. It is only needed to prevent new handles from joining the transaction. Generally T_LOCKED transaction state prevents new handles from joining the transaction - except for reserved handles which have to allowed to join while we wait for other handles to complete. To prevent reserved handles from joining the transaction while cleaning up lists, add new transaction state T_SWITCH and watch for it when starting reserved handles. With this we can just drop the lock for operations that don't need it. Reported-and-tested-by: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: "Theodore Y. Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-12-03pstore/ram: Avoid NULL deref in ftrace merging failure pathKees Cook
Given corruption in the ftrace records, it might be possible to allocate tmp_prz without assigning prz to it, but still marking it as needing to be freed, which would cause at least a NULL dereference. smatch warnings: fs/pstore/ram.c:340 ramoops_pstore_read() error: we previously assumed 'prz' could be null (see line 255) https://lists.01.org/pipermail/kbuild-all/2018-December/055528.html Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 2fbea82bbb89 ("pstore: Merge per-CPU ftrace records into one") Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Convert buf_lock to semaphoreKees Cook
Instead of running with interrupts disabled, use a semaphore. This should make it easier for backends that may need to sleep (e.g. EFI) when performing a write: |BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum |Preemption disabled at: |[<ffffffff99d60512>] pstore_dump+0x72/0x330 |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D 4.20.0-rc3 #45 |Call Trace: | dump_stack+0x4f/0x6a | ___might_sleep.cold.91+0xd3/0xe4 | __might_sleep+0x50/0x90 | wait_for_completion+0x32/0x130 | virt_efi_query_variable_info+0x14e/0x160 | efi_query_variable_store+0x51/0x1a0 | efivar_entry_set_safe+0xa3/0x1b0 | efi_pstore_write+0x109/0x140 | pstore_dump+0x11c/0x330 | kmsg_dump+0xa4/0xd0 | oops_exit+0x22/0x30 ... Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Fixes: 21b3ddd39fee ("efi: Don't use spinlocks for efi vars") Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Fix bool initialization/comparisonThomas Meyer
Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Do not treat empty buffers as validJoel Fernandes (Google)
The ramoops backend currently calls persistent_ram_save_old() even if a buffer is empty. While this appears to work, it is does not seem like the right thing to do and could lead to future bugs so lets avoid that. It also prevents misleading prints in the logs which claim the buffer is valid. I got something like: found existing buffer, size 0, start 0 When I was expecting: no valid data in buffer (sig = ...) This bails out early (and reports with pr_debug()), since it's an acceptable state. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Simplify ramoops_get_next_prz() argumentsJoel Fernandes (Google)
(1) remove type argument from ramoops_get_next_prz() Since we store the type of the prz when we initialize it, we no longer need to pass it again in ramoops_get_next_prz() since we can just use that to setup the pstore record. So lets remove it from the argument list. (2) remove max argument from ramoops_get_next_prz() Looking at the code flow, the 'max' checks are already being done on the prz passed to ramoops_get_next_prz(). Lets remove it to simplify this function and reduce its arguments. (3) further reduce ramoops_get_next_prz() arguments by passing record Both the id and type fields of a pstore_record are set by ramoops_get_next_prz(). So we can just pass a pointer to the pstore_record instead of passing individual elements. This results in cleaner more readable code and fewer lines. In addition lets also remove the 'update' argument since we can detect that. Changes are squashed into a single patch to reduce fixup conflicts. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Map PSTORE_TYPE_* to stringsJoel Fernandes (Google)
In later patches we will need to map types to names, so create a constant table for that which can also be used in different parts of old and new code. This saves the type in the PRZ which will be useful in later patches. Instead of having an explicit PSTORE_TYPE_UNKNOWN, just use ..._MAX. This includes removing the now redundant filename templates which can use a single format string. Also, there's no reason to limit the "is it still compressed?" test to only PSTORE_TYPE_DMESG when building the pstorefs filename. Records are zero-initialized, so a backend would need to have explicitly set compressed=1. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Improve and update some comments and status outputKees Cook
This improves and updates some comments: - dump handler comment out of sync from calling convention - fix kern-doc typo and improves status output: - reminder that only kernel crash dumps are compressed - do not be silent about ECC infrastructure failures Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Add kern-doc for struct persistent_ram_zoneKees Cook
The struct persistent_ram_zone wasn't well documented. This adds kern-doc for it. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Report backend assignments with finer granularityKees Cook
In order to more easily perform automated regression testing, this adds pr_debug() calls to report each prz allocation which can then be verified against persistent storage. Specifically, seeing the dividing line between header, data, any ECC bytes. (And the general assignment output is updated to remove the bogus ECC blocksize which isn't actually recorded outside the prz instance.) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore/ram: Standardize module name in ramoopsKees Cook
With both ram.c and ram_core.c built into ramoops.ko, it doesn't make sense to have differing pr_fmt prefixes. This fixes ram_core.c to use the module name (as ram.c already does). Additionally improves region reservation error to include the region name. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Avoid duplicate call of persistent_ram_zap()Peng Wang
When initialing a prz, if invalid data is found (no PERSISTENT_RAM_SIG), the function call path looks like this: ramoops_init_prz -> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap persistent_ram_zap As we can see, persistent_ram_zap() is called twice. We can avoid this by adding an option to persistent_ram_new(), and only call persistent_ram_zap() when it is needed. Signed-off-by: Peng Wang <wangpeng15@xiaomi.com> [kees: minor tweak to exit path and commit log] Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03pstore: Remove needless lock during console writesKees Cook
Since the console writer does not use the preallocated crash dump buffer any more, there is no reason to perform locking around it. Fixes: 70ad35db3321 ("pstore: Convert console write to use ->write_buf") Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-12-03pstore: Do not use crash buffer for decompressionKees Cook
The pre-allocated compression buffer used for crash dumping was also being used for decompression. This isn't technically safe, since it's possible the kernel may attempt a crashdump while pstore is populating the pstore filesystem (and performing decompression). Instead, just allocate a separate buffer for decompression. Correctness is preferred over performance here. Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03Merge branch 'for-linus/pstore' into for-next/pstoreKees Cook
2018-12-03dlm: fix invalid cluster name warningDavid Teigland
The warning added in commit 3b0e761ba83 "dlm: print log message when cluster name is not set" did not account for the fact that lockspaces created from userland do not supply a cluster name, so bogus warnings are printed every time a userland lockspace is created. Signed-off-by: David Teigland <teigland@redhat.com>
2018-12-03sysfs: constify sysfs create/remove files harderJani Nikula
Let the passed in array be const (and thus placed in rodata) instead of a mutable array of const pointers. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181004143750.30880-1-jani.nikula@intel.com
2018-12-03dlm: NULL check before some freeing functions is not neededThomas Meyer
NULL check before some freeing functions is not needed. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: David Teigland <teigland@redhat.com>
2018-12-03fuse: fix revalidation of attributes for permission checkMiklos Szeredi
fuse_invalidate_attr() now sets fi->inval_mask instead of fi->i_time, hence we need to check the inval mask in fuse_permission() as well. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 2f1e81965fd0 ("fuse: allow fine grained attr cache invaldation")
2018-12-03fuse: fix fsync on directoryMiklos Szeredi
Commit ab2257e9941b ("fuse: reduce size of struct fuse_inode") moved parts of fields related to writeback on regular file and to directory caching into a union. However fuse_fsync_common() called from fuse_dir_fsync() touches some writeback related fields, resulting in a crash. Move writeback related parts from fuse_fsync_common() to fuse_fysnc(). Reported-by: Brett Girton <btgirton@gmail.com> Tested-by: Brett Girton <btgirton@gmail.com> Fixes: ab2257e9941b ("fuse: reduce size of struct fuse_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-12-03Merge 4.20-rc5 into driver-core-nextGreg Kroah-Hartman
We need the fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-02nfs: don't dirty kernel pages read by direct-ioDave Kleikamp
When we use direct_IO with an NFS backing store, we can trigger a WARNING in __set_page_dirty(), as below, since we're dirtying the page unnecessarily in nfs_direct_read_completion(). To fix, replicate the logic in commit 53cbf3b157a0 ("fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read"). Other filesystems that implement direct_IO handle this; most use blockdev_direct_IO(). ceph and cifs have similar logic. mount 127.0.0.1:/export /nfs dd if=/dev/zero of=/nfs/image bs=1M count=200 losetup --direct-io=on -f /nfs/image mkfs.btrfs /dev/loop0 mount -t btrfs /dev/loop0 /mnt/ kernel: WARNING: CPU: 0 PID: 8067 at fs/buffer.c:580 __set_page_dirty+0xaf/0xd0 kernel: Modules linked in: loop(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) fuse(E) tun(E) ip6t_rpfilter(E) ipt_REJECT(E) nf_ kernel: snd_seq(E) snd_seq_device(E) snd_pcm(E) video(E) snd_timer(E) snd(E) soundcore(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) sr_mod(E) cdrom(E) ata_generic(E) pata_acpi(E) crc32c_intel(E) ahci(E) li kernel: CPU: 0 PID: 8067 Comm: kworker/0:2 Tainted: G E 4.20.0-rc1.master.20181111.ol7.x86_64 #1 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 kernel: Workqueue: nfsiod rpc_async_release [sunrpc] kernel: RIP: 0010:__set_page_dirty+0xaf/0xd0 kernel: Code: c3 48 8b 02 f6 c4 04 74 d4 48 89 df e8 ba 05 f7 ff 48 89 c6 eb cb 48 8b 43 08 a8 01 75 1f 48 89 d8 48 8b 00 a8 04 74 02 eb 87 <0f> 0b eb 83 48 83 e8 01 eb 9f 48 83 ea 01 0f 1f 00 eb 8b 48 83 e8 kernel: RSP: 0000:ffffc1c8825b7d78 EFLAGS: 00013046 kernel: RAX: 000fffffc0020089 RBX: fffff2b603308b80 RCX: 0000000000000001 kernel: RDX: 0000000000000001 RSI: ffff9d11478115c8 RDI: ffff9d11478115d0 kernel: RBP: ffffc1c8825b7da0 R08: 0000646f6973666e R09: 8080808080808080 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9d11478115d0 kernel: R13: ffff9d11478115c8 R14: 0000000000003246 R15: 0000000000000001 kernel: FS: 0000000000000000(0000) GS:ffff9d115ba00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007f408686f640 CR3: 0000000104d8e004 CR4: 00000000000606f0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: __set_page_dirty_buffers+0xb6/0x110 kernel: set_page_dirty+0x52/0xb0 kernel: nfs_direct_read_completion+0xc4/0x120 [nfs] kernel: nfs_pgio_release+0x10/0x20 [nfs] kernel: rpc_free_task+0x30/0x70 [sunrpc] kernel: rpc_async_release+0x12/0x20 [sunrpc] kernel: process_one_work+0x174/0x390 kernel: worker_thread+0x4f/0x3e0 kernel: kthread+0x102/0x140 kernel: ? drain_workqueue+0x130/0x130 kernel: ? kthread_stop+0x110/0x110 kernel: ret_from_fork+0x35/0x40 kernel: ---[ end trace 01341980905412c9 ]--- Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [forward-ported to v4.20] Signed-off-by: Calum Mackay <calum.mackay@oracle.com> Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-12-02flexfiles: enforce per-mirror stateid only for v4 DSesTigran Mkrtchyan
Since commit bb21ce0ad227 we always enforce per-mirror stateid. However, this makes sense only for v4+ servers. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-12-02jffs2: Fix use of uninitialized delayed_work, lockdep breakageDaniel Santos
jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER is defined then a write buffer is available and has been initialized. However, this does is not the case when the mtd device has no out-of-band buffer: int jffs2_nand_flash_setup(struct jffs2_sb_info *c) { if (!c->mtd->oobsize) return 0; ... The resulting call to cancel_delayed_work_sync passing a uninitialized (but zeroed) delayed_work struct forces lockdep to become disabled. [ 90.050639] overlayfs: upper fs does not support tmpfile. [ 90.652264] INFO: trying to register non-static key. [ 90.662171] the code is fine but needs lockdep annotation. [ 90.673090] turning off the locking correctness validator. [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0 [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7 [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420 [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90 [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000 [ 90.780033] ... [ 90.784902] Call Trace: [ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148 [ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c [ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c [ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc [ 90.828345] [<8003f27c>] flush_work+0x200/0x24c [ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210 [ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54 [ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120 [ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c [ 90.875067] [<80014424>] syscall_common+0x34/0x58 Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Reviewed-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-12-01Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', ↵Paul E. McKenney
'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD bug.2018.11.12a: Get rid of BUG_ON() and friends consolidate.2018.12.01a: Continued RCU flavor-consolidation cleanup doc.2018.11.12a: Documentation updates fixes.2018.11.12a: Miscellaneous fixes initrd.2018.11.08b: Automate creation of rcutorture initrd sil.2018.11.12a: Remove more spin_unlock_wait() calls
2018-12-01Merge tag 'for-linus-20181201' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: - Single range elevator discard merge fix, that caused crashes (Ming) - Fix for a regression in O_DIRECT, where we could potentially lose the error value (Maximilian Heyne) - NVMe pull request from Christoph, with little fixes all over the map for NVMe. * tag 'for-linus-20181201' of git://git.kernel.dk/linux-block: block: fix single range discard merge nvme-rdma: fix double freeing of async event data nvme: flush namespace scanning work just before removing namespaces nvme: warn when finding multi-port subsystems without multipathing enabled fs: fix lost error code in dio_complete nvme-pci: fix surprise removal nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() nvme: Free ctrl device name on init failure
2018-11-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "31 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits) ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set userfaultfd: shmem: add i_size checks userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem ...