summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-12-26ceph: update wanted caps after resuming stale sessionYan, Zheng
mds contains an optimization, it does not re-issue stale caps if client does not want any cap. A special case of the optimization is that client wants some caps, but skipped updating 'wanted'. For this case, client needs to update 'wanted' when stale session get renewed. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-12-26ceph: skip updating 'wanted' caps if caps are already issuedYan, Zheng
When reading cached inode that already has Fscr caps, this can avoid two cap messages (one updats 'wanted' caps, one clears 'wanted' caps). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-12-26ceph: don't request excl caps when mount is readonlyYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-12-26ceph: don't update importing cap's mseq when handing cap exportYan, Zheng
Updating mseq makes client think importer mds has accepted all prior cap messages and importer mds knows what caps client wants. Actually some cap messages may have been dropped because of mseq mismatch. If mseq is left untouched, importing cap's mds_wanted later will get reset by cap import message. Cc: stable@vger.kernel.org Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-12-26ceph: remove redundant assignmentChengguang Xu
There is redundant assighment of variable i in ceph_mdsmap_get_random_mds(), just remvoe it. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-12-26ceph: cleanup splice_dentry()Yan, Zheng
splice_dentry() may drop the original dentry and return other dentry. It relies on its caller to update pointer that points to the dropped dentry. This is error-prone. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-12-25Merge tag 'mtd/for-4.21' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull mtd updates from Boris Brezillon: "SPI NOR Core changes: - Parse the 4BAIT SFDP section - Add a bunch of SPI NOR entries to the flash_info table - Add the concept of SFDP fixups and use it to fix a bug on MX25L25635F - A bunch of minor cleanups/comestic changes NAND core changes: - kernel-doc miscellaneous fixes. - Third batch of fixes/cleanup to the raw NAND core impacting various controller drivers (ams-delta, marvell, fsmc, denali, tegra, vf610): * Stop to pass mtd_info objects to internal functions * Reorganize code to avoid forward declarations * Drop useless test in nand_legacy_set_defaults() * Move nand_exec_op() to internal.h * Add nand_[de]select_target() helpers * Pass the CS line to be selected in struct nand_operation * Make ->select_chip() optional when ->exec_op() is implemented * Deprecate the ->select_chip() hook * Move the ->exec_op() method to nand_controller_ops * Move ->setup_data_interface() to nand_controller_ops * Deprecate the dummy_controller field * Fix JEDEC detection * Provide a helper for polling GPIO R/B pin Raw NAND chip drivers changes: - Macronix: * Flag 1.8V AC chips with a broken GET_FEATURES(TIMINGS) Raw NAND controllers drivers changes: - Ams-delta: * Fix the error path * SPDX tag added * May be compiled with COMPILE_TEST=y * Conversion to ->exec_op() interface * Drop .IOADDR_R/W use * Use GPIO API for data I/O - Denali: * Remove denali_reset_banks() * Remove ->dev_ready() hook * Include <linux/bits.h> instead of <linux/bitops.h> * Changes to comply with the above fixes/cleanup done in the core. - FSMC: * Add an SPDX tag to replace the license text * Make conversion from chip to fsmc consistent * Fix unchecked return value in fsmc_read_page_hwecc * Changes to comply with the above fixes/cleanup done in the core. - Marvell: * Prevent timeouts on a loaded machine (fix) * Changes to comply with the above fixes/cleanup done in the core. - OMAP2: * Pass the parent of pdev to dma_request_chan() (fix) - R852: * Use generic DMA API - sh_flctl: * Convert to SPDX identifiers - Sunxi: * Write pageprog related opcodes to the right register: WCMD_SET (fix) - Tegra: * Stop implementing ->select_chip() - VF610: * Add an SPDX tag to replace the license text * Changes to comply with the above fixes/cleanup done in the core. - Various trivial/spelling/coding style fixes. SPI-NAND drivers changes: - Remove the depreacated mt29f_spinand driver from staging. - Add support for: * Toshiba TC58CVG2S0H * GigaDevice GD5FxGQ4xA * Winbond W25N01GV JFFS2 changes: - Fix a lockdep issue MTD changes: - Rework the physmap driver to merge gpio-addr-flash and physmap_of in it - Add a new compatible for RedBoot partitions - Make sub-partitions RW if the parent partition was RO because of a mis-alignment - Add pinctrl support to the - Addition of /* fall-through */ comments where appropriate - Various minor fixes and cleanups Other changes: - Update my email address" * tag 'mtd/for-4.21' of git://git.infradead.org/linux-mtd: (108 commits) mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET MAINTAINERS: Update my email address mtd: rawnand: marvell: prevent timeouts on a loaded machine mtd: rawnand: omap2: Pass the parent of pdev to dma_request_chan() mtd: rawnand: Fix JEDEC detection mtd: spi-nor: Add support for is25lp016d mtd: spi-nor: parse SFDP 4-byte Address Instruction Table mtd: spi-nor: Add 4B_OPCODES flag to is25lp256 mtd: spi-nor: Add an SPDX tag to spi-nor.{c,h} mtd: spi-nor: Make the enable argument passed to set_byte() a bool mtd: spi-nor: Stop passing flash_info around mtd: spi-nor: Avoid forward declaration of internal functions mtd: spi-nor: Drop inline on all internal helpers mtd: spi-nor: Add a post BFPT fixup for MX25L25635E mtd: spi-nor: Add a post BFPT parsing fixup hook mtd: spi-nor: Add the SNOR_F_4B_OPCODES flag mtd: spi-nor: cast to u64 to avoid uint overflows mtd: spi-nor: Add support for IS25LP032/064 mtd: spi-nor: add entry for mt35xu512aba flash mtd: spi-nor: add macros related to MICRON flash ...
2018-12-25Merge tag 'drm-next-2018-12-14' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "Core: - shared fencing staging removal - drop transactional atomic helpers and move helpers to new location - DP/MST atomic cleanup - Leasing cleanups and drop EXPORT_SYMBOL - Convert drivers to atomic helpers and generic fbdev. - removed deprecated obj_ref/unref in favour of get/put - Improve dumb callback documentation - MODESET_LOCK_BEGIN/END helpers panels: - CDTech panels, Banana Pi Panel, DLC1010GIG, - Olimex LCD-O-LinuXino, Samsung S6D16D0, Truly NT35597 WQXGA, - Himax HX8357D, simulated RTSM AEMv8. - GPD Win2 panel - AUO G101EVN010 vgem: - render node support ttm: - move global init out of drivers - fix LRU handling for ghost objects - Support for simultaneous submissions to multiple engines scheduler: - timeout/fault handling changes to help GPU recovery - helpers for hw with preemption support i915: - Scaler/Watermark fixes - DP MST + powerwell fixes - PSR fixes - Break long get/put shmemfs pages - Icelake fixes - Icelake DSI video mode enablement - Engine workaround improvements amdgpu: - freesync support - GPU reset enabled on CI, VI, SOC15 dGPUs - ABM support in DC - KFD support for vega12/polaris12 - SDMA paging queue on vega - More amdkfd code sharing - DCC scanout on GFX9 - DC kerneldoc - Updated SMU firmware for GFX8 chips - XGMI PSP + hive reset support - GPU reset - DC trace support - Powerplay updates for newer Polaris - Cursor plane update fast path - kfd dma-buf support virtio-gpu: - add EDID support vmwgfx: - pageflip with damage support nouveau: - Initial Turing TU104/TU106 modesetting support msm: - a2xx gpu support for apq8060 and imx5 - a2xx gpummu support - mdp4 display support for apq8060 - DPU fixes and cleanups - enhanced profiling support - debug object naming interface - get_iova/page pinning decoupling tegra: - Tegra194 host1x, VIC and display support enabled - Audio over HDMI for Tegra186 and Tegra194 exynos: - DMA/IOMMU refactoring - plane alpha + blend mode support - Color format fixes for mixer driver rcar-du: - R8A7744 and R8A77470 support - R8A77965 LVDS support imx: - fbdev emulation fix - multi-tiled scalling fixes - SPDX identifiers rockchip - dw_hdmi support - dw-mipi-dsi + dual dsi support - mailbox read size fix qxl: - fix cursor pinning vc4: - YUV support (scaling + cursor) v3d: - enable TFU (Texture Formatting Unit) mali-dp: - add support for linear tiled formats sun4i: - Display Engine 3 support - H6 DE3 mixer 0 support - H6 display engine support - dw-hdmi support - H6 HDMI phy support - implicit fence waiting - BGRX8888 support meson: - Overlay plane support - implicit fence waiting - HDMI 1.4 4k modes bridge: - i2c fixes for sii902x" * tag 'drm-next-2018-12-14' of git://anongit.freedesktop.org/drm/drm: (1403 commits) drm/amd/display: Add fast path for cursor plane updates drm/amdgpu: Enable GPU recovery by default for CI drm/amd/display: Fix duplicating scaling/underscan connector state drm/amd/display: Fix unintialized max_bpc state values Revert "drm/amd/display: Set RMX_ASPECT as default" drm/amdgpu: Fix stub function name drm/msm/dpu: Fix clock issue after bind failure drm/msm/dpu: Clean up dpu_media_info.h static inline functions drm/msm/dpu: Further cleanups for static inline functions drm/msm/dpu: Cleanup the debugfs functions drm/msm/dpu: Remove dpu_irq and unused functions drm/msm: Make irq_postinstall optional drm/msm/dpu: Cleanup callers of dpu_hw_blk_init drm/msm/dpu: Remove unused functions drm/msm/dpu: Remove dpu_crtc_is_enabled() drm/msm/dpu: Remove dpu_crtc_get_mixer_height drm/msm/dpu: Remove dpu_dbg drm/msm: dpu: Remove crtc_lock drm/msm: dpu: Remove vblank_requested flag from dpu_crtc drm/msm: dpu: Separate crtc assignment from vblank enable ...
2018-12-25ext4: fix a potential fiemap/page fault deadlock w/ inline_dataTheodore Ts'o
The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent() while still holding the xattr semaphore. This is not necessary and it triggers a circular lockdep warning. This is because fiemap_fill_next_extent() could trigger a page fault when it writes into page which triggers a page fault. If that page is mmaped from the inline file in question, this could very well result in a deadlock. This problem can be reproduced using generic/519 with a file system configuration which has the inline_data feature enabled. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2018-12-24ext4: make sure enough credits are reserved for dioread_nolock writesTheodore Ts'o
There are enough credits reserved for most dioread_nolock writes; however, if the extent tree is sufficiently deep, and/or quota is enabled, the code was not allowing for all eventualities when reserving journal credits for the unwritten extent conversion. This problem can be seen using xfstests ext4/034: WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180 Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180 ... EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28 EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2018-12-23cifs: Save TTL value when parsing DFS referralsPaulo Alcantara
This will be needed by DFS cache. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: auto disable 'serverino' in dfs mountsAurelien Aptel
Different servers have different set of file ids. After failover, unique IDs will be different so we can't validate them. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Paulo Alcantara <palcantara@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: Make devname param optional in cifs_compose_mount_options()Paulo Alcantara
If we only want to get the mount options strings, do not return the devname. For DFS failover, we'll be passing the DFS full path down to cifs_mount() rather than the devname. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: Skip any trailing backslashes from UNCPaulo Alcantara
When extracting hostname from UNC, check for leading backslashes before trying to remove them. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: Refactor out cifs_mount()Paulo Alcantara
* Split and refactor the very large function cifs_mount() in multiple functions: - tcp, ses and tcon setup to mount_get_conns() - tcp, ses and tcon cleanup in mount_put_conns() - tcon tlink setup to mount_setup_tlink() - remote path checking to is_path_remote() * Implement 2 version of cifs_mount() for DFS-enabled builds and non-DFS-enabled builds (CONFIG_CIFS_DFS_UPCALL). In preparation for DFS failover support. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problemGeorgy A Bystrenin
While resolving a bug with locks on samba shares found a strange behavior. When a file locked by one node and we trying to lock it from another node it fail with errno 5 (EIO) but in that case errno must be set to (EACCES | EAGAIN). This isn't happening when we try to lock file second time on same node. In this case it returns EACCES as expected. Also this issue not reproduces when we use SMB1 protocol (vers=1.0 in mount options). Further investigation showed that the mapping from status_to_posix_error is different for SMB1 and SMB2+ implementations. For SMB1 mapping is [NT_STATUS_LOCK_NOT_GRANTED to ERRlock] (See fs/cifs/netmisc.c line 66) but for SMB2+ mapping is [STATUS_LOCK_NOT_GRANTED to -EIO] (see fs/cifs/smb2maperror.c line 383) Quick changes in SMB2+ mapping from EIO to EACCES has fixed issue. BUG: https://bugzilla.kernel.org/show_bug.cgi?id=201971 Signed-off-by: Georgy A Bystrenin <gkot@altlinux.org> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23CIFS: return correct errors when pinning memory failed for direct I/OLong Li
When pinning memory failed, we should return the correct error code and rewind the SMB credits. Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Cc: Murphy Zhou <jencce.kernel@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23CIFS: use the correct length when pinning memory for direct I/O for writeLong Li
The current code attempts to pin memory using the largest possible wsize based on the currect SMB credits. This doesn't cause kernel oops but this is not optimal as we may pin more pages then actually needed. Fix this by only pinning what are needed for doing this write I/O. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
2018-12-23cifs: check ntwrk_buf_start for NULL before dereferencing itRonnie Sahlberg
RHBZ: 1021460 There is an issue where when multiple threads open/close the same directory ntwrk_buf_start might end up being NULL, causing the call to smbCalcSize later to oops with a NULL deref. The real bug is why this happens and why this can become NULL for an open cfile, which should not be allowed. This patch tries to avoid a oops until the time when we fix the underlying issue. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: remove coverity warning in calc_lanman_hashRonnie Sahlberg
password_with_pad is a fixed size buffer of 16 bytes, it contains a password string, to be padded with \0 if shorter than 16 bytes but is just truncated if longer. It is not, and we do not depend on it to be, nul terminated. As such, do not use strncpy() to populate this buffer since the str* prefix suggests that this is a string, which it is not, and it also confuses coverity causing a false warning. Detected by CoverityScan CID#113743 ("Buffer not null terminated") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: remove set but not used variable 'smb_buf'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: fs/cifs/sess.c: In function '_sess_auth_rawntlmssp_assemble_req': fs/cifs/sess.c:1157:18: warning: variable 'smb_buf' set but not used [-Wunused-but-set-variable] It never used since commit cc87c47d9d7a ("cifs: Separate rawntlmssp auth from CIFS_SessSetup()") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: suppress some implicit-fallthrough warningsGustavo A. R. Silva
To avoid the warning: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: change smb2_query_eas to use the compound query-info helperRonnie Sahlberg
Reducing the number of network roundtrips improves the performance of query xattrs Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23Add vers=3.0.2 as a valid option for SMBv3.0.2Kenneth D'souza
Technically 3.02 is not the dialect name although that is more familiar to many, so we should also accept the official dialect name (3.0.2 vs. 3.02) in vers= Signed-off-by: Kenneth D'souza <kdsouza@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: create a helper function for compound query_infoRonnie Sahlberg
and convert statfs to use it. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: address trivial coverity warningSteve French
This is not actually a bug but as Coverity points out we shouldn't be doing an "|=" on a value which hasn't been set (although technically it was memset to zero so isn't a bug) and so might as well change "|=" to "=" in this line Detected by CoverityScan, CID#728535 ("Unitialized scalar variable") Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-12-23cifs: smb2 commands can not be negative, remove confusing checkSteve French
As Coverity points out le16_to_cpu(midEntry->Command) can not be less than zero. Detected by CoverityScan, CID#1438650 ("Macro compares unsigned to 0") Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-12-23cifs: use a compound for setting an xattrRonnie Sahlberg
Improve performance by reducing number of network round trips for set xattr. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23cifs: clean up indentation, replace spaces with tabColin Ian King
Trivial fix to clean up indentation, replace spaces with tab Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-12-23Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "A couple of fixes - no common topic ;-)" [ The aio spectre patch also came in from Jens, so now we have that doubly fixed .. ] * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: proc/sysctl: don't return ENOMEM on lookup when a table is unregistering aio: fix spectre gadget in lookup_ioctx
2018-12-22Revert "vfs: Allow userns root to call mknod on owned filesystems."Christian Brauner
This reverts commit 55956b59df336f6738da916dbb520b6e37df9fbd. commit 55956b59df33 ("vfs: Allow userns root to call mknod on owned filesystems.") enabled mknod() in user namespaces for userns root if CAP_MKNOD is available. However, these device nodes are useless since any filesystem mounted from a non-initial user namespace will set the SB_I_NODEV flag on the filesystem. Now, when a device node s created in a non-initial user namespace a call to open() on said device node will fail due to: bool may_open_dev(const struct path *path) { return !(path->mnt->mnt_flags & MNT_NODEV) && !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV); } The problem with this is that as of the aforementioned commit mknod() creates partially functional device nodes in non-initial user namespaces. In particular, it has the consequence that as of the aforementioned commit open() will be more privileged with respect to device nodes than mknod(). Before it was the other way around. Specifically, if mknod() succeeded then it was transparent for any userspace application that a fatal error must have occured when open() failed. All of this breaks multiple userspace workloads and a widespread assumption about how to handle mknod(). Basically, all container runtimes and systemd live by the slogan "ask for forgiveness not permission" when running user namespace workloads. For mknod() the assumption is that if the syscall succeeds the device nodes are useable irrespective of whether it succeeds in a non-initial user namespace or not. This logic was chosen explicitly to allow for the glorious day when mknod() will actually be able to create fully functional device nodes in user namespaces. A specific problem people are already running into when running 4.18 rc kernels are failing systemd services. For any distro that is run in a container systemd services started with the PrivateDevices= property set will fail to start since the device nodes in question cannot be opened (cf. the arguments in [1]). Full disclosure, Seth made the very sound argument that it is already possible to end up with partially functional device nodes. Any filesystem mounted with MS_NODEV set will allow mknod() to succeed but will not allow open() to succeed. The difference to the case here is that the MS_NODEV case is transparent to userspace since it is an explicitly set mount option while the SB_I_NODEV case is an implicit property enforced by the kernel and hence opaque to userspace. [1]: https://github.com/systemd/systemd/pull/9483 Signed-off-by: Christian Brauner <christian@brauner.io> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-21xfs: reallocate realtime summary cache on growfsOmar Sandoval
At mount time, we allocate m_rsum_cache with the number of realtime bitmap blocks. However, xfs_growfs_rt() can increase the number of realtime bitmap blocks. Using the cache after this happens may access out of the bounds of the cache. Fix it by reallocating the cache in this case. Fixes: 355e3532132b ("xfs: cache minimum realtime summary level") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-21dax: Use non-exclusive wait in wait_entry_unlocked()Dan Williams
get_unlocked_entry() uses an exclusive wait because it is guaranteed to eventually obtain the lock and follow on with an unlock+wakeup cycle. The wait_entry_unlocked() path does not have the same guarantee. Rather than open-code an extra wakeup, just switch to a non-exclusive wait. Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21NFS: nfs_compare_mount_options always compare auth flavors.Chris Perl
This patch removes the check from nfs_compare_mount_options to see if a `sec' option was passed for the current mount before comparing auth flavors and instead just always compares auth flavors. Consider the following scenario: You have a server with the address 192.168.1.1 and two exports /export/a and /export/b. The first export supports `sys' and `krb5' security, the second just `sys'. Assume you start with no mounts from the server. The following results in EIOs being returned as the kernel nfs client incorrectly thinks it can share the underlying `struct nfs_server's: $ mkdir /tmp/{a,b} $ sudo mount -t nfs -o vers=3,sec=krb5 192.168.1.1:/export/a /tmp/a $ sudo mount -t nfs -o vers=3 192.168.1.1:/export/b /tmp/b $ df >/dev/null df: ‘/tmp/b’: Input/output error Signed-off-by: Chris Perl <cperl@janestreet.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-12-21Merge tag '4.20-rc7-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb3 fix from Steve French: "An important smb3 fix for an regression to some servers introduced by compounding optimization to rmdir. This fix has been tested by multiple developers (including me) with the usual private xfstesting, but also by the new cifs/smb3 "buildbot" xfstest VMs (thank you Ronnie and Aurelien for good work on this automation). The automated testing has been updated so that it will catch problems like this in the future. Note that Pavel discovered (very recently) some unrelated but extremely important bugs in credit handling (smb3 flow control problem that can lead to disconnects/reconnects) when compounding, that I would have liked to send in ASAP but the complete testing of those two fixes may not be done in time and have to wait for 4.21" * tag '4.20-rc7-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: Fix rmdir compounding regression to strict servers
2018-12-21mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNTAl Viro
Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21LSM: new method: ->sb_add_mnt_opt()Al Viro
Adding options to growing mnt_opts. NFS kludge with passing context= down into non-text-options mount switched to it, and with that the last use of ->sb_parse_opts_str() is gone. Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21LSM: hide struct security_mnt_opts from any generic codeAl Viro
Keep void * instead, allocate on demand (in parse_str_opts, at the moment). Eventually both selinux and smack will be better off with private structures with several strings in those, rather than this "counter and two pointers to dynamically allocated arrays" ugliness. This commit allows to do that at leisure, without disrupting anything outside of given module. Changes: * instead of struct security_mnt_opt use an opaque pointer initialized to NULL. * security_sb_eat_lsm_opts(), security_sb_parse_opts_str() and security_free_mnt_opts() take it as var argument (i.e. as void **); call sites are unchanged. * security_sb_set_mnt_opts() and security_sb_remount() take it by value (i.e. as void *). * new method: ->sb_free_mnt_opts(). Takes void *, does whatever freeing that needs to be done. * ->sb_set_mnt_opts() and ->sb_remount() might get NULL as mnt_opts argument, meaning "empty". Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21nfs_remount(): don't leak, don't ignore LSM options quietlyAl Viro
* if mount(2) passes something like "context=foo" with MS_REMOUNT in flags (/sbin/mount.nfs will _not_ do that - you need to issue the syscall manually), you'll get leaked copies for LSM options. The reason is that instead of nfs_{alloc,free}_parsed_mount_data() nfs_remount() uses kzalloc/kfree, which lacks the needed cleanup. * selinux options are not changed on remount (as for any other fs), but in case of NFS the failure is quiet - they are not compared to what we used to have, with complaint in case of attempted changes. Trivially fixed by converting to use of security_sb_remount(). Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21btrfs: sanitize security_mnt_opts useAl Viro
1) keeping a copy in btrfs_fs_info is completely pointless - we never use it for anything. Getting rid of that allows for simpler calling conventions for setup_security_options() (caller is responsible for freeing mnt_opts in all cases). 2) on remount we want to use ->sb_remount(), not ->sb_set_mnt_opts(), same as we would if not for FS_BINARY_MOUNTDATA. Behaviours *are* close (in fact, selinux sb_set_mnt_opts() ought to punt to sb_remount() in "already initialized" case), but let's handle that uniformly. And the only reason why the original btrfs changes didn't go for security_sb_remount() in btrfs_remount() case is that it hadn't been exported. Let's export it for a while - it'll be going away soon anyway. Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount()Al Viro
... leaving the "is it kernel-internal" logics in the caller. Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21new helper: security_sb_eat_lsm_opts()Al Viro
combination of alloc_secdata(), security_sb_copy_data(), security_sb_parse_opt_str() and free_secdata(). Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21LSM: lift extracting and parsing LSM options into the caller of ->sb_remount()Al Viro
This paves the way for retaining the LSM options from a common filesystem mount context during a mount parameter parsing phase to be instituted prior to actual mount/reconfiguration actions. Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21LSM: lift parsing LSM options into the caller of ->sb_kern_mount()Al Viro
This paves the way for retaining the LSM options from a common filesystem mount context during a mount parameter parsing phase to be instituted prior to actual mount/reconfiguration actions. Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-12-21iomap: don't search past page end in iomap_is_partially_uptodateEric Sandeen
iomap_is_partially_uptodate() is intended to check wither blocks within the selected range of a not-uptodate page are uptodate; if the range we care about is up to date, it's an optimization. However, the iomap implementation continues to check all blocks up to from+count, which is beyond the page, and can even be well beyond the iop->uptodate bitmap. I think the worst that will happen is that we may eventually find a zero bit and return "not partially uptodate" when it would have otherwise returned true, and skip the optimization. Still, it's clearly an invalid memory access that must be fixed. So: fix this by limiting the search to within the page as is done in the non-iomap variant, block_is_partially_uptodate(). Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k page system: BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0 Read of size 8 at addr ffff800120c3a318 by task fsstress/22337 Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-12-20Merge tag 'upstream-4.20-rc7' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS fixes from Richard Weinberger: - Kconfig dependency fixes for our new auth feature - Fix for selecting the right compressor when creating a fs - Bugfix for a bug in UBIFS's O_TMPFILE implementation - Refcounting fixes for UBI * tag 'upstream-4.20-rc7' of git://git.infradead.org/linux-ubifs: ubifs: Handle re-linking of inodes correctly while recovery ubi: Do not drop UBI device reference before using ubi: Put MTD device after it is not used ubifs: Fix default compression selection in ubifs ubifs: Fix memory leak on error condition ubifs: auth: Add CONFIG_KEYS dependency ubifs: CONFIG_UBIFS_FS_AUTHENTICATION should depend on UBIFS_FS ubifs: replay: Fix high stack usage
2018-12-20vfs: Separate changing mount flags full remountDavid Howells
Separate just the changing of mount flags (MS_REMOUNT|MS_BIND) from full remount because the mount data will get parsed with the new fs_context stuff prior to doing a remount - and this causes the syscall to fail under some circumstances. To quote Eric's explanation: [...] mount(..., MS_REMOUNT|MS_BIND, ...) now validates the mount options string, which breaks systemd unit files with ProtectControlGroups=yes (e.g. systemd-networkd.service) when systemd does the following to change a cgroup (v1) mount to read-only: mount(NULL, "/run/systemd/unit-root/sys/fs/cgroup/systemd", NULL, MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_REMOUNT|MS_BIND, NULL) ... when the kernel has CONFIG_CGROUPS=y but no cgroup subsystems enabled, since in that case the error "cgroup1: Need name or subsystem set" is hit when the mount options string is empty. Probably it doesn't make sense to validate the mount options string at all in the MS_REMOUNT|MS_BIND case, though maybe you had something else in mind. This is also worthwhile doing because we will need to add a mount_setattr() syscall to take over the remount-bind function. Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com>
2018-12-20vfs: Suppress MS_* flag defs within the kernel unless explicitly enabledDavid Howells
Only the mount namespace code that implements mount(2) should be using the MS_* flags. Suppress them inside the kernel unless uapi/linux/mount.h is included. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com>
2018-12-20iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"Dave Chinner
This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3. The reverted commit added page reference counting to iomap page structures that are used to track block size < page size state. This was supposed to align the code with page migration page accounting assumptions, but what it has done instead is break XFS filesystems. Every fstests run I've done on sub-page block size XFS filesystems has since picking up this commit 2 days ago has failed with bad page state errors such as: # ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038" .... SECTION -- xfs FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+ MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc MOUNT_OPTIONS -- /dev/sdc /mnt/scratch generic/038 454s ... run fstests generic/038 at 2018-12-20 18:43:05 XFS (sdc): Unmounting Filesystem XFS (sdc): Mounting V5 Filesystem XFS (sdc): Ending clean mount BUG: Bad page state in process kswapd0 pfn:3a7fa page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1 flags: 0xfffffc0000000() raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360 raw: 0000000000000001 0000000000000000 00000000ffffffff page dumped because: non-NULL mapping CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 Call Trace: dump_stack+0x67/0x90 bad_page.cold.116+0x8a/0xbd free_pcppages_bulk+0x4bf/0x6a0 free_unref_page_list+0x10f/0x1f0 shrink_page_list+0x49d/0xf50 shrink_inactive_list+0x19d/0x3b0 shrink_node_memcg.constprop.77+0x398/0x690 ? shrink_slab.constprop.81+0x278/0x3f0 shrink_node+0x7a/0x2f0 kswapd+0x34b/0x6d0 ? node_reclaim+0x240/0x240 kthread+0x11f/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x24/0x30 Disabling lock debugging due to kernel taint .... The failures are from anyway that frees pages and empties the per-cpu page magazines, so it's not a predictable failure or an easy to debug failure. generic/038 is a reliable reproducer of this problem - it has a 9 in 10 failure rate on one of my test machines. Failure on other machines have been at random points in fstests runs but every run has ended up tripping this problem. Hence generic/038 was used to bisect the failure because it was the most reliable failure. It is too close to the 4.20 release (not to mention holidays) to try to diagnose, fix and test the underlying cause of the problem, so reverting the commit is the only option we have right now. The revert has been tested against a current tot 4.20-rc7+ kernel across multiple machines running sub-page block size XFs filesystems and none of the bad page state failures have been seen. Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-19xfs: stringify scrub types in ftrace outputDarrick J. Wong
Use __print_symbolic to print the scrub type in ftrace output. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>