summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
AgeCommit message (Collapse)Author
2016-03-26Merge tag 'ofs-pull-tag-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs filesystem from Mike Marshall. This finally merges the long-pending orangefs filesystem, which has been much cleaned up with input from Al Viro over the last six months. From the documentation file: "OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal for large storage problems faced by HPC, BigData, Streaming Video, Genomics, Bioinformatics. Orangefs, originally called PVFS, was first developed in 1993 by Walt Ligon and Eric Blumer as a parallel file system for Parallel Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns of parallel programs. Orangefs features include: - Distributes file data among multiple file servers - Supports simultaneous access by multiple clients - Stores file data and metadata on servers using local file system and access methods - Userspace implementation is easy to install and maintain - Direct MPI support - Stateless" see Documentation/filesystems/orangefs.txt for more in-depth details. * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits) orangefs: fix orangefs_superblock locking orangefs: fix do_readv_writev() handling of error halfway through orangefs: have ->kill_sb() evict the VFS side of things first orangefs: sanitize ->llseek() orangefs-bufmap.h: trim unused junk orangefs: saner calling conventions for getting a slot orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer orangefs: get rid of readdir_handle_s ornagefs: ensure that truncate has an up to date inode size orangefs: move code which sets i_link to orangefs_inode_getattr orangefs: remove needless wrapper around GFP_KERNEL orangefs: remove wrapper around mutex_lock(&inode->i_mutex) orangefs: refactor inode type or link_target change detection orangefs: use new getattr for revalidate and remove old getattr orangefs: use new getattr in inode getattr and permission orangefs: use new orangefs_inode_getattr to get size in write and llseek orangefs: use new orangefs_inode_getattr to create new inodes orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr orangefs: remove inode->i_lock wrapper orangefs: put register_chrdev immediately before register_filesystem ...
2016-03-24Merge tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull more nfsd updates from Bruce Fields: "Apologies for the previous request, which omitted the top 8 commits from my for-next branch (including the SCSI layout commits). Thanks to Trond for spotting my error!" This actually includes the new layout types, so here's that part of the pull message repeated: "Support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. Note this pull request also includes the client side of SCSI layout, with Trond's permission" * tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux: nfsd: use short read as well as i_size to set eof nfsd: better layoutupdate bounds-checking nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK nfsd: add SCSI layout support nfsd: move some blocklayout code nfsd: add a new config option for the block layout driver nfs/blocklayout: add SCSI layout support nfs4.h: add SCSI layout definitions
2016-03-22fat: add config option to set UTF-8 mount option by defaultMaciej S. Szmigiero
FAT has long supported its own default file name encoding config setting, separate from CONFIG_NLS_DEFAULT. However, if UTF-8 encoded file names are desired FAT character set should not be set to utf8 since this would make file names case sensitive even if case insensitive matching is requested. Instead, "utf8" mount options should be provided to enable UTF-8 file names in FAT file system. Unfortunately, there was no possibility to set the default value of this option so on UTF-8 system "utf8" mount option had to be added manually to most FAT mounts. This patch adds config option to set such default value. Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22ocfs2: add feature document for online file checkGang He
This document will describe OCFS2 online file check feature. OCFS2 is often used in high-availaibility systems. However, OCFS2 usually converts the filesystem to read-only when encounters an error. This may not be necessary, since turning the filesystem read-only would affect other running processes as well, decreasing availability. Then, a mount option (errors=continue) is introduced, which would return the -EIO errno to the calling process and terminate furhter processing so that the filesystem is not corrupted further. The filesystem is not converted to read-only, and the problematic file's inode number is reported in the kernel log. The user can try to check/fix this file via online filecheck feature. Signed-off-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-21Merge branch 'for-linus-4.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "We have a good sized cleanup of our internal read ahead code, and the first series of commits from Chandan to enable PAGE_SIZE > sectorsize Otherwise, it's a normal series of cleanups and fixes, with many thanks to Dave Sterba for doing most of the patch wrangling this time" * 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (82 commits) btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums btrfs: Fix misspellings in comments. btrfs: Print Warning only if ENOSPC_DEBUG is enabled btrfs: scrub: silence an uninitialized variable warning btrfs: move btrfs_compression_type to compression.h btrfs: rename btrfs_print_info to btrfs_print_mod_info Btrfs: Show a warning message if one of objectid reaches its highest value Documentation: btrfs: remove usage specific information btrfs: use kbasename in btrfsic_mount Btrfs: do not collect ordered extents when logging that inode exists Btrfs: fix race when checking if we can skip fsync'ing an inode Btrfs: fix listxattrs not listing all xattrs packed in the same item Btrfs: fix deadlock between direct IO reads and buffered writes Btrfs: fix extent_same allowing destination offset beyond i_size Btrfs: fix file loss on log replay after renaming a file and fsync Btrfs: fix unreplayable log after snapshot delete + parent dir fsync Btrfs: fix lockdep deadlock warning due to dev_replace btrfs: drop unused argument in btrfs_ioctl_get_supported_features btrfs: add GET_SUPPORTED_FEATURES to the control device ioctls btrfs: change max_inline default to 2048 ...
2016-03-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...
2016-03-18nfsd: add SCSI layout supportChristoph Hellwig
This is a simple extension to the block layout driver to use SCSI persistent reservations for access control and fencing, as well as SCSI VPD pages for device identification. For this we need to pass the nfs4_client to the proc_getdeviceinfo method to generate the reservation key, and add a new fence_client method to allow for fence actions in the layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-17Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfsLinus Torvalds
Pull configfs updates from Christoph Hellwig: - A large patch from me to simplify setting up the list of default groups by actually implementing it as a list instead of an array. - a small Y2083 prep patch from Deepa Dinamani. Probably doesn't matter on it's own, but it seems like he is trying to get rid of all CURRENT_TIME uses in file systems, which is a worthwhile goal. * tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs: configfs: switch ->default groups to a linked list configfs: Replace CURRENT_TIME by current_fs_time()
2016-03-17proc: add /proc/<pid>/timerslack_ns interfaceJohn Stultz
This patch provides a proc/PID/timerslack_ns interface which exposes a task's timerslack value in nanoseconds and allows it to be changed. This allows power/performance management software to set timer slack for other threads according to its policy for the thread (such as when the thread is designated foreground vs. background activity) If the value written is non-zero, slack is set to that value. Otherwise sets it to the default for the thread. This interface checks that the calling task has permissions to to use PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure arbitrary apps do not change the timer slack for other apps. Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17Merge tag 'tty-4.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here's the big tty/serial driver pull request for 4.6-rc1. Lots of changes in here, Peter has been on a tear again, with lots of refactoring and bugs fixes, many thanks to the great work he has been doing. Lots of driver updates and fixes as well, full details in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'tty-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (220 commits) serial: 8250: describe CONFIG_SERIAL_8250_RSA serial: samsung: optimize UART rx fifo access routine serial: pl011: add mark/space parity support serial: sa1100: make sa1100_register_uart_fns a function tty: serial: 8250: add MOXA Smartio MUE boards support serial: 8250: convert drivers to use up_to_u8250p() serial: 8250/mediatek: fix building with SERIAL_8250=m serial: 8250/ingenic: fix building with SERIAL_8250=m serial: 8250/uniphier: fix modular build Revert "drivers/tty/serial: make 8250/8250_ingenic.c explicitly non-modular" Revert "drivers/tty/serial: make 8250/8250_mtk.c explicitly non-modular" serial: mvebu-uart: initial support for Armada-3700 serial port serial: mctrl_gpio: Add missing module license serial: ifx6x60: avoid uninitialized variable use tty/serial: at91: fix bad offset for UART timeout register tty/serial: at91: restore dynamic driver binding serial: 8250: Add hardware dependency to RT288X option TTY, devpts: document pty count limiting tty: goldfish: support platform_device with id -1 drivers: tty: goldfish: Add device tree bindings ...
2016-03-17Merge tag 'docs-for-linus' of git://git.lwn.net/linuxLinus Torvalds
Pul documentation update from Jon Corbet: "Another relatively boring cycle for the docs tree: typo fixes, translation updates, etc" * tag 'docs-for-linus' of git://git.lwn.net/linux: modsign: Fix documentation on module signing enforcement parameter. Doc: nfs: Fix typos in Documentation/filesystems/nfs Documentation: kselftest: Remove duplicate word doc: fix grammar Documentation: Howto: Fixed subtitles style Doc: ARM: Fix a typo in clksrc-change-registers.awk Documentation/ko_KR: update maintainer information Documentation: Fix int/unsigned int comparison Documentation: Chinese translation of arm64/silicon-errata.txt Documentation:Update Documentation/zh_CN/arm64/booting.txt Documentation: HOWTO: remove obsolete info about regression postings Doc: ja_JP: Fix a typo in HOWTO Doc: i2c: Fix typo in Documentation/i2c Doc: DocBook: Fix a typo in device-drivers.tmpl Remove "arch" usage in Documentation/features/list-arch.sh README: cosmetic fixes Documentation/CodingStyle: add space before parenthesis in example macro SubmittingPatches: fix spelling of "git send-email"
2016-03-14Orangefs: merge to v4.5Mike Marshall
Merge tag 'v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into current Linux 4.5
2016-03-11Documentation: btrfs: remove usage specific informationDavid Sterba
The document in the kernel sources is yet another palce where the documentation would need to be updated, while it is not the primary source. We actively maintain the wiki pages. Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-09Doc: nfs: Fix typos in Documentation/filesystems/nfsMasanari Iida
This patch fix spelling typos found in Documentation/filesystems/nfs Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-03-09doc: fix grammarJavi Merino
Some minor typos: - make is unbindable -> make it unbindable - a underlying -> an underlying - different version -> different versions Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-03-07TTY, devpts: document pty count limitingKonstantin Khlebnikov
Logic has been changed in kernel 3.4 by commit e9aba5158a80 ("tty: rework pty count limiting") but still not documented. Sysctl kernel.pty.max works as global limit, kernel.pty.reserve ptys are reserved for initial devpts instance (mounted without "newinstance"). Per-instance limit also could be set by mount option "max=%d". Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-06configfs: switch ->default groups to a linked listChristoph Hellwig
Replace the current NULL-terminated array of default groups with a linked list. This gets rid of lots of nasty code to size and/or dynamically allocate the array. While we're at it also provide a conveniant helper to remove the default groups. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Felipe Balbi <balbi@kernel.org> [drivers/usb/gadget] Acked-by: Joel Becker <jlbec@evilplan.org> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
2016-03-01Merge tag 'for-chris' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.6 Btrfs patchsets for 4.6
2016-02-26Orangefs: update orangefs.txtMike Marshall
Al Viro has cleaned up the way ops are processed and waited for, now orangefs.txt has an overview of how it works. Several recent related commits have added to the comments in the code as well. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-02-12btrfs: Introduce new mount option to disable tree log replayQu Wenruo
Introduce a new mount option "nologreplay" to co-operate with "ro" mount option to get real readonly mount, like "norecovery" in ext* and xfs. Since the new parse_options() need to check new flags at remount time, so add a new parameter for parse_options(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-12btrfs: Introduce new mount option usebackuproot to replace recoveryQu Wenruo
Current "recovery" mount option will only try to use backup root. However the word "recovery" is too generic and may be confusing for some users. Here introduce a new and more specific mount option, "usebackuproot" to replace "recovery" mount option. "Recovery" will be kept for compatibility reason, but will be deprecated. Also, since "usebackuproot" will only affect mount behavior and after open_ctree() it has nothing to do with the filesystem, so clear the flag after mount succeeded. This provides the basis for later unified "norecovery" mount option. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> [ dropped usebackuproot from show_mount, added note about 'recovery' to docs ] Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-10efi: Make efivarfs entries immutable by defaultPeter Jones
"rm -rf" is bricking some peoples' laptops because of variables being used to store non-reinitializable firmware driver data that's required to POST the hardware. These are 100% bugs, and they need to be fixed, but in the mean time it shouldn't be easy to *accidentally* brick machines. We have to have delete working, and picking which variables do and don't work for deletion is quite intractable, so instead make everything immutable by default (except for a whitelist), and make tools that aren't quite so broad-spectrum unset the immutable flag. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-02-03mm: polish virtual memory accountingKonstantin Khlebnikov
* add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture * always account VMAs with flag VM_STACK as stack (as it was before) * cleanup classifying helpers * update comments and documentation Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03proc: revert /proc/<pid>/maps [stack:TID] annotationJohannes Weiner
Commit b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps. Finding the task of a stack VMA requires walking the entire thread list, turning this into quadratic behavior: a thousand threads means a thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a million combinations. The cost is not in proportion to the usefulness as described in the patch. Drop the [stack:TID] annotation to make /proc/<pid>/maps (and /proc/<pid>/numa_maps) usable again for higher thread counts. The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as identifying the stack VMA there is an O(1) operation. Siddesh said: "The end users needed a way to identify thread stacks programmatically and there wasn't a way to do that. I'm afraid I no longer remember (or have access to the resources that would aid my memory since I changed employers) the details of their requirement. However, I did do this on my own time because I thought it was an interesting project for me and nobody really gave any feedback then as to its utility, so as far as I am concerned you could roll back the main thread maps information since the information is available in the thread-specific files" Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20Documentation/filesystems/vfat.txt: update the limitation for fat fallocateNamjae Jeon
Update the limitation for fat fallocate. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-17Merge tag 'docs-4.5' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jon Corbet: "A relatively boring cycle in the docs tree. There's a few kernel-doc fixes and various document tweaks. One patch reaches out of the documentation subtree to fix a comment in init/do_mounts_rd.c. There didn't seem to be anybody more appropriate to take that one, so I accepted it" * tag 'docs-4.5' of git://git.lwn.net/linux: (29 commits) thermal: add description for integral_cutoff unit Documentation: update libhugetlbfs site url Documentation: Explain pci=conf1,conf2 more verbosely DMA-API: fix confusing sentence in Documentation/DMA-API.txt Documentation: translations: update linux cross reference link Documentation: fix typo in CodingStyle init, Documentation: Remove ramdisk_blocksize mentions Documentation-getdelays: Apply a recommendation from "checkpatch.pl" in main() Documentation: HOWTO: update versions from 3.x to 4.x Documentation: remove outdated references from translations Doc: treewide: Fix grammar "a" to "an" Documentation: cpu-hotplug: Fix sysfs mount instructions can-doc: Add hint about getting timestamps Fix CFQ I/O scheduler parameter name in documentation Documentation: arm: remove dead links from Marvell Berlin docs Documentation: HOWTO: update code cross reference link Doc: Docbook/iio: Fix typo in iio.tmpl DocBook: make index.html generation less verbose by default DocBook: Cleanup: remove an unused $(call) line DocBook: Add a help message for DOCBOOKS env var ...
2016-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to -stable - A few misc fixes - OCFS2 updates - Part of MM. Including pretty large changes to page-flags handling and to thp management which have been buffered up for 2-3 cycles now. I have a lot of MM material this time. [ It turns out the THP part wasn't quite ready, so that got dropped from this series - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) zsmalloc: reorganize struct size_class to pack 4 bytes hole mm/zbud.c: use list_last_entry() instead of list_tail_entry() zram/zcomp: do not zero out zcomp private pages zram: pass gfp from zcomp frontend to backend zram: try vmalloc() after kmalloc() zram/zcomp: use GFP_NOIO to allocate streams mm: add tracepoint for scanning pages drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 mm/page_isolation: use macro to judge the alignment mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE() mm: rework virtual memory accounting include/linux/memblock.h: fix ordering of 'flags' argument in comments mm: move lru_to_page to mm_inline.h Documentation/filesystems: describe the shared memory usage/accounting memory-hotplug: don't BUG() in register_memory_resource() hugetlb: make mm and fs code explicitly non-modular mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd() mm: make sure isolate_lru_page() is never called for tail page vmstat: make vmstat_updater deferrable again and shut down on idle ...
2016-01-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fix from Al Viro: "Don't put symlink bodies in pagecache into highmem" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Make sure that highmem pages are not added to symlink page cache
2016-01-14Documentation/filesystems: describe the shared memory usage/accountingRodrigo Freire
The Shared Memory accounting support is present in Kernel since commit 4b02108ac1b3 ("mm: oom analysis: add shmem vmstat") and in userland free(1) since 2014. This patch updates the Documentation to reflect this change. Signed-off-by: Rodrigo Freire <rfreire@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm, procfs: breakdown RSS for anon, shmem and file in /proc/pid/statusJerome Marchand
There are several shortcomings with the accounting of shared memory (SysV shm, shared anonymous mapping, mapping of a tmpfs file). The values in /proc/<pid>/status and <...>/statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though theirs implication on memory usage are quite different: during reclaim, file mapping can be dropped or written back on disk, while shmem needs a place in swap. Also, to distinguish the memory occupied by anonymous and file mappings, one has to read the /proc/pid/statm file, which has a field for the file mappings (again, including shmem) and total memory occupied by these mappings (i.e. equivalent to VmRSS in the <...>/status file. Getting the value for anonymous mappings only is thus not exactly user-friendly (the statm file is intended to be rather efficiently machine-readable). To address both of these shortcomings, this patch adds a breakdown of VmRSS in /proc/<pid>/status via new fields RssAnon, RssFile and RssShmem, making use of the previous preparatory patch. These fields tell the user the memory occupied by private anonymous pages, mapped regular files and shmem, respectively. Other existing fields in /status and /statm files are left without change. The /statm file can be extended in the future, if there's a need for that. Example (part of) /proc/pid/status output including the new Rss* fields: VmPeak: 2001008 kB VmSize: 2001004 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 5108 kB VmRSS: 5108 kB RssAnon: 92 kB RssFile: 1324 kB RssShmem: 3692 kB VmData: 192 kB VmStk: 136 kB VmExe: 4 kB VmLib: 1784 kB VmPTE: 3928 kB VmPMD: 20 kB VmSwap: 0 kB HugetlbPages: 0 kB [vbabka@suse.cz: forward-porting, tweak changelog] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm, proc: account for shmem swap in /proc/pid/smapsVlastimil Babka
Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed mappings, even if the mapped portion does contain pages that were swapped out. This is because unlike private anonymous mappings, shmem does not change pte to swap entry, but pte_none when swapping the page out. In the smaps page walk, such page thus looks like it was never faulted in. This patch changes smaps_pte_entry() to determine the swap status for such pte_none entries for shmem mappings, similarly to how mincore_page() does it. Swapped out shmem pages are thus accounted for. For private mappings of tmpfs files that COWed some of the pages, swaped out status of the original shmem pages is naturally ignored. If some of the private copies was also swapped out, they are accounted via their page table swap entries, so the resulting reported swap usage is then a sum of both swapped out private copies, and swapped out shmem pages that were not COWed. No double accounting can thus happen. The accounting is arguably still not as precise as for private anonymous mappings, since now we will count also pages that the process in question never accessed, but another process populated them and then let them become swapped out. I believe it is still less confusing and subtle than not showing any swap usage by shmem mappings at all. Swapped out counter might of interest of users who would like to prevent from future swapins during performance critical operation and pre-fault them at their convenience. Especially for larger swapped out regions the cost of swapin is much higher than a fresh page allocation. So a differentiation between pte_none vs. swapped out is important for those usecases. One downside of this patch is that it makes /proc/pid/smaps more expensive for shmem mappings, as we consult the radix tree for each pte_none entry, so the overal complexity is O(n*log(n)). I have measured this on a process that creates a 2GB mapping and dirties single pages with a stride of 2MB, and time how long does it take to cat /proc/pid/smaps of this process 100 times. Private anonymous mapping: real 0m0.949s user 0m0.116s sys 0m0.348s Mapping of a /dev/shm/file: real 0m3.831s user 0m0.180s sys 0m3.212s The difference is rather substantial, so the next patch will reduce the cost for shared or read-only mappings. In a less controlled experiment, I've gathered pids of processes on my desktop that have either '/dev/shm/*' or 'SYSV*' in smaps. This included the Chrome browser and some KDE processes. Again, I've run cat /proc/pid/smaps on each 100 times. Before this patch: real 0m9.050s user 0m0.518s sys 0m8.066s After this patch: real 0m9.221s user 0m0.541s sys 0m8.187s This suggests low impact on average systems. Note that this patch doesn't attempt to adjust the SwapPss field for shmem mappings, which would need extra work to determine who else could have the pages mapped. Thus the value stays zero except for COWed swapped out pages in a shmem mapping, which are accounted as usual. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm, documentation: clarify /proc/pid/status VmSwap limitations for shmemVlastimil Babka
This series is based on Jerome Marchand's [1] so let me quote the first paragraph from there: There are several shortcomings with the accounting of shared memory (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The values in /proc/<pid>/status and statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though their implications on memory usage are quite different: at reclaim, file mapping can be dropped or written back on disk while shmem needs a place in swap. As for shmem pages that are swapped-out or in swap cache, they aren't accounted at all. The original motivation for myself is that a customer found (IMHO rightfully) confusing that e.g. top output for process swap usage is unreliable with respect to swapped out shmem pages, which are not accounted for. The fundamental difference between private anonymous and shmem pages is that the latter has PTE's converted to pte_none, and not swapents. As such, they are not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap row. It might be theoretically possible to use swapents when swapping out shmem (without extra cost, as one has to change all mappers anyway), and on swap in only convert the swapent for the faulting process, leaving swapents in other processes until they also fault (so again no extra cost). But I don't know how many assumptions this would break, and it would be too disruptive change for a relatively small benefit. Instead, my approach is to document the limitation of VmSwap, and provide means to determine the swap usage for shmem areas for those who are interested and willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I don't think it's possible to currently to determine the usage at all. The previous patchset [1] did introduce new shmem-specific fields into smaps output, and functions to determine the values. I take a simpler approach, noting that smaps output already has a "Swap: X kB" line, where currently X == 0 always for shmem areas. I think we can just consider this a bug and provide the proper value by consulting the radix tree, as e.g. mincore_page() does. In the patch changelog I explain why this is also not perfect (and cannot be without swapents), but still arguably much better than showing a 0. The last two patches are adapted from Jerome's patchset and provide a VmRSS breakdown to RssAnon, RssFile and RssShm in /proc/pid/status. Hugh noted that this is a welcome addition, and I agree that it might help e.g. debugging process memory usage at albeit non-zero, but still rather low cost of extra per-mm counter and some page flag checks. [1] http://lwn.net/Articles/611966/ This patch (of 6): The documentation for /proc/pid/status does not mention that the value of VmSwap counts only swapped out anonymous private pages, and not swapped out pages of the underlying shmem objects (for shmem mappings). This is not obvious, so document this limitation. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14Make sure that highmem pages are not added to symlink page cacheAl Viro
inode_nohighmem() is sufficient to make sure that page_get_link() won't try to allocate a highmem page. Moreover, it is sufficient to make sure that page_symlink/__page_symlink won't do the same thing. However, any filesystem that manually preseeds the symlink's page cache upon symlink(2) needs to make sure that the page it inserts there won't be a highmem one. Fortunately, only nfs and shmem have run afoul of that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-14Documentation: update libhugetlbfs site urlSeongJae Park
The site for libhugetlbfs has moved from sourceforge to github. This commit updates the old url. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-01-13Merge tag 'for-f2fs-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series adds two ioctls to control cached data and fragmented files. Most of the rest fixes missing error cases and bugs that we have not covered so far. Summary: Enhancements: - support an ioctl to execute online file defragmentation - support an ioctl to flush cached data - speed up shrinking of extent_cache entries - handle broken superblock - refector dirty inode management infra - revisit f2fs_map_blocks to handle more cases - reduce global lock coverage - add detecting user's idle time Major bug fixes: - fix data race condition on cached nat entries - fix error cases of volatile and atomic writes" * tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (87 commits) f2fs: should unset atomic flag after successful commit f2fs: fix wrong memory condition check f2fs: monitor the number of background checkpoint f2fs: detect idle time depending on user behavior f2fs: introduce time and interval facility f2fs: skip releasing nodes in chindless extent tree f2fs: use atomic type for node count in extent tree f2fs: recognize encrypted data in f2fs_fiemap f2fs: clean up f2fs_balance_fs f2fs: remove redundant calls f2fs: avoid unnecessary f2fs_balance_fs calls f2fs: check the page status filled from disk f2fs: introduce __get_node_page to reuse common code f2fs: check node id earily when readaheading node page f2fs: read isize while holding i_mutex in fiemap Revert "f2fs: check the node block address of newly allocated nid" f2fs: cover more area with nat_tree_lock f2fs: introduce max_file_blocks in sbi f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink f2fs: introduce zombie list for fast shrinking extent trees ...
2016-01-13Orangefs: add protocol information to Documentation/filesystems/orangefs.txtMike Marshall
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-01-12Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfsLinus Torvalds
Pull configfs updates from Christoph Hellwig: "I'm assisting Joel as co-maintainer and patch monkey now, and you will see pull reuquests from me for a while. Besides the MAINTAINERS update there is just a single change, which adds support for binary attributes to configfs, which are very similar to the sysfs binary attributes. Thanks to Pantelis Antoniou! You will see another actually bigger set of configfs changes in the SCSI target pull from Nic - those were merged before this new tree even existed" * tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs: configfs: add myself as co-maintainer, updated git tree configfs: implement binary attributes
2016-01-04configfs: implement binary attributesPantelis Antoniou
ConfigFS lacked binary attributes up until now. This patch introduces support for binary attributes in a somewhat similar manner of sysfs binary attributes albeit with changes that fit the configfs usage model. Problems that configfs binary attributes fix are everything that requires a binary blob as part of the configuration of a resource, such as bitstream loading for FPGAs, DTBs for dynamically created devices etc. Look at Documentation/filesystems/configfs/configfs.txt for internals and howto use them. This patch is against linux-next as of today that contains Christoph's configfs rework. Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com> [hch: folded a fix from Geert Uytterhoeven <geert+renesas@glider.be>] [hch: a few tiny updates based on review feedback] Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-12-30switch ->get_link() to delayed_call, kill ->put_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-16f2fs: introduce new option for controlling data flushChao Yu
Add a new option 'data_flush' to enable data flush functionality. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-10Doc: treewide: Fix grammar "a" to "an"Masanari Iida
This patch fix some grammar mistake. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-12-08replace ->follow_link() with new method that could stay in RCU modeAl Viro
new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08don't put symlink bodies in pagecache into highmemAl Viro
kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-04Doc: f2fs: Fix typos in Documentation/filesystems/f2fs.txtMasanari Iida
This patch fix some typos in Documentation/filesystems/f2fs.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-11-16Orangefs: Merge tag 'v4.4-rc1' into for-nextMike Marshall
Linux 4.4-rc1
2015-11-13Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "This series contains HCH's changes to absorb configfs attribute ->show() + ->store() function pointer usage from it's original tree-wide consumers, into common configfs code. It includes usb-gadget, target w/ drivers, netconsole and ocfs2 changes to realize the improved simplicity, that now renders the original include/target/configfs_macros.h CPP magic for fabric drivers and others, unnecessary and obsolete. And with common code in place, new configfs attributes can be added easier than ever before. Note, there are further improvements in-flight from other folks for v4.5 code in configfs land, plus number of target fixes for post -rc1 code" In the meantime, a new user of the now-removed old configfs API came in through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce an abstraction for System Trace Module devices"). This merge resolution comes from Alexander Shishkin, who updated his stm class tracing abstraction to account for the removal of the old show_attribute and store_attribute methods in commit 517982229f78 ("configfs: remove old API") from this pull. As Alexander says about that patch: "There's no need to keep an extra wrapper structure per item and the awkward show_attribute/store_attribute item ops are no longer needed. This patch converts policy code to the new api, all the while making the code quite a bit smaller and easier on the eyes. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>" That patch was folded into the merge so that the tree should be fully bisectable. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits) configfs: remove old API ocfs2/cluster: use per-attribute show and store methods ocfs2/cluster: move locking into attribute store methods netconsole: use per-attribute show and store methods target: use per-attribute show and store methods spear13xx_pcie_gadget: use per-attribute show and store methods dlm: use per-attribute show and store methods usb-gadget/f_serial: use per-attribute show and store methods usb-gadget/f_phonet: use per-attribute show and store methods usb-gadget/f_obex: use per-attribute show and store methods usb-gadget/f_uac2: use per-attribute show and store methods usb-gadget/f_uac1: use per-attribute show and store methods usb-gadget/f_mass_storage: use per-attribute show and store methods usb-gadget/f_sourcesink: use per-attribute show and store methods usb-gadget/f_printer: use per-attribute show and store methods usb-gadget/f_midi: use per-attribute show and store methods usb-gadget/f_loopback: use per-attribute show and store methods usb-gadget/ether: use per-attribute show and store methods usb-gadget/f_acm: use per-attribute show and store methods usb-gadget/f_hid: use per-attribute show and store methods ...
2015-11-13Merge tag '4.4-additional' of git://git.lwn.net/linuxLinus Torvalds
Pull more documentation updates from Jon Corbet: "A few more documentation patches that wandered in and have no reason to wait; these include some improvements to the suggestions for email clients and patch submission" * tag '4.4-additional' of git://git.lwn.net/linux: Documentation: Add minimal Mutt config for using Gmail Documentation: Add note on sending files directly with Mutt Documentation: dontdiff: remove media from dontdiff Documentation/SubmittingPatches: discuss In-Reply-To Remove email address from Documentation/filesystems/overlayfs.txt can-doc: Add missing semicolon to example
2015-11-11Remove email address from Documentation/filesystems/overlayfs.txtNeilBrown
I'm getting a surprising large number of questions about overlayfs sent to me personally, rather than to a relevant mailing list. So remove my email address from the documentation, and add a note about looking in the MAINTAINERS file. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-11-10Merge tag 'libnvdimm-for-4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Outside of the new ACPI-NFIT hot-add support this pull request is more notable for what it does not contain, than what it does. There were a handful of development topics this cycle, dax get_user_pages, dax fsync, and raw block dax, that need more more iteration and will wait for 4.5. The patches to make devm and the pmem driver NUMA aware have been in -next for several weeks. The hot-add support has not, but is contained to the NFIT driver and is passing unit tests. The coredump support is straightforward and was looked over by Jeff. All of it has received a 0day build success notification across 107 configs. Summary: - Add support for the ACPI 6.0 NFIT hot add mechanism to process updates of the NFIT at runtime. - Teach the coredump implementation how to filter out DAX mappings. - Introduce NUMA hints for allocations made by the pmem driver, and as a side effect all devm allocations now hint their NUMA node by default" * tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: coredump: add DAX filtering for FDPIC ELF coredumps coredump: add DAX filtering for ELF coredumps acpi: nfit: Add support for hot-add nfit: in acpi_nfit_init, break on a 0-length table pmem, memremap: convert to numa aware allocations devm_memremap_pages: use numa_mem_id devm: make allocations numa aware by default devm_memremap: convert to return ERR_PTR devm_memunmap: use devres_release() pmem: kill memremap_pmem() x86, mm: quiet arch_add_memory()
2015-11-09Merge tag 'gfs2-merge-window' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "Here is a list of patches we've accumulated for GFS2 for the current upstream merge window. There are only six patches this time: 1. A cleanup patch from Andreas to remove the gl_spin #define in favor of its value for the sake of clarity. 2. A fix from Andy Price to mark the inode dirty during fallocate. 3. A fix from Andy Price to set s_mode on mount failures to prevent a stack trace. 4 A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data due to uninitialized storage. 5. A patch from me to protecting our freeing of the in-core directory hash table to prevent double-free. 6. A fix for a page/block rounding problem that resulted in a metadata coherency problem when the block size != page size" I've got a lot more patches in various stages of review and testing, but I'm afraid they'll have to wait until the next merge window. So next time we're likely to have a lot more" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Fix rgrp end rounding problem for bsize < page size GFS2: Protect freeing directory hash table with i_lock spin_lock gfs2: Remove gl_spin define gfs2: Add missing else in trans_add_meta/data GFS2: Set s_mode before parsing mount options GFS2: fallocate: do not rely on file_update_time to mark the inode dirty