summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
AgeCommit message (Collapse)Author
2016-01-20Documentation/filesystems/vfat.txt: update the limitation for fat fallocateNamjae Jeon
Update the limitation for fat fallocate. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-17Merge tag 'docs-4.5' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jon Corbet: "A relatively boring cycle in the docs tree. There's a few kernel-doc fixes and various document tweaks. One patch reaches out of the documentation subtree to fix a comment in init/do_mounts_rd.c. There didn't seem to be anybody more appropriate to take that one, so I accepted it" * tag 'docs-4.5' of git://git.lwn.net/linux: (29 commits) thermal: add description for integral_cutoff unit Documentation: update libhugetlbfs site url Documentation: Explain pci=conf1,conf2 more verbosely DMA-API: fix confusing sentence in Documentation/DMA-API.txt Documentation: translations: update linux cross reference link Documentation: fix typo in CodingStyle init, Documentation: Remove ramdisk_blocksize mentions Documentation-getdelays: Apply a recommendation from "checkpatch.pl" in main() Documentation: HOWTO: update versions from 3.x to 4.x Documentation: remove outdated references from translations Doc: treewide: Fix grammar "a" to "an" Documentation: cpu-hotplug: Fix sysfs mount instructions can-doc: Add hint about getting timestamps Fix CFQ I/O scheduler parameter name in documentation Documentation: arm: remove dead links from Marvell Berlin docs Documentation: HOWTO: update code cross reference link Doc: Docbook/iio: Fix typo in iio.tmpl DocBook: make index.html generation less verbose by default DocBook: Cleanup: remove an unused $(call) line DocBook: Add a help message for DOCBOOKS env var ...
2016-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to -stable - A few misc fixes - OCFS2 updates - Part of MM. Including pretty large changes to page-flags handling and to thp management which have been buffered up for 2-3 cycles now. I have a lot of MM material this time. [ It turns out the THP part wasn't quite ready, so that got dropped from this series - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) zsmalloc: reorganize struct size_class to pack 4 bytes hole mm/zbud.c: use list_last_entry() instead of list_tail_entry() zram/zcomp: do not zero out zcomp private pages zram: pass gfp from zcomp frontend to backend zram: try vmalloc() after kmalloc() zram/zcomp: use GFP_NOIO to allocate streams mm: add tracepoint for scanning pages drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 mm/page_isolation: use macro to judge the alignment mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE() mm: rework virtual memory accounting include/linux/memblock.h: fix ordering of 'flags' argument in comments mm: move lru_to_page to mm_inline.h Documentation/filesystems: describe the shared memory usage/accounting memory-hotplug: don't BUG() in register_memory_resource() hugetlb: make mm and fs code explicitly non-modular mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd() mm: make sure isolate_lru_page() is never called for tail page vmstat: make vmstat_updater deferrable again and shut down on idle ...
2016-01-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fix from Al Viro: "Don't put symlink bodies in pagecache into highmem" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Make sure that highmem pages are not added to symlink page cache
2016-01-14Documentation/filesystems: describe the shared memory usage/accountingRodrigo Freire
The Shared Memory accounting support is present in Kernel since commit 4b02108ac1b3 ("mm: oom analysis: add shmem vmstat") and in userland free(1) since 2014. This patch updates the Documentation to reflect this change. Signed-off-by: Rodrigo Freire <rfreire@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm, procfs: breakdown RSS for anon, shmem and file in /proc/pid/statusJerome Marchand
There are several shortcomings with the accounting of shared memory (SysV shm, shared anonymous mapping, mapping of a tmpfs file). The values in /proc/<pid>/status and <...>/statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though theirs implication on memory usage are quite different: during reclaim, file mapping can be dropped or written back on disk, while shmem needs a place in swap. Also, to distinguish the memory occupied by anonymous and file mappings, one has to read the /proc/pid/statm file, which has a field for the file mappings (again, including shmem) and total memory occupied by these mappings (i.e. equivalent to VmRSS in the <...>/status file. Getting the value for anonymous mappings only is thus not exactly user-friendly (the statm file is intended to be rather efficiently machine-readable). To address both of these shortcomings, this patch adds a breakdown of VmRSS in /proc/<pid>/status via new fields RssAnon, RssFile and RssShmem, making use of the previous preparatory patch. These fields tell the user the memory occupied by private anonymous pages, mapped regular files and shmem, respectively. Other existing fields in /status and /statm files are left without change. The /statm file can be extended in the future, if there's a need for that. Example (part of) /proc/pid/status output including the new Rss* fields: VmPeak: 2001008 kB VmSize: 2001004 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 5108 kB VmRSS: 5108 kB RssAnon: 92 kB RssFile: 1324 kB RssShmem: 3692 kB VmData: 192 kB VmStk: 136 kB VmExe: 4 kB VmLib: 1784 kB VmPTE: 3928 kB VmPMD: 20 kB VmSwap: 0 kB HugetlbPages: 0 kB [vbabka@suse.cz: forward-porting, tweak changelog] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm, proc: account for shmem swap in /proc/pid/smapsVlastimil Babka
Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed mappings, even if the mapped portion does contain pages that were swapped out. This is because unlike private anonymous mappings, shmem does not change pte to swap entry, but pte_none when swapping the page out. In the smaps page walk, such page thus looks like it was never faulted in. This patch changes smaps_pte_entry() to determine the swap status for such pte_none entries for shmem mappings, similarly to how mincore_page() does it. Swapped out shmem pages are thus accounted for. For private mappings of tmpfs files that COWed some of the pages, swaped out status of the original shmem pages is naturally ignored. If some of the private copies was also swapped out, they are accounted via their page table swap entries, so the resulting reported swap usage is then a sum of both swapped out private copies, and swapped out shmem pages that were not COWed. No double accounting can thus happen. The accounting is arguably still not as precise as for private anonymous mappings, since now we will count also pages that the process in question never accessed, but another process populated them and then let them become swapped out. I believe it is still less confusing and subtle than not showing any swap usage by shmem mappings at all. Swapped out counter might of interest of users who would like to prevent from future swapins during performance critical operation and pre-fault them at their convenience. Especially for larger swapped out regions the cost of swapin is much higher than a fresh page allocation. So a differentiation between pte_none vs. swapped out is important for those usecases. One downside of this patch is that it makes /proc/pid/smaps more expensive for shmem mappings, as we consult the radix tree for each pte_none entry, so the overal complexity is O(n*log(n)). I have measured this on a process that creates a 2GB mapping and dirties single pages with a stride of 2MB, and time how long does it take to cat /proc/pid/smaps of this process 100 times. Private anonymous mapping: real 0m0.949s user 0m0.116s sys 0m0.348s Mapping of a /dev/shm/file: real 0m3.831s user 0m0.180s sys 0m3.212s The difference is rather substantial, so the next patch will reduce the cost for shared or read-only mappings. In a less controlled experiment, I've gathered pids of processes on my desktop that have either '/dev/shm/*' or 'SYSV*' in smaps. This included the Chrome browser and some KDE processes. Again, I've run cat /proc/pid/smaps on each 100 times. Before this patch: real 0m9.050s user 0m0.518s sys 0m8.066s After this patch: real 0m9.221s user 0m0.541s sys 0m8.187s This suggests low impact on average systems. Note that this patch doesn't attempt to adjust the SwapPss field for shmem mappings, which would need extra work to determine who else could have the pages mapped. Thus the value stays zero except for COWed swapped out pages in a shmem mapping, which are accounted as usual. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mm, documentation: clarify /proc/pid/status VmSwap limitations for shmemVlastimil Babka
This series is based on Jerome Marchand's [1] so let me quote the first paragraph from there: There are several shortcomings with the accounting of shared memory (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The values in /proc/<pid>/status and statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though their implications on memory usage are quite different: at reclaim, file mapping can be dropped or written back on disk while shmem needs a place in swap. As for shmem pages that are swapped-out or in swap cache, they aren't accounted at all. The original motivation for myself is that a customer found (IMHO rightfully) confusing that e.g. top output for process swap usage is unreliable with respect to swapped out shmem pages, which are not accounted for. The fundamental difference between private anonymous and shmem pages is that the latter has PTE's converted to pte_none, and not swapents. As such, they are not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap row. It might be theoretically possible to use swapents when swapping out shmem (without extra cost, as one has to change all mappers anyway), and on swap in only convert the swapent for the faulting process, leaving swapents in other processes until they also fault (so again no extra cost). But I don't know how many assumptions this would break, and it would be too disruptive change for a relatively small benefit. Instead, my approach is to document the limitation of VmSwap, and provide means to determine the swap usage for shmem areas for those who are interested and willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I don't think it's possible to currently to determine the usage at all. The previous patchset [1] did introduce new shmem-specific fields into smaps output, and functions to determine the values. I take a simpler approach, noting that smaps output already has a "Swap: X kB" line, where currently X == 0 always for shmem areas. I think we can just consider this a bug and provide the proper value by consulting the radix tree, as e.g. mincore_page() does. In the patch changelog I explain why this is also not perfect (and cannot be without swapents), but still arguably much better than showing a 0. The last two patches are adapted from Jerome's patchset and provide a VmRSS breakdown to RssAnon, RssFile and RssShm in /proc/pid/status. Hugh noted that this is a welcome addition, and I agree that it might help e.g. debugging process memory usage at albeit non-zero, but still rather low cost of extra per-mm counter and some page flag checks. [1] http://lwn.net/Articles/611966/ This patch (of 6): The documentation for /proc/pid/status does not mention that the value of VmSwap counts only swapped out anonymous private pages, and not swapped out pages of the underlying shmem objects (for shmem mappings). This is not obvious, so document this limitation. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14Make sure that highmem pages are not added to symlink page cacheAl Viro
inode_nohighmem() is sufficient to make sure that page_get_link() won't try to allocate a highmem page. Moreover, it is sufficient to make sure that page_symlink/__page_symlink won't do the same thing. However, any filesystem that manually preseeds the symlink's page cache upon symlink(2) needs to make sure that the page it inserts there won't be a highmem one. Fortunately, only nfs and shmem have run afoul of that... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-14Documentation: update libhugetlbfs site urlSeongJae Park
The site for libhugetlbfs has moved from sourceforge to github. This commit updates the old url. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-01-13Merge tag 'for-f2fs-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series adds two ioctls to control cached data and fragmented files. Most of the rest fixes missing error cases and bugs that we have not covered so far. Summary: Enhancements: - support an ioctl to execute online file defragmentation - support an ioctl to flush cached data - speed up shrinking of extent_cache entries - handle broken superblock - refector dirty inode management infra - revisit f2fs_map_blocks to handle more cases - reduce global lock coverage - add detecting user's idle time Major bug fixes: - fix data race condition on cached nat entries - fix error cases of volatile and atomic writes" * tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (87 commits) f2fs: should unset atomic flag after successful commit f2fs: fix wrong memory condition check f2fs: monitor the number of background checkpoint f2fs: detect idle time depending on user behavior f2fs: introduce time and interval facility f2fs: skip releasing nodes in chindless extent tree f2fs: use atomic type for node count in extent tree f2fs: recognize encrypted data in f2fs_fiemap f2fs: clean up f2fs_balance_fs f2fs: remove redundant calls f2fs: avoid unnecessary f2fs_balance_fs calls f2fs: check the page status filled from disk f2fs: introduce __get_node_page to reuse common code f2fs: check node id earily when readaheading node page f2fs: read isize while holding i_mutex in fiemap Revert "f2fs: check the node block address of newly allocated nid" f2fs: cover more area with nat_tree_lock f2fs: introduce max_file_blocks in sbi f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink f2fs: introduce zombie list for fast shrinking extent trees ...
2016-01-12Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfsLinus Torvalds
Pull configfs updates from Christoph Hellwig: "I'm assisting Joel as co-maintainer and patch monkey now, and you will see pull reuquests from me for a while. Besides the MAINTAINERS update there is just a single change, which adds support for binary attributes to configfs, which are very similar to the sysfs binary attributes. Thanks to Pantelis Antoniou! You will see another actually bigger set of configfs changes in the SCSI target pull from Nic - those were merged before this new tree even existed" * tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs: configfs: add myself as co-maintainer, updated git tree configfs: implement binary attributes
2016-01-04configfs: implement binary attributesPantelis Antoniou
ConfigFS lacked binary attributes up until now. This patch introduces support for binary attributes in a somewhat similar manner of sysfs binary attributes albeit with changes that fit the configfs usage model. Problems that configfs binary attributes fix are everything that requires a binary blob as part of the configuration of a resource, such as bitstream loading for FPGAs, DTBs for dynamically created devices etc. Look at Documentation/filesystems/configfs/configfs.txt for internals and howto use them. This patch is against linux-next as of today that contains Christoph's configfs rework. Signed-off-by: Pantelis Antoniou <pantelis.antoniou@konsulko.com> [hch: folded a fix from Geert Uytterhoeven <geert+renesas@glider.be>] [hch: a few tiny updates based on review feedback] Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-12-30switch ->get_link() to delayed_call, kill ->put_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-16f2fs: introduce new option for controlling data flushChao Yu
Add a new option 'data_flush' to enable data flush functionality. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-10Doc: treewide: Fix grammar "a" to "an"Masanari Iida
This patch fix some grammar mistake. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-12-08replace ->follow_link() with new method that could stay in RCU modeAl Viro
new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08don't put symlink bodies in pagecache into highmemAl Viro
kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-04Doc: f2fs: Fix typos in Documentation/filesystems/f2fs.txtMasanari Iida
This patch fix some typos in Documentation/filesystems/f2fs.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-11-13Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "This series contains HCH's changes to absorb configfs attribute ->show() + ->store() function pointer usage from it's original tree-wide consumers, into common configfs code. It includes usb-gadget, target w/ drivers, netconsole and ocfs2 changes to realize the improved simplicity, that now renders the original include/target/configfs_macros.h CPP magic for fabric drivers and others, unnecessary and obsolete. And with common code in place, new configfs attributes can be added easier than ever before. Note, there are further improvements in-flight from other folks for v4.5 code in configfs land, plus number of target fixes for post -rc1 code" In the meantime, a new user of the now-removed old configfs API came in through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce an abstraction for System Trace Module devices"). This merge resolution comes from Alexander Shishkin, who updated his stm class tracing abstraction to account for the removal of the old show_attribute and store_attribute methods in commit 517982229f78 ("configfs: remove old API") from this pull. As Alexander says about that patch: "There's no need to keep an extra wrapper structure per item and the awkward show_attribute/store_attribute item ops are no longer needed. This patch converts policy code to the new api, all the while making the code quite a bit smaller and easier on the eyes. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>" That patch was folded into the merge so that the tree should be fully bisectable. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits) configfs: remove old API ocfs2/cluster: use per-attribute show and store methods ocfs2/cluster: move locking into attribute store methods netconsole: use per-attribute show and store methods target: use per-attribute show and store methods spear13xx_pcie_gadget: use per-attribute show and store methods dlm: use per-attribute show and store methods usb-gadget/f_serial: use per-attribute show and store methods usb-gadget/f_phonet: use per-attribute show and store methods usb-gadget/f_obex: use per-attribute show and store methods usb-gadget/f_uac2: use per-attribute show and store methods usb-gadget/f_uac1: use per-attribute show and store methods usb-gadget/f_mass_storage: use per-attribute show and store methods usb-gadget/f_sourcesink: use per-attribute show and store methods usb-gadget/f_printer: use per-attribute show and store methods usb-gadget/f_midi: use per-attribute show and store methods usb-gadget/f_loopback: use per-attribute show and store methods usb-gadget/ether: use per-attribute show and store methods usb-gadget/f_acm: use per-attribute show and store methods usb-gadget/f_hid: use per-attribute show and store methods ...
2015-11-13Merge tag '4.4-additional' of git://git.lwn.net/linuxLinus Torvalds
Pull more documentation updates from Jon Corbet: "A few more documentation patches that wandered in and have no reason to wait; these include some improvements to the suggestions for email clients and patch submission" * tag '4.4-additional' of git://git.lwn.net/linux: Documentation: Add minimal Mutt config for using Gmail Documentation: Add note on sending files directly with Mutt Documentation: dontdiff: remove media from dontdiff Documentation/SubmittingPatches: discuss In-Reply-To Remove email address from Documentation/filesystems/overlayfs.txt can-doc: Add missing semicolon to example
2015-11-11Remove email address from Documentation/filesystems/overlayfs.txtNeilBrown
I'm getting a surprising large number of questions about overlayfs sent to me personally, rather than to a relevant mailing list. So remove my email address from the documentation, and add a note about looking in the MAINTAINERS file. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-11-10Merge tag 'libnvdimm-for-4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Outside of the new ACPI-NFIT hot-add support this pull request is more notable for what it does not contain, than what it does. There were a handful of development topics this cycle, dax get_user_pages, dax fsync, and raw block dax, that need more more iteration and will wait for 4.5. The patches to make devm and the pmem driver NUMA aware have been in -next for several weeks. The hot-add support has not, but is contained to the NFIT driver and is passing unit tests. The coredump support is straightforward and was looked over by Jeff. All of it has received a 0day build success notification across 107 configs. Summary: - Add support for the ACPI 6.0 NFIT hot add mechanism to process updates of the NFIT at runtime. - Teach the coredump implementation how to filter out DAX mappings. - Introduce NUMA hints for allocations made by the pmem driver, and as a side effect all devm allocations now hint their NUMA node by default" * tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: coredump: add DAX filtering for FDPIC ELF coredumps coredump: add DAX filtering for ELF coredumps acpi: nfit: Add support for hot-add nfit: in acpi_nfit_init, break on a 0-length table pmem, memremap: convert to numa aware allocations devm_memremap_pages: use numa_mem_id devm: make allocations numa aware by default devm_memremap: convert to return ERR_PTR devm_memunmap: use devres_release() pmem: kill memremap_pmem() x86, mm: quiet arch_add_memory()
2015-11-09Merge tag 'gfs2-merge-window' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "Here is a list of patches we've accumulated for GFS2 for the current upstream merge window. There are only six patches this time: 1. A cleanup patch from Andreas to remove the gl_spin #define in favor of its value for the sake of clarity. 2. A fix from Andy Price to mark the inode dirty during fallocate. 3. A fix from Andy Price to set s_mode on mount failures to prevent a stack trace. 4 A patch from me to prevent a kernel BUG() in trans_add_meta/trans_add_data due to uninitialized storage. 5. A patch from me to protecting our freeing of the in-core directory hash table to prevent double-free. 6. A fix for a page/block rounding problem that resulted in a metadata coherency problem when the block size != page size" I've got a lot more patches in various stages of review and testing, but I'm afraid they'll have to wait until the next merge window. So next time we're likely to have a lot more" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Fix rgrp end rounding problem for bsize < page size GFS2: Protect freeing directory hash table with i_lock spin_lock gfs2: Remove gl_spin define gfs2: Add missing else in trans_add_meta/data GFS2: Set s_mode before parsing mount options GFS2: fallocate: do not rely on file_update_time to mark the inode dirty
2015-11-09coredump: add DAX filtering for ELF coredumpsRoss Zwisler
Add two new flags to the existing coredump mechanism for ELF files to allow us to explicitly filter DAX mappings. This is desirable because DAX mappings, like hugetlb mappings, have the potential to be very large. Update the coredump_filter documentation in Documentation/filesystems/proc.txt so that it addresses the new DAX coredump flags. Also update the documented default value of coredump_filter to be consistent with the core(5) man page. The documentation being updated talks about bit 4, Dump ELF headers, which is enabled if CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is turned on in the kernel config. This kernel config option defaults to "y" if both ELF binaries and coredump are enabled. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge patch-bomb from Andrew Morton: - inotify tweaks - some ocfs2 updates (many more are awaiting review) - various misc bits - kernel/watchdog.c updates - Some of mm. I have a huge number of MM patches this time and quite a lot of it is quite difficult and much will be held over to next time. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits) selftests: vm: add tests for lock on fault mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage mm: introduce VM_LOCKONFAULT mm: mlock: add new mlock system call mm: mlock: refactor mlock, munlock, and munlockall code kasan: always taint kernel on report mm, slub, kasan: enable user tracking by default with KASAN=y kasan: use IS_ALIGNED in memory_is_poisoned_8() kasan: Fix a type conversion error lib: test_kasan: add some testcases kasan: update reference to kasan prototype repo kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile kasan: various fixes in documentation kasan: update log messages kasan: accurately determine the type of the bad access kasan: update reported bug types for kernel memory accesses kasan: update reported bug types for not user nor kernel memory accesses mm/kasan: prevent deadlock in kasan reporting mm/kasan: don't use kasan shadow pointer in generic functions mm/kasan: MODULE_VADDR is not available on all archs ...
2015-11-05Documentation/filesystems/proc.txt: a little tidyingHugh Dickins
There's an odd line about "Locked" at the head of the description of /proc/meminfo: it seems to have strayed from /proc/PID/smaps, so lead it back there. Move "Swap" and "SwapPss" descriptions down above it, to match the order in the file (though "PageSize"s still undescribed). The example of "Locked: 374 kB" (the same as Pss, neither Rss nor Size) is so unlikely as to be misleading: just make it 0, this is /bin/bash text; which would be "dw" (disabled write) not "de" (do not expand). Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm Documentation: undoc non-linear vmasHugh Dickins
While updating some mm Documentation, I came across a few straggling references to the non-linear vmas which were happily removed in v4.0. Delete them. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm: hugetlb: proc: add HugetlbPages field to /proc/PID/statusNaoya Horiguchi
Currently there's no easy way to get per-process usage of hugetlb pages, which is inconvenient because userspace applications which use hugetlb typically want to control their processes on the basis of how much memory (including hugetlb) they use. So this patch simply provides easy access to the info via /proc/PID/status. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Joern Engel <joern@logfs.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smapsNaoya Horiguchi
Currently /proc/PID/smaps provides no usage info for vma(VM_HUGETLB), which is inconvenient when we want to know per-task or per-vma base hugetlb usage. To solve this, this patch adds new fields for hugetlb usage like below: Size: 20480 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB AnonHugePages: 0 kB Shared_Hugetlb: 18432 kB Private_Hugetlb: 2048 kB Swap: 0 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB Locked: 0 kB VmFlags: rd wr mr mw me de ht [hughd@google.com: fix Private_Hugetlb alignment ] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Joern Engel <joern@logfs.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05Merge tag 'docs-for-linus' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation update from Jon Corbet: "There is a nice new document from Neil on how pathname lookups work and some new CAN driver documentation. Beyond that, we have kernel-doc fixes, a bit more work to support reproducible builds, and the usual collection of small fixes" * tag 'docs-for-linus' of git://git.lwn.net/linux: (34 commits) Documentation: add new description of path-name lookup. Documentation/vm/slub.txt: document slabinfo-gnuplot.sh Doc: ABI/stable: Fix typo in ABI/stable doc: Clarify that nmi_watchdog param is for hardlockups Typo correction for description in gpio document. DocBook: Fix kernel-doc to be case-insensitive for private: kernel-docs.txt: update kernelnewbies reference Doc:kvm: Fix typo in Doc/virtual/kvm Documentation/Changes: Add bc in "Current Minimal Requirements" section Documentation/email-clients.txt: remove trailing whitespace DocBook: Use a fixed encoding for output MAINTAINERS: The docs tree has moved Docs/kernel-parameters: Add earlycon devicetree usage SubmittingPatches: make Subject examples match the de facto standard Documentation: gpio: mention that <function>-gpio has been deprecated Documentation: cgroups: just fix a few typos Documentation: Update kselftest.txt Documentation: DMA API: Be more explicit that nents is always the same Documentation: Update the default value of crashkernel low zram: update documentation ...
2015-11-05Merge tag 'for-f2fs-4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "Most part of the patches include enhancing the stability and performance of in-memory extent caches feature. In addition, it introduces several new features and configurable points: - F2FS_GOING_DOWN_METAFLUSH ioctl to test power failures - F2FS_IOC_WRITE_CHECKPOINT ioctl to trigger checkpoint by users - background_gc=sync mount option to do gc synchronously - periodic checkpoints - sysfs entry to control readahead blocks for free nids And the following bug fixes have been merged. - fix SSA corruption by collapse/insert_range - correct a couple of gc behaviors - fix the results of f2fs_map_blocks - fix error case handling of volatile/atomic writes" * tag 'for-f2fs-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (54 commits) f2fs: fix to skip shrinking extent nodes f2fs: fix error path of ->symlink f2fs: fix to clear GCed flag for atomic written page f2fs: don't need to submit bio on error case f2fs: fix leakage of inmemory atomic pages f2fs: refactor __find_rev_next_{zero}_bit f2fs: support fiemap for inline_data f2fs: flush dirty data for bmap f2fs: relocate the tracepoint for background_gc f2fs crypto: fix racing of accessing encrypted page among f2fs: export ra_nid_pages to sysfs f2fs: readahead for free nids building f2fs: support lower priority asynchronous readahead in ra_meta_pages f2fs: don't tag REQ_META for temporary non-meta pages f2fs: add a tracepoint for f2fs_read_data_pages f2fs: set GFP_NOFS for grab_cache_page f2fs: fix SSA updates resulting in corruption Revert "f2fs: do not skip dentry block writes" f2fs: add F2FS_GOING_DOWN_METAFLUSH to test power-failure f2fs: merge meta writes as many possible ...
2015-11-04Merge tag 'driver-core-4.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch of debugfs updates, with a smattering of minor driver core fixes and updates as well. All have been in linux-next for a long time" * tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: debugfs: Add debugfs_create_ulong() of: to support binding numa node to specified device in devicetree debugfs: Add read-only/write-only bool file ops debugfs: Add read-only/write-only size_t file ops debugfs: Add read-only/write-only x64 file ops debugfs: Consolidate file mode checks in debugfs_create_*() Revert "mm: Check if section present during memory block (un)registering" driver-core: platform: Provide helpers for multi-driver modules mm: Check if section present during memory block (un)registering devres: fix a for loop bounds check CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit base/platform: assert that dev_pm_domain callbacks are called unconditionally sysfs: correctly handle short reads on PREALLOC attrs. base: soc: siplify ida usage kobject: move EXPORT_SYMBOL() macros next to corresponding definitions kobject: explain what kobject's sd field is debugfs: document that debugfs_remove*() accepts NULL and error values debugfs: Pass bool pointer to debugfs_create_bool() ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'
2015-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: Changes of note: 1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell. 2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from David Ahern. 3) Allow the user to ask for the statistics to be filtered out of ipv4/ipv6 address netlink dumps. From Sowmini Varadhan. 4) More work to pass the network namespace context around deep into various packet path APIs, starting with the netfilter hooks. From Eric W Biederman. 5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas Richter. 6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng. 7) Support Very High Throughput in wireless MESH code, from Bob Copeland. 8) Allow setting the ageing_time in switchdev/rocker. From Scott Feldman. 9) Properly autoload L2TP type modules, from Stephen Hemminger. 10) Fix and enable offload features by default in 8139cp driver, from David Woodhouse. 11) Support both ipv4 and ipv6 sockets in a single vxlan device, from Jiri Benc. 12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning Opstad. 13) Fix IPSEC flowcache overflows on large systems, from Steffen Klassert. 14) Convert bridging to track VLANs using rhashtable entries rather than a bitmap. From Nikolay Aleksandrov. 15) Make TCP listener handling completely lockless, this is a major accomplishment. Incoming request sockets now live in the established hash table just like any other socket too. From Eric Dumazet. 15) Provide more bridging attributes to netlink, from Nikolay Aleksandrov. 16) Use hash based algorithm for ipv4 multipath routing, this was very long overdue. From Peter Nørlund. 17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann. 18) Allow non-root execution of EBPF programs, from Alexei Starovoitov. 19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This influences the port binding selection logic used by SO_REUSEPORT. 20) Add ipv6 support to VRF, from David Ahern. 21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko. 22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen. 23) Implement RACK loss recovery in TCP, from Yuchung Cheng. 24) Support multipath routes in MPLS, from Roopa Prabhu. 25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric Dumazet. 26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and Sudarsana Kalluru. 27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic Sowa. 28) Support ipv6 geneve tunnels, from John W Linville. 29) Add flood control support to switchdev layer, from Ido Schimmel. 30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from Hannes Frederic Sowa. 31) Support persistent maps and progs in bpf, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits) sh_eth: use DMA barriers switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion net: sched: kill dead code in sch_choke.c irda: Delete an unnecessary check before the function call "irlmp_unregister_service" net: dsa: mv88e6xxx: include DSA ports in VLANs net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports net/core: fix for_each_netdev_feature vlan: Invoke driver vlan hooks only if device is present arcnet/com20020: add LEDS_CLASS dependency bpf, verifier: annotate verbose printer with __printf dp83640: Only wait for timestamps for packets with timestamping enabled. ptp: Change ptp_class to a proper bitmask dp83640: Prune rx timestamp list before reading from it dp83640: Delay scheduled work. dp83640: Include hash in timestamp/packet matching ipv6: fix tunnel error handling net/mlx5e: Fix LSO vlan insertion net/mlx5e: Re-eanble client vlan TX acceleration net/mlx5e: Return error in case mlx5e_set_features() fails net/mlx5e: Don't allow more than max supported channels ...
2015-11-02Documentation: add new description of path-name lookup.Neil Brown
This document is based on three recent lwn.net articles. Some of the introductory material and linkage between articles has been removed, and some time-based descriptions have been revised. Also all links to code have been removed as the code is very close by. Contains corrections and improvements from Randy Dunlap <rdunlap@infradead.org>. Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-10-29gfs2: Remove gl_spin defineAndreas Gruenbacher
Commit e66cf161 replaced the gl_spin spinlock in struct gfs2_glock with a gl_lockref lockref and defined gl_spin as gl_lockref.lock (the spinlock in gl_lockref). Remove that define to make the references to gl_lockref.lock more obvious. Signed-off-by: Andreas Gruenbacher <andreas.gruenbacher@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-10-18ipconfig: send Client-identifier in DHCP requestsLi RongQing
A dhcp server may provide parameters to a client from a pool of IP addresses and using a shared rootfs, or provide a specific set of parameters for a specific client, usually using the MAC address to identify each client individually. The dhcp protocol also specifies a client-id field which can be used to determine the correct parameters to supply when no MAC address is available. There is currently no way to tell the kernel to supply a specific client-id, only the userspace dhcp clients support this feature, but this can not be used when the network is needed before userspace is available such as when the root filesystem is on NFS. This patch is to be able to do something like "ip=dhcp,client_id_type, client_id_value", as a kernel parameter to enable the kernel to identify itself to the server. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-13configfs: remove old APIChristoph Hellwig
Remove the old show_attribute and store_attribute methods and update the documentation. Also replace the two C samples with a single new one in the proper samples directory where people expect to find it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-10-09f2fs: introduce background_gc=sync mount optionJaegeuk Kim
This patch introduce background_gc=sync enabling synchronous cleaning in background. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-04debugfs: Pass bool pointer to debugfs_create_bool()Viresh Kumar
Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument, when all it needs is a boolean pointer. It would be better to update this API to make it accept 'bool *' instead, as that will make it more consistent and often more convenient. Over that bool takes just a byte. That required updates to all user sites as well, in the same commit updating the API. regmap core was also using debugfs_{read|write}_file_bool(), directly and variable types were updated for that to be bool as well. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01fs/proc, core/debug: Don't expose absolute kernel addresses via wchanIngo Molnar
So the /proc/PID/stat 'wchan' field (the 30th field, which contains the absolute kernel address of the kernel function a task is blocked in) leaks absolute kernel addresses to unprivileged user-space: seq_put_decimal_ull(m, ' ', wchan); The absolute address might also leak via /proc/PID/wchan as well, if KALLSYMS is turned off or if the symbol lookup fails for some reason: static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { unsigned long wchan; char symname[KSYM_NAME_LEN]; wchan = get_wchan(task); if (lookup_symbol_name(wchan, symname) < 0) { if (!ptrace_may_access(task, PTRACE_MODE_READ)) return 0; seq_printf(m, "%lu", wchan); } else { seq_printf(m, "%s", symname); } return 0; } This isn't ideal, because for example it trivially leaks the KASLR offset to any local attacker: fomalhaut:~> printf "%016lx\n" $(cat /proc/$$/stat | cut -d' ' -f35) ffffffff8123b380 Most real-life uses of wchan are symbolic: ps -eo pid:10,tid:10,wchan:30,comm and procps uses /proc/PID/wchan, not the absolute address in /proc/PID/stat: triton:~/tip> strace -f ps -eo pid:10,tid:10,wchan:30,comm 2>&1 | grep wchan | tail -1 open("/proc/30833/wchan", O_RDONLY) = 6 There's one compatibility quirk here: procps relies on whether the absolute value is non-zero - and we can provide that functionality by outputing "0" or "1" depending on whether the task is blocked (whether there's a wchan address). These days there appears to be very little legitimate reason user-space would be interested in the absolute address. The absolute address is mostly historic: from the days when we didn't have kallsyms and user-space procps had to do the decoding itself via the System.map. So this patch sets all numeric output to "0" or "1" and keeps only symbolic output, in /proc/PID/wchan. ( The absolute sleep address can generally still be profiled via perf, by tasks with sufficient privileges. ) Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20150930135917.GA3285@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13sysfs.txt: mention that store method buffers are null-terminatedUlf Magnusson
Without knowing this, the use of sysfs_streq() becomes puzzling. The termination happens in kernfs_fop_write(). Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> [jc: moved the new text to a different paragraph] Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-09-13sysfs-tagging.txt: fix pre-kernfs referencesUlf Magnusson
- sysfs_dirent is now kernfs_node - see commit 324a56e16e44 ("kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly") - sysfs_super_info is now kernfs_super_info - see commit c525aaddc366 ("kernfs: s/sysfs/kernfs/ in various data structures") - the 's_' prefix was dropped from various fields - see commit adc5e8b58f48 ("kernfs: drop s_ prefix from kernfs_node members") Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-09-13sysfs.txt: fix pre-kernfs sysfs_dirent referenceUlf Magnusson
sysfs_dirent went away when kernfs was extracted from sysfs. The reference to the kobject now lives in a kernfs_node (in the 'priv' member). See commit 324a56e16e44 ("kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly"). Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-09-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: "Almost all of the rest of MM. There was an unusually large amount of MM material this time" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (141 commits) zpool: remove no-op module init/exit mm: zbud: constify the zbud_ops mm: zpool: constify the zpool_ops mm: swap: zswap: maybe_preload & refactoring zram: unify error reporting zsmalloc: remove null check from destroy_handle_cache() zsmalloc: do not take class lock in zs_shrinker_count() zsmalloc: use class->pages_per_zspage zsmalloc: consider ZS_ALMOST_FULL as migrate source zsmalloc: partial page ordering within a fullness_list zsmalloc: use shrinker to trigger auto-compaction zsmalloc: account the number of compacted pages zsmalloc/zram: introduce zs_pool_stats api zsmalloc: cosmetic compaction code adjustments zsmalloc: introduce zs_can_compact() function zsmalloc: always keep per-class stats zsmalloc: drop unused variable `nr_to_migrate' mm/memblock.c: fix comment in __next_mem_range() mm/page_alloc.c: fix type information of memoryless node memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node() ...
2015-09-08mm: /proc/pid/smaps:: show proportional swap share of the mappingMinchan Kim
We want to know per-process workingset size for smart memory management on userland and we use swap(ex, zram) heavily to maximize memory efficiency so workingset includes swap as well as RSS. On such system, if there are lots of shared anonymous pages, it's really hard to figure out exactly how many each process consumes memory(ie, rss + wap) if the system has lots of shared anonymous memory(e.g, android). This patch introduces SwapPss field on /proc/<pid>/smaps so we can get more exact workingset size per process. Bongkyu tested it. Result is below. 1. 50M used swap SwapTotal: 461976 kB SwapFree: 411192 kB $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}'; 48236 $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}'; 141184 2. 240M used swap SwapTotal: 461976 kB SwapFree: 216808 kB $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}'; 230315 $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}'; 1387744 [akpm@linux-foundation.org: simplify kunmap_atomic() call] Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Bongkyu Kim <bongkyu.kim@lge.com> Tested-by: Bongkyu Kim <bongkyu.kim@lge.com> Cc: Hugh Dickins <hughd@google.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08dax: add huge page fault supportMatthew Wilcox
This is the support code for DAX-enabled filesystems to allow them to provide huge pages in response to faults. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08Merge tag 'libnvdimm-for-4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This update has successfully completed a 0day-kbuild run and has appeared in a linux-next release. The changes outside of the typical drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and the introduction of ZONE_DEVICE + devm_memremap_pages(). Summary: - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. - Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. - Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. - Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes" * tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits) libnvdimm, pmem: direct map legacy pmem by default libnvdimm, pmem: 'struct page' for pmem libnvdimm, pfn: 'struct page' provider infrastructure x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB add devm_memremap_pages mm: ZONE_DEVICE for "device memory" mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h dax: drop size parameter to ->direct_access() nd_blk: change aperture mapping from WC to WB nvdimm: change to use generic kvfree() pmem, dax: have direct_access use __pmem annotation dax: update I/O path to do proper PMEM flushing pmem: add copy_from_iter_pmem() and clear_pmem() pmem, x86: clean up conditional pmem includes pmem: remove layer when calling arch_has_wmb_pmem() pmem, x86: move x86 PMEM API to new pmem.h header libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option pmem: switch to devm_ allocations devres: add devm_memremap libnvdimm, btt: write and validate parent_uuid ...
2015-09-05Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Nothing major, but: - Add Jeff Layton as an nfsd co-maintainer: no change to existing practice, just an acknowledgement of the status quo. - Two patches ("nfsd: ensure that...") for a race overlooked by the state locking rewrite, causing a crash noticed by multiple users. - Lots of smaller bugfixes all over from Kinglong Mee. - From Jeff, some cleanup of server rpc code in preparation for possible shift of nfsd threads to workqueues" * tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits) nfsd: deal with DELEGRETURN racing with CB_RECALL nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case nfsd: ensure that delegation stateid hash references are only put once nfsd: ensure that the ol stateid hash reference is only put once net: sunrpc: fix tracepoint Warning: unknown op '->' nfsd: allow more than one laundry job to run at a time nfsd: don't WARN/backtrace for invalid container deployment. fs: fix fs/locks.c kernel-doc warning nfsd: Add Jeff Layton as co-maintainer NFSD: Return word2 bitmask if setting security label in OPEN/CREATE NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1 nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL. nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug NFSD: Store parent's stat in a separate value nfsd: Fix two typos in comments lockd: NLM grace period shouldn't block NFSv4 opens nfsd: include linux/nfs4.h in export.h sunrpc: Switch to using hash list instead single list sunrpc/nfsd: Remove redundant code by exports seq_operations functions sunrpc: Store cache_detail in seq_file's private directly ...
2015-09-03Merge tag 'for-f2fs-4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The major work includes fixing and enhancing the existing extent_cache feature, which has been well settling down so far and now it becomes a default mount option accordingly. Also, this version newly registers a f2fs memory shrinker to reclaim several objects consumed by a couple of data structures in order to avoid memory pressures. Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which triggers a cleaning job explicitly by users. Most of the other patches are to fix bugs occurred in the corner cases across the whole code area" * tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits) f2fs: upset segment_info repair f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent f2fs: update extent tree in batches f2fs: fix to release inode correctly f2fs: handle f2fs_truncate error correctly f2fs: avoid unneeded initializing when converting inline dentry f2fs: atomically set inode->i_flags f2fs: fix wrong pointer access during try_to_free_nids f2fs: use __GFP_NOFAIL to avoid infinite loop f2fs: lookup neighbor extent nodes for merging later f2fs: split __insert_extent_tree_ret for readability f2fs: kill dead code in __insert_extent_tree f2fs: adjust showing of extent cache stat f2fs: add largest/cached stat in extent cache f2fs: fix incorrect mapping for bmap f2fs: add annotation for space utilization of regular/inline dentry f2fs: fix to update cached_en of extent tree properly f2fs: fix typo f2fs: check the node block address of newly allocated nid f2fs: go out for insert_inode_locked failure ...