From 89474d50a0d6b9a06e043351b4e658f4f0b6ee97 Mon Sep 17 00:00:00 2001 From: Eric Engestrom Date: Fri, 20 May 2016 16:58:07 -0700 Subject: Documentation: vm: fix spelling mistakes Signed-off-by: Eric Engestrom Cc: Jonathan Corbet Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/transhuge.txt | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt index fb0e1f2a19cc..7c871d6beb63 100644 --- a/Documentation/vm/transhuge.txt +++ b/Documentation/vm/transhuge.txt @@ -340,7 +340,7 @@ unaffected. libhugetlbfs will also work fine as usual. == Graceful fallback == -Code walking pagetables but unware about huge pmds can simply call +Code walking pagetables but unaware about huge pmds can simply call split_huge_pmd(vma, pmd, addr) where the pmd is the one returned by pmd_offset. It's trivial to make the code transparent hugepage aware by just grepping for "pmd_offset" and adding split_huge_pmd where @@ -414,7 +414,7 @@ tracking. The alternative is alter ->_mapcount in all subpages on each map/unmap of the whole compound page. We set PG_double_map when a PMD of the page got split for the first time, -but still have PMD mapping. The addtional references go away with last +but still have PMD mapping. The additional references go away with last compound_mapcount. split_huge_page internally has to distribute the refcounts in the head @@ -432,10 +432,10 @@ page->_mapcount. We safe against physical memory scanners too: the only legitimate way scanner can get reference to a page is get_page_unless_zero(). -All tail pages has zero ->_refcount until atomic_add(). It prevent scanner -from geting reference to tail page up to the point. After the atomic_add() -we don't care about ->_refcount value. We already known how many references -with should uncharge from head page. +All tail pages have zero ->_refcount until atomic_add(). This prevents the +scanner from getting a reference to the tail page up to that point. After the +atomic_add() we don't care about the ->_refcount value. We already known how +many references should be uncharged from the head page. For head page get_page_unless_zero() will succeed and we don't mind. It's clear where reference should go after split: it will stay on head page. -- cgit From 9a001fc19cccdeb9be4c3b89ad089d92df303c44 Mon Sep 17 00:00:00 2001 From: Vitaly Wool Date: Fri, 20 May 2016 16:58:30 -0700 Subject: z3fold: the 3-fold allocator for compressed pages This patch introduces z3fold, a special purpose allocator for storing compressed pages. It is designed to store up to three compressed pages per physical page. It is a ZBUD derivative which allows for higher compression ratio keeping the simplicity and determinism of its predecessor. This patch comes as a follow-up to the discussions at the Embedded Linux Conference in San-Diego related to the talk [1]. The outcome of these discussions was that it would be good to have a compressed page allocator as stable and deterministic as zbud with with higher compression ratio. To keep the determinism and simplicity, z3fold, just like zbud, always stores an integral number of compressed pages per page, but it can store up to 3 pages unlike zbud which can store at most 2. Therefore the compression ratio goes to around 2.6x while zbud's one is around 1.7x. The patch is based on the latest linux.git tree. This version has been updated after testing on various simulators (e.g. ARM Versatile Express, MIPS Malta, x86_64/Haswell) and basing on comments from Dan Streetman [3]. [1] https://openiotelc2016.sched.org/event/6DAC/swapping-and-embedded-compression-relieves-the-pressure-vitaly-wool-softprise-consulting-ou [2] https://lkml.org/lkml/2016/4/21/799 [3] https://lkml.org/lkml/2016/5/4/852 Link: http://lkml.kernel.org/r/20160509151753.ec3f9fda3c9898d31ff52a32@gmail.com Signed-off-by: Vitaly Wool Cc: Seth Jennings Cc: Dan Streetman Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/z3fold.txt | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 Documentation/vm/z3fold.txt (limited to 'Documentation') diff --git a/Documentation/vm/z3fold.txt b/Documentation/vm/z3fold.txt new file mode 100644 index 000000000000..38e4dac810b6 --- /dev/null +++ b/Documentation/vm/z3fold.txt @@ -0,0 +1,26 @@ +z3fold +------ + +z3fold is a special purpose allocator for storing compressed pages. +It is designed to store up to three compressed pages per physical page. +It is a zbud derivative which allows for higher compression +ratio keeping the simplicity and determinism of its predecessor. + +The main differences between z3fold and zbud are: +* unlike zbud, z3fold allows for up to PAGE_SIZE allocations +* z3fold can hold up to 3 compressed pages in its page +* z3fold doesn't export any API itself and is thus intended to be used + via the zpool API. + +To keep the determinism and simplicity, z3fold, just like zbud, always +stores an integral number of compressed pages per page, but it can store +up to 3 pages unlike zbud which can store at most 2. Therefore the +compression ratio goes to around 2.7x while zbud's one is around 1.7x. + +Unlike zbud (but like zsmalloc for that matter) z3fold_alloc() does not +return a dereferenceable pointer. Instead, it returns an unsigned long +handle which encodes actual location of the allocated object. + +Keeping effective compression ratio close to zsmalloc's, z3fold doesn't +depend on MMU enabled and provides more predictable reclaim behavior +which makes it a better fit for small and response-critical systems. -- cgit From 43209ea2d17aae1540d4e28274e36404f72702f2 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 20 May 2016 16:59:59 -0700 Subject: zram: remove max_comp_streams internals Remove the internal part of max_comp_streams interface, since we switched to per-cpu streams. We will keep RW max_comp_streams attr around, because: a) we may (silently) switch back to idle compression streams list and don't want to disturb user space b) max_comp_streams attr must wait for the next 'lay off cycle'; we give user space 2 years to adjust before we remove/downgrade the attr, and there are already several attrs scheduled for removal in 4.11, so it's too late for max_comp_streams. This slightly change a user visible behaviour: - First, reading from max_comp_stream file now will always return the number of online CPUs. - Second, writing to max_comp_stream will not take any effect. Link: http://lkml.kernel.org/r/20160503165546.25201-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky Cc: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/blockdev/zram.txt | 27 ++++++++------------------- 1 file changed, 8 insertions(+), 19 deletions(-) (limited to 'Documentation') diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 5bda5031c83d..d88f0c70cd7f 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -59,27 +59,16 @@ num_devices parameter is optional and tells zram how many devices should be pre-created. Default: 1. 2) Set max number of compression streams - Compression backend may use up to max_comp_streams compression streams, - thus allowing up to max_comp_streams concurrent compression operations. - By default, compression backend uses single compression stream. - - Examples: - #show max compression streams number + Regardless the value passed to this attribute, ZRAM will always + allocate multiple compression streams - one per online CPUs - thus + allowing several concurrent compression operations. The number of + allocated compression streams goes down when some of the CPUs + become offline. There is no single-compression-stream mode anymore, + unless you are running a UP system or has only 1 CPU online. + + To find out how many streams are currently available: cat /sys/block/zram0/max_comp_streams - #set max compression streams number to 3 - echo 3 > /sys/block/zram0/max_comp_streams - -Note: -In order to enable compression backend's multi stream support max_comp_streams -must be initially set to desired concurrency level before ZRAM device -initialisation. Once the device initialised as a single stream compression -backend (max_comp_streams equals to 1), you will see error if you try to change -the value of max_comp_streams because single stream compression backend -implemented as a special case by lock overhead issue and does not support -dynamic max_comp_streams. Only multi stream backend supports dynamic -max_comp_streams adjustment. - 3) Select compression algorithm Using comp_algorithm device attribute one can see available and currently selected (shown in square brackets) compression algorithms, -- cgit From 623e47fc64f8de480b322b7ed68855f97137e2a5 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Fri, 20 May 2016 17:00:02 -0700 Subject: zram: introduce per-device debug_stat sysfs node debug_stat sysfs is read-only and represents various debugging data that zram developers may need. This file is not meant to be used by anyone else: its content is not documented and will change any time w/o any notice. Therefore, the output of debug_stat file contains a version string. To avoid any confusion, we will increase the version number every time we modify the output. At the moment this file exports only one value -- the number of re-compressions, IOW, the number of times compression fast path has failed. This stat is temporary any will be useful in case if any per-cpu compression streams regressions will be reported. Link: http://lkml.kernel.org/r/20160513230834.GB26763@bbox Link: http://lkml.kernel.org/r/20160511134553.12655-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky Signed-off-by: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/ABI/testing/sysfs-block-zram | 9 +++++++++ Documentation/blockdev/zram.txt | 1 + 2 files changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 2e69e83bf510..4518d30b8c2e 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -166,3 +166,12 @@ Description: The mm_stat file is read-only and represents device's mm statistics (orig_data_size, compr_data_size, etc.) in a format similar to block layer statistics file format. + +What: /sys/block/zram/debug_stat +Date: July 2016 +Contact: Sergey Senozhatsky +Description: + The debug_stat file is read-only and represents various + device's debugging info useful for kernel developers. Its + format is not documented intentionally and may change + anytime without any notice. diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index d88f0c70cd7f..13100fb3c26d 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -172,6 +172,7 @@ mem_limit RW the maximum amount of memory ZRAM can use to store pages_compacted RO the number of pages freed during compaction (available only via zram/mm_stat node) compact WO trigger memory compaction +debug_stat RO this file is used for zram debugging purposes WARNING ======= -- cgit From 3e42979e65dace1f9268dd5440e5ab096b8dee59 Mon Sep 17 00:00:00 2001 From: "Richard W.M. Jones" Date: Fri, 20 May 2016 17:00:05 -0700 Subject: procfs: expose umask in /proc//status It's not possible to read the process umask without also modifying it, which is what umask(2) does. A library cannot read umask safely, especially if the main program might be multithreaded. Add a new status line ("Umask") in /proc//status. It contains the file mode creation mask (umask) in octal. It is only shown for tasks which have task->fs. This patch is adapted from one originally written by Pierre Carrier. The use case is that we have endless trouble with people setting weird umask() values (usually on the grounds of "security"), and then everything breaking. I'm on the hook to fix these. We'd like to add debugging to our program so we can dump out the umask in debug reports. Previous versions of the patch used a syscall so you could only read your own umask. That's all I need. However there was quite a lot of push-back from those, so this new version exports it in /proc. See: https://lkml.org/lkml/2016/4/13/704 [umask2] https://lkml.org/lkml/2016/4/13/487 [getumask] Signed-off-by: Richard W.M. Jones Acked-by: Konstantin Khlebnikov Acked-by: Jerome Marchand Acked-by: Kees Cook Cc: "Theodore Ts'o" Cc: Michal Hocko Cc: Pierre Carrier Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 7f5607a089b4..e8d00759bfa5 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -225,6 +225,7 @@ Table 1-2: Contents of the status files (as of 4.1) TracerPid PID of process tracing this process (0 if not) Uid Real, effective, saved set, and file system UIDs Gid Real, effective, saved set, and file system GIDs + Umask file mode creation mask FDSize number of file descriptor slots currently allocated Groups supplementary group list NStgid descendant namespace thread group ID hierarchy -- cgit