summaryrefslogtreecommitdiff
path: root/fs/proc
AgeCommit message (Collapse)Author
2018-10-09proc/vmcore: Fix i386 build error of missing copy_oldmem_page_encrypted()Borislav Petkov
Lianbo reported a build error with a particular 32-bit config, see Link below for details. Provide a weak copy_oldmem_page_encrypted() function which architectures can override, in the same manner other functionality in that file is supplied. Reported-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: x86@kernel.org Link: http://lkml.kernel.org/r/710b9d95-2f70-eadf-c4a1-c3dc80ee4ebb@redhat.com
2018-10-06kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabledLianbo Jiang
In the kdump kernel, the memory of the first kernel needs to be dumped into the vmcore file. If SME is enabled in the first kernel, the old memory has to be remapped with the memory encryption mask in order to access it properly. Split copy_oldmem_page() functionality to handle encrypted memory properly. [ bp: Heavily massage everything. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: jroedel@suse.de Link: https://lkml.kernel.org/r/be7b47f9-6be6-e0d1-2c2a-9125bc74b818@redhat.com
2018-10-05proc: restrict kernel stack dumps to rootJann Horn
Currently, you can use /proc/self/task/*/stack to cause a stack walk on a task you control while it is running on another CPU. That means that the stack can change under the stack walker. The stack walker does have guards against going completely off the rails and into random kernel memory, but it can interpret random data from your kernel stack as instruction pointers and stack pointers. This can cause exposure of kernel stack contents to userspace. Restrict the ability to inspect kernel stacks of arbitrary tasks to root in order to prevent a local attacker from exploiting racy stack unwinding to leak kernel task stack contents. See the added comment for a longer rationale. There don't seem to be any users of this userspace API that can't gracefully bail out if reading from the file fails. Therefore, I believe that this change is unlikely to break things. In the case that this patch does end up needing a revert, the next-best solution might be to fake a single-entry stack based on wchan. Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com Fixes: 2ec220e27f50 ("proc: add /proc/*/stack") Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ken Chen <kenchen@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-29xarray: Replace exceptional entriesMatthew Wilcox
Introduce xarray value entries and tagged pointers to replace radix tree exceptional entries. This is a slight change in encoding to allow the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a value entry). It is also a change in emphasis; exceptional entries are intimidating and different. As the comment explains, you can choose to store values or pointers in the xarray and they are both first-class citizens. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-09-20fs/proc/kcore.c: fix invalid memory access in multi-page read optimizationDominique Martinet
The 'm' kcore_list item could point to kclist_head, and it is incorrect to look at m->addr / m->size in this case. There is no choice but to run through the list of entries for every address if we did not find any entry in the previous iteration Reset 'm' to NULL in that case at Omar Sandoval's suggestion. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/1536100702-28706-1-git-send-email-asmadeus@codewreck.org Fixes: bf991c2231117 ("proc/kcore: optimize multiple page reads") Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@osandov.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-04fs/proc: Show STACKLEAK metrics in the /proc file systemAlexander Popov
Introduce CONFIG_STACKLEAK_METRICS providing STACKLEAK information about tasks via the /proc file system. In particular, /proc/<pid>/stack_depth shows the maximum kernel stack consumption for the current and previous syscalls. Although this information is not precise, it can be useful for estimating the STACKLEAK performance impact for your workloads. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alexander Popov <alex.popov@linux.com> Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-08-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Kernel: - Improve kallsyms coverage - Add x86 entry trampolines to kcore - Fix ARM SPE handling - Correct PPC event post processing Tools: - Make the build system more robust - Small fixes and enhancements all over the place - Update kernel ABI header copies - Preparatory work for converting libtraceevnt to a shared library - License cleanups" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy' tools arch x86: Update tools's copy of cpufeatures.h perf python: Fix pyrf_evlist__read_on_cpu() interface perf mmap: Store real cpu number in 'struct perf_mmap' perf tools: Remove ext from struct kmod_path perf tools: Add gzip_is_compressed function perf tools: Add lzma_is_compressed function perf tools: Add is_compressed callback to compressions array perf tools: Move the temp file processing into decompress_kmodule perf tools: Use compression id in decompress_kmodule() perf tools: Store compression id into struct dso perf tools: Add compression id into 'struct kmod_path' perf tools: Make is_supported_compression() static perf tools: Make decompress_to_file() function static perf tools: Get rid of dso__needs_decompress() call in __open_dso() perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble() perf tools: Get rid of dso__needs_decompress() call in read_object_code() tools lib traceevent: Change to SPDX License format perf llvm: Allow passing options to llc in addition to clang perf parser: Improve error message for PMU address filters ...
2018-08-23fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu buildsArnd Bergmann
Without CONFIG_MMU, we get a build warning: fs/proc/vmcore.c:228:12: error: 'vmcoredd_mmap_dumps' defined but not used [-Werror=unused-function] static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst, The function is only referenced from an #ifdef'ed caller, so this uses the same #ifdef around it. Link: http://lkml.kernel.org/r/20180525213526.2117790-1-arnd@arndb.de Fixes: 7efe48df8a3d ("vmcore: append device dumps to vmcore as elf notes") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Ganesh Goudar <ganeshgr@chelsio.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: add vmcoreinfo note to /proc/kcoreOmar Sandoval
The vmcoreinfo information is useful for runtime debugging tools, not just for crash dumps. A lot of this information can be determined by other means, but this is much more convenient, and it only adds a page at most to the file. Link: http://lkml.kernel.org/r/fddbcd08eed76344863303878b12de1c1e2a04b6.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: optimize multiple page readsOmar Sandoval
The current code does a full search of the segment list every time for every page. This is wasteful, since it's almost certain that the next page will be in the same segment. Instead, check if the previous segment covers the current page before doing the list search. Link: http://lkml.kernel.org/r/fd346c11090cf93d867e01b8d73a6567c5ac6361.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: clean up ELF header generationOmar Sandoval
Currently, the ELF file header, program headers, and note segment are allocated all at once, in some icky code dating back to 2.3. Programs tend to read the file header, then the program headers, then the note segment, all separately, so this is a waste of effort. It's cleaner and more efficient to handle the three separately. Link: http://lkml.kernel.org/r/19c92cbad0e11f6103ff3274b2e7a7e51a1eb74b.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: hold lock during readOmar Sandoval
Now that we're using an rwsem, we can hold it during the entirety of read_kcore() and have a common return path. This is preparation for the next change. [akpm@linux-foundation.org: fix locking bug reported by Tetsuo Handa] Link: http://lkml.kernel.org/r/d7cfbc1e8a76616f3b699eaff9df0a2730380534.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: fix memory hotplug vs multiple opens raceOmar Sandoval
There's a theoretical race condition that will cause /proc/kcore to miss a memory hotplug event: CPU0 CPU1 // hotplug event 1 kcore_need_update = 1 open_kcore() open_kcore() kcore_update_ram() kcore_update_ram() // Walk RAM // Walk RAM __kcore_update_ram() __kcore_update_ram() kcore_need_update = 0 // hotplug event 2 kcore_need_update = 1 kcore_need_update = 0 Note that CPU1 set up the RAM kcore entries with the state after hotplug event 1 but cleared the flag for hotplug event 2. The RAM entries will therefore be stale until there is another hotplug event. This is an extremely unlikely sequence of events, but the fix makes the synchronization saner, anyways: we serialize the entire update sequence, which means that whoever clears the flag will always succeed in replacing the kcore list. Link: http://lkml.kernel.org/r/6106c509998779730c12400c1b996425df7d7089.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: replace kclist_lock rwlock with rwsemOmar Sandoval
Now we only need kclist_lock from user context and at fs init time, and the following changes need to sleep while holding the kclist_lock. Link: http://lkml.kernel.org/r/521ba449ebe921d905177410fee9222d07882f0d.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: don't grab lock for memory hotplug notifierOmar Sandoval
The memory hotplug notifier kcore_callback() only needs kclist_lock to prevent races with __kcore_update_ram(), but we can easily eliminate that race by using an atomic xchg() in __kcore_update_ram(). This is preparation for converting kclist_lock to an rwsem. Link: http://lkml.kernel.org/r/0a4bc89f4dbde8b5b2ea309f7b4fb6a85fe29df2.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc/kcore: don't grab lock for kclist_add()Omar Sandoval
Patch series "/proc/kcore improvements", v4. This series makes a few improvements to /proc/kcore. It fixes a couple of small issues in v3 but is otherwise the same. Patches 1, 2, and 3 are prep patches. Patch 4 is a fix/cleanup. Patch 5 is another prep patch. Patches 6 and 7 are optimizations to ->read(). Patch 8 makes it possible to enable CRASH_CORE on any architecture, which is needed for patch 9. Patch 9 adds vmcoreinfo to /proc/kcore. This patch (of 9): kclist_add() is only called at init time, so there's no point in grabbing any locks. We're also going to replace the rwlock with a rwsem, which we don't want to try grabbing during early boot. While we're here, mark kclist_add() with __init so that we'll get a warning if it's called from non-init code. Link: http://lkml.kernel.org/r/98208db1faf167aa8b08eebfa968d95c70527739.1531953780.git.osandov@fb.com Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22fs/proc/kcore.c: use __pa_symbol() for KCORE_TEXT list entriesJames Morse
elf_kcore_store_hdr() uses __pa() to find the physical address of KCORE_RAM or KCORE_TEXT entries exported as program headers. This trips CONFIG_DEBUG_VIRTUAL's checks, as the KCORE_TEXT entries are not in the linear map. Handle these two cases separately, using __pa_symbol() for the KCORE_TEXT entries. Link: http://lkml.kernel.org/r/20180711131944.15252-1-james.morse@arm.com Signed-off-by: James Morse <james.morse@arm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22fs/proc/vmcore.c: use new typedef vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handler in struct vm_operations_struct. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. See 1c8f422059ae ("mm: change return type to vm_fault_t") for reference. Link: http://lkml.kernel.org/r/20180702153325.GA3875@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ganesh Goudar <ganeshgr@chelsio.com> Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc: use "unsigned int" in /proc/stat hookAlexey Dobriyan
Number of CPUs is never high enough to force 64-bit arithmetic. Save couple of bytes on x86_64. Link: http://lkml.kernel.org/r/20180627200710.GC18434@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc: spread "const" a bitAlexey Dobriyan
Link: http://lkml.kernel.org/r/20180627200614.GB18434@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc: use macro in /proc/latency hookAlexey Dobriyan
->latency_record is defined as struct latency_record[LT_SAVECOUNT]; so use the same macro whie iterating. Link: http://lkml.kernel.org/r/20180627200534.GA18434@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc: save 2 atomic ops on write to "/proc/*/attr/*"Alexey Dobriyan
Code checks if write is done by current to its own attributes. For that get/put pair is unnecessary as it can be done under RCU. Note: rcu_read_unlock() can be done even earlier since pointer to a task is not dereferenced. It depends if /proc code should look scary or not: rcu_read_lock(); task = pid_task(...); rcu_read_unlock(); if (!task) return -ESRCH; if (task != current) return -EACCESS: P.S.: rename "length" variable. Code like this length = -EINVAL; should not exist. Link: http://lkml.kernel.org/r/20180627200218.GF18113@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc: put task earlier in /proc/*/fail-nthAlexey Dobriyan
Link: http://lkml.kernel.org/r/20180627195427.GE18113@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc: smaller readlock section in readdir("/proc")Alexey Dobriyan
Readdir context is thread local, so ->pos is thread local, move it out of readlock. Link: http://lkml.kernel.org/r/20180627195339.GD18113@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22fs/proc/uptime.c: use ktime_get_boottime_ts64Arnd Bergmann
get_monotonic_boottime() is deprecated and uses the old timespec type. Let's convert /proc/uptime to use ktime_get_boottime_ts64(). Link: http://lkml.kernel.org/r/20180620081746.282742-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22proc: fixup PDE allocation bloatAlexey Dobriyan
24074a35c5c975 ("proc: Make inline name size calculation automatic") started to put PDE allocations into kmalloc-256 which is unnecessary as ~40 character names are very rare. Put allocation back into kmalloc-192 cache for 64-bit non-debug builds. Put BUILD_BUG_ON to know when PDE size has gotten out of control. [adobriyan@gmail.com: fix BUILD_BUG_ON breakage on powerpc64] Link: http://lkml.kernel.org/r/20180703191602.GA25521@avx2 Link: http://lkml.kernel.org/r/20180617215732.GA24688@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22/proc/meminfo: add percpu populated pages countDennis Zhou (Facebook)
Currently, percpu memory only exposes allocation and utilization information via debugfs. This more or less is only really useful for understanding the fragmentation and allocation information at a per-chunk level with a few global counters. This is also gated behind a config. BPF and cgroup, for example, have seen an increase in use causing increased use of percpu memory. Let's make it easier for someone to identify how much memory is being used. This patch adds the "Percpu" stat to meminfo to more easily look up how much percpu memory is in use. This number includes the cost for all allocated backing pages and not just insight at the per a unit, per chunk level. Metadata is excluded. I think excluding metadata is fair because the backing memory scales with the numbere of cpus and can quickly outweigh the metadata. It also makes this calculation light. Link: http://lkml.kernel.org/r/20180807184723.74919-1-dennisszhou@gmail.com Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: /proc/pid/smaps_rollup: convert to single value seq_fileVlastimil Babka
The /proc/pid/smaps_rollup file is currently implemented via the m_start/m_next/m_stop seq_file iterators shared with the other maps files, that iterate over vma's. However, the rollup file doesn't print anything for each vma, only accumulate the stats. There are some issues with the current code as reported in [1] - the accumulated stats can get skewed if seq_file start()/stop() op is called multiple times, if show() is called multiple times, and after seeks to non-zero position. Patch [1] fixed those within existing design, but I believe it is fundamentally wrong to expose the vma iterators to the seq_file mechanism when smaps_rollup shows logically a single set of values for the whole address space. This patch thus refactors the code to provide a single "value" at offset 0, with vma iteration to gather the stats done internally. This fixes the situations where results are skewed, and simplifies the code, especially in show_smap(), at the expense of somewhat less code reuse. [1] https://marc.info/?l=linux-mm&m=151927723128134&w=2 [vbabka@suse.c: use seq_file infrastructure] Link: http://lkml.kernel.org/r/bf4525b0-fd5b-4c4c-2cb3-adee3dd95a48@suse.cz Link: http://lkml.kernel.org/r/20180723111933.15443-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Daniel Colascione <dancol@google.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: /proc/pid/smaps: factor out common stats printingVlastimil Babka
To prepare for handling /proc/pid/smaps_rollup differently from /proc/pid/smaps factor out from show_smap() printing the parts of output that are common for both variants, which is the bulk of the gathered memory stats. [vbabka@suse.cz: add const, per Alexey] Link: http://lkml.kernel.org/r/b45f319f-cd04-337b-37f8-77f99786aa8a@suse.cz Link: http://lkml.kernel.org/r/20180723111933.15443-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Colascione <dancol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: /proc/pid/smaps: factor out mem stats gatheringVlastimil Babka
To prepare for handling /proc/pid/smaps_rollup differently from /proc/pid/smaps factor out vma mem stats gathering from show_smap() - it will be used by both. Link: http://lkml.kernel.org/r/20180723111933.15443-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Colascione <dancol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22mm: /proc/pid/*maps remove is_pid and related wrappersVlastimil Babka
Patch series "cleanups and refactor of /proc/pid/smaps*". The recent regression in /proc/pid/smaps made me look more into the code. Especially the issues with smaps_rollup reported in [1] as explained in Patch 4, which fixes them by refactoring the code. Patches 2 and 3 are preparations for that. Patch 1 is me realizing that there's a lot of boilerplate left from times where we tried (unsuccessfuly) to mark thread stacks in the output. Originally I had also plans to rework the translation from /proc/pid/*maps* file offsets to the internal structures. Now the offset means "vma number", which is not really stable (vma's can come and go between read() calls) and there's an extra caching of last vma's address. My idea was that offsets would be interpreted directly as addresses, which would also allow meaningful seeks (see the ugly seek_to_smaps_entry() in tools/testing/selftests/vm/mlock2.h). However loff_t is (signed) long long so that might be insufficient somewhere for the unsigned long addresses. So the result is fixed issues with skewed /proc/pid/smaps_rollup results, simpler smaps code, and a lot of unused code removed. [1] https://marc.info/?l=linux-mm&m=151927723128134&w=2 This patch (of 4): Commit b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps") introduced differences between /proc/PID/maps and /proc/PID/task/TID/maps to mark thread stacks properly, and this was also done for smaps and numa_maps. However it didn't work properly and was ultimately removed by commit b18cb64ead40 ("fs/proc: Stop trying to report thread stacks"). Now the is_pid parameter for the related show_*() functions is unused and we can remove it together with wrapper functions and ops structures that differ for PID and TID cases only in this parameter. Link: http://lkml.kernel.org/r/20180723111933.15443-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Colascione <dancol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-14x86: Add entry trampolines to kcoreAdrian Hunter
Without program headers for PTI entry trampoline pages, the trampoline virtual addresses do not map to anything. Example before: sudo gdb --quiet vmlinux /proc/kcore Reading symbols from vmlinux...done. [New process 1] Core was generated by `BOOT_IMAGE=/boot/vmlinuz-4.16.0 root=UUID=a6096b83-b763-4101-807e-f33daff63233'. #0 0x0000000000000000 in irq_stack_union () (gdb) x /21ib 0xfffffe0000006000 0xfffffe0000006000: Cannot access memory at address 0xfffffe0000006000 (gdb) quit After: sudo gdb --quiet vmlinux /proc/kcore [sudo] password for ahunter: Reading symbols from vmlinux...done. [New process 1] Core was generated by `BOOT_IMAGE=/boot/vmlinuz-4.16.0-fix-4-00005-gd6e65a8b4072 root=UUID=a6096b83-b7'. #0 0x0000000000000000 in irq_stack_union () (gdb) x /21ib 0xfffffe0000006000 0xfffffe0000006000: swapgs 0xfffffe0000006003: mov %rsp,-0x3e12(%rip) # 0xfffffe00000021f8 0xfffffe000000600a: xchg %ax,%ax 0xfffffe000000600c: mov %cr3,%rsp 0xfffffe000000600f: bts $0x3f,%rsp 0xfffffe0000006014: and $0xffffffffffffe7ff,%rsp 0xfffffe000000601b: mov %rsp,%cr3 0xfffffe000000601e: mov -0x3019(%rip),%rsp # 0xfffffe000000300c 0xfffffe0000006025: pushq $0x2b 0xfffffe0000006027: pushq -0x3e35(%rip) # 0xfffffe00000021f8 0xfffffe000000602d: push %r11 0xfffffe000000602f: pushq $0x33 0xfffffe0000006031: push %rcx 0xfffffe0000006032: push %rdi 0xfffffe0000006033: mov $0xffffffff91a00010,%rdi 0xfffffe000000603a: callq 0xfffffe0000006046 0xfffffe000000603f: pause 0xfffffe0000006041: lfence 0xfffffe0000006044: jmp 0xfffffe000000603f 0xfffffe0000006046: mov %rdi,(%rsp) 0xfffffe000000604a: retq (gdb) quit In addition, entry trampolines all map to the same page. Represent that by giving the corresponding program headers in kcore the same offset. This has the benefit that, when perf tools uses /proc/kcore as a source for kernel object code, samples from different CPU trampolines are aggregated together. Note, such aggregation is normal for profiling i.e. people want to profile the object code, not every different virtual address the object code might be mapped to (across different processes for example). Notes by PeterZ: This also adds the KCORE_REMAP functionality. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1528289651-4113-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-14fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*Vlastimil Babka
Thomas reports: "While looking around in /proc on my v4.14.52 system I noticed that all processes got a lot of "Locked" memory in /proc/*/smaps. A lot more memory than a regular user can usually lock with mlock(). Commit 493b0e9d945f (in v4.14-rc1) seems to have changed the behavior of "Locked". Before that commit the code was like this. Notice the VM_LOCKED check. (vma->vm_flags & VM_LOCKED) ? (unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0); After that commit Locked is now the same as Pss: (unsigned long)(mss->pss >> (10 + PSS_SHIFT))); This looks like a mistake." Indeed, the commit has added mss->pss_locked with the correct value that depends on VM_LOCKED, but forgot to actually use it. Fix it. Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Colascione <dancol@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-01Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fix from Al Viro: "Followup to procfs-seq_file series this window" This fixes a memory leak by making sure that proc seq files release any private data on close. The 'proc_seq_open' has to be properly paired with 'proc_seq_release' that releases the extra private data. * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: proc: add proc_seq_release
2018-06-27proc: add proc_seq_releaseChunyu Hu
kmemleak reported some memory leak on reading proc files. After adding some debug lines, find that proc_seq_fops is using seq_release as release handler, which won't handle the free of 'private' field of seq_file, while in fact the open handler proc_seq_open could create the private data with __seq_open_private when state_size is greater than zero. So after reading files created with proc_create_seq_private, such as /proc/timer_list and /proc/vmallocinfo, the private mem of a seq_file is not freed. Fix it by adding the paired proc_seq_release as the default release handler of proc_seq_ops instead of seq_release. Fixes: 44414d82cfe0 ("proc: introduce proc_create_seq_private") Reviewed-by: Christoph Hellwig <hch@lst.de> CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-20proc: fix missing final NUL in get_mm_cmdline() rewriteLinus Torvalds
The rewrite of the cmdline fetching missed the fact that we used to also return the final terminating NUL character of the last argument. I hadn't noticed, and none of the tools I tested cared, but something obviously must care, because Michal Kubecek noticed the change in behavior. Tweak the "find the end" logic to actually include the NUL character, and once past the eend of argv, always start the strnlen() at the expected (original) argument end. This whole "allow people to rewrite their arguments in place" is a nasty hack and requires that odd slop handling at the end of the argv array, but it's our traditional model, so we continue to support it. Repored-and-bisected-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-and-tested-by: Michal Kubecek <mkubecek@suse.cz> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-16Merge branch 'afs-proc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "Assorted AFS stuff - ended up in vfs.git since most of that consists of David's AFS-related followups to Christoph's procfs series" * 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Optimise callback breaking by not repeating volume lookup afs: Display manually added cells in dynamic root mount afs: Enable IPv6 DNS lookups afs: Show all of a server's addresses in /proc/fs/afs/servers afs: Handle CONFIG_PROC_FS=n proc: Make inline name size calculation automatic afs: Implement network namespacing afs: Mark afs_net::ws_cell as __rcu and set using rcu functions afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus() proc: Add a way to make network proc files writable afs: Rearrange fs/afs/proc.c to remove remaining predeclarations. afs: Rearrange fs/afs/proc.c to move the show routines up afs: Rearrange fs/afs/proc.c by moving fops and open functions down afs: Move /proc management functions to the end of the file
2018-06-15proc: Make inline name size calculation automaticDavid Howells
Make calculation of the size of the inline name in struct proc_dir_entry automatic, rather than having to manually encode the numbers and failing to allow for lockdep. Require a minimum inline name size of 33+1 to allow for names that look like two hex numbers with a dash between. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - MM remainders - various misc things - kcov updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits) lib/test_printf.c: call wait_for_random_bytes() before plain %p tests hexagon: drop the unused variable zero_page_mask hexagon: fix printk format warning in setup.c mm: fix oom_kill event handling treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX mm: use octal not symbolic permissions ipc: use new return type vm_fault_t sysvipc/sem: mitigate semnum index against spectre v1 fault-injection: reorder config entries arm: port KCOV to arm sched/core / kcov: avoid kcov_area during task switch kcov: prefault the kcov_area kcov: ensure irq code sees a valid area kernel/relay.c: change return type to vm_fault_t exofs: avoid VLA in structures coredump: fix spam with zero VMA process fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block() proc: skip branch in /proc/*/* lookup mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns mm/memblock: add missing include <linux/bootmem.h> ...
2018-06-15proc: skip branch in /proc/*/* lookupAlexey Dobriyan
Code is structured like this: for ( ... p < last; p++) { if (memcmp == 0) break; } if (p >= last) ERROR OK gcc doesn't see that if if lookup succeeds than post loop branch will never be taken and skip it. [akpm@linux-foundation.org: proc_pident_instantiate() no longer takes an inode*] Link: http://lkml.kernel.org/r/20180423213954.GD9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
2018-06-14Merge branch 'vfs_timespec64' of https://github.com/deepa-hub/vfs into ↵Arnd Bergmann
vfs-timespec64 Pull the timespec64 conversion from Deepa Dinamani: "The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The flag patch applies cleanly. I've not seen the timestamps update logic change often. The series applies cleanly on 4.17-rc6 and linux-next tip (top commit: next-20180517). I'm not sure how to merge this kind of a series with a flag patch. We are targeting 4.18 for this. Let me know if you have other suggestions. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. I've tried to keep the conversions with the script simple, to aid in the reviews. I've kept all the internal filesystem data structures and function signatures the same. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions." I've pulled it into a branch based on top of the NFS changes that are now in mainline, so I could resolve the non-obvious conflict between the two while merging. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-09Merge branch 'proc-cmdline'Linus Torvalds
Merge proc_cmdline simplifications. This re-writes the get_mm_cmdline() logic to be rather simpler than it used to be, and makes the semantics for "cmdline goes past the end of the original area" more natural. You _can_ use prctl(PR_SET_MM) to just point your command line somewhere else entirely, but the traditional model is to just edit things in place and that still needs to continue to work. At least this way the code makes some sense. * proc-cmdline: fs/proc: simplify and clarify get_mm_cmdline() function fs/proc: re-factor proc_pid_cmdline_read() a bit
2018-06-08Merge branch 'work.lookup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull proc_fill_cache regression fix from Al Viro: "Regression fix for proc_fill_cache() braino introduced when switching instantiate() callback to d_splice_alias()" * 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix proc_fill_cache() in case of d_alloc_parallel() failure
2018-06-08fix proc_fill_cache() in case of d_alloc_parallel() failureAl Viro
If d_alloc_parallel() returns ERR_PTR(...), we don't want to dput() that. Small reorganization allows to have all error-in-lookup cases rejoin the main codepath after dput(child), avoiding the entire problem. Spotted-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: 0168b9e38c42 "procfs: switch instantiate_t to d_splice_alias()" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-07proc: use "unsigned int" for /proc/*/stackAlexey Dobriyan
struct stack_trace::nr_entries is defined as "unsigned int" (YAY!) so the iterator should be unsigned as well. It saves 1 byte of code or something like that. Link: http://lkml.kernel.org/r/20180423215248.GG9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07proc: use "unsigned int" for sigqueue lengthAlexey Dobriyan
It's defined as atomic_t and really long signal queues are unheard of. Link: http://lkml.kernel.org/r/20180423215119.GF9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07proc: use "unsigned int" in proc_fill_cache()Alexey Dobriyan
All those lengths are unsigned as they should be. Link: http://lkml.kernel.org/r/20180423213751.GC9043@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>