Age | Commit message (Collapse) | Author |
|
smc_poll() returns with mask bit EPOLLPRI if the connection urg_state
is SMC_URG_VALID. Since SMC_URG_VALID is zero, smc_poll signals
EPOLLPRI errorneously if called in state SMC_INIT before the connection
is created, for instance in a non-blocking connect scenario.
This patch switches to non-zero values for the urg states.
Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
Fixes: de8474eb9d50 ("net/smc: urgent data support")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Paolo Abeni says:
====================
ipv6: route: enforce RCU protection for fib6_info->from
This series addresses a couple of RCU left-over dating back to rt6_info->from
conversion to RCU
v1 -> v2:
- fix a possible race in patch 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need a RCU critical section around rt6_info->from deference, and
proper annotation.
Fixes: 4ed591c8ab44 ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We must access rt6_info->from under RCU read lock: move the
dereference under such lock, with proper annotation.
v1 -> v2:
- avoid using multiple, racy, fetch operations for rt->from
Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull ceph fixes from Ilya Dryomov:
"Two bug fixes for old issues, both marked for stable"
* tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-client:
ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
libceph: handle an empty authorize reply
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss into char-misc-next
Johan writes:
GNSS updates for 5.1-rc1
Here are the GNSS updates for 5.1-rc1, including:
- a new driver for Mediatek-based receivers
- support for SiRF receivers without a wakeup signal
- support for a separate LNA supply for SiRF receivers
Included are also various clean ups and minor fixes.
All have been in linux-next with no reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'gnss-5.1-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss:
gnss: add driver for mediatek receivers
gnss: add mtk receiver type support
dt-bindings: gnss: add mediatek binding
dt-bindings: Add vendor prefix for "GlobalTop Technology, Inc."
dt-bindings: gnss: add lna-supply property
gnss: sirf: add a separate supply for a lna
dt-bindings: gnss: add w2sg0004 compatible string
gnss: sirf: add support for configurations without wakeup signal
gnss: sirf: write data to gnss only when the gnss device is open
gnss: sirf: drop redundant double negation
gnss: sirf: force hibernate mode on probe
gnss: sirf: fix premature wakeup interrupt enable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull late arm64 fixes from Will Deacon:
"Three small arm64 fixes for 5.0.
They fix a build breakage with clang introduced in 4.20, an oversight
in our sigframe restoration relating to the SSBS bit and a boot fix
for systems with newer revisions of our interrupt controller.
Summary:
- Fix handling of PSTATE.SSBS bit in sigreturn()
- Fix version checking of the GIC during early boot
- Fix clang builds failing due to use of NEON in the crypto code"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Relax GIC version check during early boot
arm64/neon: Disable -Wincompatible-pointer-types when building with Clang
arm64: fix SSBS sanitization
|
|
Merge misc fixes from Andrew Morton:
"23 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (23 commits)
mm, memory_hotplug: fix off-by-one in is_pageblock_removable
mm: don't let userspace spam allocations warnings
slub: fix a crash with SLUB_DEBUG + KASAN_SW_TAGS
kasan, slab: remove redundant kasan_slab_alloc hooks
kasan, slab: make freelist stored without tags
kasan, slab: fix conflicts with CONFIG_HARDENED_USERCOPY
kasan: prevent tracing of tags.c
kasan: fix random seed generation for tag-based mode
tmpfs: fix link accounting when a tmpfile is linked in
psi: avoid divide-by-zero crash inside virtual machines
mm: handle lru_add_drain_all for UP properly
mm, page_alloc: fix a division by zero error when boosting watermarks v2
mm/debug.c: fix __dump_page() for poisoned pages
proc, oom: do not report alien mms when setting oom_score_adj
slub: fix SLAB_CONSISTENCY_CHECKS + KASAN_SW_TAGS
kasan, slub: fix more conflicts with CONFIG_SLAB_FREELIST_HARDENED
kasan, slub: fix conflicts with CONFIG_SLAB_FREELIST_HARDENED
kasan, slub: move kasan_poison_slab hook before page_address
kmemleak: account for tagged pointers when calculating pointer range
kasan, kmemleak: pass tagged pointers to kmemleak
...
|
|
Rong Chen has reported the following boot crash:
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e4 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
RIP: 0010:page_mapping+0x12/0x80
Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202
RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 000000000000000a
RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff88801e5c0000
RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 0000000001b64d13
R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff88801e640000
R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 0000000000000001
FS: 00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000000006a0
Call Trace:
__dump_page+0x14/0x2c0
is_mem_section_removable+0x24c/0x2c0
removable_show+0x87/0xa0
dev_attr_show+0x25/0x60
sysfs_kf_seq_show+0xba/0x110
seq_read+0x196/0x3f0
__vfs_read+0x34/0x180
vfs_read+0xa0/0x150
ksys_read+0x44/0xb0
do_syscall_64+0x5e/0x4a0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
and bisected it down to commit efad4e475c31 ("mm, memory_hotplug:
is_mem_section_removable do not pass the end of a zone").
The reason for the crash is that the mapping is garbage for poisoned
(uninitialized) page. This shouldn't happen as all pages in the zone's
boundary should be initialized.
Later debugging revealed that the actual problem is an off-by-one when
evaluating the end_page. 'start_pfn + nr_pages' resp 'zone_end_pfn'
refers to a pfn after the range and as such it might belong to a
differen memory section.
This along with CONFIG_SPARSEMEM then makes the loop condition
completely bogus because a pointer arithmetic doesn't work for pages
from two different sections in that memory model.
Fix the issue by reworking is_pageblock_removable to be pfn based and
only use struct page where necessary. This makes the code slightly
easier to follow and we will remove the problematic pointer arithmetic
completely.
Link: http://lkml.kernel.org/r/20190218181544.14616-1-mhocko@kernel.org
Fixes: efad4e475c31 ("mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: <rong.a.chen@intel.com>
Tested-by: <rong.a.chen@intel.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
memdump_user usually gets fed unchecked userspace input. Blasting a
full backtrace into dmesg every time is a bit excessive - I'm not sure
on the kernel rule in general, but at least in drm we're trying not to
let unpriviledge userspace spam the logs freely. Definitely not entire
warning backtraces.
It also means more filtering for our CI, because our testsuite exercises
these corner cases and so hits these a lot.
Link: http://lkml.kernel.org/r/20190220204058.11676-1-daniel.vetter@ffwll.ch
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In process_slab(), "p = get_freepointer()" could return a tagged
pointer, but "addr = page_address()" always return a native pointer. As
the result, slab_index() is messed up here,
return (p - addr) / s->size;
All other callers of slab_index() have the same situation where "addr"
is from page_address(), so just need to untag "p".
# cat /sys/kernel/slab/hugetlbfs_inode_cache/alloc_calls
Unable to handle kernel paging request at virtual address 2bff808aa4856d48
Mem abort info:
ESR = 0x96000007
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000007
CM = 0, WnR = 0
swapper pgtable: 64k pages, 48-bit VAs, pgdp = 0000000002498338
[2bff808aa4856d48] pgd=00000097fcfd0003, pud=00000097fcfd0003, pmd=00000097fca30003, pte=00e8008b24850712
Internal error: Oops: 96000007 [#1] SMP
CPU: 3 PID: 79210 Comm: read_all Tainted: G L 5.0.0-rc7+ #84
Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.0.6 07/10/2018
pstate: 00400089 (nzcv daIf +PAN -UAO)
pc : get_map+0x78/0xec
lr : get_map+0xa0/0xec
sp : aeff808989e3f8e0
x29: aeff808989e3f940 x28: ffff800826200000
x27: ffff100012d47000 x26: 9700000000002500
x25: 0000000000000001 x24: 52ff8008200131f8
x23: 52ff8008200130a0 x22: 52ff800820013098
x21: ffff800826200000 x20: ffff100013172ba0
x19: 2bff808a8971bc00 x18: ffff1000148f5538
x17: 000000000000001b x16: 00000000000000ff
x15: ffff1000148f5000 x14: 00000000000000d2
x13: 0000000000000001 x12: 0000000000000000
x11: 0000000020000002 x10: 2bff808aa4856d48
x9 : 0000020000000000 x8 : 68ff80082620ebb0
x7 : 0000000000000000 x6 : ffff1000105da1dc
x5 : 0000000000000000 x4 : 0000000000000000
x3 : 0000000000000010 x2 : 2bff808a8971bc00
x1 : ffff7fe002098800 x0 : ffff80082620ceb0
Process read_all (pid: 79210, stack limit = 0x00000000f65b9361)
Call trace:
get_map+0x78/0xec
process_slab+0x7c/0x47c
list_locations+0xb0/0x3c8
alloc_calls_show+0x34/0x40
slab_attr_show+0x34/0x48
sysfs_kf_seq_show+0x2e4/0x570
kernfs_seq_show+0x12c/0x1a0
seq_read+0x48c/0xf84
kernfs_fop_read+0xd4/0x448
__vfs_read+0x94/0x5d4
vfs_read+0xcc/0x194
ksys_read+0x6c/0xe8
__arm64_sys_read+0x68/0xb0
el0_svc_handler+0x230/0x3bc
el0_svc+0x8/0xc
Code: d3467d2a 9ac92329 8b0a0e6a f9800151 (c85f7d4b)
---[ end trace a383a9a44ff13176 ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 1-7,32,40,127
Kernel Offset: disabled
CPU features: 0x002,20000c18
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/20190220020251.82039-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
kasan_slab_alloc() calls in kmem_cache_alloc() and kmem_cache_alloc_node()
are redundant as they are already called via slab_alloc/slab_alloc_node()->
slab_post_alloc_hook()->kasan_slab_alloc(). Remove them.
Link: http://lkml.kernel.org/r/4ca1655cdcfc4379c49c50f7bf80f81c4ad01485.1550602886.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Similarly to "kasan, slub: move kasan_poison_slab hook before
page_address", move kasan_poison_slab() before alloc_slabmgmt(), which
calls page_address(), to make page_address() return value to be
non-tagged. This, combined with calling kasan_reset_tag() for off-slab
slab management object, leads to freelist being stored non-tagged.
Link: http://lkml.kernel.org/r/dfb53b44a4d00de3879a05a9f04c1f55e584f7a1.1550602886.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Similarly to commit 96fedce27e13 ("kasan: make tag based mode work with
CONFIG_HARDENED_USERCOPY"), we need to reset pointer tags in
__check_heap_object() in mm/slab.c before doing any pointer math.
Link: http://lkml.kernel.org/r/9a5c0f958db10e69df5ff9f2b997866b56b7effc.1550602886.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Similarly to commit 0d0c8de8788b ("kasan: mark file common so ftrace
doesn't trace it") add the -pg flag to mm/kasan/tags.c to prevent
conflicts with tracing.
Link: http://lkml.kernel.org/r/9c4c3ce5ccfb894c7fe66d91de7c1da2787b4da4.1550602886.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are two issues with assigning random percpu seeds right now:
1. We use for_each_possible_cpu() to iterate over cpus, but cpumask is
not set up yet at the moment of kasan_init(), and thus we only set
the seed for cpu #0.
2. A call to get_random_u32() always returns the same number and produces
a message in dmesg, since the random subsystem is not yet initialized.
Fix 1 by calling kasan_init_tags() after cpumask is set up.
Fix 2 by using get_cycles() instead of get_random_u32(). This gives us
lower quality random numbers, but it's good enough, as KASAN is meant to
be used as a debugging tool and not a mitigation.
Link: http://lkml.kernel.org/r/1f815cc914b61f3516ed4cc9bfd9eeca9bd5d9de.1550677973.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
tmpfs has a peculiarity of accounting hard links as if they were
separate inodes: so that when the number of inodes is limited, as it is
by default, a user cannot soak up an unlimited amount of unreclaimable
dcache memory just by repeatedly linking a file.
But when v3.11 added O_TMPFILE, and the ability to use linkat() on the
fd, we missed accommodating this new case in tmpfs: "df -i" shows that
an extra "inode" remains accounted after the file is unlinked and the fd
closed and the actual inode evicted. If a user repeatedly links
tmpfiles into a tmpfs, the limit will be hit (ENOSPC) even after they
are deleted.
Just skip the extra reservation from shmem_link() in this case: there's
a sense in which this first link of a tmpfile is then cheaper than a
hard link of another file, but the accounting works out, and there's
still good limiting, so no need to do anything more complicated.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182134370.7035@eggly.anvils
Fixes: f4e0c30c191 ("allow the temp files created by open() to be linked to")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Matej Kupljen <matej.kupljen@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We've been seeing hard-to-trigger psi crashes when running inside VM
instances:
divide error: 0000 [#1] SMP PTI
Modules linked in: [...]
CPU: 0 PID: 212 Comm: kworker/0:2 Not tainted 4.16.18-119_fbk9_3817_gfe944c98d695 #119
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: events psi_clock
RIP: 0010:psi_update_stats+0x270/0x490
RSP: 0018:ffffc90001117e10 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800a35a13f8
RDX: 0000000000000000 RSI: ffff8800a35a1340 RDI: 0000000000000000
RBP: 0000000000000658 R08: ffff8800a35a1470 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000f8502
FS: 0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbe370fa000 CR3: 00000000b1e3a000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
psi_clock+0x12/0x50
process_one_work+0x1e0/0x390
worker_thread+0x2b/0x3c0
? rescuer_thread+0x330/0x330
kthread+0x113/0x130
? kthread_create_worker_on_cpu+0x40/0x40
? SyS_exit_group+0x10/0x10
ret_from_fork+0x35/0x40
Code: 48 0f 47 c7 48 01 c2 45 85 e4 48 89 16 0f 85 e6 00 00 00 4c 8b 49 10 4c 8b 51 08 49 69 d9 f2 07 00 00 48 6b c0 64 4c 8b 29 31 d2 <48> f7 f7 49 69 d5 8d 06 00 00 48 89 c5 4c 69 f0 00 98 0b 00 48
The Code-line points to `period` being 0 inside update_stats(), and we
divide by that when calculating that period's pressure percentage.
The elapsed period should never be 0. The reason this can happen is due
to an off-by-one in the idle time / missing period calculation combined
with a coarse sched_clock() in the virtual machine.
The target time for aggregation is advanced into the future on a fixed
grid to prevent clock drift. So when an aggregation runs after some idle
period, we can not just set it to "now + psi_period", but have to
calculate the downtime and advance the target time relative to itself.
However, if the aggregator was disabled exactly one psi_period (ns), we
drop one idle period in the calculation due to a > when we should do >=.
In that case, next_update will be advanced from 'now - psi_period' to
'now' when it should be moved to 'now + psi_period'. The run finishes
with last_update == next_update == sched_clock().
With hardware clocks, this exact nanosecond match isn't likely in the
first place; but if it does happen, the clock will still have moved on and
the period non-zero by the time the worker runs. A pointlessly short
period, but besides the extra work, no harm no foul. However, a slow
sched_clock() like we have on VMs might not have advanced either by the
time the worker runs again. And when we calculate the elapsed period, the
result, our pressure divisor, will be 0. Ouch.
Fix this by correctly handling the situation when the elapsed time between
aggregation runs is precisely two periods, and advance the expiration
timestamp correctly to period into the future.
Link: http://lkml.kernel.org/r/20190214193157.15788-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Łukasz Siudut <lsiudut@fb.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since for_each_cpu(cpu, mask) added by commit 2d3854a37e8b767a
("cpumask: introduce new API, without changing anything") did not
evaluate the mask argument if NR_CPUS == 1 due to CONFIG_SMP=n,
lru_add_drain_all() is hitting WARN_ON() at __flush_work() added by
commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without
INIT_WORK().") by unconditionally calling flush_work() [1].
Workaround this issue by using CONFIG_SMP=n specific lru_add_drain_all
implementation. There is no real need to defer the implementation to
the workqueue as the draining is going to happen on the local cpu. So
alias lru_add_drain_all to lru_add_drain which does all the necessary
work.
[akpm@linux-foundation.org: fix various build warnings]
[1] https://lkml.kernel.org/r/18a30387-6aa5-6123-e67c-57579ecc3f38@roeck-us.net
Link: http://lkml.kernel.org/r/20190213124334.GH4525@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Debugged-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Yury Norov reported that an arm64 KVM instance could not boot since
after v5.0-rc1 and could addressed by reverting the patches
1c30844d2dfe272d58c ("mm: reclaim small amounts of memory when an external
73444bc4d8f92e46a20 ("mm, page_alloc: do not wake kswapd with zone lock held")
The problem is that a division by zero error is possible if boosting
occurs very early in boot if the system has very little memory. This
patch avoids the division by zero error.
Link: http://lkml.kernel.org/r/20190213143012.GT9565@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Evaluating page_mapping() on a poisoned page ends up dereferencing junk
and making PF_POISONED_CHECK() considerably crashier than intended:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
Mem abort info:
ESR = 0x96000005
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
user pgtable: 4k pages, 39-bit VAs, pgdp = 00000000c2f6ac38
[0000000000000006] pgd=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 491 Comm: bash Not tainted 5.0.0-rc1+ #1
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Dec 17 2018
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : page_mapping+0x18/0x118
lr : __dump_page+0x1c/0x398
Process bash (pid: 491, stack limit = 0x000000004ebd4ecd)
Call trace:
page_mapping+0x18/0x118
__dump_page+0x1c/0x398
dump_page+0xc/0x18
remove_store+0xbc/0x120
dev_attr_store+0x18/0x28
sysfs_kf_write+0x40/0x50
kernfs_fop_write+0x130/0x1d8
__vfs_write+0x30/0x180
vfs_write+0xb4/0x1a0
ksys_write+0x60/0xd0
__arm64_sys_write+0x18/0x20
el0_svc_common+0x94/0xf8
el0_svc_handler+0x68/0x70
el0_svc+0x8/0xc
Code: f9400401 d1000422 f240003f 9a801040 (f9400402)
---[ end trace cdb5eb5bf435cecb ]---
Fix that by not inspecting the mapping until we've determined that it's
likely to be valid. Now the above condition still ends up stopping the
kernel, but in the correct manner:
page:ffffffbf20000000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
------------[ cut here ]------------
kernel BUG at ./include/linux/mm.h:1006!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 483 Comm: bash Not tainted 5.0.0-rc1+ #3
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Dec 17 2018
pstate: 40000005 (nZcv daif -PAN -UAO)
pc : remove_store+0xbc/0x120
lr : remove_store+0xbc/0x120
...
Link: http://lkml.kernel.org/r/03b53ee9d7e76cda4b9b5e1e31eea080db033396.1550071778.git.robin.murphy@arm.com
Fixes: 1c6fb1d89e73 ("mm: print more information about mapping in __dump_page")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Tetsuo has reported that creating a thousands of processes sharing MM
without SIGHAND (aka alien threads) and setting
/proc/<pid>/oom_score_adj will swamp the kernel log and takes ages [1]
to finish. This is especially worrisome that all that printing is done
under RCU lock and this can potentially trigger RCU stall or softlockup
detector.
The primary reason for the printk was to catch potential users who might
depend on the behavior prior to 44a70adec910 ("mm, oom_adj: make sure
processes sharing mm have same view of oom_score_adj") but after more
than 2 years without a single report I guess it is safe to simply remove
the printk altogether.
The next step should be moving oom_score_adj over to the mm struct and
remove all the tasks crawling as suggested by [2]
[1] http://lkml.kernel.org/r/97fce864-6f75-bca5-14bc-12c9f890e740@i-love.sakura.ne.jp
[2] http://lkml.kernel.org/r/20190117155159.GA4087@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20190212102129.26288-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Enabling SLUB_DEBUG's SLAB_CONSISTENCY_CHECKS with KASAN_SW_TAGS
triggers endless false positives during boot below due to
check_valid_pointer() checks tagged pointers which have no addresses
that is valid within slab pages:
BUG radix_tree_node (Tainted: G B ): Freelist Pointer check fails
-----------------------------------------------------------------------------
INFO: Slab objects=69 used=69 fp=0x (null) flags=0x7ffffffc000200
INFO: Object @offset=15060037153926966016 fp=0x
Redzone: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 18 6b 06 00 08 80 ff d0 .........k......
Object : 18 6b 06 00 08 80 ff d0 00 00 00 00 00 00 00 00 .k..............
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Object : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Redzone: bb bb bb bb bb bb bb bb ........
Padding: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B 5.0.0-rc5+ #18
Call trace:
dump_backtrace+0x0/0x450
show_stack+0x20/0x2c
__dump_stack+0x20/0x28
dump_stack+0xa0/0xfc
print_trailer+0x1bc/0x1d0
object_err+0x40/0x50
alloc_debug_processing+0xf0/0x19c
___slab_alloc+0x554/0x704
kmem_cache_alloc+0x2f8/0x440
radix_tree_node_alloc+0x90/0x2fc
idr_get_free+0x1e8/0x6d0
idr_alloc_u32+0x11c/0x2a4
idr_alloc+0x74/0xe0
worker_pool_assign_id+0x5c/0xbc
workqueue_init_early+0x49c/0xd50
start_kernel+0x52c/0xac4
FIX radix_tree_node: Marking all objects used
Link: http://lkml.kernel.org/r/20190209044128.3290-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When CONFIG_KASAN_SW_TAGS is enabled, ptr_addr might be tagged. Normally,
this doesn't cause any issues, as both set_freepointer() and
get_freepointer() are called with a pointer with the same tag. However,
there are some issues with CONFIG_SLUB_DEBUG code. For example, when
__free_slub() iterates over objects in a cache, it passes untagged
pointers to check_object(). check_object() in turns calls
get_freepointer() with an untagged pointer, which causes the freepointer
to be restored incorrectly.
Add kasan_reset_tag to freelist_ptr(). Also add a detailed comment.
Link: http://lkml.kernel.org/r/bf858f26ef32eb7bd24c665755b3aee4bc58d0e4.1550103861.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
CONFIG_SLAB_FREELIST_HARDENED hashes freelist pointer with the address of
the object where the pointer gets stored. With tag based KASAN we don't
account for that when building freelist, as we call set_freepointer() with
the first argument untagged. This patch changes the code to properly
propagate tags throughout the loop.
Link: http://lkml.kernel.org/r/3df171559c52201376f246bf7ce3184fe21c1dc7.1549921721.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With tag based KASAN page_address() looks at the page flags to see whether
the resulting pointer needs to have a tag set. Since we don't want to set
a tag when page_address() is called on SLAB pages, we call
page_kasan_tag_reset() in kasan_poison_slab(). However in allocate_slab()
page_address() is called before kasan_poison_slab(). Fix it by changing
the order.
[andreyknvl@google.com: fix compilation error when CONFIG_SLUB_DEBUG=n]
Link: http://lkml.kernel.org/r/ac27cc0bbaeb414ed77bcd6671a877cf3546d56e.1550066133.git.andreyknvl@google.com
Link: http://lkml.kernel.org/r/cd895d627465a3f1c712647072d17f10883be2a1.1549921721.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
kmemleak keeps two global variables, min_addr and max_addr, which store
the range of valid (encountered by kmemleak) pointer values, which it
later uses to speed up pointer lookup when scanning blocks.
With tagged pointers this range will get bigger than it needs to be. This
patch makes kmemleak untag pointers before saving them to min_addr and
max_addr and when performing a lookup.
Link: http://lkml.kernel.org/r/16e887d442986ab87fe87a755815ad92fa431a5f.1550066133.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Qian Cai <cai@lca.pw>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Right now we call kmemleak hooks before assigning tags to pointers in
KASAN hooks. As a result, when an objects gets allocated, kmemleak sees a
differently tagged pointer, compared to the one it sees when the object
gets freed. Fix it by calling KASAN hooks before kmemleak's ones.
Link: http://lkml.kernel.org/r/cd825aa4897b0fc37d3316838993881daccbe9f5.1549921721.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Qian Cai <cai@lca.pw>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When an object is kmalloc()'ed, two hooks are called: kasan_slab_alloc()
and kasan_kmalloc(). Right now we assign a tag twice, once in each of the
hooks. Fix it by assigning a tag only in the former hook.
Link: http://lkml.kernel.org/r/ce8c6431da735aa7ec051fd6497153df690eb021.1549921721.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgeniy Stepanov <eugenis@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The system call, get_mempolicy() [1], passes an unsigned long *nodemask
pointer and an unsigned long maxnode argument which specifies the length
of the user's nodemask array in bits (which is rounded up). The manual
page says that if the maxnode value is too small, get_mempolicy will
return EINVAL but there is no system call to return this minimum value.
To determine this value, some programs search /proc/<pid>/status for a
line starting with "Mems_allowed:" and use the number of digits in the
mask to determine the minimum value. A recent change to the way this line
is formatted [2] causes these programs to compute a value less than
MAX_NUMNODES so get_mempolicy() returns EINVAL.
Change get_mempolicy(), the older compat version of get_mempolicy(), and
the copy_nodes_to_user() function to use nr_node_ids instead of
MAX_NUMNODES, thus preserving the defacto method of computing the minimum
size for the nodemask array and the maxnode argument.
[1] http://man7.org/linux/man-pages/man2/get_mempolicy.2.html
[2] https://lore.kernel.org/lkml/1545405631-6808-1-git-send-email-longman@redhat.com
Link: http://lkml.kernel.org/r/20190211180245.22295-1-rcampbell@nvidia.com
Fixes: 4fb8e5b89bcbbbb ("include/linux/nodemask.h: use nr_node_ids (not MAX_NUMNODES) in __nodemask_pr_numnodes()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Waiman Long <longman@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Revert ff1522bb7d9845 ("initramfs: cleanup incomplete rootfs").
Andy reports
: This breaks my setup where I have U-boot provided more size of initramfs
: than needed. This allows a bit of flexibility to increase or decrease
: initramfs compressed image without taking care of bootloader. The proper
: solution is to do this if we sure that we didn't get enough memory,
: otherwise I can't consider the error fatal to clean up rootfs.
Fixes: ff1522bb7d9845 ("initramfs: cleanup incomplete rootfs")
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: David Engraf <david.engraf@sysgo.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next
Johan writes:
USB-serial fixes for 5.1-rc1
Here are a couple of new device ids for 5.1-rc1.
All have been in linux-next with no reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.1-rc1-2' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
USB: serial: cp210x: add ID for Ingenico 3070
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next
Johan writes:
USB-serial updates for 5.1-rc1
Here are the USB-serial updates for 5.1-rc1, including:
- support for the last three GPIOs on cp2102n devices
- gpio support for cp2104
- support for using cp210x gpios with autosuspend
- proper error handling for unsupported cp2105 line speeds
Included is also a new modem device id.
All have been in linux-next with no reported issues.
Signed-off-by: Johan Hovold <johan@kernel.org>
* tag 'usb-serial-5.1-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit ME910 ECM composition
USB: serial: cp210x: fix GPIO in autosuspend
USB: serial: cp210x: add minimum baud rate for CP2105 SCI
USB: serial: cp210x: add GPIO support for CP2104
USB: serial: cp210x: support all gpios on CP2102N QFN28 package
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ash/stm into char-misc-next
Alexander writes:
stm class/intel_th: Updates for v5.1
These are:
* 2 bugfixes in stm class
* one bugfix in intel_th
* a few minor cleanups
* tag 'intel_th-stm-for-greg-20190221' of git://git.kernel.org/pub/scm/linux/kernel/git/ash/stm:
stm class: Prevent division by zero
stm class: Fix an endless loop in channel allocation
intel_th: Don't reference unassigned outputs
intel_th: pti: Use sysfs_match_string() helper
intel_th: Only create useful device nodes
intel_th: Mark expected switch fall-throughs
intel_th: Update ABI documentation
|
|
qeueue/queue
Signed-off-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
A previous commit removed the initialization of variable 'error' to zero,
and can cause a bogus error return. This occurs when error contains a
non-zero garbage value and the call to xchk_should_terminate detects a
pending fatal signal and checks for a zero error before setting it
to -EAGAIN. Fix the issue by initializing error to zero.
Fixes: b9454fe056bd ("xfs: clean up the inode cluster checking in the inobt scrub")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Add a mode where XFS never overwrites existing blocks in place. This
is to aid debugging our COW code, and also put infatructure in place
for things like possible future support for zoned block devices, which
can't support overwrites.
This mode is enabled globally by doing a:
echo 1 > /sys/fs/xfs/debug/always_cow
Note that the parameter is global to allow running all tests in xfstests
easily in this mode, which would not easily be possible with a per-fs
sysfs file.
In always_cow mode persistent preallocations are disabled, and fallocate
will fail when called with a 0 mode (with our without
FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space
when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE.
There are a few interesting xfstests failures when run in always_cow
mode:
- generic/392 fails because the bytes used in the file used to test
hole punch recovery are less after the log replay. This is
because the blocks written and then punched out are only freed
with a delay due to the logging mechanism.
- xfs/170 will fail as the already fragile file streams mechanism
doesn't seem to interact well with the COW allocator
- xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim
the file system is badly fragmented, but there is not much we
can do to avoid that when always writing out of place
- xfs/205 fails because overwriting a file in always_cow mode
will require new space allocation and the assumption in the
test thus don't work anymore.
- xfs/326 fails to modify the file at all in always_cow mode after
injecting the refcount error, leading to an unexpected md5sum
after the remount, but that again is expected
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
No user of it in the iomap code at the moment, but we should not
actively report wrong information if we can trivially get it right.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
If we have racing buffered and direct I/O COW fork extents under
writeback can have been moved to the data fork by the time we call
xfs_reflink_convert_cow from xfs_submit_ioend. This would be mostly
harmless as the block numbers don't change by this move, except for
the fact that xfs_bmapi_write will crash or trigger asserts when
not finding existing extents, even despite trying to paper over this
with the XFS_BMAPI_CONVERT_ONLY flag.
Instead of special casing non-transaction conversions in the already
way too complicated xfs_bmapi_write just add a new helper for the much
simpler non-transactional COW fork case, which simplify ignores not
found extents.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Besides simplifying the code a bit this allows to actually implement
the behavior of using COW preallocation for non-COW data mentioned
in the current comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
This only matters if we want to write data through the COW fork that is
not actually an overwrite of existing data. Reasons for that are
speculative COW fork allocations using the cowextsize, or a mode where
we always write through the COW fork. Currently both can't actually
happen, but I plan to enable them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
While using delalloc for extsize hints is generally a good idea, the
current code that does so only for COW doesn't help us much and creates
a lot of special cases. Switch it to use real allocations like we
do for direct I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
We speculatively allocate extents in the COW fork to reduce
fragmentation. But when we write data into such COW fork blocks that
do now shadow an allocation in the data fork SEEK_DATA will not
correctly report it, as it only looks at the data fork extents.
The only reason why that hasn't been an issue so far is because
we even use these speculative COW fork preallocations over holes in
the data fork at all for buffered writes, and blocks in the COW
fork that are written by direct writes are moved into the data
fork immediately at I/O completion time.
Add a new set of iomap_ops for SEEK_HOLE/SEEK_DATA which looks into
both the COW and data fork, and reports all COW extents as unwritten
to the iomap layer. While this isn't strictly true for COW fork
extents that were already converted to real extents, the practical
semantics that you can't read data from them until they are moved
into the data fork are very similar, and this will force the iomap
layer into probing the extents for actually present data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Move checking for invalid zero blocks and setting of various iomap flags
into this helper. Also make it deal with "raw" delalloc extents to
avoid clutter in the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Using STP_POLICY_ID_SET ioctl command with dummy_stm device, or any STM
device that supplies zero mmio channel size, will trigger a division by
zero bug in the kernel.
Prevent this by disallowing channel widths other than 1 for such devices.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: 7bd1d4093c2f ("stm class: Introduce an abstraction for System Trace Module devices")
CC: stable@vger.kernel.org # v4.4+
|
|
There is a bug in the channel allocation logic that leads to an endless
loop when looking for a contiguous range of channels in a range with a
mixture of free and occupied channels. For example, opening three
consequtive channels, closing the first two and requesting 4 channels in
a row will trigger this soft lockup. The bug is that the search loop
forgets to skip over the range once it detects that one channel in that
range is occupied.
Restore the original intent to the logic by fixing the omission.
Signed-off-by: Zhi Jin <zhi.jin@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: 7bd1d4093c2f ("stm class: Introduce an abstraction for System Trace Module devices")
CC: stable@vger.kernel.org # v4.4+
|
|
When an output port driver is removed, also remove references to it from
any masters. Failing to do this causes a NULL ptr dereference when
configuring another output port:
> BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
> RIP: 0010:master_attr_store+0x9d/0x160 [intel_th_gth]
> Call Trace:
> dev_attr_store+0x1b/0x30
> sysfs_kf_write+0x3c/0x50
> kernfs_fop_write+0x125/0x1a0
> __vfs_write+0x3a/0x190
> ? __vfs_write+0x5/0x190
> ? _cond_resched+0x1a/0x50
> ? rcu_all_qs+0x5/0xb0
> ? __vfs_write+0x5/0x190
> vfs_write+0xb8/0x1b0
> ksys_write+0x55/0xc0
> __x64_sys_write+0x1a/0x20
> do_syscall_64+0x5a/0x140
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Fixes: b27a6a3f97b9 ("intel_th: Add Global Trace Hub driver")
CC: stable@vger.kernel.org # v4.4+
Reported-by: Ammy Yi <ammy.yi@intel.com>
|
|
Deprecate cpsw-phy-sel driver as it's been replaced with new
TI phy-gmii-sel PHY driver.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
This reverts commit e2ce3674883ecba2605370404208c9d4a07ae1c3.
It turns out that the sock destructor xsk_destruct was needed after
all. The cleanup simplification broke the skb transmit cleanup path,
due to that the umem was prematurely destroyed.
The umem cannot be destroyed until all outstanding skbs are freed,
which means that we cannot remove the umem until the sk_destruct has
been called.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The DCSS range is currently printed with %p, which results in hashed values
instead of the actual addresses.
Use %px instead, the DCSS ranges do not reveal any kernel symbol addresses.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|