summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-25block: move the part_stat* helpers from genhd.h to a new headerChristoph Hellwig
These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: move block layer internals out of include/linux/genhd.hChristoph Hellwig
None of this needs to be exposed to drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: move guard_bio_eod to bio.cChristoph Hellwig
This is bio layer functionality and not related to buffer heads. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: unexport get_gendiskChristoph Hellwig
get_gendisk is not used by any modular code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: unexport disk_map_sector_rcuChristoph Hellwig
disk_map_sector_rcu is not used by any modular code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: unexport disk_get_partChristoph Hellwig
disk_get_part is not used by any modular code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: mark part_in_flight and part_in_flight_rw staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: mark block_depr staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: factor out requeue handling from dispatch codeJohannes Thumshirn
Factor out the requeue handling from the dispatch code, this will make subsequent addition of different requeueing schemes easier. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25btrfs: fix missing semaphore unlock in btrfs_sync_fileRobbie Ko
Ordered ops are started twice in sync file, once outside of inode mutex and once inside, taking the dio semaphore. There was one error path missing the semaphore unlock. Fixes: aab15e8ec2576 ("Btrfs: fix rare chances for data loss when doing a fast fsync") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> [ add changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-25btrfs: use nofs allocations for running delayed itemsJosef Bacik
Zygo reported the following lockdep splat while testing the balance patches ====================================================== WARNING: possible circular locking dependency detected 5.6.0-c6f0579d496a+ #53 Not tainted ------------------------------------------------------ kswapd0/1133 is trying to acquire lock: ffff888092f622c0 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node+0x7c/0x5b0 but task is already holding lock: ffffffff8fc5f860 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}: fs_reclaim_acquire.part.91+0x29/0x30 fs_reclaim_acquire+0x19/0x20 kmem_cache_alloc_trace+0x32/0x740 add_block_entry+0x45/0x260 btrfs_ref_tree_mod+0x6e2/0x8b0 btrfs_alloc_tree_block+0x789/0x880 alloc_tree_block_no_bg_flush+0xc6/0xf0 __btrfs_cow_block+0x270/0x940 btrfs_cow_block+0x1ba/0x3a0 btrfs_search_slot+0x999/0x1030 btrfs_insert_empty_items+0x81/0xe0 btrfs_insert_delayed_items+0x128/0x7d0 __btrfs_run_delayed_items+0xf4/0x2a0 btrfs_run_delayed_items+0x13/0x20 btrfs_commit_transaction+0x5cc/0x1390 insert_balance_item.isra.39+0x6b2/0x6e0 btrfs_balance+0x72d/0x18d0 btrfs_ioctl_balance+0x3de/0x4c0 btrfs_ioctl+0x30ab/0x44a0 ksys_ioctl+0xa1/0xe0 __x64_sys_ioctl+0x43/0x50 do_syscall_64+0x77/0x2c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&delayed_node->mutex){+.+.}: __lock_acquire+0x197e/0x2550 lock_acquire+0x103/0x220 __mutex_lock+0x13d/0xce0 mutex_lock_nested+0x1b/0x20 __btrfs_release_delayed_node+0x7c/0x5b0 btrfs_remove_delayed_node+0x49/0x50 btrfs_evict_inode+0x6fc/0x900 evict+0x19a/0x2c0 dispose_list+0xa0/0xe0 prune_icache_sb+0xbd/0xf0 super_cache_scan+0x1b5/0x250 do_shrink_slab+0x1f6/0x530 shrink_slab+0x32e/0x410 shrink_node+0x2a5/0xba0 balance_pgdat+0x4bd/0x8a0 kswapd+0x35a/0x800 kthread+0x1e9/0x210 ret_from_fork+0x3a/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(&delayed_node->mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/1133: #0: ffffffff8fc5f860 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffff8fc380d8 (shrinker_rwsem){++++}, at: shrink_slab+0x1e8/0x410 #2: ffff8881e0e6c0e8 (&type->s_umount_key#42){++++}, at: trylock_super+0x1b/0x70 stack backtrace: CPU: 2 PID: 1133 Comm: kswapd0 Not tainted 5.6.0-c6f0579d496a+ #53 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0xc1/0x11a print_circular_bug.isra.38.cold.57+0x145/0x14a check_noncircular+0x2a9/0x2f0 ? print_circular_bug.isra.38+0x130/0x130 ? stack_trace_consume_entry+0x90/0x90 ? save_trace+0x3cc/0x420 __lock_acquire+0x197e/0x2550 ? btrfs_inode_clear_file_extent_range+0x9b/0xb0 ? register_lock_class+0x960/0x960 lock_acquire+0x103/0x220 ? __btrfs_release_delayed_node+0x7c/0x5b0 __mutex_lock+0x13d/0xce0 ? __btrfs_release_delayed_node+0x7c/0x5b0 ? __asan_loadN+0xf/0x20 ? pvclock_clocksource_read+0xeb/0x190 ? __btrfs_release_delayed_node+0x7c/0x5b0 ? mutex_lock_io_nested+0xc20/0xc20 ? __kasan_check_read+0x11/0x20 ? check_chain_key+0x1e6/0x2e0 mutex_lock_nested+0x1b/0x20 ? mutex_lock_nested+0x1b/0x20 __btrfs_release_delayed_node+0x7c/0x5b0 btrfs_remove_delayed_node+0x49/0x50 btrfs_evict_inode+0x6fc/0x900 ? btrfs_setattr+0x840/0x840 ? do_raw_spin_unlock+0xa8/0x140 evict+0x19a/0x2c0 dispose_list+0xa0/0xe0 prune_icache_sb+0xbd/0xf0 ? invalidate_inodes+0x310/0x310 super_cache_scan+0x1b5/0x250 do_shrink_slab+0x1f6/0x530 shrink_slab+0x32e/0x410 ? do_shrink_slab+0x530/0x530 ? do_shrink_slab+0x530/0x530 ? __kasan_check_read+0x11/0x20 ? mem_cgroup_protected+0x13d/0x260 shrink_node+0x2a5/0xba0 balance_pgdat+0x4bd/0x8a0 ? mem_cgroup_shrink_node+0x490/0x490 ? _raw_spin_unlock_irq+0x27/0x40 ? finish_task_switch+0xce/0x390 ? rcu_read_lock_bh_held+0xb0/0xb0 kswapd+0x35a/0x800 ? _raw_spin_unlock_irqrestore+0x4c/0x60 ? balance_pgdat+0x8a0/0x8a0 ? finish_wait+0x110/0x110 ? __kasan_check_read+0x11/0x20 ? __kthread_parkme+0xc6/0xe0 ? balance_pgdat+0x8a0/0x8a0 kthread+0x1e9/0x210 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50 This is because we hold that delayed node's mutex while doing tree operations. Fix this by just wrapping the searches in nofs. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-25MIPS: Exclude more dsemul code when CONFIG_MIPS_FP_SUPPORT=nYousong Zhou
This furthers what commit 42b10815d559 ("MIPS: Don't compile math-emu when CONFIG_MIPS_FP_SUPPORT=n") has done Signed-off-by: Yousong Zhou <yszhou4tech@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-25MIPS/tlbex: Fix LDDIR usage in setup_pw() for Loongson-3Huacai Chen
LDDIR/LDPTE is Loongson-3's acceleration for Page Table Walking. If BD (Base Directory, the 4th page directory) is not enabled, then GDOffset is biased by BadVAddr[63:62]. So, if GDOffset (aka. BadVAddr[47:36] for Loongson-3) is big enough, "0b11(BadVAddr[63:62])|BadVAddr[47:36]|...." can far beyond pg_swapper_dir. This means the pg_swapper_dir may NOT be accessed by LDDIR correctly, so fix it by set PWDirExt in CP0_PWCtl. Cc: <stable@vger.kernel.org> Signed-off-by: Pei Huang <huangpei@loongson.cn> Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-25MIPS: do not compile generic functions for CONFIG_CAVIUM_OCTEON_SOCMasahiro Yamada
MIPS provides multiple definitions for the following functions: fw_init_cmdline __delay __udelay __ndelay memmove __rmemcpy memcpy __copy_user The generic ones are defined in lib-y objects, which are overridden by the Octeon ones when CONFIG_CAVIUM_OCTEON_SOC is enabled. The use of EXPORT_SYMBOL in static libraries potentially causes a problem for the llvm linker [1]. So, I want to forcibly link lib-y objects to vmlinux when CONFIG_MODULES=y. As a groundwork, we must fix multiple definitions that have previously been hidden by lib-y. If you look at lib/string.c, arch can define __HAVE_ARCH_* to opt out the generic implementation. Similarly, this commit adds CONFIG_HAVE_PLAT_* to allow a platform to opt out the MIPS generic code. [1]: https://github.com/ClangBuiltLinux/linux/issues/515 Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-25MAINTAINERS: Update Loongson64 entryJiaxun Yang
To include newly added irqchip drivers. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-25MIPS: Loongson64: Load built-in dtbsJiaxun Yang
Load proper dtb according to firmware passed parameters and CPU PRID. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Co-developed-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-25MIPS: Loongson64: Add generic dtsJiaxun Yang
Add generic device dts for Loongson-3 devices. They are currently almost identical but will be different later. Some PCH devices like PCI Host Bridge is still enabled by platform code for now. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Co-developed-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-25dt-bindings: mips: Add loongson boardsJiaxun Yang
Prepare for later dts. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Rob Herring <robh@kernel.org> Co-developed-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-25MIPS: Loongson64: Drop legacy IRQ codeJiaxun Yang
We've made generic irqchip drivers for Loongson-3 platform, it's time to say goodbye to these legacy code. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Co-developed-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-03-25Merged 'Infrastructure to allow fixing exec deadlocks' from Bernd EdlingerEric W. Biederman
This is an infrastructure change that makes way for fixing this issue. Each patch was already posted previously so this is just a cleanup of the original mailing list thread(s) which got out of control by now. Everything started here: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ I added reviewed-by tags from the mailing list threads, except when withdrawn. It took a lot longer than expected to collect everything from the mailinglist threads, since several commit messages have been infected with typos, and they got fixed without a new patch version. - Correct the point of no return. - Add two new mutexes to replace cred_guard_mutex. - Fix each use of cred_guard_mutex. - Update documentation. - Add a test case. -- EWB Removed the last 2 patches they need more work Bernd Edlinger (9): exec: Fix a deadlock in strace selftests/ptrace: add test cases for dead-locks mm: docs: Fix a comment in process_vm_rw_core kernel: doc: remove outdated comment cred.c kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve proc: Use new infrastructure to fix deadlocks in execve proc: io_accounting: Use new infrastructure to fix deadlocks in execve perf: Use new infrastructure to fix deadlocks in execve pidfd: Use new infrastructure to fix deadlocks in execve Eric W. Biederman (5): exec: Only compute current once in flush_old_exec exec: Factor unshare_sighand out of de_thread and call it separately exec: Move cleanup of posix timers on exec out of de_thread exec: Move exec_mmap right after de_thread in flush_old_exec exec: Add exec_update_mutex to replace cred_guard_mutex fs/exec.c | 78 +++++++++++++++++++--------- fs/proc/base.c | 10 ++-- include/linux/binfmts.h | 8 ++- include/linux/sched/signal.h | 9 +++- init/init_task.c | 1 + kernel/cred.c | 2 - kernel/events/core.c | 12 ++--- kernel/fork.c | 5 +- kernel/kcmp.c | 8 +-- kernel/pid.c | 4 +- mm/process_vm_access.c | 2 +- tools/testing/selftests/ptrace/Makefile | 4 +- tools/testing/selftests/ptrace/vmaccess.c | 86 +++++++++++++++++++++++++++++++ 13 files changed, 179 insertions(+), 50 deletions(-) Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-03-25pidfd: Use new infrastructure to fix deadlocks in execveBernd Edlinger
This changes __pidfd_fget to use the new exec_update_mutex instead of cred_guard_mutex. This should be safe, as the credentials do not change before exec_update_mutex is locked. Therefore whatever file access is possible with holding the cred_guard_mutex here is also possbile with the exec_update_mutex. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25perf: Use new infrastructure to fix deadlocks in execveBernd Edlinger
This changes perf_event_set_clock to use the new exec_update_mutex instead of cred_guard_mutex. This should be safe, as the credentials are only used for reading. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25proc: io_accounting: Use new infrastructure to fix deadlocks in execveBernd Edlinger
This changes do_io_accounting to use the new exec_update_mutex instead of cred_guard_mutex. This fixes possible deadlocks when the trace is accessing /proc/$pid/io for instance. This should be safe, as the credentials are only used for reading. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25proc: Use new infrastructure to fix deadlocks in execveBernd Edlinger
This changes lock_trace to use the new exec_update_mutex instead of cred_guard_mutex. This fixes possible deadlocks when the trace is accessing /proc/$pid/stack for instance. This should be safe, as the credentials are only used for reading, and task->mm is updated on execve under the new exec_update_mutex. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25kernel/kcmp.c: Use new infrastructure to fix deadlocks in execveBernd Edlinger
This changes kcmp_epoll_target to use the new exec_update_mutex instead of cred_guard_mutex. This should be safe, as the credentials are only used for reading, and furthermore ->mm and ->sighand are updated on execve, but only under the new exec_update_mutex. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25kernel: doc: remove outdated comment cred.cBernd Edlinger
This removes an outdated comment in prepare_kernel_cred. There is no "cred_replace_mutex" any more, so the comment must go away. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25mm: docs: Fix a comment in process_vm_rw_coreBernd Edlinger
This removes a duplicate "a" in the comment in process_vm_rw_core. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25selftests/ptrace: add test cases for dead-locksBernd Edlinger
This adds test cases for ptrace deadlocks. Additionally fixes a compile problem in get_syscall_info.c, observed with gcc-4.8.4: get_syscall_info.c: In function 'get_syscall_info': get_syscall_info.c:93:3: error: 'for' loop initial declarations are only allowed in C99 mode for (unsigned int i = 0; i < ARRAY_SIZE(args); ++i) { ^ get_syscall_info.c:93:3: note: use option -std=c99 or -std=gnu99 to compile your code Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25exec: Fix a deadlock in straceBernd Edlinger
This fixes a deadlock in the tracer when tracing a multi-threaded application that calls execve while more than one thread are running. I observed that when running strace on the gcc test suite, it always blocks after a while, when expect calls execve, because other threads have to be terminated. They send ptrace events, but the strace is no longer able to respond, since it is blocked in vm_access. The deadlock is always happening when strace needs to access the tracees process mmap, while another thread in the tracee starts to execve a child process, but that cannot continue until the PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received: strace D 0 30614 30584 0x00000000 Call Trace: __schedule+0x3ce/0x6e0 schedule+0x5c/0xd0 schedule_preempt_disabled+0x15/0x20 __mutex_lock.isra.13+0x1ec/0x520 __mutex_lock_killable_slowpath+0x13/0x20 mutex_lock_killable+0x28/0x30 mm_access+0x27/0xa0 process_vm_rw_core.isra.3+0xff/0x550 process_vm_rw+0xdd/0xf0 __x64_sys_process_vm_readv+0x31/0x40 do_syscall_64+0x64/0x220 entry_SYSCALL_64_after_hwframe+0x44/0xa9 expect D 0 31933 30876 0x80004003 Call Trace: __schedule+0x3ce/0x6e0 schedule+0x5c/0xd0 flush_old_exec+0xc4/0x770 load_elf_binary+0x35a/0x16c0 search_binary_handler+0x97/0x1d0 __do_execve_file.isra.40+0x5d4/0x8a0 __x64_sys_execve+0x49/0x60 do_syscall_64+0x64/0x220 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This changes mm_access to use the new exec_update_mutex instead of cred_guard_mutex. This patch is based on the following patch by Eric W. Biederman: "[PATCH 0/5] Infrastructure to allow fixing exec deadlocks" Link: https://lore.kernel.org/lkml/87v9ne5y4y.fsf_-_@x220.int.ebiederm.org/ Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25exec: Add exec_update_mutex to replace cred_guard_mutexEric W. Biederman
The cred_guard_mutex is problematic as it is held over possibly indefinite waits for userspace. The possible indefinite waits for userspace that I have identified are: The cred_guard_mutex is held in PTRACE_EVENT_EXIT waiting for the tracer. The cred_guard_mutex is held over "put_user(0, tsk->clear_child_tid)" in exit_mm(). The cred_guard_mutex is held over "get_user(futex_offset, ...") in exit_robust_list. The cred_guard_mutex held over copy_strings. The functions get_user and put_user can trigger a page fault which can potentially wait indefinitely in the case of userfaultfd or if userspace implements part of the page fault path. In any of those cases the userspace process that the kernel is waiting for might make a different system call that winds up taking the cred_guard_mutex and result in deadlock. Holding a mutex over any of those possibly indefinite waits for userspace does not appear necessary. Add exec_update_mutex that will just cover updating the process during exec where the permissions and the objects pointed to by the task struct may be out of sync. The plan is to switch the users of cred_guard_mutex to exec_update_mutex one by one. This lets us move forward while still being careful and not introducing any regressions. Link: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ Link: https://lore.kernel.org/lkml/AM6PR03MB5170B06F3A2B75EFB98D071AE4E60@AM6PR03MB5170.eurprd03.prod.outlook.com/ Link: https://lore.kernel.org/linux-fsdevel/20161102181806.GB1112@redhat.com/ Link: https://lore.kernel.org/lkml/20160923095031.GA14923@redhat.com/ Link: https://lore.kernel.org/lkml/20170213141452.GA30203@redhat.com/ Ref: 45c1a159b85b ("Add PTRACE_O_TRACEVFORKDONE and PTRACE_O_TRACEEXIT facilities.") Ref: 456f17cd1a28 ("[PATCH] user-vm-unlock-2.5.31-A2") Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25exec: Move exec_mmap right after de_thread in flush_old_execEric W. Biederman
I have read through the code in exec_mmap and I do not see anything that depends on sighand or the sighand lock, or on signals in anyway so this should be safe. This rearrangement of code has two significant benefits. It makes the determination of passing the point of no return by testing bprm->mm accurate. All failures prior to that point in flush_old_exec are either truly recoverable or they are fatal. Further this consolidates all of the possible indefinite waits for userspace together at the top of flush_old_exec. The possible wait for a ptracer on PTRACE_EVENT_EXIT, the possible wait for a page fault to be resolved in clear_child_tid, and the possible wait for a page fault in exit_robust_list. This consolidation allows the creation of a mutex to replace cred_guard_mutex that is not held over possible indefinite userspace waits. Which will allow removing deadlock scenarios from the kernel. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25exec: Move cleanup of posix timers on exec out of de_threadEric W. Biederman
These functions have very little to do with de_thread move them out of de_thread an into flush_old_exec proper so it can be more clearly seen what flush_old_exec is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25exec: Factor unshare_sighand out of de_thread and call it separatelyEric W. Biederman
This makes the code clearer and makes it easier to implement a mutex that is not taken over any locations that may block indefinitely waiting for userspace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25exec: Only compute current once in flush_old_execEric W. Biederman
Make it clear that current only needs to be computed once in flush_old_exec. This may have some efficiency improvements and it makes the code easier to change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-25Bluetooth: don't assume key size is 16 when the command failsAlain Michaud
With this change, the encryption key size is not assumed to be 16 if the read_encryption_key_size command fails for any reason. This ensures that if the controller fails the command for any reason that the encryption key size isn't implicitely set to 16 and instead take a more concervative posture to assume it is 0. Signed-off-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-03-25block/diskstats: replace time_in_queue with sum of request timesKonstantin Khlebnikov
Column "time_in_queue" in diskstats is supposed to show total waiting time of all requests. I.e. value should be equal to the sum of times from other columns. But this is not true, because column "time_in_queue" is counted separately in jiffies rather than in nanoseconds as other times. This patch removes redundant counter for "time_in_queue" and shows total time of read, write, discard and flush requests. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block/diskstats: accumulate all per-cpu counters in one passKonstantin Khlebnikov
Reading /proc/diskstats iterates over all cpus for summing each field. It's faster to sum all fields in one pass. Hammering /proc/diskstats with fio shows 2x performance improvement: fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \ --size=1k --bs=1k --fallocate=none --create_on_open=1 \ --time_based=1 --runtime=10 --invalidate=0 --group_report JOBS=1 JOBS=10 Before: 7k iops 64k iops After: 18k iops 120k iops Also this way code is more compact: add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346) Function old new delta part_stat_read_all - 194 +194 diskstats_show 1344 631 -713 part_stat_show 1219 392 -827 Total: Before=14966947, After=14965601, chg -0.01% Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block/diskstats: more accurate approximation of io_ticks for slow disksKonstantin Khlebnikov
Currently io_ticks is approximated by adding one at each start and end of requests if jiffies counter has changed. This works perfectly for requests shorter than a jiffy or if one of requests starts/ends at each jiffy. If disk executes just one request at a time and they are longer than two jiffies then only first and last jiffies will be accounted. Fix is simple: at the end of request add up into io_ticks jiffies passed since last update rather than just one jiffy. Example: common HDD executes random read 4k requests around 12ms. fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 & iostat -x 10 sdb Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch: Before: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43 After: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99 Now io_ticks does not loose time between start and end of requests, but for queue-depth > 1 some I/O time between adjacent starts might be lost. For load estimation "%util" is not as useful as average queue length, but it clearly shows how often disk queue is completely empty. Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25Merge branch 'x86/cpu' into perf/core, to resolve conflictIngo Molnar
Conflicts: arch/x86/events/intel/uncore.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-25soc: samsung: chipid: Fix return value on non-Exynos platformsMarek Szyprowski
Correct the probe return value to -ENODEV on non-Exynos platforms. Link: https://lore.kernel.org/r/20200316175652.5604-4-krzk@kernel.org Fixes: 02fb29882d5c ("soc: samsung: chipid: Drop "syscon" compatible requirement") Cc: <stable@vger.kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-03-25Merge tag 'tee-amdtee-fix2-for-5.6' of ↵Arnd Bergmann
https://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes tee: amdtee: out of bounds read in find_session() * tag 'tee-amdtee-fix2-for-5.6' of https://git.linaro.org/people/jens.wiklander/linux-tee: tee: amdtee: out of bounds read in find_session() Link: https://lore.kernel.org/r/20200320063446.GA9892@jade Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-03-25Merge tag 'oxnas-arm-soc-dt-fixes-for-5.6' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/narmstrong/linux-oxnas into arm/fixes - interrupt controller mask init fix to avoid spurious irq after soft reset * tag 'oxnas-arm-soc-dt-fixes-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/narmstrong/linux-oxnas: ARM: dts: oxnas: Fix clear-mask property Link: https://lore.kernel.org/r/1cc83d6b-78de-9160-59bf-77cfe726a365@baylibre.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-03-25Merge tag 'arm-soc/for-5.6/devicetree-fixes-part2' of ↵Arnd Bergmann
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoCs Device Tree fixes for 5.6, please pull the following: - Nick fixes the missing pinctrl-names property for the Raspberry Pi Zero Wireless DTS - Nicolas fixes the VC4 firmware node dma-range property which does not have the limitations of the soc's bus node * tag 'arm-soc/for-5.6/devicetree-fixes-part2' of https://github.com/Broadcom/stblinux: ARM: dts: bcm283x: Fix vc4's firmware bus DMA limitations ARM: bcm2835-rpi-zero-w: Add missing pinctrl name Link: https://lore.kernel.org/r/20200323025246.22713-1-f.fainelli@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-03-25arm64: dts: Fix leftover entry-methods for PSCILinus Walleij
These two device trees were either missed or added after the commit correcting the "entry-method" from "arm,psci" to just "psci" as per the binding. Link: https://lore.kernel.org/r/20200322115846.16265-1-linus.walleij@linaro.org Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Fabio Estevam <festevam@gmail.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Chunyan Zhang <chunyan.zhang@unisoc.com> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-03-25Merge tag 'omap-for-v5.6/fixes-rc6-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Few more fixes for omaps Just few dts fixes: - A fix droid4 touchscreen stopping working with lost gpio interrupts - Also limit omap5 dma range similar to what we've recently done for dra7 * tag 'omap-for-v5.6/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: omap5: Add bus_dma_limit for L3 bus ARM: dts: omap4-droid4: Fix lost touchscreen interrupts ARM: dts: dra7: Add bus_dma_limit for L3 bus ARM: dts: N900: fix onenand timings ARM: dts: Fix dm814x Ethernet by changing to use rgmii-id mode Link: https://lore.kernel.org/r/pull-1584575254-461940@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-03-25ARM: dts: exynos: Fix regulator node aliasing on Midas-based boardsMarek Szyprowski
Commit d4ec0cb05064 ("ARM: dts: exynos: Add support for the touch-sensitive buttons on Midas family") added a new fixed regulator ("voltage-regulator-6") to base "midas" .dtsi, but it didn't update the clients of that .dtsi, which define their own fixed regulators starting from the "voltage-regulator-6". This results in aliasing of the regulator dt nodes and breaks operation of OLED panel due to lack of power supply. Fix this by increasing the numbers in the fixed regulator names for those boards. Link: https://lore.kernel.org/r/20200316173710.3144-1-krzk@kernel.org Fixes: d4ec0cb05064 ("ARM: dts: exynos: Add support for the touch-sensitive buttons on Midas family") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Denis 'GNUtoo' Carikli <GNUtoo@cyberdimension.org> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-03-25Merge tag 'imx-fixes-5.6-2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.6, round 2: - Fix minimum voltage setting of vdd_arm and vdd_soc on i.MX6 phycore-som board. * tag 'imx-fixes-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6: phycore-som: fix arm and soc minimum voltage Link: https://lore.kernel.org/r/20200316032555.GD17221@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-03-25RDMA/mlx5: Block delay drop to unprivileged usersMaor Gottlieb
It has been discovered that this feature can globally block the RX port, so it should be allowed for highly privileged users only. Fixes: 03404e8ae652("IB/mlx5: Add support to dropless RQ") Link: https://lore.kernel.org/r/20200322124906.1173790-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-25staging: hp100: Add spaces in if statement.Soumyajit Deb
Add space between if and open parenthesis to improve code readability and to adhere to the standard linux kernel coding style. Also, shift the next line to the right by a single space as it is the continuation of the above if statement. Signed-off-by: Soumyajit Deb <debsoumyajit100@gmail.com> Link: https://lore.kernel.org/r/20200325122752.38600-1-debsoumyajit100@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25staging: hp100: Add space between while keyword and open parenthesisSoumyajit Deb
Add space between while keyword and open parenthesis "(" in the do while statement to improve code readability and to adhere to the Linux Kernel coding style. Reported by checkpatch.pl Signed-off-by: Soumyajit Deb <debsoumyajit100@gmail.com> Link: https://lore.kernel.org/r/20200325115956.37126-1-debsoumyajit100@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>