summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-31devpts: fix error handling in devpts_mntget()Eric Biggers
If devpts_ptmx_path() returns an error code, then devpts_mntget() dereferences an ERR_PTR(): BUG: unable to handle kernel paging request at fffffffffffffff5 IP: devpts_mntget+0x13f/0x280 fs/devpts/inode.c:173 Fix it by returning early in the error paths. Reproducer: #define _GNU_SOURCE #include <fcntl.h> #include <sched.h> #include <sys/ioctl.h> #define TIOCGPTPEER _IO('T', 0x41) int main() { for (;;) { int fd = open("/dev/ptmx", 0); unshare(CLONE_NEWNS); ioctl(fd, TIOCGPTPEER, 0); } } Fixes: 311fc65c9fb9 ("pty: Repair TIOCGPTPEER") Reported-by: syzbot <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # v4.13+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31iversion: make inode_cmp_iversion{+raw} return bool instead of s64Jeff Layton
As Linus points out: The inode_cmp_iversion{+raw}() functions are pure and utter crap. Why? You say that they return 0/negative/positive, but they do so in a completely broken manner. They return that ternary value as the sequence number difference in a 's64', which means that if you actually care about that ternary value, and do the *sane* thing that the kernel-doc of the function implies is the right thing, you would do int cmp = inode_cmp_iversion(inode, old); if (cmp < 0 ... and as a result you get code that looks sane, but that doesn't actually *WORK* right. Since none of the callers actually care about the ternary value here, convert the inode_cmp_iversion{+raw} functions to just return a boolean value (false for matching, true for non-matching). This matches the existing use of these functions just fine, and makes it simple to convert them to return a ternary value in the future if we grow callers that need it. With this change we can also reimplement inode_cmp_iversion in a simpler way using inode_peek_iversion. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31netfilter: on sockopt() acquire sock lock only in the required scopePaolo Abeni
Syzbot reported several deadlocks in the netfilter area caused by rtnl lock and socket lock being acquired with a different order on different code paths, leading to backtraces like the following one: ====================================================== WARNING: possible circular locking dependency detected 4.15.0-rc9+ #212 Not tainted ------------------------------------------------------ syzkaller041579/3682 is trying to acquire lock: (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock include/net/sock.h:1463 [inline] (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167 but task is already holding lock: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607 tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106 xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845 check_target net/ipv6/netfilter/ip6_tables.c:538 [inline] find_check_entry.isra.7+0x935/0xcf0 net/ipv6/netfilter/ip6_tables.c:580 translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749 do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline] do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 entry_SYSCALL_64_fastpath+0x29/0xa0 -> #0 (sk_lock-AF_INET6){+.+.}: lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914 lock_sock_nested+0xc2/0x110 net/core/sock.c:2780 lock_sock include/net/sock.h:1463 [inline] do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167 ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 entry_SYSCALL_64_fastpath+0x29/0xa0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(sk_lock-AF_INET6); lock(rtnl_mutex); lock(sk_lock-AF_INET6); *** DEADLOCK *** 1 lock held by syzkaller041579/3682: #0: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 The problem, as Florian noted, is that nf_setsockopt() is always called with the socket held, even if the lock itself is required only for very tight scopes and only for some operation. This patch addresses the issues moving the lock_sock() call only where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt() does not need anymore to acquire both locks. Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers") Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-31Merge branch 'for-4.16/remove-immediate' into for-linusJiri Kosina
Pull 'immediate' feature removal from Miroslav Benes.
2018-01-31tls: Add support for encryption using async offload acceleratorVakul Garg
Async crypto accelerators (e.g. drivers/crypto/caam) support offloading GCM operation. If they are enabled, crypto_aead_encrypt() return error code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a completion till the time the response for crypto offload request is received. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31ip6mr: fix stale iteratorNikolay Aleksandrov
When we dump the ip6mr mfc entries via proc, we initialize an iterator with the table to dump but we don't clear the cache pointer which might be initialized from a prior read on the same descriptor that ended. This can result in lock imbalance (an unnecessary unlock) leading to other crashes and hangs. Clear the cache pointer like ipmr does to fix the issue. Thanks for the reliable reproducer. Here's syzbot's trace: WARNING: bad unlock balance detected! 4.15.0-rc3+ #128 Not tainted syzkaller971460/3195 is trying to release lock (mrt_lock) at: [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller971460/3195: #0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0 fs/seq_file.c:165 stack backtrace: CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561 __lock_release kernel/locking/lockdep.c:3775 [inline] lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023 __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 traverse+0x3bc/0xa00 fs/seq_file.c:135 seq_read+0x96a/0x13d0 fs/seq_file.c:189 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 BUG: sleeping function called from invalid context at lib/usercopy.c:25 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460 INFO: lockdep is turned off. CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060 __might_sleep+0x95/0x190 kernel/sched/core.c:6013 __might_fault+0xab/0x1d0 mm/memory.c:4525 _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 copy_to_user include/linux/uaccess.h:155 [inline] seq_read+0xcb4/0x13d0 fs/seq_file.c:279 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0 lib/usercopy.c:26 Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31net/sched: kconfig: Remove blank help textsUlf Magnusson
Blank help texts are probably either a typo, a Kconfig misunderstanding, or some kind of half-committing to adding a help text (in which case a TODO comment would be clearer, if the help text really can't be added right away). Best to remove them, IMO. Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31openvswitch: meter: Use 64-bit arithmetic instead of 32-bitGustavo A. R. Silva
Add suffix LL to constant 1000 in order to give the compiler complete information about the proper arithmetic to use. Notice that this constant is used in a context that expects an expression of type long long int (64 bits, signed). The expression (band->burst_size + band->rate) * 1000 is currently being evaluated using 32-bit arithmetic. Addresses-Coverity-ID: 1461563 ("Unintentional integer overflow") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31tcp_nv: fix potential integer overflow in tcpnv_ackedGustavo A. R. Silva
Add suffix ULL to constant 80000 in order to avoid a potential integer overflow and give the compiler complete information about the proper arithmetic to use. Notice that this constant is used in a context that expects an expression of type u64. The current cast to u64 effectively applies to the whole expression as an argument of type u64 to be passed to div64_u64, but it does not prevent it from being evaluated using 32-bit arithmetic instead of 64-bit arithmetic. Also, once the expression is properly evaluated using 64-bit arithmentic, there is no need for the parentheses and the external cast to u64. Addresses-Coverity-ID: 1357588 ("Unintentional integer overflow") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31r8169: fix RTL8168EP take too long to complete driver initialization.Chunhao Lin
Driver check the wrong register bit in rtl_ocp_tx_cond() that keep driver waiting until timeout. Fix this by waiting for the right register bit. Signed-off-by: Chunhao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31qmi_wwan: Add support for Quectel EP06Kristian Evensen
The Quectel EP06 is a Cat. 6 LTE modem. It uses the same interface as the EC20/EC25 for QMI, and requires the same "set DTR"-quirk to work. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINKChristian Brauner
- Backwards Compatibility: If userspace wants to determine whether RTM_NEWLINK supports the IFLA_IF_NETNSID property they should first send an RTM_GETLINK request with IFLA_IF_NETNSID on lo. If either EACCESS is returned or the reply does not include IFLA_IF_NETNSID userspace should assume that IFLA_IF_NETNSID is not supported on this kernel. If the reply does contain an IFLA_IF_NETNSID property userspace can send an RTM_NEWLINK with a IFLA_IF_NETNSID property. If they receive EOPNOTSUPP then the kernel does not support the IFLA_IF_NETNSID property with RTM_NEWLINK. Userpace should then fallback to other means. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31Merge branches 'for-4.16/upstream' and 'for-4.15/upstream-fixes' into for-linusJiri Kosina
Pull assorted small fixes queued for merge window.
2018-01-31Merge branch 'for-4.16/i2c-hid' into for-linusJiri Kosina
2018-01-31Merge branch 'for-4.16/wacom' into for-linusJiri Kosina
Pull Wacom device driver updates. These don't have to go on top of the hid_have_special_driver[] revamp, as the whole group is assumed to have a special driver based on VID.
2018-01-31Merge branch 'for-4.16/elo' into for-linusJiri Kosina
Pull hid-elo device detection fix
2018-01-31Merge branches 'for-4.16/hid-quirks-cleanup/asus', ↵Jiri Kosina
'for-4.16/hid-quirks-cleanup/elecom', 'for-4.16/hid-quirks-cleanup/ish', 'for-4.16/hid-quirks-cleanup/multitouch', 'for-4.16/hid-quirks-cleanup/pixart', 'for-4.16/hid-quirks-cleanup/rmi', 'for-4.16/hid-quirks-cleanup/sony' and 'for-4.16/hid-quirks-cleanup/toshiba' into for-linus Pull assorted device driver fixes (ASUS, Elecom, Intel-ISH, Multitouch, PixArt, RMI, Sony and Toshiba) based on top the hid-quirks revamp.
2018-01-31Merge branch 'for-4.16/hid-quirks-cleanup/_base' into for-linusJiri Kosina
This series from Benjamin Tissoires finally removes one of the big PITAs in the hid-core, which is the absolute need of having added all the new device IDs into the horrid hid_have_special_driver[]
2018-01-31netfilter: ipt_CLUSTERIP: fix out-of-bounds accesses in clusterip_tg_check()Dmitry Vyukov
Commit 136e92bbec0a switched local_nodes from an array to a bitmask but did not add proper bounds checks. As the result clusterip_config_init_nodelist() can both over-read ipt_clusterip_tgt_info.local_nodes and over-write clusterip_config.local_nodes. Add bounds checks for both. Fixes: 136e92bbec0a ("[NETFILTER] CLUSTERIP: use a bitmap to store node responsibility data") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-31netfilter: x_tables: fix pointer leaks to userspaceDmitry Vyukov
Several netfilter matches and targets put kernel pointers into info objects, but don't set usersize in descriptors. This leads to kernel pointer leaks if a match/target is set and then read back to userspace. Properly set usersize for these matches/targets. Found with manual code inspection. Fixes: ec2318904965 ("xtables: extend matches and targets with .usersize") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-31netfilter: ipset: Fix wraparound in hash:*net* typesJozsef Kadlecsik
Fix wraparound bug which could lead to memory exhaustion when adding an x.x.x.x-255.255.255.255 range to any hash:*net* types. Fixes Netfilter's bugzilla id #1212, reported by Thomas Schwark. Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-31drm: Check for lessee in DROP_MASTER ioctlKeith Packard
Don't let a lessee control what the current DRM master is set to; that's the job of the "real" master. Otherwise, the lessee would disable all access to master operations for the owner and all lessees under it. This matches the same check made in the SET_MASTER ioctl. Signed-off-by: Keith Packard <keithp@keithp.com> Fixes: 2ed077e467ee ("drm: Add drm_object lease infrastructure [v5]") Cc: <stable@vger.kernel.org> # v4.15+ Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20180119015159.1606-1-keithp@keithp.com
2018-01-31Merge branch 'topic/xilinx' into for-linusVinod Koul
2018-01-31Merge branch 'topic/virt-dma' into for-linusVinod Koul
2018-01-31Merge branch 'topic/timb' into for-linusVinod Koul
2018-01-31Merge branch 'topic/ti' into for-linusVinod Koul
2018-01-31Merge branch 'topic/tegra' into for-linusVinod Koul
2018-01-31Merge branch 'topic/stm' into for-linusVinod Koul
2018-01-31Merge branch 'topic/sprd' into for-linusVinod Koul
2018-01-31Merge branch 'topic/rcar' into for-linusVinod Koul
2018-01-31Merge branch 'topic/qcom_hidma' into for-linusVinod Koul
2018-01-31Merge branch 'topic/qcom' into for-linusVinod Koul
2018-01-31Merge branch 'topic/mic' into for-linusVinod Koul
2018-01-31Merge branch 'topic/imx' into for-linusVinod Koul
2018-01-31Merge branch 'topic/doc' into for-linusVinod Koul
2018-01-31Merge branch 'topic/device_changes' into for-linusVinod Koul
2018-01-31Merge branch 'topic/cppi' into for-linusVinod Koul
2018-01-31x86/kexec: Make kexec (mostly) work in 5-level paging modeKirill A. Shutemov
Currently kexec() will crash when switching into a 5-level paging enabled kernel. I missed that we need to change relocate_kernel() to set CR4.LA57 flag if the kernel has 5-level paging enabled. I avoided using #ifdef CONFIG_X86_5LEVEL here and inferred if we need to enable 5-level paging from previous CR4 value. This way the code is ready for boot-time switching between paging modes. With this patch applied, in addition to kexec 4-to-4 which always worked, we can kexec 4-to-5 and 5-to-5 - while 5-to-4 will need more work. Reported-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Baoquan He <bhe@redhat.com> Cc: <stable@vger.kernel.org> # v4.14+ Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y") Link: http://lkml.kernel.org/r/20180129110845.26633-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-30blk-mq: introduce BLK_STS_DEV_RESOURCEMing Lei
This status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: Jens Axboe <axboe@kernel.dk> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-30Merge tag 'f2fs-for-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've followed up to support some generic features such as cgroup, block reservation, linking fscrypt_ops, delivering write_hints, and some ioctls. And, we could fix some corner cases in terms of power-cut recovery and subtle deadlocks. Enhancements: - bitmap operations to handle NAT blocks - readahead to improve readdir speed - switch to use fscrypt_* - apply write hints for direct IO - add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid - modify b_avail and b_free to consider root reserved blocks - support cgroup writeback - support FIEMAP_FLAG_XATTR for fibmap - add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents - add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks - support inode creation time Bug fixs: - sysfile-based quota operations - memory footprint accounting - allow to write data on partial preallocation case - fix deadlock case on fallocate - fix to handle fill_super errors - fix missing inode updates of fsync'ed file - recover renamed file which was fsycn'ed before - drop inmemory pages in corner error case - keep last_disk_size correctly - recover missing i_inline flags during roll-forward Various clean-up patches were added as well" * tag 'f2fs-for-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (72 commits) f2fs: support inode creation time f2fs: rebuild sit page from sit info in mem f2fs: stop issuing discard if fs is readonly f2fs: clean up duplicated assignment in init_discard_policy f2fs: use GFP_F2FS_ZERO for cleanup f2fs: allow to recover node blocks given updated checkpoint f2fs: recover some i_inline flags f2fs: correct removexattr behavior for null valued extended attribute f2fs: drop page cache after fs shutdown f2fs: stop gc/discard thread after fs shutdown f2fs: hanlde error case in f2fs_ioc_shutdown f2fs: split need_inplace_update f2fs: fix to update last_disk_size correctly f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup f2fs: clean up error path of fill_super f2fs: avoid hungtask when GC encrypted block if io_bits is set f2fs: allow quota to use reserved blocks f2fs: fix to drop all inmem pages correctly f2fs: speed up defragment on sparse file f2fs: support F2FS_IOC_PRECACHE_EXTENTS ...
2018-01-30Merge tag 'nfs-for-4.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable bugfixes: - Fix breakages in the nfsstat utility due to the inclusion of the NFSv4 LOOKUPP operation - Fix a NULL pointer dereference in nfs_idmap_prepare_pipe_upcall() due to nfs_idmap_legacy_upcall() being called without an 'aux' parameter - Fix a refcount leak in the standard O_DIRECT error path - Fix a refcount leak in the pNFS O_DIRECT fallback to MDS path - Fix CPU latency issues with nfs_commit_release_pages() - Fix the LAYOUTUNAVAILABLE error case in the file layout type - NFS: Fix a race between mmap() and O_DIRECT Features: - Support the statx() mask and query flags to enable optimisations when the user is requesting only attributes that are already up to date in the inode cache, or is specifying the AT_STATX_DONT_SYNC flag - Add a module alias for the SCSI pNFS layout type Bugfixes: - Automounting when resolving a NFSv4 referral should preserve the RDMA transport protocol settings - Various other RDMA bugfixes from Chuck - pNFS block layout fixes - Always set NFS_LOCK_LOST when a lock is lost" * tag 'nfs-for-4.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits) NFS: Fix a race between mmap() and O_DIRECT NFS: Remove a redundant call to unmap_mapping_range() pnfs/blocklayout: Ensure disk address in block device map pnfs/blocklayout: pnfs_block_dev_map uses bytes, not sectors lockd: Fix server refcounting SUNRPC: Fix null rpc_clnt dereference in rpc_task_queued tracepoint SUNRPC: Micro-optimize __rpc_execute SUNRPC: task_run_action should display tk_callback sunrpc: Format RPC events consistently for display SUNRPC: Trace xprt_timer events xprtrdma: Correct some documenting comments xprtrdma: Fix "bytes registered" accounting xprtrdma: Instrument allocation/release of rpcrdma_req/rep objects xprtrdma: Add trace points to instrument QP and CQ access upcalls xprtrdma: Add trace points in the client-side backchannel code paths xprtrdma: Add trace points for connect events xprtrdma: Add trace points to instrument MR allocation and recovery xprtrdma: Add trace points to instrument memory invalidation xprtrdma: Add trace points in reply decoder path xprtrdma: Add trace points to instrument memory registration ..
2018-01-30Merge branch 'work.sock_recvmsg' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull kern_recvmsg reduction from Al Viro: "kernel_recvmsg() is a set_fs()-using wrapper for sock_recvmsg(). In all but one case that is not needed - use of ITER_KVEC for ->msg_iter takes care of the data and does not care about set_fs(). The only exception is svc_udp_recvfrom() where we want cmsg to be store into kernel object; everything else can just use sock_recvmsg() and be done with that. A followup converting svc_udp_recvfrom() away from set_fs() (and killing kernel_recvmsg() off) is *NOT* in here - I'd like to hear what netdev folks think of the approach proposed in that followup)" * 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: tipc: switch to sock_recvmsg() smc: switch to sock_recvmsg() ipvs: switch to sock_recvmsg() mISDN: switch to sock_recvmsg() drbd: switch to sock_recvmsg() lustre lnet_sock_read(): switch to sock_recvmsg() cfs2: switch to sock_recvmsg() ncpfs: switch to sock_recvmsg() dlm: switch to sock_recvmsg() svc_recvfrom(): switch to sock_recvmsg()
2018-01-30Merge branch 'work.mqueue' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mqueue/bpf vfs cleanups from Al Viro: "mqueue and bpf go through rather painful and similar contortions to create objects in their dentry trees. Provide a primitive for doing that without abusing ->mknod(), switch bpf and mqueue to it. Another mqueue-related thing that has ended up in that branch is on-demand creation of internal mount (based upon the work of Giuseppe Scrivano)" * 'work.mqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mqueue: switch to on-demand creation of internal mount tidy do_mq_open() up a bit mqueue: clean prepare_open() up do_mq_open(): move all work prior to dentry_open() into a helper mqueue: fold mq_attr_ok() into mqueue_get_inode() move dentry_open() calls up into do_mq_open() mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod() new primitive: vfs_mkobj()
2018-01-30Merge branch 'misc.poll' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull poll annotations from Al Viro: "This introduces a __bitwise type for POLL### bitmap, and propagates the annotations through the tree. Most of that stuff is as simple as 'make ->poll() instances return __poll_t and do the same to local variables used to hold the future return value'. Some of the obvious brainos found in process are fixed (e.g. POLLIN misspelled as POLL_IN). At that point the amount of sparse warnings is low and most of them are for genuine bugs - e.g. ->poll() instance deciding to return -EINVAL instead of a bitmap. I hadn't touched those in this series - it's large enough as it is. Another problem it has caught was eventpoll() ABI mess; select.c and eventpoll.c assumed that corresponding POLL### and EPOLL### were equal. That's true for some, but not all of them - EPOLL### are arch-independent, but POLL### are not. The last commit in this series separates userland POLL### values from the (now arch-independent) kernel-side ones, converting between them in the few places where they are copied to/from userland. AFAICS, this is the least disruptive fix preserving poll(2) ABI and making epoll() work on all architectures. As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and it will trigger only on what would've triggered EPOLLWRBAND on other architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered at all on sparc. With this patch they should work consistently on all architectures" * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) make kernel-side POLL... arch-independent eventpoll: no need to mask the result of epi_item_poll() again eventpoll: constify struct epoll_event pointers debugging printk in sg_poll() uses %x to print POLL... bitmap annotate poll(2) guts 9p: untangle ->poll() mess ->si_band gets POLL... bitmap stored into a user-visible long field ring_buffer_poll_wait() return value used as return value of ->poll() the rest of drivers/*: annotate ->poll() instances media: annotate ->poll() instances fs: annotate ->poll() instances ipc, kernel, mm: annotate ->poll() instances net: annotate ->poll() instances apparmor: annotate ->poll() instances tomoyo: annotate ->poll() instances sound: annotate ->poll() instances acpi: annotate ->poll() instances crypto: annotate ->poll() instances block: annotate ->poll() instances x86: annotate ->poll() instances ...
2018-01-30Merge branch 'for-4.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting. Documentation updates and trivial changes; however, this pull request does containt he previusly discussed dropping of __must_check from strscpy()" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Documentation: Fix 'file_mapped' -> 'mapped_file' string: drop __must_check from strscpy() and restore strscpy() usages in cgroup cgroup, docs: document the root cgroup behavior of cpu and io controllers cgroup-v2.txt: fix typos cgroup: Update documentation reference Documentation/cgroup-v1: fix outdated programming details cgroup, docs: document cgroup v2 device controller
2018-01-30Merge branch 'for-4.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu update from Tejun Heo: "One trivial patch to convert the return type from int to bool" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: percpu_counter_initialized can be boolean
2018-01-30Merge branch 'for-4.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: "Nothing too interesting. Several patches to convert mdelay() to usleep_range(), removal of unused pata_at32, and other low level driver specific changes" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: pata_pdc2027x: Replace mdelay with msleep ata: pata_it821x: Replace mdelay with usleep_range in it821x_firmware_command ata: sata_mv: Replace mdelay with usleep_range in mv_reset_channel ata: remove pata_at32 phy: brcm-sata: remove unused variable phy: brcm-sata: fix semicolon.cocci warnings ata: ahci_brcm: Recover from failures to identify devices phy: brcm-sata: Implement calibrate callback ahci: Add Intel Cannon Lake PCH-H PCI ID ata_piix: constify pci_bits libata:pata_atiixp: Don't use unconnected secondary port on SB600 ata: ahci_brcm: Avoid clobbering SATA_TOP_CTRL_BUS_CTRL ahci: Allow setting a default LPM policy for mobile chipsets ahci: Add PCI ids for Intel Bay Trail, Cherry Trail and Apollo Lake AHCI ahci: Annotate PCI ids for mobile Intel chipsets as such
2018-01-30Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: "Workqueue has an early init trick where workqueues can be created and work items queued on them before the workqueue subsystem is online. This helps simplifying early init and operation of low level subsystems which use workqueues for managerial things which aren't depended upon early during boot. Out of laziness, the early init didn't cover workqueues with WQ_MEM_RECLAIM, which is inconsistent and confusing because adding the flag simply makes the system fail to boot. Cover WQ_MEM_RECLAIM too. This was originally brought up for RCU but RCU didn't actually need this. I still think it's a good idea to cover it" * 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: allow WQ_MEM_RECLAIM on early init workqueues workqueue: separate out init_rescuer()
2018-01-30Merge branch 'userns-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns updates from Eric Biederman: "Between the holidays and other distractions only a small amount of namespace work made it into my tree this time. Just a final cleanup from a revert several kernels ago and a small typo fix from Wolffhardt Schwabe" * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: fix typo in assignment of fs default overflow gid autofs4: Modify autofs_wait to use current_uid() and current_gid() userns: Don't fail follow_automount based on s_user_ns
2018-01-30Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo cleanups from Eric Biederman: "Long ago when 2.4 was just a testing release copy_siginfo_to_user was made to copy individual fields to userspace, possibly for efficiency and to ensure initialized values were not copied to userspace. Unfortunately the design was complex, it's assumptions unstated, and humans are fallible and so while it worked much of the time that design failed to ensure unitialized memory is not copied to userspace. This set of changes is part of a new design to clean up siginfo and simplify things, and hopefully make the siginfo handling robust enough that a simple inspection of the code can be made to ensure we don't copy any unitializied fields to userspace. The design is to unify struct siginfo and struct compat_siginfo into a single definition that is shared between all architectures so that anyone adding to the set of information shared with struct siginfo can see the whole picture. Hopefully ensuring all future si_code assignments are arch independent. The design is to unify copy_siginfo_to_user32 and copy_siginfo_from_user32 so that those function are complete and cope with all of the different cases documented in signinfo_layout. I don't think there was a single implementation of either of those functions that was complete and correct before my changes unified them. The design is to introduce a series of helpers including force_siginfo_fault that take the values that are needed in struct siginfo and build the siginfo structure for their callers. Ensuring struct siginfo is built correctly. The remaining work for 4.17 (unless someone thinks it is post -rc1 material) is to push usage of those helpers down into the architectures so that architecture specific code will not need to deal with the fiddly work of intializing struct siginfo, and then when struct siginfo is guaranteed to be fully initialized change copy siginfo_to_user into a simple wrapper around copy_to_user. Further there is work in progress on the issues that have been documented requires arch specific knowledge to sort out. The changes below fix or at least document all of the issues that have been found with siginfo generation. Then proceed to unify struct siginfo the 32 bit helpers that copy siginfo to and from userspace, and generally clean up anything that is not arch specific with regards to siginfo generation. It is a lot but with the unification you can of siginfo you can already see the code reduction in the kernel" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits) signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr mm/memory_failure: Remove unused trapno from memory_failure signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap signal: Helpers for faults with specialized siginfo layouts signal: Add send_sig_fault and force_sig_fault signal: Replace memset(info,...) with clear_siginfo for clarity signal: Don't use structure initializers for struct siginfo signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered ptrace: Use copy_siginfo in setsiginfo and getsiginfo signal: Unify and correct copy_siginfo_to_user32 signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32 signal: Unify and correct copy_siginfo_from_user32 signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h signal/powerpc: Remove redefinition of NSIGTRAP on powerpc signal: Move addr_lsb into the _sigfault union for clarity ...