summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-08-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "3 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails ipc/shm.c add ->pagesize function to shm_vm_ops memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
2018-08-02userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK failsMike Rapoport
The fix in commit 0cbb4b4f4c44 ("userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails") cleared the vma->vm_userfaultfd_ctx but kept userfaultfd flags in vma->vm_flags that were copied from the parent process VMA. As the result, there is an inconsistency between the values of vma->vm_userfaultfd_ctx.ctx and vma->vm_flags which triggers BUG_ON in userfaultfd_release(). Clearing the uffd flags from vma->vm_flags in case of UFFD_EVENT_FORK failure resolves the issue. Link: http://lkml.kernel.org/r/1532931975-25473-1-git-send-email-rppt@linux.vnet.ibm.com Fixes: 0cbb4b4f4c44 ("userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails") Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reported-by: syzbot+121be635a7a35ddb7dcb@syzkaller.appspotmail.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02ipc/shm.c add ->pagesize function to shm_vm_opsJane Chu
Commit 05ea88608d4e ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct") adds a new ->pagesize() function to hugetlb_vm_ops, intended to cover all hugetlbfs backed files. With System V shared memory model, if "huge page" is specified, the "shared memory" is backed by hugetlbfs files, but the mappings initiated via shmget/shmat have their original vm_ops overwritten with shm_vm_ops, so we need to add a ->pagesize function to shm_vm_ops. Otherwise, vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs backed vma, result in below BUG: fs/hugetlbfs/inode.c 443 if (unlikely(page_mapped(page))) { 444 BUG_ON(truncate_op); resulting in hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is deprecated ------------[ cut here ]------------ kernel BUG at fs/hugetlbfs/inode.c:444! Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ... CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 4.14.35-1829.el7uek.x86_64 #2 RIP: 0010:remove_inode_hugepages+0x3db/0x3e2 .... Call Trace: hugetlbfs_evict_inode+0x1e/0x3e evict+0xdb/0x1af iput+0x1a2/0x1f7 dentry_unlink_inode+0xc6/0xf0 __dentry_kill+0xd8/0x18d dput+0x1b5/0x1ed __fput+0x18b/0x216 ____fput+0xe/0x10 task_work_run+0x90/0xa7 exit_to_usermode_loop+0xdd/0x116 do_syscall_64+0x187/0x1ae entry_SYSCALL_64_after_hwframe+0x150/0x0 [jane.chu@oracle.com: relocate comment] Link: http://lkml.kernel.org/r/20180731044831.26036-1-jane.chu@oracle.com Link: http://lkml.kernel.org/r/20180727211727.5020-1-jane.chu@oracle.com Fixes: 05ea88608d4e13 ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct") Signed-off-by: Jane Chu <jane.chu@oracle.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failureKirill Tkhai
In case of memcg_online_kmem() failure, memcg_cgroup::id remains hashed in mem_cgroup_idr even after memcg memory is freed. This leads to leak of ID in mem_cgroup_idr. This patch adds removal into mem_cgroup_css_alloc(), which fixes the problem. For better readability, it adds a generic helper which is used in mem_cgroup_alloc() and mem_cgroup_id_put_many() as well. Link: http://lkml.kernel.org/r/152354470916.22460.14397070748001974638.stgit@localhost.localdomain Fixes 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02drivers: net: lmc: fix case value for target abort errorColin Ian King
Current value for a target abort error is 0x010, however, this value should in fact be 0x002. As it stands, the range of error is 0..7 so it is currently never being detected. This bug has been in the driver since the early 2.6.12 days (or before). Detected by CoverityScan, CID#744290 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02blk-mq: fix blk_mq_tagset_busy_iterMing Lei
Commit d250bf4e776ff09d5("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter") uses 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT' to replace 'blk_mq_request_started(req)', this way is wrong, and causes lots of test system hang during booting. Fix the issue by using blk_mq_request_started(req) inside bt_tags_iter(). Fixes: d250bf4e776ff09d5 ("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter") Cc: Josef Bacik <josef@toxicpanda.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Mark Brown <broonie@kernel.org> Cc: Matt Hart <matthew.hart@linaro.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: Hannes Reinecke <hare@suse.com>, Cc: "Martin K. Petersen" <martin.petersen@oracle.com>, Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: linux-scsi@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Mark Brown <broonie@kernel.org> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-02fs: fix iomap_bmap position calculationEric Sandeen
The position calculation in iomap_bmap() shifts bno the wrong way, so we don't progress properly and end up re-mapping block zero over and over, yielding an unchanging physical block range as the logical block advances: # filefrag -Be file ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 0: 21.. 21: 1: merged 1: 1.. 1: 21.. 21: 1: 22: merged Discontinuity: Block 1 is at 21 (was 22) 2: 2.. 2: 21.. 21: 1: 22: merged Discontinuity: Block 2 is at 21 (was 22) 3: 3.. 3: 21.. 21: 1: 22: merged This breaks the FIBMAP interface for anyone using it (XFS), which in turn breaks LILO, zipl, etc. Bug-actually-spotted-by: Darrick J. Wong <darrick.wong@oracle.com> Fixes: 89eb1906a953 ("iomap: add an iomap-based bmap implementation") Cc: stable@vger.kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-02scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGOJohannes Thumshirn
When receiving a LOGO request we forget to clear the FC_RP_STARTED flag before starting the rport delete routine. As the started flag was not cleared, we're not deleting the rport but waiting for a restart and thus are keeping the reference count of the rdata object at 1. This leads to the following kmemleak report: unreferenced object 0xffff88006542aa00 (size 512): comm "kworker/0:2", pid 24, jiffies 4294899222 (age 226.880s) hex dump (first 32 bytes): 68 96 fe 65 00 88 ff ff 00 00 00 00 00 00 00 00 h..e............ 01 00 00 00 08 00 00 00 02 c5 45 24 ac b8 00 10 ..........E$.... backtrace: [<(____ptrval____)>] fcoe_ctlr_vn_add.isra.5+0x7f/0x770 [libfcoe] [<(____ptrval____)>] fcoe_ctlr_vn_recv+0x12af/0x27f0 [libfcoe] [<(____ptrval____)>] fcoe_ctlr_recv_work+0xd01/0x32f0 [libfcoe] [<(____ptrval____)>] process_one_work+0x7ff/0x1420 [<(____ptrval____)>] worker_thread+0x87/0xef0 [<(____ptrval____)>] kthread+0x2db/0x390 [<(____ptrval____)>] ret_from_fork+0x35/0x40 [<(____ptrval____)>] 0xffffffffffffffff Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: ard <ard@kwaak.net> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-02scsi: fcoe: drop frames in ELS LOGO error pathJohannes Thumshirn
Drop the frames in the ELS LOGO error path instead of just returning an error. This fixes the following kmemleak report: unreferenced object 0xffff880064cb1000 (size 424): comm "kworker/0:2", pid 24, jiffies 4294904293 (age 68.504s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<(____ptrval____)>] _fc_frame_alloc+0x2c/0x180 [libfc] [<(____ptrval____)>] fc_lport_enter_logo+0x106/0x360 [libfc] [<(____ptrval____)>] fc_fabric_logoff+0x8c/0xc0 [libfc] [<(____ptrval____)>] fcoe_if_destroy+0x79/0x3b0 [fcoe] [<(____ptrval____)>] fcoe_destroy_work+0xd2/0x170 [fcoe] [<(____ptrval____)>] process_one_work+0x7ff/0x1420 [<(____ptrval____)>] worker_thread+0x87/0xef0 [<(____ptrval____)>] kthread+0x2db/0x390 [<(____ptrval____)>] ret_from_fork+0x35/0x40 [<(____ptrval____)>] 0xffffffffffffffff which can be triggered by issuing echo eth0 > /sys/bus/fcoe/ctlr_destroy Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-02scsi: fcoe: fix use-after-free in fcoe_ctlr_els_sendJohannes Thumshirn
KASAN reports a use-after-free in fcoe_ctlr_els_send() when we're sending a LOGO and have FIP debugging enabled. This is because we're first freeing the skb and then printing the frame's DID. But the DID is a member of the FC frame header which in turn is the skb's payload. Exchange the debug print and kfree_skb() calls so we're not touching the freed data. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-02perf trace: Use perf_evsel__sc_tp_{uint,ptr} for "id"/"args" handling ↵Arnaldo Carvalho de Melo
syscalls:* events Now it looks just about the same as for the trace__sys_{enter,exit}. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-y59may7zx1eccnp4m3qm4u0b@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-02perf trace: Setup struct syscall_tp for syscalls:sys_{enter,exit}_NAME eventsArnaldo Carvalho de Melo
Mapping "__syscall_nr" to "id" and setting up "args" from the offset of "__syscall_nr" + sizeof(u64), as the payload for syscalls:* is the same as for raw_syscalls:*, just the fields have different names. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ogeenrpviwcpwl3oy1l55f3m@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-02perf trace: Allow setting up a syscall_tp struct without a format_fieldArnaldo Carvalho de Melo
To avoid having to ask libtraceevent to find a field by name when handling each tracepoint event, we setup a struct syscall_tp with a tp_field struct having an extractor function + the offset for the "id", "args" and "ret" raw_syscalls:sys_{enter,exit} tracepoints. Now that we want to do the same with syscalls:sys_{entry,exit}_NAME individual syscall tracepoints, where we have "id" as "__syscall_nr" and "args" as the actual series of per syscall parameters, we need more flexibility from the routines that set up these pre-looked up syscall tracepoint arg fields. The next cset will use it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-v59q5e0jrlzkpl9a1c7t81ni@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-02perf trace: Rename some syscall_tp methods to raw_syscallArnaldo Carvalho de Melo
Because raw_syscalls have the field for the syscall number as 'id' while the syscalls:sys_{enter,exit}_NAME have it as __syscall_nr... Since we want to support both for being able to enable just a syscalls:sys_{enter,exit}_name instead of asking for raw_syscalls:sys_{enter,exit} plus filters, make the method names for each kind of tracepoint more explicit. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-4rixbfzco6tsry0w9ghx3ktb@git.kernel.org Signef-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-02perf trace: Use beautifiers on syscalls:sys_enter_ handlersArnaldo Carvalho de Melo
We were using the beautifiers only when processing the raw_syscalls:sys_enter events, but we can as well use them for the syscalls:sys_enter_NAME events, as the layout is the same. Some more tweaking is needed as we're processing them straight away, i.e. there is no buffering in the sys_enter_NAME event to wait for things like vfs_getname to provide pointer contents and then flushing at sys_exit_NAME, so we need to state in the syscall_arg that this is unbuffered, just print the pointer values, beautifying just non-pointer syscall args. This just shows an alternative way of processing tracepoints, that we will end up using when creating "tracepoint" payloads that already copy pointer contents (or chunks of it, i.e. not the whole filename, but just the end of it, not all the bf for a read/write, but just the start, etc), directly in the kernel using eBPF. E.g.: # perf trace -e syscalls:*enter*sleep,*sleep sleep 1 0.303 ( ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffc93d5ecc0 0.305 (1000.229 ms): sleep/8746 nanosleep(rqtp: 0x7ffc93d5ecc0) = 0 # perf trace -e syscalls:*_*sleep,*sleep sleep 1 0.288 ( ): syscalls:sys_enter_nanosleep:rqtp: 0x7ffecde87e40 0.289 ( ): sleep/8748 nanosleep(rqtp: 0x7ffecde87e40) ... 1000.479 ( ): syscalls:sys_exit_nanosleep:0x0 0.289 (1000.208 ms): sleep/8748 ... [continued]: nanosleep()) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-jehyd2zwhw00z3p7v7mg9632@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-02Merge tag 'pci-v4.18-fixes-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Fix integer overflow in new mobiveil driver (Dan Carpenter) - Fix race during NVMe removal/rescan (Hari Vyas) * tag 'pci-v4.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Fix is_added/is_busmaster race condition PCI: mobiveil: Avoid integer overflow in IB_WIN_SIZE
2018-08-02selftest/net: fix protocol family to work for IPv4.Maninder Singh
use actual protocol family passed by user rather than hardcoded AF_INTE6 to cerate sockets. current code is not working for IPv4. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Vaneet Narang <v.narang@samsung.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 regression fix from Will Deacon: "Ard found a nasty arm64 regression in 4.18 where the AES ghash/gcm code doesn't notify the kernel about its use of the vector registers, therefore potentially corrupting live user state. The fix is straightforward and Herbert agreed for it to go via arm64" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair
2018-08-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Fixes keep trickling in: 1) Various IP fragmentation memory limit hardening changes from Eric Dumazet. 2) Revert ipv6 metrics leak change, it causes more problems than it fixes for now. 3) Fix WoL regression in stmmac driver, from Jose Abreu. 4) Netlink socket spectre v1 gadget fix, from Jeremy Cline" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: Revert "net/ipv6: fix metrics leak" rxrpc: Fix user call ID check in rxrpc_service_prealloc_one net: dsa: Do not suspend/resume closed slave_dev netlink: Fix spectre v1 gadget in netlink_create() Documentation: dpaa2: Use correct heading adornment net: stmmac: Fix WoL for PCI-based setups bonding: avoid lockdep confusion in bond_get_stats() enic: do not call enic_change_mtu in enic_probe ipv4: frags: handle possible skb truesize change inet: frag: enforce memory limits earlier net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flow net/mlx5e: Fix null pointer access when setting MTU of vport representor net/mlx5e: Set port trust mode to PCP as default net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager net: dsa: mv88e6xxx: Fix SERDES support on 88E6141/6341 brcmfmac: fix regression in parsing NVRAM for multiple devices iwlwifi: add more card IDs for 9000 series
2018-08-02Squashfs: Compute expected length from inode size rather than block lengthPhillip Lougher
Previously in squashfs_readpage() when copying data into the page cache, it used the length of the datablock read from the filesystem (after decompression). However, if the filesystem has been corrupted this data block may be short, which will leave pages unfilled. The fix for this is to compute the expected number of bytes to copy from the inode size, and use this to detect if the block is short. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Tested-by: Willy Tarreau <w@1wt.eu> Cc: Анатолий Тросиненко <anatoly.trosinenko@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02squashfs: more metadata hardeningLinus Torvalds
The squashfs fragment reading code doesn't actually verify that the fragment is inside the fragment table. The end result _is_ verified to be inside the image when actually reading the fragment data, but before that is done, we may end up taking a page fault because the fragment table itself might not even exist. Another report from Anatoly and his endless squashfs image fuzzing. Reported-by: Анатолий Тросиненко <anatoly.trosinenko@gmail.com> Acked-by:: Phillip Lougher <phillip.lougher@gmail.com>, Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-02perf trace: Associate vfs_getname()'ed pathname with fd returned from 'openat'Arnaldo Carvalho de Melo
When the vfs_getname() wannabe tracepoint is in place: # perf probe -l probe:vfs_getname (on getname_flags:73@acme/git/linux/fs/namei.c with pathname) # 'perf trace' will use it to get the pathname when it is copied from userspace to the kernel, right after syscalls:sys_enter_open, copied in the 'probe:vfs_getname', stash it somewhere and then, at syscalls:sys_exit_open time, if the 'open' return is not -1, i.e. a successfull open syscall, associate that pathname to this return, i.e. the fd. We were not doing this for the 'openat' syscall, which would cause 'perf trace' to fallback to using /proc to get the fd, change it so that we use what we got from probe:vfs_getname, reducing the 'openat' beautification process cost, ditching the syscalls performed to read procfs state and avoiding some possible races in the process. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-xnp44ao3bkb6ejeczxfnjwsh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-02stop_machine: Reflow cpu_stop_queue_two_works()Peter Zijlstra
The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end. Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: isaacm@codeaurora.org Cc: matt@codeblueprint.co.uk Cc: psodagud@codeaurora.org Cc: gregkh@linuxfoundation.org Cc: pkondeti@codeaurora.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
2018-08-02clockevents: Warn if cpu_all_mask is used as cpumaskSudeep Holla
Using cpu_all_mask in clockevents cpumask may result in issues while comparing multiple clockevent devices to choose the preferred one. On one of the platforms with 2 system (i.e. non per-CPU) timers with different ratings, having cpu_all_mask for one of the device resulted in a boot hang due to a endless loop in clockevents_notify_released() as both were clocksources were selected as preferred. In order to prevent such issues in the future, warn if any clockevent driver sets cpu_all_mask as it's cpumask and just override it to use cpu_possible_mask. All the existing occurrences of cpu_all_mask are already replaced with cpu_possible_mask. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/1531308264-24220-3-git-send-email-sudeep.holla@arm.com
2018-08-02tick/broadcast-hrtimer: Use cpu_possible_mask for ce_broadcast_hrtimerSudeep Holla
This is the last instance of cpu_all_mask usage in the core framework. Replace it with cpu_possible_mask like all other instances in the clockevent drivers. This makes it possible to add a warning in the core clockevents_register_device on usage of cpu_all_mask from any clockevent drivers in the future. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/1531308264-24220-2-git-send-email-sudeep.holla@arm.com
2018-08-02clocksource/drivers/arm_arch_timer: Fix bogus cpu_all_mask usageThomas Gleixner
Using cpu_all_mask as target mask for clockevents is wrong as it never can actually target not possible CPUs. Use cpu_possible_mask instead Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-08-02x86/iommu: Use NULL instead of 0Zhong Jiang
Fixes the following sparse warning: arch/x86/kernel/pci-iommu_table.c:63:37: warning: Using plain integer as NULL pointer Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <hpa@zytor.com> Cc: <gregkh@linuxfoundation.org> Link: https://lkml.kernel.org/r/1532162004-24670-1-git-send-email-zhongjiang@huawei.com
2018-08-02x86/boot: Use CC_SET()/CC_OUT() instead of open coding itUros Bizjak
Remove open-coded uses of set instructions with CC_SET()/CC_OUT(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180629142844.15200-1-ubizjak@gmail.com
2018-08-02x86/mm: Remove redundant check for kmem_cache_create()Chengguang Xu
The flag 'SLAB_PANIC' implies panic on failure, So there is no need to check the returned pointer for NULL. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/1528804132-154948-1-git-send-email-cgxu519@gmx.com
2018-08-02x86/platform/UV: Remove redundant check of p == qColin Ian King
The check for p == q is dead code because the proceeding switch statements jump to the end of the outer for-loop with continue statements. Remove the dead code. Detected by CoverityScan, CID#145071 ("Structurally dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: kernel-janitors@vger.kernel.org Link: https://lkml.kernel.org/r/20180731090938.11856-1-colin.king@canonical.com
2018-08-02x86/platform/olpc: Use PTR_ERR_OR_ZERO()zhong jiang
Replace the open coded equivalent with PTR_ERR_OR_ZERO(). Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1533053286-34990-1-git-send-email-zhongjiang@huawei.com
2018-08-02x86/boot/compressed/64: Validate trampoline placement against E820Kirill A. Shutemov
There were two report of boot failure cased by trampoline placed into a reserved memory region. It can happen on machines that don't report EBDA correctly. Fix the problem by re-validating the found address against the E820 table. If the address is in a reserved area, find the next usable region below the initial address. Fixes: 3548e131ec6a ("x86/boot/compressed/64: Find a place for 32-bit trampoline") Reported-by: Dmitry Malkin <d.malkin@real-time-systems.com> Reported-by: youling 257 <youling257@gmail.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180801133225.38121-1-kirill.shutemov@linux.intel.com
2018-08-02debugobjects: Remove redundant NULL pointer checkZhong Jiang
kmem_cache_destroy() has a built in NULL pointer check, so the one at the call can be removed. Signed-off-by: Zhong Jiang <zhongjiang@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <longman@redhat.com> Cc: <arnd@arndb.de> Cc: <yang.shi@linux.alibaba.com> Link: https://lkml.kernel.org/r/1533054298-35824-1-git-send-email-zhongjiang@huawei.com
2018-08-02clocksource: ti-32k: Remove CLOCK_SOURCE_SUSPEND_NONSTOP flagKeerthy
Since commit 39232ed5a179 ("time: Introduce one suspend clocksource to compensate the suspend time") suspend/resume fails on AM437x platforms as the clocksource actually stops in suspend. Hence remove the CLOCK_SOURCE_SUSPEND_NONSTOP flag. Suggested-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <daniel.lezcano@linaro.org> Cc: <linux-omap@vger.kernel.org> Cc: <baolin.wang@linaro.org> Cc: <d-gerlach@ti.com> Cc: <tony@atomide.com> Cc: <t-kristo@ti.com> Link: https://lkml.kernel.org/r/1533191716-20476-1-git-send-email-j-keerthy@ti.com
2018-08-02timers: Clear timer_base::must_forward_clk with timer_base::lock heldGaurav Kohli
timer_base::must_forward_clock is indicating that the base clock might be stale due to a long idle sleep. The forwarding of the base clock takes place in the timer softirq or when a timer is enqueued to a base which is idle. If the enqueue of timer to an idle base happens from a remote CPU, then the following race can happen: CPU0 CPU1 run_timer_softirq mod_timer base = lock_timer_base(timer); base->must_forward_clk = false if (base->must_forward_clk) forward(base); -> skipped enqueue_timer(base, timer, idx); -> idx is calculated high due to stale base unlock_timer_base(timer); base = lock_timer_base(timer); forward(base); The root cause is that timer_base::must_forward_clk is cleared outside the timer_base::lock held region, so the remote queuing CPU observes it as cleared, but the base clock is still stale. This can cause large granularity values for timers, i.e. the accuracy of the expiry time suffers. Prevent this by clearing the flag with timer_base::lock held, so that the forwarding takes place before the cleared flag is observable by a remote CPU. Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: linux-arm-msm@vger.kernel.org Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
2018-08-02Merge tag 'perf-core-for-mingo-4.19-20180801' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf trace: (Arnaldo Carvalho de Melo) - Do not require --no-syscalls to suppress strace like output, i.e. # perf trace -e sched:*switch will show just sched:sched_switch events, not strace-like formatted syscall events, use --syscalls to get the previous behaviour. If instead: # perf trace is used, i.e. no events specified, then --syscalls is implied and system wide strace like formatting will be applied to all syscalls. The behaviour when just a syscall subset is used with '-e' is unchanged: # perf trace -e *sleep,sched:*switch will work as before: just the 'nanosleep' syscall will be strace-like formatted plus the sched:sched_switch tracepoint event, system wide. - Allow string table generators to use a default header dir, allowing use of them without parameters to see the table it generates on stdout, e.g.: $ tools/perf/trace/beauty/kvm_ioctl.sh static const char *kvm_ioctl_cmds[] = { [0x00] = "GET_API_VERSION", [0x01] = "CREATE_VM", [0x02] = "GET_MSR_INDEX_LIST", [0x03] = "CHECK_EXTENSION", <BIG SNIP> [0xe0] = "CREATE_DEVICE", [0xe1] = "SET_DEVICE_ATTR", [0xe2] = "GET_DEVICE_ATTR", [0xe3] = "HAS_DEVICE_ATTR", }; $ See 'ls tools/perf/trace/beauty/*.sh' to see the available string table generators. - Add a generator for IPPROTO_ socket's protocol constants. perf record: (Kan Liang) - Fix error out while applying initial delay and using LBR, due to the use of a PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY event to track PERF_RECORD_MMAP events while waiting for the initial delay. Such events fail when configured asking PERF_SAMPLE_BRANCH_STACK in perf_event_attr.sample_type. perf c2c: (Jiri Olsa) - Fix report crash for empty browser, when processing a perf.data file without events of interest, either because not asked for in 'perf record' or because the workload didn't triggered such events. perf list: (Michael Petlan) - Align metric group description format with PMU event description. perf tests: (Sandipan Das) - Fix indexing when invoking subtests, which caused BPF tests to get results for the next test in the list, with the last one reporting a failure. eBPF: - Fix installation directory for header files included from eBPF proggies, avoiding clashing with relative paths used to build other software projects such as glibc. (Thomas Richter) - Show better message when failing to load an object. (Arnaldo Carvalho de Melo) General: (Christophe Leroy) - Allow overriding MAX_NR_CPUS at compile time, to make the tooling usable in systems with less memory, in time this has to be changed to properly allocate based on _NPROCESSORS_ONLN. Architecture specific: - Update arm64's ThunderX2 implementation defined pmu core events (Ganapatrao Kulkarni) - Fix complex event name parsing in 'perf test' for PowerPC, where the 'umask' event modifier isn't present. (Sandipan Das) CoreSight ARM hardware tracing: (Leo Yan) - Fix start tracing packet handling. - Support dummy address value for CS_ETM_TRACE_ON packet. - Generate branch sample when receiving a CS_ETM_TRACE_ON packet. - Generate branch sample for CS_ETM_TRACE_ON packet. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-08-02Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-08-01Revert "net/ipv6: fix metrics leak"David S. Miller
This reverts commit df18b50448fab1dff093731dfd0e25e77e1afcd1. This change causes other problems and use-after-free situations as found by syzbot. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01NFSv4: Fix _nfs4_do_setlk()Trond Myklebust
The patch to fix the case where a lock request was interrupted ended up changing default handling of errors such as NFS4ERR_DENIED and caused the client to immediately resend the lock request. Let's do a partial revert of that request so that the default is now to exit, but change the way we handle resends to take into account the fact that the user may have interrupted the request. Reported-by: Kenneth Johansson <ken@kenjo.org> Fixes: a3cf9bca2ace ("NFSv4: Don't add a new lock on an interrupted wait..") Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2018-08-01Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fix from Russell King: "Just a single fix this time around for recent binutils causing build problems when generating Thumb-2 code" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
2018-08-01mm: do not initialize TLB stack vma's with vma_init()Linus Torvalds
Commit 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") tried to initialize various left-over ad-hoc vma's "properly", but actually made things worse for the temporary vma's used for TLB flushing. vma_init() doesn't actually initialize all of the vma, just a few fields, so doing something like - struct vm_area_struct vma = { .vm_mm = tlb->mm, }; + struct vm_area_struct vma; + + vma_init(&vma, tlb->mm); was actually very bad: instead of having a nicely initialized vma with every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma with only a couple of fields initialized. And they weren't even fields that the code in question mostly cared about. The flush_tlb_range() function takes a "struct vma" rather than a "struct mm_struct", because a few architectures actually care about what kind of range it is - being able to only do an ITLB flush if it's a range that doesn't have data accesses enabled, for example. And all the normal users already have the vma for doing the range invalidation. But a few people want to call flush_tlb_range() with a range they just made up, so they also end up using a made-up vma. x86 just has a special "flush_tlb_mm_range()" function for this, but other architectures (arm and ia64) do the "use fake vma" thing instead, and thus got caught up in the vma_init() changes. At the same time, the TLB flushing code really doesn't care about most other fields in the vma, so vma_init() is just unnecessary and pointless. This fixes things by having an explicit "this is just an initializer for the TLB flush" initializer macro, which is used by the arm/arm64/ia64 people who mis-use this interface with just a dummy vma. Fixes: 2c4541e24c55 ("mm: use vma_init() to initialize VMAs on stack and data segments") Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01mm: delete historical BUG from zap_pmd_range()Hugh Dickins
Delete the old VM_BUG_ON_VMA() from zap_pmd_range(), which asserted that mmap_sem must be held when splitting an "anonymous" vma there. Whether that's still strictly true nowadays is not entirely clear, but the danger of sometimes crashing on the BUG is now fairly clear. Even with the new stricter rules for anonymous vma marking, the condition it checks for can possible trigger. Commit 44960f2a7b63 ("staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages") is good, and originally I thought it was safe from that VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE) insists on VM_SHARED. But after I read John's earlier mail, drawing attention to the vfs_fallocate() in there: I may be wrong, and I don't know if Android has THP in the config anyway, but it looks to me like an unmap_mapping_range() from ashmem's vfs_fallocate() could hit precisely the VM_BUG_ON_VMA(), once it's vma_is_anonymous(). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01perf trace: Do not require --no-syscalls to suppress strace like outputArnaldo Carvalho de Melo
So far the --syscalls option was the default, requiring explicit --no-syscalls when wanting to process just some other event, invert that and assume it only when no other event was specified, allowing its explicit enablement when wanting to see all syscalls together with some other event: E.g: The existing default is maintained for a single workload: # perf trace sleep 1 <SNIP> 0.264 ( 0.003 ms): sleep/12762 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f62cbf04000 0.271 ( 0.001 ms): sleep/12762 close(fd: 3) = 0 0.295 (1000.130 ms): sleep/12762 nanosleep(rqtp: 0x7ffd15194fd0) = 0 1000.469 ( 0.006 ms): sleep/12762 close(fd: 1) = 0 1000.480 ( 0.004 ms): sleep/12762 close(fd: 2) = 0 1000.502 ( ): sleep/12762 exit_group() # For a pid: # pidof ssh 7826 3961 3226 2628 2493 # perf trace -p 3961 ? ( ): ... [continued]: select()) = 1 0.023 ( 0.005 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce870 ) = 0 0.036 ( 0.009 ms): read(fd: 5</dev/pts/7>, buf: 0x7ffcc8fca7b0, count: 16384 ) = 3 0.060 ( 0.004 ms): getpid( ) = 3961 (ssh) 0.079 ( 0.004 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce8e0 ) = 0 0.088 ( 0.003 ms): clock_gettime(which_clock: BOOTTIME, tp: 0x7ffcc8fce7c0 ) = 0 <SNIP> For system wide, threads, cgroups, user, etc when no event is specified, the existing behaviour is maintained, i.e. --syscalls is selected. When some event is specified, then --no-syscalls doesn't need to be specified: # perf trace -e tcp:tcp_probe ssh localhost 0.000 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=53 snd_nxt=0xb67ce8f7 snd_una=0xb67ce8f7 snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=43690 0.010 tcp:tcp_probe:src=[::1]:39074 dest=[::1]:22 mark=0 length=32 snd_nxt=0xa8f9ef38 snd_una=0xa8f9ef23 snd_cwnd=10 ssthresh=2147483647 snd_wnd=43690 srtt=31 rcv_wnd=43776 4.525 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=1240 snd_nxt=0xb67ce90c snd_una=0xb67ce90c snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=43776 7.242 tcp:tcp_probe:src=[::1]:22 dest=[::1]:39074 mark=0 length=80 snd_nxt=0xb67ced44 snd_una=0xb67ce90c snd_cwnd=10 ssthresh=2147483647 snd_wnd=43776 srtt=18 rcv_wnd=174720 The authenticity of host 'localhost (::1)' can't be established. ECDSA key fingerprint is SHA256:TKZS58923458203490asekfjaklskljmkjfgPMBfHzY. ECDSA key fingerprint is MD5:d8:29:54:40:71:fa:b8:44:89:52:64:8a:35:42:d0:e8. Are you sure you want to continue connecting (yes/no)? ^C # To get the previous behaviour just use --syscalls and get all syscalls formatted strace like + the specified extra events: # trace -e sched:*switch --syscalls sleep 1 <SNIP> 0.160 ( 0.003 ms): sleep/12877 mprotect(start: 0x7fdfe2361000, len: 4096, prot: READ) = 0 0.164 ( 0.009 ms): sleep/12877 munmap(addr: 0x7fdfe2345000, len: 113155) = 0 0.211 ( 0.001 ms): sleep/12877 brk() = 0x55d3ce68e000 0.212 ( 0.002 ms): sleep/12877 brk(brk: 0x55d3ce6af000) = 0x55d3ce6af000 0.215 ( 0.001 ms): sleep/12877 brk() = 0x55d3ce6af000 0.219 ( 0.004 ms): sleep/12877 open(filename: 0xe1f07c00, flags: CLOEXEC) = 3 0.225 ( 0.001 ms): sleep/12877 fstat(fd: 3, statbuf: 0x7fdfe2138aa0) = 0 0.227 ( 0.003 ms): sleep/12877 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7fdfdb1b8000 0.234 ( 0.001 ms): sleep/12877 close(fd: 3) = 0 0.257 ( ): sleep/12877 nanosleep(rqtp: 0x7fffb36b6020) ... 0.260 ( ): sched:sched_switch:prev_comm=sleep prev_pid=12877 prev_prio=120 prev_state=D ==> next_comm=swapper/3 next_pid=0 next_prio=120 0.257 (1000.134 ms): sleep/12877 ... [continued]: nanosleep()) = 0 1000.428 ( 0.006 ms): sleep/12877 close(fd: 1) = 0 1000.440 ( 0.004 ms): sleep/12877 close(fd: 2) = 0 1000.461 ( ): sleep/12877 exit_group() # When specifiying just some syscalls, the behaviour doesn't change, i.e.: # trace -e nanosleep -e sched:*switch sleep 1 0.000 ( ): sleep/14974 nanosleep(rqtp: 0x7ffc344ba9c0 ) ... 0.007 ( ): sched:sched_switch:prev_comm=sleep prev_pid=14974 prev_prio=120 prev_state=D ==> next_comm=swapper/2 next_pid=0 next_prio=120 0.000 (1000.139 ms): sleep/14974 ... [continued]: nanosleep()) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-om2fulll97ytnxv40ler8jkf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-01rxrpc: Fix user call ID check in rxrpc_service_prealloc_oneYueHaibing
There just check the user call ID isn't already in use, hence should compare user_call_ID with xcall->user_call_ID, which is current node's user_call_ID. Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Suggested-by: David Howells <dhowells@redhat.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01Merge tag 'mmc-v4.18-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fix from Ulf Hansson: "MMC host: mxcmmc: Fix build error for powerpc" * tag 'mmc-v4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: mxcmmc: Fix missing parentheses and brace
2018-08-01Merge tag 'pm-urgent-4.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix the scope of a recent intel_pstate driver optimization used incorrectly on some systems due to processor identification ambiguity and fix a few issues in the turbostat utility, including three recent regressions. Specifics: - Use ACPI FADT preferred PM Profile to distinguish Skylake desktop processors from some server ones with the same model number in order to limit the scope of the recent IO-wait boost optimization to servers, as intended (Srinivas Pandruvada). - Fix several issues in the turbostat utility: * Fix the -S option on 1-CPU systems (Len Brown). * Fix computations using incorrect processor core counts (Artem Bityutskiy). * Fix the x2apic debug message (Len Brown). * Fix logical node enumeration to allow for non-sequential physical nodes (Prarit Bhargava). * Fix reported family on modern AMD processors (Calvin Walton). * Clarify the RAPL column information in the man page (Len Brown)" * tag 'pm-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Limit the scope of HWP dynamic boost platforms tools/power turbostat: version 18.07.27 tools/power turbostat: Read extended processor family from CPUID tools/power turbostat: Fix logical node enumeration to allow for non-sequential physical nodes tools/power turbostat: fix x2apic debug message output file tools/power turbostat: fix bogus summary values tools/power turbostat: fix -S on UP systems tools/power turbostat: Update turbostat(8) RAPL throttling column description
2018-08-01squashfs metadata 2: electric boogalooLinus Torvalds
Anatoly continues to find issues with fuzzed squashfs images. This time, corrupt, missing, or undersized data for the page filling wasn't checked for, because the squashfs_{copy,read}_cache() functions did the squashfs_copy_data() call without checking the resulting data size. Which could result in the page cache pages being incompletely filled in, and no error indication to the user space reading garbage data. So make a helper function for the "fill in pages" case, because the exact same incomplete sequence existed in two places. [ I should have made a squashfs branch for these things, but I didn't intend to start doing them in the first place. My historical connection through cramfs is why I got into looking at these issues at all, and every time I (continue to) think it's a one-off. Because _this_ time is always the last time. Right? - Linus ] Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Tested-by: Willy Tarreau <w@1wt.eu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pagesJohn Stultz
Amit Pundir and Youling in parallel reported crashes with recent mainline kernels running Android: F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** F DEBUG : Build fingerprint: 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key F DEBUG : Revision: '0' F DEBUG : ABI: 'arm' F DEBUG : pid: 2261, tid: 2261, name: zygote >>> zygote <<< F DEBUG : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008 ... <snip> ... F DEBUG : backtrace: F DEBUG : #00 pc 00001c04 /system/lib/libc.so (memset+48) F DEBUG : #01 pc 0010c513 /system/lib/libart.so (create_mspace_with_base+82) F DEBUG : #02 pc 0015c601 /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned int)+40) F DEBUG : #03 pc 0015c3ed /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, std::__1::basic_string<char, std::__ 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned int, unsigned int, bool)+36) ... This was bisected back to commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives"). create_mspace_with_base() in the trace above, utilizes ashmem, and with ashmem, for shared mappings we use shmem_zero_setup(), which sets the vma->vm_ops to &shmem_vm_ops. But for private ashmem mappings nothing sets the vma->vm_ops. Looking at the problematic patch, it seems to add a requirement that one call vma_set_anonymous() on a vma, otherwise the dummy_vm_ops will be used. Using the dummy_vm_ops seem to triggger SIGBUS when traversing unmapped pages. Thus, this patch adds a call to vma_set_anonymous() for ashmem private mappings and seems to avoid the reported problem. Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Colin Cross <ccross@google.com> Cc: Matthew Wilcox <willy@infradead.org> Reported-by: Amit Pundir <amit.pundir@linaro.org> Reported-by: Youling 257 <youling257@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01ia64: mark special ia64 memory areas anonymousLinus Torvalds
Commit bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") made newly allocated vma's have a dummy vm_ops field so that they wouldn't be mistaken for anonymous mappings, and if you wanted an anonymous vma you had to explicitly say so by calling "vma_set_anonymous()" on it. However, it missed the two special vmas that ia64 processes have: the register backing store and the NaT page. So they wouldn't actually act like anonymous ranges, and page faults on them caused a SIGBUS rather than the creation of a new anon page in them. That obviously will make any ia64 binary very unhappy indeed, and the boot fails early. Fixes: bfd40eaff5ab ("mm: fix vma_is_anonymous() false-positives") Reported-by: Tony Luck <tony.luck@intel.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-01net: dsa: Do not suspend/resume closed slave_devFlorian Fainelli
If a DSA slave network device was previously disabled, there is no need to suspend or resume it. Fixes: 2446254915a7 ("net: dsa: allow switch drivers to implement suspend/resume hooks") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>