summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2016-02-10lib/ucs2_string: Add ucs2 -> utf8 helper functionsPeter Jones
This adds ucs2_utf8size(), which tells us how big our ucs2 string is in bytes, and ucs2_as_utf8, which translates from ucs2 to utf8.. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
2016-02-09Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module fixes from Rusty Russell: "Fix for async_probe module param added in 4.3 (clearly not widely used yet), and a much more interesting kallsyms race which has been around approximately forever. This fix is more invasive, and will require some care in backporting, but I hated all the bandaids I could think of, so... There are some more coming, which are only for breakages introduced this cycle (livepatch), but wanted these in now" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: modules: fix longstanding /proc/kallsyms vs module insertion race. module: wrapper for symbol name. modules: fix modparam async_probe request
2016-02-09sched/debug: Make schedstats a runtime tunable that is disabled by defaultMel Gorman
schedstats is very useful during debugging and performance tuning but it incurs overhead to calculate the stats. As such, even though it can be disabled at build time, it is often enabled as the information is useful. This patch adds a kernel command-line and sysctl tunable to enable or disable schedstats on demand (when it's built in). It is disabled by default as someone who knows they need it can also learn to enable it when necessary. The benefits are dependent on how scheduler-intensive the workload is. If it is then the patch reduces the number of cycles spent calculating the stats with a small benefit from reducing the cache footprint of the scheduler. These measurements were taken from a 48-core 2-socket machine with Xeon(R) E5-2670 v3 cpus although they were also tested on a single socket machine 8-core machine with Intel i7-3770 processors. netperf-tcp 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean 64 560.45 ( 0.00%) 575.98 ( 2.77%) Hmean 128 766.66 ( 0.00%) 795.79 ( 3.80%) Hmean 256 950.51 ( 0.00%) 981.50 ( 3.26%) Hmean 1024 1433.25 ( 0.00%) 1466.51 ( 2.32%) Hmean 2048 2810.54 ( 0.00%) 2879.75 ( 2.46%) Hmean 3312 4618.18 ( 0.00%) 4682.09 ( 1.38%) Hmean 4096 5306.42 ( 0.00%) 5346.39 ( 0.75%) Hmean 8192 10581.44 ( 0.00%) 10698.15 ( 1.10%) Hmean 16384 18857.70 ( 0.00%) 18937.61 ( 0.42%) Small gains here, UDP_STREAM showed nothing intresting and neither did the TCP_RR tests. The gains on the 8-core machine were very similar. tbench4 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Hmean mb/sec-1 500.85 ( 0.00%) 522.43 ( 4.31%) Hmean mb/sec-2 984.66 ( 0.00%) 1018.19 ( 3.41%) Hmean mb/sec-4 1827.91 ( 0.00%) 1847.78 ( 1.09%) Hmean mb/sec-8 3561.36 ( 0.00%) 3611.28 ( 1.40%) Hmean mb/sec-16 5824.52 ( 0.00%) 5929.03 ( 1.79%) Hmean mb/sec-32 10943.10 ( 0.00%) 10802.83 ( -1.28%) Hmean mb/sec-64 15950.81 ( 0.00%) 16211.31 ( 1.63%) Hmean mb/sec-128 15302.17 ( 0.00%) 15445.11 ( 0.93%) Hmean mb/sec-256 14866.18 ( 0.00%) 15088.73 ( 1.50%) Hmean mb/sec-512 15223.31 ( 0.00%) 15373.69 ( 0.99%) Hmean mb/sec-1024 14574.25 ( 0.00%) 14598.02 ( 0.16%) Hmean mb/sec-2048 13569.02 ( 0.00%) 13733.86 ( 1.21%) Hmean mb/sec-3072 12865.98 ( 0.00%) 13209.23 ( 2.67%) Small gains of 2-4% at low thread counts and otherwise flat. The gains on the 8-core machine were slightly different tbench4 on 8-core i7-3770 single socket machine Hmean mb/sec-1 442.59 ( 0.00%) 448.73 ( 1.39%) Hmean mb/sec-2 796.68 ( 0.00%) 794.39 ( -0.29%) Hmean mb/sec-4 1322.52 ( 0.00%) 1343.66 ( 1.60%) Hmean mb/sec-8 2611.65 ( 0.00%) 2694.86 ( 3.19%) Hmean mb/sec-16 2537.07 ( 0.00%) 2609.34 ( 2.85%) Hmean mb/sec-32 2506.02 ( 0.00%) 2578.18 ( 2.88%) Hmean mb/sec-64 2511.06 ( 0.00%) 2569.16 ( 2.31%) Hmean mb/sec-128 2313.38 ( 0.00%) 2395.50 ( 3.55%) Hmean mb/sec-256 2110.04 ( 0.00%) 2177.45 ( 3.19%) Hmean mb/sec-512 2072.51 ( 0.00%) 2053.97 ( -0.89%) In constract, this shows a relatively steady 2-3% gain at higher thread counts. Due to the nature of the patch and the type of workload, it's not a surprise that the result will depend on the CPU used. hackbench-pipes 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v3r1 Amean 1 0.0637 ( 0.00%) 0.0660 ( -3.59%) Amean 4 0.1229 ( 0.00%) 0.1181 ( 3.84%) Amean 7 0.1921 ( 0.00%) 0.1911 ( 0.52%) Amean 12 0.3117 ( 0.00%) 0.2923 ( 6.23%) Amean 21 0.4050 ( 0.00%) 0.3899 ( 3.74%) Amean 30 0.4586 ( 0.00%) 0.4433 ( 3.33%) Amean 48 0.5910 ( 0.00%) 0.5694 ( 3.65%) Amean 79 0.8663 ( 0.00%) 0.8626 ( 0.43%) Amean 110 1.1543 ( 0.00%) 1.1517 ( 0.22%) Amean 141 1.4457 ( 0.00%) 1.4290 ( 1.16%) Amean 172 1.7090 ( 0.00%) 1.6924 ( 0.97%) Amean 192 1.9126 ( 0.00%) 1.9089 ( 0.19%) Some small gains and losses and while the variance data is not included, it's close to the noise. The UMA machine did not show anything particularly different pipetest 4.5.0-rc1 4.5.0-rc1 vanilla nostats-v2r2 Min Time 4.13 ( 0.00%) 3.99 ( 3.39%) 1st-qrtle Time 4.38 ( 0.00%) 4.27 ( 2.51%) 2nd-qrtle Time 4.46 ( 0.00%) 4.39 ( 1.57%) 3rd-qrtle Time 4.56 ( 0.00%) 4.51 ( 1.10%) Max-90% Time 4.67 ( 0.00%) 4.60 ( 1.50%) Max-93% Time 4.71 ( 0.00%) 4.65 ( 1.27%) Max-95% Time 4.74 ( 0.00%) 4.71 ( 0.63%) Max-99% Time 4.88 ( 0.00%) 4.79 ( 1.84%) Max Time 4.93 ( 0.00%) 4.83 ( 2.03%) Mean Time 4.48 ( 0.00%) 4.39 ( 1.91%) Best99%Mean Time 4.47 ( 0.00%) 4.39 ( 1.91%) Best95%Mean Time 4.46 ( 0.00%) 4.38 ( 1.93%) Best90%Mean Time 4.45 ( 0.00%) 4.36 ( 1.98%) Best50%Mean Time 4.36 ( 0.00%) 4.25 ( 2.49%) Best10%Mean Time 4.23 ( 0.00%) 4.10 ( 3.13%) Best5%Mean Time 4.19 ( 0.00%) 4.06 ( 3.20%) Best1%Mean Time 4.13 ( 0.00%) 4.00 ( 3.39%) Small improvement and similar gains were seen on the UMA machine. The gain is small but it stands to reason that doing less work in the scheduler is a good thing. The downside is that the lack of schedstats and tracepoints may be surprising to experts doing performance analysis until they find the existence of the schedstats= parameter or schedstats sysctl. It will be automatically activated for latencytop and sleep profiling to alleviate the problem. For tracepoints, there is a simple warning as it's not safe to activate schedstats in the context when it's known the tracepoint may be wanted but is unavailable. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1454663316-22048-1-git-send-email-mgorman@techsingularity.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09net:Add sysctl_max_skb_fragsHans Westgaard Ry
Devices may have limits on the number of fragments in an skb they support. Current codebase uses a constant as maximum for number of fragments one skb can hold and use. When enabling scatter/gather and running traffic with many small messages the codebase uses the maximum number of fragments and may thereby violate the max for certain devices. The patch introduces a global variable as max number of fragments. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-08nfs: fix nfs_size_to_loff_tChristoph Hellwig
See http: //www.infradead.org/rpr.html X-Evolution-Source: 1451162204.2173.11@leira.trondhjem.org Content-Transfer-Encoding: 8bit Mime-Version: 1.0 We support OFFSET_MAX just fine, so don't round down below it. Also switch to using min_t to make the helper more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: 433c92379d9c ("NFS: Clean up nfs_size_to_loff_t()") Cc: stable@vger.kernel.org # 2.6.23+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-06pty: make sure super_block is still valid in final /dev/tty closeHerton R. Krzesinski
Considering current pty code and multiple devpts instances, it's possible to umount a devpts file system while a program still has /dev/tty opened pointing to a previosuly closed pty pair in that instance. In the case all ptmx and pts/N files are closed, umount can be done. If the program closes /dev/tty after umount is done, devpts_kill_index will use now an invalid super_block, which was already destroyed in the umount operation after running ->kill_sb. This is another "use after free" type of issue, but now related to the allocated super_block instance. To avoid the problem (warning at ida_remove and potential crashes) for this specific case, I added two functions in devpts which grabs additional references to the super_block, which pty code now uses so it makes sure the super block structure is still valid until pty shutdown is done. I also moved the additional inode references to the same functions, which also covered similar case with inode being freed before /dev/tty final close/shutdown. Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Cc: stable@vger.kernel.org # 2.6.29+ Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "22 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT radix-tree: fix oops after radix_tree_iter_retry MAINTAINERS: trim the file triggers for ABI/API dax: dirty inode only if required thp: make deferred_split_scan() work again mm: replace vma_lock_anon_vma with anon_vma_lock_read/write ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup um: asm/page.h: remove the pte_high member from struct pte_t mm, hugetlb: don't require CMA for runtime gigantic pages mm/hugetlb: fix gigantic page initialization/allocation mm: downgrade VM_BUG in isolate_lru_page() to warning mempolicy: do not try to queue pages from !vma_migratable() mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress vmstat: make vmstat_update deferrable mm, vmstat: make quiet_vmstat lighter mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT memblock: don't mark memblock_phys_mem_size() as __init dump_stack: avoid potential deadlocks mm: validate_mm browse_rb SMP race condition m32r: fix build failure due to SMP and MMU ...
2016-02-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "We have a few wire protocol compatibility fixes, ports of a few recent CRUSH mapping changes, and a couple error path fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: MOSDOpReply v7 encoding libceph: advertise support for TUNABLES5 crush: decode and initialize chooseleaf_stable crush: add chooseleaf_stable tunable crush: ensure take bucket value is valid crush: ensure bucket id is valid before indexing buckets array ceph: fix snap context leak in error path ceph: checking for IS_ERR instead of NULL
2016-02-05radix-tree: fix oops after radix_tree_iter_retryKonstantin Khlebnikov
Helper radix_tree_iter_retry() resets next_index to the current index. In following radix_tree_next_slot current chunk size becomes zero. This isn't checked and it tries to dereference null pointer in slot. Tagged iterator is fine because retry happens only at slot 0 where tag bitmask in iter->tags is filled with single bit. Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ohad Ben-Cohen <ohad@wizery.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-05mm: replace vma_lock_anon_vma with anon_vma_lock_read/writeKonstantin Khlebnikov
Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if anon_vma appeared between lock and unlock. We have to check anon_vma first or call anon_vma_prepare() to be sure that it's here. There are only few users of these legacy helpers. Let's get rid of them. This patch fixes anon_vma lock imbalance in validate_mm(). Write lock isn't required here, read lock is enough. And reorders expand_downwards/expand_upwards: security_mmap_addr() and wrapping-around check don't have to be under anon vma lock. Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.com Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-05mm, hugetlb: don't require CMA for runtime gigantic pagesVlastimil Babka
Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") has added the runtime gigantic page allocation via alloc_contig_range(), making this support available only when CONFIG_CMA is enabled. Because it doesn't depend on MIGRATE_CMA pageblocks and the associated infrastructure, it is possible with few simple adjustments to require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA. After this patch, alloc_contig_range() and related functions are available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION enabled. Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION. This allows supporting runtime gigantic pages without the CMA-specific checks in page allocator fastpaths. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-04Merge remote-tracking branch 'mkp-scsi/4.5/scsi-fixes' into fixesJames Bottomley
2016-02-04block/sd: Return -EREMOTEIO when WRITE SAME and DISCARD are disabledMartin K. Petersen
When a storage device rejects a WRITE SAME command we will disable write same functionality for the device and return -EREMOTEIO to the block layer. -EREMOTEIO will in turn prevent DM from retrying the I/O and/or failing the path. Yiwen Jiang discovered a small race where WRITE SAME requests issued simultaneously would cause -EIO to be returned. This happened because any requests being prepared after WRITE SAME had been disabled for the device caused us to return BLKPREP_KILL. The latter caused the block layer to return -EIO upon completion. To overcome this we introduce BLKPREP_INVALID which indicates that this is an invalid request for the device. blk_peek_request() is modified to return -EREMOTEIO in that case. Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Suggested-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinicke <hare@suse.de> Reviewed-by: Ewan Milne <emilne@redhat.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-02-04libceph: MOSDOpReply v7 encodingIlya Dryomov
Empty request_redirect_t (struct ceph_request_redirect in the kernel client) is now encoded with a bool. NEW_OSDOPREPLY_ENCODING feature bit overlaps with already supported CRUSH_TUNABLES5. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-02-04libceph: advertise support for TUNABLES5Ilya Dryomov
Add TUNABLES5 feature (chooseleaf_stable tunable) to a set of features supported by default. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-02-04crush: add chooseleaf_stable tunableIlya Dryomov
Add a tunable to fix the bug that chooseleaf may cause unnecessary pg migrations when some device fails. Reflects ceph.git commit fdb3f664448e80d984470f32f04e2e6f03ab52ec. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-02-04lightnvm: allow to force mm initializationMatias Bjørling
System block allows the device to initialize with its configured media manager. The system blocks is written to disk, and read again when media manager is determined. For this to work, the backend must store the data. Device drivers, such as null_blk, does not have any backend storage. This patch allows the media manager to be initialized without a storage backend. It also fix incorrect configuration of capabilities in null_blk, as it does not support get/set bad block interface. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-03Merge branch 'mymd/for-next' into mymd/for-linusShaohua Li
2016-02-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "18 fixes" [ The 18 fixes turned into 17 commits, because one of the fixes was a fix for another patch in the series that I just folded in by editing the patch manually - hopefully correctly - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: fix memory leak in copy_huge_pmd() drivers/hwspinlock: fix race between radix tree insertion and lookup radix-tree: fix race in gang lookup mm/vmpressure.c: fix subtree pressure detection mm: polish virtual memory accounting mm: warn about VmData over RLIMIT_DATA Documentation: cgroup-v2: add memory.stat::sock description mm: memcontrol: drop superfluous entry in the per-memcg stats array drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration proc: revert /proc/<pid>/maps [stack:TID] annotation numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390 MAINTAINERS: update Seth email ocfs2/cluster: fix memory leak in o2hb_region_release lib/test-string_helpers.c: fix and improve string_get_size() tests thp: limit number of object to scan on deferred_split_scan() thp: change deferred_split_count() to return number of THP in queue thp: make split_queue per-node
2016-02-03Merge tag 'devicetree-fixes-for-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree fixes from Rob Herring: - Fix build error with *_OF_DECLARE() when used in modules - Add missing platform maintainers for dts files in MAINTAINERS * tag 'devicetree-fixes-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: drop symbols declared by _OF_DECLARE() from modules MAINTAINERS: Add missing platform maintainers for dts files
2016-02-03radix-tree: fix race in gang lookupMatthew Wilcox
If the indirect_ptr bit is set on a slot, that indicates we need to redo the lookup. Introduce a new function radix_tree_iter_retry() which forces the loop to retry the lookup by setting 'slot' to NULL and turning the iterator back to point at the problematic entry. This is a pretty rare problem to hit at the moment; the lookup has to race with a grow of the radix tree from a height of 0. The consequences of hitting this race are that gang lookup could return a pointer to a radix_tree_node instead of a pointer to whatever the user had inserted in the tree. Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator") Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ohad Ben-Cohen <ohad@wizery.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03mm: polish virtual memory accountingKonstantin Khlebnikov
* add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture * always account VMAs with flag VM_STACK as stack (as it was before) * cleanup classifying helpers * update comments and documentation Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03mm: memcontrol: drop superfluous entry in the per-memcg stats arrayJohannes Weiner
MEM_CGROUP_STAT_NSTATS is just a delimiter for cgroup1 statistics, not an actual array entry. Reuse it for the first cgroup2 stat entry, like in the event array. Fixes: b2807f07f4f8 ("mm: memcontrol: add "sock" to cgroup2 memory.stat") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03proc: revert /proc/<pid>/maps [stack:TID] annotationJohannes Weiner
Commit b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps. Finding the task of a stack VMA requires walking the entire thread list, turning this into quadratic behavior: a thousand threads means a thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a million combinations. The cost is not in proportion to the usefulness as described in the patch. Drop the [stack:TID] annotation to make /proc/<pid>/maps (and /proc/<pid>/numa_maps) usable again for higher thread counts. The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as identifying the stack VMA there is an O(1) operation. Siddesh said: "The end users needed a way to identify thread stacks programmatically and there wasn't a way to do that. I'm afraid I no longer remember (or have access to the resources that would aid my memory since I changed employers) the details of their requirement. However, I did do this on my own time because I thought it was an interesting project for me and nobody really gave any feedback then as to its utility, so as far as I am concerned you could roll back the main thread maps information since the information is available in the thread-specific files" Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03thp: make split_queue per-nodeKirill A. Shutemov
Andrea Arcangeli suggested to make split queue per-node to improve scalability. Let's do it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03modules: fix longstanding /proc/kallsyms vs module insertion race.Rusty Russell
For CONFIG_KALLSYMS, we keep two symbol tables and two string tables. There's one full copy, marked SHF_ALLOC and laid out at the end of the module's init section. There's also a cut-down version that only contains core symbols and strings, and lives in the module's core section. After module init (and before we free the module memory), we switch the mod->symtab, mod->num_symtab and mod->strtab to point to the core versions. We do this under the module_mutex. However, kallsyms doesn't take the module_mutex: it uses preempt_disable() and rcu tricks to walk through the modules, because it's used in the oops path. It's also used in /proc/kallsyms. There's nothing atomic about the change of these variables, so we can get the old (larger!) num_symtab and the new symtab pointer; in fact this is what I saw when trying to reproduce. By grouping these variables together, we can use a carefully-dereferenced pointer to ensure we always get one or the other (the free of the module init section is already done in an RCU callback, so that's safe). We allocate the init one at the end of the module init section, and keep the core one inside the struct module itself (it could also have been allocated at the end of the module core, but that's probably overkill). Reported-by: Weilong Chen <chenweilong@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541 Cc: stable@kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2016-02-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "This looks like a lot but it's a mixture of regression fixes as well as fixes for longer standing issues. 1) Fix on-channel cancellation in mac80211, from Johannes Berg. 2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables module, from Eric Dumazet. 3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric Dumazet. 4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is bound, from Craig Gallek. 5) GRO key comparisons don't take lightweight tunnels into account, from Jesse Gross. 6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric Dumazet. 7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we register them, otherwise the NEWLINK netlink message is missing the proper attributes. From Thadeu Lima de Souza Cascardo. 8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido Schimmel 9) Handle fragments properly in ipv4 easly socket demux, from Eric Dumazet. 10) Don't ignore the ifindex key specifier on ipv6 output route lookups, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits) tcp: avoid cwnd undo after receiving ECN irda: fix a potential use-after-free in ircomm_param_request net: tg3: avoid uninitialized variable warning net: nb8800: avoid uninitialized variable warning net: vxge: avoid unused function warnings net: bgmac: clarify CONFIG_BCMA dependency net: hp100: remove unnecessary #ifdefs net: davinci_cpdma: use dma_addr_t for DMA address ipv6/udp: use sticky pktinfo egress ifindex on connect() ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() netlink: not trim skb for mmaped socket when dump vxlan: fix a out of bounds access in __vxlan_find_mac net: dsa: mv88e6xxx: fix port VLAN maps fib_trie: Fix shift by 32 in fib_table_lookup net: moxart: use correct accessors for DMA memory ipv4: ipconfig: avoid unused ic_proto_used symbol bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout. bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter. bnxt_en: Ring free response from close path should use completion ring net_sched: drr: check for NULL pointer in drr_dequeue ...
2016-02-01Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "1/ Fixes to the libnvdimm 'pfn' device that establishes a reserved area for storing a struct page array. 2/ Fixes for dax operations on a raw block device to prevent pagecache collisions with dax mappings. 3/ A fix for pfn_t usage in vm_insert_mixed that lead to a null pointer de-reference. These have received build success notification from the kbuild robot across 153 configs and pass the latest ndctl tests" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: phys_to_pfn_t: use phys_addr_t mm: fix pfn_t to page conversion in vm_insert_mixed block: use DAX for partition table reads block: revert runtime dax control of the raw block device fs, block: force direct-I/O for dax-enabled block devices devm_memremap_pages: fix vmem_altmap lifetime + alignment handling libnvdimm, pfn: fix restoring memmap location libnvdimm: fix mode determination for e820 devices
2016-01-31Merge tag 'tty-4.5-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small tty/serial driver fixes for 4.5-rc2. They resolve a number of reported problems (the ioctl one specifically has been pointed out by numerous people) and one patch adds some new device ids for the 8250_pci driver. All have been in linux-next successfully" * tag 'tty-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250_pci: Add Intel Broadwell ports staging/speakup: Use tty_ldisc_ref() for paste kworker n_tty: Fix unsafe reference to "other" ldisc tty: Fix unsafe ldisc reference via ioctl(TIOCGETD) tty: Retry failed reopen if tty teardown in-progress tty: Wait interruptibly for tty lock on reopen
2016-01-31Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "The timer departement delivers: - a regression fix for the NTP code along with a proper selftest - prevent a spurious timer interrupt in the NOHZ lowres code - a fix for user space interfaces returning the remaining time on architectures with CONFIG_TIME_LOW_RES=y - a few patches to fix COMPILE_TEST fallout" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Set the correct expiry when switching to nohz/lowres mode clocksource: Fix dependencies for archs w/o HAS_IOMEM clocksource: Select CLKSRC_MMIO where needed tick/sched: Hide unused oneshot timer code kselftests: timers: Add adjtimex SETOFFSET validity tests ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper hrtimer: Handle remaining time proper for TIME_LOW_RES clockevents/tcb_clksrc: Prevent disabling an already disabled clock
2016-01-31Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "This is much bigger than typical fixes, but Peter found a category of races that spurred more fixes and more debugging enhancements. Work started before the merge window, but got finished only now. Aside of that this contains the usual small fixes to perf and tools. Nothing particular exciting" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) perf: Remove/simplify lockdep annotation perf: Synchronously clean up child events perf: Untangle 'owner' confusion perf: Add flags argument to perf_remove_from_context() perf: Clean up sync_child_event() perf: Robustify event->owner usage and SMP ordering perf: Fix STATE_EXIT usage perf: Update locking order perf: Remove __free_event() perf/bpf: Convert perf_event_array to use struct file perf: Fix NULL deref perf/x86: De-obfuscate code perf/x86: Fix uninitialized value usage perf: Fix race in perf_event_exit_task_context() perf: Fix orphan hole perf stat: Do not clean event's private stats perf hists: Fix HISTC_MEM_DCACHELINE width setting perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed perf tests: Remove wrong semicolon in while loop in CQM test perf: Synchronously free aux pages in case of allocation failure ...
2016-01-31Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull IRQ fixes from Ingo Molnar: "Mostly irqchip driver fixes, but also an irq core crash fix and a build fix" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mxs: Add missing set_handle_irq() irqchip/atmel-aic: Fix wrong bit operation for IRQ priority irqchip/gic-v3-its: Recompute the number of pages on page size change base: Export platform_msi_domain_[alloc,free]_irqs of: MSI: Simplify irqdomain lookup irqdomain: Allow domain lookup with DOMAIN_BUS_WIRED token irqchip: Fix dependencies for archs w/o HAS_IOMEM irqchip/s3c24xx: Mark init_eint as __maybe_unused genirq: Validate action before dereferencing it in handle_irq_event_percpu()
2016-01-31phys_to_pfn_t: use phys_addr_tDan Williams
A dma_addr_t is potentially smaller than a phys_addr_t on some archs. Don't truncate the address when doing the pfn conversion. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Matthew Wilcox <willy@linux.intel.com> [willy: fix pfn_t_to_phys as well] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-30block: use DAX for partition table readsDan Williams
Avoid populating pagecache when the block device is in DAX mode. Otherwise these page cache entries collide with the fsync/msync implementation and break data durability guarantees. Cc: Jan Kara <jack@suse.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-30block: revert runtime dax control of the raw block deviceDan Williams
Dynamically enabling DAX requires that the page cache first be flushed and invalidated. This must occur atomically with the change of DAX mode otherwise we confuse the fsync/msync tracking and violate data durability guarantees. Eliminate the possibilty of DAX-disabled to DAX-enabled transitions for now and revisit this for the next cycle. Cc: Jan Kara <jack@suse.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-30fs, block: force direct-I/O for dax-enabled block devicesDan Williams
Similar to the file I/O path, re-direct all I/O to the DAX path for I/O to a block-device special file. Both regular files and device special files can use the common filp->f_mapping->host lookup to determing is DAX is enabled. Otherwise, we confuse the DAX code that does not expect to find live data in the page cache: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217 __delete_from_page_cache+0x9f6/0xb60() Modules linked in: CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000 ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089 ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50 [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482 [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515 [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217 [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244 [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487 [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730 [< inline >] wp_pfn_shared mm/memory.c:2208 [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307 [< inline >] handle_pte_fault mm/memory.c:3323 [< inline >] __handle_mm_fault mm/memory.c:3417 [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446 [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238 [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331 [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264 [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986 [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185 ---[ end trace dae21e0f85f1f98c ]--- Fixes: 5a023cdba50c ("block: enable dax for raw block devices") Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-29Merge branch 'stable/for-linus-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb patchlet from Konrad Rzeszutek Wilk: "One trivial patch. Another patch (from Fengguang) is already in your tree courtesy of Andrew Morton - but I would prefer not to rebase my tree. Hence the diff is very small" * 'stable/for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: Make linux/swiotlb.h standalone includible MAINTAINERS: add git URL for swiotlb
2016-01-29Merge branch 'stable/for-linus-4.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm Pull cleancache cleanups from Konrad Rzeszutek Wilk: "Simple cleanups" * 'stable/for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm: include/linux/cleancache.h: Clean up code cleancache: constify cleancache_ops structure
2016-01-29Merge tag 'iommu-fixes-v4.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: "Five patches queued up: - Two patches for the AMD and Intel IOMMU drivers to fix alias handling and ATS handling. - Fix build error with arm io-pgtable code - Two documentation fixes" * tag 'iommu-fixes-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Update struct iommu_ops comments iommu/vt-d: Fix link to Intel IOMMU Specification iommu/amd: Correct the wrong setting of alias DTE in do_attach iommu/vt-d: Don't skip PCI devices when disabling IOTLB iommu/io-pgtable-arm: Fix io-pgtable-arm build failure
2016-01-29workqueue: skip flush dependency checks for legacy workqueuesTejun Heo
fca839c00a12 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to flush a !WQ_MEM_RECLAIM workquee. This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory reclaim path and making it depend on something which may need more memory to make forward progress can lead to deadlocks. Unfortunately, workqueues created with the legacy create*_workqueue() interface always have WQ_MEM_RECLAIM regardless of whether they are depended upon memory reclaim or not. These spurious WQ_MEM_RECLAIM markings cause spurious triggering of the flush dependency checks. WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144() workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu ... Workqueue: deferwq deferred_probe_work_func [<c0017acc>] (unwind_backtrace) from [<c0013134>] (show_stack+0x10/0x14) [<c0013134>] (show_stack) from [<c0245f18>] (dump_stack+0x94/0xd4) [<c0245f18>] (dump_stack) from [<c0026f9c>] (warn_slowpath_common+0x80/0xb0) [<c0026f9c>] (warn_slowpath_common) from [<c0026ffc>] (warn_slowpath_fmt+0x30/0x40) [<c0026ffc>] (warn_slowpath_fmt) from [<c00390b8>] (check_flush_dependency+0x138/0x144) [<c00390b8>] (check_flush_dependency) from [<c0039ca0>] (flush_work+0x50/0x15c) [<c0039ca0>] (flush_work) from [<c00c51b0>] (lru_add_drain_all+0x130/0x180) [<c00c51b0>] (lru_add_drain_all) from [<c00f728c>] (migrate_prep+0x8/0x10) [<c00f728c>] (migrate_prep) from [<c00bfbc4>] (alloc_contig_range+0xd8/0x338) [<c00bfbc4>] (alloc_contig_range) from [<c00f8f18>] (cma_alloc+0xe0/0x1ac) [<c00f8f18>] (cma_alloc) from [<c001cac4>] (__alloc_from_contiguous+0x38/0xd8) [<c001cac4>] (__alloc_from_contiguous) from [<c001ceb4>] (__dma_alloc+0x240/0x278) [<c001ceb4>] (__dma_alloc) from [<c001cf78>] (arm_dma_alloc+0x54/0x5c) [<c001cf78>] (arm_dma_alloc) from [<c0355ea4>] (dmam_alloc_coherent+0xc0/0xec) [<c0355ea4>] (dmam_alloc_coherent) from [<c039cc4c>] (ahci_port_start+0x150/0x1dc) [<c039cc4c>] (ahci_port_start) from [<c0384734>] (ata_host_start.part.3+0xc8/0x1c8) [<c0384734>] (ata_host_start.part.3) from [<c03898dc>] (ata_host_activate+0x50/0x148) [<c03898dc>] (ata_host_activate) from [<c039d558>] (ahci_host_activate+0x44/0x114) [<c039d558>] (ahci_host_activate) from [<c039f05c>] (ahci_platform_init_host+0x1d8/0x3c8) [<c039f05c>] (ahci_platform_init_host) from [<c039e6bc>] (tegra_ahci_probe+0x448/0x4e8) [<c039e6bc>] (tegra_ahci_probe) from [<c0347058>] (platform_drv_probe+0x50/0xac) [<c0347058>] (platform_drv_probe) from [<c03458cc>] (driver_probe_device+0x214/0x2c0) [<c03458cc>] (driver_probe_device) from [<c0343cc0>] (bus_for_each_drv+0x60/0x94) [<c0343cc0>] (bus_for_each_drv) from [<c03455d8>] (__device_attach+0xb0/0x114) [<c03455d8>] (__device_attach) from [<c0344ab8>] (bus_probe_device+0x84/0x8c) [<c0344ab8>] (bus_probe_device) from [<c0344f48>] (deferred_probe_work_func+0x68/0x98) [<c0344f48>] (deferred_probe_work_func) from [<c003b738>] (process_one_work+0x120/0x3f8) [<c003b738>] (process_one_work) from [<c003ba48>] (worker_thread+0x38/0x55c) [<c003ba48>] (worker_thread) from [<c0040f14>] (kthread+0xdc/0xf4) [<c0040f14>] (kthread) from [<c000f778>] (ret_from_fork+0x14/0x3c) Fix it by marking workqueues created via create*_workqueue() with __WQ_LEGACY and disabling flush dependency checks on them. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Thierry Reding <thierry.reding@gmail.com> Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com Fixes: fca839c00a12 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
2016-01-29iommu: Update struct iommu_ops commentsMagnus Damm
Update the comments around struct iommu_ops to match current state and fix a few typos while at it. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2016-01-29perf: Synchronously clean up child eventsPeter Zijlstra
The orphan cleanup workqueue doesn't always catch orphans, for example, if they never schedule after they are orphaned. IOW, the event leak is still very real. It also wouldn't work for kernel counters. Doing it synchonously is a little hairy due to lock inversion issues, but is made to work. Patch based on work by Alexander Shishkin. Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29perf/bpf: Convert perf_event_array to use struct fileAlexei Starovoitov
Robustify refcounting. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160126045947.GA40151@ast-mbp.thefacebook.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-28Merge tag 'trace-v4.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull minor tracing fixes from Steven Rostedt: "This includes three minor fixes, mostly due to cut-and-paste issues. The first is a cut and paste issue that changed the amount of stack to skip when tracing a stack dump from 0 to 6, which basically made the stack disappear for small stack traces. The second fix is just removing an unused field in a struct that is no longer used, and currently just wastes space. The third is another cut-and-paste fix that had a tracepoint recording the wrong field (it was recording the previous field a second time)" * tag 'trace-v4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/dma-buf/fence: Fix timeline str value on fence_annotate_wait_on ftrace: Remove unused nr_trampolines var tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()
2016-01-27include/linux/cleancache.h: Clean up codeChen Gang
Let cleancache_fs_enabled() call cleancache_fs_enabled_mapping() directly. Remove redundant variable ret in cleancache_get_page(). Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-27cleancache: constify cleancache_ops structureJulia Lawall
The cleancache_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-26tty: Wait interruptibly for tty lock on reopenPeter Hurley
Allow a signal to interrupt the wait for a tty reopen; eg., if the tty has starting final close and is waiting for the device to drain. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Cc: stable <stable@vger.kernel.org> # 4.4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-26irqdomain: Allow domain lookup with DOMAIN_BUS_WIRED tokenMarc Zyngier
Let's take the (outlandish) example of an interrupt controller capable of handling both wired interrupts and PCI MSIs. With the current code, the PCI MSI domain is going to be tagged with DOMAIN_BUS_PCI_MSI, and the wired domain with DOMAIN_BUS_ANY. Things get hairy when we start looking up the domain for a wired interrupt (typically when creating it based on some firmware information - DT or ACPI). In irq_create_fwspec_mapping(), we perform the lookup using DOMAIN_BUS_ANY, which is actually used as a wildcard. This gives us one chance out of two to end up with the wrong domain, and we try to configure a wired interrupt with the MSI domain. Everything grinds to a halt pretty quickly. What we really need to do is to start looking for a domain that would uniquely identify a wired interrupt domain, and only use DOMAIN_BUS_ANY as a fallback. In order to solve this, let's introduce a new DOMAIN_BUS_WIRED token, which is going to be used exactly as described above. Of course, this depends on the irqchip to setup the domain bus_token, and nobody had to implement this so far. Only so far. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1453816347-32720-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-25drivers: ata: wake port before DMA stop for ALPMDanesh Petigara
The AHCI driver code stops and starts port DMA engines at will without considering the power state of the particular port. The AHCI specification isn't very clear on how to handle this scenario, leaving implementation open to interpretation. Broadcom's STB SATA host controller is unable to handle port DMA controller restarts when the port in question is in low power mode. When a port enters partial or slumber mode, its PHY is powered down. When a controller restart is requested, the controller's internal state machine expects the PHY to be brought back up by software which never happens in this case, resulting in failures. To avoid this situation, logic is added to manually wake up the port just before its DMA engine is stopped, if the port happens to be in a low power state. HBA initiated power management ensures that the port eventually returns to its configured low power state, when the link is idle (as per the conditions listed in the spec). A new host flag is also added to ensure this logic is only exercised for hosts with the above limitation. tj: Formatting changes. Signed-off-by: Danesh Petigara <dpetigara@broadcom.com> Reviewed-by: Markus Mayer <mmayer@broadcom.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-01-25of: drop symbols declared by _OF_DECLARE() from modulesMasahiro Yamada
The users of this macro (OF_EARLYCON_DECLARE, CLK_OF_DECLARE, IRQCHIP_DECLARE, etc.) are only parsed in the early boot stage. Such symbols contained in modules are never used. This commit fixes the link error introduced by commit b8d20e06eaad ("serial: 8250_uniphier: add earlycon support"); the combination of CONFIG_SERIAL_8250_UNIPHIER=m and CONFIG_SERIAL_8250_CONSOLE=y fails to link: ERROR: "early_serial8250_setup" [drivers/tty/serial/8250/8250_uniphier.ko] undefined! Fixes: b8d20e06eaad ("serial: 8250_uniphier: add earlycon support") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rob Herring <robh@kernel.org>