summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-08-12perf/x86/intel/uncore: Add enable_box for client MSR uncoreKan Liang
There are bug reports about miscounting uncore counters on some client machines like Sandybridge, Broadwell and Skylake. It is very likely to be observed on idle systems. This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be cleared after Package C7, and nothing will be count. The related errata (HSD 158) could be found in: www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL in ->enable_box(). The workaround does not cover all cases. It helps for new events after returning from C7. But it cannot prevent C7, it will still miscount if a counter is already active. There is no drawback in leaving it enabled, so it does not need disable_box() here. Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-12perf/x86/intel/uncore: Fix uncore num_countersKan Liang
Some uncore boxes' num_counters value for Haswell server and Broadwell server are not correct (too large, off by one). This issue was found by comparing the code with the document. Although there is no bug report from users yet, accessing non-existent counters is dangerous and the behavior is undefined: it may cause miscounting or even crashes. This patch makes them consistent with the uncore document. Reported-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-12uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructionsDenys Vlasenko
Since instruction decoder now supports EVEX-encoded instructions, two fixes are needed to correctly handle them in uprobes. Extended bits for MODRM.rm field need to be sanitized just like we do it for VEX3, to avoid encoding wrong register for register-relative access. EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be ignored by the CPU (since GPRs go only up to 15, not 31), but let's be paranoid here: proper encoding for register-relative access should have EVEX.x = 1. Secondly, we should fetch vex.vvvv for EVEX too. This is now super easy because instruction decoder populates vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> # v4.1+ Fixes: 8a764a875fe3 ("x86/asm/decoder: Create artificial 3rd byte for 2-byte VEX") Link: http://lkml.kernel.org/r/20160811154521.20469-1-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "7 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats mm, oom: fix uninitialized ret in task_will_free_mem() kasan: remove the unnecessary WARN_ONCE from quarantine.c mm: memcontrol: fix memcg id ref counter on swap charge move mm: memcontrol: fix swap counter leak on swapout from offline cgroup proc, meminfo: use correct helpers for calculating LRU sizes in meminfo mm/hugetlb: fix incorrect hugepages count during mem hotplug
2016-08-11mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdatsReza Arbab
The following oops occurs after a pgdat is hotadded: Unable to handle kernel paging request for data at address 0x00c30001 Faulting instruction address: 0xc00000000022f8f4 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter nls_utf8 isofs sg virtio_balloon uio_pdrv_genirq uio ip_tables xfs libcrc32c sr_mod cdrom sd_mod virtio_net ibmvscsi scsi_transport_srp virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.8.0-rc1-device #110 task: c000000000ef3080 task.stack: c000000000f6c000 NIP: c00000000022f8f4 LR: c00000000022f948 CTR: 0000000000000000 REGS: c000000000f6fa50 TRAP: 0300 Tainted: G W (4.8.0-rc1-device) MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 84002028 XER: 20000000 CFAR: d000000001d2013c DAR: 0000000000c30001 DSISR: 40000000 SOFTE: 0 NIP refresh_cpu_vm_stats+0x1a4/0x2f0 LR refresh_cpu_vm_stats+0x1f8/0x2f0 Call Trace: refresh_cpu_vm_stats+0x1f8/0x2f0 (unreliable) Add per_cpu_nodestats initialization to the hotplug codepath. Link: http://lkml.kernel.org/r/1470931473-7090-1-git-send-email-arbab@linux.vnet.ibm.com Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11mm, oom: fix uninitialized ret in task_will_free_mem()Geert Uytterhoeven
mm/oom_kill.c: In function `task_will_free_mem': mm/oom_kill.c:767: warning: `ret' may be used uninitialized in this function If __task_will_free_mem() is never called inside the for_each_process() loop, ret will not be initialized. Fixes: 1af8bb43269563e4 ("mm, oom: fortify task_will_free_mem()") Link: http://lkml.kernel.org/r/1470255599-24841-1-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11kasan: remove the unnecessary WARN_ONCE from quarantine.cAlexander Potapenko
It's quite unlikely that the user will so little memory that the per-CPU quarantines won't fit into the given fraction of the available memory. Even in that case he won't be able to do anything with the information given in the warning. Link: http://lkml.kernel.org/r/1470929182-101413-1-git-send-email-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11mm: memcontrol: fix memcg id ref counter on swap charge moveVladimir Davydov
Since commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") swap entries do not pin memcg->css.refcnt directly. Instead, they pin memcg->id.ref. So we should adjust the reference counters accordingly when moving swap charges between cgroups. Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Link: http://lkml.kernel.org/r/9ce297c64954a42dc90b543bc76106c4a94f07e8.1470219853.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11mm: memcontrol: fix swap counter leak on swapout from offline cgroupVladimir Davydov
An offline memory cgroup might have anonymous memory or shmem left charged to it and no swap. Since only swap entries pin the id of an offline cgroup, such a cgroup will have no id and so an attempt to swapout its anon/shmem will not store memory cgroup info in the swap cgroup map. As a result, memcg->swap or memcg->memsw will never get uncharged from it and any of its ascendants. Fix this by always charging swapout to the first ancestor cgroup that hasn't released its id yet. [hannes@cmpxchg.org: add comment to mem_cgroup_swapout] [vdavydov@virtuozzo.com: use WARN_ON_ONCE() in mem_cgroup_id_get_online()] Link: http://lkml.kernel.org/r/20160803123445.GJ13263@esperanza Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Link: http://lkml.kernel.org/r/5336daa5c9a32e776067773d9da655d2dc126491.1470219853.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> [3.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11proc, meminfo: use correct helpers for calculating LRU sizes in meminfoMel Gorman
meminfo_proc_show() and si_mem_available() are using the wrong helpers for calculating the size of the LRUs. The user-visible impact is that there appears to be an abnormally high number of unevictable pages. Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11mm/hugetlb: fix incorrect hugepages count during mem hotplugzhong jiang
When memory hotplug operates, free hugepages will be freed if the movable node is offline. Therefore, /proc/sys/vm/nr_hugepages will be incorrect. Fix it by reducing max_huge_pages when the node is offlined. n-horiguchi@ah.jp.nec.com said: : dissolve_free_huge_page intends to break a hugepage into buddy, and the : destination hugepage is supposed to be allocated from the pool of the : destination node, so the system-wide pool size is reduced. So adding : h->max_huge_pages-- makes sense to me. Link: http://lkml.kernel.org/r/1470624546-902-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "A couple of bug fixes have come in for v4.8 so far. Since the first few were originally meant to go into -rc1 (but didn't get sent in time for travel reasons), the branch is unfortunately based on top of a commit in the middle of the merge window rather than -rc1. Content-wise we have: - a fix for the last remaining broken build in kernelci, getting mach-shmobile to build again with SMP disabled - a fix for a realview regression that broke real hardware but not the qemu model that everyone uses in practice (needed for v4.7 as well) - a merge conflict fix for Tegra that also broke v4.7 - two Kconfig fixes for arm64 build regressions - a couple of arm32 build warning fixes (all harmless) - fix the RTC on Exynos7 Espresso (which apparently never worked right)" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: Merge tag 'pxa-fixes-v4.8' of https://github.com/rjarzmik/linux into randconfig-4.8 arm64: Kconfig: select HISILICON_IRQ_MBIGEN only if PCI is selected arm64: Kconfig: select ALPINE_MSI only if PCI is selected ARM: dts: realview: Fix PBX-A9 cache description ARM: tegra: fix erroneous address in dts ARM: dts: add syscon compatible string for AP syscon ARM: dts: add syscon compatible string for CP syscon ARM: oxnas: select reset controller framework ARM: hide mach-*/ include for ARM_SINGLE_ARMV7M ARM: don't include removed directories Revert "ARM: aspeed: adapt defconfigs for new CONFIG_PRINTK_TIME" ARM: shmobile: don't call platform_can_secondary_boot on UP MAINTAINER: alpine: add a mailing list ARM: do away with final ARCH_REQUIRE_GPIOLIB arm64: dts: Fix RTC by providing rtc_src clock
2016-08-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vhost fixes and cleanups from Michael Tsirkin: "Misc fixes and cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio/s390: deprecate old transport virtio/s390: keep early_put_chars virtio_blk: Fix a slient kernel panic virtio-vsock: fix include guard typo vhost/vsock: fix vhost virtio_vsock_pkt use-after-free 9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc() virtio: fix error handling for debug builds virtio: fix memory leak in virtqueue_add()
2016-08-11Merge tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A patch for a NULL dereference bug introduced in 4.8-rc1 and a handful of static checker fixes" * tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client: ceph: initialize pathbase in the !dentry case in encode_caps_cb() rbd: nuke the 32-bit pool id check rbd: destroy header_oloc in rbd_dev_release() ceph: fix null pointer dereference in ceph_flush_snaps() libceph: using kfree_rcu() to simplify the code libceph: make cancel_generic_request() static libceph: fix return value check in alloc_msg_with_page_vector()
2016-08-11nfsd: Fix race between FREE_STATEID and LOCKChuck Lever
When running LTP's nfslock01 test, the Linux client can send a LOCK and a FREE_STATEID request at the same time. The outcome is: Frame 324 R OPEN stateid [2,O] Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64 Frame 115008 R LOCK stateid [1,L] Frame 115012 C WRITE stateid [0,L] offset 672000 len 64 Frame 115016 R WRITE NFS4_OK Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64 Frame 115022 R LOCKU NFS4_OK Frame 115025 C FREE_STATEID stateid [2,L] Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64 Frame 115029 R FREE_STATEID NFS4_OK Frame 115030 R LOCK stateid [3,L] Frame 115034 C WRITE stateid [0,L] offset 672128 len 64 Frame 115038 R WRITE NFS4ERR_BAD_STATEID In other words, the server returns stateid L in a successful LOCK reply, but it has already released it. Subsequent uses of stateid L fail. To address this, protect the generation check in nfsd4_free_stateid with the st_mutex. This should guarantee that only one of two outcomes occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID returns NFS4ERR_LOCKS_HELD. Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Fix-suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-11arm64: Remove stack duplicating code from jprobesDavid A. Long
Because the arm64 calling standard allows stacked function arguments to be anywhere in the stack frame, do not attempt to duplicate the stack frame for jprobes handler functions. Documentation changes to describe this issue have been broken out into a separate patch in order to simultaneously address them in other architecture(s). Signed-off-by: David A. Long <dave.long@linaro.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-08-11nfsd: fix dentry refcounting on createJosef Bacik
b44061d0b9 introduced a dentry ref counting bug. Previously we were grabbing one ref to dchild in nfsd_create(), but with the creation of nfsd_create_locked() we have a ref for dchild from the lookup in nfsd_create(), and then another ref in nfsd_create_locked(). The ref from the lookup in nfsd_create() is never dropped and results in dentries still in use at unmount. Signed-off-by: Josef Bacik <jbacik@fb.com> Fixes: b44061d0b9 "nfsd: reorganize nfsd_create" Reported-by: kernel test robot <xiaolong.ye@intel.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-11bvec: avoid variable shadowing warningJohannes Berg
Due to the (indirect) nesting of min(..., min(...)), sparse will show a variable shadowing warning whenever bvec.h is included. Avoid that by assigning the inner min() to a temporary variable first. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-11doc: update block/queue-sysfs.txt entriesJoe Lawrence
Add descriptions for dax, io_poll, and write_same_max_bytes files. Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-11nvme: Suspend all queues before deletionGabriel Krisman Bertazi
When nvme_delete_queue fails in the first pass of the nvme_disable_io_queues() loop, we return early, failing to suspend all of the IO queues. Later, on the nvme_pci_disable path, this causes us to disable MSI without actually having freed all the IRQs, which triggers the BUG_ON in free_msi_irqs(), as show below. This patch refactors nvme_disable_io_queues to suspend all queues before start submitting delete queue commands. This way, we ensure that we have at least returned every IRQ before continuing with the removal path. [ 487.529200] kernel BUG at ../drivers/pci/msi.c:368! cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650] pc: c000000000627a50: free_msi_irqs+0x90/0x200 lr: c000000000627a40: free_msi_irqs+0x80/0x200 sp: c0000078c5b838d0 msr: 9000000100029033 current = 0xc0000078c5b40000 paca = 0xc000000002bd7600 softe: 0 irq_happened: 0x01 pid = 1376, comm = kworker/70:1H kernel BUG at ../drivers/pci/msi.c:368! Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413 (Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016 enter ? for help [c0000078c5b83920] d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme] [c0000078c5b83a10] d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme] [c0000078c5b83ad0] c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110 [c0000078c5b83b40] c00000000056c930 bt_for_each+0x160/0x170 [c0000078c5b83bb0] c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110 [c0000078c5b83c00] c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0 [c0000078c5b83c50] c0000000000e8cf0 process_one_work+0x1e0/0x590 [c0000078c5b83ce0] c0000000000e9148 worker_thread+0xa8/0x660 [c0000078c5b83d80] c0000000000f2090 kthread+0x110/0x130 [c0000078c5b83e30] c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-nvme@lists.infradead.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-11drm/i915: Deal with NV12 CbCr plane AUX surface on SKL+Ville Syrjälä
With NV12 we have two color planes to deal with so we must compute the surface and x/y offsets for the second plane as well. What makes this a bit nasty is that the hardware expects the surface offset to be specified as a distance from the main surface offset. What's worse, the distance must be non-negative (no neat wraparound or anything). So we must make sure that the main surface offset is always less or equal to the AUX surface offset. We do that by computing the AUX offset first and the main surface offset second. If the main surface offset ends up being above the AUX offset, we just push it down as far as is required while still maintaining the required alignment etc. Fortunately the AUX offset only reuqires 4K alignment, so we don't need to do any of the backwards searching for an acceptable offset that we must do for the main surface. And X tiled + NV12 isn't a supported combination anyway. Note that this just computes aux surface offsets, we do not yet program them into the actual hardware registers, and hence we can't yet expose NV12. v2: Rebase due to drm_plane_state src/dst rects s/TODO.../something else/ in the commit message/ (Daniel) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-12-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-11drm/i915: Compute display surface offset in the plane check hook for SKL+Ville Syrjälä
SKL has nasty limitations with the display surface offsets: * source x offset + width must be less than the stride for X tiled surfaces or the display engine falls over * the surface offset requires lots of alignment (256K or 1M) These facts mean that we can't just pick any suitably aligned tile boundary as the offset and expect the resulting x offset to be useable. The solution is to start with the closest boundary as before, but then keep searching backwards until we find one that works, or don't. This means we must be prepared to fail, hence the whole surface offset calculation needs to be moved to the .check_plane() hook from the .update_plane() hook. While at it we can check that the source width/height don't exceed maximum plane size limits. We'll store the results of the computation in the plane state to make it easy for the .update_plane() hook to do its thing. v2: Replace for+break loop with while loop Rebase due to drm_plane_state src/dst rects Rebase due to plane_check_state() Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-11-git-send-email-ville.syrjala@linux.intel.com
2016-08-11drm/i915: Make intel_adjust_tile_offset() work for linear buffersVille Syrjälä
To make life less surprising we can make intel_adjust_tile_offset() deal with linear buffers as well. Currently it doesn't seem like there's a real need for this since only X tiling and NV12 (which would always be tiled currently) should need it. But I've used it for some debug hacks already so seems like a reasonable thing to have. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-10-git-send-email-ville.syrjala@linux.intel.com
2016-08-11drm/i915: Allow calling intel_adjust_tile_offset() multiple timesVille Syrjälä
Minimize the resulting X coordinate after intel_adjust_tile_offset() is done with it's offset adjustment. This allows calling intel_adjust_tile_offset() multiple times in case we need to adjust the offset several times. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-9-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-11drm/i915: Limit fb x offset due to fencesVille Syrjälä
If there's a fence on the object it will be aligned to the start of the object, and hence CPU rendering to any fb that straddles the fence edge will come out wrong due to lines wrapping at the wrong place. We have no API to manage fences on a sub-object level, so we can't really fix this in any way. Additonally gen2/3 fences are rather coarse grained so adjusting the offset migth not even be possible. Avoid these problems by requiring the fb layout to agree with the fence layout (if present). v2: Rebase due to i915_gem_object_get_tiling() & co. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-8-git-send-email-ville.syrjala@linux.intel.com
2016-08-11drm/i915: Adjust obj tiling vs. fb modifier rulesVille Syrjälä
Currently we require the object to be X tiled if the fb is X tiled. The argument is supposedly FBC GTT tracking. But actually that no longer holds water since FBC supports Y tiling as well on SKL+. A better rule IMO is to require that if there is a fence, the fb modifier match the object tiling mode. But if the object is linear, we can allow the fb modifier to be anything. The idea being that if the user set the tiling mode on the object, presumably the intention is to actually use the fence for CPU access. But if the tiling mode is not set, the user has no intention of using a fence (and can't actually since we disallow tiling mode changes when there are framebuffers associated with the object). On gen2/3 we must keep to the rule that the object and fb must be either both linear or both X tiled. No mixing allowed since the display engine itself will use the fence if it's present. v2: Fix typos v3: Rebase due to i915_gem_object_get_tiling() & co. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-7-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-11drm/i915: Use fb modifiers for display tiling decisionsVille Syrjälä
Soon the fence tiling mode may not always match the fb modifier even for X tiled buffers. So let's use the fb modifier consistently for all display tiling decisions. v2: Rebased due to s/ring/engine/ v3: Rebased due to s/engine/ring/ O_o v4: Rebase due to i915_gem_object_get_tiling() & co. Reviewed-by: Matthew Auld <matthew.auld@intel.com> (v1) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-6-git-send-email-ville.syrjala@linux.intel.com
2016-08-11drm/i915: Pass around plane_state instead of fb+rotationVille Syrjälä
intel_compute_tile_offset() and intel_add_fb_offsets() get passed the fb and the rotation. As both of those come from the plane state we can just pass that in instead. For extra consitency pass the plane state to intel_fb_xy_to_linear() as well even though it only really needs the fb. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-5-git-send-email-ville.syrjala@linux.intel.com
2016-08-11drm/i915: Move SKL hw stride calculation into a helperVille Syrjälä
We repeat the SKL stride register value calculations a several places. Move it into a small helper function. v2: Rebase due to drm_plane_state src/dst rects Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-4-git-send-email-ville.syrjala@linux.intel.com
2016-08-11drm/i915: Don't pass pitch to intel_compute_page_offset()Ville Syrjälä
intel_compute_page_offset() can dig up the correct pitch from the fb itself, no need for the caller to pass it in. A bit of extra care is needed for the lower level _intel_compute_page_offset() since that one gets called before the rotated pitch under intel_fb is populated. Note that we don't actually call it with anything but DRM_ROTATE_0 there so we wouldn't actually look up the rotated pitch there, but still, leave the pitch as something the caller has to pass to _intel_compute_page_offset() as an indicator that something is a bit special. This leaves 'stride_div' in the skl plane update hooks as a mostly useless variable so just get rid of it. v2: Add a note why stride_div got nuked v3: Extract intel_fb_pitch() since it can be useful later Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v2) Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-3-git-send-email-ville.syrjala@linux.intel.com
2016-08-11drm/i915: Rewrite fb rotation GTT handlingVille Syrjälä
Redo the fb rotation handling in order to: - eliminate the NV12 special casing - handle fb->offsets[] properly - make the rotation handling easier for the plane code To achieve these goals we reduce intel_rotation_info to only contain (for each plane) the rotated view width,height,stride in tile units, and the page offset into the object where the plane starts. Each plane is handled exactly the same way, no special casing for NV12 or other formats. We then store the computed rotation_info under intel_framebuffer so that we don't have to recompute it again. To handle fb->offsets[] we treat them as a linear offsets and convert them to x/y offsets from the start of the relevant GTT mapping (either normal or rotated). We store the x/y offsets under intel_framebuffer, and for some extra convenience we also store the rotated pitch (ie. tile aligned plane height). So for each plane we have the normal x/y offsets, rotated x/y offsets, and the rotated pitch. The normal pitch is available already in fb->pitches[]. While we're gathering up all that extra information, we can also easily compute the storage requirements for the framebuffer, so that we can check that the object is big enough to hold it. When it comes time to deal with the plane source coordinates, we first rotate the clipped src coordinates to match the relevant GTT view orientation, then add to them the fb x/y offsets. Next we compute the aligned surface page offset, and as a result we're left with some residual x/y offsets. Finally, if required by the hardware, we convert the remaining x/y offsets into a linear offset. For gen2/3 we simply skip computing the final page offset, and just convert the src+fb x/y offsets directly into a linear offset since that's what the hardware wants. After this all platforms, incluing SKL+, compute these things in exactly the same way (excluding alignemnt differences). v2: Use BIT(DRM_ROTATE_270) instead of ROTATE_270 when rotating plane src coordinates Drop some spurious changes that got left behind during development v3: Split out more changes to prep patches (Daniel) s/intel_fb->plane[].foo.bar/intel_fb->foo[].bar/ for brevity Rename intel_surf_gtt_offset to intel_fb_gtt_offset Kill the pointless 'plane' parameter from intel_fb_gtt_offset() v4: Fix alignment vs. alignment-1 when calling _intel_compute_tile_offset() from intel_fill_fb_info() Pass the pitch in tiles in stad of pixels to intel_adjust_tile_offset() from intel_fill_fb_info() Pass the full width/height of the rotated area to drm_rect_rotate() for clarity Use u32 for more offsets v5: Preserve the upper_32_bits()/lower_32_bits() handling for the fb ggtt offset (Sivakumar) v6: Rebase due to drm_plane_state src/dst rects Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470821001-25272-2-git-send-email-ville.syrjala@linux.intel.com Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-11x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()Sebastian Andrzej Siewior
I made a mistake while converting the driver to the hotplug state machine and as a result x2apic_cluster_probe() was accessing cpus_in_cluster before allocating it. This patch fixes it by setting the cpumask after the allocation the memory succeeded. While at it, I marked two functions static which are only used within this file. Reported-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6b2c28471de5 ("x86/x2apic: Convert to CPU hotplug state machine") Link: http://lkml.kernel.org/r/1470924515-9444-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11sched/cputime: Fix omitted ticks passed in parameterFrederic Weisbecker
Commit: f9bcf1e0e014 ("sched/cputime: Fix steal time accounting") ... fixes a leak on steal time accounting but forgets to account the ticks passed in parameters, assuming there is only one to take into account. Let's consider that parameter back. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Wanpeng Li <kernellwp@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: linux-tip-commits@vger.kernel.org Link: http://lkml.kernel.org/r/20160811125822.GB4214@lerouge Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11drm/i915: Account for TSEG size when determining 865G stolen baseVille Syrjälä
Looks like the TSEG lives just above TOUD, stolen comes after TSEG. The spec seems somewhat self-contradictory in places, in the ESMRAMC register desctription it says: TSEG Size: 10=(TOUD + 512 KB) to TOUD 11 =(TOUD + 1 MB) to TOUD so that agrees with TSEG being at TOUD. But the example given elsehwere in the spec says: TOUD equals 62.5 MB = 03E7FFFFh TSEG selected as 512 KB in size, Graphics local memory selected as 1 MB in size General System RAM available in system = 62.5 MB General system RAM range00000000h to 03E7FFFFh TSEG address range03F80000h to 03FFFFFFh TSEG pre-allocated from03F80000h to 03FFFFFFh Graphics local memory pre-allocated from03E80000h to 03F7FFFFh so here we have TSEG above stolen. Real world evidence agrees with the TOUD->TSEG->stolen order however, so let's fix up the code to account for the TSEG size. Cc: Taketo Kabe <fdporg@vega.pgw.jp> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: stable@vger.kernel.org Fixes: 0ad98c74e093 ("drm/i915: Determine the stolen memory base address on gen2") Fixes: a4dff76924fe ("x86/gpu: Add Intel graphics stolen memory quirk for gen2 platforms") Reported-by: Taketo Kabe <fdporg@vega.pgw.jp> Tested-by: Taketo Kabe <fdporg@vega.pgw.jp> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96473 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470653919-27251-1-git-send-email-ville.syrjala@linux.intel.com Link: http://download.intel.com/design/chipsets/datashts/25251405.pdf Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-08-11efi/capsule: Allocate whole capsule into virtual memoryAustin Christ
According to UEFI 2.6 section 7.5.3, the capsule should be in contiguous virtual memory and firmware may consume the capsule immediately. To correctly implement this functionality, the kernel driver needs to vmap the entire capsule at the time it is made available to firmware. The virtual allocation of the capsule update has been changed from kmap, which was only allocating the first page of the update, to vmap, and allocates the entire data payload. Signed-off-by: Austin Christ <austinwc@codeaurora.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Lee, Chun-Yi <jlee@suse.com> Cc: <stable@vger.kernel.org> # v4.7 Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kweh Hock Leong <hock.leong.kweh@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1470912120-22831-3-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11x86/platform/uv: Skip UV runtime services mapping in the ↵Alex Thorlton
efi_runtime_disabled case This problem has actually been in the UV code for a while, but we didn't catch it until recently, because we had been relying on EFI_OLD_MEMMAP to allow our systems to boot for a period of time. We noticed the issue when trying to kexec a recent community kernel, where we hit this NULL pointer dereference in efi_sync_low_kernel_mappings(): [ 0.337515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000880 [ 0.346276] IP: [<ffffffff8105df8d>] efi_sync_low_kernel_mappings+0x5d/0x1b0 The problem doesn't show up with EFI_OLD_MEMMAP because we skip the chunk of setup_efi_state() that sets the efi_loader_signature for the kexec'd kernel. When the kexec'd kernel boots, it won't set EFI_BOOT in setup_arch, so we completely avoid the bug. We always kexec with noefi on the command line, so this shouldn't be an issue, but since we're not actually checking for efi_runtime_disabled in uv_bios_init(), we end up trying to do EFI runtime callbacks when we shouldn't be. This patch just adds a check for efi_runtime_disabled in uv_bios_init() so that we don't map in uv_systab when runtime_disabled == true. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: <stable@vger.kernel.org> # v4.7 Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <rja@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1470912120-22831-2-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11x86/efi: Allocate a trampoline if needed in efi_free_boot_services()Andy Lutomirski
On my Dell XPS 13 9350 with firmware 1.4.4 and SGX on, if I boot Fedora 24's grub2-efi off a hard disk, my first 1MB of RAM looks like: efi: mem00: [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x0000000000000000-0x0000000000000fff] (0MB) efi: mem01: [Boot Data | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000001000-0x0000000000027fff] (0MB) efi: mem02: [Loader Data | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000028000-0x0000000000029fff] (0MB) efi: mem03: [Reserved | | | | | | | | |WB|WT|WC|UC] range=[0x000000000002a000-0x000000000002bfff] (0MB) efi: mem04: [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x000000000002c000-0x000000000002cfff] (0MB) efi: mem05: [Loader Data | | | | | | | | |WB|WT|WC|UC] range=[0x000000000002d000-0x000000000002dfff] (0MB) efi: mem06: [Conventional Memory| | | | | | | | |WB|WT|WC|UC] range=[0x000000000002e000-0x0000000000057fff] (0MB) efi: mem07: [Reserved | | | | | | | | |WB|WT|WC|UC] range=[0x0000000000058000-0x0000000000058fff] (0MB) efi: mem08: [Conventional Memory| | | | | | | | |WB|WT|WC|UC] range=[0x0000000000059000-0x000000000009ffff] (0MB) My EBDA is at 0x2c000, which blocks off everything from 0x2c000 and up, and my trampoline is 0x6000 bytes (6 pages), so it doesn't fit in the loader data range at 0x28000. Without this patch, it panics due to a failure to allocate the trampoline. With this patch, it works: [ +0.001744] Base memory trampoline at [ffff880000001000] 1000 size 24576 Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <mario_limonciello@dell.com> Cc: Matt Fleming <mfleming@suse.de> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/998c77b3bf709f3dfed85cb30701ed1a5d8a438b.1470821230.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11drm/i915/guc: Consolidate firmware major-minor to one placeTvrtko Ursulin
Currently to change the firmware one has to update the exported module firmware string and the major-minor versions used for verification after load. Consolidate that to a single place defining correct major and minor versions per platform. v2: Rebased for KBL. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Peter Antoine <peter.antoine@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470842206-35685-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-08-11drm/i915: Store number of active engines in device infoTvrtko Ursulin
Until now code was calling hweight32 to figure out the number from device_info->ring_mask at runtime. Instead we can cache it at engine init time and use directly. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1470842530-35854-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-08-11dma-buf/sync_file: only enable fence signalling on poll()Gustavo Padovan
Signalling doesn't need to be enabled at sync_file creation, it is only required if userspace waiting the fence to signal through poll(). Thus we delay fence_add_callback() until poll is called. It only adds the callback the first time poll() is called. This avoid re-adding the same callback multiple times. v2: rebase and update to work with new fence support for sync_file v3: use atomic operation to set enabled and protect fence_add_callback() v4: use user bit from fence flags (comment from Chris Wilson) v5: use ternary if on poll return (comment from Chris Wilson) Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> [sumits: remove unused var status] Link: http://patchwork.freedesktop.org/patch/msgid/1470404378-27961-1-git-send-email-gustavo@padovan.org
2016-08-11Documentation: add doc for sync_file_get_fence()Gustavo Padovan
Document the new function added to sync_file.c v2: Adapt to fence_array v3: Take in Chris Wilson suggestions Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2016-08-11dma-buf/sync_file: add sync_file_get_fence()Gustavo Padovan
Creates a function that given an sync file descriptor returns a fence containing all fences in the sync_file. v2: Comments by Daniel Vetter - Adapt to new version of fence_collection_init() - Hold a reference for the fence we return v3: - Adapt to use fput() directly - rename to sync_file_get_fence() as we always return one fence v4: Adapt to use fence_array v5: set fence through fence_get() Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2016-08-11dma-buf/sync_file: refactor fence storage in struct sync_fileGustavo Padovan
Create sync_file->fence to abstract the type of fence we are using for each sync_file. If only one fence is present we use a normal struct fence but if there is more fences to be added to the sync_file a fence_array is created. This change cleans up sync_file a bit. We don't need to have sync_file_cb array anymore. Instead, as we always have one fence, only one fence callback is registered per sync_file. v2: Comments from Chris Wilson and Christian König - Not using fence_ops anymore - fence_is_array() was created to differentiate fence from fence_array - fence_array_teardown() is now exported and used under fence_is_array() - struct sync_file lost num_fences member v3: Comments from Chris Wilson and Christian König - struct sync_file lost status member in favor of fence_is_signaled() - drop use of fence_array_teardown() - use sizeof(*fence) to allocate only an array on fence pointers v4: Comments from Chris Wilson - use sizeof(*fence) to reallocate array - fix typo in comments - protect num_fences sum against overflows - use array->base instead of casting the to struct fence v5: fixes checkpatch warnings v6: fix case where all fences are signaled. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2016-08-11dma-buf/fence-array: add fence_is_array()Gustavo Padovan
Add helper to check if fence is array. v2: Comments from Chris Wilson - remove ternary if from ops comparison - add EXPORT_SYMBOL(fence_array_ops) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
2016-08-11drm/imx-ldb: Add support to drm-bridgePeter Senna Tschudin
Add support to attach a drm_bridge to imx-ldb in addition to existing support to attach a LVDS panel. This patch does a simple code refactoring by moving code from for_each_child_of_node iterator to a new function named imx_ldb_panel_ddc(). This was necessary to allow the panel ddc code to run only when the imx_ldb is not attached to a bridge. Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Rob Herring <robh@kernel.org> Cc: Fabio Estevam <fabio.estevam@nxp.com> Cc: David Airlie <airlied@linux.ie> Cc: Thierry Reding <treding@nvidia.com> Cc: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Peter Senna Tschudin <peter.senna@collabora.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2016-08-11x86/boot: Rework reserve_real_mode() to allow multiple triesAndy Lutomirski
If reserve_real_mode() fails, panicing immediately means we're doomed. Make it safe to try more than once to allocate the trampoline: - Degrade a failure from panic() to pr_info(). (If we make it to setup_real_mode() without reserving the trampoline, we'll panic them.) - Factor out helpers so that platform code can supply a specific address to try. - Warn if reserve_real_mode() is called after we're done with the memblock allocator. If that were to happen, we would behave unpredictably. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <mario_limonciello@dell.com> Cc: Matt Fleming <mfleming@suse.de> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/876e383038f3e9971aa72fd20a4f5da05f9d193d.1470821230.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11x86/boot: Defer setup_real_mode() to early_initcall timeAndy Lutomirski
There's no need to run setup_real_mode() as early as we run it. Defer it to the same early_initcall that sets up the page permissions for the real mode code. This should be a code size reduction. More importantly, it give us a longer window in which we can allocate the real mode trampoline. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <mario_limonciello@dell.com> Cc: Matt Fleming <mfleming@suse.de> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/fd62f0da4f79357695e9bf3e365623736b05f119.1470821230.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directlyAndy Lutomirski
The initialization process for trampoline_cr4_features and mmu_cr4_features was confusing. The intent is for mmu_cr4_features and *trampoline_cr4_features to stay in sync, but trampoline_cr4_features is NULL until setup_real_mode() runs. The old code synchronized *trampoline_cr4_features *twice*, once in setup_real_mode() and once in setup_arch(). It also initialized mmu_cr4_features in setup_real_mode(), which causes the actual value of mmu_cr4_features to potentially depend on when setup_real_mode() is called. With this patch, mmu_cr4_features is initialized directly in setup_arch(), and *trampoline_cr4_features is synchronized to mmu_cr4_features when the trampoline is set up. After this patch, it should be safe to defer setup_real_mode(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <mario_limonciello@dell.com> Cc: Matt Fleming <mfleming@suse.de> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d48a263f9912389b957dd495a7127b009259ffe0.1470821230.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11x86/boot: Run reserve_bios_regions() after we initialize the memory mapAndy Lutomirski
reserve_bios_regions() is a quirk that reserves memory that we might otherwise think is available. There's no need to run it so early, and running it before we have the memory map initialized with its non-quirky inputs makes it hard to make reserve_bios_regions() more intelligent. Move it right after we populate the memblock state. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <mario_limonciello@dell.com> Cc: Matt Fleming <mfleming@suse.de> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/59f58618911005c799c6c9979ce6ae4881d907c2.1470821230.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-11x86/irq: Do not substract irq_tlb_count from irq_call_countAaron Lu
Since commit: 52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR") the TLB remote shootdown is done through call function vector. That commit didn't take care of irq_tlb_count, which a later commit: fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from other functions call interrupts") ... tried to fix. The fix assumes every increase of irq_tlb_count has a corresponding increase of irq_call_count. So the irq_call_count is always bigger than irq_tlb_count and we could substract irq_tlb_count from irq_call_count. Unfortunately this is not true for the smp_call_function_single() case. The IPI is only sent if the target CPU's call_single_queue is empty when adding a csd into it in generic_exec_single. That means if two threads are both adding flush tlb csds to the same CPU's call_single_queue, only one IPI is sent. In other words, the irq_call_count is incremented by 1 but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be bigger than irq_call_count and the substract will produce a very large irq_call_count value due to overflow. Considering that: 1) it's not worth to send more IPIs for the sake of accurate counting of irq_call_count in generic_exec_single(); 2) it's not easy to tell if the call function interrupt is for TLB shootdown in __smp_call_function_single_interrupt(). Not to exclude TLB shootdown from call function count seems to be the simplest fix and this patch just does that. This bug was found by LKP's cyclic performance regression tracking recently with the vm-scalability test suite. I have bisected to commit: 3dec0ba0be6a ("mm/rmap: share the i_mmap_rwsem") This commit didn't do anything wrong but revealed the irq_call_count problem. IIUC, the commit makes rwc->remap_one in rmap_walk_file concurrent with multiple threads. When remap_one is try_to_unmap_one(), then multiple threads could queue flush TLB to the same CPU but only one IPI will be sent. Since the commit was added in Linux v3.19, the counting problem only shows up from v3.19 onwards. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: Alex Shi <alex.shi@linaro.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Link: http://lkml.kernel.org/r/20160811074430.GA18163@aaronlu.sh.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>