summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-10x86/unwind: Align stack pointer in unwinder dumpJosh Poimboeuf
When printing the unwinder dump, the stack pointer could be unaligned, for one of two reasons: - stack corruption; or - GCC created an unaligned stack. There's no way for the unwinder to tell the difference between the two, so we have to assume one or the other. GCC unaligned stacks are very rare, and have only been spotted before GCC 5. Presumably, if we're doing an unwinder stack dump, stack corruption is more likely than a GCC unaligned stack. So always align the stack before starting the dump. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2f540c515946ab09ed267e1a1d6421202a0cce08.1507597785.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10x86/unwind: Use MSB for frame pointer encoding on 32-bitJosh Poimboeuf
On x86-32, Tetsuo Handa and Fengguang Wu reported unwinder warnings like: WARNING: kernel stack regs at f60bb9c8 in swapper:1 has bad 'bp' value 0ba00000 And also there were some stack dumps with a bunch of unreliable '?' symbols after an apic_timer_interrupt symbol, meaning the unwinder got confused when it tried to read the regs. The cause of those issues is that, with GCC 4.8 (and possibly older), there are cases where GCC misaligns the stack pointer in a leaf function for no apparent reason: c124a388 <acpi_rs_move_data>: c124a388: 55 push %ebp c124a389: 89 e5 mov %esp,%ebp c124a38b: 57 push %edi c124a38c: 56 push %esi c124a38d: 89 d6 mov %edx,%esi c124a38f: 53 push %ebx c124a390: 31 db xor %ebx,%ebx c124a392: 83 ec 03 sub $0x3,%esp ... c124a3e3: 83 c4 03 add $0x3,%esp c124a3e6: 5b pop %ebx c124a3e7: 5e pop %esi c124a3e8: 5f pop %edi c124a3e9: 5d pop %ebp c124a3ea: c3 ret If an interrupt occurs in such a function, the regs on the stack will be unaligned, which breaks the frame pointer encoding assumption. So on 32-bit, use the MSB instead of the LSB to encode the regs. This isn't an issue on 64-bit, because interrupts align the stack before writing to it. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/279a26996a482ca716605c7dbc7f2db9d8d91e81.1507597785.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10x86/unwind: Fix dereference of untrusted pointerJosh Poimboeuf
Tetsuo Handa and Fengguang Wu reported a panic in the unwinder: BUG: unable to handle kernel NULL pointer dereference at 000001f2 IP: update_stack_state+0xd4/0x340 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 18728 Comm: 01-cpu-hotplug Not tainted 4.13.0-rc4-00170-gb09be67 #592 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 task: bb0b53c0 task.stack: bb3ac000 EIP: update_stack_state+0xd4/0x340 EFLAGS: 00010002 CPU: 0 EAX: 0000a570 EBX: bb3adccb ECX: 0000f401 EDX: 0000a570 ESI: 00000001 EDI: 000001ba EBP: bb3adc6b ESP: bb3adc3f DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 80050033 CR2: 000001f2 CR3: 0b3a7000 CR4: 00140690 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: fffe0ff0 DR7: 00000400 Call Trace: ? unwind_next_frame+0xea/0x400 ? __unwind_start+0xf5/0x180 ? __save_stack_trace+0x81/0x160 ? save_stack_trace+0x20/0x30 ? __lock_acquire+0xfa5/0x12f0 ? lock_acquire+0x1c2/0x230 ? tick_periodic+0x3a/0xf0 ? _raw_spin_lock+0x42/0x50 ? tick_periodic+0x3a/0xf0 ? tick_periodic+0x3a/0xf0 ? debug_smp_processor_id+0x12/0x20 ? tick_handle_periodic+0x23/0xc0 ? local_apic_timer_interrupt+0x63/0x70 ? smp_trace_apic_timer_interrupt+0x235/0x6a0 ? trace_apic_timer_interrupt+0x37/0x3c ? strrchr+0x23/0x50 Code: 0f 95 c1 89 c7 89 45 e4 0f b6 c1 89 c6 89 45 dc 8b 04 85 98 cb 74 bc 88 4d e3 89 45 f0 83 c0 01 84 c9 89 04 b5 98 cb 74 bc 74 3b <8b> 47 38 8b 57 34 c6 43 1d 01 25 00 00 02 00 83 e2 03 09 d0 83 EIP: update_stack_state+0xd4/0x340 SS:ESP: 0068:bb3adc3f CR2: 00000000000001f2 ---[ end trace 0d147fd4aba8ff50 ]--- Kernel panic - not syncing: Fatal exception in interrupt On x86-32, after decoding a frame pointer to get a regs address, regs_size() dereferences the regs pointer when it checks regs->cs to see if the regs are user mode. This is dangerous because it's possible that what looks like a decoded frame pointer is actually a corrupt value, and we don't want the unwinder to make things worse. Instead of calling regs_size() on an unsafe pointer, just assume they're kernel regs to start with. Later, once it's safe to access the regs, we can do the user mode check and corresponding safety check for the remaining two regs. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 5ed8d8bb38c5 ("x86/unwind: Move common code into update_stack_state()") Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10drm/etnaviv: remove unnecessary clock stabilization delayPhilipp Zabel
There is no reason to wait for clock stabilization here, as the clock framework guarantees that PLL clock sources are stable before clk_enable returns. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: reduce reset delayPhilipp Zabel
After reset assertion, we only have to wait for the reset signals to propagate through the GPU before deasserting the reset again. A few hundred clock cycles should be more than enough. Replace the msleep(1), which can actually take about 30 ms on i.MX6Q in some configurations, with an usleep_range of a few microseconds. If the delay was too short, the FE would not be idle afterwards, and the reset would be retried. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10powerpc: Don't call lockdep_assert_cpus_held() from arch_update_cpu_topology()Thiago Jung Bauermann
It turns out that not all paths calling arch_update_cpu_topology() hold cpu_hotplug_lock, but that's OK because those paths can't race with any concurrent hotplug events. Warnings were reported with the following trace: lockdep_assert_cpus_held arch_update_cpu_topology sched_init_domains sched_init_smp kernel_init_freeable kernel_init ret_from_kernel_thread Which is safe because it's called early in boot when hotplug is not live yet. And also this trace: lockdep_assert_cpus_held arch_update_cpu_topology partition_sched_domains cpuset_update_active_cpus sched_cpu_deactivate cpuhp_invoke_callback cpuhp_down_callbacks cpuhp_thread_fun smpboot_thread_fn kthread ret_from_kernel_thread Which is safe because it's called as part of CPU hotplug, so although we don't hold the CPU hotplug lock, there is another thread driving the CPU hotplug operation which does hold the lock, and there is no race. Thanks to tglx for deciphering it for us. Fixes: 3e401f7a2e51 ("powerpc: Only obtain cpu_hotplug_lock if called by rtasd") Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-10drm/etnaviv: remove unused function etnaviv_gem_newLucas Stach
We only ever do GEM object creation by handle, as there is no kernel internal use of GEM objects. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: remove stale commentLucas Stach
This comment is outdated as the driver is taking care about clock gating and the pulse eater for quite some time already. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-10-10drm/etnaviv: submit supports performance monitor requestsChristian Gmeiner
We increment the minor driver version so userspace can detect perfmon support. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: enable debug registers on demandChristian Gmeiner
Some performance register are debug register and they need to be enabled in order to be functional. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: need to disable clock gating when doing profilingChristian Gmeiner
As done by Vivante kernel driver. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add MC perf domainChristian Gmeiner
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add TX perf domainChristian Gmeiner
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add RA perf domainChristian Gmeiner
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add SE perf domainChristian Gmeiner
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add PA perf domainChristian Gmeiner
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add SH perf domainChristian Gmeiner
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add PE perf domainChristian Gmeiner
We need to iterate over all pixel pipelines to get overall value. Changes from v4 -> v5: - switch back to pixel pipe 0 to prevent GPU hang - PIXELS_RENDERED_2D is exposed for 2D pipe Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add HI perf domainChristian Gmeiner
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: use 'sync points' for performance monitor requestsChristian Gmeiner
With 'sync points' we can sample the reqeustes perform signals before and/or after the submited command buffer. Changes v2 -> v3: - fixed indentation and init nr_events to 1 Changes v4 -> v5: - simplify logic around fence handling. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: clear alloced eventChristian Gmeiner
Results in less code as the users do not set every struct member to 0/NULL. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add 'sync point' supportChristian Gmeiner
In order to support performance counters in a sane way we need to provide a method to sync the GPU with the CPU. The GPU can process multpile command buffers/events per irq. With the help of a 'sync point' we can trigger an event and stop the GPU/FE immediately. When the CPU is done with is processing it simply needs to restart the FE and the GPU will process the command stream. Changes from v1 -> v2: - process sync point with a work item to keep irq as fast as possible Changes from v4 -> v5: - renamed pmrs_* to sync_point_* - call event_free(..) in sync_point_worker(..) Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add performance monitor request processingChristian Gmeiner
Changes v4 -> v5 - make use of doms_meta array Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: copy pmrs from userspaceChristian Gmeiner
Changes from v1 -> v2: - renamed submit_perfmon_request() to submit_perfmon_validate() - extended flags validation - added comment about offset 0 - moved assigment of cmdbuf->nr_pmrs below the copy_from_user of the pmrs. Changes from v2 -> v3: - fixed flags validation Changes v4 -> v5 - pass cmdbuf->exec_state to etnaviv_pm_req_validate(..) Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add performance monitor request validationChristian Gmeiner
Check if the selected domain and signal combination exists. Changes from v4 to v5 - add exec_state parameter Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: extend etnaviv_gpu_cmdbuf_new(..) with nr_pmrsChristian Gmeiner
This commits extends etnaviv_gpu_cmdbuf_new(..) to define the number of struct etnaviv_perfmon elements gets used. Changes from v1 -> v2: - make use of goto as requested by Lucas Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add internal representation of perfmon_requestChristian Gmeiner
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add uapi for perfmon featureChristian Gmeiner
Sadly we can not read any registers via command stream so we need to extend the drm_etnaviv_gem_submit struct with performance monitor requests. Those requests gets process before or after the actual submitted command stream. The Vivante kernel driver has a special ioctl to read all perfmon registers at once and return it. Changes from v1 -> v2: - use a 16 bit value for signals - fix padding issues Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: add infrastructure to query perf counterChristian Gmeiner
Make it possible that userspace can query all performance domains and its signals. This information is needed to sample those signals via submit ioctl. At the moment no performance domain is available. Changes from v1 -> v2: - use a 16 bit value for signals - fix padding issues - add id member to domain and signal struct Changes v4 -> v5 - provide for each pipe an own set of pm domains Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: make it possible to allocate multiple eventsChristian Gmeiner
This makes it possible to allocate multiple events under the event spinlock. This change is needed to support 'sync'-points. Changes v2 -> v3: - wait for the completion of all events - use 10sec timeout regardless of the number of events - removed validation if there are enough free events - fixed return value evaluation of event_alloc(..) in etnaviv_gpu_submit(..) Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: use bitmap to keep track of eventsChristian Gmeiner
This is prep work to be able to allocate multiple events in one go. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: rework clock initializationLucas Stach
The reset path wants to initialize the clock control register regardless of the DYNAMIC_FREQUENCY_SCALING feature, so don't call clock update, but explicitly load the register. Also disabling of the debug registers is moved into the reset function, so we always get to the same state after a GPU reset. This means the clock update function should not touch the bits already set in the clock control register, but instead only update the scaling bits. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-10-10drm/etnaviv: remove IOMMU dependencyLucas Stach
Using the IOMMU API to manage the internal GPU MMU has been an historical accident and it keeps getting in the way, as well as entangling the driver with the inner workings of the IOMMU subsystem. Clean this up by removing the usage of iommu_domain, which is the last piece linking etnaviv to the IOMMU subsystem. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2017-10-10drm/etnaviv: mmu: mark local functions staticLucas Stach
And clean up the header file a bit. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-10-10drm/etnaviv: mmu: stop using iommu map/unmap functionsLucas Stach
This is a preparation to remove the etnaviv dependency on the IOMMU subsystem by importing the relevant parts of the iommu map/unamp functions into the driver. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-10-10drm/etnaviv: iommuv1: remove map_lockLucas Stach
It wasn't protecting anything, as the single word writes used to set up or tear down a translation are already inherently atomic, so the spinlock is pure overhead. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-10-10drm/etnaviv: iommuv1: fold pgtable_write into callersLucas Stach
A function doing a single assignment is not really helping the code flow. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-10-10drm/etnaviv: iommuv1: fold pagetable alloc and free into callerLucas Stach
Those functions are simple enough to fold them into the calling function. This also fixes a correctness issue, as the alloc/free functions didn't specifiy the device the memory was allocated for. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-10-10drm/etnaviv: remove iova_to_phys iommu opsLucas Stach
They are not used in any way, so can go away. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-10-10drm/etnaviv: remove iommu fault handlerLucas Stach
The handler has never been used, so it's really just dead code. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-10-10drm/bridge/synopsys: dsi :remove is_panel_bridgebenjamin.gaignard@linaro.org
When using drm_of_panel_bridge_remove() we can simplify the code and remove is_panel_bridge from dw_mipi_dsi structure. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Reviewed-by: Philippe Cornu <philippe.cornu@st.com> Tested-by: Philippe Cornu <philippe.cornu@st.com> Link: https://patchwork.freedesktop.org/patch/msgid/1506936888-23844-6-git-send-email-benjamin.gaignard@linaro.org
2017-10-10drm/vc4: remove bridge from driver internal structurebenjamin.gaignard@linaro.org
With a call to drm_of_panel_bridge_remove() we could remove the bridge without store it in vc4_dpi internal driver structure. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Reviewed-by: Eric Anholt <eric@anholt.net> Link: https://patchwork.freedesktop.org/patch/msgid/1506936888-23844-5-git-send-email-benjamin.gaignard@linaro.org
2017-10-10drm/stm: ltdc: remove bridge from driver internal structurebenjamin.gaignard@linaro.org
With a call to drm_of_panel_bridge_remove() we could remove the bridge without store it in ldtc internal driver structure. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Reviewed-by: Philippe Cornu <philippe.cornu@st.com> Tested-by: Philippe Cornu <philippe.cornu@st.com> Link: https://patchwork.freedesktop.org/patch/msgid/1506936888-23844-4-git-send-email-benjamin.gaignard@linaro.org
2017-10-10drm/drm_of: add drm_of_panel_bridge_remove functionbenjamin.gaignard@linaro.org
This function is the pendant of drm_of_find_panel_or_bridge() to remove a previously allocated panel_bridge. Given a specific port and endpoint it remove the panel bridge. Since drm_panel_bridge_remove() will check that bridge parameter is not NULL and is a real drm_panel_bridge and no a simple bridge it is safe to call it directly. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Reviewed-by: Philippe Cornu <philippe.cornu@st.com> Tested-by: Philippe Cornu <philippe.cornu@st.com> Link: https://patchwork.freedesktop.org/patch/msgid/1506936888-23844-3-git-send-email-benjamin.gaignard@linaro.org
2017-10-10drm/bridge: make drm_panel_bridge_remove more robustbenjamin.gaignard@linaro.org
Make sure that bridge parameter is not NULL and can be safely cast into a panel_bridge structure. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@linaro.org> Reviewed-by: Philippe Cornu <philippe.cornu@st.com> Tested-by: Philippe Cornu <philippe.cornu@st.com> Link: https://patchwork.freedesktop.org/patch/msgid/1506936755-23625-2-git-send-email-benjamin.gaignard@linaro.org
2017-10-10powerpc/lib/sstep: Fix count leading zeros instructionsSandipan Das
According to the GCC documentation, the behaviour of __builtin_clz() and __builtin_clzl() is undefined if the value of the input argument is zero. Without handling this special case, these builtins have been used for emulating the following instructions: * Count Leading Zeros Word (cntlzw[.]) * Count Leading Zeros Doubleword (cntlzd[.]) This fixes the emulated behaviour of these instructions by adding an additional check for this special case. Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs") Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-10spi: a3700: Return correct value on timeout detectionMaxime Chevallier
When waiting for transfer completion, a3700_spi_wait_completion returns a boolean indicating if a timeout occurred. The function was returning 'true' everytime, failing to detect any timeout. This patch makes it return 'false' when a timeout is reached. Signed-off-by: Maxime Chevallier <maxime.chevallier@smile.fr> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
2017-10-10sched/core: Ensure load_balance() respects the active_maskPeter Zijlstra
While load_balance() masks the source CPUs against active_mask, it had a hole against the destination CPU. Ensure the destination CPU is also part of the 'domain-mask & active-mask' set. Reported-by: Levin, Alexander (Sasha Levin) <alexander.levin@verizon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10sched/core: Address more wake_affine() regressionsPeter Zijlstra
The trivial wake_affine_idle() implementation is very good for a number of workloads, but it comes apart at the moment there are no idle CPUs left, IOW. the overloaded case. hackbench: NO_WA_WEIGHT WA_WEIGHT hackbench-20 : 7.362717561 seconds 6.450509391 seconds (win) netperf: NO_WA_WEIGHT WA_WEIGHT TCP_SENDFILE-1 : Avg: 54524.6 Avg: 52224.3 TCP_SENDFILE-10 : Avg: 48185.2 Avg: 46504.3 TCP_SENDFILE-20 : Avg: 29031.2 Avg: 28610.3 TCP_SENDFILE-40 : Avg: 9819.72 Avg: 9253.12 TCP_SENDFILE-80 : Avg: 5355.3 Avg: 4687.4 TCP_STREAM-1 : Avg: 41448.3 Avg: 42254 TCP_STREAM-10 : Avg: 24123.2 Avg: 25847.9 TCP_STREAM-20 : Avg: 15834.5 Avg: 18374.4 TCP_STREAM-40 : Avg: 5583.91 Avg: 5599.57 TCP_STREAM-80 : Avg: 2329.66 Avg: 2726.41 TCP_RR-1 : Avg: 80473.5 Avg: 82638.8 TCP_RR-10 : Avg: 72660.5 Avg: 73265.1 TCP_RR-20 : Avg: 52607.1 Avg: 52634.5 TCP_RR-40 : Avg: 57199.2 Avg: 56302.3 TCP_RR-80 : Avg: 25330.3 Avg: 26867.9 UDP_RR-1 : Avg: 108266 Avg: 107844 UDP_RR-10 : Avg: 95480 Avg: 95245.2 UDP_RR-20 : Avg: 68770.8 Avg: 68673.7 UDP_RR-40 : Avg: 76231 Avg: 75419.1 UDP_RR-80 : Avg: 34578.3 Avg: 35639.1 UDP_STREAM-1 : Avg: 64684.3 Avg: 66606 UDP_STREAM-10 : Avg: 52701.2 Avg: 52959.5 UDP_STREAM-20 : Avg: 30376.4 Avg: 29704 UDP_STREAM-40 : Avg: 15685.8 Avg: 15266.5 UDP_STREAM-80 : Avg: 8415.13 Avg: 7388.97 (wins and losses) sysbench: NO_WA_WEIGHT WA_WEIGHT sysbench-mysql-2 : 2135.17 per sec. 2142.51 per sec. sysbench-mysql-5 : 4809.68 per sec. 4800.19 per sec. sysbench-mysql-10 : 9158.59 per sec. 9157.05 per sec. sysbench-mysql-20 : 14570.70 per sec. 14543.55 per sec. sysbench-mysql-40 : 22130.56 per sec. 22184.82 per sec. sysbench-mysql-80 : 20995.56 per sec. 21904.18 per sec. sysbench-psql-2 : 1679.58 per sec. 1705.06 per sec. sysbench-psql-5 : 3797.69 per sec. 3879.93 per sec. sysbench-psql-10 : 7253.22 per sec. 7258.06 per sec. sysbench-psql-20 : 11166.75 per sec. 11220.00 per sec. sysbench-psql-40 : 17277.28 per sec. 17359.78 per sec. sysbench-psql-80 : 17112.44 per sec. 17221.16 per sec. (increase on the top end) tbench: NO_WA_WEIGHT Throughput 685.211 MB/sec 2 clients 2 procs max_latency=0.123 ms Throughput 1596.64 MB/sec 5 clients 5 procs max_latency=0.119 ms Throughput 2985.47 MB/sec 10 clients 10 procs max_latency=0.262 ms Throughput 4521.15 MB/sec 20 clients 20 procs max_latency=0.506 ms Throughput 9438.1 MB/sec 40 clients 40 procs max_latency=2.052 ms Throughput 8210.5 MB/sec 80 clients 80 procs max_latency=8.310 ms WA_WEIGHT Throughput 697.292 MB/sec 2 clients 2 procs max_latency=0.127 ms Throughput 1596.48 MB/sec 5 clients 5 procs max_latency=0.080 ms Throughput 2975.22 MB/sec 10 clients 10 procs max_latency=0.254 ms Throughput 4575.14 MB/sec 20 clients 20 procs max_latency=0.502 ms Throughput 9468.65 MB/sec 40 clients 40 procs max_latency=2.069 ms Throughput 8631.73 MB/sec 80 clients 80 procs max_latency=8.605 ms (increase on the top end) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rik van Riel <riel@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10sched/core: Fix wake_affine() performance regressionPeter Zijlstra
Eric reported a sysbench regression against commit: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()") Similarly, Rik was looking at the NAS-lu.C benchmark, which regressed against his v3.10 enterprise kernel. PRE (current tip/master): ivb-ep sysbench: 2: [30 secs] transactions: 64110 (2136.94 per sec.) 5: [30 secs] transactions: 143644 (4787.99 per sec.) 10: [30 secs] transactions: 274298 (9142.93 per sec.) 20: [30 secs] transactions: 418683 (13955.45 per sec.) 40: [30 secs] transactions: 320731 (10690.15 per sec.) 80: [30 secs] transactions: 355096 (11834.28 per sec.) hsw-ex NAS: OMP_PROC_BIND/lu.C.x_threads_144_run_1.log: Time in seconds = 18.01 OMP_PROC_BIND/lu.C.x_threads_144_run_2.log: Time in seconds = 17.89 OMP_PROC_BIND/lu.C.x_threads_144_run_3.log: Time in seconds = 17.93 lu.C.x_threads_144_run_1.log: Time in seconds = 434.68 lu.C.x_threads_144_run_2.log: Time in seconds = 405.36 lu.C.x_threads_144_run_3.log: Time in seconds = 433.83 POST (+patch): ivb-ep sysbench: 2: [30 secs] transactions: 64494 (2149.75 per sec.) 5: [30 secs] transactions: 145114 (4836.99 per sec.) 10: [30 secs] transactions: 278311 (9276.69 per sec.) 20: [30 secs] transactions: 437169 (14571.60 per sec.) 40: [30 secs] transactions: 669837 (22326.73 per sec.) 80: [30 secs] transactions: 631739 (21055.88 per sec.) hsw-ex NAS: lu.C.x_threads_144_run_1.log: Time in seconds = 23.36 lu.C.x_threads_144_run_2.log: Time in seconds = 22.96 lu.C.x_threads_144_run_3.log: Time in seconds = 22.52 This patch takes out all the shiny wake_affine() stuff and goes back to utter basics. Between the two CPUs involved with the wakeup (the CPU doing the wakeup and the CPU we ran on previously) pick the CPU we can run on _now_. This restores much of the regressions against the older kernels, but leaves some ground in the overloaded case. The default-enabled WA_WEIGHT (which will be introduced in the next patch) is an attempt to address the overloaded situation. Reported-by: Eric Farman <farman@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jinpuwang@gmail.com Cc: vcaputo@pengaru.com Fixes: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()") Signed-off-by: Ingo Molnar <mingo@kernel.org>