summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-03-20x86/dma: Use generic swiotlb_opsChristoph Hellwig
The generic swiotlb DMA ops were based on the x86 ones and provide equivalent functionality, so use them. Also fix the sta2x11 case. For that SOC the DMA map ops need an additional physical to DMA address translations. For swiotlb buffers that is done throught the phys_to_dma helper, but the sta2x11_dma_ops also added an additional translation on the return value from x86_swiotlb_alloc_coherent, which is only correct if that functions returns a direct allocation and not a swiotlb buffer. With the generic swiotlb and DMA-direct code phys_to_dma is not always used and the separate sta2x11_dma_ops can be replaced with a simple bit that marks if the additional physical to DMA address translation is needed. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-5-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y)Christoph Hellwig
The generic DMA-direct (CONFIG_DMA_DIRECT_OPS=y) implementation is now functionally equivalent to the x86 nommu dma_map implementation, so switch over to using it. That includes switching from using x86_dma_supported in various IOMMU drivers to use dma_direct_supported instead, which provides the same functionality. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-4-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20x86/dma: Remove dma_alloc_coherent_mask()Christoph Hellwig
These days all devices (including the ISA fallback device) have a coherent DMA mask set, so remove the workaround. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-3-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20Merge branch 'x86/mm' into x86/dma, to pick up dependenciesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirkChristoph Hellwig
There were only a few Pentium Pro multiprocessors systems where this errata applied. They are more than 20 years old now, and we've slowly dropped places which put the workarounds in and discouraged anyone from enabling the workaround. Get rid of it for good. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-2-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20spi: Fix unregistration of controller with fixed SPI bus numberJarkko Nikula
Commit 9b61e302210e (spi: Pick spi bus number from Linux idr or spi alias) ceased to unregister SPI buses with fixed bus numbers. Moreover this is visible only if CONFIG_SPI_DEBUG=y is set or when trying to re-register the same SPI controller. rmmod spi_pxa2xx_platform (with CONFIG_SPI_DEBUG=y): [ 26.788362] spi_master spi1: attempting to delete unregistered controller [spi1] modprobe spi_pxa2xx_platform: [ 37.883137] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:19.0/pxa2xx-spi.12/spi_master/spi1' [ 37.894984] CPU: 1 PID: 1467 Comm: modprobe Not tainted 4.16.0-rc4+ #21 [ 37.902384] Call Trace: ... [ 38.122680] kobject_add_internal failed for spi1 with -EEXIST, don't try to register things with the same name in the same directory. [ 38.136154] WARNING: CPU: 1 PID: 1467 at lib/kobject.c:238 kobject_add_internal+0x2a5/0x2f0 ... [ 38.513817] pxa2xx-spi pxa2xx-spi.12: problem registering spi master [ 38.521036] pxa2xx-spi: probe of pxa2xx-spi.12 failed with error -17 Fix this by not returning immediately from spi_unregister_controller() if idr_find() doesn't find controller with given ID/bus number. It finds only those controllers that were registered with dynamic SPI bus numbers. Only conditional cleanup between dynamic and fixed bus numbers is to remove allocated IDR. Fixes: 9b61e302210e (spi: Pick spi bus number from Linux idr or spi alias) Cc: stable@vger.kernel.org Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-03-20sched/debug: Adjust newlines for better alignmentJoe Lawrence
Scheduler debug stats include newlines that display out of alignment when prefixed by timestamps. For example, the dmesg utility: % echo t > /proc/sysrq-trigger % dmesg ... [ 83.124251] runnable tasks: S task PID tree-key switches prio wait-time sum-exec sum-sleep ----------------------------------------------------------------------------------------------------------- At the same time, some syslog utilities (like rsyslog by default) don't like the additional newlines control characters, saving lines like this to /var/log/messages: Mar 16 16:02:29 localhost kernel: #012runnable tasks:#012 S task PID tree-key ... ^^^^ ^^^^ Clean these up by moving newline characters to their own SEQ_printf invocation. This leaves the /proc/sched_debug unchanged, but brings the entire output into alignment when prefixed: % echo t > /proc/sysrq-trigger % dmesg ... [ 62.410368] runnable tasks: [ 62.410368] S task PID tree-key switches prio wait-time sum-exec sum-sleep [ 62.410369] ----------------------------------------------------------------------------------------------------------- [ 62.410369] I kworker/u12:0 5 1932.215593 332 120 0.000000 3.621252 0.000000 0 0 / and no escaped control characters from rsyslog in /var/log/messages: Mar 16 16:15:06 localhost kernel: runnable tasks: Mar 16 16:15:06 localhost kernel: S task PID tree-key ... Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/debug: Fix per-task line continuation for console outputJoe Lawrence
When the SEQ_printf() macro prints to the console, it runs a simple printk() without KERN_CONT "continued" line printing. The result of this is oddly wrapped task info, for example: % echo t > /proc/sysrq-trigger % dmesg ... runnable tasks: ... [ 29.608611] I [ 29.608613] rcu_sched 8 3252.013846 4087 120 [ 29.608614] 0.000000 29.090111 0.000000 [ 29.608615] 0 0 [ 29.608616] / Modify SEQ_printf to use pr_cont() for expected one-line results: % echo t > /proc/sysrq-trigger % dmesg ... runnable tasks: ... [ 106.716329] S cpuhp/5 37 2006.315026 14 120 0.000000 0.496893 0.000000 0 0 / Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1521484555-8620-2-git-send-email-joe.lawrence@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20perf/cgroup: Fix child event counting bugSong Liu
When a perf_event is attached to parent cgroup, it should count events for all children cgroups: parent_group <---- perf_event \ - child_group <---- process(es) However, in our tests, we found this perf_event cannot report reliable results. Here is an example case: # create cgroups mkdir -p /sys/fs/cgroup/p/c # start perf for parent group perf stat -e instructions -G "p" # on another console, run test process in child cgroup: stressapptest -s 2 -M 1000 & echo $! > /sys/fs/cgroup/p/c/cgroup.procs # after the test process is done, stop perf in the first console shows <not counted> instructions p The instruction should not be "not counted" as the process runs in the child cgroup. We found this is because perf_event->cgrp and cpuctx->cgrp are not identical, thus perf_event->cgrp are not updated properly. This patch fixes this by updating perf_cgroup properly for ancestor cgroup(s). Reported-by: Ephraim Park <ephiepark@fb.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <jolsa@redhat.com> Cc: <kernel-team@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20180312165943.1057894-1-songliubraving@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake ↵Kan Liang
servers The number of CHAs is miscalculated on multi-domain PCI Skylake server systems, resulting in an uncore driver initialization error. Gary Kroening explains: "For systems with a single PCI segment, it is sufficient to look for the bus number to change in order to determine that all of the CHa's have been counted for a single socket. However, for multi PCI segment systems, each socket is given a new segment and the bus number does NOT change. So looking only for the bus number to change ends up counting all of the CHa's on all sockets in the system. This leads to writing CPU MSRs beyond a valid range and causes an error in ivbep_uncore_msr_init_box()." To fix this bug, query the number of CHAs from the CAPID6 register: it should read bits 27:0 in the CAPID6 register located at Device 30, Function 3, Offset 0x9C. These 28 bits form a bit vector of available LLC slices and the CHAs that manage those slices. Reported-by: Kroening, Gary <gary.kroening@hpe.com> Tested-by: Kroening, Gary <gary.kroening@hpe.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: abanman@hpe.com Cc: dimitri.sivanich@hpe.com Cc: hpa@zytor.com Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1520967094-13219-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20perf/x86/intel: Rename confusing 'freerunning PEBS' API and implementation ↵Kan Liang
to 'large PEBS' The 'freerunning PEBS' and 'large PEBS' are the same thing. Both of these names appear in the code and in the API, which causes confusion. Rename 'freerunning PEBS' to 'large PEBS' to unify the code, which eliminates the confusion. No functional change. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1520865937-22910-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20jump_label: Disable jump labels in __exit codeJosh Poimboeuf
With the following commit: 333522447063 ("jump_label: Explicitly disable jump labels in __init code") ... we explicitly disabled jump labels in __init code, so they could be detected and not warned about in the following commit: dc1dd184c2f0 ("jump_label: Warn on failed jump_label patching attempt") In-kernel __exit code has the same issue. It's never used, so it's freed along with the rest of initmem. But jump label entries in __exit code aren't explicitly disabled, so we get the following warning when enabling pr_debug() in __exit code: can't patch jump_label at dmi_sysfs_exit+0x0/0x2d WARNING: CPU: 0 PID: 22572 at kernel/jump_label.c:376 __jump_label_update+0x9d/0xb0 Fix the warning by disabling all jump labels in initmem (which includes both __init and __exit code). Reported-and-tested-by: Li Wang <liwang@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: dc1dd184c2f0 ("jump_label: Warn on failed jump_label patching attempt") Link: http://lkml.kernel.org/r/7121e6e595374f06616c505b6e690e275c0054d1.1521483452.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20perf/x86/intel/uncore: Add missing filter constraint for SKX CHA eventStephane Eranian
Adding a filter constraint for Intel Skylake CHA event UNC_CHA_UPI_CREDITS_ACQUIRED (0x38). The event supports core-id/thread-id and link filtering. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1520869294-14176-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period()Dan Carpenter
We intended to clear the lowest 6 bits but because of a type bug we clear the high 32 bits as well. Andi says that periods are rarely more than U32_MAX so this bug probably doesn't have a huge runtime impact. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 294fe0f52a44 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") Link: http://lkml.kernel.org/r/20180317115216.GB4035@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20perf/x86/intel: Disable userspace RDPMC usage for large PEBSKan Liang
Userspace RDPMC cannot possibly work for large PEBS, which was introduced in: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)") When the PEBS interrupt threshold is larger than one, there is no way to get exact auto-reload times and value for userspace RDPMC. Disable the userspace RDPMC usage when large PEBS is enabled. The only exception is when the PEBS interrupt threshold is 1, in which case user-space RDPMC works well even with auto-reload events. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)") Link: http://lkml.kernel.org/r/1518474035-21006-6-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 1af22eba248efe2de25658041a80a3d40fb3e92e)
2018-03-20sched/wait: Improve __var_waitqueue() code generationPeter Zijlstra
Since we fixed hash_64() to not suck there is no need to play games to attempt to improve the hash value on 64-bit. Also, since we don't use the bit value for the variables, use hash_ptr() directly. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait: Remove the wait_on_atomic_t() APIPeter Zijlstra
There are no users left (everyone got converted to wait_var_event()), remove it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, arch/mips: Fix and convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. And while there, fix a bug and add the missing wakeup... Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: James Hogan <jhogan@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, fs/ocfs2: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Joel Becker <jlbec@evilplan.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, fs/fscache: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, fs/btrfs: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, fs/afs: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, drivers/media: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. Unlike wake_up_atomic_t(), wake_up_var() will issue the wakeup even if the variable is not 0. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanimir Varbanov <stanimir.varbanov@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new ↵Peter Zijlstra
wait_var_event() API The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. Unlike wake_up_atomic_t(), wake_up_var() will issue the wakeup even if the variable is not 0. No change in functionality. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/wait: Introduce wait_var_event()Peter Zijlstra
As a replacement for the wait_on_atomic_t() API provide the wait_var_event() API. The wait_var_event() API is based on the very same hashed-waitqueue idea, but doesn't care about the type (atomic_t) or the specific condition (atomic_read() == 0). IOW. it's much more widely applicable/flexible. It shares all the benefits/disadvantages of a hashed-waitqueue approach with the existing wait_on_atomic_t/wait_on_bit() APIs. The API is modeled after the existing wait_event() API, but instead of taking a wait_queue_head, it takes an address. This addresses is hashed to obtain a wait_queue_head from the bit_wait_table. Similar to the wait_event() API, it takes a condition expression as second argument and will wait until this expression becomes true. The following are (mostly) identical replacements: wait_on_atomic_t(&my_atomic, atomic_t_wait, TASK_UNINTERRUPTIBLE); wake_up_atomic_t(&my_atomic); wait_var_event(&my_atomic, !atomic_read(&my_atomic)); wake_up_var(&my_atomic); The only difference is that wake_up_var() is an unconditional wakeup and doesn't check the previously hard-coded (atomic_read() == 0) condition here. This is of little concequence, since most callers are already conditional on atomic_dec_and_test() and the ones that are not, are trivial to make so. Tested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/fair: Update util_est only on util_avg updatesPatrick Bellasi
The estimated utilization of a task is currently updated every time the task is dequeued. However, to keep overheads under control, PELT signals are effectively updated at maximum once every 1ms. Thus, for really short running tasks, it can happen that their util_avg value has not been updates since their last enqueue. If such tasks are also frequently running tasks (e.g. the kind of workload generated by hackbench) it can also happen that their util_avg is updated only every few activations. This means that updating util_est at every dequeue potentially introduces not necessary overheads and it's also conceptually wrong if the util_avg signal has never been updated during a task activation. Let's introduce a throttling mechanism on task's util_est updates to sync them with util_avg updates. To make the solution memory efficient, both in terms of space and load/store operations, we encode a synchronization flag into the LSB of util_est.enqueued. This makes util_est an even values only metric, which is still considered good enough for its purpose. The synchronization bit is (re)set by __update_load_avg_se() once the PELT signal of a task has been updated during its last activation. Such a throttling mechanism allows to keep under control util_est overheads in the wakeup hot path, thus making it a suitable mechanism which can be enabled also on high-intensity workload systems. Thus, this now switches on by default the estimation utilization scheduler feature. Suggested-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-5-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/cpufreq/schedutil: Use util_est for OPP selectionPatrick Bellasi
When schedutil looks at the CPU utilization, the current PELT value for that CPU is returned straight away. In certain scenarios this can have undesired side effects and delays on frequency selection. For example, since the task utilization is decayed at wakeup time, a long sleeping big task newly enqueued does not add immediately a significant contribution to the target CPU. This introduces some latency before schedutil will be able to detect the best frequency required by that task. Moreover, the PELT signal build-up time is a function of the current frequency, because of the scale invariant load tracking support. Thus, starting from a lower frequency, the utilization build-up time will increase even more and further delays the selection of the actual frequency which better serves the task requirements. In order to reduce these kind of latencies, we integrate the usage of the CPU's estimated utilization in the sugov_get_util function. This allows to properly consider the expected utilization of a CPU which, for example, has just got a big task running after a long sleep period. Ultimately this allows to select the best frequency to run a task right after its wake-up. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-4-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/fair: Use util_est in LB and WU pathsPatrick Bellasi
When the scheduler looks at the CPU utilization, the current PELT value for a CPU is returned straight away. In certain scenarios this can have undesired side effects on task placement. For example, since the task utilization is decayed at wakeup time, when a long sleeping big task is enqueued it does not add immediately a significant contribution to the target CPU. As a result we generate a race condition where other tasks can be placed on the same CPU while it is still considered relatively empty. In order to reduce this kind of race conditions, this patch introduces the required support to integrate the usage of the CPU's estimated utilization in the wakeup path, via cpu_util_wake(), as well as in the load-balance path, via cpu_util() which is used by update_sg_lb_stats(). The estimated utilization of a CPU is defined to be the maximum between its PELT's utilization and the sum of the estimated utilization (at previous dequeue time) of all the tasks currently RUNNABLE on that CPU. This allows to properly represent the spare capacity of a CPU which, for example, has just got a big task running since a long sleep period. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-3-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20sched/fair: Add util_est on top of PELTPatrick Bellasi
The util_avg signal computed by PELT is too variable for some use-cases. For example, a big task waking up after a long sleep period will have its utilization almost completely decayed. This introduces some latency before schedutil will be able to pick the best frequency to run a task. The same issue can affect task placement. Indeed, since the task utilization is already decayed at wakeup, when the task is enqueued in a CPU, this can result in a CPU running a big task as being temporarily represented as being almost empty. This leads to a race condition where other tasks can be potentially allocated on a CPU which just started to run a big task which slept for a relatively long period. Moreover, the PELT utilization of a task can be updated every [ms], thus making it a continuously changing value for certain longer running tasks. This means that the instantaneous PELT utilization of a RUNNING task is not really meaningful to properly support scheduler decisions. For all these reasons, a more stable signal can do a better job of representing the expected/estimated utilization of a task/cfs_rq. Such a signal can be easily created on top of PELT by still using it as an estimator which produces values to be aggregated on meaningful events. This patch adds a simple implementation of util_est, a new signal built on top of PELT's util_avg where: util_est(task) = max(task::util_avg, f(task::util_avg@dequeue)) This allows to remember how big a task has been reported by PELT in its previous activations via f(task::util_avg@dequeue), which is the new _task_util_est(struct task_struct*) function added by this patch. If a task should change its behavior and it runs longer in a new activation, after a certain time its util_est will just track the original PELT signal (i.e. task::util_avg). The estimated utilization of cfs_rq is defined only for root ones. That's because the only sensible consumer of this signal are the scheduler and schedutil when looking for the overall CPU utilization due to FAIR tasks. For this reason, the estimated utilization of a root cfs_rq is simply defined as: util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est::enqueued) where: cfs_rq::util_est::enqueued = sum(_task_util_est(task)) for each RUNNABLE task on that root cfs_rq It's worth noting that the estimated utilization is tracked only for objects of interests, specifically: - Tasks: to better support tasks placement decisions - root cfs_rqs: to better support both tasks placement decisions as well as frequencies selection Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-2-patrick.bellasi@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20Merge branch 'linus' into sched/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20locking/mutex: Improve documentationMatthew Wilcox
On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > My memory is weak and our documentation is awful. What does > mutex_lock_killable() actually do and how does it differ from > mutex_lock_interruptible()? Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the kernel-doc for mutex_lock_interruptible(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cl@linux.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/20180315115812.GA9949@bombadil.infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20x86/boot/64: Verify alignment of the LOAD segmentH.J. Lu
Since the x86-64 kernel must be aligned to 2MB, refuse to boot the kernel if the alignment of the LOAD segment isn't a multiple of 2MB. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOrR7xSJgUfiCoZLuqWUwymRxXPoGBW38%2BpN%3D9g%2ByKNhZw@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20x86/build/64: Force the linker to use 2MB page sizeH.J. Lu
Binutils 2.31 will enable -z separate-code by default for x86 to avoid mixing code pages with data to improve cache performance as well as security. To reduce x86-64 executable and shared object sizes, the maximum page size is reduced from 2MB to 4KB. But x86-64 kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to linker to force 2MB page size regardless of the default page size used by linker. Tested with Linux kernel 4.15.6 on x86-64. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/CAMe9rOp4_%3D_8twdpTyAP2DhONOCeaTOsniJLoppzhoNptL8xzA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-19scsi: iscsi_tcp: set BDI_CAP_STABLE_WRITES when data digest enabledJianchao Wang
iscsi tcp will first send out data, then calculate and send data digest. If we don't have BDI_CAP_STABLE_WRITES, the page cache will be written in spite of the on going writeback. Consequently, wrong digest will be got and sent to target. To fix this, set BDI_CAP_STABLE_WRITES when data digest is enabled in iscsi_tcp .slave_configure callback. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Acked-by: Chris Leech <cleech@redhat.com> Acked-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-03-19scsi: sd: Remember that READ CAPACITY(16) succeededMartin K. Petersen
The USB storage glue sets the try_rc_10_first flag in an attempt to avoid wedging poorly implemented legacy USB devices. If the device capacity is too large to be expressed in the provided response buffer field of READ CAPACITY(10), a well-behaved device will set the reported capacity to 0xFFFFFFFF. We will then attempt to issue a READ CAPACITY(16) to obtain the real capacity. Since this part of the discovery logic is not covered by the first_scan flag, a warning will be printed a couple of times times per revalidate attempt if we upgrade from READ CAPACITY(10) to READ CAPACITY(16). Remember that we have successfully issued READ CAPACITY(16) so we can take the fast path on subsequent revalidate attempts. Reported-by: Menion <menion@gmail.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-03-20regulator: giving regulator controlling gpios a non-empty label when used ↵Nicholas Lowell
through the devicetree. When the label is empty, it causes missing information and limits diagnostics for instances such as 'cat /sys/kernel/debug/gpio' Setting the label to the regulator supply_name will point to the device using the gpio(s). Signed-off-by: Nicholas Lowell <nlowell@lexmark.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-03-19Merge branch 'phy-relax-error-checking'David S. Miller
Grygorii Strashko says: ==================== net: phy: relax error checking when creating sysfs link netdev->phydev Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per one netdevice, as result such drivers will produce warning during system boot and fail to connect second phy to netdevice when PHYLIB framework will try to create sysfs link netdev->phydev for second PHY in phy_attach_direct(), because sysfs link with the same name has been created already for the first PHY. As result, second CPSW external port will became unusable. This regression was introduced by commits: 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for attached_dev/phydev" a3995460491d ("net: phy: Relax error checking on sysfs_create_link()" Patch 1: exports sysfs_create_link_nowarn() function as preparation for Patch 2. Patch 2: relaxes error checking when PHYLIB framework is creating sysfs link netdev->phydev in phy_attach_direct(), suppresses warning by using sysfs_create_link_nowarn() and adds error message instead, so links creation failure is not fatal any more and system can continue working, which fixes TI CPSW issue and makes boot logs accessible in case of NFS boot, for example. This can be stable material 4.13+. Changes in v2: - commit messages updated. v1: https://patchwork.ozlabs.org/cover/886058/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19net: phy: relax error checking when creating sysfs link netdev->phydevGrygorii Strashko
Some ethernet drivers (like TI CPSW) may connect and manage >1 Net PHYs per one netdevice, as result such drivers will produce warning during system boot and fail to connect second phy to netdevice when PHYLIB framework will try to create sysfs link netdev->phydev for second PHY in phy_attach_direct(), because sysfs link with the same name has been created already for the first PHY. As result, second CPSW external port will became unusable. Fix it by relaxing error checking when PHYLIB framework is creating sysfs link netdev->phydev in phy_attach_direct(), suppressing warning by using sysfs_create_link_nowarn() and adding error message instead. After this change links (phy->netdev and netdev->phy) creation failure is not fatal any more and system can continue working, which fixes TI CPSW issue. Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Fixes: a3995460491d ("net: phy: Relax error checking on sysfs_create_link()") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19sysfs: symlink: export sysfs_create_link_nowarn()Grygorii Strashko
The sysfs_create_link_nowarn() is going to be used in phylib framework in subsequent patch which can be built as module. Hence, export sysfs_create_link_nowarn() to avoid build errors. Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Fixes: a3995460491d ("net: phy: Relax error checking on sysfs_create_link()") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20spi: rspi: use correct enum for DMA transfer directionStefan Agner
Use enum dma_transfer_direction as required by dmaengine_prep_slave_sg instead of enum dma_data_direction. This won't change behavior in practice as the enum values are equivalent. This fixes two warnings when building with clang: drivers/spi/spi-rspi.c:538:26: warning: implicit conversion from enumeration type 'enum dma_data_direction' to different enumeration type 'enum dma_transfer_direction' [-Wenum-conversion] rx->sgl, rx->nents, DMA_FROM_DEVICE, ^~~~~~~~~~~~~~~ drivers/spi/spi-rspi.c:558:26: warning: implicit conversion from enumeration type 'enum dma_data_direction' to different enumeration type 'enum dma_transfer_direction' [-Wenum-conversion] tx->sgl, tx->nents, DMA_TO_DEVICE, ^~~~~~~~~~~~~ Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-03-19drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.Dhinakaran Pandiyan
If bios sets up an MST output and hardware state readout code sees this is an SST configuration, when disabling the encoder we end up calling ->post_disable_dp() hook instead of the MST version. Consequently, we write to the DP_SET_POWER dpcd to set it D3 state. Further along when we try enable the encoder in MST mode, POWER_UP_PHY transaction fails to power up the MST hub. This results in continuous link training failures which keep the system busy delaying boot. We could identify bios MST boot discrepancy and handle it accordingly but a simple way to solve this is to write to the DP_SET_POWER dpcd for MST too. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105470 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reported-by: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Fixes: 5ea2355a100a ("drm/i915/mst: Use MST sideband message transactions for dpms control") Tested-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180314054825.1718-1-dhinakaran.pandiyan@intel.com (cherry picked from commit ad260ab32a4d94fa974f58262f8000472d34fd5b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2018-03-19Merge branch 'for-4.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two commits to fix the following subtle cgroup2 behavior bugs: - cpu.max was rejecting config when it shouldn't - thread mode enable was allowed when it shouldn't" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix rule checking for threaded mode switching sched, cgroup: Don't reject lower cpu.max on ancestors
2018-03-19ACPI / watchdog: Fix off-by-one error at resource assignmentTakashi Iwai
The resource allocation in WDAT watchdog has off-one-by error, it sets one byte more than the actual end address. This may eventually lead to unexpected resource conflicts. Fixes: 058dfc767008 (ACPI / watchdog: Add support for WDAT hardware watchdog) Cc: 4.9+ <stable@vger.kernel.org> # 4.9+ Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-03-19Merge branch 'for-4.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two low-impact workqueue commits. One fixes workqueue creation error path and the other removes the unused cancel_work()" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove unused cancel_work() workqueue: use put_device() instead of kfree()
2018-03-19Merge branch 'for-4.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu fixes from Tejun Heo: "Late percpu pull request for v4.16-rc6. - percpu allocator pool replenishing no longer triggers OOM or warning messages. Also, the alloc interface now understands __GFP_NORETRY and __GFP_NOWARN. This is to allow avoiding OOMs from userland triggered actions like bpf map creation. Also added cond_resched() in alloc loop. - perpcu allocation now can be interrupted by kill sigs to avoid deadlocking OOM killer. - Added Dennis Zhou as a co-maintainer. He has rewritten the area map allocator, understands most of the code base and has been responsive for all bug reports" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu_ref: Update doc to dissuade users from depending on internal RCU grace periods mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() percpu: include linux/sched.h for cond_resched() percpu: add a schedule point in pcpu_balance_workfn() percpu: allow select gfp to be passed to underlying allocators percpu: add __GFP_NORETRY semantics to the percpu balancing path percpu: match chunk allocator declarations with definitions percpu: add Dennis Zhou as a percpu co-maintainer
2018-03-19Merge branch 'for-4.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "I sat on them too long and it's quite a few this late, but nothing has a wide blast area. The changes are... - Fix corner cases in SG command handling. - Recent introduction of default powersaving mode config option exposed several devices with broken powersaving behaviors. A number of patches to update the blacklist accordingly. - Fix a kernel panic on SAS hotplug. - Other misc and device specific updates" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata: Modify quirks for MX100 to limit NCQ_TRIM quirk to MU01 version libata: Make Crucial BX100 500GB LPM quirk apply to all firmware versions libata: Apply NOLPM quirk to Crucial M500 480 and 960GB SSDs libata: Enable queued TRIM for Samsung SSD 860 PCI: Add function 1 DMA alias quirk for Highpoint RocketRAID 644L ahci: Add PCI-id for the Highpoint Rocketraid 644L card ata: do not schedule hot plug if it is a sas host libata: disable LPM for Crucial BX100 SSD 500GB drive libata: Apply NOLPM quirk to Crucial MX100 512GB SSDs libata: update documentation for sysfs interfaces ata: sata_rcar: Remove unused variable in sata_rcar_init_controller() libata: transport: cleanup documentation of sysfs interface sata_rcar: Reset SATA PHY when Salvator-X board resumes libata: don't try to pass through NCQ commands to non-NCQ devices libata: remove WARN() for DMA or PIO command without data libata: fix length validation of ATAPI-relayed SCSI commands ata: libahci: fix comment indentation ahci: Add check for device presence (PCIe hot unplug) in ahci_stop_engine() libata: Fix compile warning with ATA_DEBUG enabled
2018-03-19drm/sun4i: backend: Support YUV planesMaxime Ripard
Now that we have the guarantee that we will have only a single YUV plane, actually support them. The way it works is not really straightforward, since we first need to enable the YUV mode in the plane that we want to setup, and then we have a few registers to setup the YUV buffer and parameters. We also need to setup the color correction to actually have something displayed. Reviewed-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Link: https://patchwork.freedesktop.org/patch/msgid/66088c1398bd3189123f28a89a7ccc669fe9f296.1519931807.git-series.maxime.ripard@bootlin.com
2018-03-19nfsd: remove blocked locks on client teardownJeff Layton
We had some reports of panics in nfsd4_lm_notify, and that showed a nfs4_lockowner that had outlived its so_client. Ensure that we walk any leftover lockowners after tearing down all of the stateids, and remove any blocked locks that they hold. With this change, we also don't need to walk the nbl_lru on nfsd_net shutdown, as that will happen naturally when we tear down the clients. Fixes: 76d348fadff5 (nfsd: have nfsd4_lock use blocking locks for v4.1+ locks) Reported-by: Frank Sorenson <fsorenso@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # 4.9 Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-03-19Merge branch 'bpf-sockmap-ulp'Daniel Borkmann
John Fastabend says: ==================== This series adds a BPF hook for sendmsg and senfile by using the ULP infrastructure and sockmap. A simple pseudocode example would be, // load the programs bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog); // lookup the sockmap bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map"); // get fd for sockmap map_fd_msg = bpf_map__fd(bpf_map_msg); // attach program to sockmap bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0); // Add a socket 'fd' to sockmap at location 'i' bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY); After the above snippet any socket attached to the map would run msg_prog on sendmsg and sendfile system calls. Three additional helpers are added bpf_msg_apply_bytes(), bpf_msg_cork_bytes(), and bpf_msg_pull_data(). With bpf_msg_apply_bytes BPF programs can tell the infrastructure how many bytes the given verdict should apply to. This has two cases. First, a BPF program applies verdict to fewer bytes than in the current sendmsg/sendfile msg this will apply the verdict to the first N bytes of the message then run the BPF program again with data pointers recalculated to the N+1 byte. The second case is the BPF program applies a verdict to more bytes than the current sendmsg or sendfile system call. In this case the infrastructure will cache the verdict and apply it to future sendmsg/sendfile calls until the byte limit is reached. This avoids the overhead of running BPF programs on large payloads. The helper bpf_msg_cork_bytes() handles a different case where a BPF program can not reach a verdict on a msg until it receives more bytes AND the program doesn't want to forward the packet until it is known to be "good". The example case being a user (albeit a dumb one probably) sends a N byte header in 1B system calls. The BPF program can call bpf_msg_cork_bytes with the required byte limit to reach a verdict and then the program will only be called again once N bytes are received. The last helper added in this series is bpf_msg_pull_data(). It is used to pull data in for modification or reading. Similar to how sk_pull_data() works msg_pull_data can be used to access data not in the initial (data_start, data_end) range. For sendpage() calls this is needed if any data is accessed because the BPF sendpage hook initializes the data_start and data_end pointers to zero. We do this because sendpage data is shared with the user and can be modified during or after the BPF verdict possibly invalidating any verdict the BPF program decides. For sendmsg the data is already copied by the sendmsg bpf infrastructure so we only copy the data if the user request a data range that is not already linearized. This happens if the user requests larger blocks of data that are not in a single scatterlist element. The common case seems to be accessing headers which normally are in the first scatterlist element and already linearized. For more examples please review the sample program. There are examples for all the actions and helpers there. Patches 1-8 implement the above sockmap/BPF infrastructure. The remaining patches flush out some minimal selftests and the sample sockmap program. The sockmap sample program is the main vehicle for testing this infrastructure and will be moved into selftests shortly. The final patch in this series is a simple shell script to run a set of tests. These are the tests I run after any changes to sockmap. The next task on the list after this series is to push those into selftests so we can avoid manually testing. Couple notes on future items in the pipeline, 0. move sample sockmap programs into selftests (noted above) 1. add additional support for tcp flags, most are ignored now. 2. add a Documentation/bpf/sockmap file with these details 3. support stacked ULP types to allow this and ktls to cooperate 4. Ingress flag support, redirect only supports egress here. The other redirect helpers support ingress and egress flags. 5. add optimizations, I cut a few optimizations here in the first iteration of the code for later study/implementation -v3 updates : u32 data pointers in msg_md changed to void * : page_address NULL check and flag verification in msg_pull_data : remove old note in commit msg that is no longer relevant : remove enum sk_msg_action its not used anywhere : fixup test_verifier W -> DW insn to account for data pointers : unintentionally dropped a smap_stop_tx() call in sockmap.c I propagated the ACKs forward because above changes were small one/two line changes. -v2 updates (discussion): Dave noticed that sendpage call was previously (in v1) running on the data directly. This allowed users to potentially modify the data after or during the BPF program. However doing a copy automatically even if the data is not accessed has measurable performance impact. So we added another helper modeled after the existing skb_pull_data() helper to allow users to selectively pull data from the msg. This is also useful in the sendmsg case when users need to access data outside the first scatterlist element or across scatterlist boundaries. While doing this I also unified the sendmsg and sendfile handlers a bit. Originally the sendfile call was optimized for never touching the data. I've decided for a first submission to drop this optimization and we can add it back later. It introduced unnecessary complexity, at least for a first posting, for a use case I have not entirely flushed out yet. When the use case is deployed we can add it back if needed. Then we can review concrete performance deltas as well on real-world use-cases/applications. Lastly, I reorganized the patches a bit. Now all sockmap changes are in a single patch and each helper gets its own patch. This, at least IMO, makes it easier to review because sockmap changes are not spread across the patch series. On the other hand now apply_bytes, cork_bytes logic is only activated later in the series. But that should be OK. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>