summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-02-09thermal: rcar: enable to use thermal-zone on DTKuninori Morimoto
This patch enables to use thermal-zone on DT if it was calles as "renesas,rcar-thermal-gen2". Previous style (= non thermal-zone) is still supported by "renesas,rcar-thermal" to keep compatibility for "git bisect". Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-02-09thermal: of: use for_each_available_child_of_node for child iteratorLaxman Dewangan
Use for_each_available_child_of_node() for iterating over each available child instead of iterating over each child and then checking their status. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-02-09Revert "workqueue: make sure delayed work run in local cpu"Tejun Heo
This reverts commit 874bbfe600a660cba9c776b3957b1ce393151b76. Workqueue used to implicity guarantee that work items queued without explicit CPU specified are put on the local CPU. Recent changes in timer broke the guarantee and led to vmstat breakage which was fixed by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU we need it to run on"). vmstat is the most likely to expose the issue and it's quite possible that there are other similar problems which are a lot more difficult to trigger. As a preventive measure, 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu") was applied to restore the local CPU guarnatee. Unfortunately, the change exposed a bug in timer code which got fixed by 22b886dd1018 ("timers: Use proper base migration in add_timer_on()"). Due to code restructuring, the commit couldn't be backported beyond certain point and stable kernels which only had 874bbfe600a6 started crashing. The local CPU guarantee was accidental more than anything else and we want to get rid of it anyway. As, with the vmstat case fixed, 874bbfe600a6 is causing more problems than it's fixing, it has been decided to take the chance and officially break the guarantee by reverting the commit. A debug feature will be added to force foreign CPU assignment to expose cases relying on the guarantee and fixes for the individual cases will be backported to stable as necessary. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu") Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz Cc: stable@vger.kernel.org Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shli@fb.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Michal Hocko <mhocko@kernel.org>
2016-02-09spi: spi-ti-qspi: add mmap mode read supportVignesh R
ti-qspi controller provides mmap port to read data from SPI flashes. mmap port is enabled in QSPI_SPI_SWITCH_REG. ctrl module register may also need to be accessed for some SoCs. The QSPI_SPI_SETUP_REGx needs to be populated with flash specific information like read opcode, read mode(quad, dual, normal), address width and dummy bytes. Once, controller is in mmap mode, the whole flash memory is available as a memory region at SoC specific address. This region can be accessed using normal memcpy() (or mem-to-mem dma copy). The ti-qspi controller hardware will internally communicate with SPI flash over SPI bus and get the requested data. Implement spi_flash_read() callback to support mmap read over SPI flash devices. With this, the read throughput increases from ~100kB/s to ~2.5 MB/s. Signed-off-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09spi: introduce accelerated read support for spi flash devicesVignesh R
In addition to providing direct access to SPI bus, some spi controller hardwares (like ti-qspi) provide special port (like memory mapped port) that are optimized to improve SPI flash read performance. This means the controller can automatically send the SPI signals required to read data from the SPI flash device. For this, SPI controller needs to know flash specific information like read command to use, dummy bytes and address width. Introduce spi_flash_read() interface to support accelerated read over SPI flash devices. SPI master drivers can implement this callback to support interfaces such as memory mapped read etc. m25p80 flash driver and other flash drivers can call this make use of such interfaces. The interface should only be used with SPI flashes and cannot be used with other SPI devices. Signed-off-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09block: fix module reference leak on put_disk() call for cgroups throttleRoman Pen
get_disk(),get_gendisk() calls have non explicit side effect: they increase the reference on the disk owner module. The following is the correct sequence how to get a disk reference and to put it: disk = get_gendisk(...); /* use disk */ owner = disk->fops->owner; put_disk(disk); module_put(owner); fs/block_dev.c is aware of this required module_put() call, but f.e. blkg_conf_finish(), which is located in block/blk-cgroup.c, does not put a module reference. To see a leakage in action cgroups throttle config can be used. In the following script I'm removing throttle for /dev/ram0 (actually this is NOP, because throttle was never set for this device): # lsmod | grep brd brd 5175 0 # i=100; while [ $i -gt 0 ]; do echo "1:0 0" > \ /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device; i=$(($i - 1)); \ done # lsmod | grep brd brd 5175 100 Now brd module has 100 references. The issue is fixed by calling module_put() just right away put_disk(). Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Cc: Gi-Oh Kim <gi-oh.kim@profitbricks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-09spi: core: add spi_split_transfers_maxsizeMartin Sperl
Add spi_split_transfers_maxsize method that splits spi_transfers transparently into multiple transfers that are below the given max-size. This makes use of the spi_res framework via spi_replace_transfers to allocate/free the extra transfers as well as reverting back the changes applied while processing the spi_message. Signed-off-by: Martin Sperl <kernel@martin.sperl.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09spi: core: add spi_replace_transfers methodMartin Sperl
Add the spi_replace_transfers method that can get used to replace some spi_transfers from a spi_message with other transfers. Signed-off-by: Martin Sperl <kernel@martin.sperl.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09spi: core: added spi_resource managementMartin Sperl
SPI resource management framework used while processing a spi_message via the spi-core. The basic idea is taken from devres, but as the allocation may happen fairly frequently, some provisioning (in the form of an unused spi_device pointer argument to spi_res_alloc) has been made so that at a later stage we may implement reuse objects allocated earlier avoiding the repeated allocation by keeping a cache of objects that we can reuse. This framework can get used for: * rewriting spi_messages * to fullfill alignment requirements of the spi_master HW * to fullfill transfer length requirements (e.g: transfers need to be less than 64k) * consolidate spi_messages with multiple transfers into a single transfer when the total transfer length is below a threshold. * reimplement spi_unmap_buf without explicitly needing to check if it has been mapped Signed-off-by: Martin Sperl <kernel@martin.sperl.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09spi: pxa2xx: Add support for both chip selects on Intel BraswellMika Westerberg
Intel Braswell LPSS SPI controller actually has two chip selects and there is no capabilities register where this could be found out. These two chip selects are controlled by bits which are in slightly differrent location than Broxton has. Braswell Windows driver also starts chip select (ACPI DeviceSelection) numbering from 1 so translate it to be suitable for Linux as well. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09spi: pxa2xx: Move chip select control bits into lpss_config structureMika Westerberg
Some Intel LPSS SPI controllers, like the one in Braswell has these bits in a different location so move these bits to be part of the LPSS configuration. Since not all LPSS SPI controllers support multiple native chip selects we refactor selecting chip select to its own function and check control->cs_sel_mask before switching to another chip select. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09spi: pxa2xx: Translate ACPI DeviceSelection to Linux chip select on BaytrailMika Westerberg
The Windows Baytrail SPI host controller driver uses 1 as the first (and only) value for ACPI DeviceSelection like can be seen in DSDT taken from Lenovo Thinkpad 10: Device (FPNT) { ... Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings { Name (UBUF, ResourceTemplate () { SpiSerialBus (0x0001, // DeviceSelection PolarityLow, FourWireMode, 0x08, ControllerInitiated, 0x007A1200, ClockPolarityLow, ClockPhaseFirst, "\\_SB.SPI1", 0x00, ResourceConsumer,,) This will fail to enumerate in Linux with following error: [ 0.241296] pxa2xx-spi 80860F0E:00: cs1 >= max 1 [ 0.241312] spi_master spi32766: failed to add SPI device VFSI6101:00 from ACPI To make the Linux SPI core successfully enumerate the device we provide a custom version of ->fw_translate_cs() that translates DeviceSelection correctly. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09Merge branches 'pci/host-designware', 'pci/host-imx6', 'pci/host-layerscape' ↵Bjorn Helgaas
and 'pci/host-rcar' into next * pci/host-designware: PCI: designware: Remove PCI_PROBE_ONLY handling PCI: designware: Explain why we don't program ATU for some platforms * pci/host-imx6: PCI: imx6: Move link up check into imx6_pcie_wait_for_link() PCI: imx6: Remove broken Gen2 workaround PCI: imx6: Move PHY reset into imx6_pcie_establish_link() PCI: imx6: Move imx6_pcie_reset_phy() near other PHY handling functions * pci/host-layerscape: PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID * pci/host-rcar: PCI: rcar: Remove PCI_PROBE_ONLY handling
2016-02-09Merge branches 'pci/aer', 'pci/misc' and 'pci/virtualization' into nextBjorn Helgaas
* pci/aer: PCI/AER: Use list_first_entry_or_null() to simplify code PCI/AER: Restore pci_ops pointer while calling original pci_ops PCI/AER: Rename pci_ops_aer to aer_inj_pci_ops * pci/misc: PCI: Remove includes of asm/pci-bridge.h PCI: Remove empty asm-generic/pci-bridge.h ARM64: PCI: Remove generated include of asm-generic/pci-bridge.h PCI: Remove includes of empty asm-generic/pci-bridge.h PCI: Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h PCI/PME: Restructure pcie_pme_suspend() to prevent compiler warning PCI/PME: Remove redundant port lookup PCI: Check device_attach() return value always * pci/virtualization: PCI: Add ACS quirk for all Cavium devices
2016-02-09PCI: designware: Remove PCI_PROBE_ONLY handlingLorenzo Pieralisi
The PCIe designware host driver is not used in system configurations requiring the PCI_PROBE_ONLY flag to be set to prevent resources assignment, therefore the driver code handling the flag can be removed from the kernel. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Pratyush Anand <pratyush.anand@gmail.com> Acked-by: Jingoo Han Jingoo Han <jingoohan1@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Gabriele Paoloni <gabriele.paoloni@huawei.com> Cc: Zhou Wang <wangzhou1@hisilicon.com>
2016-02-09PCI: designware: Explain why we don't program ATU for some platformsJisheng Zhang
Some platforms don't support ATU, e.g., pci-keystone.c. These platforms use their own address translation component rather than ATU, and they provide the rd_other_conf and wr_other_conf methods to program the translation component and perform the access. Add a comment to explain why we don't program the ATU for these platforms. [bhelgaas: changelog] Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-02-09Merge branch 'topic/acpi' of ↵Mark Brown
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi into spi-pxa2xx
2016-02-09spi: Let drivers translate ACPI DeviceSelection to suitable Linux chip selectMika Westerberg
In Windows it is up to the SPI host controller driver to handle the ACPI DeviceSelection as it likes. The SPI core does not take any part in it. This is different in Linux because we always expect to have chip select in range of 0 .. master->num_chipselect - 1. In order to support this in Linux we need a way to allow the driver to translate between ACPI DeviceSelection field and Linux chip select number so provide a new optional hook ->fw_translate_cs() that can be used by a driver to handle translation and call this hook if set during SPI slave ACPI enumeration. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: API: - Fix async algif_skcipher, it was broken by recent fixes. - Fix potential race condition in algif_skcipher with ctx. - Fix potential memory corruption in algif_skcipher. - Add missing lock to crypto_user when doing an alg dump. Drivers: - marvell/cesa was testing the wrong variable for NULL after allocation. - Fix potential double-free in atmel-sha. - Fix illegal call to sleepin function from atomic context in atmel-sha" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: marvell/cesa - fix test in mv_cesa_dev_dma_init() crypto: atmel-sha - remove calls of clk_prepare() from atomic contexts crypto: atmel-sha - fix atmel_sha_remove() crypto: algif_skcipher - Do not set MAY_BACKLOG on the async path crypto: algif_skcipher - Do not dereference ctx without socket lock crypto: algif_skcipher - Do not assume that req is unchanged crypto: user - lock crypto_alg_list on alg dump
2016-02-09scripts: add "prune-kernel" script to clean up old kernel imagesJ. Bruce Fields
Long ago, Dave Jones complained about CONFIG_LOCALVERSION_AUTO: "I don't use the auto config, because I end up filling up /boot unless I go through and clean them out by hand every time I install a new one (which I do probably a dozen or so times a day). Is there some easy way to prune old builds I'm missing?" To which Bruce replied: "I run this by hand every now and then. I'm probably doing it all wrong" And if he is running it wrong, then so am I - because I've been using this script ever since. It is true that CONFIG_LOCALVERSION_AUTO easily ends up filling your /boot partition if you don't clean up old versions regularly, and this script helps make that easier. Checked with Bruce to see that it's fine to add this to the kernel scripts. Maybe people will come up with enhancements, but more importantly, this way I won't misplace this script whenever I install a new machine and start doing custom kernels for it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-09arm64: disable kasan when accessing frame->fp in unwind_frameYang Shi
When boot arm64 kernel with KASAN enabled, the below error is reported by kasan: BUG: KASAN: out-of-bounds in unwind_frame+0xec/0x260 at addr ffffffc064d57ba0 Read of size 8 by task pidof/499 page:ffffffbdc39355c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 2 PID: 499 Comm: pidof Not tainted 4.5.0-rc1 #119 Hardware name: Freescale Layerscape 2085a RDB Board (DT) Call trace: [<ffffffc00008d078>] dump_backtrace+0x0/0x290 [<ffffffc00008d32c>] show_stack+0x24/0x30 [<ffffffc0006a981c>] dump_stack+0x8c/0xd8 [<ffffffc0002e4400>] kasan_report_error+0x558/0x588 [<ffffffc0002e4958>] kasan_report+0x60/0x70 [<ffffffc0002e3188>] __asan_load8+0x60/0x78 [<ffffffc00008c92c>] unwind_frame+0xec/0x260 [<ffffffc000087e60>] get_wchan+0x110/0x160 [<ffffffc0003b647c>] do_task_stat+0xb44/0xb68 [<ffffffc0003b7730>] proc_tgid_stat+0x40/0x50 [<ffffffc0003ac840>] proc_single_show+0x88/0xd8 [<ffffffc000345be8>] seq_read+0x370/0x770 [<ffffffc00030aba0>] __vfs_read+0xc8/0x1d8 [<ffffffc00030c0ec>] vfs_read+0x94/0x168 [<ffffffc00030d458>] SyS_read+0xb8/0x128 [<ffffffc000086530>] el0_svc_naked+0x24/0x28 Memory state around the buggy address: ffffffc064d57a80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 f4 f4 ffffffc064d57b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffc064d57b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffffffc064d57c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc064d57c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Since the shadow byte pointed by the report is 0, so it may mean it is just hit oob in non-current task. So, disable the instrumentation to silence these warnings. Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-09nvme: fix Kconfig description for BLK_DEV_NVME_SCSIChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-09kernel/fs: fix I/O wait not accounted for RW O_DSYNCStephane Gasparini
When a process is doing Random Write with O_DSYNC flag the I/O wait are not accounted in the kernel (get_cpu_iowait_time_us). This is preventing the governor or the cpufreq driver to account for I/O wait and thus use the right pstate Signed-off-by: Stephane Gasparini <stephane.gasparini@linux.intel.com> Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-09MIPS: Fix early CM probingPaul Burton
Commit c014d164f21d ("MIPS: Add platform callback before initializing the L2 cache") added a platform_early_l2_init function in order to allow platforms to probe for the CM before L2 initialisation is performed, so that CM GCRs are available to mips_sc_probe. That commit actually fails to do anything useful, since it checks mips_cm_revision to determine whether it should call mips_cm_probe but the result of mips_cm_revision will always be 0 until mips_cm_probe has been called. Thus the "early" mips_cm_probe call never occurs. Fix this & drop the useless weak platform_early_l2_init function by simply calling mips_cm_probe from setup_arch. For platforms that don't select CONFIG_MIPS_CM this will be a no-op, and for those that do it removes the requirement for them to call mips_cm_probe manually (although doing so isn't harmful for now). Signed-off-by: Paul Burton <paul.burton@imgtec.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Rob Herring <robh@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com> Cc: Jaedon Shin <jaedon.shin@gmail.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jonas Gorski <jogo@openwrt.org> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/12475/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-02-09regmap: irq: dispose all virtual irq before removing domainLaxman Dewangan
It is require to dispose all virtual irq of hwirq on chip created on given irq domain before removing this irq domain. Hence dispose all mapped irqs before deleting the irq domains in regmap_del_irq_chip(); Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2016-02-09KVM: x86: consolidate different ways to test for in-kernel LAPICPaolo Bonzini
Different pieces of code checked for vcpu->arch.apic being (non-)NULL, or used kvm_vcpu_has_lapic (more optimized) or lapic_in_kernel. Replace everything with lapic_in_kernel's name and kvm_vcpu_has_lapic's implementation. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09KVM: x86: consolidate "has lapic" checks into irq.cPaolo Bonzini
Do for kvm_cpu_has_pending_timer and kvm_inject_pending_timer_irqs what the other irq.c routines have been doing. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09KVM: APIC: remove unnecessary double checks on APIC existencePaolo Bonzini
Usually the in-kernel APIC's existence is checked in the caller. Do not bother checking it again in lapic.c. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09x86/fpu: Default eagerfpu=on on all CPUsAndy Lutomirski
We have eager and lazy FPU modes, introduced in: 304bceda6a18 ("x86, fpu: use non-lazy fpu restore for processors supporting xsave") The result is rather messy. There are two code paths in almost all of the FPU code, and only one of them (the eager case) is tested frequently, since most kernel developers have new enough hardware that we use eagerfpu. It seems that, on any remotely recent hardware, eagerfpu is a win: glibc uses SSE2, so laziness is probably overoptimistic, and, in any case, manipulating TS is far slower that saving and restoring the full state. (Stores to CR0.TS are serializing and are poorly optimized.) To try to shake out any latent issues on old hardware, this changes the default to eager on all CPUs. If no performance or functionality problems show up, a subsequent patch could remove lazy mode entirely. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/ac290de61bf08d9cfc2664a4f5080257ffc1075a.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/fpu: Speed up lazy FPU restores slightlyAndy Lutomirski
If we have an FPU, there's no need to check CR0 for FPU emulation. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/980004297e233c27066d54e71382c44cdd36ef7c.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/fpu: Fold fpu_copy() into fpu__copy()Andy Lutomirski
Splitting it into two functions needlessly obfuscated the code. While we're at it, improve the comment slightly. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/3eb5a63a9c5c84077b2677a7dfe684eef96fe59e.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/fpu: Fix FNSAVE usage in eagerfpu modeAndy Lutomirski
In eager fpu mode, having deactivated FPU without immediately reloading some other context is illegal. Therefore, to recover from FNSAVE, we can't just deactivate the state -- we need to reload it if we're not actively context switching. We had this wrong in fpu__save() and fpu__copy(). Fix both. __kernel_fpu_begin() was fine -- add a comment. This fixes a warning triggerable with nofxsr eagerfpu=on. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/60662444e13c76f06e23c15c5dcdba31b4ac3d67.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/fpu: Fix math emulation in eager fpu modeAndy Lutomirski
Systems without an FPU are generally old and therefore use lazy FPU switching. Unsurprisingly, math emulation in eager FPU mode is a bit buggy. Fix it. There were two bugs involving kernel code trying to use the FPU registers in eager mode even if they didn't exist and one BUG_ON() that was incorrect. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/b4b8d112436bd6fab866e1b4011131507e8d7fbe.1453675014.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: Honour passed pgprot in track_pfn_insert() and track_pfn_remap()Matthew Wilcox
track_pfn_insert() overwrites the pgprot that is passed in with a value based on the VMA's page_prot. This is a problem for people trying to do clever things with the new vm_insert_pfn_prot() as it will simply overwrite the passed protection flags. If we use the current value of the pgprot as the base, then it will behave as people are expecting. Also fix track_pfn_remap() in the same way. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1453742717-10326-2-git-send-email-matthew.r.wilcox@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/boot: Use proper array element type in memset() size calculationAlexander Kuleshov
I changed open coded zeroing loops to explicit memset()s in the following commit: 5e9ebbd87a99 ("x86/boot: Micro-optimize reset_early_page_tables()") The base for the size argument of memset was sizeof(pud_p/pmd_p), which are pointers - but the initialized array has pud_t/pmd_t elements. Luckily the two types had the same size, so this did not result in any runtime misbehavior. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1455025494-4063-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09locking/atomics: Update comment about READ_ONCE() and structuresKonrad Rzeszutek Wilk
The comment is out of data. Also point out the performance drawback of the barrier();__builtin_memcpy(); barrier() followed by another copy from stack (__u) to lvalue; Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453757600-11441-1-git-send-email-konrad.wilk@oracle.com [ Made it a bit more readable. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09sched/numa: Spread memory according to CPU and memory useRik van Riel
The pseudo-interleaving in NUMA placement has a fundamental problem: using hard usage thresholds to spread memory equally between nodes can prevent workloads from converging, or keep memory "trapped" on nodes where the workload is barely running any more. In order for workloads to properly converge, the memory migration should not be stopped when nodes reach parity, but instead be distributed according to how heavily memory is used from each node. This way memory migration and task migration reinforce each other, instead of one putting the brakes on the other. Remove the hard thresholds from the pseudo-interleaving code, and instead use a more gradual policy on memory placement. This also seems to improve convergence of workloads that do not run flat out, but sleep in between bursts of activity. We still want to slow down NUMA scanning and migration once a workload has settled on a few actively used nodes, so keep the 3/4 hysteresis in place. Keep track of whether a workload is actively running on multiple nodes, so task_numa_migrate does a full scan of the system for better task placement. In the case of running 3 SPECjbb2005 instances on a 4 node system, this code seems to result in fairer distribution of memory between nodes, with more memory bandwidth for each instance. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: mgorman@suse.de Link: http://lkml.kernel.org/r/20160125170739.2fc9a641@annuminas.surriel.com [ Minor readability tweaks. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache()Andy Lutomirski
DMI cacheability is very confused on x86. dmi_early_remap() uses early_ioremap(), which uses FIXMAP_PAGE_IO, which is __PAGE_KERNEL_IO, which is __PAGE_KERNEL, which is cached. Don't ask me why this makes any sense. dmi_remap() uses ioremap(), which requests an uncached mapping. However, on non-EFI systems, the DMI data generally lives between 0xf0000 and 0x100000, which is in the legacy ISA range, which triggers a special case in the PAT code that overrides the cache mode requested by ioremap() and forces a WB mapping. On a UEFI boot, however, the DMI table can live at any physical address. On my laptop, it's around 0x77dd0000. That's nowhere near the legacy ISA range, so the ioremap() implicit uncached type is honored and we end up with a UC- mapping. UC- is a very, very slow way to read from main memory, so dmi_walk() is likely to take much longer than necessary. Given that, even on UEFI, we do early cached DMI reads, it seems safe to just ask for cached access. Switch to ioremap_cache(). I haven't tried to benchmark this, but I'd guess it saves several milliseconds of boot time. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean Delvare <jdelvare@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/3147c38e51f439f3c8911db34c7d4ab22d854915.1453791969.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: If INVPCID is available, use it to flush global mappingsAndy Lutomirski
On my Skylake laptop, INVPCID function 2 (flush absolutely everything) takes about 376ns, whereas saving flags, twiddling CR4.PGE to flush global mappings, and restoring flags takes about 539ns. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: Add a 'noinvpcid' boot option to turn off INVPCIDAndy Lutomirski
This adds a chicken bit to turn off INVPCID in case something goes wrong. It's an early_param() because we do TLB flushes before we parse __setup() parameters. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/mm: Add INVPCID helpersAndy Lutomirski
This adds helpers for each of the four currently-specified INVPCID modes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/kasan: Write protect kasan zero shadowAndrey Ryabinin
After kasan_init() executed, no one is allowed to write to kasan_zero_page, so write protect it. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1452516679-32040-3-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09x86/kasan: Clear kasan_zero_page after TLB flushAndrey Ryabinin
Currently we clear kasan_zero_page before __flush_tlb_all(). This works with current implementation of native_flush_tlb[_global]() because it doesn't cause do any writes to kasan shadow memory. But any subtle change made in native_flush_tlb*() could break this. Also current code seems doesn't work for paravirt guests (lguest). Only after the TLB flush we can be sure that kasan_zero_page is not used as early shadow anymore (instrumented code will not write to it). So it should cleared it only after the TLB flush. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09KVM/VMX: Add host irq information in trace event when updating IRTE for ↵Feng Wu
posted interrupts Add host irq information in trace event, so we can better understand which irq is in posted mode. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09KVM: x86: Add lowest-priority support for vt-d posted-interruptsFeng Wu
Use vector-hashing to deliver lowest-priority interrupts for VT-d posted-interrupts. This patch extends kvm_intr_is_single_vcpu() to support lowest-priority handling. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09KVM: x86: Use vector-hashing to deliver lowest-priority interruptsFeng Wu
Use vector-hashing to deliver lowest-priority interrupts, As an example, modern Intel CPUs in server platform use this method to handle lowest-priority interrupts. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09KVM: Recover IRTE to remapped mode if the interrupt is not single-destinationFeng Wu
When the interrupt is not single destination any more, we need to change back IRTE to remapped mode explicitly. Signed-off-by: Feng Wu <feng.wu@intel.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09KVM: x86: introduce do_shl32_div32Paolo Bonzini
This is similar to the existing div_frac function, but it returns the remainder too. Unlike div_frac, it can be used to implement long division, e.g. (a << 64) / b for 32-bit a and b. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09flow_dissector: Fix unaligned access in __skb_flow_dissector when used by ↵Alexander Duyck
eth_get_headlen This patch fixes an issue with unaligned accesses when using eth_get_headlen on a page that was DMA aligned instead of being IP aligned. The fact is when trying to check the length we don't need to be looking at the flow label so we can reorder the checks to first check if we are supposed to gather the flow label and then make the call to actually get it. v2: Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can cause us to check for the flow label. Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09ALSA: timer: Fix race at concurrent readsTakashi Iwai
snd_timer_user_read() has a potential race among parallel reads, as qhead and qused are updated outside the critical section due to copy_to_user() calls. Move them into the critical section, and also sanitize the relevant code a bit. Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>