linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-08-04	Merge tag 'iommu-fixes-v4.13-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: - fix a scheduling-while-atomic bug in the AMD IOMMU driver. It was found after the checker was enabled earlier. - a fix for the virtual APIC code in the AMD IOMMU driver which delivers device interrupts directly into KVM guests for assigned devices. - fixes for the recently merged lock-less page-table code for ARM. The redundant TLB syncs got reverted and locks added again around the TLB sync code. - fix for error handling in arm_smmu_add_device() - address sanitization fix for arm io-pgtable code * tag 'iommu-fixes-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix schedule-while-atomic BUG in initialization code iommu/amd: Enable ga_log_intr when enabling guest_mode iommu/io-pgtable: Sanitise map/unmap addresses iommu/arm-smmu: Fix the error path in arm_smmu_add_device Revert "iommu/io-pgtable: Avoid redundant TLB syncs" iommu/mtk: Avoid redundant TLB syncs locally iommu/arm-smmu: Reintroduce locking around TLB sync operations
2017-08-04	Merge tag 'mmc-v4.13-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "A couple of mmc fixes intended for v4.13-rc4. MMC core: - Fix NULL pointer dereference for block I/O during hotplug MMC host: - sdhci-of-at91: Fix card detect for non-removable cards" * tag 'mmc-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: block: bypass the queue even if usage is present for hotplug mmc: sdhci-of-at91: force card detect value for non removable devices
2017-08-04	Merge tag 'drm-fixes-for-v4.13-rc4' of ↵	Linus Torvalds
	git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Either my email ate everything or everyone is on holidays, either way all I can find is some lonely AMD fixes" [ Europe might be on vacation, and the Pacific NW is too hot for work. ] * tag 'drm-fixes-for-v4.13-rc4' of git://people.freedesktop.org/~airlied/linux: drm/amdgpu: Use list_del_init in amdgpu_mn_unregister drm/amdgpu: Fix undue fallthroughs in golden registers initialization drm/amdgpu: fix header on gfx9 clear state
2017-08-04	Merge tag 'powerpc-4.13-5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for recently merged code: - a fix for the _PAGE_DEVMAP support, which was breaking KVM on Power9 radix - avoid a (harmless) lockdep warning in the early SMP code - return failure for some uses of dma_set_mask() rather than falling back to 32-bits - fix stack setup in watchdog soft_nmi_common() to use emergency stack - fix of_irq_to_resource() error check in of_fsl_spi_probe() Two fixes going to stable: - fix saving of Transactional Memory SPRs in core dump - fix __check_irq_replay missing decrementer interrupt And two misc: - fix 64-bit boot wrapper build with non-biarch compiler - work around a POWER9 PMU hang after state-loss idle Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner" * tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix __check_irq_replay missing decrementer interrupt powerpc/perf: POWER9 PMU stops after idle workaround powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check powerpc/64s: Fix stack setup in watchdog soft_nmi_common() powerpc/powernv/pci: Return failure for some uses of dma_set_mask() powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU powerpc/tm: Fix saving of TM SPRs in core dump powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
2017-08-04	sparc64: Fix exception handling in UltraSPARC-III memcpy.	David S. Miller
	Mikael Pettersson reported that some test programs in the strace-4.18 testsuite cause an OOPS. After some debugging it turns out that garbage values are returned when an exception occurs, causing the fixup memset() to be run with bogus arguments. The problem is that two of the exception handler stubs write the successfully copied length into the wrong register. Fixes: ee841d0aff64 ("sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.") Reported-by: Mikael Pettersson <mikpelinux@gmail.com> Tested-by: Mikael Pettersson <mikpelinux@gmail.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04	arm64: avoid overflow in VA_START and PAGE_OFFSET	Nick Desaulniers
	The bitmask used to define these values produces overflow, as seen by this compiler warning: arch/arm64/kernel/head.S:47:8: warning: integer overflow in preprocessor expression #elif (PAGE_OFFSET & 0x1fffff) != 0 ^~~~~~~~~~~ arch/arm64/include/asm/memory.h:52:46: note: expanded from macro 'PAGE_OFFSET' #define PAGE_OFFSET (UL(0xffffffffffffffff) << (VA_BITS - 1)) ~~~~~~~~~~~~~~~~~~ ^ It would be preferrable to use GENMASK_ULL() instead, but it's not set up to be used from assembly (the UL() macro token pastes UL suffixes when not included in assembly sources). Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Suggested-by: Yury Norov <ynorov@caviumnetworks.com> Suggested-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-04	arm64: Fix potential race with hardware DBM in ptep_set_access_flags()	Catalin Marinas
	In a system with DBM (dirty bit management) capable agents there is a possible race between a CPU executing ptep_set_access_flags() (maybe non-DBM capable) and a hardware update of the dirty state (clearing of PTE_RDONLY). The scenario: a) the pte is writable (PTE_WRITE set), clean (PTE_RDONLY set) and old (PTE_AF clear) b) ptep_set_access_flags() is called as a result of a read access and it needs to set the pte to writable, clean and young (PTE_AF set) c) a DBM-capable agent, as a result of a different write access, is marking the entry as young (setting PTE_AF) and dirty (clearing PTE_RDONLY) The current ptep_set_access_flags() implementation would set the PTE_RDONLY bit in the resulting value overriding the DBM update and losing the dirty state. This patch fixes such race by setting PTE_RDONLY to the most permissive (lowest value) of the current entry and the new one. Fixes: 66dbd6e61a52 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM") Cc: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-08-04	Merge tag 'davinci-fixes-for-v4.13' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes Pull "DaVinci fixes for v4.13" from Sekhar Nori: Drop unused VPIF endpoints from device-tree. They should be used only when an actual remote-endpoint is connected. * tag 'davinci-fixes-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci: ARM: dts: da850-lcdk: drop unused VPIF endpoints ARM: dts: da850-evm: drop unused VPIF endpoints
2017-08-04	Merge tag 'sunxi-fixes-for-4.13' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes Pull "Allwinner fixes for 4.13" from Chen-Yu Tsai: Two fixes to correct the EMAC blocks memory region size to match the datasheet. One that converts raw A83T clock indices to macros from the clk dt-binding header, completing the A83T sunxi-ng clk driver. * tag 'sunxi-fixes-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: ARM: dts: sun8i: a83t: Switch to CCU device tree binding macros arm64: allwinner: sun50i-a64: Correct emac register size ARM: dts: sunxi: h3/h5: Correct emac register size
2017-08-04	Merge tag 'qcom-arm64-defconfig-fixes-for-4.13-rc2' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into fixes Pull "Qualcomm ARM64 based defconfig Fixes for v4.13-rc2" from Andy Gross: * Enable missing HWSPINLOCK * tag 'qcom-arm64-defconfig-fixes-for-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux: arm64: defconfig: enable missing HWSPINLOCK
2017-08-04	ARM: dts: tango4: Request RGMII RX and TX clock delays	Marc Gonzalez
	RX and TX clock delays are required. Request them explicitly. Fixes: cad008b8a77e6 ("ARM: dts: tango4: Initial device trees") Cc: stable@vger.kernel.org Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-08-04	bus: uniphier-system-bus: set up registers when resuming	Masahiro Yamada
	When resuming, set up registers that have been lost in the sleep state. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-08-04	Merge tag 'renesas-fixes3-for-v4.13' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes Pull "Third Round of Renesas ARM Based SoC Fixes for v4.13" from Simon Horman: Fix deadlock in regulator quirk for R-Car Gen 2 SoCs The da9063/da9210 regulator quirk for R-Car Gen2 boards uses a bus notifier, and unregisters the notifier when it is no longer needed. However, a notifier must not be unregistered from within the call chain. This bug went unnoticed, as blocking_notifier_chain_unregister() didn't take the semaphore during early boot. This is no longer the case as of upstream commit 1c3c5eab171590f8 ("sched/core: Enable might_sleep() and smp_processor_id() checks early") and a deadlock occurs. * tag 'renesas-fixes3-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: rcar-gen2: Fix deadlock in regulator quirk
2017-08-04	Merge tag 'mvebu-fixes-4.13-2' of git://git.infradead.org/linux-mvebu into fixes	Arnd Bergmann
	Pull "mvebu fixes for 4.13 (part 2)" from Gregory CLEMENT: All the fixes are for ARM64 mvebu: - Fix the RTC interrupt on A7K/A8K which was missed when switching from GIC to ICU - Mark the A7K/A8K crypto engine as dma coherent - Fix the number of GPIO on south bridge on Armada 3700 * tag 'mvebu-fixes-4.13-2' of git://git.infradead.org/linux-mvebu: ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge arm64: dts: marvell: mark the cp110 crypto engine as dma coherent arm64: dts: marvell: use ICU for the CP110 slave RTC
2017-08-04	Merge tag 'amlogic-fixes' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into fixes Pull "Amlogic fixes for v4.13-rc" from Kevin Hilman: - 2 minor DT fixes * tag 'amlogic-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic: ARM64: dts: meson-gxl-s905x-libretech-cc: fixup board definition ARM64: dts: meson-gx: use specific compatible for the AO pwms
2017-08-04	Merge tag 'v4.13-rockchip-dts32fixes-1' of ↵	Arnd Bergmann
	git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes Pull "Rockchip dts32 fixes for 4.13" from Heiko Stübner: Fix for the recently added mali dt support. The example showed a wrong value, so fix it before it gets copy-pasted to much. * tag 'v4.13-rockchip-dts32fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: ARM: dts: rockchip: fix mali gpu node on rk3288 dt-bindings: gpu: drop wrong compatible from midgard binding example
2017-08-04	drm/i915/gvt: Initialize MMIO Block with HW state	Tina Zhang
	MMIO block with tracked mmio, is introduced for the sake of performance of searching tracked mmio. All the tracked mmio needs to get the initial value from the HW state during vGPU being created. This patch is to initialize the tracked registers in MMIO block with the HW state. v2: Add "Fixes:" line for this patch (Zhenyu) Fixes: 65f9f6febf12 ("drm/i915/gvt: Optimize MMIO register handling for some large MMIO blocks") Signed-off-by: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-08-04	ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC	Vineet Gupta
	PAE40 confiuration in hardware extends some of the address registers for TLB/cache ops to 2 words. So far kernel was NOT setting the higher word if feature was not enabled in software which is wrong. Those need to be set to 0 in such case. Normally this would be done in the cache flush / tlb ops, however since these registers only exist conditionally, this would have to be conditional to a flag being set on boot which is expensive/ugly - specially for the more common case of PAE exists but not in use. Optimize that by zero'ing them once at boot - nobody will write to them afterwards Cc: stable@vger.kernel.org #4.4+ Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-04	ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses	Alexey Brodkin
	It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1 which hold MSB bits of the physical address correspondingly of region start and end otherwise SLC region operation is executed in unpredictable manner Without this patch, SLC flushes on HSDK (IOC disabled) were taking seconds. Cc: stable@vger.kernel.org #4.4+ Reported-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: PAR40 regs only written if PAE40 exist]
2017-08-04	ARC: dma: implement dma_unmap_page and sg variant	Vineet Gupta
	Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-04	ARCv2: SLC: Make sure busy bit is set properly for region ops	Alexey Brodkin
	c70c473396cb "ARCv2: SLC: Make sure busy bit is set properly on SLC flushing" fixes problem for entire SLC operation where the problem was initially caught. But given a nature of the issue it is perfectly possible for busy bit to be read incorrectly even when region operation was started. So extending initial fix for regional operation as well. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: stable@vger.kernel.org #4.10 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-04	ARC: [plat-sim] Include this platform unconditionally	Vineet Gupta
	Essentially remove CONFIG_ARC_PLAT_SIM There is no need for any platform specific code, just the board DTS match strings which we can include unconditionally Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-04	ARC: [plat-axs10x]: prepare dts files for enabling PAE40 on axs103	Eugeniy Paltsev
	Enable 64bit adressing, where it needed, to make possible enabling PAE40 on axs103. This patch doesn't affect on any functionality. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev-HKixBCOQz3hWk0Htik3J/w@public.gmane.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-04	drm/rockchip: vop: report error when check resource error	Mark yao
	The user would be confused while facing a error commit without any error report. Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494596-7090-1-git-send-email-mark.yao@rock-chips.com
2017-08-04	drm/rockchip: vop: round_up pitches to word align	Mark yao
	VOP pitch register is word align, need align to word. VOP_WIN0_VIR: bit[31:16] win0_vir_stride_uv Number of words of Win0 uv Virtual width bit[15:0] win0_vir_width Number of words of Win0 yrgb Virtual width ARGB888 : win0_vir_width RGB888 : (win0_vir_width*3/4) + (win0_vir_width%3) RGB565 : ceil(win0_vir_width/2) YUV : ceil(win0_vir_width/4) Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494591-7034-1-git-send-email-mark.yao@rock-chips.com
2017-08-04	drm/rockchip: vop: fix NV12 video display error	Mark yao
	fixup the scale calculation formula on the case src_height == (dst_height/2). Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494586-6984-1-git-send-email-mark.yao@rock-chips.com
2017-08-04	drm/rockchip: vop: fix iommu page fault when resume	Mark yao
	Iommu would get page fault with following path: vop_disable: 1, disable all windows and set vop config done 2, vop enter to standy, all windows not works, but their registers are not clean, when you read window's enable bit, may found the window is enable. vop_enable: 1, memcpy(vop->regsbak, vop->regs, len) save current vop registers to vop->regsbak, then you can found window is enable on regsbak. 2, VOP_WIN_SET(vop, win, gate, 1); force enable window gate, but gate and enable are on same hardware register, then window enable bit rewrite to vop hardware. 3, vop power on, and vop might try to scan destroyed buffer, then iommu get page fault. Move windows disable after vop regsbak restore, then vop regsbak mechanism would keep tracing the modify, everything would be safe. Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494582-6934-1-git-send-email-mark.yao@rock-chips.com
2017-08-04	powerpc/64: Fix __check_irq_replay missing decrementer interrupt	Nicholas Piggin
	If the decrementer wraps again and de-asserts the decrementer exception while hard-disabled, __check_irq_replay() has a test to notice the wrap when interrupts are re-enabled. The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked because the decrementer interrupt was always the first one checked after clearing the hard disable flag, but HMI check was moved ahead of that, which introduced this bug. This can cause a missed decrementer interrupt if we soft-disable interrupts then take an HMI which is recorded in irq_happened, then hard-disable interrupts for > 4s to wrap the decrementer. Fixes: e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-04	powerpc/perf: POWER9 PMU stops after idle workaround	Nicholas Piggin
	POWER9 DD2 PMU can stop after a state-loss idle in some conditions. A solution is to set then clear MMCRA[60] after wake from state-loss idle. MMCRA[60] is a non-architected bit, see the user manual for details. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-04	Merge branch 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux ↵	Dave Airlie
	into drm-fixes Just a few small fixes for 4.13. * 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: Use list_del_init in amdgpu_mn_unregister drm/amdgpu: Fix undue fallthroughs in golden registers initialization drm/amdgpu: fix header on gfx9 clear state
2017-08-03	Merge branch 'tcp-xmit-timer-rearming'	David S. Miller
	Neal Cardwell says: ==================== tcp: fix xmit timer rearming to avoid stalls This patch series is a bug fix for a TCP loss recovery performance bug reported independently in recent netdev threads: (i) July 26, 2017: netdev thread "TCP fast retransmit issues" (ii) July 26, 2017: netdev thread: "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one outstanding TLP retransmission" Many thanks to Klavs Klavsen and Mao Wenan for the detailed reports, traces, and packetdrill test cases, which enabled us to root-cause this issue and verify the fix. - v1 -> v2: - In patch 2/3, changed an unclear comment in the pre-existing code in tcp_schedule_loss_probe() to be more clear (thanks to Eric Dumazet for suggesting we improve this). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03	tcp: fix xmit timer to only be reset if data ACKed/SACKed	Neal Cardwell
	Fix a TCP loss recovery performance bug raised recently on the netdev list, in two threads: (i) July 26, 2017: netdev thread "TCP fast retransmit issues" (ii) July 26, 2017: netdev thread: "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one outstanding TLP retransmission" The basic problem is that incoming TCP packets that did not indicate forward progress could cause the xmit timer (TLP or RTO) to be rearmed and pushed back in time. In certain corner cases this could result in the following problems noted in these threads: - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes could cause TCP to repeatedly schedule TLPs forever. We kept sending TLPs after every ~200ms, which elicited bogus SACKs, which caused more TLPs, ad infinitum; we never fired an RTO to fill in the holes. - Incoming data segments could, in some cases, cause us to reschedule our RTO or TLP timer further out in time, for no good reason. This could cause repeated inbound data to result in stalls in outbound data, in the presence of packet loss. This commit fixes these bugs by changing the TLP and RTO ACK processing to: (a) Only reschedule the xmit timer once per ACK. (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the ACK indicates sufficient forward progress (a packet was cumulatively ACKed, or we got a SACK for a packet that was sent before the most recent retransmit of the write queue head). This brings us back into closer compliance with the RFCs, since, as the comment for tcp_rearm_rto() notes, we should only restart the RTO timer after forward progress on the connection. Previously we were restarting the xmit timer even in these cases where there was no forward progress. As a side benefit, this commit simplifies and speeds up the TCP timer arming logic. We had been calling inet_csk_reset_xmit_timer() three times on normal ACKs that cumulatively acknowledged some data: 1) Once near the top of tcp_ack() to switch from TLP timer to RTO: if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) tcp_rearm_rto(sk); 2) Once in tcp_clean_rtx_queue(), to update the RTO: if (flag & FLAG_ACKED) { tcp_rearm_rto(sk); 3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO to TLP: if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); This commit, by only rescheduling the xmit timer once per ACK, simplifies the code and reduces CPU overhead. This commit was tested in an A/B test with Google web server traffic. SNMP stats and request latency metrics were within noise levels, substantiating that for normal web traffic patterns this is a rare issue. This commit was also tested with packetdrill tests to verify that it fixes the timer behavior in the corner cases discussed in the netdev threads mentioned above. This patch is a bug fix patch intended to be queued for -stable relases. Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)") Reported-by: Klavs Klavsen <kl@vsen.dk> Reported-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03	tcp: enable xmit timer fix by having TLP use time when RTO should fire	Neal Cardwell
	Have tcp_schedule_loss_probe() base the TLP scheduling decision based on when the RTO should fire. This is to enable the upcoming xmit timer fix in this series, where tcp_schedule_loss_probe() cannot assume that the last timer installed was an RTO timer (because we are no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every ACK). So tcp_schedule_loss_probe() must independently figure out when an RTO would want to fire. In the new TLP implementation following in this series, we cannot assume that icsk_timeout was set based on an RTO; after processing a cumulative ACK the icsk_timeout we see can be from a previous TLP or RTO. So we need to independently recalculate the RTO time (instead of reading it out of icsk_timeout). Removing this dependency on the nature of icsk_timeout makes things a little easier to reason about anyway. Note that the old and new code should be equivalent, since they are both saying: "if the RTO is in the future, but at an earlier time than the normal TLP time, then set the TLP timer to fire when the RTO would have fired". Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03	tcp: introduce tcp_rto_delta_us() helper for xmit timer fix	Neal Cardwell
	Pure refactor. This helper will be required in the xmit timer fix later in the patch series. (Because the TLP logic will want to make this calculation.) Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03	Merge tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio	Linus Torvalds
	Pull VFIO fixes from Alex Williamson: - SPAPR/EEH config build fix (Murilo Opsfelder Araujo) - Fix possible device lock deadlock (Alex Williamson) - Correctly size integrated endpoint PCIe capabilities (Alex Williamson) * tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio: vfio/pci: Fix handling of RC integrated endpoint PCIe capability size vfio/pci: Use pci_try_reset_function() on initial open include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH
2017-08-03	ipv6: set rt6i_protocol properly in the route when it is installed	Xin Long
	After commit c2ed1880fd61 ("net: ipv6: check route protocol when deleting routes"), ipv6 route checks rt protocol when trying to remove a rt entry. It introduced a side effect causing 'ip -6 route flush cache' not to work well. When flushing caches with iproute, all route caches get dumped from kernel then removed one by one by sending DELROUTE requests to kernel for each cache. The thing is iproute sends the request with the cache whose proto is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps it. But in kernel the rt_cache protocol is still 0, which causes the cache not to be matched and removed. So the real reason is rt6i_protocol in the route is not set when it is allocated. As David Ahern's suggestion, this patch is to set rt6i_protocol properly in the route when it is installed and remove the codes setting rtm_protocol according to rt6i_flags in rt6_fill_node. This is also an improvement to keep rt6i_protocol consistent with rtm_protocol. Fixes: c2ed1880fd61 ("net: ipv6: check route protocol when deleting routes") Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge misc fixes from Andrew Morton: "15 fixes" [ This does not merge the "fortify: use WARN instead of BUG for now" patch, which needs a bit of extra work to build cleanly with all configurations. Arnd is on it. - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: ocfs2: don't clear SGID when inheriting ACLs mm: allow page_cache_get_speculative in interrupt context userfaultfd: non-cooperative: flush event_wqh at release time ipc: add missing container_of()s for randstruct cpuset: fix a deadlock due to incomplete patching of cpusets_enabled() userfaultfd_zeropage: return -ENOSPC in case mm has gone mm: take memory hotplug lock within numa_zonelist_order_handler() mm/page_io.c: fix oops during block io poll in swapin path zram: do not free pool->size_class kthread: fix documentation build warning kasan: avoid -Wmaybe-uninitialized warning userfaultfd: non-cooperative: notify about unmap of destination during mremap mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries pid: kill pidhash_size in pidhash_init() mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
2017-08-03	Merge tag 'acpi-4.13-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix two issues in the ACPI SoC drivers (Intel LPSS and AMD APD), a crash in the PCC mailbox initialization code and a WDAT watchdog initialization failure. Specifics: - Fix a device ID of Hisilicon Hip07/08 in the ACPI APD (AMD SoC) driver (Hanjun Guo). - Fix list corruption (introduced during the 4.11 cycle) in the ACPI LPSS (Intel SoC) driver (Hans de Goede). - Fix PCC mailbox handling code crash during initialization when PCCT is not present and PCC channel 0 is requested (Hoan Tran). - Fix a WDAT watchdog initialization issue causing platform device creation to fail due to partially overlapping address ranges in resources (Ryan Kennedy)" * tag 'acpi-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: APD: Fix HID for Hisilicon Hip07/08 mailbox: pcc: Fix crash when request PCC channel 0 ACPI / watchdog: Fix init failure with overlapping register regions ACPI / LPSS: Only call pwm_add_table() for the first PWM controller
2017-08-03	Merge tag 'pm-4.13-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two cpufreq issues, one introduced recently and one related to recent changes, fix cpufreq documentation, fix up recently added code in the Thunderbolt driver and update runtime PM framework documentation. Specifics: - Fix the handling of the scaling_cur_freq cpufreq policy attribute on x86 systems with the MPERF/APERF registers present to make it behave more as expected after recent changes (Rafael Wysocki). - Drop a leftover callback from the intel_pstate driver which also prevents the cpuinfo_cur_freq cpufreq policy attribute from being incorrectly exposed when intel_pstate works in the active mode (Rafael Wysocki). - Add a missing piece describing the cpuinfo_cur_freq policy attribute to cpufreq documentation (Rafael Wysocki). - Fix up a recently added part of the Thunderbolt driver to avoid aborting system suspends if its mailbox commands time out (Rafael Wysocki). - Update device runtime PM framework documentation to reflect the current behavior of the code (Johan Hovold)" * tag 'pm-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thunderbolt: icm: Ignore mailbox errors in icm_suspend() cpufreq: x86: Make scaling_cur_freq behave more as expected PM / runtime: Document new pm_runtime_set_suspended() constraint cpufreq: docs: Add missing cpuinfo_cur_freq description cpufreq: intel_pstate: Drop ->get from intel_pstate structure
2017-08-03	Merge branches 'acpi-soc', 'acpi-wdat' and 'acpi-cppc'	Rafael J. Wysocki
	* acpi-soc: ACPI: APD: Fix HID for Hisilicon Hip07/08 ACPI / LPSS: Only call pwm_add_table() for the first PWM controller * acpi-wdat: ACPI / watchdog: Fix init failure with overlapping register regions * acpi-cppc: mailbox: pcc: Fix crash when request PCC channel 0
2017-08-03	Merge branches 'pm-core' and 'pm-misc'	Rafael J. Wysocki
	* pm-core: PM / runtime: Document new pm_runtime_set_suspended() constraint * pm-misc: thunderbolt: icm: Ignore mailbox errors in icm_suspend()
2017-08-03	Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate'	Rafael J. Wysocki
	* pm-cpufreq-x86: cpufreq: x86: Make scaling_cur_freq behave more as expected * pm-cpufreq-docs: cpufreq: docs: Add missing cpuinfo_cur_freq description * intel_pstate: cpufreq: intel_pstate: Drop ->get from intel_pstate structure
2017-08-03	Merge tag 'usb-serial-4.13-rc4' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial fixes for v4.13-rc4 Here are some new device ids for v4.13-rc4. All have been in linux-next with no reported issues. Signed-off-by: Johan Hovold <johan@kernel.org>
2017-08-03	net: fix keepalive code vs TCP_FASTOPEN_CONNECT	Eric Dumazet
	syzkaller was able to trigger a divide by 0 in TCP stack [1] Issue here is that keepalive timer needs to be updated to not attempt to send a probe if the connection setup was deferred using TCP_FASTOPEN_CONNECT socket option added in linux-4.11 [1] divide error: 0000 [#1] SMP CPU: 18 PID: 0 Comm: swapper/18 Not tainted task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000 RIP: 0010:[<ffffffff8409cc0d>] [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160 Call Trace: <IRQ> [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20 [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0 [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160 [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230 [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0 [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280 [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257 [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0 [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80 [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90 <EOI> [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0 [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0 Tested: Following packetdrill no longer crashes the kernel `echo 0 >/proc/sys/net/ipv4/tcp_timestamps` // Cache warmup: send a Fast Open cookie request 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_SETFL, O_RDWR\|O_NONBLOCK) = 0 +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress) +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop> +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop> +0 > . 1:1(0) ack 1 +0 close(3) = 0 +0 > F. 1:1(0) ack 1 +0 < F. 1:1(0) ack 2 win 92 +0 > . 2:2(0) ack 2 +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4 +0 fcntl(4, F_SETFL, O_RDWR\|O_NONBLOCK) = 0 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0 +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 +.01 connect(4, ..., ...) = 0 +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0 +10 close(4) = 0 `echo 1 >/proc/sys/net/ipv4/tcp_timestamps` Fixes: 19f6d3f3c842 ("net/tcp-fastopen: Add new API support") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03	Merge tag 'batadv-net-for-davem-20170802' of git://git.open-mesh.org/linux-merge	David S. Miller
	Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - fix TT sync flag inconsistency problems, which can lead to excess packets, by Linus Luessing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03	Merge tag 'fixes-for-v4.13-rc4' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus Felipe writes: usb: fixes for v4.13-rc4 Another fix for isochronous transfers on dwc3. This time around, we're making sure that we use correct PIDs in all transfer sizes. MSM PHY driver got a fix for the use of devm_regulator_bulk_get() API which will avoid kernel crashes. Renesas DRD driver got 3 fixes: a fix on giveback, a fix for proper controller programming and the removal of set-but-never-used variable.
2017-08-03	Merge tag 'kvm-arm-for-v4.13-rc4' of ↵	Radim Krčmář
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Fixes for v4.13-rc4 - Yet another race with VM destruction plugged - A set of small vgic fixes
2017-08-03	fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio	Ashish Samant
	Commit 8fba54aebbdf ("fuse: direct-io: don't dirty ITER_BVEC pages") fixes the ITER_BVEC page deadlock for direct io in fuse by checking in fuse_direct_io(), whether the page is a bvec page or not, before locking it. However, this check is missed when the "async_dio" mount option is enabled. In this case, set_page_dirty_lock() is called from the req->end callback in request_end(), when the fuse thread is returning from userspace to respond to the read request. This will cause the same deadlock because the bvec condition is not checked in this path. Here is the stack of the deadlocked thread, while returning from userspace: [13706.656686] INFO: task glusterfs:3006 blocked for more than 120 seconds. [13706.657808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [13706.658788] glusterfs D ffffffff816c80f0 0 3006 1 0x00000080 [13706.658797] ffff8800d6713a58 0000000000000086 ffff8800d9ad7000 ffff8800d9ad5400 [13706.658799] ffff88011ffd5cc0 ffff8800d6710008 ffff88011fd176c0 7fffffffffffffff [13706.658801] 0000000000000002 ffffffff816c80f0 ffff8800d6713a78 ffffffff816c790e [13706.658803] Call Trace: [13706.658809] [<ffffffff816c80f0>] ? bit_wait_io_timeout+0x80/0x80 [13706.658811] [<ffffffff816c790e>] schedule+0x3e/0x90 [13706.658813] [<ffffffff816ca7e5>] schedule_timeout+0x1b5/0x210 [13706.658816] [<ffffffff81073ffb>] ? gup_pud_range+0x1db/0x1f0 [13706.658817] [<ffffffff810668fe>] ? kvm_clock_read+0x1e/0x20 [13706.658819] [<ffffffff81066909>] ? kvm_clock_get_cycles+0x9/0x10 [13706.658822] [<ffffffff810f5792>] ? ktime_get+0x52/0xc0 [13706.658824] [<ffffffff816c6f04>] io_schedule_timeout+0xa4/0x110 [13706.658826] [<ffffffff816c8126>] bit_wait_io+0x36/0x50 [13706.658828] [<ffffffff816c7d06>] __wait_on_bit_lock+0x76/0xb0 [13706.658831] [<ffffffffa0545636>] ? lock_request+0x46/0x70 [fuse] [13706.658834] [<ffffffff8118800a>] __lock_page+0xaa/0xb0 [13706.658836] [<ffffffff810c8500>] ? wake_atomic_t_function+0x40/0x40 [13706.658838] [<ffffffff81194d08>] set_page_dirty_lock+0x58/0x60 [13706.658841] [<ffffffffa054d968>] fuse_release_user_pages+0x58/0x70 [fuse] [13706.658844] [<ffffffffa0551430>] ? fuse_aio_complete+0x190/0x190 [fuse] [13706.658847] [<ffffffffa0551459>] fuse_aio_complete_req+0x29/0x90 [fuse] [13706.658849] [<ffffffffa05471e9>] request_end+0xd9/0x190 [fuse] [13706.658852] [<ffffffffa0549126>] fuse_dev_do_write+0x336/0x490 [fuse] [13706.658854] [<ffffffffa054963e>] fuse_dev_write+0x6e/0xa0 [fuse] [13706.658857] [<ffffffff812a9ef3>] ? security_file_permission+0x23/0x90 [13706.658859] [<ffffffff81205300>] do_iter_readv_writev+0x60/0x90 [13706.658862] [<ffffffffa05495d0>] ? fuse_dev_splice_write+0x350/0x350 [fuse] [13706.658863] [<ffffffff812062a1>] do_readv_writev+0x171/0x1f0 [13706.658866] [<ffffffff810b3d00>] ? try_to_wake_up+0x210/0x210 [13706.658868] [<ffffffff81206361>] vfs_writev+0x41/0x50 [13706.658870] [<ffffffff81206496>] SyS_writev+0x56/0xf0 [13706.658872] [<ffffffff810257a1>] ? syscall_trace_leave+0xf1/0x160 [13706.658874] [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71 Fix this by making should_dirty a fuse_io_priv parameter that can be checked in fuse_aio_complete_req(). Reported-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-08-03	KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg	Christoffer Dall
	There is a small chance that the compiler could generate separate loads for the dist->propbaser which could be modified from another CPU. As we want to make sure we atomically update the entire value, and don't race with other updates, guarantee that the cmpxchg operation compares against the original value. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-08-03	KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"	Wanpeng Li
	------------[ cut here ]------------ WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] Call Trace: vmx_check_nested_events+0x131/0x1f0 [kvm_intel] ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 do_syscall_64+0x8f/0x750 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which means that tells the kernel to not make use of any IOAPICs that may be present in the system. Actually external_intr variable in nested_vmx_vmexit() is the req_int_win variable passed from vcpu_enter_guest() which means that the L0's userspace requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && L0's userspace reqeusts an irq window) is true, so there is no interrupt which L1 requires to inject to L2, we should not attempt to emualte "Acknowledge interrupt on exit" for the irq window requirement in this scenario. This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit" if there is no L1 requirement to inject an interrupt to L2. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [Added code comment to make it obvious that the behavior is not correct. We should do a userspace exit with open interrupt window instead of the nested VM exit. This patch still improves the behavior, so it was accepted as a (temporary) workaround.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>