summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-07-02drm/i915: Fix buffer overflow in dsi_calc_mnp()Chris Wilson
smatch complain: drivers/gpu/drm/i915/intel_dsi_pll.c:101 dsi_calc_mnp() error: buffer overflow 'lfsr_converts' 39 <= 4294967234 and looks justified in doing so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-7-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-07-02drm/i915: Fix inconsistent indenting in vbt_panel_init()Chris Wilson
smatch complains: drivers/gpu/drm/i915/intel_dsi_panel_vbt.c:657 vbt_panel_init() warn: inconsistent indenting Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-6-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-07-02drm/i915: Match bitmask size to types in intel_fb_initial_config()Chris Wilson
smatch complains of: drivers/gpu/drm/i915/intel_fbdev.c:403 intel_fb_initial_config() warn: should '1 << i' be a 64 bit type? drivers/gpu/drm/i915/intel_fbdev.c:422 intel_fb_initial_config() warn: should '1 << i' be a 64 bit type? drivers/gpu/drm/i915/intel_fbdev.c:501 intel_fb_initial_config() warn: should '1 << i' be a 64 bit type? We are prepared to iterate over a u64 but don't limit the number of connectors we try to configure to a maximum of 64. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-5-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-07-02drm/i915: Fix inconsistent indenting in i915_error_state_to_str()Chris Wilson
smatch complains: drivers/gpu/drm/i915/i915_gpu_error.c:503 i915_error_state_to_str() warn: inconsistent indenting Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-4-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-07-02drm/i915: Fix indentation in i915_gem_framebuffer_info()Chris Wilson
smatch complains: drivers/gpu/drm/i915/i915_debugfs.c:1390 i915_frequency_info() Function too hairy. Giving up. drivers/gpu/drm/i915/i915_debugfs.c:1985 i915_gem_framebuffer_info() warn: inconsistent indenting Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-3-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-07-02drm/915: Fix long lines and random indent in gen6_set_rps_thresholds()Chris Wilson
smatch complains: drivers/gpu/drm/i915/intel_pm.c:4745 gen6_set_rps_thresholds() warn: inconsistent indenting Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-07-02drm/i915: Fix random indent in i915_drm_resume()Chris Wilson
smatch complains: drivers/gpu/drm/i915/i915_drv.c:1616 i915_drm_resume() warn: inconsistent indenting Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-07-02Merge tag 'drm-fixes-for-v4.7-rc6' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes frlm Dave Airlie: "Just some AMD and Intel fixes, the AMD ones are further production Polaris fixes, and the Intel ones fix some early timeouts, some PCI ID changes and a couple of other fixes. Still a bit Internet challenged here, hopefully end of next week will solve it" * tag 'drm-fixes-for-v4.7-rc6' of git://people.freedesktop.org/~airlied/linux: drm/i915: Fix missing unlock on error in i915_ppgtt_info() drm/amd/powerplay: workaround for UVD clock issue drm/amdgpu: add ACLK_CNTL setting for polaris10 drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11. drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10. drm/i915: Removing PCI IDs that are no longer listed as Kabylake. drm/i915: Add more Kabylake PCI IDs. drm/i915: Avoid early timeout during AUX transfers drm/i915/hsw: Avoid early timeout during LCPLL disable/restore drm/i915/lpt: Avoid early timeout during FDI PHY reset drm/i915/bxt: Avoid early timeout during PLL enable drm/i915: Refresh cached DP port register value on resume drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation drm/amd/powerplay: disable FFC. drm/amd/powerplay: add some definition for FFC feature on polaris.
2016-07-02Merge tag 'spi-fix-v4.7-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few small driver-specific fixes for SPI, all in the normal important if you hit them category especially the rockchip driver fix which addresses a race which has been exposed more frequently with some recent performance improvements" * tag 'spi-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: sunxi: fix transfer timeout spi: sun4i: fix FIFO limit spi: rockchip: Signal unfinished DMA transfers spi: spi-ti-qspi: Suspend the queue before removing the device
2016-07-02Merge tag 'regulator-fix-v4.7-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "Two small fixes for the regulator subsystem - one fixing a crash with one of the devices supported by the max77620 driver, another fixing startup for the anatop regulator when it starts up with the regulator in bypass mode" * tag 'regulator-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: max77620: check for valid regulator info regulator: anatop: allow regulator to be in bypass mode
2016-07-02Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A small fix for the newly added oxnas clk driver and a handful of rockchip clk driver fixes for newly added rk3399 support" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: Fix return value check in oxnas_stdclk_probe() clk: rockchip: release io resource when failing to init clk on rk3399 clk: rockchip: fix cpuclk registration error handling clk: rockchip: Revert "clk: rockchip: reset init state before mmc card initialization" clk: rockchip: fix incorrect parent for rk3399's {c,g}pll_aclk_perihp_src clk: rockchip: mark rk3399 GIC clocks as critical clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
2016-07-02Merge tag 'asoc-fix-v4.7-rc5' of ↵Takashi Iwai
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v4.7 A small clutch of hardware specific fixes for various ASoC devices, all small individually and important if you have that device but not otherwise.
2016-07-02Merge branch 'for-next' of http://git.agner.ch/git/linux-drm-fsl-dcu into ↵Dave Airlie
drm-next The patchset contains a new helper in drm_fb_cma_helper.c for suspend/ resume when using cma backed framebuffers. * 'for-next' of http://git.agner.ch/git/linux-drm-fsl-dcu: drm/fsl-dcu: disable vblank events on CRTC disable drm/fsl-dcu: implement suspend/resume using atomic helpers drm/fsl-dcu: use clk helpers drm/fsl-dcu: move layer initialization to plane file drm/fsl-dcu: store layer registers in soc_data drm/fb_cma_helper: add suspend helper
2016-07-02Back-merge tag 'v4.7-rc5' into drm-nextDave Airlie
Linux 4.7-rc5 The fsl-dcu pull needs -rc3 so go to -rc5 for now.
2016-07-02Merge branch 'sti-drm-next-2016-06-30' of ↵Dave Airlie
http://git.linaro.org/people/benjamin.gaignard/kernel into drm-next This pull request include 3 minors fix and one feature evolution around ASoC hdmi codec support. * 'sti-drm-next-2016-06-30' of http://git.linaro.org/people/benjamin.gaignard/kernel: drm: sti: Add ASoC generic hdmi codec support. drm/sti: adjust delay for AWG drm: sti: fix clocking issues in crtc drm/sti: Use 64-bit timestamps
2016-07-02Merge tag 'drm-intel-fixes-2016-06-30' of ↵Dave Airlie
git://anongit.freedesktop.org/drm-intel into drm-fixes here's a batch of i915 fixes for 4.7. * tag 'drm-intel-fixes-2016-06-30' of git://anongit.freedesktop.org/drm-intel: drm/i915: Fix missing unlock on error in i915_ppgtt_info() drm/i915: Removing PCI IDs that are no longer listed as Kabylake. drm/i915: Add more Kabylake PCI IDs. drm/i915: Avoid early timeout during AUX transfers drm/i915/hsw: Avoid early timeout during LCPLL disable/restore drm/i915/lpt: Avoid early timeout during FDI PHY reset drm/i915/bxt: Avoid early timeout during PLL enable drm/i915: Refresh cached DP port register value on resume
2016-07-02Merge branch 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Just a few more late fixes for Polaris cards. * 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux: drm/amd/powerplay: workaround for UVD clock issue drm/amdgpu: add ACLK_CNTL setting for polaris10 drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11. drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10. drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation drm/amd/powerplay: disable FFC. drm/amd/powerplay: add some definition for FFC feature on polaris.
2016-07-02MIPS: Fix possible corruption of cache mode by mprotect.Ralf Baechle
The following testcase may result in a page table entries with a invalid CCA field being generated: static void *bindstack; static int sysrqfd; static void protect_low(int protect) { mprotect(bindstack, BINDSTACK_SIZE, protect); } static void sigbus_handler(int signal, siginfo_t * info, void *context) { void *addr = info->si_addr; write(sysrqfd, "x", 1); printf("sigbus, fault address %p (should not happen, but might)\n", addr); abort(); } static void run_bind_test(void) { unsigned int *p = bindstack; p[0] = 0xf001f001; write(sysrqfd, "x", 1); /* Set trap on access to p[0] */ protect_low(PROT_NONE); write(sysrqfd, "x", 1); /* Clear trap on access to p[0] */ protect_low(PROT_READ | PROT_WRITE | PROT_EXEC); write(sysrqfd, "x", 1); /* Check the contents of p[0] */ if (p[0] != 0xf001f001) { write(sysrqfd, "x", 1); /* Reached, but shouldn't be */ printf("badness, shouldn't happen but does\n"); abort(); } } int main(void) { struct sigaction sa; sysrqfd = open("/proc/sysrq-trigger", O_WRONLY); if (sigprocmask(SIG_BLOCK, NULL, &sa.sa_mask)) { perror("sigprocmask"); return 0; } sa.sa_sigaction = sigbus_handler; sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_RESTART; if (sigaction(SIGBUS, &sa, NULL)) { perror("sigaction"); return 0; } bindstack = mmap(NULL, BINDSTACK_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (bindstack == MAP_FAILED) { perror("mmap bindstack"); return 0; } printf("bindstack: %p\n", bindstack); run_bind_test(); printf("done\n"); return 0; } There are multiple ingredients for this: 1) PAGE_NONE is defined to _CACHE_CACHABLE_NONCOHERENT, which is CCA 3 on all platforms except SB1 where it's CCA 5. 2) _page_cachable_default must have bits set which are not set _CACHE_CACHABLE_NONCOHERENT. 3) Either the defective version of pte_modify for XPA or the standard version must be in used. However pte_modify for the 36 bit address space support is no affected. In that case additional bits in the final CCA mode may generate an invalid value for the CCA field. On the R10000 system where this was tracked down for example a CCA 7 has been observed, which is Uncached Accelerated. Fixed by: 1) Using the proper CCA mode for PAGE_NONE just like for all the other PAGE_* pte/pmd bits. 2) Fix the two affected variants of pte_modify. Further code inspection also shows the same issue to exist in pmd_modify which would affect huge page systems. Issue in pte_modify tracked down by Alastair Bridgewater, PAGE_NONE and pmd_modify issue found by me. The history of this goes back beyond Linus' git history. Chris Dearman's commit 351336929ccf222ae38ff0cb7a8dd5fd5c6236a0 ("[MIPS] Allow setting of the cache attribute at run time.") missed the opportunity to fix this but it was originally introduced in lmo commit d523832cf12007b3242e50bb77d0c9e63e0b6518 ("Missing from last commit.") and 32cc38229ac7538f2346918a09e75413e8861f87 ("New configuration option CONFIG_MIPS_UNCACHED.") Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reported-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
2016-07-02ACPI,PCI,IRQ: separate ISA penalty calculationSinan Kaya
Since commit 103544d86976 (ACPI,PCI,IRQ: reduce resource requirements) the penalty values are calculated on the fly rather than at boot time. This works fine for PCI interrupts but not so well for ISA interrupts. The information on whether or not an ISA interrupt is in use is not available to the pci_link.c code directly. That information is obtained from the outside via acpi_penalize_isa_irq(). [If its "active" argument is true, then the IRQ is in use by ISA.] Since the current code relies on PCI Link objects for determination of penalties, we are factoring in the PCI penalty twice after acpi_penalize_isa_irq() function is called. To avoid that, limit the newly added functionality to just PCI interrupts so that old behavior is still maintained. Fixes: 103544d86976 (ACPI,PCI,IRQ: reduce resource requirements) Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Tested-by: Wim Osterholt <wim@djo.tudelft.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-02Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"Sinan Kaya
Trying to make the ISA and PCI init functionality common turned out to be a bad idea, because the ISA path depends on external functionality. Restore the previous behavior and limit the refactoring to PCI interrupts only. Fixes: 1fcb6a813c4f "ACPI,PCI,IRQ: remove redundant code in acpi_irq_penalty_init()" Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Tested-by: Wim Osterholt <wim@djo.tudelft.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-02ACPI,PCI,IRQ: factor in PCI possibleSinan Kaya
The change introduced in commit 103544d86976 (ACPI,PCI,IRQ: reduce resource requirements) omitted the initially applied PCI_POSSIBLE penalty when the IRQ is active. Incorrect calculation of the penalty leads the ACPI code to assigning a wrong interrupt number to a PCI INTx interrupt. This would not be as bad as it sounds in theory. It would just cause the interrupts to be shared and result in performance penalty. However, some drivers (like the parallel port driver) don't like interrupt sharing and in the above case they will causes all of the PCI drivers wanting to share the interrupt to be unable to request it. The issue has not been caught in testing because the behavior is platform-specific and depends on the peripherals ending up sharing the IRQ and their drivers. Before the above commit the code would add the PCI_POSSIBLE value divided by the number of possible IRQ users to the IRQ penalty during initialization. Later in that code path, if the IRQ is chosen as the active IRQ or if it is used by ISA; additional penalties are added. Fixes: 103544d86976 (ACPI,PCI,IRQ: reduce resource requirements) Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Tested-by: Wim Osterholt <wim@djo.tudelft.nl> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-01Merge tag 'acpi-4.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Fix an expression in the ACPI PCI IRQ management code added by a recent commit that overlooked missing parens in it, so the result of the computation is incorrect in some cases (Sinan Kaya)" * tag 'acpi-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI,PCI,IRQ: correct operator precedence
2016-07-01Merge tag 'pm-4.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Three cpufreq fixes, one in the core (stable-candidate) and two in drivers (intel_pstate and cpufreq-dt). Specifics: - Fix a recent intel_pstate regression that caused the number of wakeups to increase significantly on an idle system in some cases due to excessive synchronize_sched() invocations (Rafael Wysocki). - Fix unnecessary invocations of WARN_ON() in the cpufreq core after cpufreq has been suspended introduced during the 4.6 cycla (Rafael Wysocki). - Fix an error code path in the cpufreq-dt-platdev driver that forgets to drop a reference to a DT node (Masahiro Yamada)" * tag 'pm-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Avoid false-positive WARN_ON()s in cpufreq_update_policy() cpufreq: dt: call of_node_put() before error out intel_pstate: Do not clear utilization update hooks on policy changes
2016-07-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Tmpfs readdir throughput regression fix (this cycle) + some -stable fodder all over the place. One missing bit is Miklos' tonight locks.c fix - NFS folks had already grabbed that one by the time I woke up ;-)" [ The locks.c fix came through the nfsd tree just moments ago ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: namespace: update event counter when umounting a deleted dentry 9p: use file_dentry() ceph: fix d_obtain_alias() misuses lockless next_positive() libfs.c: new helper - next_positive() dcache_{readdir,dir_lseek}(): don't bother with nested ->d_lock
2016-07-01Merge tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull lockd/locks fixes from Bruce Fields: "One fix for lockd soft lookups in an error path, and one fix for file leases on overlayfs" * tag 'nfsd-4.7-3' of git://linux-nfs.org/~bfields/linux: locks: use file_inode() lockd: unregister notifier blocks if the service fails to come up completely
2016-07-01Merge tag 'mfd-fixes-4.7.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull more MFD fixes from Lee Jones: "Apologies for missing these from the first pull request. Final patches fixing Reset API change" * tag 'mfd-fixes-4.7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: usb: dwc3: st: Use explicit reset_control_get_exclusive() API phy: phy-stih407-usb: Use explicit reset_control_get_exclusive() API phy: miphy28lp: Inform the reset framework that our reset line may be shared
2016-07-01Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "1/ Two regression fixes since v4.6: one for the byte order of a sysfs attribute (bz121161) and another for QEMU 2.6's NVDIMM _DSM (ACPI Device Specific Method) implementation that gets tripped up by new auto-probing behavior in the NFIT driver. 2/ A fix tagged for -stable that stops the kernel from clobbering/ignoring changes to the configuration of a 'pfn' instance ("struct page" driver). For example changing the alignment from 2M to 1G may silently revert to 2M if that value is currently stored on media. 3/ A fix from Eric for an xfstests failure in dax. It is not currently tagged for -stable since it requires an 8-exabyte file system to trigger, and there appear to be no user visible side effects" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix format interface code byte order dax: fix offset overflow in dax_io acpi, nfit: fix acpi_check_dsm() vs zero functions implemented libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment
2016-07-01tipc: fix nl compat regression for link statisticsRichard Alpe
Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes of the link name where left out. Making the output of tipc-config -ls look something like: Link statistics: dcast-link 1:data0-1.1.2:data0 1:data0-1.1.3:data0 Also, for the record, the patch that introduce this regression claims "Sending the whole object out can cause a leak". Which isn't very likely as this is a compat layer, where the data we are parsing is generated by us and we know the string to be NULL terminated. But you can of course never be to secure. Fixes: 5d2be1422e02 (tipc: fix an infoleak in tipc_nl_compat_link_dump) Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net: bcmsysport: Device stats are unsigned longFlorian Fainelli
On 64bits kernels, device stats are 64bits wide, not 32bits. Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01macsec: set actual real device for xmit when !protect_framesDaniel Borkmann
Avoid recursions of dev_queue_xmit() to the wrong net device when frames are unprotected, since at that time skb->dev still points to our own macsec dev and unlike macsec_encrypt_finish() dev pointer doesn't get updated to real underlying device. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net_sched: fix mirrored packets checksumWANG Cong
Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation") we need to fixup the checksum for CHECKSUM_COMPLETE when pushing skb on RX path. Otherwise we get similar splats. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01packet: Use symmetric hash for PACKET_FANOUT_HASH.David S. Miller
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that they want packets going in both directions on a flow to hash to the same bucket. The core kernel SKB hash became non-symmetric when the ipv6 flow label and other entities were incorporated into the standard flow hash order to increase entropy. But there are no users of PACKET_FANOUT_HASH who want an assymetric hash, they all want a symmetric one. Therefore, use the flow dissector to compute a flat symmetric hash over only the protocol, addresses and ports. This hash does not get installed into and override the normal skb hash, so this change has no effect whatsoever on the rest of the stack. Reported-by: Eric Leblond <eric@regit.org> Tested-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01drm/i915: Remove debug noise on detecting fault-injection of missed interruptsChris Wilson
Since the tests can and do explicitly check debugfs/i915_ring_missed_irqs for the handling of a "missed interrupt", adding it to the dmesg at INFO is just noise. When it happens for real, we still class it as an ERROR. Note that I have chose to remove it entirely because when we detect the "missed interrupt" is irrelevant and the message contains no more information than we glean from looking in debugfs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-20-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Simplify enabling user-interrupts with L3-remappingChris Wilson
Borrow the idea from intel_lrc.c to precompute the mask of interrupts we wish to always enable to avoid having lots of conditionals inside the interrupt enabling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-19-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Move the get/put irq locking into the callerChris Wilson
With only a single callsite for intel_engine_cs->irq_get and ->irq_put, we can reduce the code size by moving the common preamble into the caller, and we can also eliminate the reference counting. For completeness, as we are no longer doing reference counting on irq, rename the get/put vfunctions to enable/disable respectively and are able to review the use of posting reads. We only require the serialisation with hardware when enabling the interrupt (i.e. so we cannot miss an interrupt by going to sleep before the hardware truly enables it). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-18-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Embed signaling node into the GEM requestChris Wilson
Under the assumption that enabling signaling will be a frequent operation, lets preallocate our attachments for signaling inside the (rather large) request struct (and so benefiting from the slab cache). v2: Convert from void * to more meaningful names and types. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-17-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Convert trace-irq to the breadcrumb waiterChris Wilson
If we convert the tracing over from direct use of ring->irq_get() and over to the breadcrumb infrastructure, we only have a single user of the ring->irq_get and so we will be able to simplify the driver routines (eliminating the redundant validation and irq refcounting). Process context is preferred over softirq (or even hardirq) for a couple of reasons: - we already utilize process context to have fast wakeup of a single client (i.e. the client waiting for the GPU inspects the seqno for itself following an interrupt to avoid the overhead of a context switch before it returns to userspace) - engine->irq_seqno() is not suitable for use from an softirq/hardirq context as we may require long waits (100-250us) to ensure the seqno write is posted before we read it from the CPU A signaling framework is a requirement for enabling dma-fences. v2: Move to a signaling framework based upon the waiter. v3: Track the first-signal to avoid having to walk the rbtree everytime. v4: Mark the signaler thread as RT priority to reduce latency in the indirect wakeups. v5: Make failure to allocate the thread fatal. v6: Rename kthreads to i915/signal:%u Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-16-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Stop setting wraparound seqno on initialisationChris Wilson
We have testcases to ensure that seqno wraparound works fine, so we can forgo forcing everyone to encounter seqno wraparound during early uptime. seqno wraparound incurs a full GPU stall so not forcing it will eliminate one jitter from the early system. Using the testcases, we have very deterministic testing which given how difficult it would be to debug an issue (GPU hang) stemming from a wraparound using pure postmortem analysis I see no value in forcing a wrap during boot. Advancing the global next_seqno after a GPU reset is equally pointless. References? https://bugs.freedesktop.org/show_bug.cgi?id=95023 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-15-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Only apply one barrier after a breadcrumb interrupt is postedChris Wilson
If we flag the seqno as potentially stale upon receiving an interrupt, we can use that information to reduce the frequency that we apply the heavyweight coherent seqno read (i.e. if we wake up a chain of waiters). v2: Use cmpxchg to replace READ_ONCE/WRITE_ONCE for more explicit control of the ordering wrt to interrupt generation and interrupt checking in the bottom-half. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-14-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Check the CPU cached value in HWS of seqno after waking the waiterChris Wilson
If we have multiple waiters, we may find that many complete on the same wake up. If we first inspect the seqno from the CPU cache, we may reduce the number of heavyweight coherent seqno reads we require. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-13-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)Chris Wilson
On Ironlake, there is no command nor register to ensure that the write from a MI_STORE command is completed (and coherent on the CPU) before the command parser continues. This means that the ordering between the seqno write and the subsequent user interrupt is undefined (like gen6+). So to ensure that the seqno write is completed after the final user interrupt we need to delay the read sufficiently to allow the write to complete. This delay is undefined by the bspec, and empirically requires 75us even though a register read combined with a clflush is less than 500ns. Hence, the delay is due to an on-chip buffer rather than the latency of the write to memory. Note that the render ring controls this by filling the PIPE_CONTROL fifo with stalling commands that force the earliest pipe-control with the seqno to be completed before the command parser continues. Given that we need a barrier operation for BSD, we may as well forgo the extra per-batch latency by using a common per-interrupt barrier. Studying the impact of adding the usleep shows that in both sequences of and individual synchronous no-op batches is negligible for the media engine (where the write now is unordered with the interrupt). Converting the render engine over from the current glutton of pie-controls over to the per-interrupt delays speeds up both the sequential and individual synchronous no-ops by 20% and 60%, respectively. This speed up holds even when looking at the throughput of small copies (4KiB->4MiB), both serial and synchronous, by about 20%. This is because despite adding a significant delay to the interrupt, in all likelihood we will see the seqno write without having to apply the barrier (only in the rare corner cases where the write is delayed on the last required is the delay necessary). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307 Testcase: igt/gem_sync #ilk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-12-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Refactor scratch object allocation for gen2 w/a bufferChris Wilson
The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch buffer. If we pass in the size we want to allocate for the scratch buffer, both callers can use the same routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-11-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Allocate scratch page from stolenChris Wilson
With the last direct CPU access to the scratch page removed, we can now allocate it from our small amount of reserved system pages (stolen memory). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-10-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Stop mapping the scratch page into CPU spaceChris Wilson
After the elimination of using the scratch page for Ironlake's breadcrumb, we no longer need to kmap the object. We therefore can move it into the high unmappable space and do not need to force the object to be coherent (i.e. snooped on !llc platforms). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-9-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Use HWS for seqno tracking everywhereChris Wilson
By using the same address for storing the HWS on every platform, we can remove the platform specific vfuncs and reduce the get-seqno routine to a single read of a cached memory location. v2: Fix semaphore_passed() to look at the signaling engine (not the waiter's) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Spin after waking up for an interruptChris Wilson
When waiting for an interrupt (waiting for the engine to complete some work), we know we are the only waiter to be woken on this engine. We also know when the GPU has nearly completed our request (or at least started processing it), so after being woken and we detect that the GPU is active and working on our request, allow us the bottom-half (the first waiter who wakes up to handle checking the seqno after the interrupt) to spin for a very short while to reduce client latencies. The impact is minimal, there was an improvement to the realtime-vs-many clients case, but exporting the function proves useful later. However, it is tempting to adjust irq_seqno_barrier to include the spin. The problem is first ensuring that the "start-of-request" seqno is coherent as we use that as our basis for judging when it is ok to spin. If we could, spinning there could dramatically shorten some sleeps, and allow us to make the barriers more conservative to handle missed seqno writes on more platforms (all gen7+ are known to have the occasional issue, at least). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-7-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Slaughter the thundering i915_wait_request herdChris Wilson
One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batchbuffer - hence the thunder of hooves as then every client must do its heavyweight dance to read a coherent seqno to see if it is the lucky one. Ideally, we only want one client to wake up after the interrupt and check its request for completion. Since the requests must retire in order, we can select the first client on the oldest request to be woken. Once that client has completed his wait, we can then wake up the next client and so on. However, all clients then incur latency as every process in the chain may be delayed for scheduling - this may also then cause some priority inversion. To reduce the latency, when a client is added or removed from the list, we scan the tree for completed seqno and wake up all the completed waiters in parallel. Using igt/benchmarks/gem_latency, we can demonstrate this effect. The benchmark measures the number of GPU cycles between completion of a batch and the client waking up from a call to wait-ioctl. With many concurrent waiters, with each on a different request, we observe that the wakeup latency before the patch scales nearly linearly with the number of waiters (before external factors kick in making the scaling much worse). After applying the patch, we can see that only the single waiter for the request is being woken up, providing a constant wakeup latency for every operation. However, the situation is not quite as rosy for many waiters on the same request, though to the best of my knowledge this is much less likely in practice. Here, we can observe that the concurrent waiters incur extra latency from being woken up by the solitary bottom-half, rather than directly by the interrupt. This appears to be scheduler induced (having discounted adverse effects from having a rbtree walk/erase in the wakeup path), each additional wake_up_process() costs approximately 1us on big core. Another effect of performing the secondary wakeups from the first bottom-half is the incurred delay this imposes on high priority threads - rather than immediately returning to userspace and leaving the interrupt handler to wake the others. To offset the delay incurred with additional waiters on a request, we could use a hybrid scheme that did a quick read in the interrupt handler and dequeued all the completed waiters (incurring the overhead in the interrupt handler, not the best plan either as we then incur GPU submission latency) but we would still have to wake up the bottom-half every time to do the heavyweight slow read. Or we could only kick the waiters on the seqno with the same priority as the current task (i.e. in the realtime waiter scenario, only it is woken up immediately by the interrupt and simply queues the next waiter before returning to userspace, minimising its delay at the expense of the chain, and also reducing contention on its scheduler runqueue). This is effective at avoid long pauses in the interrupt handler and at avoiding the extra latency in realtime/high-priority waiters. v2: Convert from a kworker per engine into a dedicated kthread for the bottom-half. v3: Rename request members and tweak comments. v4: Use a per-engine spinlock in the breadcrumbs bottom-half. v5: Fix race in locklessly checking waiter status and kicking the task on adding a new waiter. v6: Fix deciding when to force the timer to hide missing interrupts. v7: Move the bottom-half from the kthread to the first client process. v8: Reword a few comments v9: Break the busy loop when the interrupt is unmasked or has fired. v10: Comments, unnecessary churn, better debugging from Tvrtko v11: Wake all completed waiters on removing the current bottom-half to reduce the latency of waking up a herd of clients all waiting on the same request. v12: Rearrange missed-interrupt fault injection so that it works with igt/drv_missed_irq_hang v13: Rename intel_breadcrumb and friends to intel_wait in preparation for signal handling. v14: RCU commentary, assert_spin_locked v15: Hide BUG_ON behind the compiler; report on gem_latency findings. v16: Sort seqno-groups by priority so that first-waiter has the highest task priority (and so avoid priority inversion). v17: Add waiters to post-mortem GPU hang state. v18: Return early for a completed wait after acquiring the spinlock. Avoids adding ourselves to the tree if the is already complete, and skips the awkward question of why we don't do completion wakeups for waits earlier than or equal to ourselves. v19: Prepare for init_breadcrumbs to fail. Later patches may want to allocate during init, so be prepared to propagate back the error code. Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_latency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18 Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Separate GPU hang waitqueue from advanceChris Wilson
Currently __i915_wait_request uses a per-engine wait_queue_t for the dual purpose of waking after the GPU advances or for waking after an error. In the future, we may add even more wake sources and require greater separation, but for now we can conceptually simplify wakeups by separating the two sources. In particular, this allows us to use different wait-queues (e.g. one on the engine advancement, a global one for errors and one on each requests) without any hassle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-5-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Make queueing the hangcheck work inlineChris Wilson
Since the function is a small wrapper around schedule_delayed_work(), move it inline to remove the function call overhead for the principle caller. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-4-git-send-email-chris@chris-wilson.co.uk
2016-07-01drm/i915: Remove the dedicated hangcheck workqueueChris Wilson
The queue only ever contains at most one item and has no special flags. It is just a very simple wrapper around the system-wq - a complication with no benefits. v2: Use the system_long_wq as we may wish to capture the error state after detecting the hang - which may take a bit of time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-3-git-send-email-chris@chris-wilson.co.uk