git.armlinux.org.uk/linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2014-09-29	mei: debugfs: adjust print buffer	Alexander Usyskin
	In case of many me clients (15 and more) 1K buffer is not enough for full information print. Calculate buffer size according to real clients number. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29	mei: add hbm and pg state in devstate debugfs print	Alexander Usyskin
	Add hbm state, pg enablement and state to devstate file in debugfs (<debugfs>/mei/devstate) Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29	usb: hcd: add generic PHY support	Sergei Shtylyov
	Add the generic PHY support, analogous to the USB PHY support. Intended it to be used with the PCI EHCI/OHCI drivers and the xHCI platform driver. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29	usb: rename phy to usb_phy in HCD	Antoine Tenart
	The USB PHY member of the HCD structure is renamed to 'usb_phy' and modifications are done in all drivers accessing it. This is in preparation to adding the generic PHY support. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> [Sergei: added missing 'drivers/usb/misc/lvstest.c' file, resolved rejects, updated changelog.] Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29	usb: gadget: uvc: fix up uvcg_v4l2_get_unmapped_area typo	Arnd Bergmann
	Patch "usb: gadget: uvc: rename functions to avoid conflicts with host uvc" renamed a lot of symbols but missed one references that was inside of an #ifdef: drivers/usb/gadget/function/uvc_v4l2.c:363:23: error: 'uvcg_v4l2_get_unmapped_area' undeclared here (not in a function) .get_unmapped_area = uvcg_v4l2_get_unmapped_area, ^ drivers/usb/gadget/function/uvc_v4l2.c:344:22: warning: 'uvc_v4l2_get_unmapped_area' defined but not used [-Wunused-function] static unsigned long uvc_v4l2_get_unmapped_area(struct file *file, ^ This renames the reference according the changed function name. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 7ea95b110811 ("usb: gadget: uvc: rename functions to avoid conflicts with host uvc") Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Cc: Michael Grzeschik <m.grzeschik@pengutronix.de> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29	USB: host: st: fix ehci/ohci driver selection	Arnd Bergmann
	The newly added sti ehci and ohci drivers come with a single Kconfig entry that does not depend on either of the base drivers, which leads to a link error when they are disabled: drivers/built-in.o: In function `ohci_platform_init': :(.init.text+0x14788): undefined reference to `ohci_init_driver' To fix that, this patch introduces two separate Kconfig options with proper dependencies, which avoids the problem and is also more consistent with the other glue drivers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: d115837259ada ("usb: host: ohci-st: Add OHCI driver support for ST STB devices") Cc: Peter Griffin <peter.griffin@linaro.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29	usb: host: ehci-exynos: Remove unnecessary usb-phy support	Vivek Gautam
	Now that we have completely moved from older USB-PHY drivers to newer GENERIC-PHY drivers for PHYs available with USB controllers on Exynos series of SoCs, we can remove the support for the same in our host drivers too. We also defer the probe for our host in case we end up getting EPROBE_DEFER error when getting PHYs. Signed-off-by: Vivek Gautam <gautam.vivek@samsung.com> Acked-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29	usb: core: return -ENOTSUPP for all targeted hosts	Peter Chen
	The current code only returns -ENOTSUPP for OTG host, but in fact, embedded host also needs to returns -ENOTSUPP if the peripheral is not at TPL. Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-29	pinctrl: alter device tree bindings for functions	Linus Walleij
	For function and group configuration nodes, use "function" "groups" string pairs, not "pins" where there should be "groups". Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2014-09-29	Bluetooth: 6lowpan: Enable multicast support	Jukka Rissanen
	Set multicast support for 6lowpan network interface. This is needed in every network interface that supports IPv6. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-29	Bluetooth: 6lowpan: Ensure header compression does not corrupt IPv6 header	Jukka Rissanen
	If skb is going to multiple destinations, then make sure that we do not overwrite the common IPv6 headers. So before compressing the IPv6 headers, we copy the skb and that is then sent to 6LoWPAN Bluetooth devices. This is a similar patch as what was done for IEEE 802.154 6LoWPAN in commit f19f4f9525cf3 ("ieee802154: 6lowpan: ensure header compression does not corrupt ipv6 header") Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-29	pinctrl: nomadik: refactor DT parser to take two paths	Linus Walleij
	We refactor the DT parser to look for either a config or a function and then look for further nodes and reserve maps, not the two things mixed up like prior to this patch. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2014-09-29	drm/i915: Don't spam dmesg with rps messages on vlv/chv	Ville Syrjälä
	If the GPU frequency isn't going to change don't spam dmesg with debug messages about it. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-09-29	arm: kvm: fix CPU hotplug	Vladimir Murzin
	On some platforms with no power management capabilities, the hotplug implementation is allowed to return from a smp_ops.cpu_die() call as a function return. Upon a CPU onlining event, the KVM CPU notifier tries to reinstall the hyp stub, which fails on platform where no reset took place following a hotplug event, with the message: CPU1: smp_ops.cpu_die() returned, trying to resuscitate CPU1: Booted secondary processor Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x80409540 unexpected data abort in Hyp mode at: 0x80401fe8 unexpected HVC/SVC trap in Hyp mode at: 0x805c6170 since KVM code is trying to reinstall the stub on a system where it is already configured. To prevent this issue, this patch adds a check in the KVM hotplug notifier that detects if the HYP stub really needs re-installing when a CPU is onlined and skips the installation call if the stub is already in place, which means that the CPU has not been reset. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-29	cpufreq: Replace strnicmp with strncasecmp	Rasmus Villemoes
	The kernel used to contain two functions for length-delimited, case-insensitive string comparison, strnicmp with correct semantics and a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper for the new strncasecmp to avoid breaking existing users. To allow the compat wrapper strnicmp to be removed at some point in the future, and to avoid the extra indirection cost, do s/strnicmp/strncasecmp/g. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29	cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec	Shilpasri G Bhat
	This patch ensures the cpus to kexec/reboot at nominal frequency. Nominal frequency is the highest cpu frequency on PowerPC at which the cores can run without getting throttled. If the host kernel had set the cpus to a low pstate and then it kexecs/reboots to a cpufreq disabled kernel it would cause the target kernel to perform poorly. It will also increase the boot up time of the target kernel. So set the cpus to high pstate, in this case to nominal frequency before rebooting to avoid such scenarios. The reboot notifier will set the cpus to nominal frequncy. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29	cpufreq: powernv: Set the pstate of the last hotplugged out cpu in ↵	Preeti U Murthy
	policy->cpus to minimum Its possible today that the pstate of a core is held at a high even after the entire core is hotplugged out if a load had just run on the hotplugged cpu. This is fair, since it is assumed that the pstate does not matter to a cpu in a deep idle state, which is the expected state of a hotplugged core on powerpc. However on powerpc, the pstate at a socket level is held at the maximum of the pstates of each core. Even if the pstates of the active cores on that socket is low, the socket pstate is held high due to the pstate of the hotplugged core in the above mentioned scenario. This can cost significant amount of power loss for no good. Besides, since it is a non active core, nothing can be done from the kernel's end to set the frequency of the core right. Hence make use of the stop_cpu callback to explicitly set the pstate of the core to a minimum when the last cpu of the core gets hotplugged out. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29	cpufreq: Allow stop CPU callback to be used by all cpufreq drivers	Preeti U Murthy
	Commit 367dc4aa932bfb3 ("cpufreq: Add stop CPU callback to cpufreq_driver interface") introduced the stop CPU callback for intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this callback is invoked so that drivers can take some action on the pstate of the cpu before it is taken offline. This callback was assumed to be useful only for those drivers which have implemented the set_policy CPU callback because they have no other way to take action about the cpufreq of a CPU which is being hotplugged out except in the exit callback which is called very late in the offline process. The drivers which implement the target/target_index callbacks were expected to take care of requirements like the ones that commit 367dc4aa addresses in the GOV_STOP notification event. But there are disadvantages to restricting the usage of stop CPU callback to cpufreq drivers that implement the set_policy callbacks and who want to take explicit action on the setting the cpufreq during a hotplug operation. 1.GOV_STOP gets called for every CPU offline and drivers would usually want to take action when the last cpu in the policy->cpus mask is taken offline. As long as there is more than one cpu in the policy->cpus mask, cpufreq core itself makes sure that the freq for the other cpus in this mask is set according to the maximum load. This is sensible and drivers which implement the target_index callback would mostly not want to modify that. However the cpufreq core leaves a loose end when the cpu in the policy->cpus mask is the last one to go offline; it does nothing explicit to the frequency of the core. Drivers may need a way to take some action here and stop CPU callback mechanism is the best way to do it today. 2. We cannot implement driver specific actions in the GOV_STOP mechanism. So we will need another driver callback which is invoked from here which is unnecessary. Therefore this patch extends the usage of stop CPU callback to be used by all cpufreq drivers as long as they have this callback implemented and irrespective of whether they are set_policy/target_index drivers. The assumption is if the drivers find the GOV_STOP path to be a suitable way of implementing what they want to do with the freq of the cpu going offine,they will not implement the stop CPU callback at all. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29	drm/i915: Flush the PTEs after updating them before suspend	Chris Wilson
	As we use WC updates of the PTE, we are responsible for notifying the hardware when to flush its TLBs. Do so after we zap all the PTEs before suspend (and the BIOS tries to read our GTT). Fixes a regression from commit 828c79087cec61eaf4c76bb32c222fbe35ac3930 Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Wed Oct 16 09:21:30 2013 -0700 drm/i915: Disable GGTT PTEs on GEN6+ suspend that survived and continue to cause harm even after commit e568af1c626031925465a5caaab7cca1303d55c7 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Mar 26 20:08:20 2014 +0100 drm/i915: Undo gtt scratch pte unmapping again v2: Trivial rebase. v3: Fixes requires pointer dances. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82340 Tested-by: ming.yao@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Cc: Takashi Iwai <tiwai@suse.de> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Todd Previte <tprevite@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-09-29	KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode	Aneesh Kumar K.V
	We use cma reserved area for creating guest hash page table. Don't do the reservation in non-hypervisor mode. This avoids unnecessary CMA reservation when booting with limited memory configs like fadump and kdump. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-29	cpufreq: integrator: fix integrator_cpufreq_remove return type	Arnd Bergmann
	When building this driver as a module, we get a helpful warning about the return type: drivers/cpufreq/integrator-cpufreq.c:232:2: warning: initialization from incompatible pointer type .remove = __exit_p(integrator_cpufreq_remove), If the remove callback returns void, the caller gets an undefined value as it expects an integer to be returned. This fixes the problem by passing down the value from cpufreq_unregister_driver. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29	cpufreq: pcc-cpufreq: Fix wait_event() under spinlock	Rafael J. Wysocki
	Fix the following bug introduced by commit 8fec051eea73 (cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin\|end}) that forgot to move the spin_lock() in pcc_cpufreq_target() past cpufreq_freq_transition_begin() which calls wait_event(): BUG: sleeping function called from invalid context at drivers/cpufreq/cpufreq.c:370 in_atomic(): 1, irqs_disabled(): 0, pid: 2636, name: modprobe Preemption disabled at:[<ffffffffa04d74d7>] pcc_cpufreq_target+0x27/0x200 [pcc_cpufreq] [ 51.025044] CPU: 57 PID: 2636 Comm: modprobe Tainted: G E 3.17.0-default #7 Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 07/07/2010 00000000ffffffff ffff88026c46b828 ffffffff81589dbd 0000000000000000 ffff880037978090 ffff88026c46b848 ffffffff8108e1df ffff880037978090 0000000000000000 ffff88026c46b878 ffffffff8108e298 ffff88026d73ec00 Call Trace: [<ffffffff81589dbd>] dump_stack+0x4d/0x90 [<ffffffff8108e1df>] ___might_sleep+0x10f/0x180 [<ffffffff8108e298>] __might_sleep+0x48/0xd0 [<ffffffff8145b905>] cpufreq_freq_transition_begin+0x75/0x140 drivers/cpufreq/cpufreq.c:370 wait_event(policy->transition_wait, !policy->transition_ongoing); [<ffffffff8108fc99>] ? preempt_count_add+0xb9/0xc0 [<ffffffffa04d7513>] pcc_cpufreq_target+0x63/0x200 [pcc_cpufreq] drivers/cpufreq/pcc-cpufreq.c:207 spin_lock(&pcc_lock); [<ffffffff810e0d0f>] ? update_ts_time_stats+0x7f/0xb0 [<ffffffff8145be55>] __cpufreq_driver_target+0x85/0x170 [<ffffffff8145e4c8>] od_check_cpu+0xa8/0xb0 [<ffffffff8145ef10>] dbs_check_cpu+0x180/0x1d0 [<ffffffff8145f310>] cpufreq_governor_dbs+0x3b0/0x720 [<ffffffff8145ebe3>] od_cpufreq_governor_dbs+0x33/0xe0 [<ffffffff814593d9>] __cpufreq_governor+0xa9/0x210 [<ffffffff81459fb2>] cpufreq_set_policy+0x1e2/0x2e0 [<ffffffff8145a6cc>] cpufreq_init_policy+0x8c/0x110 [<ffffffff8145c9a0>] ? cpufreq_update_policy+0x1b0/0x1b0 [<ffffffff8108fb99>] ? preempt_count_sub+0xb9/0x100 [<ffffffff8145c6c6>] __cpufreq_add_dev+0x596/0x6b0 [<ffffffffa016c608>] ? pcc_cpufreq_probe+0x4b4/0x4b4 [pcc_cpufreq] [<ffffffff8145c7ee>] cpufreq_add_dev+0xe/0x10 [<ffffffff81408e81>] subsys_interface_register+0xc1/0xf0 [<ffffffff8108fb99>] ? preempt_count_sub+0xb9/0x100 [<ffffffff8145b3d7>] cpufreq_register_driver+0x117/0x2a0 [<ffffffffa016c65d>] pcc_cpufreq_init+0x55/0x9f8 [pcc_cpufreq] [<ffffffffa016c608>] ? pcc_cpufreq_probe+0x4b4/0x4b4 [pcc_cpufreq] [<ffffffff81000298>] do_one_initcall+0xc8/0x1f0 [<ffffffff811a731d>] ? __vunmap+0x9d/0x100 [<ffffffff810eb9a0>] do_init_module+0x30/0x1b0 [<ffffffff810edfa6>] load_module+0x686/0x710 [<ffffffff810ebb20>] ? do_init_module+0x1b0/0x1b0 [<ffffffff810ee1db>] SyS_init_module+0x9b/0xc0 [<ffffffff8158f7a9>] system_call_fastpath+0x16/0x1b Fixes: 8fec051eea73 (cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin\|end}) Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: 3.15+ <stable@vger.kernel.org> # 3.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29	Merge back earlier 'pm-sleep' material for v3.18.	Rafael J. Wysocki

2014-09-29	Merge back earlier 'acpica' material for v3.18.	Rafael J. Wysocki

2014-09-29	drm/i915: Do not leak pages when freeing userptr objects	Tvrtko Ursulin
	sg_alloc_table_from_pages() can build us a table with coalesced ranges which means we need to iterate over pages and not sg table entries when releasing page references. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: "Barbalho, Rafael" <rafael.barbalho@intel.com> Tested-by: Rafael Barbalho <rafael.barbalho@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org [danvet: Remove unused local variable sg.] Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-09-29	Merge back earlier 'acpi-lpss' material for v3.18.	Rafael J. Wysocki

2014-09-29	pinctrl: nomadik: use utils map free function	Linus Walleij
	Stop brewing our own map free function and rely on the pinctrl utils helpers. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2014-09-29	drm/i915: Do not store the error pointer for a failed userptr registration	Chris Wilson
	If we fail to create our mmu notification, we report the error back and currently store the error inside the i915_mm_struct. This not only causes subsequent registerations of the same mm to fail (an issue if the first was interrupted by a signal and needed to be restarted) but also causes us to eventually try and free the error pointer. [ 73.419599] BUG: unable to handle kernel NULL pointer dereference at 000000000000004c [ 73.419831] IP: [<ffffffff8114af33>] mmu_notifier_unregister+0x23/0x130 [ 73.420065] PGD 8650c067 PUD 870bb067 PMD 0 [ 73.420319] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 73.420580] CPU: 0 PID: 42 Comm: kworker/0:1 Tainted: G W 3.17.0-rc6+ #1561 [ 73.420837] Hardware name: Intel Corporation SandyBridge Platform/LosLunas CRB, BIOS ASNBCPT1.86C.0075.P00.1106281639 06/28/2011 [ 73.421405] Workqueue: events __i915_mm_struct_free__worker [ 73.421724] task: ffff880088a81220 ti: ffff880088168000 task.ti: ffff880088168000 [ 73.422051] RIP: 0010:[<ffffffff8114af33>] [<ffffffff8114af33>] mmu_notifier_unregister+0x23/0x130 [ 73.422410] RSP: 0018:ffff88008816bd50 EFLAGS: 00010286 [ 73.422765] RAX: 0000000000000003 RBX: ffff880086485400 RCX: 0000000000000000 [ 73.423137] RDX: ffff88016d80ee90 RSI: ffff880086485400 RDI: 0000000000000044 [ 73.423513] RBP: ffff88008816bd70 R08: 0000000000000001 R09: 0000000000000000 [ 73.423895] R10: 0000000000000320 R11: 0000000000000001 R12: 0000000000000044 [ 73.424282] R13: ffff880166e5f008 R14: ffff88016d815200 R15: ffff880166e5f040 [ 73.424682] FS: 0000000000000000(0000) GS:ffff88016d800000(0000) knlGS:0000000000000000 [ 73.425099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 73.425537] CR2: 000000000000004c CR3: 0000000087f5f000 CR4: 00000000000407f0 [ 73.426157] Stack: [ 73.426597] ffff880088a81248 ffff880166e5f038 fffffffffffffffc ffff880166e5f008 [ 73.427096] ffff88008816bd98 ffffffff814a75f2 ffff880166e5f038 ffff8800880f8a28 [ 73.427603] ffff88016d812ac0 ffff88008816be00 ffffffff8106321a ffffffff810631af [ 73.428119] Call Trace: [ 73.428606] [<ffffffff814a75f2>] __i915_mm_struct_free__worker+0x42/0x80 [ 73.429116] [<ffffffff8106321a>] process_one_work+0x1ba/0x610 [ 73.429632] [<ffffffff810631af>] ? process_one_work+0x14f/0x610 [ 73.430153] [<ffffffff810636db>] worker_thread+0x6b/0x4a0 [ 73.430671] [<ffffffff8108d67d>] ? trace_hardirqs_on+0xd/0x10 [ 73.431501] [<ffffffff81063670>] ? process_one_work+0x610/0x610 [ 73.432030] [<ffffffff8106a206>] kthread+0xf6/0x110 [ 73.432561] [<ffffffff8106a110>] ? __kthread_parkme+0x80/0x80 [ 73.433100] [<ffffffff8169c22c>] ret_from_fork+0x7c/0xb0 [ 73.433644] [<ffffffff8106a110>] ? __kthread_parkme+0x80/0x80 [ 73.434194] Code: 0f 1f 84 00 00 00 00 00 66 66 66 66 90 8b 46 4c 85 c0 0f 8e 10 01 00 00 55 48 89 e5 41 55 41 54 53 48 89 f3 49 89 fc 48 83 ec 08 <48> 83 7f 08 00 0f 84 b1 00 00 00 48 c7 c7 40 e6 ac 82 e8 26 65 [ 73.435942] RIP [<ffffffff8114af33>] mmu_notifier_unregister+0x23/0x130 [ 73.437017] RSP <ffff88008816bd50> [ 73.437704] CR2: 000000000000004c Fixes regression from commit ad46cb533d586fdb256855437af876617c6cf609 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 7 14:20:40 2014 +0100 drm/i915: Prevent recursive deadlock on releasing a busy userptr Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84207 Testcase: igt/gem_render_copy_redux Testcase: igt/gem_userptr_blits/create-destroy-sync Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jacek Danecki <jacek.danecki@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Jacek Danecki <jacek.danecki@intel.com> Cc: "Ursulin, Tvrtko" <tvrtko.ursulin@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-09-29	pinctrl: nomadik: use util function to reserve maps	Linus Walleij
	Stop brewing our own pin map reservation function and use the generic code. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2014-09-29	Revert "drm/i915/bdw: BDW Software Turbo"	Daniel Vetter
	This reverts commit c76bb61a71083b2d90504cc6d0dda2047c5d63ca. It's apparently too broken so that Rodrigo submitted a patch to add a config option for it. Given that the design is also ... suboptimal and that I've only merged this to get lead engineers and managers off my back for one second let's just revert this. /me puts on combat gear again It was worth a shot ... References: http://mid.mail-archive.com/1411686380-1953-1-git-send-email-rodrigo.vivi@intel.com Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Daisy Sun <daisy.sun@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-09-29	udf: remove redundant sys_tz declaration	Fabian Frederick
	sys_tz is already declared in include/linux/time.h Cc: Jan Kara <jack@suse.cz> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jan Kara <jack@suse.cz>
2014-09-29	PM / devfreq: exynos: Enable building exynos PPMU as module	Punit Agrawal
	Export symbols from the PPMU driver needed to build the exynos bus driver as a module. Cc: MyungJoo Ham <myungjoo.ham@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Kukjin Kim <kgene.kim@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2014-09-29	PM / devfreq: Export helper functions for drivers	Ãrjan Eide
	These functions are indended for use by drivers and should be available also when the driver is built as a module. Cc: MyungJoo Ham <myungjoo.ham@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Ãrjan Eide <orjan.eide@arm.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2014-09-29	netfilter: conntrack: disable generic tracking for known protocols	Florian Westphal
	Given following iptables ruleset: -P FORWARD DROP -A FORWARD -m sctp --dport 9 -j ACCEPT -A FORWARD -p tcp --dport 80 -j ACCEPT -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT One would assume that this allows SCTP on port 9 and TCP on port 80. Unfortunately, if the SCTP conntrack module is not loaded, this allows all SCTP communication, to pass though, i.e. -p sctp -j ACCEPT, which we think is a security issue. This is because on the first SCTP packet on port 9, we create a dummy "generic l4" conntrack entry without any port information (since conntrack doesn't know how to extract this information). All subsequent packets that are unknown will then be in established state since they will fallback to proto_generic and will match the 'generic' entry. Our originally proposed version [1] completely disabled generic protocol tracking, but Jozsef suggests to not track protocols for which a more suitable helper is available, hence we now mitigate the issue for in tree known ct protocol helpers only, so that at least NAT and direction information will still be preserved for others. [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html Joint work with Daniel Borkmann. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29	PM / devfreq: Remove ARCH_HAS_OPP completely	Paul Bolle
	The Kconfig symbol ARCH_HAS_OPP became redundant in v3.16: commit 049d595a4db3 ("PM / OPP: Make OPP invisible to users in Kconfig") removed the only dependency that used it. Setting it had no effect anymore. So commit 78c5e0bb145d ("PM / OPP: Remove ARCH_HAS_OPP") removed it. For some reason that commit did not remove all select statements for that symbol. These statements are now useless. Remove one from devfreq too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2014-09-29	mmc: Fix incorrect warning when setting 0 Hz via debugfs	Adrian Hunter
	It is possible to turn off the card clock by setting the frequency to zero via debugfs e.g. echo 0 > /sys/kernel/debug/mmc0/clock However that produces an incorrect warning that is designed to warn if the frequency is below the minimum operating frequency. So correct the warning. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2014-09-29	netfilter: nf_tables: store and dump set policy	Arturo Borrero
	We want to know in which cases the user explicitly sets the policy options. In that case, we also want to dump back the info. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29	mmc: Fix use of wrong device in mmc_gpiod_free_cd()	Adrian Hunter
	mmc_gpiod_free_cd() is paired with mmc_gpiod_request_cd() and both must reference the same device which is the actual host controller device not the mmc_host class device. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2014-09-29	mmc: atmel-mci: fix mismatched section on atmci_cleanup_slot	Arnd Bergmann
	As of 528bc7808f4e ("mmc: atmel-mci: Release mmc resources on failure in probe"), the atmci_probe() function calls atmci_cleanup_slot in the failure path. This causes a new warning whenever the driver is built: WARNING: drivers/mmc/host/built-in.o(.init.text+0xa04): Section mismatch in reference from the function atmci_probe() to the function .exit.text:atmci_cleanup_slot() The function __init atmci_probe() references a function __exit atmci_cleanup_slot(). Gcc correctly warns about this function getting dropped in the link stage for the built-in case, which would cause undefined behavior when this error path is hit. The solution is to simply drop the __exit annotation. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 528bc7808f4e ("mmc: atmel-mci: Release mmc resources on failure in probe") Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2014-09-29	clk: ti: dra7-atl-clock: Mark the device as pm_runtime_irq_safe	Peter Ujfalusi
	It is safe to call the pm sync calls in interrupt context in this driver. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Tero Kristo <t-kristo@ti.com>
2014-09-29	clk: ti: LLVMLinux: Move __init outside of type definition	Behan Webster
	As written, the __init for ti_clk_get_div_table is in the middle of the return type. The gcc documentation indicates that section attributes should be added to the end of the function declaration: extern void foobar (void) __attribute__ ((section ("bar"))); However gcc seems to be very permissive with where attributes can be placed. clang on the other hand isn't so permissive, and fails if you put the section definition in the middle of the return type: drivers/clk/ti/divider.c:298:28: error: expected ';' after struct static struct clk_div_table ^ ; drivers/clk/ti/divider.c:298:1: warning: 'static' ignored on this declaration [-Wmissing-declarations] static struct clk_div_table ^ drivers/clk/ti/divider.c:299:9: error: type specifier missing, defaults to 'int' [-Werror,-Wimplicit-int] __init ti_clk_get_div_table(struct device_node node) ~~~~~~ ^ drivers/clk/ti/divider.c:345:9: warning: incompatible pointer types returning 'struct clk_div_table ' from a function with result type 'int ' [-Wincompatible-pointer-types] return table; ^~~~~ drivers/clk/ti/divider.c:419:9: warning: incompatible pointer types assigning to 'const struct clk_div_table ' from 'int ' [-Wincompatible-pointer-types] table = ti_clk_get_div_table(node); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~ 3 warnings and 2 errors generated. By convention, most of the kernel code puts section attributes between the return type and function name. In the case where the return type is a pointer, it's important to place the '' on left of the __init. This updated code works for both gcc and clang. Signed-off-by: Behan Webster <behanw@converseincode.com> Reviewed-by: Mark Charlebois <charlebm@gmail.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Tero Kristo <t-kristo@ti.com>
2014-09-29	clk: ti: consider the fact that of_clk_get() might return an error	Sebastian Andrzej Siewior
	I "forgot" to update the dtb and the kernel crashed: \|Unable to handle kernel NULL pointer dereference at virtual address 0000002e \|PC is at __clk_get_flags+0x4/0xc \|LR is at ti_dt_clockdomains_setup+0x70/0xe8 because I did not have the clock nodes. of_clk_get() returns an error pointer which is not checked here. Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tero Kristo <t-kristo@ti.com>
2014-09-29	clk: ti: dra7-atl-clock: fix a memory leak	Tero Kristo
	of_clk_add_provider makes an internal copy of the parent_names property while its called, thus it is no longer needed after this call and can be freed. Signed-off-by: Tero Kristo <t-kristo@ti.com> Cc: Mike Turquette <mturquette@linaro.org> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
2014-09-29	clk: ti: change clock init to use generic of_clk_init	Tero Kristo
	Previously, the TI clock driver initialized all the clocks hierarchically under each separate clock provider node. Now, each clock that requires IO access will instead check their parent node to find out which IO range to use. This patch allows the TI clock driver to use a few new features provided by the generic of_clk_init, and also allows registration of clock nodes outside the clock hierarchy (for example, any external clocks.) Signed-off-by: Tero Kristo <t-kristo@ti.com> Cc: Mike Turquette <mturquette@linaro.org> Cc: Paul Walmsley <paul@pwsan.com> Cc: Tony Lindgren <tony@atomide.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Ujfalusi <peter.ujfalusi@ti.com> Cc: Jyri Sarha <jsarha@ti.com> Cc: Stefan Assmann <sassmann@kpanic.de> Acked-by: Tony Lindgren <tony@atomide.com>
2014-09-29	Bluetooth: 6lowpan: Make sure skb exists before accessing it	Jukka Rissanen
	We need to make sure that the saved skb exists when resuming or suspending a CoC channel. This can happen if initial credits is 0 when channel is connected. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-29	Merge branch 'qca7000_spi'	David S. Miller
	Stefan Wahren says: ==================== add Qualcomm QCA7000 ethernet driver This patch series adds support for the Qualcomm QCA7000 Homeplug GreenPHY. The QCA7000 is serial-to-powerline bridge with two interfaces: UART and SPI. These patches handles only the last one, with an Ethernet over SPI protocol driver. This driver based on the Qualcomm code [1], but contains a lot of changes since last year: * devicetree support * DebugFS support * ethtool support * better error handling * performance improvements * code cleanup * some bugfixes The code has been tested only on Freescale i.MX28 boards, but should work on other platforms. [1] - https://github.com/IoE/qca7000 Changes in V3: - Use ether_addr_copy instead of memcpy - Remove qcaspi_set_mac_address - Improve DT parsing - replace OF_GPIO dependancy with OF - fix compile error caused by SET_ETHTOOL_OPS - fix possible endless loop when spi read fails - fix DT documentation - fix coding style - fix sparse warnings Changes in V2: - replace in DT the SPI intr GPIO with pure interrupt - make legacy mode a boolean DT property and remove it as module parameter - make burst length a module parameter instead of DT property - make pluggable a module parameter instead of DT property - improve DT documentation - replace debugFS register dump with ethtool function - replace debugFS stats with ethtool function - implement function to get ring parameter via ethtool - implement function to set TX ring count via ethtool - fix TX ring state in debugFS - optimize tx ring flush - add byte limit for TX ring to avoid bufferbloat - fix TX queue full and write buffer miss counter - fix SPI clk speed module parameter - fix possible packet loss - fix possible race during transmit ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29	net: qualcomm: new Ethernet over SPI driver for QCA7000	Stefan Wahren
	This patch adds the Ethernet over SPI driver for the Qualcomm QCA7000 HomePlug GreenPHY. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29	Documentation: add Device tree bindings for QCA7000	Stefan Wahren
	This patch adds the Device tree bindings for the Ethernet over SPI protocol driver of the Qualcomm QCA7000 HomePlug GreenPHY. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29	Merge branch 'dctcp'	David S. Miller
	Daniel Borkmann says: ==================== net: tcp: DCTCP congestion control algorithm This patch series adds support for the DataCenter TCP (DCTCP) congestion control algorithm. Please see individual patches for the details. The last patch adds DCTCP as a congestion control module, and previous ones add needed infrastructure to extend the congestion control framework. Joint work between Florian Westphal, Daniel Borkmann and Glenn Judd. v3 -> v2: - No changes anywhere, just a resend as requested by Dave - Added Stephen's ACK v1 -> v2: - Rebased to latest net-next - Addressed Eric's feedback, thanks! - Update stale comment wrt. DCTCP ECN usage - Don't call INET_ECN_xmit for every packet - Add dctcp ss/inetdiag support to expose internal stats to userspace ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29	net: tcp: add DCTCP congestion control algorithm	Daniel Borkmann
	This work adds the DataCenter TCP (DCTCP) congestion control algorithm [1], which has been first published at SIGCOMM 2010 [2], resp. follow-up analysis at SIGMETRICS 2011 [3] (and also, more recently as an informational IETF draft available at [4]). DCTCP is an enhancement to the TCP congestion control algorithm for data center networks. Typical data center workloads are i.e. i) partition/aggregate (queries; bursty, delay sensitive), ii) short messages e.g. 50KB-1MB (for coordination and control state; delay sensitive), and iii) large flows e.g. 1MB-100MB (data update; throughput sensitive). DCTCP has therefore been designed for such environments to provide/achieve the following three requirements: * High burst tolerance (incast due to partition/aggregate) * Low latency (short flows, queries) * High throughput (continuous data updates, large file transfers) with commodity, shallow buffered switches The basic idea of its design consists of two fundamentals: i) on the switch side, packets are being marked when its internal queue length > threshold K (K is chosen so that a large enough headroom for marked traffic is still available in the switch queue); ii) the sender/host side maintains a moving average of the fraction of marked packets, so each RTT, F is being updated as follows: F := X / Y, where X is # of marked ACKs, Y is total # of ACKs alpha := (1 - g) * alpha + g * F, where g is a smoothing constant The resulting alpha (iow: probability that switch queue is congested) is then being used in order to adaptively decrease the congestion window W: W := (1 - (alpha / 2)) * W The means for receiving marked packets resp. marking them on switch side in DCTCP is the use of ECN. RFC3168 describes a mechanism for using Explicit Congestion Notification from the switch for early detection of congestion, rather than waiting for segment loss to occur. However, this method only detects the presence of congestion, not the extent. In the presence of mild congestion, it reduces the TCP congestion window too aggressively and unnecessarily affects the throughput of long flows [4]. DCTCP, as mentioned, enhances Explicit Congestion Notification (ECN) processing to estimate the fraction of bytes that encounter congestion, rather than simply detecting that some congestion has occurred. DCTCP then scales the TCP congestion window based on this estimate [4], thus it can derive multibit feedback from the information present in the single-bit sequence of marks in its control law. And thus act in proportion to the extent of congestion, not its presence. Switches therefore set the Congestion Experienced (CE) codepoint in packets when internal queue lengths exceed threshold K. Resulting, DCTCP delivers the same or better throughput than normal TCP, while using 90% less buffer space. It was found in [2] that DCTCP enables the applications to handle 10x the current background traffic, without impacting foreground traffic. Moreover, a 10x increase in foreground traffic did not cause any timeouts, and thus largely eliminates TCP incast collapse problems. The algorithm itself has already seen deployments in large production data centers since then. We did a long-term stress-test and analysis in a data center, short summary of our TCP incast tests with iperf compared to cubic: This test measured DCTCP throughput and latency and compared it with CUBIC throughput and latency for an incast scenario. In this test, 19 senders sent at maximum rate to a single receiver. The receiver simply ran iperf -s. The senders ran iperf -c <receiver> -t 30. All senders started simultaneously (using local clocks synchronized by ntp). This test was repeated multiple times. Below shows the results from a single test. Other tests are similar. (DCTCP results were extremely consistent, CUBIC results show some variance induced by the TCP timeouts that CUBIC encountered.) For this test, we report statistics on the number of TCP timeouts, flow throughput, and traffic latency. 1) Timeouts (total over all flows, and per flow summaries): CUBIC DCTCP Total 3227 25 Mean 169.842 1.316 Median 183 1 Max 207 5 Min 123 0 Stddev 28.991 1.600 Timeout data is taken by measuring the net change in netstat -s "other TCP timeouts" reported. As a result, the timeout measurements above are not restricted to the test traffic, and we believe that it is likely that all of the "DCTCP timeouts" are actually timeouts for non-test traffic. We report them nevertheless. CUBIC will also include some non-test timeouts, but they are drawfed by bona fide test traffic timeouts for CUBIC. Clearly DCTCP does an excellent job of preventing TCP timeouts. DCTCP reduces timeouts by at least two orders of magnitude and may well have eliminated them in this scenario. 2) Throughput (per flow in Mbps): CUBIC DCTCP Mean 521.684 521.895 Median 464 523 Max 776 527 Min 403 519 Stddev 105.891 2.601 Fairness 0.962 0.999 Throughput data was simply the average throughput for each flow reported by iperf. By avoiding TCP timeouts, DCTCP is able to achieve much better per-flow results. In CUBIC, many flows experience TCP timeouts which makes flow throughput unpredictable and unfair. DCTCP, on the other hand, provides very clean predictable throughput without incurring TCP timeouts. Thus, the standard deviation of CUBIC throughput is dramatically higher than the standard deviation of DCTCP throughput. Mean throughput is nearly identical because even though cubic flows suffer TCP timeouts, other flows will step in and fill the unused bandwidth. Note that this test is something of a best case scenario for incast under CUBIC: it allows other flows to fill in for flows experiencing a timeout. Under situations where the receiver is issuing requests and then waiting for all flows to complete, flows cannot fill in for timed out flows and throughput will drop dramatically. 3) Latency (in ms): CUBIC DCTCP Mean 4.0088 0.04219 Median 4.055 0.0395 Max 4.2 0.085 Min 3.32 0.028 Stddev 0.1666 0.01064 Latency for each protocol was computed by running "ping -i 0.2 <receiver>" from a single sender to the receiver during the incast test. For DCTCP, "ping -Q 0x6 -i 0.2 <receiver>" was used to ensure that traffic traversed the DCTCP queue and was not dropped when the queue size was greater than the marking threshold. The summary statistics above are over all ping metrics measured between the single sender, receiver pair. The latency results for this test show a dramatic difference between CUBIC and DCTCP. CUBIC intentionally overflows the switch buffer which incurs the maximum queue latency (more buffer memory will lead to high latency.) DCTCP, on the other hand, deliberately attempts to keep queue occupancy low. The result is a two orders of magnitude reduction of latency with DCTCP - even with a switch with relatively little RAM. Switches with larger amounts of RAM will incur increasing amounts of latency for CUBIC, but not for DCTCP. 4) Convergence and stability test: This test measured the time that DCTCP took to fairly redistribute bandwidth when a new flow commences. It also measured DCTCP's ability to remain stable at a fair bandwidth distribution. DCTCP is compared with CUBIC for this test. At the commencement of this test, a single flow is sending at maximum rate (near 10 Gbps) to a single receiver. One second after that first flow commences, a new flow from a distinct server begins sending to the same receiver as the first flow. After the second flow has sent data for 10 seconds, the second flow is terminated. The first flow sends for an additional second. Ideally, the bandwidth would be evenly shared as soon as the second flow starts, and recover as soon as it stops. The results of this test are shown below. Note that the flow bandwidth for the two flows was measured near the same time, but not simultaneously. DCTCP performs nearly perfectly within the measurement limitations of this test: bandwidth is quickly distributed fairly between the two flows, remains stable throughout the duration of the test, and recovers quickly. CUBIC, in contrast, is slow to divide the bandwidth fairly, and has trouble remaining stable. CUBIC DCTCP Seconds Flow 1 Flow 2 Seconds Flow 1 Flow 2 0 9.93 0 0 9.92 0 0.5 9.87 0 0.5 9.86 0 1 8.73 2.25 1 6.46 4.88 1.5 7.29 2.8 1.5 4.9 4.99 2 6.96 3.1 2 4.92 4.94 2.5 6.67 3.34 2.5 4.93 5 3 6.39 3.57 3 4.92 4.99 3.5 6.24 3.75 3.5 4.94 4.74 4 6 3.94 4 5.34 4.71 4.5 5.88 4.09 4.5 4.99 4.97 5 5.27 4.98 5 4.83 5.01 5.5 4.93 5.04 5.5 4.89 4.99 6 4.9 4.99 6 4.92 5.04 6.5 4.93 5.1 6.5 4.91 4.97 7 4.28 5.8 7 4.97 4.97 7.5 4.62 4.91 7.5 4.99 4.82 8 5.05 4.45 8 5.16 4.76 8.5 5.93 4.09 8.5 4.94 4.98 9 5.73 4.2 9 4.92 5.02 9.5 5.62 4.32 9.5 4.87 5.03 10 6.12 3.2 10 4.91 5.01 10.5 6.91 3.11 10.5 4.87 5.04 11 8.48 0 11 8.49 4.94 11.5 9.87 0 11.5 9.9 0 SYN/ACK ECT test: This test demonstrates the importance of ECT on SYN and SYN-ACK packets by measuring the connection probability in the presence of competing flows for a DCTCP connection attempt without ECT in the SYN packet. The test was repeated five times for each number of competing flows. Competing Flows 1 \| 2 \| 4 \| 8 \| 16 ------------------------------ Mean Connection Probability 1 \| 0.67 \| 0.45 \| 0.28 \| 0 Median Connection Probability 1 \| 0.65 \| 0.45 \| 0.25 \| 0 As the number of competing flows moves beyond 1, the connection probability drops rapidly. Enabling DCTCP with this patch requires the following steps: DCTCP must be running both on the sender and receiver side in your data center, i.e.: sysctl -w net.ipv4.tcp_congestion_control=dctcp Also, ECN functionality must be enabled on all switches in your data center for DCTCP to work. The default ECN marking threshold (K) heuristic on the switch for DCTCP is e.g., 20 packets (30KB) at 1Gbps, and 65 packets (~100KB) at 10Gbps (K > 1/7 * C * RTT, [4]). In above tests, for each switch port, traffic was segregated into two queues. For any packet with a DSCP of 0x01 - or equivalently a TOS of 0x04 - the packet was placed into the DCTCP queue. All other packets were placed into the default drop-tail queue. For the DCTCP queue, RED/ECN marking was enabled, here, with a marking threshold of 75 KB. More details however, we refer you to the paper [2] under section 3). There are no code changes required to applications running in user space. DCTCP has been implemented in full isolation of the rest of the TCP code as its own congestion control module, so that it can run without a need to expose code to the core of the TCP stack, and thus nothing changes for non-DCTCP users. Changes in the CA framework code are minimal, and DCTCP algorithm operates on mechanisms that are already available in most Silicon. The gain (dctcp_shift_g) is currently a fixed constant (1/16) from the paper, but we leave the option that it can be chosen carefully to a different value by the user. In case DCTCP is being used and ECN support on peer site is off, DCTCP falls back after 3WHS to operate in normal TCP Reno mode. ss {-4,-6} -t -i diag interface: ... dctcp wscale:7,7 rto:203 rtt:2.349/0.026 mss:1448 cwnd:2054 ssthresh:1102 ce_state 0 alpha 15 ab_ecn 0 ab_tot 735584 send 10129.2Mbps pacing_rate 20254.1Mbps unacked:1822 retrans:0/15 reordering:101 rcv_space:29200 ... dctcp-reno wscale:7,7 rto:201 rtt:0.711/1.327 ato:40 mss:1448 cwnd:10 ssthresh:1102 fallback_mode send 162.9Mbps pacing_rate 325.5Mbps rcv_rtt:1.5 rcv_space:29200 More information about DCTCP can be found in [1-4]. [1] http://simula.stanford.edu/~alizade/Site/DCTCP.html [2] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf [3] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf [4] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00 Joint work with Florian Westphal and Glenn Judd. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>