linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2014-01-13	mtd: nand: add ONFI vendor block for Micron	Brian Norris
	Signed-off-by: Brian Norris <computersforpeace@gmail.com> Acked-by: Huang Shijie <b32955@freescale.com>
2014-01-13	audit: correct a type mismatch in audit_syscall_exit()	AKASHI Takahiro
	audit_syscall_exit() saves a result of regs_return_value() in intermediate "int" variable and passes it to __audit_syscall_exit(), which expects its second argument as a "long" value. This will result in truncating the value returned by a system call and making a wrong audit record. I don't know why gcc compiler doesn't complain about this, but anyway it causes a problem at runtime on arm64 (and probably most 64-bit archs). Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13	audit: convert all sessionid declaration to unsigned int	Eric Paris
	Right now the sessionid value in the kernel is a combination of u32, int, and unsigned int. Just use unsigned int throughout. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13	audit: refactor audit_receive_msg() to clarify AUDIT__RULE cases	Richard Guy Briggs
	audit_receive_msg() needlessly contained a fallthrough case that called audit_receive_filter(), containing no common code between the cases. Separate them to make the logic clearer. Refactor AUDIT_LIST_RULES, AUDIT_ADD_RULE, AUDIT_DEL_RULE cases to create audit_rule_change(), audit_list_rules_send() functions. This should not functionally change the logic. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13	audit: fix incorrect type of sessionid	Richard Guy Briggs
	The type of task->sessionid is unsigned int, the return type of audit_get_sessionid should be consistent with it. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13	audit: fix netlink portid naming and types	Richard Guy Briggs
	Normally, netlink ports use the PID of the userspace process as the port ID. If the PID is already in use by a port, the kernel will allocate another port ID to avoid conflict. Re-name all references to netlink ports from pid to portid to reflect this reality and avoid confusion with actual PIDs. Ports use the __u32 type, so re-type all portids accordingly. (This patch is very similar to ebiederman's 5deadd69) Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13	audit: Simplify and correct audit_log_capset	Eric W. Biederman
	- Always report the current process as capset now always only works on the current process. This prevents reporting 0 or a random pid in a random pid namespace. - Don't bother to pass the pid as is available. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit bcc85f0af31af123e32858069eb2ad8f39f90e67) (cherry picked from commit f911cac4556a7a23e0b3ea850233d13b32328692) Signed-off-by: Richard Guy Briggs <rgb@redhat.com> [eparis: fix build error when audit disabled] Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13	PCI: Add global pci_lock_rescan_remove()	Rafael J. Wysocki
	There are multiple PCI device addition and removal code paths that may be run concurrently with the generic PCI bus rescan and device removal that can be triggered via sysfs. If that happens, it may lead to multiple different, potentially dangerous race conditions. The most straightforward way to address those problems is to run the code in question under the same lock that is used by the generic rescan/remove code in pci-sysfs.c. To prepare for those changes, move the definition of the global PCI remove/rescan lock to probe.c and provide global wrappers, pci_lock_rescan_remove() and pci_unlock_rescan_remove(), allowing drivers to manipulate that lock. Also provide pci_stop_and_remove_bus_device_locked() for the callers of pci_stop_and_remove_bus_device() who only need to hold the rescan/remove lock around it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-13	PCI: Cleanup pci.h whitespace	Bjorn Helgaas
	Put empty or trivial inline stub functions on one line when they fit. No functional change. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-13	PCI: Reorder so actual code comes before stubs	Bjorn Helgaas
	Consistently use the: #ifdef CONFIG_PCI_FOO int pci_foo(...); #else static inline int pci_foo(...) { return -1; } #endif pattern, instead of sometimes using "#ifndef CONFIG_PCI_FOO". No functional change. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-13	usb: chipidea: add freescale imx28 special write register method	Peter Chen
	According to Freescale imx28 Errata, "ENGR119653 USB: ARM to USB register error issue", All USB register write operations must use the ARM SWP instruction. So, we implement special hw_write and hw_test_and_clear for imx28. Discussion for it at below: http://marc.info/?l=linux-usb&m=137996395529294&w=2 This patch is needed for stable tree 3.11+. Cc: stable@vger.kernel.org Cc: robert.hodaszi@digi.com Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	Merge branch 'pci/dead-code' into next	Bjorn Helgaas
	* pci/dead-code: PCI: Make local functions static PCI: Remove unused alloc_pci_dev() PCI: Remove unused pci_renumber_slot() PCI: Remove unused pcie_aspm_enabled() PCI: Remove unused pci_vpd_truncate() PCI: Remove unused ID-Based Ordering support PCI: Remove unused Optimized Buffer Flush/Fill support PCI: Remove unused Latency Tolerance Reporting support PCI: Removed unused parts of Page Request Interface support Conflicts: drivers/pci/pci.c include/linux/pci.h
2014-01-13	Revert "kernfs: replace kernfs_node->u.completion with ↵	Greg Kroah-Hartman
	kernfs_root->deactivate_waitq" This reverts commit ea1c472dfeada211a0100daa7976e8e8e779b858. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()"	Greg Kroah-Hartman
	This reverts commit a69d001cfc712b96ec9d7ba44d6285702a38dabf. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	Revert "kernfs: remove KERNFS_REMOVED"	Greg Kroah-Hartman
	This reverts commit ae34372eb8408b3d07e870f1939f99007a730d28. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	Revert "kernfs: restructure removal path to fix possible premature return"	Greg Kroah-Hartman
	This reverts commit 45a140e587f3d32d8d424ed940dffb61e1739047. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	phylib: Add of_phy_attach	Andy Fleming
	10G PHYs don't currently support running the state machine, which is implicitly setup via of_phy_connect(). Therefore, it is necessary to implement an OF version of phy_attach(), which does everything except start the state machine. Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13	phylib: Support attaching to generic 10g driver	Andy Fleming
	phy_attach_direct() may now attach to a generic 10G driver. It can also be used exactly as phy_connect_direct(), which will be useful when using of_mdio, as phy_connect (and therefore of_phy_connect) start the PHY state machine, which is currently irrelevant for 10G PHYs. Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13	phylib: introduce PHY_INTERFACE_MODE_XGMII for 10G PHY	Andy Fleming
	Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13	phylib: Add Clause 45 read/write functions	Andy Fleming
	Need an extra parameter to read or write Clause 45 PHYs, so need a different API with the extra parameter. Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13	Revert "kernfs: remove kernfs_addrm_cxt"	Greg Kroah-Hartman
	This reverts commit 99177a34110889a8f2c36420c34e3bcc9bfd8a70. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	Revert "kernfs: implement kernfs_{de\|re}activate[_self]()"	Greg Kroah-Hartman
	This reverts commit 9f010c2ad5194a4b682e747984477850fabd03be. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its ↵	Greg Kroah-Hartman
	wrappers" This reverts commit 1ae06819c77cff1ea2833c94f8c093fe8a5c79db. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	Revert "sysfs, driver-core: remove unused ↵	Greg Kroah-Hartman
	{sysfs\|device}_schedule_callback_owner()" This reverts commit d1ba277e79889085a2faec3b68b91ce89c63f888. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-13	Merge branch 'for-john' of ↵	John W. Linville
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
2014-01-13	PCI: Make local functions static	Stephen Hemminger
	Using 'make namespacecheck' identify code which should be declared static. Checked for users in other driver/archs as well. Compile tested only. This stops exporting the following interfaces to modules: pci_target_state() pci_load_saved_state() [bhelgaas: retained pci_find_next_ext_capability() and pci_cfg_space_size()] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-13	PCI: Remove unused alloc_pci_dev()	Stephen Hemminger
	My philosophy is unused code is dead code. And dead code is subject to bit rot and is a likely source of bugs. Use it or lose it. This removes this unused and deprecated interface: alloc_pci_dev() [bhelgaas: split to separate patch] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-13	PCI: Remove unused pci_renumber_slot()	Stephen Hemminger
	My philosophy is unused code is dead code. And dead code is subject to bit rot and is a likely source of bugs. Use it or lose it. This reverts part of f46753c5e354 ("PCI: introduce pci_slot") and d25b7c8d6ba2 ("PCI: rename pci_update_slot_number to pci_renumber_slot"), removing this interface: pci_renumber_slot() [bhelgaas: split to separate patch, add historical link from Alex] Link: http://lkml.kernel.org/r/20081009043140.8678.44164.stgit@bob.kio Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Chiang <achiang@canonical.com>
2014-01-13	PCI: Remove unused pcie_aspm_enabled()	Stephen Hemminger
	My philosophy is unused code is dead code. And dead code is subject to bit rot and is a likely source of bugs. Use it or lose it. This reverts part of 3e1b16002af2 ("ACPI/PCI: PCIe ASPM _OSC support capabilities called when root bridge added"), removing this interface: pcie_aspm_enabled() [bhelgaas: split to separate patch] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Andrew Patterson <andrew.patterson@hp.com>
2014-01-13	PCI: Remove unused pci_vpd_truncate()	Stephen Hemminger
	My philosophy is unused code is dead code. And dead code is subject to bit rot and is a likely source of bugs. Use it or lose it. This reverts db5679437a2b ("PCI: add interface to set visible size of VPD"), removing this interface: pci_vpd_truncate() [bhelgaas: split to separate patch, also remove prototype from pci.h] Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-13	mmc: sdhci: add quirk for broken HS200 support	David Cohen
	This patch defines a quirk for platforms unable to enable HS200 support. Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Reviewed-by: Chuanxiao Dong <chuanxiao.dong@intel.com> Acked-by: Dong Aisheng <b29396@freescale.com> Cc: stable <stable@vger.kernel.org> # [3.13] Signed-off-by: Chris Ball <chris@printf.net>
2014-01-13	mmc: SDHI: add SoC specific workaround via HW version	Kuninori Morimoto
	One of Renesas SDHI chip needs workaround to use it, and, we can judge it based on chip version. This patch adds very quick-hack workaround method, since we still don't know how many chips need workaround in the future. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2014-01-13	mmc: tmio: add new TMIO_MMC_HAVE_HIGH_REG flags	Kuninori Morimoto
	The accessibility checking method to the higher register was added by 69d1fe18e92afb (mmc: tmio: only access registers above 0xff, if available) But, it doesn't care 32bit register. It is impossible to calculate it from the resource size, since there is 16/32 bit register IP (e.g. VERSION is located on 0xe2 if 16bit register, but it is located on 0x1c4 if 32bit register). This patch adds new TMIO_MMC_HAVE_HIGH_REG flags, tmio_mmc driver has it, and sh_mobile_sdhi doesn't have it today. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2014-01-13	mmc: tmio: bus_shift become tmio_mmc_data member	Kuninori Morimoto
	.bus_shift is used to 16/32bit register access offset calculation on tmio driver. tmio_mmc_xxx is used from Toshiba/Renesas now, but this bus_shift value depends on HW IP. This patch moves .bus_shift to tmio_mmc_data member and sets it on each driver. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2014-01-13	sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding	Peter Zijlstra
	With various drivers wanting to inject idle time; we get people calling idle routines outside of the idle loop proper. Therefore we need to be extra careful about not missing TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations. While looking at this, I also realized there's a small window in the existing idle loop where we can miss TIF_NEED_RESCHED; when it hits right after the tif_need_resched() test at the end of the loop but right before the need_resched() test at the start of the loop. So move preempt_fold_need_resched() out of the loop where we're guaranteed to have TIF_NEED_RESCHED set. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched/preempt, locking: Rework local_bh_{dis,en}able()	Peter Zijlstra
	Currently local_bh_disable() is out-of-line for no apparent reason. So inline it to save a few cycles on call/return nonsense, the function body is a single add on x86 (a few loads and store extra on load/store archs). Also expose two new local_bh functions: __local_bh_{dis,en}able_ip(unsigned long ip, unsigned int cnt); Which implement the actual local_bh_{dis,en}able() behaviour. The next patch uses the exposed @cnt argument to optimize bh lock functions. With build fixes from Jacob Pan. Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	cgroup: remove stray references to css_id	Hugh Dickins
	Trivial: remove the few stray references to css_id, which itself was removed in v3.13's 2ff2a7d03bbe "cgroup: kill css_id". Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-01-13	sched/clock, x86: Use a static_key for sched_clock_stable	Peter Zijlstra
	In order to avoid the runtime condition and variable load turn sched_clock_stable into a static_key. Also provide a shorter implementation of local_clock() and cpu_clock(int) when sched_clock_stable==1. MAINLINE PRE POST sched_clock_stable: 1 1 1 (cold) sched_clock: 329841 221876 215295 (cold) local_clock: 301773 234692 220773 (warm) sched_clock: 38375 25602 25659 (warm) local_clock: 100371 33265 27242 (warm) rdtsc: 27340 24214 24208 sched_clock_stable: 0 0 0 (cold) sched_clock: 382634 235941 237019 (cold) local_clock: 396890 297017 294819 (warm) sched_clock: 38194 25233 25609 (warm) local_clock: 143452 71234 71232 (warm) rdtsc: 27345 24245 24243 Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched/preempt: Take away preempt_enable_no_resched() from modules	Peter Zijlstra
	Discourage drivers/modules to be creative with preemption. Sadly all is implemented in macros and inline so if they want to do evil they still can, but at least try and discourage some. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Cc: rjw@rjwysocki.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-fn7h6vu8wtgxk0ih402qcijx@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	locking: Optimize lock_bh functions	Peter Zijlstra
	Currently all _bh_ lock functions do two preempt_count operations: local_bh_disable(); preempt_disable(); and for the unlock: preempt_enable_no_resched(); local_bh_enable(); Since its a waste of perfectly good cycles to modify the same variable twice when you can do it in one go; use the new __local_bh_{dis,en}able_ip() functions that allow us to provide a preempt_count value to add/sub. So define SOFTIRQ_LOCK_OFFSET as the offset a _bh_ lock needs to add/sub to be done in one go. As a bonus it gets rid of the preempt_enable_no_resched() usage. This reduces a 1000 loops of: spin_lock_bh(&bh_lock); spin_unlock_bh(&bh_lock); from 53596 cycles to 51995 cycles. I didn't do enough measurements to say for absolute sure that the result is significant but the the few runs I did for each suggest it is so. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Cc: rjw@rjwysocki.net Cc: rui.zhang@intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched/deadline: Remove the sysctl_sched_dl knobs	Peter Zijlstra
	Remove the deadline specific sysctls for now. The problem with them is that the interaction with the exisiting rt knobs is nearly impossible to get right. The current (as per before this patch) situation is that the rt and dl bandwidth is completely separate and we enforce rt+dl < 100%. This is undesirable because this means that the rt default of 95% leaves us hardly any room, even though dl tasks are saver than rt tasks. Another proposed solution was (a discarted patch) to have the dl bandwidth be a fraction of the rt bandwidth. This is highly confusing imo. Furthermore neither proposal is consistent with the situation we actually want; which is rt tasks ran from a dl server. In which case the rt bandwidth is a direct subset of dl. So whichever way we go, the introduction of dl controls at this point is painful. Therefore remove them and instead share the rt budget. This means that for now the rt knobs are used for dl admission control and the dl runtime is accounted against the rt runtime. I realise that this isn't entirely desirable either; but whatever we do we appear to need to change the interface later, so better have a small interface for now. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks	Dario Faggioli
	In order of deadline scheduling to be effective and useful, it is important that some method of having the allocation of the available CPU bandwidth to tasks and task groups under control. This is usually called "admission control" and if it is not performed at all, no guarantee can be given on the actual scheduling of the -deadline tasks. Since when RT-throttling has been introduced each task group have a bandwidth associated to itself, calculated as a certain amount of runtime over a period. Moreover, to make it possible to manipulate such bandwidth, readable/writable controls have been added to both procfs (for system wide settings) and cgroupfs (for per-group settings). Therefore, the same interface is being used for controlling the bandwidth distrubution to -deadline tasks and task groups, i.e., new controls but with similar names, equivalent meaning and with the same usage paradigm are added. However, more discussion is needed in order to figure out how we want to manage SCHED_DEADLINE bandwidth at the task group level. Therefore, this patch adds a less sophisticated, but actually very sensible, mechanism to ensure that a certain utilization cap is not overcome per each root_domain (the single rq for !SMP configurations). Another main difference between deadline bandwidth management and RT-throttling is that -deadline tasks have bandwidth on their own (while -rt ones doesn't!), and thus we don't need an higher level throttling mechanism to enforce the desired bandwidth. This patch, therefore: - adds system wide deadline bandwidth management by means of: * /proc/sys/kernel/sched_dl_runtime_us, * /proc/sys/kernel/sched_dl_period_us, that determine (i.e., runtime / period) the total bandwidth available on each CPU of each root_domain for -deadline tasks; - couples the RT and deadline bandwidth management, i.e., enforces that the sum of how much bandwidth is being devoted to -rt -deadline tasks to stay below 100%. This means that, for a root_domain comprising M CPUs, -deadline tasks can be created until the sum of their bandwidths stay below: M * (sched_dl_runtime_us / sched_dl_period_us) It is also possible to disable this bandwidth management logic, and be thus free of oversubscribing the system up to any arbitrary level. Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched/deadline: Add SCHED_DEADLINE inheritance logic	Dario Faggioli
	Some method to deal with rt-mutexes and make sched_dl interact with the current PI-coded is needed, raising all but trivial issues, that needs (according to us) to be solved with some restructuring of the pi-code (i.e., going toward a proxy execution-ish implementation). This is under development, in the meanwhile, as a temporary solution, what this commits does is: - ensure a pi-lock owner with waiters is never throttled down. Instead, when it runs out of runtime, it immediately gets replenished and it's deadline is postponed; - the scheduling parameters (relative deadline and default runtime) used for that replenishments --during the whole period it holds the pi-lock-- are the ones of the waiting task with earliest deadline. Acting this way, we provide some kind of boosting to the lock-owner, still by using the existing (actually, slightly modified by the previous commit) pi-architecture. We would stress the fact that this is only a surely needed, all but clean solution to the problem. In the end it's only a way to re-start discussion within the community. So, as always, comments, ideas, rants, etc.. are welcome! :-) Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> [ Added !RT_MUTEXES build fix. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	rtmutex: Turn the plist into an rb-tree	Peter Zijlstra
	Turn the pi-chains from plist to rb-tree, in the rt_mutex code, and provide a proper comparison function for -deadline and -priority tasks. This is done mainly because: - classical prio field of the plist is just an int, which might not be enough for representing a deadline; - manipulating such a list would become O(nr_deadline_tasks), which might be to much, as the number of -deadline task increases. Therefore, an rb-tree is used, and tasks are queued in it according to the following logic: - among two -priority (i.e., SCHED_BATCH/OTHER/RR/FIFO) tasks, the one with the higher (lower, actually!) prio wins; - among a -priority and a -deadline task, the latter always wins; - among two -deadline tasks, the one with the earliest deadline wins. Queueing and dequeueing functions are changed accordingly, for both the list of a task's pi-waiters and the list of tasks blocked on a pi-lock. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-again-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-10-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched/deadline: Add period support for SCHED_DEADLINE tasks	Harald Gustafsson
	Make it possible to specify a period (different or equal than deadline) for -deadline tasks. Relative deadlines (D_i) are used on task arrivals to generate new scheduling (absolute) deadlines as "d = t + D_i", and periods (P_i) to postpone the scheduling deadlines as "d = d + P_i" when the budget is zero. This is in general useful to model (and schedule) tasks that have slow activation rates (long periods), but have to be scheduled soon once activated (short deadlines). Signed-off-by: Harald Gustafsson <harald.gustafsson@ericsson.com> Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-7-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic	Juri Lelli
	Introduces data structures relevant for implementing dynamic migration of -deadline tasks and the logic for checking if runqueues are overloaded with -deadline tasks and for choosing where a task should migrate, when it is the case. Adds also dynamic migrations to SCHED_DEADLINE, so that tasks can be moved among CPUs when necessary. It is also possible to bind a task to a (set of) CPU(s), thus restricting its capability of migrating, or forbidding migrations at all. The very same approach used in sched_rt is utilised: - -deadline tasks are kept into CPU-specific runqueues, - -deadline tasks are migrated among runqueues to achieve the following: * on an M-CPU system the M earliest deadline ready tasks are always running; * affinity/cpusets settings of all the -deadline tasks is always respected. Therefore, this very special form of "load balancing" is done with an active method, i.e., the scheduler pushes or pulls tasks between runqueues when they are woken up and/or (de)scheduled. IOW, every time a preemption occurs, the descheduled task might be sent to some other CPU (depending on its deadline) to continue executing (push). On the other hand, every time a CPU becomes idle, it might pull the second earliest deadline ready task from some other CPU. To enforce this, a pull operation is always attempted before taking any scheduling decision (pre_schedule()), as well as a push one after each scheduling decision (post_schedule()). In addition, when a task arrives or wakes up, the best CPU where to resume it is selected taking into account its affinity mask, the system topology, but also its deadline. E.g., from the scheduling point of view, the best CPU where to wake up (and also where to push) a task is the one which is running the task with the latest deadline among the M executing ones. In order to facilitate these decisions, per-runqueue "caching" of the deadlines of the currently running and of the first ready task is used. Queued but not running tasks are also parked in another rb-tree to speed-up pushes. Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-5-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched/deadline: Add SCHED_DEADLINE structures & implementation	Dario Faggioli
	Introduces the data structures, constants and symbols needed for SCHED_DEADLINE implementation. Core data structure of SCHED_DEADLINE are defined, along with their initializers. Hooks for checking if a task belong to the new policy are also added where they are needed. Adds a scheduling class, in sched/dl.c and a new policy called SCHED_DEADLINE. It is an implementation of the Earliest Deadline First (EDF) scheduling algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS) that makes it possible to isolate the behaviour of tasks between each other. The typical -deadline task will be made up of a computation phase (instance) which is activated on a periodic or sporadic fashion. The expected (maximum) duration of such computation is called the task's runtime; the time interval by which each instance need to be completed is called the task's relative deadline. The task's absolute deadline is dynamically calculated as the time instant a task (better, an instance) activates plus the relative deadline. The EDF algorithms selects the task with the smallest absolute deadline as the one to be executed first, while the CBS ensures each task to run for at most its runtime every (relative) deadline length time interval, avoiding any interference between different tasks (bandwidth isolation). Thanks to this feature, also tasks that do not strictly comply with the computational model sketched above can effectively use the new policy. To summarize, this patch: - introduces the data structures, constants and symbols needed; - implements the core logic of the scheduling algorithm in the new scheduling class file; - provides all the glue code between the new scheduling class and the core scheduler and refines the interactions between sched/dl and the other existing scheduling classes. Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com> Signed-off-by: Fabio Checconi <fchecconi@gmail.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	sched: Add new scheduler syscalls to support an extended scheduling ↵	Dario Faggioli
	parameters ABI Add the syscalls needed for supporting scheduling algorithms with extended scheduling parameters (e.g., SCHED_DEADLINE). In general, it makes possible to specify a periodic/sporadic task, that executes for a given amount of runtime at each instance, and is scheduled according to the urgency of their own timing constraints, i.e.: - a (maximum/typical) instance execution time, - a minimum interval between consecutive instances, - a time constraint by which each instance must be completed. Thus, both the data structure that holds the scheduling parameters of the tasks and the system calls dealing with it must be extended. Unfortunately, modifying the existing struct sched_param would break the ABI and result in potentially serious compatibility issues with legacy binaries. For these reasons, this patch: - defines the new struct sched_attr, containing all the fields that are necessary for specifying a task in the computational model described above; - defines and implements the new scheduling related syscalls that manipulate it, i.e., sched_setattr() and sched_getattr(). Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a proof of concept and for developing and testing purposes. Making them available on other architectures is straightforward. Since no "user" for these new parameters is introduced in this patch, the implementation of the new system calls is just identical to their already existing counterpart. Future patches that implement scheduling policies able to exploit the new data structure must also take care of modifying the sched_*attr() calls accordingly with their own purposes. Signed-off-by: Dario Faggioli <raistlin@linux.it> [ Rewrote to use sched_attr. ] Signed-off-by: Juri Lelli <juri.lelli@gmail.com> [ Removed sched_setscheduler2() for now. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	Merge branch 'sched/urgent' into sched/core	Ingo Molnar
	Pick up the latest fixes before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13	spi: Kill superfluous cast in spi_w8r16()	Geert Uytterhoeven
	spi_write_then_read() takes a "void " for rxbuf, so there's no need to cast the buffer pointer to "u8 ". Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Signed-off-by: Mark Brown <broonie@linaro.org>