linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2015-10-20	stop_machine: Ensure that a queued callback will be called before ↵	Oleg Nesterov
	cpu_stop_park() cpu_stop_queue_work() checks stopper->enabled before it queues the work, but ->enabled == T can only guarantee cpu_stop_signal_done() if we race with cpu_down(). This is not enough for stop_two_cpus() or stop_machine(), they will deadlock if multi_cpu_stop() won't be called by one of the target CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex. But stop_two_cpus() has to check cpu_active() to avoid the same race with hotplug, and this check is very unobvious and probably not even correct if we race with cpu_up(). Change cpu_down() pass to clear ->enabled before cpu_stopper_thread() flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set. Note also that smpboot_thread_call() calls cpu_stop_unpark() which sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until cpu_stopper_thread() is called at least once. This all means that if cpu_stop_queue_work() succeeds, we know that work->fn() will be called. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20151008145131.GA18139@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20	Merge branch 'sched/urgent' into sched/core, to pick up fixes and resolve ↵	Ingo Molnar
	conflicts Conflicts: kernel/sched/fair.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20	Merge tag 'v4.3-rc6' into locking/core, to pick up fixes before applying new ↵	Ingo Molnar
	changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-19	ARM: 8446/1: amba: Remove unused callbacks for legacy system PM	Ulf Hansson
	Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-10-19	Merge branch 'for-mingo' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Miscellaneous fixes. (Paul E. McKenney, Boqun Feng, Oleg Nesterov, Patrick Marlier) - Improvements to expedited grace periods. (Paul E. McKenney) - Performance improvements to and locktorture tests for percpu-rwsem. (Oleg Nesterov, Paul E. McKenney) - Torture-test changes. (Paul E. McKenney, Davidlohr Bueso) - Documentation updates. (Paul E. McKenney) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-18	debugfs: Add debugfs_create_ulong()	Viresh Kumar
	Add debugfs_create_ulong() for the users of type 'unsigned long'. These will be 32 bits long on a 32 bit machine and 64 bits long on a 64 bit machine. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	vme: 8-bit status/id takes 256 values, not 255	Dmitry Kalinkin
	Fixes an off by one array size. Signed-off-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	msm: remove unused header	Gabriel Laskar
	Signed-off-by: Gabriel Laskar <gabriel@lse.epita.fr> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	tty: Abstract tty buffer work	Peter Hurley
	Introduce API functions to restart and cancel tty buffer work, rather than manipulate buffer work directly. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	tty: Combine SIGTTOU/SIGTTIN handling	Peter Hurley
	The job_control() check in n_tty_read() has nearly identical purpose and results as tty_check_change(). Both functions' purpose is to determine if the current task's pgrp is the foreground pgrp for the tty, and if not, to signal the current pgrp. Introduce __tty_check_change() which takes the signal to send and performs the shared operations for job control() and tty_check_change(). Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	dmaengine: hsu: remove platform data	Heikki Krogerus
	There are no platforms where it's not possible to calculate the number of channels based on IO space length, and since that is the only purpose for struct hsu_dma_platform_data, removing it. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Vinod Koul <vinod.koul@intel.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	dmaengine: hsu: introduce stubs for the exported functions	Heikki Krogerus
	This allows UART drivers to register HSU DMA Engine without being forced to use ifdefs. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Vinod Koul <vinod.koul@intel.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	tty: Remove wait_event_interruptible_tty()	Peter Hurley
	In-tree users of wait_event_interruptible_tty() have been removed; remove. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	tty: r3964: Replace/remove bogus tty lock use	Peter Hurley
	The tty lock is strictly for serializing tty lifetime events (open/close/hangup), and not for line discipline serialization. The tty core already provides serialization of concurrent writes to the same tty, and line discipline lifetime management (by ldisc references), so pinning the tty via tty_lock() is unnecessary and counter-productive; remove tty lock use. However, the line discipline is responsible for serializing reads (if required by the line discipline); add read_lock mutex to serialize calls of r3964_read(). Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	tty: r3964: Use tty->read_wait waitqueue	Peter Hurley
	The tty core provides read_wait waitqueue specifically for line disciplines to wait readers; otherwise, the line discipline may miss wakeups generated by the tty core. NB: The tty core already provides serialization for the line discipline's close() method, and guarantees no readers or writers will be using the closing instance of the line discipline. Completely remove that wakeup. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	tty: Remove tty_port::close_wait	Peter Hurley
	With the removal of tty_wait_until_sent_from_close(), tty drivers no longer wait during open for parallel closes to complete (instead, the tty core waits before calling the driver open() method). Thus, the close_wait waitqueue is no longer used for waiting. Remove struct tty_port::close_wait. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	tty: Remove tty_wait_until_sent_from_close()	Peter Hurley
	tty_wait_until_sent_from_close() drops the tty lock while waiting for the tty driver to finish sending previously accepted data (ie., data remaining in its write buffer and transmit fifo). tty_wait_until_sent_from_close() was added by commit a57a7bf3fc7e ("TTY: define tty_wait_until_sent_from_close") to prevent the entire tty subsystem from being unable to open new ttys while waiting for one tty to close while output drained. However, since commit 0911261d4cb6 ("tty: Don't take tty_mutex for tty count changes"), holding a tty lock while closing does not prevent other ttys from being opened/closed/hung up, but only prevents lifetime event changes for the tty under lock. Holding the tty lock while waiting for output to drain does prevent parallel non-blocking opens (O_NONBLOCK) from advancing or returning while the tty lock is held. However, all parallel opens _already_ block even if the tty lock is dropped while closing and the parallel open advances. Blocking in open has been in mainline since at least 2.6.29 (see tty_port_block_til_ready(); note the test for O_NONBLOCK is _after_ the wait while ASYNC_CLOSING). IOW, before this patch a non-blocking open will sleep anyway for the _entire_ duration of a parallel hardware shutdown, and when it wakes, the error return will cause a release of its tty, and it will restart with a fresh attempt to open. Similarly with a blocking open that is already waiting; when it's woken, the hardware shutdown has already completed to ASYNC_INITIALIZED is not set, which forces a release and restart as well. So, holding the tty lock across the _entire_ close (which is what this patch does), even while waiting for output to drain, is equivalent to the current outcome wrt parallel opens. Cc: Alan Cox <alan@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> CC: Arnd Bergmann <arnd@arndb.de> CC: Karsten Keil <isdn@linux-pingi.de> CC: linuxppc-dev@lists.ozlabs.org Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17	Merge branch 'master' of ↵	Pablo Neira Ayuso
	git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next This merge resolves conflicts with 75aec9df3a78 ("bridge: Remove br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve netns support in the network stack that reached upstream via David's net-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/bridge/br_netfilter_hooks.c
2015-10-16	Merge tag 'extcon-next-for-4.4' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next Chanwoo writes: Update extcon for 4.4 Detailed description for patchset: 1. Update the extcon core: - Modify the unique identification and name of each external connector with the additional prefix to clarify both attribute and meaning of external connector as following: : EXTCON_CHG_* mean the charger connector. : EXTCON_JACK_* mean the jack connector. : EXTCON_DISP_* mean the display port connector. - Keep the standard name of USB charging port by refering to the "Battery Charging v1.2 Spec and Adopters Agreement"[1] to use the standard name of USB charging port as following: : EXTCON_CHG_USB_SDP /* Standard Downstream Port / : EXTCON_CHG_USB_DCP / Dedicated Charging Port / : EXTCON_CHG_USB_CDP / Charging Downstream Port / : EXTCON_CHG_USB_ACA / Accessory Charger Adapter */ [1] www.usb.org/developers/docs/devclass_docs/BCv1.2_070312.zip 2. Update the extcon-arizona.c driver: - Support the WM8998 and WM1814 codec for jack detection. - Support for the ADC mode microphone detection and the general purpose switch for pop suppression. - Fix bug include fixing the headphone detection accuracy at the top end of the range and some corrections around the use of the microphone clamps. 3. Update the extcon-gpio.c driver: - Clean-up the extcon-gpio driver and fix minor issue before supporting the Device tree binding of it. 4. Clean-up and fix the minor issue for extcon drivers: - Export OF module alias information for extcon-rt8973a.c and extcon-sm5502.c. - Fix wrong type of variable of for extcon-rt8973a.c and extcon-sm5502.c. - Use resource managed API for extcon-axp288.c.
2015-10-16	gpiolib: Add and use OF_GPIO_SINGLE_ENDED flag	Laurent Pinchart
	The flag matches the DT GPIO_SINGLE_ENDED flag and allows drivers to parse and use the DT flag to handle single-ended (open-drain or open-source) GPIOs. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-10-16	PCI/ACPI: Add interface acpi_pci_root_create()	Jiang Liu
	Introduce common interface acpi_pci_root_create() and related data structures to create PCI root bus for ACPI PCI host bridges. It will be used to kill duplicated arch specific code for IA64 and x86. It may also help ARM64 in future. Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-16	ACPI/PCI: Enhance ACPI core to support sparse IO space	Jiang Liu
	Enhance ACPI resource parsing interfaces to support sparse IO space, which will be used to share common code between x86 and IA64 later. Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-16	Merge back earlier 'pm-domains' material for v4.4.	Rafael J. Wysocki

2015-10-16	gpiolib: provide generic request/free implementations	Jonas Gorski
	Provide generic request/free implementations that pinctrl aware gpio drivers can use instead of open coding if they use a 1:1 pin to gpio signal mapping. Signed-off-by: Jonas Gorski <jogo@openwrt.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-10-16	clk: Make clk input parameter of __clk_get_name() const	Geert Uytterhoeven
	When calling __clk_get_name() on a const clock: warning: passing argument 1 of '__clk_get_name' discards 'const' qualifier from pointer target type include/linux/clk-provider.h:613:13: note: expected 'struct clk ' but argument is of type 'const struct clk ' __clk_get_name() does not modify the passed clock, hence make it const. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2015-10-16	regulator: introduce min_dropout_uV	Sascha Hauer
	Many voltage Regulators need a input voltage that is higher than the output voltage. Allow to specify a minimum dropout voltage which will be used later to find the best input voltage for regulators. [Changed uv to uV for consistency and legibility -- broonie] Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Mark Brown <broonie@kernel.org>
2015-10-16	netfilter: turn NF_HOOK into an inline function	Arnd Bergmann
	A recent change to the dst_output handling caused a new warning when the call to NF_HOOK() is the only used of a local variable passed as 'dev', and CONFIG_NETFILTER is disabled: net/ipv6/ip6_output.c: In function 'ip6_output': net/ipv6/ip6_output.c:135:21: warning: unused variable 'dev' [-Wunused-variable] The reason for this is that the NF_HOOK macro in this case does not reference the variable at all, and the call to dev_net(dev) got removed from the ip6_output function. To avoid that warning now and in the future, this changes the macro into an equivalent inline function, which tells the compiler that the variable is passed correctly but still unused. The dn_forward function apparently had the same problem in the past and added a local workaround that no longer works with the inline function. In order to avoid a regression, we have to also remove the #ifdef from decnet in the same patch. Fixes: ede2059dbaf9 ("dst: Pass net into dst->output") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16	netfilter: remove hook owner refcounting	Florian Westphal
	since commit 8405a8fff3f8 ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook") all pending queued entries are discarded. So we can simply remove all of the owner handling -- when module is removed it also needs to unregister all its hooks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16	net: introduce pre-change upper device notifier	Jiri Pirko
	This newly introduced netdevice notifier is called before actual change upper happens. That provides a possibility for notifier handlers to know upper change will happen and react to it, including possibility to forbid the change. That is valuable for drivers which can check if the upper device linkage is supported and forbid that in case it is not. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16	PCI/MSI: Allow the MSI domain to be device-specific	Marc Zyngier
	So far, we've always considered that for a given PCI device, its MSI controller was either set by the architecture-specific pcibios hook, or simply inherited from the host bridge. This doesn't cover things like firmware-defined topologies like msi-map (DT) or IORT (ACPI), which can provide information about which MSI controller to use on a per-device basis. This patch adds the necessary hook into the MSI code to allow this feature, and provides the msi-map functionnality as a first implementation. Acked-by: Rob Herring <robh@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-10-16	of/irq: Use the msi-map property to provide device-specific MSI domain	Marc Zyngier
	While msi-parent is used to point to the MSI controller that works for all the devices behind a root complex, it doesn't allow configurations where each individual device can be routed to a separate MSI controller. The msi-map property provides this flexibility (and much more), so let's add a utility function that parses it, and return the corresponding MSI domain. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-10-16	of/irq: Add support code for multi-parent version of "msi-parent"	Marc Zyngier
	Since 126b16e2ad98 ("Docs: dt: add generic MSI bindings"), the definition of "msi-parent" has evolved, while maintaining some degree of compatibility. It can now express multiple MSI controllers as parents, as well as some sideband data being communicated to the controller. This patch adds the parsing of the property, iterating over the multiple parents until a suitable irqdomain is found. It can also fallback to the original parsing if the old binding is detected. This support code gets used in the subsequent patches. Suggested-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-10-16	PCI/MSI: Add helper function pci_msi_domain_get_msi_rid().	David Daney
	Add pci_msi_domain_get_msi_rid() to return the MSI requester id (RID). Initially needed by gic-v3 based systems. It will be used by follow on patch to drivers/irqchip/irq-gic-v3-its-pci-msi.c Initially supports mapping the RID via OF device tree. In the future, this could be extended to use ACPI _IORT tables as well. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-10-16	of/irq: Add new function of_msi_map_rid()	David Daney
	The device tree property "msi-map" specifies how to create the PCI requester id used in some MSI controllers. Add a new function of_msi_map_rid() that finds the msi-map property and applies its translation to a given requester id. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2015-10-16	extcon: Modify the id and name of external connector	Chanwoo Choi
	This patch modifies the id and name of external connector with the additional prefix to clarify both attribute and meaning of external connector as following: - EXTCON_CHG_* mean the charger connector. - EXTCON_JACK_* mean the jack connector. - EXTCON_DISP_* mean the display port connector. Following table show the new name of external connector with old name: -------------------------------------------------- Old extcon name \| New extcon name \| -------------------------------------------------- EXTCON_TA \| EXTCON_CHG_USB_DCP \| EXTCON_CHARGE_DOWNSTREAM\| EXTCON_CHG_USB_CDP \| EXTCON_FAST_CHARGER \| EXTCON_CHG_USB_FAST \| EXTCON_SLOW_CHARGER \| EXTCON_CHG_USB_SLOW \| -------------------------------------------------- EXTCON_MICROPHONE \| EXTCON_JACK_MICROPHONE \| EXTCON_HEADPHONE \| EXTCON_JACK_HEADPHONE \| EXTCON_LINE_IN \| EXTCON_JACK_LINE_IN \| EXTCON_LINE_OUT \| EXTCON_JACK_LINE_OUT \| EXTCON_VIDEO_IN \| EXTCON_JACK_VIDEO_IN \| EXTCON_VIDEO_OUT \| EXTCON_JACK_VIDEO_OUT \| EXTCON_SPDIF_IN \| EXTCON_JACK_SPDIF_IN \| EXTCON_SPDIF_OUT \| EXTCON_JACK_SPDIF_OUT \| -------------------------------------------------- EXTCON_HMDI \| EXTCON_DISP_HDMI \| EXTCON_MHL \| EXTCON_DISP_MHL \| EXTCON_DVI \| EXTCON_DISP_DVI \| EXTCON_VGA \| EXTCON_DISP_VGA \| -------------------------------------------------- And, when altering the name of USB charger connector, EXTCON refers to the "Battery Charging v1.2 Spec and Adopters Agreement"[1] to use the standard name of USB charging port as following. Following name of USB charging port are already used in power_supply subsystem. We chan check it on patch[2]. - EXTCON_CHG_USB_SDP /* Standard Downstream Port / - EXTCON_CHG_USB_DCP / Dedicated Charging Port / - EXTCON_CHG_USB_CDP / Charging Downstream Port / - EXTCON_CHG_USB_ACA / Accessory Charger Adapter */ [1] www.usb.org/developers/docs/devclass_docs/BCv1.2_070312.zip [2] commit 85efc8a18ced ("power_supply: Add types for USB chargers") Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> [ckeepax: For the Arizona changes] Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Reviewed-by: Roger Quadros <rogerq@ti.com>
2015-10-15	cgroup: add cgroup_subsys->free() method and use it to fix pids controller	Tejun Heo
	pids controller is completely broken in that it uncharges when a task exits allowing zombies to escape resource control. With the recent updates, cgroup core now maintains cgroup association till task free and pids controller can be fixed by uncharging on free instead of exit. This patch adds cgroup_subsys->free() method and update pids controller to use it instead of ->exit() for uncharging. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Aleksa Sarai <cyphar@cyphar.com>
2015-10-15	cgroup: keep zombies associated with their original cgroups	Tejun Heo
	cgroup_exit() is called when a task exits and disassociates the exiting task from its cgroups and half-attach it to the root cgroup. This is unnecessary and undesirable. No controller actually needs an exiting task to be disassociated with non-root cgroups. Both cpu and perf_event controllers update the association to the root cgroup from their exit callbacks just to keep consistent with the cgroup core behavior. Also, this disassociation makes it difficult to track resources held by zombies or determine where the zombies came from. Currently, pids controller is completely broken as it uncharges on exit and zombies always escape the resource restriction. With cgroup association being reset on exit, fixing it is pretty painful. There's no reason to reset cgroup membership on exit. The zombie can be removed from its css_set so that it doesn't show up on "cgroup.procs" and thus can't be migrated or interfere with cgroup removal. It can still pin and point to the css_set so that its cgroup membership is maintained. This patch makes cgroup core keep zombies associated with their cgroups at the time of exit. * Previous patches decoupled populated_cnt tracking from css_set lifetime, so a dying task can be simply unlinked from its css_set while pinning and pointing to the css_set. This keeps css_set association from task side alive while hiding it from "cgroup.procs" and populated_cnt tracking. The css_set reference is dropped when the task_struct is freed. * ->exit() callback no longer needs the css arguments as the associated css never changes once PF_EXITING is set. Removed. * cpu and perf_events controllers no longer need ->exit() callbacks. There's no reason to explicitly switch away on exit. The final schedule out is enough. The callbacks are removed. * On traditional hierarchies, nothing changes. "/proc/PID/cgroup" still reports "/" for all zombies. On the default hierarchy, "/proc/PID/cgroup" keeps reporting the cgroup that the task belonged to at the time of exit. If the cgroup gets removed before the task is reaped, " (deleted)" is appended. v2: Build brekage due to missing dummy cgroup_free() when !CONFIG_CGROUP fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
2015-10-15	cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock	Tejun Heo
	css_set_rwsem is the inner lock protecting css_sets and is accessed from hot paths such as fork and exit. Internally, it has no reason to be a rwsem or even mutex. There are no internal blocking operations while holding it. This was rwsem because css task iteration used to expose it to external iterator users. As the previous patch updated css task iteration such that the locking is not leaked to its users, there's no reason to keep it a rwsem. This patch converts css_set_rwsem to a spinlock and rename it to css_set_lock. It uses bh-safe operations as a planned usage needs to access it from RCU callback context. Signed-off-by: Tejun Heo <tj@kernel.org>
2015-10-15	cgroup: don't hold css_set_rwsem across css task iteration	Tejun Heo
	css_sets are synchronized through css_set_rwsem but the locking scheme is kinda bizarre. The hot paths - fork and exit - have to write lock the rwsem making the rw part pointless; furthermore, many readers already hold cgroup_mutex. One of the readers is css task iteration. It read locks the rwsem over the entire duration of iteration. This leads to silly locking behavior. When cpuset tries to migrate processes of a cgroup to a different NUMA node, css_set_rwsem is held across the entire migration attempt which can take a long time locking out forking, exiting and other cgroup operations. This patch updates css task iteration so that it locks css_set_rwsem only while the iterator is being advanced. css task iteration involves two levels - css_set and task iteration. As css_sets in use are practically immutable, simply pinning the current one is enough for resuming iteration afterwards. Task iteration is tricky as tasks may leave their css_set while iteration is in progress. This is solved by keeping track of active iterators and advancing them if their next task leaves its css_set. v2: put_task_struct() in css_task_iter_next() moved outside css_set_rwsem. A later patch will add cgroup operations to task_struct free path which may grab the same lock and this avoids deadlock possibilities. css_set_move_task() updated to use list_for_each_entry_safe() when walking task_iters and advancing them. This is necessary as advancing an iter may remove it from the list. Signed-off-by: Tejun Heo <tj@kernel.org>
2015-10-15	cgroup: replace cgroup_has_tasks() with cgroup_is_populated()	Tejun Heo
	Currently, cgroup_has_tasks() tests whether the target cgroup has any css_set linked to it. This works because a css_set's refcnt converges with the number of tasks linked to it and thus there's no css_set linked to a cgroup if it doesn't have any live tasks. To help tracking resource usage of zombie tasks, putting the ref of css_set will be separated from disassociating the task from the css_set which means that a cgroup may have css_sets linked to it even when it doesn't have any live tasks. This patch replaces cgroup_has_tasks() with cgroup_is_populated() which tests cgroup->nr_populated instead which locally counts the number of populated css_sets. Unlike cgroup_has_tasks(), cgroup_is_populated() is recursive - if any of the descendants is populated, the cgroup is populated too. While this changes the meaning of the test, all the existing users are okay with the change. While at it, replace the open-coded ->populated_cnt test in cgroup_events_show() with cgroup_is_populated(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org>
2015-10-15	cgroup: make cgroup->nr_populated count the number of populated css_sets	Tejun Heo
	Currently, cgroup->nr_populated counts whether the cgroup has any css_sets linked to it and the number of children which has non-zero ->nr_populated. This works because a css_set's refcnt converges with the number of tasks linked to it and thus there's no css_set linked to a cgroup if it doesn't have any live tasks. To help tracking resource usage of zombie tasks, putting the ref of css_set will be separated from disassociating the task from the css_set which means that a cgroup may have css_sets linked to it even when it doesn't have any live tasks. This patch updates cgroup->nr_populated so that for the cgroup itself it counts the number of css_sets which have tasks associated with them so that empty css_sets don't skew the populated test. Signed-off-by: Tejun Heo <tj@kernel.org>
2015-10-15	block: don't release bdi while request_queue has live references	Tejun Heo
	bdi's are initialized in two steps, bdi_init() and bdi_register(), but destroyed in a single step by bdi_destroy() which, for a bdi embedded in a request_queue, is called during blk_cleanup_queue() which makes the queue invisible and starts the draining of remaining usages. A request_queue's user can access the congestion state of the embedded bdi as long as it holds a reference to the queue. As such, it may access the congested state of a queue which finished blk_cleanup_queue() but hasn't reached blk_release_queue() yet. Because the congested state was embedded in backing_dev_info which in turn is embedded in request_queue, accessing the congested state after bdi_destroy() was called was fine. The bdi was destroyed but the memory region for the congested state remained accessible till the queue got released. a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback") changed the situation. Now, the root congested state which is expected to be pinned while request_queue remains accessible is separately reference counted and the base ref is put during bdi_destroy(). This means that the root congested state may go away prematurely while the queue is between bdi_dstroy() and blk_cleanup_queue(), which was detected by Andrey's KASAN tests. The root cause of this problem is that bdi doesn't distinguish the two steps of destruction, unregistration and release, and now the root congested state actually requires a separate release step. To fix the issue, this patch separates out bdi_unregister() and bdi_exit() from bdi_destroy(). bdi_unregister() is called from blk_cleanup_queue() and bdi_exit() from blk_release_queue(). bdi_destroy() is now just a simple wrapper calling the two steps back-to-back. While at it, the prototype of bdi_destroy() is moved right below bdi_setup_and_register() so that the counterpart operations are located together. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback") Cc: stable@vger.kernel.org # v4.2+ Reported-and-tested-by: Andrey Konovalov <andreyknvl@google.com> Link: http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kCY5GXJBwGfQ@mail.gmail.com Reviewed-by: Jan Kara <jack@suse.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-15	posix_cpu_timer: Reduce unnecessary sighand lock contention	Jason Low
	It was found while running a database workload on large systems that significant time was spent trying to acquire the sighand lock. The issue was that whenever an itimer expired, many threads ended up simultaneously trying to send the signal. Most of the time, nothing happened after acquiring the sighand lock because another thread had just already sent the signal and updated the "next expire" time. The fastpath_timer_check() didn't help much since the "next expire" time was updated after the threads exit fastpath_timer_check(). This patch addresses this by having the thread_group_cputimer structure maintain a boolean to signify when a thread in the group is already checking for process wide timers, and adds extra logic in the fastpath to check the boolean. Signed-off-by: Jason Low <jason.low2@hp.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: hideaki.kimura@hpe.com Cc: terry.rudd@hpe.com Cc: scott.norton@hpe.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1444849677-29330-5-git-send-email-jason.low2@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-15	posix_cpu_timer: Convert cputimer->running to bool	Jason Low
	In the next patch in this series, a new field 'checking_timer' will be added to 'struct thread_group_cputimer'. Both this and the existing 'running' integer field are just used as boolean values. To save space in the structure, we can make both of these fields booleans. This is a preparatory patch to convert the existing running integer field to a boolean. Suggested-by: George Spelvin <linux@horizon.com> Signed-off-by: Jason Low <jason.low2@hp.com> Reviewed: George Spelvin <linux@horizon.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: hideaki.kimura@hpe.com Cc: terry.rudd@hpe.com Cc: scott.norton@hpe.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1444849677-29330-4-git-send-email-jason.low2@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-14	net/mlx4_core: Replace VF zero mac with random mac in mlx4_core	Jack Morgenstein
	By design, when no default MAC addresses are set in the Hypervisor for VFs, the VFs are passed zero-macs. When such a MAC is received by the VF, it generates a random MAC address and registers that MAC address with the Hypervisor. This random mac generation is currently done in the mlx4_en module. There is a problem, though, if the mlx4_ib module is loaded by a VF before the mlx4_en module. In this case, for RoCE, mlx4_ib will see the un-replaced zero-mac and register that zero-mac as part of QP1 initialization. Having a zero-mac in the port's MAC table creates problems for a Baseboard Management Console. The BMC occasionally sends packets with a zero-mac destination MAC. If there is a zero-mac present in the port's MAC table, the FW will send such BMC packets to the host driver rather than to the wire, and BMC will stop working. To address this problem, we move the replacement of zero-mac addresses with random-mac addresses to procedure mlx4_slave_cap(), which is part of the driver startup for VFs, and is before activation of mlx4_ib and mlx4_en. As a result, zero-mac addresses will never be registered in the port MAC table by the driver. In addition, when mlx4_en does initialize the net device, it needs to set the NET_ADDR_RANDOM flag in the netdev structure if the address was randomly generated. This is done so that udev on the VM does not create a new device name after each VF probe (VM boot and such). To accomplish this, we add a per-port flag in mlx4_dev which gets set whenever mlx4_core replaces a zero-mac with a randomly-generated mac. This flag is examined when mlx4_en initializes the net-device. Fix was suggested by Matan Barak <matanb@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14	net/mlx5_core: Wait for FW readiness on startup	Eli Cohen
	On device initialization, wait till firmware indicates that that it is done with initialization before proceeding to initialize the device. Also update initialization segment layout to match driver/firmware interface definitions. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14	net/mlx5_core: Add pci error handlers to mlx5_core driver	Majd Dibbiny
	This patch implement the pci_error_handlers for mlx5_core which allow the driver to recover from PCI error. Once an error is detected in the PCI, the mlx5_pci_err_detected is called and it: 1) Marks the device to be in 'Internal Error' state. 2) Dispatches an event to the mlx5_ib to flush all the outstanding cqes with error. 3) Returns all the on going commands with error. 4) Unloads the driver. Afterwards, the FW is reset and mlx5_pci_slot_reset is called and it enables the device and restore it's pci state. If the later succeeds, mlx5_pci_resume is called, and it loads the SW stack. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14	net/mlx5_core: Fix internal error detection conditions	Eli Cohen
	The detection of a fatal condition has been updated to take into account the state reported by the device or by detecting an all ones read of the firmware version which indicates that the device is not accessible. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15	ACPI / tables: Add acpi_subtable_proc to ACPI table parsers	Lukasz Anaczkowski
	ACPI subtable parsing needs to be extended to allow two or more handlers to be run in the same ACPI table walk, thus adding acpi_subtable_proc structure which stores () ACPI table id () handler that processes table () counter how many items has been processed and passing it to acpi_parse_entries_array() and acpi_table_parse_entries_array(). This is needed to fix CPU enumeration when APIC/X2APIC entries are interleaved. Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-14	Merge tag 'efi-next' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi Pull v4.4 EFI updates from Matt Fleming: - Make the EFI System Resource Table (ESRT) driver explicitly non-modular by ripping out the module_* code since Kconfig doesn't allow it to be built as a module anyway. (Paul Gortmaker) - Make the x86 efi=debug kernel parameter, which enables EFI debug code and output, generic and usable by arm64. (Leif Lindholm) - Add support to the x86 EFI boot stub for 64-bit Graphics Output Protocol frame buffer addresses. (Matt Fleming) - Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled in the firmware and set an efi.flags bit so the kernel knows when it can apply more strict runtime mapping attributes - Ard Biesheuvel - Auto-load the efi-pstore module on EFI systems, just like we currently do for the efivars module. (Ben Hutchings) - Add "efi_fake_mem" kernel parameter which allows the system's EFI memory map to be updated with additional attributes for specific memory ranges. This is useful for testing the kernel code that handles the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware doesn't include support. (Taku Izumi) Note: there is a semantic conflict between the following two commits: 8a53554e12e9 ("x86/efi: Fix multiple GOP device support") ae2ee627dc87 ("efifb: Add support for 64-bit frame buffer addresses") I fixed up the interaction in the merge commit, changing the type of current_fb_base from u32 to u64. Signed-off-by: Ingo Molnar <mingo@kernel.org>