summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2013-06-17n_tty: Encapsulate minimum_to_wake within N_TTYPeter Hurley
minimum_to_wake is unique to N_TTY processing, and belongs in per-ldisc data. Add the ldisc method, ldisc_ops::fasync(), to notify line disciplines when signal-driven I/O is enabled or disabled. When enabled for N_TTY (by fcntl(F_SETFL, O_ASYNC)), blocking reader/polls will be woken for any readable input. When disabled, blocking reader/polls are not woken until the read buffer is full. Canonical mode (L_ICANON(tty), n_tty_data::icanon) is not affected by the minimum_to_wake setting. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-17Merge 3.10-rc6 into usb-nextGreg Kroah-Hartman
We want the fixes in this branch as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-17Merge 3.10-rc6 into tty-nextGreg Kroah-Hartman
We want the changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-17Merge 3.10-rc6 into staging-nextGreg Kroah-Hartman
We want the fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-17Merge 3.10-rc6 into char-misc-nextGreg Kroah-Hartman
We want the fixes in here.
2013-06-17ssb: add struct for serial flashRafał Miłecki
This data allow writing for example MTD driver. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-06-17Merge remote-tracking branch 'asoc/topic/wm8994' into asoc-nextMark Brown
2013-06-17Merge remote-tracking branch 'asoc/topic/ssm2518' into asoc-nextMark Brown
2013-06-17Merge remote-tracking branch 'asoc/topic/arizona' into asoc-nextMark Brown
2013-06-17pinctrl: establish pull-up/pull-down terminologyLinus Walleij
It is counter-intuitive to have "0" mean disable in a boolean manner for electronic properties of pins such as pull-up and pull-down. Therefore, define that a pull-up/pull-down argument of 0 to such a generic option means that the pin is short-circuited to VDD or GROUND. Pull disablement shall be done using PIN_CONFIG_BIAS_DISABLE. Cc: James Hogan <james.hogan@imgtec.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by Heiko Stuebner <heiko@sntech.de> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17Fix comment on pinctrl_gpio_range.pin_baseChristian Ruppert
The comment introduced with the recently added pinctrl_gpio_range.pins element was wrong. This corrects it. Thanks to Patrice Chotard for pointing this out. Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17pinctrl: move the pm state stubsLinus Walleij
The stubs for the !PINCTRL case were placed in the wrong part of the file, causing breakage in linux-next when compiling SH without pinctrl. Fix it up by moving the stubs to the right place. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17pinctrl: add pin list based GPIO rangesChristian Ruppert
Traditionally, GPIO ranges are based on consecutive ranges of both GPIO and pin numbers. This patch allows for GPIO ranges with arbitrary lists of pin numbers. Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17pinctrl: get rid of all platform data for coh901Linus Walleij
This deletes the dependency on any platform data for the COH901 pin controller. There is only one user in the kernel, and if we at some point want to support more variants, they shall provide their variant info through the device tree. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-17serial: sh-sci: HSCIF supportUlrich Hecht
Adds support for "High Speed Serial Communications Interface with FIFO", essentially a SCIF with 128-byte FIFOs and more accurate baud rate generator. Signed-off-by: Ulrich Hecht <ulrich.hecht@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2013-06-16percpu-refcount: use RCU-sched insted of normal RCUTejun Heo
percpu-refcount was incorrectly using preempt_disable/enable() for RCU critical sections against call_rcu(). 6a24474da8 ("percpu-refcount: consistently use plain (non-sched) RCU") fixed it by converting the preepmtion operations with rcu_read_[un]lock() citing that there isn't any advantage in using sched-RCU over using the usual one; however, rcu_read_[un]lock() for the preemptible RCU implementation - CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly more expensive than preempt_disable/enable(). In a contrived microbench which repeats the followings, - percpu_ref_get() - copy 32 bytes of data into percpu buffer - percpu_put_get() - copy 32 bytes of data into percpu buffer rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by about 15% when compared to using sched-RCU. As the RCU critical sections are extremely short, using sched-RCU shouldn't have any latency implications. Convert to RCU-sched. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au>
2013-06-16drivers: pinctrl sleep and idle states in the coreLinus Walleij
If a device have sleep and idle states in addition to the default state, look up these in the core and stash them in the pinctrl state container. Add accessor functions for pinctrl consumers to put the pins into "default", "sleep" and "idle" states passing nothing but the struct device * affected. Solution suggested by Kevin Hilman, Mark Brown and Dmitry Torokhov in response to a patch series from Hebbar Gururaja. Cc: Hebbar Gururaja <gururaja.hebbar@ti.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-16pinctrl: add pinconf-generic define for a pin-default pullHeiko Stübner
There exist controllers that don't support to set the pull to up or down separately but instead automatically set the pull direction based on embedded knowledge inside the controller, for example depending on the selected mux function of the pin. Therefore this patch adds another config option to use this default pull-state for a pin where it is not possible to know or decide if the pin will be pulled up or down. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-16pinconf-generic: add BIAS_BUS_HOLD pinconfJames Hogan
Add a new PIN_CONFIG_BIAS_BUS_HOLD pin configuration for a bus holder pin mode (also known as bus keeper, or repeater). This is a weak latch which drives the last value on a tristate bus. Another device on the bus can drive the bus high or low before going tristate to change the value driven by the pin. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-06-15clk: gate: add CLK_GATE_HIWORD_MASKHaojian Zhuang
In Rockchip Cortex-A9 based chips, they don't use paradigm of reading-changing-writing the register contents. Instead they use a hiword mask to indicate the changed bits. When b1 should be set as gate, it also needs to indicate the change by setting hiword mask (b1 << 16). The patch adds gate flag for this usage. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-06-15clk: divider: add CLK_DIVIDER_HIWORD_MASK flagHaojian Zhuang
In both Hisilicon & Rockchip Cortex-A9 based chips, they don't use the paradigm of reading-changing-writing the register contents. Instead they use a hiword mask to indicate the changed bits. When b01 should be set as setting divider, it also needs to indicate the change by setting hiword mask (b11 << 16). The patch adds divider flag for this usage. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-06-15clk: mux: add CLK_MUX_HIWORD_MASKHaojian Zhuang
In both Hisilicon & Rockchip Cortex-A9 based chips, they don't use the paradigm of reading-changing-writing the register contents. Instead they use a hiword mask to indicate the changed bits. When b01 should be set as switching mux, it also needs to indicate the change by setting hiword mask (b11 << 16). The patch adds mux flag for this usage. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-06-16Merge tag 'omap-pm-v3.11/avs' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap-pm into pm-omap Pull PM / AVS: SmartReflex: misc. cleanups for 3.11 from Kevin Hilman.
2013-06-14smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().David Daney
Thanks to commit f91eb62f71b3 ("init: scream bloody murder if interrupts are enabled too early"), "bloody murder" is now being screamed. With a MIPS OCTEON config, we use on_each_cpu() in our irq_chip.irq_bus_sync_unlock() function. This gets called in early as a result of the time_init() call. Because the !SMP version of on_each_cpu() unconditionally enables irqs, we get: WARNING: at init/main.c:560 start_kernel+0x250/0x410() Interrupts were enabled early CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801 Call Trace: show_stack+0x68/0x80 warn_slowpath_common+0x78/0xb0 warn_slowpath_fmt+0x38/0x48 start_kernel+0x250/0x410 Suggested fix: Do what we already do in the SMP version of on_each_cpu(), and use local_irq_save/local_irq_restore. Because we need a flags variable, make it a static inline to avoid name space issues. [ Change from v1: Convert on_each_cpu to a static inline function, add #include <linux/irqflags.h> to avoid build breakage on some files. on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as on_each_cpu(), but they are not causing !SMP bugs for me, so I will defer changing them to a less urgent patch. ] Signed-off-by: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-14Merge tag 'renesas-gpio-rcar-for-v3.11' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/drivers From Simon Horman: Renesas ARM based SoC GPIO R-Car updates for v3.11 DT support to GPIO R-Car driver by Laurent Pinchart. * tag 'renesas-gpio-rcar-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: (131 commits) gpio-rcar: Add DT support
2013-06-14Merge tag 'renesas-phy-rcar-usb-for-v3.11' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/soc From Simon Horman: Renesas USB updates for v3.11 These updates are by Sergei Shtylyov to clean-up USB support present for R8A7779/Marzen and then extend USB support coverage to R8A7778/BOCK-W. * tag 'renesas-phy-rcar-usb-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: BOCK-W: add USB support ARM: shmobile: r8a7778: add USB support phy-rcar-usb: add R8A7778 support phy-rcar-usb: handle platform data ARM: shmobile: Marzen: pass platform data to USB PHY device phy-rcar-usb: add platform data phy-rcar-usb: correct base address ARM: shmobile: r8a7779: remove USB PHY 2nd memory resource phy-rcar-usb: remove EHCI internal buffer setup ARM: shmobile: r8a7779: setup EHCI internal buffer ehci-platform: add pre_setup() method to platform data ARM: shmobile: Marzen: move USB EHCI, OHCI, and PHY devices to R8A7779 code Conflicts: arch/arm/mach-shmobile/board-marzen.c arch/arm/mach-shmobile/setup-r8a7778.c Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14Merge tag 'ux500-dma40-for-arm-soc-2' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson into next/drivers From Linus Walleij: Second set of DMA40 changes: refactorings and device tree support for the DMA40. Now with MUSB and some platform data removal. * tag 'ux500-dma40-for-arm-soc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson: dmaengine: ste_dma40: Fetch disabled channels from DT dmaengine: ste_dma40: Fetch the number of physical channels from DT ARM: ux500: Stop passing DMA platform data though AUXDATA dmaengine: ste_dma40: Allow memcpy channels to be configured from DT dmaengine: ste_dma40_ll: Replace meaningless register set with comment dmaengine: ste_dma40: Convert data_width from register bit format to value dmaengine: ste_dma40_ll: Use the BIT macro to replace ugly '(1 << x)'s ARM: ux500: Remove recently unused stedma40_xfer_dir enums dmaengine: ste_dma40: Replace ST-E's home-brew DMA direction defs with generic ones ARM: ux500: Replace ST-E's home-brew DMA direction definition with the generic one dmaengine: ste_dma40: Use the BIT macro to replace ugly '(1 << x)'s ARM: ux500: Remove empty function u8500_of_init_devices() ARM: ux500: Remove ux500-musb platform registation when booting with DT usb: musb: ux500: add device tree probing support usb: musb: ux500: attempt to find channels by name before using pdata usb: musb: ux500: harden checks for platform data usb: musb: ux500: take the dma_mask from coherent_dma_mask usb: musb: ux500: move the MUSB HDRC configuration into the driver usb: musb: ux500: move channel number knowledge into the driver
2013-06-14Merge branch 'pci/jiang-bus-lock-v3' into nextBjorn Helgaas
* pci/jiang-bus-lock-v3: PCI: Return early on allocation failures to unindent mainline code PCI: Simplify IOV implementation and fix reference count races PCI: Drop redundant setting of bus->is_added in virtfn_add_bus() unicore32/PCI: Remove redundant call of pci_bus_add_devices() m68k/PCI: Remove redundant call of pci_bus_add_devices() PCI: Rename pci_release_bus_bridge_dev() to pci_release_host_bridge_dev() PCI: Fix refcount issue in pci_create_root_bus() error recovery path ia64/PCI: Clean up pci_scan_root_bus() usage PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus) PCI: Introduce pci_alloc_dev(struct pci_bus*) to replace alloc_pci_dev() PCI: Introduce pci_bus_{get|put}() to manage PCI bus reference count Conflicts: drivers/pci/probe.c
2013-06-14Merge branch 'pci/misc' into nextBjorn Helgaas
* pci/misc: PCI / ACPI / PM: Use correct power state strings in messages PCI: Fix comment typo for pcie_pme_remove() PCI: Add pcibios_release_device()
2013-06-14Merge tag 'omap-for-v3.11/soc-signed' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/soc From Tony Lindgren: Omap SoC changes. Mostly improves am33xx support, and adds minimal support for am43x SoCs. * tag 'omap-for-v3.11/soc-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: AM43x: SRAM base and size ARM: OMAP2+: AM43x: GP or HS ? ARM: OMAP2+: AM43x: early init ARM: OMAP2+: AM43x: static mapping ARM: OMAP2+: AM437x: SoC revision detection ARM: OMAP2+: AM43x: soc_is support ARM: OMAP2+: AM43x: kbuild ARM: OMAP2+: AM43x: Kconfig ARM: OMAP2+: separate out OMAP4 restart ARM: AM33XX: clk: Add clock node for EHRPWM TBCLK ARM: OMAP3: clock data: get rid of unused USB host clock aliases and dummies ARM: OMAP2+: AM33xx: Add missing reset status info to GFX hwmod + Linux 3.10-rc5 Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14Merge tag 'omap-for-v3.11/cleanup-signed' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/cleanup From Tony Lindgren: Move omap4 over to device tree based booting. This allows us to get rid a big pile of platform init code for things that are already handled by device tree related code. As am33xx is already device tree based, we can also remove the same data for am33xx. * tag 'omap-for-v3.11/cleanup-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP4: hwmod data: Remove irq entries from mcspi, mmc hwmods ARM: OMAP4: hwmod data: add DSS data back ARM: OMAP4: hwmod data: Clean up the data file ARM: AM33XX: hwmod data: irq, dma and addr info clean up ARM: OMAP2+: Remove omap4 ocp2scp pdata ARM: OMAP2+: Remove omap4 pdata for USB ARM: OMAP2+: Remove omap4 pdata from hsmmc.c ARM: OMAP2+: Remove legacy mux data for omap4 ARM: OMAP2+: Remove board-omap4panda.c ARM: OMAP2+: Remove board-4430sdp.c ARM: OMAP2+: Legacy support for wl12xx when booted with devicetree Resolved merge conflict due to a fix for 3.10 (the fix is removed since the code is no longer used -- data comes from device tree). Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14Merge tag 'omap-for-v3.10/fixes-v3.10-rc4' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/cleanup Pulling in a set of fixes from Tony Lindgren to resolve conflicts with later cleanup branch: A set of small fixes for omaps for the -rc cycle: - am7303 iva2 reset PM regression fix - am33xx uart2 dma channel fix - am33xx gpmc properties fix - omap44xx rtc wake-up mux fix for nirq pins - omap36xx clock divider restore fix There's also one tiny non-critical .dts fix for omap5 timer pwm properties. * tag 'omap-for-v3.10/fixes-v3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: omap3: clock: fix wrong container_of in clock36xx.c ARM: dts: OMAP5: Fix missing PWM capability to timer nodes ARM: dts: omap4-panda|sdp: Fix mux for twl6030 IRQ pin and msecure line ARM: dts: AM33xx: Fix properties on gpmc node arm: omap2: fix AM33xx hwmod infos for UART2 + Linux 3.10-rc4 Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14Merge tag 'omap-for-v3.11/fixes-non-critical-signed' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/fixes-non-critical From Tony Lindgren: Non-critical fixes for omaps for v3.11 merge window. * tag 'omap-for-v3.11/fixes-non-critical-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2+: omap-usb-host: Fix memory leaks ARM: OMAP2+: Fix serial init for device tree based booting arm/omap: use const char properly ARM: OMAP2+: devices: Do not print error when dss_hdmi hwmod lookup fails ARM: OMAP2+: devices: Do not print error when DMIC hwmod lookup fails ARM: OMAP2+: devices: Do not print error when McPDM hwmod lookup fails ARM: OMAP: add vdds_sdi supply for omapdss_sdi.0 ARM: OMAP: add vdds_dsi supply for omapdss_dpi.0 ARM: OMAP: fix dsi regulator names + Linux 3.10-rc5 Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-14mm: thp: Correct the HPAGE_PMD_ORDER check.Steve Capper
All Transparent Huge Pages are allocated by the buddy allocator. A compile time check is in place that fails when the order of a transparent huge page is too large to be allocated by the buddy allocator. Unfortunately that compile time check passes when: HPAGE_PMD_ORDER == MAX_ORDER ( which is incorrect as the buddy allocator can only allocate memory of order strictly less than MAX_ORDER. ) This patch updates the compile time check to fail in the above case. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14mm: hugetlb: Copy huge_pmd_share from x86 to mm.Steve Capper
Under x86, multiple puds can be made to reference the same bank of huge pmds provided that they represent a full PUD_SIZE of shared huge memory that is aligned to a PUD_SIZE boundary. The code to share pmds does not require any architecture specific knowledge other than the fact that pmds can be indexed, thus can be beneficial to some other architectures. This patch copies the huge pmd sharing (and unsharing) logic from x86/ to mm/ and introduces a new config option to activate it: CONFIG_ARCH_WANTS_HUGE_PMD_SHARE Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-13cgroup: update sane_behavior documentationTejun Heo
f12dc02014 ("cgroup: mark "tasks" cgroup file as insane") and cc5943a781 ("cgroup: mark "notify_on_release" and "release_agent" cgroup files insane") forgot to update the changed behavior documentation in cgroup.h. Update it. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13cgroup: use percpu refcnt for cgroup_subsys_statesTejun Heo
A css (cgroup_subsys_state) is how each cgroup is represented to a controller. As such, it can be used in hot paths across the various subsystems different controllers are associated with. One of the common operations is reference counting, which up until now has been implemented using a global atomic counter and can have significant adverse impact on scalability. For example, css refcnt can be gotten and put multiple times by blkcg for each IO request. For highops configurations which try to do as much per-cpu as possible, the global frequent refcnting can be very expensive. In general, given the various and hugely diverse paths css's end up being used from, we need to make it cheap and highly scalable. In its usage, css refcnting isn't very different from module refcnting. This patch converts css refcnting to use the recently added percpu_ref. css_get/tryget/put() directly maps to the matching percpu_ref operations and the deactivation logic is no longer necessary as percpu_ref already has refcnt killing. The only complication is that as the refcnt is per-cpu, percpu_ref_kill() in itself doesn't ensure that further tryget operations will fail, which we need to guarantee before invoking ->css_offline()'s. This is resolved collecting kill confirmation using percpu_ref_kill_and_confirm() and initiating the offline phase of destruction after all css refcnt's are confirmed to be seen as killed on all CPUs. The previous patches already splitted destruction into two phases, so percpu_ref_kill_and_confirm() can be hooked up easily. This patch removes css_refcnt() which is used for rcu dereference sanity check in css_id(). While we can add a percpu refcnt API to ask the same question, css_id() itself is scheduled to be removed fairly soon, so let's not bother with it. Just drop the sanity check and use rcu_dereference_raw() instead. v2: - init_cgroup_css() was calling percpu_ref_init() without checking the return value. This causes two problems - the obvious lack of error handling and percpu_ref_init() being called from cgroup_init_subsys() before the allocators are up, which triggers warnings but doesn't cause actual problems as the refcnt isn't used for roots anyway. Fix both by moving percpu_ref_init() to cgroup_create(). - The base references were put too early by percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the refs one extra time. This wasn't noticeable because css's go through another RCU grace period before being freed. Update cgroup_destroy_locked() to grab an extra reference before killing the refcnts. This problem was noticed by Kent. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <koverstreet@google.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Alasdair G. Kergon" <agk@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Glauber Costa <glommer@gmail.com>
2013-06-13Merge branch 'for-3.11' of ↵Tejun Heo
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.11 This is to receive percpu_refcount which will replace atomic_t reference count in cgroup_subsys_state. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13cgroup: split cgroup destruction into two stepsTejun Heo
Split cgroup_destroy_locked() into two steps and put the latter half into cgroup_offline_fn() which is executed from a work item. The latter half is responsible for offlining the css's, removing the cgroup from internal lists, and propagating release notification to the parent. The separation is to allow using percpu refcnt for css. Note that this allows for other cgroup operations to happen between the first and second halves of destruction, including creating a new cgroup with the same name. As the target cgroup is marked DEAD in the first half and cgroup internals don't care about the names of cgroups, this should be fine. A comment explaining this will be added by the next patch which implements the actual percpu refcnting. As RCU freeing is guaranteed to happen after the second step of destruction, we can use the same work item for both. This patch renames cgroup->free_work to ->destroy_work and uses it for both purposes. INIT_WORK() is now performed right before queueing the work item. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13percpu-refcount: implement percpu_tryget() along with ↵Tejun Heo
percpu_ref_kill_and_confirm() Implement percpu_tryget() which stops giving out references once the percpu_ref is visible as killed. Because the refcnt is per-cpu, different CPUs will start to see a refcnt as killed at different points in time and tryget() may continue to succeed on subset of cpus for a while after percpu_ref_kill() returns. For use cases where it's necessary to know when all CPUs start to see the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new function takes an extra argument @confirm_kill which is invoked when the refcnt is guaranteed to be viewed as killed on all CPUs. While this isn't the prettiest interface, it doesn't force synchronous wait and is much safer than requiring the caller to do its own call_rcu(). v2: Patch description rephrased to emphasize that tryget() may continue to succeed on some CPUs after kill() returns as suggested by Kent. v3: Function comment in percpu_ref_kill_and_confirm() updated warning people to not depend on the implied RCU grace period from the confirm callback as it's an implementation detail. Signed-off-by: Tejun Heo <tj@kernel.org> Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13net/mlx4: Add VF link state supportRony Efraim
Add support to change the link state of VF (vPort) Signed-off-by: Rony Efraim <ronye@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13net/core: Add VF link state controlRony Efraim
Add netlink directives and ndo entry to allow for controling VF link, which can be in one of three states: Auto - VF link state reflects the PF link state (default) Up - VF link state is up, traffic from VF to VF works even if the actual PF link is down Down - VF link state is down, no traffic from/to this VF, can be of use while configuring the VF Signed-off-by: Rony Efraim <ronye@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13net-rps: fixes for rps flow limitWillem de Bruijn
Caught by sparse: - __rcu: missing annotation to sd->flow_limit - __user: direct access in cpumask_scnprintf Also - add endline character when printing bitmap if room in buffer - avoid bucket overflow by reducing FLOW_LIMIT_HISTORY The last item warrants some explanation. The hashtable buckets are subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal to bucket size, since all packets may end up in a single bucket. The current (rather arbitrary) history value of 256 happens to match the buffer size (u8). As a result, with a single flow, the first 128 packets are accepted (correct), the second 128 packets dropped (correct) and then the history[] array has filled, so that each subsequent new packet causes an increment in the bucket for new_flow plus a decrement for old_flow: a steady state. This is fine if packets are dropped, as the steady state goes away as soon as a mix of traffic reappears. But, because the 256th packet overflowed the bucket to 0: no packets are dropped. Instead of explicitly adding an overflow check, this patch changes FLOW_LIMIT_HISTORY to never be able to overflow a single bucket. Reported-by: Fengguang Wu <fengguang.wu@intel.com> (first item) Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13Merge branch 'rcu/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fixes from Paul McKenney: "I must confess that this past merge window was not RCU's best showing. This series contains three more fixes for RCU regressions: 1. A fix to __DECLARE_TRACE_RCU() that causes it to act as an interrupt from idle rather than as a task switch from idle. This change is needed due to the recent use of _rcuidle() tracepoints that can be invoked from interrupt handlers as well as from idle. Without this fix, invoking _rcuidle() tracepoints from interrupt handlers results in splats and (more seriously) confusion on RCU's part as to whether a given CPU is idle or not. This confusion can in turn result in too-short grace periods and therefore random memory corruption. 2. A fix to a subtle deadlock that could result due to RCU doing a wakeup while holding one of its rcu_node structure's locks. Although the probability of occurrence is low, it really does happen. The fix, courtesy of Steven Rostedt, uses irq_work_queue() to avoid the deadlock. 3. A fix to a silent deadlock (invisible to lockdep) due to the interaction of timeouts posted by RCU debug code enabled by CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU hotplug operations. This will not occur in production kernels, but really does occur in randconfig testing. Diagnosis courtesy of Steven Rostedt" * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration rcu: Don't call wakeup() with rcu_node structure ->lock held trace: Allow idle-safe tracepoints to be called from irq
2013-06-13percpu-refcount: implement percpu_ref_cancel_init()Tejun Heo
Normally, percpu_ref_init() initializes and percpu_ref_kill() initiates destruction which completes asynchronously. The asynchronous destruction can be problematic in init failure path where the caller wants to destroy half-constructed object - distinguishing half-constructed objects from the usual release method can be painful for complex objects. This patch implements percpu_ref_cancel_init() which synchronously destroys the percpu_ref without invoking release. To avoid unintentional misuses, the function requires the ref to have finished percpu_ref_init() but never used and triggers WARN otherwise. v2: Explain the weird name and usage restriction in the function comment. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13percpu-refcount: add __must_check to percpu_ref_init() and don't use ↵Tejun Heo
ACCESS_ONCE() in percpu_ref_kill_rcu() Two small changes. * Unlike most init functions, percpu_ref_init() allocates memory and may fail. Let's mark it with __must_check in case the caller forgets. * percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to dereference @ref->pcpu_count, which can be misleading. The pointer is guaranteed to be valid and visible and can't change underneath the function. Drop ACCESS_ONCE(). Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13cgroup: remove cgroup->count and useTejun Heo
cgroup->count tracks the number of css_sets associated with the cgroup and used only to verify that no css_set is associated when the cgroup is being destroyed. It's superflous as the destruction path can simply check whether cgroup->cset_links is empty instead. Drop cgroup->count and check ->cset_links directly from cgroup_destroy_locked(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13cgroup: rename CGRP_REMOVED to CGRP_DEADTejun Heo
We will add another flag indicating that the cgroup is in the process of being killed. REMOVING / REMOVED is more difficult to distinguish and cgroup_is_removing()/cgroup_is_removed() are a bit awkward. Also, later percpu_ref usage will involve "kill"ing the refcnt. s/CGRP_REMOVED/CGRP_DEAD/ s/cgroup_is_removed()/cgroup_is_dead() This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13cgroup: clean up css_[try]get() and css_put()Tejun Heo
* __css_get() isn't used by anyone. Fold it into css_get(). * Add proper function comments to all css reference functions. This patch is purely cosmetic. v2: Typo fix as per Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13cgroup: bring some sanity to naming around cg_cgroup_linkTejun Heo
cgroups and css_sets are mapped M:N and this M:N mapping is represented by struct cg_cgroup_link which forms linked lists on both sides. The naming around this mapping is already confusing and struct cg_cgroup_link exacerbates the situation quite a bit. >From cgroup side, it starts off ->css_sets and runs through ->cgrp_link_list. From css_set side, it starts off ->cg_links and runs through ->cg_link_list. This is rather reversed as cgrp_link_list is used to iterate css_sets and cg_link_list cgroups. Also, this is the only place which is still using the confusing "cg" for css_sets. This patch cleans it up a bit. * s/cgroup->css_sets/cgroup->cset_links/ s/css_set->cg_links/css_set->cgrp_links/ s/cgroup_iter->cg_link/cgroup_iter->cset_link/ * s/cg_cgroup_link/cgrp_cset_link/ * s/cgrp_cset_link->cg/cgrp_cset_link->cset/ s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/ s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/ * s/init_css_set_link/init_cgrp_cset_link/ s/free_cg_links/free_cgrp_cset_links/ s/allocate_cg_links/allocate_cgrp_cset_links/ * s/cgl[12]/link[12]/ in compare_css_sets() * s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar adustments. * Comment and whiteline adjustments. After the changes, we have list_for_each_entry(link, &cont->cset_links, cset_link) { struct css_set *cset = link->cset; instead of list_for_each_entry(link, &cont->css_sets, cgrp_link_list) { struct css_set *cset = link->cg; This patch is purely cosmetic. v2: Fix broken sentences in the patch description. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>