Age | Commit message (Collapse) | Author |
|
This change is meant to make the code much more readable for MTQC and MRQC
configuration.
The big change is that I simplified much of the logic so that we are
essentially handling just 4 cases and their variants. In the cases where
RSS is disabled we are actually just programming the RETA table with all
1s resulting in a single queue RSS. In the case of SR-IOV I am treating
that as a subset of VMDq. This all results int he following configuration
for the hardware:
DCB
En Dis
VMDq En VMDQ/DCB VMDq/RSS
Dis DCB/RSS RSS
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This change cleans up some of the logic in an attempt to try and simplify
things for how we are configuring DCB w/ RSS.
In this patch I basically did 3 things. I updated the logic for getting
the first register index. I applied the fact that all TCs get the same
number of queues to simplify the looping logic in caching the DCB ring
register. Finally I updated how we configure the RQTC register to match
the fact that all TCs are assigned the same number of queues.
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
It makes much more sense for us to configure the real number of Tx and Rx
queues in the ixgbe_open call than it does in ixgbe_set_num_queues. By
setting the number in ixgbe_open we can avoid a number of unecessary
updates and only have to make the calls once.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Previously we were exiting without cleaning up the memory internally on the
ixgbe_setup_rx_resources and ixgbe_setup_tx_resources calls. Instead of
forcing the caller to clean things up for us we should instead just unwind
the rings and free the memory as we go. This way we can more gracefully
clean up the rings in the event of an allocation failure.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
When the link status changes on the PF we need to notify the VFs. In order
to do this we should ping all of the VFs in order to trigger a link status
change on them as well.
This fixes issues in which the PF would reset, but the VF didn't because the
NAK flag was not set in the VF mailbox.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
We want it to be possible for target_submit_cmd() to return errors up
to its fabric module callers. For now just update the prototype to
return an int, and update all callers to handle non-zero return values
as an error.
This is immediately useful for tcm_qla2xxx to fix a long-standing active
I/O session shutdown race, but tcm_fc, usb-gadget, and sbp-target the
fabric maintainers need to check + ACK that handling a target_submit_cmd()
failure due to session shutdown does not introduce regressions
(nab: Respin against for-next after initial NACK + update docbook comment +
fix double se_cmd init in exception path for usb-gadget)
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: Arun Easi <arun.easi@qlogic.com>
Cc: Chris Boot <bootc@bootc.net>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Andy Grover <agrover@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
|
Merge Andrew's remaining patches for 3.5:
"Nine fixes"
* Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (9 commits)
mm: fix lost kswapd wakeup in kswapd_stop()
m32r: make memset() global for CONFIG_KERNEL_BZIP2=y
m32r: add memcpy() for CONFIG_KERNEL_GZIP=y
m32r: consistently use "suffix-$(...)"
m32r: fix 'fix breakage from "m32r: use generic ptrace_resume code"' fallout
m32r: fix pull clearing RESTORE_SIGMASK into block_sigmask() fallout
m32r: remove duplicate definition of PTRACE_O_TRACESYSGOOD
mn10300: fix "pull clearing RESTORE_SIGMASK into block_sigmask()" fallout
bootmem: make ___alloc_bootmem_node_nopanic() really nopanic
|
|
Offlining memory may block forever, waiting for kswapd() to wake up
because kswapd() does not check the event kthread->should_stop before
sleeping.
The proper pattern, from Documentation/memory-barriers.txt, is:
--- waker ---
event_indicated = 1;
wake_up_process(event_daemon);
--- sleeper ---
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (event_indicated)
break;
schedule();
}
set_current_state() may be wrapped by:
prepare_to_wait();
In the kswapd() case, event_indicated is kthread->should_stop.
=== offlining memory (waker) ===
kswapd_stop()
kthread_stop()
kthread->should_stop = 1
wake_up_process()
wait_for_completion()
=== kswapd_try_to_sleep (sleeper) ===
kswapd_try_to_sleep()
prepare_to_wait()
.
.
schedule()
.
.
finish_wait()
The schedule() needs to be protected by a test of kthread->should_stop,
which is wrapped by kthread_should_stop().
Reproducer:
Do heavy file I/O in background.
Do a memory offline/online in a tight loop
Signed-off-by: Aaditya Kumar <aaditya.kumar@ap.sony.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the m32r compile error:
arch/m32r/boot/compressed/misc.c:31:14: error: static declaration of 'memset' follows non-static declaration
make[5]: *** [arch/m32r/boot/compressed/misc.o] Error 1
make[4]: *** [arch/m32r/boot/compressed/vmlinux] Error 2
by removing the static keyword.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the m32r link error:
LD arch/m32r/boot/compressed/vmlinux
arch/m32r/boot/compressed/misc.o: In function `zlib_updatewindow':
misc.c:(.text+0x190): undefined reference to `memcpy'
misc.c:(.text+0x190): relocation truncated to fit: R_M32R_26_PLTREL against undefined symbol `memcpy'
make[5]: *** [arch/m32r/boot/compressed/vmlinux] Error 1
by adding our own implementation of memcpy().
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit a556bec9955c ("m32r: fix arch/m32r/boot/compressed/Makefile")
changed "$(suffix_y)" to "$(suffix-y)", but didn't update any location
where "suffix_y" is set, causing:
make[5]: *** No rule to make target `arch/m32r/boot/compressed/vmlinux.bin.', needed by `arch/m32r/boot/compressed/piggy.o'. Stop.
make[4]: *** [arch/m32r/boot/compressed/vmlinux] Error 2
make[3]: *** [zImage] Error 2
Correct the other locations to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit acdc0d5ef9dd ('m32r: fix breakage from "m32r: use generic
ptrace_resume code"') tried to fix a problem in commit e34112e3966fc
("m32r: use generic ptrace_resume code") by returning values in a
function returning void, causing:
arch/m32r/kernel/ptrace.c: In function 'user_enable_single_step':
arch/m32r/kernel/ptrace.c:594:3: warning: 'return' with a value, in function returning void [enabled by default]
arch/m32r/kernel/ptrace.c:598:3: warning: 'return' with a value, in function returning void [enabled by default]
arch/m32r/kernel/ptrace.c:601:3: warning: 'return' with a value, in function returning void [enabled by default]
arch/m32r/kernel/ptrace.c:604:2: warning: 'return' with a value, in function returning void [enabled by default]
Remove the unneeded return values.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit a610d6e672d6 ("pull clearing RESTORE_SIGMASK into
block_sigmask()") caused:
arch/m32r/kernel/signal.c: In function 'handle_signal':
arch/m32r/kernel/signal.c:289:6: warning: 'return' with a value, in function returning void [enabled by default]
Remove the return value it forgot to remove.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the m32r build warning:
include/linux/ptrace.h:66:0: warning: "PTRACE_O_TRACESYSGOOD" redefined [enabled by default]
arch/m32r/include/asm/ptrace.h:117:0: note: this is the location of the previous definition
We already have it in <linux/ptrace.h>, so remove it from <asm/ptrace.h>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit a610d6e672d6 ("pull clearing RESTORE_SIGMASK into
block_sigmask()") caused:
arch/mn10300/kernel/signal.c: In function 'handle_signal':
arch/mn10300/kernel/signal.c:462:3: warning: 'return' with no value, in function returning non-void [-Wreturn-type]
Add the missing return values, and restore the indentation while we're
at it.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In reaction to commit 99ab7b19440a ("mm: sparse: fix usemap allocation
above node descriptor section") Johannes said:
| while backporting the below patch, I realised that your fix busted
| f5bf18fa22f8 again. The problem was not a panicking version on
| allocation failure but when the usemap size was too large such that
| goal + size > limit triggers the BUG_ON in the bootmem allocator. So
| we need a version that passes limit ONLY if the usemap is smaller than
| the section.
after checking the code, the name of ___alloc_bootmem_node_nopanic()
does not reflect the fact.
Make bootmem really not panic.
Hope will kill bootmem sooner.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes Wang <hayeswang@realtek.com>
|
|
It is not needed for mac_ocp_{write / read}. Actually bit 31 of OCPDR
does not change and r8168_mac_ocp_read always returns ~0.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Tested-by: Francois Romieu <romieu@fr.zoreil.com>
|
|
git://git.linaro.org/people/ljones/linux-3.0-ux500 into next/dt
From Lee Jones <lee.jones@linaro.org>:
* 'for-arm-soc-next' of git://git.linaro.org/people/ljones/linux-3.0-ux500:
ARM: ux500: Remove PMU platform registration when booting with DT
ARM: ux500: Remove temporary snowball_of_platform_devs enablement structure
ARM: ux500: Ensure vendor specific properties have the vendor's identifier
pinctrl: pinctrl-nomadik: Append sleepmode property with vendor specific prefixes
ARM: ux500: Move rtc-pl031 registration to Device Tree when enabled
ARM: ux500: Enable the AB8500 RTC for all DT:ed DB8500 based devices
ARM: ux500: Correctly reference IRQs supplied by the AB8500 from Device Tree
ARM: ux500: Apply ab8500-debug node do the db8500 DT structure
ARM: ux500: Add a ab8500-usb Device Tree node for db8500 based devices
ARM: ux500: Add db8500 Device Tree node for misc/ab8500-pwm
ARM: ux500: Add db8500 Device Tree node for ab8500-sysctrl
ARM: ux500: Enable LED heartbeat functionality on Snowbal via DT
ARM: ux500: Enable LED heartbeat functionality on Snowball
ARM: ux500: Add support for input/ponkey into the db8500's Device Tree
ARM: ux500: Add a ab8500-gpadc node to the db8500 Device Tree
ARM: ux500: Enable the user LED on Snowball via Device Tree
ARM: ux500: Kconfig: Compile in leds-gpio support for Snowball
ARM: ux500: Provide auxdata to be used as name base clock search for nmk-i2c
ARM: ux500: Remove unused i2c platform_data initialisation code
ARM: ux500: Enable Device Tree support mmci for Snowball
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
prerequisite for ux500/dt branch
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/soc
From Kukjin Kim <kgene.kim@samsung.com>:
This is general development for Samsung stuff for v3.6
* 'next/devel-samsung' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: SAMSUNG: Introduce Kconfig variable for Samsung custom clk API
ARM: EXYNOS: Add missing static storage class specifier in pmu.c file
ARM: EXYNOS: Make combiner_init function static
ARM: EXYNOS: Update HSOTG PHY clock setting for EXYNOS4X12
ARM: EXYNOS: Clear SYS_WDTRESET bit to use watchdog reset
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull a last-minute PM update from Rafael J. Wysocki:
"This renames CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND to encourage future
reuse of the capability in question in related cases."
* tag 'pm-post-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND
|
|
Bindings for DMA channels are still under discussion and will
change once this has been resolved. Therefore we mark them
the newly added ones as preliminary. Let's hope nobody starts
relying on them...
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/dt
From Kukjin Kim <kgene.kim@samsung.com>:
It is for supporting spi dt for exynos4210 and exynos5250, and got the
ack from Grant Likely for spi driver.
Note: Since this is including spi driver changes, so it was made based
on next/devel-dma-ops which touches same file, Samsung spi driver for
avoiding bad conflicts.
* 'next/dt-samsung' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: dts: Add nodes for spi controllers for SAMSUNG EXYNOS5 platforms
ARM: EXYNOS: Enable platform support for SPI controllers for EXYNOS5
ARM: EXYNOS: Add spi clock support for EXYNOS5
ARM: dts: Add nodes for spi controllers for SAMSUNG EXYNOS4 platforms
ARM: EXYNOS: Enable platform support for SPI controllers for EXYNOX4
ARM: EXYNOS: Fix the incorrect hierarchy of spi controller bus clock
ARM: EXYNOS: Add device tree node for EXYNOS4 interrupt combiner controller
spi: s3c64xx: add device tree support
spi: s3c64xx: Remove the 'set_level' callback from controller data
ARM: SAMSUNG: Modify s3c64xx_spi{0|1|2}_set_platdata function
ARM: SAMSUNG: Remove pdev pointer parameter from spi gpio setup functions
spi: s3c64xx: move controller information into driver data
spi: s3c64xx: remove unused S3C64XX_SPI_ST_TRLCNTZ macro
ARM: S3C64XX: Add a new dma request id for device tree based dma channel lookup
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Required as a dependency for samsung/dt changes.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/boards
From Kukjin Kim <kgene.kim@samsung.com>:
The branch includes updating each Samsung boards such as SMDK4X12,
Aquila, Goni and so on, and it is for audio platform device and
supporting of HSOTG or framebuffer.
* 'next/board-samsung-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: EXYNOS: Add framebuffer support for SMDK4X12
ARM: EXYNOS: Add HSOTG support to SMDK4X12
ARM: S5PV210: Add audio platform device in Goni board
ARM: S5PV210: Add audio platform device in Aquila board
ARM: EXYNOS: Add audio platform device in SMDKV310 board
ARM: S3C64XX: Don't specify an irq_base for WM1192-EV1 board
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/cleanup
From Kukjin Kim <kgene.kim@samsung.com>:
Here is second Samsung cleanup pull request for v3.6.
* 'next/cleanup-samsung-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: S3C24XX: Remove unused GPIO definitions for Openmoko GTA02 board
ARM: S3C24XX: Remove unused GPIO definitions for port J
ARM: S3C24XX: Remove unused GPA, GPE, GPH bank GPIO aliases
ARM: S3C24XX: Convert the touchscreen setup code to common GPIO API
ARM: S3C24XX: Convert the PM code to gpiolib API
ARM: S3C24XX: Convert QT2410 board file to the gpiolib API
ARM: S3C24XX: Convert SMDK board file to the gpiolib API
ARM: S3C24XX: Free the backlight gpio requested in Mini2440 board code
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
free_nh_exceptions() should use rcu_dereference_protected(..., 1)
since its called after one RCU grace period.
Also add some const-ification in recent code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
From Sascha Hauer <s.hauer@pengutronix.de>:
i.MX clk noncritical fixes and updates
* tag 'imx-clk' of git://git.pengutronix.de/git/imx/linux-2.6:
ARM: imx: clk-imx31: Fix clock id for rnga driver
ARM: imx: add missing item to the list of clock event modes
ARM: i.MX5x CSPI: Fixed clock name for CSPI
ARM: i.MX5x clocks: Fix GPT clocks
ARM: i.MX5x clocks: Fix parent for PWM clocks
ARM: i.MX5x clocks: Add EPIT support
ARM: mx27: Reenable silicon version print
ARM: clk-imx27: Fix rtc clock id
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
One more addition from Thomas Petazzoni:
* mvebu/newsoc:
arm: mvebu: generate DTBs for supported SoCs
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Add the necessary dtb-$(CONFIG_...) entries so that "make dtbs"
generates the Device Tree Blobs that correspond to the selected mvebu
SoCs.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Patches from Alexander Shiyan <shc_work@mail.ru>:
* clps711x/cleanup:
ARM: clps711x: Remove the setting of the time
ARM: clps711x: Removed superfluous transform virt_to_bus and related functions
ARM: clps711x/p720t: Replace __initcall by .init_early call
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
This patch fixes "bug: interrupts were enabled early" due to
do_settimeofday() which used in common code for clps711x-platform.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
This override does not allow support for multiple machines in a single core.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Since we are trying to do to support multiple machines in a single kernel,
we need to eliminate the use of __initcall to be used for all machines.
Using .init_early call solves this problem.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Need to mask it with (FNHE_HASH_SIZE - 1).
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the time field as it is not used.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Remove the usage field as it is not used.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Remove the latency_ticks field as it is not used.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
When suspending the system with an important RTC wake alarm active,
it is possible that the RTC alarm will expire before the system has
gone to sleep (e.g. short alarm timer, or an unusually long suspend
routine).
If this happens, the RTC alarm should trigger a wakeup event, possibly
aborting system suspend. This condition can be detected in the form
of an RTC alarm interrupt.
Signed-off-by: Paul Fox <pgf@laptop.org>
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
With trustee gone, CPU hotplug code can be simplified.
* gcwq_claim/release_management() now grab and release gcwq lock too
respectively and gained _and_lock and _and_unlock postfixes.
* All CPU hotplug logic was implemented in workqueue_cpu_callback()
which was called by workqueue_cpu_up/down_callback() for the correct
priority. This was because up and down paths shared a lot of logic,
which is no longer true. Remove workqueue_cpu_callback() and move
all hotplug logic into the two actual callbacks.
This patch doesn't make any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
With the previous changes, a disassociated global_cwq now can run as
an unbound one on its own - it can create workers as necessary to
drain remaining works after the CPU has been brought down and manage
the number of workers using the usual idle timer mechanism making
trustee completely redundant except for the actual unbinding
operation.
This patch removes the trustee and let a disassociated global_cwq
manage itself. Unbinding is moved to a work item (for CPU affinity)
which is scheduled and flushed from CPU_DONW_PREPARE.
This patch moves nr_running clearing outside gcwq and manager locks to
simplify the code. As nr_running is unused at the point, this is
safe.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
Currently, during CPU offlining, after all pending work items are
drained, the trustee butchers all workers. Also, on CPU onlining
failure, workqueue_cpu_callback() ensures that the first idle worker
is destroyed. Combined, these guarantee that an offline CPU doesn't
have any worker for it once all the lingering work items are finished.
This guarantee isn't really necessary and makes CPU on/offlining more
expensive than needs to be, especially for platforms which use CPU
hotplug for powersaving.
This patch lets offline CPUs removes idle worker butchering from the
trustee and let a CPU which failed onlining keep the created first
worker. The first worker is created if the CPU doesn't have any
during CPU_DOWN_PREPARE and started right away. If onlining succeeds,
the rebind_workers() call in CPU_ONLINE will rebind it like any other
workers. If onlining fails, the worker is left alone till the next
try.
This makes CPU hotplugs cheaper by allowing global_cwqs to keep
workers across them and simplifies code.
Note that trustee doesn't re-arm idle timer when it's done and thus
the disassociated global_cwq will keep all workers until it comes back
online. This will be improved by further patches.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
Currently, if there are left workers when a CPU is being brough back
online, the trustee kills all idle workers and scheduled rebind_work
so that they re-bind to the CPU after the currently executing work is
finished. This works for busy workers because concurrency management
doesn't try to wake up them from scheduler callbacks, which require
the target task to be on the local run queue. The busy worker bumps
concurrency counter appropriately as it clears WORKER_UNBOUND from the
rebind work item and it's bound to the CPU before returning to the
idle state.
To reduce CPU on/offlining overhead (as many embedded systems use it
for powersaving) and simplify the code path, workqueue is planned to
be modified to retain idle workers across CPU on/offlining. This
patch reimplements CPU online rebinding such that it can also handle
idle workers.
As noted earlier, due to the local wakeup requirement, rebinding idle
workers is tricky. All idle workers must be re-bound before scheduler
callbacks are enabled. This is achieved by interlocking idle
re-binding. Idle workers are requested to re-bind and then hold until
all idle re-binding is complete so that no bound worker starts
executing work item. Only after all idle workers are re-bound and
parked, CPU_ONLINE proceeds to release them and queue rebind work item
to busy workers thus guaranteeing scheduler callbacks aren't invoked
until all idle workers are ready.
worker_rebind_fn() is renamed to busy_worker_rebind_fn() and
idle_worker_rebind() for idle workers is added. Rebinding logic is
moved to rebind_workers() and now called from CPU_ONLINE after
flushing trustee. While at it, add CPU sanity check in
worker_thread().
Note that now a worker may become idle or the manager between trustee
release and rebinding during CPU_ONLINE. As the previous patch
updated create_worker() so that it can be used by regular manager
while unbound and this patch implements idle re-binding, this is safe.
This prepares for removal of trustee and keeping idle workers across
CPU hotplugs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
Currently, create_worker()'s callers are responsible for deciding
whether the newly created worker should be bound to the associated CPU
and create_worker() sets WORKER_UNBOUND only for the workers for the
unbound global_cwq. Creation during normal operation is always via
maybe_create_worker() and @bind is true. For workers created during
hotplug, @bind is false.
Normal operation path is planned to be used even while the CPU is
going through hotplug operations or offline and this static decision
won't work.
Drop @bind from create_worker() and decide whether to bind by looking
at GCWQ_DISASSOCIATED. create_worker() will also set WORKER_UNBOUND
autmatically if disassociated. To avoid flipping GCWQ_DISASSOCIATED
while create_worker() is in progress, the flag is now allowed to be
changed only while holding all manager_mutexes on the global_cwq.
This requires that GCWQ_DISASSOCIATED is not cleared behind trustee's
back. CPU_ONLINE no longer clears DISASSOCIATED before flushing
trustee, which clears DISASSOCIATED before rebinding remaining workers
if asked to release. For cases where trustee isn't around, CPU_ONLINE
clears DISASSOCIATED after flushing trustee. Also, now, first_idle
has UNBOUND set on creation which is explicitly cleared by CPU_ONLINE
while binding it. These convolutions will soon be removed by further
simplification of CPU hotplug path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
POOL_MANAGING_WORKERS is used to ensure that at most one worker takes
the manager role at any given time on a given global_cwq. Trustee
later hitched on it to assume manager adding blocking wait for the
bit. As trustee already needed a custom wait mechanism, waiting for
MANAGING_WORKERS was rolled into the same mechanism.
Trustee is scheduled to be removed. This patch separates out
MANAGING_WORKERS wait into per-pool mutex. Workers use
mutex_trylock() to test for manager role and trustee uses mutex_lock()
to claim manager roles.
gcwq_claim/release_management() helpers are added to grab and release
manager roles of all pools on a global_cwq. gcwq_claim_management()
always grabs pool manager mutexes in ascending pool index order and
uses pool index as lockdep subclass.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
Currently, WORKER_UNBOUND is used to mark workers for the unbound
global_cwq and WORKER_ROGUE is used to mark workers for disassociated
per-cpu global_cwqs. Both are used to make the marked worker skip
concurrency management and the only place they make any difference is
in worker_enter_idle() where WORKER_ROGUE is used to skip scheduling
idle timer, which can easily be replaced with trustee state testing.
This patch replaces WORKER_ROGUE with WORKER_UNBOUND and drops
WORKER_ROGUE. This is to prepare for removing trustee and handling
disassociated global_cwqs as unbound.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
Workqueue used CPU_DYING notification to mark GCWQ_DISASSOCIATED.
This was necessary because workqueue's CPU_DOWN_PREPARE happened
before other DOWN_PREPARE notifiers and workqueue needed to stay
associated across the rest of DOWN_PREPARE.
After the previous patch, workqueue's DOWN_PREPARE happens after
others and can set GCWQ_DISASSOCIATED directly. Drop CPU_DYING and
let the trustee set GCWQ_DISASSOCIATED after disabling concurrency
management.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
Currently, all workqueue cpu hotplug operations run off
CPU_PRI_WORKQUEUE which is higher than normal notifiers. This is to
ensure that workqueue is up and running while bringing up a CPU before
other notifiers try to use workqueue on the CPU.
Per-cpu workqueues are supposed to remain working and bound to the CPU
for normal CPU_DOWN_PREPARE notifiers. This holds mostly true even
with workqueue offlining running with higher priority because
workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
runs the per-cpu workqueue without concurrency management without
explicitly detaching the existing workers.
However, if the trustee needs to create new workers, it creates
unbound workers which may wander off to other CPUs while
CPU_DOWN_PREPARE notifiers are in progress. Furthermore, if the CPU
down is cancelled, the per-CPU workqueue may end up with workers which
aren't bound to the CPU.
While reliably reproducible with a convoluted artificial test-case
involving scheduling and flushing CPU burning work items from CPU down
notifiers, this isn't very likely to happen in the wild, and, even
when it happens, the effects are likely to be hidden by the following
successful CPU down.
Fix it by using different priorities for up and down notifiers - high
priority for up operations and low priority for down operations.
Workqueue cpu hotplug operations will soon go through further cleanup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
|
|
As discussed in
http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990,
the capability introduced in 4d7e30d98939a0340022ccd49325a3d70f7e0238
to govern EPOLLWAKEUP seems misnamed: this capability is about governing
the ability to suspend the system, not using a particular API flag
(EPOLLWAKEUP). We should make the name of the capability more general
to encourage reuse in related cases. (Whether or not this capability
should also be used to govern the use of /sys/power/wake_lock is a
question that needs to be separately resolved.)
This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure
that the old capability name doesn't make it out into the wild, could you
please apply and push up the tree to ensure that it is incorporated
for the 3.5 release.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|