summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-07-24MIPS: BCM63XX: Add PCIe Support for BCM6328Jonas Gorski
Add support for the PCIe port found on BCM6328. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: Kevin Cernekee <cernekee@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/3956/ Reviewed-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: Move the PCI initialization into its own functionJonas Gorski
Also make the cpu check a bit more explicit. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: Kevin Cernekee <cernekee@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/3953/ Reviewed-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: Add basic BCM6328 supportJonas Gorski
This includes CPU speed, memory size detection and working UART, but lacking the appropriate drivers, no support for attached flash. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: Kevin Cernekee <cernekee@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/3951/ Reviewed-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: Use the Chip ID register for identifying the SoCJonas Gorski
Newer BCM63XX SoCs use virtually the same CPU ID, differing only in the revision bits. But since they all have the Chip ID register at the same location, we can use that to identify the SoC we are running on. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: Kevin Cernekee <cernekee@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/3955/ Reviewed-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: Add flash type detectionJonas Gorski
On BCM6358 and BCM6368 the attached flash type is exposed through a bootstrapping register. Use it for auto detecting the flash type on those and default to parallel flash for earlier SoCs. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: Kevin Cernekee <cernekee@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/3954/ Reviewed-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: Move flash registration out of board_bcm963xx.cJonas Gorski
board_bcm963xx.c is already large enough. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Cc: linux-mips@linux-mips.org Cc: Maxime Bizon <mbizon@freebox.fr> Cc: Florian Fainelli <florian@openwrt.org> Cc: Kevin Cernekee <cernekee@gmail.com> Patchwork: https://patchwork.linux-mips.org/patch/3952/ Reviewed-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24hw_random: add Broadcom BCM63xx RNG driverFlorian Fainelli
Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: linux-mips@linux-mips.org Cc: mpm@selenic.com Cc: herbert@gondor.apana.org.au Patchwork: https://patchwork.linux-mips.org/patch/3327/ Patchwork: https://patchwork.linux-mips.org/patch/4072/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: add RNG driver platform_device stubFlorian Fainelli
Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: linux-mips@linux-mips.org Cc: mpm@selenic.com Cc: herbert@gondor.apana.org.au Patchwork: https://patchwork.linux-mips.org/patch/3325/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: add RNG peripheral definitionsFlorian Fainelli
Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: linux-mips@linux-mips.org Cc: mpm@selenic.com Cc: herbert@gondor.apana.org.au Patchwork: https://patchwork.linux-mips.org/patch/3326/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: add support for "ipsec" clockFlorian Fainelli
This module is only available on BCM6368 so far and does not require resetting the block. Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: linux-mips@linux-mips.org Cc: mpm@selenic.com Cc: herbert@gondor.apana.org.au Patchwork: https://patchwork.linux-mips.org/patch/3324/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: OCTEON: Remove some unused files.David Daney
These FPA related files are not used anywhere in the kernel. Remove them. Signed-off-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/3892/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24MIPS: BCM63XX: Fix platform_devices idFlorian Fainelli
There is only one watchdog and VoIP DSP platform devices per board, use -1 as the platform_device id accordingly. Signed-off-by: Florian Fainelli <florian@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/3313/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-07-24drbd: announce FLUSH/FUA capability to upper layersLars Ellenberg
Unconditionally announce FLUSH/FUA to upper layers. If the lower layers on either node do not actually support this, generic_make_request() will deal with it. If this causes performance regressions on your setup, make sure there are no volatile caches involved, and mount -o nobarrier or equivalent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24drbd: fix max_bio_size to be unsignedLars Ellenberg
We capped our max_bio_size respectively max_hw_sectors with min_t(int, lower level limit, our limit); unfortunately, some drivers, e.g. the kvm virtio block driver, initialize their limits to "-1U", and that is of course a smaller "int" value than our limit. Impact: we started to request 16 MB resync requests, which lead to protocol error and a reconnect loop. Fix all relevant constants and parameters to be unsigned int. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24drbd: flush drbd work queue before invalidate/invalidate remoteLars Ellenberg
If you do back to back wait-sync/invalidate on a Primary in a tight loop, during application IO load, you could trigger a race: kernel: block drbd6: FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? Fix this by changing the order of the drbd_queue_work() and the wake_up() in dec_ap_pending(), and adding the additional drbd_flush_workqueue() before requesting the full sync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24drbd: fix potential access after freeLars Ellenberg
Occasionally, if we disconnect, we triggered this assert: block drbd7: ASSERT FAILED tl_hash[27] == c30b0f04, expected NULL hlist_del() happens only on master bio completion. We used to wait for pending IO to complete before freeing tl_hash on disconnect. We no longer do so, since we learned to "freeze" IO on disconnect. If the local disk is too slow, we may reach C_STANDALONE early, and there are still some requests pending locally when we call drbd_free_tl_hash(). If we now free the tl_hash, and later the local IO completion completes the master bio, which then does hlist_del() and clobbers freed memory. Do hlist_del_init() and hlist_add_fake() before kfree(tl_hash), so the hlist_del() on master bio completion is harmless. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24i2c-omap: Add support for I2C_M_STOP message flagLaurent Pinchart
Generate a stop condition after each message marked with I2C_M_STOP. [JD: Add I2C_FUNC_PROTOCOL_MANGLING.] Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c: Fall back to emulated SMBus if the operation isn't supported nativelyLaurent Pinchart
Adapter drivers might support only a subset of the SMBus operations natively. Those drivers currently have to manually emulate unsupported operations using I2C. Make the i2c_smbus_xfer() function fall back to i2c_smbus_xfer_emulated() when the adapter's .smbus_xfer() operation returns -EOPNOTSUPP, like it already does when the .smbus_xfer() operation isn't available at all. [JD: Minor optimization.] Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c: Add SCCB supportLaurent Pinchart
SCCB is a serial communication bus developed by Omnivision. Its 2-wire mode is very similar to SMBus byte data transactions, but requires the controller to ignore the ACK bit and to insert a stop condition after each message. Add a device SCCB flag and a message stop flag to be passed to controller drivers. [JD: Kill rogue definition in go7007 driver.] Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-tiny-usb: Add support for the Robofuzz OSIF USB/I2C converterEmmanuel Deloget
Robofuzz OSIF is a generic USB/iIC interface that embeds an ATMega8A AVR-RISC microcontroler. The device is based upon Till Harbaum's i2c-tiny-usb and although it enhances the original design with further functionnalities it still maintain compatibility with it with respect to the USB/I2C interface. Signed-off-by: Emmanuel Deloget <logout@free.fr> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-i801: Enable IRQ for byte_by_byte transactionsDaniel Kurtz
Byte-by-byte transactions are used primarily for accessing I2C devices with an SMBus controller. For these transactions, for each byte that is read or written, the SMBus controller generates a BYTE_DONE IRQ. The isr reads/writes the next byte, and clears the IRQ flag to start the next byte. On the penultimate IRQ, the isr also sets the LAST_BYTE flag. There is no locking around the cmd/len/count/data variables, since the I2C adapter lock ensures there is never multiple simultaneous transactions for the same device, and the driver thread never accesses these variables while interrupts might be occurring. The end result is faster I2C block read and write transactions. Note: This patch has only been tested and verified by doing I2C read and write block transfers on Cougar Point 6 Series PCH, as well as I2C read block transfers on ICH5. Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-i801: Enable interrupts on ICH5/7/8/9/10Jean Delvare
Enable interrupts on more devices. ICH5, ICH7(-M) and ICH10 have been tested to work OK. ICH8 and ICH9 are expected to work just fine as they are very close to ICH7 and ICH10. Ultimately we want to enable this feature on at least every device since the ICH5, but for now we limit the exposure. We'll enable it for other devices if we don't get negative feedback. As a bonus, let the user know when interrupts are used. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Daniel Kurtz <djkurtz@chromium.org>
2012-07-24i2c-i801: Enable IRQ for SMBus transactionsDaniel Kurtz
Add a new 'feature' to i2c-i801 to enable using PCI interrupts. When the feature is enabled, then an isr is installed for the device's PCI IRQ. An I2C/SMBus transaction is always terminated by one of the following interrupt sources: FAILED, BUS_ERR, DEV_ERR, or on success: INTR. When the isr fires for one of these cases, it sets the ->status variable and wakes up the waitq. The waitq then saves off the status code, and clears ->status (in preparation for some future transaction). The SMBus controller generates an INTR irq at the end of each transaction where INTREN was set in the HST_CNT register. No locking is needed around accesses to priv->status since all writes to it are serialized: it is only ever set once in the isr at the end of a transaction, and cleared while no interrupts can occur. In addition, the I2C adapter lock guarantees that entire I2C transactions for a single adapter are always serialized. For this patch, the INTREN bit is set only for SMBus block, byte and word transactions, but not for I2C reads or writes. The use of the DS (BYTE_DONE) interrupt with byte-by-byte I2C transactions is implemented in a subsequent patch. The interrupt feature has only been enabled for COUGARPOINT hardware. In addition, it is disabled if SMBus is using the SMI# interrupt. Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-i801: Consolidate pollingJean Delvare
(Based on earlier work by Daniel Kurtz.) Come up with a consistent, driver-wide strategy for event polling. For intermediate steps of byte-by-byte block transactions, check for BYTE_DONE or any error flag being set. At the end of every transaction (regardless of PEC being used), check for both BUSY being cleared and INTR or any error flag being set. This ensures proper action for all transaction types. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Daniel Kurtz <djkurtz@chromium.org>
2012-07-24i2c-i801: Drop ENABLE_INT9Daniel Kurtz
Later patches enable interrupts. This preliminary patch removes the older unsupported ENABLE_INT9 flag. Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-i801: Rename some SMBHSTCNT bit constantsDaniel Kurtz
Rename the SMBHSTCNT register bit access constants to match the style of other register bits. Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-i801: Check and return errors during byte-by-byte transfersDaniel Kurtz
If an error is detected in the polling loop, abort the transaction and return an error code. * DEV_ERR is set if the device does not respond with an acknowledge, and the SMBus controller times out (minimum 25ms). * BUS_ERR is set if a bus arbitration collision is detected. In other words, when the SMBus controller tries to generate a START condition, but detects that the SMBDATA is being held low, usually by another SMBus/I2C master. * FAILED is only set if a transaction is stopped by software (using the SMBHSTCNT KILL bit). Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-i801: Clear only status bits in HST_STSDaniel Kurtz
Writing back the whole status register could clear unwanted bits. In particular, it could clear the "INUSE_STS" bit, which is a 'hardware semaphore', that might be useful to use some day. To prepare for this, let's ban writing back the whole status to register HST_STS, of which this is the only instance. Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-i801: Refactor use of LAST_BYTE in i801_block_transaction_byte_by_byteDaniel Kurtz
As a slight optimization, pull some logic out of the polling loop during byte-by-byte transactions by just setting the I801_LAST_BYTE bit, as defined in the i801 (PCH) datasheet, when reading the last byte of a byte-by-byte I2C_SMBUS_READ. Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-smbus: Use module_i2c_driver()Fabio Estevam
Using module_i2c_driver() makes the code smaller and cleaner. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c/writing-clients: Mention module_i2c_driver()Jean Delvare
Based on a previous patch from Peter Meerwald. Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Peter Meerwald <p.meerwald@bct-electronic.com>
2012-07-24i2c-piix4: Support AMD auxiliary SMBus controllerAndrew Armenia
Some AMD chipsets, such as the SP5100, have an auxiliary SMBus controller with a second set of registers. This patch adds support for this auxiliary controller. Tested on ASUS KCMA-D8 motherboard. Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-piix4: Separate registration and probing codeAndrew Armenia
Some chipsets have multiple sets of SMBus registers each controlling a separate SMBus. Supporting these chipsets properly will require registering multiple I2C adapters for one piix4. The code to initialize and register the i2c_adapter structure has been separated from piix4_probe and allows registration of a piix4 adapter given its base address. Note that the i2c_adapter and i2c_piix4_adapdata structures are now dynamically allocated. Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c-piix4: Eliminate piix4_smba global variableAndrew Armenia
Some chipsets have multiple sets of piix4-compatible SMBus registers. Eliminating the global variable will allow these chipsets to be fully supported. Return value from piix4_setup and piix4_sb800_setup now returns the smba value detected. This is stored in a struct i2c_piix4_adapdata. Thus the global variable is eliminated. Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24i2c/busses: Use module_pci_driverAxel Lin
Convert the drivers in drivers/i2c/busses/* to usemodule_pci_driver() macro which makes the code smaller and a bit simpler. Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Rudolf Marek <r.marek@assembler.cz> Cc: Olof Johansson <olof@lixom.net> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Cc: Tomoya MORINAGA <tomoya.rohm@gmail.com>
2012-07-24i2c: Update Guenter Roeck's e-mail addressGuenter Roeck
My old e-mail address won't be valid for much longer. Time to update it. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2012-07-24drbd: call local-io-error handler earlyLars Ellenberg
In case we want to hard-reset from the local-io-error handler, we need to call it before notifying the peer or aborting local IO. Otherwise the peer will advance its data generation UUIDs even if secondary. This way, local io error looks like a "regular" node crash, which reduces the number of different failure cases. This may be useful in a bigger picture where crashed or otherwise "misbehaving" nodes are automatically re-deployed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24drbd: do not reset rs_pending_cnt too earlyLars Ellenberg
Fix asserts like block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 ! We reset the resync lru cache and related information (rs_pending_cnt), once we successfully finished a resync or online verify, or if the replication connection is lost. We also need to reset it if a resync or online verify is aborted because a lower level disk failed. In that case the replication link is still established, and we may still have packets queued in the network buffers which want to touch rs_pending_cnt. We do not have any synchronization mechanism to know for sure when all such pending resync related packets have been drained. To avoid this counter to go negative (and violate the ASSERT that it will always be >= 0), just do not reset it when we lose a disk. It is good enough to make sure it is re-initialized before the next resync can start: reset it when we re-attach a disk. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24drbd: reset congestion information before reporting it in /proc/drbdLars Ellenberg
We cache the congestion status in mdev->congestion_reason whenever drbd_congested() was called. Reset this cached info before reporting it when reading /proc/drbd. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24drbd: report congestion if we are waiting for some userland callbackLars Ellenberg
If the drbd worker thread is synchronously waiting for some userland callback, we don't want some casual pageout to block on us. Have drbd_congested() report congestion in that case. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24drbd: differentiate between normal and forced detachLars Ellenberg
Aborting local requests (not waiting for completion from the lower level disk) is dangerous: if the master bio has been completed to upper layers, data pages may be re-used for other things already. If local IO is still pending and later completes, this may cause crashes or corrupt unrelated data. Only abort local IO if explicitly requested. Intended use case is a lower level device that turned into a tarpit, not completing io requests, not even doing error completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24drbd: cleanup, remove two unused global flagsLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-07-24sched: Fix race in task_group()Peter Zijlstra
Stefan reported a crash on a kernel before a3e5d1091c1 ("sched: Don't call task_group() too many times in set_task_rq()"), he found the reason to be that the multiple task_group() invocations in set_task_rq() returned different values. Looking at all that I found a lack of serialization and plain wrong comments. The below tries to fix it using an extra pointer which is updated under the appropriate scheduler locks. Its not pretty, but I can't really see another way given how all the cgroup stuff works. Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24sched: Improve balance_cpu() to consider other cpus in its group as target ↵Srivatsa Vaddagiri
of (pinned) task Current load balance scheme requires only one cpu in a sched_group (balance_cpu) to look at other peer sched_groups for imbalance and pull tasks towards itself from a busy cpu. Tasks thus pulled by balance_cpu could later get picked up by cpus that are in the same sched_group as that of balance_cpu. This scheme however fails to pull tasks that are not allowed to run on balance_cpu (but are allowed to run on other cpus in its sched_group). That can affect fairness and in some worst case scenarios cause starvation. Consider a two core (2 threads/core) system running tasks as below: Core0 Core1 / \ / \ C0 C1 C2 C3 | | | | v v v v F0 T1 F1 [idle] T2 F0 = SCHED_FIFO task (pinned to C0) F1 = SCHED_FIFO task (pinned to C2) T1 = SCHED_OTHER task (pinned to C1) T2 = SCHED_OTHER task (pinned to C1 and C2) F1 could become a cpu hog, which will starve T2 unless C1 pulls it. Between C0 and C1 however, C0 is required to look for imbalance between cores, which will fail to pull T2 towards Core0. T2 will starve eternally in this case. The same scenario can arise in presence of non-rt tasks as well (say we replace F1 with high irq load). We tackle this problem by having balance_cpu move pinned tasks to one of its sibling cpus (where they can run). We first check if load balance goal can be met by ignoring pinned tasks, failing which we retry move_tasks() with a new env->dst_cpu. This patch modifies load balance semantics on who can move load towards a given cpu in a given sched_domain. Before this patch, a given_cpu or a ilb_cpu acting on behalf of an idle given_cpu is responsible for moving load to given_cpu. With this patch applied, balance_cpu can in addition decide on moving some load to a given_cpu. There is a remote possibility that excess load could get moved as a result of this (balance_cpu and given_cpu/ilb_cpu deciding *independently* and at *same* time to move some load to a given_cpu). However we should see less of such conflicting decisions in practice and moreover subsequent load balance cycles should correct the excess load moved to given_cpu. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FE06CDB.2060605@linux.vnet.ibm.com [ minor edits ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24sched: Reset loop counters if all tasks are pinned and we need to redo load ↵Prashanth Nageshappa
balance While load balancing, if all tasks on the source runqueue are pinned, we retry after excluding the corresponding source cpu. However, loop counters env.loop and env.loop_break are not reset before retrying, which can lead to failure in moving the tasks. In this patch we reset env.loop and env.loop_break to their inital values before we retry. Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FE06EEF.2090709@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24sched: Reorder 'struct lb_env' members to reduce its sizePrashanth Nageshappa
Members of 'struct lb_env' are not in appropriate order to reuse compiler added padding on 64bit architectures. In this patch we reorder those struct members and help reduce the size of the structure from 96 bytes to 80 bytes on 64 bit architectures. Suggested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FE06DDE.7000403@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24sched: Improve scalability via 'CPU buddies', which withstand random ↵Mike Galbraith
perturbations Traversing an entire package is not only expensive, it also leads to tasks bouncing all over a partially idle and possible quite large package. Fix that up by assigning a 'buddy' CPU to try to motivate. Each buddy may try to motivate that one other CPU, if it's busy, tough, it may then try its SMT sibling, but that's all this optimization is allowed to cost. Sibling cache buddies are cross-wired to prevent bouncing. 4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is better: clients 1 2 4 8 16 32 64 128 .......................................................................... pre 30 41 118 645 3769 6214 12233 14312 post 299 603 1211 2418 4697 6847 11606 14557 A nice increase in performance. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1339471112.7352.32.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24cpusets: Remove/update outdated commentsSrivatsa S. Bhat
cpuset_track_online_cpus() is no longer present. So remove the outdated comment and replace it with reference to cpuset_update_active_cpus() which is its equivalent. Also, we don't lack memory hot-unplug anymore. And David Rientjes pointed out how it is dealt with. So update that comment as well. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120524141700.3692.98192.stgit@srivatsabhat.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24cpusets, hotplug: Restructure functions that are invoked during hotplugSrivatsa S. Bhat
Separate out the cpuset related handling for CPU/Memory online/offline. This also helps us exploit the most obvious and basic level of optimization that any notification mechanism (CPU/Mem online/offline) has to offer us: "We *know* why we have been invoked. So stop pretending that we are lost, and do only the necessary amount of processing!". And while at it, rename scan_for_empty_cpusets() to scan_cpusets_upon_hotplug(), which is more appropriate considering how it is restructured. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120524141650.3692.48637.stgit@srivatsabhat.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24cpusets, hotplug: Implement cpuset tree traversal in a helper functionSrivatsa S. Bhat
At present, the functions that deal with cpusets during CPU/Mem hotplug are quite messy, since a lot of the functionality is mixed up without clear separation. And this takes a toll on optimization as well. For example, the function cpuset_update_active_cpus() is called on both CPU offline and CPU online events; and it invokes scan_for_empty_cpusets(), which makes sense only for CPU offline events. And hence, the current code ends up unnecessarily traversing the cpuset tree during CPU online also. As a first step towards cleaning up those functions, encapsulate the cpuset tree traversal in a helper function, so as to facilitate upcoming changes. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120524141635.3692.893.stgit@srivatsabhat.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>