From fe5d2f4a15967bbe907e7b3e31e49dae7af7cc6b Mon Sep 17 00:00:00 2001 From: Jonathan Brassow Date: Thu, 21 Feb 2013 13:28:10 +1100 Subject: DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10 implementation. This patch adds support for the "far" and "offset" algorithms, but only with the improved redundancy that is brought with the introduction of the 'use_far_sets' bit, which shifts copied stripes according to smaller sets vs the entire array. That is, the 17th bit of the 'layout' variable that defines the RAID10 implementation will always be set. (More information on how the 'layout' variable selects the RAID10 algorithm can be found in the opening comments of drivers/md/raid10.c.) Signed-off-by: Jonathan Brassow Signed-off-by: NeilBrown --- Documentation/device-mapper/dm-raid.txt | 44 +++++++++++++++++++++++++++------ 1 file changed, 37 insertions(+), 7 deletions(-) (limited to 'Documentation') diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt index 56fb62b09fc5..b428556197c9 100644 --- a/Documentation/device-mapper/dm-raid.txt +++ b/Documentation/device-mapper/dm-raid.txt @@ -30,6 +30,7 @@ The target is named "raid" and it accepts the following parameters: raid10 Various RAID10 inspired algorithms chosen by additional params - RAID10: Striped Mirrors (aka 'Striping on top of mirrors') - RAID1E: Integrated Adjacent Stripe Mirroring + - RAID1E: Integrated Offset Stripe Mirroring - and other similar RAID10 variants Reference: Chapter 4 of @@ -64,15 +65,15 @@ The target is named "raid" and it accepts the following parameters: synchronisation state for each region. [raid10_copies <# copies>] - [raid10_format near] + [raid10_format ] These two options are used to alter the default layout of a RAID10 configuration. The number of copies is can be - specified, but the default is 2. There are other variations - to how the copies are laid down - the default and only current - option is "near". Near copies are what most people think of - with respect to mirroring. If these options are left - unspecified, or 'raid10_copies 2' and/or 'raid10_format near' - are given, then the layouts for 2, 3 and 4 devices are: + specified, but the default is 2. There are also three + variations to how the copies are laid down - the default + is "near". Near copies are what most people think of with + respect to mirroring. If these options are left unspecified, + or 'raid10_copies 2' and/or 'raid10_format near' are given, + then the layouts for 2, 3 and 4 devices are: 2 drives 3 drives 4 drives -------- ---------- -------------- A1 A1 A1 A1 A2 A1 A1 A2 A2 @@ -85,6 +86,33 @@ The target is named "raid" and it accepts the following parameters: 3-device layout is what might be called a 'RAID1E - Integrated Adjacent Stripe Mirroring'. + If 'raid10_copies 2' and 'raid10_format far', then the layouts + for 2, 3 and 4 devices are: + 2 drives 3 drives 4 drives + -------- -------------- -------------------- + A1 A2 A1 A2 A3 A1 A2 A3 A4 + A3 A4 A4 A5 A6 A5 A6 A7 A8 + A5 A6 A7 A8 A9 A9 A10 A11 A12 + .. .. .. .. .. .. .. .. .. + A2 A1 A3 A1 A2 A2 A1 A4 A3 + A4 A3 A6 A4 A5 A6 A5 A8 A7 + A6 A5 A9 A7 A8 A10 A9 A12 A11 + .. .. .. .. .. .. .. .. .. + + If 'raid10_copies 2' and 'raid10_format offset', then the + layouts for 2, 3 and 4 devices are: + 2 drives 3 drives 4 drives + -------- ------------ ----------------- + A1 A2 A1 A2 A3 A1 A2 A3 A4 + A2 A1 A3 A1 A2 A2 A1 A4 A3 + A3 A4 A4 A5 A6 A5 A6 A7 A8 + A4 A3 A6 A4 A5 A6 A5 A8 A7 + A5 A6 A7 A8 A9 A9 A10 A11 A12 + A6 A5 A9 A7 A8 A10 A9 A12 A11 + .. .. .. .. .. .. .. .. .. + Here we see layouts closely akin to 'RAID1E - Integrated + Offset Stripe Mirroring'. + <#raid_devs>: The number of devices composing the array. Each device consists of two entries. The first is the device containing the metadata (if any); the second is the one containing the @@ -142,3 +170,5 @@ Version History 1.3.0 Added support for RAID 10 1.3.1 Allow device replacement/rebuild for RAID 10 1.3.2 Fix/improve redundancy checking for RAID10 +1.4.0 Non-functional change. Removes arg from mapping function. +1.4.1 Add RAID10 "far" and "offset" algorithm support. -- cgit From e3333e572f29109740b48302035a59f9db0fa8e9 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 20 Feb 2013 20:58:42 -0800 Subject: hwmon: Update my e-mail address in driver documentation Most of the hwmon driver documentation still listed my old invalid e-mail address. Fix it. Signed-off-by: Guenter Roeck Acked-by: Jean Delvare --- Documentation/hwmon/adm1275 | 2 +- Documentation/hwmon/jc42 | 2 +- Documentation/hwmon/lineage-pem | 2 +- Documentation/hwmon/lm25066 | 2 +- Documentation/hwmon/ltc2978 | 2 +- Documentation/hwmon/ltc4261 | 2 +- Documentation/hwmon/max16064 | 2 +- Documentation/hwmon/max16065 | 2 +- Documentation/hwmon/max34440 | 2 +- Documentation/hwmon/max8688 | 2 +- Documentation/hwmon/pmbus | 2 +- Documentation/hwmon/smm665 | 2 +- Documentation/hwmon/ucd9000 | 2 +- Documentation/hwmon/ucd9200 | 2 +- Documentation/hwmon/zl6100 | 2 +- 15 files changed, 15 insertions(+), 15 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/adm1275 b/Documentation/hwmon/adm1275 index 2cfa25667123..15b4a20d5062 100644 --- a/Documentation/hwmon/adm1275 +++ b/Documentation/hwmon/adm1275 @@ -15,7 +15,7 @@ Supported chips: Addresses scanned: - Datasheet: www.analog.com/static/imported-files/data_sheets/ADM1276.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/jc42 b/Documentation/hwmon/jc42 index 165077121238..868d74d6b773 100644 --- a/Documentation/hwmon/jc42 +++ b/Documentation/hwmon/jc42 @@ -49,7 +49,7 @@ Supported chips: Addresses scanned: I2C 0x18 - 0x1f Author: - Guenter Roeck + Guenter Roeck Description diff --git a/Documentation/hwmon/lineage-pem b/Documentation/hwmon/lineage-pem index 2ba5ed126858..83b2ddc160c8 100644 --- a/Documentation/hwmon/lineage-pem +++ b/Documentation/hwmon/lineage-pem @@ -8,7 +8,7 @@ Supported devices: Documentation: http://www.lineagepower.com/oem/pdf/CPLI2C.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/lm25066 b/Documentation/hwmon/lm25066 index a21db81c4591..26025e419d35 100644 --- a/Documentation/hwmon/lm25066 +++ b/Documentation/hwmon/lm25066 @@ -19,7 +19,7 @@ Supported chips: Datasheet: http://www.national.com/pf/LM/LM5066.html -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978 index c365f9beb5dd..1642348de2d6 100644 --- a/Documentation/hwmon/ltc2978 +++ b/Documentation/hwmon/ltc2978 @@ -11,7 +11,7 @@ Supported chips: Addresses scanned: - Datasheet: http://cds.linear.com/docs/Datasheet/3880f.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/ltc4261 b/Documentation/hwmon/ltc4261 index eba2e2c4b94d..9378a75c6134 100644 --- a/Documentation/hwmon/ltc4261 +++ b/Documentation/hwmon/ltc4261 @@ -8,7 +8,7 @@ Supported chips: Datasheet: http://cds.linear.com/docs/Datasheet/42612fb.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/max16064 b/Documentation/hwmon/max16064 index f8b478076f6d..d59cc7829bec 100644 --- a/Documentation/hwmon/max16064 +++ b/Documentation/hwmon/max16064 @@ -7,7 +7,7 @@ Supported chips: Addresses scanned: - Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX16064.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/max16065 b/Documentation/hwmon/max16065 index c11f64a1f2ad..208a29e43010 100644 --- a/Documentation/hwmon/max16065 +++ b/Documentation/hwmon/max16065 @@ -24,7 +24,7 @@ Supported chips: http://datasheets.maxim-ic.com/en/ds/MAX16070-MAX16071.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/max34440 b/Documentation/hwmon/max34440 index 47651ff341ae..37cbf472a19d 100644 --- a/Documentation/hwmon/max34440 +++ b/Documentation/hwmon/max34440 @@ -27,7 +27,7 @@ Supported chips: Addresses scanned: - Datasheet: http://datasheets.maximintegrated.com/en/ds/MAX34461.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/max8688 b/Documentation/hwmon/max8688 index fe849871df32..e78078638b91 100644 --- a/Documentation/hwmon/max8688 +++ b/Documentation/hwmon/max8688 @@ -7,7 +7,7 @@ Supported chips: Addresses scanned: - Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX8688.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/pmbus b/Documentation/hwmon/pmbus index 3d3a0f97f966..cf756ed48ff9 100644 --- a/Documentation/hwmon/pmbus +++ b/Documentation/hwmon/pmbus @@ -34,7 +34,7 @@ Supported chips: Addresses scanned: - Datasheet: n.a. -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/smm665 b/Documentation/hwmon/smm665 index 59e316140542..a341eeedab75 100644 --- a/Documentation/hwmon/smm665 +++ b/Documentation/hwmon/smm665 @@ -29,7 +29,7 @@ Supported chips: http://www.summitmicro.com/prod_select/summary/SMM766/SMM766_2086.pdf http://www.summitmicro.com/prod_select/summary/SMM766B/SMM766B_2122.pdf -Author: Guenter Roeck +Author: Guenter Roeck Module Parameters diff --git a/Documentation/hwmon/ucd9000 b/Documentation/hwmon/ucd9000 index 0df5f276505b..805e33edb978 100644 --- a/Documentation/hwmon/ucd9000 +++ b/Documentation/hwmon/ucd9000 @@ -11,7 +11,7 @@ Supported chips: http://focus.ti.com/lit/ds/symlink/ucd9090.pdf http://focus.ti.com/lit/ds/symlink/ucd90910.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/ucd9200 b/Documentation/hwmon/ucd9200 index fd7d07b1908a..1e8060e631bd 100644 --- a/Documentation/hwmon/ucd9200 +++ b/Documentation/hwmon/ucd9200 @@ -15,7 +15,7 @@ Supported chips: http://focus.ti.com/lit/ds/symlink/ucd9246.pdf http://focus.ti.com/lit/ds/symlink/ucd9248.pdf -Author: Guenter Roeck +Author: Guenter Roeck Description diff --git a/Documentation/hwmon/zl6100 b/Documentation/hwmon/zl6100 index 3d924b6b59e9..756b57c6b73e 100644 --- a/Documentation/hwmon/zl6100 +++ b/Documentation/hwmon/zl6100 @@ -54,7 +54,7 @@ http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146401 http://archive.ericsson.net/service/internet/picov/get?DocNo=28701-EN/LZT146256 -Author: Guenter Roeck +Author: Guenter Roeck Description -- cgit From 6d21a416562c2532f45cbcd40f47653f61d45533 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Sun, 27 Jan 2013 09:36:36 -0800 Subject: hwmon: (pmbus/ltc2978) Update datasheet links Links to datasheets are no longer valid. Provide links to product information instead (which provides links to the datasheets and is hopefully more persistent). Signed-off-by: Guenter Roeck Acked-by: Jean Delvare --- Documentation/hwmon/ltc2978 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/ltc2978 b/Documentation/hwmon/ltc2978 index 1642348de2d6..e4d75c606c97 100644 --- a/Documentation/hwmon/ltc2978 +++ b/Documentation/hwmon/ltc2978 @@ -5,11 +5,11 @@ Supported chips: * Linear Technology LTC2978 Prefix: 'ltc2978' Addresses scanned: - - Datasheet: http://cds.linear.com/docs/Datasheet/2978fa.pdf + Datasheet: http://www.linear.com/product/ltc2978 * Linear Technology LTC3880 Prefix: 'ltc3880' Addresses scanned: - - Datasheet: http://cds.linear.com/docs/Datasheet/3880f.pdf + Datasheet: http://www.linear.com/product/ltc3880 Author: Guenter Roeck -- cgit From ab302bb0b87fe01f3110c1c5d5a6fca439d27c6b Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Thu, 28 Feb 2013 13:05:13 +0100 Subject: hwmon: (adt7410) Document ADT7420 support The adt7410 driver supports the ADT7420, but its documentation file makes no mention of that. Add this refrence, and a brief a description of the differences between the ADT7410 and the ADT7420. Signed-off-by: Jean Delvare Cc: Lars-Peter Clausen Cc: Hartmut Knaack Cc: Guenter Roeck Signed-off-by: Guenter Roeck --- Documentation/hwmon/adt7410 | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/hwmon/adt7410 b/Documentation/hwmon/adt7410 index 96004000dc2a..58150c480e56 100644 --- a/Documentation/hwmon/adt7410 +++ b/Documentation/hwmon/adt7410 @@ -4,9 +4,14 @@ Kernel driver adt7410 Supported chips: * Analog Devices ADT7410 Prefix: 'adt7410' - Addresses scanned: I2C 0x48 - 0x4B + Addresses scanned: None Datasheet: Publicly available at the Analog Devices website http://www.analog.com/static/imported-files/data_sheets/ADT7410.pdf + * Analog Devices ADT7420 + Prefix: 'adt7420' + Addresses scanned: None + Datasheet: Publicly available at the Analog Devices website + http://www.analog.com/static/imported-files/data_sheets/ADT7420.pdf Author: Hartmut Knaack @@ -27,6 +32,10 @@ value per second or even justget one sample on demand for power saving. Besides, it can completely power down its ADC, if power management is required. +The ADT7420 is register compatible, the only differences being the package, +a slightly narrower operating temperature range (-40°C to +150°C), and a +better accuracy (0.25°C instead of 0.50°C.) + Configuration Notes ------------------- -- cgit From 4b87581036849723242cadaa161e9b01234ef9ae Mon Sep 17 00:00:00 2001 From: Nishanth Menon Date: Tue, 26 Feb 2013 23:53:02 +0000 Subject: PM / OPP: improve introductory documentation Make Operating Performance Points (OPP) library introductory chapter a little more reader-friendly. Split the chapter into two sections, highlight the definition with an example and minor rewording to be verbose. Reported-by: Linus Torvalds Signed-off-by: Nishanth Menon Reviewed-by: Randy Dunlap Signed-off-by: Rafael J. Wysocki --- Documentation/power/opp.txt | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt index 3035d00757ad..425c51d56aef 100644 --- a/Documentation/power/opp.txt +++ b/Documentation/power/opp.txt @@ -1,6 +1,5 @@ -*=============* -* OPP Library * -*=============* +Operating Performance Points (OPP) Library +========================================== (C) 2009-2010 Nishanth Menon , Texas Instruments Incorporated @@ -16,15 +15,31 @@ Contents 1. Introduction =============== +1.1 What is an Operating Performance Point (OPP)? + Complex SoCs of today consists of a multiple sub-modules working in conjunction. In an operational system executing varied use cases, not all modules in the SoC need to function at their highest performing frequency all the time. To facilitate this, sub-modules in a SoC are grouped into domains, allowing some -domains to run at lower voltage and frequency while other domains are loaded -more. The set of discrete tuples consisting of frequency and voltage pairs that +domains to run at lower voltage and frequency while other domains run at +voltage/frequency pairs that are higher. + +The set of discrete tuples consisting of frequency and voltage pairs that the device will support per domain are called Operating Performance Points or OPPs. +As an example: +Let us consider an MPU device which supports the following: +{300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V}, +{1GHz at minimum voltage of 1.3V} + +We can represent these as three OPPs as the following {Hz, uV} tuples: +{300000000, 1000000} +{800000000, 1200000} +{1000000000, 1300000} + +1.2 Operating Performance Points Library + OPP library provides a set of helper functions to organize and query the OPP information. The library is located in drivers/base/power/opp.c and the header is located in include/linux/opp.h. OPP library can be enabled by enabling -- cgit From 755727b7fb1e0ebe46824159107749cf635d43b1 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 8 Mar 2013 12:43:35 -0800 Subject: Randy has moved Update email address and CREDITS info. xenotime.net is defunct. Signed-off-by: Randy Dunlap Cc: Harry Wei Cc: Keiichi KII Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/SubmittingPatches | 3 +-- Documentation/printk-formats.txt | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index c379a2a6949f..aa0c1e63f050 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -60,8 +60,7 @@ own source tree. For example: "dontdiff" is a list of files which are generated by the kernel during the build process, and should be ignored in any diff(1)-generated patch. The "dontdiff" file is included in the kernel tree in -2.6.12 and later. For earlier kernel versions, you can get it -from . +2.6.12 and later. Make sure your patch does not include any extra files which do not belong in a patch submission. Make sure to review your patch -after- diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index e8a6aa473bab..6e953564de03 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt @@ -170,5 +170,5 @@ Reminder: sizeof() result is of type size_t. Thank you for your cooperation and attention. -By Randy Dunlap and +By Randy Dunlap and Andrew Murray -- cgit From 4357fb570b3709c145384065d04b698a30dc722e Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 12 Feb 2013 07:56:27 -0800 Subject: rcu: Make bugginess of code sample more evident One of the code samples in whatisRCU.txt shows a bug, but someone scanning the document quickly might mistake it for a valid use of RCU. Add some screaming comments to help keep speed-readers on track. Reported-by: Nathan Zimmer Signed-off-by: Paul E. McKenney Acked-by: Rafael J. Wysocki --- Documentation/RCU/whatisRCU.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 0cc7820967f4..10df0b82f459 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -265,9 +265,9 @@ rcu_dereference() rcu_read_lock(); p = rcu_dereference(head.next); rcu_read_unlock(); - x = p->address; + x = p->address; /* BUG!!! */ rcu_read_lock(); - y = p->data; + y = p->data; /* BUG!!! */ rcu_read_unlock(); Holding a reference from one RCU read-side critical section -- cgit From 3f944adb9d1ca912902783e7aede2a5b5c19a605 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 4 Mar 2013 17:55:49 -0800 Subject: rcu: Documentation update This commit applies a few updates based on a quick review of the RCU documentations. Signed-off-by: Paul E. McKenney --- Documentation/RCU/checklist.txt | 26 ++++++++++++++++---------- Documentation/RCU/lockdep.txt | 5 +++++ Documentation/RCU/rcubarrier.txt | 15 ++++++++++++++- 3 files changed, 35 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 31ef8fe07f82..79e789b8b8ea 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt @@ -217,9 +217,14 @@ over a rather long period of time, but improvements are always welcome! whether the increased speed is worth it. 8. Although synchronize_rcu() is slower than is call_rcu(), it - usually results in simpler code. So, unless update performance - is critically important or the updaters cannot block, - synchronize_rcu() should be used in preference to call_rcu(). + usually results in simpler code. So, unless update performance is + critically important, the updaters cannot block, or the latency of + synchronize_rcu() is visible from userspace, synchronize_rcu() + should be used in preference to call_rcu(). Furthermore, + kfree_rcu() usually results in even simpler code than does + synchronize_rcu() without synchronize_rcu()'s multi-millisecond + latency. So please take advantage of kfree_rcu()'s "fire and + forget" memory-freeing capabilities where it applies. An especially important property of the synchronize_rcu() primitive is that it automatically self-limits: if grace periods @@ -268,7 +273,8 @@ over a rather long period of time, but improvements are always welcome! e. Periodically invoke synchronize_rcu(), permitting a limited number of updates per grace period. - The same cautions apply to call_rcu_bh() and call_rcu_sched(). + The same cautions apply to call_rcu_bh(), call_rcu_sched(), + call_srcu(), and kfree_rcu(). 9. All RCU list-traversal primitives, which include rcu_dereference(), list_for_each_entry_rcu(), and @@ -296,9 +302,9 @@ over a rather long period of time, but improvements are always welcome! all currently executing rcu_read_lock()-protected RCU read-side critical sections complete. It does -not- necessarily guarantee that all currently running interrupts, NMIs, preempt_disable() - code, or idle loops will complete. Therefore, if you do not have - rcu_read_lock()-protected read-side critical sections, do -not- - use synchronize_rcu(). + code, or idle loops will complete. Therefore, if your + read-side critical sections are protected by something other + than rcu_read_lock(), do -not- use synchronize_rcu(). Similarly, disabling preemption is not an acceptable substitute for rcu_read_lock(). Code that attempts to use preemption @@ -401,9 +407,9 @@ over a rather long period of time, but improvements are always welcome! read-side critical sections. It is the responsibility of the RCU update-side primitives to deal with this. -17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and - the __rcu sparse checks to validate your RCU code. These - can help find problems as follows: +17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the + __rcu sparse checks (enabled by CONFIG_SPARSE_RCU_POINTER) to + validate your RCU code. These can help find problems as follows: CONFIG_PROVE_RCU: check that accesses to RCU-protected data structures are carried out under the proper RCU diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt index a102d4b3724b..cd83d2348fef 100644 --- a/Documentation/RCU/lockdep.txt +++ b/Documentation/RCU/lockdep.txt @@ -64,6 +64,11 @@ checking of rcu_dereference() primitives: but retain the compiler constraints that prevent duplicating or coalescsing. This is useful when when testing the value of the pointer itself, for example, against NULL. + rcu_access_index(idx): + Return the value of the index and omit all barriers, but + retain the compiler constraints that prevent duplicating + or coalescsing. This is useful when when testing the + value of the index itself, for example, against -1. The rcu_dereference_check() check expression can be any boolean expression, but would normally include a lockdep expression. However, diff --git a/Documentation/RCU/rcubarrier.txt b/Documentation/RCU/rcubarrier.txt index 38428c125135..2e319d1b9ef2 100644 --- a/Documentation/RCU/rcubarrier.txt +++ b/Documentation/RCU/rcubarrier.txt @@ -79,7 +79,20 @@ complete. Pseudo-code using rcu_barrier() is as follows: 2. Execute rcu_barrier(). 3. Allow the module to be unloaded. -The rcutorture module makes use of rcu_barrier in its exit function +There are also rcu_barrier_bh(), rcu_barrier_sched(), and srcu_barrier() +functions for the other flavors of RCU, and you of course must match +the flavor of rcu_barrier() with that of call_rcu(). If your module +uses multiple flavors of call_rcu(), then it must also use multiple +flavors of rcu_barrier() when unloading that module. For example, if +it uses call_rcu_bh(), call_srcu() on srcu_struct_1, and call_srcu() on +srcu_struct_2(), then the following three lines of code will be required +when unloading: + + 1 rcu_barrier_bh(); + 2 srcu_barrier(&srcu_struct_1); + 3 srcu_barrier(&srcu_struct_2); + +The rcutorture module makes use of rcu_barrier() in its exit function as follows: 1 static void -- cgit From 6231069bdab575fce862ca786f1c0ba5e4e9ba3b Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 6 Mar 2013 13:37:09 -0800 Subject: rcu: Add softirq-stall indications to stall-warning messages If RCU's softirq handler is prevented from executing, an RCU CPU stall warning can result. Ways to prevent RCU's softirq handler from executing include: (1) CPU spinning with interrupts disabled, (2) infinite loop in some softirq handler, and (3) in -rt kernels, an infinite loop in a set of real-time threads running at priorities higher than that of RCU's softirq handler. Because this situation can be difficult to track down, this commit causes the count of RCU softirq handler invocations to be printed with RCU CPU stall warnings. This information does require some interpretation, as now documented in Documentation/RCU/stallwarn.txt. Reported-by: Thomas Gleixner Signed-off-by: Paul E. McKenney Tested-by: Paul Gortmaker --- Documentation/RCU/stallwarn.txt | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 1927151b386b..e38b8df3d727 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt @@ -92,14 +92,14 @@ If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, more information is printed with the stall-warning message, for example: INFO: rcu_preempt detected stall on CPU - 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 + 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543 (t=65000 jiffies) In kernels with CONFIG_RCU_FAST_NO_HZ, even more information is printed: INFO: rcu_preempt detected stall on CPU - 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 drain=0 . timer not pending + 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 nonlazy_posted: 25 .D (t=65000 jiffies) The "(64628 ticks this GP)" indicates that this CPU has taken more @@ -116,13 +116,28 @@ number between the two "/"s is the value of the nesting, which will be a small positive number if in the idle loop and a very large positive number (as shown above) otherwise. -For CONFIG_RCU_FAST_NO_HZ kernels, the "drain=0" indicates that the CPU is -not in the process of trying to force itself into dyntick-idle state, the -"." indicates that the CPU has not given up forcing RCU into dyntick-idle -mode (it would be "H" otherwise), and the "timer not pending" indicates -that the CPU has not recently forced RCU into dyntick-idle mode (it -would otherwise indicate the number of microseconds remaining in this -forced state). +The "softirq=" portion of the message tracks the number of RCU softirq +handlers that the stalled CPU has executed. The number before the "/" +is the number that had executed since boot at the time that this CPU +last noted the beginning of a grace period, which might be the current +(stalled) grace period, or it might be some earlier grace period (for +example, if the CPU might have been in dyntick-idle mode for an extended +time period. The number after the "/" is the number that have executed +since boot until the current time. If this latter number stays constant +across repeated stall-warning messages, it is possible that RCU's softirq +handlers are no longer able to execute on this CPU. This can happen if +the stalled CPU is spinning with interrupts are disabled, or, in -rt +kernels, if a high-priority process is starving RCU's softirq handler. + +For CONFIG_RCU_FAST_NO_HZ kernels, the "last_accelerate:" prints the +low-order 16 bits (in hex) of the jiffies counter when this CPU last +invoked rcu_try_advance_all_cbs() from rcu_needs_cpu() or last invoked +rcu_accelerate_cbs() from rcu_prepare_for_idle(). The "nonlazy_posted:" +prints the number of non-lazy callbacks posted since the last call to +rcu_needs_cpu(). Finally, an "L" indicates that there are currently +no non-lazy callbacks ("." is printed otherwise, as shown above) and +"D" indicates that dyntick-idle processing is enabled ("." is printed +otherwise, for example, if disabled via the "nohz=" kernel boot parameter). Multiple Warnings From One Stall -- cgit From a488985851cf2facd2227bd982cc2c251df56268 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 3 Dec 2012 08:16:28 -0800 Subject: rcu: Distinguish "rcuo" kthreads by RCU flavor Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by the CPU number, for example, "rcuo". This is problematic given that there are either two or three RCU flavors, each of which gets a per-CPU kthread with exactly the same name. This commit therefore introduces a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh, 'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have "rcuob/0", "rcuop/0", and "rcuos/0". Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Tested-by: Dietmar Eggemann --- Documentation/kernel-parameters.txt | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 4609e81dbc37..a17ba16c8fc8 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2461,9 +2461,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. In kernels built with CONFIG_RCU_NOCB_CPU=y, set the specified list of CPUs to be no-callback CPUs. Invocation of these CPUs' RCU callbacks will - be offloaded to "rcuoN" kthreads created for - that purpose. This reduces OS jitter on the + be offloaded to "rcuox/N" kthreads created for + that purpose, where "x" is "b" for RCU-bh, "p" + for RCU-preempt, and "s" for RCU-sched, and "N" + is the CPU number. This reduces OS jitter on the offloaded CPUs, which can be useful for HPC and + real-time workloads. It can also improve energy efficiency for asymmetric multiprocessors. -- cgit From c0f4dfd4f90f1667d234d21f15153ea09a2eaa66 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 28 Dec 2012 11:30:36 -0800 Subject: rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks Because RCU callbacks are now associated with the number of the grace period that they must wait for, CPUs can now take advance callbacks corresponding to grace periods that ended while a given CPU was in dyntick-idle mode. This eliminates the need to try forcing the RCU state machine while entering idle, thus reducing the CPU intensiveness of RCU_FAST_NO_HZ, which should increase its energy efficiency. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney --- Documentation/kernel-parameters.txt | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index a17ba16c8fc8..22303b2e74bc 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2490,6 +2490,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted. leaf rcu_node structure. Useful for very large systems. + rcutree.jiffies_till_first_fqs= [KNL,BOOT] + Set delay from grace-period initialization to + first attempt to force quiescent states. + Units are jiffies, minimum value is zero, + and maximum value is HZ. + + rcutree.jiffies_till_next_fqs= [KNL,BOOT] + Set delay between subsequent attempts to force + quiescent states. Units are jiffies, minimum + value is one, and maximum value is HZ. + rcutree.qhimark= [KNL,BOOT] Set threshold of queued RCU callbacks over which batch limiting is disabled. @@ -2504,16 +2515,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted. rcutree.rcu_cpu_stall_timeout= [KNL,BOOT] Set timeout for RCU CPU stall warning messages. - rcutree.jiffies_till_first_fqs= [KNL,BOOT] - Set delay from grace-period initialization to - first attempt to force quiescent states. - Units are jiffies, minimum value is zero, - and maximum value is HZ. + rcutree.rcu_idle_gp_delay= [KNL,BOOT] + Set wakeup interval for idle CPUs that have + RCU callbacks (RCU_FAST_NO_HZ=y). - rcutree.jiffies_till_next_fqs= [KNL,BOOT] - Set delay between subsequent attempts to force - quiescent states. Units are jiffies, minimum - value is one, and maximum value is HZ. + rcutree.rcu_idle_lazy_gp_delay= [KNL,BOOT] + Set wakeup interval for idle CPUs that have + only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y). + Lazy RCU callbacks are those which RCU can + prove do nothing more than free memory. rcutorture.fqs_duration= [KNL,BOOT] Set duration of force_quiescent_state bursts. -- cgit From 49717cb40410fe4b563968680ff7c513967504c6 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 11 Apr 2013 08:07:11 -0700 Subject: kthread: Document ways of reducing OS jitter due to per-CPU kthreads The Linux kernel uses a number of per-CPU kthreads, any of which might contribute to OS jitter at any time. The usual approach to normal kthreads, namely to bind them to a "housekeeping" CPU, does not work with these kthreads because they cannot operate correctly if moved to some other CPU. This commit therefore lists ways of controlling OS jitter from the Linux kernel's per-CPU kthreads. It also lists some ways of diagnosing excessive jitter. Signed-off-by: Paul E. McKenney Cc: Frederic Weisbecker Cc: Steven Rostedt Cc: Borislav Petkov Cc: Arjan van de Ven Cc: Kevin Hilman Cc: Christoph Lameter Cc: Thomas Gleixner Cc: Olivier Baetz Cc: Pradeep Satyanarayana Reviewed-by: Randy Dunlap Reviewed-by: Borislav Petkov --- Documentation/kernel-per-CPU-kthreads.txt | 202 ++++++++++++++++++++++++++++++ 1 file changed, 202 insertions(+) create mode 100644 Documentation/kernel-per-CPU-kthreads.txt (limited to 'Documentation') diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt new file mode 100644 index 000000000000..cbf7ae412da4 --- /dev/null +++ b/Documentation/kernel-per-CPU-kthreads.txt @@ -0,0 +1,202 @@ +REDUCING OS JITTER DUE TO PER-CPU KTHREADS + +This document lists per-CPU kthreads in the Linux kernel and presents +options to control their OS jitter. Note that non-per-CPU kthreads are +not listed here. To reduce OS jitter from non-per-CPU kthreads, bind +them to a "housekeeping" CPU dedicated to such work. + + +REFERENCES + +o Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. + +o Documentation/cgroups: Using cgroups to bind tasks to sets of CPUs. + +o man taskset: Using the taskset command to bind tasks to sets + of CPUs. + +o man sched_setaffinity: Using the sched_setaffinity() system + call to bind tasks to sets of CPUs. + +o /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, + writing "0" to offline and "1" to online. + +o In order to locate kernel-generated OS jitter on CPU N: + + cd /sys/kernel/debug/tracing + echo 1 > max_graph_depth # Increase the "1" for more detail + echo function_graph > current_tracer + # run workload + cat per_cpu/cpuN/trace + + +KTHREADS + +Name: ehca_comp/%u +Purpose: Periodically process Infiniband-related work. +To reduce its OS jitter, do any of the following: +1. Don't use eHCA Infiniband hardware, instead choosing hardware + that does not require per-CPU kthreads. This will prevent these + kthreads from being created in the first place. (This will + work for most people, as this hardware, though important, is + relatively old and is produced in relatively low unit volumes.) +2. Do all eHCA-Infiniband-related work on other CPUs, including + interrupts. +3. Rework the eHCA driver so that its per-CPU kthreads are + provisioned only on selected CPUs. + + +Name: irq/%d-%s +Purpose: Handle threaded interrupts. +To reduce its OS jitter, do the following: +1. Use irq affinity to force the irq threads to execute on + some other CPU. + +Name: kcmtpd_ctr_%d +Purpose: Handle Bluetooth work. +To reduce its OS jitter, do one of the following: +1. Don't use Bluetooth, in which case these kthreads won't be + created in the first place. +2. Use irq affinity to force Bluetooth-related interrupts to + occur on some other CPU and furthermore initiate all + Bluetooth activity on some other CPU. + +Name: ksoftirqd/%u +Purpose: Execute softirq handlers when threaded or when under heavy load. +To reduce its OS jitter, each softirq vector must be handled +separately as follows: +TIMER_SOFTIRQ: Do all of the following: +1. To the extent possible, keep the CPU out of the kernel when it + is non-idle, for example, by avoiding system calls and by forcing + both kernel threads and interrupts to execute elsewhere. +2. Build with CONFIG_HOTPLUG_CPU=y. After boot completes, force + the CPU offline, then bring it back online. This forces + recurring timers to migrate elsewhere. If you are concerned + with multiple CPUs, force them all offline before bringing the + first one back online. Once you have onlined the CPUs in question, + do not offline any other CPUs, because doing so could force the + timer back onto one of the CPUs in question. +NET_TX_SOFTIRQ and NET_RX_SOFTIRQ: Do all of the following: +1. Force networking interrupts onto other CPUs. +2. Initiate any network I/O on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +BLOCK_SOFTIRQ: Do all of the following: +1. Force block-device interrupts onto some other CPU. +2. Initiate any block I/O on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +BLOCK_IOPOLL_SOFTIRQ: Do all of the following: +1. Force block-device interrupts onto some other CPU. +2. Initiate any block I/O and block-I/O polling on other CPUs. +3. Once your application has started, prevent CPU-hotplug operations + from being initiated from tasks that might run on the CPU to + be de-jittered. (It is OK to force this CPU offline and then + bring it back online before you start your application.) +TASKLET_SOFTIRQ: Do one or more of the following: +1. Avoid use of drivers that use tasklets. (Such drivers will contain + calls to things like tasklet_schedule().) +2. Convert all drivers that you must use from tasklets to workqueues. +3. Force interrupts for drivers using tasklets onto other CPUs, + and also do I/O involving these drivers on other CPUs. +SCHED_SOFTIRQ: Do all of the following: +1. Avoid sending scheduler IPIs to the CPU to be de-jittered, + for example, ensure that at most one runnable kthread is present + on that CPU. If a thread that expects to run on the de-jittered + CPU awakens, the scheduler will send an IPI that can result in + a subsequent SCHED_SOFTIRQ. +2. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, + CONFIG_NO_HZ_FULL=y, and, in addition, ensure that the CPU + to be de-jittered is marked as an adaptive-ticks CPU using the + "nohz_full=" boot parameter. This reduces the number of + scheduler-clock interrupts that the de-jittered CPU receives, + minimizing its chances of being selected to do the load balancing + work that runs in SCHED_SOFTIRQ context. +3. To the extent possible, keep the CPU out of the kernel when it + is non-idle, for example, by avoiding system calls and by + forcing both kernel threads and interrupts to execute elsewhere. + This further reduces the number of scheduler-clock interrupts + received by the de-jittered CPU. +HRTIMER_SOFTIRQ: Do all of the following: +1. To the extent possible, keep the CPU out of the kernel when it + is non-idle. For example, avoid system calls and force both + kernel threads and interrupts to execute elsewhere. +2. Build with CONFIG_HOTPLUG_CPU=y. Once boot completes, force the + CPU offline, then bring it back online. This forces recurring + timers to migrate elsewhere. If you are concerned with multiple + CPUs, force them all offline before bringing the first one + back online. Once you have onlined the CPUs in question, do not + offline any other CPUs, because doing so could force the timer + back onto one of the CPUs in question. +RCU_SOFTIRQ: Do at least one of the following: +1. Offload callbacks and keep the CPU in either dyntick-idle or + adaptive-ticks state by doing all of the following: + a. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y, + CONFIG_NO_HZ_FULL=y, and, in addition ensure that the CPU + to be de-jittered is marked as an adaptive-ticks CPU using + the "nohz_full=" boot parameter. Bind the rcuo kthreads + to housekeeping CPUs, which can tolerate OS jitter. + b. To the extent possible, keep the CPU out of the kernel + when it is non-idle, for example, by avoiding system + calls and by forcing both kernel threads and interrupts + to execute elsewhere. +2. Enable RCU to do its processing remotely via dyntick-idle by + doing all of the following: + a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. + b. Ensure that the CPU goes idle frequently, allowing other + CPUs to detect that it has passed through an RCU quiescent + state. If the kernel is built with CONFIG_NO_HZ_FULL=y, + userspace execution also allows other CPUs to detect that + the CPU in question has passed through a quiescent state. + c. To the extent possible, keep the CPU out of the kernel + when it is non-idle, for example, by avoiding system + calls and by forcing both kernel threads and interrupts + to execute elsewhere. + +Name: rcuc/%u +Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. +To reduce its OS jitter, do at least one of the following: +1. Build the kernel with CONFIG_PREEMPT=n. This prevents these + kthreads from being created in the first place, and also obviates + the need for RCU priority boosting. This approach is feasible + for workloads that do not require high degrees of responsiveness. +2. Build the kernel with CONFIG_RCU_BOOST=n. This prevents these + kthreads from being created in the first place. This approach + is feasible only if your workload never requires RCU priority + boosting, for example, if you ensure frequent idle time on all + CPUs that might execute within the kernel. +3. Build with CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y, + which offloads all RCU callbacks to kthreads that can be moved + off of CPUs susceptible to OS jitter. This approach prevents the + rcuc/%u kthreads from having any work to do, so that they are + never awakened. +4. Ensure that the CPU never enters the kernel, and, in particular, + avoid initiating any CPU hotplug operations on this CPU. This is + another way of preventing any callbacks from being queued on the + CPU, again preventing the rcuc/%u kthreads from having any work + to do. + +Name: rcuob/%d, rcuop/%d, and rcuos/%d +Purpose: Offload RCU callbacks from the corresponding CPU. +To reduce its OS jitter, do at least one of the following: +1. Use affinity, cgroups, or other mechanism to force these kthreads + to execute on some other CPU. +2. Build with CONFIG_RCU_NOCB_CPUS=n, which will prevent these + kthreads from being created in the first place. However, please + note that this will not eliminate OS jitter, but will instead + shift it to RCU_SOFTIRQ. + +Name: watchdog/%u +Purpose: Detect software lockups on each CPU. +To reduce its OS jitter, do at least one of the following: +1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these + kthreads from being created in the first place. +2. Echo a zero to /proc/sys/kernel/watchdog to disable the + watchdog timer. +3. Echo a large number of /proc/sys/kernel/watchdog_thresh in + order to reduce the frequency of OS jitter due to the watchdog + timer down to a level that is acceptable for your workload. -- cgit