diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-block-bcache | 156 | ||||
-rw-r--r-- | Documentation/ABI/testing/sysfs-class-mtd | 6 | ||||
-rw-r--r-- | Documentation/acpi/enumeration.txt | 77 | ||||
-rw-r--r-- | Documentation/bcache.txt | 431 | ||||
-rw-r--r-- | Documentation/block/cfq-iosched.txt | 47 | ||||
-rw-r--r-- | Documentation/cgroups/memory.txt | 4 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/dma/atmel-dma.txt | 35 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/mtd/gpmc-nand.txt | 28 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/mtd/partition.txt | 36 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/net/gpmc-eth.txt | 56 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/thermal/armada-thermal.txt | 22 | ||||
-rw-r--r-- | Documentation/dmatest.txt | 81 | ||||
-rw-r--r-- | Documentation/filesystems/f2fs.txt | 4 | ||||
-rw-r--r-- | Documentation/gpio.txt | 10 | ||||
-rw-r--r-- | Documentation/thermal/exynos_thermal_emulation | 8 | ||||
-rw-r--r-- | Documentation/thermal/sysfs-api.txt | 28 | ||||
-rw-r--r-- | Documentation/zh_CN/gpio.txt | 8 |
17 files changed, 954 insertions, 83 deletions
diff --git a/Documentation/ABI/testing/sysfs-block-bcache b/Documentation/ABI/testing/sysfs-block-bcache new file mode 100644 index 000000000000..9e4bbc5d51fd --- /dev/null +++ b/Documentation/ABI/testing/sysfs-block-bcache @@ -0,0 +1,156 @@ +What: /sys/block/<disk>/bcache/unregister +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + A write to this file causes the backing device or cache to be + unregistered. If a backing device had dirty data in the cache, + writeback mode is automatically disabled and all dirty data is + flushed before the device is unregistered. Caches unregister + all associated backing devices before unregistering themselves. + +What: /sys/block/<disk>/bcache/clear_stats +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + Writing to this file resets all the statistics for the device. + +What: /sys/block/<disk>/bcache/cache +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For a backing device that has cache, a symlink to + the bcache/ dir of that cache. + +What: /sys/block/<disk>/bcache/cache_hits +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For backing devices: integer number of full cache hits, + counted per bio. A partial cache hit counts as a miss. + +What: /sys/block/<disk>/bcache/cache_misses +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For backing devices: integer number of cache misses. + +What: /sys/block/<disk>/bcache/cache_hit_ratio +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For backing devices: cache hits as a percentage. + +What: /sys/block/<disk>/bcache/sequential_cutoff +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For backing devices: Threshold past which sequential IO will + skip the cache. Read and written as bytes in human readable + units (i.e. echo 10M > sequntial_cutoff). + +What: /sys/block/<disk>/bcache/bypassed +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + Sum of all reads and writes that have bypassed the cache (due + to the sequential cutoff). Expressed as bytes in human + readable units. + +What: /sys/block/<disk>/bcache/writeback +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For backing devices: When on, writeback caching is enabled and + writes will be buffered in the cache. When off, caching is in + writethrough mode; reads and writes will be added to the + cache but no write buffering will take place. + +What: /sys/block/<disk>/bcache/writeback_running +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For backing devices: when off, dirty data will not be written + from the cache to the backing device. The cache will still be + used to buffer writes until it is mostly full, at which point + writes transparently revert to writethrough mode. Intended only + for benchmarking/testing. + +What: /sys/block/<disk>/bcache/writeback_delay +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For backing devices: In writeback mode, when dirty data is + written to the cache and the cache held no dirty data for that + backing device, writeback from cache to backing device starts + after this delay, expressed as an integer number of seconds. + +What: /sys/block/<disk>/bcache/writeback_percent +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For backing devices: If nonzero, writeback from cache to + backing device only takes place when more than this percentage + of the cache is used, allowing more write coalescing to take + place and reducing total number of writes sent to the backing + device. Integer between 0 and 40. + +What: /sys/block/<disk>/bcache/synchronous +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For a cache, a boolean that allows synchronous mode to be + switched on and off. In synchronous mode all writes are ordered + such that the cache can reliably recover from unclean shutdown; + if disabled bcache will not generally wait for writes to + complete but if the cache is not shut down cleanly all data + will be discarded from the cache. Should not be turned off with + writeback caching enabled. + +What: /sys/block/<disk>/bcache/discard +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For a cache, a boolean allowing discard/TRIM to be turned off + or back on if the device supports it. + +What: /sys/block/<disk>/bcache/bucket_size +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For a cache, bucket size in human readable units, as set at + cache creation time; should match the erase block size of the + SSD for optimal performance. + +What: /sys/block/<disk>/bcache/nbuckets +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For a cache, the number of usable buckets. + +What: /sys/block/<disk>/bcache/tree_depth +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For a cache, height of the btree excluding leaf nodes (i.e. a + one node tree will have a depth of 0). + +What: /sys/block/<disk>/bcache/btree_cache_size +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + Number of btree buckets/nodes that are currently cached in + memory; cache dynamically grows and shrinks in response to + memory pressure from the rest of the system. + +What: /sys/block/<disk>/bcache/written +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For a cache, total amount of data in human readable units + written to the cache, excluding all metadata. + +What: /sys/block/<disk>/bcache/btree_written +Date: November 2010 +Contact: Kent Overstreet <kent.overstreet@gmail.com> +Description: + For a cache, sum of all btree writes in human readable units. diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd index 938ef71e2035..3105644b3bfc 100644 --- a/Documentation/ABI/testing/sysfs-class-mtd +++ b/Documentation/ABI/testing/sysfs-class-mtd @@ -14,8 +14,7 @@ Description: The /sys/class/mtd/mtd{0,1,2,3,...} directories correspond to each /dev/mtdX character device. These may represent physical/simulated flash devices, partitions on a flash - device, or concatenated flash devices. They exist regardless - of whether CONFIG_MTD_CHAR is actually enabled. + device, or concatenated flash devices. What: /sys/class/mtd/mtdXro/ Date: April 2009 @@ -23,8 +22,7 @@ KernelVersion: 2.6.29 Contact: linux-mtd@lists.infradead.org Description: These directories provide the corresponding read-only device - nodes for /sys/class/mtd/mtdX/ . They are only created - (for the benefit of udev) if CONFIG_MTD_CHAR is enabled. + nodes for /sys/class/mtd/mtdX/ . What: /sys/class/mtd/mtdX/dev Date: April 2009 diff --git a/Documentation/acpi/enumeration.txt b/Documentation/acpi/enumeration.txt index b0d541042ac6..d9be7a97dff3 100644 --- a/Documentation/acpi/enumeration.txt +++ b/Documentation/acpi/enumeration.txt @@ -66,6 +66,83 @@ the ACPI device explicitly to acpi_platform_device_ids list defined in drivers/acpi/acpi_platform.c. This limitation is only for the platform devices, SPI and I2C devices are created automatically as described below. +DMA support +~~~~~~~~~~~ +DMA controllers enumerated via ACPI should be registered in the system to +provide generic access to their resources. For example, a driver that would +like to be accessible to slave devices via generic API call +dma_request_slave_channel() must register itself at the end of the probe +function like this: + + err = devm_acpi_dma_controller_register(dev, xlate_func, dw); + /* Handle the error if it's not a case of !CONFIG_ACPI */ + +and implement custom xlate function if needed (usually acpi_dma_simple_xlate() +is enough) which converts the FixedDMA resource provided by struct +acpi_dma_spec into the corresponding DMA channel. A piece of code for that case +could look like: + + #ifdef CONFIG_ACPI + struct filter_args { + /* Provide necessary information for the filter_func */ + ... + }; + + static bool filter_func(struct dma_chan *chan, void *param) + { + /* Choose the proper channel */ + ... + } + + static struct dma_chan *xlate_func(struct acpi_dma_spec *dma_spec, + struct acpi_dma *adma) + { + dma_cap_mask_t cap; + struct filter_args args; + + /* Prepare arguments for filter_func */ + ... + return dma_request_channel(cap, filter_func, &args); + } + #else + static struct dma_chan *xlate_func(struct acpi_dma_spec *dma_spec, + struct acpi_dma *adma) + { + return NULL; + } + #endif + +dma_request_slave_channel() will call xlate_func() for each registered DMA +controller. In the xlate function the proper channel must be chosen based on +information in struct acpi_dma_spec and the properties of the controller +provided by struct acpi_dma. + +Clients must call dma_request_slave_channel() with the string parameter that +corresponds to a specific FixedDMA resource. By default "tx" means the first +entry of the FixedDMA resource array, "rx" means the second entry. The table +below shows a layout: + + Device (I2C0) + { + ... + Method (_CRS, 0, NotSerialized) + { + Name (DBUF, ResourceTemplate () + { + FixedDMA (0x0018, 0x0004, Width32bit, _Y48) + FixedDMA (0x0019, 0x0005, Width32bit, ) + }) + ... + } + } + +So, the FixedDMA with request line 0x0018 is "tx" and next one is "rx" in +this example. + +In robust cases the client unfortunately needs to call +acpi_dma_request_slave_chan_by_index() directly and therefore choose the +specific FixedDMA resource by its index. + SPI serial bus support ~~~~~~~~~~~~~~~~~~~~~~ Slave devices behind SPI bus have SpiSerialBus resource attached to them. diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt new file mode 100644 index 000000000000..77db8809bd96 --- /dev/null +++ b/Documentation/bcache.txt @@ -0,0 +1,431 @@ +Say you've got a big slow raid 6, and an X-25E or three. Wouldn't it be +nice if you could use them as cache... Hence bcache. + +Wiki and git repositories are at: + http://bcache.evilpiepirate.org + http://evilpiepirate.org/git/linux-bcache.git + http://evilpiepirate.org/git/bcache-tools.git + +It's designed around the performance characteristics of SSDs - it only allocates +in erase block sized buckets, and it uses a hybrid btree/log to track cached +extants (which can be anywhere from a single sector to the bucket size). It's +designed to avoid random writes at all costs; it fills up an erase block +sequentially, then issues a discard before reusing it. + +Both writethrough and writeback caching are supported. Writeback defaults to +off, but can be switched on and off arbitrarily at runtime. Bcache goes to +great lengths to protect your data - it reliably handles unclean shutdown. (It +doesn't even have a notion of a clean shutdown; bcache simply doesn't return +writes as completed until they're on stable storage). + +Writeback caching can use most of the cache for buffering writes - writing +dirty data to the backing device is always done sequentially, scanning from the +start to the end of the index. + +Since random IO is what SSDs excel at, there generally won't be much benefit +to caching large sequential IO. Bcache detects sequential IO and skips it; +it also keeps a rolling average of the IO sizes per task, and as long as the +average is above the cutoff it will skip all IO from that task - instead of +caching the first 512k after every seek. Backups and large file copies should +thus entirely bypass the cache. + +In the event of a data IO error on the flash it will try to recover by reading +from disk or invalidating cache entries. For unrecoverable errors (meta data +or dirty data), caching is automatically disabled; if dirty data was present +in the cache it first disables writeback caching and waits for all dirty data +to be flushed. + +Getting started: +You'll need make-bcache from the bcache-tools repository. Both the cache device +and backing device must be formatted before use. + make-bcache -B /dev/sdb + make-bcache -C /dev/sdc + +make-bcache has the ability to format multiple devices at the same time - if +you format your backing devices and cache device at the same time, you won't +have to manually attach: + make-bcache -B /dev/sda /dev/sdb -C /dev/sdc + +To make bcache devices known to the kernel, echo them to /sys/fs/bcache/register: + + echo /dev/sdb > /sys/fs/bcache/register + echo /dev/sdc > /sys/fs/bcache/register + +To register your bcache devices automatically, you could add something like +this to an init script: + + echo /dev/sd* > /sys/fs/bcache/register_quiet + +It'll look for bcache superblocks and ignore everything that doesn't have one. + +Registering the backing device makes the bcache show up in /dev; you can now +format it and use it as normal. But the first time using a new bcache device, +it'll be running in passthrough mode until you attach it to a cache. See the +section on attaching. + +The devices show up at /dev/bcacheN, and can be controlled via sysfs from +/sys/block/bcacheN/bcache: + + mkfs.ext4 /dev/bcache0 + mount /dev/bcache0 /mnt + +Cache devices are managed as sets; multiple caches per set isn't supported yet +but will allow for mirroring of metadata and dirty data in the future. Your new +cache set shows up as /sys/fs/bcache/<UUID> + +ATTACHING: + +After your cache device and backing device are registered, the backing device +must be attached to your cache set to enable caching. Attaching a backing +device to a cache set is done thusly, with the UUID of the cache set in +/sys/fs/bcache: + + echo <UUID> > /sys/block/bcache0/bcache/attach + +This only has to be done once. The next time you reboot, just reregister all +your bcache devices. If a backing device has data in a cache somewhere, the +/dev/bcache# device won't be created until the cache shows up - particularly +important if you have writeback caching turned on. + +If you're booting up and your cache device is gone and never coming back, you +can force run the backing device: + + echo 1 > /sys/block/sdb/bcache/running + +(You need to use /sys/block/sdb (or whatever your backing device is called), not +/sys/block/bcache0, because bcache0 doesn't exist yet. If you're using a +partition, the bcache directory would be at /sys/block/sdb/sdb2/bcache) + +The backing device will still use that cache set if it shows up in the future, +but all the cached data will be invalidated. If there was dirty data in the +cache, don't expect the filesystem to be recoverable - you will have massive +filesystem corruption, though ext4's fsck does work miracles. + +ERROR HANDLING: + +Bcache tries to transparently handle IO errors to/from the cache device without +affecting normal operation; if it sees too many errors (the threshold is +configurable, and defaults to 0) it shuts down the cache device and switches all +the backing devices to passthrough mode. + + - For reads from the cache, if they error we just retry the read from the + backing device. + + - For writethrough writes, if the write to the cache errors we just switch to + invalidating the data at that lba in the cache (i.e. the same thing we do for + a write that bypasses the cache) + + - For writeback writes, we currently pass that error back up to the + filesystem/userspace. This could be improved - we could retry it as a write + that skips the cache so we don't have to error the write. + + - When we detach, we first try to flush any dirty data (if we were running in + writeback mode). It currently doesn't do anything intelligent if it fails to + read some of the dirty data, though. + +TROUBLESHOOTING PERFORMANCE: + +Bcache has a bunch of config options and tunables. The defaults are intended to +be reasonable for typical desktop and server workloads, but they're not what you +want for getting the best possible numbers when benchmarking. + + - Bad write performance + + If write performance is not what you expected, you probably wanted to be + running in writeback mode, which isn't the default (not due to a lack of + maturity, but simply because in writeback mode you'll lose data if something + happens to your SSD) + + # echo writeback > /sys/block/bcache0/cache_mode + + - Bad performance, or traffic not going to the SSD that you'd expect + + By default, bcache doesn't cache everything. It tries to skip sequential IO - + because you really want to be caching the random IO, and if you copy a 10 + gigabyte file you probably don't want that pushing 10 gigabytes of randomly + accessed data out of your cache. + + But if you want to benchmark reads from cache, and you start out with fio + writing an 8 gigabyte test file - so you want to disable that. + + # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff + + To set it back to the default (4 mb), do + + # echo 4M > /sys/block/bcache0/bcache/sequential_cutoff + + - Traffic's still going to the spindle/still getting cache misses + + In the real world, SSDs don't always keep up with disks - particularly with + slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So + you want to avoid being bottlenecked by the SSD and having it slow everything + down. + + To avoid that bcache tracks latency to the cache device, and gradually + throttles traffic if the latency exceeds a threshold (it does this by + cranking down the sequential bypass). + + You can disable this if you need to by setting the thresholds to 0: + + # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us + # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us + + The default is 2000 us (2 milliseconds) for reads, and 20000 for writes. + + - Still getting cache misses, of the same data + + One last issue that sometimes trips people up is actually an old bug, due to + the way cache coherency is handled for cache misses. If a btree node is full, + a cache miss won't be able to insert a key for the new data and the data + won't be written to the cache. + + In practice this isn't an issue because as soon as a write comes along it'll + cause the btree node to be split, and you need almost no write traffic for + this to not show up enough to be noticable (especially since bcache's btree + nodes are huge and index large regions of the device). But when you're + benchmarking, if you're trying to warm the cache by reading a bunch of data + and there's no other traffic - that can be a problem. + + Solution: warm the cache by doing writes, or use the testing branch (there's + a fix for the issue there). + +SYSFS - BACKING DEVICE: + +attach + Echo the UUID of a cache set to this file to enable caching. + +cache_mode + Can be one of either writethrough, writeback, writearound or none. + +clear_stats + Writing to this file resets the running total stats (not the day/hour/5 minute + decaying versions). + +detach + Write to this file to detach from a cache set. If there is dirty data in the + cache, it will be flushed first. + +dirty_data + Amount of dirty data for this backing device in the cache. Continuously + updated unlike the cache set's version, but may be slightly off. + +label + Name of underlying device. + +readahead + Size of readahead that should be performed. Defaults to 0. If set to e.g. + 1M, it will round cache miss reads up to that size, but without overlapping + existing cache entries. + +running + 1 if bcache is running (i.e. whether the /dev/bcache device exists, whether + it's in passthrough mode or caching). + +sequential_cutoff + A sequential IO will bypass the cache once it passes this threshhold; the + most recent 128 IOs are tracked so sequential IO can be detected even when + it isn't all done at once. + +sequential_merge + If non zero, bcache keeps a list of the last 128 requests submitted to compare + against all new requests to determine which new requests are sequential + continuations of previous requests for the purpose of determining sequential + cutoff. This is necessary if the sequential cutoff value is greater than the + maximum acceptable sequential size for any single request. + +state + The backing device can be in one of four different states: + + no cache: Has never been attached to a cache set. + + clean: Part of a cache set, and there is no cached dirty data. + + dirty: Part of a cache set, and there is cached dirty data. + + inconsistent: The backing device was forcibly run by the user when there was + dirty data cached but the cache set was unavailable; whatever data was on the + backing device has likely been corrupted. + +stop + Write to this file to shut down the bcache device and close the backing + device. + +writeback_delay + When dirty data is written to the cache and it previously did not contain + any, waits some number of seconds before initiating writeback. Defaults to + 30. + +writeback_percent + If nonzero, bcache tries to keep around this percentage of the cache dirty by + throttling background writeback and using a PD controller to smoothly adjust + the rate. + +writeback_rate + Rate in sectors per second - if writeback_percent is nonzero, background + writeback is throttled to this rate. Continuously adjusted by bcache but may + also be set by the user. + +writeback_running + If off, writeback of dirty data will not take place at all. Dirty data will + still be added to the cache until it is mostly full; only meant for + benchmarking. Defaults to on. + +SYSFS - BACKING DEVICE STATS: + +There are directories with these numbers for a running total, as well as +versions that decay over the past day, hour and 5 minutes; they're also +aggregated in the cache set directory as well. + +bypassed + Amount of IO (both reads and writes) that has bypassed the cache + +cache_hits +cache_misses +cache_hit_ratio + Hits and misses are counted per individual IO as bcache sees them; a + partial hit is counted as a miss. + +cache_bypass_hits +cache_bypass_misses + Hits and misses for IO that is intended to skip the cache are still counted, + but broken out here. + +cache_miss_collisions + Counts instances where data was going to be inserted into the cache from a + cache miss, but raced with a write and data was already present (usually 0 + since the synchronization for cache misses was rewritten) + +cache_readaheads + Count of times readahead occured. + +SYSFS - CACHE SET: + +average_key_size + Average data per key in the btree. + +bdev<0..n> + Symlink to each of the attached backing devices. + +block_size + Block size of the cache devices. + +btree_cache_size + Amount of memory currently used by the btree cache + +bucket_size + Size of buckets + +cache<0..n> + Symlink to each of the cache devices comprising this cache set. + +cache_available_percent + Percentage of cache device free. + +clear_stats + Clears the statistics associated with this cache + +dirty_data + Amount of dirty data is in the cache (updated when garbage collection runs). + +flash_vol_create + Echoing a size to this file (in human readable units, k/M/G) creates a thinly + provisioned volume backed by the cache set. + +io_error_halflife +io_error_limit + These determines how many errors we accept before disabling the cache. + Each error is decayed by the half life (in # ios). If the decaying count + reaches io_error_limit dirty data is written out and the cache is disabled. + +journal_delay_ms + Journal writes will delay for up to this many milliseconds, unless a cache + flush happens sooner. Defaults to 100. + +root_usage_percent + Percentage of the root btree node in use. If this gets too high the node + will split, increasing the tree depth. + +stop + Write to this file to shut down the cache set - waits until all attached + backing devices have been shut down. + +tree_depth + Depth of the btree (A single node btree has depth 0). + +unregister + Detaches all backing devices and closes the cache devices; if dirty data is + present it will disable writeback caching and wait for it to be flushed. + +SYSFS - CACHE SET INTERNAL: + +This directory also exposes timings for a number of internal operations, with +separate files for average duration, average frequency, last occurence and max +duration: garbage collection, btree read, btree node sorts and btree splits. + +active_journal_entries + Number of journal entries that are newer than the index. + +btree_nodes + Total nodes in the btree. + +btree_used_percent + Average fraction of btree in use. + +bset_tree_stats + Statistics about the auxiliary search trees + +btree_cache_max_chain + Longest chain in the btree node cache's hash table + +cache_read_races + Counts instances where while data was being read from the cache, the bucket + was reused and invalidated - i.e. where the pointer was stale after the read + completed. When this occurs the data is reread from the backing device. + +trigger_gc + Writing to this file forces garbage collection to run. + +SYSFS - CACHE DEVICE: + +block_size + Minimum granularity of writes - should match hardware sector size. + +btree_written + Sum of all btree writes, in (kilo/mega/giga) bytes + +bucket_size + Size of buckets + +cache_replacement_policy + One of either lru, fifo or random. + +discard + Boolean; if on a discard/TRIM will be issued to each bucket before it is + reused. Defaults to off, since SATA TRIM is an unqueued command (and thus + slow). + +freelist_percent + Size of the freelist as a percentage of nbuckets. Can be written to to + increase the number of buckets kept on the freelist, which lets you + artificially reduce the size of the cache at runtime. Mostly for testing + purposes (i.e. testing how different size caches affect your hit rate), but + since buckets are discarded when they move on to the freelist will also make + the SSD's garbage collection easier by effectively giving it more reserved + space. + +io_errors + Number of errors that have occured, decayed by io_error_halflife. + +metadata_written + Sum of all non data writes (btree writes and all other metadata). + +nbuckets + Total buckets in this cache + +priority_stats + Statistics about how recently data in the cache has been accessed. This can + reveal your working set size. + +written + Sum of all data that has been written to the cache; comparison with + btree_written gives the amount of write inflation in bcache. diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt index a5eb7d19a65d..9887f0414c16 100644 --- a/Documentation/block/cfq-iosched.txt +++ b/Documentation/block/cfq-iosched.txt @@ -5,7 +5,7 @@ The main aim of CFQ scheduler is to provide a fair allocation of the disk I/O bandwidth for all the processes which requests an I/O operation. CFQ maintains the per process queue for the processes which request I/O -operation(syncronous requests). In case of asynchronous requests, all the +operation(synchronous requests). In case of asynchronous requests, all the requests from all the processes are batched together according to their process's I/O priority. @@ -66,6 +66,47 @@ This parameter is used to set the timeout of synchronous requests. Default value of this is 124ms. In case to favor synchronous requests over asynchronous one, this value should be decreased relative to fifo_expire_async. +group_idle +----------- +This parameter forces idling at the CFQ group level instead of CFQ +queue level. This was introduced after after a bottleneck was observed +in higher end storage due to idle on sequential queue and allow dispatch +from a single queue. The idea with this parameter is that it can be run with +slice_idle=0 and group_idle=8, so that idling does not happen on individual +queues in the group but happens overall on the group and thus still keeps the +IO controller working. +Not idling on individual queues in the group will dispatch requests from +multiple queues in the group at the same time and achieve higher throughput +on higher end storage. + +Default value for this parameter is 8ms. + +latency +------- +This parameter is used to enable/disable the latency mode of the CFQ +scheduler. If latency mode (called low_latency) is enabled, CFQ tries +to recompute the slice time for each process based on the target_latency set +for the system. This favors fairness over throughput. Disabling low +latency (setting it to 0) ignores target latency, allowing each process in the +system to get a full time slice. + +By default low latency mode is enabled. + +target_latency +-------------- +This parameter is used to calculate the time slice for a process if cfq's +latency mode is enabled. It will ensure that sync requests have an estimated +latency. But if sequential workload is higher(e.g. sequential read), +then to meet the latency constraints, throughput may decrease because of less +time for each process to issue I/O request before the cfq queue is switched. + +Though this can be overcome by disabling the latency_mode, it may increase +the read latency for some applications. This parameter allows for changing +target_latency through the sysfs interface which can provide the balanced +throughput and read latency. + +Default value for target_latency is 300ms. + slice_async ----------- This parameter is same as of slice_sync but for asynchronous queue. The @@ -98,8 +139,8 @@ in the device exceeds this parameter. This parameter is used for synchronous request. In case of storage with several disk, this setting can limit the parallel -processing of request. Therefore, increasing the value can imporve the -performace although this can cause the latency of some I/O to increase due +processing of request. Therefore, increasing the value can improve the +performance although this can cause the latency of some I/O to increase due to more number of requests. CFQ Group scheduling diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 09027a9fece5..ddf4f93967a9 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -480,7 +480,9 @@ memory.stat file includes following statistics # per-memory cgroup local status cache - # of bytes of page cache memory. -rss - # of bytes of anonymous and swap cache memory. +rss - # of bytes of anonymous and swap cache memory (includes + transparent hugepages). +rss_huge - # of bytes of anonymous transparent hugepages. mapped_file - # of bytes of mapped file (includes tmpfs/shmem) pgpgin - # of charging events to the memory cgroup. The charging event happens each time a page is accounted as either mapped diff --git a/Documentation/devicetree/bindings/dma/atmel-dma.txt b/Documentation/devicetree/bindings/dma/atmel-dma.txt index 3c046ee6e8b5..c80e8a3402f0 100644 --- a/Documentation/devicetree/bindings/dma/atmel-dma.txt +++ b/Documentation/devicetree/bindings/dma/atmel-dma.txt @@ -1,14 +1,39 @@ * Atmel Direct Memory Access Controller (DMA) Required properties: -- compatible: Should be "atmel,<chip>-dma" -- reg: Should contain DMA registers location and length -- interrupts: Should contain DMA interrupt +- compatible: Should be "atmel,<chip>-dma". +- reg: Should contain DMA registers location and length. +- interrupts: Should contain DMA interrupt. +- #dma-cells: Must be <2>, used to represent the number of integer cells in +the dmas property of client devices. -Examples: +Example: -dma@ffffec00 { +dma0: dma@ffffec00 { compatible = "atmel,at91sam9g45-dma"; reg = <0xffffec00 0x200>; interrupts = <21>; + #dma-cells = <2>; +}; + +DMA clients connected to the Atmel DMA controller must use the format +described in the dma.txt file, using a three-cell specifier for each channel: +a phandle plus two interger cells. +The three cells in order are: + +1. A phandle pointing to the DMA controller. +2. The memory interface (16 most significant bits), the peripheral interface +(16 less significant bits). +3. The peripheral identifier for the hardware handshaking interface. The +identifier can be different for tx and rx. + +Example: + +i2c0@i2c@f8010000 { + compatible = "atmel,at91sam9x5-i2c"; + reg = <0xf8010000 0x100>; + interrupts = <9 4 6>; + dmas = <&dma0 1 7>, + <&dma0 1 8>; + dma-names = "tx", "rx"; }; diff --git a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt index e7f8d7ed47eb..6a983c1d87cd 100644 --- a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt +++ b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt @@ -56,20 +56,20 @@ Example for an AM33xx board: nand-bus-width = <16>; ti,nand-ecc-opt = "bch8"; - gpmc,sync-clk = <0>; - gpmc,cs-on = <0>; - gpmc,cs-rd-off = <44>; - gpmc,cs-wr-off = <44>; - gpmc,adv-on = <6>; - gpmc,adv-rd-off = <34>; - gpmc,adv-wr-off = <44>; - gpmc,we-off = <40>; - gpmc,oe-off = <54>; - gpmc,access = <64>; - gpmc,rd-cycle = <82>; - gpmc,wr-cycle = <82>; - gpmc,wr-access = <40>; - gpmc,wr-data-mux-bus = <0>; + gpmc,sync-clk-ps = <0>; + gpmc,cs-on-ns = <0>; + gpmc,cs-rd-off-ns = <44>; + gpmc,cs-wr-off-ns = <44>; + gpmc,adv-on-ns = <6>; + gpmc,adv-rd-off-ns = <34>; + gpmc,adv-wr-off-ns = <44>; + gpmc,we-off-ns = <40>; + gpmc,oe-off-ns = <54>; + gpmc,access-ns = <64>; + gpmc,rd-cycle-ns = <82>; + gpmc,wr-cycle-ns = <82>; + gpmc,wr-access-ns = <40>; + gpmc,wr-data-mux-bus-ns = <0>; #address-cells = <1>; #size-cells = <1>; diff --git a/Documentation/devicetree/bindings/mtd/partition.txt b/Documentation/devicetree/bindings/mtd/partition.txt index 6e1f61f1e789..9315ac96b49b 100644 --- a/Documentation/devicetree/bindings/mtd/partition.txt +++ b/Documentation/devicetree/bindings/mtd/partition.txt @@ -5,8 +5,12 @@ on platforms which have strong conventions about which portions of a flash are used for what purposes, but which don't use an on-flash partition table such as RedBoot. -#address-cells & #size-cells must both be present in the mtd device and be -equal to 1. +#address-cells & #size-cells must both be present in the mtd device. There are +two valid values for both: +<1>: for partitions that require a single 32-bit cell to represent their + size/address (aka the value is below 4 GiB) +<2>: for partitions that require two 32-bit cells to represent their + size/address (aka the value is 4 GiB or greater). Required properties: - reg : The partition's offset and size within the mtd bank. @@ -36,3 +40,31 @@ flash@0 { reg = <0x0100000 0x200000>; }; }; + +flash@1 { + #address-cells = <1>; + #size-cells = <2>; + + /* a 4 GiB partition */ + partition@0 { + label = "filesystem"; + reg = <0x00000000 0x1 0x00000000>; + }; +}; + +flash@2 { + #address-cells = <2>; + #size-cells = <2>; + + /* an 8 GiB partition */ + partition@0 { + label = "filesystem #1"; + reg = <0x0 0x00000000 0x2 0x00000000>; + }; + + /* a 4 GiB partition */ + partition@200000000 { + label = "filesystem #2"; + reg = <0x2 0x00000000 0x1 0x00000000>; + }; +}; diff --git a/Documentation/devicetree/bindings/net/gpmc-eth.txt b/Documentation/devicetree/bindings/net/gpmc-eth.txt index 24cb4e46f675..ace4a64b3695 100644 --- a/Documentation/devicetree/bindings/net/gpmc-eth.txt +++ b/Documentation/devicetree/bindings/net/gpmc-eth.txt @@ -26,16 +26,16 @@ Required properties: - bank-width: Address width of the device in bytes. GPMC supports 8-bit and 16-bit devices and so must be either 1 or 2 bytes. - compatible: Compatible string property for the ethernet child device. -- gpmc,cs-on: Chip-select assertion time -- gpmc,cs-rd-off: Chip-select de-assertion time for reads -- gpmc,cs-wr-off: Chip-select de-assertion time for writes -- gpmc,oe-on: Output-enable assertion time -- gpmc,oe-off Output-enable de-assertion time -- gpmc,we-on: Write-enable assertion time -- gpmc,we-off: Write-enable de-assertion time -- gpmc,access: Start cycle to first data capture (read access) -- gpmc,rd-cycle: Total read cycle time -- gpmc,wr-cycle: Total write cycle time +- gpmc,cs-on-ns: Chip-select assertion time +- gpmc,cs-rd-off-ns: Chip-select de-assertion time for reads +- gpmc,cs-wr-off-ns: Chip-select de-assertion time for writes +- gpmc,oe-on-ns: Output-enable assertion time +- gpmc,oe-off-ns: Output-enable de-assertion time +- gpmc,we-on-ns: Write-enable assertion time +- gpmc,we-off-ns: Write-enable de-assertion time +- gpmc,access-ns: Start cycle to first data capture (read access) +- gpmc,rd-cycle-ns: Total read cycle time +- gpmc,wr-cycle-ns: Total write cycle time - reg: Chip-select, base address (relative to chip-select) and size of the memory mapped for the device. Note that base address will be typically 0 as this @@ -65,24 +65,24 @@ gpmc: gpmc@6e000000 { bank-width = <2>; gpmc,mux-add-data; - gpmc,cs-on = <0>; - gpmc,cs-rd-off = <186>; - gpmc,cs-wr-off = <186>; - gpmc,adv-on = <12>; - gpmc,adv-rd-off = <48>; - gpmc,adv-wr-off = <48>; - gpmc,oe-on = <54>; - gpmc,oe-off = <168>; - gpmc,we-on = <54>; - gpmc,we-off = <168>; - gpmc,rd-cycle = <186>; - gpmc,wr-cycle = <186>; - gpmc,access = <114>; - gpmc,page-burst-access = <6>; - gpmc,bus-turnaround = <12>; - gpmc,cycle2cycle-delay = <18>; - gpmc,wr-data-mux-bus = <90>; - gpmc,wr-access = <186>; + gpmc,cs-on-ns = <0>; + gpmc,cs-rd-off-ns = <186>; + gpmc,cs-wr-off-ns = <186>; + gpmc,adv-on-ns = <12>; + gpmc,adv-rd-off-ns = <48>; + gpmc,adv-wr-off-ns = <48>; + gpmc,oe-on-ns = <54>; + gpmc,oe-off-ns = <168>; + gpmc,we-on-ns = <54>; + gpmc,we-off-ns = <168>; + gpmc,rd-cycle-ns = <186>; + gpmc,wr-cycle-ns = <186>; + gpmc,access-ns = <114>; + gpmc,page-burst-access-ns = <6>; + gpmc,bus-turnaround-ns = <12>; + gpmc,cycle2cycle-delay-ns = <18>; + gpmc,wr-data-mux-bus-ns = <90>; + gpmc,wr-access-ns = <186>; gpmc,cycle2cycle-samecsen; gpmc,cycle2cycle-diffcsen; diff --git a/Documentation/devicetree/bindings/thermal/armada-thermal.txt b/Documentation/devicetree/bindings/thermal/armada-thermal.txt new file mode 100644 index 000000000000..fff93d5f92de --- /dev/null +++ b/Documentation/devicetree/bindings/thermal/armada-thermal.txt @@ -0,0 +1,22 @@ +* Marvell Armada 370/XP thermal management + +Required properties: + +- compatible: Should be set to one of the following: + marvell,armada370-thermal + marvell,armadaxp-thermal + +- reg: Device's register space. + Two entries are expected, see the examples below. + The first one is required for the sensor register; + the second one is required for the control register + to be used for sensor initialization (a.k.a. calibration). + +Example: + + thermal@d0018300 { + compatible = "marvell,armada370-thermal"; + reg = <0xd0018300 0x4 + 0xd0018304 0x4>; + status = "okay"; + }; diff --git a/Documentation/dmatest.txt b/Documentation/dmatest.txt new file mode 100644 index 000000000000..279ac0a8c5b1 --- /dev/null +++ b/Documentation/dmatest.txt @@ -0,0 +1,81 @@ + DMA Test Guide + ============== + + Andy Shevchenko <andriy.shevchenko@linux.intel.com> + +This small document introduces how to test DMA drivers using dmatest module. + + Part 1 - How to build the test module + +The menuconfig contains an option that could be found by following path: + Device Drivers -> DMA Engine support -> DMA Test client + +In the configuration file the option called CONFIG_DMATEST. The dmatest could +be built as module or inside kernel. Let's consider those cases. + + Part 2 - When dmatest is built as a module... + +After mounting debugfs and loading the module, the /sys/kernel/debug/dmatest +folder with nodes will be created. They are the same as module parameters with +addition of the 'run' node that controls run and stop phases of the test. + +Note that in this case test will not run on load automatically. + +Example of usage: + % echo dma0chan0 > /sys/kernel/debug/dmatest/channel + % echo 2000 > /sys/kernel/debug/dmatest/timeout + % echo 1 > /sys/kernel/debug/dmatest/iterations + % echo 1 > /sys/kernel/debug/dmatest/run + +Hint: available channel list could be extracted by running the following +command: + % ls -1 /sys/class/dma/ + +After a while you will start to get messages about current status or error like +in the original code. + +Note that running a new test will stop any in progress test. + +The following command should return actual state of the test. + % cat /sys/kernel/debug/dmatest/run + +To wait for test done the user may perform a busy loop that checks the state. + + % while [ $(cat /sys/kernel/debug/dmatest/run) = "Y" ] + > do + > echo -n "." + > sleep 1 + > done + > echo + + Part 3 - When built-in in the kernel... + +The module parameters that is supplied to the kernel command line will be used +for the first performed test. After user gets a control, the test could be +interrupted or re-run with same or different parameters. For the details see +the above section "Part 2 - When dmatest is built as a module..." + +In both cases the module parameters are used as initial values for the test case. +You always could check them at run-time by running + % grep -H . /sys/module/dmatest/parameters/* + + Part 4 - Gathering the test results + +The module provides a storage for the test results in the memory. The gathered +data could be used after test is done. + +The special file 'results' in the debugfs represents gathered data of the in +progress test. The messages collected are printed to the kernel log as well. + +Example of output: + % cat /sys/kernel/debug/dmatest/results + dma0chan0-copy0: #1: No errors with src_off=0x7bf dst_off=0x8ad len=0x3fea (0) + +The message format is unified across the different types of errors. A number in +the parens represents additional information, e.g. error code, error counter, +or status. + +Comparison between buffers is stored to the dedicated structure. + +Note that the verify result is now accessible only via file 'results' in the +debugfs. diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index dcf338e62b71..bd3c56c67380 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -146,7 +146,7 @@ USAGE Format options -------------- --l [label] : Give a volume label, up to 256 unicode name. +-l [label] : Give a volume label, up to 512 unicode name. -a [0 or 1] : Split start location of each area for heap-based allocation. 1 is set by default, which performs this. -o [int] : Set overprovision ratio in percent over volume size. @@ -156,6 +156,8 @@ Format options -z [int] : Set the number of sections per zone. 1 is set by default. -e [str] : Set basic extension list. e.g. "mp3,gif,mov" +-t [0 or 1] : Disable discard command or not. + 1 is set by default, which conducts discard. ================================================================================ DESIGN diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index 77a1d11af723..6f83fa965b4b 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt @@ -72,11 +72,11 @@ in this document, but drivers acting as clients to the GPIO interface must not care how it's implemented.) That said, if the convention is supported on their platform, drivers should -use it when possible. Platforms must declare GENERIC_GPIO support in their -Kconfig (boolean true), and provide an <asm/gpio.h> file. Drivers that can't -work without standard GPIO calls should have Kconfig entries which depend -on GENERIC_GPIO. The GPIO calls are available, either as "real code" or as -optimized-away stubs, when drivers use the include file: +use it when possible. Platforms must select ARCH_REQUIRE_GPIOLIB or +ARCH_WANT_OPTIONAL_GPIOLIB in their Kconfig. Drivers that can't work without +standard GPIO calls should have Kconfig entries which depend on GPIOLIB. The +GPIO calls are available, either as "real code" or as optimized-away stubs, +when drivers use the include file: #include <linux/gpio.h> diff --git a/Documentation/thermal/exynos_thermal_emulation b/Documentation/thermal/exynos_thermal_emulation index b73bbfb697bb..36a3e79c1203 100644 --- a/Documentation/thermal/exynos_thermal_emulation +++ b/Documentation/thermal/exynos_thermal_emulation @@ -13,11 +13,11 @@ Thermal emulation mode supports software debug for TMU's operation. User can set manually with software code and TMU will read current temperature from user value not from sensor's value. -Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support in available. -When it's enabled, sysfs node will be created under -/sys/bus/platform/devices/'exynos device name'/ with name of 'emulation'. +Enabling CONFIG_THERMAL_EMULATION option will make this support available. +When it's enabled, sysfs node will be created as +/sys/devices/virtual/thermal/thermal_zone'zone id'/emul_temp. -The sysfs node, 'emulation', will contain value 0 for the initial state. When you input any +The sysfs node, 'emul_node', will contain value 0 for the initial state. When you input any temperature you want to update to sysfs node, it automatically enable emulation mode and current temperature will be changed into it. (Exynos also supports user changable delay time which would be used to delay of diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt index 6859661c9d31..a71bd5b90fe8 100644 --- a/Documentation/thermal/sysfs-api.txt +++ b/Documentation/thermal/sysfs-api.txt @@ -31,15 +31,17 @@ temperature) and throttle appropriate devices. 1. thermal sysfs driver interface functions 1.1 thermal zone device interface -1.1.1 struct thermal_zone_device *thermal_zone_device_register(char *name, +1.1.1 struct thermal_zone_device *thermal_zone_device_register(char *type, int trips, int mask, void *devdata, - struct thermal_zone_device_ops *ops) + struct thermal_zone_device_ops *ops, + const struct thermal_zone_params *tzp, + int passive_delay, int polling_delay)) This interface function adds a new thermal zone device (sensor) to /sys/class/thermal folder as thermal_zone[0-*]. It tries to bind all the thermal cooling devices registered at the same time. - name: the thermal zone name. + type: the thermal zone type. trips: the total number of trip points this thermal zone supports. mask: Bit string: If 'n'th bit is set, then trip point 'n' is writeable. devdata: device private data @@ -57,6 +59,12 @@ temperature) and throttle appropriate devices. will be fired. .set_emul_temp: set the emulation temperature which helps in debugging different threshold temperature points. + tzp: thermal zone platform parameters. + passive_delay: number of milliseconds to wait between polls when + performing passive cooling. + polling_delay: number of milliseconds to wait between polls when checking + whether trip points have been crossed (0 for interrupt driven systems). + 1.1.2 void thermal_zone_device_unregister(struct thermal_zone_device *tz) @@ -265,6 +273,10 @@ emul_temp Unit: millidegree Celsius WO, Optional + WARNING: Be careful while enabling this option on production systems, + because userland can easily disable the thermal policy by simply + flooding this sysfs node with low temperature values. + ***************************** * Cooling device attributes * ***************************** @@ -363,7 +375,7 @@ This function returns the thermal_instance corresponding to a given {thermal_zone, cooling_device, trip_point} combination. Returns NULL if such an instance does not exist. -5.3:notify_thermal_framework: +5.3:thermal_notify_framework: This function handles the trip events from sensor drivers. It starts throttling the cooling devices according to the policy configured. For CRITICAL and HOT trip points, this notifies the respective drivers, @@ -375,11 +387,3 @@ platform data is provided, this uses the step_wise throttling policy. This function serves as an arbitrator to set the state of a cooling device. It sets the cooling device to the deepest cooling state if possible. - -5.5:thermal_register_governor: -This function lets the various thermal governors to register themselves -with the Thermal framework. At run time, depending on a zone's platform -data, a particular governor is used for throttling. - -5.6:thermal_unregister_governor: -This function unregisters a governor from the thermal framework. diff --git a/Documentation/zh_CN/gpio.txt b/Documentation/zh_CN/gpio.txt index 4fa7b4e6f856..d5b8f01833f4 100644 --- a/Documentation/zh_CN/gpio.txt +++ b/Documentation/zh_CN/gpio.txt @@ -84,10 +84,10 @@ GPIO 公约 控制器的抽象函数来实现它。(有一些可选的代码能支持这种策略的实现,本文档 后面会介绍,但作为 GPIO 接口的客户端驱动程序必须与它的实现无关。) -也就是说,如果在他们的平台上支持这个公约,驱动应尽可能的使用它。平台 -必须在 Kconfig 中声明对 GENERIC_GPIO的支持 (布尔型 true),并提供 -一个 <asm/gpio.h> 文件。那些调用标准 GPIO 函数的驱动应该在 Kconfig -入口中声明依赖GENERIC_GPIO。当驱动包含文件: +也就是说,如果在他们的平台上支持这个公约,驱动应尽可能的使用它。同时,平台 +必须在 Kconfig 中选择 ARCH_REQUIRE_GPIOLIB 或者 ARCH_WANT_OPTIONAL_GPIOLIB +选项。那些调用标准 GPIO 函数的驱动应该在 Kconfig 入口中声明依赖GENERIC_GPIO。 +当驱动包含文件: #include <linux/gpio.h> |