Age | Commit message (Collapse) | Author |
|
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
By default the atmel_serial driver in RS485 mode disables receiving data until
all data in the send buffer has been sent. This flag allows to receive data
even whilst sending data.
Signed-off-by: Bernhard Roth <br@pwrnet.de>
Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This reverts commit 6b1a98d1c4851235d9b6764b3f7b7db7909fc760.
It causes a build error that needs to be resolved differently.
Cc: Alan Cox <alan@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
We can have the NFC core layer allocating the tx head and tail
room for the drivers and avoid 1 or more SKBs copy on write on
the Tx path.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
Allow userspace to set NL80211_MESHCONF_GATE_ANNOUNCEMENTS attribute,
which will advertise this mesh node as being a mesh gate.
NL80211_HWMP_ROOTMODE must be set or this will do nothing.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
Allow userspace to set Root Announcement Interval for our mesh
interface. Also, RANN interval is now in proper units of TUs.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
In this implementation, a mesh gate is a root node with a certain bit
set in its RANN flags. The mpath to this root node is marked as a path
to a gate, and added to our list of known gates for this if_mesh. Once a
path discovery process fails, we forward the unresolved frames to a
known gate. Thanks to Luis Rodriguez for refactoring and bug fix help.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
fuse: mark pages accessed when written to
fuse: delete dead .write_begin and .write_end aops
fuse: fix flock
fuse: fix non-ANSI void function notation
|
|
Cleaning up the code a little bit. attempt_plug_merge() traverses the plug
list anyway, we can do the request counting there, so stack size is reduced
a little bit.
The motivation here is I suspect if we should count the requests for each
queue (task could handle multiple disks in the meantime), but my test doesn't
show it's worthy doing. If somebody proves we should do it, below change
will make that more easier.
Signed-off-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
"4-finger scroll" is a gesture supported by some applications and
operating systems.
"Resting thumb" is when a clickpad user rests a finger (e.g., a
thumb), in a "click zone" (typically the bottom of the touchpad) in
anticipation of click+move=select gestures.
Thus, "4-finger scroll + resting thumb" is a 5-finger gesture.
To allow userspace to detect this gesture, we send BTN_TOOL_QUINTTAP.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Acked-by: Chase Douglas <chase.douglas@canonical.com>
Acked-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
|
|
The Synopsys DesignWare 8250 is an 8250 that has an extra interrupt that
gets raised when writing to the LCR when busy. To handle this we need
special serial_out, serial_in and handle_irq methods. Add a new
function serial8250_use_designware_io() that configures a uart_port with
these accessors.
Cc: Alan Cox <alan@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Now that platforms can override the port IRQ handler and the only user
of these UPIO modes has been converted over, kill off UPIO_DWAPB and
UPIO_DWAPB32.
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Some ports (e.g. Synopsys DesignWare 8250) have special requirements for
handling the interrupts. Allow these platforms to specify their own
interrupt handler that will override the default.
serial8250_handle_irq() is provided so that platforms can extend the IRQ
handler rather than completely replacing it.
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Some serial ports may have unusal requirements for interrupt handling
(e.g. the Synopsys DesignWare 8250-alike port and it's busy detect
interrupt). Add a .handle_irq callback that can be used for platforms
to override the interrupt behaviour in a similar fashion to the
.serial_out and .serial_in callbacks.
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
We used it really only serial and ami_serial. The rest of the
callsites were BUG/WARN_ONs to check if BTM is held. Now that we
pruned tty_locked from both of the real users, we can get rid of
tty_lock along with __big_tty_mutex_owner.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
tty_wakeup can be called from any context. So there is no need to have
an extra tasklet for calling that. Hence save some space and remove
the tasklet completely.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
tty_operations->remove is normally called like:
queue_release_one_tty
->tty_shutdown
->tty_driver_remove_tty
->tty_operations->remove
However tty_shutdown() is called from queue_release_one_tty() only if
tty_operations->shutdown is NULL. But for pty, it is not.
pty_unix98_shutdown() is used there as ->shutdown.
So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
called. This results in invalid pty_count. I.e. what can be seen in
/proc/sys/kernel/pty/nr.
I see this was already reported at:
https://lkml.org/lkml/2009/11/5/370
But it was not fixed since then.
This patch is kind of a hackish way. The problem lies in ->install. We
allocate there another tty (so-called tty->link). So ->install is
called once, but ->remove twice, for both tty and tty->link. The fix
here is to count both tty and tty->link and divide the count by 2 for
user.
And to have ->remove called, let's make tty_driver_remove_tty() global
and call that from pty_unix98_shutdown() (tty_operations->shutdown).
While at it, let's document that when ->shutdown is defined,
tty_shutdown() is not called.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Now ${LINUX}/drivers/usb/* can use usb_endpoint_maxp(desc) to get maximum packet size
instead of le16_to_cpu(desc->wMaxPacketSize).
This patch fix it up
Cc: Armin Fuerst <fuerst@in.tum.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Johannes Erdfelt <johannes@erdfelt.com>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Oliver Neukum <oliver@neukum.name>
Cc: David Kubicek <dave@awk.cz>
Cc: Johan Hovold <jhovold@gmail.com>
Cc: Brad Hards <bhards@bigpond.net.au>
Acked-by: Felipe Balbi <balbi@ti.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Dahlmann <dahlmann.thomas@arcor.de>
Cc: David Brownell <david-b@pacbell.net>
Cc: David Lopo <dlopo@chipidea.mips.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Michal Nazarewicz <m.nazarewicz@samsung.com>
Cc: Xie Xiaobo <X.Xie@freescale.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Jiang Bo <tanya.jiang@freescale.com>
Cc: Yuan-hsin Chen <yhchen@faraday-tech.com>
Cc: Darius Augulis <augulis.darius@gmail.com>
Cc: Xiaochen Shen <xiaochen.shen@intel.com>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: OKI SEMICONDUCTOR, <toshiharu-linux@dsn.okisemi.com>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Ben Dooks <ben@simtec.co.uk>
Cc: Thomas Abraham <thomas.ab@samsung.com>
Cc: Herbert Pötzl <herbert@13thfloor.at>
Cc: Arnaud Patard <arnaud.patard@rtp-net.org>
Cc: Roman Weissgaerber <weissg@vienna.at>
Acked-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Tony Olech <tony.olech@elandigitalsystems.com>
Cc: Florian Floe Echtler <echtler@fs.tum.de>
Cc: Christian Lucht <lucht@codemercs.com>
Cc: Juergen Stuber <starblue@sourceforge.net>
Cc: Georges Toth <g.toth@e-biz.lu>
Cc: Bill Ryder <bryder@sgi.com>
Cc: Kuba Ober <kuba@mareimbrium.org>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
and lave REQ_META purely for marking requests as metadata in blktrace.
All existing callers of REQ_META except for XFS are updated to also
set REQ_PRIO for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Replace all occurnanced of the undocumented READ_META with READ | REQ_META
and remove the unused WRITE_META define.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Previously, netif_dbg() was using dynamic_dev_dbg() to perform
the underlying printk. Fix it to use __netdev_printk(), instead.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Previously, if dynamic debug was enabled netdev_dbg() was using
dynamic_dev_dbg() to print out the underlying msg. Fix this by making
sure netdev_dbg() uses __netdev_printk().
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Remove no longer used dynamic debug control variables. The
definitions were removed a while ago, but we forgot to clean
up the extern references.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Unlike dynamic_pr_debug, dynamic uses of dev_dbg can not
currently add task_pid/KBUILD_MODNAME/__func__/__LINE__
to selected debug output.
Add a new function similar to dynamic_pr_debug to
optionally emit these prefixes.
Cc: Aloisio Almeida <aloisio.almeida@openbossa.org>
Noticed-by: Aloisio Almeida <aloisio.almeida@openbossa.org>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
The primary aim is to add skb_frag_(ref|unref) in order to remove the use of
bare get/put_page on SKB pages fragments and to isolate users from subsequent
changes to the skb_frag_t data structure.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
not to confuse with Table 9-7 in USB 2.0 spec
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Certain platform specific or Host-WiLink Interface specific actions would be
required to be taken when the chip is being enabled and after the chip is
disabled such as configuration of the mux modes for the GPIO of host connected
to the nshutdown of the chip or relinquishing UART after the chip is disabled.
Similar actions can also be taken when the chip is in deep sleep or when the
chip is awake. Performance enhancements such as configuring the host to run
faster when chip is awake and slower when chip is asleep can also be made
here.
Signed-off-by: Pavan Savoy <pavan_savoy@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Make mesh path selection frames Mesh Action category, remove outdated
Mesh Path Selection category and defines, use updated reason codes, add
mesh_action_is_path_sel for readability, and update/correct path
selection IEs.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
This patch updates the mesh peering frames to the format specified in
the recently ratified 802.11s standard. Several changes took place to
make this happen:
- Change RX path to handle new self-protected frames
- Add new Peering management IE
- Remove old Peer Link IE
- Remove old plink_action field in ieee80211_mgmt header
These changes by themselves would either break peering, or work by
coincidence, so squash them all into this patch.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
802.11s introduces a new action frame category, add action codes as well
as an entry in ieee80211_mgmt.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
We need to disable ext. PA lines for reading SPROM. It's disabled by
default, but this patch allows using bcma after loading wl, which leaves
workaround enabled.
Cc: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
While at it also fix the indention of the other IEEE80211_BAR_CTRL_ defines.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
drivers/staging/ath6kl/miscdrv/ar3kps/ar3kpsparser.c
drivers/staging/ath6kl/os/linux/ar6000_drv.c
|
|
|
|
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
Factor out the register read/write code to use the register map API. We
still need some wm831x specific code and locking in place to check that
the user key is handled correctly but only on the write side, reads are
not affected by the key.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
|
|
It is useful for the register cache code to be able to specify the
default values for the device registers. The major use is when restoring
the register cache after suspend, knowing the register defaults allows
us to skip registers that are at their default values when we resume which
can be a substantial win on larger modern devices. For some devices
(mostly older ones) the hardware does not support readback so the only way we
can know the values is from code and so initializing the cache with default
values makes it much easier for drivers work with read/modify/write
updates.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
|
|
|
|
This patch (as1482) adds a macro for testing whether or not a
pm_message value represents an autosuspend or autoresume (i.e., a
runtime PM) event. Encapsulating this notion seems preferable to
open-coding the test all over the place.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
Revert "cfq: Remove special treatment for metadata rqs."
block: fix flush machinery for stacking drivers with differring flush flags
block: improve rq_affinity placement
blktrace: add FLUSH/FUA support
Move some REQ flags to the common bio/request area
allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
xen/blkback: Make description more obvious.
cfq-iosched: Add documentation about idling
block: Make rq_affinity = 1 work as expected
block: swim3: fix unterminated of_device_id table
block/genhd.c: remove useless cast in diskstats_show()
drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
bsg-lib: add module.h include
cfq-iosched: Reduce linked group count upon group destruction
blk-throttle: correctly determine sync bio
loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
loop: add management interface for on-demand device allocation
loop: replace linked list of allocated devices with an idr index
...
|
|
Use NUMA aware allocations to reduce latencies and increase throughput.
sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Neil Brown <neilb@suse.de>
CC: David Miller <davem@davemloft.net>
Reviewed-by: Greg Banks <gnb@fastmail.fm>
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We currently use a bit in fl_flags to record whether a lease is being
broken, and set fl_type to the type (RDLCK or UNLCK) that it will
eventually have. This means that once the lease break starts, we forget
what the lease's type *used* to be. Breaking a read lease will then
result in blocking read opens, even though there's no conflict--because
the lease type is now F_UNLCK and we can no longer tell whether it was
previously a read or write lease.
So, instead keep fl_type as the original type (the type which we
enforce), and keep track of whether we're unlocking or merely
downgrading by replacing the single FL_INPROGRESS flag by
FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags.
To get this right we also need to track separate downgrade and break
times, to handle the case where a write-leased file gets conflicting
opens first for read, then later for write.
(I first considered just eliminating the downgrade behavior
completely--nfsv4 doesn't need it, and nobody as far as I can tell
actually uses it currently--but Jeremy Allison tells me that Windows
oplocks do behave this way, so Samba will probably use this some day.)
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
F_INPROGRESS isn't exposed to userspace. To me it makes more sense in
fl_flags....
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: OF: Don't crash when bridge parent is NULL.
PCI: export pcie_bus_configure_settings symbol
PCI: code and comments cleanup
PCI: make cardbus-bridge resources optional
PCI: make SRIOV resources optional
PCI : ability to relocate assigned pci-resources
PCI: honor child buses add_size in hot plug configuration
PCI: Set PCI-E Max Payload Size on fabric
|
|
Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
partial pages. The partial page list is used in slab_free() to avoid
per node lock taking.
In __slab_alloc() we can then take multiple partial pages off the per
node partial list in one go reducing node lock pressure.
We can also use the per cpu partial list in slab_alloc() to avoid scanning
partial lists for pages with free objects.
The main effect of a per cpu partial list is that the per node list_lock
is taken for batches of partial pages instead of individual ones.
Potential future enhancements:
1. The pickup from the partial list could be perhaps be done without disabling
interrupts with some work. The free path already puts the page into the
per cpu partial list without disabling interrupts.
2. __slab_free() may have some code paths that could use optimization.
Performance:
Before After
./hackbench 100 process 200000
Time: 1953.047 1564.614
./hackbench 100 process 20000
Time: 207.176 156.940
./hackbench 100 process 20000
Time: 204.468 156.940
./hackbench 100 process 20000
Time: 204.879 158.772
./hackbench 10 process 20000
Time: 20.153 15.853
./hackbench 10 process 20000
Time: 20.153 15.986
./hackbench 10 process 20000
Time: 19.363 16.111
./hackbench 1 process 20000
Time: 2.518 2.307
./hackbench 1 process 20000
Time: 2.258 2.339
./hackbench 1 process 20000
Time: 2.864 2.163
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
|
|
Revert the pass-good area introduced in ffd1f609ab10 ("writeback:
introduce max-pause and pass-good dirty limits") and make the max-pause
area smaller and safe.
This fixes ~30% performance regression in the ext3 data=writeback
fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.
Using deadline scheduler also has a regression, but not that big as CFQ,
so this suggests we have some write starvation.
The test logs show that
- the disks are sometimes under utilized
- global dirty pages sometimes rush high to the pass-good area for
several hundred seconds, while in the mean time some bdi dirty pages
drop to very low value (bdi_dirty << bdi_thresh). Then suddenly the
global dirty pages dropped under global dirty threshold and bdi_dirty
rush very high (for example, 2 times higher than bdi_thresh). During
which time balance_dirty_pages() is not called at all.
So the problems are
1) The random writes progress so slow that they break the assumption of
the max-pause logic that "8 pages per 200ms is typically more than
enough to curb heavy dirtiers".
2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
and then (bdi_thresh >> bdi_dirty) for others.
3) The higher max-pause/pass-good thresholds somehow leads to the bad
swing of dirty pages.
The fix is to allow the task to slightly dirty over task_bdi_thresh, but
no way to exceed bdi_dirty and/or global dirty_thresh.
Tests show that it fixed the JBOD regression completely (both behavior
and performance), while still being able to cut down large pause times
in balance_dirty_pages() for single-disk cases.
Reported-by: Li Shaohua <shaohua.li@intel.com>
Tested-by: Li Shaohua <shaohua.li@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|