summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-09-28virtio: remove CONFIG_VIRTIO_RINGRusty Russell
Everyone who selects VIRTIO is also made to select VIRTIO_RING; just make them synonymous, since we removed the indirection layer some time ago. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio: add help to CONFIG_VIRTIO option.Rusty Russell
Trying to enable a virtio driver (eg CONFIG_VIRTIO_BLK) is painful because it depends on CONFIG_VIRTIO. CONFIG_VIRTIO doesn't tell you how to turn it on (it's selected from anything which provides a virtio bus). This patch at least adds some documentation, visible in menuconfig, as a hint. Reported-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio: support reserved vqsMichael S. Tsirkin
virtio network device multiqueue support reserves vq 3 for future use (useful both for future extensions and to make it pretty - this way receive vqs have even and transmit - odd numbers). Make it possible to skip initialization for specific vq numbers by specifying NULL for name. Document this usage as well as (existing) NULL callback. Drivers using this not coded up yet, so I simply tested with virtio-pci and verified that this patch does not break existing drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio: introduce an API to set affinity for a virtqueueJason Wang
Sometimes, virtio device need to configure irq affinity hint to maximize the performance. Instead of just exposing the irq of a virtqueue, this patch introduce an API to set the affinity for a virtqueue. The api is best-effort, the affinity hint may not be set as expected due to platform support, irq sharing or irq type. Currently, only pci method were implemented and we set the affinity according to: - if device uses INTX, we just ignore the request - if device has per vq vector, we force the affinity hint - if the virtqueues share MSI, make the affinity OR over all affinities requested Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio-ring: move queue_index to vring_virtqueueJason Wang
Instead of storing the queue index in transport-specific virtio structs, this patch moves them to vring_virtqueue and introduces an helper to get the value. This lets drivers simplify their management and tracing of virtqueues. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio_balloon: not EXPERIMENTAL any more.Rusty Russell
It is not experimental in any vaguely-sane sense. Reported-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio-balloon: dependency fixMichael S. Tsirkin
Devices should depend on virtio, not select it. It's supposed to be selected by the particular driver, e.g. VIRTIO_PCI. Make balloon depend on VIRTIO and EXPERIMENTAL (to match description). Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio-blk: fix NULL checking in virtblk_alloc_req()Dan Carpenter
Smatch complains about the inconsistent NULL checking here. Fix it to return NULL on failure. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (fixed accidental deletion)
2012-09-28virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio pathAsias He
We need to support both REQ_FLUSH and REQ_FUA for bio based path since it does not get the sequencing of REQ_FUA into REQ_FLUSH that request based drivers can request. REQ_FLUSH is emulated by: A) If the bio has no data to write: 1. Send VIRTIO_BLK_T_FLUSH to device, 2. In the flush I/O completion handler, finish the bio B) If the bio has data to write: 1. Send VIRTIO_BLK_T_FLUSH to device 2. In the flush I/O completion handler, send the actual write data to device 3. In the write I/O completion handler, finish the bio REQ_FUA is emulated by: 1. Send the actual write data to device 2. In the write I/O completion handler, send VIRTIO_BLK_T_FLUSH to device 3. In the flush I/O completion handler, finish the bio Changes in v7: - Using vbr->flags to trace request type - Dropped unnecessary struct virtio_blk *vblk parameter - Reuse struct virtblk_req in bio done function Cahnges in v6: - Reworked REQ_FLUSH and REQ_FUA emulatation order Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Tejun Heo <tj@kernel.org> Cc: Shaohua Li <shli@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio-blk: Add bio-based IO path for virtio-blkAsias He
This patch introduces bio-based IO path for virtio-blk. Compared to request-based IO path, bio-based IO path uses driver provided ->make_request_fn() method to bypasses the IO scheduler. It handles the bio to device directly without allocating a request in block layer. This reduces the IO path in guest kernel to achieve high IOPS and lower latency. The downside is that guest can not use the IO scheduler to merge and sort requests. However, this is not a big problem if the backend disk in host side uses faster disk device. When the bio-based IO path is not enabled, virtio-blk still uses the original request-based IO path, no performance difference is observed. Using a slow device e.g. normal SATA disk, the bio-based IO path for sequential read and write are slower than req-based IO path due to lack of merge in guest kernel. So we make the bio-based path optional. Performance evaluation: ----------------------------- 1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : 28%, 24%, 21%, 16% Latency improvement: 32%, 17%, 21%, 16% Long version: With bio-based IO path: seq-read : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec rand-write: io=3095.7MB, bw=96198KB/s, iops=192396 , runt= 32952msec clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30 clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35 clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39 clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45 cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954 cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137 cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648 cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375 With request-based IO path: seq-read : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49 clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15 clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27 clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68 cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622 cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959 cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082 cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985 2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : 11%, 11%, 13%, 10% Latency improvement: 10%, 10%, 12%, 10% Long Version: With bio-based IO path: read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29 clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80 clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07 clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25 cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517 cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604 cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612 cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105 With request-based IO path: read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84 clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64 clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55 clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95 cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641 cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689 cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223 cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409 3) Fio test is performed in a 8 vcpu guest with normal SATA based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : -10%, -10%, 4.4%, 0.5% Latency improvement: -12%, -15%, 2.5%, 0.8% Long Version: With bio-based IO path: read : io=124812KB, bw=36537KB/s, iops=9060 , runt= 3416msec write: io=169180KB, bw=24406KB/s, iops=6065 , runt= 6932msec read : io=256200KB, bw=2089.3KB/s, iops=520 , runt=122630msec write: io=257988KB, bw=1545.7KB/s, iops=384 , runt=166910msec clat (msec): min=1 , max=1527 , avg=28.06, stdev=89.54 clat (msec): min=2 , max=344 , avg=41.12, stdev=38.70 clat (msec): min=8 , max=1984 , avg=490.63, stdev=207.28 clat (msec): min=33 , max=4131 , avg=659.19, stdev=304.71 cpu : usr=4.85%, sys=17.15%, ctx=31593, majf=0, minf=7 cpu : usr=3.04%, sys=11.45%, ctx=39377, majf=0, minf=0 cpu : usr=0.47%, sys=1.59%, ctx=262986, majf=0, minf=16 cpu : usr=0.47%, sys=1.46%, ctx=337410, majf=0, minf=0 With request-based IO path: read : io=150120KB, bw=40420KB/s, iops=10037 , runt= 3714msec write: io=194932KB, bw=27029KB/s, iops=6722 , runt= 7212msec read : io=257136KB, bw=2001.1KB/s, iops=498 , runt=128443msec write: io=258276KB, bw=1537.2KB/s, iops=382 , runt=168028msec clat (msec): min=1 , max=1542 , avg=24.84, stdev=32.45 clat (msec): min=3 , max=628 , avg=35.62, stdev=39.71 clat (msec): min=8 , max=2540 , avg=503.28, stdev=236.97 clat (msec): min=41 , max=4398 , avg=653.88, stdev=302.61 cpu : usr=3.91%, sys=15.75%, ctx=26968, majf=0, minf=23 cpu : usr=2.50%, sys=10.56%, ctx=19090, majf=0, minf=0 cpu : usr=0.16%, sys=0.43%, ctx=20159, majf=0, minf=16 cpu : usr=0.18%, sys=0.53%, ctx=81364, majf=0, minf=0 How to use: ----------------------------- Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk use_bio=1' to enable ->make_request_fn() based I/O path. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Tejun Heo <tj@kernel.org> Cc: Shaohua Li <shli@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Asias He <asias@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio: console: fix error handling in init() functionAlexey Khoroshilov
If register_virtio_driver() fails, virtio-ports class is not destroyed. The patch adds error handling of register_virtio_driver(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28tools: Fix pthread flag for Makefile of trace-agent used by virtio-traceYoshihiro YUNOMAE
pthread flag should not be -lpthread but -pthread using gcc. The -lpthread links the external multithread library. On the other hand, the -pthread manages both the gcc's preprocessor and linker to be able to compile with pthread. Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28tools: Add guest trace agent as a user toolYoshihiro YUNOMAE
This patch adds a user tool, "trace agent" for sending trace data of a guest to a Host in low overhead. This agent has the following functions: - splice a page of ring-buffer to read_pipe without memory copying - splice the page from write_pipe to virtio-console without memory copying - write trace data to stdout by using -o option - controlled by start/stop orders from a Host Changes in v2: - Cleanup (change fprintf() to pr_err() and an include guard) Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Acked-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio/console: Allocate scatterlist according to the current pipe sizeMasami Hiramatsu
Allocate scatterlist according to the current pipe size. This allows splicing bigger buffer if the pipe size has been changed by fcntl. Changes in v2: - Just a minor fix for avoiding a confliction with previous patch. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28ftrace: Allow stealing pages from pipe bufferMasami Hiramatsu
Use generic steal operation on pipe buffer to allow stealing ring buffer's read page from pipe buffer. Note that this could reduce the performance of splice on the splice_write side operation without affinity setting. Since the ring buffer's read pages are allocated on the tracing-node, but the splice user does not always execute splice write side operation on the same node. In this case, the page will be accessed from the another node. Thus, it is strongly recommended to assign the splicing thread to corresponding node. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio/console: Wait until the port is ready on spliceMasami Hiramatsu
Wait if the port is not connected or full on splice like as write is doing. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio/console: Add a failback for unstealable pipe bufferMasami Hiramatsu
Add a failback memcpy path for unstealable pipe buffer. If buf->ops->steal() fails, virtio-serial tries to copy the page contents to an allocated page, instead of just failing splice(). Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28virtio/console: Add splice_write supportMasami Hiramatsu
Enable to use splice_write from pipe to virtio-console port. This steals pages from pipe and directly send it to host. Note that this may accelerate only the guest to host path. Changes in v2: - Use GFP_KERNEL instead of GFP_ATOMIC in syscall context function. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28netdev: octeon: fix return value check in octeon_mgmt_init_phy()Wei Yongjun
In case of error, the function of_phy_connect() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28Merge tag 'v3.6-rc7' into nextJames Morris
Linux 3.6-rc7 Requested by David Howells so he can merge his key susbsystem work into my tree with requisite -linus changesets.
2012-09-27Revert "be2net: fix vfs enumeration"David S. Miller
This reverts commit 51af6d7c1f31e0f3d42c87d53657ec7acb6e3462. Breaks the build with CONFIG_PCI_ATS not enabled. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter deltasLen Brown
# turbostat -d 0x34 is useful for printing the number of SMI's within an interval on Nehalem and newer processors. where # turbostat -m 0x34 will simply print out the total SMI count since reset. Suggested-by: Andi Kleen Signed-off-by: Len Brown <len.brown@intel.com>
2012-09-27Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "The three nouveau fixes quiten unneeded dmesg spam that people are seeing and pondering, The udl fix stops it from trying to driver monitors that are too big, where we get a black screen. And a vmware memory alloc problem." * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/nvc0/fifo: ignore bits in PFIFO_INTR that aren't set in PFIFO_INTR_EN drm/udl: limit modes to the sku pixel limits. vmwgfx: corruption in vmw_event_fence_action_create() drm/nvc0/ltcg: mask off intr 0x10 drm/nouveau: silence a debug message triggered by newer userspace
2012-09-27Merge tag 'usb-3.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg Kroah-Hartman: "Here are two USB bugfixes for your 3.6-rc7 tree. The OHCI fix has been reported a number of times and is a regression from 3.5, and the patch that causes the regression was on the way to the -stable trees before I was reminded (again) that this fix needed to get to your tree soon. The host controller bugfix was reported in older kernels as being pretty easy to trigger, and has been tested by Red Hat and their customers. Both have been in the usb-next branch in the -next tree for a while now, I just cherry-picked them out to get to you in time for the 3.6 release. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'usb-3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: Fix race condition when removing host controllers USB: ohci-at91: fix null pointer in ohci_hcd_at91_overcurrent_irq
2012-09-27ALSA: snd-usb: fix next_packet_size calls for pause caseDaniel Mack
Also fix the calls to next_packet_size() for the pause case. This was missed in 245baf983 ("ALSA: snd-usb: fix calls to next_packet_size"). Signed-off-by: Daniel Mack <zonque@gmail.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Reported-and-tested-by: Christian Tefzer <ctrefzer@gmx.de> Cc: stable@kernel.org [ Taking directly because Takashi is on vacation - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-27Merge tag 'asoc-3.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound Pull ASoC update from Mark Brown: "One small and obvious driver-specific fix. Takashi is on vacation now so he asked me to send directly, it's a pretty bad bug with low regression risk." * tag 'asoc-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound: ASoC: wm2000: Correct register size
2012-09-27net: struct napi_struct fields reorderingEric Dumazet
Remove two holes on 64bit arches, and put dev_list at the end of napi_struct since its not used in fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27net: use bigger pages in __netdev_alloc_fragEric Dumazet
We currently use percpu order-0 pages in __netdev_alloc_frag to deliver fragments used by __netdev_alloc_skb() Depending on NIC driver and arch being 32 or 64 bit, it allows a page to be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096 Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows : - Better filling of space (the ending hole overhead is less an issue) - Less calls to page allocator or accesses to page->_count - Could allow struct skb_shared_info futures changes without major performance impact. This patch implements a transparent fallback to smaller pages in case of memory pressure. It also uses a standard "struct page_frag" instead of a custom one. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27inetpeer: fix token initializationNicolas Dichtel
When jiffies wraps around (for example, 5 minutes after the boot, see INITIAL_JIFFIES) and peer has just been created, now - peer->rate_last can be < XRLIM_BURST_FACTOR * timeout, so token is not set to the maximum value, thus some icmp packets can be unexpectedly dropped. Fix this case by initializing last_rate to 60 seconds in the past. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27qlcnic: Fix scheduling while atomic bugNarendra K
In the device close path, 'qlcnic_fw_destroy_ctx' and 'qlcnic_poll_rsp' call msleep. But 'qlcnic_fw_destroy_ctx' and 'qlcnic_poll_rsp' are called with 'adapter->tx_clean_lock' spin lock held resulting in scheduling while atomic bug causing the following trace. I observed that the commit 012dc19a45b2b9cc2ebd14aaa401cf782c2abba4 from John Fastabend addresses a similar issue in ixgbevf driver. Adopting the same approach used in the commit, this patch uses mdelay to address the issue. [79884.999115] BUG: scheduling while atomic: ip/30846/0x00000002 [79885.005562] INFO: lockdep is turned off. [79885.009958] Modules linked in: qlcnic fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE bnep bluetooth rfkill ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables dcdbas coretemp kvm_intel kvm iTCO_wdt ixgbe iTCO_vendor_support crc32c_intel ghash_clmulni_intel nfsd microcode sb_edac pcspkr edac_core dca bnx2x shpchp auth_rpcgss nfs_acl lpc_ich mfd_core mdio lockd libcrc32c wmi acpi_pad acpi_power_meter sunrpc uinput sd_mod sr_mod cdrom crc_t10dif ahci libahci libata megaraid_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded: qlcnic] [79885.083608] Pid: 30846, comm: ip Tainted: G W O 3.6.0-rc7+ #1 [79885.090805] Call Trace: [79885.093569] [<ffffffff816764d8>] __schedule_bug+0x68/0x76 [79885.099699] [<ffffffff8168358e>] __schedule+0x99e/0xa00 [79885.105634] [<ffffffff81683929>] schedule+0x29/0x70 [79885.111186] [<ffffffff81680def>] schedule_timeout+0x16f/0x350 [79885.117724] [<ffffffff811afb7a>] ? init_object+0x4a/0x90 [79885.123770] [<ffffffff8107c190>] ? __internal_add_timer+0x140/0x140 [79885.130873] [<ffffffff81680fee>] schedule_timeout_uninterruptible+0x1e/0x20 [79885.138773] [<ffffffff8107e830>] msleep+0x20/0x30 [79885.144159] [<ffffffffa04c7fbf>] qlcnic_issue_cmd+0xef/0x290 [qlcnic] [79885.151478] [<ffffffffa04c8265>] qlcnic_fw_cmd_destroy_rx_ctx+0x55/0x90 [qlcnic] [79885.159868] [<ffffffffa04c92fd>] qlcnic_fw_destroy_ctx+0x2d/0xa0 [qlcnic] [79885.167576] [<ffffffffa04bf2ed>] __qlcnic_down+0x11d/0x180 [qlcnic] [79885.174708] [<ffffffffa04bf6f8>] qlcnic_close+0x18/0x20 [qlcnic] [79885.181547] [<ffffffff8153b4c5>] __dev_close_many+0x95/0xe0 [79885.187899] [<ffffffff8153b548>] __dev_close+0x38/0x50 [79885.193761] [<ffffffff81545101>] __dev_change_flags+0xa1/0x180 [79885.200419] [<ffffffff81545298>] dev_change_flags+0x28/0x70 [79885.206779] [<ffffffff815531b8>] do_setlink+0x378/0xa00 [79885.212731] [<ffffffff81354fe1>] ? nla_parse+0x31/0xe0 [79885.218612] [<ffffffff815558ee>] rtnl_newlink+0x37e/0x560 [79885.224768] [<ffffffff812cfa19>] ? selinux_capable+0x39/0x50 [79885.231217] [<ffffffff812cbf98>] ? security_capable+0x18/0x20 [79885.237765] [<ffffffff81555114>] rtnetlink_rcv_msg+0x114/0x2f0 [79885.244412] [<ffffffff81551f87>] ? rtnl_lock+0x17/0x20 [79885.250280] [<ffffffff81551f87>] ? rtnl_lock+0x17/0x20 [79885.256148] [<ffffffff81555000>] ? __rtnl_unlock+0x20/0x20 [79885.262413] [<ffffffff81570fc1>] netlink_rcv_skb+0xa1/0xb0 [79885.268661] [<ffffffff81551fb5>] rtnetlink_rcv+0x25/0x40 [79885.274727] [<ffffffff815708bd>] netlink_unicast+0x19d/0x220 [79885.281146] [<ffffffff81570c45>] netlink_sendmsg+0x305/0x3f0 [79885.287595] [<ffffffff8152b188>] ? sock_update_classid+0x148/0x2e0 [79885.294650] [<ffffffff81525c2c>] sock_sendmsg+0xbc/0xf0 [79885.300600] [<ffffffff8152600c>] __sys_sendmsg+0x3ac/0x3c0 [79885.306853] [<ffffffff8109be23>] ? up_read+0x23/0x40 [79885.312510] [<ffffffff816896cc>] ? do_page_fault+0x2bc/0x570 [79885.318968] [<ffffffff81191854>] ? sys_brk+0x44/0x150 [79885.324715] [<ffffffff811c458c>] ? fget_light+0x24c/0x520 [79885.330875] [<ffffffff815286f9>] sys_sendmsg+0x49/0x90 [79885.336707] [<ffffffff8168e429>] system_call_fastpath+0x16/0x1b Signed-off-by: Narendra K <narendra_k@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27tcp: Remove unused parameter from tcp_v4_save_optionsChristoph Paasch
struct sock *sk is not used inside tcp_v4_save_options. Thus it can be removed. Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27bnx2: Clean up remaining iounmapNeil Horman
commit c0357e975afdbbedab5c662d19bef865f02adc17 modified bnx2 to switch from using ioremap/iounmap to pci_iomap/pci_iounmap. They missed a spot in the error path of bnx2_init_one though. This patch just cleans that up. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Michael Chan <mcan@broadcom.com> CC: "David S. Miller" <davem@davemloft.net> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27ipv6: gre: remove ip6gre_header_parse()Eric Dumazet
dev_parse_header() callers provide 8 bytes of storage, so it's not possible to store an IPv6 address. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull one more arm-soc bugfix from Olof Johansson: "Here's a bugfix for orion5x. Without this, PCI doesn't initialize properly because of too small coherent pool to cover the allocations needed. A similar fix has already been done on kirkwood." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: Orion5x: Fix too small coherent pool.
2012-09-27Merge branch 'fixes-for-3.6' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull ARM dma-mapping fix from Marek Szyprowski: "This patch fixes a potential memory leak in the ARM dma-mapping code." * 'fixes-for-3.6' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: ARM: dma-mapping: Fix potential memory leak in atomic_pool_init()
2012-09-27Merge tag 'gpio-fixes-v3.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fix from Linus Walleij: "A late GPIO fix: Roland Stigge found a problem in the LPC32xx driver where a callback ignores one of its arguments. It needs to go into stable too so sending this upstream immediately." * tag 'gpio-fixes-v3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio-lpc32xx: Fix value handling of gpio_direction_output()
2012-09-27Merge tag 'md-3.6-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull two md bugfixes from NeilBrown: "One (missing spinlock init) was only introduced recently. The other has been present as long as raid10 has been supported, so is tagged for -stable." * tag 'md-3.6-fixes' of git://neil.brown.name/md: md/raid10: fix "enough" function for detecting if array is failed. md/raid5: add missing spin_lock_init.
2012-09-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edacLinus Torvalds
Pull EDAC fixes from Mauro Carvalho Chehab: "Three edac fixes at the memory enumeration logic: - i3200_edac: Fixes a regression at the memory rank size, when the memorias are dual-rank; - i5000_edac: Fix a longstanding bug when calculating the memory size: before Kernel 3.6, the memory size were right only with one specific configuration; - sb_edac: Fixes a bug since the initial release of the driver: with 16GB DIMMs, there's an overflow at the memory size, causing the number of pages per dimm (an unsigned value) to have the highest bit equal to 1, effectively mangling the memory size. The third bug can potentially affect the error decoding logic as well." * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: sb_edac: Avoid overflow errors at memory size calculation i5000: Fix the memory size calculation with 2R memories i3200_edac: Fix memory rank size
2012-09-27trivial select_parent documentation fixJ. Bruce Fields
"Search list for X" sounds like you're trying to find X on a list. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-27net: remove sk_init() helperEric Dumazet
It seems sk_init() has no value today and even does strange things : # grep . /proc/sys/net/core/?mem_* /proc/sys/net/core/rmem_default:212992 /proc/sys/net/core/rmem_max:131071 /proc/sys/net/core/wmem_default:212992 /proc/sys/net/core/wmem_max:131071 We can remove it completely. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27pkt_sched: Fix warning false positives.David S. Miller
GCC refuses to recognize that all error control flows do in fact set err to something. Add an explicit initialization to shut it up. net/sched/sch_drr.c: In function ‘drr_enqueue’: net/sched/sch_drr.c:359:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized] net/sched/sch_qfq.c: In function ‘qfq_enqueue’: net/sched/sch_qfq.c:885:11: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27bna: Fix warning false positive.David S. Miller
GCC can't see that in all non-error-return paths we do in fact set *using_dac to something. Add an explicit initialization to remove this warning: drivers/net/ethernet/brocade/bna/bnad.c: In function ‘bnad_pci_probe’: drivers/net/ethernet/brocade/bna/bnad.c:3079:5: warning: ‘using_dac’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/net/ethernet/brocade/bna/bnad.c:3233:7: note: ‘using_dac’ was declared here Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27net: phy: smsc: Implement PHY config_init for LAN87xxMarek Vasut
The LAN8710/LAN8720 chips do have broken the "FlexPWR" smart power-saving capability. Enabling it leads to the PHY not being able to detect Link when cold-started without cable connected. Thus, make sure this is disabled. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Christian Hohnstaedt <chohnstaedt@innominate.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fabio Estevam <fabio.estevam@freescale.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Otavio Salvador <otavio@ossystems.com.br> Acked-by: Otavio Salvador <otavio@ossystems.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27nf_defrag_ipv6: fix oops on module unloadingKonstantin Khlebnikov
fix copy-paste error introduced in linux-next commit "ipv6: add a new namespace for nf_conntrack_reasm" Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Amerigo Wang <amwang@redhat.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27be2net: fix vfs enumerationIvan Vecera
Current VFs enumeration algorithm used in be_find_vfs does not take domain number into the match. The match found in igb/ixgbe is more elegant and safe. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27tunnel: drop packet if ECN present with not-ECTstephen hemminger
Linux tunnels were written before RFC6040 and therefore never implemented the corner case of ECN getting set in the outer header and the inner header not being ready for it. Section 4.2. Default Tunnel Egress Behaviour. o If the inner ECN field is Not-ECT, the decapsulator MUST NOT propagate any other ECN codepoint onwards. This is because the inner Not-ECT marking is set by transports that rely on dropped packets as an indication of congestion and would not understand or respond to any other ECN codepoint [RFC4774]. Specifically: * If the inner ECN field is Not-ECT and the outer ECN field is CE, the decapsulator MUST drop the packet. * If the inner ECN field is Not-ECT and the outer ECN field is Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the outgoing packet with the ECN field cleared to Not-ECT. This patch moves the ECN decap logic out of the individual tunnels into a common place. It also adds logging to allow detecting broken systems that set ECN bits incorrectly when tunneling (or an intermediate router might be changing the header). Overloads rx_frame_error to keep track of ECN related error. Thanks to Chris Wright who caught this while reviewing the new VXLAN tunnel. This code was tested by injecting faulty logic in other end GRE to send incorrectly encapsulated packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27xfrm: remove extranous rcu_read_lockstephen hemminger
The handlers for xfrm_tunnel are always invoked with rcu read lock already. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27gre: remove unnecessary rcu_read_lock/unlockstephen hemminger
The gre function pointers for receive and error handling are always called (from gre.c) with rcu_read_lock already held. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27gre: fix handling of key 0stephen hemminger
GRE driver incorrectly uses zero as a flag value. Zero is a perfectly valid value for key, and the tunnel should match packets with no key only with tunnels created without key, and vice versa. This is a slightly visible change since previously it might be possible to construct a working tunnel that sent key 0 and received only because of the key wildcard of zero. I.e the sender sent key of zero, but tunnel was defined without key. Note: using gre key 0 requires iproute2 utilities v3.2 or later. The original utility code was broken as well. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-27sparc: bpf_jit_comp: add XOR instruction for BPF JIT JITDaniel Borkmann
This patch is a follow-up for patch "filter: add XOR instruction for use with X/K" that implements BPF SPARC JIT parts for the BPF XOR operation. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>