linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2008-11-06	block: add timer on blkdev_dequeue_request() not elv_next_request()	Tejun Heo
	Block queue supports two usage models - one where block driver peeks at the front of queue using elv_next_request(), processes it and finishes it and the other where block driver peeks at the front of queue, dequeue the request using blkdev_dequeue_request() and finishes it. The latter is more flexible as it allows the driver to process multiple commands concurrently. These two inconsistent usage models affect the block layer implementation confusing. For some, elv_next_request() is considered the issue point while others consider blkdev_dequeue_request() the issue point. Till now the inconsistency mostly affect only accounting, so it didn't really break anything seriously; however, with block layer timeout, this inconsistency hits hard. Block layer considers elv_next_request() the issue point and adds timer but SCSI layer thinks it was just peeking and when the request can't process the command right away, it's just left there without further processing. This makes the request dangling on the timer list and, when the timer goes off, the request which the SCSI layer and below think is still on the block queue ends up in the EH queue, causing various problems - EH hang (failed count goes over busy count and EH never wakes up), WARN_ON() and oopses as low level driver trying to handle the unknown command, etc. depending on the timing. As SCSI midlayer is the only user of block layer timer at the moment, moving blk_add_timer() to elv_dequeue_request() fixes the problem; however, this two usage models definitely need to be cleaned up in the future. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-06	bio: define __BIOVEC_PHYS_MERGEABLE	Jeremy Fitzhardinge
	Define __BIOVEC_PHYS_MERGEABLE as the default implementation of BIOVEC_PHYS_MERGEABLE, so that its available for reuse within an arch-specific definition of BIOVEC_PHYS_MERGEABLE. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-06	block: remove unused ll_new_mergeable()	FUJITA Tomonori
	Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-06	x86: mention ACPI in top-level Kconfig menu	Bjorn Helgaas
	Impact: clarify menuconfig text Mention ACPI in the top-level menu to give a clue as to where it lives. This matches what ia64 does. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06	md: fix bug in raid10 recovery.	NeilBrown
	Adding a spare to a raid10 doesn't cause recovery to start. This is due to an silly type in commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda and so is a bug in 2.6.27 and .28-rc. Thanks to Thomas Backlund for bisecting to find this. Cc: Thomas Backlund <tmb@mandriva.org> Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-06	md: revert the recent addition of a call to the BLKRRPART ioctl.	NeilBrown
	It turns out that it is only safe to call blkdev_ioctl when the device is actually open (as ->bd_disk is set to NULL on last close). And it is quite possible for do_md_stop to be called when the device is not open. So discard the call to blkdev_ioctl(BLKRRPART) which was added in commit 934d9c23b4c7e31840a895ba4b7e88d6413c81f3 It is just as easy to call this ioctl from userspace when needed (on mdadm -S) so leave it out of the kernel Signed-off-by: NeilBrown <neilb@suse.de>
2008-11-06	x86: size NR_IRQS on 32-bit systems the same way as 64-bit	Yinghai Lu
	Impact: make NR_IRQS big enough for system with lots of apic/pins If lots of IO_APIC's are there (or can be there), size the same way as 64-bit, depending on MAX_IO_APICS and NR_CPUS. This fixes the boot problem reported by Ben Hutchings on a 32-bit server with 5 IO-APICs and 240 IO-APIC pins. Signed-off-by: Yinghai <yinghai@kernel.org> Tested-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06	x86: don't allow nr_irqs > NR_IRQS	Ben Hutchings
	Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can return a value larger than NR_IRQS. This will lead to probe_irq_on() overrunning the irq_desc array. I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a Supermicro dual Xeon system. NR_IRQS is 224 but probe_nr_irqs() detects 5 IOAPICs and returns 240. Here are the log messages: Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) Tue Nov 4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24]) Tue Nov 4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48]) Tue Nov 4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72]) Tue Nov 4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96]) Tue Nov 4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119 Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Tue Nov 4 16:53:47 2008 Enabling APIC mode: Flat. Using 5 I/O APICs Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05	[JFFS2] fix race condition in jffs2_lzo_compress()	Geert Uytterhoeven
	deflate_mutex protects the globals lzo_mem and lzo_compress_buf. However, jffs2_lzo_compress() unlocks deflate_mutex _before_ it has copied out the compressed data from lzo_compress_buf. Correct this by moving the mutex unlock after the copy. In addition, document what deflate_mutex actually protects. Cc: stable@kernel.org Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Richard Purdie <rpurdie@openedhand.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-11-05	net/9p: fix printk format warnings	Randy Dunlap
	Fix printk format warnings in net/9p. Built cleanly on 7 arches. net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 6 has type 'u64' net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64' net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64' net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64' net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64' net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64' net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64' net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64' net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05	unsigned fid->fid cannot be negative	Roel Kluin
	Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05	9p: rdma: remove duplicated #include	Huang Weiyi
	Removed duplicated #include <rdma/ib_verbs.h> in net/9p/trans_rdma.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05	p9: Fix leak of waitqueue in request allocation path	Tom Tucker
	If a T or R fcall cannot be allocated, the function returns an error but neglects to free the wait queue that was successfully allocated. If it comes through again a second time this wq will be overwritten with a new allocation and the old allocation will be leaked. Also, if the client is subsequently closed, the close path will attempt to clean up these allocations, so set the req fields to NULL to avoid duplicate free. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05	9p: Remove unneeded free of fcall for Flush	Tom Tucker
	T and R fcall are reused until the client is destroyed. There does not need to be a special case for Flush Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05	9p: Make all client spin locks IRQ safe	Tom Tucker
	The client lock must be IRQ safe. Some of the lock acquisition paths took regular spin locks. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05	9p: rdma: Set trans prior to requesting async connection ops	Tom Tucker
	The RDMA connection manager is fundamentally asynchronous. Since the async callback context is the client pointer, the transport in the client struct needs to be set prior to calling the first async op. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-11-05	IB/mlx4: Set umem field to NULL in mlx4_ib_alloc_fast_reg_mr()	Vladimir Sokolovsky
	Set mr->umem to NULL in mlx4_ib_alloc_fast_reg_mr(). Otherwise ib_dereg_mr() may invoke ib_umem_release() on a random pointer value and get an oops. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-05	[SCSI] scsi_error regression: Fix idempotent command handling	Mike Christie
	Drivers want to be able to return DID_TRANSPORT_DISRUPTED and have it do the right thing for commands like tape and passthrouh as far as retries go. The LLDs previously used DID_BUS_BUSY or DID_ERROR which followed the cmd->retries limit, but DID_TRANSPORT_DISRUPTED was skipping that check so it could have caused a problem with tape commands. This patch has DID_TRANSPORT_DISRUPTED check the cmd->retries/cmd->allowed. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] zfcp: Fix hexdump data in s390dbf traces	Christof Schmitt
	Fix multiple problems found in the hexdump data: - length calculation was wrong, traces were incomplete - FC payloads were dumped in different record than the output function tried to read - minor fixes in output - allow complete RSCN traces (up to 1024 bytes according to spec) Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] zfcp: fix erp timeout cleanup for port open requests	Martin Petermann
	If an open port fsf request times out (in erp) the corresponding erp_action member of the fsf request need to set to NULL. If the port structure will be removed later-on there will be still a reference in the fsf request to the non existing erp_action otherwise. Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] zfcp: Wait for port scan to complete when setting adapter online	Christof Schmitt
	Attaching a unit immediately after setting the adapter online should be possible. The problem right now is that the port_scan runs from a workqueue and has not finished when the set_online call returns and the sysfs structures for the ports are not available yet. Fix that by waiting for the port scan to complete. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] zfcp: Fix cast warning	Christof Schmitt
	Fix leftover from last typecast patch: drivers/s390/scsi/zfcp_aux.c: In function ‘zfcp_port_enqueue’: drivers/s390/scsi/zfcp_aux.c:629: warning: format ‘%016llx’ expects type ‘long long unsigned int’, but argument 3 has type ‘u64’ Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] zfcp: Fix request list handling in error path	Christof Schmitt
	Fix the handling of the request list in the error path: - Use irqsave for the lock as in the good path. - Before removing the request, check if it is still in the list, a call to dismiss_all might have changed the list in between. - zfcp_qdio_send does not change the queue counters on failure, trying revert something is wrong, so remove this. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] zfcp: fix mempool usage for status_read requests	Christof Schmitt
	When allocating fsf requests without qtcb, store the pointer to the mempool in the fsf requests for later call to mempool_free. This codepath is only used by the status_read requests. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] zfcp: fix req_list_locking.	Heiko Carstens
	The per adapter req_list_lock must be held with interrupts disabled, otherwise we might end up with nice deadlocks as lockdep tells us (see below). zfcp 0.0.1804: QDIO problem occurred. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.27-rc8-00035-g4a77035-dirty #86 --------------------------------------------------------- swapper/0 just changed the state of lock: (&adapter->erp_lock){++..}, at: [<00000000002c82ae>] zfcp_erp_adapter_reopen+0x4e/0x8c but this lock took another, hard-irq-unsafe lock in the past: (&adapter->req_list_lock){-+..} and interrupts could create inverse lock ordering between them. [tons of backtraces, but only the interesting part follows] the second lock's dependencies: -> (&adapter->req_list_lock){-+..} ops: 2280627634176 { initial-use at: [<0000000000071f10>] __lock_acquire+0x504/0x18bc [<000000000007335c>] lock_acquire+0x94/0xbc [<00000000003d7224>] _spin_lock_irqsave+0x6c/0xb0 [<00000000002cf684>] zfcp_fsf_req_dismiss_all+0x50/0x140 [<00000000002c87ee>] zfcp_erp_adapter_strategy_generic+0x66/0x3d0 [<00000000002c9498>] zfcp_erp_thread+0x88c/0x1318 [<000000000001b0d2>] kernel_thread_starter+0x6/0xc [<000000000001b0cc>] kernel_thread_starter+0x0/0xc in-softirq-W at: [<0000000000072172>] __lock_acquire+0x766/0x18bc [<000000000007335c>] lock_acquire+0x94/0xbc [<00000000003d7224>] _spin_lock_irqsave+0x6c/0xb0 [<00000000002ca73e>] zfcp_qdio_int_resp+0xbe/0x2ac [<000000000027a1d6>] qdio_kick_inbound_handler+0x82/0xa0 [<000000000027daba>] tiqdio_inbound_processing+0x62/0xf8 [<0000000000047ba4>] tasklet_action+0x100/0x1f4 [<0000000000048b5a>] __do_softirq+0xae/0x154 [<0000000000021e4a>] do_softirq+0xea/0xf0 [<00000000000485de>] irq_exit+0xde/0xe8 [<0000000000268c64>] do_IRQ+0x160/0x1fc [<00000000000261a2>] io_return+0x0/0x8 [<000000000001b8f8>] cpu_idle+0x17c/0x224 hardirq-on-W at: [<0000000000072190>] __lock_acquire+0x784/0x18bc [<000000000007335c>] lock_acquire+0x94/0xbc [<00000000003d702c>] _spin_lock+0x5c/0x9c [<00000000002caff6>] zfcp_fsf_req_send+0x3e/0x158 [<00000000002ce7fe>] zfcp_fsf_exchange_config_data+0x106/0x124 [<00000000002c8948>] zfcp_erp_adapter_strategy_generic+0x1c0/0x3d0 [<00000000002c98ea>] zfcp_erp_thread+0xcde/0x1318 [<000000000001b0d2>] kernel_thread_starter+0x6/0xc [<000000000001b0cc>] kernel_thread_starter+0x0/0xc } ... key at: [<0000000000e356c8>] __key.26629+0x0/0x8 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christof Schmitt <christof.schmit@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] zfcp: Dont clear reference from SCSI device to unit	Christof Schmitt
	It is possible that a remote port has a problem, the SCSI device gets deleted after the rport timeout and then the timeout for pending SCSI commands trigger an abort. For this case, don't delete the reference from the SCSI device to the zfcp unit, so that we can still have the reference to issue an abort request. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: Swen Schillig <swen@vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] qla2xxx: Update version number to 8.02.01-k9.	Andrew Vasquez
	Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] qla2xxx: Return a FAILED status when abort mailbox-command fails.	Michael Reed
	Mike Reed noted (https://bugzilla.novell.com/show_bug.cgi?id=421330) that the driver was incorrectly returning a SUCCESS status if the driver's request to the firmware to abort a command failed. By doing so, the mid-layer believed, incorrectly, that the command has completed and has been returned (ultimately clearing scsi_cmnd.request_buffer) yet the driver still has the command. What should correctly happen is a mid-layer escalation (device-reset, etc.) of recovery during which the driver will eventually return the outstanding commands to the mid-layer. Cc: Stable Tree <stable@kernel.org> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] qla2xxx: Do not honour max_vports from firmware for 2G ISPs and below.	Shyam Sundar
	For 23XX ISPs, max_vports may return an invalid value. Do not honour it. Cc: Stable Tree <stable@kernel.org> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] qla2xxx: Use pci_disable_rom() to manipulate PCI config space.	Andrew Vasquez
	http://bugzilla.kernel.org/show_bug.cgi?id=9422 Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] qla2xxx: Correct Atmel flash-part handling.	Lalit Chandivade
	Use correct block size (4K) for erase command 0x20 for Atmel Flash. Use dword addresses for determining sector boundary. Cc: Stable Tree <stable@kernel.org> Signed-off-by: Lalit Chandivade <lalit.chandivade@qlogic.com> Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	[SCSI] megaraid: fix mega_internal_command oops	FUJITA Tomonori
	scsi_cmnd->cmnd was changed from a static array to a pointer post 2.6.25. It breaks mega_internal_command(): static int mega_internal_command(adapter_t adapter, megacmd_t mc, mega_passthru pthru) { ... scb = &adapter->int_scb; memset(scb, 0, sizeof(scb_t)); scmd = &adapter->int_scmd; memset(scmd, 0, sizeof(Scsi_Cmnd)); sdev = kzalloc(sizeof(struct scsi_device), GFP_KERNEL); scmd->device = sdev; scmd->device->host = adapter->host; scmd->host_scribble = (void )scb; scmd->cmnd[0] = MEGA_INTERNAL_CMD; mega_internal_command() uses scsi_cmnd allocated internally so scmd->cmnd is NULL here. This patch adds a static array for cdb to adapter_t and uses it here. This also uses scsi_allocate_command/scsi_free_command, the recommended way to allocate struct scsi_cmnd since the driver might use sense_buffer in struct scsi_cmnd. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> Tested-by: Pascal Terjan <pterjan@gmail.com> Reported-by: Pascal Terjan <pterjan@gmail.com> Acked-by: "Yang, Bo" <Bo.Yang@lsi.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-11-05	sched: re-tune balancing	Ingo Molnar
	Impact: improve wakeup affinity on NUMA systems, tweak SMP systems Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain balancing defaults on NUMA and SMP systems. Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason why we would not want to have wakeup affinity across nodes as well. (we already do this in the standard NUMA template.) lat_ctx on a NUMA box is particularly happy about this change: before: \| phoenix:~/l> ./lat_ctx -s 0 2 \| "size=0k ovr=2.60 \| 2 5.70 after: \| phoenix:~/l> ./lat_ctx -s 0 2 \| "size=0k ovr=2.65 \| 2 2.07 a 2.75x speedup. pipe-test is similarly happy about it too: \| phoenix:~/sched-tests> ./pipe-test \| 18.26 usecs/loop. \| 14.70 usecs/loop. \| 14.38 usecs/loop. \| 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1 \| 8.63 usecs/loop. \| 8.59 usecs/loop. \| 9.03 usecs/loop. \| 8.94 usecs/loop. \| 8.96 usecs/loop. \| 8.63 usecs/loop. Also: - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings) - enable SD_WAKE_BALANCE on SMP domains Sysbench+postgresql improves all around the board, quite significantly: .28-rc3-11474e2c .28-rc3-11474e2c-tune ------------------------------------------------- 1: 571 688 +17.08% 2: 1236 1206 -2.55% 4: 2381 2642 +9.89% 8: 4958 5164 +3.99% 16: 9580 9574 -0.07% 32: 7128 8118 +12.20% 64: 7342 8266 +11.18% 128: 7342 8064 +8.95% 256: 7519 7884 +4.62% 512: 7350 7731 +4.93% ------------------------------------------------- SUM: 55412 59341 +6.62% So it's a win both for the runup portion, the peak area and the tail. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05	[MTD] [NOR] Fix cfi_send_gen_cmd handling of x16 devices in x8 mode (v4)	Eric W. Biederman
	For "unlock" cycles to 16bit devices in 8bit compatibility mode we need to use the byte addresses 0xaaa and 0x555. These effectively match the word address 0x555 and 0x2aa, except the latter has its low bit set. Most chips don't care about the value of the 'A-1' pin in x8 mode, but some -- like the ST M29W320D -- do. So we need to be careful to set it where appropriate. cfi_send_gen_cmd is only ever passed addresses where the low byte is 0x00, 0x55 or 0xaa. Of those, only addresses ending 0xaa are affected by this patch, by masking in the extra low bit when the device is known to be in compatibility mode. [dwmw2: Do it only when (cmd_ofs & 0xff) == 0xaa] v4: Fix stupid typo in cfi_build_cmd_addr that failed to compile I'm writing this patch way to late at night. v3: Bring all of the work back into cfi_build_cmd_addr including calling of map_bankwidth(map) and cfi_interleave(cfi) So every caller doesn't need to. v2: Only modified the address if we our device_type is larger than our bus width. Cc: stable@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-11-05	tcp: Fix recvmsg MSG_PEEK influence of blocking behavior.	David S. Miller
	Vito Caputo noticed that tcp_recvmsg() returns immediately from partial reads when MSG_PEEK is used. In particular, this means that SO_RCVLOWAT is not respected. Simply remove the test. And this matches the behavior of several other systems, including BSD. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05	netfilter: netns ct: walk netns list under RTNL	Alexey Dobriyan
	netns list (just list) is under RTNL. But helper and proto unregistration happen during rmmod when RTNL is not held, and that's how it was tested: modprobe/rmmod vs clone(CLONE_NEWNET)/exit. BUG: unable to handle kernel paging request at 0000000000100100 <=== IP: [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack] PGD 15e300067 PUD 15e1d8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/kernel/uevent_seqnum CPU 0 Modules linked in: nf_conntrack_proto_sctp(-) nf_conntrack_proto_dccp(-) af_packet iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 sr_mod cdrom [last unloaded: nf_conntrack_proto_sctp] Pid: 16758, comm: rmmod Not tainted 2.6.28-rc2-netns-xfrm #3 RIP: 0010:[<ffffffffa009890f>] [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack] RSP: 0018:ffff88015dc1fec8 EFLAGS: 00010212 RAX: 0000000000000000 RBX: 00000000001000f8 RCX: 0000000000000000 RDX: ffffffffa009575c RSI: 0000000000000003 RDI: ffffffffa00956b5 RBP: ffff88015dc1fed8 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: ffff88015dc1fe48 R12: ffffffffa0458f60 R13: 0000000000000880 R14: 00007fff4c361d30 R15: 0000000000000880 FS: 00007f624435a6f0(0000) GS:ffffffff80521580(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000100100 CR3: 0000000168969000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 16758, threadinfo ffff88015dc1e000, task ffff880179864218) Stack: ffffffffa0459100 0000000000000000 ffff88015dc1fee8 ffffffffa0457934 ffff88015dc1ff78 ffffffff80253fef 746e6e6f635f666e 6f72705f6b636172 00707463735f6f74 ffffffff8024cb30 00000000023b8010 0000000000000000 Call Trace: [<ffffffffa0457934>] nf_conntrack_proto_sctp_fini+0x10/0x1e [nf_conntrack_proto_sctp] [<ffffffff80253fef>] sys_delete_module+0x19f/0x1fe [<ffffffff8024cb30>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff803ea9b2>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8020b52b>] system_call_fastpath+0x16/0x1b Code: 13 35 e0 e8 c4 6c 1a e0 48 8b 1d 6d c6 46 e0 eb 16 48 89 df 4c 89 e2 48 c7 c6 fc 85 09 a0 e8 61 cd ff ff 48 8b 5b 08 48 83 eb 08 <48> 8b 43 08 0f 18 08 48 8d 43 08 48 3d 60 4f 50 80 75 d3 5b 41 RIP [<ffffffffa009890f>] nf_conntrack_l4proto_unregister+0x96/0xae [nf_conntrack] RSP <ffff88015dc1fec8> CR2: 0000000000100100 ---[ end trace bde8ac82debf7192 ]--- Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05	ALSA: hda - Add a quirk for MEDION MD96630	Takashi Iwai
	Use model=lenovo-ms7195-dig for MEDION MD96630 laptop (17c0:4085) with ALC888 codec. Reference: Novell bnc#412548 https://bugzilla.novell.com/show_bug.cgi?id=412528 Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-11-05	ipv6: fix run pending DAD when interface becomes ready	Benjamin Thery
	With some net devices types, an IPv6 address configured while the interface was down can stay 'tentative' forever, even after the interface is set up. In some case, pending IPv6 DADs are not executed when the device becomes ready. I observed this while doing some tests with kvm. If I assign an IPv6 address to my interface eth0 (kvm driver rtl8139) when it is still down then the address is flagged tentative (IFA_F_TENTATIVE). Then, I set eth0 up, and to my surprise, the address stays 'tentative', no DAD is executed and the address can't be pinged. I also observed the same behaviour, without kvm, with virtual interfaces types macvlan and veth. Some easy steps to reproduce the issue with macvlan: 1. ip link add link eth0 type macvlan 2. ip -6 addr add 2003::ab32/64 dev macvlan0 3. ip addr show dev macvlan0 ... inet6 2003::ab32/64 scope global tentative ... 4. ip link set macvlan0 up 5. ip addr show dev macvlan0 ... inet6 2003::ab32/64 scope global tentative ... Address is still tentative I think there's a bug in net/ipv6/addrconf.c, addrconf_notify(): addrconf_dad_run() is not always run when the interface is flagged IF_READY. Currently it is only run when receiving NETDEV_CHANGE event. Looks like some (virtual) devices doesn't send this event when becoming up. For both NETDEV_UP and NETDEV_CHANGE events, when the interface becomes ready, run_pending should be set to 1. Patch below. 'run_pending = 1' could be moved below the if/else block but it makes the code less readable. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05	net/9p: fix printk format warnings	Randy Dunlap
	Fix printk format warnings in net/9p. Built cleanly on 7 arches. net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 6 has type 'u64' net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64' net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64' net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64' net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64' net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64' net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64' net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64' net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-05	sched: fix buddies for group scheduling	Peter Zijlstra
	Impact: scheduling order fix for group scheduling For each level in the hierarchy, set the buddy to point to the right entity. Therefore, when we do the hierarchical schedule, we have a fair chance of ending up where we meant to. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05	sched: backward looking buddy	Peter Zijlstra
	Impact: improve/change/fix wakeup-buddy scheduling Currently we only have a forward looking buddy, that is, we prefer to schedule to the task we last woke up, under the presumption that its going to consume the data we just produced, and therefore will have cache hot benefits. This allows co-waking producer/consumer task pairs to run ahead of the pack for a little while, keeping their cache warm. Without this, we would interleave all pairs, utterly trashing the cache. This patch introduces a backward looking buddy, that is, suppose that in the above scenario, the consumer preempts the producer before it can go to sleep, we will therefore miss the wakeup from consumer to producer (its already running, after all), breaking the cycle and reverting to the cache-trashing interleaved schedule pattern. The backward buddy will try to schedule back to the task that woke us up in case the forward buddy is not available, under the assumption that the last task will be the one with the most cache hot task around barring current. This will basically allow a task to continue after it got preempted. In order to avoid starvation, we allow either buddy to get wakeup_gran ahead of the pack. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05	sched: fix fair preempt check	Peter Zijlstra
	Impact: fix cross-class preemption Inter-class wakeup preemptions should go on class order. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05	sched: cleanup fair task selection	Peter Zijlstra
	Impact: cleanup Clean up task selection Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05	ftrace: fix breakage in bin_fmt results	Eric Anholt
	In 777e208d40d0953efc6fb4ab58590da3f7d8f02d we changed from outputting field->cpu (a char) to iter->cpu (unsigned int), increasing the resulting structure size by 3 bytes. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05	powerpc: Fix "unused variable" warning in pci_dlpar.c	Stephen Rothwell
	This gets rid of this build warning: arch/powerpc/platforms/pseries/pci_dlpar.c: In function 'init_phb_dynamic': arch/powerpc/platforms/pseries/pci_dlpar.c:192: warning: unused variable 'b' This is one of the very few warnings left in a ppc64_defconfig build and getting rid of it will make it easier to see future introduced ones (in fact this was introduced very recently). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05	powerpc/cell: Fix compile error in ras.c	Alexey Dobriyan
	This fixes this error on Cell when CONFIG_KEXEC = n: arch/powerpc/platforms/cell/ras.c:299: error: implicit declaration of function 'crash_shutdown_register' We have to include <asm/kexec.h> because it contains the dummy definition of crash_shutdown_register that is used when CONFIG_KEXEC=n, but <linux/kexec.h> doesn't include <asm/kexec.h> in that case. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-05	powerpc/ps3: Fix compile error in ps3-lpm.c	Alexey Dobriyan
	Compiling with CONFIG_SMP = n and CONFIG_PS3_LPM != n gives this error: drivers/ps3/ps3-lpm.c:838: error: implicit declaration of function 'get_hard_smp_processor_id' This fixes it. We have to include <asm/smp.h> rather than <linux/smp.h> because the UP definition of get_hard_smp_processor_id() is in <asm/smp.h>, and <linux/smp.h> only includes <asm/smp.h> if CONFIG_SMP = y. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-11-04	net: fix packet socket delivery in rx irq handler	Patrick McHardy
	The changes to deliver hardware accelerated VLAN packets to packet sockets (commit bc1d0411) caused a warning for non-NAPI drivers. The __vlan_hwaccel_rx() function is called directly from the drivers RX function, for non-NAPI drivers that means its still in RX IRQ context: [ 27.779463] ------------[ cut here ]------------ [ 27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81() ... [ 27.782520] [<c0264755>] netif_nit_deliver+0x5b/0x75 [ 27.782590] [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162 [ 27.782664] [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1] [ 27.782738] [<c0155b17>] handle_IRQ_event+0x23/0x51 [ 27.782808] [<c015692e>] handle_edge_irq+0xc2/0x102 [ 27.782878] [<c0105fd5>] do_IRQ+0x4d/0x64 Split hardware accelerated VLAN reception into two parts to fix this: - __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN device lookup, then calls netif_receive_skb()/netif_rx() - vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb() in softirq context, performs the real reception and delivery to packet sockets. Reported-and-tested-by: Ramon Casellas <ramon.casellas@cttc.es> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-04	xfrm: Have af-specific init_tempsel() initialize family field of temporary ↵	Andreas Steffen
	selector While adding MIGRATE support to strongSwan, Andreas Steffen noticed that the selectors provided in XFRM_MSG_ACQUIRE have their family field uninitialized (those in MIGRATE do have their family set). Looking at the code, this is because the af-specific init_tempsel() (called via afinfo->init_tempsel() in xfrm_init_tempsel()) do not set the value. Reported-by: Andreas Steffen <andreas.steffen@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
2008-11-04	ARM: OMAP: Fix define for twl4030 irqs	Tony Lindgren
	Otherwise twl4030 gpios won't work. Signed-off-by: Tony Lindgren <tony@atomide.com>