linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2013-11-20	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull block IO fixes from Jens Axboe: "Normally I'd defer my initial for-linus pull request until after the merge window, but a race was uncovered in the virtio-blk conversion to blk-mq that could cause hangs. So here's a small collection of fixes for you to pull: - The fix for the virtio-blk IO hang reported by Dave Chinner, from Shaohua and myself. - Add the Insert blktrace event for blk-mq. This makes 'btt' happy when it is doing it's state transition analysis. - Ensure that blk-mq has disk/partition stats enabled by default, instead of making it opt-in. - A fix for __bio_add_page() and large sector counts" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: add blktrace insert event trace virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations blk-mq: ensure that we set REQ_IO_STAT so diskstats work bio: fix argument of __bio_add_page() for max_sectors > 0xffff
2013-11-20	Merge tag 'md/3.13' of git://neil.brown.name/md	Linus Torvalds
	Pull md update from Neil Brown: "Mostly optimisations and obscure bug fixes. - raid5 gets less lock contention - raid1 gets less contention between normal-io and resync-io during resync" * tag 'md/3.13' of git://neil.brown.name/md: md/raid5: Use conf->device_lock protect changing of multi-thread resources. md/raid5: Before freeing old multi-thread worker, it should flush them. md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE. UAPI: include <asm/byteorder.h> in linux/raid/md_p.h raid1: Rewrite the implementation of iobarrier. raid1: Add some macros to make code clearly. raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array when reconfiguring the array. raid1: Add a field array_frozen to indicate whether raid in freeze state. md: Convert use of typedef ctl_table to struct ctl_table md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes. md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread. md: fix some places where mddev_lock return value is not checked. raid5: Retry R5_ReadNoMerge flag when hit a read error. raid5: relieve lock contention in get_active_stripe() raid5: relieve lock contention in get_active_stripe() wait: add wait_event_cmd() md/raid5.c: add proper locking to error path of raid5_start_reshape. md: fix calculation of stacking limits on level change. raid5: Use slow_path to release stripe when mddev->thread is null
2013-11-20	r8152: fix incorrect type in assignment	hayeswang
	The data from the hardware should be little endian. Correct the declaration. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20	r8152: support stopping/waking tx queue	hayeswang
	The maximum packet number which a tx aggregation buffer could contain is the tx_qlen. tx_qlen = buffer size / (packet size + descriptor size). If the tx buffer is empty and the queued packets are more than the maximum value which is defined above, stop the tx queue. Wake the tx queue if tx queue is stopped and the queued packets are less than tx_qlen. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20	r8152: modify the tx flow	hayeswang
	Remove the code for sending the packet in the rtl8152_start_xmit(). Let rtl8152_start_xmit() to queue the packet only, and schedule a tasklet to send the queued packets. This simplify the code and make sure all the packet would be sent by the original order. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20	r8152: fix tx/rx memory overflow	hayeswang
	The tx/rx would access the memory which is out of the desired range. Modify the method of checking the end of the memory to avoid it. For r8152_tx_agg_fill(), the variable remain may become negative. However, the declaration is unsigned, so the while loop wouldn't break when reaching the end of the desied memory. Although to change the declaration from unsigned to signed is enough to fix it, I also modify the checking method for safe. Replace remain = rx_buf_sz - sizeof(tx_desc) - (u32)((void )tx_data - agg->head); with remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head); to make sure the variable remain is always positive. Then, the overflow wouldn't happen. For rx_bottom(), the rx_desc should not be used to calculate the packet length before making sure the rx_desc is in the desired range. Change the checking to two parts. First, check the descriptor is in the memory. The other, using the descriptor to find out the packet length and check if the packet is in the memory. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20	iscsi-target: Expose default_erl as TPG attribute	Nicholas Bellinger
	This patch exposes default_erl as a TPG attribute so that it may be set TPG wide in demo-mode, but still allow the existing NodeACL attribute to be overridden on a per initiator basis. Reported-by: Arshad Hussain <arshad.hussain@calsoftinc.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-20	target_core_configfs: split up ALUA supported states	Hannes Reinecke
	Split up the various ALUA states into individual attributes to make parsing easier and adhere to the one value per attribute sysfs principle. (nab: Convert strict_strtoul -> kstrtoul usage) Signed-off-by: Hannes Reinecke <hare@suse.de>
2013-11-20	target_core_alua: Make supported states configurable	Hannes Reinecke
	Signed-off-by: Hannes Reinecke <hare@suse.de>
2013-11-20	target_core_alua: Store supported ALUA states	Hannes Reinecke
	The supported ALUA states might be different for individual devices, so store it in a separate field. (nab: Remove unnecessary line continuation) Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-20	target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED	Hannes Reinecke
	Rename ALUA_ACCESS_STATE_OPTMIZED to ALUA_ACCESS_STATE_OPTIMIZED. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-20	target_core_alua: spellcheck	Hannes Reinecke
	Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-20	target core: rename (ex,im)plict -> (ex,im)plicit	Hannes Reinecke
	Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-20	hwmon: (acpi_power_meter) Fix acpi_bus_get_device() return value check	Yijing Wang
	Since acpi_bus_get_device() returns plain int and not acpi_status, ACPI_FAILURE() should not be used for checking its return value. Fix that. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2013-11-20	drm/i915: Fix gen3 self-refresh watermarks	Daniel Vetter
	This regression has been introduced in commit 4fe8590a921d0b2e36e542dbfa89a8c5993f5a3f Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Wed Sep 4 18:25:22 2013 +0300 drm/i915: Use adjusted_mode appropriately when computing watermarks I guess we should renable the enabled local variable into something a notch more descriptive, but that's something for -next. The effect on my i945gme netbook is pretty severe amounts of underruns - usually the very first pixel gets used for the entire screeen. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-20	drm/ttm: Remove set_need_resched from the ttm fault handler	Thomas Hellstrom
	Addresses "[BUG] completely bonkers use of set_need_resched + VM_FAULT_NOPAGE". In the first occurence it was used to try to be nice while releasing the mmap_sem and retrying the fault to work around a locking inversion. The second occurence was never used. There has been some discussion whether we should change the locking order to mmap_sem -> bo_reserve. This patch doesn't address that issue, and leaves that locking order undefined. The solution that we release the mmap_sem if tryreserve fails and wait for the buffer to become unreserved is something we want in any case, and follows how the core vm system waits for pages to be come unlocked while releasing the mmap_sem. The code also outlines what needs to be changed if we want to establish the locking order as mmap_sem -> bo::reserve. One slight issue that remains with this code is that the fault handler might be prone to starvation if another thread countinously reserves the buffer. IMO that usage pattern is highly unlikely. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2013-11-20	drm/ttm: Don't move non-existing data	Thomas Hellstrom
	If ttm_bo_move_memcpy was instructed to move a non-populated ttm to io memory, it would first populate the ttm, then move the data and then destroy the ttm. That's stupid. However, some drivers might have relied on this to clear io memory from old stuff. So instead of a NOP, which would be the most efficient, just clear the destination. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
2013-11-19	iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN	Nicholas Bellinger
	This patch changes iscsit_sequence_cmd() logic to no longer reject non-immediate CmdSNs that exceed MaxCmdSN with a protocol error, but instead silently ignore them. This is done to correctly follow RFC-3720 Section 3.2.2.1: For non-immediate commands, the CmdSN field can take any value from ExpCmdSN to MaxCmdSN inclusive. The target MUST silently ignore any non-immediate command outside of this range or non- immediate duplicates within the range. Reported-by: Santosh Kulkarni <santosh.kulkarni@calsoftinc.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-19	iscsi-target: Convert iscsi_session statistics to atomic_long_t	Nicholas Bellinger
	This patch converts a handful of iscsi_session statistics to type atomic_long_t, instead of using iscsi_session->session_stats_lock when incrementing these values. More importantly, go ahead and drop the spinlock usage within iscsit_setup_scsi_cmd(), iscsit_check_dataout_hdr(), iscsit_send_datain(), and iscsit_build_rsp_pdu() fast-path code. (Squash in Roland's target: Remove write-only stats fields and lock from struct se_node_acl) Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-19	virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations	Shaohua Li
	It isn't safe to call it without holding the vblk->vq_lock. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Fixed another condition of virtqueue_kick() not holding the lock. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-19	aacraid: prevent invalid pointer dereference	Mahesh Rajashekhara
	It appears that driver runs into a problem here if fibsize is too small because we allocate user_srbcmd with fibsize size only but later we access it until user_srbcmd->sg.count to copy it over to srbcmd. It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this structure already includes one sg element and this is not needed for commands without data. So, we would recommend to add the following (instead of test for fibsize == 0). Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com> Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...
2013-11-19	genetlink: make multicast groups const, prevent abuse	Johannes Berg
	Register generic netlink multicast groups as an array with the family and give them contiguous group IDs. Then instead of passing the global group ID to the various functions that send messages, pass the ID relative to the family - for most families that's just 0 because the only have one group. This avoids the list_head and ID in each group, adding a new field for the mcast group ID offset to the family. At the same time, this allows us to prevent abusing groups again like the quota and dropmon code did, since we can now check that a family only uses a group it owns. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19	genetlink: pass family to functions using groups	Johannes Berg
	This doesn't really change anything, but prepares for the next patch that will change the APIs to pass the group ID within the family, rather than the global group ID. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19	genetlink: only pass array to genl_register_family_with_ops()	Johannes Berg
	As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19	drm/radeon: hook up backlight functions for CI and KV family.	Samuel Li
	Fixes crashes when handling atif events due to the lack of a callback being registered. Signed-off-by: Samuel Li <samuel.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2013-11-19	atm: idt77252: fix dev refcnt leak	Ying Xue
	init_card() calls dev_get_by_name() to get a network deceive. But it doesn't decrease network device reference count after the device is used. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19	Merge branch 'acpi-hotplug'	Rafael J. Wysocki
	* acpi-hotplug: PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
2013-11-19	PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration	Mika Westerberg
	Commit bd950799d951 (PCI: acpiphp: Convert to dynamic debug) removed users of acpiphp_debug variable and the variable itself but the declaration was left in the header file. Drop this unused declaration. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-19	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull second set of s390 patches from Martin Schwidefsky: "The handling of the PCI hotplug notifications has been improved, the zfcp dumper can now detect the HSA size dynamically and the default install kernel has been changed to the compressed bzImage. And two bug-fixes for scm and 3720" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: implement hotplug notifications s390/scm_block: do not hide eadm subchannel dependency s390/sclp: Consolidate early sclp init calls to sclp_early_detect() s390/sclp: Move early code from sclp_cmd.c to sclp_early.c s390/sclp: Determine HSA size dynamically for zfcpdump s390/sclp: Move declarations for sclp_sdias into separate header file s390/pci: implement pcibios_remove_bus s390/pci: improve handling of bus resources s390/3270: fix missing device_destroy() call s390/boot: Install bzImage as default kernel image
2013-11-19	drm/i915: Replicate BIOS eDP bpp clamping hack for hsw	Daniel Vetter
	Haswell's DDI encoders have their own ->get_config callback and in commit c6cd2ee2d59111a07cd9199564c9bdcb2d11e5cf Author: Jani Nikula <jani.nikula@intel.com> Date: Mon Oct 21 10:52:07 2013 +0300 drm/i915/dp: workaround BIOS eDP bpp clamping issue we've forgotten to replicate this hack. So let's do it that. Note for backporters: The above commit and all it's depencies need to be backported first. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71049 Cc: stable@vger.kernel.org Tested-by: Gökçen Eraslan <gokcen.eraslan@gmail.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-19	drm/i915: Do not enable package C8 on unsupported hardware	Chris Wilson
	If the hardware does not support package C8, then do not even schedule work to enable it. Thereby we can eliminate a bunch of dangerous work. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-19	drm/i915: Hold pc8 lock around toggling pc8.gpu_idle	Chris Wilson
	We need to hold the pc8 lock around toggling the value of gpu_idle. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-11-19	md/raid5: Use conf->device_lock protect changing of multi-thread resources.	majianpeng
	When we change group_thread_cnt from sysfs entry, it can OOPS. The kernel messages are: [ 135.299021] BUG: unable to handle kernel NULL pointer dereference at (null) [ 135.299073] IP: [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440 [ 135.299107] PGD 0 [ 135.299122] Oops: 0000 [#1] SMP [ 135.299144] Modules linked in: netconsole e1000e ptp pps_core [ 135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24 [ 135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 135.299255] task: ffff8800b9638f80 ti: ffff8800b77a4000 task.ti: ffff8800b77a4000 [ 135.299283] RIP: 0010:[<ffffffff815188ab>] [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440 [ 135.299323] RSP: 0018:ffff8800b77a5c48 EFLAGS: 00010002 [ 135.299344] RAX: ffff880037bb5c70 RBX: 0000000000000000 RCX: 0000000000000008 [ 135.299371] RDX: ffff880037bb5cb8 RSI: 0000000000000001 RDI: ffff880037bb5c00 [ 135.299398] RBP: ffff8800b77a5d08 R08: 0000000000000001 R09: 0000000000000000 [ 135.299425] R10: ffff8800b77a5c98 R11: 00000000ffffffff R12: ffff880037bb5c00 [ 135.299452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880037bb5c70 [ 135.299479] FS: 0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 [ 135.299510] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 135.299532] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000407e0 [ 135.299559] Stack: [ 135.299570] ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300 [ 135.299611] 000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8 [ 135.299654] 000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98 [ 135.299696] Call Trace: [ 135.299711] [<ffffffff8107383e>] ? __wake_up+0x4e/0x70 [ 135.299733] [<ffffffff81518f88>] raid5d+0x4c8/0x680 [ 135.299756] [<ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0 [ 135.299781] [<ffffffff81524c9f>] md_thread+0x11f/0x170 [ 135.299804] [<ffffffff81069cd0>] ? wake_up_bit+0x40/0x40 [ 135.299826] [<ffffffff81524b80>] ? md_rdev_init+0x110/0x110 [ 135.299850] [<ffffffff81069656>] kthread+0xc6/0xd0 [ 135.299871] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70 [ 135.299899] [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0 [ 135.299923] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70 [ 135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90 [ 135.300005] RIP [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440 [ 135.300005] RSP <ffff8800b77a5c48> [ 135.300005] CR2: 0000000000000000 [ 135.300005] ---[ end trace 504854e5bb7562ed ]--- [ 135.300005] Kernel panic - not syncing: Fatal exception This is because raid5d() can be running when the multi-thread resources are changed via system. We see need to provide locking. mddev->device_lock is suitable, but we cannot simple call alloc_thread_groups under this lock as we cannot allocate memory while holding a spinlock. So change alloc_thread_groups() to allocate and return the data structures, then raid5_store_group_thread_cnt() can take the lock while updating the pointers to the data structures. This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x stable series. Fixes: b721420e8719131896b009b11edbbd27 Cc: stable@vger.kernel.org (3.12) Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Shaohua Li <shli@kernel.org>
2013-11-19	md/raid5: Before freeing old multi-thread worker, it should flush them.	majianpeng
	When changing group_thread_cnt from sysfs entry, the kernel can oops. The kernel messages are: [ 740.961389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 740.961444] IP: [<ffffffff81062570>] process_one_work+0x30/0x500 [ 740.961476] PGD b9013067 PUD b651e067 PMD 0 [ 740.961503] Oops: 0000 [#1] SMP [ 740.961525] Modules linked in: netconsole e1000e ptp pps_core [ 740.961577] CPU: 0 PID: 3683 Comm: kworker/u8:5 Not tainted 3.12.0+ #23 [ 740.961602] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 740.961646] task: ffff88013abe0000 ti: ffff88013a246000 task.ti: ffff88013a246000 [ 740.961673] RIP: 0010:[<ffffffff81062570>] [<ffffffff81062570>] process_one_work+0x30/0x500 [ 740.961708] RSP: 0018:ffff88013a247e08 EFLAGS: 00010086 [ 740.961730] RAX: ffff8800b912b400 RBX: ffff88013a61e680 RCX: ffff8800b912b400 [ 740.961757] RDX: ffff8800b912b600 RSI: ffff8800b912b600 RDI: ffff88013a61e680 [ 740.961782] RBP: ffff88013a247e48 R08: ffff88013a246000 R09: 000000000002c09d [ 740.961808] R10: 000000000000010f R11: 0000000000000000 R12: ffff88013b00cc00 [ 740.961833] R13: 0000000000000000 R14: ffff88013b00cf80 R15: ffff88013a61e6b0 [ 740.961861] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 [ 740.961893] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 740.962001] CR2: 00000000000000b8 CR3: 00000000b24fe000 CR4: 00000000000407f0 [ 740.962001] Stack: [ 740.962001] 0000000000000008 ffff8800b912b600 ffff88013b00cc00 ffff88013a61e680 [ 740.962001] ffff88013b00cc00 ffff88013b00cc18 ffff88013b00cf80 ffff88013a61e6b0 [ 740.962001] ffff88013a247eb8 ffffffff810639c6 0000000000012a80 ffff88013a247fd8 [ 740.962001] Call Trace: [ 740.962001] [<ffffffff810639c6>] worker_thread+0x206/0x3f0 [ 740.962001] [<ffffffff810637c0>] ? manage_workers+0x2c0/0x2c0 [ 740.962001] [<ffffffff81069656>] kthread+0xc6/0xd0 [ 740.962001] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70 [ 740.962001] [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0 [ 740.962001] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70 [ 740.962001] Code: 89 e5 41 57 41 56 41 55 45 31 ed 41 54 53 48 89 fb 48 83 ec 18 48 8b 06 4c 8b 67 48 48 89 c1 30 c9 a8 04 4c 0f 45 e9 80 7f 58 00 <49> 8b 45 08 44 8b b0 00 01 00 00 78 0c 41 f6 44 24 10 04 0f 84 [ 740.962001] RIP [<ffffffff81062570>] process_one_work+0x30/0x500 [ 740.962001] RSP <ffff88013a247e08> [ 740.962001] CR2: 0000000000000008 [ 740.962001] ---[ end trace 39181460000748de ]--- [ 740.962001] Kernel panic - not syncing: Fatal exception This can happen if there are some stripes left, fewer than MAX_STRIPE_BATCH. A worker is queued to handle them. But before calling raid5_do_work, raid5d handles those stripes making conf->active_stripe = 0. So mddev_suspend() can return. We might then free old worker resources before the queued raid5_do_work() handled them. When it runs, it crashes. raid5d() raid5_store_group_thread_cnt() queue_work mddev_suspend() handle_strips active_stripe=0 free(old worker resources) process_one_work raid5_do_work To avoid this, we should only flush the worker resources before freeing them. This fixes a bug introduced in 3.12 so is suitable for the 3.12.x stable series. Cc: stable@vger.kernel.org (3.12) Fixes: b721420e8719131896b009b11edbbd27 Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Shaohua Li <shli@kernel.org>
2013-11-19	md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE.	majianpeng
	For R5_ReadNoMerge,it mean this bio can't merge with other bios or request.It used REQ_FLUSH to achieve this. But REQ_NOMERGE can do the same work. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	raid1: Rewrite the implementation of iobarrier.	majianpeng
	There is an iobarrier in raid1 because of contention between normal IO and resync IO. It suspends all normal IO when resync/recovery happens. However if normal IO is out side the resync window, there is no contention. So this patch changes the barrier mechanism to only block IO that could contend with the resync that is currently happening. We partition the whole space into five parts. \|---------\|-----------\|------------\|----------------\|-------\| start next_resync start_next_window end_window start + RESYNC_WINDOW = next_resync next_resync + NEXT_NORMALIO_DISTANCE = start_next_window start_next_window + NEXT_NORMALIO_DISTANCE = end_window Firstly we introduce some concepts: 1 - RESYNC_WINDOW: For resync, there are 32 resync requests at most at the same time. A sync request is RESYNC_BLOCK_SIZE(641024). So the RESYNC_WINDOW is 32 RESYNC_BLOCK_SIZE, that is 2MB. 2 - NEXT_NORMALIO_DISTANCE: the distance between next_resync and start_next_window. It also indicates the distance between start_next_window and end_window. It is currently 3 * RESYNC_WINDOW_SIZE but could be tuned if this turned out not to be optimal. 3 - next_resync: the next sector at which we will do sync IO. 4 - start: a position which is at most RESYNC_WINDOW before next_resync. 5 - start_next_window: a position which is NEXT_NORMALIO_DISTANCE beyond next_resync. Normal-io after this position doesn't need to wait for resync-io to complete. 6 - end_window: a position which is 2 * NEXT_NORMALIO_DISTANCE beyond next_resync. This also doesn't need to wait, but is counted differently. 7 - current_window_requests: the count of normalIO between start_next_window and end_window. 8 - next_window_requests: the count of normalIO after end_window. NormalIO will be partitioned into four types: NormIO1: the end sector of bio is smaller or equal the start NormIO2: the start sector of bio larger or equal to end_window NormIO3: the start sector of bio larger or equal to start_next_window. NormIO4: the location between start_next_window and end_window \|--------\|-----------\|--------------------\|----------------\|-------------\| \| start \| next_resync \| start_next_window \| end_window \| NormIO1 NormIO4 NormIO4 NormIO3 NormIO2 For NormIO1, we don't need any io barrier. For NormIO4, we used a similar approach to the original iobarrier mechanism. The normalIO and resyncIO must be kept separate. For NormIO2/3, we add two fields to struct r1conf: "current_window_requests" and "next_window_requests". They indicate the count of active requests in the two window. For these, we don't wait for resync io to complete. For resync action, if there are NormIO4s, we must wait for it. If not, we can proceed. But if resync action reaches start_next_window and current_window_requests > 0 (that is there are NormIO3s), we must wait until the current_window_requests becomes zero. When current_window_requests becomes zero, start_next_window also moves forward. Then current_window_requests will replaced by next_window_requests. There is a problem which when and how to change from NormIO2 to NormIO3. Only then can sync action progress. We add a field in struct r1conf "start_next_window". A: if start_next_window == MaxSector, it means there are no NormIO2/3. So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE B: if current_window_requests == 0 && next_window_requests != 0, it means start_next_window move to end_window There is another problem which how to differentiate between old NormIO2(now it is NormIO3) and NormIO2. For example, there are many bios which are NormIO2 and a bio which is NormIO3. NormIO3 firstly completed, so the bios of NormIO2 became NormIO3. We add a field in struct r1bio "start_next_window". This is used to record the position conf->start_next_window when the call to wait_barrier() is made in make_request(). In allow_barrier(), we check the conf->start_next_window. If r1bio->stat_next_window == conf->start_next_window, it means there is no transition between NormIO2 and NormIO3. If r1bio->start_next_window != conf->start_next_window, it mean there was a transition between NormIO2 and NormIO3. There can only have been one transition. So it only means the bio is old NormIO2. For one bio, there may be many r1bio's. So we make sure all the r1bio->start_next_window are the same value. If we met blocked_dev in make_request(), it must call allow_barrier and wait_barrier. So the former and the later value of conf->start_next_window will be change. If there are many r1bio's with differnet start_next_window, for the relevant bio, it depend on the last value of r1bio. It will cause error. To avoid this, we must wait for previous r1bios to complete. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	raid1: Add some macros to make code clearly.	majianpeng
	In a subsequent patch, we'll use some const parameters. Using macros will make the code clearly. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array ↵	majianpeng
	when reconfiguring the array. We used to use raise_barrier to suspend normal IO while we reconfigure the array. However raise_barrier will soon only suspend some normal IO, not all. So we need something else. Change it to use freeze_array. But freeze_array not only suspends normal io, it also suspends resync io. For the place where call raise_barrier for reconfigure, it isn't a problem. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	raid1: Add a field array_frozen to indicate whether raid in freeze state.	majianpeng
	Because the following patch will rewrite the content between normal IO and resync IO. So we used a parameter to indicate whether raid is in freeze array. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	md: Convert use of typedef ctl_table to struct ctl_table	Joe Perches
	This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	md/raid5: avoid deadlock when raid5 array has unack badblocks during ↵	NeilBrown
	md_stop_writes. When raid5 recovery hits a fresh badblock, this badblock will flagged as unack badblock until md_update_sb() is called. But md_stop will take reconfig lock which means raid5d can't call md_update_sb() in md_check_recovery(), the badblock will always be unack, so raid5d thread enters an infinite loop and md_stop_write() can never stop sync_thread. This causes deadlock. To solve this, when STOP_ARRAY ioctl is issued and sync_thread is running, we need set md->recovery FROZEN and INTR flags and wait for sync_thread to stop before we (re)take reconfig lock. This requires that raid5 reshape_request notices MD_RECOVERY_INTR (which it probably should have noticed anyway) and stops waiting for a metadata update in that case. Reported-by: Jianpeng Ma <majianpeng@gmail.com> Reported-by: Bian Yu <bianyu@kedacom.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread.	NeilBrown
	We currently use kthread_should_stop() in various places in the sync/reshape code to abort early. However some places set MD_RECOVERY_INTR but don't immediately call md_reap_sync_thread() (and we will shortly get another one). When this happens we are relying on md_check_recovery() to reap the thread and that only happen when it finishes normally. So MD_RECOVERY_INTR must lead to a normal finish without the kthread_should_stop() test. So replace all relevant tests, and be more careful when the thread is interrupted not to acknowledge that latest step in a reshape as it may not be fully committed yet. Also add a test on MD_RECOVERY_INTR in the 'is_mddev_idle' loop so we don't wait have to wait for the speed to drop before we can abort. Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	md: fix some places where mddev_lock return value is not checked.	NeilBrown
	Sometimes we need to lock and mddev and cannot cope with failure due to interrupt. In these cases we should use mutex_lock, not mutex_lock_interruptible. Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	raid5: Retry R5_ReadNoMerge flag when hit a read error.	Bian Yu
	Because of block layer merge, one bio fails will cause other bios which belongs to the same request fails, so raid5_end_read_request will record all these bios as badblocks. If retry request with R5_ReadNoMerge flag to avoid bios merge, badblocks can only record sector which is bad exactly. test: hdparm --yes-i-know-what-i-am-doing --make-bad-sector 300000 /dev/sdb mdadm -C /dev/md0 -l5 -n3 /dev/sd[bcd] --assume-clean mdadm /dev/md0 -f /dev/sdd mdadm /dev/md0 -r /dev/sdd mdadm --zero-superblock /dev/sdd mdadm /dev/md0 -a /dev/sdd 1. Without this patch: cat /sys/block/md0/md/rd/bad_blocks 299776 256 299776 256 2. With this patch: cat /sys/block/md0/md/rd/bad_blocks 300000 8 300000 8 Signed-off-by: Bian Yu <bianyu@kedacom.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	raid5: relieve lock contention in get_active_stripe()	Shaohua Li
	track empty inactive list count, so md_raid5_congested() can use it to make decision. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-19	Merge branch 'pm-cpufreq'	Rafael J. Wysocki
	* pm-cpufreq: cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs() cpufreq: OMAP: Fix compilation error 'r & ret undeclared' cpufreq: conservative: set requested_freq to policy max when it is over policy max
2013-11-19	Merge branch 'pm-runtime'	Rafael J. Wysocki
	* pm-runtime: PM / Runtime: Fix error path for prepare PM / Runtime: Update documentation around probe\|remove\|suspend
2013-11-19	Merge branch 'pm-cpuidle'	Rafael J. Wysocki
	* pm-cpuidle: intel_idle: Support Intel Atom Processor C2000 Product Family
2013-11-19	Merge branch 'acpi-video'	Rafael J. Wysocki
	* acpi-video: ACPI / video: clean up DMI table for initial black screen problem