summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-14ASoC: wm8962: Remove remaining direct register cache accessesNicolin Chen
Also fix return values for headphone switch updates. Signed-off-by: Nicolin Chen <b42378@freescale.com> Signed-off-by: Mark Brown <broonie@linaro.org> Cc: stable@vger.kernel.org
2013-06-14ARM64: mm: THP support.Steve Capper
Bring Transparent HugePage support to ARM. The size of a transparent huge page depends on the normal page size. A transparent huge page is always represented as a pmd. If PAGE_SIZE is 4KB, THPs are 2MB. If PAGE_SIZE is 64KB, THPs are 512MB. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2013-06-14ARM64: mm: Raise MAX_ORDER for 64KB pages and THP.Steve Capper
The buddy allocator has a default MAX_ORDER of 11, which is too low to allocate enough memory for 512MB Transparent HugePages if our base page size is 64KB. This patch introduces MAX_ZONE_ORDER and sets it to 14 when 64KB pages are used in conjuction with THP, otherwise the default value of 11 is used. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2013-06-14ARM64: mm: HugeTLB support.Steve Capper
Add huge page support to ARM64, different huge page sizes are supported depending on the size of normal pages: PAGE_SIZE is 4KB: 2MB - (pmds) these can be allocated at any time. 1024MB - (puds) usually allocated on bootup with the command line with something like: hugepagesz=1G hugepages=6 PAGE_SIZE is 64KB: 512MB - (pmds) usually allocated on bootup via command line. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2013-06-14ARM64: mm: Move PTE_PROT_NONE bit.Steve Capper
Under ARM64, PTEs can be broadly categorised as follows: - Present and valid: Bit #0 is set. The PTE is valid and memory access to the region may fault. - Present and invalid: Bit #0 is clear and bit #1 is set. Represents present memory with PROT_NONE protection. The PTE is an invalid entry, and the user fault handler will raise a SIGSEGV. - Not present (file or swap): Bits #0 and #1 are clear. Memory represented has been paged out. The PTE is an invalid entry, and the fault handler will try and re-populate the memory where necessary. Huge PTEs are block descriptors that have bit #1 clear. If we wish to represent PROT_NONE huge PTEs we then run into a problem as there is no way to distinguish between regular and huge PTEs if we set bit #1. To resolve this ambiguity this patch moves PTE_PROT_NONE from bit #1 to bit #2 and moves PTE_FILE from bit #2 to bit #3. The number of swap/file bits is reduced by 1 as a consequence, leaving 60 bits for file and swap entries. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2013-06-14ARM64: mm: Make PAGE_NONE pages read only and no-execute.Steve Capper
If we consider the following code sequence: my_pte = pte_modify(entry, myprot); x = pte_write(my_pte); y = pte_exec(my_pte); If myprot comes from a PROT_NONE page, then x and y will both be true which is undesireable behaviour. This patch sets the no-execute and read-only bits for PAGE_NONE such that the code above will return false for both x and y. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2013-06-14ARM64: mm: Restore memblock limit when map_mem finished.Steve Capper
In paging_init the memblock limit is set to restrict any addresses returned by early_alloc to fit within the initial direct kernel mapping in swapper_pg_dir. This allows map_mem to allocate puds, pmds and ptes from the initial direct kernel mapping. The limit stays low after paging_init() though, meaning any bootmem allocations will be from a restricted subset of memory. Gigabyte huge pages, for instance, are normally allocated from bootmem as their order (18) is too large for the default buddy allocator (MAX_ORDER = 11). This patch restores the memblock limit when map_mem has finished, allowing gigabyte huge pages (and other objects) to be allocated from all of bootmem. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2013-06-14mm: thp: Correct the HPAGE_PMD_ORDER check.Steve Capper
All Transparent Huge Pages are allocated by the buddy allocator. A compile time check is in place that fails when the order of a transparent huge page is too large to be allocated by the buddy allocator. Unfortunately that compile time check passes when: HPAGE_PMD_ORDER == MAX_ORDER ( which is incorrect as the buddy allocator can only allocate memory of order strictly less than MAX_ORDER. ) This patch updates the compile time check to fail in the above case. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14x86: mm: Remove general hugetlb code from x86.Steve Capper
huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have already been copied over to mm. This patch removes the x86 copies of these functions and activates the general ones by enabling: CONFIG_ARCH_WANT_GENERAL_HUGETLB Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14mm: hugetlb: Copy general hugetlb code from x86 to mm.Steve Capper
The huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d functions in x86/mm/hugetlbpage.c do not rely on any architecture specific knowledge other than the fact that pmds and puds can be treated as huge ptes. To allow other architectures to use this code (and reduce the need for code duplication), this patch copies these functions into mm, replaces the use of pud_large with pud_huge and provides a config flag to activate them: CONFIG_ARCH_WANT_GENERAL_HUGETLB If CONFIG_ARCH_WANT_HUGE_PMD_SHARE is also active then the huge_pmd_share code will be called by huge_pte_alloc (othewise we call pmd_alloc and skip the sharing code). Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14x86: mm: Remove x86 version of huge_pmd_share.Steve Capper
The huge_pmd_share code has been copied over to mm/hugetlb.c to make it accessible to other architectures. Remove the x86 copy of the huge_pmd_share code and enable the ARCH_WANT_HUGE_PMD_SHARE config flag. That way we reference the general one. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14mm: hugetlb: Copy huge_pmd_share from x86 to mm.Steve Capper
Under x86, multiple puds can be made to reference the same bank of huge pmds provided that they represent a full PUD_SIZE of shared huge memory that is aligned to a PUD_SIZE boundary. The code to share pmds does not require any architecture specific knowledge other than the fact that pmds can be indexed, thus can be beneficial to some other architectures. This patch copies the huge pmd sharing (and unsharing) logic from x86/ to mm/ and introduces a new config option to activate it: CONFIG_ARCH_WANTS_HUGE_PMD_SHARE Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2013-06-14sgi: xpc: Convert use of typedef ctl_table to struct ctl_tableJoe Perches
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Robin Holt <holt@sgi.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-06-14tcm_qla2xxx: Fix residual for underrun commands that failRoland Dreier
Suppose an initiator sends a DATA IN command with an allocation length shorter than the FC transfer length -- we get a target message like TARGET_CORE[qla2xxx]: Expected Transfer Length: 256 does not match SCSI CDB Length: 0 for SAM Opcode: 0x12 In that case, the target core adjusts the data_length and sets se_cmd->residual_count for the underrun. But now suppose that command fails and we end up in tcm_qla2xxx_queue_status() -- that function unconditionally overwrites residual_count with the already adjusted data_length, and the initiator will burp with a message like qla2xxx [0000:00:06.0]-301d:0: Dropped frame(s) detected (0x100 of 0x100 bytes). Fix this by adding on to the existing underflow residual count instead. Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: Giridhar Malavali <giridhar.malavali@qlogic.com> Cc: Chad Dupuis <chad.dupuis@qlogic.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-06-14target/iscsi: don't corrupt bh_count in iscsit_stop_time2retain_timer()Jörn Engel
Here is a fun one. Bug seems to have been introduced by commit 140854cb, almost two years ago. I have no idea why we only started seeing it now, but we did. Rough callgraph: core_tpg_set_initiator_node_queue_depth() `-> spin_lock_irqsave(&tpg->session_lock, flags); `-> lio_tpg_shutdown_session() `-> iscsit_stop_time2retain_timer() `-> spin_unlock_bh(&se_tpg->session_lock); `-> spin_lock_bh(&se_tpg->session_lock); `-> spin_unlock_irqrestore(&tpg->session_lock, flags); core_tpg_set_initiator_node_queue_depth() used to call spin_lock_bh(), but 140854cb changed that to spin_lock_irqsave(). However, lio_tpg_shutdown_session() still claims to be called with spin_lock_bh() held, as does iscsit_stop_time2retain_timer(): * Called with spin_lock_bh(&struct se_portal_group->session_lock) held Stale documentation is mostly annoying, but in this case the dropping the lock with the _bh variant is plain wrong. It is also wrong to drop locks two functions below the lock-holder, but I will ignore that bit for now. After some more locking and unlocking we eventually hit this backtrace: ------------[ cut here ]------------ WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0xe8/0x100() Pid: 24645, comm: lio_helper.py Tainted: G O 3.6.11+ Call Trace: [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0 [<ffffffffa040ae37>] ? iscsit_inc_conn_usage_count+0x37/0x50 [iscsi_target_mod] [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20 [<ffffffff810472f8>] local_bh_enable_ip+0xe8/0x100 [<ffffffff815b8365>] _raw_spin_unlock_bh+0x15/0x20 [<ffffffffa040ae37>] iscsit_inc_conn_usage_count+0x37/0x50 [iscsi_target_mod] [<ffffffffa041149a>] iscsit_stop_session+0xfa/0x1c0 [iscsi_target_mod] [<ffffffffa0417fab>] lio_tpg_shutdown_session+0x7b/0x90 [iscsi_target_mod] [<ffffffffa033ede4>] core_tpg_set_initiator_node_queue_depth+0xe4/0x290 [target_core_mod] [<ffffffffa0409032>] iscsit_tpg_set_initiator_node_queue_depth+0x12/0x20 [iscsi_target_mod] [<ffffffffa0415c29>] lio_target_nacl_store_cmdsn_depth+0xa9/0x180 [iscsi_target_mod] [<ffffffffa0331b49>] target_fabric_nacl_base_attr_store+0x39/0x40 [target_core_mod] [<ffffffff811b857d>] configfs_write_file+0xbd/0x120 [<ffffffff81148f36>] vfs_write+0xc6/0x180 [<ffffffff81149251>] sys_write+0x51/0x90 [<ffffffff815c0969>] system_call_fastpath+0x16/0x1b ---[ end trace 3747632b9b164652 ]--- As a pure band-aid, this patch drops the _bh. Signed-off-by: Joern Engel <joern@logfs.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-06-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is an assortment of crash fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: stop all workers before cleaning up roots Btrfs: fix use-after-free bug during umount Btrfs: init relocate extent_io_tree with a mapping btrfs: Drop inode if inode root is NULL Btrfs: don't delete fs_roots until after we cleanup the transaction
2013-06-13mei: me: clear interrupts on the resume pathTomas Winkler
We need to clear pending interrupts on the resume path. This brings the device into defined state before starting the reset flow This should solve suspend/resume issues: mei_me : wait hw ready failed. status = 0x0 mei_me : version message write failed Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-13mei: nfc: fix nfc device freeingTomas Winkler
The nfc_dev is a static variable and is not cleaned properly upon reset mainly ndev->cl and ndev->cl_info are not set to NULL after freeing which mei_stop:198: mei_me 0000:00:16.0: stopping the device. [ 404.253427] general protection fault: 0000 [#2] SMP [ 404.253437] Modules linked in: mei_me(-) binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave fuse loop dm_mod hid_generic usbhid hid coretemp acpi_cpufreq mperf kvm_intel kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul snd_hda_codec_hdmi glue_helper aes_x86_64 e1000e snd_hda_intel snd_hda_codec ehci_pci iTCO_wdt iTCO_vendor_support ehci_hcd snd_hwdep xhci_hcd snd_pcm usbcore ptp mei sg microcode snd_timer pps_core i2c_i801 snd pcspkr battery rtc_cmos lpc_ich mfd_core soundcore usb_common snd_page_alloc ac ext3 jbd mbcache drm_kms_helper drm intel_agp i2c_algo_bit intel_gtt i2c_core sd_mod crc_t10dif thermal fan video button processor thermal_sys hwmon ahci libahci libata scsi_mod [last unloaded: mei_me] [ 404.253591] CPU: 0 PID: 5551 Comm: modprobe Tainted: G D W 3.10.0-rc3 #1 [ 404.253611] task: ffff880143cd8300 ti: ffff880144a2a000 task.ti: ffff880144a2a000 [ 404.253619] RIP: 0010:[<ffffffff81334e5d>] [<ffffffff81334e5d>] device_del+0x1d/0x1d0 [ 404.253638] RSP: 0018:ffff880144a2bcf8 EFLAGS: 00010206 [ 404.253645] RAX: 2020302e30202030 RBX: ffff880144fdb000 RCX: 0000000000000086 [ 404.253652] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffff880144fdb000 [ 404.253659] RBP: ffff880144a2bd18 R08: 0000000000000651 R09: 0000000000000006 [ 404.253666] R10: 0000000000000651 R11: 0000000000000006 R12: ffff880144fdb000 [ 404.253673] R13: ffff880149371098 R14: ffff880144482c00 R15: ffffffffa04710e0 [ 404.253681] FS: 00007f251c59a700(0000) GS:ffff88014e200000(0000) knlGS:0000000000000000 [ 404.253689] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 404.253696] CR2: ffffffffff600400 CR3: 0000000145319000 CR4: 00000000001407f0 [ 404.253703] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 404.253710] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 404.253716] Stack: [ 404.253720] ffff880144fdb000 ffff880143ffe000 ffff880149371098 ffffffffa0471000 [ 404.253732] ffff880144a2bd38 ffffffff8133502d ffff88014e20cf48 ffff880143ffe1d8 [ 404.253744] ffff880144a2bd48 ffffffffa02a4749 ffff880144a2bd58 ffffffffa02a4ba1 [ 404.253755] Call Trace: [ 404.253766] [<ffffffff8133502d>] device_unregister+0x1d/0x60 [ 404.253787] [<ffffffffa02a4749>] mei_cl_remove_device+0x9/0x10 [mei] [ 404.253804] [<ffffffffa02a4ba1>] mei_nfc_host_exit+0x21/0x30 [mei] [ 404.253819] [<ffffffffa029c2dd>] mei_stop+0x3d/0x90 [mei] [ 404.253830] [<ffffffffa046e220>] mei_me_remove+0x60/0xe0 [mei_me] [ 404.253843] [<ffffffff81278f37>] pci_device_remove+0x37/0xb0 [ 404.253855] [<ffffffff81337c68>] __device_release_driver+0x98/0x100 [ 404.253865] [<ffffffff81337d80>] driver_detach+0xb0/0xc0 [ 404.253876] [<ffffffff81336b4f>] bus_remove_driver+0x8f/0x120 [ 404.253891] [<ffffffff81075990>] ? try_to_wake_up+0x2b0/0x2b0 [ 404.253903] [<ffffffff81338a48>] driver_unregister+0x58/0x90 [ 404.253913] [<ffffffff8127906b>] pci_unregister_driver+0x2b/0xb0 [ 404.253924] [<ffffffffa046f244>] mei_me_driver_exit+0x10/0xdcc [mei_me] [ 404.253936] [<ffffffff810a50d8>] SyS_delete_module+0x198/0x2b0 [ 404.253949] [<ffffffff814850d9>] ? do_page_fault+0x9/0x10 [ 404.253961] [<ffffffff81489692>] system_call_fastpath+0x16/0x1b [ 404.253967] Code: 41 5c 41 5d 41 5e 41 5f c9 c3 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 8b 87 88 00 00 00 4c 8b 37 48 85 c0 74 18 <48> 8b 78 78 4c 89 e2 be 02 00 00 00 48 81 c7 f8 00 00 00 e8 3b [ 404.254048] RIP [<ffffffff81334e5d>] device_del+0x1d/0x1d0 Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-13mei: init: Flush scheduled work before resetting the deviceSamuel Ortiz
Flushing pending work items before resetting the device makes more sense than doing so afterwards. Some of them, like e.g. the NFC initialization one, find themselves with client IDs changed after the reset, eventually leading to trigger a client.c:mei_me_cl_by_id() warning after a few modprobe/rmmod cycles. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-14ARM: davinci: remove __init atrribute from function declarationLad, Prabhakar
__init attribute does not make sense on function declarations. Get rid of them in mach-davinci. Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Cc: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com>
2013-06-13cpuset: rename @cont to @cgrpLi Zefan
Cont is short for container. control group was named process container at first, but then people found container already has a meaning in linux kernel. Clean up the leftover variable name @cont. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13cgroup: update sane_behavior documentationTejun Heo
f12dc02014 ("cgroup: mark "tasks" cgroup file as insane") and cc5943a781 ("cgroup: mark "notify_on_release" and "release_agent" cgroup files insane") forgot to update the changed behavior documentation in cgroup.h. Update it. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13cgroup: use percpu refcnt for cgroup_subsys_statesTejun Heo
A css (cgroup_subsys_state) is how each cgroup is represented to a controller. As such, it can be used in hot paths across the various subsystems different controllers are associated with. One of the common operations is reference counting, which up until now has been implemented using a global atomic counter and can have significant adverse impact on scalability. For example, css refcnt can be gotten and put multiple times by blkcg for each IO request. For highops configurations which try to do as much per-cpu as possible, the global frequent refcnting can be very expensive. In general, given the various and hugely diverse paths css's end up being used from, we need to make it cheap and highly scalable. In its usage, css refcnting isn't very different from module refcnting. This patch converts css refcnting to use the recently added percpu_ref. css_get/tryget/put() directly maps to the matching percpu_ref operations and the deactivation logic is no longer necessary as percpu_ref already has refcnt killing. The only complication is that as the refcnt is per-cpu, percpu_ref_kill() in itself doesn't ensure that further tryget operations will fail, which we need to guarantee before invoking ->css_offline()'s. This is resolved collecting kill confirmation using percpu_ref_kill_and_confirm() and initiating the offline phase of destruction after all css refcnt's are confirmed to be seen as killed on all CPUs. The previous patches already splitted destruction into two phases, so percpu_ref_kill_and_confirm() can be hooked up easily. This patch removes css_refcnt() which is used for rcu dereference sanity check in css_id(). While we can add a percpu refcnt API to ask the same question, css_id() itself is scheduled to be removed fairly soon, so let's not bother with it. Just drop the sanity check and use rcu_dereference_raw() instead. v2: - init_cgroup_css() was calling percpu_ref_init() without checking the return value. This causes two problems - the obvious lack of error handling and percpu_ref_init() being called from cgroup_init_subsys() before the allocators are up, which triggers warnings but doesn't cause actual problems as the refcnt isn't used for roots anyway. Fix both by moving percpu_ref_init() to cgroup_create(). - The base references were put too early by percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the refs one extra time. This wasn't noticeable because css's go through another RCU grace period before being freed. Update cgroup_destroy_locked() to grab an extra reference before killing the refcnts. This problem was noticed by Kent. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <koverstreet@google.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Alasdair G. Kergon" <agk@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Glauber Costa <glommer@gmail.com>
2013-06-13Merge branch 'for-3.11' of ↵Tejun Heo
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.11 This is to receive percpu_refcount which will replace atomic_t reference count in cgroup_subsys_state. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13cgroup: split cgroup destruction into two stepsTejun Heo
Split cgroup_destroy_locked() into two steps and put the latter half into cgroup_offline_fn() which is executed from a work item. The latter half is responsible for offlining the css's, removing the cgroup from internal lists, and propagating release notification to the parent. The separation is to allow using percpu refcnt for css. Note that this allows for other cgroup operations to happen between the first and second halves of destruction, including creating a new cgroup with the same name. As the target cgroup is marked DEAD in the first half and cgroup internals don't care about the names of cgroups, this should be fine. A comment explaining this will be added by the next patch which implements the actual percpu refcnting. As RCU freeing is guaranteed to happen after the second step of destruction, we can use the same work item for both. This patch renames cgroup->free_work to ->destroy_work and uses it for both purposes. INIT_WORK() is now performed right before queueing the work item. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13cgroup: reorder the operations in cgroup_destroy_locked()Tejun Heo
This patch reorders the operations in cgroup_destroy_locked() such that the userland visible parts happen before css offlining and removal from the ->sibling list. This will be used to make css use percpu refcnt. While at it, split out CGRP_DEAD related comment from the refcnt deactivation one and correct / clarify how different guarantees are met. While this patch changes the specific order of operations, it shouldn't cause any noticeable behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-13percpu-refcount: implement percpu_tryget() along with ↵Tejun Heo
percpu_ref_kill_and_confirm() Implement percpu_tryget() which stops giving out references once the percpu_ref is visible as killed. Because the refcnt is per-cpu, different CPUs will start to see a refcnt as killed at different points in time and tryget() may continue to succeed on subset of cpus for a while after percpu_ref_kill() returns. For use cases where it's necessary to know when all CPUs start to see the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new function takes an extra argument @confirm_kill which is invoked when the refcnt is guaranteed to be viewed as killed on all CPUs. While this isn't the prettiest interface, it doesn't force synchronous wait and is much safer than requiring the caller to do its own call_rcu(). v2: Patch description rephrased to emphasize that tryget() may continue to succeed on some CPUs after kill() returns as suggested by Kent. v3: Function comment in percpu_ref_kill_and_confirm() updated warning people to not depend on the implied RCU grace period from the confirm callback as it's an implementation detail. Signed-off-by: Tejun Heo <tj@kernel.org> Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13sctp: fully initialize sctp_outq in sctp_outq_initNeil Horman
In commit 2f94aabd9f6c925d77aecb3ff020f1cc12ed8f86 (refactor sctp_outq_teardown to insure proper re-initalization) we modified sctp_outq_teardown to use sctp_outq_init to fully re-initalize the outq structure. Steve West recently asked me why I removed the q->error = 0 initalization from sctp_outq_teardown. I did so because I was operating under the impression that sctp_outq_init would properly initalize that value for us, but it doesn't. sctp_outq_init operates under the assumption that the outq struct is all 0's (as it is when called from sctp_association_init), but using it in __sctp_outq_teardown violates that assumption. We should do a memset in sctp_outq_init to ensure that the entire structure is in a known state there instead. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: "West, Steve (NSN - US/Fort Worth)" <steve.west@nsn.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: netdev@vger.kernel.org CC: davem@davemloft.net Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13netiucv: Hold rtnl between name allocation and device registration.Benjamin Poirier
fixes a race condition between concurrent initializations of netiucv devices that try to use the same name. sysfs: cannot create duplicate filename '/devices/iucv/netiucv2' [...] Call Trace: ([<00000000002edea4>] sysfs_add_one+0xb0/0xdc) [<00000000002eecd4>] create_dir+0x80/0xfc [<00000000002eee38>] sysfs_create_dir+0xe8/0x118 [<00000000003835a8>] kobject_add_internal+0x120/0x2d0 [<00000000003839d6>] kobject_add+0x62/0x9c [<00000000003d9564>] device_add+0xcc/0x510 [<000003e00212c7b4>] netiucv_register_device+0xc0/0x1ec [netiucv] Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Tested-by: Ursula Braun <braunu@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13tulip: Properly check dma mapping resultNeil Horman
Tulip throws an error when dma debugging is enabled, as it doesn't properly check dma mapping results with dma_mapping_error() durring tx ring refills. Easy fix, just add it in, and drop the frame if the mapping is bad Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Grant Grundler <grundler@parisc-linux.org> CC: "David S. Miller" <davem@davemloft.net> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-14f2fs: recover wrong pino after checkpoint during fsyncJaegeuk Kim
If a file is linked, f2fs loose its parent inode number so that fsync calls for the linked file should do checkpoint all the time. But, if we can recover its parent inode number after the checkpoint, we can adjust roll-forward mechanism for the further fsync calls, which is able to improve the fsync performance significatly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: optimize do_write_data_page()Haicheng Li
Since "need_inplace_update() == true" is a very rare case, using unlikely() to give compiler a chance to optimize the code. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: make locate_dirty_segment() as staticHaicheng Li
It's used only locally and could be static. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: remove unnecessary parameter "offset" from __add_sum_entry()Haicheng Li
We can get the value directly from pointer "curseg". Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: avoid freqeunt write_inode callsJaegeuk Kim
If update_inode is called, we don't need to do write_inode. So, let's use a *dirty* flag for each inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: optimise the truncate_data_blocks_range() rangeNamjae Jeon
The function truncate_data_blocks_range() decrements the valid block count of inode via dec_valid_block_count(). Since this function updates the i_blocks field of inode, we can update this field once we have calculated total the number of blocks to be freed. Therefore we can decrement valid blocks outside of the for loop. if (nr_free) { + dec_valid_block_count(sbi, dn->inode, nr_free); set_page_dirty(dn->node_page); sync_inode_page(dn); } 'nr_free' tells the total number of blocks freed. So, we can just directly pass this value to dec_valid_block_count() and update the i_blocks. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: use the F2FS specific flags in f2fs_ioctl()Namjae Jeon
In f2fs_ioctl() function, it is using generic flags. Since F2FS specific flags are defined. So lets use those flags. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-13Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linuxLinus Torvalds
Pull device tree bug fixes from Grant Likely: "This branch contains the following bug fixes: - Fix locking vs. interrupts. Bug caught by lockdep checks - Fix parsing of cpp #line directive output by dtc - Fix 'make clean' for dtc temporary files. There is also a commit that regenerates the dtc lexer and parser files with Bison 2.5. The only purpose of this commit is to separate the functional change in the dtc bug fix from the code generation change caused by a different Bison version" * tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux: dtc: ensure #line directives don't consume data from the next line dtc: Update generated files to output from Bison 2.5 of: Fix locking vs. interrupts kbuild: make sure we clean up DTB temporary files
2013-06-14md/raid10: check In_sync flag in 'enough()'.NeilBrown
It isn't really enough to check that the rdev is present, we need to also be sure that the device is still In_sync. Doing this requires using rcu_dereference to access the rdev, and holding the rcu_read_lock() to ensure the rdev doesn't disappear while we look at it. Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14md/raid10: locking changes for 'enough()'.NeilBrown
As 'enough' accesses conf->prev and conf->geo, which can change spontanously, it should guard against changes. This can be done with device_lock as start_reshape holds device_lock while updating 'geo' and end_reshape holds it while updating 'prev'. So 'error' needs to hold 'device_lock'. On the other hand, raid10_end_read_request knows which of the two it really wants to access, and as it is an active request on that one, the value cannot change underneath it. So change _enough to take flag rather than a pointer, pass the appropriate flag from raid10_end_read_request(), and remove the locking. All other calls to 'enough' are made with reconfig_mutex held, so neither 'prev' nor 'geo' can change. Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14md: replace strict_strto*() with kstrto*()Jingoo Han
The usage of strict_strtoul() is not preferred, because strict_strtoul() is obsolete. Thus, kstrtoul() should be used. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14md: Wait for md_check_recovery before attempting device removal.Hannes Reinecke
When a device has failed, it needs to be removed from the personality module before it can be removed from the array as a whole. The first step is performed by md_check_recovery() which is called from the raid management thread. So when a HOT_REMOVE ioctl arrives, wait briefly for md_check_recovery to have run. This increases the chance that the ioctl will succeed. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Neil Brown <nfbrown@suse.de>
2013-06-14dm-raid: silence compiler warning on rebuilds_per_group.NeilBrown
This doesn't really need to be initialised, but it doesn't hurt, silences the compiler, and as it is a counter it makes sense for it to start at zero. Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14DM RAID: Fix raid_resume not reviving failed devices in all casesJonathan Brassow
DM RAID: Fix raid_resume not reviving failed devices in all cases When a device fails in a RAID array, it is marked as Faulty. Later, md_check_recovery is called which (through the call chain) calls 'hot_remove_disk' in order to have the personalities remove the device from use in the array. Sometimes, it is possible for the array to be suspended before the personalities get their chance to perform 'hot_remove_disk'. This is normally not an issue. If the array is deactivated, then the failed device will be noticed when the array is reinstantiated. If the array is resumed and the disk is still missing, md_check_recovery will be called upon resume and 'hot_remove_disk' will be called at that time. However, (for dm-raid) if the device has been restored, a resume on the array would cause it to attempt to revive the device by calling 'hot_add_disk'. If 'hot_remove_disk' had not been called, a situation is then created where the device is thought to concurrently be the replacement and the device to be replaced. Thus, the device is first sync'ed with the rest of the array (because it is the replacement device) and then marked Faulty and removed from the array (because it is also the device being replaced). The solution is to check and see if the device had properly been removed before the array was suspended. This is done by seeing whether the device's 'raid_disk' field is -1 - a condition that implies that 'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1) -> hot_remove_disk' has been called. If 'raid_disk' is not -1, then 'hot_remove_disk' must be called to complete the removal of the previously faulty device before it can be revived via 'hot_add_disk'. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14DM RAID: Break-up untidy functionJonathan Brassow
DM RAID: Break-up untidy function Clean-up excessive indentation by moving some code in raid_resume() into its own function. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14DM RAID: Add ability to restore transiently failed devices on resumeJonathan Brassow
DM RAID: Add ability to restore transiently failed devices on resume This patch adds code to the resume function to check over the devices in the RAID array. If any are found to be marked as failed and their superblocks can be read, an attempt is made to reintegrate them into the array. This allows the user to refresh the array with a simple suspend and resume of the array - rather than having to load a completely new table, allocate and initialize all the structures and throw away the old instantiation. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-13dtc: ensure #line directives don't consume data from the next lineGrant Likely
Previously, the #line parsing regex ended with ({WS}+[0-9]+)?. The {WS} could match line-break characters. If the #line directive did not contain the optional flags field at the end, this could cause any integer data on the next line to be consumed as part of the #line directive parsing. This could cause syntax errors (i.e. #line parsing consuming the leading 0 from a hex literal 0x1234, leaving x1234 to be parsed as cell data, which is a syntax error), or invalid compilation results (i.e. simply consuming literal 1234 as part of the #line processing, thus removing it from the cell data). Fix this by replacing {WS} with [ \t] so that it can't match line-breaks. Convert all instances of {WS}, even though the other instances should be irrelevant for any well-formed #line directive. This is done for consistency and ultimate safety. [Cherry picked from DTC commit a1ee6f068e1c8dbc62873645037a353d7852d5cc] Reported-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Stephen Warren <swarren@nvidia.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2013-06-13dtc: Update generated files to output from Bison 2.5Grant Likely
This patch merely updates the generated dtc parser and lexer files to the output generated by Bison 2.5. The previous versions were generated from version 2.4.1. The only reason for this commit is to minimize the diff on the next commit which fixes a bug in the DTC #line directive parsing. Otherwise the Bison changes would be intermingled with the functional changes. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2013-06-13of: Fix locking vs. interruptsBenjamin Herrenschmidt
The OF code uses irqsafe locks everywhere except in a handful of functions for no obvious reasons. Since the conversion from the old rwlocks, this now triggers lockdep warnings when used at interrupt time. At least one driver (ibmvscsi) seems to be doing that from softirq context. This converts the few non-irqsafe locks into irqsafe ones, making them consistent with the rest of the code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-13kbuild: make sure we clean up DTB temporary filesIan Campbell
Various temporary files used when building DTB files were not suffixed with .tmp and therefore were not cleaned up by "make clean". Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Stephen Warren <swarren@nvidia.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Grant Likely <grant.likely@linaro.org>