Age | Commit message (Collapse) | Author |
|
Also fix return values for headphone switch updates.
Signed-off-by: Nicolin Chen <b42378@freescale.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Cc: stable@vger.kernel.org
|
|
Bring Transparent HugePage support to ARM. The size of a
transparent huge page depends on the normal page size. A
transparent huge page is always represented as a pmd.
If PAGE_SIZE is 4KB, THPs are 2MB.
If PAGE_SIZE is 64KB, THPs are 512MB.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The buddy allocator has a default MAX_ORDER of 11, which is too
low to allocate enough memory for 512MB Transparent HugePages if
our base page size is 64KB.
This patch introduces MAX_ZONE_ORDER and sets it to 14 when 64KB
pages are used in conjuction with THP, otherwise the default value
of 11 is used.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Add huge page support to ARM64, different huge page sizes are
supported depending on the size of normal pages:
PAGE_SIZE is 4KB:
2MB - (pmds) these can be allocated at any time.
1024MB - (puds) usually allocated on bootup with the command line
with something like: hugepagesz=1G hugepages=6
PAGE_SIZE is 64KB:
512MB - (pmds) usually allocated on bootup via command line.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Under ARM64, PTEs can be broadly categorised as follows:
- Present and valid: Bit #0 is set. The PTE is valid and memory
access to the region may fault.
- Present and invalid: Bit #0 is clear and bit #1 is set.
Represents present memory with PROT_NONE protection. The PTE
is an invalid entry, and the user fault handler will raise a
SIGSEGV.
- Not present (file or swap): Bits #0 and #1 are clear.
Memory represented has been paged out. The PTE is an invalid
entry, and the fault handler will try and re-populate the
memory where necessary.
Huge PTEs are block descriptors that have bit #1 clear. If we wish
to represent PROT_NONE huge PTEs we then run into a problem as
there is no way to distinguish between regular and huge PTEs if we
set bit #1.
To resolve this ambiguity this patch moves PTE_PROT_NONE from
bit #1 to bit #2 and moves PTE_FILE from bit #2 to bit #3. The
number of swap/file bits is reduced by 1 as a consequence, leaving
60 bits for file and swap entries.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
If we consider the following code sequence:
my_pte = pte_modify(entry, myprot);
x = pte_write(my_pte);
y = pte_exec(my_pte);
If myprot comes from a PROT_NONE page, then x and y will both be
true which is undesireable behaviour.
This patch sets the no-execute and read-only bits for PAGE_NONE
such that the code above will return false for both x and y.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
In paging_init the memblock limit is set to restrict any addresses
returned by early_alloc to fit within the initial direct kernel
mapping in swapper_pg_dir. This allows map_mem to allocate puds,
pmds and ptes from the initial direct kernel mapping.
The limit stays low after paging_init() though, meaning any
bootmem allocations will be from a restricted subset of memory.
Gigabyte huge pages, for instance, are normally allocated from
bootmem as their order (18) is too large for the default buddy
allocator (MAX_ORDER = 11).
This patch restores the memblock limit when map_mem has finished,
allowing gigabyte huge pages (and other objects) to be allocated
from all of bootmem.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
All Transparent Huge Pages are allocated by the buddy allocator.
A compile time check is in place that fails when the order of a
transparent huge page is too large to be allocated by the buddy
allocator. Unfortunately that compile time check passes when:
HPAGE_PMD_ORDER == MAX_ORDER
( which is incorrect as the buddy allocator can only allocate
memory of order strictly less than MAX_ORDER. )
This patch updates the compile time check to fail in the above
case.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
|
|
huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have
already been copied over to mm.
This patch removes the x86 copies of these functions and activates
the general ones by enabling:
CONFIG_ARCH_WANT_GENERAL_HUGETLB
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d
functions in x86/mm/hugetlbpage.c do not rely on any architecture
specific knowledge other than the fact that pmds and puds can be
treated as huge ptes.
To allow other architectures to use this code (and reduce the need
for code duplication), this patch copies these functions into mm,
replaces the use of pud_large with pud_huge and provides a config
flag to activate them:
CONFIG_ARCH_WANT_GENERAL_HUGETLB
If CONFIG_ARCH_WANT_HUGE_PMD_SHARE is also active then the
huge_pmd_share code will be called by huge_pte_alloc (othewise we
call pmd_alloc and skip the sharing code).
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The huge_pmd_share code has been copied over to mm/hugetlb.c to
make it accessible to other architectures.
Remove the x86 copy of the huge_pmd_share code and enable the
ARCH_WANT_HUGE_PMD_SHARE config flag. That way we reference the
general one.
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Under x86, multiple puds can be made to reference the same bank of
huge pmds provided that they represent a full PUD_SIZE of shared
huge memory that is aligned to a PUD_SIZE boundary.
The code to share pmds does not require any architecture specific
knowledge other than the fact that pmds can be indexed, thus can
be beneficial to some other architectures.
This patch copies the huge pmd sharing (and unsharing) logic from
x86/ to mm/ and introduces a new config option to activate it:
CONFIG_ARCH_WANTS_HUGE_PMD_SHARE
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Robin Holt <holt@sgi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
Suppose an initiator sends a DATA IN command with an allocation length
shorter than the FC transfer length -- we get a target message like
TARGET_CORE[qla2xxx]: Expected Transfer Length: 256 does not match SCSI CDB Length: 0 for SAM Opcode: 0x12
In that case, the target core adjusts the data_length and sets
se_cmd->residual_count for the underrun. But now suppose that command
fails and we end up in tcm_qla2xxx_queue_status() -- that function
unconditionally overwrites residual_count with the already adjusted
data_length, and the initiator will burp with a message like
qla2xxx [0000:00:06.0]-301d:0: Dropped frame(s) detected (0x100 of 0x100 bytes).
Fix this by adding on to the existing underflow residual count instead.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
|
Here is a fun one. Bug seems to have been introduced by commit 140854cb,
almost two years ago. I have no idea why we only started seeing it now,
but we did.
Rough callgraph:
core_tpg_set_initiator_node_queue_depth()
`-> spin_lock_irqsave(&tpg->session_lock, flags);
`-> lio_tpg_shutdown_session()
`-> iscsit_stop_time2retain_timer()
`-> spin_unlock_bh(&se_tpg->session_lock);
`-> spin_lock_bh(&se_tpg->session_lock);
`-> spin_unlock_irqrestore(&tpg->session_lock, flags);
core_tpg_set_initiator_node_queue_depth() used to call spin_lock_bh(),
but 140854cb changed that to spin_lock_irqsave(). However,
lio_tpg_shutdown_session() still claims to be called with spin_lock_bh()
held, as does iscsit_stop_time2retain_timer():
* Called with spin_lock_bh(&struct se_portal_group->session_lock) held
Stale documentation is mostly annoying, but in this case the dropping
the lock with the _bh variant is plain wrong. It is also wrong to drop
locks two functions below the lock-holder, but I will ignore that bit
for now.
After some more locking and unlocking we eventually hit this backtrace:
------------[ cut here ]------------
WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0xe8/0x100()
Pid: 24645, comm: lio_helper.py Tainted: G O 3.6.11+
Call Trace:
[<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffffa040ae37>] ? iscsit_inc_conn_usage_count+0x37/0x50 [iscsi_target_mod]
[<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810472f8>] local_bh_enable_ip+0xe8/0x100
[<ffffffff815b8365>] _raw_spin_unlock_bh+0x15/0x20
[<ffffffffa040ae37>] iscsit_inc_conn_usage_count+0x37/0x50 [iscsi_target_mod]
[<ffffffffa041149a>] iscsit_stop_session+0xfa/0x1c0 [iscsi_target_mod]
[<ffffffffa0417fab>] lio_tpg_shutdown_session+0x7b/0x90 [iscsi_target_mod]
[<ffffffffa033ede4>] core_tpg_set_initiator_node_queue_depth+0xe4/0x290 [target_core_mod]
[<ffffffffa0409032>] iscsit_tpg_set_initiator_node_queue_depth+0x12/0x20 [iscsi_target_mod]
[<ffffffffa0415c29>] lio_target_nacl_store_cmdsn_depth+0xa9/0x180 [iscsi_target_mod]
[<ffffffffa0331b49>] target_fabric_nacl_base_attr_store+0x39/0x40 [target_core_mod]
[<ffffffff811b857d>] configfs_write_file+0xbd/0x120
[<ffffffff81148f36>] vfs_write+0xc6/0x180
[<ffffffff81149251>] sys_write+0x51/0x90
[<ffffffff815c0969>] system_call_fastpath+0x16/0x1b
---[ end trace 3747632b9b164652 ]---
As a pure band-aid, this patch drops the _bh.
Signed-off-by: Joern Engel <joern@logfs.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This is an assortment of crash fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: stop all workers before cleaning up roots
Btrfs: fix use-after-free bug during umount
Btrfs: init relocate extent_io_tree with a mapping
btrfs: Drop inode if inode root is NULL
Btrfs: don't delete fs_roots until after we cleanup the transaction
|
|
We need to clear pending interrupts on the resume
path. This brings the device into defined state
before starting the reset flow
This should solve suspend/resume issues:
mei_me : wait hw ready failed. status = 0x0
mei_me : version message write failed
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The nfc_dev is a static variable and is not cleaned properly upon reset
mainly ndev->cl and ndev->cl_info are not set to NULL after freeing which
mei_stop:198: mei_me 0000:00:16.0: stopping the device.
[ 404.253427] general protection fault: 0000 [#2] SMP
[ 404.253437] Modules linked in: mei_me(-) binfmt_misc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device edd af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave fuse loop dm_mod hid_generic usbhid hid coretemp acpi_cpufreq mperf kvm_intel kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul snd_hda_codec_hdmi glue_helper aes_x86_64 e1000e snd_hda_intel snd_hda_codec ehci_pci iTCO_wdt iTCO_vendor_support ehci_hcd snd_hwdep xhci_hcd snd_pcm usbcore ptp mei sg microcode snd_timer pps_core i2c_i801 snd pcspkr battery rtc_cmos lpc_ich mfd_core soundcore usb_common snd_page_alloc ac ext3 jbd mbcache drm_kms_helper drm intel_agp i2c_algo_bit intel_gtt i2c_core sd_mod crc_t10dif thermal fan video button processor thermal_sys hwmon ahci libahci libata scsi_mod [last unloaded: mei_me]
[ 404.253591] CPU: 0 PID: 5551 Comm: modprobe Tainted: G D W 3.10.0-rc3 #1
[ 404.253611] task: ffff880143cd8300 ti: ffff880144a2a000 task.ti: ffff880144a2a000
[ 404.253619] RIP: 0010:[<ffffffff81334e5d>] [<ffffffff81334e5d>] device_del+0x1d/0x1d0
[ 404.253638] RSP: 0018:ffff880144a2bcf8 EFLAGS: 00010206
[ 404.253645] RAX: 2020302e30202030 RBX: ffff880144fdb000 RCX: 0000000000000086
[ 404.253652] RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffff880144fdb000
[ 404.253659] RBP: ffff880144a2bd18 R08: 0000000000000651 R09: 0000000000000006
[ 404.253666] R10: 0000000000000651 R11: 0000000000000006 R12: ffff880144fdb000
[ 404.253673] R13: ffff880149371098 R14: ffff880144482c00 R15: ffffffffa04710e0
[ 404.253681] FS: 00007f251c59a700(0000) GS:ffff88014e200000(0000) knlGS:0000000000000000
[ 404.253689] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 404.253696] CR2: ffffffffff600400 CR3: 0000000145319000 CR4: 00000000001407f0
[ 404.253703] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 404.253710] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 404.253716] Stack:
[ 404.253720] ffff880144fdb000 ffff880143ffe000 ffff880149371098 ffffffffa0471000
[ 404.253732] ffff880144a2bd38 ffffffff8133502d ffff88014e20cf48 ffff880143ffe1d8
[ 404.253744] ffff880144a2bd48 ffffffffa02a4749 ffff880144a2bd58 ffffffffa02a4ba1
[ 404.253755] Call Trace:
[ 404.253766] [<ffffffff8133502d>] device_unregister+0x1d/0x60
[ 404.253787] [<ffffffffa02a4749>] mei_cl_remove_device+0x9/0x10 [mei]
[ 404.253804] [<ffffffffa02a4ba1>] mei_nfc_host_exit+0x21/0x30 [mei]
[ 404.253819] [<ffffffffa029c2dd>] mei_stop+0x3d/0x90 [mei]
[ 404.253830] [<ffffffffa046e220>] mei_me_remove+0x60/0xe0 [mei_me]
[ 404.253843] [<ffffffff81278f37>] pci_device_remove+0x37/0xb0
[ 404.253855] [<ffffffff81337c68>] __device_release_driver+0x98/0x100
[ 404.253865] [<ffffffff81337d80>] driver_detach+0xb0/0xc0
[ 404.253876] [<ffffffff81336b4f>] bus_remove_driver+0x8f/0x120
[ 404.253891] [<ffffffff81075990>] ? try_to_wake_up+0x2b0/0x2b0
[ 404.253903] [<ffffffff81338a48>] driver_unregister+0x58/0x90
[ 404.253913] [<ffffffff8127906b>] pci_unregister_driver+0x2b/0xb0
[ 404.253924] [<ffffffffa046f244>] mei_me_driver_exit+0x10/0xdcc [mei_me]
[ 404.253936] [<ffffffff810a50d8>] SyS_delete_module+0x198/0x2b0
[ 404.253949] [<ffffffff814850d9>] ? do_page_fault+0x9/0x10
[ 404.253961] [<ffffffff81489692>] system_call_fastpath+0x16/0x1b
[ 404.253967] Code: 41 5c 41 5d 41 5e 41 5f c9 c3 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 8b 87 88 00 00 00 4c 8b 37 48 85 c0 74 18 <48> 8b 78 78 4c 89 e2 be 02 00 00 00 48 81 c7 f8 00 00 00 e8 3b
[ 404.254048] RIP [<ffffffff81334e5d>] device_del+0x1d/0x1d0
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Flushing pending work items before resetting the device makes more
sense than doing so afterwards. Some of them, like e.g. the NFC
initialization one, find themselves with client IDs changed after
the reset, eventually leading to trigger a client.c:mei_me_cl_by_id()
warning after a few modprobe/rmmod cycles.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
__init attribute does not make sense on function declarations.
Get rid of them in mach-davinci.
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Cc: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
|
|
Cont is short for container. control group was named process container
at first, but then people found container already has a meaning in
linux kernel.
Clean up the leftover variable name @cont.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
f12dc02014 ("cgroup: mark "tasks" cgroup file as insane") and
cc5943a781 ("cgroup: mark "notify_on_release" and "release_agent"
cgroup files insane") forgot to update the changed behavior
documentation in cgroup.h. Update it.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
A css (cgroup_subsys_state) is how each cgroup is represented to a
controller. As such, it can be used in hot paths across the various
subsystems different controllers are associated with.
One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability. For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.
In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable. In its
usage, css refcnting isn't very different from module refcnting.
This patch converts css refcnting to use the recently added
percpu_ref. css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.
The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s. This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs. The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.
This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id(). While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it. Just drop the sanity check and use
rcu_dereference_raw() instead.
v2: - init_cgroup_css() was calling percpu_ref_init() without checking
the return value. This causes two problems - the obvious lack
of error handling and percpu_ref_init() being called from
cgroup_init_subsys() before the allocators are up, which
triggers warnings but doesn't cause actual problems as the
refcnt isn't used for roots anyway. Fix both by moving
percpu_ref_init() to cgroup_create().
- The base references were put too early by
percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
refs one extra time. This wasn't noticeable because css's go
through another RCU grace period before being freed. Update
cgroup_destroy_locked() to grab an extra reference before
killing the refcnts. This problem was noticed by Kent.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Kent Overstreet <koverstreet@google.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.11
This is to receive percpu_refcount which will replace atomic_t
reference count in cgroup_subsys_state.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Split cgroup_destroy_locked() into two steps and put the latter half
into cgroup_offline_fn() which is executed from a work item. The
latter half is responsible for offlining the css's, removing the
cgroup from internal lists, and propagating release notification to
the parent. The separation is to allow using percpu refcnt for css.
Note that this allows for other cgroup operations to happen between
the first and second halves of destruction, including creating a new
cgroup with the same name. As the target cgroup is marked DEAD in the
first half and cgroup internals don't care about the names of cgroups,
this should be fine. A comment explaining this will be added by the
next patch which implements the actual percpu refcnting.
As RCU freeing is guaranteed to happen after the second step of
destruction, we can use the same work item for both. This patch
renames cgroup->free_work to ->destroy_work and uses it for both
purposes. INIT_WORK() is now performed right before queueing the work
item.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
|
This patch reorders the operations in cgroup_destroy_locked() such
that the userland visible parts happen before css offlining and
removal from the ->sibling list. This will be used to make css use
percpu refcnt.
While at it, split out CGRP_DEAD related comment from the refcnt
deactivation one and correct / clarify how different guarantees are
met.
While this patch changes the specific order of operations, it
shouldn't cause any noticeable behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
|
|
percpu_ref_kill_and_confirm()
Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed. Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.
For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.
While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().
v2: Patch description rephrased to emphasize that tryget() may
continue to succeed on some CPUs after kill() returns as suggested
by Kent.
v3: Function comment in percpu_ref_kill_and_confirm() updated warning
people to not depend on the implied RCU grace period from the
confirm callback as it's an implementation detail.
Signed-off-by: Tejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
|
|
In commit 2f94aabd9f6c925d77aecb3ff020f1cc12ed8f86
(refactor sctp_outq_teardown to insure proper re-initalization)
we modified sctp_outq_teardown to use sctp_outq_init to fully re-initalize the
outq structure. Steve West recently asked me why I removed the q->error = 0
initalization from sctp_outq_teardown. I did so because I was operating under
the impression that sctp_outq_init would properly initalize that value for us,
but it doesn't. sctp_outq_init operates under the assumption that the outq
struct is all 0's (as it is when called from sctp_association_init), but using
it in __sctp_outq_teardown violates that assumption. We should do a memset in
sctp_outq_init to ensure that the entire structure is in a known state there
instead.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: "West, Steve (NSN - US/Fort Worth)" <steve.west@nsn.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: netdev@vger.kernel.org
CC: davem@davemloft.net
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
fixes a race condition between concurrent initializations of netiucv devices
that try to use the same name.
sysfs: cannot create duplicate filename '/devices/iucv/netiucv2'
[...]
Call Trace:
([<00000000002edea4>] sysfs_add_one+0xb0/0xdc)
[<00000000002eecd4>] create_dir+0x80/0xfc
[<00000000002eee38>] sysfs_create_dir+0xe8/0x118
[<00000000003835a8>] kobject_add_internal+0x120/0x2d0
[<00000000003839d6>] kobject_add+0x62/0x9c
[<00000000003d9564>] device_add+0xcc/0x510
[<000003e00212c7b4>] netiucv_register_device+0xc0/0x1ec [netiucv]
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Tested-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tulip throws an error when dma debugging is enabled, as it doesn't properly
check dma mapping results with dma_mapping_error() durring tx ring refills.
Easy fix, just add it in, and drop the frame if the mapping is bad
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Grant Grundler <grundler@parisc-linux.org>
CC: "David S. Miller" <davem@davemloft.net>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If a file is linked, f2fs loose its parent inode number so that fsync calls
for the linked file should do checkpoint all the time.
But, if we can recover its parent inode number after the checkpoint, we can
adjust roll-forward mechanism for the further fsync calls, which is able to
improve the fsync performance significatly.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Since "need_inplace_update() == true" is a very rare case, using unlikely()
to give compiler a chance to optimize the code.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
It's used only locally and could be static.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
We can get the value directly from pointer "curseg".
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
If update_inode is called, we don't need to do write_inode.
So, let's use a *dirty* flag for each inode.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
The function truncate_data_blocks_range() decrements the valid
block count of inode via dec_valid_block_count(). Since this
function updates the i_blocks field of inode, we can update this
field once we have calculated total the number of blocks
to be freed.
Therefore we can decrement valid blocks outside of the for loop.
if (nr_free) {
+ dec_valid_block_count(sbi, dn->inode, nr_free);
set_page_dirty(dn->node_page);
sync_inode_page(dn);
}
'nr_free' tells the total number of blocks freed. So, we can
just directly pass this value to dec_valid_block_count() and update
the i_blocks.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
In f2fs_ioctl() function, it is using generic flags.
Since F2FS specific flags are defined. So lets use
those flags.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
Pull device tree bug fixes from Grant Likely:
"This branch contains the following bug fixes:
- Fix locking vs. interrupts. Bug caught by lockdep checks
- Fix parsing of cpp #line directive output by dtc
- Fix 'make clean' for dtc temporary files.
There is also a commit that regenerates the dtc lexer and parser files
with Bison 2.5. The only purpose of this commit is to separate the
functional change in the dtc bug fix from the code generation change
caused by a different Bison version"
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
dtc: ensure #line directives don't consume data from the next line
dtc: Update generated files to output from Bison 2.5
of: Fix locking vs. interrupts
kbuild: make sure we clean up DTB temporary files
|
|
It isn't really enough to check that the rdev is present, we need to
also be sure that the device is still In_sync.
Doing this requires using rcu_dereference to access the rdev, and
holding the rcu_read_lock() to ensure the rdev doesn't disappear while
we look at it.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
As 'enough' accesses conf->prev and conf->geo, which can change
spontanously, it should guard against changes.
This can be done with device_lock as start_reshape holds device_lock
while updating 'geo' and end_reshape holds it while updating 'prev'.
So 'error' needs to hold 'device_lock'.
On the other hand, raid10_end_read_request knows which of the two it
really wants to access, and as it is an active request on that one,
the value cannot change underneath it.
So change _enough to take flag rather than a pointer, pass the
appropriate flag from raid10_end_read_request(), and remove the locking.
All other calls to 'enough' are made with reconfig_mutex held, so
neither 'prev' nor 'geo' can change.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
When a device has failed, it needs to be removed from the personality
module before it can be removed from the array as a whole.
The first step is performed by md_check_recovery() which is called
from the raid management thread.
So when a HOT_REMOVE ioctl arrives, wait briefly for md_check_recovery
to have run. This increases the chance that the ioctl will succeed.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Neil Brown <nfbrown@suse.de>
|
|
This doesn't really need to be initialised, but it doesn't hurt,
silences the compiler, and as it is a counter it makes sense for it to
start at zero.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
DM RAID: Fix raid_resume not reviving failed devices in all cases
When a device fails in a RAID array, it is marked as Faulty. Later,
md_check_recovery is called which (through the call chain) calls
'hot_remove_disk' in order to have the personalities remove the device
from use in the array.
Sometimes, it is possible for the array to be suspended before the
personalities get their chance to perform 'hot_remove_disk'. This is
normally not an issue. If the array is deactivated, then the failed
device will be noticed when the array is reinstantiated. If the
array is resumed and the disk is still missing, md_check_recovery will
be called upon resume and 'hot_remove_disk' will be called at that
time. However, (for dm-raid) if the device has been restored,
a resume on the array would cause it to attempt to revive the device
by calling 'hot_add_disk'. If 'hot_remove_disk' had not been called,
a situation is then created where the device is thought to concurrently
be the replacement and the device to be replaced. Thus, the device
is first sync'ed with the rest of the array (because it is the replacement
device) and then marked Faulty and removed from the array (because
it is also the device being replaced).
The solution is to check and see if the device had properly been removed
before the array was suspended. This is done by seeing whether the
device's 'raid_disk' field is -1 - a condition that implies that
'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
-> hot_remove_disk' has been called. If 'raid_disk' is not -1, then
'hot_remove_disk' must be called to complete the removal of the previously
faulty device before it can be revived via 'hot_add_disk'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
DM RAID: Break-up untidy function
Clean-up excessive indentation by moving some code in raid_resume()
into its own function.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
DM RAID: Add ability to restore transiently failed devices on resume
This patch adds code to the resume function to check over the devices
in the RAID array. If any are found to be marked as failed and their
superblocks can be read, an attempt is made to reintegrate them into
the array. This allows the user to refresh the array with a simple
suspend and resume of the array - rather than having to load a
completely new table, allocate and initialize all the structures and
throw away the old instantiation.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Previously, the #line parsing regex ended with ({WS}+[0-9]+)?. The {WS}
could match line-break characters. If the #line directive did not contain
the optional flags field at the end, this could cause any integer data on
the next line to be consumed as part of the #line directive parsing. This
could cause syntax errors (i.e. #line parsing consuming the leading 0
from a hex literal 0x1234, leaving x1234 to be parsed as cell data,
which is a syntax error), or invalid compilation results (i.e. simply
consuming literal 1234 as part of the #line processing, thus removing it
from the cell data).
Fix this by replacing {WS} with [ \t] so that it can't match line-breaks.
Convert all instances of {WS}, even though the other instances should be
irrelevant for any well-formed #line directive. This is done for
consistency and ultimate safety.
[Cherry picked from DTC commit a1ee6f068e1c8dbc62873645037a353d7852d5cc]
Reported-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
This patch merely updates the generated dtc parser and lexer files to
the output generated by Bison 2.5. The previous versions were generated
from version 2.4.1. The only reason for this commit is to minimize the
diff on the next commit which fixes a bug in the DTC #line directive
parsing. Otherwise the Bison changes would be intermingled with the
functional changes.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
The OF code uses irqsafe locks everywhere except in a handful of functions
for no obvious reasons. Since the conversion from the old rwlocks, this
now triggers lockdep warnings when used at interrupt time. At least one
driver (ibmvscsi) seems to be doing that from softirq context.
This converts the few non-irqsafe locks into irqsafe ones, making them
consistent with the rest of the code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
|
|
Various temporary files used when building DTB files were not suffixed with
.tmp and therefore were not cleaned up by "make clean".
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
|