Age | Commit message (Collapse) | Author |
|
reserved space leak
[BUG]
The following simple workload from fsstress can lead to qgroup reserved
data space leak:
0/0: creat f0 x:0 0 0
0/0: creat add id=0,parent=-1
0/1: write f0[259 1 0 0 0 0] [600030,27288] 0
0/4: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 64 627318] return 25, fallback to stat()
0/4: dwrite f0[259 1 0 0 64 627318] [610304,106496] 0
This would cause btrfs qgroup to leak 20480 bytes for data reserved
space. If btrfs qgroup limit is enabled, such leak can lead to
unexpected early EDQUOT and unusable space.
[CAUSE]
When doing direct IO, kernel will try to writeback existing buffered
page cache, then invalidate them:
generic_file_direct_write()
|- filemap_write_and_wait_range();
|- invalidate_inode_pages2_range();
However for btrfs, the bi_end_io hook doesn't finish all its heavy work
right after bio ends. In fact, it delays its work further:
submit_extent_page(end_io_func=end_bio_extent_writepage);
end_bio_extent_writepage()
|- btrfs_writepage_endio_finish_ordered()
|- btrfs_init_work(finish_ordered_fn);
<<< Work queue execution >>>
finish_ordered_fn()
|- btrfs_finish_ordered_io();
|- Clear qgroup bits
This means, when filemap_write_and_wait_range() returns,
btrfs_finish_ordered_io() is not guaranteed to be executed, thus the
qgroup bits for related range are not cleared.
Now into how the leak happens, this will only focus on the overlapping
part of buffered and direct IO part.
1. After buffered write
The inode had the following range with QGROUP_RESERVED bit:
596 616K
|///////////////|
Qgroup reserved data space: 20K
2. Writeback part for range [596K, 616K)
Write back finished, but btrfs_finish_ordered_io() not get called
yet.
So we still have:
596K 616K
|///////////////|
Qgroup reserved data space: 20K
3. Pages for range [596K, 616K) get released
This will clear all qgroup bits, but don't update the reserved data
space.
So we have:
596K 616K
| |
Qgroup reserved data space: 20K
That number doesn't match the qgroup bit range anymore.
4. Dio prepare space for range [596K, 700K)
Qgroup reserved data space for that range, we got:
596K 616K 700K
|///////////////|///////////////////////|
Qgroup reserved data space: 20K + 104K = 124K
5. btrfs_finish_ordered_range() gets executed for range [596K, 616K)
Qgroup free reserved space for that range, we got:
596K 616K 700K
| |///////////////////////|
We need to free that range of reserved space.
Qgroup reserved data space: 124K - 20K = 104K
6. btrfs_finish_ordered_range() gets executed for range [596K, 700K)
However qgroup bit for range [596K, 616K) is already cleared in
previous step, so we only free 84K for qgroup reserved space.
596K 616K 700K
| | |
We need to free that range of reserved space.
Qgroup reserved data space: 104K - 84K = 20K
Now there is no way to release that 20K unless disabling qgroup or
unmounting the fs.
[FIX]
This patch will change the timing of btrfs_qgroup_release/free_data()
call. Here it uses buffered COW write as an example.
The new timing | The old timing
----------------------------------------+---------------------------------------
btrfs_buffered_write() | btrfs_buffered_write()
|- btrfs_qgroup_reserve_data() | |- btrfs_qgroup_reserve_data()
|
btrfs_run_delalloc_range() | btrfs_run_delalloc_range()
|- btrfs_add_ordered_extent() |
|- btrfs_qgroup_release_data() |
The reserved is passed into |
btrfs_ordered_extent structure |
|
btrfs_finish_ordered_io() | btrfs_finish_ordered_io()
|- The reserved space is passed to | |- btrfs_qgroup_release_data()
btrfs_qgroup_record | The resereved space is passed
| to btrfs_qgroup_recrod
|
btrfs_qgroup_account_extents() | btrfs_qgroup_account_extents()
|- btrfs_qgroup_free_refroot() | |- btrfs_qgroup_free_refroot()
The point of such change is to ensure, when ordered extents are
submitted, the qgroup reserved space is already released, to keep the
timing aligned with file_write_and_wait_range().
So that qgroup data reserved space is all bound to btrfs_ordered_extent
and solve the timing mismatch.
Fixes: f695fdcef83a ("btrfs: qgroup: Introduce functions to release/free qgroup reserve data space")
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The incoming qgroup reserved space timing will move the data reservation
to ordered extent completely.
However in btrfs_punch_hole_lock_range() will call
btrfs_invalidate_page(), which will clear QGROUP_RESERVED bit for the
range.
In current stage it's OK, but if we're making ordered extents handle the
reserved space, then btrfs_punch_hole_lock_range() can clear the
QGROUP_RESERVED bit before we submit ordered extent, leading to qgroup
reserved space leakage.
So here change the timing to make reserve data space after
btrfs_punch_hole_lock_range().
The new timing is fine for either current code or the new code.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
insert_reserved_file_extent()
This is to prepare for the incoming timing change of qgroup reserved
data space and ordered extent.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Function insert_reserved_file_extent() takes a long list of parameters,
which are all for btrfs_file_extent_item, even including two reserved
members, encryption and other_encoding.
This makes the parameter list unnecessary long for a function which only
gets called twice.
This patch will refactor the parameter list, by using
btrfs_file_extent_item as parameter directly to hugely reduce the number
of parameters.
Also, since there are only two callers, one in btrfs_finish_ordered_io()
which inserts file extent for ordered extent, and one
__btrfs_prealloc_file_range().
These two call sites have completely different context, where ordered
extent can be compressed, but will always be regular extent, while the
preallocated one is never going to be compressed and always has PREALLOC
type.
So use two small wrapper for these two different call sites to improve
readability.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Add proper variable for the scrub page and use it instead of repeatedly
dereferencing the other structures.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use a simpler iteration over tree block pages, same what csum_tree_block
does: first page always exists, loop over the rest.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Add proper variable for the scrub page and use it instead of repeatedly
dereferencing the other structures.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We have sectorsize same as PAGE_SIZE, the checksum can be calculated in
one go.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Add proper variable for the scrub page and use it instead of repeatedly
dereferencing the other structures.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The page contents with the checksum is available during the entire
function so we don't need to make a copy.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
BTRFS_SUPER_INFO_SIZE is 4096, and fits to a page on all supported
architectures, so we can calculate the checksum in one go.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
As the page mapping has been removed, rename the variables to 'kaddr'
that we use everywhere else. The type is changed to 'char *' so pointer
arithmetic works without casts.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
All pages that scrub uses in the scrub_block::pagev array are allocated
with GFP_KERNEL and never part of any mapping, so kmap is not necessary,
we only need to know the page address.
In scrub_write_page_to_dev_replace we don't even need to call
flush_dcache_page because of the same reason as above.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This patch introduces a new "rescue=" mount option group for all mount
options for data recovery.
Different rescue sub options are seperated by ':'. E.g
"ro,rescue=nologreplay:usebackuproot".
The original plan was to use ';', but ';' needs to be escaped/quoted,
or it will be interpreted by bash, similar to '|'.
And obviously, user can specify rescue options one by one like:
"ro,rescue=nologreplay,rescue=usebackuproot".
The following mount options are converted to "rescue=", old mount
options are deprecated but still available for compatibility purpose:
- usebackuproot
Now it's "rescue=usebackuproot"
- nologreplay
Now it's "rescue=nologreplay"
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
relocation
We currently use btrfs_check_data_free_space() when allocating space for
relocating data extents, but that is not necessary because that function
combines btrfs_alloc_data_chunk_ondemand(), which does the actual space
reservation, and btrfs_qgroup_reserve_data().
We can use btrfs_alloc_data_chunk_ondemand() directly because we know we
do not need to reserve qgroup space since we are dealing with a relocation
tree, which can never have qgroups (btrfs_qgroup_reserve_data() does
nothing as is_fstree() returns false for a relocation tree).
Conversely we can use btrfs_free_reserved_data_space_noquota() directly
instead of btrfs_free_reserved_data_space(), since we had no qgroup
reservation when allocating space.
This change is preparatory work for another patch in this series that
makes relocation reserve the exact amount of space it needs to relocate
a data block group. The function btrfs_check_data_free_space() has
the incovenient of requiring a start offset argument and we will want to
be able to allocate space for multiple ranges, which are not consecutive,
at once.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The start argument for btrfs_free_reserved_data_space_noquota() is only
used to make sure the amount of bytes we decrement from the bytes_may_use
counter of the data space_info object is aligned to the filesystem's
sector size. It serves no other purpose.
All its current callers always pass a length argument that is already
aligned to the sector size, so we can make the start argument go away.
In fact its presence makes it impossible to use it in a context where we
just want to free a number of bytes for a range for which either we do
not know its start offset or for freeing multiple ranges at once (which
are not contiguous).
This change is preparatory work for a patch (third patch in this series)
that makes relocation of data block groups that are not full reserve less
data space.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
allocation
As there is a dump_stack() done on memory allocation failures, these
messages might as well be deleted instead.
Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor tweaks ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use the helper function where it is open coded to increment the
block_group reference count As btrfs_get_block_group() is a one-liner we
could have open-coded it, but its partner function
btrfs_put_block_group() isn't one-liner which does the free part in it.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
__btrfs_return_cluster_to_free_space() returns only 0. And all its
parent functions don't need the return value either so make this a void
function.
Further, as none of the callers of btrfs_return_cluster_to_free_space()
is actually using the return from this function, make this function also
return void.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Initially when the 'removed' flag was added to a block group to avoid
races between block group removal and fitrim, by commit 04216820fe83d5
("Btrfs: fix race between fs trimming and block group remove/allocation"),
we had to lock the chunks mutex because we could be moving the block
group from its current list, the pending chunks list, into the pinned
chunks list, or we could just be adding it to the pinned chunks if it was
not in the pending chunks list. Both lists were protected by the chunk
mutex.
However we no longer have those lists since commit 1c11b63eff2a67
("btrfs: replace pending/pinned chunks lists with io tree"), and locking
the chunk mutex is no longer necessary because of that. The same happens
at btrfs_unfreeze_block_group(), we lock the chunk mutex because the block
group's extent map could be part of the pinned chunks list and the call
to remove_extent_mapping() could be deleting it from that list, which
used to be protected by that mutex.
So just remove those lock and unlock calls as they are not needed anymore.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When find_first_block_group() finds a block group item in the extent-tree,
it does a lookup of the object in the extent mapping tree and does further
checks on the item.
Factor out this step from find_first_block_group() so we can further
simplify the code.
While we're at it, we can also just return early in
find_first_block_group(), if the tree slot isn't found.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We already have an fs_info in our function parameters, there's no need
to do the maths again and get fs_info from the extent_root just to get
the mapping_tree.
Instead directly grab the mapping_tree from fs_info.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Adresses held in 'logical' array are always guaranteed to fall within
the boundaries of the block group. That is, 'start' can never be
smaller than cache->start. This invariant follows from the way the
address are calculated in btrfs_rmap_block:
stripe_nr = physical - map->stripes[i].physical;
stripe_nr = div64_u64(stripe_nr, map->stripe_len);
bytenr = chunk_start + stripe_nr * io_stripe_size;
I.e it's always some IO stripe within the given chunk.
Exploit this invariant to simplify the body of the loop by removing the
unnecessary 'if' since its 'else' part is the one always executed.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
extent_map::orig_block_len contains the size of a physical stripe when
it's used to describe block groups (calculated in read_one_chunk via
calc_stripe_length or calculated in decide_stripe_size and then assigned
to extent_map::orig_block_len in create_chunk). Exploit this fact to get
the size directly rather than opencoding the calculations. No functional
changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The call to btrfs_btree_balance_dirty has been there since the early
days of BTRFS, when the btree was directly modified from the write path,
hence dirtied btree inode pages. With the implementation of b888db2bd7b6
("Btrfs: Add delayed allocation to the extent based page tree code")
13 years ago the btree is no longer modified from the write path, hence
there is no point in calling this function. Just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Pull cpupower utility updates for v5.9 from Shuah Khan:
"This cpupower update for Linux 5.9-rc1 consists of 2 fixes to
coccicheck warnings and one change to replacing HTTP links with
HTTPS ones."
* tag 'linux-cpupower-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
cpupower: Replace HTTP links with HTTPS ones
cpupower: Fix NULL but dereferenced coccicheck errors
cpupower: Fix comparing pointer to 0 coccicheck warns
|
|
Instead of having #ifdef/#endif blocks inside sync_core() for X86_64 and
X86_32, implement the new function iret_to_self() with two versions.
In this manner, avoid having to use even more more #ifdef/#endif blocks
when adding support for SERIALIZE in sync_core().
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200727043132.15082-4-ricardo.neri-calderon@linux.intel.com
|
|
Having sync_core() in processor.h is problematic since it is not possible
to check for hardware capabilities via the *cpu_has() family of macros.
The latter needs the definitions in processor.h.
It also looks more intuitive to relocate the function to sync_core.h.
This changeset does not make changes in functionality.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20200727043132.15082-3-ricardo.neri-calderon@linux.intel.com
|
|
The Intel architecture defines a set of Serializing Instructions (a
detailed definition can be found in Vol.3 Section 8.3 of the Intel "main"
manual, SDM). However, these instructions do more than what is required,
have side effects and/or may be rather invasive. Furthermore, some of
these instructions are only available in kernel mode or may cause VMExits.
Thus, software using these instructions only to serialize execution (as
defined in the manual) must handle the undesired side effects.
As indicated in the name, SERIALIZE is a new Intel architecture
Serializing Instruction. Crucially, it does not have any of the mentioned
side effects. Also, it does not cause VMExit and can be used in user mode.
This new instruction is currently documented in the latest "extensions"
manual (ISE). It will appear in the "main" manual in the future.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20200727043132.15082-2-ricardo.neri-calderon@linux.intel.com
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
The function is only called from within init_64.c and can be static.
Also remove it from pgtable_64.h.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20200721095953.6218-4-joro@8bytes.org
|
|
Remove the code to sync the vmalloc and ioremap ranges for x86-64. The
page-table pages are all pre-allocated now so that synchronization is
no longer necessary.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20200721095953.6218-3-joro@8bytes.org
|
|
Pre-allocate the page-table pages for the vmalloc area at the level
which needs synchronization on x86-64, which is P4D for 5-level and
PUD for 4-level paging.
Doing this at boot makes sure no synchronization of that area is
necessary at runtime. The synchronization takes the pgd_lock and
iterates over all page-tables in the system, so it can take quite long
and is better avoided.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/20200721095953.6218-2-joro@8bytes.org
|
|
Fold acpi_os_map_cleanup_deferred() into acpi_os_map_remove() and
pass the latter to INIT_RCU_WORK() in acpi_os_drop_map_ref() to make
the code more straightforward.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There is no reason (knwon to me) why any of the existing users of
acpi_os_unmap_iomem() would need to wait for the unused memory
mappings left by it to actually go away, so use the deferred
unmapping of ACPI memory introduced previously in that function.
While at it, fold __acpi_os_unmap_iomem() back into
acpi_os_unmap_iomem(), which has become a simple wrapper around it,
and make acpi_os_unmap_memory() call the latter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There is no reason (knwon to me) why any of the existing users of
acpi_os_unmap_generic_address() would need to wait for the unused
memory mappings left by it to actually go away, so use the deferred
unmapping of ACPI memory introduced previously in that function.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The ACPICA's strategy with respect to the handling of memory mappings
associated with memory operation regions is to avoid mapping the
entire region at once which may be problematic at least in principle
(for example, it may lead to conflicts with overlapping mappings
having different attributes created by drivers). It may also be
wasteful, because memory opregions on some systems take up vast
chunks of address space while the fields in those regions actually
accessed by AML are sparsely distributed.
For this reason, a one-page "window" is mapped for a given opregion
on the first memory access through it and if that "window" does not
cover an address range accessed through that opregion subsequently,
it is unmapped and a new "window" is mapped to replace it. Next,
if the new "window" is not sufficient to acess memory through the
opregion in question in the future, it will be replaced with yet
another "window" and so on. That may lead to a suboptimal sequence
of memory mapping and unmapping operations, for example if two fields
in one opregion separated from each other by a sufficiently wide
chunk of unused address space are accessed in an alternating pattern.
The situation may still be suboptimal if the deferred unmapping
introduced previously is supported by the OS layer. For instance,
the alternating memory access pattern mentioned above may produce
a relatively long list of mappings to release with substantial
duplication among the entries in it, which could be avoided if
acpi_ex_system_memory_space_handler() did not release the mapping
used by it previously as soon as the current access was not covered
by it.
In order to improve that, modify acpi_ex_system_memory_space_handler()
to preserve all of the memory mappings created by it until the memory
regions associated with them go away.
Accordingly, update acpi_ev_system_memory_region_setup() to unmap all
memory associated with memory opregions that go away.
Reported-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Xiang Li <xiang.z.li@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The ACPI OS layer in Linux uses RCU to protect the walkers of the
list of ACPI memory mappings from seeing an inconsistent state
while it is being updated. Among other situations, that list can
be walked in (NMI and non-NMI) interrupt context, so using a
sleeping lock to protect it is not an option.
However, performance issues related to the RCU usage in there
appear, as described by Dan Williams:
"Recently a performance problem was reported for a process invoking
a non-trival ASL program. The method call in this case ends up
repetitively triggering a call path like:
acpi_ex_store
acpi_ex_store_object_to_node
acpi_ex_write_data_to_field
acpi_ex_insert_into_field
acpi_ex_write_with_update_rule
acpi_ex_field_datum_io
acpi_ex_access_region
acpi_ev_address_space_dispatch
acpi_ex_system_memory_space_handler
acpi_os_map_cleanup.part.14
_synchronize_rcu_expedited.constprop.89
schedule
The end result of frequent synchronize_rcu_expedited() invocation is
tiny sub-millisecond spurts of execution where the scheduler freely
migrates this apparently sleepy task. The overhead of frequent
scheduler invocation multiplies the execution time by a factor
of 2-3X."
The source of this is that acpi_ex_system_memory_space_handler()
unmaps the memory mapping currently cached by it at the access time
if that mapping doesn't cover the memory area being accessed.
Consequently, if there is a memory opregion with two fields
separated from each other by an unused chunk of address space that
is large enough for not being covered by a single mapping, and they
happen to be used in an alternating pattern, the unmapping will
occur on every acpi_ex_system_memory_space_handler() invocation for
that memory opregion and that will lead to significant overhead.
Moreover, acpi_ex_system_memory_space_handler() carries out the
memory unmapping with the namespace and interpreter mutexes held
which may lead to additional latency, because all of the tasks
wanting to acquire on of these mutexes need to wait for the
memory unmapping operation to complete.
To address that, rework acpi_os_unmap_memory() so that it does not
release the memory mapping covering the given address range right
away and instead make it queue up the mapping at hand for removal
via queue_rcu_work().
Reported-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Xiang Li <xiang.z.li@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add dedicated Krzysztof Kozlowski's Git repository on @kernel.org for
memory controller drivers.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Fix firwmare -> firmware.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
|
|
Remove unneeded blank line and align indentation with open parenthesis.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Align indentation with open parenthesis (or fix existing alignment).
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Macros arguments should be enclosed by parenthesis for safety.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Use proper kerneldoc to fix GCC warnings like:
drivers/memory/of_memory.c:30: warning: Function parameter or member 'dev' not described in 'of_get_min_tck'
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Fix arbitary -> arbitrary.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Remove some unneeded blank lines, align indentation with open
parenthesis (or fix existing alignment).
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Driver uses 'unsigned int' in other places instead of 'unsigned'.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Macros arguments should be enclosed by parenthesis for safety.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Use proper kerneldoc to fix GCC warnings like:
drivers/memory/omap-gpmc.c:299: warning: Function parameter or member 'cs' not described in 'gpmc_get_clk_period'
drivers/memory/omap-gpmc.c:432: warning: Excess function parameter 'ma' description in 'get_gpmc_timing_reg'
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|