Age | Commit message (Collapse) | Author |
|
And check for valid nsec value before passing into timespec64_to_jiffies().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Store in memory pointed to by ->d_fsdata. Use ->d_init() to allocate the
storage. Need to use RCU freeing because the data is used in RCU lookup
mode.
We could cast ->d_fsdata directly on 64bit archs, but I don't think this is
worth the extra complexity.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with
userspace. When it is set in the INIT response, ACL support will be
enabled. ACL support also implies "default_permissions".
When ACL support is enabled, the kernel will cache and have responsibility
for enforcing ACLs. ACL xattrs will be passed to userspace, which is
responsible for updating the ACLs in the filesystem, keeping the file mode
in sync, and inheritance of default ACLs when new filesystem nodes are
created.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Only userspace filesystem can do the killing of suid/sgid without races.
So introduce an INIT flag and negotiate support for this.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on
chown and truncate, and (since writeback_cache) write. The problem with
this is that it'll potentially restore a stale mode.
The poper fix would be to let the filesystems do the suid/sgid clearing on
the relevant operations. Possibly some are already doing it but there's no
way we can detect this.
So fix this by refreshing and recalculating the mode. Do this only if
ATTR_KILL_S[UG]ID is set to not destroy performance for writes. This is
still racy but the size of the window is reduced.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
|
|
Without "default_permissions" the userspace filesystem's lookup operation
needs to perform the check for search permission on the directory.
If directory does not allow search for everyone (this is quite rare) then
userspace filesystem has to set entry timeout to zero to make sure
permissions are always performed.
Changing the mode bits of the directory should also invalidate the
(previously cached) dentry to make sure the next lookup will have a chance
of updating the timeout, if needed.
Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
"One small change to make joydev (which is used by older games) to bind
to devices that export Z axis but not X or Y (such as TRC rudder)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: joydev - recognize devices with Z axis as joysticks
|
|
In preparation for enabling multiple namespaces per pmem region, convert
the label tracking to use a linked list. In particular this will allow
select_pmem_id() to move labels from the unvalidated state to the
validated state. Currently we only track one validated set per-region.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Before we add more libnvdimm-private fields to nd_mapping make it clear
which parameters are input vs libnvdimm internals. Use struct
nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make
struct nd_mapping private to libnvdimm.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Existing implemenetation writes to all the flush hint addresses for a
given ND region. This is not necessary as the flushes are per imc and
not per DIMM. Search the mappings and clear out the duplicates at init
to avoid multiple flush to the same imc.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
This patch add update_ckpt_flags() to clean up the flow.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
While we call ->writepages, there are two cases:
a. we didn't writeout any dirty pages, since they are writebacked by other
thread concurrently.
b. we writeout dirty pages, and have already submitted bio to block layer.
In these cases, we don't need to do additional bio flushing unnecessarily,
it may split bio in cache into smaller one.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In sync_node_pages, we won't check and commit last merged pages in private
bio cache of f2fs, as these pages were taged as writeback, someone who is
waiting for writebacking of the page will be blocked until the cache was
committed by someone else.
We need to commit node type bio cache to avoid potential deadlock or long
delay of waiting writeback.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There exists almost same codes when get the value of pre_version
and cur_version in function validate_checkpoint, this patch adds
get_checkpoint_version to clean up redundant codes.
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds to support checkpoint error injection in f2fs for testing
fatal error tolerance, it will be useful that it can simulate abnormal
power off by f2fs itself instead of calling godown ioctl by running apps.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In ->remount_fs, we didn't recover original fault injection config if
we encounter error, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Do fault injection initialization in default_options to keep consistent
with other default option configurating.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch remove redundant value definition in build_sit_entries
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.
It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.
>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Just adjust segment bit info printed in procfs.
Before:
1008 5|0 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1009 3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 cc e6 8e bf f9 ff 5 3d 31 3d 13
1010 3|1 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
After:
1008 5|0 | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1009 4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef
1010 4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When getting EIO while handling orphan inodes, we can get some dirty node
pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try
to flush node pages. But in this case, we should prevent to do that, since
we will try again from the start.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Null-terminating the fscrypt_symlink_data on read is unnecessary because
it is not string data --- it contains binary ciphertext.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch fixes to handle EIO during recover_orphan_inode() given the below
panic.
F2FS-fs : inject IO error in f2fs_read_end_io+0xe6/0x100 [f2fs]
------------[ cut here ]------------
RIP: 0010:[<ffffffffc0b244e3>] [<ffffffffc0b244e3>] f2fs_evict_inode+0x433/0x470 [f2fs]
RSP: 0018:ffff92f8b7fb7c30 EFLAGS: 00010246
RAX: ffff92fb88a13500 RBX: ffff92f890566ea0 RCX: 00000000fd3c255c
RDX: 0000000000000001 RSI: ffff92fb88a13d90 RDI: ffff92fb8ee127e8
RBP: ffff92f8b7fb7c58 R08: 0000000000000001 R09: ffff92fb88a13d58
R10: 000000005a6a9373 R11: 0000000000000001 R12: 00000000fffffffb
R13: ffff92fb8ee12000 R14: 00000000000034ca R15: ffff92fb8ee12620
FS: 00007f1fefd8e880(0000) GS:ffff92fb95600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc211d34cdb CR3: 000000012d43a000 CR4: 00000000001406e0
Stack:
ffff92f890566ea0 ffff92f890567078 ffffffffc0b5a0c0 ffff92f890566f28
ffff92fb888b2000 ffff92f8b7fb7c80 ffffffffbc27ff55 ffff92f890566ea0
ffff92fb8bf10000 ffffffffc0b5a0c0 ffff92f8b7fb7cb0 ffffffffbc28090d
Call Trace:
[<ffffffffbc27ff55>] evict+0xc5/0x1a0
[<ffffffffbc28090d>] iput+0x1ad/0x2c0
[<ffffffffc0b3304c>] recover_orphan_inodes+0x10c/0x2e0 [f2fs]
[<ffffffffc0b2e0f4>] f2fs_fill_super+0x884/0x1150 [f2fs]
[<ffffffffbc2644ac>] mount_bdev+0x18c/0x1c0
[<ffffffffc0b2d870>] ? f2fs_commit_super+0x100/0x100 [f2fs]
[<ffffffffc0b2a755>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffffbc264e49>] mount_fs+0x39/0x170
[<ffffffffbc28555b>] vfs_kern_mount+0x6b/0x160
[<ffffffffbc2881df>] do_mount+0x1cf/0xd00
[<ffffffffbc287f2c>] ? copy_mount_options+0xac/0x170
[<ffffffffbc289003>] SyS_mount+0x83/0xd0
[<ffffffffbc8ee880>] entry_SYSCALL_64_fastpath+0x23/0xc1
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Otherwise, we can hit
f2fs_bug_on(sbi, !PageUptodate(sum_page));
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We should call put_page for preloaded summary pages in do_garbage_collect.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds a return value of write_checkpoint for f2fs_gc.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch improves the migration of dirty pages and allows migrating atomic
written pages that F2FS uses in Page Cache. Instead of the fallback releasing
page path, it provides better performance for memory compaction, CMA and other
users of memory page migrating. For dirty pages, there is no need to write back
first when migrating. For an atomic written page before committing, we can
migrate the page and update the related 'inmem_pages' list at the same time.
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix some coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch introduces spinlock to protect updating process of ckpt_flags
field in struct f2fs_checkpoint, it avoids incorrectly updating in race
condition.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
nvdimm_clear_poison cleared the user-visible badblocks, and sent
commands to the NVDIMM to clear the areas marked as 'poison', but it
neglected to clear the same areas from the internal poison_list which is
used to marshal ARS results before sorting them by namespace. As a
result, once on-demand ARS functionality was added:
37b137f nfit, libnvdimm: allow an ARS scrub to be triggered on demand
A scrub triggered from either sysfs or an MCE was found to be adding
stale entries that had been cleared from gendisk->badblocks, but were
still present in nvdimm_bus->poison_list. Additionally, the stale entries
could be triggered into producing stale disk->badblocks by simply disabling
and re-enabling the namespace or region.
This adds the missing step of clearing poison_list entries when clearing
poison, so that it is always in sync with badblocks.
Fixes: 37b137f ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
pmem_do_bvec used to kmap_atomic at the begin, and only unmap at the
end. Things like nvdimm_clear_poison may want to do nvdimm subsystem
bookkeeping operations that may involve taking locks or doing memory
allocations, and we can't do that from the atomic context. Reduce the
atomic context to just what needs it - the memcpy to/from pmem.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Starting a full Address Range Scrub (ARS) on hitting a memory error
machine check exception may not always be desirable. Provide a way
through sysfs to toggle the behavior between just adding the address
(cache line) where the MCE happened to the poison list and doing a full
scrub. The former (selective insertion of the address) is done
unconditionally.
Cc: linux-acpi@vger.kernel.org
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Merge more fixes from Andrew Morton:
"Three fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
include/linux/property.h: fix typo/compile error
ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()
mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
|
|
This fixes commit d76eebfa175e ("include/linux/property.h: fix build
issues with gcc-4.4.4").
With that commit we get the following compile error when using the
PROPERTY_ENTRY_INTEGER_ARRAY macro.
include/linux/property.h:201:39: error: `u32_data' undeclared (first
use in this function)
PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u32, _val_)
^
include/linux/property.h:193:17: note: in definition of macro
`PROPERTY_ENTRY_INTEGER_ARRAY'
{ .pointer = { _type_##_data = _val_ } }, \
^
This needs a '.' to reference the union member. It seems this was just
overlooked here since it is done correctly in similar constructs in
other parts of the original commit.
This fix is in preparation of upcoming commits that will use this macro.
Fixes: commit d76eebfa175e ("include/linux/property.h: fix build issues with gcc-4.4.4")
Link: http://lkml.kernel.org/r/2de3b929290d88a723ed829a3e3cbd02044714df.1475114627.git.johnyoun@synopsys.com
Signed-off-by: John Youn <johnyoun@synopsys.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.
In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
there are 2 process repeatedly performing the following operations
respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
ftruncate(fd, CLUSTER_SIZE) again and again.
This is the backtrace when the deadlock happens:
__wait_on_bit_lock+0x50/0xa0
__lock_page+0xb7/0xc0
ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
do_page_mkwrite+0x66/0xc0
handle_mm_fault+0x685/0x1350
__do_page_fault+0x1d8/0x4d0
trace_do_page_fault+0x37/0xf0
do_async_page_fault+0x19/0x70
async_page_fault+0x28/0x30
In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
disk space for this write; ocfs2_try_to_free_truncate_log() will be
called if -ENOSPC is returned; if we're lucky to get enough clusters,
which is usually the case, we start over again.
But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
will deadlock when trying to grab the target page again.
Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
Another deadlock will happen in __do_page_mkwrite() if
ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
locked target page.
These two errors fail on the same path, so fix them by unlocking the
target page manually before ocfs2_free_write_ctxt().
Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
cause.
Changes since v1:
1. Also put ENOMEM error case into consideration.
Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: He Gang <ghe@suse.com>
Acked-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
replace_page_cache_page()
Antonio reports the following crash when using fuse under memory pressure:
kernel BUG at /build/linux-a2WvEb/linux-4.4.0/mm/workingset.c:346!
invalid opcode: 0000 [#1] SMP
Modules linked in: all of them
CPU: 2 PID: 63 Comm: kswapd0 Not tainted 4.4.0-36-generic #55-Ubuntu
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
task: ffff88040cae6040 ti: ffff880407488000 task.ti: ffff880407488000
RIP: shadow_lru_isolate+0x181/0x190
Call Trace:
__list_lru_walk_one.isra.3+0x8f/0x130
list_lru_walk_one+0x23/0x30
scan_shadow_nodes+0x34/0x50
shrink_slab.part.40+0x1ed/0x3d0
shrink_zone+0x2ca/0x2e0
kswapd+0x51e/0x990
kthread+0xd8/0xf0
ret_from_fork+0x3f/0x70
which corresponds to the following sanity check in the shadow node
tracking:
BUG_ON(node->count & RADIX_TREE_COUNT_MASK);
The workingset code tracks radix tree nodes that exclusively contain
shadow entries of evicted pages in them, and this (somewhat obscure)
line checks whether there are real pages left that would interfere with
reclaim of the radix tree node under memory pressure.
While discussing ways how fuse might sneak pages into the radix tree
past the workingset code, Miklos pointed to replace_page_cache_page(),
and indeed there is a problem there: it properly accounts for the old
page being removed - __delete_from_page_cache() does that - but then
does a raw raw radix_tree_insert(), not accounting for the replacement
page. Eventually the page count bits in node->count underflow while
leaving the node incorrectly linked to the shadow node LRU.
To address this, make sure replace_page_cache_page() uses the tracked
page insertion code, page_cache_tree_insert(). This fixes the page
accounting and makes sure page-containing nodes are properly unlinked
from the shadow node LRU again.
Also, make the sanity checks a bit less obscure by using the helpers for
checking the number of pages and shadows in a radix tree node.
Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Link: http://lkml.kernel.org/r/20160919155822.29498-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org> [3.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
Commit d9676fa152c83b ("ARCv2: Enable LOCKDEP"), changed
local_save_flags() to not return raw STATUS32 but encoded in the form
such that it could be fed directly to CLRI/SETI instructions.
However the STATUS32.E[] was not captured correctly as it corresponds to
bits [4:1] in the register and not [3:0]
Fixes: d9676fa152c83b ("ARCv2: Enable LOCKDEP")
Cc: Evgeny Voevodin <evgeny.voevodin@intel.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
Seem like values assigned as absolute number and not and
shift value, i.e. should be 0 for one node (2^0) and 1 for
couple of nodes (2^1)
Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
In the end of "arc_init_IRQ" STATUS32.IE flag is going to be affected by
"flag" instruction but "flag" never touches IE flag on ARCv2. So "kflag"
instruction must be used instead of "flag".
Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Cc: stable@vger.kernel.org #4.2+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
We used to keep the .exit.* sections as linker would fail in final link
due to references from .debug_frame which itself could not be discardrd
due to the forced "write,alloc" attributes for it.
| LD init/built-in.o
| `.exit.text' referenced in section `.debug_frame' of arch/arc/built-in.o: defined in discarded section `.exit.text' of arch/arc/built-in.o
| Makefile:949: recipe for target 'vmlinux' failed
With .debug_frame now retired, this hack is no longer needed.
kernel binary is now a little bit smaller as well.
closes STAR 9000549913
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
This uses a new set of annoations viz. ENTRY_CFI/END_CFI to enabel cfi
ops generation.
Note that we didn't change the normal ENTRY/EXIT as we don't actually
want unwind info in the trap/exception/interrutp handlers which use
these, as unwinder then gets confused (it keeps recursing vs. stopping).
Semantically these are leaf routines and unwinding should stop when it
hits those routines.
Before
------
28.52% 1.19% 9929 hackbench libuClibc-1.0.17.so [.] __write_nocancel
|
---__write_nocancel
|--8.95%--EV_Trap
| --8.25%--sys_write
| |--3.93%--sock_write_iter
...
|--2.62%--memset <==== [LEAF entry as no unwind info]
^^^^^^
After
-----
29.46% 1.24% 13622 hackbench libuClibc-1.0.17.so [.] __write_nocancel
|
---__write_nocancel
|--9.31%--EV_Trap
| --8.62%--sys_write
| |--4.17%--sock_write_iter
...
|--6.19%--sys_write
| --6.19%--sock_write_iter
| unix_stream_sendmsg
| |--1.62%--sock_alloc_send_pskb
| |--0.89%--sock_def_readable
| |--0.88%--_raw_spin_unlock_irqrestore
| |--0.69%--memset
| | ^^^^^^ <==== [now in proper callframe]
| |
| --0.52%--skb_copy_datagram_from_iter
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
1. detect whether binutils supports the cfi pseudo ops
2. define conditional macros to generate the ops
3. define new ENTRY_CFI/END_CFI to annotate hand asm code.
- Needed because we don't want to emit dwarf info in general ENTRY/END
used by lowest level trap/exception/interrutp handlers as unwinder
gets confused trying to unwind out of them. We want unwinder to
instead stop when it hits onfo those routines
- These provide minimal start/end cfi ops assuming routine doesn't
touch stack memory/regs
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
This essentially removes ENTRY() assembler annotation for this symbol
since it didn't have a pairing END()
This in ahead of introducing cfi pseudo ops in ENTRY/END which expects
paired cfi_startproc/cfi_endproc
| ../arch/arc/kernel/entry.S: Assembler messages:
| ../arch/arc/kernel/entry.S:270: Error: previous CFI entry not closed (missing .cfi_endproc)
| ../scripts/Makefile.build:326: recipe for target 'arch/arc/kernel/entry-arcv2.o' failed
| make[4]: *** [arch/arc/kernel/entry-arcv2.o] Error 1
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
In .debug_frame based unwinding regime, we used to force -gdwarf-2 since
kernel unwinder only claimed to handle dwarf 2. This changed since commit
6d0d506012c93d ("ARC: dw2 unwind: Don't bail for CIE.version != 1")
which added some support beyond dwarf 2, atleast to handle CIE != 1
The ill-effect of -gdwarf-2 is that it forces generation of .debug_*
sections, which bloats loadable modules .ko files. For the curious, this
doesn't affect vmlinx binary since linker script discards .debug_* but
same discard is not yet implemented for modules.
So it seems we can drop the -gdwarf-2 toggle, which should not be needed
anyways given that we now use .eh_frame based unwinding.
I've verified using GNU 2016.09-engo10 that the actual unwind info is
not different with or w/o this toggle - but the debug_* sections are
gone for good.
before
-----
arc-linux-readelf -S q_proc.ko-unwinding-1-eh_frame-switch | grep debug
[15] .debug_info PROGBITS 00000000 000300 00d08d 00 0 0 1
[16] .rela.debug_info RELA 00000000 0162a0 008844 0c I 29 15 4
[17] .debug_abbrev PROGBITS 00000000 00d38d 0005f8 00 0 0 1
[18] .debug_loc PROGBITS 00000000 00d985 000070 00 0 0 1
[19] .rela.debug_loc RELA 00000000 01eae4 0000c0 0c I 29 18 4
[20] .debug_aranges PROGBITS 00000000 00d9f5 000040 00 0 0 1
[21] .rela.debug_arang RELA 00000000 01eba4 000030 0c I 29 20 4
[22] .debug_ranges PROGBITS 00000000 00da35 000018 00 0 0 1
[23] .rela.debug_range RELA 00000000 01ebd4 000030 0c I 29 22 4
[24] .debug_line PROGBITS 00000000 00da4d 000b5b 00 0 0 1
[25] .rela.debug_line RELA 00000000 01ec04 0000cc 0c I 29 24 4
[26] .debug_str PROGBITS 00000000 00e5a8 007831 01 MS 0 0 1
after
----
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
So finally after almost 8 years of dealing with .debug_frame, we are
finally switching to .eh_frame. The reason being stripped kernel
binaries had non-functional unwinder as .debug_frame was gone.
Also, in general .eh_frame seems more common way of doing unwinding.
This also folds a revert of f52e126cc747 ("ARC: unwind: ensure that
.debug_frame is generated (vs. .eh_frame)") to ensure that we start
getting .eh_frame
Reported-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
This paves way for switching to .eh_frame based unwindiing
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
We used to live with PERF_COUNT_HW_CACHE_REFERENCES and
PERF_COUNT_HW_CACHE_REFERENCES not specified on ARC.
Those events are actually aliases to 2 cache events that we do support
and so this change sets "cache-reference" and "cache-misses" events
in the same way as "L1-dcache-loads" and L1-dcache-load-misses.
And while at it adding debug info for cache events as well as doing a
subtle fix in HW events debug info - config value is much better
represented by hex so we may see not only event index but as well other
control bits set (if they exist).
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|