Age | Commit message (Collapse) | Author |
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit 3034a14 "ima: pass 'opened' flag to identify newly created files"
stopped identifying empty files as new files. However new empty files
can be created using the mknodat syscall. On systems with IMA-appraisal
enabled, these empty files are not labeled with security.ima extended
attributes properly, preventing them from subsequently being opened in
order to write the file data contents. This patch defines a new hook
named ima_post_path_mknod() to mark these empty files, created using
mknodat, as new in order to allow the file data contents to be written.
In addition, files with security.ima xattrs containing a file signature
are considered "immutable" and can not be modified. The file contents
need to be written, before signing the file. This patch relaxes this
requirement for new files, allowing the file signature to be written
before the file contents.
Changelog:
- defer identifying files with signatures stored as security.ima
(based on Dmitry Rozhkov's comments)
- removing tests (eg. dentry, dentry->d_inode, inode->i_size == 0)
(based on Al's review)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Al Viro <<viro@zeniv.linux.org.uk>
Tested-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
|
|
This patch is based on top of the "vfs: support for a common kernel file
loader" patch set. In general when the kernel is reading a file into
memory it does not want anything else writing to it.
The kernel currently only forbids write access to a file being executed.
This patch extends this locking to files being read by the kernel.
Changelog:
- moved function to kernel_read_file() - Mimi
- updated patch description - Mimi
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@huawei.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
|
|
* if we have a hashed negative dentry and either CREAT|EXCL on
r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
with failing may_o_create(), we should fail with EROFS or the
error may_o_create() has returned, but not ENOENT. Which is what
the current code ends up returning.
* if we have CREAT|TRUNC hitting a regular file on a read-only
filesystem, we can't fail with EROFS here. At the very least,
not until we'd done follow_managed() - we might have a writable
file (or a device, for that matter) bound on top of that one.
Moreover, the code downstream will see that O_TRUNC and attempt
to grab the write access (*after* following possible mount), so
if we really should fail with EROFS, it will happen. No need
to do that inside atomic_open().
The real logics is much simpler than what the current code is
trying to do - if we decided to go for simple lookup, ended
up with a negative dentry *and* had create_error set, fail with
create_error. No matter whether we'd got that negative dentry
from lookup_real() or had found it in dcache.
Cc: stable@vger.kernel.org # v3.6+
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
A fault in a user provided buffer may lead anywhere, and lockdep warns
that we have a potential deadlock between the mm->mmap_sem and the
kernfs file mutex:
[ 82.811702] ======================================================
[ 82.811705] [ INFO: possible circular locking dependency detected ]
[ 82.811709] 4.5.0-rc4-gfxbench+ #1 Not tainted
[ 82.811711] -------------------------------------------------------
[ 82.811714] kms_setmode/5859 is trying to acquire lock:
[ 82.811717] (&dev->struct_mutex){+.+.+.}, at: [<ffffffff8150d9c1>] drm_gem_mmap+0x1a1/0x270
[ 82.811731]
but task is already holding lock:
[ 82.811734] (&mm->mmap_sem){++++++}, at: [<ffffffff8117b364>] vm_mmap_pgoff+0x44/0xa0
[ 82.811745]
which lock already depends on the new lock.
[ 82.811749]
the existing dependency chain (in reverse order) is:
[ 82.811752]
-> #3 (&mm->mmap_sem){++++++}:
[ 82.811761] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.811766] [<ffffffff8118bc65>] __might_fault+0x75/0xa0
[ 82.811771] [<ffffffff8124da4a>] kernfs_fop_write+0x8a/0x180
[ 82.811787] [<ffffffff811d1023>] __vfs_write+0x23/0xe0
[ 82.811792] [<ffffffff811d1d74>] vfs_write+0xa4/0x190
[ 82.811797] [<ffffffff811d2c14>] SyS_write+0x44/0xb0
[ 82.811801] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.811807]
-> #2 (s_active#6){++++.+}:
[ 82.811814] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.811819] [<ffffffff8124c070>] __kernfs_remove+0x210/0x2f0
[ 82.811823] [<ffffffff8124d040>] kernfs_remove_by_name_ns+0x40/0xa0
[ 82.811828] [<ffffffff8124e9e0>] sysfs_remove_file_ns+0x10/0x20
[ 82.811832] [<ffffffff815318d4>] device_del+0x124/0x250
[ 82.811837] [<ffffffff81531a19>] device_unregister+0x19/0x60
[ 82.811841] [<ffffffff8153c051>] cpu_cache_sysfs_exit+0x51/0xb0
[ 82.811846] [<ffffffff8153c628>] cacheinfo_cpu_callback+0x38/0x70
[ 82.811851] [<ffffffff8109ae89>] notifier_call_chain+0x39/0xa0
[ 82.811856] [<ffffffff8109aef9>] __raw_notifier_call_chain+0x9/0x10
[ 82.811860] [<ffffffff810786de>] cpu_notify+0x1e/0x40
[ 82.811865] [<ffffffff81078779>] cpu_notify_nofail+0x9/0x20
[ 82.811869] [<ffffffff81078ac3>] _cpu_down+0x233/0x340
[ 82.811874] [<ffffffff81079019>] disable_nonboot_cpus+0xc9/0x350
[ 82.811878] [<ffffffff810d2e11>] suspend_devices_and_enter+0x5a1/0xb50
[ 82.811883] [<ffffffff810d3903>] pm_suspend+0x543/0x8d0
[ 82.811888] [<ffffffff810d1b77>] state_store+0x77/0xe0
[ 82.811892] [<ffffffff813fa68f>] kobj_attr_store+0xf/0x20
[ 82.811897] [<ffffffff8124e740>] sysfs_kf_write+0x40/0x50
[ 82.811902] [<ffffffff8124dafc>] kernfs_fop_write+0x13c/0x180
[ 82.811906] [<ffffffff811d1023>] __vfs_write+0x23/0xe0
[ 82.811910] [<ffffffff811d1d74>] vfs_write+0xa4/0x190
[ 82.811914] [<ffffffff811d2c14>] SyS_write+0x44/0xb0
[ 82.811918] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.811923]
-> #1 (cpu_hotplug.lock){+.+.+.}:
[ 82.811929] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.811933] [<ffffffff817b6f72>] mutex_lock_nested+0x62/0x3b0
[ 82.811940] [<ffffffff810784c1>] get_online_cpus+0x61/0x80
[ 82.811944] [<ffffffff811170eb>] stop_machine+0x1b/0xe0
[ 82.811949] [<ffffffffa0178edd>] gen8_ggtt_insert_entries__BKL+0x2d/0x30 [i915]
[ 82.812009] [<ffffffffa017d3a6>] ggtt_bind_vma+0x46/0x70 [i915]
[ 82.812045] [<ffffffffa017eb70>] i915_vma_bind+0x140/0x290 [i915]
[ 82.812081] [<ffffffffa01862b9>] i915_gem_object_do_pin+0x899/0xb00 [i915]
[ 82.812117] [<ffffffffa0186555>] i915_gem_object_pin+0x35/0x40 [i915]
[ 82.812154] [<ffffffffa019a23e>] intel_init_pipe_control+0xbe/0x210 [i915]
[ 82.812192] [<ffffffffa0197312>] intel_logical_rings_init+0xe2/0xde0 [i915]
[ 82.812232] [<ffffffffa0186fe3>] i915_gem_init+0xf3/0x130 [i915]
[ 82.812278] [<ffffffffa02097ed>] i915_driver_load+0xf2d/0x1770 [i915]
[ 82.812318] [<ffffffff81512474>] drm_dev_register+0xa4/0xb0
[ 82.812323] [<ffffffff8151467e>] drm_get_pci_dev+0xce/0x1e0
[ 82.812328] [<ffffffffa01472cf>] i915_pci_probe+0x2f/0x50 [i915]
[ 82.812360] [<ffffffff8143f907>] pci_device_probe+0x87/0xf0
[ 82.812366] [<ffffffff81535f89>] driver_probe_device+0x229/0x450
[ 82.812371] [<ffffffff81536233>] __driver_attach+0x83/0x90
[ 82.812375] [<ffffffff81533c61>] bus_for_each_dev+0x61/0xa0
[ 82.812380] [<ffffffff81535879>] driver_attach+0x19/0x20
[ 82.812384] [<ffffffff8153535f>] bus_add_driver+0x1ef/0x290
[ 82.812388] [<ffffffff81536e9b>] driver_register+0x5b/0xe0
[ 82.812393] [<ffffffff8143e83b>] __pci_register_driver+0x5b/0x60
[ 82.812398] [<ffffffff81514866>] drm_pci_init+0xd6/0x100
[ 82.812402] [<ffffffffa027c094>] 0xffffffffa027c094
[ 82.812406] [<ffffffff810003de>] do_one_initcall+0xae/0x1d0
[ 82.812412] [<ffffffff811595a0>] do_init_module+0x5b/0x1cb
[ 82.812417] [<ffffffff81106160>] load_module+0x1c20/0x2480
[ 82.812422] [<ffffffff81106bae>] SyS_finit_module+0x7e/0xa0
[ 82.812428] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.812433]
-> #0 (&dev->struct_mutex){+.+.+.}:
[ 82.812439] [<ffffffff810cbe59>] __lock_acquire+0x1fc9/0x20f0
[ 82.812443] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.812456] [<ffffffff8150d9e7>] drm_gem_mmap+0x1c7/0x270
[ 82.812460] [<ffffffff81196a14>] mmap_region+0x334/0x580
[ 82.812466] [<ffffffff81196fc4>] do_mmap+0x364/0x410
[ 82.812470] [<ffffffff8117b38d>] vm_mmap_pgoff+0x6d/0xa0
[ 82.812474] [<ffffffff811950f4>] SyS_mmap_pgoff+0x184/0x220
[ 82.812479] [<ffffffff8100a0fd>] SyS_mmap+0x1d/0x20
[ 82.812484] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.812489]
other info that might help us debug this:
[ 82.812493] Chain exists of:
&dev->struct_mutex --> s_active#6 --> &mm->mmap_sem
[ 82.812502] Possible unsafe locking scenario:
[ 82.812506] CPU0 CPU1
[ 82.812508] ---- ----
[ 82.812510] lock(&mm->mmap_sem);
[ 82.812514] lock(s_active#6);
[ 82.812519] lock(&mm->mmap_sem);
[ 82.812522] lock(&dev->struct_mutex);
[ 82.812526]
*** DEADLOCK ***
[ 82.812531] 1 lock held by kms_setmode/5859:
[ 82.812533] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8117b364>] vm_mmap_pgoff+0x44/0xa0
[ 82.812541]
stack backtrace:
[ 82.812547] CPU: 0 PID: 5859 Comm: kms_setmode Not tainted 4.5.0-rc4-gfxbench+ #1
[ 82.812550] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0040.2015.0814.1353 08/14/2015
[ 82.812553] 0000000000000000 ffff880079407bf0 ffffffff813f8505 ffffffff825fb270
[ 82.812560] ffffffff825c4190 ffff880079407c30 ffffffff810c84ac ffff880079407c90
[ 82.812566] ffff8800797ed328 ffff8800797ecb00 0000000000000001 ffff8800797ed350
[ 82.812573] Call Trace:
[ 82.812578] [<ffffffff813f8505>] dump_stack+0x67/0x92
[ 82.812582] [<ffffffff810c84ac>] print_circular_bug+0x1fc/0x310
[ 82.812586] [<ffffffff810cbe59>] __lock_acquire+0x1fc9/0x20f0
[ 82.812590] [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
[ 82.812594] [<ffffffff8150d9c1>] ? drm_gem_mmap+0x1a1/0x270
[ 82.812599] [<ffffffff8150d9e7>] drm_gem_mmap+0x1c7/0x270
[ 82.812603] [<ffffffff8150d9c1>] ? drm_gem_mmap+0x1a1/0x270
[ 82.812608] [<ffffffff81196a14>] mmap_region+0x334/0x580
[ 82.812612] [<ffffffff81196fc4>] do_mmap+0x364/0x410
[ 82.812616] [<ffffffff8117b38d>] vm_mmap_pgoff+0x6d/0xa0
[ 82.812629] [<ffffffff811950f4>] SyS_mmap_pgoff+0x184/0x220
[ 82.812633] [<ffffffff8100a0fd>] SyS_mmap+0x1d/0x20
[ 82.812637] [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
Highly unlikely though this scenario is, we can avoid the issue entirely
by moving the copy operation from out under the kernfs_get_active()
tracking by assigning the preallocated buffer its own mutex. The
temporary buffer allocation doesn't require mutex locking as it is
entirely local.
The locked section was extended by the addition of the preallocated buf
to speed up md user operations in
commit 2b75869bba676c248d8d25ae6d2bd9221dfffdb6
Author: NeilBrown <neilb@suse.de>
Date: Mon Oct 13 16:41:28 2014 +1100
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: NeilBrown <neilb@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of just printing warning messages, if the orphan list is
corrupted, declare the file system is corrupted. If there are any
reserved inodes in the orphaned inode list, declare the file system
corrupted and stop right away to avoid doing more potential damage to
the file system.
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly). Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.
This can be reproduced via:
mke2fs -t ext4 /tmp/foo.img 100
debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
mount -o loop /tmp/foo.img /mnt
(But don't do this if you are using an unpatched kernel if you care
about the system staying functional. :-)
This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1]. (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)
[1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf
Cc: stable@vger.kernel.org
Reported by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Merge fixes from Andrew Morton:
"20 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
Documentation/sysctl/vm.txt: update numa_zonelist_order description
lib/stackdepot.c: allow the stack trace hash to be zero
rapidio: fix potential NULL pointer dereference
mm/memory-failure: fix race with compound page split/merge
ocfs2/dlm: return zero if deref_done message is successfully handled
Ananth has moved
kcov: don't profile branches in kcov
kcov: don't trace the code coverage code
mm: wake kcompactd before kswapd's short sleep
.mailmap: add Frank Rowand
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: call swap_slot_free_notify() with page lock held
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
mailmap: fix Krzysztof Kozlowski's misspelled name
thp: keep huge zero page pinned until tlb flush
mm: exclude HugeTLB pages from THP page_mapped() logic
kexec: export OFFSET(page.compound_head) to find out compound tail page
kexec: update VMCOREINFO for compound_order/dtor
|
|
Single caller passes GFP_NOFS. We can get rid of the
gfpflags_allow_blocking checks as NOFS can block but does not recurse to
filesystem through reclaim.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Similar to __clear_extent_bit, do not fail if the state preallocation
fails as we might not need it. One less BUG_ON.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Single caller passes GFP_NOFS.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Single caller passes GFP_NOFS.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Single caller passes GFP_NOFS.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Callers pass GFP_NOFS and tests pass GFP_KERNEL, but using NOFS there
does not hurt. No need to pass the flags around.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Callers pass GFP_NOFS. No need to pass the flags around.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Callers pass GFP_NOFS. No need to pass the flags around.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Callers pass GFP_NOFS and GFP_KERNEL. No need to pass the flags around.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
All callers pass GFP_NOFS.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
dlm_deref_lockres_done_handler() should return zero if the message is
successfully handled.
Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message").
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.
On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.
This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There is a lifecycle fix in the auth code, a fix for a narrow race
condition on map, and a helpful message in the log when there is a
feature mismatch (which happens frequently now that the default
server-side options have changed)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: report unsupported features to syslog
rbd: fix rbd map vs notify races
libceph: make authorizer destruction independent of ceph_auth_client
|
|
The BTRFS_IOC_SEARCH_TREE ioctl returns file system items directly
to userspace. In order to decode them, full type information is required.
Create a new header, btrfs_tree to contain these since most users won't
need them.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
struct btrfs_ioctl_defrag_range_args is used by the BTRFS_IOC_DEFRAG_RANGE
ioctl.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The BTRFS_BALANCE_* flags are used by struct btrfs_ioctl_balance_args.flags
and btrfs_ioctl_balance_args.{data,meta,sys}.flags in the BTRFS_IOC_BALANCE
ioctl.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The compat/compat_ro/incompat feature flags are used by the feature set/get
ioctls.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The BTRFS_QGROUP_LIMIT_* flags are required to tell the kernel which
fields are valid when using the BTRFS_IOC_QGROUP_LIMIT ioctl.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
BTRFS_LABEL_SIZE is required to define the BTRFS_IOC_GET_FSLABEL and
BTRFS_IOC_SET_FSLABEL ioctls.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
A refactor patch, and avoids user input verification in the
btrfs_dev_replace_start(), and so this function can be reused.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Local variable fs_info, contains root->fs_info, use it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Rename BTRFS_DEVICE_BY_ID so it's more descriptive that we specify the
device by id, it'll be part of the public API. The mask of supported
flags is also renamed, only for internal use.
The error code for unknown flags is EOPNOTSUPP, fixed.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
For clarity how we are going to find the device, let's call it a device
specifier, devspec for short. Also rename the arguments that are a
leftover from previous function purpose.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We should avoid duplicating the device constraints, let's use the
btrfs_raid_array in btrfs_check_raid_min_devices.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Before this patch, btrfs_check_raid_min_devices would do an off-by-one
check of the constraints and not the miminmum check, as its name
suggests. This is not a problem if the only caller is device remove, but
would be confusing for others.
Add an argument with the exact number and let the caller(s) decide if
this needs any adjustments, like when device replace is running.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Underscores are for special functions, use the full prefix for better
stacktrace recognition.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Optimize check for stale device to only be checked when there is device
added or changed. If there is no update to the device, there is no need
to call btrfs_free_stale_device().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This introduces new ioctl BTRFS_IOC_RM_DEV_V2, which uses enhanced struct
btrfs_ioctl_vol_args_v2 to carry devid as an user argument.
The patch won't delete the old ioctl interface and so kernel remains
backward compatible with user land progs.
Test case/script:
echo "0 $(blockdev --getsz /dev/sdf) linear /dev/sdf 0" | dmsetup create bad_disk
mkfs.btrfs -f -d raid1 -m raid1 /dev/sdd /dev/sde /dev/mapper/bad_disk
mount /dev/sdd /btrfs
dmsetup suspend bad_disk
echo "0 $(blockdev --getsz /dev/sdf) error /dev/sdf 0" | dmsetup load bad_disk
dmsetup resume bad_disk
echo "bad disk failed. now deleting/replacing"
btrfs dev del 3 /btrfs
echo $?
btrfs fi show /btrfs
umount /btrfs
btrfs-show-super /dev/sdd | egrep num_device
dmsetup remove bad_disk
wipefs -a /dev/sdf
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Martin <m_btrfs@ml1.co.uk>
[ adjust messages, s/disk/device/ ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
With the previous patches now the btrfs_scratch_superblocks() is ready to
be used in btrfs_rm_device() so use it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ use GFP_KERNEL ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The operation of device replace and device delete follows same steps upto
some depth with in btrfs kernel, however they don't share codes. This
enhancement will help replace and delete to share codes.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_rm_device() has a section of the code which can be replaced
btrfs_find_device_by_user_input()
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The patch renames btrfs_dev_replace_find_srcdev() to
btrfs_find_device_by_user_input() and moves it to volumes.c, so that
delete device can use it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
__check_raid_min_device() which was pealed from btrfs_rm_device()
maintianed its original code to show the block move. This patch cleans up
__check_raid_min_device().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
move a section of btrfs_rm_device() code to check for min number of the
devices into the function __check_raid_min_devices()
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
A part of code from btrfs_scan_one_device() is moved to a new function
btrfs_read_disk_super(), so that former function looks cleaner. (In this
process it also moves the code which ensures null terminating label). So
this creates easy opportunity to merge various duplicate codes on read
disk super. Earlier attempt to merge duplicate codes highlighted that
there were some issues for which there are duplicate codes (to read disk
super), however it was not clear what was the issue. So until we figure
that out, its better to keep them in a separate functions.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ use GFP_KERNEL, PAGE_CACHE_ removal related fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently UDF superblock magic doesn't appear in any userspace header
files and thus userspace apps have hard time checking for this fs. Let's
export the magic to userspace as with any other filesystem.
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Now we force to create empty block group to keep data profile alive,
however, in the below example, we eventually get an empty block group
while we're trying to get more space for other types (metadata/system),
- Before,
block group "A": size=2G, used=1.2G
block group "B": size=2G, used=512M
- After "btrfs balance start -dusage=50 mount_point",
block group "A": size=2G, used=(1.2+0.5)G
block group "C": size=2G, used=0
Since there is no data in block group C, it won't be deleted
automatically and we have to get the unused 2G until the next mount.
Balance itself just moves data and doesn't remove data, so it's safe
to not create such a empty block group if we already have data
allocated in other block groups.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
delalloc space
The delalloc reserved space is calculated in terms of number of bytes
used by an integral number of blocks. This is done by rounding down the
value of 'pos' to the nearest multiple of sectorsize.
The file offset value held by 'pos' variable may not be aligned to
sectorsize and hence when passing it as an argument to
btrfs_delalloc_release_space(), we may end up releasing larger delalloc
space than we originally had reserved.
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|