Age | Commit message (Collapse) | Author |
|
When a chain is updated, a counter can be attached. if so,
the nft_counters_enabled should be increased.
test commands:
%nft add table ip filter
%nft add chain ip filter input { type filter hook input priority 4\; }
%iptables-compat -Z input
%nft delete chain ip filter input
we can see below messages.
[ 286.443720] jump label: negative count!
[ 286.448278] WARNING: CPU: 0 PID: 1459 at kernel/jump_label.c:197 __static_key_slow_dec_cpuslocked+0x6f/0xf0
[ 286.449144] Modules linked in: nf_tables nfnetlink ip_tables x_tables
[ 286.449144] CPU: 0 PID: 1459 Comm: nft Tainted: G W 4.17.0-rc2+ #12
[ 286.449144] RIP: 0010:__static_key_slow_dec_cpuslocked+0x6f/0xf0
[ 286.449144] RSP: 0018:ffff88010e5176f0 EFLAGS: 00010286
[ 286.449144] RAX: 000000000000001b RBX: ffffffffc0179500 RCX: ffffffffb8a82522
[ 286.449144] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88011b7e5eac
[ 286.449144] RBP: 0000000000000000 R08: ffffed00236fce5c R09: ffffed00236fce5b
[ 286.449144] R10: ffffffffc0179503 R11: ffffed00236fce5c R12: 0000000000000000
[ 286.449144] R13: ffff88011a28e448 R14: ffff88011a28e470 R15: dffffc0000000000
[ 286.449144] FS: 00007f0384328700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 286.449144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 286.449144] CR2: 00007f038394bf10 CR3: 0000000104a86000 CR4: 00000000001006f0
[ 286.449144] Call Trace:
[ 286.449144] static_key_slow_dec+0x6a/0x70
[ 286.449144] nf_tables_chain_destroy+0x19d/0x210 [nf_tables]
[ 286.449144] nf_tables_commit+0x1891/0x1c50 [nf_tables]
[ 286.449144] nfnetlink_rcv+0x1148/0x13d0 [nfnetlink]
[ ... ]
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The table field in nft_obj_filter is not an array. In order to check
tablename, we should check if the pointer is set.
Test commands:
%nft add table ip filter
%nft add counter ip filter ct1
%nft reset counters
Splat looks like:
[ 306.510504] kasan: CONFIG_KASAN_INLINE enabled
[ 306.516184] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 306.524775] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 306.528284] Modules linked in: nft_objref nft_counter nf_tables nfnetlink ip_tables x_tables
[ 306.528284] CPU: 0 PID: 1488 Comm: nft Not tainted 4.17.0-rc4+ #17
[ 306.528284] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 306.528284] RIP: 0010:nf_tables_dump_obj+0x52c/0xa70 [nf_tables]
[ 306.528284] RSP: 0018:ffff8800b6cb7520 EFLAGS: 00010246
[ 306.528284] RAX: 0000000000000000 RBX: ffff8800b6c49820 RCX: 0000000000000000
[ 306.528284] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffed0016d96e9a
[ 306.528284] RBP: ffff8800b6cb75c0 R08: ffffed00236fce7c R09: ffffed00236fce7b
[ 306.528284] R10: ffffffff9f6241e8 R11: ffffed00236fce7c R12: ffff880111365108
[ 306.528284] R13: 0000000000000000 R14: ffff8800b6c49860 R15: ffff8800b6c49860
[ 306.528284] FS: 00007f838b007700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 306.528284] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 306.528284] CR2: 00007ffeafabcf78 CR3: 00000000b6cbe000 CR4: 00000000001006f0
[ 306.528284] Call Trace:
[ 306.528284] netlink_dump+0x470/0xa20
[ 306.528284] __netlink_dump_start+0x5ae/0x690
[ 306.528284] ? nf_tables_getobj+0x1b3/0x740 [nf_tables]
[ 306.528284] nf_tables_getobj+0x2f5/0x740 [nf_tables]
[ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables]
[ 306.528284] ? nf_tables_getobj+0x740/0x740 [nf_tables]
[ 306.528284] ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables]
[ 306.528284] ? nft_obj_notify+0x100/0x100 [nf_tables]
[ 306.528284] nfnetlink_rcv_msg+0x8ff/0x932 [nfnetlink]
[ 306.528284] ? nfnetlink_rcv_msg+0x216/0x932 [nfnetlink]
[ 306.528284] netlink_rcv_skb+0x1c9/0x2f0
[ 306.528284] ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
[ 306.528284] ? debug_check_no_locks_freed+0x270/0x270
[ 306.528284] ? netlink_ack+0x7a0/0x7a0
[ 306.528284] ? ns_capable_common+0x6e/0x110
[ ... ]
Fixes: e46abbcc05aa8 ("netfilter: nf_tables: Allow table names of up to 255 chars")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch fixes the following splat.
[118709.054937] BUG: using smp_processor_id() in preemptible [00000000] code: test/1571
[118709.054970] caller is nft_update_chain_stats.isra.4+0x53/0x97 [nf_tables]
[118709.054980] CPU: 2 PID: 1571 Comm: test Not tainted 4.17.0-rc6+ #335
[...]
[118709.054992] Call Trace:
[118709.055011] dump_stack+0x5f/0x86
[118709.055026] check_preemption_disabled+0xd4/0xe4
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The extra forward declaration of pm_qos_get_value() is redundant, so
drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Kernel library has a common function to match user input from sysfs
against an array of strings. Thus, replace bch_read_string_list() by
__sysfs_match_string().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is couple of functions that are used exclusively in sysfs.c.
Move it to there and make them static.
Besides above, it will allow further clean up.
No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is couple of string arrays that are used exclusively in sysfs.c.
Move it to there and make them static.
Besides above, it will allow further clean up.
No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently bcache does not handle backing device failure, if backing
device is offline and disconnected from system, its bcache device can still
be accessible. If the bcache device is in writeback mode, I/O requests even
can success if the requests hit on cache device. That is to say, when and
how bcache handles offline backing device is undefined.
This patch tries to handle backing device offline in a rather simple way,
- Add cached_dev->status_update_thread kernel thread to update backing
device status in every 1 second.
- Add cached_dev->offline_seconds to record how many seconds the backing
device is observed to be offline. If the backing device is offline for
BACKING_DEV_OFFLINE_TIMEOUT (30) seconds, set dc->io_disable to 1 and
call bcache_device_stop() to stop the bache device which linked to the
offline backing device.
Now if a backing device is offline for BACKING_DEV_OFFLINE_TIMEOUT seconds,
its bcache device will be removed, then user space application writing on
it will get error immediately, and handler the device failure in time.
This patch is quite simple, does not handle more complicated situations.
Once the bcache device is stopped, users need to recovery the backing
device, register and attach it manually.
Changelog:
v3: call wait_for_kthread_stop() before exits kernel thread.
v2: remove "bcache: " prefix when calling pr_warn().
v1: initial version.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Junhui Tang <tang.junhui@zte.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Kconfig got text processing tools like we see in Make. Add Kconfig
helper macros to scripts/Kconfig.include like we collect Makefile
macros in scripts/Kbuild.include.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
|
|
The kernel configuration phase is now tightly coupled with the compiler
in use. It will be nice to show the compiler information in Kconfig.
The compiler information will be displayed like this:
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- config
scripts/kconfig/conf --oldaskconfig Kconfig
*
* Linux/arm64 4.16.0-rc1 Kernel Configuration
*
*
* Compiler: aarch64-linux-gnu-gcc (Linaro GCC 7.2-2017.11) 7.2.1 20171011
*
*
* General setup
*
Compile also drivers which will not load (COMPILE_TEST) [N/y/?]
If you use GUI methods such as menuconfig, it will be displayed in the
top menu.
This is simply implemented by using the 'comment' statement. So, it
will be saved into the .config file as well.
This commit has a very important meaning. If the compiler is upgraded,
Kconfig must be re-run since different compilers have different sets
of supported options.
All referenced environments are written to include/config/auto.conf.cmd
so that any environment change triggers syncconfig, and prompt the user
to input new values if needed.
With this commit, something like follows will be added to
include/config/auto.conf.cmd
ifneq "$(CC_VERSION_TEXT)" "aarch64-linux-gnu-gcc (Linaro GCC 7.2-2017.11) 7.2.1 20171011"
include/config/auto.conf: FORCE
endif
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Here are the test cases I used for developing the text expansion
feature.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Add a document for the macro language introduced to Kconfig.
The motivation of this work is to move the compiler option tests to
Kconfig from Makefile. A number of kernel features require the
compiler support. Enabling such features blindly in Kconfig ends up
with a lot of nasty build-time testing in Makefiles. If a chosen
feature turns out unsupported by the compiler, what the build system
can do is either to disable it (silently!) or to forcibly break the
build, despite Kconfig has let the user to enable it. By moving the
compiler capability tests to Kconfig, features unsupported by the
compiler will be hidden automatically.
This change was strongly prompted by Linus Torvalds. You can find
his suggestions [1] [2] in ML. The original idea was to add a new
attribute with 'option shell=...', but I found more generalized text
expansion would make Kconfig more powerful and lovely. The basic
ideas are from Make, but there are some differences.
[1]: https://lkml.org/lkml/2016/12/9/577
[2]: https://lkml.org/lkml/2018/2/7/527
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
|
|
When using a recursively expanded variable, it is a common mistake
to make circular reference.
For example, Make terminates the following code:
X = $(X)
Y := $(X)
Let's detect the circular expansion in Kconfig, too.
On the other hand, a function that recurses itself is a commonly-used
programming technique. So, Make does not check recursion in the
reference with 'call'. For example, the following code continues
running eternally:
X = $(call X)
Y := $(X)
Kconfig allows circular expansion if one or more arguments are given,
but terminates when the same function is recursively invoked 1000 times,
assuming it is a programming mistake.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The special variables, $(filename) and $(lineno), are expanded to a
file name and its line number being parsed, respectively.
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Syntax:
$(info,<text>)
$(warning-if,<condition>,<text>)
$(error-if,<condition>,<text)
The 'info' function prints a message to stdout as in Make.
The 'warning-if' and 'error-if' are similar to 'warning' and 'error'
in Make, but take the condition parameter. They are effective only
when the <condition> part is y.
Kconfig does not implement the lazy expansion as used in the 'if'
'and, 'or' functions in Make. In other words, Kconfig does not
support conditional expansion. The unconditional 'error' function
would always terminate the parsing, hence would be useless in Kconfig.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Make expands the lefthand side of assignment statements. In fact,
Kbuild relies on it since kernel makefiles mostly look like this:
obj-$(CONFIG_FOO) += foo.o
Do likewise in Kconfig.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Support += operator. This appends a space and the text on the
righthand side to a variable.
The timing of the evaluation of the righthand side depends on the
flavor of the variable. If the lefthand side was originally defined
as a simple variable, the righthand side is expanded immediately.
Otherwise, the expansion is deferred. Appending something to an
undefined variable results in a recursive variable.
To implement this, we need to remember the flavor of variables.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The previous commit added variable and user-defined function. They
work similarly in the sense that the evaluation is deferred until
they are used.
This commit adds another type of variable, simply expanded variable,
as we see in Make.
The := operator defines a simply expanded variable, expanding the
righthand side immediately. This works like traditional programming
language variables.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Now, we got a basic ability to test compiler capability in Kconfig.
config CC_HAS_STACKPROTECTOR
def_bool $(shell,($(CC) -Werror -fstack-protector -E -x c /dev/null -o /dev/null 2>/dev/null) && echo y || echo n)
This works, but it is ugly to repeat this long boilerplate.
We want to describe like this:
config CC_HAS_STACKPROTECTOR
bool
default $(cc-option,-fstack-protector)
It is straight-forward to add a new function, but I do not like to
hard-code specialized functions like that. Hence, here is another
feature, user-defined function. This works as a textual shorthand
with parameterization.
A user-defined function is defined by using the = operator, and can
be referenced in the same way as built-in functions. A user-defined
function in Make is referenced like $(call my-func,arg1,arg2), but I
omitted the 'call' to make the syntax shorter.
The definition of a user-defined function contains $(1), $(2), etc.
in its body to reference the parameters. It is grammatically valid
to pass more or fewer arguments when calling it. We already exploit
this feature in our makefiles; scripts/Kbuild.include defines cc-option
which takes two arguments at most, but most of the callers pass only
one argument.
By the way, a variable is supported as a subset of this feature since
a variable is "a user-defined function with zero argument". In this
context, I mean "variable" as recursively expanded variable. I will
add a different flavored variable in the next commit.
The code above can be written as follows:
[Example Code]
success = $(shell,($(1)) >/dev/null 2>&1 && echo y || echo n)
cc-option = $(success,$(CC) -Werror $(1) -E -x c /dev/null -o /dev/null)
config CC_HAS_STACKPROTECTOR
def_bool $(cc-option,-fstack-protector)
[Result]
$ make -s alldefconfig && tail -n 1 .config
CONFIG_CC_HAS_STACKPROTECTOR=y
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Currently, any statement line starts with a keyword with TF_COMMAND
flag. So, the following three lines are dead code.
alloc_string(yytext, yyleng);
zconflval.string = text;
return T_WORD;
If a T_WORD token is returned in this context, it will cause syntax
error in the parser anyway.
The next commit will support the assignment statement where a line
starts with an arbitrary identifier. So, I want the lexer to switch
to the PARAM state only when it sees a command keyword.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Now that 'shell' function is supported, this can be self-contained in
Kconfig.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
|
|
This accepts a single command to execute. It returns the standard
output from it.
[Example code]
config HELLO
string
default "$(shell,echo hello world)"
config Y
def_bool $(shell,echo y)
[Result]
$ make -s alldefconfig && tail -n 2 .config
CONFIG_HELLO="hello world"
CONFIG_Y=y
Caveat:
Like environments, functions are expanded in the lexer. You cannot
pass symbols to function arguments. This is a limitation to simplify
the implementation. I want to avoid the dynamic function evaluation,
which would introduce much more complexity.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This commit adds a new concept 'function' to do more text processing
in Kconfig.
A function call looks like this:
$(function,arg1,arg2,arg3,...)
This commit adds the basic infrastructure to expand functions.
Change the text expansion helpers to take arguments.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
If "mainmenu" is not specified, "Linux Kernel Configuration" is used
as a default prompt.
Given that Kconfig is used in other projects than Linux, let's use
a more generic prompt, "Main menu".
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
There is no more caller of sym_expand_string_value().
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Now that environments are expanded in the lexer, conf_parse() does
not need to expand them explicitly.
The hack introduced by commit 0724a7c32a54 ("kconfig: Don't leak
main menus during parsing") can go away.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
|
|
There are two callers of file_lookup(), but there is no more reason
to expand the given path.
[1] zconf_initscan()
This is used to open the first Kconfig. sym_expand_string_value()
has never been used in a useful way here; before opening the first
Kconfig file, obviously there is no symbol to expand. If you use
expand_string_value() instead, environments in KBUILD_KCONFIG would
be expanded, but I do not see practical benefits for that.
[2] zconf_nextfile()
This is used to open the next file from 'source' statement.
Symbols in the path like "arch/$SRCARCH/Kconfig" needed expanding,
but it was replaced with the direct environment expansion. The
environment has already been expanded before the token is passed
to the parser.
By the way, file_lookup() was already buggy; it expanded a given path,
but it used the path before expansion for look-up:
if (!strcmp(name, file->name)) {
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
|
|
To get access to environment variables, Kconfig needs to define a
symbol using "option env=" syntax. It is tedious to add a symbol entry
for each environment variable given that we need to define much more
such as 'CC', 'AS', 'srctree' etc. to evaluate the compiler capability
in Kconfig.
Adding '$' for symbol references is grammatically inconsistent.
Looking at the code, the symbols prefixed with 'S' are expanded by:
- conf_expand_value()
This is used to expand 'arch/$ARCH/defconfig' and 'defconfig_list'
- sym_expand_string_value()
This is used to expand strings in 'source' and 'mainmenu'
All of them are fixed values independent of user configuration. So,
they can be changed into the direct expansion instead of symbols.
This change makes the code much cleaner. The bounce symbols 'SRCARCH',
'ARCH', 'SUBARCH', 'KERNELVERSION' are gone.
sym_init() hard-coding 'UNAME_RELEASE' is also gone. 'UNAME_RELEASE'
should be replaced with an environment variable.
ARCH_DEFCONFIG is a normal symbol, so it should be simply referenced
without '$' prefix.
The new syntax is addicted by Make. The variable reference needs
parentheses, like $(FOO), but you can omit them for single-letter
variables, like $F. Yet, in Makefiles, people tend to use the
parenthetical form for consistency / clarification.
At this moment, only the environment variable is supported, but I will
extend the concept of 'variable' later on.
The variables are expanded in the lexer so we can simplify the token
handling on the parser side.
For example, the following code works.
[Example code]
config MY_TOOLCHAIN_LIST
string
default "My tools: CC=$(CC), AS=$(AS), CPP=$(CPP)"
[Result]
$ make -s alldefconfig && tail -n 1 .config
CONFIG_MY_TOOLCHAIN_LIST="My tools: CC=gcc, AS=as, CPP=gcc -E"
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Kbuild provides a couple of ways to specify CROSS_COMPILE:
[1] Command line
[2] Environment
[3] arch/*/Makefile (only some architectures)
[4] CONFIG_CROSS_COMPILE
[4] is problematic for the compiler capability tests in Kconfig.
CONFIG_CROSS_COMPILE allows users to change the compiler prefix from
'make menuconfig', etc. It means, the compiler options would have
to be all re-calculated everytime CONFIG_CROSS_COMPILE is changed.
To avoid complexity and performance issues, I'd like to evaluate
the shell commands statically, i.e. only parsing Kconfig files.
I guess the majority is [1] or [2]. Currently, there are only
5 defconfig files that specify CONFIG_CROSS_COMPILE.
arch/arm/configs/lpc18xx_defconfig
arch/hexagon/configs/comet_defconfig
arch/nds32/configs/defconfig
arch/openrisc/configs/or1ksim_defconfig
arch/openrisc/configs/simple_smp_defconfig
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
The kbuild cache was introduced to remember the result of shell
commands, some of which are expensive to compute, such as
$(call cc-option,...).
However, this turned out not so clever as I had first expected.
Actually, it is problematic. For example, "$(CC) -print-file-name"
is cached. If the compiler is updated, the stale search path causes
build error, which is difficult to figure out. Another problem
scenario is cache files could be touched while install targets are
running under the root permission. We can patch them if desired,
but the build infrastructure is getting uglier and uglier.
Now, we are going to move compiler flag tests to the configuration
phase. If this is completed, the result of compiler tests will be
naturally cached in the .config file. We will not have performance
issues of incremental building since this testing only happens at
Kconfig time.
To start this work with a cleaner code base, remove the kbuild
cache first.
Revert the following commits:
Commit 9a234a2e3843 ("kbuild: create directory for make cache only when necessary")
Commit e17c400ae194 ("kbuild: shrink .cache.mk when it exceeds 1000 lines")
Commit 4e56207130ed ("kbuild: Cache a few more calls to the compiler")
Commit 3298b690b21c ("kbuild: Add a cache for generated variables")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
|
|
Looks like this got lost in a merge.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The snapshot trigger currently only affects the main ring buffer, even when
it is used by the instances. This can be confusing as the snapshot trigger
is listed in the instance.
> # cd /sys/kernel/tracing
> # mkdir instances/foo
> # echo snapshot > instances/foo/events/syscalls/sys_enter_fchownat/trigger
> # echo top buffer > trace_marker
> # echo foo buffer > instances/foo/trace_marker
> # touch /tmp/bar
> # chown rostedt /tmp/bar
> # cat instances/foo/snapshot
# tracer: nop
#
#
# * Snapshot is freed *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
> # cat snapshot
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
bash-1189 [000] .... 111.488323: tracing_mark_write: top buffer
Not only did the snapshot occur in the top level buffer, but the instance
snapshot buffer should have been allocated, and it is still free.
Cc: stable@vger.kernel.org
Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
link error when another driver using it is built-in. The
INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
a 'bool' symbol that does not force anything to be a module in turn.
fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
net/9p/trans_rdma.o: In function `rdma_request':
trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
net/9p/trans_rdma.o: In function `rdma_destroy_trans':
trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
Fixes: 9533b292a7ac ("IB: remove redundant INFINIBAND kconfig dependencies")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
btrfs_read_fs_root_no_name() may return ERR_PTR(-ENOENT) or
ERR_PTR(-ENOMEM) and therefore search_ioctl() and
btrfs_search_path_in_tree() should use PTR_ERR() instead of -ENOENT,
which all other callers of btrfs_read_fs_root_no_name() do.
Drop the error message as it would be confusing, the caller of ioctl
will likely interpret the error code and not look into the syslog.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
I got a report that after upgrading to 4.16, someone's filesystems
weren't mounting:
[ 23.845852] BTRFS info (device loop0): unrecognized mount option 'subvol='
Before 4.16, this mounted the default subvolume. It turns out that this
empty "subvol=" is actually an application bug, but it was causing the
application to fail, so it's an ABI break if you squint.
The generic parsing code we use for mount options (match_token())
doesn't match an empty string as "%s". Previously, setup_root_args()
removed the "subvol=" string, but the mount path was cleaned up to not
need that. Add a dummy Opt_subvol_empty to fix this.
The simple workaround is to use / or . for the value of 'subvol=' .
Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
CC: stable@vger.kernel.org # 4.16+
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Looks like the original idea was to print the hex of the flags which is
not coded with their flag name. So use the current buf pointer bp
instead of buf.
Reaching the uknown flags should never happen, it's there just in case.
Fixes: ebce0e01b930b ("btrfs: make block group flags in balance printks human-readable")
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The dedupe range is 16 MiB, with 4 KiB pages and 8 byte pointers, the
arrays can be 32KiB large. To avoid allocation failures due to
fragmented memory, use the allocation with fallback to vmalloc.
The arrays are allocated and freed only inside btrfs_extent_same and
reused for all the ranges.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We support big dedup requests by splitting range to smaller parts, and
call dedupe logic on each of them.
Instead of repeated allocation and deallocation, allocate once at the
beginning and reuse in the iteration.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently btrfs_dedupe_file_range silently restricts the dedupe range to
to 16MiB to limit locking and working memory size and is documented in
manual page as implementation specific.
Let's remove that restriction by iterating over the dedup range in 16MiB
steps. This is backward compatible and will not change anything for
requests smaller then 16MiB.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Split btrfs_extent_same() to two parts where one is the main EXTENT_SAME
entry and a helper that can be repeatedly called on a range. This will
be used in following patches.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_link() calls btrfs_orphan_del() if it's linking an O_TMPFILE but
it doesn't reserve space to do so. Even before the removal of the
orphan_block_rsv it wasn't using it.
Fixes: ef3b9af50bfa ("Btrfs: implement inode_operations callback tmpfile")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We got rid of BTRFS_INODE_HAS_ORPHAN_ITEM and
BTRFS_INODE_ORPHAN_META_RESERVED, so we can renumber the flags to make
them consecutive again.
Signed-off-by: Omar Sandoval <osandov@fb.com>
[ switch them enums so we don't have to do that again ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Now that we don't keep long-standing reservations for orphan items,
root->orphan_block_rsv isn't used. We can git rid of it, along with:
- root->orphan_lock, which was used to protect root->orphan_block_rsv
- root->orphan_inodes, which was used as a refcount for root->orphan_block_rsv
- BTRFS_INODE_ORPHAN_META_RESERVED, which was used to track reservations
in root->orphan_block_rsv
- btrfs_orphan_commit_root(), which was the last user of any of these
and does nothing else
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently, we keep space reserved for all inode orphan items until the
inode is evicted (i.e., all references to it are dropped). We hit an
issue where an application would keep a bunch of deleted files open (by
design) and thus keep a large amount of space reserved, causing ENOSPC
errors when other operations tried to reserve space. This long-standing
reservation isn't absolutely necessary for a couple of reasons:
- We can almost always make the reservation we need or steal from the
global reserve for the orphan item
- If we can't, it's not the end of the world if we drop the orphan item
on the floor and let the next mount clean it up
So, get rid of persistent reservation and just reserve space in
btrfs_evict_inode().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The truncate loop in btrfs_evict_inode() does two things at once:
- It refills the temporary block reserve, potentially stealing from the
global reserve or committing
- It calls btrfs_truncate_inode_items()
The tangle of continues hides the fact that these two steps are actually
separate. Split the first step out into a separate function both for
clarity and so that we can reuse it in a later patch.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In btrfs_evict_inode(), if btrfs_truncate_inode_items() fails, the inode
item will still be in the tree but we still return the ino to the ino
cache. That will blow up later when someone tries to allocate that ino,
so don't return it to the cache.
Fixes: 581bb050941b ("Btrfs: Cache free inode numbers in memory")
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_orphan_commit_root() tries to delete an orphan item for a
subvolume in the tree root, but we don't actually insert that item in
the first place. See commit 0a0d4415e338 ("Btrfs: delete dead code in
btrfs_orphan_add()"). We can get rid of it.
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Now that we don't add orphan items for truncate, there can't be races on
adding or deleting an orphan item, so this bit is unnecessary.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently, we insert an orphan item during a truncate so that if there's
a crash, we don't leak extents past the on-disk i_size. However, since
commit 7f4f6e0a3f6d ("Btrfs: only update disk_i_size as we remove
extents"), we keep disk_i_size in sync with the extent items as we
truncate, so orphan cleanup will never have any extents to remove. Don't
bother with the superfluous orphan item.
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_free_extent() can fail because of ENOMEM. There's no reason to
panic here, we can just abort the transaction.
Fixes: f4b9aa8d3b87 ("btrfs_truncate")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|