Age | Commit message (Collapse) | Author |
|
While profiling TCP stack, I noticed one useless atomic operation
in tcp_sendmsg(), caused by skb_header_release().
It turns out all current skb_header_release() users have a fresh skb,
that no other user can see, so we can avoid one atomic operation.
Introduce __skb_header_release() to clearly document this.
This gave me a 1.5 % improvement on TCP_RR workload.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sparse complains about fec_enet_select_queue() not being static.
Feedback from David Miller [1] was to remove this function instead of making it
static:
"Please just delete this function.
It's overriding code which does exactly the same thing.
Actually, more precisely, this code is duplicating code in a way that
bypasses many core facilitites of the networking. For example, this
override means that socket based flow steering, XPS, etc. are all
not happening on these devices.
Without ->ndo_select_queue(), the flow dissector does __netdev_pick_tx
which is exactly what you want to happen."
[1] http://www.spinics.net/lists/netdev/msg297653.html
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
pull request: wireless-next 2014-09-22
Please pull this batch of updates intended for the 3.18 stream...
For the mac80211 bits, Johannes says:
"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."
For the bluetooth bits, Johan says:
"Here are some more patches intended for 3.18. Most of them are cleanups
or fixes for SMP. The only exception is a fix for BR/EDR L2CAP fixed
channels which should now work better together with the L2CAP
information request procedure."
For the iwlwifi bits, Emmanuel says:
"I fix here dvm which was broken by my last pull request. Arik
continues to work on TDLS and Luca solved a few issues in CT-Kill. Eyal
keeps digging into rate scaling code, more to come soon. Besides this,
nothing really special here."
Beyond that, there are the usual big batches of updates to ath9k, b43,
mwifiex, and wil6210 as well as a handful of other bits here and there.
Also, rtlwifi gets some btcoexist attention from Larry.
Please let me know if there are problems!
====================
Had to adjust the wil6210 code to comply with Joe Perches's recent
change in net-next to make the netdev_*() routines return void instead
of 'int'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In macvtap device delete and open calls can race and
this causes a list curruption of the vlan queue_list.
The race intself is triggered by the idr accessors
that located the vlan device. The device is stored
into and removed from the idr under both an rtnl and
a mutex. However, when attempting to locate the device
in idr, only a mutex is taken. As a result, once cpu
perfoming a delete may take an rtnl and wait for the mutex,
while another cput doing an open() will take the idr
mutex first to fetch the device pointer and later take
an rtnl to add a queue for the device which may have
just gotten deleted.
With this patch, we now hold the rtnl for the duration
of the macvtap_open() call thus making sure that
open will not race with delete.
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No caller or macro uses the return value so make all
the functions return void.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No caller or macro uses the return value so make it void.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Manish Chopra says:
====================
qlcnic: Bug fixes.
This patch series contains following bug fixes:
* Fixes related to ethtool statistics.
* Fix for flash read related API.
Please apply this series to 'net'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
o When TX queues are not allocated, driver does not fill TX queues stats in the buffer.
However, it is also not advancing data pointer by TX queue stats length, which would
misplace all successive stats data in the buffer and will result in mismatch between
stats strings and it's values.
o Fix this by advancing data pointer by TX queue stats length when
queues are not allocated.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
o TX queues stats must be read when queues are allocated regardless
of interface is up or not.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
o Driver is doing memset with zero for total number of stats bytes when
it has already filled some data in the stats buffer, which can overwrite
memory area beyond the length of stats buffer.
o Fix this by initializing stats buffer with zero before filling any data in it.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the API.
In qlcnic_83xx_setup_idc_parameters() routine use qlcnic_83xx_flash_read32() API
which takes flash lock internally instead of the lockless version
qlcnic_83xx_lockless_flash_read32().
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexei Starovoitov says:
====================
eBPF syscall, verifier, testsuite
v14 -> v15:
- got rid of macros with hidden control flow (suggested by David)
replaced macro with explicit goto or return and simplified
where possible (affected patches #9 and #10)
- rebased, retested
v13 -> v14:
- small change to 1st patch to ease 'new userspace with old kernel'
problem (done similar to perf_copy_attr()) (suggested by Daniel)
- the rest unchanged
v12 -> v13:
- replaced 'foo __user *' pointers with __aligned_u64 (suggested by David)
- added __attribute__((aligned(8)) to 'union bpf_attr' to keep
constant alignment between patches
- updated manpage and syscall wrappers due to __aligned_u64
- rebased, retested on x64 with 32-bit and 64-bit userspace and on i386,
build tested on arm32,sparc64
v11 -> v12:
- dropped patch 11 and copied few macros to libbpf.h (suggested by Daniel)
- replaced 'enum bpf_prog_type' with u32 to be safe in compat (.. Andy)
- implemented and tested compat support (not part of this set) (.. Daniel)
- changed 'void *log_buf' to 'char *' (.. Daniel)
- combined struct bpf_work_struct and bpf_prog_info (.. Daniel)
- added better return value explanation to manpage (.. Andy)
- added log_buf/log_size explanation to manpage (.. Andy & Daniel)
- added a lot more info about prog_type and map_type to manpage (.. Andy)
- rebased, tweaked test_stubs
Patches 1-4 establish BPF syscall shell for maps and programs.
Patches 5-10 add verifier step by step
Patch 11 adds test stubs for 'unspec' program type and verifier testsuite
from user space
Note that patches 1,3,4,7 add commands and attributes to the syscall
while being backwards compatible from each other, which should demonstrate
how other commands can be added in the future.
After this set the programs can be loaded for testing only. They cannot
be attached to any events. Though manpage talks about tracing and sockets,
it will be a subject of future patches.
Please take a look at manpage:
BPF(2) Linux Programmer's Manual BPF(2)
NAME
bpf - perform a command on eBPF map or program
SYNOPSIS
#include <linux/bpf.h>
int bpf(int cmd, union bpf_attr *attr, unsigned int size);
DESCRIPTION
bpf() syscall is a multiplexor for a range of different operations on
eBPF which can be characterized as "universal in-kernel virtual
machine". eBPF is similar to original Berkeley Packet Filter (or
"classic BPF") used to filter network packets. Both statically analyze
the programs before loading them into the kernel to ensure that
programs cannot harm the running system.
eBPF extends classic BPF in multiple ways including ability to call in-
kernel helper functions and access shared data structures like eBPF
maps. The programs can be written in a restricted C that is compiled
into eBPF bytecode and executed on the eBPF virtual machine or JITed
into native instruction set.
eBPF Design/Architecture
eBPF maps is a generic storage of different types. User process can
create multiple maps (with key/value being opaque bytes of data) and
access them via file descriptor. In parallel eBPF programs can access
maps from inside the kernel. It's up to user process and eBPF program
to decide what they store inside maps.
eBPF programs are similar to kernel modules. They are loaded by the
user process and automatically unloaded when process exits. Each eBPF
program is a safe run-to-completion set of instructions. eBPF verifier
statically determines that the program terminates and is safe to
execute. During verification the program takes a hold of maps that it
intends to use, so selected maps cannot be removed until the program is
unloaded. The program can be attached to different events. These events
can be packets, tracepoint events and other types in the future. A new
event triggers execution of the program which may store information
about the event in the maps. Beyond storing data the programs may call
into in-kernel helper functions which may, for example, dump stack, do
trace_printk or other forms of live kernel debugging. The same program
can be attached to multiple events. Different programs can access the
same map:
tracepoint tracepoint tracepoint sk_buff sk_buff
event A event B event C on eth0 on eth1
| | | | |
| | | | |
--> tracing <-- tracing socket socket
prog_1 prog_2 prog_3 prog_4
| | | |
|--- -----| |-------| map_3
map_1 map_2
Syscall Arguments
bpf() syscall operation is determined by cmd which can be one of the
following:
BPF_MAP_CREATE
Create a map with given type and attributes and return map FD
BPF_MAP_LOOKUP_ELEM
Lookup element by key in a given map and return its value
BPF_MAP_UPDATE_ELEM
Create or update element (key/value pair) in a given map
BPF_MAP_DELETE_ELEM
Lookup and delete element by key in a given map
BPF_MAP_GET_NEXT_KEY
Lookup element by key in a given map and return key of next
element
BPF_PROG_LOAD
Verify and load eBPF program
attr is a pointer to a union of type bpf_attr as defined below.
size is the size of the union.
union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32 map_type;
__u32 key_size; /* size of key in bytes */
__u32 value_size; /* size of value in bytes */
__u32 max_entries; /* max number of entries in a map */
};
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
__u32 map_fd;
__aligned_u64 key;
union {
__aligned_u64 value;
__aligned_u64 next_key;
};
};
struct { /* anonymous struct used by BPF_PROG_LOAD command */
__u32 prog_type;
__u32 insn_cnt;
__aligned_u64 insns; /* 'const struct bpf_insn *' */
__aligned_u64 license; /* 'const char *' */
__u32 log_level; /* verbosity level of eBPF verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied 'char *' buffer */
};
} __attribute__((aligned(8)));
eBPF maps
maps is a generic storage of different types for sharing data between
kernel and userspace.
Any map type has the following attributes:
. type
. max number of elements
. key size in bytes
. value size in bytes
The following wrapper functions demonstrate how this syscall can be
used to access the maps. The functions use the cmd argument to invoke
different operations.
BPF_MAP_CREATE
int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries)
{
union bpf_attr attr = {
.map_type = map_type,
.key_size = key_size,
.value_size = value_size,
.max_entries = max_entries
};
return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
bpf() syscall creates a map of map_type type and given
attributes key_size, value_size, max_entries. On success it
returns process-local file descriptor. On error, -1 is returned
and errno is set to EINVAL or EPERM or ENOMEM.
The attributes key_size and value_size will be used by verifier
during program loading to check that program is calling
bpf_map_*_elem() helper functions with correctly initialized key
and that program doesn't access map element value beyond
specified value_size. For example, when map is created with
key_size = 8 and program does:
bpf_map_lookup_elem(map_fd, fp - 4)
such program will be rejected, since in-kernel helper function
bpf_map_lookup_elem(map_fd, void *key) expects to read 8 bytes
from 'key' pointer, but 'fp - 4' starting address will cause out
of bounds stack access.
Similarly, when map is created with value_size = 1 and program
does:
value = bpf_map_lookup_elem(...);
*(u32 *)value = 1;
such program will be rejected, since it accesses value pointer
beyond specified 1 byte value_size limit.
Currently only hash table map_type is supported:
enum bpf_map_type {
BPF_MAP_TYPE_UNSPEC,
BPF_MAP_TYPE_HASH,
};
map_type selects one of the available map implementations in
kernel. For all map_types eBPF programs access maps with the
same bpf_map_lookup_elem()/bpf_map_update_elem() helper
functions.
BPF_MAP_LOOKUP_ELEM
int bpf_lookup_elem(int fd, void *key, void *value)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.value = ptr_to_u64(value),
};
return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
}
bpf() syscall looks up an element with given key in a map fd.
If element is found it returns zero and stores element's value
into value. If element is not found it returns -1 and sets
errno to ENOENT.
BPF_MAP_UPDATE_ELEM
int bpf_update_elem(int fd, void *key, void *value)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.value = ptr_to_u64(value),
};
return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}
The call creates or updates element with given key/value in a
map fd. On success it returns zero. On error, -1 is returned
and errno is set to EINVAL or EPERM or ENOMEM or E2BIG. E2BIG
indicates that number of elements in the map reached max_entries
limit specified at map creation time.
BPF_MAP_DELETE_ELEM
int bpf_delete_elem(int fd, void *key)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
};
return bpf(BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
}
The call deletes an element in a map fd with given key. Returns
zero on success. If element is not found it returns -1 and sets
errno to ENOENT.
BPF_MAP_GET_NEXT_KEY
int bpf_get_next_key(int fd, void *key, void *next_key)
{
union bpf_attr attr = {
.map_fd = fd,
.key = ptr_to_u64(key),
.next_key = ptr_to_u64(next_key),
};
return bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
}
The call looks up an element by key in a given map fd and
returns key of the next element into next_key pointer. If key is
not found, it return zero and returns key of the first element
into next_key. If key is the last element, it returns -1 and
sets errno to ENOENT. Other possible errno values are ENOMEM,
EFAULT, EPERM, EINVAL. This method can be used to iterate over
all elements of the map.
close(map_fd)
will delete the map map_fd. Exiting process will delete all
maps automatically.
eBPF programs
BPF_PROG_LOAD
This cmd is used to load eBPF program into the kernel.
char bpf_log_buf[LOG_BUF_SIZE];
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int insn_cnt,
const char *license)
{
union bpf_attr attr = {
.prog_type = prog_type,
.insns = ptr_to_u64(insns),
.insn_cnt = insn_cnt,
.license = ptr_to_u64(license),
.log_buf = ptr_to_u64(bpf_log_buf),
.log_size = LOG_BUF_SIZE,
.log_level = 1,
};
return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
}
prog_type is one of the available program types:
enum bpf_prog_type {
BPF_PROG_TYPE_UNSPEC,
BPF_PROG_TYPE_SOCKET,
BPF_PROG_TYPE_TRACING,
};
By picking prog_type program author selects a set of helper
functions callable from eBPF program and corresponding format of
struct bpf_context (which is the data blob passed into the
program as the first argument). For example, the programs
loaded with prog_type = TYPE_TRACING may call bpf_printk()
helper, whereas TYPE_SOCKET programs may not. The set of
functions available to the programs under given type may
increase in the future.
Currently the set of functions for TYPE_TRACING is:
bpf_map_lookup_elem(map_fd, void *key) // lookup key in a map_fd
bpf_map_update_elem(map_fd, void *key, void *value) // update key/value
bpf_map_delete_elem(map_fd, void *key) // delete key in a map_fd
bpf_ktime_get_ns(void) // returns current ktime
bpf_printk(char *fmt, int fmt_size, ...) // prints into trace buffer
bpf_memcmp(void *ptr1, void *ptr2, int size) // non-faulting memcmp
bpf_fetch_ptr(void *ptr) // non-faulting load pointer from any address
bpf_fetch_u8(void *ptr) // non-faulting 1 byte load
bpf_fetch_u16(void *ptr) // other non-faulting loads
bpf_fetch_u32(void *ptr)
bpf_fetch_u64(void *ptr)
and bpf_context is defined as:
struct bpf_context {
/* argN fields match one to one to arguments passed to trace events */
u64 arg1, arg2, arg3, arg4, arg5, arg6;
/* return value from kretprobe event or from syscall_exit event */
u64 ret;
};
The set of helper functions for TYPE_SOCKET is TBD.
More program types may be added in the future. Like
BPF_PROG_TYPE_USER_TRACING for unprivileged programs.
BPF_PROG_TYPE_UNSPEC is used for testing only. Such programs
cannot be attached to events.
insns array of "struct bpf_insn" instructions
insn_cnt number of instructions in the program
license license string, which must be GPL compatible to call
helper functions marked gpl_only
log_buf user supplied buffer that in-kernel verifier is using to
store verification log. Log is a multi-line string that should
be used by program author to understand how verifier came to
conclusion that program is unsafe. The format of the output can
change at any time as verifier evolves.
log_size size of user buffer. If size of the buffer is not large
enough to store all verifier messages, -1 is returned and errno
is set to ENOSPC.
log_level verbosity level of eBPF verifier, where zero means no
logs provided
close(prog_fd)
will unload eBPF program
The maps are accesible from programs and generally tie the two
together. Programs process various events (like tracepoint, kprobe,
packets) and store the data into maps. User space fetches data from
maps. Either the same or a different map may be used by user space as
configuration space to alter program behavior on the fly.
Events
Once an eBPF program is loaded, it can be attached to an event. Various
kernel subsystems have different ways to do so. For example:
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd));
will attach the program prog_fd to socket sock which was received by
prior call to socket().
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
will attach the program prog_fd to perf event event_fd which was
received by prior call to perf_event_open().
Another way to attach the program to a tracing event is:
event_fd = open("/sys/kernel/debug/tracing/events/skb/kfree_skb/filter");
write(event_fd, "bpf-123"); /* where 123 is eBPF program FD */
/* here program is attached and will be triggered by events */
close(event_fd); /* to detach from event */
EXAMPLES
/* eBPF+sockets example:
* 1. create map with maximum of 2 elements
* 2. set map[6] = 0 and map[17] = 0
* 3. load eBPF program that counts number of TCP and UDP packets received
* via map[skb->ip->proto]++
* 4. attach prog_fd to raw socket via setsockopt()
* 5. print number of received TCP/UDP packets every second
*/
int main(int ac, char **av)
{
int sock, map_fd, prog_fd, key;
long long value = 0, tcp_cnt, udp_cnt;
map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value), 2);
if (map_fd < 0) {
printf("failed to create map '%s'\n", strerror(errno));
/* likely not run as root */
return 1;
}
key = 6; /* ip->proto == tcp */
assert(bpf_update_elem(map_fd, &key, &value) == 0);
key = 17; /* ip->proto == udp */
assert(bpf_update_elem(map_fd, &key, &value) == 0);
struct bpf_insn prog[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), /* r6 = r1 */
BPF_LD_ABS(BPF_B, 14 + 9), /* r0 = ip->proto */
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4),/* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), /* r2 = fp */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = r2 - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd), /* r1 = map_fd */
BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), /* r0 = map_lookup(r1, r2) */
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* if (r0 == 0) goto pc+2 */
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
BPF_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* lock *(u64 *)r0 += r1 */
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(), /* return r0 */
};
prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET, prog, sizeof(prog), "GPL");
assert(prog_fd >= 0);
sock = open_raw_sock("lo");
assert(setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd,
sizeof(prog_fd)) == 0);
for (;;) {
key = 6;
assert(bpf_lookup_elem(map_fd, &key, &tcp_cnt) == 0);
key = 17;
assert(bpf_lookup_elem(map_fd, &key, &udp_cnt) == 0);
printf("TCP %lld UDP %lld packets0, tcp_cnt, udp_cnt);
sleep(1);
}
return 0;
}
RETURN VALUE
For a successful call, the return value depends on the operation:
BPF_MAP_CREATE
The new file descriptor associated with eBPF map.
BPF_PROG_LOAD
The new file descriptor associated with eBPF program.
All other commands
Zero.
On error, -1 is returned, and errno is set appropriately.
ERRORS
EPERM bpf() syscall was made without sufficient privilege (without the
CAP_SYS_ADMIN capability).
ENOMEM Cannot allocate sufficient memory.
EBADF fd is not an open file descriptor
EFAULT One of the pointers ( key or value or log_buf or insns ) is
outside accessible address space.
EINVAL The value specified in cmd is not recognized by this kernel.
EINVAL For BPF_MAP_CREATE, either map_type or attributes are invalid.
EINVAL For BPF_MAP_*_ELEM commands, some of the fields of "union
bpf_attr" unused by this command are not set to zero.
EINVAL For BPF_PROG_LOAD, attempt to load invalid program (unrecognized
instruction or uses reserved fields or jumps out of range or
loop detected or calls unknown function).
EACCES For BPF_PROG_LOAD, though program has valid instructions, it was
rejected, since it was deemed unsafe (may access disallowed
memory region or uninitialized stack/register or function
constraints don't match actual types or misaligned access). In
such case it is recommended to call bpf() again with log_level =
1 and examine log_buf for specific reason provided by verifier.
ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM, indicates that
element with given key was not found.
E2BIG program is too large or a map reached max_entries limit (max
number of elements).
NOTES
These commands may be used only by a privileged process (one having the
CAP_SYS_ADMIN capability).
SEE ALSO
eBPF architecture and instruction set is explained in
Documentation/networking/filter.txt
Linux 2014-09-16 BPF(2)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
1.
the library includes a trivial set of BPF syscall wrappers:
int bpf_create_map(int key_size, int value_size, int max_entries);
int bpf_update_elem(int fd, void *key, void *value);
int bpf_lookup_elem(int fd, void *key, void *value);
int bpf_delete_elem(int fd, void *key);
int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct sock_filter_int *insns, int insn_len,
const char *license);
bpf_prog_load() stores verifier log into global bpf_log_buf[] array
and BPF_*() macros to build instructions
2.
test stubs configure eBPF infra with 'unspec' map and program types.
These are fake types used by user space testsuite only.
3.
verifier tests valid and invalid programs and expects predefined
error log messages from kernel.
40 tests so far.
$ sudo ./test_verifier
#0 add+sub+mul OK
#1 unreachable OK
#2 unreachable2 OK
#3 out of range jump OK
#4 out of range jump2 OK
#5 test1 ld_imm64 OK
...
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds verifier core which simulates execution of every insn and
records the state of registers and program stack. Every branch instruction seen
during simulation is pushed into state stack. When verifier reaches BPF_EXIT,
it pops the state from the stack and continues until it reaches BPF_EXIT again.
For program:
1: bpf_mov r1, xxx
2: if (r1 == 0) goto 5
3: bpf_mov r0, 1
4: goto 6
5: bpf_mov r0, 2
6: bpf_exit
The verifier will walk insns: 1, 2, 3, 4, 6
then it will pop the state recorded at insn#2 and will continue: 5, 6
This way it walks all possible paths through the program and checks all
possible values of registers. While doing so, it checks for:
- invalid instructions
- uninitialized register access
- uninitialized stack access
- misaligned stack access
- out of range stack access
- invalid calling convention
- instruction encoding is not using reserved fields
Kernel subsystem configures the verifier with two callbacks:
- bool (*is_valid_access)(int off, int size, enum bpf_access_type type);
that provides information to the verifer which fields of 'ctx'
are accessible (remember 'ctx' is the first argument to eBPF program)
- const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id);
returns argument constraints of kernel helper functions that eBPF program
may call, so that verifier can checks that R1-R5 types match the prototype
More details in Documentation/networking/filter.txt and in kernel/bpf/verifier.c
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
check that control flow graph of eBPF program is a directed acyclic graph
check_cfg() does:
- detect loops
- detect unreachable instructions
- check that program terminates with BPF_EXIT insn
- check that all branches are within program boundary
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
eBPF programs passed from userspace are using pseudo BPF_LD_IMM64 instructions
to refer to process-local map_fd. Scan the program for such instructions and
if FDs are valid, convert them to 'struct bpf_map' pointers which will be used
by verifier to check access to maps in bpf_map_lookup/update() calls.
If program passes verifier, convert pseudo BPF_LD_IMM64 into generic by dropping
BPF_PSEUDO_MAP_FD flag.
Note that eBPF interpreter is generic and knows nothing about pseudo insns.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
add optional attributes for BPF_PROG_LOAD syscall:
union bpf_attr {
struct {
...
__u32 log_level; /* verbosity level of eBPF verifier */
__u32 log_size; /* size of user buffer */
__aligned_u64 log_buf; /* user supplied 'char *buffer' */
};
};
when log_level > 0 the verifier will return its verification log in the user
supplied buffer 'log_buf' which can be used by program author to analyze why
verifier rejected given program.
'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt
provides several examples of these messages, like the program:
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
BPF_EXIT_INSN(),
will be rejected with the following multi-line message in log_buf:
0: (7a) *(u64 *)(r10 -8) = 0
1: (bf) r2 = r10
2: (07) r2 += -8
3: (b7) r1 = 0
4: (85) call 1
5: (15) if r0 == 0x0 goto pc+1
R0=map_ptr R10=fp
6: (7a) *(u64 *)(r0 +4) = 0
misaligned access off 4 size 8
The format of the output can change at any time as verifier evolves.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
this patch adds all of eBPF verfier documentation and empty bpf_check()
The end goal for the verifier is to statically check safety of the program.
Verifier will catch:
- loops
- out of range jumps
- unreachable instructions
- invalid instructions
- uninitialized register access
- uninitialized stack access
- misaligned stack access
- out of range stack access
- invalid calling convention
More details in Documentation/networking/filter.txt
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
in native eBPF programs userspace is using pseudo BPF_CALL instructions
which encode one of 'enum bpf_func_id' inside insn->imm field.
Verifier checks that program using correct function arguments to given func_id.
If all checks passed, kernel needs to fixup BPF_CALL->imm fields by
replacing func_id with in-kernel function pointer.
eBPF interpreter just calls the function.
In-kernel eBPF users continue to use generic BPF_CALL.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
eBPF programs are similar to kernel modules. They are loaded by the user
process and automatically unloaded when process exits. Each eBPF program is
a safe run-to-completion set of instructions. eBPF verifier statically
determines that the program terminates and is safe to execute.
The following syscall wrapper can be used to load the program:
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int insn_cnt,
const char *license)
{
union bpf_attr attr = {
.prog_type = prog_type,
.insns = ptr_to_u64(insns),
.insn_cnt = insn_cnt,
.license = ptr_to_u64(license),
};
return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
}
where 'insns' is an array of eBPF instructions and 'license' is a string
that must be GPL compatible to call helper functions marked gpl_only
Upon succesful load the syscall returns prog_fd.
Use close(prog_fd) to unload the program.
User space tests and examples follow in the later patches
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'maps' is a generic storage of different types for sharing data between kernel
and userspace.
The maps are accessed from user space via BPF syscall, which has commands:
- create a map with given type and attributes
fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)
returns fd or negative error
- lookup key in a given map referenced by fd
err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
using attr->map_fd, attr->key, attr->value
returns zero and stores found elem into value or negative error
- create or update key/value pair in a given map
err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
using attr->map_fd, attr->key, attr->value
returns zero or negative error
- find and delete element by key in a given map
err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
using attr->map_fd, attr->key
- iterate map elements (based on input key return next_key)
err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size)
using attr->map_fd, attr->key, attr->next_key
- close(fd) deletes the map
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
done as separate commit to ease conflict resolution
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
BPF syscall is a multiplexor for a range of different operations on eBPF.
This patch introduces syscall with single command to create a map.
Next patch adds commands to access maps.
'maps' is a generic storage of different types for sharing data between kernel
and userspace.
Userspace example:
/* this syscall wrapper creates a map with given type and attributes
* and returns map_fd on success.
* use close(map_fd) to delete the map
*/
int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries)
{
union bpf_attr attr = {
.map_type = map_type,
.key_size = key_size,
.value_size = value_size,
.max_entries = max_entries
};
return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
'union bpf_attr' is backwards compatible with future extensions.
More details in Documentation/networking/filter.txt and in manpage
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
"A small fixup to i8042 adding Asus X450LCP to the nomux list"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - fix Asus X450LCP touchpad detection
|
|
AHB bus support was added in v2.6.38, through commit a0b907ee2a71
("ath5k: Add AHB bus support."). That code can only be build if the
Kconfig symbol ATHEROS_AR231X is set. But that symbol has never been
added to the tree. So AHB bus support has always been dead code.
Let's remove all code that depends on ATHEROS_AR231X. If that symbol
ever gets added to the tree the AHB bus support can be re-added too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"NFC: 3.18 pull request
This is the NFC pull request for 3.18.
We've had major updates for TI and ST Microelectronics drivers:
For TI's trf7970a driver:
- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues
For ST Microelectronics' ST21NFCA and ST21NFCB drivers:
- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes
Finally, Marvell add ISO15693 support to the NCI stack, together with a
couple of NCI fixes."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
|
|
Jesper reported that br_netfilter always registers the hooks since
this is part of the bridge core. This harms performance for people that
don't need this.
This patch modularizes br_netfilter so it can be rmmod'ed, thus,
the hooks can be unregistered. I think the bridge netfilter should have
been a separated module since the beginning, Patrick agreed on that.
Note that this is breaking compatibility for users that expect that
bridge netfilter is going to be available after explicitly 'modprobe
bridge' or via automatic load through brctl.
However, the damage can be easily undone by modprobing br_netfilter.
The bridge core also spots a message to provide a clue to people that
didn't notice that this has been deprecated.
On top of that, the plan is that nftables will not rely on this software
layer, but integrate the connection tracking into the bridge layer to
enable stateful filtering and NAT, which is was bridge netfilter users
seem to require.
This patch still keeps the fake_dst_ops in the bridge core, since this
is required by when the bridge port is initialized. So we can safely
modprobe/rmmod br_netfilter anytime.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
|
|
Move nf_bridge_copy_header() as static inline in netfilter_bridge.h
header file. This patch prepares the modularization of the br_netfilter
code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Reduce boilerplate code by using __seq_open_private() instead of seq_open()
in xt_match_open() and xt_target_open().
Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
serial8250_do_startup() adds UART_IER_RDI and UART_IER_RLSI to ier.
serial8250_stop_rx() should remove both.
This is what the serial-omap driver has been doing and is now moved to
the 8250-core since it does no look to be *that* omap specific.
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
serial8250_find_match_or_unused()
Tony noticed that the old omap-serial driver picked the uart "number"
based on the hint given from device tree or platform device's id.
The 8250 based omap driver doesn't do this because the core code does
not honour the ->line argument which is passed by the driver.
This patch aims to keep the same behaviour as with omap-serial. The
function will first try to use the line suggested ->line argument and
then fallback to the old strategy in case the port is taken.
That means the the third uart will always be ttyS2 even if the previous
two have not been enabled in DT.
Reviewed-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The serial8250_do_startup() function unconditionally clears the
interrupts and for that it reads from the RX-FIFO without checking if
there is a byte in the FIFO or not. This works fine on OMAP4+ HW like
AM335x or DRA7.
OMAP3630 ES1.1 (which means probably all OMAP3 and earlier) does not like
this:
|Unhandled fault: external abort on non-linefetch (0x1028) at 0xfb020000
|Internal error: : 1028 [#1] ARM
|Modules linked in:
|CPU: 0 PID: 1 Comm: swapper Not tainted 3.16.0-00022-g7edcb57-dirty #1213
|task: de0572c0 ti: de058000 task.ti: de058000
|PC is at mem32_serial_in+0xc/0x1c
|LR is at serial8250_do_startup+0x220/0x85c
|Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
|Control: 10c5387d Table: 80004019 DAC: 00000015
|[<c03051d4>] (mem32_serial_in) from [<c0307fe8>] (serial8250_do_startup+0x220/0x85c)
|[<c0307fe8>] (serial8250_do_startup) from [<c0309e00>] (omap_8250_startup+0x5c/0xe0)
|[<c0309e00>] (omap_8250_startup) from [<c030863c>] (serial8250_startup+0x18/0x2c)
|[<c030863c>] (serial8250_startup) from [<c030394c>] (uart_startup+0x78/0x1d8)
|[<c030394c>] (uart_startup) from [<c0304678>] (uart_open+0xe8/0x114)
|[<c0304678>] (uart_open) from [<c02e9e10>] (tty_open+0x1a8/0x5a4)
Reviewed-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
While comparing the OMAP-serial and the 8250 part of this I noticed that
the latter does not use run time-pm. Here are the pieces. It is
basically a get before first register access and a last_busy + put after
last access. This has to be enabled from userland _and_ UART_CAP_RPM is
required for this.
The runtime PM can usually work transparently in the background however
there is one exception to this: After serial8250_tx_chars() completes
there still may be unsent bytes in the FIFO (depending on CPU speed vs
baud rate + flow control). Even if the TTY-buffer is empty we do not
want RPM to disable the device because it won't send the remaining
bytes. Instead we leave serial8250_tx_chars() with RPM enabled and wait
for the FIFO empty interrupt. Once we enter serial8250_tx_chars() with
an empty buffer we know that the FIFO is empty and since we are not going
to send anything, we can disable the device.
That xchg() is to ensure that serial8250_tx_chars() can be called
multiple times and only the first invocation will actually invoke the
runtime PM function. So that the last invocation of __stop_tx() will
disable runtime pm.
NOTE: do not enable RPM on the device unless you know what you do! If
the device goes idle, it won't be woken up by incomming RX data _unless_
there is a wakeup irq configured which is usually the RX pin configure
for wakeup via the reset module. The RX activity will then wake up the
device from idle. However the first character is garbage and lost. The
following bytes will be received once the device is up in time. On the
beagle board xm (omap3) it takes approx 13ms from the first wakeup byte
until the first byte that is received properly if the device was in
core-off.
v5…v8:
- drop RPM from serial8250_set_mctrl() it will be used in
restore path which already has RPM active and holds
dev->power.lock
v4…v5:
- add a wrapper around rpm function and introduce UART_CAP_RPM
to ensure RPM put is invoked after the TX FIFO is empty.
v3…v4:
- added runtime to the console code
- removed device_may_wakeup() from serial8250_set_sleep()
Cc: mika.westerberg@linux.intel.com
Reviewed-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The OMAP UART provides support for HW assisted flow control. What is
missing is the support to throttle / unthrottle callbacks which are used
by the omap-serial driver at the moment.
This patch adds the callbacks. It should be safe to add them since they
are only invoked from the serial_core (uart_throttle()) if the feature
flags are set.
Reviewed-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"A CONFIG_STACK_GROWSUP=y fix, and a hotplug llc CPU mask fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix unreleased llc_shared_mask bit during CPU hotplug
sched: Fix end_of_stack() and location of stack canary for architectures using CONFIG_STACK_GROWSUP
|
|
Merge fixes from Andrew Morton:
"9 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: softdirty: keep bit when zapping file pte
fs/cachefiles: add missing \n to kerror conversions
genalloc: fix device node resource counter
drivers/rtc/rtc-efi.c: add missing module alias
mm, slab: initialize object alignment on cache creation
mm: softdirty: addresses before VMAs in PTE holes aren't softdirty
ocfs2/dlm: do not get resource spinlock if lockres is new
nilfs2: fix data loss with mmap()
ocfs2: free vol_label in ocfs2_delete_osb()
|
|
This fixes the same bug as b43790eedd31 ("mm: softdirty: don't forget to
save file map softdiry bit on unmap") and 9aed8614af5a ("mm/memory.c:
don't forget to set softdirty on file mapped fault") where the return
value of pte_*mksoft_dirty was being ignored.
To be sure that no other pte/pmd "mk" function return values were being
ignored, I annotated the functions in arch/x86/include/asm/pgtable.h
with __must_check and rebuilt.
The userspace effect of this bug is that the softdirty mark might be
lost if a file mapped pte get zapped.
Signed-off-by: Peter Feiner <pfeiner@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 0227d6abb378 ("fs/cachefiles: replace kerror by pr_err") didn't
include newline featuring in original kerror definition
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reported-by: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org> [3.16.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Decrement the np_pool device_node refcount, which was incremented on
the preceding of_parse_phandle() call.
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Without proper alias kernel module is not loaded for rtc-efi driver.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Cc: dann frazier <dannf@dannf.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since commit 4590685546a3 ("mm/sl[aou]b: Common alignment code"), the
"ralign" automatic variable in __kmem_cache_create() may be used as
uninitialized.
The proper alignment defaults to BYTES_PER_WORD and can be overridden by
SLAB_RED_ZONE or the alignment specified by the caller.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Andrei Elovikov <a.elovikov@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In PTE holes that contain VM_SOFTDIRTY VMAs, unmapped addresses before
VM_SOFTDIRTY VMAs are reported as softdirty by /proc/pid/pagemap. This
bug was introduced in commit 68b5a6524856 ("mm: softdirty: respect
VM_SOFTDIRTY in PTE holes"). That commit made /proc/pid/pagemap look at
VM_SOFTDIRTY in PTE holes but neglected to observe the start of VMAs
returned by find_vma.
Tested:
Wrote a selftest that creates a PMD-sized VMA then unmaps the first
page and asserts that the page is not softdirty. I'm going to send the
pagemap selftest in a later commit.
Signed-off-by: Peter Feiner <pfeiner@google.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Jamie Liu <jamieliu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a deadlock case which reported by Guozhonghua:
https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html
This case is caused by &res->spinlock and &dlm->master_lock
misordering in different threads.
It was introduced by commit 8d400b81cc83 ("ocfs2/dlm: Clean up refmap
helpers"). Since lockres is new, it doesn't not require the
&res->spinlock. So remove it.
Fixes: 8d400b81cc83 ("ocfs2/dlm: Clean up refmap helpers")
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Guozhonghua <guozhonghua@h3c.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system. It is easily
reproducible with the following script:
----------------[BEGIN SCRIPT]--------------------
mkfs.nilfs2 -f /dev/sdb
mount /dev/sdb /mnt
dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
/root/mmaptest/mmaptest /mnt/testfile 30 10 5
sync
CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
umount /mnt
mount /dev/sdb /mnt
CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
umount /mnt
echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
echo "AFTER MMAP:\t$CHECKSUM_AFTER"
echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
----------------[END SCRIPT]--------------------
The mmaptest tool looks something like this (very simplified, with
error checking removed):
----------------[BEGIN mmaptest]--------------------
data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, file_offset);
for (i = 0; i < write_count; ++i) {
memcpy(data + i * 4096, buf, sizeof(buf));
msync(data, file_size - file_offset, MS_SYNC))
}
----------------[END mmaptest]--------------------
The output of the script looks something like this:
BEFORE MMAP: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
AFTER MMAP: 6604a1c31f10780331a6850371b3a313 /mnt/testfile
AFTER REMOUNT: 281ed1d5ae50e8419f9b978aab16de83 /mnt/testfile
So it is clear, that the changes done using mmap() do not survive a
remount. This can be reproduced a 100% of the time. The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").
If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it. In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.
This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.
[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
osb->vol_label is malloced in ocfs2_initialize_super but not freed if
error occurs or during umount, thus causing a memory leak.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The warning was introduced in 2009 (commit 4bf1fa5a34aa ([ARM] 5613/1:
implement CALLER_ADDRESSx)). The only "problem" here is that
CALLER_ADDRESSx for x > 1 returns NULL which doesn't do much harm.
The drawback of implementing a fix (i.e. use unwind tables to implement CALLER_ADDRESSx) is that much of the unwinder code would need to be marked as not
traceable.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Syntactically FOOTBRIDGE and ARCH_FOOTBRIDGE are identical (the former
is defined in an if ARCH_FOOTBRIDGE block and the latter selects the
former).
Sematically FOOTBRIDGE means "we have a DC21285 (aka footbridge) device
in the system" and ARCH_FOOTBRIDGE is the support for boards with a
footbridge device, so ARCH_FOOTBRIDGE is the better symbol here.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|