Age | Commit message (Collapse) | Author |
|
This patch introduces a new syntax to perf event parser:
# perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
By utilizing the basic facilities in bpf-loader.c which allow setting
different slots in a BPF map separately, the newly introduced syntax
allows perf to control specific elements in a BPF map.
Test result:
# cat ./test_bpf_map_3.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(unsigned char),
.max_entries = 100,
};
SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
int func(void *ctx, int err, long nsec)
{
char fmt[] = "%ld\n";
long usec = nsec * 0x10624dd3 >> 38; // nsec / 1000
int key = (int)usec;
unsigned char *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), (unsigned char)*pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
Normal case:
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 15
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
usleep-655 [006] d... 2745434.122814: : 102
usleep-904 [006] d... 2745439.916264: : 103
# ./perf record -e './test_bpf_map_3.c/map:channel.value[all]=104/' usleep 99
# cat /sys/kernel/debug/tracing/trace | grep usleep
usleep-405 [004] d... 2745423.547822: : 101
usleep-655 [006] d... 2745434.122814: : 102
usleep-904 [006] d... 2745439.916264: : 103
usleep-1537 [003] d... 2745538.053737: : 104
Error case:
# ./perf record -e './test_bpf_map_3.c/map:channel.value[10...1000]=104/' usleep 99
event syntax error: '..annel.value[10...1000]=104/'
\___ Index too large
Hint: Valid config terms:
map:[<arraymap>].value<indices>=[value]
map:[<eventmap>].event<indices>=[event]
where <indices> is something like [0,3...5] or [all]
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-9-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This patch introduces basic facilities to support config different slots
in a BPF map one by one.
array.nr_ranges and array.ranges are introduced into 'struct
parse_events_term', where ranges is an array of indices range (start,
length) which will be configured by this config term. nr_ranges is the
size of the array. The array is passed to 'struct bpf_map_priv'. To
indicate the new type of configuration, BPF_MAP_KEY_RANGES is added as a
new key type. bpf_map_config_foreach_key() is extended to iterate over
those indices instead of all possible keys.
Code in this commit will be enabled by following commit which enables
the indices syntax for array configuration.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Spotted-by: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93441
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455719489-3008-1-git-send-email-imre.deak@intel.com
(cherry picked from commit 4d800030238878c1a98d1d3a37a3d673eea661ce)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-13-git-send-email-imre.deak@intel.com
(cherry picked from commit ecb2448218acf23c401434c26be256147833b221)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-12-git-send-email-imre.deak@intel.com
(cherry picked from commit 5b0921748c0b1d0362bbfa802dc25a5c23de7e76)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-11-git-send-email-imre.deak@intel.com
(cherry picked from commit 3f3f42b887fbffc3353e44ef9f32456c19ae4280)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-10-git-send-email-imre.deak@intel.com
(cherry picked from commit 6fa9a5ecf7a54450b255229ac1fc6df276cf0653)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
While at it also add the missing reference around the HW access in
i915_interrupt_info().
v2:
- update the commit message mentioning that this also fixes the
HW access in the interrupt info debugfs entry (Daniel)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-9-git-send-email-imre.deak@intel.com
(cherry picked from commit e129649b7a3e1d50d196e159492496777769437e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-8-git-send-email-imre.deak@intel.com
(cherry picked from commit e27daab49718e3232318d8b539cb302521b4b724)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93439
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-7-git-send-email-imre.deak@intel.com
(cherry picked from commit 1c8fdda1ea947ae8cf994969a1c285acc7089cb9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-6-git-send-email-imre.deak@intel.com
(cherry picked from commit 4feed0ebfa45879bc422c9a0bfa3cffec82ea60a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-5-git-send-email-imre.deak@intel.com
(cherry picked from commit 6392f8478e6f119467b1ad06e30e1f078e62efc1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-4-git-send-email-imre.deak@intel.com
(cherry picked from commit 12fda3876d08519bdf6f0acc70dd35754b422ed5)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The assumption when adding the intel_display_power_is_enabled() checks
was that if it returns success the power can't be turned off afterwards
during the HW access, which is guaranteed by modeset locks. This isn't
always true, so make sure we hold a dedicated reference for the time of
the access.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Revieved-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455296121-4742-3-git-send-email-imre.deak@intel.com
(cherry picked from commit 1729050eb4bbc192e54069e82069f2811313c1dd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
We have many places in the code where we check if a given display power
domain is enabled and if so access registers backed by this power
domain. We assumed that some modeset lock will prevent the power
reference from vanishing in the middle of the HW access, but this
assumption doesn't always hold. In such cases we get either the wakeref
not held, or an unclaimed register access error message. To fix this in
a future-proof way that's independent of other locks wrap any such
access with a get_ref_if_enabled()/put_ref() pair.
Kudos to Ville and Joonas for the ideas of this new interface.
v2:
- init the power_domains ptr when declaring it everywhere (Joonas)
v3:
- don't report the device to be powered if runtime PM is disabled
CC: Mika Kuoppala <mika.kuoppala@intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455711462-7442-1-git-send-email-imre.deak@intel.com
(cherry picked from commit 09731280028ce03e6a27e1998137f1775a2839f3)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
A new syntax is added to the parser so that the user can access
predefined perf events in BPF objects.
After this patch, BPF programs for perf are finally able to utilize
bpf_perf_event_read() introduced in commit 35578d798400 ("bpf: Implement
function bpf_perf_event_read() that get the selected hardware PMU
counter").
Test result:
# cat test_bpf_map_2.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
static int (*get_smp_processor_id)(void) =
(void *)BPF_FUNC_get_smp_processor_id;
static int (*perf_event_read)(struct bpf_map_def *, int) =
(void *)BPF_FUNC_perf_event_read;
struct bpf_map_def SEC("maps") pmu_map = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = __NR_CPUS__,
};
SEC("func_write=sys_write")
int func_write(void *ctx)
{
unsigned long long val;
char fmt[] = "sys_write: pmu=%llu\n";
val = perf_event_read(&pmu_map, get_smp_processor_id());
trace_printk(fmt, sizeof(fmt), val);
return 0;
}
SEC("func_write_return=sys_write%return")
int func_write_return(void *ctx)
{
unsigned long long val = 0;
char fmt[] = "sys_write_return: pmu=%llu\n";
val = perf_event_read(&pmu_map, get_smp_processor_id());
trace_printk(fmt, sizeof(fmt), val);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
Normal case:
# echo "" > /sys/kernel/debug/tracing/trace
# perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
[SNIP]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (7 samples) ]
# cat /sys/kernel/debug/tracing/trace | grep ls
ls-17066 [000] d... 938449.863301: : sys_write: pmu=1157327
ls-17066 [000] dN.. 938449.863342: : sys_write_return: pmu=1225218
ls-17066 [000] d... 938449.863349: : sys_write: pmu=1241922
ls-17066 [000] dN.. 938449.863369: : sys_write_return: pmu=1267445
Normal case (system wide):
# echo "" > /sys/kernel/debug/tracing/trace
# perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.811 MB perf.data (120 samples) ]
# cat /sys/kernel/debug/tracing/trace | grep -v '18446744073709551594' | grep -v perf | head -n 20
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
gmain-30828 [002] d... 2740551.068992: : sys_write: pmu=84373
gmain-30828 [002] d... 2740551.068992: : sys_write_return: pmu=87696
gmain-30828 [002] d... 2740551.068996: : sys_write: pmu=100658
gmain-30828 [002] d... 2740551.068997: : sys_write_return: pmu=102572
Error case 1:
# perf record -e './test_bpf_map_2.c' ls /
[SNIP]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace | grep ls
ls-17115 [007] d... 2724279.665625: : sys_write: pmu=18446744073709551614
ls-17115 [007] dN.. 2724279.665651: : sys_write_return: pmu=18446744073709551614
ls-17115 [007] d... 2724279.665658: : sys_write: pmu=18446744073709551614
ls-17115 [007] dN.. 2724279.665677: : sys_write_return: pmu=18446744073709551614
(18446744073709551614 is 0xfffffffffffffffe (-2))
Error case 2:
# perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=evt/' -a
event syntax error: '..ps:pmu_map.event=evt/'
\___ Event not found for map setting
Hint: Valid config terms:
map:[<arraymap>].value=[value]
map:[<eventmap>].event=[event]
[SNIP]
Error case 3:
# ls /proc/2348/task/
2348 2505 2506 2507 2508
# perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -p 2348
ERROR: Apply config to BPF failed: Cannot set event to BPF map in multi-thread tracing
Error case 4:
# perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
ERROR: Apply config to BPF failed: Doesn't support inherit event (Hint: use -i to turn off inherit)
Error case 5:
# perf record -i -e raw_syscalls:sys_enter -e './test_bpf_map_2.c/map:pmu_map.event=raw_syscalls:sys_enter/' ls
ERROR: Apply config to BPF failed: Can only put raw, hardware and BPF output event into a BPF map
Error case 6:
# perf record -i -e './test_bpf_map_2.c/map:pmu_map.event=123/' ls /
event syntax error: '.._map.event=123/'
\___ Incorrect value type for map
[SNIP]
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-7-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
bpf__apply_obj_config() is introduced as the core API to apply object
config options to all BPF objects. This patch also does the real work
for setting values for BPF_MAP_TYPE_PERF_ARRAY maps by inserting value
stored in map's private field into the BPF map.
This patch is required because we are not always able to set all BPF
config during parsing. Further patch will set events created by perf to
BPF_MAP_TYPE_PERF_EVENT_ARRAY maps, which is not exist until
perf_evsel__open().
bpf_map_foreach_key() is introduced to iterate over each key needs to be
configured. This function would be extended to support more map types
and different key settings.
In perf record, before start recording, call bpf__apply_config() to turn
on all BPF config options.
Test result:
# cat ./test_bpf_map_1.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
SEC("func=sys_nanosleep")
int func(void *ctx)
{
int key = 0;
char fmt[] = "%d\n";
int *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), *pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
# echo "" > /sys/kernel/debug/tracing/trace
# ./perf record -e './test_bpf_map_1.c/map:channel.value=11/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-18593 [007] d... 2394714.395539: : 11
# ./perf record -e './test_bpf_map_1.c/map:channel.value=101/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1 #P:8
[SNIP]
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-18593 [007] d... 2394714.395539: : 11
usleep-19000 [006] d... 2394831.057840: : 101
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-6-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This patch adds the final step for BPF map configuration. A new syntax
is appended into parser so user can config BPF objects through '/' '/'
enclosed config terms.
After this patch, following syntax is available:
# perf record -e ./test_bpf_map_1.c/map:channel.value=10/ ...
It would takes effect after appling following commits.
Test result:
# cat ./test_bpf_map_1.c
/************************ BEGIN **************************/
#include <uapi/linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
(void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
struct bpf_map_def SEC("maps") channel = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
SEC("func=sys_nanosleep")
int func(void *ctx)
{
int key = 0;
char fmt[] = "%d\n";
int *pval = map_lookup_elem(&channel, &key);
if (!pval)
return 0;
trace_printk(fmt, sizeof(fmt), *pval);
return 0;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
/************************* END ***************************/
- Normal case:
# ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB perf.data ]
- Error case:
# ./perf record -e './test_bpf_map_1.c/map:channel.value/' usleep 10
event syntax error: '..ps:channel:value/'
\___ Config value not set (missing '=')
Hint: Valid config term:
map:[<arraymap>]:value=[value]
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
# ./perf record -e './test_bpf_map_1.c/xmap:channel.value=10/' usleep 10
event syntax error: '..pf_map_1.c/xmap:channel.value=10/'
\___ Invalid object config option
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:xchannel.value=10/' usleep 10
event syntax error: '..p_1.c/map:xchannel.value=10/'
\___ Target map not exist
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:channel.xvalue=10/' usleep 10
event syntax error: '..ps:channel.xvalue=10/'
\___ Invalid object map config option
[SNIP]
# ./perf record -e './test_bpf_map_1.c/map:channel.value=x10/' usleep 10
event syntax error: '..nnel.value=x10/'
\___ Incorrect value type for map
[SNIP]
Change BPF_MAP_TYPE_ARRAY to '1' in test_bpf_map_1.c:
# ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
event syntax error: '..ps:channel.value=10/'
\___ Can't use this config term to this type of map
Hint: Valid config term:
map:[<arraymap>].value=[value]
(add -v to see detail)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
[for parser part]
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-5-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
bpf__config_obj() is introduced as a core API to config BPF object after
loading. One configuration option of maps is introduced. After this
patch BPF object can accept assignments like:
map:my_map.value=1234
(map.my_map.value looks pretty. However, there's a small but hard to fix
problem related to flex's greedy matching. Please see [1]. Choose ':'
to avoid it in a simpler way.)
This patch is more complex than the work it does because the
consideration of extension. In designing BPF map configuration, the
following things should be considered:
1. Array indices selection: perf should allow user setting different
value for different slots in an array, with syntax like:
map:my_map.value[0,3...6]=1234;
2. A map should be set by different config terms, each for a part
of it. For example, set each slot to the pid of a thread;
3. Type of value: integer is not the only valid value type. A perf
counter can also be put into a map after commit 35578d798400
("bpf: Implement function bpf_perf_event_read() that get the
selected hardware PMU counter")
4. For a hash table, it should be possible to use a string or other
value as a key;
5. It is possible that map configuration is unable to be setup
during parsing. A perf counter is an example.
Therefore, this patch does the following:
1. Instead of updating map element during parsing, this patch stores
map config options in 'struct bpf_map_priv'. Following patches
will apply those configs at an appropriate time;
2. Link map operations in a list so a map can have multiple config
terms attached, so different parts can be configured separately;
3. Make 'struct bpf_map_priv' extensible so that the following patches
can add new types of keys and operations;
4. Use bpf_obj_config__map_funcs array to support more map config options.
Since the patch changing the event parser to parse BPF object config is
relative large, I've put it in another commit. Code in this patch can be
tested after applying the next patch.
[1] http://lkml.kernel.org/g/564ED621.4050500@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Cody P Schafer <dev@codyps.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill Smelkov <kirr@nexedi.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1456132275-98875-4-git-send-email-wangnan0@huawei.com
Signed-off-by: He Kuang <hekuang@huawei.com>
[ Changes "maps:my_map.value" to "map:my_map.value", improved error messages ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The dynamic entry is created for each field in a tracepoint event.
Since they have no fixed hpp format index, it should skip when
perf_hpp__reset_width() is called.
This caused following assertion failure..
$ perf record -e sched:sched_switch -a sleep 1
$ perf report -s comm,next_pid --stdio
perf: ui/hist.c:651: perf_hpp__reset_width:
Assertion `!(fmt->idx >= PERF_HPP__MAX_INDEX)' failed.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1456064558-13086-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It missed to update column length of the 'trace' sort key in the
hists__calc_col_len() so it might truncate the output. It calculated
the column length in the ->cmp() callback originally but it doesn't
guarantee it's called always.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1456064558-13086-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The srcline, srcfile and trace sort keys can have long entries. With
commit 89fee7094323 ("perf hists: Do column alignment on the format
iterator"), it now aligns output with hist_entry__snprintf_alignment().
So each (possibly long) sort entries don't need to do it themselves.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1456101153-14519-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Normally the hist entry's srcline and/or srcfile is set during sorting.
However sometime it's possible to a hist entry's srcline is not set yet
after the sorting. This is because the entry is so unique and other
sort keys already make it distinct. Then the srcline/file sort didn't
have a chance to be called during the sorting. In that case it has NULL
srcline/srcfile field and shows nothing.
Before:
$ perf report -s comm,sym,srcline
...
Overhead Command Symbol
-----------------------------------------------------------------
34.42% swapper [k] intel_idle intel_idle.c:0
2.44% perf [.] __poll_nocancel (null)
1.70% gnome-shell [k] fw_domains_get (null)
1.04% Xorg [k] sock_poll (null)
After:
34.42% swapper [k] intel_idle intel_idle.c:0
2.44% perf [.] __poll_nocancel .:0
1.70% gnome-shell [k] fw_domains_get fw_domains_get+42
1.04% Xorg [k] sock_poll socket.c:0
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1456101111-14400-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
A dynamic entry is created for each tracepoint event. When it sets up
the sort key, it checks with existing keys using ->equal() callback.
But it missed to set the ->equal for dynamic entries. The following
segfault was due to the missing ->equal() callback.
(gdb) bt
#0 0x0000000000140003 in ?? ()
#1 0x0000000000537769 in fmt_equal (b=0x2106980, a=0x21067a0) at ui/hist.c:548
#2 perf_hpp__setup_output_field (list=0x8c6d80 <perf_hpp_list>) at ui/hist.c:560
#3 0x00000000004e927e in setup_sorting (evlist=<optimized out>) at util/sort.c:2642
#4 0x000000000043cf50 in cmd_report (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>)
at builtin-report.c:932
#5 0x00000000004865a1 in run_builtin (p=p@entry=0x8bbce0 <commands+192>, argc=argc@entry=7,
argv=argv@entry=0x7ffd24d56ce0) at perf.c:390
#6 0x000000000042dc1f in handle_internal_command (argv=0x7ffd24d56ce0, argc=7) at perf.c:451
#7 run_argv (argv=0x7ffd24d56a70, argcp=0x7ffd24d56a7c) at perf.c:495
#8 main (argc=7, argv=0x7ffd24d56ce0) at perf.c:620
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1456064558-13086-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Older compilers don't like this, for instance, on RHEL6.7:
CC /tmp/build/perf/util/parse-events.o
util/parse-events.c:844: error: redefinition of typedef ‘config_term_func_t’
util/parse-events.c:353: note: previous declaration of ‘config_term_func_t’ was here
So remove the second definition, that should've been just moved in 43d0b97817a4
("perf tools: Enable config and setting names for legacy cache events"), not copied.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 43d0b97817a4 ("perf tools: Enable config and setting names for legacy cache events")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In RHEL 6.7:
CC /tmp/build/perf/util/parse-events.o
cc1: warnings being treated as errors
util/parse-events.c: In function ‘parse_events_add_cache’:
util/parse-events.c:366: error: declaration of ‘error’ shadows a global declaration
util/util.h:136: error: shadowed declaration is here
Rename it to 'err'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 43d0b97817a4 ("perf tools: Enable config and setting names for legacy cache events")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
ARMv6 CPUs do not have virtualisation extensions, but hyp-stub.S is
still included into the image to keep it generic. In order to use ARMv7
instructions during HYP initialisation, add -march=armv7-a flag to
hyp-stub's build.
On an ARMv6 CPU, __hyp_stub_install returns as soon as it detects that
the mode isn't HYP, so we will never reach those instructions.
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
git commit 904818e2f229f3d94ec95f6932a6358c81e73d78
"s390/kernel: introduce fpu-internal.h with fpu helper functions"
introduced the fpregs_store / fp_regs_load helper. These function
fail to save and restore the floating pointer control registers.
The effect is that the FPC is not correctly handled on signal
delivery and signal return.
Cc: stable@vger.kernel.org # 4.4
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
git commit 8070361799ae1e3f4ef347bd10f0a508ac10acfb
"s390: add support for vector extension"
broke 31-bit compat processes in regard to signal handling.
The restore_sigregs_ext32() function is used to restore the additional
elements from the user space signal frame. Among the additional elements
are the upper registers halves for 64-bit register support for 31-bit
processes. The copy_from_user that is used to retrieve the high-gprs
array from the user stack uses an incorrect length, 8 bytes instead of
64 bytes. This causes incorrect upper register halves to get loaded.
Cc: stable@vger.kernel.org # 3.8+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
We can get a hash pte fault with 4k base page size and find the pte
already inserted with 64K base page size. In that case we need to clear
the existing slot information from the old pte. Fix this correctly
With THP, we also clear the slot information with respect to all
the 64K hash pte mapping that 16MB page. They are all invalid
now. This make sure we don't find the slot valid when we fault with
4k base page size. Finding the slot valid should not result in any wrong
behavior because we do check again in hash page table for the validity.
But we can avoid that check completely.
Fixes: a43c0eb8364c022 ("powerpc/mm: Convert 4k hash insert to C")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
During error recovery, the device could be removed as part of the
partial hotplug. The criterion used to come with partial hotplug
is: if the device driver provides error_detected(), slot_reset()
and resume() callbacks, it's immune from hotplug. Otherwise,
it's going to experience partial hotplug during EEH recovery. But
the criterion isn't correct enough: mlx4_core driver for Mellanox
adapters provides error_detected(), slot_reset() callbacks, but
resume() isn't there. Those Mellanox adapters won't be to involved
in the partial hotplug.
This fixes the criterion to a practical one: adpater with driver
that provides error_detected(), slot_reset() will be immune from
partial hotplug. resume() isn't mandatory.
Fixes: f2da4ccf ("powerpc/eeh: More relaxed hotplug criterion")
Cc: stable@vger.kernel.org #v4.4+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Although the ARM vDSO is cleanly separated by code/data with the code
being read-only in userspace mappings, the code page is still writable
from the kernel.
There have been exploits (such as http://itszn.com/blog/?p=21) that
take advantage of this on x86 to go from a bad kernel write to full
root.
Prevent this specific exploit class on ARM as well by putting the vDSO
code page in post-init read-only memory as well.
Before:
vdso: 1 text pages at base 80927000
root@Vexpress:/ cat /sys/kernel/debug/kernel_page_tables
---[ Modules ]---
---[ Kernel Mapping ]---
0x80000000-0x80100000 1M RW NX SHD
0x80100000-0x80600000 5M ro x SHD
0x80600000-0x80800000 2M ro NX SHD
0x80800000-0xbe000000 984M RW NX SHD
After:
vdso: 1 text pages at base 8072b000
root@Vexpress:/ cat /sys/kernel/debug/kernel_page_tables
---[ Modules ]---
---[ Kernel Mapping ]---
0x80000000-0x80100000 1M RW NX SHD
0x80100000-0x80600000 5M ro x SHD
0x80600000-0x80800000 2M ro NX SHD
0x80800000-0xbe000000 984M RW NX SHD
Inspired by https://lkml.org/lkml/2016/1/19/494 based on work by the
PaX Team, Brad Spengler, and Kees Cook.
Signed-off-by: David Brown <david.brown@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Lynch <nathan_lynch@mentor.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1455748879-21872-8-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The vDSO does not need to be writable after __init, so mark it as
__ro_after_init. The result kills the exploit method of writing to the
vDSO from kernel space resulting in userspace executing the modified code,
as shown here to bypass SMEP restrictions: http://itszn.com/blog/?p=21
The memory map (with added vDSO address reporting) shows the vDSO moving
into read-only memory:
Before:
[ 0.143067] vDSO @ ffffffff82004000
[ 0.143551] vDSO @ ffffffff82006000
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81800000 8M ro PSE GLB x pmd
0xffffffff81800000-0xffffffff819f3000 1996K ro GLB x pte
0xffffffff819f3000-0xffffffff81a00000 52K ro NX pte
0xffffffff81a00000-0xffffffff81e00000 4M ro PSE GLB NX pmd
0xffffffff81e00000-0xffffffff81e05000 20K ro GLB NX pte
0xffffffff81e05000-0xffffffff82000000 2028K ro NX pte
0xffffffff82000000-0xffffffff8214f000 1340K RW GLB NX pte
0xffffffff8214f000-0xffffffff82281000 1224K RW NX pte
0xffffffff82281000-0xffffffff82400000 1532K RW GLB NX pte
0xffffffff82400000-0xffffffff83200000 14M RW PSE GLB NX pmd
0xffffffff83200000-0xffffffffc0000000 974M pmd
After:
[ 0.145062] vDSO @ ffffffff81da1000
[ 0.146057] vDSO @ ffffffff81da4000
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81800000 8M ro PSE GLB x pmd
0xffffffff81800000-0xffffffff819f3000 1996K ro GLB x pte
0xffffffff819f3000-0xffffffff81a00000 52K ro NX pte
0xffffffff81a00000-0xffffffff81e00000 4M ro PSE GLB NX pmd
0xffffffff81e00000-0xffffffff81e0b000 44K ro GLB NX pte
0xffffffff81e0b000-0xffffffff82000000 2004K ro NX pte
0xffffffff82000000-0xffffffff8214c000 1328K RW GLB NX pte
0xffffffff8214c000-0xffffffff8227e000 1224K RW NX pte
0xffffffff8227e000-0xffffffff82400000 1544K RW GLB NX pte
0xffffffff82400000-0xffffffff83200000 14M RW PSE GLB NX pmd
0xffffffff83200000-0xffffffffc0000000 974M pmd
Based on work by PaX Team and Brad Spengler.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-7-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The new __ro_after_init section should be writable before init, but
not after. Validate that it gets updated at init and can't be written
to afterwards.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
One of the easiest ways to protect the kernel from attack is to reduce
the internal attack surface exposed when a "write" flaw is available. By
making as much of the kernel read-only as possible, we reduce the
attack surface.
Many things are written to only during __init, and never changed
again. These cannot be made "const" since the compiler will do the wrong
thing (we do actually need to write to them). Instead, move these items
into a memory region that will be made read-only during mark_rodata_ro()
which happens after all kernel __init code has finished.
This introduces __ro_after_init as a way to mark such memory, and adds
some documentation about the existing __read_mostly marking.
This improves the security of the Linux kernel by marking formerly
read-write memory regions as read-only on a fully booted up system.
Based on work by PaX Team and Brad Spengler.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This removes the CONFIG_DEBUG_RODATA option and makes it always enabled.
This simplifies the code and also makes it clearer that read-only mapped
memory is just as fundamental a security feature in kernel-space as it is
in user-space.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
mappings
It may be useful to debug writes to the readonly sections of memory,
so provide a cmdline "rodata=off" to allow for this. This can be
expanded in the future to support "log" and "write" modes, but that
will need to be architecture-specific.
This also makes KDB software breakpoints more usable, as read-only
mappings can now be disabled on any kernel.
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Instead of defining mark_rodata_ro() in each architecture, consolidate it.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Gross <agross@codeaurora.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashok Kumar <ashoks@broadcom.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Brown <david.brown@linaro.org>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Link: http://lkml.kernel.org/r/1455748879-21872-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2016-02-21
this is a pull reqeust of one patch for net/master.
The patch is by Gerhard Uttenthaler and fixes a potential tx overflow in the
ems_usb driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yuval Mintz says:
====================
bnx2x: Fix 848xx phys
This series contains link-related fixes, mostly for the 848xx phys
[2 patches are for 84833, and 2 patches are for 84858].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current initialization sequence is lacking, causing some configurations
to fail.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The phy's firmware version isn't being parsed properly as it's
currently parsed like the rest of the 848xx phys.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's a problem in current 84833 phy configuration -
in case 1Gb link is configured and jumbo-sized packets are being
used, device will experience RX crc errors.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when link is using KR2 it cannot be forced to any speed other
than 20g.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.om>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2016-02-20
Here's an important patch for 4.5 which fixes potential invalid pointer
access when processing completed Bluetooth HCI commands.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The dm9000 driver doesn't work in at least one device-tree
configuration, spitting an error message on irq resource :
[ 1.062495] dm9000 8000000.ethernet: insufficient resources
[ 1.068439] dm9000 8000000.ethernet: not found (-2).
[ 1.073451] dm9000: probe of 8000000.ethernet failed with error -2
The reason behind is that the interrupt might be provided by a gpio
controller, not probed when dm9000 is probed, and needing the probe
deferral mechanism to apply.
Currently, the interrupt is directly taken from resources. This patch
changes this to use the more generic platform_get_irq(), which handles
the deferral.
Moreover, since commit Fixes: 7085a7401ba5 ("drivers: platform: parse
IRQ flags from resources"), the interrupt trigger flags are honored in
platform_get_irq(), so remove the needless code in dm9000.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Marcel Ziswiler <marcel@ziswiler.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Tested-by: Sergei Ianovich <ynvich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
address
fix incorrect indexing of dev->dev_addr[] when copying the MAC address
of FMV-J182 at buf[5].
Signed-off-by: Ken Kawasaki <ken_kawasaki@nifty.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Device emulation supports max size of 4096.
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Murali Karicheri says:
====================
net: ti: netcp: restore get/set_pad_info() functionality
This series fixes a regression and add some improvements for the ease
of maintainance. Incorporated comments against v1.
Changelogs:
v2 : combined 2-3 into one patch as this involves a header change
fixed a parse warning in 3/4 per comment from Arnd.
Removed Sign-off from Arnd against 1/4
added comments in 3/3 to alert on the usage of sw data per review
comments
v1 : added 2-4 to accomodate feedback received from review
v0 : initial version to fix the regression (From Grygorii)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|