Age | Commit message (Collapse) | Author |
|
This patch supports probing for the new BPF_MAP_TYPE_SK_STORAGE.
BPF_MAP_TYPE_SK_STORAGE enforces BTF usage, so the new probe
requires to create and load a BTF also.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch sync the bpf.h to tools/.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
After allowing a bpf prog to
- directly read the skb->sk ptr
- get the fullsock bpf_sock by "bpf_sk_fullsock()"
- get the bpf_tcp_sock by "bpf_tcp_sock()"
- get the listener sock by "bpf_get_listener_sock()"
- avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock"
into different bpf running context.
this patch is another effort to make bpf's network programming
more intuitive to do (together with memory and performance benefit).
When bpf prog needs to store data for a sk, the current practice is to
define a map with the usual 4-tuples (src/dst ip/port) as the key.
If multiple bpf progs require to store different sk data, multiple maps
have to be defined. Hence, wasting memory to store the duplicated
keys (i.e. 4 tuples here) in each of the bpf map.
[ The smallest key could be the sk pointer itself which requires
some enhancement in the verifier and it is a separate topic. ]
Also, the bpf prog needs to clean up the elem when sk is freed.
Otherwise, the bpf map will become full and un-usable quickly.
The sk-free tracking currently could be done during sk state
transition (e.g. BPF_SOCK_OPS_STATE_CB).
The size of the map needs to be predefined which then usually ended-up
with an over-provisioned map in production. Even the map was re-sizable,
while the sk naturally come and go away already, this potential re-size
operation is arguably redundant if the data can be directly connected
to the sk itself instead of proxy-ing through a bpf map.
This patch introduces sk->sk_bpf_storage to provide local storage space
at sk for bpf prog to use. The space will be allocated when the first bpf
prog has created data for this particular sk.
The design optimizes the bpf prog's lookup (and then optionally followed by
an inline update). bpf_spin_lock should be used if the inline update needs
to be protected.
BPF_MAP_TYPE_SK_STORAGE:
-----------------------
To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in
this patch) needs to be created. Multiple BPF_MAP_TYPE_SK_STORAGE maps can
be created to fit different bpf progs' needs. The map enforces
BTF to allow printing the sk-local-storage during a system-wise
sk dump (e.g. "ss -ta") in the future.
The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete
a "sk-local-storage" data from a particular sk.
Think of the map as a meta-data (or "type") of a "sk-local-storage". This
particular "type" of "sk-local-storage" data can then be stored in any sk.
The main purposes of this map are mostly:
1. Define the size of a "sk-local-storage" type.
2. Provide a similar syscall userspace API as the map (e.g. lookup/update,
map-id, map-btf...etc.)
3. Keep track of all sk's storages of this "type" and clean them up
when the map is freed.
sk->sk_bpf_storage:
------------------
The main lookup/update/delete is done on sk->sk_bpf_storage (which
is a "struct bpf_sk_storage"). When doing a lookup,
the "map" pointer is now used as the "key" to search on the
sk_storage->list. The "map" pointer is actually serving
as the "type" of the "sk-local-storage" that is being
requested.
To allow very fast lookup, it should be as fast as looking up an
array at a stable-offset. At the same time, it is not ideal to
set a hard limit on the number of sk-local-storage "type" that the
system can have. Hence, this patch takes a cache approach.
The last search result from sk_storage->list is cached in
sk_storage->cache[] which is a stable sized array. Each
"sk-local-storage" type has a stable offset to the cache[] array.
In the future, a map's flag could be introduced to do cache
opt-out/enforcement if it became necessary.
The cache size is 16 (i.e. 16 types of "sk-local-storage").
Programs can share map. On the program side, having a few bpf_progs
running in the networking hotpath is already a lot. The bpf_prog
should have already consolidated the existing sock-key-ed map usage
to minimize the map lookup penalty. 16 has enough runway to grow.
All sk-local-storage data will be removed from sk->sk_bpf_storage
during sk destruction.
bpf_sk_storage_get() and bpf_sk_storage_delete():
------------------------------------------------
Instead of using bpf_map_(lookup|update|delete)_elem(),
the bpf prog needs to use the new helper bpf_sk_storage_get() and
bpf_sk_storage_delete(). The verifier can then enforce the
ARG_PTR_TO_SOCKET argument. The bpf_sk_storage_get() also allows to
"create" new elem if one does not exist in the sk. It is done by
the new BPF_SK_STORAGE_GET_F_CREATE flag. An optional value can also be
provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE.
The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock. Together,
it has eliminated the potential use cases for an equivalent
bpf_map_update_elem() API (for bpf_prog) in this patch.
Misc notes:
----------
1. map_get_next_key is not supported. From the userspace syscall
perspective, the map has the socket fd as the key while the map
can be shared by pinned-file or map-id.
Since btf is enforced, the existing "ss" could be enhanced to pretty
print the local-storage.
Supporting a kernel defined btf with 4 tuples as the return key could
be explored later also.
2. The sk->sk_lock cannot be acquired. Atomic operations is used instead.
e.g. cmpxchg is done on the sk->sk_bpf_storage ptr.
Please refer to the source code comments for the details in
synchronization cases and considerations.
3. The mem is charged to the sk->sk_omem_alloc as the sk filter does.
Benchmark:
---------
Here is the benchmark data collected by turning on
the "kernel.bpf_stats_enabled" sysctl.
Two bpf progs are tested:
One bpf prog with the usual bpf hashmap (max_entries = 8192) with the
sk ptr as the key. (verifier is modified to support sk ptr as the key
That should have shortened the key lookup time.)
Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE.
Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for
each egress skb and then bump the cnt. netperf is used to drive
data with 4096 connected UDP sockets.
BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run)
27: cgroup_skb name egress_sk_map tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633
loaded_at 2019-04-15T13:46:39-0700 uid 0
xlated 344B jited 258B memlock 4096B map_ids 16
btf_id 5
BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run)
30: cgroup_skb name egress_sk_stora tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739
loaded_at 2019-04-15T13:47:54-0700 uid 0
xlated 168B jited 156B memlock 4096B map_ids 17
btf_id 6
Here is a high-level picture on how are the objects organized:
sk
┌──────┐
│ │
│ │
│ │
│*sk_bpf_storage─────▶ bpf_sk_storage
└──────┘ ┌───────┐
┌───────────┤ list │
│ │ │
│ │ │
│ │ │
│ └───────┘
│
│ elem
│ ┌────────┐
├─▶│ snode │
│ ├────────┤
│ │ data │ bpf_map
│ ├────────┤ ┌─────────┐
│ │map_node│◀─┬─────┤ list │
│ └────────┘ │ │ │
│ │ │ │
│ elem │ │ │
│ ┌────────┐ │ └─────────┘
└─▶│ snode │ │
├────────┤ │
bpf_map │ data │ │
┌─────────┐ ├────────┤ │
│ list ├───────▶│map_node│ │
│ │ └────────┘ │
│ │ │
│ │ elem │
└─────────┘ ┌────────┐ │
┌─▶│ snode │ │
│ ├────────┤ │
│ │ data │ │
│ ├────────┤ │
│ │map_node│◀─┘
│ └────────┘
│
│
│ ┌───────┐
sk └──────────│ list │
┌──────┐ │ │
│ │ │ │
│ │ │ │
│ │ └───────┘
│*sk_bpf_storage───────▶bpf_sk_storage
└──────┘
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Matt Mullins says:
====================
This adds an opt-in interface for tracepoints to expose a writable context to
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that are attached, while
supporting read-only access from existing BPF_PROG_TYPE_RAW_TRACEPOINT
programs, as well as from non-BPF-based tracepoints.
The initial motivation is to support tracing that can be observed from the
remote end of an NBD socket, e.g. by adding flags to the struct nbd_request
header. Earlier attempts included adding an NBD-specific tracepoint fd, but in
code review, I was recommended to implement it more generically -- as a result,
this patchset is far simpler than my initial try.
v4->v5:
* rebased onto bpf-next/master and fixed merge conflicts
* "tools: sync bpf.h" also syncs comments that have previously changed
in bpf-next
v3->v4:
* fixed a silly copy/paste typo in include/trace/events/bpf_test_run.h
(_TRACE_NBD_H -> _TRACE_BPF_TEST_RUN_H)
* fixed incorrect/misleading wording in patch 1's commit message,
since the pointer cannot be directly dereferenced in a
BPF_PROG_TYPE_RAW_TRACEPOINT
* cleaned up the error message wording if the prog_tests fail
* Addressed feedback from Yonghong
* reject non-pointer-sized accesses to the buffer pointer
* use sizeof(struct nbd_request) as one-byte-past-the-end in
raw_tp_writable_reject_nbd_invalid.c
* use BPF_MOV64_IMM instead of BPF_LD_IMM64
v2->v3:
* Andrew addressed Josef's comments:
* C-style commenting in nbd.c
* Collapsed identical events into a single DECLARE_EVENT_CLASS.
This saves about 2kB of kernel text
v1->v2:
* add selftests
* sync tools/include/uapi/linux/bpf.h
* reject variable offset into the buffer
* add string representation of PTR_TO_TP_BUFFER to reg_type_str
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This tests that:
* a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
uses either:
* a variable offset to the tracepoint buffer, or
* an offset beyond the size of the tracepoint buffer
* a tracer can modify the buffer provided when attached to a writable
tracepoint in bpf_prog_test_run
Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, and fixes up the
error: enumeration value ‘BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE’ not handled in switch [-Werror=switch-enum]
build errors it would otherwise cause in libbpf.
Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This adds four tracepoints to nbd, enabling separate tracing of payload
and header sending/receipt.
In the send path for headers that have already been sent, we also
explicitly initialize the handle so it can be referenced by the later
tracepoint.
Signed-off-by: Andrew Hall <hall@fb.com>
Signed-off-by: Matt Mullins <mmullins@fb.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This adds a tracepoint that can both observe the nbd request being sent
to the server, as well as modify that request , e.g., setting a flag in
the request that will cause the server to collect detailed tracing data.
The struct request * being handled is included to permit correlation
with the block tracepoints.
Signed-off-by: Matt Mullins <mmullins@fb.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This is an opt-in interface that allows a tracepoint to provide a safe
buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
The size of the buffer must be a compile-time constant, and is checked
before allowing a BPF program to attach to a tracepoint that uses this
feature.
The pointer to this buffer will be the first argument of tracepoints
that opt in; the pointer is valid and can be bpf_probe_read() by both
BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE
programs that attach to such a tracepoint, but the buffer to which it
points may only be written by the latter.
Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
lets add support for STADD and use that in favor of LDXR / STXR loop for
the XADD mapping if available. STADD is encoded as an alias for LDADD with
XZR as the destination register, therefore add LDADD to the instruction
encoder along with STADD as special case and use it in the JIT for CPUs
that advertise LSE atomics in CPUID register. If immediate offset in the
BPF XADD insn is 0, then use dst register directly instead of temporary
one.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Prefetch-with-intent-to-write is currently part of the XADD mapping in
the AArch64 JIT and follows the kernel's implementation of atomic_add.
This may interfere with other threads executing the LDXR/STXR loop,
leading to potential starvation and fairness issues. Drop the optional
prefetch instruction.
Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This way, slhc_free() accepts what slhc_init() returns, whether that is
an error or not.
In particular, the pattern in sl_alloc_bufs() is
slcomp = slhc_init(16, 16);
...
slhc_free(slcomp);
for the error handling path, and rather than complicate that code, just
make it ok to always free what was returned by the init function.
That's what the code used to do before commit 4ab42d78e37a ("ppp, slip:
Validate VJ compression slot parameters completely") when slhc_init()
just returned NULL for the error case, with no actual indication of the
details of the error.
Reported-by: syzbot+45474c076a4927533d2e@syzkaller.appspotmail.com
Fixes: 4ab42d78e37a ("ppp, slip: Validate VJ compression slot parameters completely")
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Merge misc fixes from Andrew Morton:
"9 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
mm/page_alloc.c: fix never set ALLOC_NOFRAGMENT flag
mm/page_alloc.c: avoid potential NULL pointer dereference
mm, page_alloc: always use a captured page regardless of compaction result
mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model
lib/test_vmalloc.c: do not create cpumask_t variable on stack
lib/Kconfig.debug: fix build error without CONFIG_BLOCK
zram: pass down the bvec we need to read into in the work struct
mm/memory_hotplug.c: drop memory device reference after find_memory_block()
|
|
Currently any changed config register values don't take effect, as the
function to write them back is called with the wrong register offset.
Fixes: ff8f83708b3e (Input: synaptics-rmi4 - add support for 2D
sensors and F11)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Various updates, notably:
* extended key ID support (from 802.11-2016)
* per-STA TX power control support
* mac80211 TX performance improvements
* HE (802.11ax) updates
* mesh link probing support
* enhancements of multi-BSSID support (also related to HE)
* OWE userspace processing support
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- keep the tail of an unaligned initrd reserved
- adjust ftrace_make_call() to deal with the relative nature of PLTs
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/module: ftrace: deal with place relative nature of PLTs
arm64: mm: Ensure tail of unaligned initrd is reserved
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Three tracing fixes:
- Use "nosteal" for ring buffer splice pages
- Memory leak fix in error path of trace_pid_write()
- Fix preempt_enable_no_resched() (use preempt_enable()) in ring
buffer code"
* tag 'trace-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
trace: Fix preempt_enable_no_resched() abuse
tracing: Fix a memory leak by early error exit in trace_pid_write()
tracing: Fix buffer_ref pipe ops
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Not much to say about them, regular fixes:
- Fix a bug on the errorpath of gpiochip_add_data_with_key()
- IRQ type setting on the spreadtrum GPIO driver"
* tag 'gpio-v5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Fix gpiochip_add_data_with_key() error path
gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC
|
|
Pull drm fixes from Dave Airlie:
"Regular drm fixes, nothing too outstanding, I'm guessing Easter was
slowing people down.
i915:
- FEC enable fix
- BXT display lanes fix
ttm:
- fix reinit for reloading drivers regression
imx:
- DP CSC fix
sun4i:
- module unload/load fix
vc4:
- memory leak fix
- compile fix
dw-hdmi:
- rockchip scdc overflow fix
sched:
- docs fix
vmwgfx:
- dma api layering fix"
* tag 'drm-fixes-2019-04-26' of git://anongit.freedesktop.org/drm/drm:
drm/bridge: dw-hdmi: fix SCDC configuration for ddc-i2c-bus
drm/vmwgfx: Fix dma API layer violation
drm/vc4: Fix compilation error reported by kbuild test bot
drm/sun4i: Unbind components before releasing DRM and memory
drm/vc4: Fix memory leak during gpu reset.
drm/sched: Fix description of drm_sched_stop
drm/imx: don't skip DP channel disable for background plane
gpu: ipu-v3: dp: fix CSC handling
drm/ttm: fix re-init of global structures
drm/sun4i: Fix component unbinding and component master deletion
drm/sun4i: Set device driver data at bind time for use in unbind
drm/sun4i: Add missing drm_atomic_helper_shutdown at driver unbind
drm/i915: Restore correct bxt_ddi_phy_calc_lane_lat_optim_mask() calculation
drm/i915: Do not enable FEC without DSC
drm: bridge: dw-hdmi: Fix overflow workaround for Rockchip SoCs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One patch to fix a crash in io submission path, due to memory
allocation errors.
In short, the multipage bio work that landed in 5.1 caused larger bios
that in turn require larger temporary memory for checksums. The patch
is a workaround, we're going to rework the allocation so it does not
require the vmalloc fallback.
It took a while to identify that it's caused by patches in 5.1 and not
a patchset that did some changes in error handling in the code. I've
tested it on various memory/cpu combinations, it could hit OOM but
does not crash.
The timestamp of the patch is less than a day due to updates in the
changelog, tests were running meanwhile"
* tag 'for-5.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Switch memory allocations in async csum calculation path to kvmalloc
|
|
Pull cifs fixes from Steve French:
"Three small SMB3 fixes (all for stable as well): two leaks and a
rename bug"
* tag '5.1-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix page reference leak with readv/writev
cifs: do not attempt cifs operation on smb2+ rename error
cifs: fix memory leak in SMB2_read
|
|
Syzkaller report this:
sysctl could not get directory: /net//bridge -12
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G C 5.1.0-rc3+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
RSP: 0018:ffff8881bb507778 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a
RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568
RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4
R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
erase_entry fs/proc/proc_sysctl.c:178 [inline]
erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
start_unregistering fs/proc/proc_sysctl.c:331 [inline]
drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
get_subdir fs/proc/proc_sysctl.c:1022 [inline]
__register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
br_netfilter_init+0x68/0x1000 [br_netfilter]
do_one_initcall+0xbc/0x47d init/main.c:901
do_init_module+0x1b5/0x547 kernel/module.c:3456
load_module+0x6405/0x8c10 kernel/module.c:3804
__do_sys_finit_module+0x162/0x190 kernel/module.c:3898
do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter]
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace 68741688d5fbfe85 ]---
commit 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer
dereference in put_links") forgot to handle start_unregistering() case,
while header->parent is NULL, it calls erase_header() and as seen in the
above syzkaller call trace, accessing &header->parent->root will trigger
a NULL pointer dereference.
As that commit explained, there is also no need to call
start_unregistering() if header->parent is NULL.
Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com
Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links")
Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 0a79cdad5eb2 ("mm: use alloc_flags to record if kswapd can wake")
removed setting of the ALLOC_NOFRAGMENT flag. Bring it back.
The runtime effect is that ALLOC_NOFRAGMENT behaviour is restored so
that allocations are spread across local zones to avoid fragmentation
due to mixing pageblocks as long as possible.
Link: http://lkml.kernel.org/r/20190423120806.3503-2-aryabinin@virtuozzo.com
Fixes: 0a79cdad5eb2 ("mm: use alloc_flags to record if kswapd can wake")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ac.preferred_zoneref->zone passed to alloc_flags_nofragment() can be NULL.
'zone' pointer unconditionally derefernced in alloc_flags_nofragment().
Bail out on NULL zone to avoid potential crash. Currently we don't see
any crashes only because alloc_flags_nofragment() has another bug which
allows compiler to optimize away all accesses to 'zone'.
Link: http://lkml.kernel.org/r/20190423120806.3503-1-aryabinin@virtuozzo.com
Fixes: 6bb154504f8b ("mm, page_alloc: spread allocations across zones before introducing fragmentation")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During the development of commit 5e1f0f098b46 ("mm, compaction: capture
a page under direct compaction"), a paranoid check was added to ensure
that if a captured page was available after compaction that it was
consistent with the final state of compaction. The intent was to catch
serious programming bugs such as using a stale page pointer and causing
corruption problems.
However, it is possible to get a captured page even if compaction was
unsuccessful if an interrupt triggered and happened to free pages in
interrupt context that got merged into a suitable high-order page. It's
highly unlikely but Li Wang did report the following warning on s390
occuring when testing OOM handling. Note that the warning is slightly
edited for clarity.
WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_direct_compact+0x182/0x190
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs
lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390
des_s390 des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc
ip_tables xfs libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod
qeth qdio lcs ctcm ccwgroup fsm dm_mirror dm_region_hash dm_log
dm_mod
CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1
This patch simply removes the check entirely instead of trying to be
clever about pages freed from interrupt context. If a serious
programming error was introduced, it is highly likely to be caught by
prep_new_page() instead.
Link: http://lkml.kernel.org/r/20190419085133.GH18914@techsingularity.net
Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Li Wang <liwang@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
model
Mikulas Patocka reported that commit 1c30844d2dfe ("mm: reclaim small
amounts of memory when an external fragmentation event occurs") "broke"
memory management on parisc.
The machine is not NUMA but the DISCONTIG model creates three pgdats
even though it's a UMA machine for the following ranges
0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB
1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB
2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB
Mikulas reported:
With the patch 1c30844d2, the kernel will incorrectly reclaim the
first zone when it fills up, ignoring the fact that there are two
completely free zones. Basiscally, it limits cache size to 1GiB.
For example, if I run:
# dd if=/dev/sda of=/dev/null bs=1M count=2048
- with the proper kernel, there should be "Buffers - 2GiB"
when this command finishes. With the patch 1c30844d2, buffers
will consume just 1GiB or slightly more, because the kernel was
incorrectly reclaiming them.
The page allocator and reclaim makes assumptions that pgdats really
represent NUMA nodes and zones represent ranges and makes decisions on
that basis. Watermark boosting for small pgdats leads to unexpected
results even though this would have behaved reasonably on SPARSEMEM.
DISCONTIG is essentially deprecated and even parisc plans to move to
SPARSEMEM so there is no need to be fancy, this patch simply disables
watermark boosting by default on DISCONTIGMEM.
Link: http://lkml.kernel.org/r/20190419094335.GJ18914@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On my "Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz" system(12 CPUs) i get the
warning from the compiler about frame size:
warning: the frame size of 1096 bytes is larger than 1024 bytes [-Wframe-larger-than=]
the size of cpumask_t depends on number of CPUs, therefore just make use
of cpumask_of() in set_cpus_allowed_ptr() as a second argument.
Link: http://lkml.kernel.org/r/20190418193925.9361-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If CONFIG_TEST_KMOD is set to M, while CONFIG_BLOCK is not set, XFS and
BTRFS can not be compiled successly.
Link: http://lkml.kernel.org/r/20190410075434.35220-1-yuehaibing@huawei.com
Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When scheduling work item to read page we need to pass down the proper
bvec struct which points to the page to read into. Before this patch it
uses a randomly initialized bvec (only if PAGE_SIZE != 4096) which is
wrong.
Note that without this patch on arch/kernel where PAGE_SIZE != 4096
userspace could read random memory through a zram block device (thought
userspace probably would have no control on the address being read).
Link: http://lkml.kernel.org/r/20190408183219.26377-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Right now we are using find_memory_block() to get the node id for the
pfn range to online. We are missing to drop a reference to the memory
block device. While the device still gets unregistered via
device_unregister(), resulting in no user visible problem, the device is
never released via device_release(), resulting in a memory leak. Fix
that by properly using a put_device().
Link: http://lkml.kernel.org/r/20190411110955.1430-1-david@redhat.com
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Huazhong Tan says:
====================
code optimizations & bugfixes for HNS3 driver
This patch-set includes code optimizations and bugfixes for the HNS3
ethernet controller driver.
[patch 1/11 - 3/11] fixes some bugs about the IO path
[patch 4/11 - 6/11] includes some optimization and bugfixes
about mailbox handling
[patch 7/11 - 11/11] adds misc code optimizations and bugfixes.
Change log:
V2->V3: fixes comments from Neil Horman, removes [patch 8/12]
V1->V2: adds modification on [patch 8/12]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's meaningless to trigger reset when failed to send command to IMP,
because the failure is usually caused by no authority, illegal command
and so on. When that happened, we just need to return the status code
for further debugging.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds a check for the hns3_put_ring_config() to prevent
double free, and for more readable, move the NULL assignment of
priv->ring_data into the hns3_put_ring_config().
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The test results show that the maximum time of hardware return
to mac link state is 500MS.The software needs to set twice the
maximum time of hardware return state (1000MS).
If not modified, the loopback test returns probability failure.
Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When configure pause, current implementation returns directly
after setup PFC without setup BP, which is not sufficient.
So this patch fixes it, only return while setting PFC failed.
Fixes: 44e59e375bf7 ("net: hns3: do not return GE PFC setting err when initializing")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the hardware does not handle mailboxes and the hardware
reset include TQP reset, so it is unnecessary to reset TQP
in the hclgevf_ae_stop() while doing VF reset. Also it is
unnecessary to reset the remaining TQP when one reset fails.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch uses a reserved byte in the hclge_mbx_vf_to_pf_cmd
to save the need_resp flag, so when PF received the mailbox,
it can use it to decise whether send a response to VF.
For hclge_set_vf_uc_mac_addr(), it should use mbx_need_resp flag
to decide whether send response to VF.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since irq handler and mailbox task will both update arq's count,
so arq's count should use atomic_t instead of u32, otherwise
its value may go wrong finally.
Fixes: 07a0556a3a73 ("net: hns3: Changes to support ARQ(Asynchronous Receive Queue)")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
HCLGEVF_STATE_CMD_DISABLE is more suitable than
HCLGEVF_STATE_RST_HANDLING to stop sending keep alive msg,
since HCLGEVF_STATE_RST_HANDLING only be set when the reset
task is running.
Fixes: c59a85c07e77 ("net: hns3: stop sending keep alive msg to PF when VF is resetting")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bdinfo handled in hns3_handle_bdinfo is only valid on the
last BD of the current packet, currently the bd info may be handled
based on the first BD if the packet has more than two BDs, which
may cause rx error.
This patch fixes it by using the last BD of the current packet in
hns3_handle_bdinfo.
Also, hns3_set_rx_skb_rss_type has used RSS hash value from the last
BD of the current packet, so remove the same last BD calculation in
hns3_set_rx_skb_rss_type and call it from hns3_handle_bdinfo.
Fixes: e55970950556 ("net: hns3: Add handling of GRO Pkts not fully RX'ed in NAPI poll")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hns3_desc_unused() returns how many BD have been cleaned, but new
buffer has not been attached to them. The register of
HNS3_RING_RX_RING_FBDNUM_REG returns how many BD need allocating new
buffer to or need to cleaned. So the remaining BD need to be clean
is HNS3_RING_RX_RING_FBDNUM_REG - hns3_desc_unused().
Also, new buffer can not attach to the pending BD when the last BD is
not handled, because memcpy has not been done on the first pending BD.
This patch fixes by subtracting the pending BD num from unused_count
after 'HNS3_RING_RX_RING_FBDNUM_REG - unused_count' is used to calculate
the BD bum need to be clean.
Fixes: e55970950556 ("net: hns3: Add handling of GRO Pkts not fully RX'ed in NAPI poll")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hns3_clean_tx_ring calls hns3_nic_reclaim_one_desc to clean
buffers and set ring->next_to_clean, then hns3_nic_net_xmit
reuses the cleaned buffers. But there are no memory barriers
when buffers gets recycled, so the recycled buffers can be
corrupted.
This patch uses smp_store_release to update ring->next_to_clean
and smp_load_acquire to read ring->next_to_clean to properly
hand off buffers from hns3_clean_tx_ring to hns3_nic_net_xmit.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Device may be shutdown without the hardware being reinitialized, in
which case we want to ensure we cleanup properly.
This is especially important for kexec with traffic flowing.
The shutdown procedures resembles the remove procedures, so we can reuse
those common tasks.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/nfc/st95hf/core.c: In function 'st95hf_irq_thread_handler':
drivers/nfc/st95hf/core.c:786:26: warning:
variable 'nfcddev' set but not used [-Wunused-but-set-variable]
drivers/nfc/st95hf/core.c:784:17: warning:
variable 'dev' set but not used [-Wunused-but-set-variable]
They are never used since introduction in
commit cab47333f0f7 ("NFC: Add STMicroelectronics ST95HF driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
marvell_get_sset_count() returns how many statistics counters there
are. If the PHY supports fibre, there are 3, otherwise two.
marvell_get_strings() does not make this distinction, and always
returns 3 strings. This then often results in writing past the end
of the buffer for the strings.
Fixes: 2170fef78a40 ("Marvell phy: add field to get errors from fiber link.")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I forgot to remove one rcu_read_unlock() before a return statement.
Joy of mixing goto and return styles in a function :)
Fixes: 4109a2c3b91e ("tipc: tipc_udp_recv() cleanup vs rcu verbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When allocating the next family->id it makes more sense to use
idr_alloc_cyclic to avoid re-using a previously used family->id as much
as possible.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PHY's behave differently when being reset. Some reset registers to
defaults, some don't. Some trigger an autoneg restart, some don't.
So let's also set the autoneg restart bit when resetting. Then PHY
behavior should be more consistent. Clearing BMCR_ISOLATE serves the
same purpose and is borrowed from genphy_restart_aneg.
BMCR holds the speed / duplex settings in fixed mode. Therefore
we may have an issue if a soft reset resets BMCR to its default.
So better call genphy_setup_forced() afterwards in fixed mode.
We've seen no related complaint in the last >10 yrs, so let's
treat it as an improvement.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass.net
Cc: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: stable@vger.kernel.org
Fixes: 2c2d7329d8af ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
In trace_pid_write(), the buffer for trace parser is allocated through
kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
it is then freed through kfree() in trace_parser_put(). However, it is
possible that trace_pid_write() is terminated due to unexpected errors,
e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
is a memory leak bug.
To fix this issue, free the allocated buffer when an error is encountered.
Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu
Fixes: f4d34a87e9c10 ("tracing: Use pid bitmap instead of a pid array for set_event_pid")
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|