summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-03-09net sched actions: calculate add/delete event message sizeRoman Mashak
Introduce routines to calculate size of the shared tc netlink attributes and the full message size including netlink header and tc service header. Update add/delete action logic to have the size for event messages, the size is passed to tcf_add_notify() and tcf_del_notify() where the notification message is being allocated and constructed. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09net sched actions: add new tc_action_ops callbackRoman Mashak
Add a new callback in tc_action_ops, it will be needed by the tc actions to compute its size when a ADD/DELETE notification message is constructed. This routine has to take into account optional/variable size TLVs specific per action. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09net sched actions: update Add/Delete action API with new argumentRoman Mashak
Introduce a new function argument to carry total attributes size for correct allocation of skb in event messages. Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09net: do not create fallback tunnels for non-default namespacesEric Dumazet
fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0, ip6tnl0, ip6gre0) are automatically created when the corresponding module is loaded. These tunnels are also automatically created when a new network namespace is created, at a great cost. In many cases, netns are used for isolation purposes, and these extra network devices are a waste of resources. We are using thousands of netns per host, and hit the netns creation/delete bottleneck a lot. (Many thanks to Kirill for recent work on this) Add a new sysctl so that we can opt-out from this automatic creation. Note that these tunnels are still created for the initial namespace, to be the least intrusive for typical setups. Tested: lpk43:~# cat add_del_unshare.sh for i in `seq 1 40` do (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & done wait lpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net lpk43:~# time ./add_del_unshare.sh real 0m37.521s user 0m0.886s sys 7m7.084s lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net lpk43:~# time ./add_del_unshare.sh real 0m4.761s user 0m0.851s sys 1m8.343s lpk43:~# Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09tools: tc-testing: Can pause just before post-suiteBrenda J. Butler
With option -P, the test script will pause just before the post_suite functions are called. This allows the tester to inspect the system before it is torn down. Signed-off-by: Brenda J. Butler <bjb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09tools: tc-testing: Can refer to $TESTID in test specBrenda J. Butler
When processing the commands in the test cases, substitute the test id for $TESTID. This helps to make more flexible tests. For example, the testid can be given as a command line argument. As an example, if we wish to save the test output to a file named for the test case, we can write in the test case: "cmdUnderTest": "some test command | tee -a $TESTID.out" Signed-off-by: Brenda J. Butler <bjb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09net: dsa: mv88e6xxx: Fix irq free'ingAndrew Lunn
Call the common irq free function, rather than going recursive and blowing away the stack, followed by the machine. Fixes: 294d711ee8c0 ("net: dsa: mv88e6xxx: Poll when no interrupt defined") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09tc-testing: add csum testsRoman Mashak
Signed-off-by: Roman Mashak <mrv@mojatatu.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event()Eric Dumazet
A tun device type can trivially be set to arbitrary value using TUNSETLINK ioctl(). Therefore, lowpan_device_event() must really check that ieee802154_ptr is not NULL. Fixes: 2c88b5283f60d ("ieee802154: 6lowpan: remove check on null") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option()Lorenzo Bianconi
Fix the following slab-out-of-bounds kasan report in ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not linear and the accessed data are not in the linear data region of orig_skb. [ 1503.122508] ================================================================== [ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990 [ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932 [ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124 [ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 [ 1503.123527] Call Trace: [ 1503.123579] <IRQ> [ 1503.123638] print_address_description+0x6e/0x280 [ 1503.123849] kasan_report+0x233/0x350 [ 1503.123946] memcpy+0x1f/0x50 [ 1503.124037] ndisc_send_redirect+0x94e/0x990 [ 1503.125150] ip6_forward+0x1242/0x13b0 [...] [ 1503.153890] Allocated by task 1932: [ 1503.153982] kasan_kmalloc+0x9f/0xd0 [ 1503.154074] __kmalloc_track_caller+0xb5/0x160 [ 1503.154198] __kmalloc_reserve.isra.41+0x24/0x70 [ 1503.154324] __alloc_skb+0x130/0x3e0 [ 1503.154415] sctp_packet_transmit+0x21a/0x1810 [ 1503.154533] sctp_outq_flush+0xc14/0x1db0 [ 1503.154624] sctp_do_sm+0x34e/0x2740 [ 1503.154715] sctp_primitive_SEND+0x57/0x70 [ 1503.154807] sctp_sendmsg+0xaa6/0x1b10 [ 1503.154897] sock_sendmsg+0x68/0x80 [ 1503.154987] ___sys_sendmsg+0x431/0x4b0 [ 1503.155078] __sys_sendmsg+0xa4/0x130 [ 1503.155168] do_syscall_64+0x171/0x3f0 [ 1503.155259] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 1503.155436] Freed by task 1932: [ 1503.155527] __kasan_slab_free+0x134/0x180 [ 1503.155618] kfree+0xbc/0x180 [ 1503.155709] skb_release_data+0x27f/0x2c0 [ 1503.155800] consume_skb+0x94/0xe0 [ 1503.155889] sctp_chunk_put+0x1aa/0x1f0 [ 1503.155979] sctp_inq_pop+0x2f8/0x6e0 [ 1503.156070] sctp_assoc_bh_rcv+0x6a/0x230 [ 1503.156164] sctp_inq_push+0x117/0x150 [ 1503.156255] sctp_backlog_rcv+0xdf/0x4a0 [ 1503.156346] __release_sock+0x142/0x250 [ 1503.156436] release_sock+0x80/0x180 [ 1503.156526] sctp_sendmsg+0xbb0/0x1b10 [ 1503.156617] sock_sendmsg+0x68/0x80 [ 1503.156708] ___sys_sendmsg+0x431/0x4b0 [ 1503.156799] __sys_sendmsg+0xa4/0x130 [ 1503.156889] do_syscall_64+0x171/0x3f0 [ 1503.156980] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 1503.157158] The buggy address belongs to the object at ffff8800298ab600 which belongs to the cache kmalloc-1024 of size 1024 [ 1503.157444] The buggy address is located 176 bytes inside of 1024-byte region [ffff8800298ab600, ffff8800298aba00) [ 1503.157702] The buggy address belongs to the page: [ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 1503.158053] flags: 0x4000000000008100(slab|head) [ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e [ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000 [ 1503.158523] page dumped because: kasan: bad access detected [ 1503.158698] Memory state around the buggy address: [ 1503.158816] ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1503.158988] ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1503.159338] ^ [ 1503.159436] ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1503.159610] ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1503.159785] ================================================================== [ 1503.159964] Disabling lock debugging due to kernel taint The test scenario to trigger the issue consists of 4 devices: - H0: data sender, connected to LAN0 - H1: data receiver, connected to LAN1 - GW0 and GW1: routers between LAN0 and LAN1. Both of them have an ethernet connection on LAN0 and LAN1 On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for data from LAN0 to LAN1. Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send buffer size is set to 16K). While data streams are active flush the route cache on HA multiple times. I have not been able to identify a given commit that introduced the issue since, using the reproducer described above, the kasan report has been triggered from 4.14 and I have not gone back further. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09ASoC: amd: 16bit resolution support for i2s sp instanceVijendar Mukunda
Moved 16bit resolution condition check for stoney platform to acp_hw_params.Depending upon substream required register value need to be programmed rather than enabling 16bit resolution support all time in acp init. Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2018-03-09usb: usbmon: Read text within supplied buffer sizePete Zaitcev
This change fixes buffer overflows and silent data corruption with the usbmon device driver text file read operations. Signed-off-by: Fredrik Noring <noring@nocrew.org> Signed-off-by: Pete Zaitcev <zaitcev@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-09loop: Fix lost writes caused by missing flagRoss Zwisler
The following commit: commit aa4d86163e4e ("block: loop: switch to VFS ITER_BVEC") replaced __do_lo_send_write(), which used ITER_KVEC iterators, with lo_write_bvec() which uses ITER_BVEC iterators. In this change, though, the WRITE flag was lost: - iov_iter_kvec(&from, ITER_KVEC | WRITE, &kvec, 1, len); + iov_iter_bvec(&i, ITER_BVEC, bvec, 1, bvec->bv_len); This flag is necessary for the DAX case because we make decisions based on whether or not the iterator is a READ or a WRITE in dax_iomap_actor() and in dax_iomap_rw(). We end up going through this path in configurations where we combine a PMEM device with 4k sectors, a loopback device and DAX. The consequence of this missed flag is that what we intend as a write actually turns into a read in the DAX code, so no data is ever written. The very simplest test case is to create a loopback device and try and write a small string to it, then hexdump a few bytes of the device to see if the write took. Without this patch you read back all zeros, with this you read back the string you wrote. For XFS this causes us to fail or panic during the following xfstests: xfs/074 xfs/078 xfs/216 xfs/217 xfs/250 For ext4 we have a similar issue where writes never happen, but we don't currently have any xfstests that use loopback and show this issue. Fix this by restoring the WRITE flag argument to iov_iter_bvec(). This causes the xfstests to all pass. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Fixes: commit aa4d86163e4e ("block: loop: switch to VFS ITER_BVEC") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-09drm/i915/gvt: keep oa config in shadow ctxMin He
When populating shadow ctx from guest, we should handle oa related registers in hw ctx, so that they will not be overlapped by guest oa configs. This patch made it possible to capture oa data from host for both host and guests. Signed-off-by: Min He <min.he@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2018-03-09drm/i915/gvt: Add runtime_pm_get/put into gvt_switch_mmioXiong Zhang
If user continuously create vgpu, boot guest, shoutdown guest and destroy vgpu from remote, the following calltrace exists in dmesg sometimes: [ 6412.954721] RPM wakelock ref not held during HW access [ 6412.954795] WARNING: CPU: 7 PID: 11941 at linux/drivers/gpu/drm/i915/intel_drv.h:1800 intel_uncore_forcewake_get.part.7+0x96/0xa0 [i915] [ 6412.954915] Call Trace: [ 6412.954951] intel_uncore_forcewake_get+0x18/0x20 [i915] [ 6412.954989] intel_gvt_switch_mmio+0x8e/0x770 [i915] [ 6412.954996] ? __slab_free+0x14d/0x2c0 [ 6412.955001] ? __slab_free+0x14d/0x2c0 [ 6412.955006] ? __slab_free+0x14d/0x2c0 [ 6412.955041] intel_vgpu_stop_schedule+0x92/0xd0 [i915] [ 6412.955073] intel_gvt_deactivate_vgpu+0x48/0x60 [i915] [ 6412.955078] __intel_vgpu_release+0x55/0x260 [kvmgt] when this happens, gvt_switch_mmio is called at vgpu destroy, host i915 is idle and doesn't hold RPM wakelock, igd is in powersave mode, but gvt_switch_mmio require igd power on to access register, so intel_runtime_pm_get should be added to make sure igd power on before gvt_switch_mmio. v2: Move runtime_pm_get/put into gvt_switch_mmio.(Zhenyu) Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
2018-03-09clocksource/atmel-st: Add 'depends on HAS_IOMEM' to fix unmet dependencyMasahiro Yamada
The ATMEL_ST config selects MFD_SYSCON, but does not depend on HAS_IOMEM. Compile testing on architecture without HAS_IOMEM causes "unmet direct dependencies" in Kconfig phase. Detected by "make ARCH=score allyesconfig". Add the proper dependency to the ATMEL_ST config. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/1520335233-11277-1-git-send-email-yamada.masahiro@socionext.com
2018-03-09rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsitesBoqun Feng
When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a HARDIRQ-safe->HARDIRQ-unsafe deadlock: ================================ WARNING: inconsistent lock state 4.16.0-rc4+ #1 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. takes: __schedule+0xbe/0xaf0 {IN-HARDIRQ-W} state was registered at: _raw_spin_lock+0x2a/0x40 scheduler_tick+0x47/0xf0 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&rq->lock); <Interrupt> lock(&rq->lock); *** DEADLOCK *** 1 lock held by rcu_torture_rea/724: rcu_torture_read_lock+0x0/0x70 stack backtrace: CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: lock_acquire+0x90/0x200 ? __schedule+0xbe/0xaf0 _raw_spin_lock+0x2a/0x40 ? __schedule+0xbe/0xaf0 __schedule+0xbe/0xaf0 preempt_schedule_irq+0x2f/0x60 retint_kernel+0x1b/0x2d RIP: 0010:rcu_read_unlock_special+0x0/0x680 ? rcu_torture_read_unlock+0x60/0x60 __rcu_read_unlock+0x64/0x70 rcu_torture_read_unlock+0x17/0x60 rcu_torture_reader+0x275/0x450 ? rcutorture_booster_init+0x110/0x110 ? rcu_torture_stall+0x230/0x230 ? kthread+0x10e/0x130 kthread+0x10e/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ? call_usermodehelper_exec_async+0x11a/0x150 ret_from_fork+0x3a/0x50 This happens with the following even sequence: preempt_schedule_irq(); local_irq_enable(); __schedule(): local_irq_disable(); // irq off ... rcu_note_context_switch(): rcu_note_preempt_context_switch(): rcu_read_unlock_special(): local_irq_save(flags); ... raw_spin_unlock_irqrestore(...,flags); // irq remains off rt_mutex_futex_unlock(): raw_spin_lock_irq(); ... raw_spin_unlock_irq(); // accidentally set irq on <return to __schedule()> rq_lock(): raw_spin_lock(); // acquiring rq->lock with irq on which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause deadlocks in scheduler code. This problem was introduced by commit 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints"). That brought the user of rt_mutex_futex_unlock() with irq off. To fix this, replace the *lock_irq() in rt_mutex_futex_unlock() with *lock_irq{save,restore}() to make it safe to call rt_mutex_futex_unlock() with irq off. Fixes: 02a7c234e540 ("rcu: Suppress lockdep false-positive ->boost_mtx complaints") Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Link: https://lkml.kernel.org/r/20180309065630.8283-1-boqun.feng@gmail.com
2018-03-09bpf: comment why dots in filenames under BPF virtual FS are not allowedQuentin Monnet
When pinning a file under the BPF virtual file system (traditionally /sys/fs/bpf), using a dot in the name of the location to pin at is not allowed. For example, trying to pin at "/sys/fs/bpf/foo.bar" will be rejected with -EPERM. This check was introduced at the same time as the BPF file system itself, with commit b2197755b263 ("bpf: add support for persistent maps/progs"). At this time, it was checked in a function called "bpf_dname_reserved()", which made clear that using a dot was reserved for future extensions. This function disappeared and the check was moved elsewhere with commit 0c93b7d85d40 ("bpf: reject invalid names right in ->lookup()"), and the meaning of the dot ban was lost. The present commit simply adds a comment in the source to explain to the reader that the usage of dots is reserved for future usage. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09Merge branch 'bpf-tools-makefile-improvements'Daniel Borkmann
Jiri Benc says: ==================== Currently, 'make bpf' in the tools/ directory does not provide the standard quiet output except for bpftool (which is however listed with a wrong directory). Worse, it does not respect the build output directory. The 'make bpf_install' does not work as one would expect, either. It installs unconditionally to /usr/bin without respecting DESTDIR and prefix. This patchset improves that behavior. ==================== Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09tools: bpf: silence make by not deleting intermediate fileJiri Benc
Even in quiet mode, make finishes with rm tools/bpf/bpf_exp.lex.c That's because it considers the file to be intermediate. Silence that by mentioning the lex.c file instead of the lex.o file; the dependency still stays. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09tools: bpf: respect quiet/verbose buildJiri Benc
Default to quiet build, with V=1 enabling verbose build as is usual. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09tools: bpf: call descend in MakefileJiri Benc
Use the descend macro to properly propagate $(subdir) to bpftool. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09tools: bpf: make install should build firstJiri Benc
Make the 'install' target depend on the 'all' target to build the binaries first. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09tools: bpf: consistent make bpf_installJiri Benc
Currently, make bpf_install in tools/ does not respect DESTDIR. Moreover, it installs to /usr/bin/ unconditionally. Let it respect DESTDIR and allow prefix to be specified. Also, to be more consistent with bpftool and with the usual customs, default the prefix to /usr/local instead of /usr. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09tools: bpf: respect output directory during buildJiri Benc
Currently, the programs under tools/bpf (with the notable exception of bpftool) do not respect the output directory (make O=dir). Fix that. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09tools: bpftool: silence 'missing initializer' warningsJiri Benc
When building bpf tool, gcc emits piles of warnings: prog.c: In function ‘prog_fd_by_tag’: prog.c:101:9: warning: missing initializer for field ‘type’ of ‘struct bpf_prog_info’ [-Wmissing-field-initializers] struct bpf_prog_info info = {}; ^ In file included from /home/storage/jbenc/git/net-next/tools/lib/bpf/bpf.h:26:0, from prog.c:47: /home/storage/jbenc/git/net-next/tools/include/uapi/linux/bpf.h:925:8: note: ‘type’ declared here __u32 type; ^ As these warnings are not useful, switch them off. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-09x86/kprobes: Fix kernel crash when probing .entry_trampoline codeFrancis Deslauriers
Disable the kprobe probing of the entry trampoline: .entry_trampoline is a code area that is used to ensure page table isolation between userspace and kernelspace. At the beginning of the execution of the trampoline, we load the kernel's CR3 register. This has the effect of enabling the translation of the kernel virtual addresses to physical addresses. Before this happens most kernel addresses can not be translated because the running process' CR3 is still used. If a kprobe is placed on the trampoline code before that change of the CR3 register happens the kernel crashes because int3 handling pages are not accessible. To fix this, add the .entry_trampoline section to the kprobe blacklist to prohibit the probing of code before all the kernel pages are accessible. Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: mathieu.desnoyers@efficios.com Cc: mhiramat@kernel.org Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-09perf/core: Fix ctx_event_type in ctx_resched()Song Liu
In ctx_resched(), EVENT_FLEXIBLE should be sched_out when EVENT_PINNED is added. However, ctx_resched() calculates ctx_event_type before checking this condition. As a result, pinned events will NOT get higher priority than flexible events. The following shows this issue on an Intel CPU (where ref-cycles can only use one hardware counter). 1. First start: perf stat -C 0 -e ref-cycles -I 1000 2. Then, in the second console, run: perf stat -C 0 -e ref-cycles:D -I 1000 The second perf uses pinned events, which is expected to have higher priority. However, because it failed in ctx_resched(). It is never run. This patch fixes this by calculating ctx_event_type after re-evaluating event_type. Reported-by: Ephraim Park <ephiepark@fb.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <jolsa@redhat.com> Cc: <kernel-team@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 487f05e18aa4 ("perf/core: Optimize event rescheduling on active contexts") Link: http://lkml.kernel.org/r/20180306055504.3283731-1-songliubraving@fb.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-08net: usb: asix88179_178a: de-duplicate codeAlexander Kurz
Remove the duplicated code for asix88179_178a bind and reset methods. Signed-off-by: Alexander Kurz <akurz@blala.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: usb: asix88179_178a: set permanent address once onlyAlexander Kurz
The permanent address of asix88179_178a devices is read at probe time and should not be overwritten later. Otherwise it may be overwritten unintentionally with a configured address. Signed-off-by: Alexander Kurz <akurz@blala.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08Merge branch 'ntuple-filters-with-RSS'David S. Miller
Edward Cree says: ==================== ntuple filters with RSS This series introduces the ability to mark an ethtool steering filter to use RSS spreading, and the ability to create and configure multiple RSS contexts with different indirection tables, hash keys, and hash fields. An implementation for the sfc driver (for 7000-series and later SFC NICs) is included in patch 2/2. The anticipated use case of this feature is for steering traffic destined for a container (or virtual machine) to the subset of CPUs on which processes in the container (or the VM's vCPUs) are bound, while retaining the scalability of RSS spreading from the viewpoint inside the container. The use of both a base queue number (ring_cookie) and indirection table is intended to allow re-use of a single RSS context to target multiple sets of CPUs. For instance, if an 8-core system is hosting three containers on CPUs [1,2], [3,4] and [6,7], then a single RSS context with an equal-weight [0,1] indirection table could be used to target all three containers by setting ring_cookie to 1, 3 and 6 on the respective filters. v2: Initialised ctx in efx_ef10_filter_insert() to avoid (false positive) gcc warning. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08sfc: support RSS spreading of ethtool ntuple filtersEdward Cree
Use a linked list to associate user-facing context IDs with FW-facing context IDs (since the latter can change after an MC reset). Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net: ethtool: extend RXNFC API to support RSS spreading of filter matchesEdward Cree
We use a two-step process to configure a filter with RSS spreading. First, the RSS context is allocated and configured using ETHTOOL_SRSSH; this returns an identifier (rss_context) which can then be passed to subsequent invocations of ETHTOOL_SRXCLSRLINS to specify that the offset from the RSS indirection table lookup should be added to the queue number (ring_cookie) when delivering the packet. Drivers for devices which can only use the indirection table entry directly (not add it to a base queue number) should reject rule insertions combining RSS with a nonzero ring_cookie. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08rds: rds_info_from_znotifier() can be statickbuild test robot
Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08rds: rds_message_zcopy_from_user() can be statickbuild test robot
Fixes: d40a126b16ea ("rds: refactor zcopy code into rds_message_zcopy_from_user") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net/ncsi: unlock on error in ncsi_set_interface_nl()Dan Carpenter
There are two error paths which are missing unlocks in this function. Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08net/ncsi: use kfree_skb() instead of kfree()Dan Carpenter
We're supposed to use kfree_skb() to free these sk_buffs. Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08liquidio: avoid doing useless workPrasad Kanneganti
Avoid doing useless work by making sure that the response_list is not empty before scheduling work to process it. Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08liquidio: Resolved mbox read issue while reading more than one 64bit dataIntiyaz Basha
Corrected length check when data received in the mbox is more than one 64 bit data value Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08Merge branch 'nvme-4.16-rc5' of git://git.infradead.org/nvme into for-linusJens Axboe
Pull NVMe fixes for this series from Keith: "A few late fixes for 4.16: * Reverting sysfs slave device links for native nvme multipathing. The hidden disk attributes broke common user tools. * A fix for a PPC pci error handling regression. * Update pci interrupt count to consider the actual IRQ spread, fixing potentially poor initial queue affinity. * Off-by-one errors in nvme-fc queue sizes * A fabrics discovery fix to be more tolerant with user tools." * 'nvme-4.16-rc5' of git://git.infradead.org/nvme: nvme_fc: rework sqsize handling nvme-fabrics: Ignore nr_io_queues option for discovery controllers Revert "nvme: create 'slaves' and 'holders' entries for hidden controllers" nvme: pci: pass max vectors as num_possible_cpus() to pci_alloc_irq_vectors nvme-pci: Fix EEH failure on ppc
2018-03-09Merge branch 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Fixes for 4.16. A bit bigger than I would have liked, but most of that is DC fixes which Harry helped me pull together from the past few weeks. Highlights: - Fix DL DVI with DC - Various RV fixes for DC - Overlay fixes for DC - Fix HDMI2 handling on boards without HBR tables in the vbios - Fix crash with pass-through on SI on amdgpu - Fix RB harvesting on KV - Fix hibernation failures on UVD with certain cards * 'drm-fixes-4.16' of git://people.freedesktop.org/~agd5f/linux: (35 commits) drm/amd/display: validate plane format on primary plane drm/amdgpu:Always save uvd vcpu_bo in VM Mode drm/amdgpu:Correct max uvd handles drm/amd/display: early return if not in vga mode in disable_vga drm/amd/display: Fix takover from VGA mode drm/amd/display: Fix memleaks when atomic check fails. drm/amd/display: Return success when enabling interrupt drm/amd/display: Use crtc enable/disable_vblank hooks drm/amd/display: update infoframe after dig fe is turned on drm/amd/display: fix boot-up on vega10 drm/amd/display: fix cursor related Pstate hang drm/amd/display: Set irq state only on existing crtcs drm/amd/display: Fixed non-native modes not lighting up drm/amd/display: Call update_stream_signal directly from amdgpu_dm drm/amd/display: Make create_stream_for_sink more consistent drm/amd/display: Don't block dual-link DVI modes drm/amd/display: Don't allow dual-link DVI on all ASICs. drm/amd/display: Pass signal directly to enable_tmds_output drm/amd/display: Remove unnecessary fail labels in create_stream_for_sink drm/amd/display: Move MAX_TMDS_CLOCK define to header ...
2018-03-09Merge tag 'drm-misc-fixes-2018-03-07' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes sun4i fixes on clk, division by zero and LVDS. * tag 'drm-misc-fixes-2018-03-07' of git://anongit.freedesktop.org/drm/drm-misc: drm/sun4i: crtc: Call drm_crtc_vblank_on / drm_crtc_vblank_off drm/sun4i: rgb: Fix potential division by zero drm/sun4i: tcon: Reduce the scope of the LVDS error a bit drm/sun4i: Release exclusive clock lock when disabling TCON drm/sun4i: Fix dclk_set_phase
2018-03-09Merge tag 'drm-intel-fixes-2018-03-07' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - 2 fixes: 1 for perf and 1 execlist submission race. * tag 'drm-intel-fixes-2018-03-07' of git://anongit.freedesktop.org/drm/drm-intel: drm/i915: Suspend submission tasklets around wedging drm/i915/perf: fix perf stream opening lock
2018-03-08module: propagate error in modules_open()Leon Yu
otherwise kernel can oops later in seq_release() due to dereferencing null file->private_data which is only set if seq_open() succeeds. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: seq_release+0xc/0x30 Call Trace: close_pdeo+0x37/0xd0 proc_reg_release+0x5d/0x60 __fput+0x9d/0x1d0 ____fput+0x9/0x10 task_work_run+0x75/0x90 do_exit+0x252/0xa00 do_group_exit+0x36/0xb0 SyS_exit_group+0xf/0x10 Fixes: 516fb7f2e73d ("/proc/module: use the same logic as /proc/kallsyms for address exposure") Cc: Jessica Yu <jeyu@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Leon Yu <chianglungyu@gmail.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2018-03-08Merge tag 'mlx5-updates-2018-02-28-2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-updates-2018-02-28-2 (IPSec-2) This series follows our previous one to lay out the foundations for IPSec in user-space and extend current kernel netdev IPSec support. As noted in our previous pull request cover letter "mlx5-updates-2018-02-28-1 (IPSec-1)", the IPSec mechanism will be supported through our flow steering mechanism. Therefore, we need to change the initialization order. Furthermore, IPsec is also supported in both egress and ingress. Since our current flow steering is egress only, we add an empty (only implemented through FPGA steering ops) egress namespace to handle that case. We also implement the required flow steering callbacks and logic in our FPGA driver. We extend the FPGA support for ESN and modifying a xfrm too. Therefore, we add support for some new FPGA command interface that supports them. The other required bits are added too. The new features and requirements are advertised via cap bits. Last but not least, we revise our driver's accel_esp API. This API will be shared between our netdev and IB driver, so we need to have all the required functionality from both worlds. Regards, Aviad and Matan ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-08usb: host: xhci-rcar: add support for r8a77965Yoshihiro Shimoda
This patch adds support for r8a77965 (R-Car M3-N). Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Rob Herring <robh@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08Merge tag 'mips_fixes_4.16_4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips Pull MIPS fixes from James Hogan: "A miscellaneous pile of MIPS fixes for 4.16: - move put_compat_sigset() to evade hardened usercopy warnings (4.16) - select ARCH_HAVE_PC_{SERIO,PARPORT} for Loongson64 platforms (4.16) - fix kzalloc() failure handling in ath25 (3.19) and Octeon (4.0) - fix disabling of IPIs during BMIPS suspend (3.19)" * tag 'mips_fixes_4.16_4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: MIPS: BMIPS: Do not mask IPIs during suspend MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_SERIO MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_PARPORT signals: Move put_compat_sigset to compat.h to silence hardened usercopy MIPS: OCTEON: irq: Check for null return on kzalloc allocation MIPS: ath25: Check for kzalloc allocation failure
2018-03-08USB: storage: Add JMicron bridge 152d:2567 to unusual_devs.hTeijo Kinnunen
This USB-SATA controller seems to be similar with JMicron bridge 152d:2566 already on the list. Adding it here fixes "Invalid field in cdb" errors. Signed-off-by: Teijo Kinnunen <teijo.kinnunen@code-q.fi> Cc: stable@vger.kernel.org Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-08Merge tag 'chrome-platform-4.16-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform Pull chrome platform fix from Benson Leung: "Revert a problematic patch that constified something imporperly" * tag 'chrome-platform-4.16-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform: Revert "platform/chrome: chromeos_laptop: make chromeos_laptop const"
2018-03-08NFS: Fix unstable write completionTrond Myklebust
We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to ensure that all outstanding COMMIT requests to the inode in question are complete. Currently we may exit early from both nfs_commit_inode() and nfs_write_inode() even if there are COMMIT requests in flight, or unstable writes on the commit list. In order to get the right semantics w.r.t. sync_inode(), we don't need to have nfs_commit_inode() reset the inode dirty flags when called from nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that nfs_write_inode() leaves them in the right state if there are outstanding commits, or stable pages. Reported-by: Scott Mayhew <smayhew@redhat.com> Fixes: dc4fd9ab01ab ("nfs: don't wait on commit in nfs_commit_inode()...") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>