summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-08-29acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuseBoqun Feng
COMPLETION_INITIALIZER_ONSTACK() is supposed to be used as an initializer, in other words, it should only be used in assignment expressions or compound literals. So the usage in drivers/acpi/nfit/core.c: COMPLETION_INITIALIZER_ONSTACK(flush.cmp); ... is inappropriate. Besides, this usage could also break the build for another fix that reduces stack sizes caused by COMPLETION_INITIALIZER_ONSTACK(), because that fix changes COMPLETION_INITIALIZER_ONSTACK() from rvalue to lvalue, and usage as above will report the following error: drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe': include/linux/completion.h:77:3: error: value computed is not used [-Werror=unused-value] (*({ init_completion(&work); &work; })) This patch fixes this by replacing COMPLETION_INITIALIZER_ONSTACK() with init_completion() in acpi_nfit_flush_probe(), which does the same initialization without any other problems. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/20170824142239.15178-1-boqun.feng@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29locking/pvqspinlock: Relax cmpxchg's to improve performance on some ↵Waiman Long
architectures All the locking related cmpxchg's in the following functions are replaced with the _acquire variants: - pv_queued_spin_steal_lock() - trylock_clear_pending() This change should help performance on architectures that use LL/SC. The cmpxchg in pv_kick_node() is replaced with a relaxed version with explicit memory barrier to make sure that it is fully ordered in the writing of next->lock and the reading of pn->state whether the cmpxchg is a success or failure without affecting performance in non-LL/SC architectures. On a 2-socket 12-core 96-thread Power8 system with pvqspinlock explicitly enabled, the performance of a locking microbenchmark with and without this patch on a 4.13-rc4 kernel with Xinhui's PPC qspinlock patch were as follows: # of thread w/o patch with patch % Change ----------- --------- ---------- -------- 8 5054.8 Mop/s 5209.4 Mop/s +3.1% 16 3985.0 Mop/s 4015.0 Mop/s +0.8% 32 2378.2 Mop/s 2396.0 Mop/s +0.7% Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1502741222-24360-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29smp: Avoid using two cache lines for struct call_single_dataYing Huang
struct call_single_data is used in IPIs to transfer information between CPUs. Its size is bigger than sizeof(unsigned long) and less than cache line size. Currently it is not allocated with any explicit alignment requirements. This makes it possible for allocated call_single_data to cross two cache lines, which results in double the number of the cache lines that need to be transferred among CPUs. This can be fixed by requiring call_single_data to be aligned with the size of call_single_data. Currently the size of call_single_data is the power of 2. If we add new fields to call_single_data, we may need to add padding to make sure the size of new definition is the power of 2 as well. Fortunately, this is enforced by GCC, which will report bad sizes. To set alignment requirements of call_single_data to the size of call_single_data, a struct definition and a typedef is used. To test the effect of the patch, I used the vm-scalability multiple thread swap test case (swap-w-seq-mt). The test will create multiple threads and each thread will eat memory until all RAM and part of swap is used, so that huge number of IPIs are triggered when unmapping memory. In the test, the throughput of memory writing improves ~5% compared with misaligned call_single_data, because of faster IPIs. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Huang, Ying <ying.huang@intel.com> [ Add call_single_data_t and align with size of call_single_data. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29locking/lockdep: Untangle xhlock history save/restore from task independencePeter Zijlstra
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to ensure the temporal IRQ events don't interact with task state, the XHLOCK_PROC is a fundament different beast that just happens to share the interface. The purpose of XHLOCK_PROC is to annotate independent execution inside one task. For example workqueues, each work should appear to run in its own 'pristine' 'task'. Remove XHLOCK_PROC in favour of its own interface to avoid confusion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boqun.feng@gmail.com Cc: david@fromorbit.com Cc: johannes@sipsolutions.net Cc: kernel-team@lge.com Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/x86: Fix caps/ for !IntelPeter Zijlstra
Move the 'max_precise' capability into generic x86 code where it belongs. This fixes a sysfs splat on !Intel systems where we fail to set x86_pmu_caps_group.atts. Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory") Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/core, x86: Add PERF_SAMPLE_PHYS_ADDRKan Liang
For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. Add a new sample type for physical address. perf already has a facility to collect data virtual address. This patch introduces a function to convert the virtual address to physical address. The function is quite generic and can be extended to any architecture as long as a virtual address is provided. - For kernel direct mapping addresses, virt_to_phys is used to convert the virtual addresses to physical address. - For user virtual addresses, __get_user_pages_fast is used to walk the pages tables for user physical address. - This does not work for vmalloc addresses right now. These are not resolved, but code to do that could be added. The new sample type requires collecting the virtual address. The virtual address will not be output unless SAMPLE_ADDR is applied. For security, the physical address can only be exposed to root or privileged user. Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mpe@ellerman.id.au Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/core, pt, bts: Get rid of itrace_startedAlexander Shishkin
I just noticed that hw.itrace_started and hw.config are aliased to the same location. Now, the PT driver happens to use both, which works out fine by sheer luck: - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the program order, although there are no compiler barriers to ensure that, - to the perf_log_itrace_start() hw.itrace_start looks set at the same time as when it is intended to be set because both stores happen in the same path, - hw.config is never reset to zero in the PT driver. Now, the use of hw.config by the PT driver makes more sense (it being a HW PMU) than messing around with itrace_started, which is an awkward API to begin with. This patch replaces hw.itrace_started with an attach_state bit and an API call for the PMU drivers to use to communicate the condition. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29Documentation: Hardware tag matchingArtemy Kovalyov
Add document providing definitions of terms and core explanations for tag matching (TM) protocols, eager and rendezvous, TM application header, tag list manipulations and matching process. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/mlx5: Support IB_SRQT_TMArtemy Kovalyov
Pass to mlx5_core flag to enable rendezvous offload, list_size and CQ when SRQ created with IB_SRQT_TM. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29net/mlx5: Add XRQ supportArtemy Kovalyov
Add support to new XRQ(eXtended shared Receive Queue) hardware object. It supports SRQ semantics with addition of extended receive buffers topologies and offloads. Currently supports tag matching topology and rendezvouz offload. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/mlx5: Fill XRQ capabilitiesArtemy Kovalyov
Provide driver specific values for XRQ capabilities. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/uverbs: Expose XRQ capabilitiesArtemy Kovalyov
Make XRQ capabilities available via ibv_query_device() verb. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/uverbs: Add new SRQ type IB_SRQT_TMArtemy Kovalyov
Add new SRQ type capable of new tag matching feature. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/uverbs: Add XRQ creation parameter to UAPIArtemy Kovalyov
Add tm_list_size parameter to struct ib_uverbs_create_xsrq. If SRQ type is tag-matching this field defines maximum size of tag matching list. Otherwise, it is expected to be zero. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/core: Add new SRQ type IB_SRQT_TMArtemy Kovalyov
This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag matching and rendezvous offloads for MPI applications. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/core: Separate CQ handle in SRQ contextArtemy Kovalyov
Before this change CQ attached to SRQ was part of XRC specific extension. Moving CQ handle out makes it available to other types extending SRQ functionality. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/core: Add XRQ capabilitiesArtemy Kovalyov
This patch adds following TM XRQ capabilities: * max_rndv_hdr_size - Max size of rendezvous request message * max_num_tags - Max number of entries in tag matching list * max_ops - Max number of outstanding list operations * max_sge - Max number of SGE in tag matching entry * flags - the following flags are currently defined: - IB_TM_CAP_RC - Support tag matching on RC transport Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29net/mlx5: Update HW layout definitionsArtemy Kovalyov
* add offload_type field to mlx5_ifc_qpc_bits * update mlx5_ifc_xrqc_bits layout Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29x86/boot: Prevent faulty bootparams.screeninfo from causing harmJan H. Schönherr
If a zero for the number of lines manages to slip through, scroll() may underflow some offset calculations, causing accesses outside the video memory. Make the check in __putstr() more pessimistic to prevent that. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot: Provide more slack space during decompressionJan H. Schönherr
The current slack space is not enough for LZ4, which has a worst case overhead of 0.4% for data that cannot be further compressed. With an LZ4 compressed kernel with an embedded initrd, the output is likely to overwrite the input. Increase the slack space to avoid that. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1503842124-29718-1-git-send-email-jschoenh@amazon.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/ftrace: Fix double traces of perf on ftrace:functionZhou Chengming
When running perf on the ftrace:function tracepoint, there is a bug which can be reproduced by: perf record -e ftrace:function -a sleep 20 & perf record -e ftrace:function ls perf script ls 10304 [005] 171.853235: ftrace:function: perf_output_begin ls 10304 [005] 171.853237: ftrace:function: perf_output_begin ls 10304 [005] 171.853239: ftrace:function: task_tgid_nr_ns ls 10304 [005] 171.853240: ftrace:function: task_tgid_nr_ns ls 10304 [005] 171.853242: ftrace:function: __task_pid_nr_ns ls 10304 [005] 171.853244: ftrace:function: __task_pid_nr_ns We can see that all the function traces are doubled. The problem is caused by the inconsistency of the register function perf_ftrace_event_register() with the probe function perf_ftrace_function_call(). The former registers one probe for every perf_event. And the latter handles all perf_events on the current cpu. So when two perf_events on the current cpu, the traces of them will be doubled. So this patch adds an extra parameter "event" for perf_tp_event, only send sample data to this event when it's not NULL. Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: huawei.libin@huawei.com Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29perf/core: Fix potential double-fetch bugMeng Xu
While examining the kernel source code, I found a dangerous operation that could turn into a double-fetch situation (a race condition bug) where the same userspace memory region are fetched twice into kernel with sanity checks after the first fetch while missing checks after the second fetch. 1. The first fetch happens in line 9573 get_user(size, &uattr->size). 2. Subsequently the 'size' variable undergoes a few sanity checks and transformations (line 9577 to 9584). 3. The second fetch happens in line 9610 copy_from_user(attr, uattr, size) 4. Given that 'uattr' can be fully controlled in userspace, an attacker can race condition to override 'uattr->size' to arbitrary value (say, 0xFFFFFFFF) after the first fetch but before the second fetch. The changed value will be copied to 'attr->size'. 5. There is no further checks on 'attr->size' until the end of this function, and once the function returns, we lose the context to verify that 'attr->size' conforms to the sanity checks performed in step 2 (line 9577 to 9584). 6. My manual analysis shows that 'attr->size' is not used elsewhere later, so, there is no working exploit against it right now. However, this could easily turns to an exploitable one if careless developers start to use 'attr->size' later. To fix this, override 'attr->size' from the second fetch to the one from the first fetch, regardless of what is actually copied in. In this way, it is assured that 'attr->size' is consistent with the checks performed after the first fetch. Signed-off-by: Meng Xu <mengxu.gatech@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: alexander.shishkin@linux.intel.com Cc: meng.xu@gatech.edu Cc: sanidhya@gatech.edu Cc: taesoo@gatech.edu Link: http://lkml.kernel.org/r/1503522470-35531-1-git-send-email-meng.xu@gatech.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()Jiri Slaby
ALIGN+GLOBAL is effectively what ENTRY() does, so use ENTRY() which is dedicated for exactly this purpose -- global functions. Note that stub32_clone() is a C-like leaf function -- it has a standard call frame -- it only switches one argument and continues by jumping into C. Since each ENTRY() should be balanced by some END*() marker, we add a corresponding ENDPROC() to stub32_clone() too. Besides that, x86's custom GLOBAL macro is going to die very soon. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170824080624.7768-2-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/fpu/math-emu: Add ENDPROC to functionsJiri Slaby
Functions in math-emu are annotated as ENTRY() symbols, but their ends are not annotated at all. But these are standard functions called from C, with proper stack register update etc. Omitting the ends means: * the annotations are not paired and we cannot deal with such functions e.g. in objtool * the symbols are not marked as functions in the object file * there are no sizes of the functions in the object file So fix this by adding ENDPROC() to each such case in math-emu. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170824080624.7768-1-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot/64: Extract efi_pe_entry() from startup_64()Jiri Slaby
Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into startup_64(). In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry() to start at 0x210. But this requirement was removed long time ago, in: 99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code") The way it is now makes the code less readable and illogical. Given we can now safely extract the inlined efi_pe_entry() body from startup_64() into a separate function, we do so. We also annotate the function appropriatelly by ENTRY+ENDPROC. ABI offsets are preserved: 0000000000000000 T startup_32 0000000000000200 T startup_64 0000000000000390 T efi64_stub_entry On the top-level, it looked like: .org 0x200 ENTRY(startup_64) #ifdef CONFIG_EFI_STUB ; start of inlined jmp preferred_addr GLOBAL(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) leaq preferred_addr(%rax), %rax jmp *%rax preferred_addr: #endif ; end of inlined ... ; a lot of assembly (startup_64) ENDPROC(startup_64) And it is now converted into: .org 0x200 ENTRY(startup_64) ... ; a lot of assembly (startup_64) ENDPROC(startup_64) #ifdef CONFIG_EFI_STUB ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) leaq startup_64(%rax), %rax jmp *%rax ENDPROC(efi_pe_entry) #endif Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/boot/32: Extract efi_pe_entry() from startup_32()Jiri Slaby
The efi_pe_entry() body is somehow squashed into startup_32(). In the old days, we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start at 0x10. But this requirement was removed long time ago, in: 99f857db8857 ("x86, build: Dynamically find entry points in compressed startup code") The way it is now makes the code less readable and illogical. Given we can now safely extract the inlined efi_pe_entry() body from startup_32() into a separate function, we do so and we separate it to two functions as they are marked already: efi_pe_entry() + efi32_stub_entry(). We also annotate the functions appropriatelly by ENTRY+ENDPROC. ABI offset is preserved: 0000 128 FUNC GLOBAL DEFAULT 6 startup_32 0080 60 FUNC GLOBAL DEFAULT 6 efi_pe_entry 00bc 68 FUNC GLOBAL DEFAULT 6 efi32_stub_entry On the top-level, it looked like this: ENTRY(startup_32) #ifdef CONFIG_EFI_STUB ; start of inlined jmp preferred_addr ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) ENTRY(efi32_stub_entry) ... ; a lot of assembly (efi32_stub_entry) leal preferred_addr(%eax), %eax jmp *%eax preferred_addr: #endif ; end of inlined ... ; a lot of assembly (startup_32) ENDPROC(startup_32) And it is now converted into: ENTRY(startup_32) ... ; a lot of assembly (startup_32) ENDPROC(startup_32) #ifdef CONFIG_EFI_STUB ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) ENDPROC(efi_pe_entry) ENTRY(efi32_stub_entry) ... ; a lot of assembly (efi32_stub_entry) leal startup_32(%eax), %eax jmp *%eax ENDPROC(efi32_stub_entry) #endif Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time beingIngo Molnar
Mike Galbraith bisected a boot crash back to the following commit: 7a46ec0e2f48 ("locking/refcounts, x86/asm: Implement fast refcount overflow protection") The crash/hang pattern is: > Symptom is a few splats as below, with box finally hanging. Network > comes up, but neither ssh nor console login is possible. > > ------------[ cut here ]------------ > WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0 > ... > __sk_destruct() > rcu_process_callbacks() > __do_softirq() > irq_exit() > smp_apic_timer_interrupt() > apic_timer_interrupt() We are at -rc7 already, and the code has grown some dependencies, so instead of a plain revert disable the config temporarily, in the hope of getting real fixes. Reported-by: Mike Galbraith <efault@gmx.de> Tested-by: Mike Galbraith <efault@gmx.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/tip-7a46ec0e2f4850407de5e1d19a44edee6efa58ec@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29x86/ldt: Fix off by one in get_segment_base()Dan Carpenter
ldt->entries[] is allocated in alloc_ldt_struct(). It has ldt->nr_entries elements and ldt->nr_entries is capped at LDT_ENTRIES. So if "idx" is == ldt->nr_entries then we're reading beyond the end of the buffer. It seems duplicative to have two limit checks when one would work just as well so I removed the check against LDT_ENTRIES. The gdt_page.gdt[] array has GDT_ENTRIES entries. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Fixes: d07bdfd322d3 ("perf/x86: Fix USER/KERNEL tagging of samples properly") Link: http://lkml.kernel.org/r/20170818102516.gqwm4xdvvuvjw5ho@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29devicetree: bindings: Remove deprecated propertiesMagnus Damm
The deprecated DT properties are part of the GIT history, no need to keep them around any longer. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29devicetree: bindings: Remove unused 32-bit CMT bindingsMagnus Damm
Remove the 32-bit CMT compat strings to reduce maintenance burden. It should be fine to break DT compatibility because the 32-bit CMT DT binding was never part of any upstream DTS file. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29devicetree: bindings: Deprecate property, update exampleMagnus Damm
Deprecate "renesas,channels-mask" and update the r8a7790 CMT example. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29devicetree: bindings: r8a73a4 and R-Car Gen2 CMT bindingsMagnus Damm
Update SoC-specific bindings for r8a73a4 and R-Car Gen2 CMT0 and CMT1. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29devicetree: bindings: R-Car Gen2 CMT0 and CMT1 bindingsMagnus Damm
Add documentation for new separate CMT0 and CMT1 DT compatible strings for R-Car Gen2. These compat strings allow us to enable CMT1-specific features in the driver. The old compat strings will be deprecated in the not so distant future. Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29devicetree: bindings: Remove sh7372 CMT bindingMagnus Damm
Remove the sh7372 CMT compat string to reduce maintenance burden. It should be fine to break DT compatibility because: 1) The sh7372 SoC support has been removed from upstream 2) The sh7372 CMT DT binding was never part of upstream DTS 3) The CMT driver never matches on the sh7372 binding Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29clocksource/drivers/imx-tpm: Add imx tpm timer supportDong Aisheng
IMX Timer/PWM Module (TPM) supports both timer and pwm function while this patch only adds the timer support. PWM would be added later. The TPM counter, compare and capture registers are clocked by an asynchronous clock that can remain enabled in low power modes. NOTE: We observed in a very small probability, the bus fabric contention between GPU and A7 may results a few cycles delay of writing CNT registers which may cause the min_delta event got missed, so we need add a ETIME check here in case it happened. Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Anson Huang <Anson.Huang@nxp.com> Cc: Bai Ping <ping.bai@nxp.com> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29dt-bindings: timer: Add nxp tpm timer binding docDong Aisheng
Adding NXP Low Power Timer/Pulse Width Modulation Module (TPM) binding doc. Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Bai Ping <ping.bai@nxp.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-29x86/microcode/intel: Improve microcode patches saving flowBorislav Petkov
Avoid potentially dereferencing a NULL pointer when saving a microcode patch for early loading on the application processors. While at it, drop the IS_ERR() checking in favor of simpler, NULL-ptr checks which are sufficient and rename __alloc_microcode_buf() to memdup_patch() to more precisely denote what it does. No functionality change. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170825100456.n236w3jebteokfd6@pd.tnic
2017-08-29x86/mm: Fix SME encryption stack ptr handlingBorislav Petkov
sme_encrypt_execute() stashes the stack pointer on entry into %rbp because it allocates a one-page stack in the non-encrypted area for the encryption routine to use. When the latter is done, it restores it from %rbp again, before returning. However, it uses the FRAME_* macros partially but restores %rsp from %rbp explicitly with a MOV. And this is fine as long as the macros *actually* do something. Unless, you do a !CONFIG_FRAME_POINTER build where those macros are empty. Then, we still restore %rsp from %rbp but %rbp contains *something* and this leads to a stack corruption. The manifestation being a triple-fault during early boot when testing SME. Good luck to me debugging this with the clumsy endless-loop-in-asm method and narrowing it down gradually. :-( So, long story short, open-code the frame macros so that there's no monkey business and we avoid subtly breaking SME depending on the .config. Fixes: 6ebcb060713f ("x86/mm: Add support to encrypt the kernel in-place") Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Link: http://lkml.kernel.org/r/20170827163924.25552-1-bp@alien8.de
2017-08-28net: dsa: Don't dereference dst->cpu_dp->netdevFlorian Fainelli
If we do not have a master network device attached dst->cpu_dp will be NULL and accessing cpu_dp->netdev will create a trace similar to the one below. The correct check is on dst->cpu_dp period. [ 1.004650] DSA: switch 0 0 parsed [ 1.008078] Unable to handle kernel NULL pointer dereference at virtual address 00000010 [ 1.016195] pgd = c0003000 [ 1.018918] [00000010] *pgd=80000000004003, *pmd=00000000 [ 1.024349] Internal error: Oops: 206 [#1] SMP ARM [ 1.029157] Modules linked in: [ 1.032228] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc6-00071-g45b45afab9bd-dirty #7 [ 1.040772] Hardware name: Broadcom STB (Flattened Device Tree) [ 1.046704] task: ee08f840 task.stack: ee090000 [ 1.051258] PC is at dsa_register_switch+0x5e0/0x9dc [ 1.056234] LR is at dsa_register_switch+0x5d0/0x9dc [ 1.061211] pc : [<c08fb28c>] lr : [<c08fb27c>] psr: 60000213 [ 1.067491] sp : ee091d88 ip : 00000000 fp : 0000000c [ 1.072728] r10: 00000000 r9 : 00000001 r8 : ee208010 [ 1.077965] r7 : ee2b57b0 r6 : ee2b5780 r5 : 00000000 r4 : ee208e0c [ 1.084506] r3 : 00000000 r2 : 00040d00 r1 : 2d1b2000 r0 : 00000016 [ 1.091050] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 1.098199] Control: 32c5387d Table: 00003000 DAC: fffffffd [ 1.103957] Process swapper/0 (pid: 1, stack limit = 0xee090210) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29Merge branch 'drm-vmwgfx-next' of ↵Dave Airlie
git://people.freedesktop.org/~syeh/repos_linux into drm-next vmwgfx add fence fd support. * 'drm-vmwgfx-next' of git://people.freedesktop.org/~syeh/repos_linux: drm/vmwgfx: Bump the version for fence FD support drm/vmwgfx: Add export fence to file descriptor support drm/vmwgfx: Add support for imported Fence File Descriptor drm/vmwgfx: Prepare to support fence fd drm/vmwgfx: Fix incorrect command header offset at restart drm/vmwgfx: Support the NOP_ERROR command drm/vmwgfx: Restart command buffers after errors drm/vmwgfx: Move irq bottom half processing to threads drm/vmwgfx: Don't use drm_irq_[un]install
2017-08-29Merge tag 'exynos-drm-next-for-v4.14' of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next Summary: - Provide NV12MT pixel format support of Mixer driver in generic way. - Refactor Exynos KMS drivers . Refactoring to panel detection way . Refactoring to setting up possible_crtcs . Refactoring to video and command mode support - Some cleanups * tag 'exynos-drm-next-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos: drm/exynos: simplify set_pixfmt() in DECON and FIMD drivers drm/exynos: consistent use of cpp drm/exynos: mixer: remove src offset from mixer_graph_buffer() drm/exynos: mixer: simplify mixer_graph_buffer() drm/exynos: mixer: simplify vp_video_buffer() drm/exynos: mixer: enable NV12MT support for the video plane drm/exynos: mixer: fix chroma comment in vp_video_buffer() arm64: dts: exynos: remove i80-if-timings nodes dt-bindings: exynos5433-decon: remove i80-if-timings property drm/exynos/decon5433: use mode info stored in CRTC to detect i80 mode drm/exynos: add mode_valid callback to exynos_drm drm/exynos/decon5433: refactor irq requesting code drm/exynos/mic: use mode info stored in CRTC to detect i80 mode drm/exynos/dsi: propagate info about command mode from panel drm/exynos/dsi: refactor panel detection logic drm/exynos: use helper to set possible crtcs drm/exynos/decon5433: use readl_poll_timeout helpers
2017-08-29Merge tag 'drm-misc-next-fixes-2017-08-28' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-misc into drm-next UAPI Changes: - Rename u32 to __u32 in struct drm_format_modifier_blob (Lionel) Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> * tag 'drm-misc-next-fixes-2017-08-28' of git://anongit.freedesktop.org/git/drm-misc: drm: rename u32 in __u32 in uapi
2017-08-29drm/syncobj: Add a signal ioctl (v3)Jason Ekstrand
This IOCTL provides a mechanism for userspace to trigger a sync object directly. There are other ways that userspace can trigger a syncobj such as submitting a dummy batch somewhere or hanging on to a triggered sync_file and doing an import. This just provides an easy way to manually trigger the sync object without weird hacks. The motivation for this IOCTL is Vulkan fences. Vulkan lets you create a fence already in the signaled state so that you can wait on it immediatly without stalling. We could also handle this with a new create flag to ask the driver to create a syncobj that is already signaled but the IOCTL seemed a bit cleaner and more generic. v2: - Take an array of sync objects (Dave Airlie) v3: - Throw -EINVAL if pad != 0 Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-29drm/syncobj: Add a reset ioctl (v3)Jason Ekstrand
This just resets the dma_fence to NULL so it looks like it's never been signaled. This will be useful once we add the new wait API for allowing wait on "submit and signal" behavior. v2: - Take an array of sync objects (Dave Airlie) v3: - Throw -EINVAL if pad != 0 Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian König <christian.koenig@amd.com> (v1) Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-08-28page waitqueue: always add new entries at the endLinus Torvalds
Commit 3510ca20ece0 ("Minor page waitqueue cleanups") made the page queue code always add new waiters to the back of the queue, which helps upcoming patches to batch the wakeups for some horrid loads where the wait queues grow to thousands of entries. However, I forgot about the nasrt add_page_wait_queue() special case code that is only used by the cachefiles code. That one still continued to add the new wait queue entries at the beginning of the list. Fix it, because any sane batched wakeup will require that we don't suddenly start getting new entries at the beginning of the list that we already handled in a previous batch. [ The current code always does the whole list while holding the lock, so wait queue ordering doesn't matter for correctness, but even then it's better to add later entries at the end from a fairness standpoint ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-28bridge: check for null fdb->dst before notifying switchdev driversRoopa Prabhu
current switchdev drivers dont seem to support offloading fdb entries pointing to the bridge device which have fdb->dst not set to any port. This patch adds a NULL fdb->dst check in the switchdev notifier code. This patch fixes the below NULL ptr dereference: $bridge fdb add 00:02:00:00:00:33 dev br0 self [ 69.953374] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 69.954044] IP: br_switchdev_fdb_notify+0x29/0x80 [ 69.954044] PGD 66527067 [ 69.954044] P4D 66527067 [ 69.954044] PUD 7899c067 [ 69.954044] PMD 0 [ 69.954044] [ 69.954044] Oops: 0000 [#1] SMP [ 69.954044] Modules linked in: [ 69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1 [ 69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 69.954044] task: ffff88007b827140 task.stack: ffffc90001564000 [ 69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80 [ 69.954044] RSP: 0018:ffffc90001567918 EFLAGS: 00010246 [ 69.954044] RAX: 0000000000000000 RBX: ffff8800795e0880 RCX: 00000000000000c0 [ 69.954044] RDX: ffffc90001567920 RSI: 000000000000001c RDI: ffff8800795d0600 [ 69.954044] RBP: ffffc90001567938 R08: ffff8800795d0600 R09: 0000000000000000 [ 69.954044] R10: ffffc90001567a88 R11: ffff88007b849400 R12: ffff8800795e0880 [ 69.954044] R13: ffff8800795d0600 R14: ffffffff81ef8880 R15: 000000000000001c [ 69.954044] FS: 00007f93d3085700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 69.954044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 69.954044] CR2: 0000000000000008 CR3: 0000000066551000 CR4: 00000000000006e0 [ 69.954044] Call Trace: [ 69.954044] fdb_notify+0x3f/0xf0 [ 69.954044] __br_fdb_add.isra.12+0x1a7/0x370 [ 69.954044] br_fdb_add+0x178/0x280 [ 69.954044] rtnl_fdb_add+0x10a/0x200 [ 69.954044] rtnetlink_rcv_msg+0x1b4/0x240 [ 69.954044] ? skb_free_head+0x21/0x40 [ 69.954044] ? rtnl_calcit.isra.18+0xf0/0xf0 [ 69.954044] netlink_rcv_skb+0xed/0x120 [ 69.954044] rtnetlink_rcv+0x15/0x20 [ 69.954044] netlink_unicast+0x180/0x200 [ 69.954044] netlink_sendmsg+0x291/0x370 [ 69.954044] ___sys_sendmsg+0x180/0x2e0 [ 69.954044] ? filemap_map_pages+0x2db/0x370 [ 69.954044] ? do_wp_page+0x11d/0x420 [ 69.954044] ? __handle_mm_fault+0x794/0xd80 [ 69.954044] ? vma_link+0xcb/0xd0 [ 69.954044] __sys_sendmsg+0x4c/0x90 [ 69.954044] SyS_sendmsg+0x12/0x20 [ 69.954044] do_syscall_64+0x63/0xe0 [ 69.954044] entry_SYSCALL64_slow_path+0x25/0x25 [ 69.954044] RIP: 0033:0x7f93d2bad690 [ 69.954044] RSP: 002b:00007ffc7217a638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 69.954044] RAX: ffffffffffffffda RBX: 00007ffc72182eac RCX: 00007f93d2bad690 [ 69.954044] RDX: 0000000000000000 RSI: 00007ffc7217a670 RDI: 0000000000000003 [ 69.954044] RBP: 0000000059a1f7f8 R08: 0000000000000006 R09: 000000000000000a [ 69.954044] R10: 00007ffc7217a400 R11: 0000000000000246 R12: 00007ffc7217a670 [ 69.954044] R13: 00007ffc72182a98 R14: 00000000006114c0 R15: 00007ffc72182aa0 [ 69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47 20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d 55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00 [ 69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP: ffffc90001567918 [ 69.954044] CR2: 0000000000000008 [ 69.954044] ---[ end trace 03e9eec4a82c238b ]--- Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configsTejun Heo
When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of @node. The assumption seems that if !NUMA, there shouldn't be more than one node and thus reporting cpu_online_mask regardless of @node is correct. However, that assumption was broken years ago to support DISCONTIGMEM and whether a system has multiple nodes or not is separately controlled by NEED_MULTIPLE_NODES. This means that, on a system with !NUMA && NEED_MULTIPLE_NODES, cpumask_of_node() will report cpu_online_mask for all possible nodes, indicating that the CPUs are associated with multiple nodes which is an impossible configuration. This bug has been around forever but doesn't look like it has caused any noticeable symptoms. However, it triggers a WARN recently added to workqueue to verify NUMA affinity configuration. Fix it by reporting empty cpumask on non-zero nodes if !NUMA. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-28IB/rxe: Handle NETDEV_CHANGE eventsAndrew Boyer
Without this fix, ports configured on top of ixgbe miss link up notifications. ibv_query_port() will continue to return IBV_PORT_DOWN even though the port is up and working. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28IB/rxe: Avoid ICRC errors by copying into the skb firstAndrew Boyer
The current process is to first calculate the CRC and then copy the client data into the packet. This leaves a window in which the packet contents and CRC can get out of sync, if the client changes the data after the CRC is calculated but before the data is copied. By copying the data into the packet and then calculating the CRC directly from the packet contents we eliminate the window. This can be seen with qperf's ud_bi_bw test. This seems like very strange/reckless client behavior, but whether the client has mangled its data or not RXE should be able to transfer it reliably. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>