summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-07-22packet: propagate sock_cmsg_send() errorSoheil Hassas Yeganeh
sock_cmsg_send() can return different error codes and not only -EINVAL, and we should properly propagate them. Fixes: c14ac9451c34 ("sock: enable timestamping using control messages") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22Merge branch 'macsec-gro'David S. Miller
Paolo Abeni says: ==================== macsec: enable s/w offloads This patches leverage gro_cells infrastructure to enable both GRO and RPS on macsec devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22macsec: enable GRO and RPS on macsec devicesPaolo Abeni
Use gro_gells to trigger GRO and allow RPS on macsec traffic after decryption. Also, be sure to avoid clearing software offload features in macsec_fix_features(). Overall this increase TCP tput by 30% on recent h/w. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22gro_cells: gro_cells_receive now return error codePaolo Abeni
so that the caller can update stats accordingly, if needed Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22Merge branch 'xfs-4.8-misc-fixes-4' into for-nextDave Chinner
2016-07-22xfs: remove EXPERIMENTAL tag from sparse inode featureDave Chinner
Been around for long enough now, hasn't caused any regression test failures in the past 3 months, so it's time to make it a fully supported feature. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-22xfs: bufferhead chains are invalid after end_page_writebackDave Chinner
In xfs_finish_page_writeback(), we have a loop that looks like this: do { if (off < bvec->bv_offset) goto next_bh; if (off > end) break; bh->b_end_io(bh, !error); next_bh: off += bh->b_size; } while ((bh = bh->b_this_page) != head); The b_end_io function is end_buffer_async_write(), which will call end_page_writeback() once all the buffers have marked as no longer under IO. This issue here is that the only thing currently protecting both the bufferhead chain and the page from being reclaimed is the PageWriteback state held on the page. While we attempt to limit the loop to just the buffers covered by the IO, we still read from the buffer size and follow the next pointer in the bufferhead chain. There is no guarantee that either of these are valid after the PageWriteback flag has been cleared. Hence, loops like this are completely unsafe, and result in use-after-free issues. One such problem was caught by Calvin Owens with KASAN: ..... INFO: Freed in 0x103fc80ec age=18446651500051355200 cpu=2165122683 pid=-1 free_buffer_head+0x41/0x90 __slab_free+0x1ed/0x340 kmem_cache_free+0x270/0x300 free_buffer_head+0x41/0x90 try_to_free_buffers+0x171/0x240 xfs_vm_releasepage+0xcb/0x3b0 try_to_release_page+0x106/0x190 shrink_page_list+0x118e/0x1a10 shrink_inactive_list+0x42c/0xdf0 shrink_zone_memcg+0xa09/0xfa0 shrink_zone+0x2c3/0xbc0 ..... Call Trace: <IRQ> [<ffffffff81e8b8e4>] dump_stack+0x68/0x94 [<ffffffff8153a995>] print_trailer+0x115/0x1a0 [<ffffffff81541174>] object_err+0x34/0x40 [<ffffffff815436e7>] kasan_report_error+0x217/0x530 [<ffffffff81543b33>] __asan_report_load8_noabort+0x43/0x50 [<ffffffff819d651f>] xfs_destroy_ioend+0x3bf/0x4c0 [<ffffffff819d69d4>] xfs_end_bio+0x154/0x220 [<ffffffff81de0c58>] bio_endio+0x158/0x1b0 [<ffffffff81dff61b>] blk_update_request+0x18b/0xb80 [<ffffffff821baf57>] scsi_end_request+0x97/0x5a0 [<ffffffff821c5558>] scsi_io_completion+0x438/0x1690 [<ffffffff821a8d95>] scsi_finish_command+0x375/0x4e0 [<ffffffff821c3940>] scsi_softirq_done+0x280/0x340 Where the access is occuring during IO completion after the buffer had been freed from direct memory reclaim. Prevent use-after-free accidents in this end_io processing loop by pre-calculating the loop conditionals before calling bh->b_end_io(). The loop is already limited to just the bufferheads covered by the IO in progress, so the offset checks are sufficient to prevent accessing buffers in the chain after end_page_writeback() has been called by the the bh->b_end_io() callout. Yet another example of why Bufferheads Must Die. cc: <stable@vger.kernel.org> # 4.7 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reported-and-Tested-by: Calvin Owens <calvinowens@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-22xfs: allocate log vector buffers outside CIL context lockDave Chinner
One of the problems we currently have with delayed logging is that under serious memory pressure we can deadlock memory reclaim. THis occurs when memory reclaim (such as run by kswapd) is reclaiming XFS inodes and issues a log force to unpin inodes that are dirty in the CIL. The CIL is pushed, but this will only occur once it gets the CIL context lock to ensure that all committing transactions are complete and no new transactions start being committed to the CIL while the push switches to a new context. The deadlock occurs when the CIL context lock is held by a committing process that is doing memory allocation for log vector buffers, and that allocation is then blocked on memory reclaim making progress. Memory reclaim, however, is blocked waiting for a log force to make progress, and so we effectively deadlock at this point. To solve this problem, we have to move the CIL log vector buffer allocation outside of the context lock so that memory reclaim can always make progress when it needs to force the log. The problem with doing this is that a CIL push can take place while we are determining if we need to allocate a new log vector buffer for an item and hence the current log vector may go away without warning. That means we canot rely on the existing log vector being present when we finally grab the context lock and so we must have a replacement buffer ready to go at all times. To ensure this, introduce a "shadow log vector" buffer that is always guaranteed to be present when we gain the CIL context lock and format the item. This shadow buffer may or may not be used during the formatting, but if the log item does not have an existing log vector buffer or that buffer is too small for the new modifications, we swap it for the new shadow buffer and format the modifications into that new log vector buffer. The result of this is that for any object we modify more than once in a given CIL checkpoint, we double the memory required to track dirty regions in the log. For single modifications then we consume the shadow log vectorwe allocate on commit, and that gets consumed by the checkpoint. However, if we make multiple modifications, then the second transaction commit will allocate a shadow log vector and hence we will end up with double the memory usage as only one of the log vectors is consumed by the CIL checkpoint. The remaining shadow vector will be freed when th elog item is freed. This can probably be optimised in future - access to the shadow log vector is serialised by the object lock (as opposited to the active log vector, which is controlled by the CIL context lock) and so we can probably free shadow log vector from some objects when the log item is marked clean on removal from the AIL. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-22libxfs: directory node splitting does not have an extra blockDave Chinner
xfsprogs source commit 4280e59dcbc4cd8e01585efe788a68eb378048e8 xfs_da3_split() has to handle all three versions of the directory/attribute btree structure. The attr tree is v1, the dir tre is v2 or v3. The main difference between the v1 and v2/3 trees is the way tree nodes are split - in the v1 tree we can require a double split to occur because the object to be inserted may be larger than the space made by splitting a leaf. In this case we need to do a double split - one to split the full leaf, then another to allocate an empty leaf block in the correct location for the new entry. This does not happen with dir (v2/v3) formats as the objects being inserted are always guaranteed to fit into the new space in the split blocks. Indeed, for directories they *may* be an extra block on this buffer pointer. However, it's guaranteed not to be a leaf block (i.e. a directory data block) - the directory code only ever places hash index or free space blocks in this pointer (as a cursor of sorts), and so to use it as a directory data block will immediately corrupt the directory. The problem is that the code assumes that there may be extra blocks that we need to link into the tree once we've split the root, but this is not true for either dir or attr trees, because the extra attr block is always consumed by the last node split before we split the root. Hence the linking in an extra block is always wrong at the root split level, and this manifests itself in repair as a directory corruption in a repaired directory, leaving the directory rebuild incomplete. This is a dir v2 zero-day bug - it was in the initial dir v2 commit that was made back in February 1998. Fix this by ensuring the linking of the blocks after the root split never tries to make use of the extra blocks that may be held in the cursor. They are held there for other purposes and should never be touched by the root splitting code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-22xfs: remove dax code from object file when disabledArnd Bergmann
We check IS_DAX(inode) before calling either xfs_file_dax_read or xfs_file_dax_write, and this will lead the call being optimized out at compile time when CONFIG_FS_DAX is disabled. However, the two functions are marked STATIC, so they become global symbols when CONFIG_XFS_DEBUG is set, leaving us with two unused global functions that call into an undefined function and a broken "allmodconfig" build: fs/built-in.o: In function `xfs_file_dax_read': fs/xfs/xfs_file.c:348: undefined reference to `dax_do_io' fs/built-in.o: In function `xfs_file_dax_write': fs/xfs/xfs_file.c:758: undefined reference to `dax_do_io' Marking the two functions 'static noinline' instead of 'STATIC' will let the compiler drop the symbols when there are no callers but avoid the implicit inlining. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 16d4d43595b4 ("xfs: split direct I/O and DAX path") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-22xfs: skip dirty pages in ->releasepage()Brian Foster
XFS has had scattered reports of delalloc blocks present at ->releasepage() time. This results in a warning with a stack trace similar to the following: ... Call Trace: [<ffffffffa23c5b8f>] dump_stack+0x63/0x84 [<ffffffffa20837a7>] warn_slowpath_common+0x97/0xe0 [<ffffffffa208380a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa2326caf>] xfs_vm_releasepage+0x10f/0x140 [<ffffffffa218c680>] ? page_mkclean_one+0xd0/0xd0 [<ffffffffa218d3a0>] ? anon_vma_prepare+0x150/0x150 [<ffffffffa21521c2>] try_to_release_page+0x32/0x50 [<ffffffffa2166b2e>] shrink_active_list+0x3ce/0x3e0 [<ffffffffa21671c7>] shrink_lruvec+0x687/0x7d0 [<ffffffffa21673ec>] shrink_zone+0xdc/0x2c0 [<ffffffffa2168539>] kswapd+0x4f9/0x970 [<ffffffffa2168040>] ? mem_cgroup_shrink_node_zone+0x1a0/0x1a0 [<ffffffffa20a0d99>] kthread+0xc9/0xe0 [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 [<ffffffffa26b404f>] ret_from_fork+0x3f/0x70 [<ffffffffa20a0cd0>] ? kthread_stop+0x100/0x100 This occurs because it is possible for shrink_active_list() to send pages marked dirty to ->releasepage() when certain buffer_head threshold conditions are met. shrink_active_list() doesn't check the page dirty state apparently to handle an old ext3 corner case where in some cases clean pages would not have the dirty bit cleared, thus it is up to the filesystem to determine how to handle the page. XFS currently handles the delalloc case properly, but this behavior makes the warning spurious. Update the XFS ->releasepage() handler to explicitly skip dirty pages. Retain the existing delalloc/unwritten checks so we continue to warn if such buffers exist on clean pages when they shouldn't. Diagnosed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-21Documentation: dtb: xgene: Add hwmon dts binding documentationhotran
This patch adds the APM X-Gene hwmon device tree node documentation. Signed-off-by: Hoan Tran <hotran@apm.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-07-21cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()Viresh Kumar
The handlers provided by cpufreq core are sufficient for resolving the frequency for drivers providing ->target_index(), as the core already has the frequency table and so ->resolve_freq() isn't required for such platforms. This patch disallows drivers with ->target_index() callback to use the ->resolve_freq() callback. Also, it fixes a potential kernel crash for drivers providing ->target() but no ->resolve_freq(). Fixes: e3c062360870 "cpufreq: add cpufreq_driver_resolve_freq()" Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21ACPI: enable ACPI_PROCESSOR_IDLE on ARM64Sudeep Holla
Now that ACPI processor idle driver supports LPI(Low Power Idle), lets enable ACPI_PROCESSOR_IDLE for ARM64 too. This patch just removes the IA64 and X86 dependency on ACPI_PROCESSOR_IDLE Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21arm64: add support for ACPI Low Power Idle(LPI)Sudeep Holla
This patch adds appropriate callbacks to support ACPI Low Power Idle (LPI) on ARM64. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21drivers: firmware: psci: initialise idle states using ACPI LPISudeep Holla
This patch adds support for initialisation of PSCI CPUIdle states from Low Power Idle(_LPI) entries in the ACPI tables when acpi is enabled. Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64}Sudeep Holla
The function arm_enter_idle_state is exactly the same in both generic ARM{32,64} CPUIdle driver and will be the same even on ARM64 backend for ACPI processor idle driver. So we can unify it and move it to a common place by introducing CPU_PM_CPU_IDLE_ENTER macro that can be used in all places avoiding duplication. This is in preparation of reuse of the generic cpuidle entry function for ACPI LPI support on ARM64. Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21arm64: cpuidle: drop __init section marker to arm_cpuidle_initSudeep Holla
Commit ea389daa7fd9 (arm64: cpuidle: add __init section marker to arm_cpuidle_init) added the __init annotation to arm_cpuidle_init as it was not needed after booting which was correct at that time. However with the introduction of ACPI LPI support, this will be used from cpuhotplug path in ACPI processor driver. This patch drops the __init annotation from arm_cpuidle_init to avoid the following warning: WARNING: vmlinux.o(.text+0x113c8): Section mismatch in reference from the function acpi_processor_ffh_lpi_probe() to the function .init.text:arm_cpuidle_init() The function acpi_processor_ffh_lpi_probe() references the function __init arm_cpuidle_init(). This is often because acpi_processor_ffh_lpi_probe() lacks a __init annotation or the annotation of arm_cpuidle_init is wrong. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21ACPI / processor_idle: Add support for Low Power Idle(LPI) statesSudeep Holla
ACPI 6.0 introduced an optional object _LPI that provides an alternate method to describe Low Power Idle states. It defines the local power states for each node in a hierarchical processor topology. The OSPM can use _LPI object to select a local power state for each level of processor hierarchy in the system. They used to produce a composite power state request that is presented to the platform by the OSPM. Since multiple processors affect the idle state for any non-leaf hierarchy node, coordination of idle state requests between the processors is required. ACPI supports two different coordination schemes: Platform coordinated and OS initiated. This patch adds initial support for Platform coordination scheme of LPI. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21ACPI / processor_idle: introduce ACPI_PROCESSOR_CSTATESudeep Holla
ACPI 6.0 adds a new method to specify the CPU idle states(C-states) called Low Power Idle(LPI) states. Since new architectures like ARM64 use only LPIs, introduce ACPI_PROCESSOR_CSTATE to encapsulate all the code supporting the old style C-states(_CST). This patch will help to extend the processor_idle module to support LPI. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21PCI / PM: check all fields in pci_set_platform_pm()Andy Shevchenko
When assign new PCI platform PM operations check for all mandatory fields to prevent NULL pointer dereference. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21cpufreq: acpi-cpufreq: use cached frequency mapping when possibleSteve Muckle
A call to cpufreq_driver_resolve_freq will cache the mapping from the desired target frequency to the frequency table index. If there is a mapping for the desired target frequency then use it instead of looking up the mapping again. Signed-off-by: Steve Muckle <smuckle@linaro.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21cpufreq: schedutil: map raw required frequency to driver frequencySteve Muckle
The slow-path frequency transition path is relatively expensive as it requires waking up a thread to do work. Should support be added for remote CPU cpufreq updates that is also expensive since it requires an IPI. These activities should be avoided if they are not necessary. To that end, calculate the actual driver-supported frequency required by the new utilization value in schedutil by using the recently added cpufreq_driver_resolve_freq API. If it is the same as the previously requested driver frequency then there is no need to continue with the update assuming the cpu frequency limits have not changed. This will have additional benefits should the semantics of the rate limit be changed to apply solely to frequency transitions rather than to frequency calculations in schedutil. The last raw required frequency is cached. This allows the driver frequency lookup to be skipped in the event that the new raw required frequency matches the last one, assuming a frequency update has not been forced due to limits changing (indicated by a next_freq value of UINT_MAX, see sugov_should_update_freq). Signed-off-by: Steve Muckle <smuckle@linaro.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21GFS2: Fix gfs2_replay_incr_blk for multiple journal sizesBob Peterson
Before this patch, if you used gfs2_jadd to add new journals of a size smaller than the existing journals, replaying those new journals would withdraw. That's because function gfs2_replay_incr_blk was using the number of journal blocks (jd_block) from the superblock's journal pointer. In other words, "My journal's max size" rather than "the journal we're replaying's size." This patch changes the function to use the size of the pertinent journal rather than always using the journal we happen to be using. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-07-21Merge branch 'for-next/kprobes' into for-next/coreCatalin Marinas
* kprobes: arm64: kprobes: Add KASAN instrumentation around stack accesses arm64: kprobes: Cleanup jprobe_return arm64: kprobes: Fix overflow when saving stack arm64: kprobes: WARN if attempting to step with PSTATE.D=1 kprobes: Add arm64 case in kprobe example module arm64: Add kernel return probes support (kretprobes) arm64: Add trampoline code for kretprobes arm64: kprobes instruction simulation support arm64: Treat all entry code as non-kprobe-able arm64: Blacklist non-kprobe-able symbol arm64: Kprobes with single stepping support arm64: add conditional instruction simulation support arm64: Add more test functions to insn.c arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature
2016-07-21x86/fpu: Do not BUG_ON() in early FPU codeDave Hansen
I don't think it is really possible to have a system where CPUID enumerates support for XSAVE but that it does not have FP/SSE (they are "legacy" features and always present). But, I did manage to hit this case in qemu when I enabled its somewhat shaky XSAVE support. The bummer is that the FPU is set up before we parse the command-line or have *any* console support including earlyprintk. That turned what should have been an easy thing to debug in to a bit more of an odyssey. So a BUG() here is worthless. All it does it guarantee that if/when we hit this case we have an empty console. So, remove the BUG() and try to limp along by disabling XSAVE and trying to continue. Add a comment on why we are doing this, and also add a common "out_disable" path for leaving fpu__init_system_xstate(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160720194551.63BB2B58@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-21arm64: Honor nosmp kernel command line optionSuzuki K Poulose
Passing "nosmp" should boot the kernel with a single processor, without provision to enable secondary CPUs even if they are present. "nosmp" is implemented by setting maxcpus=0. At the moment we still mark the secondary CPUs present even with nosmp, which allows the userspace to bring them up. This patch corrects the smp_prepare_cpus() to honor the maxcpus == 0. Commit 44dbcc93ab67145 ("arm64: Fix behavior of maxcpus=N") fixed the behavior for maxcpus >= 1, but broke maxcpus = 0. Fixes: 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N") Cc: <stable@vger.kernel.org> # 4.7+ Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: updated code comment] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-21arm64: Fix incorrect per-cpu usage for boot CPUSuzuki K Poulose
In smp_prepare_boot_cpu(), we invoke cpuinfo_store_boot_cpu to store the cpuinfo in a per-cpu ptr, before initialising the per-cpu offset for the boot CPU. This patch reorders the sequence to make sure we initialise the per-cpu offset before accessing the per-cpu area. Commit 4b998ff1885eec ("arm64: Delay cpuinfo_store_boot_cpu") fixed the issue where we modified the per-cpu area even before the kernel initialises the per-cpu areas, but failed to wait until the boot cpu updated it's offset. Fixes: 4b998ff1885e ("arm64: Delay cpuinfo_store_boot_cpu") Cc: <stable@vger.kernel.org> # 4.4+ Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-21cpufreq: add cpufreq_driver_resolve_freq()Steve Muckle
Cpufreq governors may need to know what a particular target frequency maps to in the driver without necessarily wanting to set the frequency. Support this operation via a new cpufreq API, cpufreq_driver_resolve_freq(). This API returns the lowest driver frequency equal or greater than the target frequency (CPUFREQ_RELATION_L), subject to any policy (min/max) or driver limitations. The mapping is also cached in the policy so that a subsequent fast_switch operation can avoid repeating the same lookup. The API will call a new cpufreq driver callback, resolve_freq(), if it has been registered by the driver. Otherwise the frequency is resolved via cpufreq_frequency_table_target(). Rather than require ->target() style drivers to provide a resolve_freq() callback it is left to the caller to ensure that the driver implements this callback if necessary to use cpufreq_driver_resolve_freq(). Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Steve Muckle <smuckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21perf tools: Add AVX-512 instructions to the new instructions testAdrian Hunter
Previous patches added support for Intel's AVX-512 instructions to the kernel and perf tools instruction decoders. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). Add a representative set of instructions to perf's "new instructions" test. e.g. perf test "new instructions" Or to view a particular instruction: perf test -v "new instructions" 2>&1 | grep vbroadcasti64x4 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21perf tools: Add AVX-512 support to the instruction decoder used by Intel PTAdrian Hunter
Add support for Intel's AVX-512 instructions to perf tools instruction decoder used by Intel PT. The kernel's instruction decoder was updated in a previous patch. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). AVX-512 instructions are identified by a EVEX prefix which, for the purpose of instruction decoding, can be treated as though it were a 4-byte VEX prefix. Existing instructions which can now accept an EVEX prefix need not be further annotated in the op code map (x86-opcode-map.txt). In the case of new instructions, the op code map is updated accordingly. Also add associated Mask Instructions that are used to manipulate mask registers used in AVX-512 instructions. A representative set of instructions is added to the perf tools new instructions test in a subsequent patch. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21x86/insn: Add AVX-512 support to the instruction decoderAdrian Hunter
Add support for Intel's AVX-512 instructions to the instruction decoder. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). AVX-512 instructions are identified by a EVEX prefix which, for the purpose of instruction decoding, can be treated as though it were a 4-byte VEX prefix. Existing instructions which can now accept an EVEX prefix need not be further annotated in the op code map (x86-opcode-map.txt). In the case of new instructions, the op code map is updated accordingly. Also add associated Mask Instructions that are used to manipulate mask registers used in AVX-512 instructions. The 'perf tools' instruction decoder is updated in a subsequent patch. And a representative set of instructions is added to the perf tools new instructions test in a subsequent patch. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPTSrinivas Pandruvada
The MSR MSR_HWP_INTERRUPT is valid only when CPUID.06H:EAX[8] = 1, so check for feature before accessing this MSR. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21intel_pstate: Update cpu_frequency tracepoint every timeRafael J. Wysocki
Currently, intel_pstate only updates the cpu_frequency tracepoint if the new P-state to set is different from the current one, but that causes powertop to report 100% idle on an 100% loaded system sometimes. Prevent that from happening by updating the cpu_frequency tracepoint every time intel_pstate_update_pstate() is called. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>-
2016-07-21cpufreq: intel_pstate: clean remnant struct elementCarsten Emde
When I was working with the Intel P state driver I came across a remnant struct element that is no longer needed after the function intel_pstate_calc_freq() was retired. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21ACPI / DPTF: move int340x_thermal.c to the DPTF folderSrinivas Pandruvada
Since DPTF has its own folder under ACPI, move this file also there. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21ACPI / DPTF: Add DPTF power participant driverSrinivas Pandruvada
This driver adds support for Dynamic Platform and Thermal Framework (DPTF) Platform Power Participant device (INT3407) support. This participant is responsible for exposing platform telemetry such as: max_platform_power platform_power_source adapter_rating battery_steady_power charger_type These attributes are presented via sysfs interface under the INT3407 platform device: $ls /sys/bus/platform/devices/INT3407\:00/dptf_power/ adapter_rating_mw battery_steady_power_mw charger_type max_platform_power_mw platform_power_source ` ACPI methods description used in this driver: PMAX: Maximum platform power that can be supported by the battery in mW. PSRC: System charge source, 0x00 = DC 0x01 = AC 0x02 = USB 0x03 = Wireless Charger ARTG: Adapter rating in mW (Maximum Adapter power) Must be 0 if no AC adapter is plugged in. CTYP: Charger Type, Traditional : 0x01 Hybrid: 0x02 NVDC: 0x03 PBSS: Returns max sustained power for battery in milliWatts. The INT3407 also contains _BTS and _BIX objects, which are compliant to ACPI 5.0, specification. Those objects are already used by ACPI battery (PNP0C0A) driver and information about them is exported via Linux power supply class registration. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21arm64: kprobes: Add KASAN instrumentation around stack accessesCatalin Marinas
This patch disables KASAN around the memcpy from/to the kernel or IRQ stacks to avoid warnings like below: BUG: KASAN: stack-out-of-bounds in setjmp_pre_handler+0xe4/0x170 at addr ffff800935cbbbc0 Read of size 128 by task swapper/0/1 page:ffff7e0024d72ec0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x1000000000000000() page dumped because: kasan: bad access detected CPU: 4 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc4+ #1 Hardware name: ARM Juno development board (r0) (DT) Call trace: [<ffff20000808ad88>] dump_backtrace+0x0/0x280 [<ffff20000808b01c>] show_stack+0x14/0x20 [<ffff200008563a64>] dump_stack+0xa4/0xc8 [<ffff20000824a1fc>] kasan_report_error+0x4fc/0x528 [<ffff20000824a5e8>] kasan_report+0x40/0x48 [<ffff20000824948c>] check_memory_region+0x144/0x1a0 [<ffff200008249814>] memcpy+0x34/0x68 [<ffff200008c3ee2c>] setjmp_pre_handler+0xe4/0x170 [<ffff200008c3ec5c>] kprobe_breakpoint_handler+0xec/0x1d8 [<ffff2000080853a4>] brk_handler+0x5c/0xa0 [<ffff2000080813f0>] do_debug_exception+0xa0/0x138 Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-21arm64: kprobes: Cleanup jprobe_returnMarc Zyngier
jprobe_return seems to have aged badly. Comments referring to non-existent behaviours, and a dangerous habit of messing with registers without telling the compiler. This patches applies the following remedies: - Fix the comments to describe the actual behaviour - Tidy up the asm sequence to directly assign the stack pointer without clobbering extra registers - Mark the rest of the function as unreachable() so that the compiler knows that there is no need for an epilogue - Stop making jprobe_return_break a global function (you really don't want to call that guy, and it isn't even a function). Tested with tcp_probe. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-21x86/boot: Reorganize and clean up the BIOS area reservation codeIngo Molnar
So the reserve_ebda_region() code has accumulated a number of problems over the years that make it really difficult to read and understand: - The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks... - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it super confusing to read. - It does not help at all that we have various memory range markers, half of which are 'start of range', half of which are 'end of range' - but this crucial property is not obvious in the naming at all ... gave me a headache trying to understand all this. - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is obvious, all values here are addresses!), while it does not highlight that it's the _start_ of the EBDA region ... - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value that is a pointer to a value, not a memory range address! - The function name itself is a misnomer: it says 'reserve_ebda_region()' while its main purpose is to reserve all the firmware ROM typically between 640K and 1MB, while the 'EBDA' part is only a small part of that ... - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this too should be about whether to reserve firmware areas in the paravirt case. - In fact thinking about this as 'end of RAM' is confusing: what this function *really* wants to reserve is firmware data and code areas! Once the thinking is inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure 'reserved area' notion everything becomes a lot clearer. To improve all this rewrite the whole code (without changing the logic): - Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start' and propagate this concept through all the variable names and constants. BIOS_RAM_SIZE_KB_PTR // was: BIOS_LOWMEM_KILOBYTES BIOS_START_MIN // was: INSANE_CUTOFF ebda_start // was: ebda_addr bios_start // was: lowmem BIOS_START_MAX // was: LOWMEM_CAP - Then clean up the name of the function itself by renaming it to reserve_bios_regions() and renaming the ::ebda_search paravirt flag to ::reserve_bios_regions. - Fix up all the comments (fix typos), harmonize and simplify their formulation and remove comments that become unnecessary due to the much better naming all around. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-20Merge tag 'nfc-next-4.8-1' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.8 pull request This is the first NFC pull request for 4.8. We have: - A fairly large NFC digital stack patchset: * RTOX fixes. * Proper DEP RWT support. * ACK and NACK PDUs handling fixes, in both initiator and target modes. * A few memory leak fixes. - A conversion of the nfcsim driver to use the digital stack. The driver supports the DEP protocol in both NFC-A and NFC-F. - Error injection through debugfs for the nfcsim driver. - Improvements to the port100 driver for the Sony USB chipset, in particular to the command abort and cancellation code paths. - A few minor fixes for the pn533, trf7970a and fdp drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20samples: Add an IPv6 '-6' option to the pktgen scriptsMartin KaFai Lau
Add a '-6' option to the sample pktgen scripts for sending out IPv6 packets. [root@kerneldev010.prn1 ~/pktgen]# ./pktgen_sample03_burst_single_flow.sh -i eth0 -s 64 -d fe80::f652:14ff:fec2:a14c -m f4:52:14:c2:a1:4c -b 32 -6 [root@kerneldev011.prn1 ~]# tcpdump -i eth0 -nn -c3 port 9 tcpdump: WARNING: eth0: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 14:38:51.815297 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16 14:38:51.815311 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16 14:38:51.815313 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20Merge branch 'xdp-cleanups'David S. Miller
Brenden Blanco says: ==================== misc cleanups for xdp This addresses several of the non-blocking comments left over from the xdp patch set. See individual patches for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20bpf: make xdp sample variable names more meaningfulBrenden Blanco
The naming choice of index is not terribly descriptive, and dropcnt is in fact incorrect for xdp2. Pick better names for these: ipproto and rxcnt. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20rtnl: protect do_setlink from IFLA_XDP_ATTACHEDBrenden Blanco
The IFLA_XDP_ATTACHED nested attribute is meant for read-only, and while do_setlink properly ignores it, it should be more paranoid and reject commands that try to set it. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20net/mlx4_en: use READ_ONCE when freeing xdp_progBrenden Blanco
For consistency, and in order to hint at the synchronous nature of the xdp_prog field, use READ_ONCE in the destroy path of the ring. All occurrences should now use either READ_ONCE or xchg. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2016-07-20 This series contains updates to fm10k only. Ngai-Mint provides a fix to clear PCIE_GMBX bits to ensure the proper functioning of the mailbox global interrupt after a data path reset. Jake provides most of the patches in the series, starting with a early return from fm10k_down() if we are already down to prevent conflict with other threads. Fixed an issue where fm10k_update_stats() could cause a null pointer dereference, specifically if it is called when we are going down and the rings have been removed. Cleans up and fixes the data path reset flow, Tx hang routine and stop_hw(). Re-worked the fm10k_reinit() to be more maintainable and fixed several inconsistencies with the work flow. Implemented fm10k_prepare_suspend() and fm10k_handle_resume() which abstract around the now existing fm10k_prepare_for_reset and fm10k_handle_reset. The new functions also handle stopping the service task, which is something that the original re-init flow does not need. Fixed an issue where if an FLR occurs, VF devices will be knocked out of bus master mode, and the driver will be unable to recover from the reset properly, so ensure bus master is enabled after every reset. Fixed an issue where a reset will occur as if for no reason, regularly every few minutes until the switch manager software is loaded, which is caused by continuously requesting the lport map so only do the request after we have verified the switch mailbox is tx_ready. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Herbert Xu
Merge the crypto tree to resolve conflict in qat Makefile.
2016-07-21crypto: qat - make qat_asym_algs.o depend on asn1 headersJan Stancek
Parallel build can sporadically fail because asn1 headers may not be built yet by the time qat_asym_algs.o is compiled: drivers/crypto/qat/qat_common/qat_asym_algs.c:55:32: fatal error: qat_rsapubkey-asn1.h: No such file or directory #include "qat_rsapubkey-asn1.h" Cc: stable@vger.kernel.org Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-20Merge branch 'mv88r6xxx-eeprom-rework'David S. Miller
Vivien Didelot says: ==================== net: dsa: mv88e6xxx: rework EEPROM code Some switches can access an optional external EEPROM via its registers. The 88E6352 family of switches have 8-bit address / 16-bit data access. The new 88E6390 family has 16-bit address / 8-bit data access. This patchset cleans up the EEPROM code with 16-suffixed Global2 helpers and makes it easy to add future support for 8-bit data EEPROM access. It also removes unnecessary mutexes and a few locked access functions. Changes in v2: - add missing Signed-off-by tag ==================== Signed-off-by: David S. Miller <davem@davemloft.net>