linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-07-21	Merge branch 'for-next/kprobes' into for-next/core	Catalin Marinas
	* kprobes: arm64: kprobes: Add KASAN instrumentation around stack accesses arm64: kprobes: Cleanup jprobe_return arm64: kprobes: Fix overflow when saving stack arm64: kprobes: WARN if attempting to step with PSTATE.D=1 kprobes: Add arm64 case in kprobe example module arm64: Add kernel return probes support (kretprobes) arm64: Add trampoline code for kretprobes arm64: kprobes instruction simulation support arm64: Treat all entry code as non-kprobe-able arm64: Blacklist non-kprobe-able symbol arm64: Kprobes with single stepping support arm64: add conditional instruction simulation support arm64: Add more test functions to insn.c arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature
2016-07-21	x86/fpu: Do not BUG_ON() in early FPU code	Dave Hansen
	I don't think it is really possible to have a system where CPUID enumerates support for XSAVE but that it does not have FP/SSE (they are "legacy" features and always present). But, I did manage to hit this case in qemu when I enabled its somewhat shaky XSAVE support. The bummer is that the FPU is set up before we parse the command-line or have any console support including earlyprintk. That turned what should have been an easy thing to debug in to a bit more of an odyssey. So a BUG() here is worthless. All it does it guarantee that if/when we hit this case we have an empty console. So, remove the BUG() and try to limp along by disabling XSAVE and trying to continue. Add a comment on why we are doing this, and also add a common "out_disable" path for leaving fpu__init_system_xstate(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160720194551.63BB2B58@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-21	arm64: Honor nosmp kernel command line option	Suzuki K Poulose
	Passing "nosmp" should boot the kernel with a single processor, without provision to enable secondary CPUs even if they are present. "nosmp" is implemented by setting maxcpus=0. At the moment we still mark the secondary CPUs present even with nosmp, which allows the userspace to bring them up. This patch corrects the smp_prepare_cpus() to honor the maxcpus == 0. Commit 44dbcc93ab67145 ("arm64: Fix behavior of maxcpus=N") fixed the behavior for maxcpus >= 1, but broke maxcpus = 0. Fixes: 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N") Cc: <stable@vger.kernel.org> # 4.7+ Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> [catalin.marinas@arm.com: updated code comment] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-21	arm64: Fix incorrect per-cpu usage for boot CPU	Suzuki K Poulose
	In smp_prepare_boot_cpu(), we invoke cpuinfo_store_boot_cpu to store the cpuinfo in a per-cpu ptr, before initialising the per-cpu offset for the boot CPU. This patch reorders the sequence to make sure we initialise the per-cpu offset before accessing the per-cpu area. Commit 4b998ff1885eec ("arm64: Delay cpuinfo_store_boot_cpu") fixed the issue where we modified the per-cpu area even before the kernel initialises the per-cpu areas, but failed to wait until the boot cpu updated it's offset. Fixes: 4b998ff1885e ("arm64: Delay cpuinfo_store_boot_cpu") Cc: <stable@vger.kernel.org> # 4.4+ Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-21	cpufreq: add cpufreq_driver_resolve_freq()	Steve Muckle
	Cpufreq governors may need to know what a particular target frequency maps to in the driver without necessarily wanting to set the frequency. Support this operation via a new cpufreq API, cpufreq_driver_resolve_freq(). This API returns the lowest driver frequency equal or greater than the target frequency (CPUFREQ_RELATION_L), subject to any policy (min/max) or driver limitations. The mapping is also cached in the policy so that a subsequent fast_switch operation can avoid repeating the same lookup. The API will call a new cpufreq driver callback, resolve_freq(), if it has been registered by the driver. Otherwise the frequency is resolved via cpufreq_frequency_table_target(). Rather than require ->target() style drivers to provide a resolve_freq() callback it is left to the caller to ensure that the driver implements this callback if necessary to use cpufreq_driver_resolve_freq(). Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Steve Muckle <smuckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21	perf tools: Add AVX-512 instructions to the new instructions test	Adrian Hunter
	Previous patches added support for Intel's AVX-512 instructions to the kernel and perf tools instruction decoders. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). Add a representative set of instructions to perf's "new instructions" test. e.g. perf test "new instructions" Or to view a particular instruction: perf test -v "new instructions" 2>&1 \| grep vbroadcasti64x4 Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21	perf tools: Add AVX-512 support to the instruction decoder used by Intel PT	Adrian Hunter
	Add support for Intel's AVX-512 instructions to perf tools instruction decoder used by Intel PT. The kernel's instruction decoder was updated in a previous patch. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). AVX-512 instructions are identified by a EVEX prefix which, for the purpose of instruction decoding, can be treated as though it were a 4-byte VEX prefix. Existing instructions which can now accept an EVEX prefix need not be further annotated in the op code map (x86-opcode-map.txt). In the case of new instructions, the op code map is updated accordingly. Also add associated Mask Instructions that are used to manipulate mask registers used in AVX-512 instructions. A representative set of instructions is added to the perf tools new instructions test in a subsequent patch. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21	x86/insn: Add AVX-512 support to the instruction decoder	Adrian Hunter
	Add support for Intel's AVX-512 instructions to the instruction decoder. AVX-512 instructions are documented in Intel Architecture Instruction Set Extensions Programming Reference (February 2016). AVX-512 instructions are identified by a EVEX prefix which, for the purpose of instruction decoding, can be treated as though it were a 4-byte VEX prefix. Existing instructions which can now accept an EVEX prefix need not be further annotated in the op code map (x86-opcode-map.txt). In the case of new instructions, the op code map is updated accordingly. Also add associated Mask Instructions that are used to manipulate mask registers used in AVX-512 instructions. The 'perf tools' instruction decoder is updated in a subsequent patch. And a representative set of instructions is added to the perf tools new instructions test in a subsequent patch. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: http://lkml.kernel.org/r/1469003437-32706-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-21	cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT	Srinivas Pandruvada
	The MSR MSR_HWP_INTERRUPT is valid only when CPUID.06H:EAX[8] = 1, so check for feature before accessing this MSR. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21	intel_pstate: Update cpu_frequency tracepoint every time	Rafael J. Wysocki
	Currently, intel_pstate only updates the cpu_frequency tracepoint if the new P-state to set is different from the current one, but that causes powertop to report 100% idle on an 100% loaded system sometimes. Prevent that from happening by updating the cpu_frequency tracepoint every time intel_pstate_update_pstate() is called. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>-
2016-07-21	cpufreq: intel_pstate: clean remnant struct element	Carsten Emde
	When I was working with the Intel P state driver I came across a remnant struct element that is no longer needed after the function intel_pstate_calc_freq() was retired. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21	ACPI / DPTF: move int340x_thermal.c to the DPTF folder	Srinivas Pandruvada
	Since DPTF has its own folder under ACPI, move this file also there. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21	ACPI / DPTF: Add DPTF power participant driver	Srinivas Pandruvada
	This driver adds support for Dynamic Platform and Thermal Framework (DPTF) Platform Power Participant device (INT3407) support. This participant is responsible for exposing platform telemetry such as: max_platform_power platform_power_source adapter_rating battery_steady_power charger_type These attributes are presented via sysfs interface under the INT3407 platform device: $ls /sys/bus/platform/devices/INT3407\:00/dptf_power/ adapter_rating_mw battery_steady_power_mw charger_type max_platform_power_mw platform_power_source ` ACPI methods description used in this driver: PMAX: Maximum platform power that can be supported by the battery in mW. PSRC: System charge source, 0x00 = DC 0x01 = AC 0x02 = USB 0x03 = Wireless Charger ARTG: Adapter rating in mW (Maximum Adapter power) Must be 0 if no AC adapter is plugged in. CTYP: Charger Type, Traditional : 0x01 Hybrid: 0x02 NVDC: 0x03 PBSS: Returns max sustained power for battery in milliWatts. The INT3407 also contains _BTS and _BIX objects, which are compliant to ACPI 5.0, specification. Those objects are already used by ACPI battery (PNP0C0A) driver and information about them is exported via Linux power supply class registration. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-21	arm64: kprobes: Add KASAN instrumentation around stack accesses	Catalin Marinas
	This patch disables KASAN around the memcpy from/to the kernel or IRQ stacks to avoid warnings like below: BUG: KASAN: stack-out-of-bounds in setjmp_pre_handler+0xe4/0x170 at addr ffff800935cbbbc0 Read of size 128 by task swapper/0/1 page:ffff7e0024d72ec0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x1000000000000000() page dumped because: kasan: bad access detected CPU: 4 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc4+ #1 Hardware name: ARM Juno development board (r0) (DT) Call trace: [<ffff20000808ad88>] dump_backtrace+0x0/0x280 [<ffff20000808b01c>] show_stack+0x14/0x20 [<ffff200008563a64>] dump_stack+0xa4/0xc8 [<ffff20000824a1fc>] kasan_report_error+0x4fc/0x528 [<ffff20000824a5e8>] kasan_report+0x40/0x48 [<ffff20000824948c>] check_memory_region+0x144/0x1a0 [<ffff200008249814>] memcpy+0x34/0x68 [<ffff200008c3ee2c>] setjmp_pre_handler+0xe4/0x170 [<ffff200008c3ec5c>] kprobe_breakpoint_handler+0xec/0x1d8 [<ffff2000080853a4>] brk_handler+0x5c/0xa0 [<ffff2000080813f0>] do_debug_exception+0xa0/0x138 Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-21	arm64: kprobes: Cleanup jprobe_return	Marc Zyngier
	jprobe_return seems to have aged badly. Comments referring to non-existent behaviours, and a dangerous habit of messing with registers without telling the compiler. This patches applies the following remedies: - Fix the comments to describe the actual behaviour - Tidy up the asm sequence to directly assign the stack pointer without clobbering extra registers - Mark the rest of the function as unreachable() so that the compiler knows that there is no need for an epilogue - Stop making jprobe_return_break a global function (you really don't want to call that guy, and it isn't even a function). Tested with tcp_probe. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-07-21	x86/boot: Reorganize and clean up the BIOS area reservation code	Ingo Molnar
	So the reserve_ebda_region() code has accumulated a number of problems over the years that make it really difficult to read and understand: - The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks... - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it super confusing to read. - It does not help at all that we have various memory range markers, half of which are 'start of range', half of which are 'end of range' - but this crucial property is not obvious in the naming at all ... gave me a headache trying to understand all this. - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is obvious, all values here are addresses!), while it does not highlight that it's the _start_ of the EBDA region ... - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value that is a pointer to a value, not a memory range address! - The function name itself is a misnomer: it says 'reserve_ebda_region()' while its main purpose is to reserve all the firmware ROM typically between 640K and 1MB, while the 'EBDA' part is only a small part of that ... - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this too should be about whether to reserve firmware areas in the paravirt case. - In fact thinking about this as 'end of RAM' is confusing: what this function really wants to reserve is firmware data and code areas! Once the thinking is inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure 'reserved area' notion everything becomes a lot clearer. To improve all this rewrite the whole code (without changing the logic): - Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start' and propagate this concept through all the variable names and constants. BIOS_RAM_SIZE_KB_PTR // was: BIOS_LOWMEM_KILOBYTES BIOS_START_MIN // was: INSANE_CUTOFF ebda_start // was: ebda_addr bios_start // was: lowmem BIOS_START_MAX // was: LOWMEM_CAP - Then clean up the name of the function itself by renaming it to reserve_bios_regions() and renaming the ::ebda_search paravirt flag to ::reserve_bios_regions. - Fix up all the comments (fix typos), harmonize and simplify their formulation and remove comments that become unnecessary due to the much better naming all around. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-20	Merge tag 'nfc-next-4.8-1' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.8 pull request This is the first NFC pull request for 4.8. We have: - A fairly large NFC digital stack patchset: * RTOX fixes. * Proper DEP RWT support. * ACK and NACK PDUs handling fixes, in both initiator and target modes. * A few memory leak fixes. - A conversion of the nfcsim driver to use the digital stack. The driver supports the DEP protocol in both NFC-A and NFC-F. - Error injection through debugfs for the nfcsim driver. - Improvements to the port100 driver for the Sony USB chipset, in particular to the command abort and cancellation code paths. - A few minor fixes for the pn533, trf7970a and fdp drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	samples: Add an IPv6 '-6' option to the pktgen scripts	Martin KaFai Lau
	Add a '-6' option to the sample pktgen scripts for sending out IPv6 packets. [root@kerneldev010.prn1 ~/pktgen]# ./pktgen_sample03_burst_single_flow.sh -i eth0 -s 64 -d fe80::f652:14ff:fec2:a14c -m f4:52:14:c2:a1:4c -b 32 -6 [root@kerneldev011.prn1 ~]# tcpdump -i eth0 -nn -c3 port 9 tcpdump: WARNING: eth0: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 14:38:51.815297 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16 14:38:51.815311 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16 14:38:51.815313 IP6 fe80::f652:14ff:fec2:2ad2.9 > fe80::f652:14ff:fec2:a14c.9: UDP, length 16 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	Merge branch 'xdp-cleanups'	David S. Miller
	Brenden Blanco says: ==================== misc cleanups for xdp This addresses several of the non-blocking comments left over from the xdp patch set. See individual patches for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	bpf: make xdp sample variable names more meaningful	Brenden Blanco
	The naming choice of index is not terribly descriptive, and dropcnt is in fact incorrect for xdp2. Pick better names for these: ipproto and rxcnt. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	rtnl: protect do_setlink from IFLA_XDP_ATTACHED	Brenden Blanco
	The IFLA_XDP_ATTACHED nested attribute is meant for read-only, and while do_setlink properly ignores it, it should be more paranoid and reject commands that try to set it. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	net/mlx4_en: use READ_ONCE when freeing xdp_prog	Brenden Blanco
	For consistency, and in order to hint at the synchronous nature of the xdp_prog field, use READ_ONCE in the destroy path of the ring. All occurrences should now use either READ_ONCE or xchg. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	Merge branch '100GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2016-07-20 This series contains updates to fm10k only. Ngai-Mint provides a fix to clear PCIE_GMBX bits to ensure the proper functioning of the mailbox global interrupt after a data path reset. Jake provides most of the patches in the series, starting with a early return from fm10k_down() if we are already down to prevent conflict with other threads. Fixed an issue where fm10k_update_stats() could cause a null pointer dereference, specifically if it is called when we are going down and the rings have been removed. Cleans up and fixes the data path reset flow, Tx hang routine and stop_hw(). Re-worked the fm10k_reinit() to be more maintainable and fixed several inconsistencies with the work flow. Implemented fm10k_prepare_suspend() and fm10k_handle_resume() which abstract around the now existing fm10k_prepare_for_reset and fm10k_handle_reset. The new functions also handle stopping the service task, which is something that the original re-init flow does not need. Fixed an issue where if an FLR occurs, VF devices will be knocked out of bus master mode, and the driver will be unable to recover from the reset properly, so ensure bus master is enabled after every reset. Fixed an issue where a reset will occur as if for no reason, regularly every few minutes until the switch manager software is loaded, which is caused by continuously requesting the lport map so only do the request after we have verified the switch mailbox is tx_ready. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-21	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Herbert Xu
	Merge the crypto tree to resolve conflict in qat Makefile.
2016-07-21	crypto: qat - make qat_asym_algs.o depend on asn1 headers	Jan Stancek
	Parallel build can sporadically fail because asn1 headers may not be built yet by the time qat_asym_algs.o is compiled: drivers/crypto/qat/qat_common/qat_asym_algs.c:55:32: fatal error: qat_rsapubkey-asn1.h: No such file or directory #include "qat_rsapubkey-asn1.h" Cc: stable@vger.kernel.org Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-20	Merge branch 'mv88r6xxx-eeprom-rework'	David S. Miller
	Vivien Didelot says: ==================== net: dsa: mv88e6xxx: rework EEPROM code Some switches can access an optional external EEPROM via its registers. The 88E6352 family of switches have 8-bit address / 16-bit data access. The new 88E6390 family has 16-bit address / 8-bit data access. This patchset cleans up the EEPROM code with 16-suffixed Global2 helpers and makes it easy to add future support for 8-bit data EEPROM access. It also removes unnecessary mutexes and a few locked access functions. Changes in v2: - add missing Signed-off-by tag ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	net: dsa: mv88e6xxx: kill last locked reg_read	Vivien Didelot
	Get rid of the last usage of the locked mv88e6xxx_reg_read function with a new mv88e6xxx_port_read helper, useful later for chips with different port registers base address. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	net: dsa: mv88e6xxx: rework EEPROM access	Vivien Didelot
	The 6352 family of switches and compatibles provide a 8-bit address and 16-bit data access to an optional EEPROM. Newer chip such as the 6390 family slightly changed the access to 16-bit address and 8-bit data. This commit cleans up the EEPROM access code for 16-bit access and makes it easy to eventually introduce future support for 8-bit access. Here's a list of notable changes brought by this patch: - provide Global2 unlocked helpers for EEPROM commands - remove eeprom_mutex, only reg_lock is necessary for driver functions - eeprom_len is 0 for chip without EEPROM, so return it directly - the Running bit must be 0 before r/w, so wait for Busy and Running - remove now unused mv88e6xxx_wait and mv88e6xxx_reg_write - other than that, the logic (in _{get,set}_eeprom16) didn't change Chips with an 8-bit EEPROM access will require to implement the 8-suffixed variant of G2 helpers and the related flag: #define MV88E6XXX_FLAGS_EEPROM8 \ (MV88E6XXX_FLAG_G2_EEPROM_CMD \| \ MV88E6XXX_FLAG_G2_EEPROM_ADDR) Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	net: dsa: mv88e6xxx: remove unused phy_mutex	Vivien Didelot
	Only reg_lock is necessary now and phy_mutex is dead. Remove it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	net/faraday: Disallow using reversed MAC address from hardware	Gavin Shan
	The initial MAC address is retrieved from hardware if it's not provided by device-tree. The reserved MAC address from hardware will be used if non-reserved MAC address is invalid. It will cause mismatched MAC address seen by hardware and software. This disallows using the reserved hardware MAC address to avoid the mismatched MAC address seen by hardware and software. Fixes: 113ce107afe9 ("net/faraday: Read MAC address from chip") Suggested-by: David Laight <David.Laight@ACULAB.COM> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20	dm: allow bio-based table to be upgraded to bio-based with DAX support	Toshi Kani
	Allow table type DM_TYPE_BIO_BASED to extend with DM_TYPE_DAX_BIO_BASED since DM_TYPE_DAX_BIO_BASED supports bio-based requests. This is needed to allow a snapshot of an LV with DAX support to be removed. One of the intermediate table reloads that lvm2 does switches from DM_TYPE_BIO_BASED to DM_TYPE_DAX_BIO_BASED. No known reason to disallow this so... Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20	dm snap: add fake origin_direct_access	Toshi Kani
	dax-capable mapped-device is marked as DM_TYPE_DAX_BIO_BASED, which supports both dax and bio-based operations. dm-snap needs to work with dax-capable device when bio-based operation is used. Add fake origin_direct_access() to origin device so that its origin device is also marked as DM_TYPE_DAX_BIO_BASED for dax-capable device. This allows to extend target's DM table. dm-snap works normally when bio-based operation is used. dm-snap does not support dax operation, and mount with dax option to a target device or snapshot device fails. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20	dm stripe: add DAX support	Toshi Kani
	Change dm-stripe to implement direct_access function, stripe_direct_access(), which maps bdev and sector and calls direct_access function of its physical target device. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20	dm error: add DAX support	Mike Snitzer
	Allow the error target to replace an existing DAX-enabled target. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20	dm linear: add DAX support	Toshi Kani
	Change dm-linear to implement direct_access function, linear_direct_access(), which maps sector and calls direct_access function of its physical target device. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20	dm: add infrastructure for DAX support	Toshi Kani
	Change mapped device to implement direct_access function, dm_blk_direct_access(), which calls a target direct_access function. 'struct target_type' is extended to have target direct_access interface. This function limits direct accessible size to the dm_target's limit with max_io_len(). Add dm_table_supports_dax() to iterate all targets and associated block devices to check for DAX support. To add DAX support to a DM target the target must only implement the direct_access function. Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped device supports DAX and is bio based. This new type is used to assure that all target devices have DAX support and remain that way after QUEUE_FLAG_DAX is set in mapped device. At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting DM_TYPE_DAX_BIO_BASED to the type. Any subsequent table load to the mapped device must have the same type, or else it fails per the check in table_load(). Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20	Merge remote-tracking branch 'jens/for-4.8/core' into dm-4.8	Mike Snitzer
	DM's DAX support depends on block core's newly added QUEUE_FLAG_DAX.
2016-07-20	block: Fix front merge check	Damien Le Moal
	For a front merge, the maximum number of sectors of the request must be checked against the front merge BIO sector, not the current sector of the request. Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20	block: do not merge requests without consulting with io scheduler	Tahsin Erdogan
	Before merging a bio into an existing request, io scheduler is called to get its approval first. However, the requests that come from a plug flush may get merged by block layer without consulting with io scheduler. In case of CFQ, this can cause fairness problems. For instance, if a request gets merged into a low weight cgroup's request, high weight cgroup now will depend on low weight cgroup to get scheduled. If high weigt cgroup needs that io request to complete before submitting more requests, then it will also lose its timeslice. Following script demonstrates the problem. Group g1 has a low weight, g2 and g3 have equal high weights but g2's requests are adjacent to g1's requests so they are subject to merging. Due to these merges, g2 gets poor disk time allocation. cat > cfq-merge-repro.sh << "EOF" #!/bin/bash set -e IO_ROOT=/mnt-cgroup/io mkdir -p $IO_ROOT if ! mount \| grep -qw $IO_ROOT; then mount -t cgroup none -oblkio $IO_ROOT fi cd $IO_ROOT for i in g1 g2 g3; do if [ -d $i ]; then rmdir $i fi done mkdir g1 && echo 10 > g1/blkio.weight mkdir g2 && echo 495 > g2/blkio.weight mkdir g3 && echo 495 > g3/blkio.weight RUNTIME=10 (echo $BASHPID > g1/cgroup.procs && fio --readonly --name name1 --filename /dev/sdb \ --rw read --size 64k --bs 64k --time_based \ --runtime=$RUNTIME --offset=0k &> /dev/null)& (echo $BASHPID > g2/cgroup.procs && fio --readonly --name name1 --filename /dev/sdb \ --rw read --size 64k --bs 64k --time_based \ --runtime=$RUNTIME --offset=64k &> /dev/null)& (echo $BASHPID > g3/cgroup.procs && fio --readonly --name name1 --filename /dev/sdb \ --rw read --size 64k --bs 64k --time_based \ --runtime=$RUNTIME --offset=256k &> /dev/null)& sleep $((RUNTIME+1)) for i in g1 g2 g3; do echo ---- $i ---- cat $i/blkio.time done EOF # ./cfq-merge-repro.sh ---- g1 ---- 8:16 162 ---- g2 ---- 8:16 165 ---- g3 ---- 8:16 686 After applying the patch: # ./cfq-merge-repro.sh ---- g1 ---- 8:16 90 ---- g2 ---- 8:16 445 ---- g3 ---- 8:16 471 Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20	nvme/pci: Provide SR-IOV support	Keith Busch
	This registers an sr-iov callback for nvme. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20	block: Fix spelling in a source code comment	Bart Van Assche
	Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20	nvme: initialize variable before logical OR'ing it	Jay Freyensee
	It is typically not good coding or secure coding practice to logical OR a variable without an initialization value first. Here on this line: integrity.flags \|= BLK_INTEGRITY_DEVICE_CAPABLE; BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable never set to an initial value. This patch fixes that. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20	block: expose QUEUE_FLAG_DAX in sysfs	Yigal Korman
	Provides the ability to identify DAX enabled devices in userspace. Signed-off-by: Yigal Korman <yigal@plexistor.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20	block: add QUEUE_FLAG_DAX for devices to advertise their DAX support	Toshi Kani
	Currently, presence of direct_access() in block_device_operations indicates support of DAX on its block device. Because block_device_operations is instantiated with 'const', this DAX capablity may not be enabled conditinally. In preparation for supporting DAX to device-mapper devices, add QUEUE_FLAG_DAX to request_queue flags to advertise their DAX support. This will allow to set the DAX capability based on how mapped device is composed. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <linux-s390@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20	Input: tty/vt/keyboard - use memdup_user()	Muhammad Falak R Wani
	Use memdup_user to duplicate a memory region from user-space to kernel-space, instead of open coding using kmalloc & copy_from_user. Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-07-20	scsi:libsas: fix oops caused by assigning a freed task to ->lldd_task	Wei Fang
	A freed task has been assigned to ->lldd_task when lldd_execute_task() failed in sas_ata_qc_issue(), and access of ->lldd_task will cause an oops: Call trace: [<ffffffc000641f64>] sas_ata_post_internal+0x6c/0x150 [<ffffffc0006c0d64>] ata_exec_internal_sg+0x32c/0x588 [<ffffffc0006c1048>] ata_exec_internal+0x88/0xe8 [<ffffffc0006c13b4>] ata_dev_read_id+0x204/0x5e0 [<ffffffc0006c17f0>] ata_dev_reread_id+0x60/0xc8 [<ffffffc0006c3098>] ata_dev_revalidate+0x88/0x1e0 [<ffffffc0006cf828>] ata_eh_recover+0xcf8/0x13a8 [<ffffffc0006d075c>] ata_do_eh+0x5c/0xe0 [<ffffffc0006d0828>] ata_std_error_handler+0x48/0x98 [<ffffffc0006d042c>] ata_scsi_port_error_handler+0x474/0x658 [<ffffffc000641b78>] async_sas_ata_eh+0x50/0x80 [<ffffffc0000ca664>] async_run_entry_fn+0x64/0x180 [<ffffffc0000c085c>] process_one_work+0x164/0x438 [<ffffffc0000c0c74>] worker_thread+0x144/0x4b0 [<ffffffc0000c70fc>] kthread+0xfc/0x110 Fix this by reassigning NULL to ->lldd_task in error path. Signed-off-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-07-20	Input: tsc200x - report proper input_dev name	Michael Welling
	Passes input_id struct to the common probe function for the tsc200x drivers instead of just the bustype. This allows for the use of the product variable to set the input_dev->name variable according to the type of touchscreen used. Note that when we introduced support for TSC2004 we started calling everything TSC200X, so let's keep this quirk. Signed-off-by: Michael Welling <mwelling@ieee.org> Cc: stable@vger.kernel.org Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-07-20	tty/vt/keyboard: fix OOB access in do_compute_shiftstate()	Dmitry Torokhov
	The size of individual keymap in drivers/tty/vt/keyboard.c is NR_KEYS, which is currently 256, whereas number of keys/buttons in input device (and therefor in key_down) is much larger - KEY_CNT - 768, and that can cause out-of-bound access when we do sym = U(key_maps[0][k]); with large 'k'. To fix it we should not attempt iterating beyond smaller of NR_KEYS and KEY_CNT. Also while at it let's switch to for_each_set_bit() instead of open-coding it. Reported-by: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-07-20	fnic: pci_dma_mapping_error() doesn't return an error code	Dan Carpenter
	pci_dma_mapping_error() returns true on error and false on success. Fixes: fd6ddfa4c1dd ('fnic: check pci_map_single() return value') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-07-21	netfilter: nf_tables: allow to filter out rules by table and chain	Pablo Neira Ayuso
	If the table and/or chain attributes are set in a rule dump request, we filter out the rules based on this selection. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>