linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-11-19	net: fix bogus cast in skb_pagelen() and use unsigned variables	Alexey Dobriyan
	1) cast to "int" is unnecessary: u8 will be promoted to int before decrementing, small positive numbers fit into "int", so their values won't be changed during promotion. Once everything is int including loop counters, signedness doesn't matter: 32-bit operations will stay 32-bit operations. But! Someone tried to make this loop smart by making everything of the same type apparently in an attempt to optimise it. Do the optimization, just differently. Do the cast where it matters. :^) 2) frag size is unsigned entity and sum of fragments sizes is also unsigned. Make everything unsigned, leave no MOVSX instruction behind. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-4 (-4) function old new delta skb_cow_data 835 834 -1 ip_do_fragment 2549 2548 -1 ip6_fragment 3130 3128 -2 Total: Before=154865032, After=154865028, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-19	netlink: use "unsigned int" in nla_next()	Alexey Dobriyan
	->nla_len is unsigned entity (it's length after all) and u16, thus it can't overflow when being aligned into int/unsigned int. (nlmsg_next has the same code, but I didn't yet convince myself it is correct to do so). There is pointer arithmetic in this function and offset being unsigned is better: add/remove: 0/0 grow/shrink: 1/64 up/down: 5/-309 (-304) function old new delta nl80211_set_wiphy 1444 1449 +5 team_nl_cmd_options_set 997 995 -2 tcf_em_tree_validate 872 870 -2 switchdev_port_bridge_setlink 352 350 -2 switchdev_port_br_afspec 312 310 -2 rtm_to_fib_config 428 426 -2 qla4xxx_sysfs_ddb_set_param 2193 2191 -2 qla4xxx_iface_set_param 4470 4468 -2 ovs_nla_free_flow_actions 152 150 -2 output_userspace 518 516 -2 ... nl80211_set_reg 654 649 -5 validate_scan_freqs 148 142 -6 validate_linkmsg 288 282 -6 nl80211_parse_connkeys 489 483 -6 nlattr_set 231 224 -7 nf_tables_delsetelem 267 260 -7 do_setlink 3416 3408 -8 netlbl_cipsov4_add_std 1672 1659 -13 nl80211_parse_sched_scan 2902 2888 -14 nl80211_trigger_scan 1738 1720 -18 do_execute_actions 2821 2738 -83 Total: Before=154865355, After=154865051, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-19	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM fixes from Radim Krčmář: "ARM: - Fix handling of the 32bit cycle counter - Fix cycle counter filtering x86: - Fix a race leading to double unregistering of user notifiers - Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead - Use SRCU around kvm_lapic_set_vapic_addr - Avoid recursive flushing of asynchronous page faults - Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP - Let userspace know that KVM_GET_CLOCK is useful with master clock; 4.9 changed the return value to better match the guest clock, but didn't provide means to let guests take advantage of it" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr KVM: async_pf: avoid recursive flushing of work items kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use KVM: Disable irq while unregistering user notifier KVM: x86: do not go through vcpu in __get_kvmclock_ns KVM: arm64: Fix the issues when guest PMCCFILTR is configured arm64: KVM: pmu: Fix AArch32 cycle counter access
2016-11-19	kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use	Paolo Bonzini
	Userspace can read the exact value of kvmclock by reading the TSC and fetching the timekeeping parameters out of guest memory. This however is brittle and not necessary anymore with KVM 4.11. Provide a mechanism that lets userspace know if the new KVM_GET_CLOCK semantics are in effect, and---since we are at it---if the clock is stable across all VCPUs. Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19	iommu/vt-d: Fix PASID table allocation	David Woodhouse
	Somehow I ended up with an off-by-three error in calculating the size of the PASID and PASID State tables, which triggers allocations failures as those tables unfortunately have to be physically contiguous. In fact, even the correct maximum size of 8MiB is problematic and is wont to lead to allocation failures. Since I have extracted a promise that this will be fixed in hardware, I'm happy to limit it on the current hardware to a maximum of 0x20000 PASIDs, which gives us 1MiB tables — still not ideal, but better than before. Reported by Mika Kuoppala <mika.kuoppala@linux.intel.com> and also by Xunlei Pang <xlpang@redhat.com> who submitted a simpler patch to fix only the allocation (and not the free) to the "correct" limit... which was still problematic. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: stable@vger.kernel.org
2016-11-19	virtio_net: Do not clear memory for struct virtio_net_hdr twice.	Jarno Rajahalme
	virtio_net_hdr_from_skb() clears the memory for the header, so there is no point for the callers to do the same. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-19	virtio_net.h: Fix comment.	Jarno Rajahalme
	Fix incorrent comment after the final #endif. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	Merge tag 'acpi-4.9-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "They fix an ACPI thermal management regression introduced by a recent FADT handling cleanup, an ACPI tools build issue introduced by a recent ACPICA commit and a PCC mailbox initialization bug causing lockdep to complain loudly. Specifics: - Revert a recent ACPICA cleanup that attempted to get rid of all FADT version 2 legacy, but broke ACPI thermal management on at least one system (Rafael Wysocki). - Fix cross-compiled builds of ACPI tools that stopped working after a recent cleanup related to the handling of header files in ACPICA (Lv Zheng). - Fix a locking issue in the PCC channel initialization code that invokes devm_request_irq() under a spinlock (among other things) and causes lockdep to complain (Hoan Tran)" * tag 'acpi-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: tools/power/acpi: Remove direct kernel source include reference mailbox: PCC: Fix lockdep warning when request PCC channel Revert "ACPICA: FADT support cleanup"
2016-11-18	Merge tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux	Linus Torvalds
	Pull nfsd bugfix from Bruce Fields: "Just one fix for an NFS/RDMA crash" * tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux: sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports
2016-11-18	Merge tag 'v4.9-rc4' into sound	Jonathan Corbet
	Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18	Merge branches 'acpica-fixes', 'acpi-cppc-fixes' and 'acpi-tools-fixes'	Rafael J. Wysocki
	* acpica-fixes: Revert "ACPICA: FADT support cleanup" * acpi-cppc-fixes: mailbox: PCC: Fix lockdep warning when request PCC channel * acpi-tools-fixes: tools/power/acpi: Remove direct kernel source include reference
2016-11-18	netns: fix get_net_ns_by_fd(int pid) typo	Stefan Hajnoczi
	The argument to get_net_ns_by_fd() is a /proc/$PID/ns/net file descriptor not a pid. Fix the typo. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Rami Rosen <roszenrami@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info	Florian Fainelli
	In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST, provide an inline stub for mvebu_mbus_get_dram_win_info(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables	Raju Lakkaraju
	For operation in cabling environments that are incompatible with 1000BASE-T, PHY device may provide an automatic link speed downshift operation. When enabled, the device automatically changes its 1000BASE-T auto-negotiation to the next slower speed after a configured number of failed attempts at 1000BASE-T. This feature is useful in setting up in networks using older cable installations that include only pairs A and B, and not pairs C and D. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE	Raju Lakkaraju
	Adding get_tunable/set_tunable function pointer to the phy_driver structure, and uses these function pointers to implement the ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	ethtool: (uapi) Add ETHTOOL_PHY_GTUNABLE and ETHTOOL_PHY_STUNABLE	Raju Lakkaraju
	Defines a generic API to get/set phy tunables. The API is using the existing ethtool_tunable/tunable_type_id types which is already being used for mac level tunables. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Add MPCNT register infrastructure	Gal Pressman
	Add the needed infrastructure for future use of MPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Set driver version infrastructure	Saeed Mahameed
	Add driver_version capability bit is enabled, and set driver version command in mlx5_ifc firmware header. The only purpose of this command is to store a driver version/OS string in FW to be reported and displayed in various management systems, such as IPMI/BMC. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Add handling for port module event	Huy Nguyen
	For each asynchronous port module event: 1. print with ratelimit to the dmesg log 2. increment the corresponding event counter Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Port module event hardware structures	Huy Nguyen
	Add hardware structures and constants definitions needed for module events support. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Make the command interface cache more flexible	Mohamad Haj Yahia
	Add more cache command size sets and more entries for each set based on the current commands set different sizes and commands frequency. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	netns: make struct pernet_operations::id unsigned int	Alexey Dobriyan
	Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	udp: enable busy polling for all sockets	Eric Dumazet
	UDP busy polling is restricted to connected UDP sockets. This is because sk_busy_loop() only takes care of one NAPI context. There are cases where it could be extended. 1) Some hosts receive traffic on a single NIC, with one RX queue. 2) Some applications use SO_REUSEPORT and associated BPF filter to split the incoming traffic on one UDP socket per RX queue/thread/cpu 3) Some UDP sockets are used to send/receive traffic for one flow, but they do not bother with connect() This patch records the napi_id of first received skb, giving more reach to busy polling. Tested: lpaa23:~# echo 70 >/proc/sys/net/core/busy_read lpaa24:~# echo 70 >/proc/sys/net/core/busy_read lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done Before patch : 27867 28870 37324 41060 41215 36764 36838 44455 41282 43843 After patch : 73920 73213 70147 74845 71697 68315 68028 75219 70082 73707 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	block: Change extern inline to static inline	Tobias Klauser
	With compilers which follow the C99 standard (like modern versions of gcc and clang), "extern inline" does the opposite thing from older versions of gcc (emits code for an externally linkable version of the inline function). "static inline" does the intended behavior in all cases instead. Description taken from commit 6d91857d4826 ("staging, rtl8192e, LLVMLinux: Change extern inline to static inline"). This also fixes the following GCC warning when building with CONFIG_PM disabled: ./include/linux/blkdev.h:1143:20: warning: no previous prototype for 'blk_set_runtime_active' [-Wmissing-prototypes] Fixes: d07ab6d11477 ("block: Add blk_set_runtime_active()") Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17	cpufreq: intel_pstate: Request P-states control from SMM if needed	Rafael J. Wysocki
	Currently, intel_pstate is unable to control P-states on my IvyBridge-based Acer Aspire S5, because they are controlled by SMM on that machine by default and it is necessary to request OS control of P-states from it via the SMI Command register exposed in the ACPI FADT. intel_pstate doesn't do that now, but acpi-cpufreq and other cpufreq drivers for x86 platforms do. Address this problem by making intel_pstate use the ACPI-defined mechanism as well. However, intel_pstate is not modular and it doesn't need the module refcount tricks played by acpi_processor_notify_smm(), so export the core of this function to it as acpi_processor_pstate_control() and make it call that. [The changes in processor_perflib.c related to this should not make any functional difference for the acpi_processor_notify_smm() users]. To be safe, only call acpi_processor_notify_smm() from intel_pstate if ACPI _PPC support is enabled in it. Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-11-17	Merge tag 'clk-renesas-for-v4.10-tag2' of ↵	Stephen Boyd
	git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into clk-next Pull Renesas clk driver updates from Geerty Uytterhoeven: - Add R-Car RST driver for obtaining mode pin state, and move the related functionality from platform code to DT, - Add r8a7743 and r8a7745 CPG Core Clock Definitions. The commits here are intermingled with arm-soc material because of the hard dependency we're breaking between mach code and driver code. We're replacing that with a driver dependency between the soc driver and the clk driver. * tag 'clk-renesas-for-v4.10-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers: (25 commits) clk: renesas: Add r8a7745 CPG Core Clock Definitions clk: renesas: Add r8a7743 CPG Core Clock Definitions clk: renesas: rcar-gen2: Remove obsolete rcar_gen2_clocks_init() clk: renesas: r8a7779: Remove obsolete r8a7779_clocks_init() clk: renesas: r8a7778: Remove obsolete r8a7778_clocks_init() ARM: shmobile: rcar-gen2: Stop passing mode pins state to clock driver ARM: shmobile: r8a7779: Stop passing mode pins state to clock driver ARM: shmobile: r8a7778: Stop passing mode pins state to clock driver clk: renesas: rcar-gen3-cpg: Remove obsolete rcar_gen3_read_mode_pins() clk: renesas: r8a7796: Obtain mode pin values from R-Car RST driver clk: renesas: r8a7795: Obtain mode pin values from R-Car RST driver clk: renesas: rcar-gen2: Obtain mode pin values using RST driver clk: renesas: r8a7779: Obtain mode pin values from R-Car RST driver clk: renesas: r8a7778: Obtain mode pin values using R-Car RST driver arm64: renesas: r8a7796 dtsi: Add device node for RST module arm64: renesas: r8a7795 dtsi: Add device node for RST module ARM: dts: r8a7794: Add device node for RST module ARM: dts: r8a7793: Add device node for RST module ARM: dts: r8a7792: Add device node for RST module ARM: dts: r8a7791: Add device node for RST module ...
2016-11-17	blk-mq: make the polling code adaptive	Jens Axboe
	The previous commit introduced the hybrid sleep/poll mode. Take that one step further, and use the completion latencies to automatically sleep for half the mean completion time. This is a good approximation. This changes the 'io_poll_delay' sysfs file a bit to expose the various options. Depending on the value, the polling code will behave differently: -1 Never enter hybrid sleep mode 0 Use half of the completion mean for the sleep delay >0 Use this specific value as the sleep delay Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17	blk-mq: implement hybrid poll mode for sync O_DIRECT	Jens Axboe
	This patch enables a hybrid polling mode. Instead of polling after IO submission, we can induce an artificial delay, and then poll after that. For example, if the IO is presumed to complete in 8 usecs from now, we can sleep for 4 usecs, wake up, and then do our polling. This still puts a sleep/wakeup cycle in the IO path, but instead of the wakeup happening after the IO has completed, it'll happen before. With this hybrid scheme, we can achieve big latency reductions while still using the same (or less) amount of CPU. Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17	mremap: fix race between mremap() and page cleanning	Aaron Lu
	Prior to 3.15, there was a race between zap_pte_range() and page_mkclean() where writes to a page could be lost. Dave Hansen discovered by inspection that there is a similar race between move_ptes() and page_mkclean(). We've been able to reproduce the issue by enlarging the race window with a msleep(), but have not been able to hit it without modifying the code. So, we think it's a real issue, but is difficult or impossible to hit in practice. The zap_pte_range() issue is fixed by commit 1cf35d47712d("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts"). And this patch is to fix the race between page_mkclean() and mremap(). Here is one possible way to hit the race: suppose a process mmapped a file with READ \| WRITE and SHARED, it has two threads and they are bound to 2 different CPUs, e.g. CPU1 and CPU2. mmap returned X, then thread 1 did a write to addr X so that CPU1 now has a writable TLB for addr X on it. Thread 2 starts mremaping from addr X to Y while thread 1 cleaned the page and then did another write to the old addr X again. The 2nd write from thread 1 could succeed but the value will get lost. thread 1 thread 2 (bound to CPU1) (bound to CPU2) 1: write 1 to addr X to get a writeable TLB on this CPU 2: mremap starts 3: move_ptes emptied PTE for addr X and setup new PTE for addr Y and then dropped PTL for X and Y 4: page laundering for N by doing fadvise FADV_DONTNEED. When done, pageframe N is deemed clean. 5: write 2 to addr X 6: tlb flush for addr X 7: munmap (Y, pagesize) to make the page unmapped 8: fadvise with FADV_DONTNEED again to kick the page off the pagecache 9: pread the page from file to verify the value. If 1 is there, it means we have lost the written 2. the write may or may not cause segmentation fault, it depends on if the TLB is still on the CPU. Please note that this is only one specific way of how the race could occur, it didn't mean that the race could only occur in exact the above config, e.g. more than 2 threads could be involved and fadvise() could be done in another thread, etc. For anonymous pages, they could race between mremap() and page reclaim: THP: a huge PMD is moved by mremap to a new huge PMD, then the new huge PMD gets unmapped/splitted/pagedout before the flush tlb happened for the old huge PMD in move_page_tables() and we could still write data to it. The normal anonymous page has similar situation. To fix this, check for any dirty PTE in move_ptes()/move_huge_pmd() and if any, did the flush before dropping the PTL. If we did the flush for every move_ptes()/move_huge_pmd() call then we do not need to do the flush in move_pages_tables() for the whole range. But if we didn't, we still need to do the whole range flush. Alternatively, we can track which part of the range is flushed in move_ptes()/move_huge_pmd() and which didn't to avoid flushing the whole range in move_page_tables(). But that would require multiple tlb flushes for the different sub-ranges and should be less efficient than the single whole range flush. KBuild test on my Sandybridge desktop doesn't show any noticeable change. v4.9-rc4: real 5m14.048s user 32m19.800s sys 4m50.320s With this commit: real 5m13.888s user 32m19.330s sys 4m51.200s Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-17	vfio: Define device_api strings	Kirti Wankhede
	Defined device API strings. Vendor driver using mediated device framework should use corresponding string for device_api attribute. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Introduce vfio_set_irqs_validate_and_prepare()	Kirti Wankhede
	Vendor driver using mediated device framework would use same mechnism to validate and prepare IRQs. Introducing this function to reduce code replication in multiple drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Introduce common function to add capabilities	Kirti Wankhede
	Vendor driver using mediated device framework should use vfio_info_add_capability() to add capabilities. Introduced this function to reduce code duplication in vendor drivers. vfio_info_cap_shift() manipulated a data buffer to add an offset to each element in a chain. This data buffer is documented in a uapi header. Changing vfio_info_cap_shift symbol to be available to all drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu: Add blocking notifier to notify DMA_UNMAP	Kirti Wankhede
	Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers about DMA_UNMAP. Exported two APIs vfio_register_notifier() and vfio_unregister_notifier(). Notifier should be registered, if external user wants to use vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages. Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate mappings. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops	Kirti Wankhede
	Added APIs for pining and unpining set of pages. These call back into backend iommu module to actually pin and unpin pages. Added two new callback functions to struct vfio_iommu_driver_ops. Backend IOMMU module that supports pining and unpinning pages for mdev devices should provide these functions. Renamed static functions in vfio_type1_iommu.c to resolve conflicts Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Mediated device Core driver	Kirti Wankhede
	Design for Mediated Device Driver: Main purpose of this driver is to provide a common interface for mediated device management that can be used by different drivers of different devices. This module provides a generic interface to create the device, add it to mediated bus, add device to IOMMU group and then add it to vfio group. Below is the high Level block diagram, with Nvidia, Intel and IBM devices as example, since these are the devices which are going to actively use this module as of now. +---------------+ \| \| \| +-----------+ \| mdev_register_driver() +--------------+ \| \| \| +<------------------------+ __init() \| \| \| mdev \| \| \| \| \| \| bus \| +------------------------>+ \|<-> VFIO user \| \| driver \| \| probe()/remove() \| vfio_mdev.ko \| APIs \| \| \| \| \| \| \| +-----------+ \| +--------------+ \| \| \| MDEV CORE \| \| MODULE \| \| mdev.ko \| \| +-----------+ \| mdev_register_device() +--------------+ \| \| \| +<------------------------+ \| \| \| \| \| \| nvidia.ko \|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| \| Physical \| \| \| \| device \| \| mdev_register_device() +--------------+ \| \| interface \| \|<------------------------+ \| \| \| \| \| \| i915.ko \|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| \| \| \| \| \| \| \| mdev_register_device() +--------------+ \| \| \| +<------------------------+ \| \| \| \| \| \| ccw_device.ko\|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| +-----------+ \| +---------------+ Core driver provides two types of registration interfaces: 1. Registration interface for mediated bus driver: /** * struct mdev_driver - Mediated device's driver * @name: driver name * @probe: called when new device created * @remove:called when device removed * @driver:device driver structure * */ struct mdev_driver { const char name; int (probe) (struct device dev); void (remove) (struct device dev); struct device_driver driver; }; Mediated bus driver for mdev device should use this interface to register and unregister with core driver respectively: int mdev_register_driver(struct mdev_driver drv, struct module owner); void mdev_unregister_driver(struct mdev_driver drv); Mediated bus driver is responsible to add/delete mediated devices to/from VFIO group when devices are bound and unbound to the driver. 2. Physical device driver interface This interface provides vendor driver the set APIs to manage physical device related work in its driver. APIs are : dev_attr_groups: attributes of the parent device. * mdev_attr_groups: attributes of the mediated device. * supported_type_groups: attributes to define supported type. This is mandatory field. * create: to allocate basic resources in vendor driver for a mediated device. This is mandatory to be provided by vendor driver. * remove: to free resources in vendor driver when mediated device is destroyed. This is mandatory to be provided by vendor driver. * open: open callback of mediated device * release: release callback of mediated device * read : read emulation callback. * write: write emulation callback. * ioctl: ioctl callback. * mmap: mmap emulation callback. Drivers should use these interfaces to register and unregister device to mdev core driver respectively: extern int mdev_register_device(struct device dev, const struct parent_ops ops); extern void mdev_unregister_device(struct device *dev); There are no locks to serialize above callbacks in mdev driver and vfio_mdev driver. If required, vendor driver can have locks to serialize above APIs in their driver. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued	Daniel Vetter
	Tvrtko needs commit b3c11ac267d461d3d597967164ff7278a919a39f Author: Eric Engestrom <eric@engestrom.ch> Date: Sat Nov 12 01:12:56 2016 +0000 drm: move allocation out of drm_get_format_name() to be able to apply his patches without conflicts. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-11-17	drm: Nuke modifier[1-3]	Ville Syrjälä
	It has been suggested that having per-plane modifiers is making life more difficult for userspace, so let's just retire modifier[1-3] and use modifier[0] to apply to the entire framebuffer. Obviosuly this means that if individual planes need different tiling layouts and whatnot we will need a new modifier for each combination of planes with different tiling layouts. For a bit of extra backwards compatilbilty the kernel will allow non-zero modifier[1+] but it require that they will match modifier[0]. This in case there's existing userspace out there that sets modifier[1+] to something non-zero with planar formats. Mostly a cocci job, with a bit of manual stuff mixed in. @@ struct drm_framebuffer *fb; expression E; @@ - fb->modifier[E] + fb->modifier @@ struct drm_framebuffer fb; expression E; @@ - fb.modifier[E] + fb.modifier Cc: Kristian Høgsberg <hoegsberg@gmail.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tomeu Vizoso <tomeu@tomeuvizoso.net> Cc: dczaplejewicz@collabora.co.uk Suggested-by: Kristian Høgsberg <hoegsberg@gmail.com> Acked-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: Daniel Stone <daniels@collabora.com> Acked-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1479295996-26246-1-git-send-email-ville.syrjala@linux.intel.com
2016-11-17	locking/core: Provide common cpu_relax_yield() definition	Christian Borntraeger
	No need to duplicate the same define everywhere. Since the only user is stop-machine and the only provider is s390, we can use a default implementation of cpu_relax_yield() in sched.h. Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-s390 <linux-s390@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1479298985-191589-1-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16	Merge branch 'topic/st_fdma' of ↵	Bjorn Andersson
	git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma into rproc-next
2016-11-16	sctp: use new rhlist interface on sctp transport rhashtable	Xin Long
	Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17	Merge tag 'drm-vc4-next-2016-11-16' of https://github.com/anholt/linux into ↵	Dave Airlie
	drm-next This pull request brings in fragment shader threading and ETC1 support for vc4.
2016-11-16	netpoll: more efficient locking	Eric Dumazet
	Callers of netpoll_poll_lock() own NAPI_STATE_SCHED Callers of netpoll_poll_unlock() have BH blocked between the NAPI_STATE_SCHED being cleared and poll_lock is released. We can avoid the spinlock which has no contention, and use cmpxchg() on poll_owner which we need to set anyway. This removes a possible lockdep violation after the cited commit, since sk_busy_loop() re-enables BH before calling busy_poll_stop() Fixes: 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	ACPI / video: Move ACPI_VIDEO_NOTIFY_* defines to acpi/video.h	Hans de Goede
	acpi_video.c passed the ACPI_VIDEO_NOTIFY_* defines as type code to acpi_notifier_call_chain(). Move these defines to acpi/video.h so that acpi_notifier listeners can check the type code using these defines. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17	Merge tag 'drm-misc-next-2016-11-16' of ↵	Dave Airlie
	git://anongit.freedesktop.org/git/drm-misc into drm-next Another pile of misc: - Explicit fencing for atomic! Big thanks to Gustavo, Sean, Rob 3x, Brian and anyone else I've forgotten to make this happen. - roll out fbdev helper ops to drivers (Stefan Christ) - last bits of drm_crtc split-up&kerneldoc - some drm_irq.c crtc functions cleanup - prepare_fb helper for cma, works correctly with explicit fencing (Marek Vasut) - misc small patches all over * tag 'drm-misc-next-2016-11-16' of git://anongit.freedesktop.org/git/drm-misc: (51 commits) drm/fence: add out-fences support drm/fence: add fence timeline to drm_crtc drm/fence: add in-fences support drm/bridge: analogix_dp: return error if transfer none byte drm: drm_irq.h header cleanup drm/irq: Unexport drm_vblank_on/off drm/irq: Unexport drm_vblank_count drm/irq: Make drm_vblank_pre/post_modeset internal drm/nouveau: Use drm_crtc_vblank_off/on drm/amdgpu: Use drm_crtc_vblank_on/off for dce6 drm/color: document NULL values and default settings better drm: Drop externs from drm_crtc.h drm: Move tile group code into drm_connector.c drm: Extract drm_mode_config.[hc] Revert "drm: Add aspect ratio parsing in DRM layer" Revert "drm: Add and handle new aspect ratios in DRM layer" drm/print: Move kerneldoc next to definition drm: Consolidate dumb buffer docs drm: Clean up kerneldoc for struct drm_driver drm: Extract drm_drv.h ...
2016-11-16	lwtunnel: subtract tunnel headroom from mtu on output redirect	David Lebrun
	This patch changes the lwtunnel_headroom() function which is called in ipv4_mtu() and ip6_mtu(), to also return the correct headroom value when the lwtunnel state is OUTPUT_REDIRECT. This patch enables e.g. SR-IPv6 encapsulations to work without manually setting the route mtu. Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	ACPI / tebles: remove redundant declare of acpi_table_parse_entries()	Longpeng $Mike$
	This function declared twice, so remove one declaration of it. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16	tools/power/acpi: Remove direct kernel source include reference	Lv Zheng
	Avoid breaking cross-compiled ACPI tools builds by rearranging the handling of kernel header files. This patch also contains OUTPUT/srctree cleanups in order to make above fix working for various build environments. Fixes: e323c02dee59 (ACPICA: MSVC9: Fix <sys/stat.h> inclusion order issue) Reported-and-tested-by: Yisheng Xie <xieyisheng1@huawei.com> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16	drm/vc4: Add fragment shader threading support	Jonas Pfeil
	FS threading brings performance improvements of 0-20% in glmark2. The validation code checks for thread switch signals and ensures that the registers of the other thread are not touched, and that our clamps are not live across thread switches. It also checks that the threading and branching instructions do not interfere. (Original patch by Jonas, changes by anholt for style cleanup, removing validation the kernel doesn't need to do, and adding the flag for userspace). v2: Minor style fixes from checkpatch. Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de> Signed-off-by: Eric Anholt <eric@anholt.net>
2016-11-16	Merge tag 'sunxi-clk-for-4.10' of ↵	Stephen Boyd
	https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into clk-next Pull Allwinner clock changes from Maxime Ripard: The usual patches from us, but most notably the introduction of the A64 clocks unit. * tag 'sunxi-clk-for-4.10' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux: clk: sunxi-ng: sun8i-h3: Set CLK_SET_RATE_PARENT for audio module clocks clk: sunxi-ng: sun8i-a23: Set CLK_SET_RATE_PARENT for audio module clocks clk: sunxi-ng: Add A64 clocks clk: sunxi-ng: Implement minimum for multipliers clk: sunxi-ng: Add minimums for all the relevant structures and clocks clk: sunxi-ng: Finish to convert to structures for arguments clk: sunxi-ng: Remove the use of rational computations clk: sunxi-ng: Rename the internal structures clk: sunxi: mod0: improve function-level documentation
2016-11-16	Merge tag 'imx-clk-4.10' of ↵	Stephen Boyd
	git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into clk-next Pull i.MX clock updates from Shawn Guo: - A patch series to fix the long standing issue with glitchy parent mux of ldb_di_clk, which can hang up LVDS display when ipu_di_clk is sourced from ldb_di_clk. - A patch to add imx6ull clock support on top of imx6ul clock driver. * tag 'imx-clk-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: clk: imx: clk-imx6ul: add clk support for imx6ull clk: imx6: Fix procedure to switch the parent of LDB_DI_CLK clk: imx6: Make the LDB_DI0 and LDB_DI1 clocks read-only clk: imx6: Mask mmdc_ch1 handshake for periph2_sel and mmdc_ch1_axi_podf