linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2016-11-18	netns: fix get_net_ns_by_fd(int pid) typo	Stefan Hajnoczi
	The argument to get_net_ns_by_fd() is a /proc/$PID/ns/net file descriptor not a pid. Fix the typo. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Rami Rosen <roszenrami@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info	Florian Fainelli
	In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST, provide an inline stub for mvebu_mbus_get_dram_win_info(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	hw_breakpoint: Allow watchpoint of length 3,5,6 and 7	Pratyush Anand
	We only support breakpoint/watchpoint of length 1, 2, 4 and 8. If we can support other length as well, then user may watch more data with less number of watchpoints (provided hardware supports it). For example: if we have to watch only 4th, 5th and 6th byte from a 64 bit aligned address, we will have to use two slots to implement it currently. One slot will watch a half word at offset 4 and other a byte at offset 6. If we can have a watchpoint of length 3 then we can watch it with single slot as well. ARM64 hardware does support such functionality, therefore adding these new definitions in generic layer. Signed-off-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-18	ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables	Raju Lakkaraju
	For operation in cabling environments that are incompatible with 1000BASE-T, PHY device may provide an automatic link speed downshift operation. When enabled, the device automatically changes its 1000BASE-T auto-negotiation to the next slower speed after a configured number of failed attempts at 1000BASE-T. This feature is useful in setting up in networks using older cable installations that include only pairs A and B, and not pairs C and D. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE	Raju Lakkaraju
	Adding get_tunable/set_tunable function pointer to the phy_driver structure, and uses these function pointers to implement the ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	ethtool: (uapi) Add ETHTOOL_PHY_GTUNABLE and ETHTOOL_PHY_STUNABLE	Raju Lakkaraju
	Defines a generic API to get/set phy tunables. The API is using the existing ethtool_tunable/tunable_type_id types which is already being used for mac level tunables. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Add MPCNT register infrastructure	Gal Pressman
	Add the needed infrastructure for future use of MPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Set driver version infrastructure	Saeed Mahameed
	Add driver_version capability bit is enabled, and set driver version command in mlx5_ifc firmware header. The only purpose of this command is to store a driver version/OS string in FW to be reported and displayed in various management systems, such as IPMI/BMC. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Add handling for port module event	Huy Nguyen
	For each asynchronous port module event: 1. print with ratelimit to the dmesg log 2. increment the corresponding event counter Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Port module event hardware structures	Huy Nguyen
	Add hardware structures and constants definitions needed for module events support. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	net/mlx5: Make the command interface cache more flexible	Mohamad Haj Yahia
	Add more cache command size sets and more entries for each set based on the current commands set different sizes and commands frequency. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	netns: make struct pernet_operations::id unsigned int	Alexey Dobriyan
	Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	udp: enable busy polling for all sockets	Eric Dumazet
	UDP busy polling is restricted to connected UDP sockets. This is because sk_busy_loop() only takes care of one NAPI context. There are cases where it could be extended. 1) Some hosts receive traffic on a single NIC, with one RX queue. 2) Some applications use SO_REUSEPORT and associated BPF filter to split the incoming traffic on one UDP socket per RX queue/thread/cpu 3) Some UDP sockets are used to send/receive traffic for one flow, but they do not bother with connect() This patch records the napi_id of first received skb, giving more reach to busy polling. Tested: lpaa23:~# echo 70 >/proc/sys/net/core/busy_read lpaa24:~# echo 70 >/proc/sys/net/core/busy_read lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done Before patch : 27867 28870 37324 41060 41215 36764 36838 44455 41282 43843 After patch : 73920 73213 70147 74845 71697 68315 68028 75219 70082 73707 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18	Merge tag 'usb-for-v4.10' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next Felipe writes: usb: patches for v4.10 merge window One big merge this time with a total of 166 non-merge commits. Most of the work, by far, is on dwc2 this time (68.2%) with dwc3 a far second (22.5%). The remaining 9.3% are scattered on gadget drivers. The most important changes for dwc2 are the peripheral side DMA support implemented by Synopsys folks and support for the new IOT dwc2 compatible core from Synopsys. In dwc3 land we have support for high-bandwidth, high-speed isochronous endpoints and some non-critical fixes for large scatter lists. Apart from these, we have our usual set of cleanups, non-critical fixes, etc.
2016-11-18	block: Change extern inline to static inline	Tobias Klauser
	With compilers which follow the C99 standard (like modern versions of gcc and clang), "extern inline" does the opposite thing from older versions of gcc (emits code for an externally linkable version of the inline function). "static inline" does the intended behavior in all cases instead. Description taken from commit 6d91857d4826 ("staging, rtl8192e, LLVMLinux: Change extern inline to static inline"). This also fixes the following GCC warning when building with CONFIG_PM disabled: ./include/linux/blkdev.h:1143:20: warning: no previous prototype for 'blk_set_runtime_active' [-Wmissing-prototypes] Fixes: d07ab6d11477 ("block: Add blk_set_runtime_active()") Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-18	usb: gadget: fix request length error for isoc transfer	Peter Chen
	For isoc endpoint descriptor, the wMaxPacketSize is not real max packet size (see Table 9-13. Standard Endpoint Descriptor, USB 2.0 specifcation), it may contain the number of packet, so the real max packet should be ep->desc->wMaxPacketSize && 0x7ff. Cc: Felipe F. Tonello <eu@felipetonello.com> Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Fixes: 16b114a6d797 ("usb: gadget: fix usb_ep_align_maybe endianness and new usb_ep_aligna") Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
2016-11-17	cpufreq: intel_pstate: Request P-states control from SMM if needed	Rafael J. Wysocki
	Currently, intel_pstate is unable to control P-states on my IvyBridge-based Acer Aspire S5, because they are controlled by SMM on that machine by default and it is necessary to request OS control of P-states from it via the SMI Command register exposed in the ACPI FADT. intel_pstate doesn't do that now, but acpi-cpufreq and other cpufreq drivers for x86 platforms do. Address this problem by making intel_pstate use the ACPI-defined mechanism as well. However, intel_pstate is not modular and it doesn't need the module refcount tricks played by acpi_processor_notify_smm(), so export the core of this function to it as acpi_processor_pstate_control() and make it call that. [The changes in processor_perflib.c related to this should not make any functional difference for the acpi_processor_notify_smm() users]. To be safe, only call acpi_processor_notify_smm() from intel_pstate if ACPI _PPC support is enabled in it. Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-11-17	Merge tag 'clk-renesas-for-v4.10-tag2' of ↵	Stephen Boyd
	git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into clk-next Pull Renesas clk driver updates from Geerty Uytterhoeven: - Add R-Car RST driver for obtaining mode pin state, and move the related functionality from platform code to DT, - Add r8a7743 and r8a7745 CPG Core Clock Definitions. The commits here are intermingled with arm-soc material because of the hard dependency we're breaking between mach code and driver code. We're replacing that with a driver dependency between the soc driver and the clk driver. * tag 'clk-renesas-for-v4.10-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers: (25 commits) clk: renesas: Add r8a7745 CPG Core Clock Definitions clk: renesas: Add r8a7743 CPG Core Clock Definitions clk: renesas: rcar-gen2: Remove obsolete rcar_gen2_clocks_init() clk: renesas: r8a7779: Remove obsolete r8a7779_clocks_init() clk: renesas: r8a7778: Remove obsolete r8a7778_clocks_init() ARM: shmobile: rcar-gen2: Stop passing mode pins state to clock driver ARM: shmobile: r8a7779: Stop passing mode pins state to clock driver ARM: shmobile: r8a7778: Stop passing mode pins state to clock driver clk: renesas: rcar-gen3-cpg: Remove obsolete rcar_gen3_read_mode_pins() clk: renesas: r8a7796: Obtain mode pin values from R-Car RST driver clk: renesas: r8a7795: Obtain mode pin values from R-Car RST driver clk: renesas: rcar-gen2: Obtain mode pin values using RST driver clk: renesas: r8a7779: Obtain mode pin values from R-Car RST driver clk: renesas: r8a7778: Obtain mode pin values using R-Car RST driver arm64: renesas: r8a7796 dtsi: Add device node for RST module arm64: renesas: r8a7795 dtsi: Add device node for RST module ARM: dts: r8a7794: Add device node for RST module ARM: dts: r8a7793: Add device node for RST module ARM: dts: r8a7792: Add device node for RST module ARM: dts: r8a7791: Add device node for RST module ...
2016-11-17	blk-mq: make the polling code adaptive	Jens Axboe
	The previous commit introduced the hybrid sleep/poll mode. Take that one step further, and use the completion latencies to automatically sleep for half the mean completion time. This is a good approximation. This changes the 'io_poll_delay' sysfs file a bit to expose the various options. Depending on the value, the polling code will behave differently: -1 Never enter hybrid sleep mode 0 Use half of the completion mean for the sleep delay >0 Use this specific value as the sleep delay Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17	blk-mq: implement hybrid poll mode for sync O_DIRECT	Jens Axboe
	This patch enables a hybrid polling mode. Instead of polling after IO submission, we can induce an artificial delay, and then poll after that. For example, if the IO is presumed to complete in 8 usecs from now, we can sleep for 4 usecs, wake up, and then do our polling. This still puts a sleep/wakeup cycle in the IO path, but instead of the wakeup happening after the IO has completed, it'll happen before. With this hybrid scheme, we can achieve big latency reductions while still using the same (or less) amount of CPU. Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17	mremap: fix race between mremap() and page cleanning	Aaron Lu
	Prior to 3.15, there was a race between zap_pte_range() and page_mkclean() where writes to a page could be lost. Dave Hansen discovered by inspection that there is a similar race between move_ptes() and page_mkclean(). We've been able to reproduce the issue by enlarging the race window with a msleep(), but have not been able to hit it without modifying the code. So, we think it's a real issue, but is difficult or impossible to hit in practice. The zap_pte_range() issue is fixed by commit 1cf35d47712d("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts"). And this patch is to fix the race between page_mkclean() and mremap(). Here is one possible way to hit the race: suppose a process mmapped a file with READ \| WRITE and SHARED, it has two threads and they are bound to 2 different CPUs, e.g. CPU1 and CPU2. mmap returned X, then thread 1 did a write to addr X so that CPU1 now has a writable TLB for addr X on it. Thread 2 starts mremaping from addr X to Y while thread 1 cleaned the page and then did another write to the old addr X again. The 2nd write from thread 1 could succeed but the value will get lost. thread 1 thread 2 (bound to CPU1) (bound to CPU2) 1: write 1 to addr X to get a writeable TLB on this CPU 2: mremap starts 3: move_ptes emptied PTE for addr X and setup new PTE for addr Y and then dropped PTL for X and Y 4: page laundering for N by doing fadvise FADV_DONTNEED. When done, pageframe N is deemed clean. 5: write 2 to addr X 6: tlb flush for addr X 7: munmap (Y, pagesize) to make the page unmapped 8: fadvise with FADV_DONTNEED again to kick the page off the pagecache 9: pread the page from file to verify the value. If 1 is there, it means we have lost the written 2. the write may or may not cause segmentation fault, it depends on if the TLB is still on the CPU. Please note that this is only one specific way of how the race could occur, it didn't mean that the race could only occur in exact the above config, e.g. more than 2 threads could be involved and fadvise() could be done in another thread, etc. For anonymous pages, they could race between mremap() and page reclaim: THP: a huge PMD is moved by mremap to a new huge PMD, then the new huge PMD gets unmapped/splitted/pagedout before the flush tlb happened for the old huge PMD in move_page_tables() and we could still write data to it. The normal anonymous page has similar situation. To fix this, check for any dirty PTE in move_ptes()/move_huge_pmd() and if any, did the flush before dropping the PTL. If we did the flush for every move_ptes()/move_huge_pmd() call then we do not need to do the flush in move_pages_tables() for the whole range. But if we didn't, we still need to do the whole range flush. Alternatively, we can track which part of the range is flushed in move_ptes()/move_huge_pmd() and which didn't to avoid flushing the whole range in move_page_tables(). But that would require multiple tlb flushes for the different sub-ranges and should be less efficient than the single whole range flush. KBuild test on my Sandybridge desktop doesn't show any noticeable change. v4.9-rc4: real 5m14.048s user 32m19.800s sys 4m50.320s With this commit: real 5m13.888s user 32m19.330s sys 4m51.200s Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-17	mei: bus: split RX and async notification callbacks	Alexander Usyskin
	Split callbacks for RX and async notification events on mei bus to eliminate synchronization problems and to open way for RX optimizations. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-17	vfio: Define device_api strings	Kirti Wankhede
	Defined device API strings. Vendor driver using mediated device framework should use corresponding string for device_api attribute. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Introduce vfio_set_irqs_validate_and_prepare()	Kirti Wankhede
	Vendor driver using mediated device framework would use same mechnism to validate and prepare IRQs. Introducing this function to reduce code replication in multiple drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Introduce common function to add capabilities	Kirti Wankhede
	Vendor driver using mediated device framework should use vfio_info_add_capability() to add capabilities. Introduced this function to reduce code duplication in vendor drivers. vfio_info_cap_shift() manipulated a data buffer to add an offset to each element in a chain. This data buffer is documented in a uapi header. Changing vfio_info_cap_shift symbol to be available to all drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu: Add blocking notifier to notify DMA_UNMAP	Kirti Wankhede
	Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers about DMA_UNMAP. Exported two APIs vfio_register_notifier() and vfio_unregister_notifier(). Notifier should be registered, if external user wants to use vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages. Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate mappings. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops	Kirti Wankhede
	Added APIs for pining and unpining set of pages. These call back into backend iommu module to actually pin and unpin pages. Added two new callback functions to struct vfio_iommu_driver_ops. Backend IOMMU module that supports pining and unpinning pages for mdev devices should provide these functions. Renamed static functions in vfio_type1_iommu.c to resolve conflicts Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	vfio: Mediated device Core driver	Kirti Wankhede
	Design for Mediated Device Driver: Main purpose of this driver is to provide a common interface for mediated device management that can be used by different drivers of different devices. This module provides a generic interface to create the device, add it to mediated bus, add device to IOMMU group and then add it to vfio group. Below is the high Level block diagram, with Nvidia, Intel and IBM devices as example, since these are the devices which are going to actively use this module as of now. +---------------+ \| \| \| +-----------+ \| mdev_register_driver() +--------------+ \| \| \| +<------------------------+ __init() \| \| \| mdev \| \| \| \| \| \| bus \| +------------------------>+ \|<-> VFIO user \| \| driver \| \| probe()/remove() \| vfio_mdev.ko \| APIs \| \| \| \| \| \| \| +-----------+ \| +--------------+ \| \| \| MDEV CORE \| \| MODULE \| \| mdev.ko \| \| +-----------+ \| mdev_register_device() +--------------+ \| \| \| +<------------------------+ \| \| \| \| \| \| nvidia.ko \|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| \| Physical \| \| \| \| device \| \| mdev_register_device() +--------------+ \| \| interface \| \|<------------------------+ \| \| \| \| \| \| i915.ko \|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| \| \| \| \| \| \| \| mdev_register_device() +--------------+ \| \| \| +<------------------------+ \| \| \| \| \| \| ccw_device.ko\|<-> physical \| \| \| +------------------------>+ \| device \| \| \| \| callback +--------------+ \| +-----------+ \| +---------------+ Core driver provides two types of registration interfaces: 1. Registration interface for mediated bus driver: /** * struct mdev_driver - Mediated device's driver * @name: driver name * @probe: called when new device created * @remove:called when device removed * @driver:device driver structure * */ struct mdev_driver { const char name; int (probe) (struct device dev); void (remove) (struct device dev); struct device_driver driver; }; Mediated bus driver for mdev device should use this interface to register and unregister with core driver respectively: int mdev_register_driver(struct mdev_driver drv, struct module owner); void mdev_unregister_driver(struct mdev_driver drv); Mediated bus driver is responsible to add/delete mediated devices to/from VFIO group when devices are bound and unbound to the driver. 2. Physical device driver interface This interface provides vendor driver the set APIs to manage physical device related work in its driver. APIs are : dev_attr_groups: attributes of the parent device. * mdev_attr_groups: attributes of the mediated device. * supported_type_groups: attributes to define supported type. This is mandatory field. * create: to allocate basic resources in vendor driver for a mediated device. This is mandatory to be provided by vendor driver. * remove: to free resources in vendor driver when mediated device is destroyed. This is mandatory to be provided by vendor driver. * open: open callback of mediated device * release: release callback of mediated device * read : read emulation callback. * write: write emulation callback. * ioctl: ioctl callback. * mmap: mmap emulation callback. Drivers should use these interfaces to register and unregister device to mdev core driver respectively: extern int mdev_register_device(struct device dev, const struct parent_ops ops); extern void mdev_unregister_device(struct device *dev); There are no locks to serialize above callbacks in mdev driver and vfio_mdev driver. If required, vendor driver can have locks to serialize above APIs in their driver. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17	Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued	Daniel Vetter
	Tvrtko needs commit b3c11ac267d461d3d597967164ff7278a919a39f Author: Eric Engestrom <eric@engestrom.ch> Date: Sat Nov 12 01:12:56 2016 +0000 drm: move allocation out of drm_get_format_name() to be able to apply his patches without conflicts. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-11-17	xenfs: Use proc_create_mount_point() to create /proc/xen	Seth Forshee
	Mounting proc in user namespace containers fails if the xenbus filesystem is mounted on /proc/xen because this directory fails the "permanently empty" test. proc_create_mount_point() exists specifically to create such mountpoints in proc but is currently proc-internal. Export this interface to modules, then use it in xenbus when creating /proc/xen. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-11-17	drm: Nuke modifier[1-3]	Ville Syrjälä
	It has been suggested that having per-plane modifiers is making life more difficult for userspace, so let's just retire modifier[1-3] and use modifier[0] to apply to the entire framebuffer. Obviosuly this means that if individual planes need different tiling layouts and whatnot we will need a new modifier for each combination of planes with different tiling layouts. For a bit of extra backwards compatilbilty the kernel will allow non-zero modifier[1+] but it require that they will match modifier[0]. This in case there's existing userspace out there that sets modifier[1+] to something non-zero with planar formats. Mostly a cocci job, with a bit of manual stuff mixed in. @@ struct drm_framebuffer *fb; expression E; @@ - fb->modifier[E] + fb->modifier @@ struct drm_framebuffer fb; expression E; @@ - fb.modifier[E] + fb.modifier Cc: Kristian Høgsberg <hoegsberg@gmail.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tomeu Vizoso <tomeu@tomeuvizoso.net> Cc: dczaplejewicz@collabora.co.uk Suggested-by: Kristian Høgsberg <hoegsberg@gmail.com> Acked-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: Daniel Stone <daniels@collabora.com> Acked-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1479295996-26246-1-git-send-email-ville.syrjala@linux.intel.com
2016-11-17	locking/core: Provide common cpu_relax_yield() definition	Christian Borntraeger
	No need to duplicate the same define everywhere. Since the only user is stop-machine and the only provider is s390, we can use a default implementation of cpu_relax_yield() in sched.h. Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-s390 <linux-s390@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1479298985-191589-1-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16	Merge branch 'topic/st_fdma' of ↵	Bjorn Andersson
	git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma into rproc-next
2016-11-16	sctp: use new rhlist interface on sctp transport rhashtable	Xin Long
	Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17	Merge tag 'drm-vc4-next-2016-11-16' of https://github.com/anholt/linux into ↵	Dave Airlie
	drm-next This pull request brings in fragment shader threading and ETC1 support for vc4.
2016-11-16	netpoll: more efficient locking	Eric Dumazet
	Callers of netpoll_poll_lock() own NAPI_STATE_SCHED Callers of netpoll_poll_unlock() have BH blocked between the NAPI_STATE_SCHED being cleared and poll_lock is released. We can avoid the spinlock which has no contention, and use cmpxchg() on poll_owner which we need to set anyway. This removes a possible lockdep violation after the cited commit, since sk_busy_loop() re-enables BH before calling busy_poll_stop() Fixes: 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	ACPI / video: Move ACPI_VIDEO_NOTIFY_* defines to acpi/video.h	Hans de Goede
	acpi_video.c passed the ACPI_VIDEO_NOTIFY_* defines as type code to acpi_notifier_call_chain(). Move these defines to acpi/video.h so that acpi_notifier listeners can check the type code using these defines. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17	Merge tag 'drm-misc-next-2016-11-16' of ↵	Dave Airlie
	git://anongit.freedesktop.org/git/drm-misc into drm-next Another pile of misc: - Explicit fencing for atomic! Big thanks to Gustavo, Sean, Rob 3x, Brian and anyone else I've forgotten to make this happen. - roll out fbdev helper ops to drivers (Stefan Christ) - last bits of drm_crtc split-up&kerneldoc - some drm_irq.c crtc functions cleanup - prepare_fb helper for cma, works correctly with explicit fencing (Marek Vasut) - misc small patches all over * tag 'drm-misc-next-2016-11-16' of git://anongit.freedesktop.org/git/drm-misc: (51 commits) drm/fence: add out-fences support drm/fence: add fence timeline to drm_crtc drm/fence: add in-fences support drm/bridge: analogix_dp: return error if transfer none byte drm: drm_irq.h header cleanup drm/irq: Unexport drm_vblank_on/off drm/irq: Unexport drm_vblank_count drm/irq: Make drm_vblank_pre/post_modeset internal drm/nouveau: Use drm_crtc_vblank_off/on drm/amdgpu: Use drm_crtc_vblank_on/off for dce6 drm/color: document NULL values and default settings better drm: Drop externs from drm_crtc.h drm: Move tile group code into drm_connector.c drm: Extract drm_mode_config.[hc] Revert "drm: Add aspect ratio parsing in DRM layer" Revert "drm: Add and handle new aspect ratios in DRM layer" drm/print: Move kerneldoc next to definition drm: Consolidate dumb buffer docs drm: Clean up kerneldoc for struct drm_driver drm: Extract drm_drv.h ...
2016-11-16	lwtunnel: subtract tunnel headroom from mtu on output redirect	David Lebrun
	This patch changes the lwtunnel_headroom() function which is called in ipv4_mtu() and ip6_mtu(), to also return the correct headroom value when the lwtunnel state is OUTPUT_REDIRECT. This patch enables e.g. SR-IPv6 encapsulations to work without manually setting the route mtu. Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	ACPI / tebles: remove redundant declare of acpi_table_parse_entries()	Longpeng $Mike$
	This function declared twice, so remove one declaration of it. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16	tools/power/acpi: Remove direct kernel source include reference	Lv Zheng
	Avoid breaking cross-compiled ACPI tools builds by rearranging the handling of kernel header files. This patch also contains OUTPUT/srctree cleanups in order to make above fix working for various build environments. Fixes: e323c02dee59 (ACPICA: MSVC9: Fix <sys/stat.h> inclusion order issue) Reported-and-tested-by: Yisheng Xie <xieyisheng1@huawei.com> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16	drm/vc4: Add fragment shader threading support	Jonas Pfeil
	FS threading brings performance improvements of 0-20% in glmark2. The validation code checks for thread switch signals and ensures that the registers of the other thread are not touched, and that our clamps are not live across thread switches. It also checks that the threading and branching instructions do not interfere. (Original patch by Jonas, changes by anholt for style cleanup, removing validation the kernel doesn't need to do, and adding the flag for userspace). v2: Minor style fixes from checkpatch. Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de> Signed-off-by: Eric Anholt <eric@anholt.net>
2016-11-16	Merge tag 'sunxi-clk-for-4.10' of ↵	Stephen Boyd
	https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into clk-next Pull Allwinner clock changes from Maxime Ripard: The usual patches from us, but most notably the introduction of the A64 clocks unit. * tag 'sunxi-clk-for-4.10' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux: clk: sunxi-ng: sun8i-h3: Set CLK_SET_RATE_PARENT for audio module clocks clk: sunxi-ng: sun8i-a23: Set CLK_SET_RATE_PARENT for audio module clocks clk: sunxi-ng: Add A64 clocks clk: sunxi-ng: Implement minimum for multipliers clk: sunxi-ng: Add minimums for all the relevant structures and clocks clk: sunxi-ng: Finish to convert to structures for arguments clk: sunxi-ng: Remove the use of rational computations clk: sunxi-ng: Rename the internal structures clk: sunxi: mod0: improve function-level documentation
2016-11-16	Merge tag 'imx-clk-4.10' of ↵	Stephen Boyd
	git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into clk-next Pull i.MX clock updates from Shawn Guo: - A patch series to fix the long standing issue with glitchy parent mux of ldb_di_clk, which can hang up LVDS display when ipu_di_clk is sourced from ldb_di_clk. - A patch to add imx6ull clock support on top of imx6ul clock driver. * tag 'imx-clk-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: clk: imx: clk-imx6ul: add clk support for imx6ull clk: imx6: Fix procedure to switch the parent of LDB_DI_CLK clk: imx6: Make the LDB_DI0 and LDB_DI1 clocks read-only clk: imx6: Mask mmdc_ch1 handshake for periph2_sel and mmdc_ch1_axi_podf
2016-11-16	net: busy-poll: return busypolling status to drivers	Eric Dumazet
	NAPI drivers use napi_complete_done() or napi_complete() when they drained RX ring and right before re-enabling device interrupts. In busy polling, we can avoid interrupts being delivered since we are polling RX ring in a controlled loop. Drivers can chose to use napi_complete_done() return value to reduce interrupts overhead while busy polling is active. This is optional, legacy drivers should work fine even if not updated. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	net: busy-poll: remove need_resched() from sk_can_busy_loop()	Eric Dumazet
	Now sk_busy_loop() can schedule by itself, we can remove need_resched() check from sk_can_busy_loop() Also add a const to its struct sock parameter. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	net: busy-poll: allow preemption in sk_busy_loop()	Eric Dumazet
	After commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), sk_busy_loop() needs a bit of care : softirqs might be delayed since we do not allow preemption yet. This patch adds preemptiom points in sk_busy_loop(), and makes sure no unnecessary cache line dirtying or atomic operations are done while looping. A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED, so that sk_busy_loop() does not have to grab it again. Similarly, netpoll_poll_lock() is done one time. This gives about 10 to 20 % improvement in various busy polling tests, especially when many threads are busy polling in configurations with large number of NIC queues. This should allow experimenting with bigger delays without hurting overall latencies. Tested: On a 40Gb mlx4 NIC, 32 RX/TX queues. echo 70 >/proc/sys/net/core/busy_read for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done Before: After: 1: 90072 92819 2: 157289 184007 3: 235772 213504 4: 344074 357513 5: 394755 458267 6: 461151 487819 7: 549116 625963 8: 544423 716219 9: 720460 738446 10: 794686 837612 11: 915998 923960 12: 937507 925107 13: 1019677 971506 14: 1046831 1113650 15: 1114154 1148902 16: 1105221 1179263 17: 1266552 1299585 18: 1258454 1383817 19: 1341453 1312194 20: 1363557 1488487 21: 1387979 1501004 22: 1417552 1601683 23: 1550049 1642002 24: 1568876 1601915 25: 1560239 1683607 26: 1640207 1745211 27: 1706540 1723574 28: 1638518 1722036 29: 1734309 1757447 30: 1782007 1855436 31: 1724806 1888539 32: 1717716 1944297 33: 1778716 1869118 34: 1805738 1983466 35: 1815694 2020758 36: 1893059 2035632 37: 1843406 2034653 38: 1888830 2086580 39: 1972827 2143567 40: 1877729 2181851 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	ipv4: Restore fib_trie_flush_external function and fix call ordering	Alexander Duyck
	The patch that removed the FIB offload infrastructure was a bit too aggressive and also removed code needed to clean up us splitting the table if additional rules were added. Specifically the function fib_trie_flush_external was called at the end of a new rule being added to flush the foreign trie entries from the main trie. I updated the code so that we only call fib_trie_flush_external on the main table so that we flush the entries for local from main. This way we don't call it for every rule change which is what was happening previously. Fixes: 347e3b28c1ba2 ("switchdev: remove FIB offload infrastructure") Reported-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	bpf: fix range arithmetic for bpf map access	Josef Bacik
	I made some invalid assumptions with BPF_AND and BPF_MOD that could result in invalid accesses to bpf map entries. Fix this up by doing a few things 1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real life and just adds extra complexity. 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the minimum value to 0 for positive AND's. 3) Don't do operations on the ranges if they are set to the limits, as they are by definition undefined, and allowing arithmetic operations on those values could make them appear valid when they really aren't. This fixes the testcase provided by Jann as well as a few other theoretical problems. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16	Merge tag 'iio-for-4.10c' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: Third set of IIO new device support, features and cleanup for the 4.10 cycle. Includes Peter Rosin's interesting drivers for a comparator. First complex use we have had with an analog front end made from discrete components. Brian Masney's work on moving the tsl2583 driver out of staging also feature extensively! New Drivers * DAC based on a digital potentiometer - New driver for the use of a dpot as a DAC. Includes bindings and Axentia entry in vendor prefixes. * Envelope detector baed on DAC and a comparator including device tree bindings. Staging Graduation * tsl2583. Core new features - Core provision for _available attributes. This one had been stalled for a long time until Peter picked it up and ran with it! - In kernel interface helpers to retrieve available info from channels. Driver new features * mcp4531 - Add range of available raw values (used for the dpot dac driver). Driver cleanups and fixes for issues introduced * ad7766 - Testing the wrong variable following devm_regulator_bulk_get introduced with the driver earlier in this cycle. * ad9832 - Fix a wrong ordering in the probe introduced in the previous set of patches. A use before allocation bug. * cros_ec_sensors - Testing for an error in a u8 will never work. * mpu3050 - Remove duplicate initializer for the module owner. - Add missing i2c dependency. - Inform the i2c mux core how it is used - step one in implifying device tree bindings. * st-sensors - Get rid of large number of uninformative defines in favour of putting the constants where they are relevant. It is clear what they are from where they are used. * tsl2583 - Fix unused function warning when CONFIG_PM disabled and remove the ifdefs in favour of __maybe_unused. - Refactor taos_chip_on to only read relevant registers. - Make sure calibscale and integration time are being set. - Verify chip is in ready to be used before calibration. - Remove some repeated checks for chip status (it's protected by a mutex so can't change until it's released) - Change current state storage from a tristate enum to a boolean seeing as only two values are actually used now. - Drop a redundant write to the control regiser in taos_probe (it's a noop) - Drop the FSF mailing address. - Clean up logging to not use hard coded function names (use __func__ instead). - Cleanup up variable and function name prefixes. - Alignment of #define fixes. - Fix comparison between signed and unsigned integer warnings. - Add some newlines in favour of readability. - Combine the two sysfs ABI docs that somehow ended up in different places. - Fix multiline comment syntax. - Move a code block to inside an else statement as it makes more sense there. - Change tsl2583_als_calibrate to return 0 rather than a value nothing reads. - Drop some pointless brackets - Don't assume 32bit unsigned int. - Change to a per device instance lux table. - Add missing tsl2583 to the list of supported devices in the intro comments. - Improve commment on clearing of interrupts. - Drop some uninformative comments. - Drop a memset call that doesn't do anything useful any more. - Don't initialize some return variables that are always set. - Add Brian Masney as a module author after all these changes.