summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2016-11-18Merge tag 'acpi-4.9-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "They fix an ACPI thermal management regression introduced by a recent FADT handling cleanup, an ACPI tools build issue introduced by a recent ACPICA commit and a PCC mailbox initialization bug causing lockdep to complain loudly. Specifics: - Revert a recent ACPICA cleanup that attempted to get rid of all FADT version 2 legacy, but broke ACPI thermal management on at least one system (Rafael Wysocki). - Fix cross-compiled builds of ACPI tools that stopped working after a recent cleanup related to the handling of header files in ACPICA (Lv Zheng). - Fix a locking issue in the PCC channel initialization code that invokes devm_request_irq() under a spinlock (among other things) and causes lockdep to complain (Hoan Tran)" * tag 'acpi-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: tools/power/acpi: Remove direct kernel source include reference mailbox: PCC: Fix lockdep warning when request PCC channel Revert "ACPICA: FADT support cleanup"
2016-11-18Merge tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfix from Bruce Fields: "Just one fix for an NFS/RDMA crash" * tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux: sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports
2016-11-18Merge tag 'v4.9-rc4' into soundJonathan Corbet
Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18Merge branches 'acpica-fixes', 'acpi-cppc-fixes' and 'acpi-tools-fixes'Rafael J. Wysocki
* acpica-fixes: Revert "ACPICA: FADT support cleanup" * acpi-cppc-fixes: mailbox: PCC: Fix lockdep warning when request PCC channel * acpi-tools-fixes: tools/power/acpi: Remove direct kernel source include reference
2016-11-18netns: fix get_net_ns_by_fd(int pid) typoStefan Hajnoczi
The argument to get_net_ns_by_fd() is a /proc/$PID/ns/net file descriptor not a pid. Fix the typo. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Rami Rosen <roszenrami@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_infoFlorian Fainelli
In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST, provide an inline stub for mvebu_mbus_get_dram_win_info(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18hw_breakpoint: Allow watchpoint of length 3,5,6 and 7Pratyush Anand
We only support breakpoint/watchpoint of length 1, 2, 4 and 8. If we can support other length as well, then user may watch more data with less number of watchpoints (provided hardware supports it). For example: if we have to watch only 4th, 5th and 6th byte from a 64 bit aligned address, we will have to use two slots to implement it currently. One slot will watch a half word at offset 4 and other a byte at offset 6. If we can have a watchpoint of length 3 then we can watch it with single slot as well. ARM64 hardware does support such functionality, therefore adding these new definitions in generic layer. Signed-off-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-18ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunablesRaju Lakkaraju
For operation in cabling environments that are incompatible with 1000BASE-T, PHY device may provide an automatic link speed downshift operation. When enabled, the device automatically changes its 1000BASE-T auto-negotiation to the next slower speed after a configured number of failed attempts at 1000BASE-T. This feature is useful in setting up in networks using older cable installations that include only pairs A and B, and not pairs C and D. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLERaju Lakkaraju
Adding get_tunable/set_tunable function pointer to the phy_driver structure, and uses these function pointers to implement the ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18ethtool: (uapi) Add ETHTOOL_PHY_GTUNABLE and ETHTOOL_PHY_STUNABLERaju Lakkaraju
Defines a generic API to get/set phy tunables. The API is using the existing ethtool_tunable/tunable_type_id types which is already being used for mac level tunables. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Add MPCNT register infrastructureGal Pressman
Add the needed infrastructure for future use of MPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Set driver version infrastructureSaeed Mahameed
Add driver_version capability bit is enabled, and set driver version command in mlx5_ifc firmware header. The only purpose of this command is to store a driver version/OS string in FW to be reported and displayed in various management systems, such as IPMI/BMC. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Add handling for port module eventHuy Nguyen
For each asynchronous port module event: 1. print with ratelimit to the dmesg log 2. increment the corresponding event counter Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Port module event hardware structuresHuy Nguyen
Add hardware structures and constants definitions needed for module events support. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Make the command interface cache more flexibleMohamad Haj Yahia
Add more cache command size sets and more entries for each set based on the current commands set different sizes and commands frequency. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18udp: enable busy polling for all socketsEric Dumazet
UDP busy polling is restricted to connected UDP sockets. This is because sk_busy_loop() only takes care of one NAPI context. There are cases where it could be extended. 1) Some hosts receive traffic on a single NIC, with one RX queue. 2) Some applications use SO_REUSEPORT and associated BPF filter to split the incoming traffic on one UDP socket per RX queue/thread/cpu 3) Some UDP sockets are used to send/receive traffic for one flow, but they do not bother with connect() This patch records the napi_id of first received skb, giving more reach to busy polling. Tested: lpaa23:~# echo 70 >/proc/sys/net/core/busy_read lpaa24:~# echo 70 >/proc/sys/net/core/busy_read lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done Before patch : 27867 28870 37324 41060 41215 36764 36838 44455 41282 43843 After patch : 73920 73213 70147 74845 71697 68315 68028 75219 70082 73707 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18Merge tag 'usb-for-v4.10' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next Felipe writes: usb: patches for v4.10 merge window One big merge this time with a total of 166 non-merge commits. Most of the work, by far, is on dwc2 this time (68.2%) with dwc3 a far second (22.5%). The remaining 9.3% are scattered on gadget drivers. The most important changes for dwc2 are the peripheral side DMA support implemented by Synopsys folks and support for the new IOT dwc2 compatible core from Synopsys. In dwc3 land we have support for high-bandwidth, high-speed isochronous endpoints and some non-critical fixes for large scatter lists. Apart from these, we have our usual set of cleanups, non-critical fixes, etc.
2016-11-18block: Change extern inline to static inlineTobias Klauser
With compilers which follow the C99 standard (like modern versions of gcc and clang), "extern inline" does the opposite thing from older versions of gcc (emits code for an externally linkable version of the inline function). "static inline" does the intended behavior in all cases instead. Description taken from commit 6d91857d4826 ("staging, rtl8192e, LLVMLinux: Change extern inline to static inline"). This also fixes the following GCC warning when building with CONFIG_PM disabled: ./include/linux/blkdev.h:1143:20: warning: no previous prototype for 'blk_set_runtime_active' [-Wmissing-prototypes] Fixes: d07ab6d11477 ("block: Add blk_set_runtime_active()") Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-18usb: gadget: fix request length error for isoc transferPeter Chen
For isoc endpoint descriptor, the wMaxPacketSize is not real max packet size (see Table 9-13. Standard Endpoint Descriptor, USB 2.0 specifcation), it may contain the number of packet, so the real max packet should be ep->desc->wMaxPacketSize && 0x7ff. Cc: Felipe F. Tonello <eu@felipetonello.com> Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Fixes: 16b114a6d797 ("usb: gadget: fix usb_ep_align_maybe endianness and new usb_ep_aligna") Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
2016-11-17scsi: fc: move FC transport's bsg code to bsg-libJohannes Thumshirn
Now that all conversions are done, move the FibreChannel bsg code over to the bsg library. This patch is derived from work done by Mike Christie in 2011 [1] but only the iscsi parts got merged back then. [1] http://marc.info/?l=linux-scsi&m=131149780921009&w=2 Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17block: add bsg_job_put() and bsg_job_get()Johannes Thumshirn
Add bsg_job_put() and bsg_job_get() so don't need to export bsg_destroy_job() any more. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: fc: use bsg_job_doneJohannes Thumshirn
fc_bsg_jobdone() and bsg_job_done() are 1:1 copies now so use the bsg-lib one instead of the FC private implementation. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: fc: use bsg_softirq_doneJohannes Thumshirn
bsg_softirq_done() and fc_bsg_softirq_done() are copies of each other, so ditch the fc specific one. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: fc: Use bsg_destroy_jobJohannes Thumshirn
fc_destroy_bsgjob() and bsg_destroy_job() are now 1:1 copies, so use the latter. As bsg_destroy_job() comes from bsg-lib we need to select it in Kconfig once CONFOG_SCSI_FC_ATTRS is active. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: change FC drivers to use 'struct bsg_job'Johannes Thumshirn
Change FC drivers to use 'struct bsg_job' from bsg-lib.h instead of 'struct fc_bsg_job' from scsi_transport_fc.h and remove 'struct fc_bsg_job'. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17block: add reference counting for struct bsg_jobJohannes Thumshirn
Add reference counting to 'struct bsg_job' so we can implement a reuqest timeout handler for bsg_jobs, which is needed for Fibre Channel. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: fc: implement kref backed reference countingJohannes Thumshirn
Implement kref backed reference counting instead of rolling our own. This elimnates the need of the following fields in 'struct fc_bsg_job': * ref_cnt * state_flags * job_lock bringing us close to unification of 'struct fc_bsg_job' and 'struct bsg_job'. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: fc: provide fc_bsg_to_rport() helperJohannes Thumshirn
Provide fc_bsg_to_rport() helper that will become handy when we're moving from struct fc_bsg_job to a plain struct bsg_job. Also move all LLDDs to use the new helper. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: fc: provide fc_bsg_to_shost() helperJohannes Thumshirn
Provide fc_bsg_to_shost() helper that will become handy when we're moving from struct fc_bsg_job to a plain struct bsg_job. Also use this little helper in the LLDDs. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: fc: Export fc_bsg_jobdone and use it in FC driversJohannes Thumshirn
Export fc_bsg_jobdone so drivers can use it directly instead of doing the round-trip via struct fc_bsg_job::job_done() and use it in the LLDDs. That way we can also unify the interfaces of fc_bsg_jobdone and bsg_job_done. As we've converted all LLDDs over to use fc_bsg_jobdone() directly, we can remove the function pointer from struct fc_bsg_job as well. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17scsi: Get rid of struct fc_bsg_bufferJohannes Thumshirn
struct fc_bsg_buffer is just a clone of struct bsg_buffer from bsg-lib, so use this one instead. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-11-17cpufreq: intel_pstate: Request P-states control from SMM if neededRafael J. Wysocki
Currently, intel_pstate is unable to control P-states on my IvyBridge-based Acer Aspire S5, because they are controlled by SMM on that machine by default and it is necessary to request OS control of P-states from it via the SMI Command register exposed in the ACPI FADT. intel_pstate doesn't do that now, but acpi-cpufreq and other cpufreq drivers for x86 platforms do. Address this problem by making intel_pstate use the ACPI-defined mechanism as well. However, intel_pstate is not modular and it doesn't need the module refcount tricks played by acpi_processor_notify_smm(), so export the core of this function to it as acpi_processor_pstate_control() and make it call that. [The changes in processor_perflib.c related to this should not make any functional difference for the acpi_processor_notify_smm() users]. To be safe, only call acpi_processor_notify_smm() from intel_pstate if ACPI _PPC support is enabled in it. Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-11-17Merge tag 'clk-renesas-for-v4.10-tag2' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into clk-next Pull Renesas clk driver updates from Geerty Uytterhoeven: - Add R-Car RST driver for obtaining mode pin state, and move the related functionality from platform code to DT, - Add r8a7743 and r8a7745 CPG Core Clock Definitions. The commits here are intermingled with arm-soc material because of the hard dependency we're breaking between mach code and driver code. We're replacing that with a driver dependency between the soc driver and the clk driver. * tag 'clk-renesas-for-v4.10-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers: (25 commits) clk: renesas: Add r8a7745 CPG Core Clock Definitions clk: renesas: Add r8a7743 CPG Core Clock Definitions clk: renesas: rcar-gen2: Remove obsolete rcar_gen2_clocks_init() clk: renesas: r8a7779: Remove obsolete r8a7779_clocks_init() clk: renesas: r8a7778: Remove obsolete r8a7778_clocks_init() ARM: shmobile: rcar-gen2: Stop passing mode pins state to clock driver ARM: shmobile: r8a7779: Stop passing mode pins state to clock driver ARM: shmobile: r8a7778: Stop passing mode pins state to clock driver clk: renesas: rcar-gen3-cpg: Remove obsolete rcar_gen3_read_mode_pins() clk: renesas: r8a7796: Obtain mode pin values from R-Car RST driver clk: renesas: r8a7795: Obtain mode pin values from R-Car RST driver clk: renesas: rcar-gen2: Obtain mode pin values using RST driver clk: renesas: r8a7779: Obtain mode pin values from R-Car RST driver clk: renesas: r8a7778: Obtain mode pin values using R-Car RST driver arm64: renesas: r8a7796 dtsi: Add device node for RST module arm64: renesas: r8a7795 dtsi: Add device node for RST module ARM: dts: r8a7794: Add device node for RST module ARM: dts: r8a7793: Add device node for RST module ARM: dts: r8a7792: Add device node for RST module ARM: dts: r8a7791: Add device node for RST module ...
2016-11-17blk-mq: make the polling code adaptiveJens Axboe
The previous commit introduced the hybrid sleep/poll mode. Take that one step further, and use the completion latencies to automatically sleep for half the mean completion time. This is a good approximation. This changes the 'io_poll_delay' sysfs file a bit to expose the various options. Depending on the value, the polling code will behave differently: -1 Never enter hybrid sleep mode 0 Use half of the completion mean for the sleep delay >0 Use this specific value as the sleep delay Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17blk-mq: implement hybrid poll mode for sync O_DIRECTJens Axboe
This patch enables a hybrid polling mode. Instead of polling after IO submission, we can induce an artificial delay, and then poll after that. For example, if the IO is presumed to complete in 8 usecs from now, we can sleep for 4 usecs, wake up, and then do our polling. This still puts a sleep/wakeup cycle in the IO path, but instead of the wakeup happening after the IO has completed, it'll happen before. With this hybrid scheme, we can achieve big latency reductions while still using the same (or less) amount of CPU. Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17mremap: fix race between mremap() and page cleanningAaron Lu
Prior to 3.15, there was a race between zap_pte_range() and page_mkclean() where writes to a page could be lost. Dave Hansen discovered by inspection that there is a similar race between move_ptes() and page_mkclean(). We've been able to reproduce the issue by enlarging the race window with a msleep(), but have not been able to hit it without modifying the code. So, we think it's a real issue, but is difficult or impossible to hit in practice. The zap_pte_range() issue is fixed by commit 1cf35d47712d("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts"). And this patch is to fix the race between page_mkclean() and mremap(). Here is one possible way to hit the race: suppose a process mmapped a file with READ | WRITE and SHARED, it has two threads and they are bound to 2 different CPUs, e.g. CPU1 and CPU2. mmap returned X, then thread 1 did a write to addr X so that CPU1 now has a writable TLB for addr X on it. Thread 2 starts mremaping from addr X to Y while thread 1 cleaned the page and then did another write to the old addr X again. The 2nd write from thread 1 could succeed but the value will get lost. thread 1 thread 2 (bound to CPU1) (bound to CPU2) 1: write 1 to addr X to get a writeable TLB on this CPU 2: mremap starts 3: move_ptes emptied PTE for addr X and setup new PTE for addr Y and then dropped PTL for X and Y 4: page laundering for N by doing fadvise FADV_DONTNEED. When done, pageframe N is deemed clean. 5: *write 2 to addr X 6: tlb flush for addr X 7: munmap (Y, pagesize) to make the page unmapped 8: fadvise with FADV_DONTNEED again to kick the page off the pagecache 9: pread the page from file to verify the value. If 1 is there, it means we have lost the written 2. *the write may or may not cause segmentation fault, it depends on if the TLB is still on the CPU. Please note that this is only one specific way of how the race could occur, it didn't mean that the race could only occur in exact the above config, e.g. more than 2 threads could be involved and fadvise() could be done in another thread, etc. For anonymous pages, they could race between mremap() and page reclaim: THP: a huge PMD is moved by mremap to a new huge PMD, then the new huge PMD gets unmapped/splitted/pagedout before the flush tlb happened for the old huge PMD in move_page_tables() and we could still write data to it. The normal anonymous page has similar situation. To fix this, check for any dirty PTE in move_ptes()/move_huge_pmd() and if any, did the flush before dropping the PTL. If we did the flush for every move_ptes()/move_huge_pmd() call then we do not need to do the flush in move_pages_tables() for the whole range. But if we didn't, we still need to do the whole range flush. Alternatively, we can track which part of the range is flushed in move_ptes()/move_huge_pmd() and which didn't to avoid flushing the whole range in move_page_tables(). But that would require multiple tlb flushes for the different sub-ranges and should be less efficient than the single whole range flush. KBuild test on my Sandybridge desktop doesn't show any noticeable change. v4.9-rc4: real 5m14.048s user 32m19.800s sys 4m50.320s With this commit: real 5m13.888s user 32m19.330s sys 4m51.200s Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-17mei: bus: split RX and async notification callbacksAlexander Usyskin
Split callbacks for RX and async notification events on mei bus to eliminate synchronization problems and to open way for RX optimizations. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-17vfio: Define device_api stringsKirti Wankhede
Defined device API strings. Vendor driver using mediated device framework should use corresponding string for device_api attribute. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio: Introduce vfio_set_irqs_validate_and_prepare()Kirti Wankhede
Vendor driver using mediated device framework would use same mechnism to validate and prepare IRQs. Introducing this function to reduce code replication in multiple drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio: Introduce common function to add capabilitiesKirti Wankhede
Vendor driver using mediated device framework should use vfio_info_add_capability() to add capabilities. Introduced this function to reduce code duplication in vendor drivers. vfio_info_cap_shift() manipulated a data buffer to add an offset to each element in a chain. This data buffer is documented in a uapi header. Changing vfio_info_cap_shift symbol to be available to all drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio iommu: Add blocking notifier to notify DMA_UNMAPKirti Wankhede
Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers about DMA_UNMAP. Exported two APIs vfio_register_notifier() and vfio_unregister_notifier(). Notifier should be registered, if external user wants to use vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages. Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate mappings. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_opsKirti Wankhede
Added APIs for pining and unpining set of pages. These call back into backend iommu module to actually pin and unpin pages. Added two new callback functions to struct vfio_iommu_driver_ops. Backend IOMMU module that supports pining and unpinning pages for mdev devices should provide these functions. Renamed static functions in vfio_type1_iommu.c to resolve conflicts Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio: Mediated device Core driverKirti Wankhede
Design for Mediated Device Driver: Main purpose of this driver is to provide a common interface for mediated device management that can be used by different drivers of different devices. This module provides a generic interface to create the device, add it to mediated bus, add device to IOMMU group and then add it to vfio group. Below is the high Level block diagram, with Nvidia, Intel and IBM devices as example, since these are the devices which are going to actively use this module as of now. +---------------+ | | | +-----------+ | mdev_register_driver() +--------------+ | | | +<------------------------+ __init() | | | mdev | | | | | | bus | +------------------------>+ |<-> VFIO user | | driver | | probe()/remove() | vfio_mdev.ko | APIs | | | | | | | +-----------+ | +--------------+ | | | MDEV CORE | | MODULE | | mdev.ko | | +-----------+ | mdev_register_device() +--------------+ | | | +<------------------------+ | | | | | | nvidia.ko |<-> physical | | | +------------------------>+ | device | | | | callback +--------------+ | | Physical | | | | device | | mdev_register_device() +--------------+ | | interface | |<------------------------+ | | | | | | i915.ko |<-> physical | | | +------------------------>+ | device | | | | callback +--------------+ | | | | | | | | mdev_register_device() +--------------+ | | | +<------------------------+ | | | | | | ccw_device.ko|<-> physical | | | +------------------------>+ | device | | | | callback +--------------+ | +-----------+ | +---------------+ Core driver provides two types of registration interfaces: 1. Registration interface for mediated bus driver: /** * struct mdev_driver - Mediated device's driver * @name: driver name * @probe: called when new device created * @remove:called when device removed * @driver:device driver structure * **/ struct mdev_driver { const char *name; int (*probe) (struct device *dev); void (*remove) (struct device *dev); struct device_driver driver; }; Mediated bus driver for mdev device should use this interface to register and unregister with core driver respectively: int mdev_register_driver(struct mdev_driver *drv, struct module *owner); void mdev_unregister_driver(struct mdev_driver *drv); Mediated bus driver is responsible to add/delete mediated devices to/from VFIO group when devices are bound and unbound to the driver. 2. Physical device driver interface This interface provides vendor driver the set APIs to manage physical device related work in its driver. APIs are : * dev_attr_groups: attributes of the parent device. * mdev_attr_groups: attributes of the mediated device. * supported_type_groups: attributes to define supported type. This is mandatory field. * create: to allocate basic resources in vendor driver for a mediated device. This is mandatory to be provided by vendor driver. * remove: to free resources in vendor driver when mediated device is destroyed. This is mandatory to be provided by vendor driver. * open: open callback of mediated device * release: release callback of mediated device * read : read emulation callback. * write: write emulation callback. * ioctl: ioctl callback. * mmap: mmap emulation callback. Drivers should use these interfaces to register and unregister device to mdev core driver respectively: extern int mdev_register_device(struct device *dev, const struct parent_ops *ops); extern void mdev_unregister_device(struct device *dev); There are no locks to serialize above callbacks in mdev driver and vfio_mdev driver. If required, vendor driver can have locks to serialize above APIs in their driver. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queuedDaniel Vetter
Tvrtko needs commit b3c11ac267d461d3d597967164ff7278a919a39f Author: Eric Engestrom <eric@engestrom.ch> Date: Sat Nov 12 01:12:56 2016 +0000 drm: move allocation out of drm_get_format_name() to be able to apply his patches without conflicts. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-11-17xenfs: Use proc_create_mount_point() to create /proc/xenSeth Forshee
Mounting proc in user namespace containers fails if the xenbus filesystem is mounted on /proc/xen because this directory fails the "permanently empty" test. proc_create_mount_point() exists specifically to create such mountpoints in proc but is currently proc-internal. Export this interface to modules, then use it in xenbus when creating /proc/xen. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-11-17drm: Nuke modifier[1-3]Ville Syrjälä
It has been suggested that having per-plane modifiers is making life more difficult for userspace, so let's just retire modifier[1-3] and use modifier[0] to apply to the entire framebuffer. Obviosuly this means that if individual planes need different tiling layouts and whatnot we will need a new modifier for each combination of planes with different tiling layouts. For a bit of extra backwards compatilbilty the kernel will allow non-zero modifier[1+] but it require that they will match modifier[0]. This in case there's existing userspace out there that sets modifier[1+] to something non-zero with planar formats. Mostly a cocci job, with a bit of manual stuff mixed in. @@ struct drm_framebuffer *fb; expression E; @@ - fb->modifier[E] + fb->modifier @@ struct drm_framebuffer fb; expression E; @@ - fb.modifier[E] + fb.modifier Cc: Kristian Høgsberg <hoegsberg@gmail.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tomeu Vizoso <tomeu@tomeuvizoso.net> Cc: dczaplejewicz@collabora.co.uk Suggested-by: Kristian Høgsberg <hoegsberg@gmail.com> Acked-by: Ben Widawsky <ben@bwidawsk.net> Acked-by: Daniel Stone <daniels@collabora.com> Acked-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1479295996-26246-1-git-send-email-ville.syrjala@linux.intel.com
2016-11-17locking/core: Provide common cpu_relax_yield() definitionChristian Borntraeger
No need to duplicate the same define everywhere. Since the only user is stop-machine and the only provider is s390, we can use a default implementation of cpu_relax_yield() in sched.h. Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-s390 <linux-s390@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1479298985-191589-1-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16Merge branch 'topic/st_fdma' of ↵Bjorn Andersson
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma into rproc-next
2016-11-16sctp: use new rhlist interface on sctp transport rhashtableXin Long
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>