summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2016-11-22kvm: x86: don't print warning messages for unimplemented msrsBandan Das
Change unimplemented msrs messages to use pr_debug. If CONFIG_DYNAMIC_DEBUG is set, then these messages can be enabled at run time or else -DDEBUG can be used at compile time to enable them. These messages will still be printed if ignore_msrs=1. Signed-off-by: Bandan Das <bsd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-22block: bio: pass bvec table to bio_init()Ming Lei
Some drivers often use external bvec table, so introduce this helper for this case. It is always safe to access the bio->bi_io_vec in this way for this case. After converting to this usage, it will becomes a bit easier to evaluate the remaining direct access to bio->bi_io_vec, so it can help to prepare for the following multipage bvec support. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixed up the new O_DIRECT cases. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22kvm: Introduce kvm_write_guest_offset_cached()Pan Xinhui
It allows us to update some status or field of a structure partially. We can also save a kvm_read_guest_cached() call if we just update one fild of the struct regardless of its current value. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-8-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22sched/core: Introduce the vcpu_is_preempted(cpu) interfacePan Xinhui
This patch is the first step to add support to improve lock holder preemption beaviour. vcpu_is_preempted(cpu) does the obvious thing: it tells us whether a vCPU is preempted or not. Defaults to false on architectures that don't support it. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Translated the changelog to English. ] Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-2-git-send-email-xinhui.pan@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22sched/autogroup: Do not use autogroup->tg in zombie threadsOleg Nesterov
Exactly because for_each_thread() in autogroup_move_group() can't see it and update its ->sched_task_group before _put() and possibly free(). So the exiting task needs another sched_move_task() before exit_notify() and we need to re-introduce the PF_EXITING (or similar) check removed by the previous change for another reason. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hartsjc@redhat.com Cc: vbendel@redhat.com Cc: vlovejoy@redhat.com Link: http://lkml.kernel.org/r/20161114184612.GA15968@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-22led: core: Fix blink_brightness setting raceHans de Goede
All 3 of led_timer_func, led_set_brightness and led_set_software_blink set blink_brightness. If led_timer_func or led_set_software_blink race with led_set_brightness they may end up overwriting the new blink_brightness. The new atomic work_flags does not protect against this as it just protects the flags and not blink_brightness. This commit introduces a new new_blink_brightness value which gets set by led_set_brightness and read by led_timer_func on LED on, fixing this. Dealing with the new brightness at LED on time, makes the new brightness apply sooner, which also fixes a led_set_brightness which happens while a oneshot blink which ends in LED on is running not getting applied. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
2016-11-22led: core: Use atomic bit-field for the blink-flagsHans de Goede
All the LED_BLINK* flags are accessed read-modify-write from e.g. led_set_brightness and led_blink_set_oneshot while both set_brightness_work and the blink_timer may be running. If these race then the modify step done by one of them may be lost, switch the LED_BLINK* flags to a new atomic work_flags bit-field to avoid this race. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
2016-11-21PM / sleep / ACPI: Use the ACPI_FADT_LOW_POWER_S0 flagRafael J. Wysocki
Modify the ACPI system sleep support setup code to select suspend-to-idle as the default system sleep state if the ACPI_FADT_LOW_POWER_S0 flag is set in the FADT and the default sleep state was not selected from the kernel command line. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2016-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Clear congestion control state when changing algorithms on an existing socket, from Florian Westphal. 2) Fix register bit values in altr_tse_pcs portion of stmmac driver, from Jia Jie Ho. 3) Fix PTP handling in stammc driver for GMAC4, from Giuseppe CAVALLARO. 4) Fix udplite multicast delivery handling, it ignores the udp_table parameter passed into the lookups, from Pablo Neira Ayuso. 5) Synchronize the space estimated by rtnl_vfinfo_size and the space actually used by rtnl_fill_vfinfo. From Sabrina Dubroca. 6) Fix memory leak in fib_info when splitting nodes, from Alexander Duyck. 7) If a driver does a napi_hash_del() explicitily and not via netif_napi_del(), it must perform RCU synchronization as needed. Fix this in virtio-net and bnxt drivers, from Eric Dumazet. 8) Likewise, it is not necessary to invoke napi_hash_del() is we are also doing neif_napi_del() in the same code path. Remove such calls from be2net and cxgb4 drivers, also from Eric Dumazet. 9) Don't allocate an ID in peernet2id_alloc() if the netns is dead, from WANG Cong. 10) Fix OF node and device struct leaks in of_mdio, from Johan Hovold. 11) We cannot cache routes in ip6_tunnel when using inherited traffic classes, from Paolo Abeni. 12) Fix several crashes and leaks in cpsw driver, from Johan Hovold. 13) Splice operations cannot use freezable blocking calls in AF_UNIX, from WANG Cong. 14) Link dump filtering by master device and kind support added an error in loop index updates during the dump if we actually do filter, fix from Zhang Shengju. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (59 commits) tcp: zero ca_priv area when switching cc algorithms net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit ethernet: stmmac: make DWMAC_STM32 depend on it's associated SoC tipc: eliminate obsolete socket locking policy description rtnl: fix the loop index update error in rtnl_dump_ifinfo() l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() net: macb: add check for dma mapping error in start_xmit() rtnetlink: fix FDB size computation netns: fix get_net_ns_by_fd(int pid) typo af_unix: conditionally use freezable blocking calls in read net: ethernet: ti: cpsw: fix fixed-link phy probe deferral net: ethernet: ti: cpsw: add missing sanity check net: ethernet: ti: cpsw: fix secondary-emac probe error path net: ethernet: ti: cpsw: fix of_node and phydev leaks net: ethernet: ti: cpsw: fix deferred probe net: ethernet: ti: cpsw: fix mdio device reference leak net: ethernet: ti: cpsw: fix bad register access in probe error path net: sky2: Fix shutdown crash cfg80211: limit scan results cache size net sched filters: pass netlink message flags in event notification ...
2016-11-21block: clear all of bi_opf in bio_set_op_attrsChristoph Hellwig
Since commit 87374179 ("block: add a proper block layer data direction encoding") we only or the new op and flags into bi_opf in bio_set_op_attrs instead of clearing the old value. I've not seen any breakage with the new behavior, but it seems dangerous. Also convert it to an inline function to make the argument passing safer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-21bpf: add __must_check attributes to refcount manipulating helpersDaniel Borkmann
Helpers like bpf_prog_add(), bpf_prog_inc(), bpf_map_inc() can fail with an error, so make sure the caller properly checks their return value and not just ignores it, which could worst-case lead to use after free. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-21cpufreq: Make cpufreq_update_policy() voidRafael J. Wysocki
The return value of cpufreq_update_policy() is never used, so make it void. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-11-21EDAC, amd64: Add Deferred Error typeYazen Ghannam
Currently, deferred errors are classified as correctable in EDAC. Add a new error type for deferred errors so that they are correctly reported to the user. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-7-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-21sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_QWaiman Long
Currently the wake_q data structure is defined by the WAKE_Q() macro. This macro, however, looks like a function doing something as "wake" is a verb. Even checkpatch.pl was confused as it reported warnings like WARNING: Missing a blank line after declarations #548: FILE: kernel/futex.c:3665: + int ret; + WAKE_Q(wake_q); This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies what the macro is doing and eliminates the checkpatch.pl warnings. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com [ Resolved conflict and added missing rename. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21EDAC: Add LRDDR4 DRAM typeYazen Ghannam
AMD Fam17h systems can support Load-Reduced DDR4 DIMMs. So add this new type to edac.h in preparation for the Fam17h EDAC update. Also, let's fix a format issue with the LRDDR3 line while we're here. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1479423463-8536-3-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-11-19net: fix bogus cast in skb_pagelen() and use unsigned variablesAlexey Dobriyan
1) cast to "int" is unnecessary: u8 will be promoted to int before decrementing, small positive numbers fit into "int", so their values won't be changed during promotion. Once everything is int including loop counters, signedness doesn't matter: 32-bit operations will stay 32-bit operations. But! Someone tried to make this loop smart by making everything of the same type apparently in an attempt to optimise it. Do the optimization, just differently. Do the cast where it matters. :^) 2) frag size is unsigned entity and sum of fragments sizes is also unsigned. Make everything unsigned, leave no MOVSX instruction behind. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-4 (-4) function old new delta skb_cow_data 835 834 -1 ip_do_fragment 2549 2548 -1 ip6_fragment 3130 3128 -2 Total: Before=154865032, After=154865028, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-19iommu/vt-d: Fix PASID table allocationDavid Woodhouse
Somehow I ended up with an off-by-three error in calculating the size of the PASID and PASID State tables, which triggers allocations failures as those tables unfortunately have to be physically contiguous. In fact, even the *correct* maximum size of 8MiB is problematic and is wont to lead to allocation failures. Since I have extracted a promise that this *will* be fixed in hardware, I'm happy to limit it on the current hardware to a maximum of 0x20000 PASIDs, which gives us 1MiB tables — still not ideal, but better than before. Reported by Mika Kuoppala <mika.kuoppala@linux.intel.com> and also by Xunlei Pang <xlpang@redhat.com> who submitted a simpler patch to fix only the allocation (and not the free) to the "correct" limit... which was still problematic. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: stable@vger.kernel.org
2016-11-19virtio_net: Do not clear memory for struct virtio_net_hdr twice.Jarno Rajahalme
virtio_net_hdr_from_skb() clears the memory for the header, so there is no point for the callers to do the same. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-19virtio_net.h: Fix comment.Jarno Rajahalme
Fix incorrent comment after the final #endif. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18Merge tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfix from Bruce Fields: "Just one fix for an NFS/RDMA crash" * tag 'nfsd-4.9-2' of git://linux-nfs.org/~bfields/linux: sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports
2016-11-18Merge tag 'v4.9-rc4' into soundJonathan Corbet
Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_infoFlorian Fainelli
In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST, provide an inline stub for mvebu_mbus_get_dram_win_info(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLERaju Lakkaraju
Adding get_tunable/set_tunable function pointer to the phy_driver structure, and uses these function pointers to implement the ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Add MPCNT register infrastructureGal Pressman
Add the needed infrastructure for future use of MPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Set driver version infrastructureSaeed Mahameed
Add driver_version capability bit is enabled, and set driver version command in mlx5_ifc firmware header. The only purpose of this command is to store a driver version/OS string in FW to be reported and displayed in various management systems, such as IPMI/BMC. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Add handling for port module eventHuy Nguyen
For each asynchronous port module event: 1. print with ratelimit to the dmesg log 2. increment the corresponding event counter Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Port module event hardware structuresHuy Nguyen
Add hardware structures and constants definitions needed for module events support. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18net/mlx5: Make the command interface cache more flexibleMohamad Haj Yahia
Add more cache command size sets and more entries for each set based on the current commands set different sizes and commands frequency. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18Merge tag 'usb-for-v4.10' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next Felipe writes: usb: patches for v4.10 merge window One big merge this time with a total of 166 non-merge commits. Most of the work, by far, is on dwc2 this time (68.2%) with dwc3 a far second (22.5%). The remaining 9.3% are scattered on gadget drivers. The most important changes for dwc2 are the peripheral side DMA support implemented by Synopsys folks and support for the new IOT dwc2 compatible core from Synopsys. In dwc3 land we have support for high-bandwidth, high-speed isochronous endpoints and some non-critical fixes for large scatter lists. Apart from these, we have our usual set of cleanups, non-critical fixes, etc.
2016-11-18block: Change extern inline to static inlineTobias Klauser
With compilers which follow the C99 standard (like modern versions of gcc and clang), "extern inline" does the opposite thing from older versions of gcc (emits code for an externally linkable version of the inline function). "static inline" does the intended behavior in all cases instead. Description taken from commit 6d91857d4826 ("staging, rtl8192e, LLVMLinux: Change extern inline to static inline"). This also fixes the following GCC warning when building with CONFIG_PM disabled: ./include/linux/blkdev.h:1143:20: warning: no previous prototype for 'blk_set_runtime_active' [-Wmissing-prototypes] Fixes: d07ab6d11477 ("block: Add blk_set_runtime_active()") Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-18usb: gadget: fix request length error for isoc transferPeter Chen
For isoc endpoint descriptor, the wMaxPacketSize is not real max packet size (see Table 9-13. Standard Endpoint Descriptor, USB 2.0 specifcation), it may contain the number of packet, so the real max packet should be ep->desc->wMaxPacketSize && 0x7ff. Cc: Felipe F. Tonello <eu@felipetonello.com> Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Fixes: 16b114a6d797 ("usb: gadget: fix usb_ep_align_maybe endianness and new usb_ep_aligna") Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
2016-11-17Merge tag 'clk-renesas-for-v4.10-tag2' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into clk-next Pull Renesas clk driver updates from Geerty Uytterhoeven: - Add R-Car RST driver for obtaining mode pin state, and move the related functionality from platform code to DT, - Add r8a7743 and r8a7745 CPG Core Clock Definitions. The commits here are intermingled with arm-soc material because of the hard dependency we're breaking between mach code and driver code. We're replacing that with a driver dependency between the soc driver and the clk driver. * tag 'clk-renesas-for-v4.10-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers: (25 commits) clk: renesas: Add r8a7745 CPG Core Clock Definitions clk: renesas: Add r8a7743 CPG Core Clock Definitions clk: renesas: rcar-gen2: Remove obsolete rcar_gen2_clocks_init() clk: renesas: r8a7779: Remove obsolete r8a7779_clocks_init() clk: renesas: r8a7778: Remove obsolete r8a7778_clocks_init() ARM: shmobile: rcar-gen2: Stop passing mode pins state to clock driver ARM: shmobile: r8a7779: Stop passing mode pins state to clock driver ARM: shmobile: r8a7778: Stop passing mode pins state to clock driver clk: renesas: rcar-gen3-cpg: Remove obsolete rcar_gen3_read_mode_pins() clk: renesas: r8a7796: Obtain mode pin values from R-Car RST driver clk: renesas: r8a7795: Obtain mode pin values from R-Car RST driver clk: renesas: rcar-gen2: Obtain mode pin values using RST driver clk: renesas: r8a7779: Obtain mode pin values from R-Car RST driver clk: renesas: r8a7778: Obtain mode pin values using R-Car RST driver arm64: renesas: r8a7796 dtsi: Add device node for RST module arm64: renesas: r8a7795 dtsi: Add device node for RST module ARM: dts: r8a7794: Add device node for RST module ARM: dts: r8a7793: Add device node for RST module ARM: dts: r8a7792: Add device node for RST module ARM: dts: r8a7791: Add device node for RST module ...
2016-11-17blk-mq: make the polling code adaptiveJens Axboe
The previous commit introduced the hybrid sleep/poll mode. Take that one step further, and use the completion latencies to automatically sleep for half the mean completion time. This is a good approximation. This changes the 'io_poll_delay' sysfs file a bit to expose the various options. Depending on the value, the polling code will behave differently: -1 Never enter hybrid sleep mode 0 Use half of the completion mean for the sleep delay >0 Use this specific value as the sleep delay Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17blk-mq: implement hybrid poll mode for sync O_DIRECTJens Axboe
This patch enables a hybrid polling mode. Instead of polling after IO submission, we can induce an artificial delay, and then poll after that. For example, if the IO is presumed to complete in 8 usecs from now, we can sleep for 4 usecs, wake up, and then do our polling. This still puts a sleep/wakeup cycle in the IO path, but instead of the wakeup happening after the IO has completed, it'll happen before. With this hybrid scheme, we can achieve big latency reductions while still using the same (or less) amount of CPU. Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-17mremap: fix race between mremap() and page cleanningAaron Lu
Prior to 3.15, there was a race between zap_pte_range() and page_mkclean() where writes to a page could be lost. Dave Hansen discovered by inspection that there is a similar race between move_ptes() and page_mkclean(). We've been able to reproduce the issue by enlarging the race window with a msleep(), but have not been able to hit it without modifying the code. So, we think it's a real issue, but is difficult or impossible to hit in practice. The zap_pte_range() issue is fixed by commit 1cf35d47712d("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts"). And this patch is to fix the race between page_mkclean() and mremap(). Here is one possible way to hit the race: suppose a process mmapped a file with READ | WRITE and SHARED, it has two threads and they are bound to 2 different CPUs, e.g. CPU1 and CPU2. mmap returned X, then thread 1 did a write to addr X so that CPU1 now has a writable TLB for addr X on it. Thread 2 starts mremaping from addr X to Y while thread 1 cleaned the page and then did another write to the old addr X again. The 2nd write from thread 1 could succeed but the value will get lost. thread 1 thread 2 (bound to CPU1) (bound to CPU2) 1: write 1 to addr X to get a writeable TLB on this CPU 2: mremap starts 3: move_ptes emptied PTE for addr X and setup new PTE for addr Y and then dropped PTL for X and Y 4: page laundering for N by doing fadvise FADV_DONTNEED. When done, pageframe N is deemed clean. 5: *write 2 to addr X 6: tlb flush for addr X 7: munmap (Y, pagesize) to make the page unmapped 8: fadvise with FADV_DONTNEED again to kick the page off the pagecache 9: pread the page from file to verify the value. If 1 is there, it means we have lost the written 2. *the write may or may not cause segmentation fault, it depends on if the TLB is still on the CPU. Please note that this is only one specific way of how the race could occur, it didn't mean that the race could only occur in exact the above config, e.g. more than 2 threads could be involved and fadvise() could be done in another thread, etc. For anonymous pages, they could race between mremap() and page reclaim: THP: a huge PMD is moved by mremap to a new huge PMD, then the new huge PMD gets unmapped/splitted/pagedout before the flush tlb happened for the old huge PMD in move_page_tables() and we could still write data to it. The normal anonymous page has similar situation. To fix this, check for any dirty PTE in move_ptes()/move_huge_pmd() and if any, did the flush before dropping the PTL. If we did the flush for every move_ptes()/move_huge_pmd() call then we do not need to do the flush in move_pages_tables() for the whole range. But if we didn't, we still need to do the whole range flush. Alternatively, we can track which part of the range is flushed in move_ptes()/move_huge_pmd() and which didn't to avoid flushing the whole range in move_page_tables(). But that would require multiple tlb flushes for the different sub-ranges and should be less efficient than the single whole range flush. KBuild test on my Sandybridge desktop doesn't show any noticeable change. v4.9-rc4: real 5m14.048s user 32m19.800s sys 4m50.320s With this commit: real 5m13.888s user 32m19.330s sys 4m51.200s Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-17mei: bus: split RX and async notification callbacksAlexander Usyskin
Split callbacks for RX and async notification events on mei bus to eliminate synchronization problems and to open way for RX optimizations. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-17vfio: Introduce vfio_set_irqs_validate_and_prepare()Kirti Wankhede
Vendor driver using mediated device framework would use same mechnism to validate and prepare IRQs. Introducing this function to reduce code replication in multiple drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio: Introduce common function to add capabilitiesKirti Wankhede
Vendor driver using mediated device framework should use vfio_info_add_capability() to add capabilities. Introduced this function to reduce code duplication in vendor drivers. vfio_info_cap_shift() manipulated a data buffer to add an offset to each element in a chain. This data buffer is documented in a uapi header. Changing vfio_info_cap_shift symbol to be available to all drivers. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio iommu: Add blocking notifier to notify DMA_UNMAPKirti Wankhede
Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers about DMA_UNMAP. Exported two APIs vfio_register_notifier() and vfio_unregister_notifier(). Notifier should be registered, if external user wants to use vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages. Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate mappings. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_opsKirti Wankhede
Added APIs for pining and unpining set of pages. These call back into backend iommu module to actually pin and unpin pages. Added two new callback functions to struct vfio_iommu_driver_ops. Backend IOMMU module that supports pining and unpinning pages for mdev devices should provide these functions. Renamed static functions in vfio_type1_iommu.c to resolve conflicts Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17vfio: Mediated device Core driverKirti Wankhede
Design for Mediated Device Driver: Main purpose of this driver is to provide a common interface for mediated device management that can be used by different drivers of different devices. This module provides a generic interface to create the device, add it to mediated bus, add device to IOMMU group and then add it to vfio group. Below is the high Level block diagram, with Nvidia, Intel and IBM devices as example, since these are the devices which are going to actively use this module as of now. +---------------+ | | | +-----------+ | mdev_register_driver() +--------------+ | | | +<------------------------+ __init() | | | mdev | | | | | | bus | +------------------------>+ |<-> VFIO user | | driver | | probe()/remove() | vfio_mdev.ko | APIs | | | | | | | +-----------+ | +--------------+ | | | MDEV CORE | | MODULE | | mdev.ko | | +-----------+ | mdev_register_device() +--------------+ | | | +<------------------------+ | | | | | | nvidia.ko |<-> physical | | | +------------------------>+ | device | | | | callback +--------------+ | | Physical | | | | device | | mdev_register_device() +--------------+ | | interface | |<------------------------+ | | | | | | i915.ko |<-> physical | | | +------------------------>+ | device | | | | callback +--------------+ | | | | | | | | mdev_register_device() +--------------+ | | | +<------------------------+ | | | | | | ccw_device.ko|<-> physical | | | +------------------------>+ | device | | | | callback +--------------+ | +-----------+ | +---------------+ Core driver provides two types of registration interfaces: 1. Registration interface for mediated bus driver: /** * struct mdev_driver - Mediated device's driver * @name: driver name * @probe: called when new device created * @remove:called when device removed * @driver:device driver structure * **/ struct mdev_driver { const char *name; int (*probe) (struct device *dev); void (*remove) (struct device *dev); struct device_driver driver; }; Mediated bus driver for mdev device should use this interface to register and unregister with core driver respectively: int mdev_register_driver(struct mdev_driver *drv, struct module *owner); void mdev_unregister_driver(struct mdev_driver *drv); Mediated bus driver is responsible to add/delete mediated devices to/from VFIO group when devices are bound and unbound to the driver. 2. Physical device driver interface This interface provides vendor driver the set APIs to manage physical device related work in its driver. APIs are : * dev_attr_groups: attributes of the parent device. * mdev_attr_groups: attributes of the mediated device. * supported_type_groups: attributes to define supported type. This is mandatory field. * create: to allocate basic resources in vendor driver for a mediated device. This is mandatory to be provided by vendor driver. * remove: to free resources in vendor driver when mediated device is destroyed. This is mandatory to be provided by vendor driver. * open: open callback of mediated device * release: release callback of mediated device * read : read emulation callback. * write: write emulation callback. * ioctl: ioctl callback. * mmap: mmap emulation callback. Drivers should use these interfaces to register and unregister device to mdev core driver respectively: extern int mdev_register_device(struct device *dev, const struct parent_ops *ops); extern void mdev_unregister_device(struct device *dev); There are no locks to serialize above callbacks in mdev driver and vfio_mdev driver. If required, vendor driver can have locks to serialize above APIs in their driver. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Jike Song <jike.song@intel.com> Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17xenfs: Use proc_create_mount_point() to create /proc/xenSeth Forshee
Mounting proc in user namespace containers fails if the xenbus filesystem is mounted on /proc/xen because this directory fails the "permanently empty" test. proc_create_mount_point() exists specifically to create such mountpoints in proc but is currently proc-internal. Export this interface to modules, then use it in xenbus when creating /proc/xen. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2016-11-17locking/core: Provide common cpu_relax_yield() definitionChristian Borntraeger
No need to duplicate the same define everywhere. Since the only user is stop-machine and the only provider is s390, we can use a default implementation of cpu_relax_yield() in sched.h. Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-s390 <linux-s390@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1479298985-191589-1-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16Merge branch 'topic/st_fdma' of ↵Bjorn Andersson
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/slave-dma into rproc-next
2016-11-16netpoll: more efficient lockingEric Dumazet
Callers of netpoll_poll_lock() own NAPI_STATE_SCHED Callers of netpoll_poll_unlock() have BH blocked between the NAPI_STATE_SCHED being cleared and poll_lock is released. We can avoid the spinlock which has no contention, and use cmpxchg() on poll_owner which we need to set anyway. This removes a possible lockdep violation after the cited commit, since sk_busy_loop() re-enables BH before calling busy_poll_stop() Fixes: 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16ACPI / tebles: remove redundant declare of acpi_table_parse_entries()Longpeng \(Mike\)
This function declared twice, so remove one declaration of it. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16net: busy-poll: return busypolling status to driversEric Dumazet
NAPI drivers use napi_complete_done() or napi_complete() when they drained RX ring and right before re-enabling device interrupts. In busy polling, we can avoid interrupts being delivered since we are polling RX ring in a controlled loop. Drivers can chose to use napi_complete_done() return value to reduce interrupts overhead while busy polling is active. This is optional, legacy drivers should work fine even if not updated. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16net: busy-poll: allow preemption in sk_busy_loop()Eric Dumazet
After commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), sk_busy_loop() needs a bit of care : softirqs might be delayed since we do not allow preemption yet. This patch adds preemptiom points in sk_busy_loop(), and makes sure no unnecessary cache line dirtying or atomic operations are done while looping. A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED, so that sk_busy_loop() does not have to grab it again. Similarly, netpoll_poll_lock() is done one time. This gives about 10 to 20 % improvement in various busy polling tests, especially when many threads are busy polling in configurations with large number of NIC queues. This should allow experimenting with bigger delays without hurting overall latencies. Tested: On a 40Gb mlx4 NIC, 32 RX/TX queues. echo 70 >/proc/sys/net/core/busy_read for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done Before: After: 1: 90072 92819 2: 157289 184007 3: 235772 213504 4: 344074 357513 5: 394755 458267 6: 461151 487819 7: 549116 625963 8: 544423 716219 9: 720460 738446 10: 794686 837612 11: 915998 923960 12: 937507 925107 13: 1019677 971506 14: 1046831 1113650 15: 1114154 1148902 16: 1105221 1179263 17: 1266552 1299585 18: 1258454 1383817 19: 1341453 1312194 20: 1363557 1488487 21: 1387979 1501004 22: 1417552 1601683 23: 1550049 1642002 24: 1568876 1601915 25: 1560239 1683607 26: 1640207 1745211 27: 1706540 1723574 28: 1638518 1722036 29: 1734309 1757447 30: 1782007 1855436 31: 1724806 1888539 32: 1717716 1944297 33: 1778716 1869118 34: 1805738 1983466 35: 1815694 2020758 36: 1893059 2035632 37: 1843406 2034653 38: 1888830 2086580 39: 1972827 2143567 40: 1877729 2181851 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Adam Belay <abelay@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16bpf: fix range arithmetic for bpf map accessJosef Bacik
I made some invalid assumptions with BPF_AND and BPF_MOD that could result in invalid accesses to bpf map entries. Fix this up by doing a few things 1) Kill BPF_MOD support. This doesn't actually get used by the compiler in real life and just adds extra complexity. 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the minimum value to 0 for positive AND's. 3) Don't do operations on the ranges if they are set to the limits, as they are by definition undefined, and allowing arithmetic operations on those values could make them appear valid when they really aren't. This fixes the testcase provided by Jann as well as a few other theoretical problems. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>