linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-03-23	cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely	Rafael J. Wysocki
	The way the schedutil governor uses the PELT metric causes it to underestimate the CPU utilization in some cases. That can be easily demonstrated by running kernel compilation on a Sandy Bridge Intel processor, running turbostat in parallel with it and looking at the values written to the MSR_IA32_PERF_CTL register. Namely, the expected result would be that when all CPUs were 100% busy, all of them would be requested to run in the maximum P-state, but observation shows that this clearly isn't the case. The CPUs run in the maximum P-state for a while and then are requested to run slower and go back to the maximum P-state after a while again. That causes the actual frequency of the processor to visibly oscillate below the sustainable maximum in a jittery fashion which clearly is not desirable. That has been attributed to CPU utilization metric updates on task migration that cause the total utilization value for the CPU to be reduced by the utilization of the migrated task. If that happens, the schedutil governor may see a CPU utilization reduction and will attempt to reduce the CPU frequency accordingly right away. That may be premature, though, for example if the system is generally busy and there are other runnable tasks waiting to be run on that CPU already. This is unlikely to be an issue on systems where cpufreq policies are shared between multiple CPUs, because in those cases the policy utilization is computed as the maximum of the CPU utilization values over the whole policy and if that turns out to be low, reducing the frequency for the policy most likely is a good idea anyway. On systems with one CPU per policy, however, it may affect performance adversely and even lead to increased energy consumption in some cases. On those systems it may be addressed by taking another utilization metric into consideration, like whether or not the CPU whose frequency is about to be reduced has been idle recently, because if that's not the case, the CPU is likely to be busy in the near future and its frequency should not be reduced. To that end, use the counter of idle calls in the timekeeping code. Namely, make the schedutil governor look at that counter for the current CPU every time before its frequency is about to be reduced. If the counter has not changed since the previous iteration of the governor computations for that CPU, the CPU has been busy for all that time and its frequency should not be decreased, so if the new frequency would be lower than the one set previously, the governor will skip the frequency update. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Joel Fernandes <joelaf@google.com>
2017-03-23	iommu/iova: Fix compile error with CONFIG_IOMMU_IOVA=m	Joerg Roedel
	The #ifdef in iova.h only catches the CONFIG_IOMMU_IOVA=y case, so that compilation as a module fails with duplicate function definition errors. Fix it by catching both cases in the #if. Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-22	bpf: Add hash of maps support	Martin KaFai Lau
	This patch adds hash of maps support (hashmap->bpf_map). BPF_MAP_TYPE_HASH_OF_MAPS is added. A map-in-map contains a pointer to another map and lets call this pointer 'inner_map_ptr'. Notes on deleting inner_map_ptr from a hash map: 1. For BPF_F_NO_PREALLOC map-in-map, when deleting an inner_map_ptr, the htab_elem itself will go through a rcu grace period and the inner_map_ptr resides in the htab_elem. 2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC), when deleting an inner_map_ptr, the htab_elem may get reused immediately. This situation is similar to the existing prealloc-ated use cases. However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls inner_map->ops->map_free(inner_map) which will go through a rcu grace period (i.e. all bpf_map's map_free currently goes through a rcu grace period). Hence, the inner_map_ptr is still safe for the rcu reader side. This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the check_map_prealloc() in the verifier. preallocation is a must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using map-in-map first. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22	bpf: Add array of maps support	Martin KaFai Lau
	This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22	net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.	Joel Scherpelz
	This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). Signed-off-by: Joel Scherpelz <jscherpelz@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22	of: Add function for generating a DT modalias with a newline	Rob Herring
	The modalias sysfs attr is lacking a newline for DT aliases on platform devices. The macio and ibmebus correctly add the newline, but open code it. Introduce a new function, of_device_modalias(), that fills the buffer with the modalias including the newline and update users of the old of_device_get_modalias function. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Frank Rowand <frowand.list@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22	net: phy: remove the indirect MMD read/write methods	Russell King
	Remove the indirect MMD read/write methods which are now no longer necessary. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22	net: phy: make phy_(read\|write)_mmd() generic MMD accessors	Russell King
	Make phy_(read\|write)_mmd() generic 802.3 clause 45 register accessors for both Clause 22 and Clause 45 PHYs, using either the direct register reading for Clause 45, or the indirect method for Clause 22 PHYs. Allow this behaviour to be overriden by PHY drivers where necessary. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22	net: phy: move phy MMD accessors to phy-core.c	Russell King
	Move the phy_(read\|write)__mmd() helpers out of line, they will become our main MMD accessor functions, and so will be a little more complex. This complexity doesn't belong in an inline function. Also move the _indirect variants as well to keep like functionality together. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22	net: stmmac: Use AVB mode by default	Thierry Reding
	Prior to the recent multi-queue changes the driver would configure the queues to use the AVB mode, but the mode then got switched to DCB. The hardware still works fine in DCB mode, but my testing capabilities are limited, so it's safer to revert to the prior setting anyway. Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22	net: convert sk_filter.refcnt from atomic_t to refcount_t	Reshetova, Elena
	refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22	iommu/dma: Make PCI window reservation generic	Robin Murphy
	Now that we're applying the IOMMU API reserved regions to our IOVA domains, we shouldn't need to privately special-case PCI windows, or indeed anything else which isn't specific to our iommu-dma layer. However, since those aren't IOMMU-specific either, rather than start duplicating code into IOMMU drivers let's transform the existing function into an iommu_get_resv_regions() helper that they can share. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-22	iommu: Disambiguate MSI region types	Robin Murphy
	The introduction of reserved regions has left a couple of rough edges which we could do with sorting out sooner rather than later. Since we are not yet addressing the potential dynamic aspect of software-managed reservations and presenting them at arbitrary fixed addresses, it is incongruous that we end up displaying hardware vs. software-managed MSI regions to userspace differently, especially since ARM-based systems may actually require one or the other, or even potentially both at once, (which iommu-dma currently has no hope of dealing with at all). Let's resolve the former user-visible inconsistency ASAP before the ABI has been baked into a kernel release, in a way that also lays the groundwork for the latter shortcoming to be addressed by follow-up patches. For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use IOMMU_RESV_MSI to describe the hardware type, and document everything a little bit. Since the x86 MSI remapping hardware falls squarely under this meaning of IOMMU_RESV_MSI, apply that type to their regions as well, so that we tell the same story to userspace across all platforms. Secondly, as the various region types require quite different handling, and it really makes little sense to ever try combining them, convert the bitfield-esque #defines to a plain enum in the process before anyone gets the wrong impression. Fixes: d30ddcaa7b02 ("iommu: Add a new type field in iommu_resv_region") Reviewed-by: Eric Auger <eric.auger@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: David Woodhouse <dwmw2@infradead.org> CC: kvm@vger.kernel.org Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-22	iommu: Add dummy implementations for !IOMMU_IOVA	Thierry Reding
	Currently, building code which uses the API guarded by the IOMMU_IOVA will fail to link if IOMMU_IOVA is not enabled. Often this code will be using the API provided by the IOMMU_API Kconfig symbol, but support for this can be optional, with code falling back to contiguous memory. This commit implements dummy functions for the IOVA API so that it can be compiled out. With both IOMMU_API and IOMMU_IOVA optional, code can now be built with or without support for IOMMU without having to resort to #ifdefs in the user code. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-22	iommu/vt-d: Use lo_hi_readq() / lo_hi_writeq()	Andy Shevchenko
	There is already helper functions to do 64-bit I/O on 32-bit machines or buses, thus we don't need to reinvent the wheel. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-22	Merge tag 'iio-fixes-for-4.11c' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: Thirds set of IIO fixes for the 4.11 cycle. * core - iio sw-device - ensure configfs is enabled both when building as module and built in. * ak8974 - drop incorrect __exit markup on remove. * hid-sensor-trigger - code reorganise to avoid losing settings if a power cycle occurs during S3. * lsm6dsx - fix incorrect overwrite of parts of FIFO_CTRL2 register during watermark configuration. * ti-am335x - fix a hard to hit bug when reenabling from a fifo overrun by waiting for current cycle to finish.
2017-03-22	Merge tag 'iio-for-4.12b' of ↵	Greg Kroah-Hartman
	git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: 2nd set of new device support, features and cleanups for IIO in the 4.12 cycle A good collection of outreachy related patches in here - mostly staging driver cleanup. Also a fair number of patches added explicit OF device ID tables for i2c drivers - a precursor to dropping (eventually) the implicit probing. New Device Support * Allwinner SoC ADC. - So far covers the sun4i-a10, sun5i-a13 and sun6i-a31 general purpose ADCs, including thermal side of things. This missed the last cycle due to my incompetence, so good to get in now, particularly as various patches dependent on it are appearing. * ltc2632 - new driver supporting ltc2632-l12, ltc2632-l10, ltc2632-l8, ltc2632-h12, ltc-2632-h10, ltc-2632-h8 dacs Cleanups * Documentation - drop a broken reference to i2c/trivial-devices * ad2s1200 - drop & from function pointers for consistency. * ad2s1210 - formatting fixes. * ad7152 - octal permissions instead of symbolic. - drop & from function pointers for consistent usage. * ad7192 - drop & from function pointers for consistent usage. * ad7280 - replace core mlock usage with a local lock as mlock is intended only to protect the current device state (direct reads, or triggered and buffered) * ad7746 - drop & from function pointers for consistent usage. - replace core mlock usage with a local lock as mlock is intended only to protect the current device state (direct reads, or triggered and buffered) * ad7754 - move contents of header file into source file as not used anywhere else. * ad7759 - move contents of header file into source file as not used anywhere else. * ad7780 - drop & from function pointers for consistent usage. * ad7832 - replace core mlock usage with a local lock as mlock is intended only to protect the current device state (direct reads, or triggered and buffered) * ad9834 - replace core mlock usage with a local lock as mlock is intended only to protect the current device state (direct reads, or triggered and buffered) - drop an unnecessary goto in favour of direct return. * adis16060 - drop & from function pointers as inconsistent. * adis16201 - drop a local mutex as the adis core already protects everything necessary. - drop & from function pointers for consistent usage. * adis16203 - drop & from function pointers for consistent usage. * adis16209 - drop a local mutex as the adis core already protects everything necessary. - use an enum for scan index giving slightly nicer code. - drop & from function pointers for consistent usage. * adis16240 - drop a local mutex as the adis core already protects everything necessary. - use an enum for scan index giving slightly nicer code. - drop & from function pointers for consistent usage. * apds9960 - add OF device ID table. * bma180 - add OF device ID table. - prefer unsigned int to bare use of unsigned. * bmc150_magn - add OF device ID table. * hmp03 - add OF device ID table. * ina2xx - add OF device ID table. * itg3200 - add OF device ID table. * mag3110 - add OF device ID table. * max11100 - remove .owner field as it is set by the spi core. * max5821 - add .of_match_table set to the ID table which was present but not used. * mcp4725 - add OF device ID table. * mlx96014 - add OF device ID table. * mma7455 - add OF device ID table. * mma7660 - add OF device ID table. * mpl3115 - add OF device ID table. * mpu6050 - add OF device ID table. * pc104 - mask pc104 drivers behind a global pc104 config option. * ti-ads1015 - add OF device ID table. * tsl2563 - add OF device ID table. * us5182d - add OF device ID table.
2017-03-22	hwmon: Add missing HWMON_T_ALARM	Peter Huewe
	Unfortunately the HWMON_T_ALARM define was missing, although the associated entry was present in hwmon_temp_attributes. This is needed to convert drivers to the new interface which use channel based alarms. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-03-21	tcp: mark skbs with SCM_TIMESTAMPING_OPT_STATS	Soheil Hassas Yeganeh
	SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled while packets are collected on the error queue. So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags is not enough to safely assume that the skb contains OPT_STATS data. Add a bit in sock_exterr_skb to indicate whether the skb contains opt_stats data. Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: JongHwan Kim <zzoru007@gmail.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21	rhashtable: Add rhashtable_lookup_get_insert_fast	Andreas Gruenbacher
	Add rhashtable_lookup_get_insert_fast for fixed keys, similar to rhashtable_lookup_get_insert_key for explicit keys. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21	Merge tag 'reset-fixes-for-4.11' of git://git.pengutronix.de/git/pza/linux ↵	Olof Johansson
	into fixes Reset controller fixes for v4.11 Fix optional reset_control_get_stubs to return NULL and remove warnings from reset_control_* stubs. This fixes commit bb475230b8e5 ("reset: make optional functions really optional"), which was merged in reset-for-4.11, and would cause consumer drivers depending on the new behaviour of optional resets to fail probing if RESET_CONTROLLER Kconfig option is disabled. * tag 'reset-fixes-for-4.11' of git://git.pengutronix.de/git/pza/linux: reset: fix optional reset_control_get stubs to return NULL Signed-off-by: Olof Johansson <olof@lixom.net>
2017-03-21	net: stmmac: RX queue routing configuration	Joao Pinto
	This patch adds the configuration of RX queues' routing. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21	net: stmmac: TX and RX queue priority configuration	Joao Pinto
	This patch adds the configuration of RX and TX queues' priority. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21	net: usb: usb: remove old api ethtool_{get\|set}_settings	Philippe Reynes
	The function usbnet_{get\|set}_settings is no longer used, so we remove it. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21	net: usb: usbnet: add new api ethtool_{get\|set}_link_ksettings	Philippe Reynes
	The ethtool api {get\|set}_settings is deprecated. We add the new api {get\|set}_link_ksettings to this driver. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21	vsock: track pkt owner vsock	Peng Tao
	So that we can cancel a queued pkt later if necessary. Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21	Merge tag 'gpio-v4.11-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Here is the first set of GPIO fixes for 4.11. It was delayed a bit beacuse I was chicken when linux-next was not rotating last week. This hits the ST serial driver in drivers/tty/serial and that has an ACK from Greg, he suggested to keep the old GPIO fwnode API around to smoothen things in the merge Windod and those have now served their purpose so we take them out and convert the last driver to the new API. Apart from that it's fixes as usual. Summary: - set the parent on the Altera A10SR driver, also fix high level IRQs. - fix error path on the mockup driver. - compilation noise about unused functions fixed. - fix missed interrupts on the MCP23S08 expander, this is also tagged for stable. - retire the interrim helpers devm_get_gpiod_from_child() used to smoothen merging in the merge window" * tag 'gpio-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio:mcp23s08 Fixed missing interrupts serial: st-asc: Use new GPIOD API to obtain RTS pin gpio: altera: Use handle_level_irq when configured as a level_high gpio: xgene: mark PM functions as __maybe_unused gpio: mockup: return -EFAULT if copy_from_user() fails gpio: altera-a10sr: Set gpio_chip parent property
2017-03-21	reset: fix optional reset_control_get stubs to return NULL	Philipp Zabel
	When RESET_CONTROLLER is not enabled, the optional reset_control_get stubs should now also return NULL. Since it is now valid for reset_control_assert/deassert/reset/status/put to be called unconditionally, with NULL as an argument for optional resets, the stubs are not allowed to warn anymore. Fixes: bb475230b8e5 ("reset: make optional functions really optional") Reported-by: Andrzej Hajda <a.hajda@samsung.com> Tested-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Andrzej Hajda <a.hajda@samsung.com> Cc: Ramiro Oliveira <Ramiro.Oliveira@synopsys.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2017-03-21	blk-stat: convert to callback-based statistics reporting	Omar Sandoval
	Currently, statistics are gathered in ~0.13s windows, and users grab the statistics whenever they need them. This is not ideal for both in-tree users: 1. Writeback throttling wants its own dynamically sized window of statistics. Since the blk-stats statistics are reset after every window and the wbt windows don't line up with the blk-stats windows, wbt doesn't see every I/O. 2. Polling currently grabs the statistics on every I/O. Again, depending on how the window lines up, we may miss some I/Os. It's also unnecessary overhead to get the statistics on every I/O; the hybrid polling heuristic would be just as happy with the statistics from the previous full window. This reworks the blk-stats infrastructure to be callback-based: users register a callback that they want called at a given time with all of the statistics from the window during which the callback was active. Users can dynamically bucketize the statistics. wbt and polling both currently use read vs. write, but polling can be extended to further subdivide based on request size. The callbacks are kept on an RCU list, and each callback has percpu stats buffers. There will only be a few users, so the overhead on the I/O completion side is low. The stats flushing is also simplified considerably: since the timer function is responsible for clearing the statistics, we don't have to worry about stale statistics. wbt is a trivial conversion. After the conversion, the windowing problem mentioned above is fixed. For polling, we register an extra callback that caches the previous window's statistics in the struct request_queue for the hybrid polling heuristic to use. Since we no longer have a single stats buffer for the request queue, this also removes the sysfs and debugfs stats entries. To replace those, we add a debugfs entry for the poll statistics. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-21	blk-stat: move BLK_RQ_STAT_BATCH definition to blk-stat.c	Omar Sandoval
	This is an implementation detail that no-one outside of blk-stat.c uses. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-21	HID: hiddev: reallocate hiddev's minor number	Jaejoong Kim
	We need to store the minor number each drivers. In case of hidraw, the minor number is stored stores in struct hidraw. But hiddev's minor is located in struct hid_device. The hid-core driver announces a kernel message which driver is loaded when HID device connected, but hiddev's minor number is always zero. To proper display hiddev's minor number, we need to store the minor number asked from usb core and do some refactoring work (move from hiddev.c to hiddev.h) to access hiddev in hid-core. [jkosina@suse.cz: rebase on top of newer codebase] Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jaejoong Kim <climbbb.kim@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-21	HID: remove initial reading of reports at connect	Benjamin Tissoires
	It looks like a bunch of devices do not like to be polled for their reports at init time. When you look into the details, it seems that for those that are requiring the quirk HID_QUIRK_NO_INIT_REPORTS, the driver fails to retrieve part of the features/inputs while others (more generic) work. IMO, it should be acceptable to remove the need for the quirk in the general case. On the small amount of cases where we actually need to read the current values, the driver in charge (hid-mt or wacom) already retrieves the features manually. There are 2 cases where we might need to retrieve the reports at init: 1. hiddev devices with specific use-space tool 2. a device that would require the driver to fetch a specific feature/input at plug For case 2, I have seen this a few time on hid-multitouch. It is solved in hid-multitouch directly by fetching the feature. I hope it won't be too common and this can be solved on a per-case basis (crossing fingers). For case 1, we moved the implementation of HID_QUIRK_NO_INIT_REPORTS in hiddev. When somebody starts calling ioctls that needs an initial update, the hiddev device will fetch the initial state of the reports to mimic the current behavior. This adds a small amount of time during the first HIDIOCGUSAGE(S), but it should be acceptable in most cases. To keep the currently known broken devices, we have to keep around HID_QUIRK_NO_INIT_REPORTS, but the scope will only be for hiddev. Note that I don't think hidraw would be affected and I checked that the FF drivers that need to interact with the report fields are all using output reports, which are not initialized by usbhid_init_reports(). NO_INIT_INPUT_REPORTS is then replaced by HID_QUIRK_NO_INIT_REPORTS: there is no point keeping it for just one device. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-21	usb/early: Add driver for xhci debug capability	Lu Baolu
	XHCI debug capability (DbC) is an optional but standalone functionality provided by an xHCI host controller. Software learns this capability by walking through the extended capability list of the host. XHCI specification describes DbC in section 7.6. This patch introduces the code to probe and initialize the debug capability hardware during early boot. With hardware initialized, the debug target (system on which this code is running) will present a debug device through the debug port (normally the first USB3 port). The debug device is fully compliant with the USB framework and provides the equivalent of a very high performance (USB3) full-duplex serial link between the debug host and target. The DbC functionality is independent of the xHCI host. There isn't any precondition from the xHCI host side for the DbC to work. One use for this feature is kernel debugging, for example when your machine crashes very early before the regular console code is initialized. Other uses include simpler, lockless logging instead of a full-blown printk console driver and klogd. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-usb@vger.kernel.org Link: http://lkml.kernel.org/r/1490083293-3792-3-git-send-email-baolu.lu@linux.intel.com [ Small fix to the Kconfig help text. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-21	drm/edid: check for HF-VSDB block	Thierry Reding
	This patch implements a small function that finds if a given CEA db is hdmi-forum vendor specific data block or not. V2: Rebase. V3: Added R-B from Jose. V4: Rebase V5: Rebase V6: Rebase V7: Rebase V8: Rebase V9: Rebase V10: Rebase Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Shashank Sharma <shashank.sharma@intel.com> Reviewed-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1489404244-16608-3-git-send-email-shashank.sharma@intel.com
2017-03-21	chardev: add helper function to register char devs with a struct device	Logan Gunthorpe
	Credit for this patch goes is shared with Dan Williams [1]. I've taken things one step further to make the helper function more useful and clean up calling code. There's a common pattern in the kernel whereby a struct cdev is placed in a structure along side a struct device which manages the life-cycle of both. In the naive approach, the reference counting is broken and the struct device can free everything before the chardev code is entirely released. Many developers have solved this problem by linking the internal kobjs in this fashion: cdev.kobj.parent = &parent_dev.kobj; The cdev code explicitly gets and puts a reference to it's kobj parent. So this seems like it was intended to be used this way. Dmitrty Torokhov first put this in place in 2012 with this commit: 2f0157f char_dev: pin parent kobject and the first instance of the fix was then done in the input subsystem in the following commit: 4a215aa Input: fix use-after-free introduced with dynamic minor changes Subsequently over the years, however, this issue seems to have tripped up multiple developers independently. For example, see these commits: 0d5b7da iio: Prevent race between IIO chardev opening and IIO device (by Lars-Peter Clausen in 2013) ba0ef85 tpm: Fix initialization of the cdev (by Jason Gunthorpe in 2015) 5b28dde [media] media: fix use-after-free in cdev_put() when app exits after driver unbind (by Shauh Khan in 2016) This technique is similarly done in at least 15 places within the kernel and probably should have been done so in another, at least, 5 places. The kobj line also looks very suspect in that one would not expect drivers to have to mess with kobject internals in this way. Even highly experienced kernel developers can be surprised by this code, as seen in [2]. To help alleviate this situation, and hopefully prevent future wasted effort on this problem, this patch introduces a helper function to register a char device along with its parent struct device. This creates a more regular API for tying a char device to its parent without the developer having to set members in the underlying kobject. This patch introduce cdev_device_add and cdev_device_del which replaces a common pattern including setting the kobj parent, calling cdev_add and then calling device_add. It also introduces cdev_set_parent for the few cases that set the kobject parent without using device_add. [1] https://lkml.org/lkml/2017/2/13/700 [2] https://lkml.org/lkml/2017/2/10/370 Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-20	x86/arch_prctl: Add ARCH_[GET\|SET]_CPUID	Kyle Huey
	Intel supports faulting on the CPUID instruction beginning with Ivy Bridge. When enabled, the processor will fault on attempts to execute the CPUID instruction with CPL>0. Exposing this feature to userspace will allow a ptracer to trap and emulate the CPUID instruction. When supported, this feature is controlled by toggling bit 0 of MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of https://bugzilla.kernel.org/attachment.cgi?id=243991 Implement a new pair of arch_prctls, available on both x86-32 and x86-64. ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting is enabled (and thus the CPUID instruction is not available) or 1 if CPUID faulting is not enabled. ARCH_SET_CPUID: Set the CPUID state to the second argument. If cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will be deactivated. Returns ENODEV if CPUID faulting is not supported on this system. The state of the CPUID faulting flag is propagated across forks, but reset upon exec. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20	x86/syscalls/32: Wire up arch_prctl on x86-32	Kyle Huey
	Hook up arch_prctl to call do_arch_prctl() on x86-32, and in 32 bit compat mode on x86-64. This allows to have arch_prctls that are not specific to 64 bits. On UML, simply stub out this syscall. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-7-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20	clk: tegra: Add SATA seq input control	Peter De Schrijver
	This will be used by the powergating driver to ensure proper sequencer state when the SATA domain is powergated. Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-03-20	clk: tegra: Handle UTMIPLL IDDQ	Peter De Schrijver
	Export UTMIPLL IDDQ functions. These will be needed when powergating the XUSB partition. Signed-off-by: BH Hsieh <bhsieh@nvidia.com> Signed-off-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2017-03-18	mm/gup: Implement the dev_pagemap() logic in the generic ↵	Kirill A. Shutemov
	get_user_pages_fast() function This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. Prepare generic GUP_fast() to handle dev_pagemap(). At the moment, it's only implemented on x86. On non-x86, the new code will be compiled out. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-17	Merge branch 'x86-acpi-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 acpi fixes from Thomas Gleixner: "This update deals with the fallout of the recent work to make cpuid/node mappings persistent. It turned out that the boot time ACPI based mapping tripped over ACPI inconsistencies and caused regressions. It's partially reverted and the fragile part replaced by an implementation which makes the mapping persistent when a CPU goes online for the first time" * 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi/processor: Check for duplicate processor ids at hotplug time acpi/processor: Implement DEVICE operator for processor enumeration x86/acpi: Restore the order of CPU IDs Revert"x86/acpi: Enable MADT APIs to return disabled apicids" Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
2017-03-17	dma-fence: add dma_fence_match_context helper	Philipp Zabel
	Add a helper to check if all fences in a fence array are from a given context. For convenience, the function can also handle being given a non-array fence. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: http://patchwork.freedesktop.org/patch/msgid/1489768492-25190-1-git-send-email-p.zabel@pengutronix.de
2017-03-17	hrtimer: Remove hrtimer_peek_ahead_timers() leftovers	Stephen Boyd
	This function was removed in commit c6eb3f70d448 (hrtimer: Get rid of hrtimer softirq, 2015-04-14) but the prototype wasn't ever deleted. Delete it now. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/20170317010814.2591-1-sboyd@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-17	cgroup, kthread: close race window where new kthreads can be migrated to ↵	Tejun Heo
	non-root cgroups Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-debugged-by: Chris Mason <clm@fb.com> Cc: stable@vger.kernel.org # v4.3+ (we can't close the race on < v4.3) Signed-off-by: Tejun Heo <tj@kernel.org>
2017-03-17	Merge branch 'linus' into x86/mm, to pick up a bugfix	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-17	docs: Add kernel-doc comments to VME driver API	Martyn Welch
	Add kernel-doc comments to the VME driver API and structures. This documentation will be integrated into the RST documentation in a later patch. Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17	vmbus: expose debug info for drivers	Stephen Hemminger
	Allow driver to get debug information about state of the ring. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17	vmbus: cleanup header file style	Stephen Hemminger
	Minor changes to align hyper-v vmbus include files with current linux kernel style. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17	vmbus: remove useless return's	Stephen Hemminger
	No need for empty return at end of void function Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17	fpga: Add flag to indicate bitstream needs decrypting	Moritz Fischer
	Add a flag that is passed to the write_init() callback, indicating that the bitstream is encrypted. The low-level driver will deal with the flag, or return an error, if encrypted bitstreams are not supported. Signed-off-by: Moritz Fischer <mdf@kernel.org> Acked-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Alan Tull <atull@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>