summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2017-03-22kobject: Export kobject_get_unless_zero()Jan Kara
Make the function available for outside use and fortify it against NULL kobject. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22block: Fix oops in locked_inode_to_wb_and_lock_list()Jan Kara
When block device is closed, we call inode_detach_wb() in __blkdev_put() which sets inode->i_wb to NULL. That is contrary to expectations that inode->i_wb stays valid once set during the whole inode's lifetime and leads to oops in wb_get() in locked_inode_to_wb_and_lock_list() because inode_to_wb() returned NULL. The reason why we called inode_detach_wb() is not valid anymore though. BDI is guaranteed to stay along until we call bdi_put() from bdev_evict_inode() so we can postpone calling inode_detach_wb() to that moment. Also add a warning to catch if someone uses inode_detach_wb() in a dangerous way. Reported-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22bdi: Do not wait for cgwbs release in bdi_unregister()Jan Kara
Currently we wait for all cgwbs to get released in cgwb_bdi_destroy() (called from bdi_unregister()). That is however unnecessary now when cgwb->bdi is a proper refcounted reference (thus bdi cannot get released before all cgwbs are released) and when cgwb_bdi_destroy() shuts down writeback directly. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()Jan Kara
Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy() which also means that writeback has been shutdown on them. Since this wait is going away, directly shutdown writeback on cgwbs from cgwb_bdi_destroy() to avoid live writeback structures after bdi_unregister() has finished. To make that safe with concurrent shutdown from cgwb_release_workfn(), we also have to make sure wb_shutdown() returns only after the bdi_writeback structure is really shutdown. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-22bdi: Mark congested->bdi as internalJan Kara
congested->bdi pointer is used only to be able to remove congested structure from bdi->cgwb_congested_tree on structure release. Moreover the pointer can become NULL when we unregister the bdi. Rename the field to __bdi and add a comment to make it more explicit this is internal stuff of memcg writeback code and people should not use the field as such use will be likely race prone. We do not bother with converting congested->bdi to a proper refcounted reference. It will be slightly ugly to special-case bdi->wb.congested to avoid effectively a cyclic reference of bdi to itself and the reference gets cleared from bdi_unregister() making it impossible to reference a freed bdi. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-23cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurelyRafael J. Wysocki
The way the schedutil governor uses the PELT metric causes it to underestimate the CPU utilization in some cases. That can be easily demonstrated by running kernel compilation on a Sandy Bridge Intel processor, running turbostat in parallel with it and looking at the values written to the MSR_IA32_PERF_CTL register. Namely, the expected result would be that when all CPUs were 100% busy, all of them would be requested to run in the maximum P-state, but observation shows that this clearly isn't the case. The CPUs run in the maximum P-state for a while and then are requested to run slower and go back to the maximum P-state after a while again. That causes the actual frequency of the processor to visibly oscillate below the sustainable maximum in a jittery fashion which clearly is not desirable. That has been attributed to CPU utilization metric updates on task migration that cause the total utilization value for the CPU to be reduced by the utilization of the migrated task. If that happens, the schedutil governor may see a CPU utilization reduction and will attempt to reduce the CPU frequency accordingly right away. That may be premature, though, for example if the system is generally busy and there are other runnable tasks waiting to be run on that CPU already. This is unlikely to be an issue on systems where cpufreq policies are shared between multiple CPUs, because in those cases the policy utilization is computed as the maximum of the CPU utilization values over the whole policy and if that turns out to be low, reducing the frequency for the policy most likely is a good idea anyway. On systems with one CPU per policy, however, it may affect performance adversely and even lead to increased energy consumption in some cases. On those systems it may be addressed by taking another utilization metric into consideration, like whether or not the CPU whose frequency is about to be reduced has been idle recently, because if that's not the case, the CPU is likely to be busy in the near future and its frequency should not be reduced. To that end, use the counter of idle calls in the timekeeping code. Namely, make the schedutil governor look at that counter for the current CPU every time before its frequency is about to be reduced. If the counter has not changed since the previous iteration of the governor computations for that CPU, the CPU has been busy for all that time and its frequency should not be decreased, so if the new frequency would be lower than the one set previously, the governor will skip the frequency update. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Joel Fernandes <joelaf@google.com>
2017-03-22bpf: Add hash of maps supportMartin KaFai Lau
This patch adds hash of maps support (hashmap->bpf_map). BPF_MAP_TYPE_HASH_OF_MAPS is added. A map-in-map contains a pointer to another map and lets call this pointer 'inner_map_ptr'. Notes on deleting inner_map_ptr from a hash map: 1. For BPF_F_NO_PREALLOC map-in-map, when deleting an inner_map_ptr, the htab_elem itself will go through a rcu grace period and the inner_map_ptr resides in the htab_elem. 2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC), when deleting an inner_map_ptr, the htab_elem may get reused immediately. This situation is similar to the existing prealloc-ated use cases. However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls inner_map->ops->map_free(inner_map) which will go through a rcu grace period (i.e. all bpf_map's map_free currently goes through a rcu grace period). Hence, the inner_map_ptr is still safe for the rcu reader side. This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the check_map_prealloc() in the verifier. preallocation is a must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using map-in-map first. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22bpf: Add array of maps supportMartin KaFai Lau
This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.Joel Scherpelz
This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). Signed-off-by: Joel Scherpelz <jscherpelz@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22net: phy: remove the indirect MMD read/write methodsRussell King
Remove the indirect MMD read/write methods which are now no longer necessary. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22net: phy: make phy_(read|write)_mmd() generic MMD accessorsRussell King
Make phy_(read|write)_mmd() generic 802.3 clause 45 register accessors for both Clause 22 and Clause 45 PHYs, using either the direct register reading for Clause 45, or the indirect method for Clause 22 PHYs. Allow this behaviour to be overriden by PHY drivers where necessary. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22net: phy: move phy MMD accessors to phy-core.cRussell King
Move the phy_(read|write)__mmd() helpers out of line, they will become our main MMD accessor functions, and so will be a little more complex. This complexity doesn't belong in an inline function. Also move the _indirect variants as well to keep like functionality together. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22net: stmmac: Use AVB mode by defaultThierry Reding
Prior to the recent multi-queue changes the driver would configure the queues to use the AVB mode, but the mode then got switched to DCB. The hardware still works fine in DCB mode, but my testing capabilities are limited, so it's safer to revert to the prior setting anyway. Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22net: convert sk_filter.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22iommu: Disambiguate MSI region typesRobin Murphy
The introduction of reserved regions has left a couple of rough edges which we could do with sorting out sooner rather than later. Since we are not yet addressing the potential dynamic aspect of software-managed reservations and presenting them at arbitrary fixed addresses, it is incongruous that we end up displaying hardware vs. software-managed MSI regions to userspace differently, especially since ARM-based systems may actually require one or the other, or even potentially both at once, (which iommu-dma currently has no hope of dealing with at all). Let's resolve the former user-visible inconsistency ASAP before the ABI has been baked into a kernel release, in a way that also lays the groundwork for the latter shortcoming to be addressed by follow-up patches. For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use IOMMU_RESV_MSI to describe the hardware type, and document everything a little bit. Since the x86 MSI remapping hardware falls squarely under this meaning of IOMMU_RESV_MSI, apply that type to their regions as well, so that we tell the same story to userspace across all platforms. Secondly, as the various region types require quite different handling, and it really makes little sense to ever try combining them, convert the bitfield-esque #defines to a plain enum in the process before anyone gets the wrong impression. Fixes: d30ddcaa7b02 ("iommu: Add a new type field in iommu_resv_region") Reviewed-by: Eric Auger <eric.auger@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: David Woodhouse <dwmw2@infradead.org> CC: kvm@vger.kernel.org Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-03-22Merge tag 'iio-fixes-for-4.11c' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus Jonathan writes: Thirds set of IIO fixes for the 4.11 cycle. * core - iio sw-device - ensure configfs is enabled both when building as module and built in. * ak8974 - drop incorrect __exit markup on remove. * hid-sensor-trigger - code reorganise to avoid losing settings if a power cycle occurs during S3. * lsm6dsx - fix incorrect overwrite of parts of FIFO_CTRL2 register during watermark configuration. * ti-am335x - fix a hard to hit bug when reenabling from a fifo overrun by waiting for current cycle to finish.
2017-03-22hwmon: Add missing HWMON_T_ALARMPeter Huewe
Unfortunately the HWMON_T_ALARM define was missing, although the associated entry was present in hwmon_temp_attributes. This is needed to convert drivers to the new interface which use channel based alarms. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-03-21tcp: mark skbs with SCM_TIMESTAMPING_OPT_STATSSoheil Hassas Yeganeh
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled while packets are collected on the error queue. So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags is not enough to safely assume that the skb contains OPT_STATS data. Add a bit in sock_exterr_skb to indicate whether the skb contains opt_stats data. Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: JongHwan Kim <zzoru007@gmail.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21rhashtable: Add rhashtable_lookup_get_insert_fastAndreas Gruenbacher
Add rhashtable_lookup_get_insert_fast for fixed keys, similar to rhashtable_lookup_get_insert_key for explicit keys. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21Merge tag 'reset-fixes-for-4.11' of git://git.pengutronix.de/git/pza/linux ↵Olof Johansson
into fixes Reset controller fixes for v4.11 Fix optional reset_control_get_stubs to return NULL and remove warnings from reset_control_* stubs. This fixes commit bb475230b8e5 ("reset: make optional functions really optional"), which was merged in reset-for-4.11, and would cause consumer drivers depending on the new behaviour of optional resets to fail probing if RESET_CONTROLLER Kconfig option is disabled. * tag 'reset-fixes-for-4.11' of git://git.pengutronix.de/git/pza/linux: reset: fix optional reset_control_get stubs to return NULL Signed-off-by: Olof Johansson <olof@lixom.net>
2017-03-21net: stmmac: RX queue routing configurationJoao Pinto
This patch adds the configuration of RX queues' routing. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21net: stmmac: TX and RX queue priority configurationJoao Pinto
This patch adds the configuration of RX and TX queues' priority. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21net: usb: usb: remove old api ethtool_{get|set}_settingsPhilippe Reynes
The function usbnet_{get|set}_settings is no longer used, so we remove it. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21net: usb: usbnet: add new api ethtool_{get|set}_link_ksettingsPhilippe Reynes
The ethtool api {get|set}_settings is deprecated. We add the new api {get|set}_link_ksettings to this driver. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21vsock: track pkt owner vsockPeng Tao
So that we can cancel a queued pkt later if necessary. Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21Merge tag 'gpio-v4.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Here is the first set of GPIO fixes for 4.11. It was delayed a bit beacuse I was chicken when linux-next was not rotating last week. This hits the ST serial driver in drivers/tty/serial and that has an ACK from Greg, he suggested to keep the old GPIO fwnode API around to smoothen things in the merge Windod and those have now served their purpose so we take them out and convert the last driver to the new API. Apart from that it's fixes as usual. Summary: - set the parent on the Altera A10SR driver, also fix high level IRQs. - fix error path on the mockup driver. - compilation noise about unused functions fixed. - fix missed interrupts on the MCP23S08 expander, this is also tagged for stable. - retire the interrim helpers devm_get_gpiod_from_child() used to smoothen merging in the merge window" * tag 'gpio-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio:mcp23s08 Fixed missing interrupts serial: st-asc: Use new GPIOD API to obtain RTS pin gpio: altera: Use handle_level_irq when configured as a level_high gpio: xgene: mark PM functions as __maybe_unused gpio: mockup: return -EFAULT if copy_from_user() fails gpio: altera-a10sr: Set gpio_chip parent property
2017-03-21reset: fix optional reset_control_get stubs to return NULLPhilipp Zabel
When RESET_CONTROLLER is not enabled, the optional reset_control_get stubs should now also return NULL. Since it is now valid for reset_control_assert/deassert/reset/status/put to be called unconditionally, with NULL as an argument for optional resets, the stubs are not allowed to warn anymore. Fixes: bb475230b8e5 ("reset: make optional functions really optional") Reported-by: Andrzej Hajda <a.hajda@samsung.com> Tested-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Andrzej Hajda <a.hajda@samsung.com> Cc: Ramiro Oliveira <Ramiro.Oliveira@synopsys.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2017-03-21blk-stat: convert to callback-based statistics reportingOmar Sandoval
Currently, statistics are gathered in ~0.13s windows, and users grab the statistics whenever they need them. This is not ideal for both in-tree users: 1. Writeback throttling wants its own dynamically sized window of statistics. Since the blk-stats statistics are reset after every window and the wbt windows don't line up with the blk-stats windows, wbt doesn't see every I/O. 2. Polling currently grabs the statistics on every I/O. Again, depending on how the window lines up, we may miss some I/Os. It's also unnecessary overhead to get the statistics on every I/O; the hybrid polling heuristic would be just as happy with the statistics from the previous full window. This reworks the blk-stats infrastructure to be callback-based: users register a callback that they want called at a given time with all of the statistics from the window during which the callback was active. Users can dynamically bucketize the statistics. wbt and polling both currently use read vs. write, but polling can be extended to further subdivide based on request size. The callbacks are kept on an RCU list, and each callback has percpu stats buffers. There will only be a few users, so the overhead on the I/O completion side is low. The stats flushing is also simplified considerably: since the timer function is responsible for clearing the statistics, we don't have to worry about stale statistics. wbt is a trivial conversion. After the conversion, the windowing problem mentioned above is fixed. For polling, we register an extra callback that caches the previous window's statistics in the struct request_queue for the hybrid polling heuristic to use. Since we no longer have a single stats buffer for the request queue, this also removes the sysfs and debugfs stats entries. To replace those, we add a debugfs entry for the poll statistics. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-21blk-stat: move BLK_RQ_STAT_BATCH definition to blk-stat.cOmar Sandoval
This is an implementation detail that no-one outside of blk-stat.c uses. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-21HID: hiddev: reallocate hiddev's minor numberJaejoong Kim
We need to store the minor number each drivers. In case of hidraw, the minor number is stored stores in struct hidraw. But hiddev's minor is located in struct hid_device. The hid-core driver announces a kernel message which driver is loaded when HID device connected, but hiddev's minor number is always zero. To proper display hiddev's minor number, we need to store the minor number asked from usb core and do some refactoring work (move from hiddev.c to hiddev.h) to access hiddev in hid-core. [jkosina@suse.cz: rebase on top of newer codebase] Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jaejoong Kim <climbbb.kim@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-21HID: remove initial reading of reports at connectBenjamin Tissoires
It looks like a bunch of devices do not like to be polled for their reports at init time. When you look into the details, it seems that for those that are requiring the quirk HID_QUIRK_NO_INIT_REPORTS, the driver fails to retrieve part of the features/inputs while others (more generic) work. IMO, it should be acceptable to remove the need for the quirk in the general case. On the small amount of cases where we actually need to read the current values, the driver in charge (hid-mt or wacom) already retrieves the features manually. There are 2 cases where we might need to retrieve the reports at init: 1. hiddev devices with specific use-space tool 2. a device that would require the driver to fetch a specific feature/input at plug For case 2, I have seen this a few time on hid-multitouch. It is solved in hid-multitouch directly by fetching the feature. I hope it won't be too common and this can be solved on a per-case basis (crossing fingers). For case 1, we moved the implementation of HID_QUIRK_NO_INIT_REPORTS in hiddev. When somebody starts calling ioctls that needs an initial update, the hiddev device will fetch the initial state of the reports to mimic the current behavior. This adds a small amount of time during the first HIDIOCGUSAGE(S), but it should be acceptable in most cases. To keep the currently known broken devices, we have to keep around HID_QUIRK_NO_INIT_REPORTS, but the scope will only be for hiddev. Note that I don't think hidraw would be affected and I checked that the FF drivers that need to interact with the report fields are all using output reports, which are not initialized by usbhid_init_reports(). NO_INIT_INPUT_REPORTS is then replaced by HID_QUIRK_NO_INIT_REPORTS: there is no point keeping it for just one device. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-21usb/early: Add driver for xhci debug capabilityLu Baolu
XHCI debug capability (DbC) is an optional but standalone functionality provided by an xHCI host controller. Software learns this capability by walking through the extended capability list of the host. XHCI specification describes DbC in section 7.6. This patch introduces the code to probe and initialize the debug capability hardware during early boot. With hardware initialized, the debug target (system on which this code is running) will present a debug device through the debug port (normally the first USB3 port). The debug device is fully compliant with the USB framework and provides the equivalent of a very high performance (USB3) full-duplex serial link between the debug host and target. The DbC functionality is independent of the xHCI host. There isn't any precondition from the xHCI host side for the DbC to work. One use for this feature is kernel debugging, for example when your machine crashes very early before the regular console code is initialized. Other uses include simpler, lockless logging instead of a full-blown printk console driver and klogd. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-usb@vger.kernel.org Link: http://lkml.kernel.org/r/1490083293-3792-3-git-send-email-baolu.lu@linux.intel.com [ Small fix to the Kconfig help text. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-20x86/arch_prctl: Add ARCH_[GET|SET]_CPUIDKyle Huey
Intel supports faulting on the CPUID instruction beginning with Ivy Bridge. When enabled, the processor will fault on attempts to execute the CPUID instruction with CPL>0. Exposing this feature to userspace will allow a ptracer to trap and emulate the CPUID instruction. When supported, this feature is controlled by toggling bit 0 of MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of https://bugzilla.kernel.org/attachment.cgi?id=243991 Implement a new pair of arch_prctls, available on both x86-32 and x86-64. ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting is enabled (and thus the CPUID instruction is not available) or 1 if CPUID faulting is not enabled. ARCH_SET_CPUID: Set the CPUID state to the second argument. If cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will be deactivated. Returns ENODEV if CPUID faulting is not supported on this system. The state of the CPUID faulting flag is propagated across forks, but reset upon exec. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-20x86/syscalls/32: Wire up arch_prctl on x86-32Kyle Huey
Hook up arch_prctl to call do_arch_prctl() on x86-32, and in 32 bit compat mode on x86-64. This allows to have arch_prctls that are not specific to 64 bits. On UML, simply stub out this syscall. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: linux-kselftest@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Robert O'Callahan <robert@ocallahan.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: Jeff Dike <jdike@addtoit.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: user-mode-linux-user@lists.sourceforge.net Cc: David Matlack <dmatlack@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: linux-fsdevel@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/20170320081628.18952-7-khuey@kylehuey.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-18mm/gup: Implement the dev_pagemap() logic in the generic ↵Kirill A. Shutemov
get_user_pages_fast() function This is a preparation patch for the transition of x86 to the generic GUP_fast() implementation. Prepare generic GUP_fast() to handle dev_pagemap(). At the moment, it's only implemented on x86. On non-x86, the new code will be compiled out. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dann Frazier <dann.frazier@canonical.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170316152655.37789-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-17Merge branch 'x86-acpi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 acpi fixes from Thomas Gleixner: "This update deals with the fallout of the recent work to make cpuid/node mappings persistent. It turned out that the boot time ACPI based mapping tripped over ACPI inconsistencies and caused regressions. It's partially reverted and the fragile part replaced by an implementation which makes the mapping persistent when a CPU goes online for the first time" * 'x86-acpi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: acpi/processor: Check for duplicate processor ids at hotplug time acpi/processor: Implement DEVICE operator for processor enumeration x86/acpi: Restore the order of CPU IDs Revert"x86/acpi: Enable MADT APIs to return disabled apicids" Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
2017-03-17hrtimer: Remove hrtimer_peek_ahead_timers() leftoversStephen Boyd
This function was removed in commit c6eb3f70d448 (hrtimer: Get rid of hrtimer softirq, 2015-04-14) but the prototype wasn't ever deleted. Delete it now. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/20170317010814.2591-1-sboyd@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-17cgroup, kthread: close race window where new kthreads can be migrated to ↵Tejun Heo
non-root cgroups Creation of a kthread goes through a couple interlocked stages between the kthread itself and its creator. Once the new kthread starts running, it initializes itself and wakes up the creator. The creator then can further configure the kthread and then let it start doing its job by waking it up. In this configuration-by-creator stage, the creator is the only one that can wake it up but the kthread is visible to userland. When altering the kthread's attributes from userland is allowed, this is fine; however, for cases where CPU affinity is critical, kthread_bind() is used to first disable affinity changes from userland and then set the affinity. This also prevents the kthread from being migrated into non-root cgroups as that can affect the CPU affinity and many other things. Unfortunately, the cgroup side of protection is racy. While the PF_NO_SETAFFINITY flag prevents further migrations, userland can win the race before the creator sets the flag with kthread_bind() and put the kthread in a non-root cgroup, which can lead to all sorts of problems including incorrect CPU affinity and starvation. This bug got triggered by userland which periodically tries to migrate all processes in the root cpuset cgroup to a non-root one. Per-cpu workqueue workers got caught while being created and ended up with incorrected CPU affinity breaking concurrency management and sometimes stalling workqueue execution. This patch adds task->no_cgroup_migration which disallows the task to be migrated by userland. kthreadd starts with the flag set making every child kthread start in the root cgroup with migration disallowed. The flag is cleared after the kthread finishes initialization by which time PF_NO_SETAFFINITY is set if the kthread should stay in the root cgroup. It'd be better to wait for the initialization instead of failing but I couldn't think of a way of implementing that without adding either a new PF flag, or sleeping and retrying from waiting side. Even if userland depends on changing cgroup membership of a kthread, it either has to be synchronized with kthread_create() or periodically repeat, so it's unlikely that this would break anything. v2: Switch to a simpler implementation using a new task_struct bit field suggested by Oleg. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-and-debugged-by: Chris Mason <clm@fb.com> Cc: stable@vger.kernel.org # v4.3+ (we can't close the race on < v4.3) Signed-off-by: Tejun Heo <tj@kernel.org>
2017-03-17Merge branch 'linus' into x86/mm, to pick up a bugfixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16bpf: add helper inlining infra and optimize map_array lookupAlexei Starovoitov
Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem() into a sequence of bpf instructions. When JIT is on the sequence of bpf instructions is the sequence of native cpu instructions with significantly faster performance than indirect call and two function's prologue/epilogue. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16net/mlx4_core: Avoid delays during VF driver device shutdownJack Morgenstein
Some Hypervisors detach VFs from VMs by instantly causing an FLR event to be generated for a VF. In the mlx4 case, this will cause that VF's comm channel to be disabled before the VM has an opportunity to invoke the VF device's "shutdown" method. For such Hypervisors, there is a race condition between the VF's shutdown method and its internal-error detection/reset thread. The internal-error detection/reset thread (which runs every 5 seconds) also detects a disabled comm channel. If the internal-error detection/reset flow wins the race, we still get delays (while that flow tries repeatedly to detect comm-channel recovery). The cited commit fixed the command timeout problem when the internal-error detection/reset flow loses the race. This commit avoids the unneeded delays when the internal-error detection/reset flow wins. Fixes: d585df1c5ccf ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reported-by: Simon Xiao <sixiao@microsoft.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16drivers core: remove assert_held_device_hotplug()Heiko Carstens
The last caller of assert_held_device_hotplug() is gone, so remove it again. Link: http://lkml.kernel.org/r/20170314125226.16779-3-heiko.carstens@de.ibm.com Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-16kasan: add a prototype of task_struct to avoid warningMasami Hiramatsu
Add a prototype of task_struct to fix below warning on arm64. In file included from arch/arm64/kernel/probes/kprobes.c:19:0: include/linux/kasan.h:81:132: error: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] static inline void kasan_unpoison_task_stack(struct task_struct *task) {} As same as other types (kmem_cache, page, and vm_struct) this adds a prototype of task_struct data structure on top of kasan.h. [arnd] A related warning was fixed before, but now appears in a different line in the same file in v4.11-rc2. The patch from Masami Hiramatsu still seems appropriate, so let's take his version. Fixes: 71af2ed5eeea ("kasan, sched/headers: Remove <linux/sched.h> from <linux/kasan.h>") Link: https://patchwork.kernel.org/patch/9569839/ Link: http://lkml.kernel.org/r/20170313141517.3397802-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Alexander Potapenko <glider@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-16Merge tag 'perf-core-for-mingo-4.12-20170316' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: New features: - Add 'brstackinsn' field in 'perf script' to reuse the x86 instruction decoder used in the Intel PT code to study hot paths to samples (Andi Kleen) Kernel changes: - Default UPROBES_EVENTS to Y (Alexei Starovoitov) - Fix check for kretprobe offset within function entry (Naveen N. Rao) Infrastructure changes: - Introduce util func is_sdt_event() (Ravi Bangoria) - Make perf_event__synthesize_mmap_events() scale on older kernels where reading /proc/pid/maps is way slower than reading /proc/pid/task/pid/maps (Stephane Eranian) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16crypto: ccp - Assign DMA commands to the channel's CCPGary R Hook
The CCP driver generally uses a round-robin approach when assigning operations to available CCPs. For the DMA engine, however, the DMA mappings of the SGs are associated with a specific CCP. When an IOMMU is enabled, the IOMMU is programmed based on this specific device. If the DMA operations are not performed by that specific CCP then addressing errors and I/O page faults will occur. Update the CCP driver to allow a specific CCP device to be requested for an operation and use this in the DMA engine support. Cc: <stable@vger.kernel.org> # 4.9.x- Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-16locking/lockdep: Handle statically initialized PER_CPU locks properlyThomas Gleixner
If a PER_CPU struct which contains a spin_lock is statically initialized via: DEFINE_PER_CPU(struct foo, bla) = { .lock = __SPIN_LOCK_UNLOCKED(bla.lock) }; then lockdep assigns a seperate key to each lock because the logic for assigning a key to statically initialized locks is to use the address as the key. With per CPU locks the address is obvioulsy different on each CPU. That's wrong, because all locks should have the same key. To solve this the following modifications are required: 1) Extend the is_kernel/module_percpu_addr() functions to hand back the canonical address of the per CPU address, i.e. the per CPU address minus the per CPU offset. 2) Check the lock address with these functions and if the per CPU check matches use the returned canonical address as the lock key, so all per CPU locks have the same key. 3) Move the static_obj(key) check into look_up_lock_class() so this check can be avoided for statically initialized per CPU locks. That's required because the canonical address fails the static_obj(key) check for obvious reasons. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Merged Dan's fixups for !MODULES and !SMP into this patch. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Murphy <dmurphy@ti.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170227143736.pectaimkjkan5kow@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16locking/lockdep: Add new check to lock_downgrade()J. R. Okajima
Commit: f8319483f57f ("locking/lockdep: Provide a type check for lock_is_held") didn't fully cover rwsems as downgrade_write() was left out. Introduce lock_downgrade() and use it to add new checks. See-also: http://marc.info/?l=linux-kernel&m=148581164003149&w=2 Originally-written-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1486053497-9948-3-git-send-email-hooanon05g@gmail.com [ Rewrote the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16perf/core: Keep AUX flags in the output handleWill Deacon
In preparation for adding more flags to perf AUX records, introduce a separate API for setting the flags for a session, rather than appending more bool arguments to perf_aux_output_end. This allows to set each flag at the time a corresponding condition is detected, instead of tracking it in each driver's private state. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170220133352.17995-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16vmbus: remove hv_event_tasklet_disable/enableDexuan Cui
With the recent introduction of per-channel tasklet, we need to update the way we handle the 3 concurrency issues: 1. hv_process_channel_removal -> percpu_channel_deq vs. vmbus_chan_sched -> list_for_each_entry(..., percpu_list); 2. vmbus_process_offer -> percpu_channel_enq/deq vs. vmbus_chan_sched. 3. vmbus_close_internal vs. the per-channel tasklet vmbus_on_event; The first 2 issues can be handled by Stephen's recent patch "vmbus: use rcu for per-cpu channel list", and the third issue can be handled by calling tasklet_disable in vmbus_close_internal here. We don't need the original hv_event_tasklet_disable/enable since we now use per-channel tasklet instead of the previous per-CPU tasklet, and actually we must remove them due to the side effect now: vmbus_process_offer -> hv_event_tasklet_enable -> tasklet_schedule will start the per-channel callback prematurely, cauing NULL dereferencing (the channel may haven't been properly configured to run the callback yet). Fixes: 631e63a9f346 ("vmbus: change to per channel tasklet") Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>