summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2019-07-25 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix segfault in libbpf, from Andrii. 2) fix gso_segs access, from Eric. 3) tls/sockmap fixes, from Jakub and John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-25net/mlx5e: Prevent encap flow counter update async to user queryAriel Levkovich
This patch prevents a race between user invoked cached counters query and a neighbor last usage updater. The cached flow counter stats can be queried by calling "mlx5_fc_query_cached" which provides the number of bytes and packets that passed via this flow since the last time this counter was queried. It does so by reducting the last saved stats from the current, cached stats and then updating the last saved stats with the cached stats. It also provide the lastuse value for that flow. Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the last usage time of encapsulation flows, it calls the flow counter query method periodically and async to user queries of the flow counter using cls_flower. This call is causing the driver to update the last reported bytes and packets from the cache and therefore, future user queries of the flow stats will return lower than expected number for bytes and packets since the last saved stats in the driver was updated async to the last saved stats in cls_flower. This causes wrong stats presentation of encapsulation flows to user. Since the neighbor usage updater only needs the lastuse stats from the cached counter, the fix is to use a dedicated lastuse query call that returns the lastuse value without synching between the cached stats and the last saved stats. Fixes: f6dfb4c3f216 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters") Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-25net/mlx5: Fix modify_cq_in alignmentEdward Srouji
Fix modify_cq_in alignment to match the device specification. After this fix the 'cq_umem_valid' field will be in the right offset. Cc: <stable@vger.kernel.org> # 4.19 Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits") Signed-off-by: Edward Srouji <edwards@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-25mm/hmm: remove the legacy hmm_pfn_* APIsChristoph Hellwig
Switch the one remaining user in nouveau over to its replacement, and remove all the wrappers. Link: https://lore.kernel.org/r/20190724065258.16603-7-hch@lst.de Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25mm/hmm: move hmm_vma_range_done and hmm_vma_fault to nouveauChristoph Hellwig
These two functions are marked as a legacy APIs to get rid of, but seem to suit the current nouveau flow. Move it to the only user in preparation for fixing a locking bug involving caller and callee. All comments referring to the old API have been removed as this now is a driver private helper. Link: https://lore.kernel.org/r/20190724065258.16603-3-hch@lst.de Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25lib/dim: Fix -Wunused-const-variable warningsLeon Romanovsky
DIM causes to the following warnings during kernel compilation which indicates that tx_profile and rx_profile are supposed to be declared in *.c and not in *.h files. In file included from ./include/rdma/ib_verbs.h:64, from ./include/linux/mlx5/device.h:37, from ./include/linux/mlx5/driver.h:51, from ./include/linux/mlx5/vport.h:36, from drivers/infiniband/hw/mlx5/ib_virt.c:34: ./include/linux/dim.h:326:1: warning: _tx_profile_ defined but not used [-Wunused-const-variable=] 326 | tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = { | ^~~~~~~~~~ ./include/linux/dim.h:320:1: warning: _rx_profile_ defined but not used [-Wunused-const-variable=] 320 | rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = { | ^~~~~~~~~~ Fixes: 4f75da3666c0 ("linux/dim: Move implementation to .c files") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-25leds: core: Add support for composing LED class device namesJacek Anaszewski
Add generic support for composing LED class device name. The newly introduced led_compose_name() function composes device name according to either <color:function> or <devicename:color:function> pattern, depending on the configuration of initialization data. Backward compatibility with in-driver hard-coded LED class device names is assured thanks to the default_label and devicename properties of newly introduced struct led_init_data. In case none of the aforementioned properties was found, then, for OF nodes, the node name is adopted for LED class device name. At the occassion of amending the Documentation/leds/leds-class.txt unify spelling: colour -> color. Alongside these changes added is a new tool - tools/leds/get_led_device_info.sh. The tool allows retrieving details of a LED class device's parent device, which proves that using vendor or product name for devicename part of LED name doesn't convey any added value since that information had been already available in sysfs. The script performs also basic validation of a LED class device name. Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Baolin Wang <baolin.wang@linaro.org> Cc: Dan Murphy <dmurphy@ti.com> Cc: Daniel Mack <daniel@zonque.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Oleh Kravchenko <oleg@kaa.org.ua> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Simon Shields <simon@lineageos.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Pavel Machek <pavel@ucw.cz>
2019-07-25leds: class: Improve LED and LED flash class registration APIJacek Anaszewski
Replace of_led_classdev_register() with led_classdev_register_ext(), which accepts easily extendable struct led_init_data, instead of the fixed struct device_node argument. The latter can be now passed in an fwnode property of the struct led_init_data. The modification is driven by the need for passing additional arguments required for the forthcoming generic mechanism for composing LED names. Currently the LED name is conveyed in the "name" char pointer property of the struct led_classdev. This is redundant since LED class device name is accessible throughout the whole LED class device life time via associated struct device's kobj->name property. The change will not break any existing clients since the patch alters also existing led_classdev{_flash}_register() macro wrappers, that pass NULL in place of init_data, which leads to using legacy name initialization path basing on the struct led_classdev's "name" property. Three existing users of devm_of_led_classdev_registers() are modified to use devm_led_classdev_register(), which will not impact their operation since they in fact didn't need to pass struct device_node on registration from the beginning. Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Baolin Wang <baolin.wang@linaro.org> Cc: Dan Murphy <dmurphy@ti.com> Cc: Daniel Mack <daniel@zonque.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Oleh Kravchenko <oleg@kaa.org.ua> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Simon Shields <simon@lineageos.org> Acked-by: Pavel Machek <pavel@ucw.cz>
2019-07-25qed*: Change dpi_addr to be denoted with __iomemMichal Kalderon
Several casts were required around dpi_addr parameter in qed_rdma_if.h This is an address on the doorbell bar and should therefore be marked with __iomem. Link: https://lore.kernel.org/r/20190709141735.19193-5-michal.kalderon@marvell.com Reported-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25platform/x86: wmi: add missing struct parameter descriptionMattias Jacobsson
Add a description for the context parameter in the struct wmi_device_id. Reported-by: kbuild test robot <lkp@intel.com> Fixes: a48e23385fcf ("platform/x86: wmi: add context pointer field to struct wmi_device_id") Signed-off-by: Mattias Jacobsson <2pi@mok.nu> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-07-25Merge branch 'access-creds'Linus Torvalds
The access() (and faccessat()) credentials change can cause an unnecessary load on the RCU machinery because every access() call ends up freeing the temporary access credential using RCU. This isn't really noticeable on small machines, but if you have hundreds of cores you can cause huge slowdowns due to RCU storms. It's easy to avoid: the temporary access crededntials aren't actually normally accessed using RCU at all, so we can avoid the whole issue by just marking them as such. * access-creds: access: avoid the RCU grace period for the temporary subjective credentials
2019-07-25sched/core: Prevent race condition between cpuset and __sched_setscheduler()Juri Lelli
No synchronisation mechanism exists between the cpuset subsystem and calls to function __sched_setscheduler(). As such, it is possible that new root domains are created on the cpuset side while a deadline acceptance test is carried out in __sched_setscheduler(), leading to a potential oversell of CPU bandwidth. Grab cpuset_rwsem read lock from core scheduler, so to prevent situations such as the one described above from happening. The only exception is normalize_rt_tasks() which needs to work under tasklist_lock and can't therefore grab cpuset_rwsem. We are fine with this, as this function is only called by sysrq and, if that gets triggered, DEADLINE guarantees are already gone out of the window anyway. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bristot@redhat.com Cc: claudio@evidence.eu.com Cc: lizefan@huawei.com Cc: longman@redhat.com Cc: luca.abeni@santannapisa.it Cc: mathieu.poirier@linaro.org Cc: rostedt@goodmis.org Cc: tj@kernel.org Cc: tommaso.cucinotta@santannapisa.it Link: https://lkml.kernel.org/r/20190719140000.31694-9-juri.lelli@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25cgroup/cpuset: Change cpuset_rwsem and hotplug lock orderJuri Lelli
cpuset_rwsem is going to be acquired from sched_setscheduler() with a following patch. There are however paths (e.g., spawn_ksoftirqd) in which sched_scheduler() is eventually called while holding hotplug lock; this creates a dependecy between hotplug lock (to be always acquired first) and cpuset_rwsem (to be always acquired after hotplug lock). Fix paths which currently take the two locks in the wrong order (after a following patch is applied). Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bristot@redhat.com Cc: claudio@evidence.eu.com Cc: lizefan@huawei.com Cc: longman@redhat.com Cc: luca.abeni@santannapisa.it Cc: mathieu.poirier@linaro.org Cc: rostedt@goodmis.org Cc: tj@kernel.org Cc: tommaso.cucinotta@santannapisa.it Link: https://lkml.kernel.org/r/20190719140000.31694-7-juri.lelli@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25cpusets: Rebuild root domain deadline accounting informationMathieu Poirier
When the topology of root domains is modified by CPUset or CPUhotplug operations information about the current deadline bandwidth held in the root domain is lost. This patch addresses the issue by recalculating the lost deadline bandwidth information by circling through the deadline tasks held in CPUsets and adding their current load to the root domain they are associated with. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> [ Various additional modifications. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bristot@redhat.com Cc: claudio@evidence.eu.com Cc: lizefan@huawei.com Cc: longman@redhat.com Cc: luca.abeni@santannapisa.it Cc: rostedt@goodmis.org Cc: tj@kernel.org Cc: tommaso.cucinotta@santannapisa.it Link: https://lkml.kernel.org/r/20190719140000.31694-4-juri.lelli@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25sched/topology: Add partition_sched_domains_locked()Mathieu Poirier
Introduce the partition_sched_domains_locked() function by taking the mutex locking code out of the original function. That way the work done by partition_sched_domains_locked() can be reused without dropping the mutex lock. No change of functionality is introduced by this patch. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bristot@redhat.com Cc: claudio@evidence.eu.com Cc: lizefan@huawei.com Cc: longman@redhat.com Cc: luca.abeni@santannapisa.it Cc: rostedt@goodmis.org Cc: tommaso.cucinotta@santannapisa.it Link: https://lkml.kernel.org/r/20190719140000.31694-2-juri.lelli@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25sched/core: Convert get_task_struct() to return the taskMatthew Wilcox (Oracle)
Returning the pointer that was passed in allows us to write slightly more idiomatic code. Convert a few users. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190704221323.24290-1-willy@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25cpu/hotplug: Cache number of online CPUsThomas Gleixner
Re-evaluating the bitmap wheight of the online cpus bitmap in every invocation of num_online_cpus() over and over is a pretty useless exercise. Especially when num_online_cpus() is used in code paths like the IPI delivery of x86 or the membarrier code. Cache the number of online CPUs in the core and just return the cached variable. The accessor function provides only a snapshot when used without protection against concurrent CPU hotplug. The storage needs to use an atomic_t because the kexec and reboot code (ab)use set_cpu_online() in their 'shutdown' handlers without any form of serialization as pointed out by Mathieu. Regular CPU hotplug usage is properly serialized. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907091622590.1634@nanos.tec.linutronix.de
2019-07-25cpumask: Implement cpumask_or_equal()Thomas Gleixner
The IPI code of x86 needs to evaluate whether the target cpumask is equal to the cpu_online_mask or equal except for the calling CPU. To replace the current implementation which requires the usage of a temporary cpumask, which might involve allocations, add a new function which compares a cpumask to the result of two other cpumasks which are or'ed together before comparison. This allows to make the required decision in one go and the calling code then can check for the calling CPU being set in the target mask with cpumask_test_cpu(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105220.585449120@linutronix.de
2019-07-25smp/hotplug: Track booted once CPUs in a cpumaskThomas Gleixner
The booted once information which is required to deal with the MCE broadcast issue on X86 correctly is stored in the per cpu hotplug state, which is perfectly fine for the intended purpose. X86 needs that information for supporting NMI broadcasting via shortcuts, but retrieving it from per cpu data is cumbersome. Move it to a cpumask so the information can be checked against the cpu_present_mask quickly. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190722105219.818822855@linutronix.de
2019-07-25locking/lockdep: Reduce space occupied by stack tracesBart Van Assche
Although commit 669de8bda87b ("kernel/workqueue: Use dynamic lockdep keys for workqueues") unregisters dynamic lockdep keys when a workqueue is destroyed, a side effect of that commit is that all stack traces associated with the lockdep key are leaked when a workqueue is destroyed. Fix this by storing each unique stack trace once. Other changes in this patch are: - Use NULL instead of { .nr_entries = 0 } to represent 'no trace'. - Store a pointer to a stack trace in struct lock_class and struct lock_list instead of storing 'nr_entries' and 'offset'. This patch avoids that the following program triggers the "BUG: MAX_STACK_TRACE_ENTRIES too low!" complaint: #include <fcntl.h> #include <unistd.h> int main() { for (;;) { int fd = open("/dev/infiniband/rdma_cm", O_RDWR); close(fd); } } Suggested-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yuyang Du <duyuyang@gmail.com> Link: https://lkml.kernel.org/r/20190722182443.216015-4-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25stacktrace: Constify 'entries' argumentsBart Van Assche
Make it clear to humans and to the compiler that the stack trace ('entries') arguments are not modified. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190722182443.216015-3-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25locking/lockdep: Make it clear that what lock_class::key points at is not ↵Bart Van Assche
modified This patch does not change the behavior of the lockdep code. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190722182443.216015-2-bvanassche@acm.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25sched/fair: Use RCU accessors consistently for ->numa_groupJann Horn
The old code used RCU annotations and accessors inconsistently for ->numa_group, which can lead to use-after-frees and NULL dereferences. Let all accesses to ->numa_group use proper RCU helpers to prevent such issues. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults") Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25sched/fair: Don't free p->numa_faults with concurrent readersJann Horn
When going through execve(), zero out the NUMA fault statistics instead of freeing them. During execve, the task is reachable through procfs and the scheduler. A concurrent /proc/*/sched reader can read data from a freed ->numa_faults allocation (confirmed by KASAN) and write it back to userspace. I believe that it would also be possible for a use-after-free read to occur through a race between a NUMA fault and execve(): task_numa_fault() can lead to task_numa_compare(), which invokes task_weight() on the currently running task of a different CPU. Another way to fix this would be to make ->numa_faults RCU-managed or add extra locking, but it seems easier to wipe the NUMA fault statistics on execve. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()") Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25driver core: Remove device link creation limitationRafael J. Wysocki
If device_link_add() is called for a consumer/supplier pair with an existing device link between them and the existing link's type is not in agreement with the flags passed to that function by its caller, NULL will be returned. That is seriously inconvenient, because it forces the callers of device_link_add() to worry about what others may or may not do even if that is not relevant to them for any other reasons. It turns out, however, that this limitation can be made go away relatively easily. The underlying observation is that if DL_FLAG_STATELESS has been passed to device_link_add() in flags for the given consumer/supplier pair at least once, calling either device_link_del() or device_link_remove() to release the link returned by it should work, but there are no other requirements associated with that flag. In turn, if at least one of the callers of device_link_add() for the given consumer/supplier pair has not passed DL_FLAG_STATELESS to it in flags, the driver core should track the status of the link and act on it as appropriate (ie. the link should be treated as "managed"). This means that DL_FLAG_STATELESS needs to be set for managed device links and it should be valid to call device_link_del() or device_link_remove() to drop references to them in certain sutiations. To allow that to happen, introduce a new (internal) device link flag called DL_FLAG_MANAGED and make device_link_add() set it automatically whenever DL_FLAG_STATELESS is not passed to it. Also make it take additional references to existing device links that were previously stateless (that is, with DL_FLAG_STATELESS set and DL_FLAG_MANAGED unset) and will need to be managed going forward and initialize their status (which has been DL_STATE_NONE so far). Accordingly, when a managed device link is dropped automatically by the driver core, make it clear DL_FLAG_MANAGED, reset the link's status back to DL_STATE_NONE and drop the reference to it associated with DL_FLAG_MANAGED instead of just deleting it right away (to allow it to stay around in case it still needs to be released explicitly by someone). With that, since setting DL_FLAG_STATELESS doesn't mean that the device link in question is not managed any more, replace all of the status-tracking checks against DL_FLAG_STATELESS with analogous checks against DL_FLAG_MANAGED and update the documentation to reflect these changes. While at it, make device_link_add() reject flags that it does not recognize, including DL_FLAG_MANAGED. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Saravana Kannan <saravanak@google.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Review-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/2305283.AStDPdUUnE@kreacher Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25intel_th: msu: Introduce buffer interfaceAlexander Shishkin
Introduces a concept of external buffers, which is a mechanism for creating trace sinks that would receive trace data from MSC buffers and transfer it elsewhere. A external buffer can implement its own window allocation/deallocation if it has to. It must provide a callback that's used to notify it when a window fills up, so that it can then start a DMA transaction from that window 'elsewhere'. This window remains in a 'locked' state and won't be used for storing new trace data until the buffer 'unlocks' it with a provided API call, at which point the window can be used again for storing trace data. This relies on a functional "last block" interrupt, so not all versions of Trace Hub can use this feature, which does not reflect on existing users. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20190705141425.19894-2-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25usb: host: oxu210hp-hcd: remove include/linux/oxu210hp.hMasahiro Yamada
struct oxu210hp_platform_data is defined, but not used at all. $ git grep oxu210hp_platform_data include/linux/oxu210hp.h:struct oxu210hp_platform_data { include/linux/oxu210hp.h exists just for defining an unused structure, so it can go away. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Link: https://lore.kernel.org/r/20190721144909.5295-1-yamada.masahiro@socionext.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-25Input: allow drivers specify timestamp for input eventsAtif Niyaz
Currently, evdev stamps events with timestamps acquired in evdev_events() However, this timestamping may not be accurate in terms of measuring when the actual event happened. Let's allow individual drivers specify timestamp in order to provide a more accurate sense of time for the event. It is expected that drivers will set the timestamp in their hard interrupt routine. Signed-off-by: Atif Niyaz <atifniyaz@google.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-07-24fpga: altera-pr-ip: Make alt_pr_unregister function voidMoritz Fischer
Make alt_pr_unregister function void, since it always returns 0, and nothing would act on the value anyways. Signed-off-by: Moritz Fischer <mdf@kernel.org>
2019-07-24access: avoid the RCU grace period for the temporary subjective credentialsLinus Torvalds
It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU work because it installs a temporary credential that gets allocated and freed for each system call. The allocation and freeing overhead is mostly benign, but because credentials can be accessed under the RCU read lock, the freeing involves a RCU grace period. Which is not a huge deal normally, but if you have a lot of access() calls, this causes a fair amount of seconday damage: instead of having a nice alloc/free patterns that hits in hot per-CPU slab caches, you have all those delayed free's, and on big machines with hundreds of cores, the RCU overhead can end up being enormous. But it turns out that all of this is entirely unnecessary. Exactly because access() only installs the credential as the thread-local subjective credential, the temporary cred pointer doesn't actually need to be RCU free'd at all. Once we're done using it, we can just free it synchronously and avoid all the RCU overhead. So add a 'non_rcu' flag to 'struct cred', which can be set by users that know they only use it in non-RCU context (there are other potential users for this). We can make it a union with the rcu freeing list head that we need for the RCU case, so this doesn't need any extra storage. Note that this also makes 'get_current_cred()' clear the new non_rcu flag, in case we have filesystems that take a long-term reference to the cred and then expect the RCU delayed freeing afterwards. It's not entirely clear that this is required, but it makes for clear semantics: the subjective cred remains non-RCU as long as you only access it synchronously using the thread-local accessors, but you _can_ use it as a generic cred if you want to. It is possible that we should just remove the whole RCU markings for ->cred entirely. Only ->real_cred is really supposed to be accessed through RCU, and the long-term cred copies that nfs uses might want to explicitly re-enable RCU freeing if required, rather than have get_current_cred() do it implicitly. But this is a "minimal semantic changes" change for the immediate problem. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Glauber <jglauber@marvell.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com> Cc: Greg KH <greg@kroah.com> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-24lib/timerqueue: Rely on rbtree semantics for next timerDavidlohr Bueso
Simplify the timerqueue code by using cached rbtrees and rely on the tree leftmost node semantics to get the timer with earliest expiration time. This is a drop in conversion, and therefore semantics remain untouched. The runtime overhead of cached rbtrees is be pretty much the same as the current head->next method, noting that when removing the leftmost node, a common operation for the timerqueue, the rb_next(leftmost) is O(1) as well, so the next timer will either be the right node or its parent. Therefore no extra pointer chasing. Finally, the size of the struct timerqueue_head remains the same. Passes several hours of rcutorture. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190724152323.bojciei3muvfxalm@linux-r8p5
2019-07-24iommu: Introduce iommu_iotlb_gather_add_page()Will Deacon
Introduce a helper function for drivers to use when updating an iommu_iotlb_gather structure in response to an ->unmap() call, rather than having to open-code the logic in every page-table implementation. Signed-off-by: Will Deacon <will@kernel.org>
2019-07-24iommu: Introduce struct iommu_iotlb_gather for batching TLB flushesWill Deacon
To permit batching of TLB flushes across multiple calls to the IOMMU driver's ->unmap() implementation, introduce a new structure for tracking the address range to be flushed and the granularity at which the flushing is required. This is hooked into the IOMMU API and its caller are updated to make use of the new structure. Subsequent patches will plumb this into the IOMMU drivers as well, but for now the gathering information is ignored. Signed-off-by: Will Deacon <will@kernel.org>
2019-07-24iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_opsWill Deacon
In preparation for TLB flush gathering in the IOMMU API, rename the iommu_gather_ops structure in io-pgtable to iommu_flush_ops, which better describes its purpose and avoids the potential for confusion between different levels of the API. $ find linux/ -type f -name '*.[ch]' | xargs sed -i 's/gather_ops/flush_ops/g' Signed-off-by: Will Deacon <will@kernel.org>
2019-07-24iommu: Remove empty iommu_tlb_range_add() callback from iommu_opsWill Deacon
Commit add02cfdc9bc ("iommu: Introduce Interface for IOMMU TLB Flushing") added three new TLB flushing operations to the IOMMU API so that the underlying driver operations can be batched when unmapping large regions of IO virtual address space. However, the ->iotlb_range_add() callback has not been implemented by any IOMMU drivers (amd_iommu.c implements it as an empty function, which incurs the overhead of an indirect branch). Instead, drivers either flush the entire IOTLB in the ->iotlb_sync() callback or perform the necessary invalidation during ->unmap(). Attempting to implement ->iotlb_range_add() for arm-smmu-v3.c revealed two major issues: 1. The page size used to map the region in the page-table is not known, and so it is not generally possible to issue TLB flushes in the most efficient manner. 2. The only mutable state passed to the callback is a pointer to the iommu_domain, which can be accessed concurrently and therefore requires expensive synchronisation to keep track of the outstanding flushes. Remove the callback entirely in preparation for extending ->unmap() and ->iotlb_sync() to update a token on the caller's stack. Signed-off-by: Will Deacon <will@kernel.org>
2019-07-24can: Add SPDX license identifiers for CAN subsystemOliver Hartkopp
Add missing SPDX identifiers for the CAN network layer and correct the SPDX license for two of its include files to make sure the BSD-3-Clause applies for the entire subsystem. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-07-24can: remove obsolete empty ioctl() handlerOliver Hartkopp
With commit c7cbdbf29f488a ("net: rework SIOCGSTAMP ioctl handling") the only ioctl function in can_ioctl() has been removed. As this SIOCGSTAMP ioctl command is now handled in net/socket.c we can entirely remove the CAN specific ioctl functions. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-07-23PCI: Remove pci_block_cfg_access() et al (unused)Kelsey Skunberg
Remove the following unused functions from include/linux/pci.h: pci_block_cfg_access() pci_block_cfg_access_in_atomic() pci_unblock_cfg_access() These were added by fb51ccbf217c ("PCI: Rework config space blocking services"), though no callers were added. Link: https://lore.kernel.org/r/20190715203416.37547-1-skunberg.kelsey@gmail.com Signed-off-by: Kelsey Skunberg <skunberg.kelsey@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
2019-07-23bpf: fix narrower loads on s390Ilya Leoshkevich
The very first check in test_pkt_md_access is failing on s390, which happens because loading a part of a struct __sk_buff field produces an incorrect result. The preprocessed code of the check is: { __u8 tmp = *((volatile __u8 *)&skb->len + ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8))); if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2; }; clang generates the following code for it: 0: 71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3) 1: 61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0) 2: 57 30 00 00 00 00 00 ff r3 &= 255 3: 5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10> Finally, verifier transforms it to: 0: (61) r2 = *(u32 *)(r1 +104) 1: (bc) w2 = w2 2: (74) w2 >>= 24 3: (bc) w2 = w2 4: (54) w2 &= 255 5: (bc) w2 = w2 The problem is that when verifier emits the code to replace a partial load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a bitwise AND, it assumes that the machine is little endian and incorrectly decides to use a shift. Adjust shift count calculation to account for endianness. Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-23dma-mapping: use dma_get_mask in dma_addressing_limitedEric Auger
We currently have cases where the dma_addressing_limited() gets called with dma_mask unset. This causes a NULL pointer dereference. Use dma_get_mask() accessor to prevent the crash. Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper") Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-07-23block: blk-mq: Remove blk_mq_sched_started_request and started_requestMarcos Paulo de Souza
blk_mq_sched_completed_request is a function that checks if the elevator related to the request has started_request implemented, but currently, none of the available IO schedulers implement started_request, so remove both. Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-23fbdev: Ditch fb_edid_add_monspecsDaniel Vetter
It's dead code ever since commit 34280340b1dc74c521e636f45cd728f9abf56ee2 Author: Geert Uytterhoeven <geert+renesas@glider.be> Date: Fri Dec 4 17:01:43 2015 +0100 fbdev: Remove unused SH-Mobile HDMI driver Also with this gone we can remove the cea_modes db. This entire thing is massively incomplete anyway, compared to the CEA parsing that drm_edid.c does. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tavis Ormandy <taviso@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190721201956.941-1-daniel.vetter@ffwll.ch
2019-07-23iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVAJoerg Roedel
The stub function for !CONFIG_IOMMU_IOVA needs to be 'static inline'. Fixes: effa467870c76 ('iommu/vt-d: Don't queue_iova() if there is no flush queue') Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-23PM: sleep: Drop dpm_noirq_begin() and dpm_noirq_end()Rafael J. Wysocki
Note that after previous changes dpm_noirq_begin() and dpm_noirq_end() each have only one caller, so move the code from them to their respective callers and drop them. Also note that dpm_noirq_resume_devices() and dpm_noirq_suspend_devices() need not be exported any more, so make them both static. This change is not expected to alter functionality. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-23PM: sleep: Simplify suspend-to-idle control flowRafael J. Wysocki
After commit 33e4f80ee69b ("ACPI / PM: Ignore spurious SCI wakeups from suspend-to-idle") the "noirq" phases of device suspend and resume may run for multiple times during suspend-to-idle, if there are spurious system wakeup events while suspended. However, this is complicated and fragile and actually unnecessary. The main reason for doing this is that on some systems the EC may signal system wakeup events (power button events, for example) as well as events that should not cause the system to resume (spurious system wakeup events). Thus, in order to determine whether or not a given event signaled by the EC while suspended is a proper system wakeup one, the EC GPE needs to be dispatched and to start with that was achieved by allowing the ACPI SCI action handler to run, which was only possible after calling resume_device_irqs(). However, dispatching the EC GPE this way turned out to take too much time in some cases and some EC events might be missed due to that, so commit 68e22011856f ("ACPI: EC: Dispatch the EC GPE directly on s2idle wake") started to dispatch the EC GPE right after a wakeup event has been detected, so in fact the full ACPI SCI action handler doesn't need to run any more to deal with the wakeups coming from the EC. Use this observation to simplify the suspend-to-idle control flow so that the "noirq" phases of device suspend and resume are each run only once in every suspend-to-idle cycle, which is reported to significantly reduce power drawn by some systems when suspended to idle (by allowing them to reach a deep platform-wide low-power state through the suspend-to-idle flow). [What appears to happen is that the "noirq" resume of devices after a spurious EC wakeup brings some devices into a state in which they prevent the platform from reaching the deep low-power state going forward, even after a subsequent "noirq" suspend phase, and on some systems the EC triggers such wakeups already when the "noirq" suspend of devices is running for the first time in the given suspend/resume cycle, so the platform cannot reach the deep low-power state at all.] First, make acpi_s2idle_wake() use the acpi_ec_dispatch_gpe() return value to determine whether or not the wakeup may have been triggered by the EC (in which case the system wakeup is canceled and ACPI events are processed in order to determine whether or not the event is a proper system wakeup one) and use rearm_wake_irq() (introduced by a previous change) in it to rearm the ACPI SCI for system wakeup detection in case the system will remain suspended. Second, drop acpi_s2idle_sync(), which is not needed any more, and the corresponding global platform suspend-to-idle callback. Next, drop the pm_wakeup_pending() check (which is an optimization only) from __device_suspend_noirq() to prevent it from returning errors on system wakeups occurring before the "noirq" phase of device suspend is complete (as in the case of suspend-to-idle it is not known whether or not these wakeups are suprious at that point), in order to avoid having to carry out a "noirq" resume of devices on a spurious system wakeup. Finally, change the code flow in s2idle_loop() to (1) run the "noirq" suspend of devices once before starting the loop, (2) check for spurious EC wakeups (via the platform ->wake callback) for the first time before calling s2idle_enter(), and (3) run the "noirq" resume of devices once after leaving the loop. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-23PCI: irq: Introduce rearm_wake_irq()Rafael J. Wysocki
Introduce a new function, rearm_wake_irq(), allowing a wakeup IRQ to be armed for systen wakeup detection again without running any action handlers associated with it after it has been armed for wakeup detection and triggered. That is useful for IRQs, like ACPI SCI, that may deliver wakeup as well as non-wakeup interrupts when armed for systen wakeup detection. In those cases, it may be possible to determine whether or not the delivered interrupt is a systen wakeup one without running the entire action handler (or handlers, if the IRQ is shared) for the IRQ, and if the interrupt turns out to be a non-wakeup one, the IRQ can be rearmed with the help of the new function. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-22net: Convert skb_frag_t to bio_vecMatthew Wilcox (Oracle)
There are a lot of users of frag->page_offset, so use a union to avoid converting those users today. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22net: Rename skb_frag_t size to bv_lenMatthew Wilcox (Oracle)
Improved compatibility with bvec Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22net: Rename skb_frag page to bv_pageMatthew Wilcox (Oracle)
One step closer to turning the skb_frag_t into a bio_vec. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22net: Reorder the contents of skb_frag_tMatthew Wilcox (Oracle)
Match the layout of bio_vec. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>