summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-10-30SUNRPC: Destroy the back channel when we destroy the host transportTrond Myklebust
When we're destroying the host transport mechanism, we should ensure that we do not leak memory by failing to release any back channel slots that might still exist. Reported-by: Neil Brown <neilb@suse.de> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-30Merge branches 'doc.2019.10.29a', 'fixes.2019.10.30a', 'nohz.2019.10.28a', ↵Paul E. McKenney
'replace.2019.10.30a', 'torture.2019.10.05a' and 'lkmm.2019.10.05a' into HEAD doc.2019.10.29a: RCU documentation updates. fixes.2019.10.30a: RCU miscellaneous fixes. nohz.2019.10.28a: RCU NO_HZ and NO_HZ_FULL updates. replace.2019.10.30a: Replace rcu_swap_protected() with rcu_replace(). torture.2019.10.05a: RCU torture-test updates. lkmm.2019.10.05a: Linux kernel memory model updates.
2019-10-30rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()Paul E. McKenney
Although the rcu_swap_protected() macro follows the example of swap(), the interactions with RCU make its update of its argument somewhat counter-intuitive. This commit therefore introduces an rcu_replace_pointer() that returns the old value of the RCU pointer instead of doing the argument update. Once all the uses of rcu_swap_protected() are updated to instead use rcu_replace_pointer(), rcu_swap_protected() will be removed. Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/ Reported-by: Linus Torvalds <torvalds@linux-foundation.org> [ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Shane M Seymour <shane.seymour@hpe.com> Cc: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-30rcu: Remove unused function hlist_bl_del_init_rcu()Ethan Hansen
The function hlist_bl_del_init_rcu() is declared in rculist_bl.h, but never used. This commit therefore removes it. Signed-off-by: Ethan Hansen <1ethanhansen@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-29net/mlx5: Fix flow counter list auto bits structRoi Dayan
The union should contain the extended dest and counter list. Remove the resevered 0x40 bits which is redundant. This change doesn't break any functionally. Everything works today because the code in fs_cmd.c is using the correct structs if extended dest or the basic dest. Fixes: 1b115498598f ("net/mlx5: Introduce extended destination fields") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-29PM: wakeup: Add routine to help fetch wakeup source object.Ran Wang
Some user might want to go through all registered wakeup sources and doing things accordingly. For example, SoC PM driver might need to do HW programming to prevent powering down specific IP which wakeup source depending on. So add this API to help walk through all registered wakeup source objects on that list and return them one by one. Signed-off-by: Ran Wang <ran.wang_1@nxp.com> Tested-by: Leonard Crestez <leonard.crestez@nxp.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Li Yang <leoyang.li@nxp.com>
2019-10-29net: add __sys_accept4_file() helperJens Axboe
This is identical to __sys_accept4(), except it takes a struct file instead of an fd, and it also allows passing in extra file->f_flags flags. The latter is done to support masking in O_NONBLOCK without manipulating the original file flags. No functional changes in this patch. Cc: netdev@vger.kernel.org Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29io-wq: small threadpool implementation for io_uringJens Axboe
This adds support for io-wq, a smaller and specialized thread pool implementation. This is meant to replace workqueues for io_uring. Among the reasons for this addition are: - We can assign memory context smarter and more persistently if we manage the life time of threads. - We can drop various work-arounds we have in io_uring, like the async_list. - We can implement hashed work insertion, to manage concurrency of buffered writes without needing a) an extra workqueue, or b) needlessly making the concurrency of said workqueue very low which hurts performance of multiple buffered file writers. - We can implement cancel through signals, for cancelling interruptible work like read/write (or send/recv) to/from sockets. - We need the above cancel for being able to assign and use file tables from a process. - We can implement a more thorough cancel operation in general. - We need it to move towards a syslet/threadlet model for even faster async execution. For that we need to take ownership of the used threads. This list is just off the top of my head. Performance should be the same, or better, at least that's what I've seen in my testing. io-wq supports basic NUMA functionality, setting up a pool per node. io-wq hooks up to the scheduler schedule in/out just like workqueue and uses that to drive the need for more/less workers. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29gpu: host1x: Add direction flags to relocationsThierry Reding
Add direction flags to host1x relocations performed during job pinning. These flags indicate the kinds of accesses that hardware is allowed to perform on the relocated buffers. Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-10-29gpu: host1x: Overhaul host1x_bo_{pin,unpin}() APIThierry Reding
The host1x_bo_pin() and host1x_bo_unpin() APIs are used to pin and unpin buffers during host1x job submission. Pinning currently returns the SG table and the DMA address (an IOVA if an IOMMU is used or a physical address if no IOMMU is used) of the buffer. The DMA address is only used for buffers that are relocated, whereas the host1x driver will map gather buffers into its own IOVA space so that they can be processed by the CDMA engine. This approach has a couple of issues. On one hand it's not very useful to return a DMA address for the buffer if host1x doesn't need it. On the other hand, returning the SG table of the buffer is suboptimal because a single SG table cannot be shared for multiple mappings, because the DMA address is stored within the SG table, and the DMA address may be different for different devices. Subsequent patches will move the host1x driver over to the DMA API which doesn't work with a single shared SG table. Fix this by returning a new SG table each time a buffer is pinned. This allows the buffer to be referenced by multiple jobs for different engines. Change the prototypes of host1x_bo_pin() and host1x_bo_unpin() to take a struct device *, specifying the device for which the buffer should be pinned. This is required in order to be able to properly construct the SG table. While at it, make host1x_bo_pin() return the SG table because that allows us to return an ERR_PTR()-encoded error code if we need to, or return NULL to signal that we don't need the SG table to be remapped and can simply use the DMA address as-is. At the same time, returning the DMA address is made optional because in the example of command buffers, host1x doesn't need to know the DMA address since it will have to create its own mapping anyway. Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-10-29regulator: fixed: add off-on-delayPeng Fan
Depends on board design, the gpio controlling regulator may connects with a big capacitance. When need off, it takes some time to let the regulator to be truly off. If not add enough delay, the regulator might have always been on, so introduce off-on-delay to handle such case. Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/1572311875-22880-3-git-send-email-peng.fan@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-10-29resource: Add a resource_list_first_type helperRob Herring
A common pattern is looping over a resource_list just to get a matching entry with a specific type. Add resource_list_first_type() helper which implements this. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2019-10-29sched/kcpustat: Introduce vtime-aware kcpustat accessor for CPUTIME_SYSTEMFrederic Weisbecker
Kcpustat is not correctly supported on nohz_full CPUs. The tick doesn't fire and the cputime therefore doesn't move forward. The issue has shown up after the vanishing of the remaining 1Hz which has made the stall visible. We are solving that with checking the task running on a CPU through RCU and reading its vtime delta that we add to the raw kcpustat values. We make sure that we fetch a coherent raw-kcpustat/vtime-delta couple sequence while checking that the CPU referred by the target vtime is the correct one, under the locked vtime seqcount. Only CPUTIME_SYSTEM is handled here as a start because it's the trivial case. User and guest time will require more preparation work to correctly handle niceness. Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpengli@tencent.com> Link: https://lkml.kernel.org/r/20191025020303.19342-1-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29context_tracking: Check static key on context_tracking_enabled_*cpu()Frederic Weisbecker
guest_enter() doesn't call context_tracking_enabled() before calling context_tracking_enabled_this_cpu(). Therefore the guest code doesn't benefit from the static key on the fast path. Just make sure that context_tracking_enabled_*cpu() functions check the static key by themselves to propagate the optimization. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-11-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29sched/vtime: Introduce vtime_accounting_enabled_cpu()Frederic Weisbecker
This allows us to check if a remote CPU runs vtime accounting (ie: is nohz_full). We'll need that to reliably support reading kcpustat on nohz_full CPUs. Also simplify a bit the condition in the local flavoured function while at it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-10-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29sched/vtime: Rename vtime_accounting_cpu_enabled() to ↵Frederic Weisbecker
vtime_accounting_enabled_this_cpu() Standardize the naming on top of the vtime_accounting_enabled_*() base. Also make it clear we are checking the vtime state of the *current* CPU with this function. We'll need to add an API to check that state on remote CPUs as well, so we must disambiguate the naming. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-9-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29context_tracking: Introduce context_tracking_enabled_cpu()Frederic Weisbecker
This allows us to check if a remote CPU runs context tracking (ie: is nohz_full). We'll need that to reliably support "nice" accounting on kcpustat. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-8-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29context_tracking: Rename context_tracking_is_cpu_enabled() to ↵Frederic Weisbecker
context_tracking_enabled_this_cpu() Standardize the naming on top of the context_tracking_enabled_*() base. Also make it clear we are checking the context tracking state of the *current* CPU with this function. We'll need to add an API to check that state on remote CPUs as well, so we must disambiguate the naming. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-7-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29context_tracking: Rename context_tracking_is_enabled() => ↵Frederic Weisbecker
context_tracking_enabled() Remove the superfluous "is" in the middle of the name. We want to standardize the naming so that it can be expanded through suffixes: context_tracking_enabled() context_tracking_enabled_cpu() context_tracking_enabled_this_cpu() Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-6-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29context_tracking: Remove context_tracking_active()Frederic Weisbecker
This function is a leftover from old removal or rename. We can drop it. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-5-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29sched/cputime: Add vtime guest task stateFrederic Weisbecker
Record guest as a VTIME state instead of guessing it from VTIME_SYS and PF_VCPU. This is going to simplify the cputime read side especially as its state machine is going to further expand in order to fully support kcpustat on nohz_full. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-4-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29sched/cputime: Add vtime idle task stateFrederic Weisbecker
Record idle as a VTIME state instead of guessing it from VTIME_SYS and is_idle_task(). This is going to simplify the cputime read side especially as its state machine is going to further expand in order to fully support kcpustat on nohz_full. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-3-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29sched/vtime: Record CPU under seqcount for kcpustat needsFrederic Weisbecker
In order to compute the kcpustat delta on a nohz CPU, we'll need to fetch the task running on that target. Checking that its vtime state snapshot actually refers to the relevant target involves recording that CPU under the seqcount locked on task switch. This is a step toward making kcpustat moving forward on full nohz CPUs. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-2-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28net: fix sk_page_frag() recursion from memory reclaimTejun Heo
sk_page_frag() optimizes skb_frag allocations by using per-task skb_frag cache when it knows it's the only user. The condition is determined by seeing whether the socket allocation mask allows blocking - if the allocation may block, it obviously owns the task's context and ergo exclusively owns current->task_frag. Unfortunately, this misses recursion through memory reclaim path. Please take a look at the following backtrace. [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10 ... tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 sock_xmit.isra.24+0xa1/0x170 [nbd] nbd_send_cmd+0x1d2/0x690 [nbd] nbd_queue_rq+0x1b5/0x3b0 [nbd] __blk_mq_try_issue_directly+0x108/0x1b0 blk_mq_request_issue_directly+0xbd/0xe0 blk_mq_try_issue_list_directly+0x41/0xb0 blk_mq_sched_insert_requests+0xa2/0xe0 blk_mq_flush_plug_list+0x205/0x2a0 blk_flush_plug_list+0xc3/0xf0 [1] blk_finish_plug+0x21/0x2e _xfs_buf_ioapply+0x313/0x460 __xfs_buf_submit+0x67/0x220 xfs_buf_read_map+0x113/0x1a0 xfs_trans_read_buf_map+0xbf/0x330 xfs_btree_read_buf_block.constprop.42+0x95/0xd0 xfs_btree_lookup_get_block+0x95/0x170 xfs_btree_lookup+0xcc/0x470 xfs_bmap_del_extent_real+0x254/0x9a0 __xfs_bunmapi+0x45c/0xab0 xfs_bunmapi+0x15/0x30 xfs_itruncate_extents_flags+0xca/0x250 xfs_free_eofblocks+0x181/0x1e0 xfs_fs_destroy_inode+0xa8/0x1b0 destroy_inode+0x38/0x70 dispose_list+0x35/0x50 prune_icache_sb+0x52/0x70 super_cache_scan+0x120/0x1a0 do_shrink_slab+0x120/0x290 shrink_slab+0x216/0x2b0 shrink_node+0x1b6/0x4a0 do_try_to_free_pages+0xc6/0x370 try_to_free_mem_cgroup_pages+0xe3/0x1e0 try_charge+0x29e/0x790 mem_cgroup_charge_skmem+0x6a/0x100 __sk_mem_raise_allocated+0x18e/0x390 __sk_mem_schedule+0x2a/0x40 [0] tcp_sendmsg_locked+0x8eb/0xe10 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 ___sys_sendmsg+0x26d/0x2b0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In [0], tcp_send_msg_locked() was using current->page_frag when it called sk_wmem_schedule(). It already calculated how many bytes can be fit into current->page_frag. Due to memory pressure, sk_wmem_schedule() called into memory reclaim path which called into xfs and then IO issue path. Because the filesystem in question is backed by nbd, the control goes back into the tcp layer - back into tcp_sendmsg_locked(). nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes sense - it's in the process of freeing memory and wants to be able to, e.g., drop clean pages to make forward progress. However, this confused sk_page_frag() called from [2]. Because it only tests whether the allocation allows blocking which it does, it now thinks current->page_frag can be used again although it already was being used in [0]. After [2] used current->page_frag, the offset would be increased by the used amount. When the control returns to [0], current->page_frag's offset is increased and the previously calculated number of bytes now may overrun the end of allocated memory leading to silent memory corruptions. Fix it by adding gfpflags_normal_context() which tests sleepable && !reclaim and use it to determine whether to use current->task_frag. v2: Eric didn't like gfp flags being tested twice. Introduce a new helper gfpflags_normal_context() and combine the two tests. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: add skb_queue_empty_lockless()Eric Dumazet
Some paths call skb_queue_empty() without holding the queue lock. We must use a barrier in order to not let the compiler do strange things, and avoid KCSAN splats. Adding a barrier in skb_queue_empty() might be overkill, I prefer adding a new helper to clearly identify points where the callers might be lockless. This might help us finding real bugs. The corresponding WRITE_ONCE() should add zero cost for current compilers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28Merge branch 'odp_rework' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== In order to hoist the interval tree code out of the drivers and into the mmu_notifiers it is necessary for the drivers to not use the interval tree for other things. This series replaces the interval tree with an xarray and along the way re-aligns all the locking to use a sensible SRCU model where the 'update' step is done by modifying an xarray. The result is overall much simpler and with less locking in the critical path. Many functions were reworked for clarity and small details like using 'imr' to refer to the implicit MR make the entire code flow here more readable. This also squashes at least two race bugs on its own, and quite possibily more that haven't been identified. ==================== Merge conflicts with the odp statistics patch resolved. * branch 'odp_rework': RDMA/odp: Remove broken debugging call to invalidate_range RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray RDMA/mlx5: Rework implicit ODP destroy RDMA/mlx5: Avoid double lookups on the pagefault path RDMA/mlx5: Reduce locking in implicit_mr_get_data() RDMA/mlx5: Use an xarray for the children of an implicit ODP RDMA/mlx5: Split implicit handling from pagefault_mr RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it RDMA/mlx5: Rework implicit_mr_get_data RDMA/mlx5: Delete struct mlx5_priv->mkey_table RDMA/mlx5: Use a dedicated mkey xarray for ODP RDMA/mlx5: Split sig_err MR data into its own xarray RDMA/mlx5: Use SRCU properly in ODP prefetch Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/mlx5: Delete struct mlx5_priv->mkey_tableJason Gunthorpe
No users are left, delete it. Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28Merge tag 'v5.4-rc5' into rdma.git for-nextJason Gunthorpe
Linux 5.4-rc5 For dependencies in the next patches Conflict resolved by keeping the delete of the unlock. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/putMarc Zyngier
When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-10-28Merge tag 'reset-for-v5.5' of git://git.pengutronix.de/git/pza/linux into ↵Olof Johansson
arm/drivers Reset controller updates for v5.5 This tag adds support for Meson SM1 ARB resets, Uniphier Pro5 USB3 resets, the Meson-A1 reset controller, SocFPGA Agilex resets, and Realtek RTD1195/RTD1295 resets. It adds some reset controller API keywords for get_maintainers.pl and makes a few remaining reset_control_ops const. Also included are a conversion of the Qualcomm device tree bindings to yaml and a few small kerneldoc improvements. * tag 'reset-for-v5.5' of git://git.pengutronix.de/git/pza/linux: reset: document (devm_)reset_control_get_optional variants reset: improve of_xlate documentation reset: simple: Add Realtek RTD1195/RTD1295 reset: simple: Keep alphabetical order MAINTAINERS: add reset controller framework keywords reset: zynqmp: Make reset_control_ops const reset: hisilicon: hi3660: Make reset_control_ops const reset: build simple reset controller driver for Agilex reset: add support for the Meson-A1 SoC Reset Controller dt-bindings: reset: add bindings for the Meson-A1 SoC Reset Controller reset: uniphier-glue: Add Pro5 USB3 support dt-bindings: reset: pdc: Convert PDC Global bindings to yaml dt-bindings: reset: aoss: Convert AOSS reset bindings to yaml reset: Remove copy'n'paste redundancy in the comments reset: meson-audio-arb: add sm1 support reset: dt-bindings: meson: update arb bindings for sm1 Link: https://lore.kernel.org/r/ede6874508472d0917dca770ef80b90626b0f205.camel@pengutronix.de Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-28export: avoid code duplication in include/linux/export.hMasahiro Yamada
include/linux/export.h has lots of code duplication between EXPORT_SYMBOL and EXPORT_SYMBOL_NS. To improve the maintainability and readability, unify the implementation. When the symbol has no namespace, pass the empty string "" to the 'ns' parameter. The drawback of this change is, it grows the code size. When the symbol has no namespace, sym->namespace was previously NULL, but it is now an empty string "". So, it increases 1 byte for every no namespace EXPORT_SYMBOL. A typical kernel configuration has 10K exported symbols, so it increases 10KB in rough estimation. I did not come up with a good idea to refactor it without increasing the code size. I am not sure how big a deal it is, but at least include/linux/export.h looks nicer. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> [maennich: rebase on top of 3 fixes for the namespace feature] Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
2019-10-28fs: add generic UNRESVSP and ZERO_RANGE ioctl handlersChristoph Hellwig
These use the same scheme as the pre-existing mapping of the XFS RESVP ioctls to ->falloc, so just extend it and remove the XFS implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> [darrick: fix compile error on s390] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-28Merge remote-tracking branch 'arm64/for-next/fixes' into for-next/coreCatalin Marinas
This is required to solve the conflicts with subsequent merges of two more errata workaround branches. * arm64/for-next/fixes: arm64: tags: Preserve tags for addresses translated via TTBR1 arm64: mm: fix inverted PAR_EL1.F check arm64: sysreg: fix incorrect definition of SYS_PAR_EL1_F arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabled arm64: hibernate: check pgd table allocation arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled arm64: Fix kcore macros after 52-bit virtual addressing fallout arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected arm64: Avoid Cavium TX2 erratum 219 when switching TTBR arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
2019-10-28perf/core, perf/x86: Introduce swap_task_ctx() method at 'struct pmu'Alexey Budankov
Declare swap_task_ctx() methods at the generic and x86 specific pmu types to bridge calls to platform specific PMU code on optimized context switch path between equivalent task perf event contexts. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/9a0aa84a-f062-9b64-3133-373658550c4b@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Some minor fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vringh: fix copy direction of vringh_iov_push_kern() vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt' virtio_ring: fix stalls for packed rings
2019-10-28Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28reset: fix reset_control_ops kerneldoc commentRandy Dunlap
Add a missing short description to the reset_control_ops documentation. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> [p.zabel@pengutronix.de: rebased and updated commit message] Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-10-28powerpc/xmon: Restrict when kernel is locked downChristopher M. Riedl
Xmon should be either fully or partially disabled depending on the kernel lockdown state. Put xmon into read-only mode for lockdown=integrity and prevent user entry into xmon when lockdown=confidentiality. Xmon checks the lockdown state on every attempted entry: (1) during early xmon'ing (2) when triggered via sysrq (3) when toggled via debugfs (4) when triggered via a previously enabled breakpoint The following lockdown state transitions are handled: (1) lockdown=none -> lockdown=integrity set xmon read-only mode (2) lockdown=none -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent user re-entry into xmon (3) lockdown=integrity -> lockdown=confidentiality clear all breakpoints, set xmon read-only mode, prevent user re-entry into xmon Suggested-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190907061124.1947-3-cmr@informatik.wtf
2019-10-28drm/tegra: Move IOMMU group into host1x clientThierry Reding
Handling of the IOMMU group attachment is common to all clients, so move the group into the client to simplify code. Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-10-28gpu: host1x: Request channels for clients, not devicesThierry Reding
A struct device doesn't carry much information that a channel might be interested in, but the client very much does. Request channels for the clients rather than their parent devices and store a pointer to them in order to have that information available when needed. Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-10-28perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/Geert Uytterhoeven
As per POSIX, the correct spelling of the error code is EACCES: include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Kosina <trivial@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20191024122904.12463-1-geert+renesas@glider.be Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'Stefano Garzarella
The 'work' field was introduced with commit 06a8fc78367d0 ("VSOCK: Introduce virtio_vsock_common.ko") but it is never used in the code, so we can remove it to save memory allocated in the per-packet 'struct virtio_vsock_pkt' Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-28x86/speculation/taa: Add sysfs reporting for TSX Async AbortPawan Gupta
Add the sysfs reporting file for TSX Async Abort. It exposes the vulnerability and the mitigation state similar to the existing files for the other hardware vulnerabilities. Sysfs file path is: /sys/devices/system/cpu/vulnerabilities/tsx_async_abort Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Mark Gross <mgross@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
2019-10-27Merge branch 'ib-fwnode-gpiod-get-index' of ↵Dmitry Torokhov
https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio into next Pull in fwnode-gpiod-get-index API.
2019-10-27Merge tag 'v5.4-rc5' into nextDmitry Torokhov
Sync up with mainline.
2019-10-27Merge 5.4-rc5 into staging-nextGreg Kroah-Hartman
We want the staging fixes in here for testing and building on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-27Merge 5.4-rc5 into driver-core-nextGreg Kroah-Hartman
We want the sysfs fix in here as well to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-10-27 The following pull-request contains BPF updates for your *net-next* tree. We've added 52 non-merge commits during the last 11 day(s) which contain a total of 65 files changed, 2604 insertions(+), 1100 deletions(-). The main changes are: 1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF assembly code. The work here teaches BPF verifier to recognize kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints such that verifier allows direct use of bpf_skb_event_output() helper used in tc BPF et al (w/o probing memory access) that dumps skb data into perf ring buffer. Also add direct loads to probe memory in order to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov. 2) Big batch of changes to improve libbpf and BPF kselftests. Besides others: generalization of libbpf's CO-RE relocation support to now also include field existence relocations, revamp the BPF kselftest Makefile to add test runner concept allowing to exercise various ways to build BPF programs, and teach bpf_object__open() and friends to automatically derive BPF program type/expected attach type from section names to ease their use, from Andrii Nakryiko. 3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu. 4) Allow to read BTF as raw data from bpftool. Most notable use case is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa. 5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which manages to improve "rx_drop" performance by ~4%., from Björn Töpel. 6) Fix to restore the flow dissector after reattach BPF test and also fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki. 7) Improve verifier's BTF ctx access for use outside of raw_tp, from Martin KaFai Lau. 8) Improve documentation for AF_XDP with new sections and to reflect latest features, from Magnus Karlsson. 9) Add back 'version' section parsing to libbpf for old kernels, from John Fastabend. 10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(), from KP Singh. 11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order to improve insn coverage for built BPF progs, from Yonghong Song. 12) Misc minor cleanups and fixes, from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-10-27 The following pull-request contains BPF updates for your *net* tree. We've added 7 non-merge commits during the last 11 day(s) which contain a total of 7 files changed, 66 insertions(+), 16 deletions(-). The main changes are: 1) Fix two use-after-free bugs in relation to RCU in jited symbol exposure to kallsyms, from Daniel Borkmann. 2) Fix NULL pointer dereference in AF_XDP rx-only sockets, from Magnus Karlsson. 3) Fix hang in netdev unregister for hash based devmap as well as another overflow bug on 32 bit archs in memlock cost calculation, from Toke Høiland-Jørgensen. 4) Fix wrong memory access in LWT BPF programs on reroute due to invalid dst. Also fix BPF selftests to use more compatible nc options, from Jiri Benc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26Merge tag 'driver-core-5.4-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg KH: "Here is a single sysfs fix for 5.4-rc5. It resolves an error if you actually try to use the __BIN_ATTR_WO() macro, seems I never tested it properly before :( This has been in linux-next for a while with no reported issues" * tag 'driver-core-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: Fixes __BIN_ATTR_WO() macro