summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-09-29blk-mq: call commit_rqs while list empty but error happenyangerkun
Blk-mq should call commit_rqs once 'bd.last != true' and no more request will come(so virtscsi can kick the virtqueue, e.g.). We already do that in 'blk_mq_dispatch_rq_list/blk_mq_try_issue_list_directly' while list not empty and 'queued > 0'. However, we can seen the same scene once the last request in list call queue_rq and return error like BLK_STS_IOERR which will not requeue the request, and lead that list empty but need call commit_rqs too(Or the request for virtscsi will stay timeout until other request kick virtqueue). We found this problem by do fsstress test with offline/online virtscsi device repeat quickly. Fixes: d666ba98f849 ("blk-mq: add mq_ops->commit_rqs()") Reported-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-29io_uring: fix async buffered reads when readahead is disabledHao Xu
The async buffered reads feature is not working when readahead is turned off. There are two things to concern: - when doing retry in io_read, not only the IOCB_WAITQ flag but also the IOCB_NOWAIT flag is still set, which makes it goes to would_block phase in generic_file_buffered_read() and then return -EAGAIN. After that, the io-wq thread work is queued, and later doing the async reads in the old way. - even if we remove IOCB_NOWAIT when doing retry, the feature is still not running properly, since in generic_file_buffered_read() it goes to lock_page_killable() after calling mapping->a_ops->readpage() to do IO, and thus causing process to sleep. Fixes: 1a0a7853b901 ("mm: support async buffered reads in generic_file_buffered_read()") Fixes: 3b2a4439e0ae ("io_uring: get rid of kiocb_wait_page_queue_init()") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-29Merge tag 'gpio-fixes-for-v5.9-rc7' of ↵Linus Walleij
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into fixes gpio: fixes for v5.9-rc7 - fix uninitialized variable in gpio-pca953x - enable all 160 lines and fix interrupt configuration in gpio-aspeed-gpio - fix ast2600 bank properties in gpio-aspeed
2020-09-28ethtool: mark netlink family as __ro_after_initJakub Kicinski
Like all genl families ethtool_genl_family needs to not be a straight up constant, because it's modified/initialized by genl_register_family(). After init, however, it's only passed to genlmsg_put() & co. therefore we can mark it as __ro_after_init. Since genl_family structure contains function pointers mark this as a fix. Fixes: 2b4a8990b7df ("ethtool: introduce ethtool netlink interface") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28genetlink: add missing kdoc for validation flagsJakub Kicinski
Validation flags are missing kdoc, add it. Fixes: ef6243acb478 ("genetlink: optionally validate strictly/dumps") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net: usb: ax88179_178a: add MCT usb 3.0 adapterWilken Gottwalt
Adds the driver_info and usb ids of the AX88179 based MCT U3-A9003 USB 3.0 ethernet adapter. Signed-off-by: Wilken Gottwalt <wilken.gottwalt@mailbox.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net: usb: ax88179_178a: fix missing stop entry in driver_infoWilken Gottwalt
Adds the missing .stop entry in the Belkin driver_info structure. Fixes: e20bd60bf62a ("net: usb: asix88179_178a: Add support for the Belkin B2B128") Signed-off-by: Wilken Gottwalt <wilken.gottwalt@mailbox.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28Input: i8042 - add nopnp quirk for Acer Aspire 5 A515Jiri Kosina
Touchpad on this laptop is not detected properly during boot, as PNP enumerates (wrongly) AUX port as disabled on this machine. Fix that by adding this board (with admittedly quite funny DMI identifiers) to nopnp quirk list. Reported-by: Andrés Barrantes Silman <andresbs2000@protonmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2009252337340.3336@cbobk.fhfr.pm Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-09-28Input: trackpoint - enable Synaptics trackpointsVincent Huang
Add Synaptics IDs in trackpoint_start_protocol() to mark them as valid. Signed-off-by: Vincent Huang <vincent.huang@tw.synaptics.com> Fixes: 6c77545af100 ("Input: trackpoint - add new trackpoint variant IDs") Reviewed-by: Harry Cutts <hcutts@chromium.org> Tested-by: Harry Cutts <hcutts@chromium.org> Link: https://lore.kernel.org/r/20200924053013.1056953-1-vincent.huang@tw.synaptics.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2020-09-28net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locksManivannan Sadhasivam
The rcu read locks are needed to avoid potential race condition while dereferencing radix tree from multiple threads. The issue was identified by syzbot. Below is the crash report: ============================= WARNING: suspicious RCU usage 5.7.0-syzkaller #0 Not tainted ----------------------------- include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by kworker/u4:1/21: #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline] #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241 #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243 stack backtrace: CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: qrtr_ns_handler qrtr_ns_worker Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1e9/0x30e lib/dump_stack.c:118 radix_tree_deref_slot include/linux/radix-tree.h:176 [inline] ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline] qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674 process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268 worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414 kthread+0x353/0x380 kernel/kthread.c:268 Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace") Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28arm64: mte: Fix typo in memory tagging ABI documentationWill Deacon
We offer both PTRACE_PEEKMTETAGS and PTRACE_POKEMTETAGS requests via ptrace(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28Merge branch 'net-core-fix-a-lockdep-splat-in-the-dev_addr_list'David S. Miller
Taehee Yoo says: ==================== net: core: fix a lockdep splat in the dev_addr_list. This patchset is to avoid lockdep splat. When a stacked interface graph is changed, netif_addr_lock() is called recursively and it internally calls spin_lock_nested(). The parameter of spin_lock_nested() is 'dev->lower_level', this is called subclass. The problem of 'dev->lower_level' is that while 'dev->lower_level' is being used as a subclass of spin_lock_nested(), its value can be changed. So, spin_lock_nested() would be called recursively with the same subclass value, the lockdep understands a deadlock. In order to avoid this, a new variable is needed and it is going to be used as a parameter of spin_lock_nested(). The first and second patch is a preparation patch for the third patch. In the third patch, the problem will be fixed. The first patch is to add __netdev_upper_dev_unlink(). An existed netdev_upper_dev_unlink() is renamed to __netdev_upper_dev_unlink(). and netdev_upper_dev_unlink() is added as an wrapper of this function. The second patch is to add the netdev_nested_priv structure. netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. The third patch is to add a new variable 'nested_level' into the net_device structure. This variable will be used as a parameter of spin_lock_nested() of dev->addr_list_lock. Due to this variable, it can avoid lockdep splat. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net: core: add nested_level variable in net_deviceTaehee Yoo
This patch is to add a new variable 'nested_level' into the net_device structure. This variable will be used as a parameter of spin_lock_nested() of dev->addr_list_lock. netif_addr_lock() can be called recursively so spin_lock_nested() is used instead of spin_lock() and dev->lower_level is used as a parameter of spin_lock_nested(). But, dev->lower_level value can be updated while it is being used. So, lockdep would warn a possible deadlock scenario. When a stacked interface is deleted, netif_{uc | mc}_sync() is called recursively. So, spin_lock_nested() is called recursively too. At this moment, the dev->lower_level variable is used as a parameter of it. dev->lower_level value is updated when interfaces are being unlinked/linked immediately. Thus, After unlinking, dev->lower_level shouldn't be a parameter of spin_lock_nested(). A (macvlan) | B (vlan) | C (bridge) | D (macvlan) | E (vlan) | F (bridge) A->lower_level : 6 B->lower_level : 5 C->lower_level : 4 D->lower_level : 3 E->lower_level : 2 F->lower_level : 1 When an interface 'A' is removed, it releases resources. At this moment, netif_addr_lock() would be called. Then, netdev_upper_dev_unlink() is called recursively. Then dev->lower_level is updated. There is no problem. But, when the bridge module is removed, 'C' and 'F' interfaces are removed at once. If 'F' is removed first, a lower_level value is like below. A->lower_level : 5 B->lower_level : 4 C->lower_level : 3 D->lower_level : 2 E->lower_level : 1 F->lower_level : 1 Then, 'C' is removed. at this moment, netif_addr_lock() is called recursively. The ordering is like this. C(3)->D(2)->E(1)->F(1) At this moment, the lower_level value of 'E' and 'F' are the same. So, lockdep warns a possible deadlock scenario. In order to avoid this problem, a new variable 'nested_level' is added. This value is the same as dev->lower_level - 1. But this value is updated in rtnl_unlock(). So, this variable can be used as a parameter of spin_lock_nested() safely in the rtnl context. Test commands: ip link add br0 type bridge vlan_filtering 1 ip link add vlan1 link br0 type vlan id 10 ip link add macvlan2 link vlan1 type macvlan ip link add br3 type bridge vlan_filtering 1 ip link set macvlan2 master br3 ip link add vlan4 link br3 type vlan id 10 ip link add macvlan5 link vlan4 type macvlan ip link add br6 type bridge vlan_filtering 1 ip link set macvlan5 master br6 ip link add vlan7 link br6 type vlan id 10 ip link add macvlan8 link vlan7 type macvlan ip link set br0 up ip link set vlan1 up ip link set macvlan2 up ip link set br3 up ip link set vlan4 up ip link set macvlan5 up ip link set br6 up ip link set vlan7 up ip link set macvlan8 up modprobe -rv bridge Splat looks like: [ 36.057436][ T744] WARNING: possible recursive locking detected [ 36.058848][ T744] 5.9.0-rc6+ #728 Not tainted [ 36.059959][ T744] -------------------------------------------- [ 36.061391][ T744] ip/744 is trying to acquire lock: [ 36.062590][ T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30 [ 36.064922][ T744] [ 36.064922][ T744] but task is already holding lock: [ 36.066626][ T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60 [ 36.068851][ T744] [ 36.068851][ T744] other info that might help us debug this: [ 36.070731][ T744] Possible unsafe locking scenario: [ 36.070731][ T744] [ 36.072497][ T744] CPU0 [ 36.073238][ T744] ---- [ 36.074007][ T744] lock(&vlan_netdev_addr_lock_key); [ 36.075290][ T744] lock(&vlan_netdev_addr_lock_key); [ 36.076590][ T744] [ 36.076590][ T744] *** DEADLOCK *** [ 36.076590][ T744] [ 36.078515][ T744] May be due to missing lock nesting notation [ 36.078515][ T744] [ 36.080491][ T744] 3 locks held by ip/744: [ 36.081471][ T744] #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490 [ 36.083614][ T744] #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60 [ 36.085942][ T744] #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80 [ 36.088400][ T744] [ 36.088400][ T744] stack backtrace: [ 36.089772][ T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728 [ 36.091364][ T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 36.093630][ T744] Call Trace: [ 36.094416][ T744] dump_stack+0x77/0x9b [ 36.095385][ T744] __lock_acquire+0xbc3/0x1f40 [ 36.096522][ T744] lock_acquire+0xb4/0x3b0 [ 36.097540][ T744] ? dev_set_rx_mode+0x19/0x30 [ 36.098657][ T744] ? rtmsg_ifinfo+0x1f/0x30 [ 36.099711][ T744] ? __dev_notify_flags+0xa5/0xf0 [ 36.100874][ T744] ? rtnl_is_locked+0x11/0x20 [ 36.101967][ T744] ? __dev_set_promiscuity+0x7b/0x1a0 [ 36.103230][ T744] _raw_spin_lock_bh+0x38/0x70 [ 36.104348][ T744] ? dev_set_rx_mode+0x19/0x30 [ 36.105461][ T744] dev_set_rx_mode+0x19/0x30 [ 36.106532][ T744] dev_set_promiscuity+0x36/0x50 [ 36.107692][ T744] __dev_set_promiscuity+0x123/0x1a0 [ 36.108929][ T744] dev_set_promiscuity+0x1e/0x50 [ 36.110093][ T744] br_port_set_promisc+0x1f/0x40 [bridge] [ 36.111415][ T744] br_manage_promisc+0x8b/0xe0 [bridge] [ 36.112728][ T744] __dev_set_promiscuity+0x123/0x1a0 [ 36.113967][ T744] ? __hw_addr_sync_one+0x23/0x50 [ 36.115135][ T744] __dev_set_rx_mode+0x68/0x90 [ 36.116249][ T744] dev_uc_sync+0x70/0x80 [ 36.117244][ T744] dev_uc_add+0x50/0x60 [ 36.118223][ T744] macvlan_open+0x18e/0x1f0 [macvlan] [ 36.119470][ T744] __dev_open+0xd6/0x170 [ 36.120470][ T744] __dev_change_flags+0x181/0x1d0 [ 36.121644][ T744] dev_change_flags+0x23/0x60 [ 36.122741][ T744] do_setlink+0x30a/0x11e0 [ 36.123778][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.124929][ T744] ? __nla_validate_parse.part.6+0x45/0x8e0 [ 36.126309][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.127457][ T744] __rtnl_newlink+0x546/0x8e0 [ 36.128560][ T744] ? lock_acquire+0xb4/0x3b0 [ 36.129623][ T744] ? deactivate_slab.isra.85+0x6a1/0x850 [ 36.130946][ T744] ? __lock_acquire+0x92c/0x1f40 [ 36.132102][ T744] ? lock_acquire+0xb4/0x3b0 [ 36.133176][ T744] ? is_bpf_text_address+0x5/0xe0 [ 36.134364][ T744] ? rtnl_newlink+0x2e/0x70 [ 36.135445][ T744] ? rcu_read_lock_sched_held+0x32/0x60 [ 36.136771][ T744] ? kmem_cache_alloc_trace+0x2d8/0x380 [ 36.138070][ T744] ? rtnl_newlink+0x2e/0x70 [ 36.139164][ T744] rtnl_newlink+0x47/0x70 [ ... ] Fixes: 845e0ebb4408 ("net: change addr_list_lock back to static key") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net: core: introduce struct netdev_nested_priv for nested interface ↵Taehee Yoo
infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper | lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net: core: add __netdev_upper_dev_unlink()Taehee Yoo
The netdev_upper_dev_unlink() has to work differently according to flags. This idea is the same with __netdev_upper_dev_link(). In the following patches, new flags will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28arm64: cpufeature: Export symbol read_sanitised_ftr_reg()Jean-Philippe Brucker
The SMMUv3 driver would like to read the MMFR0 PARANGE field in order to share CPU page tables with devices. Allow the driver to be built as module by exporting the read_sanitized_ftr_reg() cpufeature symbol. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20200918101852.582559-7-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28arm64: mm: Pin down ASIDs for sharing mm with devicesJean-Philippe Brucker
To enable address space sharing with the IOMMU, introduce arm64_mm_context_get() and arm64_mm_context_put(), that pin down a context and ensure that it will keep its ASID after a rollover. Export the symbols to let the modular SMMUv3 driver use them. Pinning is necessary because a device constantly needs a valid ASID, unlike tasks that only require one when running. Without pinning, we would need to notify the IOMMU when we're about to use a new ASID for a task, and it would get complicated when a new task is assigned a shared ASID. Consider the following scenario with no ASID pinned: 1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1) 2. Task t2 is scheduled on CPUx, gets ASID (1, 2) 3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1) We would now have to immediately generate a new ASID for t1, notify the IOMMU, and finally enable task tn. We are holding the lock during all that time, since we can't afford having another CPU trigger a rollover. The IOMMU issues invalidation commands that can take tens of milliseconds. It gets needlessly complicated. All we wanted to do was schedule task tn, that has no business with the IOMMU. By letting the IOMMU pin tasks when needed, we avoid stalling the slow path, and let the pinning fail when we're out of shareable ASIDs. After a rollover, the allocator expects at least one ASID to be available in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS - 1) is the maximum number of ASIDs that can be shared with the IOMMU. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20200918101852.582559-5-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Remove _sdei_event_unregister()Gavin Shan
_sdei_event_unregister() is called by sdei_event_unregister() and sdei_device_freeze(). _sdei_event_unregister() covers the shared and private events, but sdei_device_freeze() only covers the shared events. So the logic to cover the private events isn't needed by sdei_device_freeze(). sdei_event_unregister sdei_device_freeze _sdei_event_unregister sdei_unregister_shared _sdei_event_unregister This removes _sdei_event_unregister(). Its logic is moved to its callers accordingly. This shouldn't cause any logical changes. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-14-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Remove _sdei_event_register()Gavin Shan
The function _sdei_event_register() is called by sdei_event_register() and sdei_device_thaw() as the following functional call chain shows. _sdei_event_register() covers the shared and private events, but sdei_device_thaw() only covers the shared events. So the logic to cover the private events in _sdei_event_register() isn't needed by sdei_device_thaw(). Similarly, sdei_reregister_event_llocked() covers the shared and private events in the regard of reenablement. The logic to cover the private events isn't needed by sdei_device_thaw() either. sdei_event_register sdei_device_thaw _sdei_event_register sdei_reregister_shared sdei_reregister_event_llocked _sdei_event_register This removes _sdei_event_register() and sdei_reregister_event_llocked(). Their logic is moved to sdei_event_register() and sdei_reregister_shared(). This shouldn't cause any logical changes. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-13-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Introduce sdei_do_local_call()Gavin Shan
During the CPU hotplug, the private events are registered, enabled or unregistered on the specific CPU. It repeats the same steps: initializing cross call argument, make function call on local CPU, check the returned error. This introduces sdei_do_local_call() to cover the first steps. The other benefit is to make CROSSCALL_INIT and struct sdei_crosscall_args are only visible to sdei_do_{cross, local}_call(). Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-12-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Cleanup on cross call functionGavin Shan
This applies cleanup on the cross call functions, no functional changes are introduced: * Wrap the code block of CROSSCALL_INIT inside "do { } while (0)" as linux kernel usually does. Otherwise, scripts/checkpatch.pl reports warning regarding this. * Use smp_call_func_t for @fn argument in sdei_do_cross_call() as the function is called on target CPU(s). * Remove unnecessary space before @event in sdei_do_cross_call() Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-11-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Remove while loop in sdei_event_unregister()Gavin Shan
This removes the unnecessary while loop in sdei_event_unregister() because of the following two reasons. This shouldn't cause any functional changes. * The while loop is executed for once, meaning it's not needed in theory. * With the while loop removed, the nested statements can be avoid to make the code a bit cleaner. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20200922130423.10173-10-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Remove while loop in sdei_event_register()Gavin Shan
This removes the unnecessary while loop in sdei_event_register() because of the following two reasons. This shouldn't cause any functional changes. * The while loop is executed for once, meaning it's not needed in theory. * With the while loop removed, the nested statements can be avoid to make the code a bit cleaner. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20200922130423.10173-9-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Remove redundant error message in sdei_probe()Gavin Shan
This removes the redundant error message in sdei_probe() because the case can be identified from the errno in next error message. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-8-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Remove duplicate check in sdei_get_conduit()Gavin Shan
The following two checks are duplicate because @acpi_disabled doesn't depend on CONFIG_ACPI. So the duplicate check (IS_ENABLED(CONFIG_ACPI)) can be dropped. More details is provided to keep the commit log complete: * @acpi_disabled is defined in arch/arm64/kernel/acpi.c when CONFIG_ACPI is enabled. * @acpi_disabled in defined in include/acpi.h when CONFIG_ACPI is disabled. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-7-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Unregister driver on error in sdei_init()Gavin Shan
The SDEI platform device is created from device-tree node or ACPI (SDEI) table. For the later case, the platform device is created explicitly by this module. It'd better to unregister the driver on failure to create the device to keep the symmetry. The driver, owned by this module, isn't needed if the device isn't existing. Besides, the errno (@ret) should be updated accordingly in this case. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-6-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Avoid nested statements in sdei_init()Gavin Shan
In sdei_init(), the nested statements can be avoided by bailing on error from platform_driver_register() or absent ACPI SDEI table. With it, the code looks a bit more readable. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-5-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Retrieve event number from event instanceGavin Shan
In sdei_event_create(), the event number is retrieved from the variable @event_num for the shared event. The event number was stored in the event instance. So we can fetch it from the event instance, similar to what we're doing for the private event. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-4-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Common block for failing path in sdei_event_create()Gavin Shan
There are multiple calls of kfree(event) in the failing path of sdei_event_create() to free the SDEI event. It means we need to call it again when adding more code in the failing path. It's prone to miss doing that and introduce memory leakage. This introduces common block for failing path in sdei_event_create() to resolve the issue. This shouldn't cause functional changes. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200922130423.10173-3-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28firmware: arm_sdei: Remove sdei_is_err()Gavin Shan
sdei_is_err() is only called by sdei_to_linux_errno(). The logic of checking on the error number is common to them. They can be combined finely. This removes sdei_is_err() and its logic is combined to the function sdei_to_linux_errno(). Also, the assignment of @err to zero is also dropped in invoke_sdei_fn() because it's always overridden afterwards. This shouldn't cause functional changes. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20200922130423.10173-2-gshan@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28net_sched: remove a redundant goto chain checkCong Wang
All TC actions call tcf_action_check_ctrlact() to validate goto chain, so this check in tcf_action_init_1() is actually redundant. Remove it to save troubles of leaking memory. Fixes: e49d8c22f126 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()") Reported-by: Vlad Buslov <vladbu@mellanox.com> Suggested-by: Davide Caratti <dcaratti@redhat.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net: bridge: fdb: don't flush ext_learn entriesNikolay Aleksandrov
When a user-space software manages fdb entries externally it should set the ext_learn flag which marks the fdb entry as externally managed and avoids expiring it (they're treated as static fdbs). Unfortunately on events where fdb entries are flushed (STP down, netlink fdb flush etc) these fdbs are also deleted automatically by the bridge. That in turn causes trouble for the managing user-space software (e.g. in MLAG setups we lose remote fdb entries on port flaps). These entries are completely externally managed so we should avoid automatically deleting them, the only exception are offloaded entries (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as before. Fixes: eb100e0e24a2 ("net: bridge: allow to add externally learned entries from user-space") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-09-28 1) Fix a build warning in ip_vti if CONFIG_IPV6 is not set. From YueHaibing. 2) Restore IPCB on espintcp before handing the packet to xfrm as the information there is still needed. From Sabrina Dubroca. 3) Fix pmtu updating for xfrm interfaces. From Sabrina Dubroca. 4) Some xfrm state information was not cloned with xfrm_do_migrate. Fixes to clone the full xfrm state, from Antony Antony. 5) Use the correct address family in xfrm_state_find. The struct flowi must always be interpreted along with the original address family. This got lost over the years. Fix from Herbert Xu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28spi: fsl-dspi: fix NULL pointer dereferenceMichael Walle
Since commit 530b5affc675 ("spi: fsl-dspi: fix use-after-free in remove path") this driver causes a kernel oops: [ 1.891065] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080 [..] [ 2.056973] Call trace: [ 2.059425] dspi_setup+0xc8/0x2e0 [ 2.062837] spi_setup+0xcc/0x248 [ 2.066160] spi_add_device+0xb4/0x198 [ 2.069918] of_register_spi_device+0x250/0x370 [ 2.074462] spi_register_controller+0x4f4/0x770 [ 2.079094] dspi_probe+0x5bc/0x7b0 [ 2.082594] platform_drv_probe+0x5c/0xb0 [ 2.086615] really_probe+0xec/0x3c0 [ 2.090200] driver_probe_device+0x60/0xc0 [ 2.094308] device_driver_attach+0x7c/0x88 [ 2.098503] __driver_attach+0x60/0xe8 [ 2.102263] bus_for_each_dev+0x7c/0xd0 [ 2.106109] driver_attach+0x2c/0x38 [ 2.109692] bus_add_driver+0x194/0x1f8 [ 2.113538] driver_register+0x6c/0x128 [ 2.117385] __platform_driver_register+0x50/0x60 [ 2.122105] fsl_dspi_driver_init+0x24/0x30 [ 2.126302] do_one_initcall+0x54/0x2d0 [ 2.130149] kernel_init_freeable+0x1ec/0x258 [ 2.134520] kernel_init+0x1c/0x120 [ 2.138018] ret_from_fork+0x10/0x34 [ 2.141606] Code: 97e0b11d aa0003f3 b4000680 f94006e0 (f9404000) [ 2.147723] ---[ end trace 26cf63e6cbba33a8 ]--- This is because since this commit, the allocation of the drivers private data is done explicitly and in this case spi_alloc_master() won't set the correct pointer. Also move the platform_set_drvdata() to have both next to each other. Fixes: 530b5affc675 ("spi: fsl-dspi: fix use-after-free in remove path") Signed-off-by: Michael Walle <michael@walle.cc> Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20200928085500.28254-1-michael@walle.cc Signed-off-by: Mark Brown <broonie@kernel.org>
2020-09-28Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - NFSv4.2: copy_file_range needs to invalidate caches on success - NFSv4.2: Fix security label length not being reset - pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read - pNFS/flexfiles: Fix signed/unsigned type issues with mirror indices" * tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS/flexfiles: Be consistent about mirror index types pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read NFSv4.2: fix client's attribute cache management for copy_file_range nfs: Fix security label length not being reset
2020-09-28arm_pmu: arm64: Use NMIs for PMUJulien Thierry
Add required PMU interrupt operations for NMIs. Request interrupt lines as NMIs when possible, otherwise fall back to normal interrupts. NMIs are only supported on the arm64 architecture with a GICv3 irqchip. [Alexandru E.: Added that NMIs only work on arm64 + GICv3, print message when PMU is using NMIs] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-8-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28arm_pmu: Introduce pmu_irq_opsJulien Thierry
Currently the PMU interrupt can either be a normal irq or a percpu irq. Supporting NMI will introduce two cases for each existing one. It becomes a mess of 'if's when managing the interrupt. Define sets of callbacks for operations commonly done on the interrupt. The appropriate set of callbacks is selected at interrupt request time and simplifies interrupt enabling/disabling and freeing. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-7-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28KVM: arm64: pmu: Make overflow handler NMI safeJulien Thierry
kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from NMI context, defer waking the vcpu to an irq_work queue. A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent running the irq_work for a non-existent vcpu by calling irq_work_sync() on the PMU destroy path. [Alexandru E.: Added irq_work_sync()] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20200924110706.254996-6-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28arm64: perf: Defer irq_work to IPI_IRQ_WORKJulien Thierry
When handling events, armv8pmu_handle_irq() calls perf_event_overflow(), and subsequently calls irq_work_run() to handle any work queued by perf_event_overflow(). As perf_event_overflow() raises IPI_IRQ_WORK when queuing the work, this isn't strictly necessary and the work could be handled as part of the IPI_IRQ_WORK handler. In the common case the IPI handler will run immediately after the PMU IRQ handler, and where the PE is heavily loaded with interrupts other handlers may run first, widening the window where some counters are disabled. In practice this window is unlikely to be a significant issue, and removing the call to irq_work_run() would make the PMU IRQ handler NMI safe in addition to making it simpler, so let's do that. [Alexandru E.: Reworded commit message] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-5-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28arm64: perf: Remove PMU lockingJulien Thierry
The PMU is disabled and enabled, and the counters are programmed from contexts where interrupts or preemption is disabled. The functions to toggle the PMU and to program the PMU counters access the registers directly and don't access data modified by the interrupt handler. That, and the fact that they're always called from non-preemptible contexts, means that we don't need to disable interrupts or use a spinlock. [Alexandru E.: Explained why locking is not needed, removed WARN_ONs] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-4-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28arm64: perf: Avoid PMXEV* indirectionMark Rutland
Currently we access the counter registers and their respective type registers indirectly. This requires us to write to PMSELR, issue an ISB, then access the relevant PMXEV* registers. This is unfortunate, because: * Under virtualization, accessing one register requires two traps to the hypervisor, even though we could access the register directly with a single trap. * We have to issue an ISB which we could otherwise avoid the cost of. * When we use NMIs, the NMI handler will have to save/restore the select register in case the code it preempted was attempting to access a counter or its type register. We can avoid these issues by directly accessing the relevant registers. This patch adds helpers to do so. In armv8pmu_enable_event() we still need the ISB to prevent the PE from reordering the write to PMINTENSET_EL1 register. If the interrupt is enabled before we disable the counter and the new event is configured, we might get an interrupt triggered by the previously programmed event overflowing, but which we wrongly attribute to the event that we are enabling. Execute an ISB after we disable the counter. In the process, remove the comment that refers to the ARMv7 PMU. [Julien T.: Don't inline read/write functions to avoid big code-size increase, remove unused read_pmevtypern function, fix counter index issue.] [Alexandru E.: Removed comment, removed trailing semicolons in macros, added ISB] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-3-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28arm64: perf: Add missing ISB in armv8pmu_enable_counter()Alexandru Elisei
Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In armv8pmu_enable_event(), the PE can reorder configuring the event type after we have enabled the counter and the interrupt. This can lead to an interrupt being asserted because of the previous event type that we were counting using the same counter, not the one that we've just configured. The same rationale applies to writes to the PMINTENSET_EL1 register. The PE can reorder enabling the interrupt at any point in the future after we have enabled the event. Prevent both situations from happening by adding an ISB just before we enable the event counter. Fixes: 030896885ade ("arm64: Performance counters support") Reported-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200924110706.254996-2-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28perf: Add Arm CMN-600 PMU driverRobin Murphy
Initial driver for PMU event counting on the Arm CMN-600 interconnect. CMN sports an obnoxiously complex distributed PMU system as part of its debug and trace features, which can do all manner of things like sampling, cross-triggering and generating CoreSight trace. This driver covers the PMU functionality, plus the relevant aspects of watchpoints for simply counting matching flits. Tested-by: Tsahi Zidenberg <tsahee@amazon.com> Tested-by: Tuan Phan <tuanphan@os.amperecomputing.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28perf: Add Arm CMN-600 DT bindingRobin Murphy
Document the requirements for the CMN-600 DT binding. The internal topology is almost entirely discoverable by walking a tree of ID registers, but sadly both the starting point for that walk and the exact format of those registers are configuration-dependent and not discoverable from some sane fixed location. Oh well. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28mm: do not rely on mm == current->mm in __get_user_pages_lockedJason A. Donenfeld
It seems likely this block was pasted from internal_get_user_pages_fast, which is not passed an mm struct and therefore uses current's. But __get_user_pages_locked is passed an explicit mm, and current->mm is not always valid. This was hit when being called from i915, which uses: pin_user_pages_remote-> __get_user_pages_remote-> __gup_longterm_locked-> __get_user_pages_locked Before, this would lead to an OOPS: BUG: kernel NULL pointer dereference, address: 0000000000000064 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S U O 5.9.0-rc7+ #140 Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS N2OET47W (1.34 ) 08/06/2020 Workqueue: i915-userptr-acquire __i915_gem_userptr_get_pages_worker [i915] RIP: 0010:__get_user_pages_remote+0xd7/0x310 Call Trace: __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915] process_one_work+0x1ca/0x390 worker_thread+0x48/0x3c0 kthread+0x114/0x130 ret_from_fork+0x1f/0x30 CR2: 0000000000000064 This commit fixes the problem by using the mm pointer passed to the function rather than the bogus one in current. Fixes: 008cfe4418b3 ("mm: Introduce mm_struct.has_pinned") Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-by: Harald Arnesen <harald@skogtun.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-28io_uring: fix potential ABBA deadlock in ->show_fdinfo()Jens Axboe
syzbot reports a potential lock deadlock between the normal IO path and ->show_fdinfo(): ====================================================== WARNING: possible circular locking dependency detected 5.9.0-rc6-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.2/19710 is trying to acquire lock: ffff888098ddc450 (sb_writers#4){.+.+}-{0:0}, at: io_write+0x6b5/0xb30 fs/io_uring.c:3296 but task is already holding lock: ffff8880a11b8428 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0xe9a/0x1bd0 fs/io_uring.c:8348 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ctx->uring_lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 __io_uring_show_fdinfo fs/io_uring.c:8417 [inline] io_uring_show_fdinfo+0x194/0xc70 fs/io_uring.c:8460 seq_show+0x4a8/0x700 fs/proc/fd.c:65 seq_read+0x432/0x1070 fs/seq_file.c:208 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 kernel_readv fs/splice.c:355 [inline] default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412 do_splice_to+0x137/0x170 fs/splice.c:871 splice_direct_to_actor+0x307/0x980 fs/splice.c:950 do_splice_direct+0x1b3/0x280 fs/splice.c:1059 do_sendfile+0x55f/0xd40 fs/read_write.c:1540 __do_sys_sendfile64 fs/read_write.c:1601 [inline] __se_sys_sendfile64 fs/read_write.c:1587 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&p->lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 seq_read+0x61/0x1070 fs/seq_file.c:155 pde_read fs/proc/inode.c:306 [inline] proc_reg_read+0x221/0x300 fs/proc/inode.c:318 do_loop_readv_writev fs/read_write.c:734 [inline] do_loop_readv_writev fs/read_write.c:721 [inline] do_iter_read+0x48e/0x6e0 fs/read_write.c:955 vfs_readv+0xe5/0x150 fs/read_write.c:1073 kernel_readv fs/splice.c:355 [inline] default_file_splice_read.constprop.0+0x4e6/0x9e0 fs/splice.c:412 do_splice_to+0x137/0x170 fs/splice.c:871 splice_direct_to_actor+0x307/0x980 fs/splice.c:950 do_splice_direct+0x1b3/0x280 fs/splice.c:1059 do_sendfile+0x55f/0xd40 fs/read_write.c:1540 __do_sys_sendfile64 fs/read_write.c:1601 [inline] __se_sys_sendfile64 fs/read_write.c:1587 [inline] __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1587 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (sb_writers#4){.+.+}-{0:0}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4441 lock_acquire+0x1f3/0xaf0 kernel/locking/lockdep.c:5029 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x228/0x450 fs/super.c:1672 io_write+0x6b5/0xb30 fs/io_uring.c:3296 io_issue_sqe+0x18f/0x5c50 fs/io_uring.c:5719 __io_queue_sqe+0x280/0x1160 fs/io_uring.c:6175 io_queue_sqe+0x692/0xfa0 fs/io_uring.c:6254 io_submit_sqe fs/io_uring.c:6324 [inline] io_submit_sqes+0x1761/0x2400 fs/io_uring.c:6521 __do_sys_io_uring_enter+0xeac/0x1bd0 fs/io_uring.c:8349 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: sb_writers#4 --> &p->lock --> &ctx->uring_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ctx->uring_lock); lock(&p->lock); lock(&ctx->uring_lock); lock(sb_writers#4); *** DEADLOCK *** 1 lock held by syz-executor.2/19710: #0: ffff8880a11b8428 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0xe9a/0x1bd0 fs/io_uring.c:8348 stack backtrace: CPU: 0 PID: 19710 Comm: syz-executor.2 Not tainted 5.9.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4441 lock_acquire+0x1f3/0xaf0 kernel/locking/lockdep.c:5029 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x228/0x450 fs/super.c:1672 io_write+0x6b5/0xb30 fs/io_uring.c:3296 io_issue_sqe+0x18f/0x5c50 fs/io_uring.c:5719 __io_queue_sqe+0x280/0x1160 fs/io_uring.c:6175 io_queue_sqe+0x692/0xfa0 fs/io_uring.c:6254 io_submit_sqe fs/io_uring.c:6324 [inline] io_submit_sqes+0x1761/0x2400 fs/io_uring.c:6521 __do_sys_io_uring_enter+0xeac/0x1bd0 fs/io_uring.c:8349 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45e179 Code: 3d b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b b2 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1194e74c78 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 00000000000082c0 RCX: 000000000045e179 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004 RBP: 000000000118cf98 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c R13: 00007ffd1aa5756f R14: 00007f1194e759c0 R15: 000000000118cf4c Fix this by just not diving into details if we fail to trylock the io_uring mutex. We know the ctx isn't going away during this operation, but we cannot safely iterate buffers/files/personalities if we don't hold the io_uring mutex. Reported-by: syzbot+2f8fa4e860edc3066aba@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-28io_uring: always delete double poll wait entry on matchJens Axboe
syzbot reports a crash with tty polling, which is using the double poll handling: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] CPU: 0 PID: 6874 Comm: syz-executor749 Not tainted 5.9.0-rc6-next-20200924-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:io_poll_get_single fs/io_uring.c:4778 [inline] RIP: 0010:io_poll_double_wake+0x51/0x510 fs/io_uring.c:4845 Code: fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 9e 03 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 5d 08 48 8d 7b 48 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 63 03 00 00 0f b6 6b 48 bf 06 00 00 RSP: 0018:ffffc90001c1fb70 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000004 RDX: 0000000000000009 RSI: ffffffff81d9b3ad RDI: 0000000000000048 RBP: dffffc0000000000 R08: ffff8880a3cac798 R09: ffffc90001c1fc60 R10: fffff52000383f73 R11: 0000000000000000 R12: 0000000000000004 R13: ffff8880a3cac798 R14: ffff8880a3cac7a0 R15: 0000000000000004 FS: 0000000001f98880(0000) GS:ffff8880ae400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f18886916c0 CR3: 0000000094c5a000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __wake_up_common+0x147/0x650 kernel/sched/wait.c:93 __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:123 tty_ldisc_hangup+0x1cf/0x680 drivers/tty/tty_ldisc.c:735 __tty_hangup.part.0+0x403/0x870 drivers/tty/tty_io.c:625 __tty_hangup drivers/tty/tty_io.c:575 [inline] tty_vhangup+0x1d/0x30 drivers/tty/tty_io.c:698 pty_close+0x3f5/0x550 drivers/tty/pty.c:79 tty_release+0x455/0xf60 drivers/tty/tty_io.c:1679 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:165 [inline] exit_to_user_mode_prepare+0x1e2/0x1f0 kernel/entry/common.c:192 syscall_exit_to_user_mode+0x7a/0x2c0 kernel/entry/common.c:267 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x401210 which is due to a failure in removing the double poll wait entry if we hit a wakeup match. This can cause multiple invocations of the wakeup, which isn't safe. Cc: stable@vger.kernel.org # v5.8 Reported-by: syzbot+81b3883093f772addf6d@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-28PCI: MSI: Fix Kconfig dependencies for PCI_MSI_ARCH_FALLBACKSThomas Gleixner
The unconditional selection of PCI_MSI_ARCH_FALLBACKS has an unmet dependency because PCI_MSI_ARCH_FALLBACKS is defined in a 'if PCI' clause. As it is only relevant when PCI_MSI is enabled, update the affected architecture Kconfigs to make the selection of PCI_MSI_ARCH_FALLBACKS depend on 'if PCI_MSI'. Fixes: 077ee78e3928 ("PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable") Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Links: https://lore.kernel.org/r/cdfd63305caa57785b0925dd24c0711ea02c8527.camel@redhat.com
2020-09-28arm64: perf: Add support caps under sysfsShaokun Zhang
ARMv8.4-PMU introduces the PMMIR_EL1 registers and some new PMU events, like STALL_SLOT etc, are related to it. Let's add a caps directory to /sys/bus/event_source/devices/armv8_pmuv3_0/ and support slots from PMMIR_EL1 registers in this entry. The user programs can get the slots from sysfs directly. /sys/bus/event_source/devices/armv8_pmuv3_0/caps/slots is exposed under sysfs. Both ARMv8.4-PMU and STALL_SLOT event are implemented, it returns the slots from PMMIR_EL1, otherwise it will return 0. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/1600754025-53535-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Will Deacon <will@kernel.org>
2020-09-28ARM: dts: bcm2835: Change firmware compatible from simple-bus to simple-mfdMaxime Ripard
The current binding for the RPi firmware uses the simple-bus compatible as a fallback to benefit from its automatic probing of child nodes. However, simple-bus also comes with some constraints, like having the ranges, our case. Let's switch to simple-mfd that provides the same probing logic without those constraints. Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://lore.kernel.org/r/20200924082642.18144-1-maxime@cerno.tech Signed-off-by: Rob Herring <robh@kernel.org>