summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-25btrfs: rename btrfs_alloc_chunk to btrfs_create_chunkNikolay Borisov
The user facing function used to allocate new chunks is btrfs_chunk_alloc, unfortunately there is yet another similar sounding function - btrfs_alloc_chunk. This creates confusion, especially since the latter function can be considered "private" in the sense that it implements the first stage of chunk creation and as such is called by btrfs_chunk_alloc. To avoid the awkwardness that comes with having similarly named but distinctly different in their purpose function rename btrfs_alloc_chunk to btrfs_create_chunk, given that the main purpose of this function is to orchestrate the whole process of allocating a chunk - reserving space into devices, deciding on characteristics of the stripe size and creating the in-memory structures. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-25Linux 5.15-rc7v5.15-rc7Linus Torvalds
2021-10-25secretmem: Prevent secretmem_users from wrapping to zeroMatthew Wilcox (Oracle)
Commit 110860541f44 ("mm/secretmem: use refcount_t instead of atomic_t") attempted to fix the problem of secretmem_users wrapping to zero and allowing suspend once again. But it was reverted in commit 87066fdd2e30 ("Revert 'mm/secretmem: use refcount_t instead of atomic_t'") because of the problems it caused - a refcount_t was not semantically the right type to use. Instead prevent secretmem_users from wrapping to zero by forbidding new users if the number of users has wrapped from positive to negative. This stops a long way short of reaching the necessary 4 billion users where it wraps to zero again, so there's no need to be clever with special anti-wrap types or checking the return value from atomic_inc(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jordy Zomer <jordy@pwning.systems> Cc: Kees Cook <keescook@chromium.org>, Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-25spi: Fix tegra20 build with CONFIG_PM=n once againLinus Torvalds
Commit efafec27c565 ("spi: Fix tegra20 build with CONFIG_PM=n") already fixed the build without PM support once. There was an alternative fix by Guenter in commit 2bab94090b01 ("spi: tegra20-slink: Declare runtime suspend and resume functions conditionally"), and Mark then merged the two correctly in ffb1e76f4f32 ("Merge tag 'v5.15-rc2' into spi-5.15"). But for some inexplicable reason, Mark then merged things _again_ in commit 59c4e190b10c ("Merge tag 'v5.15-rc3' into spi-5.15"), and screwed things up at that point, and the __maybe_unused attribute on tegra_slink_runtime_resume() went missing. Reinstate it, so that alpha (and other architectures without PM support) builds cleanly again. Btw, this is another prime example of how random back-merges are not good. Just don't do them. Subsystem developers should not merge my tree in any normal circumstances. Both of those merge commits pointed to above are bad: even the one that got the merge result right doesn't even mention _why_ it was done, and the one that got it wrong is obviously broken. Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-25Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: - Fix clang-related relocation warning in futex code - Fix incorrect use of get_kernel_nofault() - Fix bad code generation in __get_user_check() when kasan is enabled - Ensure TLB function table is correctly aligned - Remove duplicated string function definitions in decompressor - Fix link-time orphan section warnings - Fix old-style function prototype for arch_init_kprobes() - Only warn about XIP address when not compile testing - Handle BE32 big endian for keystone2 remapping * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9148/1: handle CONFIG_CPU_ENDIAN_BE32 in arch/arm/kernel/head.S ARM: 9141/1: only warn about XIP address when not compile testing ARM: 9139/1: kprobes: fix arch_init_kprobes() prototype ARM: 9138/1: fix link warning with XIP + frame-pointer ARM: 9134/1: remove duplicate memcpy() definition ARM: 9133/1: mm: proc-macros: ensure *_tlb_fns are 4B aligned ARM: 9132/1: Fix __get_user_check failure with ARM KASAN images ARM: 9125/1: fix incorrect use of get_kernel_nofault() ARM: 9122/1: select HAVE_FUTEX_CMPXCHG
2021-10-25Merge tag 'libata-5.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull libata fix from Damien Le Moal: "A single fix in this pull request addressing an invalid error code return in the sata_mv driver (from Zheyu)" * tag 'libata-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: sata_mv: Fix the error handling of mv_chip_id()
2021-10-25Merge tag 'pinctrl-v5.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some late pin control fixes, the most generally annoying will probably be the AMD IRQ storm fix affecting the Microsoft surface. Summary: - Three fixes pertaining to Broadcom DT bindings. Some stuff didn't work out as inteded, we need to back out - A resume bug fix in the STM32 driver - Disable and mask the interrupts on probe in the AMD pinctrl driver, affecting Microsoft surface" * tag 'pinctrl-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: disable and mask interrupts on probe pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume() Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode" dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from example Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"
2021-10-25sbitmap: silence data race warningJens Axboe
KCSAN complaints about the sbitmap hint update: ================================================================== BUG: KCSAN: data-race in sbitmap_queue_clear / sbitmap_queue_clear write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 1: sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606 blk_mq_put_tag+0x82/0x90 __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507 blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541 __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565 blk_mq_end_request+0x37/0x50 block/blk-mq.c:574 lo_complete_rq+0xca/0x170 drivers/block/loop.c:541 blk_complete_reqs block/blk-mq.c:584 [inline] blk_done_softirq+0x69/0x90 block/blk-mq.c:589 __do_softirq+0x12c/0x26e kernel/softirq.c:558 run_ksoftirqd+0x13/0x20 kernel/softirq.c:920 smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164 kthread+0x262/0x280 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 0: sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606 blk_mq_put_tag+0x82/0x90 __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507 blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541 __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565 blk_mq_end_request+0x37/0x50 block/blk-mq.c:574 lo_complete_rq+0xca/0x170 drivers/block/loop.c:541 blk_complete_reqs block/blk-mq.c:584 [inline] blk_done_softirq+0x69/0x90 block/blk-mq.c:589 __do_softirq+0x12c/0x26e kernel/softirq.c:558 run_ksoftirqd+0x13/0x20 kernel/softirq.c:920 smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164 kthread+0x262/0x280 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 value changed: 0x00000035 -> 0x00000044 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.15.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ================================================================== which is a data race, but not an important one. This is just updating the percpu alloc hint, and the reader of that hint doesn't ever require it to be valid. Just annotate it with data_race() to silence this one. Reported-by: syzbot+4f8bfd804b4a1f95b8f6@syzkaller.appspotmail.com Acked-by: Marco Elver <elver@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25fs: get rid of the res2 iocb->ki_complete argumentJens Axboe
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperVTianyu Lan
Hyper-V needs to issue the GHCB HV call in order to read/write MSRs in Isolation VMs. For that, expose sev_es_ghcb_hv_call(). The Hyper-V Isolation VMs are unenlightened guests and run a paravisor at VMPL0 for communicating. GHCB pages are being allocated and set up by that paravisor. Linux gets the GHCB page's physical address via MSR_AMD64_SEV_ES_GHCB from the paravisor and should not change it. Add a @set_ghcb_msr parameter to sev_es_ghcb_hv_call() to control whether the function should set the GHCB's address prior to the call or not and export that function for use by HyperV. [ bp: - Massage commit message - add a struct ghcb forward declaration to fix randconfig builds. ] Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-6-ltykernel@gmail.com
2021-10-25usb: remove res2 argument from gadget code completionsJens Axboe
The USB gadget code is the only code that every tried to utilize the 2nd argument of the aio completions, but there are strong suspicions that it was never actually used by anything on the userspace side. Out of the 3 cases that touch it, two of them just pass in the same as res, and the last one passes in error/transfer in res like any other normal use case. Remove the 2nd argument, pass 0 like the rest of the in-kernel users of kiocb based IO. Link: https://lore.kernel.org/linux-block/20211021174021.273c82b1.john@metanate.com/ Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25net-sysfs: initialize uid and gid before calling net_ns_get_ownershipXin Long
Currently in net_ns_get_ownership() it may not be able to set uid or gid if make_kuid or make_kgid returns an invalid value, and an uninit-value issue can be triggered by this. This patch is to fix it by initializing the uid and gid before calling net_ns_get_ownership(), as it does in kobject_get_ownership() Fixes: e6dee9f3893c ("net-sysfs: add netdev_change_owner()") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25xen/netfront: stop tx queues during live migrationDongli Zhang
The tx queues are not stopped during the live migration. As a result, the ndo_start_xmit() may access netfront_info->queues which is freed by talk_to_netback()->xennet_destroy_queues(). This patch is to netif_device_detach() at the beginning of xen-netfront resuming, and netif_device_attach() at the end of resuming. CPU A CPU B talk_to_netback() -> if (info->queues) xennet_destroy_queues(info); to free netfront_info->queues xennet_start_xmit() to access netfront_info->queues -> err = xennet_create_queues(info, &num_queues); The idea is borrowed from virtio-net. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net: Prevent infinite while loop in skb_tx_hash()Michael Chan
Drivers call netdev_set_num_tc() and then netdev_set_tc_queue() to set the queue count and offset for each TC. So the queue count and offset for the TCs may be zero for a short period after dev->num_tc has been set. If a TX packet is being transmitted at this time in the code path netdev_pick_tx() -> skb_tx_hash(), skb_tx_hash() may see nonzero dev->num_tc but zero qcount for the TC. The while loop that keeps looping while hash >= qcount will not end. Fix it by checking the TC's qcount to be nonzero before using it. Fixes: eadec877ce9c ("net: Add support for subordinate traffic classes to netdev_pick_tx") Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a stringMark Zhang
When copying the device name, the length of the data memcpy copied exceeds the length of the source buffer, which cause the KASAN issue below. Use strscpy_pad() instead. BUG: KASAN: slab-out-of-bounds in ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] Read of size 64 at addr ffff88811a10f5e0 by task rping/140263 CPU: 3 PID: 140263 Comm: rping Not tainted 5.15.0-rc1+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1d/0xa0 kasan_report+0xcb/0x110 kasan_check_range+0x13d/0x180 memcpy+0x20/0x60 ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] ib_nl_make_request+0x1c6/0x380 [ib_core] send_mad+0x20a/0x220 [ib_core] ib_sa_path_rec_get+0x3e3/0x800 [ib_core] cma_query_ib_route+0x29b/0x390 [rdma_cm] rdma_resolve_route+0x308/0x3e0 [rdma_cm] ucma_resolve_route+0xe1/0x150 [rdma_ucm] ucma_write+0x17b/0x1f0 [rdma_ucm] vfs_write+0x142/0x4d0 ksys_write+0x133/0x160 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f26499aa90f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48 RSP: 002b:00007f26495f2dc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000000007d0 RCX: 00007f26499aa90f RDX: 0000000000000010 RSI: 00007f26495f2e00 RDI: 0000000000000003 RBP: 00005632a8315440 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26495f2e00 R13: 00005632a83154e0 R14: 00005632a8315440 R15: 00005632a830a810 Allocated by task 131419: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 proc_self_get_link+0x8b/0x100 pick_link+0x4f1/0x5c0 step_into+0x2eb/0x3d0 walk_component+0xc8/0x2c0 link_path_walk+0x3b8/0x580 path_openat+0x101/0x230 do_filp_open+0x12e/0x240 do_sys_openat2+0x115/0x280 __x64_sys_openat+0xce/0x140 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Link: https://lore.kernel.org/r/72ede0f6dab61f7f23df9ac7a70666e07ef314b0.1635055496.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-25net: nxp: lpc_eth.c: avoid hang when bringing interface downTrevor Woerner
A hard hang is observed whenever the ethernet interface is brought down. If the PHY is stopped before the LPC core block is reset, the SoC will hang. Comparing lpc_eth_close() and lpc_eth_open() I re-arranged the ordering of the functions calls in lpc_eth_close() to reset the hardware before stopping the PHY. Fixes: b7370112f519 ("lpc32xx: Added ethernet driver") Signed-off-by: Trevor Woerner <twoerner@gmail.com> Acked-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25blk-cgroup: synchronize blkg creation against policy deactivationYu Kuai
Our test reports a null pointer dereference: [ 168.534653] ================================================================== [ 168.535614] Disabling lock debugging due to kernel taint [ 168.536346] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 168.537274] #PF: supervisor read access in kernel mode [ 168.537964] #PF: error_code(0x0000) - not-present page [ 168.538667] PGD 0 P4D 0 [ 168.539025] Oops: 0000 [#1] PREEMPT SMP KASAN [ 168.539656] CPU: 13 PID: 759 Comm: bash Tainted: G B 5.15.0-rc2-next-202100 [ 168.540954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_0738364 [ 168.542736] RIP: 0010:bfq_pd_init+0x88/0x1e0 [ 168.543318] Code: 98 00 00 00 e8 c9 e4 5b ff 4c 8b 65 00 49 8d 7c 24 08 e8 bb e4 5b ff 4d0 [ 168.545803] RSP: 0018:ffff88817095f9c0 EFLAGS: 00010002 [ 168.546497] RAX: 0000000000000001 RBX: ffff888101a1c000 RCX: 0000000000000000 [ 168.547438] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff888106553428 [ 168.548402] RBP: ffff888106553400 R08: ffffffff961bcaf4 R09: 0000000000000001 [ 168.549365] R10: ffffffffa2e16c27 R11: fffffbfff45c2d84 R12: 0000000000000000 [ 168.550291] R13: ffff888101a1c098 R14: ffff88810c7a08c8 R15: ffffffffa55541a0 [ 168.551221] FS: 00007fac75227700(0000) GS:ffff88839ba80000(0000) knlGS:0000000000000000 [ 168.552278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 168.553040] CR2: 0000000000000008 CR3: 0000000165ce7000 CR4: 00000000000006e0 [ 168.554000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 168.554929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 168.555888] Call Trace: [ 168.556221] <TASK> [ 168.556510] blkg_create+0x1c0/0x8c0 [ 168.556989] blkg_conf_prep+0x574/0x650 [ 168.557502] ? stack_trace_save+0x99/0xd0 [ 168.558033] ? blkcg_conf_open_bdev+0x1b0/0x1b0 [ 168.558629] tg_set_conf.constprop.0+0xb9/0x280 [ 168.559231] ? kasan_set_track+0x29/0x40 [ 168.559758] ? kasan_set_free_info+0x30/0x60 [ 168.560344] ? tg_set_limit+0xae0/0xae0 [ 168.560853] ? do_sys_openat2+0x33b/0x640 [ 168.561383] ? do_sys_open+0xa2/0x100 [ 168.561877] ? __x64_sys_open+0x4e/0x60 [ 168.562383] ? __kasan_check_write+0x20/0x30 [ 168.562951] ? copyin+0x48/0x70 [ 168.563390] ? _copy_from_iter+0x234/0x9e0 [ 168.563948] tg_set_conf_u64+0x17/0x20 [ 168.564467] cgroup_file_write+0x1ad/0x380 [ 168.565014] ? cgroup_file_poll+0x80/0x80 [ 168.565568] ? __mutex_lock_slowpath+0x30/0x30 [ 168.566165] ? pgd_free+0x100/0x160 [ 168.566649] kernfs_fop_write_iter+0x21d/0x340 [ 168.567246] ? cgroup_file_poll+0x80/0x80 [ 168.567796] new_sync_write+0x29f/0x3c0 [ 168.568314] ? new_sync_read+0x410/0x410 [ 168.568840] ? __handle_mm_fault+0x1c97/0x2d80 [ 168.569425] ? copy_page_range+0x2b10/0x2b10 [ 168.570007] ? _raw_read_lock_bh+0xa0/0xa0 [ 168.570622] vfs_write+0x46e/0x630 [ 168.571091] ksys_write+0xcd/0x1e0 [ 168.571563] ? __x64_sys_read+0x60/0x60 [ 168.572081] ? __kasan_check_write+0x20/0x30 [ 168.572659] ? do_user_addr_fault+0x446/0xff0 [ 168.573264] __x64_sys_write+0x46/0x60 [ 168.573774] do_syscall_64+0x35/0x80 [ 168.574264] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 168.574960] RIP: 0033:0x7fac74915130 [ 168.575456] Code: 73 01 c3 48 8b 0d 58 ed 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 444 [ 168.577969] RSP: 002b:00007ffc3080e288 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 168.578986] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fac74915130 [ 168.579937] RDX: 0000000000000009 RSI: 000056007669f080 RDI: 0000000000000001 [ 168.580884] RBP: 000056007669f080 R08: 000000000000000a R09: 00007fac75227700 [ 168.581841] R10: 000056007655c8f0 R11: 0000000000000246 R12: 0000000000000009 [ 168.582796] R13: 0000000000000001 R14: 00007fac74be55e0 R15: 00007fac74be08c0 [ 168.583757] </TASK> [ 168.584063] Modules linked in: [ 168.584494] CR2: 0000000000000008 [ 168.584964] ---[ end trace 2475611ad0f77a1a ]--- This is because blkg_alloc() is called from blkg_conf_prep() without holding 'q->queue_lock', and elevator is exited before blkg_create(): thread 1 thread 2 blkg_conf_prep spin_lock_irq(&q->queue_lock); blkg_lookup_check -> return NULL spin_unlock_irq(&q->queue_lock); blkg_alloc blkcg_policy_enabled -> true pd = ->pd_alloc_fn blkg->pd[i] = pd blk_mq_exit_sched bfq_exit_queue blkcg_deactivate_policy spin_lock_irq(&q->queue_lock); __clear_bit(pol->plid, q->blkcg_pols); spin_unlock_irq(&q->queue_lock); q->elevator = NULL; spin_lock_irq(&q->queue_lock); blkg_create if (blkg->pd[i]) ->pd_init_fn -> q->elevator is NULL spin_unlock_irq(&q->queue_lock); Because blkcg_deactivate_policy() requires queue to be frozen, we can grab q_usage_counter to synchoronize blkg_conf_prep() against blkcg_deactivate_policy(). Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20211020014036.2141723-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25block: refactor bio_iov_bvec_set()Pavel Begunkov
Combine bio_iov_bvec_set() and bio_iov_bvec_set_append() and let the caller to do iov_iter_advance(). Also get rid of __bio_iov_bvec_set(), which was duplicated in the final binary, and replace a weird iov_iter_truncate() of a temporal iter copy with min() better reflecting the intention. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/bcf1ac36fce769a514e19475f3623cd86a1d8b72.1635006010.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25block: add single bio async direct IO helperPavel Begunkov
As with __blkdev_direct_IO_simple(), we can implement direct IO more efficiently if there is only one bio. Add __blkdev_direct_IO_async() and blkdev_bio_end_io_async(). This patch brings me from 4.45-4.5 MIOPS with nullblk to 4.7+. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f0ae4109b7a6934adede490f84d188d53b97051b.1635006010.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25block: ataflop: more blk-mq refactoring fixesMichael Schmitz
As it turns out, my earlier patch in commit 86d46fdaa12a (block: ataflop: fix breakage introduced at blk-mq refactoring) was incomplete. This patch fixes any remaining issues found during more testing and code review. Requests exceeding 4 k are handled in 4k segments but __blk_mq_end_request() is never called on these (still sectors outstanding on the request). With redo_fd_request() removed, there is no provision to kick off processing of the next segment, causing requests exceeding 4k to hang. (By setting /sys/block/fd0/queue/max_sectors_k <= 4 as workaround, this behaviour can be avoided). Instead of reintroducing redo_fd_request(), requeue the remainder of the request by calling blk_mq_requeue_request() on incomplete requests (i.e. when blk_update_request() still returns true), and rely on the block layer to queue the residual as new request. Both error handling and formatting needs to release the ST-DMA lock, so call finish_fdc() on these (this was previously handled by redo_fd_request()). finish_fdc() may be called legitimately without the ST-DMA lock held - make sure we only release the lock if we actually held it. In a similar way, early exit due to errors in ataflop_queue_rq() must release the lock. After minor errors, fd_error sets up to recalibrate the drive but never re-runs the current operation (another task handled by redo_fd_request() before). Call do_fd_action() to get the next steps (seek, retry read/write) underway. Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Fixes: 6ec3938cff95f (ataflop: convert to blk-mq) CC: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20211024002013.9332-1-schmitzmic@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clusterise ki_flags access in rw_prepPavel Begunkov
ioprio setup doesn't depend on other fields that are modified in io_prep_rw() and we can move it down in the function without worrying about performance. It's useful as it makes iocb->ki_flags accesses/modifications closer together, so it's more likely the compiler will cache it in a register and avoid extra reloads. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8ee98779c06f1b59f6039b1e292db4332efd664b.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: kill unused param from io_file_supports_nowaitPavel Begunkov
io_file_supports_nowait() doesn't use rw argument anymore, remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4bd6709fc573d70c866ea656cb7a7dbe94be8026.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean up timeout async_data allocationPavel Begunkov
opcode prep functions are one of the first things that are called, we can't have ->async_data allocated at this point and it's certainly a bug. Reflect this assumption in io_timeout_prep() and add a WARN_ONCE just in case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/75a28ca7dbcc5af8b6cd9092819e8384c24dedd4.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: don't try io-wq polling if not supportedPavel Begunkov
If an opcode doesn't support polling, just let it be executed synchronously in iowq, otherwise it will do a nonblock attempt just to fail in io_arm_poll_handler() and return back to blocking execution. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6401256db01b88f448f15fcd241439cb76f5b940.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: check if opcode needs poll first on armingPavel Begunkov
->pollout or ->pollin are set only for opcodes that need a file, so if io_arm_poll_handler() tests them first we can be sure that the request has file set and the ->file check can be removed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9adfe4f543d984875e516fce6da35348aab48668.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean iowq submit work cancellationPavel Begunkov
If we've got IO_WQ_WORK_CANCEL in io_wq_submit_work(), handle the error on the same lines as the check instead of having a weird code flow. The main loop doesn't change but goes one indention left. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff4a09cf41f7a22bbb294b6f1faea721e21fe615.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean io_wq_submit_work()'s main loopPavel Begunkov
Do a bit of cleaning for the main loop of io_wq_submit_work(). Get rid of switch, just replace it with a single if as we're retrying in both other cases. Kill issue_sqe label, Get rid of needs_poll nesting and disambiguate a bit the comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ed12ce0c64e051f9a6b8a37a24f8ea554d299c29.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25cfg80211: correct bridge/4addr mode checkJanusz Dziedzic
Without the patch we fail: $ sudo brctl addbr br0 $ sudo brctl addif br0 wlp1s0 $ sudo iw wlp1s0 set 4addr on command failed: Device or resource busy (-16) Last command failed but iface was already in 4addr mode. Fixes: ad4bb6f8883a ("cfg80211: disallow bridging managed/adhoc interfaces") Signed-off-by: Janusz Dziedzic <janusz.dziedzic@gmail.com> Link: https://lore.kernel.org/r/20211024201546.614379-1-janusz.dziedzic@gmail.com [add fixes tag, fix indentation, edit commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-25cfg80211: fix management registrations lockingJohannes Berg
The management registrations locking was broken, the list was locked for each wdev, but cfg80211_mgmt_registrations_update() iterated it without holding all the correct spinlocks, causing list corruption. Rather than trying to fix it with fine-grained locking, just move the lock to the wiphy/rdev (still need the list on each wdev), we already need to hold the wdev lock to change it, so there's no contention on the lock in any case. This trivially fixes the bug since we hold one wdev's lock already, and now will hold the lock that protects all lists. Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Fixes: 6cd536fe62ef ("cfg80211: change internal management frame registration API") Link: https://lore.kernel.org/r/20211025133111.5cf733eab0f4.I7b0abb0494ab712f74e2efcd24bb31ac33f7eee9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-25KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()David Woodhouse
In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before making a final check whether the vCPU should be woken from HLT by any incoming interrupt. This is a problem for the get_user() in __kvm_xen_has_interrupt(), which really shouldn't be sleeping when the task state has already been set. I think it's actually harmless as it would just manifest itself as a spurious wakeup, but it's causing a debug warning: [ 230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80 Fix the warning by turning it into an *explicit* spurious wakeup. When invoked with !task_is_running(current) (and we might as well add in_atomic() there while we're at it), just return 1 to indicate that an IRQ is pending, which will cause a wakeup and then something will call it again in a context that *can* sleep so it can fault the page back in. Cc: stable@vger.kernel.org Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-25Merge tag 'kvm-s390-master-5.15-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for interrupt delivery Two bugs that might result in CPUs not woken up when interrupts are pending.
2021-10-25Merge branch 'ksettings-locking-fixes'David S. Miller
Andrew Lunn says: ==================== ksettings_{get|set} lock fixes Walter Stoll <Walter.Stoll@duagon.com> reported a race condition between "ethtool -s eth0 speed 100 duplex full autoneg off" and phylib reading the current status from the PHY. Both ksetting_get and ksetting_set fail the take the phydev mutex, and as a result, there is a small window of time where the phydev members are not self consistent. Patch 1 fixes phy_ethtool_ksettings_get by adding the needed lock. Patches 2 and 3 move code around and perform to refactoring, to allow patch 4 to fix phy_ethtool_ksettings_set by added the lock. Thanks go to Walter for the detailed origional report, suggested fix, and testing of the proposed patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25phy: phy_ethtool_ksettings_set: Lock the PHY while changing settingsAndrew Lunn
There is a race condition where the PHY state machine can change members of the phydev structure at the same time userspace requests a change via ethtool. To prevent this, have phy_ethtool_ksettings_set take the PHY lock. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Reported-by: Walter Stoll <Walter.Stoll@duagon.com> Suggested-by: Walter Stoll <Walter.Stoll@duagon.com> Tested-by: Walter Stoll <Walter.Stoll@duagon.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25phy: phy_start_aneg: Add an unlocked versionAndrew Lunn
Split phy_start_aneg into a wrapper which takes the PHY lock, and a helper doing the real work. This will be needed when phy_ethtook_ksettings_set takes the lock. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25phy: phy_ethtool_ksettings_set: Move after phy_start_anegAndrew Lunn
This allows it to make use of a helper which assume the PHY is already locked. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25phy: phy_ethtool_ksettings_get: Lock the phy for consistencyAndrew Lunn
The PHY structure should be locked while copying information out if it, otherwise there is no guarantee of self consistency. Without the lock the PHY state machine could be updating the structure. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlockDavid Woodhouse
On the preemption path when updating a Xen guest's runstate times, this lock is taken inside the scheduler rq->lock, which is a raw spinlock. This was shown in a lockdep warning: [ 89.138354] ============================= [ 89.138356] [ BUG: Invalid wait context ] [ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E [ 89.138360] ----------------------------- [ 89.138361] xen_shinfo_test/2575 is trying to lock: [ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138442] other info that might help us debug this: [ 89.138444] context-{5:5} [ 89.138445] 4 locks held by xen_shinfo_test/2575: [ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm] [ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm] [ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0 [ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm] ... [ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm] [ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm] [ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm] [ 89.138870] kvm_sched_out+0x2f/0x40 [kvm] [ 89.138900] __schedule+0x5de/0xbd0 Cc: stable@vger.kernel.org Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-25ARM: 9148/1: handle CONFIG_CPU_ENDIAN_BE32 in arch/arm/kernel/head.SLABBE Corentin
My intel-ixp42x-welltech-epbx100 no longer boot since 4.14. This is due to commit 463dbba4d189 ("ARM: 9104/2: Fix Keystone 2 kernel mapping regression") which forgot to handle CONFIG_CPU_ENDIAN_BE32 as possible BE config. Suggested-by: Krzysztof Hałasa <khalasa@piap.pl> Fixes: 463dbba4d189 ("ARM: 9104/2: Fix Keystone 2 kernel mapping regression") Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2021-10-25irq: arm: perform irqentry in entry codeMark Rutland
In preparation for removing HANDLE_DOMAIN_IRQ_IRQENTRY, have arch/arm perform all the irqentry accounting in its entry code. For configurations with CONFIG_GENERIC_IRQ_MULTI_HANDLER, we can use generic_handle_arch_irq(). Other than asm_do_IRQ(), all C calls to handle_IRQ() are from irqchip handlers which will be called from generic_handle_arch_irq(), so to avoid double accounting IRQ entry, the entry logic is moved from handle_IRQ() into asm_do_IRQ(). For ARMv7M the entry assembly is tightly coupled with the NVIC irqchip, and while the entry code should logically live under arch/arm/, moving the entry logic there makes things more convoluted. So for now, place the entry logic in the NVIC irqchip, but separated into a separate function to make the split of responsibility clear. For all other configurations without CONFIG_GENERIC_IRQ_MULTI_HANDLER, IRQ entry is already handled in arch code, and requires no changes. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRYMark Rutland
Going forward we want architecture/entry code to perform all the necessary work to enter/exit IRQ context, with irqchip code merely handling the mapping of the interrupt to any handler(s). Among other reasons, this is necessary to consistently fix some longstanding issues with the ordering of lockdep/RCU/tracing instrumentation which many architectures get wrong today in their entry code. Importantly, rcu_irq_{enter,exit}() must be called precisely once per IRQ exception, so that rcu_is_cpu_rrupt_from_idle() can correctly identify when an interrupt was taken from an idle context which must be explicitly preempted. Currently handle_domain_irq() calls rcu_irq_{enter,exit}() via irq_{enter,exit}(), but entry code needs to be able to call rcu_irq_{enter,exit}() earlier for correct ordering across lockdep/RCU/tracing updates for sequences such as: lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter(); trace_hardirqs_off_finish(); To permit each architecture to be converted to the new style in turn, this patch adds a new CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY selected by all current users of HANDLE_DOMAIN_IRQ, which gates the existing behaviour. When CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY is not selected, handle_domain_irq() requires entry code to perform the irq_{enter,exit}() work, with an explicit check for this matching the style of handle_domain_nmi(). Subsequent patches will: 1) Add the necessary IRQ entry accounting to each architecture in turn, dropping CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY from that architecture's Kconfig. 2) Remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY once it is no longer selected. 3) Convert irqchip drivers to consistently use generic_handle_domain_irq() rather than handle_domain_irq(). 4) Remove handle_domain_irq() and CONFIG_HANDLE_DOMAIN_IRQ. ... which should leave us with a clear split of responsiblity across the entry and irqchip code, making it possible to perform additional cleanups and fixes for the aforementioned longstanding issues with entry code. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQMark Rutland
In preparation for removing HANDLE_DOMAIN_IRQ, have arch/nds32 perform all the necessary IRQ entry accounting in its entry code. Currently arch/nds32 is tightly coupled with the ativic32 irqchip, and while the entry code should logically live under arch/nds32/, moving the entry logic there makes things more convoluted. So for now, place the entry logic in the ativic32 irqchip, but separated into a separate function to make the split of responsibility clear. In future this should probably use GENERIC_IRQ_MULTI_HANDLER to cleanly decouple this. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Chen <deanbo422@gmail.com>
2021-10-25irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQMark Rutland
In preparation for removing HANDLE_DOMAIN_IRQ, have arch/arc perform all the necessary IRQ entry accounting in its entry code. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@kernel.org>
2021-10-25irq: add generic_handle_arch_irq()Mark Rutland
Several architectures select GENERIC_IRQ_MULTI_HANDLER and branch to handle_arch_irq() without performing any entry accounting. Add a generic wrapper to handle the common irqentry work when invoking handle_arch_irq(). Where an architecture needs to perform some entry accounting itself, it will need to invoke handle_arch_irq() itself. In subsequent patches it will become the responsibilty of the entry code to set the irq regs when entering an IRQ (rather than deferring this to an irqchip handler), so generic_handle_arch_irq() is made to set the irq regs now. This can be redundant in some cases, but is never harmful as saving/restoring the old regs nests safely. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: unexport handle_irq_desc()Mark Rutland
There are no modular users of handle_irq_desc(). Remove the export before we gain any. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: simplify handle_domain_{irq,nmi}()Mark Rutland
There's no need for handle_domain_{irq,nmi}() to open-code the NULL check performed by handle_irq_desc(), nor the resolution of the desc performed by generic_handle_domain_irq(). Use generic_handle_domain_irq() directly, as this is functioanlly equivalent and clearer. At the same time, delete the stale comments, which are no longer helpful. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: mips: simplify do_domain_IRQ()Mark Rutland
There's no need fpr arch/mips's do_domain_IRQ() to open-code the NULL check performed by handle_irq_desc(), nor the resolution of the desc performed by generic_handle_domain_irq(). Use generic_handle_domain_irq() directly, as this is functioanlly equivalent and clearer. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: mips: stop (ab)using handle_domain_irq()Mark Rutland
On MIPS, the only user of handle_domain_irq() is octeon_irq_ciu3_ip2(), which is called from the platform-specific plat_irq_dispatch() function invoked from the early assembly code. No other irqchip relevant to arch/mips uses handle_domain_irq(): * No other plat_irq_dispatch() function transitively calls handle_domain_irq(). * No other vectored IRQ dispatch function registered with set_vi_handler() calls handle_domain_irq(). * No chained irqchip handlers call handle_domain_irq(), which makes sense as this is meant to only be used by root irqchip handlers. Currently octeon_irq_ciu3_ip2() passes NULL as the `regs` argument to handle_domain_irq(), and as handle_domain_irq() will pass this to set_irq_regs(), any invoked IRQ handlers will erroneously see a NULL pt_regs if they call get_pt_regs(). Fix this by calling generic_handle_domain_irq() directly, and performing the necessary irq_{enter,exit}() logic directly in octeon_irq_ciu3_ip2(). At the same time, deselect HANDLE_DOMAIN_IRQ, which subsequent patches will remove. Other than the corrected behaviour of get_pt_regs(), there should be no functional change as a result of this patch. Fixes: ce210d35bb93c2c5 ("MIPS: OCTEON: Add support for OCTEON III interrupt controller.") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: mips: simplify bcm6345_l1_irq_handle()Mark Rutland
As bcm6345_l1_irq_handle() only needs to know /whether/ an IRQ was resolved, and doesn't need to know the specific IRQ, it's simpler for it to call generic_handle_domain_irq() directly and check the return code, so let's do that. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: mips: avoid nested irq_enter()Mark Rutland
As bcm6345_l1_irq_handle() is a chained irqchip handler, it will be invoked within the context of the root irqchip handler, which must have entered IRQ context already. When bcm6345_l1_irq_handle() calls arch/mips's do_IRQ() , this will nest another call to irq_enter(), and the resulting nested increment to `rcu_data.dynticks_nmi_nesting` will cause rcu_is_cpu_rrupt_from_idle() to fail to identify wakeups from idle, resulting in failure to preempt, and RCU stalls. Chained irqchip handlers must invoke IRQ handlers by way of thee core irqchip code, i.e. generic_handle_irq() or generic_handle_domain_irq() and should not call do_IRQ(), which is intended only for root irqchip handlers. Fix bcm6345_l1_irq_handle() by calling generic_handle_irq() directly. Fixes: c7c42ec2baa1de7a ("irqchips/bmips: Add bcm6345-l1 interrupt controller") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25x86/of: Kill unused early_init_dt_scan_chosen_arch()Rob Herring
There are no callers for early_init_dt_scan_chosen_arch(), so remove it. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Link: https://lkml.kernel.org/r/20211022164642.2815706-1-robh@kernel.org