summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-25workqueue: doc: Call out the non-reentrance conditionsBoqun Feng
The current doc of workqueue API suggests that work items are non-reentrant: any work item is guaranteed to be executed by at most one worker system-wide at any given time. However this is not true, the following case can cause a work item W executed by two workers at the same time: queue_work_on(0, WQ1, W); // after a worker picks up W and clear the pending bit queue_work_on(1, WQ2, W); // workers on CPU0 and CPU1 will execute W in the same time. , which means the non-reentrance of a work item is conditional, and Lai Jiangshan provided a nice summary[1] of the conditions, therefore use it to describe a work item instance and improve the doc. [1]: https://lore.kernel.org/lkml/CAJhGHyDudet_xyNk=8xnuO2==o-u06s0E0GZVP4Q67nmQ84Ceg@mail.gmail.com/ Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-10-25RDMA/mlx5: fix build error with INFINIBAND_USER_ACCESS=nArnd Bergmann
The mlx5_ib_fs_add_op_fc/mlx5_ib_fs_remove_op_fc functions are only available when user access is enabled, without that we run into a link error: ERROR: modpost: "mlx5_ib_fs_add_op_fc" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! ERROR: modpost: "mlx5_ib_fs_remove_op_fc" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! Conditionally compiling the newly added code section makes it build, though this is probably not a correct fix. Fixes: a29b934ceb4c ("RDMA/mlx5: Add modify_op_stat() support") Link: https://lore.kernel.org/r/20211019061602.3062196-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-25Merge tag 'libata-5.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull libata fix from Damien Le Moal: "A single fix in this pull request addressing an invalid error code return in the sata_mv driver (from Zheyu)" * tag 'libata-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: sata_mv: Fix the error handling of mv_chip_id()
2021-10-25Merge tag 'pinctrl-v5.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some late pin control fixes, the most generally annoying will probably be the AMD IRQ storm fix affecting the Microsoft surface. Summary: - Three fixes pertaining to Broadcom DT bindings. Some stuff didn't work out as inteded, we need to back out - A resume bug fix in the STM32 driver - Disable and mask the interrupts on probe in the AMD pinctrl driver, affecting Microsoft surface" * tag 'pinctrl-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: amd: disable and mask interrupts on probe pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume() Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode" dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from example Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"
2021-10-25sbitmap: silence data race warningJens Axboe
KCSAN complaints about the sbitmap hint update: ================================================================== BUG: KCSAN: data-race in sbitmap_queue_clear / sbitmap_queue_clear write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 1: sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606 blk_mq_put_tag+0x82/0x90 __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507 blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541 __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565 blk_mq_end_request+0x37/0x50 block/blk-mq.c:574 lo_complete_rq+0xca/0x170 drivers/block/loop.c:541 blk_complete_reqs block/blk-mq.c:584 [inline] blk_done_softirq+0x69/0x90 block/blk-mq.c:589 __do_softirq+0x12c/0x26e kernel/softirq.c:558 run_ksoftirqd+0x13/0x20 kernel/softirq.c:920 smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164 kthread+0x262/0x280 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 0: sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606 blk_mq_put_tag+0x82/0x90 __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507 blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541 __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565 blk_mq_end_request+0x37/0x50 block/blk-mq.c:574 lo_complete_rq+0xca/0x170 drivers/block/loop.c:541 blk_complete_reqs block/blk-mq.c:584 [inline] blk_done_softirq+0x69/0x90 block/blk-mq.c:589 __do_softirq+0x12c/0x26e kernel/softirq.c:558 run_ksoftirqd+0x13/0x20 kernel/softirq.c:920 smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164 kthread+0x262/0x280 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 value changed: 0x00000035 -> 0x00000044 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.15.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ================================================================== which is a data race, but not an important one. This is just updating the percpu alloc hint, and the reader of that hint doesn't ever require it to be valid. Just annotate it with data_race() to silence this one. Reported-by: syzbot+4f8bfd804b4a1f95b8f6@syzkaller.appspotmail.com Acked-by: Marco Elver <elver@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25ASoC: topology: Fix stub for snd_soc_tplg_component_remove()Mark Brown
When removing the index argument from snd_soc_topology_component_remove() commit a5b8f71c5477f (ASoC: topology: Remove multistep topology loading) forgot to update the stub for !SND_SOC_TOPOLOGY use, causing build failures for anything that tries to make use of it. Fixes: a5b8f71c5477f (ASoC: topology: Remove multistep topology loading) Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20211025154844.2342120-1-broonie@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-25ASoC: qcom: common: Respect status = "disabled" on DAI link nodesStephan Gerhold
At the moment, the DAI link nodes in the device tree always have to be specified completely in each device tree. However, the available interfaces (e.g. Primary/Secondary/Tertiary/Quaternary MI2S) are common for all devices of a SoC, so the majority of the definitions can be placed in a common device tree include to reduce boilerplate. Make it possible to define such stubs in device tree includes by respecting the "status" property for the DAI link nodes. This is a trivial change that just requires switching to the _available_ OF functions that check the "status" property additionally. This allows defining a stub like: sound_dai_quaternary: dai-link-quaternary { link-name = "Quaternary MI2S"; status = "disabled"; /* Needs extra codec configuration */ cpu { sound-dai = <&q6afedai QUATERNARY_MI2S_RX>; }; platform { sound-dai = <&q6routing>; }; }; where the codec would be filled in by the device-specific device tree. For existing device trees this change does not make any difference. A missing "status" property is treated like status = "okay". Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20211025105503.49444-1-stephan@gerhold.net Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-25ASoC: dt-bindings: lpass: add binding headers for digital codecsSrinivasa Rao Mandadapu
Add header defining for lpass internal digital codecs rx,tx and va dai node id's. Signed-off-by: Srinivasa Rao Mandadapu <srivasam@codeaurora.org> Link: https://lore.kernel.org/r/1633670491-27432-1-git-send-email-srivasam@codeaurora.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-25Merge series "ASoC: wm8962: Conversion to json-schema and fix" from Geert ↵Mark Brown
Uytterhoeven <geert+renesas@glider.be>: Hi all, This patch series converts the Wolfson WM8962 Device Tree binding documentation to json-schema, after fixing an issue in the imx8mn-beacon DTS file. Thanks for your comments! Geert Uytterhoeven (2): arm64: dts: imx: imx8mn-beacon: Drop undocumented clock-names reference ASoC: dt-bindings: wlf,wm8962: Convert to json-schema .../devicetree/bindings/sound/wlf,wm8962.yaml | 118 ++++++++++++++++++ .../devicetree/bindings/sound/wm8962.txt | 43 ------- .../freescale/imx8mn-beacon-baseboard.dtsi | 1 - 3 files changed, 118 insertions(+), 44 deletions(-) create mode 100644 Documentation/devicetree/bindings/sound/wlf,wm8962.yaml delete mode 100644 Documentation/devicetree/bindings/sound/wm8962.txt -- 2.25.1 Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
2021-10-25fs: get rid of the res2 iocb->ki_complete argumentJens Axboe
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperVTianyu Lan
Hyper-V needs to issue the GHCB HV call in order to read/write MSRs in Isolation VMs. For that, expose sev_es_ghcb_hv_call(). The Hyper-V Isolation VMs are unenlightened guests and run a paravisor at VMPL0 for communicating. GHCB pages are being allocated and set up by that paravisor. Linux gets the GHCB page's physical address via MSR_AMD64_SEV_ES_GHCB from the paravisor and should not change it. Add a @set_ghcb_msr parameter to sev_es_ghcb_hv_call() to control whether the function should set the GHCB's address prior to the call or not and export that function for use by HyperV. [ bp: - Massage commit message - add a struct ghcb forward declaration to fix randconfig builds. ] Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20211025122116.264793-6-ltykernel@gmail.com
2021-10-25usb: remove res2 argument from gadget code completionsJens Axboe
The USB gadget code is the only code that every tried to utilize the 2nd argument of the aio completions, but there are strong suspicions that it was never actually used by anything on the userspace side. Out of the 3 cases that touch it, two of them just pass in the same as res, and the last one passes in error/transfer in res like any other normal use case. Remove the 2nd argument, pass 0 like the rest of the in-kernel users of kiocb based IO. Link: https://lore.kernel.org/linux-block/20211021174021.273c82b1.john@metanate.com/ Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25selftests: x86: fix [-Wstringop-overread] warn in test_process_vm_readv()Shuah Khan
Fix the following [-Wstringop-overread] by passing in the variable instead of the value. test_vsyscall.c: In function ‘test_process_vm_readv’: test_vsyscall.c:500:22: warning: ‘__builtin_memcmp_eq’ specified bound 4096 exceeds source size 0 [-Wstringop-overread] 500 | if (!memcmp(buf, (const void *)0xffffffffff600000, 4096)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-10-25net-sysfs: initialize uid and gid before calling net_ns_get_ownershipXin Long
Currently in net_ns_get_ownership() it may not be able to set uid or gid if make_kuid or make_kgid returns an invalid value, and an uninit-value issue can be triggered by this. This patch is to fix it by initializing the uid and gid before calling net_ns_get_ownership(), as it does in kobject_get_ownership() Fixes: e6dee9f3893c ("net-sysfs: add netdev_change_owner()") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25xen/netfront: stop tx queues during live migrationDongli Zhang
The tx queues are not stopped during the live migration. As a result, the ndo_start_xmit() may access netfront_info->queues which is freed by talk_to_netback()->xennet_destroy_queues(). This patch is to netif_device_detach() at the beginning of xen-netfront resuming, and netif_device_attach() at the end of resuming. CPU A CPU B talk_to_netback() -> if (info->queues) xennet_destroy_queues(info); to free netfront_info->queues xennet_start_xmit() to access netfront_info->queues -> err = xennet_create_queues(info, &num_queues); The idea is borrowed from virtio-net. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net: Prevent infinite while loop in skb_tx_hash()Michael Chan
Drivers call netdev_set_num_tc() and then netdev_set_tc_queue() to set the queue count and offset for each TC. So the queue count and offset for the TCs may be zero for a short period after dev->num_tc has been set. If a TX packet is being transmitted at this time in the code path netdev_pick_tx() -> skb_tx_hash(), skb_tx_hash() may see nonzero dev->num_tc but zero qcount for the TC. The while loop that keeps looping while hash >= qcount will not end. Fix it by checking the TC's qcount to be nonzero before using it. Fixes: eadec877ce9c ("net: Add support for subordinate traffic classes to netdev_pick_tx") Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net/tls: getsockopt supports complete algorithm listTianjia Zhang
AES_CCM_128 and CHACHA20_POLY1305 are already supported by tls, similar to setsockopt, getsockopt also needs to support these two algorithms. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net/tls: tls_crypto_context add supported algorithms contextTianjia Zhang
tls already supports the SM4 GCM/CCM algorithms. It is also necessary to add support for these two algorithms in tls_crypto_context to avoid potential issues caused by forced type conversion. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a stringMark Zhang
When copying the device name, the length of the data memcpy copied exceeds the length of the source buffer, which cause the KASAN issue below. Use strscpy_pad() instead. BUG: KASAN: slab-out-of-bounds in ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] Read of size 64 at addr ffff88811a10f5e0 by task rping/140263 CPU: 3 PID: 140263 Comm: rping Not tainted 5.15.0-rc1+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x7d print_address_description.constprop.0+0x1d/0xa0 kasan_report+0xcb/0x110 kasan_check_range+0x13d/0x180 memcpy+0x20/0x60 ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core] ib_nl_make_request+0x1c6/0x380 [ib_core] send_mad+0x20a/0x220 [ib_core] ib_sa_path_rec_get+0x3e3/0x800 [ib_core] cma_query_ib_route+0x29b/0x390 [rdma_cm] rdma_resolve_route+0x308/0x3e0 [rdma_cm] ucma_resolve_route+0xe1/0x150 [rdma_ucm] ucma_write+0x17b/0x1f0 [rdma_ucm] vfs_write+0x142/0x4d0 ksys_write+0x133/0x160 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f26499aa90f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48 RSP: 002b:00007f26495f2dc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000000007d0 RCX: 00007f26499aa90f RDX: 0000000000000010 RSI: 00007f26495f2e00 RDI: 0000000000000003 RBP: 00005632a8315440 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26495f2e00 R13: 00005632a83154e0 R14: 00005632a8315440 R15: 00005632a830a810 Allocated by task 131419: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 proc_self_get_link+0x8b/0x100 pick_link+0x4f1/0x5c0 step_into+0x2eb/0x3d0 walk_component+0xc8/0x2c0 link_path_walk+0x3b8/0x580 path_openat+0x101/0x230 do_filp_open+0x12e/0x240 do_sys_openat2+0x115/0x280 __x64_sys_openat+0xce/0x140 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Link: https://lore.kernel.org/r/72ede0f6dab61f7f23df9ac7a70666e07ef314b0.1635055496.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-25mlxsw: spectrum: Use 'bitmap_zalloc()' when applicableChristophe JAILLET
Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid some open-coded arithmetic in allocator arguments. Also change the corresponding 'kfree()' into 'bitmap_free()' to keep consistency. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net: nxp: lpc_eth.c: avoid hang when bringing interface downTrevor Woerner
A hard hang is observed whenever the ethernet interface is brought down. If the PHY is stopped before the LPC core block is reset, the SoC will hang. Comparing lpc_eth_close() and lpc_eth_open() I re-arranged the ordering of the functions calls in lpc_eth_close() to reset the hardware before stopping the PHY. Fixes: b7370112f519 ("lpc32xx: Added ethernet driver") Signed-off-by: Trevor Woerner <twoerner@gmail.com> Acked-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25usbb: catc: use correct API for MAC addressesOliver Neukum
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. In the case of catc we need a new temporary buffer to conform to the rules for DMA coherency. That in turn necessitates a reworking of error handling in probe(). Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25blk-cgroup: synchronize blkg creation against policy deactivationYu Kuai
Our test reports a null pointer dereference: [ 168.534653] ================================================================== [ 168.535614] Disabling lock debugging due to kernel taint [ 168.536346] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 168.537274] #PF: supervisor read access in kernel mode [ 168.537964] #PF: error_code(0x0000) - not-present page [ 168.538667] PGD 0 P4D 0 [ 168.539025] Oops: 0000 [#1] PREEMPT SMP KASAN [ 168.539656] CPU: 13 PID: 759 Comm: bash Tainted: G B 5.15.0-rc2-next-202100 [ 168.540954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_0738364 [ 168.542736] RIP: 0010:bfq_pd_init+0x88/0x1e0 [ 168.543318] Code: 98 00 00 00 e8 c9 e4 5b ff 4c 8b 65 00 49 8d 7c 24 08 e8 bb e4 5b ff 4d0 [ 168.545803] RSP: 0018:ffff88817095f9c0 EFLAGS: 00010002 [ 168.546497] RAX: 0000000000000001 RBX: ffff888101a1c000 RCX: 0000000000000000 [ 168.547438] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff888106553428 [ 168.548402] RBP: ffff888106553400 R08: ffffffff961bcaf4 R09: 0000000000000001 [ 168.549365] R10: ffffffffa2e16c27 R11: fffffbfff45c2d84 R12: 0000000000000000 [ 168.550291] R13: ffff888101a1c098 R14: ffff88810c7a08c8 R15: ffffffffa55541a0 [ 168.551221] FS: 00007fac75227700(0000) GS:ffff88839ba80000(0000) knlGS:0000000000000000 [ 168.552278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 168.553040] CR2: 0000000000000008 CR3: 0000000165ce7000 CR4: 00000000000006e0 [ 168.554000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 168.554929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 168.555888] Call Trace: [ 168.556221] <TASK> [ 168.556510] blkg_create+0x1c0/0x8c0 [ 168.556989] blkg_conf_prep+0x574/0x650 [ 168.557502] ? stack_trace_save+0x99/0xd0 [ 168.558033] ? blkcg_conf_open_bdev+0x1b0/0x1b0 [ 168.558629] tg_set_conf.constprop.0+0xb9/0x280 [ 168.559231] ? kasan_set_track+0x29/0x40 [ 168.559758] ? kasan_set_free_info+0x30/0x60 [ 168.560344] ? tg_set_limit+0xae0/0xae0 [ 168.560853] ? do_sys_openat2+0x33b/0x640 [ 168.561383] ? do_sys_open+0xa2/0x100 [ 168.561877] ? __x64_sys_open+0x4e/0x60 [ 168.562383] ? __kasan_check_write+0x20/0x30 [ 168.562951] ? copyin+0x48/0x70 [ 168.563390] ? _copy_from_iter+0x234/0x9e0 [ 168.563948] tg_set_conf_u64+0x17/0x20 [ 168.564467] cgroup_file_write+0x1ad/0x380 [ 168.565014] ? cgroup_file_poll+0x80/0x80 [ 168.565568] ? __mutex_lock_slowpath+0x30/0x30 [ 168.566165] ? pgd_free+0x100/0x160 [ 168.566649] kernfs_fop_write_iter+0x21d/0x340 [ 168.567246] ? cgroup_file_poll+0x80/0x80 [ 168.567796] new_sync_write+0x29f/0x3c0 [ 168.568314] ? new_sync_read+0x410/0x410 [ 168.568840] ? __handle_mm_fault+0x1c97/0x2d80 [ 168.569425] ? copy_page_range+0x2b10/0x2b10 [ 168.570007] ? _raw_read_lock_bh+0xa0/0xa0 [ 168.570622] vfs_write+0x46e/0x630 [ 168.571091] ksys_write+0xcd/0x1e0 [ 168.571563] ? __x64_sys_read+0x60/0x60 [ 168.572081] ? __kasan_check_write+0x20/0x30 [ 168.572659] ? do_user_addr_fault+0x446/0xff0 [ 168.573264] __x64_sys_write+0x46/0x60 [ 168.573774] do_syscall_64+0x35/0x80 [ 168.574264] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 168.574960] RIP: 0033:0x7fac74915130 [ 168.575456] Code: 73 01 c3 48 8b 0d 58 ed 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 444 [ 168.577969] RSP: 002b:00007ffc3080e288 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 168.578986] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fac74915130 [ 168.579937] RDX: 0000000000000009 RSI: 000056007669f080 RDI: 0000000000000001 [ 168.580884] RBP: 000056007669f080 R08: 000000000000000a R09: 00007fac75227700 [ 168.581841] R10: 000056007655c8f0 R11: 0000000000000246 R12: 0000000000000009 [ 168.582796] R13: 0000000000000001 R14: 00007fac74be55e0 R15: 00007fac74be08c0 [ 168.583757] </TASK> [ 168.584063] Modules linked in: [ 168.584494] CR2: 0000000000000008 [ 168.584964] ---[ end trace 2475611ad0f77a1a ]--- This is because blkg_alloc() is called from blkg_conf_prep() without holding 'q->queue_lock', and elevator is exited before blkg_create(): thread 1 thread 2 blkg_conf_prep spin_lock_irq(&q->queue_lock); blkg_lookup_check -> return NULL spin_unlock_irq(&q->queue_lock); blkg_alloc blkcg_policy_enabled -> true pd = ->pd_alloc_fn blkg->pd[i] = pd blk_mq_exit_sched bfq_exit_queue blkcg_deactivate_policy spin_lock_irq(&q->queue_lock); __clear_bit(pol->plid, q->blkcg_pols); spin_unlock_irq(&q->queue_lock); q->elevator = NULL; spin_lock_irq(&q->queue_lock); blkg_create if (blkg->pd[i]) ->pd_init_fn -> q->elevator is NULL spin_unlock_irq(&q->queue_lock); Because blkcg_deactivate_policy() requires queue to be frozen, we can grab q_usage_counter to synchoronize blkg_conf_prep() against blkcg_deactivate_policy(). Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20211020014036.2141723-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25block: refactor bio_iov_bvec_set()Pavel Begunkov
Combine bio_iov_bvec_set() and bio_iov_bvec_set_append() and let the caller to do iov_iter_advance(). Also get rid of __bio_iov_bvec_set(), which was duplicated in the final binary, and replace a weird iov_iter_truncate() of a temporal iter copy with min() better reflecting the intention. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/bcf1ac36fce769a514e19475f3623cd86a1d8b72.1635006010.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25block: add single bio async direct IO helperPavel Begunkov
As with __blkdev_direct_IO_simple(), we can implement direct IO more efficiently if there is only one bio. Add __blkdev_direct_IO_async() and blkdev_bio_end_io_async(). This patch brings me from 4.45-4.5 MIOPS with nullblk to 4.7+. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f0ae4109b7a6934adede490f84d188d53b97051b.1635006010.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25block: ataflop: more blk-mq refactoring fixesMichael Schmitz
As it turns out, my earlier patch in commit 86d46fdaa12a (block: ataflop: fix breakage introduced at blk-mq refactoring) was incomplete. This patch fixes any remaining issues found during more testing and code review. Requests exceeding 4 k are handled in 4k segments but __blk_mq_end_request() is never called on these (still sectors outstanding on the request). With redo_fd_request() removed, there is no provision to kick off processing of the next segment, causing requests exceeding 4k to hang. (By setting /sys/block/fd0/queue/max_sectors_k <= 4 as workaround, this behaviour can be avoided). Instead of reintroducing redo_fd_request(), requeue the remainder of the request by calling blk_mq_requeue_request() on incomplete requests (i.e. when blk_update_request() still returns true), and rely on the block layer to queue the residual as new request. Both error handling and formatting needs to release the ST-DMA lock, so call finish_fdc() on these (this was previously handled by redo_fd_request()). finish_fdc() may be called legitimately without the ST-DMA lock held - make sure we only release the lock if we actually held it. In a similar way, early exit due to errors in ataflop_queue_rq() must release the lock. After minor errors, fd_error sets up to recalibrate the drive but never re-runs the current operation (another task handled by redo_fd_request() before). Call do_fd_action() to get the next steps (seek, retry read/write) underway. Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Fixes: 6ec3938cff95f (ataflop: convert to blk-mq) CC: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20211024002013.9332-1-schmitzmic@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clusterise ki_flags access in rw_prepPavel Begunkov
ioprio setup doesn't depend on other fields that are modified in io_prep_rw() and we can move it down in the function without worrying about performance. It's useful as it makes iocb->ki_flags accesses/modifications closer together, so it's more likely the compiler will cache it in a register and avoid extra reloads. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8ee98779c06f1b59f6039b1e292db4332efd664b.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: kill unused param from io_file_supports_nowaitPavel Begunkov
io_file_supports_nowait() doesn't use rw argument anymore, remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4bd6709fc573d70c866ea656cb7a7dbe94be8026.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean up timeout async_data allocationPavel Begunkov
opcode prep functions are one of the first things that are called, we can't have ->async_data allocated at this point and it's certainly a bug. Reflect this assumption in io_timeout_prep() and add a WARN_ONCE just in case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/75a28ca7dbcc5af8b6cd9092819e8384c24dedd4.1634987320.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: don't try io-wq polling if not supportedPavel Begunkov
If an opcode doesn't support polling, just let it be executed synchronously in iowq, otherwise it will do a nonblock attempt just to fail in io_arm_poll_handler() and return back to blocking execution. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6401256db01b88f448f15fcd241439cb76f5b940.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: check if opcode needs poll first on armingPavel Begunkov
->pollout or ->pollin are set only for opcodes that need a file, so if io_arm_poll_handler() tests them first we can be sure that the request has file set and the ->file check can be removed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9adfe4f543d984875e516fce6da35348aab48668.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean iowq submit work cancellationPavel Begunkov
If we've got IO_WQ_WORK_CANCEL in io_wq_submit_work(), handle the error on the same lines as the check instead of having a weird code flow. The main loop doesn't change but goes one indention left. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff4a09cf41f7a22bbb294b6f1faea721e21fe615.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25io_uring: clean io_wq_submit_work()'s main loopPavel Begunkov
Do a bit of cleaning for the main loop of io_wq_submit_work(). Get rid of switch, just replace it with a single if as we're retrying in both other cases. Kill issue_sqe label, Get rid of needs_poll nesting and disambiguate a bit the comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ed12ce0c64e051f9a6b8a37a24f8ea554d299c29.1634987320.git.asml.silence@gmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-25wcn36xx: Fix tx_status mechanismLoic Poulain
This change fix the TX ack mechanism in various ways: - For NO_ACK tagged packets, we don't need to wait for TX_ACK indication and so are not subject to the single packet ack limitation. So we don't have to stop the tx queue, and can call the tx status callback as soon as DMA transfer has completed. - Fix skb ownership/reference. Only start status indication timeout once the DMA transfer has been completed. This avoids the skb to be both referenced in the DMA tx ring and by the tx_ack_skb pointer, preventing any use-after-free or double-free. - This adds a sanity (paranoia?) check on the skb tx ack pointer. - Resume TX queue if TX status tagged packet TX fails. Cc: stable@vger.kernel.org Fixes: fdf21cc37149 ("wcn36xx: Add TX ack support") Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1634567281-28997-1-git-send-email-loic.poulain@linaro.org
2021-10-25cfg80211: correct bridge/4addr mode checkJanusz Dziedzic
Without the patch we fail: $ sudo brctl addbr br0 $ sudo brctl addif br0 wlp1s0 $ sudo iw wlp1s0 set 4addr on command failed: Device or resource busy (-16) Last command failed but iface was already in 4addr mode. Fixes: ad4bb6f8883a ("cfg80211: disallow bridging managed/adhoc interfaces") Signed-off-by: Janusz Dziedzic <janusz.dziedzic@gmail.com> Link: https://lore.kernel.org/r/20211024201546.614379-1-janusz.dziedzic@gmail.com [add fixes tag, fix indentation, edit commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-25wcn36xx: Fix (QoS) null data frame bitrate/modulationLoic Poulain
We observe unexpected connection drops with some APs due to non-acked mac80211 generated null data frames (keep-alive). After debugging and capture, we noticed that null frames are submitted at standard data bitrate and that the given APs are in trouble with that. After setting the null frame bitrate to control bitrate, all null frames are acked as expected and connection is maintained. Not sure if it's a requirement of the specification, but it seems the right thing to do anyway, null frames are mostly used for control purpose (power-saving, keep-alive...), and submitting them with a slower/simpler bitrate/modulation is more robust. Cc: stable@vger.kernel.org Fixes: 512b191d9652 ("wcn36xx: Fix TX data path") Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1634560399-15290-1-git-send-email-loic.poulain@linaro.org
2021-10-25cfg80211: fix management registrations lockingJohannes Berg
The management registrations locking was broken, the list was locked for each wdev, but cfg80211_mgmt_registrations_update() iterated it without holding all the correct spinlocks, causing list corruption. Rather than trying to fix it with fine-grained locking, just move the lock to the wiphy/rdev (still need the list on each wdev), we already need to hold the wdev lock to change it, so there's no contention on the lock in any case. This trivially fixes the bug since we hold one wdev's lock already, and now will hold the lock that protects all lists. Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Fixes: 6cd536fe62ef ("cfg80211: change internal management frame registration API") Link: https://lore.kernel.org/r/20211025133111.5cf733eab0f4.I7b0abb0494ab712f74e2efcd24bb31ac33f7eee9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-25Merge tag 'wireless-drivers-next-2021-10-25' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.16 Third set of patches for v5.16. This time we have a small one to quickly fix two mt76 build failures I had missed in my previous pull request. Major changes: mt76 * fix linking when CONFIG_MMC is disabled * fix dev_err() format warning * mt7615: mt7622: fix ibss and meshpoint ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25Merge branch 'gve-jumbo-frame'David S. Miller
Jeroen de Borst says: ==================== gve: Add jumbo-frame support for GQ This patchset introduces jumbo-frame support for the GQ queue format. The device already supports jumbo-frames on TX. This introduces multi-descriptor RX packets using a packet continuation bit. A widely deployed driver has a bug with causes it to fail to load when a MTU greater than 2048 bytes is configured. A jumbo-frame device option is introduced to pass a jumbo-frame MTU only to drivers that support it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25gve: Add a jumbo-frame device option.Shailend Chand
A widely deployed driver has a bug that will cause the driver not to load when a max_mtu > 2048 is present in the device descriptor. To avoid this bug while still enabling jumbo frames, we present a lower max_mtu in the device descriptor and pass the actual max_mtu in a separate device option. The driver supports 2 different queue formats. To enable features on one queue format, but not the other, a supported_features mask was added to the device options in the device descriptor. Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25gve: Implement packet continuation for RX.David Awogbemila
This enables the driver to receive RX packets spread across multiple buffers: For a given multi-fragment packet the "packet continuation" bit is set on all descriptors except the last one. These descriptors' payloads are combined into a single SKB before the SKB is handed to the networking stack. This change adds a "packet buffer size" notion for RX queues. The CreateRxQueue AdminQueue command sent to the device now includes the packet_buffer_size. We opt for a packet_buffer_size of PAGE_SIZE / 2 to give the driver the opportunity to flip pages where we can instead of copying. Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25gve: Add RX context.David Awogbemila
This refactor moves the skb_head and skb_tail fields into a new gve_rx_ctx struct. This new struct will contain information about the current packet being processed. This is in preparation for multi-descriptor RX packets. Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block()David Woodhouse
In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before making a final check whether the vCPU should be woken from HLT by any incoming interrupt. This is a problem for the get_user() in __kvm_xen_has_interrupt(), which really shouldn't be sleeping when the task state has already been set. I think it's actually harmless as it would just manifest itself as a spurious wakeup, but it's causing a debug warning: [ 230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80 Fix the warning by turning it into an *explicit* spurious wakeup. When invoked with !task_is_running(current) (and we might as well add in_atomic() there while we're at it), just return 1 to indicate that an IRQ is pending, which will cause a wakeup and then something will call it again in a context that *can* sleep so it can fault the page back in. Cc: stable@vger.kernel.org Fixes: 40da8ccd724f ("KVM: x86/xen: Add event channel interrupt vector upcall") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-25Merge branch 'mlxsw-selftests-updates'David S. Miller
Ido Schimmel says: ==================== selftests: mlxsw: Various updates This patchset contains various updates to mlxsw selftests. Patch #1 replaces open-coded compatibility checks with dedicated helpers. These helpers are used to skip tests when run on incompatible machines. Patch #2 avoids spurious failures in some tests by using permanent neighbours instead of reachable ones. Patch #3 reduces the run time of a test by not iterating over all the available trap policers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25selftests: mlxsw: Reduce test run timeIdo Schimmel
Instead of iterating over all the available trap policers, only perform the tests with three policers: The first, the last and the one in the middle of the range. On a Spectrum-3 system, this reduces the run time from almost an hour to a few minutes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25selftests: mlxsw: Use permanent neighbours instead of reachable onesIdo Schimmel
The nexthop objects tests configure dummy reachable neighbours so that the nexthops will have a MAC address and be programmed to the device. Since these are dummy reachable neighbours, they can be transitioned by the kernel to a failed state if they are around for too long. This can happen, for example, if the "TIMEOUT" variable is configured with a too high value. Make the tests more robust by configuring the neighbours as permanent, so that the tests do not depend on the configured timeout value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25selftests: mlxsw: Add helpers for skipping selftestsPetr Machata
A number of mlxsw-specific selftests currently detect whether they are run on a compatible machine, and bail out silently when not. These tests are however done in a somewhat impenetrable manner by directly comparing PCI IDs against a blacklist or a whitelist, and bailing out silently if the machine is not compatible. Instead, add a helper, mlxsw_only_on_spectrum(), which allows specifying the supported machines in a human-readable manner. If the current machine is incompatible, the helper emits a SKIP message and returns an error code, based on which the caller can gracefully bail out in a suitable way. This allows a more readable conditions such as: mlxsw_only_on_spectrum 2+ || return Convert all existing open-coded guards to the new helper. Also add two new guards to do_mark_test() and do_drop_test(), which are supported only on Spectrum-2+, but the corresponding check was not there. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25Merge tag 'kvm-s390-master-5.15-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for interrupt delivery Two bugs that might result in CPUs not woken up when interrupts are pending.
2021-10-25Merge branch 'ksettings-locking-fixes'David S. Miller
Andrew Lunn says: ==================== ksettings_{get|set} lock fixes Walter Stoll <Walter.Stoll@duagon.com> reported a race condition between "ethtool -s eth0 speed 100 duplex full autoneg off" and phylib reading the current status from the PHY. Both ksetting_get and ksetting_set fail the take the phydev mutex, and as a result, there is a small window of time where the phydev members are not self consistent. Patch 1 fixes phy_ethtool_ksettings_get by adding the needed lock. Patches 2 and 3 move code around and perform to refactoring, to allow patch 4 to fix phy_ethtool_ksettings_set by added the lock. Thanks go to Walter for the detailed origional report, suggested fix, and testing of the proposed patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25phy: phy_ethtool_ksettings_set: Lock the PHY while changing settingsAndrew Lunn
There is a race condition where the PHY state machine can change members of the phydev structure at the same time userspace requests a change via ethtool. To prevent this, have phy_ethtool_ksettings_set take the PHY lock. Fixes: 2d55173e71b0 ("phy: add generic function to support ksetting support") Reported-by: Walter Stoll <Walter.Stoll@duagon.com> Suggested-by: Walter Stoll <Walter.Stoll@duagon.com> Tested-by: Walter Stoll <Walter.Stoll@duagon.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>