summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-27Merge tag 'trace-v5.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull nds32 tracing fix from Steven Rostedt: "Fix nds32le build when DYNAMIC_FTRACE is disabled A randconfig found that nds32le architecture fails to build due to a prototype mismatch between a ftrace function pointer and the function it was to be assigned to. That function pointer prototype missed being updated when all the ftrace callbacks were updated" * tag 'trace-v5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/nds32: Update the proto for ftrace_trace_function to match ftrace_stub
2021-10-27Merge tag 'nios2_fixes_for_v5.15_part3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux Pull nios2 fix from Dinh Nguyen: "Fix a build error for allmodconfig" * tag 'nios2_fixes_for_v5.15_part3' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: nios2: Make NIOS2_DTB_SOURCE_BOOL depend on !COMPILE_TEST
2021-10-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Nothing very exciting here, it has been a quiet cycle overall. Usual collection of small bug fixes: - irdma issues with CQ entries, VLAN completions and a mutex deadlock - Incorrect DCT packets in mlx5 - Userspace triggered overflows in qib - Locking error in hfi - Typo in errno value in qib/hfi1 - Double free in qedr - Leak of random kernel memory to userspace with a netlink callback" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a string RDMA/irdma: Do not hold qos mutex twice on QP resume RDMA/irdma: Set VLAN in UD work completion correctly RDMA/mlx5: Initialize the ODP xarray when creating an ODP MR rdma/qedr: Fix crash due to redundant release of device's qp memory RDMA/rdmavt: Fix error code in rvt_create_qp() IB/hfi1: Fix abba locking issue with sc_disable() IB/qib: Protect from buffer overflow in struct qib_user_sdma_pkt fields RDMA/mlx5: Set user priority for DCT RDMA/irdma: Process extended CQ entries correctly
2021-10-27ftrace/nds32: Update the proto for ftrace_trace_function to match ftrace_stubSteven Rostedt (VMware)
The ftrace callback prototype was changed to pass a special ftrace_regs instead of pt_regs as the last parameter, but the static ftrace for nds32 missed updating ftrace_trace_function and this caused a warning when compared to ftrace_stub: ../arch/nds32/kernel/ftrace.c: In function '_mcount': ../arch/nds32/kernel/ftrace.c:24:35: error: comparison of distinct pointer types lacks a cast [-Werror] 24 | if (ftrace_trace_function != ftrace_stub) | ^~ Link: https://lore.kernel.org/all/20211027055554.19372-1-rdunlap@infradead.org/ Link: https://lkml.kernel.org/r/20211027125101.33449969@gandalf.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Fixes: d19ad0775dcd6 ("ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker togetherRobin H. Johnson
Running endpoint security solutions like Sentinel1 that use perf-based tracing heavily lead to this repeated dump complaining about dockerd. The default value of 2048 is nowhere near not large enough. Using the prior patch "tracing: show size of requested buffer", we get "perf buffer not large enough, wanted 6644, have 6144", after repeated up-sizing (I did 2/4/6/8K). With 8K, the problem doesn't occur at all, so below is the trace for 6K. I'm wondering if this value should be selectable at boot time, but this is a good starting point. ``` ------------[ cut here ]------------ perf buffer not large enough, wanted 6644, have 6144 WARNING: CPU: 1 PID: 4997 at kernel/trace/trace_event_perf.c:402 perf_trace_buf_alloc+0x8c/0xa0 Modules linked in: [..] CPU: 1 PID: 4997 Comm: sh Tainted: G T 5.13.13-x86_64-00039-gb3959163488e #63 Hardware name: LENOVO 20KH002JUS/20KH002JUS, BIOS N23ET66W (1.41 ) 09/02/2019 RIP: 0010:perf_trace_buf_alloc+0x8c/0xa0 Code: 80 3d 43 97 d0 01 00 74 07 31 c0 5b 5d 41 5c c3 ba 00 18 00 00 89 ee 48 c7 c7 00 82 7d 91 c6 05 25 97 d0 01 01 e8 22 ee bc 00 <0f> 0b 31 c0 eb db 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89 RSP: 0018:ffffb922026b7d58 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff9da5ee012000 RCX: 0000000000000027 RDX: ffff9da881657828 RSI: 0000000000000001 RDI: ffff9da881657820 RBP: 00000000000019f4 R08: 0000000000000000 R09: ffffb922026b7b80 R10: ffffb922026b7b78 R11: ffffffff91dda688 R12: 000000000000000f R13: ffff9da5ee012108 R14: ffff9da8816570a0 R15: ffffb922026b7e30 FS: 00007f420db1a080(0000) GS:ffff9da881640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000060 CR3: 00000002504a8006 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: kprobe_perf_func+0x11e/0x270 ? do_execveat_common.isra.0+0x1/0x1c0 ? do_execveat_common.isra.0+0x5/0x1c0 kprobe_ftrace_handler+0x10e/0x1d0 0xffffffffc03aa0c8 ? do_execveat_common.isra.0+0x1/0x1c0 do_execveat_common.isra.0+0x5/0x1c0 __x64_sys_execve+0x33/0x40 do_syscall_64+0x6b/0xc0 ? do_syscall_64+0x11/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f420dc1db37 Code: ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 00 eb dc 0f 1f 84 00 00 00 00 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 01 43 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd4e8b4e38 EFLAGS: 00000246 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f420dc1db37 RDX: 0000564338d1e740 RSI: 0000564338d32d50 RDI: 0000564338d28f00 RBP: 0000564338d28f00 R08: 0000564338d32d50 R09: 0000000000000020 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000564338d28f00 R13: 0000564338d32d50 R14: 0000564338d1e740 R15: 0000564338d28c60 ---[ end trace 83ab3e8e16275e49 ]--- ``` Link: https://lkml.kernel.org/r/20210831043723.13481-2-robbat2@gentoo.org Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27tracing: Show size of requested perf bufferRobin H. Johnson
If the perf buffer isn't large enough, provide a hint about how large it needs to be for whatever is running. Link: https://lkml.kernel.org/r/20210831043723.13481-1-robbat2@gentoo.org Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27bootconfig: Initialize ret in xbc_parse_tree()Steven Rostedt (VMware)
The do while loop continues while ret is zero, but ret is never initialized. The check for ret in the loop at the while should always be initialized, but if an empty string were to be passed in, q would be NULL and p would be '\0', and it would break out of the loop without ever setting ret. Set ret to zero, and then xbc_verify_tree() would be called and catch that it is an empty tree and report the proper error. Link: https://lkml.kernel.org/r/20211027105753.6ab9da5f@gandalf.local.home Fixes: bdac5c2b243f ("bootconfig: Allocate xbc_data inside xbc_init()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27ftrace: do CPU checking after preemption disabled王贇
With CONFIG_DEBUG_PREEMPT we observed reports like: BUG: using smp_processor_id() in preemptible caller is perf_ftrace_function_call+0x6f/0x2e0 CPU: 1 PID: 680 Comm: a.out Not tainted Call Trace: <TASK> dump_stack_lvl+0x8d/0xcf check_preemption_disabled+0x104/0x110 ? optimize_nops.isra.7+0x230/0x230 ? text_poke_bp_batch+0x9f/0x310 perf_ftrace_function_call+0x6f/0x2e0 ... __text_poke+0x5/0x620 text_poke_bp_batch+0x9f/0x310 This telling us the CPU could be changed after task is preempted, and the checking on CPU before preemption will be invalid. Since now ftrace_test_recursion_trylock() will help to disable the preemption, this patch just do the checking after trylock() to address the issue. Link: https://lkml.kernel.org/r/54880691-5fe2-33e7-d12f-1fa6136f5183@linux.alibaba.com CC: Steven Rostedt <rostedt@goodmis.org> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27ftrace: disable preemption when recursion locked王贇
As the documentation explained, ftrace_test_recursion_trylock() and ftrace_test_recursion_unlock() were supposed to disable and enable preemption properly, however currently this work is done outside of the function, which could be missing by mistake. And since the internal using of trace_test_and_set_recursion() and trace_clear_recursion() also require preemption disabled, we can just merge the logical. This patch will make sure the preemption has been disabled when trace_test_and_set_recursion() return bit >= 0, and trace_clear_recursion() will enable the preemption if previously enabled. Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com CC: Petr Mladek <pmladek@suse.com> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: Miroslav Benes <mbenes@suse.cz> Reported-by: Abaci <abaci@linux.alibaba.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> [ Removed extra line in comment - SDR ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27Merge tag 'mac80211-for-net-2021-10-27' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two fixes: * bridge vs. 4-addr mode check was wrong * management frame registrations locking was wrong, causing list corruption/crashes ==================== Link: https://lore.kernel.org/r/20211027143756.91711-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27KVM: SEV-ES: fix another issue with string I/O VMGEXITsPaolo Bonzini
If the guest requests string I/O from the hypervisor via VMGEXIT, SW_EXITINFO2 will contain the REP count. However, sev_es_string_io was incorrectly treating it as the size of the GHCB buffer in bytes. This fixes the "outsw" test in the experimental SEV tests of kvm-unit-tests. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Reported-by: Marc Orr <marcorr@google.com> Tested-by: Marc Orr <marcorr@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-27block: re-flow blk_mq_rq_ctx_init()Jens Axboe
Now that we have flags passed in, we can do a final re-arrange of the flow of blk_mq_rq_ctx_init() so we're always writing request in the order in which it is laid out. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-5-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: prefetch request to be initializedJens Axboe
Now we have the tags available in __blk_mq_alloc_requests_batch(), we can start fetching the first request cacheline before calling into the request initialization. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-4-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: pass in blk_mq_tags to blk_mq_rq_ctx_init()Jens Axboe
Instead of getting this from data for every invocation of request initialization, pass it in as an argument instead. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-3-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: add rq_flags to struct blk_mq_alloc_dataJens Axboe
There's a hole here we can use, and it's faster to set this earlier rather than need to check q->elevator multiple times. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-2-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27nios2: Make NIOS2_DTB_SOURCE_BOOL depend on !COMPILE_TESTGuenter Roeck
nios2:allmodconfig builds fail with make[1]: *** No rule to make target 'arch/nios2/boot/dts/""', needed by 'arch/nios2/boot/dts/built-in.a'. Stop. make: [Makefile:1868: arch/nios2/boot/dts] Error 2 (ignored) This is seen with compile tests since those enable NIOS2_DTB_SOURCE_BOOL, which in turn enables NIOS2_DTB_SOURCE. This causes the build error because the default value for NIOS2_DTB_SOURCE is an empty string. Disable NIOS2_DTB_SOURCE_BOOL for compile tests to avoid the error. Fixes: 2fc8483fdcde ("nios2: Build infrastructure") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2021-10-27Merge branch 'hns3-fixes'David S. Miller
Guangbin Huang says: ==================== net: hns3: add some fixes for -net This series adds some fixes for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: adjust string spaces of some parameters of tx bd info in debugfsGuangbin Huang
This patch adjusts the string spaces of some parameters of tx bd info in debugfs according to their maximum needs. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: expand buffer len for some debugfs commandGuangbin Huang
The specified buffer length for three debugfs files fd_tcam, uc and tqp is not enough for their maximum needs, so this patch fixes them. Fixes: b5a0b70d77b9 ("net: hns3: refactor dump fd tcam of debugfs") Fixes: 1556ea9120ff ("net: hns3: refactor dump mac list of debugfs") Fixes: d96b0e59468d ("net: hns3: refactor dump reg of debugfs") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: add more string spaces for dumping packets number of queue info ↵Jie Wang
in debugfs As the width of packets number registers is 32 bits, they needs at most 10 characters for decimal data printing, but now the string spaces is not enough, so this patch fixes it. Fixes: e44c495d95e ("net: hns3: refactor queue info of debugfs") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: fix data endian problem of some functions of debugfsJie Wang
The member data in struct hclge_desc is type of __le32, it needs endian conversion before using it, and some functions of debugfs didn't do that, so this patch fixes it. Fixes: c0ebebb9ccc1 ("net: hns3: Add "dcb register" status information query function") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: ignore reset event before initialization process is doneGuangbin Huang
Currently, if there is a reset event triggered by RAS during device in initialization process, driver may run reset process concurrently with initialization process. In this case, it may cause problem. For example, the RSS indirection table may has not been alloc memory in initialization process yet, but it is used in reset process, it will cause a call trace like this: [61228.744836] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [61228.897677] Workqueue: hclgevf hclgevf_service_task [hclgevf] [61228.911390] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) [61228.918670] pc : hclgevf_set_rss_indir_table+0xb4/0x190 [hclgevf] [61228.927812] lr : hclgevf_set_rss_indir_table+0x90/0x190 [hclgevf] [61228.937248] sp : ffff8000162ebb50 [61228.941087] x29: ffff8000162ebb50 x28: ffffb77add72dbc0 x27: ffff0820c7dc8080 [61228.949516] x26: 0000000000000000 x25: ffff0820ad4fc880 x24: ffff0820c7dc8080 [61228.958220] x23: ffff0820c7dc8090 x22: 00000000ffffffff x21: 0000000000000040 [61228.966360] x20: ffffb77add72b9c0 x19: 0000000000000000 x18: 0000000000000030 [61228.974646] x17: 0000000000000000 x16: ffffb77ae713feb0 x15: ffff0820ad4fcce8 [61228.982808] x14: ffffffffffffffff x13: ffff8000962eb7f7 x12: 00003834ec70c960 [61228.991990] x11: 00e0fafa8c206982 x10: 9670facc78a8f9a8 x9 : ffffb77add717530 [61229.001123] x8 : ffff0820ad4fd6b8 x7 : 0000000000000000 x6 : 0000000000000011 [61229.010249] x5 : 00000000000cb1b0 x4 : 0000000000002adb x3 : 0000000000000049 [61229.018662] x2 : ffff8000162ebbb8 x1 : 0000000000000000 x0 : 0000000000000480 [61229.027002] Call trace: [61229.030177] hclgevf_set_rss_indir_table+0xb4/0x190 [hclgevf] [61229.039009] hclgevf_rss_init_hw+0x128/0x1b4 [hclgevf] [61229.046809] hclgevf_reset_rebuild+0x17c/0x69c [hclgevf] [61229.053862] hclgevf_reset_service_task+0x4cc/0xa80 [hclgevf] [61229.061306] hclgevf_service_task+0x6c/0x630 [hclgevf] [61229.068491] process_one_work+0x1dc/0x48c [61229.074121] worker_thread+0x15c/0x464 [61229.078562] kthread+0x168/0x16c [61229.082873] ret_from_fork+0x10/0x18 [61229.088221] Code: 7900e7f6 f904a683 d503201f 9101a3e2 (38616b43) [61229.095357] ---[ end trace 153661a538f6768c ]--- To fix this problem, don't schedule reset task before initialization process is done. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: change hclge/hclgevf workqueue to WQ_UNBOUND modeYufeng Mo
Currently, the workqueue of hclge/hclgevf is executed on the CPU that initiates scheduling requests by default. In stress scenarios, the CPU may be busy and workqueue scheduling is completed after a long period of time. To avoid this situation and implement proper scheduling, use the WQ_UNBOUND mode instead. In this way, the workqueue can be performed on a relatively idle CPU. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27net: hns3: fix pause config problem after autoneg disabledGuangbin Huang
If a TP port is configured by follow steps: 1.ethtool -s ethx autoneg off speed 100 duplex full 2.ethtool -A ethx rx on tx on 3.ethtool -s ethx autoneg on(rx&tx negotiated pause results are off) 4.ethtool -s ethx autoneg off speed 100 duplex full In step 3, driver will set rx&tx pause parameters of hardware to off as pause parameters negotiated with link partner are off. After step 4, the "ethtool -a ethx" command shows both rx and tx pause parameters are on. However, pause parameters of hardware are still off and port has no flow control function actually. To fix this problem, if autoneg is disabled, driver uses its saved parameters to restore pause of hardware. If the speed is not changed in this case, there is no link state changed for phy, it will cause the pause parameter is not taken effect, so we need to force phy to go down and up. Fixes: aacbe27e82f0 ("net: hns3: modify how pause options is displayed") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27block: Fix partition check for host-aware zoned block devicesShin'ichiro Kawasaki
Commit a33df75c6328 ("block: use an xarray for disk->part_tbl") modified the method to check partition existence in host-aware zoned block devices from disk_has_partitions() helper function call to empty check of xarray disk->part_tbl. However, disk->part_tbl always has single entry for disk->part0 and never becomes empty. This resulted in the host-aware zoned devices always judged to have partitions, and it made the sysfs queue/zoned attribute to be "none" instead of "host-aware" regardless of partition existence in the devices. This also caused DEBUG_LOCKS_WARN_ON(lock->magic != lock) for sdkp->rev_mutex in scsi layer when the kernel detects host-aware zoned device. Since block layer handled the host-aware zoned devices as non- zoned devices, scsi layer did not have chance to initialize the mutex for zone revalidation. Therefore, the warning was triggered. To fix the issues, call the helper function disk_has_partitions() in place of disk->part_tbl empty check. Since the function was removed with the commit a33df75c6328, reimplement it to walk through entries in the xarray disk->part_tbl. Fixes: a33df75c6328 ("block: use an xarray for disk->part_tbl") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.14+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211026060115.753746-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: add async version of bio_set_polledPavel Begunkov
If we know that a iocb is async we can optimise bio_set_polled() a bit, add a new helper bio_set_polled_async(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8fa137885164a5d05fadcff4c3521da8d5a83d00.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: kill DIO_MULTI_BIOPavel Begunkov
Now __blkdev_direct_IO() serves only multi-bio I/O, thus remove not used anymore single bio refcounting optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/88eb488aae9ed4852a30f3a7132f296f56e43b80.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: kill unused polling bits in __blkdev_direct_IO()Pavel Begunkov
With addition of __blkdev_direct_IO_async(), __blkdev_direct_IO() now serves only multio-bio I/O, which we don't poll. Now we can remove anything related to I/O polling from it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8c597a6b7ee612df394853bfd24726aee5b898e.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27block: avoid extra iter advance with async iocbPavel Begunkov
Nobody cares about iov iterators state if we return -EIOCBQUEUED, so as the we now have __blkdev_direct_IO_async(), which gets pages only once, we can skip expensive iov_iter_advance(). It's around 1-2% of all CPU spent. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a6158edfbfa2ae3bc24aed29a72f035df18fad2f.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-27spi: fsi: Fix contention in the FSI2SPI engineEddie James
There was nothing to protect multiple SPI controllers on the same FSI2SPI device from being accessed through the FSI2SPI device at the same time. For example, multiple writes to the command and data registers might occur for different SPI controllers, resulting in complete chaos in the SPI engine. To prevent this, add a FSI2SPI device level mutex and lock it in the SPI register read and write functions. Fixes: bbb6b2f9865b ("spi: Add FSI-attached SPI controller driver") Signed-off-by: Eddie James <eajames@linux.ibm.com> Link: https://lore.kernel.org/r/20211026193327.52420-1-eajames@linux.ibm.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-10-27Revert "btrfs: compression: drop kmap/kunmap from generic helpers"David Sterba
This reverts commit 4c2bf276b56d8d27ddbafcdf056ef3fc60ae50b0. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-27nvmet-tcp: fix header digest verificationAmit Engel
Pass the correct length to nvmet_tcp_verify_hdgst, which is the pdu header length. This fixes a wrong behaviour where header digest verification passes although the digest is wrong. Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvmet: use flex_array_size and struct_sizeLen Baker
In an effort to avoid open-coded arithmetic in the kernel [1], use the flex_array_size() and struct_size() helpers instead of an open-coded calculation. [1] https://github.com/KSPP/linux/issues/160 Signed-off-by: Len Baker <len.baker@gmx.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvmet: register discovery subsystem as 'current'Hannes Reinecke
Register the discovery subsystem as the 'current' discovery subsystem, and add a new discovery log page entry for it. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvmet: switch check for subsystem typeHannes Reinecke
Invert the check for discovery subsystem type to allow for additional discovery subsystem types. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvme: add new discovery log page entry definitionsHannes Reinecke
TP8014 adds a new SUBTYPE value and a new field EFLAGS for the discovery log page entry. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvmet-tcp: fix data digest pointer calculationVarun Prakash
exp_ddgst is of type __le32, &cmd->exp_ddgst + cmd->offset increases &cmd->exp_ddgst by 4 * cmd->offset, fix this by type casting &cmd->exp_ddgst to u8 *. Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvme-tcp: fix data digest pointer calculationVarun Prakash
ddgst is of type __le32, &req->ddgst + req->offset increases &req->ddgst by 4 * req->offset, fix this by type casting &req->ddgst to u8 *. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27nvme-tcp: fix possible req->offset corruptionVarun Prakash
With commit db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context") r2t and response PDU can get processed while send function is executing. Current data digest send code uses req->offset after kernel_sendmsg(), this creates a race condition where req->offset gets reset before it is used in send function. This can happen in two cases - 1. Target sends r2t PDU which resets req->offset. 2. Target send response PDU which completes the req and then req is used for a new command, nvme_tcp_setup_cmd_pdu() resets req->offset. Fix this by storing req->offset in a local variable and using this local variable after kernel_sendmsg(). Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context") Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-26doc: Fix typo in request queue sysfs documentationDamien Le Moal
Fix a typo (are -> as) in the introduction paragraph of Documentation/block/queue-sysfs.rst. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-6-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26doc: document sysfs queue/independent_access_ranges attributesDamien Le Moal
Update the file Documentation/block/queue-sysfs.rst to add a description of a device queue sysfs entries related to independent access ranges (e.g. concurrent positioning ranges for multi-actuator hard-disks). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-5-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26libata: support concurrent positioning ranges logDamien Le Moal
Add support to discover if an ATA device supports the Concurrent Positioning Ranges data log (address 0x47), indicating that the device is capable of seeking to multiple different locations in parallel using multiple actuators serving different LBA ranges. Also add support to translate the concurrent positioning ranges log into its equivalent Concurrent Positioning Ranges VPD page B9h in libata-scsi.c. The format of the Concurrent Positioning Ranges Log is defined in ACS-5 r9. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-4-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26scsi: sd: add concurrent positioning ranges supportDamien Le Moal
Add the sd_read_cpr() function to the sd scsi disk driver to discover if a device has multiple concurrent positioning ranges (i.e. multiple actuators on an HDD). The existence of VPD page B9h indicates if a device has multiple concurrent positioning ranges. The page content describes each range supported by the device. sd_read_cpr() is called from sd_revalidate_disk() and uses the block layer functions disk_alloc_independent_access_ranges() and disk_set_independent_access_ranges() to represent the set of actuators of the device as independent access ranges. The format of the Concurrent Positioning Ranges VPD page B9h is defined in section 6.6.6 of SBC-5. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-3-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26block: Add independent access ranges supportDamien Le Moal
The Concurrent Positioning Ranges VPD page (for SCSI) and data log page (for ATA) contain parameters describing the set of contiguous LBAs that can be served independently by a single LUN multi-actuator hard-disk. Similarly, a logically defined block device composed of multiple disks can in some cases execute requests directed at different sector ranges in parallel. A dm-linear device aggregating 2 block devices together is an example. This patch implements support for exposing a block device independent access ranges to the user through sysfs to allow optimizing device accesses to increase performance. To describe the set of independent sector ranges of a device (actuators of a multi-actuator HDDs or table entries of a dm-linear device), The type struct blk_independent_access_ranges is introduced. This structure describes the sector ranges using an array of struct blk_independent_access_range structures. This range structure defines the start sector and number of sectors of the access range. The ranges in the array cannot overlap and must contain all sectors within the device capacity. The function disk_set_independent_access_ranges() allows a device driver to signal to the block layer that a device has multiple independent access ranges. In this case, a struct blk_independent_access_ranges is attached to the device request queue by the function disk_set_independent_access_ranges(). The function disk_alloc_independent_access_ranges() is provided for drivers to allocate this structure. struct blk_independent_access_ranges contains kobjects (struct kobject) to expose to the user through sysfs the set of independent access ranges supported by a device. When the device is initialized, sysfs registration of the ranges information is done from blk_register_queue() using the block layer internal function disk_register_independent_access_ranges(). If a driver calls disk_set_independent_access_ranges() for a registered queue, e.g. when a device is revalidated, disk_set_independent_access_ranges() will execute disk_register_independent_access_ranges() to update the sysfs attribute files. The sysfs file structure created starts from the independent_access_ranges sub-directory and contains the start sector and number of sectors of each range, with the information for each range grouped in numbered sub-directories. E.g. for a dual actuator HDD, the user sees: $ tree /sys/block/sdk/queue/independent_access_ranges/ /sys/block/sdk/queue/independent_access_ranges/ |-- 0 | |-- nr_sectors | `-- sector `-- 1 |-- nr_sectors `-- sector For a regular device with a single access range, the independent_access_ranges sysfs directory does not exist. Device revalidation may lead to changes to this structure and to the attribute values. When manipulated, the queue sysfs_lock and sysfs_dir_lock mutexes are held for atomicity, similarly to how the blk-mq and elevator sysfs queue sub-directories are protected. The code related to the management of independent access ranges is added in the new file block/blk-ia-ranges.c. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-2-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26tracing/histogram: Document expression arithmetic and constantsKalesh Singh
Histogram expressions now support division, and multiplication in addition to the already supported subtraction and addition operators. Numeric constants can also be used in a hist trigger expressions or assigned to a variable and used by refernce in an expression. Link: https://lkml.kernel.org/r/20211025200852.3002369-9-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Optimize division by a power of 2Kalesh Singh
The division is a slow operation. If the divisor is a power of 2, use a shift instead. Results were obtained using Android's version of perf (simpleperf[1]) as described below: 1. hist_field_div() is modified to call 2 test functions: test_hist_field_div_[not]_optimized(); passing them the same args. Use noinline and volatile to ensure these are not optimized out by the compiler. 2. Create a hist event trigger that uses division: events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>' >> trigger events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x' >> trigger 3. Run Android's lmkd_test[2] to generate rss_stat events, and record CPU samples with Android's simpleperf: simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g -f 2000 -o perf.data == Results == Divisor is a power of 2 (divisor == 32): test_hist_field_div_not_optimized | 8,717,091 cpu-cycles test_hist_field_div_optimized | 1,643,137 cpu-cycles If the divisor is a power of 2, the optimized version is ~5.3x faster. Divisor is not a power of 2 (divisor == 33): test_hist_field_div_not_optimized | 4,444,324 cpu-cycles test_hist_field_div_optimized | 5,497,958 cpu-cycles If the divisor is not a power of 2, as expected, the optimized version is slightly slower (~24% slower). [1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md [2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Covert expr to const if both operands are constantsKalesh Singh
If both operands of a hist trigger expression are constants, convert the expression to a constant. This optimization avoids having to perform the same calculation multiple times and also saves on memory since the merged constants are represented by a single struct hist_field instead or multiple. Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Simplify handling of .sym-offset in expressionsKalesh Singh
The '-' in .sym-offset can confuse the hist trigger arithmetic expression parsing. Simplify the handling of this by replacing the 'sym-offset' with 'symXoffset'. This allows us to correctly evaluate expressions where the user may have inadvertently added a .sym-offset modifier to one of the operands in an expression, instead of bailing out. In this case the .sym-offset has no effect on the evaluation of the expression. The only valid use of the .sym-offset is as a hist key modifier. Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Fix operator precedence for hist triggers expressionKalesh Singh
The current histogram expression evaluation logic evaluates the expression from right to left. This can lead to incorrect results if the operations are not associative (as is the case for subtraction and, the now added, division operators). e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2 64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2 Division and multiplication are currently limited to single operation expression due to operator precedence support not yet implemented. Rework the expression parsing to support the correct evaluation of expressions containing operators of different precedences; and fix the associativity error by evaluating expressions with operators of the same precedence from left to right. Examples: (1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \ >> event/trigger (2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger (3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger (4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Add division and multiplication support for hist triggersKalesh Singh
Adds basic support for division and multiplication operations for hist trigger variable expressions. For simplicity this patch only supports, division and multiplication for a single operation expression (e.g. x=$a/$b), as currently expressions are always evaluated right to left. This can lead to some incorrect results: e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger 8-4-2 should evaluate to 2 i.e. (8-4)-2 but currently x evaluate to 6 i.e. 8-(4-2) Multiplication and division in sub-expressions will work correctly, once correct operator precedence support is added (See next patch in this series). For the undefined case of division by 0, the histogram expression evaluates to (u64)(-1). Since this cannot be detected when the expression is created, it is the responsibility of the user to be aware and account for this possibility. Examples: echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \ >> event/trigger echo 'hist:keys=common_pid:y=5*$b' \ >> event/trigger Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>