summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-11IB/IPoIB: Fix queue count inconsistency for PKEY child interfacesDragos Tatulea
There are 2 ways to create IPoIB PKEY child interfaces: 1) Writing a PKEY to /sys/class/net/<ib parent interface>/create_child. 2) Using netlink with iproute. While with sysfs the child interface has the same number of tx and rx queues as the parent, with netlink there will always be 1 tx and 1 rx queue for the child interface. That's because the get_num_tx/rx_queues() netlink ops are missing and the default value of 1 is taken for the number of queues (in rtnl_create_link()). This change adds the get_num_tx/rx_queues() ops which allows for interfaces with multiple queues to be created over netlink. This constant only represents the max number of tx and rx queues on that net device. Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/f4a42c8aa43c02d5ae5559a60c3e5e0f18c82531.1670485816.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-11RDMA: Add missed netdev_put() for the netdevice_trackerJason Gunthorpe
The netdev core will detect if any untracked puts are done on tracked pointers and throw refcount warnings: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 1 PID: 33 at lib/refcount.c:31 refcount_warn_saturate+0x1d7/0x1f0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 33 Comm: kworker/u4:2 Not tainted 6.1.0-rc8-next-20221207-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: ib-unreg-wq ib_unregister_work RIP: 0010:refcount_warn_saturate+0x1d7/0x1f0 lib/refcount.c:31 Code: 05 5a 60 51 0a 01 e8 35 0a b5 05 0f 0b e9 d3 fe ff ff e8 6c 9b 75 fd 48 c7 c7 c0 6d a6 8a c6 05 37 60 51 0a 01 e8 16 0a b5 05 <0f> 0b e9 b4 fe +ff ff 48 89 ef e8 5a b5 c3 fd e9 5c fe ff ff 0f 1f RSP: 0018:ffffc90000aa7b30 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8880172f9d40 RSI: ffffffff8166b1dc RDI: fffff52000154f58 RBP: ffff88807906c600 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000080000001 R11: 0000000000000000 R12: 1ffff92000154f6b R13: 0000000000000000 R14: ffff88807906c600 R15: ffff888046894000 FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe350a8ff8 CR3: 000000007a9e7000 CR4: 00000000003526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] ref_tracker_free+0x539/0x6b0 lib/ref_tracker.c:118 netdev_tracker_free include/linux/netdevice.h:4039 [inline] netdev_put include/linux/netdevice.h:4056 [inline] dev_put include/linux/netdevice.h:4082 [inline] free_netdevs+0x1f8/0x470 drivers/infiniband/core/device.c:2204 __ib_unregister_device+0xa0/0x1a0 drivers/infiniband/core/device.c:1478 ib_unregister_work+0x19/0x30 drivers/infiniband/core/device.c:1586 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 So change the missed dev_put for pdata->netdev to also follow the tracker. Fixes: 09f530f0c6d6 ("RDMA: Add netdevice_tracker to ib_device_set_netdev()") Reported-by: syzbot+3fd8326d9a0812d19218@syzkaller.appspotmail.com Reported-by: syzbot+a1ed8ffe3121380cd5dd@syzkaller.appspotmail.com Reported-by: syzbot+8d0a099c8a6d1e4e601c@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-e99919867b8d+1e2-netdev_tracker2_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-09RDMA/rxe: Enable RDMA FLUSH capability for rxe deviceLi Zhijian
Now we are ready to enable RDMA FLUSH capability for RXE. It can support Global Visibility and Persistence placement types. Link: https://lore.kernel.org/r/20221206130201.30986-11-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/cm: Make QP FLUSHABLE for supported deviceLi Zhijian
Similar to RDMA and Atomic qp attributes enabled by default in CM, enable FLUSH attribute for supported device. That makes applications that are built with rdma_create_ep, rdma_accept APIs have FLUSH qp attribute natively so that user is able to request FLUSH operation simpler. Note that, a FLUSH operation requires FLUSH are supported by both device(HCA) and memory region(MR) and QP at the same time, so it's safe to enable FLUSH qp attribute by default here. FLUSH attribute can be disable by modify_qp() interface. Link: https://lore.kernel.org/r/20221206130201.30986-10-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement flush completionLi Zhijian
Per IBA SPEC, FLUSH will ack in rdma read response with 0 length. Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH completion. Link: https://lore.kernel.org/r/20221206130201.30986-9-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement flush execution in responder sideLi Zhijian
Only the requested placement types that also registered in the destination memory region are acceptable. Otherwise, responder will also reply NAK "Remote Access Error" if it found a placement type violation. We will persist data via arch_wb_cache_pmem(), which could be architecture specific. This commit also adds 2 helpers to update qp.resp from the incoming packet. Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement RC RDMA FLUSH service in requester sideLi Zhijian
Implement FLUSH request operation in the requester. Link: https://lore.kernel.org/r/20221206130201.30986-7-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Extend rxe packet format to support flushLi Zhijian
Extend rxe opcode tables, headers, helper and constants to support flush operations. Refer to the IBA A19.4.1 for more FETH definition details Link: https://lore.kernel.org/r/20221206130201.30986-6-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Allow registering persistent flag for pmem MR onlyLi Zhijian
Memory region could support at most 2 flush access flags: IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL But we only allow user to register persistent flush flags to the pmem MR where it has the ability of persisting data across power cycles. So registering a persistent access flag to a non-pmem MR will be rejected. Link: https://lore.kernel.org/r/20221206130201.30986-5-lizhijian@fujitsu.com CC: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Extend rxe user ABI to support flushLi Zhijian
This commit extends the rxe user ABI to support the flush operation defined in IBA A19.4.1. These changes are backward compatible with the existing rxe user ABI. The user API request a flush by filling this structure. Link: https://lore.kernel.org/r/20221206130201.30986-4-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA: Extend RDMA kernel verbs ABI to support flushLi Zhijian
This commit extends the RDMA kernel verbs ABI to support the flush operation defined in IBA A19.4.1. These changes are backward compatible with the existing RDMA kernel verbs ABI. It makes device/HCA support new FLUSH attributes/capabilities, and it also makes memory region support new FLUSH access flags. Users can use ibv_reg_mr(3) to register flush access flags. Only the access flags also supported by device's capabilities can be registered successfully. Once registered successfully, it means the MR is flushable. Similarly, A flushable MR should also have one or both of GLOBAL_VISIBILITY and PERSISTENT attributes/capabilities like device/HCA. Link: https://lore.kernel.org/r/20221206130201.30986-3-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA: Extend RDMA user ABI to support flushLi Zhijian
This commit extends the RDMA user ABI to support the flush operation defined in IBA A19.4.1. These changes are backward compatible with the existing RDMA user ABI. Link: https://lore.kernel.org/r/20221206130201.30986-2-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Fix incorrect responder length checkingBob Pearson
The code in rxe_resp.c at check_length() is incorrect as it compares pkt->opcode an 8 bit value against various mask bits which are all higher than 256 so nothing is ever reported. This patch rewrites this to compare against pkt->mask which is correct. However this now exposes another error. For UD send packets the value of the pmtu cannot be determined from qp->mtu. All that is required here is to later check if the payload fits into the posted receive buffer in that case. Fixes: 837a55847ead ("RDMA/rxe: Implement packet length validation on responder") Link: https://lore.kernel.org/r/20221208210945.28607-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Fix oops with zero length readsDaisuke Matsuda
The commit 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") causes a kernel crash. If responder get a zero-byte RDMA Read request, qp->resp.mr is not set in check_rkey() (see IBA C9-88). The mr is NULL in this case, and a NULL pointer dereference occurs as shown below. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe] Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246 RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000 RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000 RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000 FS: 00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0 Call Trace: <IRQ> read_reply+0xda/0x310 [rdma_rxe] rxe_responder+0x82d/0xe50 [rdma_rxe] do_task+0x84/0x170 [rdma_rxe] tasklet_action_common.constprop.0+0xa7/0x120 __do_softirq+0xcb/0x2ac do_softirq+0x63/0x90 </IRQ> Support a NULL mr during read_reply() Fixes: 686d348476ee ("RDMA/rxe: Remove unnecessary mr testing") Fixes: b5f9a01fae42 ("RDMA/rxe: Fix mr leak in RESPST_ERR_RNR") Link: https://lore.kernel.org/r/20221209045926.531689-1-matsuda-daisuke@fujitsu.com Link: https://lore.kernel.org/r/20221202145713.13152-1-lizhijian@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09Merge tag 'v6.1-rc8' into rdma.git for-nextJason Gunthorpe
For dependencies in following patches Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-08RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB defineZhu Yanjun
IB_FLOW_SPEC_IB is not used in mlx5 and can be deleted. Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20221208101954.687960-1-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-07RDMA/hns: Fix XRC caps on HIP08Chengchang Tang
XRC caps has been set by default. But in fact, XRC is not supported in HIP08. Fixes: 32548870d438 ("RDMA/hns: Add support for XRC on HIP09") Link: https://lore.kernel.org/r/20221126102911.2921820-7-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix error code of CMDChengchang Tang
The error code is fixed to EIO when CMD fails to excute. This patch converts the error status reported by firmware to linux errno. Fixes: a04ff739f2a9 ("RDMA/hns: Add command queue support for hip08 RoCE driver") Link: https://lore.kernel.org/r/20221126102911.2921820-6-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix page size cap from firmwareChengchang Tang
Add verification to make sure the roce page size cap is supported by the system page size. Fixes: ba6bb7e97421 ("RDMA/hns: Add interfaces to get pf capabilities from firmware") Link: https://lore.kernel.org/r/20221126102911.2921820-5-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix PBL page MTR findChengchang Tang
Now, The address of the first two pages in the MR will be searched, which use to speed up the lookup of the pbl table for hardware. An exception will occur when there is only one page in this MR. This patch fix the number of page to search. Fixes: 9b2cf76c9f05 ("RDMA/hns: Optimize PBL buffer allocation process") Link: https://lore.kernel.org/r/20221126102911.2921820-4-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix AH attr queried by query_qpChengchang Tang
The queried AH attr is invalid. This patch fix it. This problem is found by rdma-core test test_mr_rereg_pd ERROR: test_mr_rereg_pd (tests.test_mr.MRTest) Test that cover rereg MR's PD with this flow: ---------------------------------------------------------------------- Traceback (most recent call last): File "./tests/test_mr.py", line 157, in test_mr_rereg_pd self.restate_qps() File "./tests/test_mr.py", line 113, in restate_qps self.server.qp.to_rts(self.server_qp_attr) File "qp.pyx", line 1137, in pyverbs.qp.QP.to_rts File "qp.pyx", line 1123, in pyverbs.qp.QP.to_rtr pyverbs.pyverbs_error.PyverbsRDMAError: Failed to modify QP state to RTR. Errno: 22, Invalid argument Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/20221126102911.2921820-3-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hns: Fix the gid problem caused by free mrYixing Liu
After the hns roce driver is loaded, if you modify the mac address of the network port, the following error will appear: __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fe22:abb5 error=-28 hns3 0000:7d:00.0 hns_0: attr path_mtu(1) invalid while modify qp The reason for the error is that the gid being occupied will cause the failure to modify the gid. The gid is occupied by the loopback QP used by free mr. When the mac address is modified, the gid will change. If there is a busy QP at this time, the gid will not be released and the modification will fail. The QP of free mr is created using the ib interface. The ib interface will add a reference count to the gid, resulting in this error scenario. Considering that free mr is solving a bug in HIP08, not an actual business, it is not necessary to use ib interfaces. Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Link: https://lore.kernel.org/r/20221126102911.2921820-2-xuhaoyue1@hisilicon.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/cma: Change RoCE packet life time from 18 to 16Chao Leng
The ack timeout retransmission time is affected by the following two factors: one is packet life time, another is the HCA processing time. Now the default packet lifetime(CMA_IBOE_PACKET_LIFETIME) is 18. That means the minimum ack timeout is 2 seconds (2^(18+1)*4us=2.097seconds). The packet lifetime means the maximum transmission time of packets on the network, 2 seconds is too long. Assume the network is a clos topology with three layers, every packet will pass through five hops of switches. Assume the buffer of every switch is 128MB and the port transmission rate is 25 Gbit/s, the maximum transmission time of the packet is 200ms(128MB*5/25Gbit/s). Add double redundancy, it is less than 500ms. So change the CMA_IBOE_PACKET_LIFETIME to 16, the maximum transmission time of the packet will be about 500+ms, it is long enough. Link: https://lore.kernel.org/r/20221125010026.755-1-lengchao@huawei.com Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-07RDMA/hfi1: use sysfs_emit() to instead of scnprintf()ye xingchen
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202212071632188074249@zte.com.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04Linux 6.1-rc8v6.1-rc8Linus Torvalds
2022-12-04Revert "mm: align larger anonymous mappings on THP boundaries"Linus Torvalds
This reverts commit f35b5d7d676e59e401690b678cd3cfec5e785c23. It has been reported to cause huge performance regressions on some loads (will-it-scale.per_process_ops, but also building the kernel with clang). The commit did speed up gcc builds by a small amount, so it's not an unambiguous regression, but until the big regressions are understood, let's revert it. Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210181535.7144dd15-yujie.liu@intel.com Reported-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%2BrGce@dev-arch.thelio-3990X/ Cc: Huang, Ying <ying.huang@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-04char: tpm: Protect tpm_pm_suspend with locksJan Dabros
Currently tpm transactions are executed unconditionally in tpm_pm_suspend() function, which may lead to races with other tpm accessors in the system. Specifically, the hw_random tpm driver makes use of tpm_get_random(), and this function is called in a loop from a kthread, which means it's not frozen alongside userspace, and so can race with the work done during system suspend: tpm tpm0: tpm_transmit: tpm_recv: error -52 tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 Call Trace: tpm_tis_status.cold+0x19/0x20 tpm_transmit+0x13b/0x390 tpm_transmit_cmd+0x20/0x80 tpm1_pm_suspend+0xa6/0x110 tpm_pm_suspend+0x53/0x80 __pnp_bus_suspend+0x35/0xe0 __device_suspend+0x10f/0x350 Fix this by calling tpm_try_get_ops(), which itself is a wrapper around tpm_chip_start(), but takes the appropriate mutex. Signed-off-by: Jan Dabros <jsd@semihalf.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/all/c5ba47ef-393f-1fba-30bd-1230d1b4b592@suse.cz/ Cc: stable@vger.kernel.org Fixes: e891db1a18bf ("tpm: turn on TPM on suspend for TPM 1.x") [Jason: reworked commit message, added metadata] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-04Merge tag 'perf_urgent_for_v6.1_rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Fix a use-after-free case where the perf pending task callback would see an already freed event * tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix perf_pending_task() UaF
2022-12-04Merge tag 'timers_urgent_for_v6.1_rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Revert a fix to RISC-V timers supposed to address an uncertainty whether clock events are received during S3 or not which locks up other RISC-V platforms. The issue will be fixed differently later. * tag 'timers_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "clocksource/drivers/riscv: Events are stopped during CPU suspend"
2022-12-04Merge tag 'powerpc-6.1-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix oops in 32-bit BPF tail call tests - Add missing declaration for machine_check_early_boot() Thanks to Christophe Leroy and Naveen N. Rao. * tag 'powerpc-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Add missing declaration for machine_check_early_boot() powerpc/bpf/32: Fix Oops on tail call tests
2022-12-04Merge tag 'input-for-v6.1-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fix from Dmitry Torokhov: - a fix for Raydium touchscreen driver to stop leaking memory when sending commands to the chip * tag 'input-for-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()
2022-12-04RDMA/mlx5: no need to kfree NULL pointerLi Zhijian
Goto label 'free' where it will kfree the 'in' is not needed though it's safe to kfree NULL. Return err code directly to simplify the code. 1973 free: 1974 kfree(in); 1975 return err; Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20221203033714.25870-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA/srp: Fix error return code in srp_parse_options()Wang Yufen
In the previous iteration of the while loop, the "ret" may have been assigned a value of 0, so the error return code -EINVAL may have been incorrectly set to 0. To fix set valid return code before calling to goto. Also investigate each case separately as Andy suggessted. Fixes: e711f968c49c ("IB/srp: replace custom implementation of hex2bin()") Fixes: 2a174df0c602 ("IB/srp: Use kstrtoull() instead of simple_strtoull()") Fixes: 19f313438c77 ("IB/srp: Add RDMA/CM support") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/1669953638-11747-2-git-send-email-wangyufen@huawei.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA/hfi1: Fix error return code in parse_platform_config()Wang Yufen
In the previous iteration of the while loop, the "ret" may have been assigned a value of 0, so the error return code -EINVAL may have been incorrectly set to 0. To fix set valid return code before calling to goto. Fixes: 97167e813415 ("staging/rdma/hfi1: Tune for unknown channel if configuration file is absent") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/1669953638-11747-1-git-send-email-wangyufen@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-04RDMA: Disable IB HW for UMLRandy Dunlap
Disable all of drivers/infiniband/hw/ and rdmavt for UML builds until someone needs it and provides patches to support it. This prevents build errors in hw/qib/qib_wc_x86_64.c. Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-rdma@vger.kernel.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-um@lists.infradead.org Link: https://lore.kernel.org/r/20221202211940.29111-1-rdunlap@infradead.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-12-03Merge tag 'i2c-for-6.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "A power state fix in the core for ACPI devices, a regression fix regarding bus recovery for the cadence driver, a DMA handling fix for the imx driver, and two error path fixes (npcm7xx and qcom-geni)" * tag 'i2c-for-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx: Only DMA messages with I2C_M_DMA_SAFE flag set i2c: qcom-geni: fix error return code in geni_i2c_gpi_xfer i2c: cadence: Fix regression with bus recovery i2c: Restore initial power state if probe fails i2c: npcm7xx: Fix error handling in npcm_i2c_init()
2022-12-03Merge tag 'dax-fixes-6.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fixes from Dan Williams: "A few bug fixes around the handling of "Soft Reserved" memory and memory tiering information. Linux is starting to enounter more real world systems that deploy an ACPI HMAT to describe different performance classes of memory, as well the "special purpose" (Linux "Soft Reserved") designation from EFI. These fixes result from that testing. It has all appeared in -next for a while with no known issues. - Fix duplicate overlapping device-dax instances for HMAT described "Soft Reserved" Memory - Fix missing node targets in the sysfs representation of memory tiers - Remove a confusing variable initialization" * tag 'dax-fixes-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: Fix duplicate 'hmem' device registration ACPI: HMAT: Fix initiator registration for single-initiator systems ACPI: HMAT: remove unnecessary variable initialization
2022-12-02Merge tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "Just a small NVMe merge for this week, fixing protection of the name space list, and a missing clear of a reserved field when unused" * tag 'block-6.1-2022-12-02' of git://git.kernel.dk/linux: nvme: fix SRCU protection of nvme_ns_head list nvme-pci: clear the prp2 field when not used
2022-12-02Merge tag 'pinctrl-v6.1-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Three driver fixes. The Intel fix looks like the most important. - Fix a potential divide by zero in pinctrl-singe (OMAP and HiSilicon) - Disable IRQs on startup in the Mediatek driver. This is a classic, we should be looking out for this more. - Save and restore pins in 'direct IRQ' mode in the Intel driver, this works around firmware bugs" * tag 'pinctrl-v6.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: intel: Save and restore pins in "direct IRQ" mode pinctrl: meditatek: Startup with the IRQs disabled pinctrl: single: Fix potential division by zero
2022-12-02Merge tag 'riscv-for-linus-6.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - build fix for the NR_CPUS Kconfig SBI version dependency - fixes to early memory initialization, to fix page permissions in EFI and post-initmem-free - build fix for the VDSO, to avoid trying to profile the VDSO functions - fixes for kexec crash handling, to fix multi-core and interrupt related initialization inside the crash kernel - fix for a race condition when handling multiple concurrect kernel stack overflows * tag 'riscv-for-linus-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: kexec: Fixup crash_smp_send_stop without multi cores riscv: kexec: Fixup irq controller broken in kexec crash path riscv: mm: Proper page permissions after initmem free riscv: vdso: fix section overlapping under some conditions riscv: fix race when vmap stack overflow riscv: Sync efi page table's kernel mappings before switching riscv: Fix NR_CPUS range conditions
2022-12-02Merge tag 'mmc-v6.1-rc5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix ambiguous TRIM and DISCARD args - Fix removal of debugfs file for mmc_test MMC host: - mtk-sd: Add missing clk_disable_unprepare() in an error path - sdhci: Fix I/O voltage switch delay for UHS-I SD cards - sdhci-esdhc-imx: Fix CQHCI exit halt state check - sdhci-sprd: Fix voltage switch" * tag 'mmc-v6.1-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-sprd: Fix no reset data and command after voltage switch mmc: sdhci: Fix voltage switch delay mmc: mtk-sd: Fix missing clk_disable_unprepare in msdc_of_clock_parse() mmc: mmc_test: Fix removal of debugfs file mmc: sdhci-esdhc-imx: correct CQHCI exit halt state check mmc: core: Fix ambiguous TRIM and DISCARD arg
2022-12-02Merge tag 'iommu-fixes-v6.1-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: "Intel VT-d fixes: - IO/TLB flush fix - Various pci_dev refcount fixes" * tag 'iommu-fixes-v6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init() iommu/vt-d: Fix PCI device refcount leak in has_external_pci() iommu/vt-d: Fix PCI device refcount leak in prq_event_thread() iommu/vt-d: Add a fix for devices need extra dtlb flush
2022-12-02x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3Pawan Gupta
The "force" argument to write_spec_ctrl_current() is currently ambiguous as it does not guarantee the MSR write. This is due to the optimization that writes to the MSR happen only when the new value differs from the cached value. This is fine in most cases, but breaks for S3 resume when the cached MSR value gets out of sync with the hardware MSR value due to S3 resetting it. When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write is skipped. Which results in SPEC_CTRL mitigations not getting restored. Move the MSR write from write_spec_ctrl_current() to a new function that unconditionally writes to the MSR. Update the callers accordingly and rename functions. [ bp: Rework a bit. ] Fixes: caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value") Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-02Input: raydium_ts_i2c - fix memory leak in raydium_i2c_send()Zhang Xiaoxu
There is a kmemleak when test the raydium_i2c_ts with bpf mock device: unreferenced object 0xffff88812d3675a0 (size 8): comm "python3", pid 349, jiffies 4294741067 (age 95.695s) hex dump (first 8 bytes): 11 0e 10 c0 01 00 04 00 ........ backtrace: [<0000000068427125>] __kmalloc+0x46/0x1b0 [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts] [<000000006e631aee>] raydium_i2c_initialize.cold+0xbc/0x3e4 [raydium_i2c_ts] [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts] [<00000000a310de16>] i2c_device_probe+0x651/0x680 [<00000000f5a96bf3>] really_probe+0x17c/0x3f0 [<00000000096ba499>] __driver_probe_device+0xe3/0x170 [<00000000c5acb4d9>] driver_probe_device+0x49/0x120 [<00000000264fe082>] __device_attach_driver+0xf7/0x150 [<00000000f919423c>] bus_for_each_drv+0x114/0x180 [<00000000e067feca>] __device_attach+0x1e5/0x2d0 [<0000000054301fc2>] bus_probe_device+0x126/0x140 [<00000000aad93b22>] device_add+0x810/0x1130 [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0 [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110 [<00000000ffec4177>] of_i2c_notify+0x100/0x160 unreferenced object 0xffff88812d3675c8 (size 8): comm "python3", pid 349, jiffies 4294741070 (age 95.692s) hex dump (first 8 bytes): 22 00 36 2d 81 88 ff ff ".6-.... backtrace: [<0000000068427125>] __kmalloc+0x46/0x1b0 [<0000000090180f91>] raydium_i2c_send+0xd4/0x2bf [raydium_i2c_ts] [<000000001d5c9620>] raydium_i2c_initialize.cold+0x223/0x3e4 [raydium_i2c_ts] [<00000000dc6fcf38>] raydium_i2c_probe+0x3cd/0x6bc [raydium_i2c_ts] [<00000000a310de16>] i2c_device_probe+0x651/0x680 [<00000000f5a96bf3>] really_probe+0x17c/0x3f0 [<00000000096ba499>] __driver_probe_device+0xe3/0x170 [<00000000c5acb4d9>] driver_probe_device+0x49/0x120 [<00000000264fe082>] __device_attach_driver+0xf7/0x150 [<00000000f919423c>] bus_for_each_drv+0x114/0x180 [<00000000e067feca>] __device_attach+0x1e5/0x2d0 [<0000000054301fc2>] bus_probe_device+0x126/0x140 [<00000000aad93b22>] device_add+0x810/0x1130 [<00000000c086a53f>] i2c_new_client_device+0x352/0x4e0 [<000000003c2c248c>] of_i2c_register_device+0xf1/0x110 [<00000000ffec4177>] of_i2c_notify+0x100/0x160 After BANK_SWITCH command from i2c BUS, no matter success or error happened, the tx_buf should be freed. Fixes: 3b384bd6c3f2 ("Input: raydium_ts_i2c - do not split tx transactions") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Link: https://lore.kernel.org/r/20221202103412.2120169-1-zhangxiaoxu5@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-12-02Merge tag 'sound-6.1-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Likely the last piece for 6.1; the only significant fixes are ASoC core ops fixes, while others are device-specific (rather minor) fixes in ASoC and FireWire drivers. All appear safe enough to take as a late stage material" * tag 'sound-6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: dice: fix regression for Lexicon I-ONIX FW810S ASoC: cs42l51: Correct PGA Volume minimum value ASoC: ops: Correct bounds check for second channel on SX controls ASoC: tlv320adc3xxx: Fix build error for implicit function declaration ASoC: ops: Check bounds for second channel in snd_soc_put_volsw_sx() ASoC: ops: Fix bounds check for _sx controls ASoC: fsl_micfil: explicitly clear CHnF flags ASoC: fsl_micfil: explicitly clear software reset bit
2022-12-02Merge tag 'drm-fixes-2022-12-02' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Things do seem to have finally settled down, just four i915 and one amdgpu this week. Probably won't have much for next week if you do push rc8 out. i915: - Fix dram info readout - Remove non-existent pipes from bigjoiner pipe mask - Fix negative value passed as remaining time - Never return 0 if not all requests retired amdgpu: - VCN fix for vangogh" * tag 'drm-fixes-2022-12-02' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: enable Vangogh VCN indirect sram mode drm/i915: Never return 0 if not all requests retired drm/i915: Fix negative value passed as remaining time drm/i915: Remove non-existent pipes from bigjoiner pipe mask drm/i915/mtl: Fix dram info readout
2022-12-02Merge tag 'mm-hotfixes-stable-2022-12-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "15 hotfixes, 11 marked cc:stable. Only three or four of the latter address post-6.0 issues, which is hopefully a sign that things are converging" * tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible" Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths mm/khugepaged: fix GUP-fast interaction by sending IPI mm/khugepaged: take the right locks for page table retraction mm: migrate: fix THP's mapcount on isolation mm: introduce arch_has_hw_nonleaf_pmd_young() mm: add dummy pmd_young() for architectures not having it mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes() tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing madvise: use zap_page_range_single for madvise dontneed mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
2022-12-02v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() failsLinus Torvalds
The V4L2_MEMORY_USERPTR interface is long deprecated and shouldn't be used (and is discouraged for any modern v4l drivers). And Seth Jenkins points out that the fallback to VM_PFNMAP/VM_IO is fundamentally racy and dangerous. Note that it's not even a case that should trigger, since any normal user pointer logic ends up just using the pin_user_pages_fast() call that does the proper page reference counting. That's not the problem case, only if you try to use special device mappings do you have any issues. Normally I'd just remove this during the merge window, but since Seth pointed out the problem cases, we really want to know as soon as possible if there are actually any users of this odd special case of a legacy interface. Neither Hans nor Mauro seem to think that such mis-uses of the old legacy interface should exist. As Mauro says: "See, V4L2 has actually 4 streaming APIs: - Kernel-allocated mmap (usually referred simply as just mmap); - USERPTR mmap; - read(); - dmabuf; The USERPTR is one of the oldest way to use it, coming from V4L version 1 times, and by far the least used one" And Hans chimed in on the USERPTR interface: "To be honest, I wouldn't mind if it goes away completely, but that's a bit of a pipe dream right now" but while removing this legacy interface entirely may be a pipe dream we can at least try to remove the unlikely (and actively broken) case of using special device mappings for USERPTR accesses. This replaces it with a WARN_ONCE() that we can remove once we've hopefully confirmed that no actual users exist. NOTE! Longer term, this means that a 'struct frame_vector' only ever contains proper page pointers, and all the games we have with converting them to pages can go away (grep for 'frame_vector_to_pages()' and the uses of 'vec->is_pfns'). But this is just the first step, to verify that this code really is all dead, and do so as quickly as possible. Reported-by: Seth Jenkins <sethjenkins@google.com> Acked-by: Hans Verkuil <hverkuil@xs4all.nl> Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-02Merge tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme into block-6.1Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.1 - fix SRCU protection of nvme_ns_head list (Caleb Sander) - clear the prp2 field when not used (Lei Rao)" * tag 'nvme-6.1-2022-01-02' of git://git.infradead.org/nvme: nvme: fix SRCU protection of nvme_ns_head list nvme-pci: clear the prp2 field when not used
2022-12-02iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()Xiongfeng Wang
for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() for the error path to avoid reference count leak. Fixes: 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-3-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>