summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-18nfc: fdp: Add MODULE_FIRMWARE macrosJuerg Haefliger
The module loads firmware so add MODULE_FIRMWARE macros to provide that information via modinfo. Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-18ieee802154/adf7242: Add MODULE_FIRMWARE macroJuerg Haefliger
The module loads firmware so add a MODULE_FIRMWARE macro to provide that information via modinfo. Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-18erofs: use separate xattr parsers for listxattr/getxattrJingbo Xu
There's a callback styled xattr parser, i.e. xattr_foreach(), which is shared among listxattr and getxattr. Convert it to two separate xattr parsers to serve listxattr and getxattr for better readability. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230613074114.120115-6-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-06-18erofs: unify inline/shared xattr iterators for listxattr/getxattrJingbo Xu
Make inline_{list,get}xattr() as well as inline_xattr_iter_begin() unified as erofs_xattr_iter_inline(), and shared_{list,get}xattr() unified as erofs_xattr_iter_shared(). After these changes, both erofs_xattr_iter_{inline,shared}() return 0 on success, and negative error on failure. One thing worth noting is that, the logic of returning it->buffer_ofs when there's no shared xattrs in shared_listxattr() is moved to erofs_listxattr() to make the unification possible. The only difference is that, semantically the old behavior will return ENOATTR rather than it->buffer_ofs if ENOATTR encountered when listxattr is parsing upon a specific shared xattr, while now the new behavior will return it->buffer_ofs in this case. This is not an issue, as listxattr upon a specific xattr won't return ENOATTR. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230613074114.120115-5-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-06-18erofs: make the size of read data stored in buffer_ofsJingbo Xu
Since now xattr_iter structures have been unified, make the size of the read data stored in buffer_ofs. Don't bother reusing buffer_size for this use, which may be confusing. This is in preparation for the following further cleanup. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230613074114.120115-4-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-06-18erofs: unify xattr_iter structuresJingbo Xu
Unify xattr_iter/listxattr_iter/getxattr_iter structures into erofs_xattr_iter structure. This is in preparation for the following further cleanup. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230613074114.120115-3-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-06-18erofs: use absolute position in xattr iteratorJingbo Xu
Replace blkaddr/ofs with pos in 'struct erofs_xattr_iter'. After erofs_bread() is introduced to replace raw page cache APIs for metadata I/Os handling, xattr_iter_fixup() is no longer needed anymore. In addition, it is also unnecessary to check if the iterated position is span over the block boundary as absolute offset is used instead of blkaddr + offset pairs. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230613074114.120115-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-06-18erofs: fix compact 4B support for 16k block sizeGao Xiang
In compact 4B, two adjacent lclusters are packed together as a unit to form on-disk indexes for effective random access, as below: (amortized = 4, vcnt = 2) _____________________________________________ |___@_____ encoded bits __________|_ blkaddr _| 0 . amortized * vcnt = 8 . . . . amortized * vcnt - 4 = 4 . . .____________________________. |_type (2 bits)_|_clusterofs_| Therefore, encoded bits for each pack are 32 bits (4 bytes). IOWs, since each lcluster can get 16 bits for its type and clusterofs, the maximum supported lclustersize for compact 4B format is 16k (14 bits). Fix this to enable compact 4B format for 16k lclusters (blocks), which is tested on an arm64 server with 16k page size. Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230601112341.56960-1-hsiangkao@linux.alibaba.com
2023-06-18erofs: convert erofs_read_metabuf() to erofs_bread() for xattrJingbo Xu
buf->inode is constant once initialized during erofs_buf's lifetime. Thus call erofs_init_metabuf() and erofs_bread() separately to avoid the repetition of assigning buf->inode when iterating xattrs. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230601024347.108469-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-06-18erofs: use poison pointer to replace the hard-coded addressGao Xiang
It's safer and cleaner to replace such hard-coded illegal pointer with poison pointers. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-7-hsiangkao@linux.alibaba.com
2023-06-18erofs: use struct lockref to replace handcrafted approachGao Xiang
Let's avoid the current handcrafted lockref although `struct lockref` inclusion usually increases extra 4 bytes with an explicit spinlock if CONFIG_DEBUG_SPINLOCK is off. Apart from the size difference, note that the meaning of refcount is also changed to active users. IOWs, it doesn't take an extra refcount for XArray tree insertion. I don't observe any significant performance difference at least on our cloud compute server but the new one indeed simplifies the overall codebase a bit. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230529123727.79943-1-hsiangkao@linux.alibaba.com
2023-06-18PCI: hv: Add a per-bus mutex state_lockDexuan Cui
In the case of fast device addition/removal, it's possible that hv_eject_device_work() can start to run before create_root_hv_pci_bus() starts to run; as a result, the pci_get_domain_bus_and_slot() in hv_eject_device_work() can return a 'pdev' of NULL, and hv_eject_device_work() can remove the 'hpdev', and immediately send a message PCI_EJECTION_COMPLETE to the host, and the host immediately unassigns the PCI device from the guest; meanwhile, create_root_hv_pci_bus() and the PCI device driver can be probing the dead PCI device and reporting timeout errors. Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the mutex before powering on the PCI bus in hv_pci_enter_d0(): when hv_eject_device_work() starts to run, it's able to find the 'pdev' and call pci_stop_and_remove_bus_device(pdev): if the PCI device driver has loaded, the PCI device driver's probe() function is already called in create_root_hv_pci_bus() -> pci_bus_add_devices(), and now hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to call the PCI device driver's remove() function and remove the device reliably; if the PCI device driver hasn't loaded yet, the function call hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to remove the PCI device reliably and the PCI device driver's probe() function won't be called; if the PCI device driver's probe() is already running (e.g., systemd-udev is loading the PCI device driver), it must be holding the per-device lock, and after the probe() finishes and releases the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to proceed to remove the device reliably. Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-18Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"Dexuan Cui
This reverts commit d6af2ed29c7c1c311b96dac989dcb991e90ee195. The statement "the hv_pci_bus_exit() call releases structures of all its child devices" in commit d6af2ed29c7c is not true: in the path hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the child "struct hv_pci_dev *hpdev" that is created earlier in pci_devices_present_work() -> new_pcichild_device(). The commit d6af2ed29c7c was originally made in July 2020 for RHEL 7.7, where the old version of hv_pci_bus_exit() was used; when the commit was rebased and merged into the upstream, people didn't notice that it's not really necessary. The commit itself doesn't cause any issue, but it makes hv_pci_probe() more complicated. Revert it to facilitate some upcoming changes to hv_pci_probe(). Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Wei Hu <weh@microsoft.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-5-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-18PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_devDexuan Cui
The hpdev->state is never really useful. The only use in hv_pci_eject_device() and hv_eject_device_work() is not really necessary. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-4-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-18PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panicDexuan Cui
When the host tries to remove a PCI device, the host first sends a PCI_EJECT message to the guest, and the guest is supposed to gracefully remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host; the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to the guest (when the guest receives this message, the device is already unassigned from the guest) and the guest can do some final cleanup work; if the guest fails to respond to the PCI_EJECT message within one minute, the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and removes the PCI device forcibly. In the case of fast device addition/removal, it's possible that the PCI device driver is still configuring MSI-X interrupts when the guest receives the PCI_EJECT message; the channel callback calls hv_pci_eject_device(), which sets hpdev->state to hv_pcichild_ejecting, and schedules a work hv_eject_device_work(); if the PCI device driver is calling pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the while loop in hv_compose_msi_msg() due to the updated hpdev->state, and leave data->chip_data with its default value of NULL; later, when the PCI device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest crashes in hv_arch_irq_unmask() due to data->chip_data being NULL. Fix the issue by not testing hpdev->state in the while loop: when the guest receives PCI_EJECT, the device is still assigned to the guest, and the guest has one minute to finish the device removal gracefully. We don't really need to (and we should not) test hpdev->state in the loop. Fixes: de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-3-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-18PCI: hv: Fix a race condition bug in hv_pci_query_relations()Dexuan Cui
Since day 1 of the driver, there has been a race between hv_pci_query_relations() and survey_child_resources(): during fast device hotplug, hv_pci_query_relations() may error out due to device-remove and the stack variable 'comp' is no longer valid; however, pci_devices_present_work() -> survey_child_resources() -> complete() may be running on another CPU and accessing the no-longer-valid 'comp'. Fix the race by flushing the workqueue before we exit from hv_pci_query_relations(). Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615044451.5580-2-decui@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-18ata: libata-scsi: Avoid deadlock on rescan after device resumeDamien Le Moal
When an ATA port is resumed from sleep, the port is reset and a power management request issued to libata EH to reset the port and rescanning the device(s) attached to the port. Device rescanning is done by scheduling an ata_scsi_dev_rescan() work, which will execute scsi_rescan_device(). However, scsi_rescan_device() takes the generic device lock, which is also taken by dpm_resume() when the SCSI device is resumed as well. If a device rescan execution starts before the completion of the SCSI device resume, the rcu locking used to refresh the cached VPD pages of the device, combined with the generic device locking from scsi_rescan_device() and from dpm_resume() can cause a deadlock. Avoid this situation by changing struct ata_port scsi_rescan_task to be a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is modified to check if the SCSI device associated with the ATA device that must be rescanned is not suspended. If the SCSI device is still suspended, ata_scsi_dev_rescan() returns early and reschedule itself for execution after an arbitrary delay of 5ms. Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reported-by: Joe Breuer <linux-kernel@jmbreuer.net> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217530 Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Joe Breuer <linux-kernel@jmbreuer.net>
2023-06-17io_uring/poll: serialize poll linked timer start with poll removalJens Axboe
We selectively grab the ctx->uring_lock for poll update/removal, but we really should grab it from the start to fully synchronize with linked timeouts. Normally this is indeed the case, but if requests are forced async by the application, we don't fully cover removal and timer disarm within the uring_lock. Make this simpler by having consistent locking state for poll removal. Cc: stable@vger.kernel.org # 6.1+ Reported-by: Querijn Voet <querijnqyn@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-17arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencingMichael Kelley
State CPUHP_AP_HYPERV_ONLINE has been introduced to correctly sequence the initialization of hyperv_pcpu_input_arg. Use this new state for Hyper-V initialization so that hyperv_pcpu_input_arg is allocated early enough. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1684862062-51576-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-17x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offlineMichael Kelley
These commits a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg") 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs") update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg because that memory will be correctly marked as decrypted or encrypted for all VM types (CoCo or normal). But problems ensue when CPUs in the VM go online or offline after virtual PCI devices have been configured. When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN. But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which may call the virtual PCI driver and fault trying to use the as yet uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo VM if the MMIO read and write hypercalls are used from state CPUHP_AP_IRQ_AFFINITY_ONLINE. When a CPU is taken offline, IRQs may be reassigned in state CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to use the hyperv_pcpu_input_arg that has already been freed by a higher state. Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE) and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for Hyper-V initialization so that hyperv_pcpu_input_arg is allocated early enough. Fix the offlining problem by not freeing hyperv_pcpu_input_arg when a CPU goes offline. Retain the allocated memory, and reuse it if the CPU comes back online later. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1684862062-51576-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-17Merge tag 'staging-6.4-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fix from Greg KH: "Here is a single staging driver "fix" for 6.4-rc7. I've been sitting on it in my tree for many weeks as it is just a simple documentation update, with the hope that maybe some other staging driver fixes would need to be merged for 6.4-final, but that does not seem to be the case. So please, pull in this one documentation update so that Aaro doesn't get emails going forward that he can't do anything about" * tag 'staging-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: octeon: delete my name from TODO contact
2023-06-17Merge tag 'usb-6.4-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt fixes from Greg KH: "Here are some small USB and Thunderbolt driver fixes and new device ids for 6.4-rc7 to resolve some reported problems. Included in here are: - new USB serial device ids - USB gadget core fixes for long-dissussed problems - dwc3 bugfixes for reported issues. - typec driver fixes - thunderbolt driver fixes All of these have been in linux-next this week with no reported issues" * tag 'usb-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: gadget: udc: core: Prevent soft_connect_store() race usb: gadget: udc: core: Offload usb_udc_vbus_handler processing usb: typec: Fix fast_role_swap_current show function usb: typec: ucsi: Fix command cancellation USB: dwc3: fix use-after-free on core driver unbind USB: dwc3: qcom: fix NULL-deref on suspend usb: dwc3: gadget: Reset num TRBs before giving back the request usb: gadget: udc: renesas_usb3: Fix RZ/V2M {modprobe,bind} error USB: serial: option: add Quectel EM061KGL series thunderbolt: Mask ring interrupt on Intel hardware as well thunderbolt: Do not touch CL state configuration during discovery thunderbolt: Increase DisplayPort Connection Manager handshake timeout thunderbolt: dma_test: Use correct value for absent rings when creating paths
2023-06-17Merge tag 'tty-6.4-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull serial driver fixes from Greg KH: "Here are two small serial driver fixes for 6.4-rc7 that resolve some reported problems: - lantiq serial driver irq fix - fsl_lpuart serial driver watermark fix Both of these have been in linux-next this week with no reported issues" * tag 'tty-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: serial: fsl_lpuart: reduce RX watermark to 0 on LS1028A serial: lantiq: add missing interrupt ack
2023-06-17SUNRPC: Address RCU warning in net/sunrpc/svc.cChuck Lever
$ make C=1 W=1 net/sunrpc/svc.o make[1]: Entering directory 'linux/obj/manet.1015granger.net' GEN Makefile CALL linux/server-development/scripts/checksyscalls.sh DESCEND objtool INSTALL libsubcmd_headers DESCEND bpf/resolve_btfids INSTALL libsubcmd_headers CC [M] net/sunrpc/svc.o CHECK linux/server-development/net/sunrpc/svc.c linux/server-development/net/sunrpc/svc.c:1225:9: warning: incorrect type in argument 1 (different address spaces) linux/server-development/net/sunrpc/svc.c:1225:9: expected struct spinlock [usertype] *lock linux/server-development/net/sunrpc/svc.c:1225:9: got struct spinlock [noderef] __rcu * linux/server-development/net/sunrpc/svc.c:1227:40: warning: incorrect type in argument 1 (different address spaces) linux/server-development/net/sunrpc/svc.c:1227:40: expected struct spinlock [usertype] *lock linux/server-development/net/sunrpc/svc.c:1227:40: got struct spinlock [noderef] __rcu * make[1]: Leaving directory 'linux/obj/manet.1015granger.net' Warning introduced by commit 913292c97d75 ("sched.h: Annotate sighand_struct with __rcu"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Use sysfs_emit in place of strlcpy/sprintfAzeem Shaikh
Part of an effort to remove strlcpy() tree-wide [1]. Direct replacement is safe here since the getter in kernel_params_ops handles -errno return [2]. [1] https://github.com/KSPP/linux/issues/89 [2] https://elixir.bootlin.com/linux/v6.4-rc6/source/include/linux/moduleparam.h#L52 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Remove transport class dprintk call sitesChuck Lever
Remove a couple of dprintk call sites that are of little value. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Fix comments for transport class registrationChuck Lever
The preceding block comment before svc_register_xprt_class() is not related to that function. While we're here, add proper documenting comments for these two publicly-visible functions. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: Remove an unused argument from __svc_rdma_put_rw_ctxt()Chuck Lever
Clean up. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: trace cc_release callsChuck Lever
This event brackets the svcrdma_post_* trace points. If this trace event is enabled but does not appear as expected, that indicates a chunk_ctxt leak. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: Convert "might sleep" comment into a code annotationChuck Lever
Try to catch incorrect calling contexts mechanically rather than by code review. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17NFSD: Add an nfsd4_encode_nfstime4() helperChuck Lever
Clean up: de-duplicate some common code. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Move initialization of rq_stimeChuck Lever
Micro-optimization: Call ktime_get() only when ->xpo_recvfrom() has given us a full RPC message to process. rq_stime isn't used otherwise, so this avoids pointless work. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Optimize page release in svc_rdma_sendto()Chuck Lever
Now that we have bulk page allocation and release APIs, it's more efficient to use those than it is for nfsd threads to wait for send completions. Previous patches have eliminated the calls to wait_for_completion() and complete(), in order to avoid scheduler overhead. Now release pages-under-I/O in the send completion handler using the efficient bulk release API. I've measured a 7% reduction in cumulative CPU utilization in svc_rdma_sendto(), svc_rdma_wc_send(), and svc_xprt_release(). In particular, using release_pages() instead of complete() cuts the time per svc_rdma_wc_send() call by two-thirds. This helps improve scalability because svc_rdma_wc_send() is single-threaded per connection. Reviewed-by: Tom Talpey <tom@talpey.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: Prevent page release when nothing was receivedChuck Lever
I noticed that svc_rqst_release_pages() was still unnecessarily releasing a page when svc_rdma_recvfrom() returns zero. Fixes: a53d5cb0646a ("svcrdma: Avoid releasing a page in svc_xprt_release()") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17sfc: use budget for TX completionsÍñigo Huguet
When running workloads heavy unbalanced towards TX (high TX, low RX traffic), sfc driver can retain the CPU during too long times. Although in many cases this is not enough to be visible, it can affect performance and system responsiveness. A way to reproduce it is to use a debug kernel and run some parallel netperf TX tests. In some systems, this will lead to this message being logged: kernel:watchdog: BUG: soft lockup - CPU#12 stuck for 22s! The reason is that sfc driver doesn't account any NAPI budget for the TX completion events work. With high-TX/low-RX traffic, this makes that the CPU is held for long time for NAPI poll. Documentations says "drivers can process completions for any number of Tx packets but should only process up to budget number of Rx packets". However, many drivers do limit the amount of TX completions that they process in a single NAPI poll. In the same way, this patch adds a limit for the TX work in sfc. With the patch applied, the watchdog warning never appears. Tested with netperf in different combinations: single process / parallel processes, TCP / UDP and different sizes of UDP messages. Repeated the tests before and after the patch, without any noticeable difference in network or CPU performance. Test hardware: Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz (4 cores, 2 threads/core) Solarflare Communications XtremeScale X2522-25G Network Adapter Fixes: 5227ecccea2d ("sfc: remove tx and MCDI handling from NAPI budget consideration") Fixes: d19a53721863 ("sfc_ef100: TX path for EF100 NICs") Reported-by: Fei Liu <feliu@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20230615084929.10506-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-17parisc: Delete redundant register definitions in <asm/assembly.h>Ben Hutchings
We define sp and ipsw in <asm/asmregs.h> using ".reg", and when using current binutils (snapshot 2.40.50.20230611) the definitions in <asm/assembly.h> using "=" conflict with those: arch/parisc/include/asm/assembly.h: Assembler messages: arch/parisc/include/asm/assembly.h:93: Error: symbol `sp' is already defined arch/parisc/include/asm/assembly.h:95: Error: symbol `ipsw' is already defined Delete the duplicate definitions in <asm/assembly.h>. Also delete the definition of gp, which isn't used anywhere. Signed-off-by: Ben Hutchings <benh@debian.org> Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2023-06-16Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A handful of clk driver fixes: - Fix an OOB issue in the Mediatek mt8365 driver where arrays of clks are mismatched in size - Use the proper clk_ops for a few clks in the Mediatek mt8365 driver - Stop using abs() in clk_composite_determine_rate() because 64-bit math goes wrong on large unsigned long numbers that are subtracted and passed into abs() - Zero initialize a struct clk_init_data in clk-loongson2 to avoid stack junk confusing clk_hw_register() - Actually use a pointer to __iomem for writel() in pxa3xx_clk_update_accr() so we don't oops" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: pxa: fix NULL pointer dereference in pxa3xx_clk_update_accr clk: clk-loongson2: Zero init clk_init_data clk: mediatek: mt8365: Fix inverted topclk operations clk: composite: Fix handling of high clock rates clk: mediatek: mt8365: Fix index issue
2023-06-16ksmbd: validate session id and tree id in the compound requestNamjae Jeon
This patch validate session id and tree id in compound request. If first operation in the compound is SMB2 ECHO request, ksmbd bypass session and tree validation. So work->sess and work->tcon could be NULL. If secound request in the compound access work->sess or tcon, It cause NULL pointer dereferecing error. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21165 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-16ksmbd: fix out-of-bound read in smb2_writeNamjae Jeon
ksmbd_smb2_check_message doesn't validate hdr->NextCommand. If ->NextCommand is bigger than Offset + Length of smb2 write, It will allow oversized smb2 write length. It will cause OOB read in smb2_write. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21164 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-16ksmbd: add mnt_want_write to ksmbd vfs functionsNamjae Jeon
ksmbd is doing write access using vfs helpers. There are the cases that mnt_want_write() is not called in vfs helper. This patch add missing mnt_want_write() to ksmbd vfs functions. Cc: stable@vger.kernel.org Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-16ksmbd: validate command payload sizeNamjae Jeon
->StructureSize2 indicates command payload size. ksmbd should validate this size with rfc1002 length before accessing it. This patch remove unneeded check and add the validation for this. [ 8.912583] BUG: KASAN: slab-out-of-bounds in ksmbd_smb2_check_message+0x12a/0xc50 [ 8.913051] Read of size 2 at addr ffff88800ac7d92c by task kworker/0:0/7 ... [ 8.914967] Call Trace: [ 8.915126] <TASK> [ 8.915267] dump_stack_lvl+0x33/0x50 [ 8.915506] print_report+0xcc/0x620 [ 8.916558] kasan_report+0xae/0xe0 [ 8.917080] kasan_check_range+0x35/0x1b0 [ 8.917334] ksmbd_smb2_check_message+0x12a/0xc50 [ 8.917935] ksmbd_verify_smb_message+0xae/0xd0 [ 8.918223] handle_ksmbd_work+0x192/0x820 [ 8.918478] process_one_work+0x419/0x760 [ 8.918727] worker_thread+0x2a2/0x6f0 [ 8.919222] kthread+0x187/0x1d0 [ 8.919723] ret_from_fork+0x1f/0x30 [ 8.919954] </TASK> Cc: stable@vger.kernel.org Reported-by: Chih-Yen Chang <cc85nod@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-06-16Merge tag 'drm-fixes-2023-06-17' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "A bunch of misc fixes across the board. amdgpu is the usual bulk with a revert and other fixes, nouveau has a race fix that was causing a UAF that was hard hanging systems, otherwise some qaic, bridge and radeon. amdgpu: - GFX9 preemption fixes - Add missing radeon secondary PCI ID - vblflash fixes - SMU 13 fix - VCN 4.0 fix - Re-enable TOPDOWN flag for large BAR systems to fix regression - eDP fix - PSR hang fix - DPIA fix radeon: - fbdev client warning fix qaic: - leak fix - null ptr deref fix nouveau: - use-after-free caused by fence race fix - runtime pm fix - NULL ptr checks bridge: - ti-sn65dsi86: Avoid possible buffer overflow" * tag 'drm-fixes-2023-06-17' of git://anongit.freedesktop.org/drm/drm: (21 commits) nouveau: fix client work fence deletion race drm/amd/display: limit DPIA link rate to HBR3 drm/amd/display: fix the system hang while disable PSR drm/amd/display: edp do not add non-edid timings Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system" drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1 drm/radeon: Disable outputs when releasing fbdev client drm/amd/pm: workaround for compute workload type on some skus drm/amd: Tighten permissions on VBIOS flashing attributes drm/amd: Make sure image is written to trigger VBIOS image update flow drm/amdgpu: add missing radeon secondary PCI ID drm/amdgpu: Implement gfx9 patch functions for resubmission drm/amdgpu: Modify indirect buffer packages for resubmission drm/amdgpu: Program gds backup address as zero if no gds allocated drm/nouveau: add nv_encoder pointer check for NULL drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled drm/nouveau/dp: check for NULL nv_connector->native_mode drm/bridge: ti-sn65dsi86: Avoid possible buffer overflow drm/nouveau: don't detect DSM for non-NVIDIA device accel/qaic: Fix NULL pointer deref in qaic_destroy_drm_device() ...
2023-06-16afs: Fix vlserver probe RTT handlingDavid Howells
In the same spirit as commit ca57f02295f1 ("afs: Fix fileserver probe RTT handling"), don't rule out using a vlserver just because there haven't been enough packets yet to calculate a real rtt. Always set the server's probe rtt from the estimate provided by rxrpc_kernel_get_srtt, which is capped at 1 second. This could lead to EDESTADDRREQ errors when accessing a cell for the first time, even though the vl servers are known and have responded to a probe. Fixes: 1d4adfaf6574 ("rxrpc: Make rxrpc_kernel_get_srtt() indicate validity") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2023-June/006746.html Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-16Merge tag 'v6.4-rockchip-dtsfixes1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes Fixes for the reset pin on nanopi r5c, a reset line on SOQuartz, a duplicate usb regulator on rock64 and PCIe register mappings on rk356x. Also some missing cache properties. * tag 'v6.4-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Fix rk356x PCIe register and range mappings arm64: dts: rockchip: fix button reset pin for nanopi r5c arm64: dts: rockchip: fix nEXTRST on SOQuartz arm64: dts: rockchip: add missing cache properties arm64: dts: rockchip: fix USB regulator on ROCK64 Link: https://lore.kernel.org/r/2885657.e9J7NaK4W3@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-06-16ieee802154: Replace strlcpy with strscpyAzeem Shaikh
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). Direct replacement is safe here since the return values from the helper macros are ignored by the callers. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230613003326.3538391-1-azeemshaikh38@gmail.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-06-17Merge tag 'drm-misc-fixes-2023-06-16' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes maybe in time for v6.4-rc7: - qaic leak and null deref fix. - Fix runtime pm in nouveau. - Fix array overflow in ti-sn65dsi86 pwm chip handling. - Assorted null check fixes in nouveau. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <dev@lankhorst.se> Link: https://patchwork.freedesktop.org/patch/msgid/641eb8a8-fbd7-90ad-0805-310b7fec9344@lankhorst.se
2023-06-16net/mlx5e: Fix scheduling of IPsec ASO query while in atomicLeon Romanovsky
ASO query can be scheduled in atomic context as such it can't use usleep. Use udelay as recommended in Documentation/timers/timers-howto.rst. Fixes: 76e463f6508b ("net/mlx5e: Overcome slow response for first IPsec ASO WQE") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: Drop XFRM state lock when modifying flow steeringLeon Romanovsky
XFRM state which is changed to be XFRM_STATE_EXPIRED doesn't really need to hold lock while modifying flow steering rules to drop traffic. That state can be deleted only and as such mlx5e_ipsec_handle_tx_limit() work will be canceled anyway and won't run in parallel. Fixes: b2f7b01d36a9 ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: Fix ESN update kernel panicPatrisious Haddad
Previously during mlx5e_ipsec_handle_event the driver tried to execute an operation that could sleep, while holding a spinlock, which caused the kernel panic mentioned below. Move the function call that can sleep outside of the spinlock context. Call Trace: <TASK> dump_stack_lvl+0x49/0x6c __schedule_bug.cold+0x42/0x4e schedule_debug.constprop.0+0xe0/0x118 __schedule+0x59/0x58a ? __mod_timer+0x2a1/0x3ef schedule+0x5e/0xd4 schedule_timeout+0x99/0x164 ? __pfx_process_timeout+0x10/0x10 __wait_for_common+0x90/0x1da ? __pfx_schedule_timeout+0x10/0x10 wait_func+0x34/0x142 [mlx5_core] mlx5_cmd_invoke+0x1f3/0x313 [mlx5_core] cmd_exec+0x1fe/0x325 [mlx5_core] mlx5_cmd_do+0x22/0x50 [mlx5_core] mlx5_cmd_exec+0x1c/0x40 [mlx5_core] mlx5_modify_ipsec_obj+0xb2/0x17f [mlx5_core] mlx5e_ipsec_update_esn_state+0x69/0xf0 [mlx5_core] ? wake_affine+0x62/0x1f8 mlx5e_ipsec_handle_event+0xb1/0xc0 [mlx5_core] process_one_work+0x1e2/0x3e6 ? __pfx_worker_thread+0x10/0x10 worker_thread+0x54/0x3ad ? __pfx_worker_thread+0x10/0x10 kthread+0xda/0x101 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x37 </TASK> BUG: workqueue leaked lock or atomic: kworker/u256:4/0x7fffffff/189754#012 last function: mlx5e_ipsec_handle_event [mlx5_core] CPU: 66 PID: 189754 Comm: kworker/u256:4 Kdump: loaded Tainted: G W 6.2.0-2596.20230309201517_5.el8uek.rc1.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X9-2/ASMMBX9-2, BIOS 61070300 08/17/2022 Workqueue: mlx5e_ipsec: eth%d mlx5e_ipsec_handle_event [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x49/0x6c process_one_work.cold+0x2b/0x3c ? __pfx_worker_thread+0x10/0x10 worker_thread+0x54/0x3ad ? __pfx_worker_thread+0x10/0x10 kthread+0xda/0x101 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x37 </TASK> BUG: scheduling while atomic: kworker/u256:4/189754/0x00000000 Fixes: cee137a63431 ("net/mlx5e: Handle ESN update events") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5e: Don't delay release of hardware objectsLeon Romanovsky
XFRM core provides two callbacks to release resources, one is .xdo_dev_policy_delete() and another is .xdo_dev_policy_free(). This separation allows delayed release so "ip xfrm policy free" commands won't starve. Unfortunately, mlx5 command interface can't run in .xdo_dev_policy_free() callbacks as the latter runs in ATOMIC context. BUG: scheduling while atomic: swapper/7/0/0x00000100 Modules linked in: act_mirred act_tunnel_key cls_flower sch_ingress vxlan mlx5_vdpa vringh vhost_iotlb vdpa rpcrdma rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.3.0+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x33/0x50 __schedule_bug+0x4e/0x60 __schedule+0x5d5/0x780 ? __mod_timer+0x286/0x3d0 schedule+0x50/0x90 schedule_timeout+0x7c/0xf0 ? __bpf_trace_tick_stop+0x10/0x10 __wait_for_common+0x88/0x190 ? usleep_range_state+0x90/0x90 cmd_exec+0x42e/0xb40 [mlx5_core] mlx5_cmd_do+0x1e/0x40 [mlx5_core] mlx5_cmd_exec+0x18/0x30 [mlx5_core] mlx5_cmd_delete_fte+0xa8/0xd0 [mlx5_core] del_hw_fte+0x60/0x120 [mlx5_core] mlx5_del_flow_rules+0xec/0x270 [mlx5_core] ? default_send_IPI_single_phys+0x26/0x30 mlx5e_accel_ipsec_fs_del_pol+0x1a/0x60 [mlx5_core] mlx5e_xfrm_free_policy+0x15/0x20 [mlx5_core] xfrm_policy_destroy+0x5a/0xb0 xfrm4_dst_destroy+0x7b/0x100 dst_destroy+0x37/0x120 rcu_core+0x2d6/0x540 __do_softirq+0xcd/0x273 irq_exit_rcu+0x82/0xb0 sysvec_apic_timer_interrupt+0x72/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x16/0x20 RIP: 0010:default_idle+0x13/0x20 Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 7a 4d ee 00 85 c0 7e 07 0f 00 2d 2f 98 2e 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 40 b4 02 00 RSP: 0018:ffff888100843ee0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffff888100812b00 RCX: 4000000000000000 RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000002d2ec RBP: 0000000000000007 R08: 00000021daeded59 R09: 0000000000000001 R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 default_idle_call+0x30/0xb0 do_idle+0x1c1/0x1d0 cpu_startup_entry+0x19/0x20 start_secondary+0xfe/0x120 secondary_startup_64_no_verify+0xf3/0xfb </TASK> bad: scheduling from the idle thread! Fixes: a5b8ca9471d3 ("net/mlx5e: Add XFRM policy offload logic") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>