summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-01SUNRPC: Remove EXPORT_SYMBOL_GPL for svc_process_bc()Chuck Lever
svc_process_bc(), previously known as bc_svc_process(), was added in commit 4d6bbb6233c9 ("nfs41: Backchannel bc_svc_process()") but there has never been a call site outside of the sunrpc.ko module. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Add callback operation lifetime trace pointsChuck Lever
Help observe the flow of callback operations. bc_shutdown() records exactly when the backchannel RPC client is destroyed and cl_cb_client is replaced with NULL. Examples include: nfsd-955 [004] 650.013997: nfsd_cb_queue: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try) kworker/u21:4-497 [004] 650.014050: nfsd_cb_seq_status: task:00000001@00000001 sessionid=65b3c5b8:f541f749:00000001:00000000 tk_status=-107 seq_status=1 kworker/u21:4-497 [004] 650.014051: nfsd_cb_restart: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff88810e39f400 (first try) kworker/u21:4-497 [004] 650.014066: nfsd_cb_queue: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff88810e39f400 (need restart) kworker/u16:0-10 [006] 650.065750: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UNKNOWN kworker/u16:0-10 [006] 650.065752: nfsd_cb_bc_update: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try) kworker/u16:0-10 [006] 650.065754: nfsd_cb_bc_shutdown: addr=192.168.122.6:0 client 65b3c5b8:f541f749 cb=0xffff8881134b02f8 (first try) kworker/u16:0-10 [006] 650.065810: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=DOWN Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Rename nfsd_cb_state trace pointChuck Lever
Make it clear where backchannel state is updated. Example trace point output: kworker/u16:0-10 [006] 2800.080404: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UP nfsd-940 [003] 2800.478368: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UNKNOWN kworker/u16:0-10 [003] 2800.478828: nfsd_cb_new_state: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=DOWN kworker/u16:0-10 [005] 2802.039724: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UP kworker/u16:0-10 [005] 2810.611452: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=FAULT kworker/u16:0-10 [005] 2810.616832: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=UNKNOWN kworker/u16:0-10 [005] 2810.616931: nfsd_cb_start: addr=192.168.122.6:0 client 65b3c5b8:f541f749 state=DOWN Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Replace dprintks in nfsd4_cb_sequence_done()Chuck Lever
Improve observability of backchannel session operation. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Add nfsd_seq4_status trace eventChuck Lever
Add a trace point that records SEQ4_STATUS flags returned in an NFSv4.1 SEQUENCE response. SEQ4_STATUS flags report backchannel issues and changes to lease state to clients. Knowing what the server is reporting to clients is useful for debugging both configuration and operational issues in real time. For example, upcoming patches will enable server administrators to revoke parts of a client's lease; that revocation is indicated to the client when a subsequent SEQUENCE operation has one or more SEQ4_STATUS flags that are set. Sample trace records: nfsd-927 [006] 615.581821: nfsd_seq4_status: xid=0x095ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT nfsd-927 [006] 615.588043: nfsd_seq4_status: xid=0x0a5ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT nfsd-928 [003] 615.588448: nfsd_seq4_status: xid=0x0b5ded07 sessionid=65a032c3:b7845faf:00000001:00000000 status_flags=BACKCHANNEL_FAULT Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Retransmit callbacks after client reconnectsChuck Lever
NFSv4.1 clients assume that if they disconnect, that will force the server to resend pending callback operations once a fresh connection has been established. Turns out NFSD has not been resending after reconnect. Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Reschedule CB operations when backchannel rpc_clnt is shut downChuck Lever
As part of managing a client disconnect, NFSD closes down and replaces the backchannel rpc_clnt. If a callback operation is pending when the backchannel rpc_clnt is shut down, currently nfsd4_run_cb_work() just discards that callback. But there are multiple cases to deal with here: o The client's lease is getting destroyed. Throw the CB away. o The client disconnected. It might be forcing a retransmit of CB operations, or it could have disconnected for other reasons. Reschedule the CB so it is retransmitted when the client reconnects. Since callback operations can now be rescheduled, ensure that cb_ops->prepare can be called only once by moving the cb_ops->prepare paragraph down to just before the rpc_call_async() call. Fixes: 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and nfsd4_shutdown_callback()") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Convert the callback workqueue to use delayed_workChuck Lever
Normally, NFSv4 callback operations are supposed to be sent to the client as soon as they are queued up. In a moment, I will introduce a recovery path where the server has to wait for the client to reconnect. We don't want a hard busy wait here -- the callback should be requeued to try again in several milliseconds. For now, convert nfsd4_callback from struct work_struct to struct delayed_work, and queue with a zero delay argument. This should avoid behavior changes for current operation. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: Reset cb_seq_status after NFS4ERR_DELAYChuck Lever
I noticed that once an NFSv4.1 callback operation gets a NFS4ERR_DELAY status on CB_SEQUENCE and then the connection is lost, the callback client loops, resending it indefinitely. The switch arm in nfsd4_cb_sequence_done() that handles NFS4ERR_DELAY uses rpc_restart_call() to rearm the RPC state machine for the retransmit, but that path does not call the rpc_prepare_call callback again. Thus cb_seq_status is set to -10008 by the first NFS4ERR_DELAY result, but is never set back to 1 for the retransmits. nfsd4_cb_sequence_done() thinks it's getting nothing but a long series of CB_SEQUENCE NFS4ERR_DELAY replies. Fixes: 7ba6cad6c88f ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: make svc_stat per-network namespace instead of globalJosef Bacik
The final bit of stats that is global is the rpc svc_stat. Move this into the nfsd_net struct and use that everywhere instead of the global struct. Remove the unused global struct. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: remove nfsd_stats, make th_cnt a global counterJosef Bacik
This is the last global stat, take it out of the nfsd_stats struct and make it a global part of nfsd, report it the same as always. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: make all of the nfsd stats per-network namespaceJosef Bacik
We have a global set of counters that we modify for all of the nfsd operations, but now that we're exposing these stats across all network namespaces we need to make the stats also be per-network namespace. We already have some caching stats that are per-network namespace, so move these definitions into the same counter and then adjust all the helpers and users of these stats to provide the appropriate nfsd_net struct so that the stats are maintained for the per-network namespace objects. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: expose /proc/net/sunrpc/nfsd in net namespacesJosef Bacik
We are running nfsd servers inside of containers with their own network namespace, and we want to monitor these services using the stats found in /proc. However these are not exposed in the proc inside of the container, so we have to bind mount the host /proc into our containers to get at this information. Separate out the stat counters init and the proc registration, and move the proc registration into the pernet operations entry and exit points so that these stats can be exposed inside of network namespaces. This is an intermediate step, this just exposes the global counters in the network namespace. Subsequent patches will move these counters into the per-network namespace container. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: rename NFSD_NET_* to NFSD_STATS_*Josef Bacik
We're going to merge the stats all into per network namespace in subsequent patches, rename these nn counters to be consistent with the rest of the stats. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01sunrpc: use the struct net as the svc proc privateJosef Bacik
nfsd is the only thing using this helper, and it doesn't use the private currently. When we switch to per-network namespace stats we will need the struct net * in order to get to the nfsd_net. Use the net as the proc private so we can utilize this when we make the switch over. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01sunrpc: remove ->pg_stats from svc_programJosef Bacik
Now that this isn't used anywhere, remove it. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01sunrpc: pass in the sv_stats struct through svc_create_pooledJosef Bacik
Since only one service actually reports the rpc stats there's not much of a reason to have a pointer to it in the svc_program struct. Adjust the svc_create_pooled function to take the sv_stats as an argument and pass the struct through there as desired instead of getting it from the svc_program->pg_stats. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: stop setting ->pg_stats for unused statsJosef Bacik
A lot of places are setting a blank svc_stats in ->pg_stats and never utilizing these stats. Remove all of these extra structs as we're not reporting these stats anywhere. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01sunrpc: don't change ->sv_stats if it doesn't existJosef Bacik
We check for the existence of ->sv_stats elsewhere except in the core processing code. It appears that only nfsd actual exports these values anywhere, everybody else just has a write only copy of sv_stats in their svc_program. Add a check for ->sv_stats before every adjustment to allow us to eliminate the stats struct from all the users who don't report the stats. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: fix LISTXATTRS returning more bytes than maxcountJorge Mora
The maxcount is the maximum number of bytes for the LISTXATTRS4resok result. This includes the cookie and the count for the name array, thus subtract 12 bytes from the maxcount: 8 (cookie) + 4 (array count) when filling up the name array. Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic") Signed-off-by: Jorge Mora <mora@netapp.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: fix LISTXATTRS returning a short list with eof=TRUEJorge Mora
If the XDR buffer is not large enough to fit all attributes and the remaining bytes left in the XDR buffer (xdrleft) is equal to the number of bytes for the current attribute, then the loop will prematurely exit without setting eof to FALSE. Also in this case, adding the eof flag to the buffer will make the reply 4 bytes larger than lsxa_maxcount. Need to check if there are enough bytes to fit not only the next attribute name but also the eof as well. Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic") Signed-off-by: Jorge Mora <mora@netapp.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: change LISTXATTRS cookie encoding to big-endianJorge Mora
Function nfsd4_listxattr_validate_cookie() expects the cookie as an offset to the list thus it needs to be encoded in big-endian. Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic") Signed-off-by: Jorge Mora <mora@netapp.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01NFSD: fix nfsd4_listxattr_validate_cookieJorge Mora
If LISTXATTRS is sent with a correct cookie but a small maxcount, this could lead function nfsd4_listxattr_validate_cookie to return NFS4ERR_BAD_COOKIE. If maxcount = 20, then second check on function gives RHS = 3 thus any cookie larger than 3 returns NFS4ERR_BAD_COOKIE. There is no need to validate the cookie on the return XDR buffer since attribute referenced by cookie will be the first in the return buffer. Fixes: 23e50fe3a5e6 ("nfsd: implement the xattr functions and en/decode logic") Signed-off-by: Jorge Mora <mora@netapp.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: use __fput_sync() to avoid delayed closing of files.NeilBrown
Calling fput() directly or though filp_close() from a kernel thread like nfsd causes the final __fput() (if necessary) to be called from a workqueue. This means that nfsd is not forced to wait for any work to complete. If the ->release or ->destroy_inode function is slow for any reason, this can result in nfsd closing files more quickly than the workqueue can complete the close and the queue of pending closes can grow without bounces (30 million has been seen at one customer site, though this was in part due to a slowness in xfs which has since been fixed). nfsd does not need this. It is quite appropriate and safe for nfsd to do its own close work. There is no reason that close should ever wait for nfsd, so no deadlock can occur. It should be safe and sensible to change all fput() calls to __fput_sync(). However in the interests of caution this patch only changes two - the two that can be most directly affected by client behaviour and could occur at high frequency. - the fput() implicitly in flip_close() is changed to __fput_sync() by calling get_file() first to ensure filp_close() doesn't do the final fput() itself. If is where files opened for IO are closed. - the fput() in nfsd_read() is also changed. This is where directories opened for readdir are closed. This ensure that minimal fput work is queued to the workqueue. This removes the need for the flush_delayed_fput() call in nfsd_file_close_inode_sync() Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01nfsd: Don't leave work of closing files to a work queueNeilBrown
The work of closing a file can have non-trivial cost. Doing it in a separate work queue thread means that cost isn't imposed on the nfsd threads and an imbalance can be created. This can result in files being queued for the work queue more quickly that the work queue can process them, resulting in unbounded growth of the queue and memory exhaustion. To avoid this work imbalance that exhausts memory, this patch moves all closing of files into the nfsd threads. This means that when the work imposes a cost, that cost appears where it would be expected - in the work of the nfsd thread. A subsequent patch will ensure the final __fput() is called in the same (nfsd) thread which calls filp_close(). Files opened for NFSv3 are never explicitly closed by the client and are kept open by the server in the "filecache", which responds to memory pressure, is garbage collected even when there is no pressure, and sometimes closes files when there is particular need such as for rename. These files currently have filp_close() called in a dedicated work queue, so their __fput() can have no effect on nfsd threads. This patch discards the work queue and instead has each nfsd thread call flip_close() on as many as 8 files from the filecache each time it acts on a client request (or finds there are no pending client requests). If there are more to be closed, more threads are woken. This spreads the work of __fput() over multiple threads and imposes any cost on those threads. The number 8 is somewhat arbitrary. It needs to be greater than 1 to ensure that files are closed more quickly than they can be added to the cache. It needs to be small enough to limit the per-request delays that will be imposed on clients when all threads are busy closing files. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01SUNRPC: Use a static buffer for the checksum initialization vectorChuck Lever
Allocating and zeroing a buffer during every call to krb5_etm_checksum() is inefficient. Instead, set aside a static buffer that is the maximum crypto block size, and use a portion (or all) of that. Reported-by: Markus Elfring <Markus.Elfring@web.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01SUNRPC: fix some memleaks in gssx_dec_option_arrayZhipeng Lu
The creds and oa->data need to be freed in the error-handling paths after their allocation. So this patch add these deallocations in the corresponding paths. Fixes: 1d658336b05f ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01SUNRPC: fix a memleak in gss_import_v2_contextZhipeng Lu
The ctx->mech_used.data allocated by kmemdup is not freed in neither gss_import_v2_context nor it only caller gss_krb5_import_sec_context, which frees ctx on error. Thus, this patch reform the last call of gss_import_v2_context to the gss_krb5_import_ctx_v2, preventing the memleak while keepping the return formation. Fixes: 47d848077629 ("gss_krb5: handle new context format from gssd") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-01io_uring/sqpoll: statistics of the true utilization of sq threadsXiaobing Li
Count the running time and actual IO processing time of the sqpoll thread, and output the statistical data to fdinfo. Variable description: "work_time" in the code represents the sum of the jiffies of the sq thread actually processing IO, that is, how many milliseconds it actually takes to process IO. "total_time" represents the total time that the sq thread has elapsed from the beginning of the loop to the current time point, that is, how many milliseconds it has spent in total. The test tool is fio, and its parameters are as follows: [global] ioengine=io_uring direct=1 group_reporting bs=128k norandommap=1 randrepeat=0 refill_buffers ramp_time=30s time_based runtime=1m clocksource=clock_gettime overwrite=1 log_avg_msec=1000 numjobs=1 [disk0] filename=/dev/nvme0n1 rw=read iodepth=16 hipri sqthread_poll=1 The test results are as follows: Every 2.0s: cat /proc/9230/fdinfo/6 | grep -E Sq SqMask: 0x3 SqHead: 3197153 SqTail: 3197153 CachedSqHead: 3197153 SqThread: 9231 SqThreadCpu: 11 SqTotalTime: 18099614 SqWorkTime: 16748316 The test results corresponding to different iodepths are as follows: |-----------|-------|-------|-------|------|-------| | iodepth | 1 | 4 | 8 | 16 | 64 | |-----------|-------|-------|-------|------|-------| |utilization| 2.9% | 8.8% | 10.9% | 92.9%| 84.4% | |-----------|-------|-------|-------|------|-------| | idle | 97.1% | 91.2% | 89.1% | 7.1% | 15.6% | |-----------|-------|-------|-------|------|-------| Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com> Link: https://lore.kernel.org/r/20240228091251.543383-1-xiaobing.li@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01io_uring/net: move recv/recvmsg flags out of retry loopJens Axboe
The flags don't change, just intialize them once rather than every loop for multishot. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01qnx4: convert qnx4 to use the new mount apiBill O'Donnell
Convert the qnx4 filesystem to use the new mount API. Tested mount, umount, and remount using a qnx4 boot image. Signed-off-by: Bill O'Donnell <bodonnel@redhat.com> Link: https://lore.kernel.org/r/20240229161649.800957-1-bodonnel@redhat.com Acked-by: Anders Larsen <al@alarsen.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01fs: use inode_set_ctime_to_ts to set inode ctime to current timeNguyen Dinh Phi
The function inode_set_ctime_current simply retrieves the current time and assigns it to the field __i_ctime without any alterations. Therefore, it is possible to set ctime to now directly using inode_set_ctime_to_ts Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Link: https://lore.kernel.org/r/20240228173031.3208743-1-phind.uet@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01Merge tag 'arm-smmu-updates' of ↵Joerg Roedel
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu Arm SMMU updates for 6.9 - Device-tree binding updates for a bunch of Qualcomm SoCs - SMMUv2: * Support for Qualcomm X1E80100 MDSS - SMMUv3: * Significant rework of the driver's STE manipulation and domain handling code. This is the initial part of a larger scale rework aiming to improve the driver's implementation of the IOMMU API in preparation for hooking up IOMMUFD support.
2024-03-01iommu/sva: Fix SVA handle sharing in multi device caseZhangfei Gao
iommu_sva_bind_device will directly goto out in multi-device case when found existing domain, ignoring list_add handle, which causes the handle to fail to be shared. Fixes: 65d4418c5002 ("iommu/sva: Restore SVA handle sharing") Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240227064821.128-1-zhangfei.gao@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu/vt-d: Use device rbtree in iopf reporting pathLu Baolu
The existing I/O page fault handler currently locates the PCI device by calling pci_get_domain_bus_and_slot(). This function searches the list of all PCI devices until the desired device is found. To improve lookup efficiency, replace it with device_rbtree_find() to search the device within the probed device rbtree. The I/O page fault is initiated by the device, which does not have any synchronization mechanism with the software to ensure that the device stays in the probed device tree. Theoretically, a device could be released by the IOMMU subsystem after device_rbtree_find() and before iopf_get_dev_fault_param(), which would cause a use-after-free problem. Add a mutex to synchronize the I/O page fault reporting path and the IOMMU release device path. This lock doesn't introduce any performance overhead, as the conflict between I/O page fault reporting and device releasing is very rare. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240220065939.121116-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu/vt-d: Use rbtree to track iommu probed devicesLu Baolu
Use a red-black tree(rbtree) to track devices probed by the driver's probe_device callback. These devices need to be looked up quickly by a source ID when the hardware reports a fault, either recoverable or unrecoverable. Fault reporting paths are critical. Searching a list in this scenario is inefficient, with an algorithm complexity of O(n). An rbtree is a self-balancing binary search tree, offering an average search time complexity of O(log(n)). This significant performance improvement makes rbtrees a better choice. Furthermore, rbtrees are implemented on a per-iommu basis, eliminating the need for global searches and further enhancing efficiency in critical fault paths. The rbtree is protected by a spin lock with interrupts disabled to ensure thread-safe access even within interrupt contexts. Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240220065939.121116-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu/vt-d: Merge intel_svm_bind_mm() into its callerTina Zhang
intel_svm_set_dev_pasid() is the only caller of intel_svm_bind_mm(). Merge them and remove intel_svm_bind_mm(). No functional change intended. Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20240219125723.1645703-4-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu/vt-d: Remove initialization for dynamically heap-allocated rcu_headTina Zhang
The rcu_head structures allocated dynamically in the heap don't need any initialization. Therefore, remove the init_rcu_head(). Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20240219125723.1645703-3-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu/vt-d: Remove treatment for revoking PASIDs with pending page faultsTina Zhang
Commit 2f26e0a9c986 ("iommu/vt-d: Add basic SVM PASID support") added a special treatment to mandate that no page faults may be outstanding for the PASID after intel_svm_unbind_mm() is called, as the PASID will be released and reused after unbind. This is unnecessary anymore as no outstanding page faults have been ensured in the driver's remove_dev_pasid path: - Tear down the pasid entry, which guarantees that new page faults for the PASID will be rejected by the iommu hardware. - All outstanding page faults have been responded to. - All hardware pending faults are drained in intel_drain_pasid_prq(). Remove this unnecessary code. Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20240219125723.1645703-2-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu/vt-d: Add the document for Intel IOMMU debugfsJingqi Liu
This document guides users to dump the Intel IOMMU internals by debugfs. Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20240207090742.23857-1-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu/vt-d: Use kcalloc() instead of kzalloc()Erick Archer
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1]. Here the multiplication is obviously safe because DMAR_LATENCY_NUM is the number of latency types defined in the "latency_type" enum. enum latency_type { DMAR_LATENCY_INV_IOTLB = 0, DMAR_LATENCY_INV_DEVTLB, DMAR_LATENCY_INV_IEC, DMAR_LATENCY_PRQ, DMAR_LATENCY_NUM }; However, using kcalloc() is more appropriate [2] and improves readability. This patch has no effect on runtime behavior. Link: https://github.com/KSPP/linux/issues/162 [1] Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [2] Signed-off-by: Erick Archer <erick.archer@gmx.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240211175143.9229-1-erick.archer@gmx.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WALu Baolu
Commit 62edf5dc4a524 ("intel-iommu: Restore DMAR_BROKEN_GFX_WA option for broken graphics drivers") was introduced 24 years ago as a temporary workaround for graphics drivers that used physical addresses for DMA and avoided DMA APIs. This workaround was disabled by default. As 24 years have passed, it is expected that graphics driver developers have migrated their drivers to use kernel DMA APIs. Therefore, this workaround is no longer required and could been removed. The Intel iommu driver also provides a "igfx_off" option to turn off the DMA translation for the graphic dedicated IOMMU. Hence, there is really no good reason to keep this config option. Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240130060823.57990-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01Merge tag 'at91-dt-6.9' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into soc/dt Microchip AT91 device tree updates for v6.9 It contains: - use DMA for DBGU of at91sam9x5ek.dtsi and USART3 of at91sam9g25-gardena-smart-gateway.dts - the new SAMA7G54 Curiosity board - cleanups * tag 'at91-dt-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: dts: microchip: sama7g5: add sama7g5 compatible ARM: dts: microchip: sam9x60: align dmas to the opening '<' ARM: dts: microchip: sama7g5: align dmas to the opening '<' ARM: dts: microchip: sama7g54_curiosity: Add initial device tree of the board ARM: dts: microchip: sama7g5: Add flexcom 10 node dt-bindings: ARM: at91: Document Microchip SAMA7G54 Curiosity ARM: dts: microchip: gardena-smart-gateway: Use DMA for USART3 ARM: dts: microchip: at91sam9x5ek: Use DMA for DBGU serial port Link: https://lore.kernel.org/r/20240226183635.1964704-1-claudiu.beznea@tuxon.dev Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-01iommu: re-use local fwnode variable in iommu_ops_from_fwnode()Krzysztof Kozlowski
iommu_ops_from_fwnode() stores &iommu_spec->np->fwnode in local variable, so use it to simplify the code (iommu_spec is not changed between these dereferences). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240216144027.185959-4-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu: constify fwnode in iommu_ops_from_fwnode()Krzysztof Kozlowski
Make pointer to fwnode_handle a pointer to const for code safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240216144027.185959-3-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu: constify of_phandle_args in xlateKrzysztof Kozlowski
The xlate callbacks are supposed to translate of_phandle_args to proper provider without modifying the of_phandle_args. Make the argument pointer to const for code safety and readability. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240216144027.185959-2-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01iommu: constify pointer to bus_typeKrzysztof Kozlowski
Make pointer to bus_type a pointer to const for code safety. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240216144027.185959-1-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-03-01Merge tag 'zynqmp-dt-for-6.9' of https://github.com/Xilinx/linux-xlnx into ↵Arnd Bergmann
soc/dt arm64: ZynqMP DT changes for 6.9 dt-bindings: - Describe firmware for Versal NET - Describe all firmware child nodes - Align versal-fpga node name with dt schema - Describe k26 rev2 and kv260 DTs: - Align firmware node with dt schema - Add an optee node - Describe reset for CANs - Update ECAM size to discover up to 256 buses - Describe assigned-clocks for uarts - Add u-boot node - Comment SMMU entries - Align dwc3 nodes with dt schema - Rename i2c groups to match dt schema - Small DT updates (comments) - Fix default clock frequency for si570 (zcu102, zcu106) - Add output-enable pins and cover MIO38 (SOM) * tag 'zynqmp-dt-for-6.9' of https://github.com/Xilinx/linux-xlnx: (21 commits) dt-bindings: firmware: xilinx: Describe soc-nvmem subnode dt-bindings: soc: xilinx: Add support for KV260 CC dt-bindings: soc: xilinx: Add support for K26 rev2 SOMs arm64: zynqmp: Align usb clock nodes with binding arm64: zynqmp: Comment all smmu entries arm64: zynqmp: Rename i2c?-gpio to i2c?-gpio-grp arm64: zynqmp: Disable Tri-state for MIO38 Pin arm64: zynqmp: Remove incorrect comment from kv260s arm64: zynqmp: Introduce u-boot options node with bootscr-address arm64: zynqmp: Fix comment to be aligned with board name. arm64: zynqmp: Update ECAM size to discover up to 256 buses arm64: zynqmp: Describe assigned-clocks for uarts arm64: zynqmp: Setup default si570 frequency to 156.25MHz arm64: zynqmp: Add resets property for CAN nodes arm64: zynqmp: Add an OP-TEE node to the device tree arm64: zynqmp: Add output-enable pins to SOMs arm64: zynqmp: Rename zynqmp-power node to power-management dt-bindings: firmware: xilinx: Sort node names (clock-controller) dt-bindings: firmware: xilinx: Describe missing child nodes dt-bindings: firmware: xilinx: Fix versal-fpga node name ... Link: https://lore.kernel.org/r/CAHTX3dLEoFMTGg1Q4+OuOwWYd8N73YBTXki8Vvj3cGHUpLJ0=A@mail.gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-01Merge tag 'sgx-for-v6.9-signed' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into soc/dt Add PowerVR Series5 SGX GPUs for the TI SoCs With the Imagination Rogue GPU binding added, let's also add the devicetree binding for earlier SGX GPUs. Let's also patch the TI SoCs for the related SGX GPU nodes. Based on the mailing list discussions, the conclusion was that we need two separate device tree bindings, one for Rogue and upcoming GPUS, and one for the older SGX GPUs. For merging the changes, I applied the binding changes together with the TI SoC related changes into a branch leaving out the sun6i and mips changes as suggested by Rob. These changes are mostly 32-bit SoCs, but also contains one arm64 change. It does not cause any merge conflicts. * tag 'sgx-for-v6.9-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: arm64: dts: ti: k3-am654-main: Add device tree entry for SGX GPU ARM: dts: DRA7xx: Add device tree entry for SGX GPU ARM: dts: AM437x: Add device tree entry for SGX GPU ARM: dts: AM33xx: Add device tree entry for SGX GPU ARM: dts: omap5: Add device tree entry for SGX GPU ARM: dts: omap4: Add device tree entry for SGX GPU ARM: dts: omap3: Add device tree entry for SGX GPU dt-bindings: gpu: Add PowerVR Series5 SGX GPUs dt-bindings: gpu: Rename img,powervr to img,powervr-rogue Link: https://lore.kernel.org/r/pull-1708943489-872615@atomide.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-01Merge tag 'imx-dt64-6.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/dt i.MX arm64 device tree for 6.9: - New board support: Apalis eval v1.2 carrier board, Variscite VAR-SOM-MX93, phyBOARD-Segin-i.MX93. - A series from Adam Ford to enable bluetooth, configure multiple queues on eqos, remove unnecessary clock configuration for i.MX8 Beacon boards. - Several changesets from Alexander Stein to add i.MX8DXP support, enable audio and GPU for i.MX8QXP, re-parent MEDIA_MIPI_PHY1_REF clock for i.MX8MP, and improve MBA8xx board description. - A few dt-schema fixes from Fabio Estevam for i.MX8MM and i.MX93 devices. - A bunch of changes from Frank Li to improve i.MX8QM and i.MX8DXL support, correcting edma3 power-domains and interrupt numbers, adding I2C, FlexCAN and SMMU devices, etc. - A series from Frieder Schrempf to improve imx8mm-kontron board descriptions, disabling pulls, fixing up RTC device, adding EEPROM, and refactoring OSM-S module, etc. - A set of Data Modul i.MX8M Plus eDM SBC improvements from Marek Vasut. - A series from Shengjiu Wang to add PDM micphone and SPDIF sound card support for imx8mm-evk board. - A series of imx8mm-venice boards improvement from Tim Harvey to add TPM device, fix USB OTG VBUS etc. - Other small and random improvements on various boards. * tag 'imx-dt64-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: (77 commits) arm64: dts: imx8mm-kontron-bl-osm-s: Fix Ethernet PHY compatible arm64: dts: imx8-apalis-v1.1: Remove reset-names from ethernet-phy arm64: dts: imx8mp-evk: Fix hdmi@3d node arm64: dts: imx93-var-som: Remove phy-supply from eqos arm64: dts: imx8mp-phyboard-pollux: Disable pull-up for CD GPIO arm64: dts: imx8mp-phyboard-pollux: Reduce drive strength for eqos tx lines arm64: dts: imx8mp-phyboard-pollux: Set debug uart muxing to 0x140 arm64: dts: imx8mp-phyboard-pollux: Add and update rtc devicetree node arm64: dts: imx8mm-evk: Add spdif sound card support arm64: dts: mba8xx: Add missing #interrupt-cells arm64: dts: imx8mp: Set SPI NOR to max 40 MHz on Data Modul i.MX8M Plus eDM SBC arm64: dts: imx8mn: tqma8mqnl-mba8mx: Add USB DR overlay arm64: dts: imx8mq: tqma8mq-mba8mx: Add missing USB vbus supply arm64: dts: freescale: imx8mm/imx8mq: mba8mx: Use PCIe clock generator arm64: dts: imx8mn-beacon: Remove unnecessary clock configuration arm64: dts: imx8mn: Slow default video_pll clock rate arm64: dts: imx8mp-beacon: Configure multiple queues on eqos arm64: dts: imx8mp-beacon: Enable Bluetooth arm64: dts: freescale: minor whitespace cleanup arm64: dts: lx2160a: Fix DTS for full PL011 UART ... Link: https://lore.kernel.org/r/20240226034147.233993-4-shawnguo2@yeah.net Signed-off-by: Arnd Bergmann <arnd@arndb.de>