Age | Commit message (Collapse) | Author |
|
Avoid a warning if the error percolates back up:
[440700.376476] CIFS VFS: \\otters.example.com crypt_message: Could not get encryption key
[440700.386947] ------------[ cut here ]------------
[440700.386948] err = 1
[440700.386977] WARNING: CPU: 11 PID: 2733 at /build/linux-hwe-5.4-p6lk6L/linux-hwe-5.4-5.4.0/lib/errseq.c:74 errseq_set+0x5c/0x70
...
[440700.397304] CPU: 11 PID: 2733 Comm: tar Tainted: G OE 5.4.0-70-generic #78~18.04.1-Ubuntu
...
[440700.397334] Call Trace:
[440700.397346] __filemap_set_wb_err+0x1a/0x70
[440700.397419] cifs_writepages+0x9c7/0xb30 [cifs]
[440700.397426] do_writepages+0x4b/0xe0
[440700.397444] __filemap_fdatawrite_range+0xcb/0x100
[440700.397455] filemap_write_and_wait+0x42/0xa0
[440700.397486] cifs_setattr+0x68b/0xf30 [cifs]
[440700.397493] notify_change+0x358/0x4a0
[440700.397500] utimes_common+0xe9/0x1c0
[440700.397510] do_utimes+0xc5/0x150
[440700.397520] __x64_sys_utimensat+0x88/0xd0
Fixes: 61cfac6f267d ("CIFS: Fix possible use after free in demultiplex thread")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If smb3_notify() is called at mount point of CIFS, build_path_from_dentry()
returns the pointer to kmalloc-ed memory with terminating zero (this is
empty FileName to be passed to SMB2 CREATE request). This pointer is assigned
to the `path` variable.
Then `path + 1` (to skip first backslash symbol) is passed to
cifs_convert_path_to_utf16(). This is incorrect for empty path and causes
out-of-bound memory access.
Get rid of this "increase by one". cifs_convert_path_to_utf16() already
contains the check for leading backslash in the path.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=212693
CC: <stable@vger.kernel.org> # v5.6+
Signed-off-by: Eugene Korenevsky <ekorenevsky@astralinux.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
* rqst[1,2,3] is allocated in vars
* each rqst->rq_iov is also allocated in vars or using pooled memory
SMB2_open_free, SMB2_ioctl_free, SMB2_query_info_free are iterating on
each rqst after vars has been freed (use-after-free), and they are
freeing the kvec a second time (double-free).
How to trigger:
* compile with KASAN
* mount a share
$ smbinfo quota /mnt/foo
Segmentation fault
$ dmesg
==================================================================
BUG: KASAN: use-after-free in SMB2_open_free+0x1c/0xa0
Read of size 8 at addr ffff888007b10c00 by task python3/1200
CPU: 2 PID: 1200 Comm: python3 Not tainted 5.12.0-rc6+ #107
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Call Trace:
dump_stack+0x93/0xc2
print_address_description.constprop.0+0x18/0x130
? SMB2_open_free+0x1c/0xa0
? SMB2_open_free+0x1c/0xa0
kasan_report.cold+0x7f/0x111
? smb2_ioctl_query_info+0x240/0x990
? SMB2_open_free+0x1c/0xa0
SMB2_open_free+0x1c/0xa0
smb2_ioctl_query_info+0x2bf/0x990
? smb2_query_reparse_tag+0x600/0x600
? cifs_mapchar+0x250/0x250
? rcu_read_lock_sched_held+0x3f/0x70
? cifs_strndup_to_utf16+0x12c/0x1c0
? rwlock_bug.part.0+0x60/0x60
? rcu_read_lock_sched_held+0x3f/0x70
? cifs_convert_path_to_utf16+0xf8/0x140
? smb2_check_message+0x6f0/0x6f0
cifs_ioctl+0xf18/0x16b0
? smb2_query_reparse_tag+0x600/0x600
? cifs_readdir+0x1800/0x1800
? selinux_bprm_creds_for_exec+0x4d0/0x4d0
? do_user_addr_fault+0x30b/0x950
? __x64_sys_openat+0xce/0x140
__x64_sys_ioctl+0xb9/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fdcf1f4ba87
Code: b3 66 90 48 8b 05 11 14 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 13 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffef1ce7748 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000c018cf07 RCX: 00007fdcf1f4ba87
RDX: 0000564c467c5590 RSI: 00000000c018cf07 RDI: 0000000000000003
RBP: 00007ffef1ce7770 R08: 00007ffef1ce7420 R09: 00007fdcf0e0562b
R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000004018
R13: 0000000000000001 R14: 0000000000000003 R15: 0000564c467c5590
Allocated by task 1200:
kasan_save_stack+0x1b/0x40
__kasan_kmalloc+0x7a/0x90
smb2_ioctl_query_info+0x10e/0x990
cifs_ioctl+0xf18/0x16b0
__x64_sys_ioctl+0xb9/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 1200:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
__kasan_slab_free+0xe5/0x110
slab_free_freelist_hook+0x53/0x130
kfree+0xcc/0x320
smb2_ioctl_query_info+0x2ad/0x990
cifs_ioctl+0xf18/0x16b0
__x64_sys_ioctl+0xb9/0xf0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff888007b10c00
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 0 bytes inside of
512-byte region [ffff888007b10c00, ffff888007b10e00)
The buggy address belongs to the page:
page:0000000044e14b75 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7b10
head:0000000044e14b75 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x100000000010200(slab|head)
raw: 0100000000010200 ffffea000015f500 0000000400000004 ffff888001042c80
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888007b10b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888007b10b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888007b10c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888007b10c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888007b10d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Can aid in making mount problems easier to diagnose
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This makes the errors accessible from userspace via dmesg and
the fs_context fd.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add fs_context param to parsing helpers to be able to log into it in
next patch.
Make some helper static as they are not used outside of fs_context.c
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This new helper will be used in the fs_context mount option parsing
code. It log errors both in:
* the fs_context log queue for userspace to read
* kernel printk buffer (dmesg, old behaviour)
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Emulated via server side copy and setsize for
SMB3 and later. In the future we could compound
this (and/or optionally use DUPLICATE_EXTENTS
if supported by the server).
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Emulated for SMB3 and later via server side copy
and setsize. Eventually this could be compounded.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Improves directory metadata caching
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
functions
Needed for the final patch in the directory caching series
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
and clear the timestamp when we receive a lease break.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Needed for subsequent patches in the directory caching
series.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
lifetime of the cache
We need to hold both a reference for the root/superblock as well as the directory that we
are caching. We need to drop these references before we call kill_anon_sb().
At this point, the root and the cached dentries are always the same but this will change
once we start caching other directories as well.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
completed mounting the share
And use this to only allow to take out a shared handle once the mount has completed and the
sb becomes available.
This will become important in follow up patches where we will start holding a reference to the
directory dentry for the shared handle during the lifetime of the handle.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
These functions will eventually be used to cache any directory, not just the root
so change the names.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Move the check for the directory path into the open_shroot() function
but still fail for any non-root directories.
This is preparation for later when we will start using the cache also
for other directories than the root.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
instead of doing it in the callsites for open_shroot.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The cost is that we might need to flip '/' to '\\' in more than
just the prefix. Needs profiling, but I suspect that we won't
get slowdown on that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
build_path_from_dentry() open-codes dentry_path_raw(). The reason
we can't use dentry_path_raw() in there (and postprocess the
result as needed) is that the callers of build_path_from_dentry()
expect that the object to be freed on cleanup and the string to
be used are at the same address. That's painful, since the path
is naturally built end-to-beginning - we start at the leaf and
go through the ancestors, accumulating the pathname.
Life would be easier if we left the buffer allocation to callers.
It wouldn't be exact-sized buffer, but none of the callers keep
the result for long - it's always freed before the caller returns.
So there's no need to do exact-sized allocation; better use
__getname()/__putname(), same as we do for pathname arguments
of syscalls. What's more, there's no need to do allocation under
spinlocks, so GFP_ATOMIC is not needed.
Next patch will replace the open-coded dentry_path_raw() (in
build_path_from_dentry_optional_prefix()) with calling the real
thing. This patch only introduces wrappers for allocating/freeing
the buffers and switches to new calling conventions:
build_path_from_dentry(dentry, buf)
expects buf to be address of a page-sized object or NULL,
return value is a pathname built inside that buffer on success,
ERR_PTR(-ENOMEM) if buf is NULL and ERR_PTR(-ENAMETOOLONG) if
the pathname won't fit into page. Note that we don't need to
check for failure when allocating the buffer in the caller -
build_path_from_dentry() will do the right thing.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
... and adjust the callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
As it is, it takes const char * and, in some cases, stores it in
caller's variable that is plain char *. Fortunately, none of the
callers actually proceeded to modify the string via now-non-const
alias, but that's trouble waiting to happen.
It's easy to do properly, anyway...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
strndup(s, strlen(s)) is a highly unidiomatic way to spell strdup(s);
it's *NOT* safer in any way, since strlen() is just as sensitive to
NUL-termination as strdup() is.
strndup() is for situations when you need a copy of a known-sized
substring, not a magic security juju to drive the bad spirits away.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Protocol has been extended for additional compression headers.
See MS-SMB2 section 2.2.42
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
While reviewing a patch clarifying locks and locking hierarchy I
realized some locks were unused.
This commit removes old data and code that isn't actually used
anywhere, or hidden in ifdefs which cannot be enabled from the kernel
config.
* The uid/gid trees and associated locks are left-overs from when
uid/sid mapping had an extra caching layer on top of the keyring and
are now unused.
See commit faa65f07d21e ("cifs: simplify id_to_sid and sid_to_id mapping code")
from 2012.
* cifs_oplock_break_ops is a left-over from when slow_work was remplaced
by regular workqueue and is now unused.
See commit 9b646972467f ("cifs: use workqueue instead of slow-work")
from 2010.
* CIFSSMBSetAttrLegacy is SMB1 cruft dealing with some legacy
NT4/Win9x behaviour.
* Remove CONFIG_CIFS_DNOTIFY_EXPERIMENTAL left-overs. This was already
partially removed in 392e1c5dc9cc ("cifs: rename and clarify CIFS_ASYNC_OP and CIFS_NO_RESP")
from 2019. Kill it completely.
* Another candidate that was considered but spared is
CONFIG_CIFS_NFSD_EXPORT which has an empty implementation and cannot
be enabled by a config option (although it is listed but disabled with
"BROKEN" as a dep). It's unclear whether this could even function
today in its current form but it has it's own .c file and Kconfig
entry which is a bit more involved to remove and might make a come
back?
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warning:
CC [M] fs/cifs/cifssmb.o
fs/cifs/cifssmb.c: In function ‘CIFSFindNext’:
fs/cifs/cifssmb.c:4636:23: warning: array subscript 1 is above array bounds of ‘char[1]’ [-Warray-bounds]
4636 | pSMB->ResumeFileName[name_len+1] = 0;
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
struct cifs_writedata is declared twice.
One is declared at 209th line.
And struct cifs_writedata is defined blew.
The declaration hear is not needed. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add missing documentation for open_files and dfscache /proc files.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This commit doesn't change the logic of SWN.
Add dummy implementation of SWN functions when SWN is disabled instead
of using ifdef sections.
The dummy functions get optimized out, this leads to clearer code and
compile time type-checking regardless of config options with no
runtime penalty.
Leave the simple ifdefs section as-is.
A single bitfield (bool foo:1) on its own will use up one int. Move
tcon->use_witness out of ifdefs with the other tcon bitfields.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Samuel Cabrero <scabrero@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
[MS-SMB2] protocol specification was recently updated to include
new flags, new negotiate context and some minor changes to fields.
Update smb2pdu.h structure definitions to match the newest version
of the protocol specification. Updates to the compression context
values will be in a followon patch.
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
A few of the semaphores had been removed, and one additional one
needed to be noted in the comments.
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix the following gcc warning:
fs/cifs/cifsacl.c:1097:8: warning: variable ‘nmode’ set but not used
[-Wunused-but-set-variable].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
secuirty -> security
Signed-off-by: jack1.li_cp <liliu1@yulong.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The Intel Lightning Mountain (LGM) Serial Shift Output controller (SSO)
is only present on Intel Lightning Mountain SoCs. Hence add a
dependency on X86, to prevent asking the user about this driver when
configuring a kernel without Intel Lightning Mountain platform support.
While at it, merge the other dependencies into a single statement.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
|
|
|
|
There is a spelling mistake in a dev_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
|
|
Signed-off-by: Pavel Machek <pavel@ucw.cz>
|
|
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Pavel Machek <pavel@ucw.cz>
|
|
Before this fix, the function and userdata columns weren't aligned:
device can_id can_mask function userdata matches ident
vcan0 92345678 9fffffff 0000000000000000 0000000000000000 0 raw
vcan0 123 00000123 0000000000000000 0000000000000000 0 raw
After the fix they are:
device can_id can_mask function userdata matches ident
vcan0 92345678 9fffffff 0000000000000000 0000000000000000 0 raw
vcan0 123 00000123 0000000000000000 0000000000000000 0 raw
Link: Link: https://lore.kernel.org/r/20210425141440.229653-1-erik@flodin.me
Signed-off-by: Erik Flodin <erik@flodin.me>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Add '/' prefix to clarify that the generated files exist right under
scripts/kconfig/, but not in any sub-directory.
Replace '*conf-cfg' with '[gmnq]conf-cfg' to make it explicit, and
still short enough.
Use '[gmnq]conf' to combine gconf, mconf, nconf, and qconf.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix potential NULL pointer dereference in the auxtrace option parser
- Fix access to PID in an array when setting a PID filter in 'perf ftrace'
- Fix error return code in the 'perf data' tool and in maps__clone(),
found using a static analysis tool from Huawei
* tag 'perf-tools-fixes-for-v5.12-2021-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf map: Fix error return code in maps__clone()
perf ftrace: Fix access to pid in array when setting a pid filter
perf auxtrace: Fix potential NULL pointer dereference
perf data: Fix error return code in perf_data__create_dir()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Borislav Petkov:
- Fix Broadwell Xeon's stepping in the PEBS isolation table of CPUs
- Fix a panic when initializing perf uncore machinery on Haswell and
Broadwell servers
* tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
|
|
This is done by create_io_thread() now.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Otherwise io_wq_worker_{running,sleeping}() may dereference an
invalid pointer (in future). Currently all users of create_io_thread()
are fine and get task->pf_io_worker = NULL implicitly from the
wq_manager, which got it either from the userspace thread
of the sq_thread, which explicitly reset it to NULL.
I think it's safer to always reset it in order to avoid future
problems.
Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task")
cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add the code needed in the main IDXD driver to interface with the IDXD
perfmon implementation.
[ Based on work originally by Jing Lin. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: https://lore.kernel.org/r/a5564a5583911565d31c2af9234218c5166c4b2c.1619276133.git.zanussi@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Implement the IDXD performance monitor capability (named 'perfmon' in
the DSA (Data Streaming Accelerator) spec [1]), which supports the
collection of information about key events occurring during DSA and
IAX (Intel Analytics Accelerator) device execution, to assist in
performance tuning and debugging.
The idxd perfmon support is implemented as part of the IDXD driver and
interfaces with the Linux perf framework. It has several features in
common with the existing uncore pmu support:
- it does not support sampling
- does not support per-thread counting
However it also has some unique features not present in the core and
uncore support:
- all general-purpose counters are identical, thus no event constraints
- operation is always system-wide
While the core perf subsystem assumes that all counters are by default
per-cpu, the uncore pmus are socket-scoped and use a cpu mask to
restrict counting to one cpu from each socket. IDXD counters use a
similar strategy but expand the scope even further; since IDXD
counters are system-wide and can be read from any cpu, the IDXD perf
driver picks a single cpu to do the work (with cpu hotplug notifiers
to choose a different cpu if the chosen one is taken off-line).
More specifically, the perf userspace tool by default opens a counter
for each cpu for an event. However, if it finds a cpumask file
associated with the pmu under sysfs, as is the case with the uncore
pmus, it will open counters only on the cpus specified by the cpumask.
Since perfmon only needs to open a single counter per event for a
given IDXD device, the perfmon driver will create a sysfs cpumask file
for the device and insert the first cpu of the system into it. When a
user uses perf to open an event, perf will open a single counter on
the cpu specified by the cpu mask. This amounts to the default
system-wide rather than per-cpu counting mentioned previously for
perfmon pmu events. In order to keep the cpu mask up-to-date, the
driver implements cpu hotplug support for multiple devices, as IDXD
usually enumerates and registers more than one idxd device.
The perfmon driver implements basic perfmon hardware capability
discovery and configuration, and is initialized by the IDXD driver's
probe function. During initialization, the driver retrieves the total
number of supported performance counters, the pmu ID, and the device
type from idxd device, and registers itself under the Linux perf
framework.
The perf userspace tool can be used to monitor single or multiple
events depending on the given configuration, as well as event groups,
which are also supported by the perfmon driver. The user configures
events using the perf tool command-line interface by specifying the
event and corresponding event category, along with an optional set of
filters that can be used to restrict counting to specific work queues,
traffic classes, page and transfer sizes, and engines (See [1] for
specifics).
With the configuration specified by the user, the perf tool issues a
system call passing that information to the kernel, which uses it to
initialize the specified event(s). The event(s) are opened and
started, and following termination of the perf command, they're
stopped. At that point, the perfmon driver will read the latest count
for the event(s), calculate the difference between the latest counter
values and previously tracked counter values, and display the final
incremental count as the event count for the cycle. An overflow
handler registered on the IDXD irq path is used to account for counter
overflows, which are signaled by an overflow interrupt.
Below are a couple of examples of perf usage for monitoring DSA events.
The following monitors all events in the 'engine' category. Becuuse
no filters are specified, this captures all engine events for the
workload, which in this case is 19 iterations of the work generated by
the kernel dmatest module.
Details describing the events can be found in Appendix D of [1],
Performance Monitoring Events, but briefly they are:
event 0x1: total input data processed, in 32-byte units
event 0x2: total data written, in 32-byte units
event 0x4: number of work descriptors that read the source
event 0x8: number of work descriptors that write the destination
event 0x10: number of work descriptors dispatched from batch descriptors
event 0x20: number of work descriptors dispatched from work queues
# perf stat -e dsa0/event=0x1,event_category=0x1/,
dsa0/event=0x2,event_category=0x1/,
dsa0/event=0x4,event_category=0x1/,
dsa0/event=0x8,event_category=0x1/,
dsa0/event=0x10,event_category=0x1/,
dsa0/event=0x20,event_category=0x1/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
5,332 dsa0/event=0x1,event_category=0x1/
5,327 dsa0/event=0x2,event_category=0x1/
19 dsa0/event=0x4,event_category=0x1/
19 dsa0/event=0x8,event_category=0x1/
0 dsa0/event=0x10,event_category=0x1/
19 dsa0/event=0x20,event_category=0x1/
21.977436186 seconds time elapsed
The command below illustrates filter usage with a simple example. It
specifies that MEM_MOVE operations should be counted for the DSA
device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number
of Memory Move Descriptors, which is part of event category 0x3 -
Operations. The detailed category and event IDs are available in
Appendix D, Performance Monitoring Events, of [1]). In addition to
the event and event category, a number of filters are also specified
(the detailed filter values are available in Chapter 6.4 (Filter
Support) of [1]), which will restrict counting to only those events
that meet all of the filter criteria. In this case, the filters
specify that only MEM_MOVE operations that are serviced by work queue
wq0 and specifically engine number engine0 and traffic class tc0
having sizes between 0 and 4k and page size of between 0 and 1G result
in a counter hit; anything else will be filtered out and not appear in
the final count. Note that filters are optional - any filter not
specified is assumed to be all ones and will pass anything.
# perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
21.865914091 seconds time elapsed
The output above reflects that the unspecified workload resulted in
the counting of 19 MEM_MOVE operation events that met the filter
criteria.
[1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html
[ Based on work originally by Jing Lin. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
we shall update sq_thread_idle anytime we do ctx deletion from ctx_list
Fixes:734551df6f9b ("io_uring: fix shared sqpoll cancellation hangs")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/1619256380-236460-1-git-send-email-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Hook buffers into all rsrc infrastructure, including tagging and
updates.
Suggested-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/119ed51d68a491dae87eb55fb467a47870c86aad.1619356238.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|