Age | Commit message (Collapse) | Author |
|
Pull io_uring epoll support from Jens Axboe:
"This adds support for reading epoll events via io_uring.
While this may seem counter-intuitive (and/or productive), the
reasoning here is that quite a few existing epoll event loops can
easily do a partial conversion to a completion based model, but are
still stuck with one (or few) event types that remain readiness based.
For that case, they then need to add the io_uring fd to the epoll
context, and continue to rely on epoll_wait(2) for waiting on events.
This misses out on the finer grained waiting that io_uring can do, to
reduce context switches and wait for multiple events in one batch
reliably.
With adding support for reaping epoll events via io_uring, the whole
legacy readiness based event types can still be reaped via epoll, with
the overall waiting in the loop be driven by io_uring"
* tag 'for-6.15/io_uring-epoll-wait-20250325' of git://git.kernel.dk/linux:
io_uring/epoll: add support for IORING_OP_EPOLL_WAIT
io_uring/epoll: remove CONFIG_EPOLL guards
|
|
Pull io_uring zero-copy receive support from Jens Axboe:
"This adds support for zero-copy receive with io_uring, enabling fast
bulk receive of data directly into application memory, rather than
needing to copy the data out of kernel memory.
While this version only supports host memory as that was the initial
target, other memory types are planned as well, with notably GPU
memory coming next.
This work depends on some networking components which were queued up
on the networking side, but have now landed in your tree.
This is the work of Pavel Begunkov and David Wei. From the v14 posting:
'We configure a page pool that a driver uses to fill a hw rx queue
to hand out user pages instead of kernel pages. Any data that ends
up hitting this hw rx queue will thus be dma'd into userspace
memory directly, without needing to be bounced through kernel
memory. 'Reading' data out of a socket instead becomes a
_notification_ mechanism, where the kernel tells userspace where
the data is. The overall approach is similar to the devmem TCP
proposal
This relies on hw header/data split, flow steering and RSS to
ensure packet headers remain in kernel memory and only desired
flows hit a hw rx queue configured for zero copy. Configuring this
is outside of the scope of this patchset.
We share netdev core infra with devmem TCP. The main difference is
that io_uring is used for the uAPI and the lifetime of all objects
are bound to an io_uring instance. Data is 'read' using a new
io_uring request type. When done, data is returned via a new shared
refill queue. A zero copy page pool refills a hw rx queue from this
refill queue directly. Of course, the lifetime of these data
buffers are managed by io_uring rather than the networking stack,
with different refcounting rules.
This patchset is the first step adding basic zero copy support. We
will extend this iteratively with new features e.g. dynamically
allocated zero copy areas, THP support, dmabuf support, improved
copy fallback, general optimisations and more'
In a local setup, I was able to saturate a 200G link with a single CPU
core, and at netdev conf 0x19 earlier this month, Jamal reported
188Gbit of bandwidth using a single core (no HT, including soft-irq).
Safe to say the efficiency is there, as bigger links would be needed
to find the per-core limit, and it's considerably more efficient and
faster than the existing devmem solution"
* tag 'for-6.15/io_uring-rx-zc-20250325' of git://git.kernel.dk/linux:
io_uring/zcrx: add selftest case for recvzc with read limit
io_uring/zcrx: add a read limit to recvzc requests
io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap
io_uring: Rename KConfig to Kconfig
io_uring/zcrx: fix leaks on failed registration
io_uring/zcrx: recheck ifq on shutdown
io_uring/zcrx: add selftest
net: add documentation for io_uring zcrx
io_uring/zcrx: add copy fallback
io_uring/zcrx: throttle receive requests
io_uring/zcrx: set pp memory provider for an rx queue
io_uring/zcrx: add io_recvzc request
io_uring/zcrx: dma-map area for the device
io_uring/zcrx: implement zerocopy receive pp memory provider
io_uring/zcrx: grab a net device
io_uring/zcrx: add io_zcrx_area
io_uring/zcrx: add interface queue and refill queue
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"This contains a new driver: a TPM FF-A driver.
FF comes from Firmware Framework, and A comes from Arm's A-profile.
FF-A is essentially a standard mechanism to communicate with TrustZone
apps such as TPM.
Other than that, this includes a pile of fixes and small improvments"
* tag 'tpmdd-next-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Make chip->{status,cancel,req_canceled} opt
MAINTAINERS: TPM DEVICE DRIVER: add missing includes
tpm: End any active auth session before shutdown
Documentation: tpm: Add documentation for the CRB FF-A interface
tpm_crb: Add support for the ARM FF-A start method
ACPICA: Add start method for ARM FF-A
tpm_crb: Clean-up and refactor check for idle support
tpm_crb: ffa_tpm: Implement driver compliant to CRB over FF-A
tpm/tpm_ftpm_tee: fix struct ftpm_tee_private documentation
tpm, tpm_tis: Workaround failed command reception on Infineon devices
tpm, tpm_tis: Fix timeout handling when waiting for TPM status
tpm: Convert warn to dbg in tpm2_start_auth_session()
tpm: Lazily flush auth session when getting random data
tpm: ftpm_tee: remove incorrect of_match_ptr annotation
tpm: do not start chip while suspended
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC fixes from Eric Biggers:
"Fix out-of-scope array bugs in arm and arm64's crc_t10dif_arch()"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
arm64/crc-t10dif: fix use of out-of-scope array in crc_t10dif_arch()
arm/crc-t10dif: fix use of out-of-scope array in crc_t10dif_arch()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This brings two main changes to Landlock:
- A signal scoping fix with a new interface for user space to know if
it is compatible with the running kernel.
- Audit support to give visibility on why access requests are denied,
including the origin of the security policy, missing access rights,
and description of object(s). This was designed to limit log spam
as much as possible while still alerting about unexpected blocked
access.
With these changes come new and improved documentation, and a lot of
new tests"
* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
landlock: Add audit documentation
selftests/landlock: Add audit tests for network
selftests/landlock: Add audit tests for filesystem
selftests/landlock: Add audit tests for abstract UNIX socket scoping
selftests/landlock: Add audit tests for ptrace
selftests/landlock: Test audit with restrict flags
selftests/landlock: Add tests for audit flags and domain IDs
selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
selftests/landlock: Add test for invalid ruleset file descriptor
samples/landlock: Enable users to log sandbox denials
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
landlock: Log scoped denials
landlock: Log TCP bind and connect denials
landlock: Log truncate and IOCTL denials
landlock: Factor out IOCTL hooks
landlock: Log file-related denials
landlock: Log mount-related denials
landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux
Pull capabilities update from Serge Hallyn:
"This contains just one patch that removes a helper function whose last
user (smack) stopped using it in 2018"
* tag 'caps-pr-20250327' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
capability: Remove unused has_capability
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull ima updates from Mimi Zohar:
"Two performance improvements, which minimize the number of integrity
violations"
* tag 'integrity-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: limit the number of ToMToU integrity violations
ima: limit the number of open-writers integrity violations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull ipe update from Fan Wu:
"This contains just one commit from Randy Dunlap, which fixes
kernel-doc warnings in the IPE subsystem"
* tag 'ipe-pr-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: policy_fs: fix kernel-doc warnings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"
This reverts commit 36f5f026df6c1cd8a20373adc4388d2b3401ce91, reversing
changes made to 43a7eec035a5b64546c8adefdc9cf96a116da14b.
Thomas says:
"I just noticed that for some incomprehensible reason, probably sheer
incompetemce when trying to utilize b4, I managed to merge an outdated
_and_ buggy version of that series.
Can you please revert that merge completely?"
Done.
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When a filesystem is mounted read-only, subsequent attempts to mount it
as read-write fail with EBUSY. Previously, the error path in
bch2_fs_get_tree() would unconditionally call __bch2_fs_stop(),
improperly freeing resources for a filesystem that was still actively
mounted. This change modifies the error path to only call
__bch2_fs_stop() if the superblock has no valid root dentry, ensuring
resources are not cleaned up prematurely when the filesystem is in use.
Signed-off-by: Florian Albrechtskirchinger <falbrechtskirchinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We want to log errors all at once, not spread across multiple printks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Don't print a newline on empty string; this was causing us to also print
an extra newline when we got to the end of th string.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
syzbot discovered that this one is possible: we have pointers, but none
of them are to valid devices.
Reported-by: syzbot+336a6e6a2dbb7d4dba9a@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The hole we find in the btree might be fully dirty in the page cache. If
so, keep searching.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't call bch2_seek_pagecache_hole(), and block on page locks, with
btree locks held.
This is easily fixed because we're at the end of the transaction - we
can just unlock, we don't need a drop_locks_do().
Reported-by: https://github.com/nagalun
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
state_lock guards against devices coming or leaving, changing state, or
the filesystem changing between ro <-> rw.
But it's not necessary for running recovery passes, and holding it
blocks asynchronous events that would cause us to go RO or kick out
devices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There's no reason for this not to be world readable - it provides the
currently supported on disk format version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Check if the pasid has been attached before going further in the detach
path. This fixes a crash found by syzkaller. Add a selftest as well.
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 668 Comm: repro Not tainted 6.14.0-next-20250325-eb4bc4b07f66 #1 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org4
RIP: 0010:iommufd_hw_pagetable_detach+0x8a/0x4d0
Code: 00 00 00 44 89 ee 48 89 c7 48 89 75 c8 48 89 45 c0 e8 ca 55 17 02 48 89 c2 49 89 c4 48 b8 00 00 00b
RSP: 0018:ffff888021b17b78 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff888014b5a000 RCX: ffff888021b17a64
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88801dad07fc
RBP: ffff888021b17bc8 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: ffff88801dad0e58 R12: 0000000000000000
R13: 0000000000000001 R14: ffff888021b17e18 R15: ffff8880132d3008
FS: 00007fca52013600(0000) GS:ffff8880e3684000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200006c0 CR3: 00000000112d0005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
iommufd_device_detach+0x2a/0x2e0
iommufd_test+0x2f99/0x5cd0
iommufd_fops_ioctl+0x38e/0x520
__x64_sys_ioctl+0x1ba/0x220
x64_sys_call+0x122e/0x2150
do_syscall_64+0x6d/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Link: https://patch.msgid.link/r/20250328133448.22052-1-yi.l.liu@intel.com
Reported-by: Lai Yi <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/linux-iommu/Z+X0tzxhiaupJT7b@ly-workstation
Fixes: c0e301b2978d ("iommufd/device: Add pasid_attach array to track per-PASID attach")
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Change my role for MODULE SUPPORT from a reviewer to a maintainer. We
started to rotate its maintainership and I currently look after the modules
tree. This not being reflected in MAINTAINERS proved to confuse folks.
Add lib/tests/module/ and tools/testing/selftests/module/ to maintained
files. They were introduced previously by commit 84b4a51fce4c ("selftests:
add new kallsyms selftests").
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20250306162117.18876-1-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
|
|
Stephen Rothwell reports htmldocs warnings on iommufd_vevent_header
tables:
Documentation/userspace-api/iommufd:323: ./include/uapi/linux/iommufd.h:1048: CRITICAL: Unexpected section title or transition.
------------------------------------------------------------------------- [docutils]
WARNING: kernel-doc './scripts/kernel-doc -rst -enable-lineno -sphinx-version 8.1.3 ./include/uapi/linux/iommufd.h' processing failed with: Documentation/userspace-api/iommufd:323: ./include/uapi/linux/iommufd.h:1048: (SEVERE/4) Unexpected section title or transition.
-------------------------------------------------------------------------
These are because Sphinx confuses the tables for section headings. Fix
the table markup to squash away above warnings.
Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Link: https://patch.msgid.link/r/20250328114654.55840-1-bagasdotme@gmail.com
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250318213359.5dc56fd1@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Bare unreachable() should be avoided as it generates undefined behavior,
e.g. falling through to the next function. Use BUG() instead so the
error is defined.
Fixes the following warnings:
drivers/iommu/dma-iommu.o: warning: objtool: iommu_dma_sw_msi+0x92: can't find jump dest instruction at .text+0x54d5
vmlinux.o: warning: objtool: iommu_dma_get_msi_page() falls through to next function __iommu_dma_unmap()
Link: https://patch.msgid.link/r/0c801ae017ec078cacd39f8f0898fc7780535f85.1743053325.git.jpoimboe@kernel.org
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/314f8809-cd59-479b-97d7-49356bf1c8d1@infradead.org
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Closes: https://lore.kernel.org/5dd1f35e-8ece-43b7-ad6d-86d02d2718f6@paulmck-laptop
Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
iommufd_veventq_fops_read() decrements veventq->num_events when a vevent
is read out. However, the report path ony increments veventq->num_events
for normal events. To be balanced, make the read path decrement num_events
only for normal vevents.
Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Link: https://patch.msgid.link/r/20250324120034.5940-3-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The vevent->header.flags is not initialized per allocation, hence the
vevent read path may treat the vevent as lost_events_header wrongly.
Use kzalloc() to alloc memory for new vevent.
Fixes: e8e1ef9b77a7 ("iommufd/viommu: Add iommufd_viommu_report_event helper")
Link: https://patch.msgid.link/r/20250324120034.5940-2-yi.l.liu@intel.com
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
IOMMU_HW_INFO is extended to report max_pasid_log2, hence add coverage
for it.
Link: https://patch.msgid.link/r/20250321180143.8468-6-yi.l.liu@intel.com
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
PASID usage requires PASID support in both device and IOMMU. Since the
iommu drivers always enable the PASID capability for the device if it
is supported, this extends the IOMMU_GET_HW_INFO to report the PASID
capability to userspace. Also, enhances the selftest accordingly.
Link: https://patch.msgid.link/r/20250321180143.8468-5-yi.l.liu@intel.com
Cc: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Instead of using kaslr_offset() just record the location of "_text". This
makes it possible for user space to use either the System.map or
/proc/kallsyms as what to map all addresses to functions with.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250326220304.38dbedcd@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the previous boot trace buffer can include module text address in
the stacktrace. As same as the kernel text address, convert the module
text address using the module address information.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174282689201.356346.17647540360450727687.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Variable bmeta is not effectively used, so delete it.
kernel/trace/ring_buffer.c:1952:27: warning: variable ‘bmeta’ set but not used.
Link: https://lore.kernel.org/20250317015524.3902-1-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=19524
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
save_mod()
If the last boot data is already cleared, there is no reason to update it
again. Skip if the TRACE_ARRAY_FL_LAST_BOOT is cleared.
Also, for calling save_mod() when module loading, we don't need to check
the trace is active or not because any module address can be on the
stacktrace.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174165660328.1173316.15529357882704817499.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In allocate_trace_buffer() the following code:
buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
tr->range_addr_start,
tr->range_addr_size,
struct_size(tscratch, entries, 128));
tscratch = ring_buffer_meta_scratch(buf->buffer, &scratch_size);
setup_trace_scratch(tr, tscratch, scratch_size);
Has undefined behavior if ring_buffer_alloc_range() fails because
"scratch_size" is not initialize. If the allocation fails, then
buf->buffer will be NULL. The ring_buffer_meta_scratch() will return
NULL immediately if it is passed a NULL buffer and it will not update
scratch_size. Then setup_trace_scratch() will return immediately if
tscratch is NULL.
Although there's no real issue here, but it is considered undefined
behavior to pass an uninitialized variable to a function as input, and
UBSan may complain about it.
Just initialize scratch_size to zero to make the code defined behavior and
a little more robust.
Link: https://lore.kernel.org/all/44c5deaa-b094-4852-90f9-52f3fb10e67a@stanley.mountain/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
There are some code which depends on CONFIG_MODULES. #ifdef
to enclose it.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174230515367.2909896.8132122175220657625.stgit@mhiramat.tok.corp.google.com
Fixes: dca91c1c5468 ("tracing: Have persistent trace instances save module addresses")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Make the ring buffer on reserved memory to be freeable. This allows us
to free the trace instance on the reserved memory without changing
cmdline and rebooting. Even if we can not change the kernel cmdline
for security reason, we can release the reserved memory for the ring
buffer as free (available) memory.
For example, boot kernel with reserved memory;
"reserve_mem=20M:2M:trace trace_instance=boot_mapped^traceoff@trace"
~ # free
total used free shared buff/cache available
Mem: 1995548 50544 1927568 14964 17436 1911480
Swap: 0 0 0
~ # rmdir /sys/kernel/tracing/instances/boot_mapped/
[ 23.704023] Freeing reserve_mem:trace memory: 20476K
~ # free
total used free shared buff/cache available
Mem: 2016024 41844 1956740 14968 17440 1940572
Swap: 0 0 0
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/173989134814.230693.18199312930337815629.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add reserve_mem_release_by_name() to release a reserved memory region
with a given name. This allows us to release reserved memory which is
defined by kernel cmdline, after boot.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linux-mm@kvack.org
Link: https://lore.kernel.org/173989133862.230693.14094993331347437600.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When a module is loaded and a persistent buffer is actively tracing, add
it to the list of modules in the persistent memory.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.469844721@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add the last boot module's names and addresses to the last_boot_info file.
This only shows the module information from a previous boot. If the buffer
is started and is recording the current boot, this file still will only
show "current".
~# cat instances/boot_mapped/last_boot_info
10c00000 [kernel]
ffffffffc00ca000 usb_serial_simple
ffffffffc00ae000 usbserial
ffffffffc008b000 bfq
~# echo function > instances/boot_mapped/current_tracer
~# cat instances/boot_mapped/last_boot_info
# Current
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.299186021@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
For trace instances that are mapped to persistent memory, have them use
the scratch area to save the currently loaded modules. This will allow
where the modules have been loaded on the next boot so that their
addresses can be deciphered by using where they were loaded previously.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164609.129741650@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The tracing system needs a way to save all the currently loaded modules
and their addresses into persistent memory so that it can evaluate the
addresses on a reboot from a crash. When the persistent memory trace
starts, it will load the module addresses and names into the persistent
memory. To do so, it will call the module_for_each_mod() function and pass
it a function and data structure to get called on each loaded module. Then
it can record the memory.
This only implements that function.
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: linux-modules@vger.kernel.org
Link: https://lore.kernel.org/20250305164608.962615966@goodmis.org
Acked-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
There's no reason to save the KASLR offset for the ring buffer itself.
That is used by the tracer. Now that the tracer has a way to save data in
the persistent memory of the ring buffer, have the tracing infrastructure
take care of the saving of the KASLR offset.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.792722274@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Now that there's one meta data at the start of the persistent memory used by
the ring buffer, allow the caller to request some memory right after that
data that it can use as its own persistent memory.
Also fix some white space issues with ring_buffer_alloc().
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.619631731@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Instead of just having a meta data at the first page of each sub buffer
that has duplicate data, add a new meta page to the entire block of memory
that holds the duplicate data and remove it from the sub buffer meta data.
This will open up the extra memory in this first page to be used by the
tracer for its own persistent data.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.446351513@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Instead of saving off the text and data pointers and using them to compare
with the current boot's text and data pointers, just save off the KASLR
offset. Then that can be used to figure out how to read the previous boots
buffer.
The last_boot_info will now show this offset, but only if it is for a
previous boot:
~# cat instances/boot_mapped/last_boot_info
39000000 [kernel]
~# echo function > instances/boot_mapped/current_tracer
~# cat instances/boot_mapped/last_boot_info
# Current
If the KASLR offset saved is for the current boot, the last_boot_info will
show the value of "current".
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250305164608.274956504@goodmis.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The calculation of bytes-dropped and bytes_dropped_nested is reversed.
Although it does not affect the final calculation of total_dropped,
it should still be modified.
Link: https://lore.kernel.org/20250223070106.6781-1-yangfeng59949@163.com
Fixes: 6c43e554a2a5 ("ring-buffer: Add ring buffer startup selftest")
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When verify_sha256_digest() fails, __apply_microcode_amd() should propagate
the failure by returning false (and not -1 which is promoted to true).
Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250327230503.1850368-2-boris.ostrovsky@oracle.com
|
|
commit 8a7d12d674ac ("net: usb: usbnet: fix name regression") assumed
that local addresses always came from the kernel, but some devices hand
out local mac addresses so we ended up with point-to-point devices with
a mac set by the driver, renaming to eth%d when they used to be named
usb%d.
Userspace should not rely on device name, but for the sake of stability
restore the local mac address check portion of the naming exception:
point to point devices which either have no mac set by the driver or
have a local mac handed out by the driver will keep the usb%d name.
(some USB LTE modems are known to hand out a stable mac from the locally
administered range; that mac appears to be random (different for
mulitple devices) and can be reset with device-specific commands, so
while such devices would benefit from getting a OUI reserved, we have
to deal with these and might as well preserve the existing behavior
to avoid breaking fragile openwrt configurations and such on upgrade.)
Link: https://lkml.kernel.org/r/20241203130457.904325-1-asmadeus@codewreck.org
Fixes: 8a7d12d674ac ("net: usb: usbnet: fix name regression")
Cc: stable@vger.kernel.org
Tested-by: Ahmed Naseef <naseefkm@gmail.com>
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Link: https://patch.msgid.link/20250326-usbnet_rename-v2-1-57eb21fcff26@atmark-techno.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When hw-gro is enabled, the maximum number of header entries that are
needed per wqe (hd_per_wqe) is calculated based on the size of the
reservations among other parameters.
Miscalculation of the size of reservations leads to incorrect
calculation of hd_per_wqe as 0, particularly in the case of large page
size like in aarch64, this prevents the SHAMPO header from being
correctly initialized in the device, ultimately causing the following
cqe err that indicates a violation of PD.
mlx5_core 0000:00:08.0 eth2: ERR CQE on RQ: 0x1180
mlx5_core 0000:00:08.0 eth2: Error cqe on cqn 0x510, ci 0x0, qn 0x1180, opcode 0xe, syndrome 0x4, vendor syndrome 0x32
00000000: 00 00 00 00 04 4a 00 00 00 00 00 00 20 00 93 32
00000010: 55 00 00 00 fb cc 00 00 00 00 00 00 07 18 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 4a
00000030: 00 00 00 9a 93 00 32 04 00 00 00 00 00 00 da e1
Use the correct formula for calculating the size of reservations,
precisely it shouldn't be dependent on page size, instead use the
correct multiply of MLX5E_SHAMPO_WQ_BASE_RESRV_SIZE.
Fixes: e5ca8fb08ab2 ("net/mlx5e: Add control path for SHAMPO feature")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1742732906-166564-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Frag allocators, such as netdev_alloc_frag(), were not designed to
work for fragsz > PAGE_SIZE.
So, switch to page pool for jumbo frames instead of using page frag
allocators. This driver is using page pool for smaller MTUs already.
Cc: stable@vger.kernel.org
Fixes: 80f6215b450e ("net: mana: Add support for jumbo frame")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Link: https://patch.msgid.link/1742920357-27263-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The infrastructure to handle multi-phy devices is fairly standalone.
Add myself as maintainer for that part as well as the netlink uAPI
that exposes it.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250327110013.106865-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
So far, no differences are known, so it can fallback to the default
compatible.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
- remove unused include of linux/fb.h
- use strscpy() instead of strncpy()
* tag 'm68knommu-for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: mm: Replace deprecated strncpy() with strscpy()
m68k: Do not include <linux/fb.h>
|