Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updatesk from Greg KH:
"Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
happened this development cycle, including:
- kernfs scaling changes to make it even faster thanks to rcu
- bin_attribute constify work in many subsystems
- faux bus minor tweaks for the rust bindings
- rust binding updates for driver core, pci, and platform busses,
making more functionaliy available to rust drivers. These are all
due to people actually trying to use the bindings that were in
6.14.
- make Rafael and Danilo full co-maintainers of the driver core
codebase
- other minor fixes and updates"
* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
rust: platform: require Send for Driver trait implementers
rust: pci: require Send for Driver trait implementers
rust: platform: impl Send + Sync for platform::Device
rust: pci: impl Send + Sync for pci::Device
rust: platform: fix unrestricted &mut platform::Device
rust: pci: fix unrestricted &mut pci::Device
rust: device: implement device context marker
rust: pci: use to_result() in enable_device_mem()
MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
rust/kernel/faux: mark Registration methods inline
driver core: faux: only create the device if probe() succeeds
rust/faux: Add missing parent argument to Registration::new()
rust/faux: Drop #[repr(transparent)] from faux::Registration
rust: io: fix devres test with new io accessor functions
rust: io: rename `io::Io` accessors
kernfs: Move dput() outside of the RCU section.
efi: rci2: mark bin_attribute as __ro_after_init
rapidio: constify 'struct bin_attribute'
firmware: qemu_fw_cfg: constify 'struct bin_attribute'
powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "powerpc/crash: use generic crashkernel reservation" from
Sourabh Jain changes powerpc's kexec code to use more of the generic
layers.
- The series "get_maintainer: report subsystem status separately" from
Vlastimil Babka makes some long-requested improvements to the
get_maintainer output.
- The series "ucount: Simplify refcounting with rcuref_t" from
Sebastian Siewior cleans up and optimizing the refcounting in the
ucount code.
- The series "reboot: support runtime configuration of emergency
hw_protection action" from Ahmad Fatoum improves the ability for a
driver to perform an emergency system shutdown or reboot.
- The series "Converge on using secs_to_jiffies() part two" from Easwar
Hariharan performs further migrations from msecs_to_jiffies() to
secs_to_jiffies().
- The series "lib/interval_tree: add some test cases and cleanup" from
Wei Yang permits more userspace testing of kernel library code, adds
some more tests and performs some cleanups.
- The series "hung_task: Dump the blocking task stacktrace" from Masami
Hiramatsu arranges for the hung_task detector to dump the stack of
the blocking task and not just that of the blocked task.
- The series "resource: Split and use DEFINE_RES*() macros" from Andy
Shevchenko provides some cleanups to the resource definition macros.
- Plus the usual shower of singleton patches - please see the
individual changelogs for details.
* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
mailmap: consolidate email addresses of Alexander Sverdlin
fs/procfs: fix the comment above proc_pid_wchan()
relay: use kasprintf() instead of fixed buffer formatting
resource: replace open coded variant of DEFINE_RES()
resource: replace open coded variants of DEFINE_RES_*_NAMED()
resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
samples: add hung_task detector mutex blocking sample
hung_task: show the blocker task if the task is hung on mutex
kexec_core: accept unaccepted kexec segments' destination addresses
watchdog/perf: optimize bytes copied and remove manual NUL-termination
lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
lib/interval_tree: skip the check before go to the right subtree
lib/interval_tree: add test case for span iteration
lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
lib/rbtree: add random seed
lib/rbtree: split tests
lib/rbtree: enable userland test suite for rbtree related data structure
checkpatch: describe --min-conf-desc-length
scripts/gdb/symbols: determine KASLR offset on s390
...
|
|
Since acpi_processor_notify() can be called before registering a cpufreq
driver or even in cases when a cpufreq driver is not registered at all,
cpufreq_update_limits() needs to check if a cpufreq driver is present
and prevent it from being unregistered.
For this purpose, make it call cpufreq_cpu_get() to obtain a cpufreq
policy pointer for the given CPU and reference count the corresponding
policy object, if present.
Fixes: 5a25e3f7cc53 ("cpufreq: intel_pstate: Driver-specific handling of _PPC updates")
Closes: https://lore.kernel.org/linux-acpi/Z-ShAR59cTow0KcR@mail-itl
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/1928789.tdWV9SEqCh@rjwysocki.net
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "Enable strict percpu address space checks" from Uros
Bizjak uses x86 named address space qualifiers to provide
compile-time checking of percpu area accesses.
This has caused a small amount of fallout - two or three issues were
reported. In all cases the calling code was found to be incorrect.
- The series "Some cleanup for memcg" from Chen Ridong implements some
relatively monir cleanups for the memcontrol code.
- The series "mm: fixes for device-exclusive entries (hmm)" from David
Hildenbrand fixes a boatload of issues which David found then using
device-exclusive PTE entries when THP is enabled. More work is
needed, but this makes thins better - our own HMM selftests now
succeed.
- The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
remove the z3fold and zbud implementations. They have been deprecated
for half a year and nobody has complained.
- The series "mm: further simplify VMA merge operation" from Lorenzo
Stoakes implements numerous simplifications in this area. No runtime
effects are anticipated.
- The series "mm/madvise: remove redundant mmap_lock operations from
process_madvise()" from SeongJae Park rationalizes the locking in the
madvise() implementation. Performance gains of 20-25% were observed
in one MADV_DONTNEED microbenchmark.
- The series "Tiny cleanup and improvements about SWAP code" from
Baoquan He contains a number of touchups to issues which Baoquan
noticed when working on the swap code.
- The series "mm: kmemleak: Usability improvements" from Catalin
Marinas implements a couple of improvements to the kmemleak
user-visible output.
- The series "mm/damon/paddr: fix large folios access and schemes
handling" from Usama Arif provides a couple of fixes for DAMON's
handling of large folios.
- The series "mm/damon/core: fix wrong and/or useless damos_walk()
behaviors" from SeongJae Park fixes a few issues with the accuracy of
kdamond's walking of DAMON regions.
- The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
Stoakes changes the interaction between framebuffer deferred-io and
core MM. No functional changes are anticipated - this is preparatory
work for the future removal of page structure fields.
- The series "mm/damon: add support for hugepage_size DAMOS filter"
from Usama Arif adds a DAMOS filter which permits the filtering by
huge page sizes.
- The series "mm: permit guard regions for file-backed/shmem mappings"
from Lorenzo Stoakes extends the guard region feature from its
present "anon mappings only" state. The feature now covers shmem and
file-backed mappings.
- The series "mm: batched unmap lazyfree large folios during
reclamation" from Barry Song cleans up and speeds up the unmapping
for pte-mapped large folios.
- The series "reimplement per-vma lock as a refcount" from Suren
Baghdasaryan puts the vm_lock back into the vma. Our reasons for
pulling it out were largely bogus and that change made the code more
messy. This patchset provides small (0-10%) improvements on one
microbenchmark.
- The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
improves" from SeongJae Park does some maintenance work on the DAMON
docs.
- The series "hugetlb/CMA improvements for large systems" from Frank
van der Linden addresses a pile of issues which have been observed
when using CMA on large machines.
- The series "mm/damon: introduce DAMOS filter type for unmapped pages"
from SeongJae Park enables users of DMAON/DAMOS to filter my the
page's mapped/unmapped status.
- The series "zsmalloc/zram: there be preemption" from Sergey
Senozhatsky teaches zram to run its compression and decompression
operations preemptibly.
- The series "selftests/mm: Some cleanups from trying to run them" from
Brendan Jackman fixes a pile of unrelated issues which Brendan
encountered while runnimg our selftests.
- The series "fs/proc/task_mmu: add guard region bit to pagemap" from
Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
determine whether a particular page is a guard page.
- The series "mm, swap: remove swap slot cache" from Kairui Song
removes the swap slot cache from the allocation path - it simply
wasn't being effective.
- The series "mm: cleanups for device-exclusive entries (hmm)" from
David Hildenbrand implements a number of unrelated cleanups in this
code.
- The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
implements a number of preparatoty cleanups to the GENERIC_PTDUMP
Kconfig logic.
- The series "mm/damon: auto-tune aggregation interval" from SeongJae
Park implements a feedback-driven automatic tuning feature for
DAMON's aggregation interval tuning.
- The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
preparation for implementing lazy mmu mode for arm64 to optimize
vmalloc.
- The series "mm/page_alloc: Some clarifications for migratetype
fallback" from Brendan Jackman reworks some commentary to make the
code easier to follow.
- The series "page_counter cleanup and size reduction" from Shakeel
Butt cleans up the page_counter code and fixes a size increase which
we accidentally added late last year.
- The series "Add a command line option that enables control of how
many threads should be used to allocate huge pages" from Thomas
Prescher does that. It allows the careful operator to significantly
reduce boot time by tuning the parallalization of huge page
initialization.
- The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
from Tang Yizhou fixes the tracing output from the dirty page
balancing code.
- The series "mm/damon: make allow filters after reject filters useful
and intuitive" from SeongJae Park improves the handling of allow and
reject filters. Behaviour is made more consistent and the documention
is updated accordingly.
- The series "Switch zswap to object read/write APIs" from Yosry Ahmed
updates zswap to the new object read/write APIs and thus permits the
removal of some legacy code from zpool and zsmalloc.
- The series "Some trivial cleanups for shmem" from Baolin Wang does as
it claims.
- The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
Alistair Popple regularizes the weird ZONE_DEVICE page refcount
handling in DAX, permittig the removal of a number of special-case
checks.
- The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
preparatoty refactoring and cleanup of the mremap() code.
- The series "mm: MM owner tracking for large folios (!hugetlb) +
CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
which we determine whether a large folio is known to be mapped
exclusively into a single MM.
- The series "mm/damon: add sysfs dirs for managing DAMOS filters based
on handling layers" from SeongJae Park adds a couple of new sysfs
directories to ease the management of DAMON/DAMOS filters.
- The series "arch, mm: reduce code duplication in mem_init()" from
Mike Rapoport consolidates many per-arch implementations of
mem_init() into code generic code, where that is practical.
- The series "mm/damon/sysfs: commit parameters online via
damon_call()" from SeongJae Park continues the cleaning up of sysfs
access to DAMON internal data.
- The series "mm: page_ext: Introduce new iteration API" from Luiz
Capitulino reworks the page_ext initialization to fix a boot-time
crash which was observed with an unusual combination of compile and
cmdline options.
- The series "Buddy allocator like (or non-uniform) folio split" from
Zi Yan reworks the code to split a folio into smaller folios. The
main benefit is lessened memory consumption: fewer post-split folios
are generated.
- The series "Minimize xa_node allocation during xarry split" from Zi
Yan reduces the number of xarray xa_nodes which are generated during
an xarray split.
- The series "drivers/base/memory: Two cleanups" from Gavin Shan
performs some maintenance work on the drivers/base/memory code.
- The series "Add tracepoints for lowmem reserves, watermarks and
totalreserve_pages" from Martin Liu adds some more tracepoints to the
page allocator code.
- The series "mm/madvise: cleanup requests validations and
classifications" from SeongJae Park cleans up some warts which
SeongJae observed during his earlier madvise work.
- The series "mm/hwpoison: Fix regressions in memory failure handling"
from Shuai Xue addresses two quite serious regressions which Shuai
has observed in the memory-failure implementation.
- The series "mm: reliable huge page allocator" from Johannes Weiner
makes huge page allocations cheaper and more reliable by reducing
fragmentation.
- The series "Minor memcg cleanups & prep for memdescs" from Matthew
Wilcox is preparatory work for the future implementation of memdescs.
- The series "track memory used by balloon drivers" from Nico Pache
introduces a way to track memory used by our various balloon drivers.
- The series "mm/damon: introduce DAMOS filter type for active pages"
from Nhat Pham permits users to filter for active/inactive pages,
separately for file and anon pages.
- The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
separates the proactive reclaim statistics from the direct reclaim
statistics.
- The series "mm/vmscan: don't try to reclaim hwpoison folio" from
Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
code.
* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
x86/mm: restore early initialization of high_memory for 32-bits
mm/vmscan: don't try to reclaim hwpoison folio
mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
selftests/mm: speed up split_huge_page_test
selftests/mm: uffd-unit-tests support for hugepages > 2M
docs/mm/damon/design: document active DAMOS filter type
mm/damon: implement a new DAMOS filter type for active pages
fs/dax: don't disassociate zero page entries
MM documentation: add "Unaccepted" meminfo entry
selftests/mm: add commentary about 9pfs bugs
fork: use __vmalloc_node() for stack allocation
docs/mm: Physical Memory: Populate the "Zones" section
xen: balloon: update the NR_BALLOON_PAGES state
hv_balloon: update the NR_BALLOON_PAGES state
balloon_compaction: update the NR_BALLOON_PAGES state
meminfo: add a per node counter for balloon drivers
mm: remove references to folio in __memcg_kmem_uncharge_page()
...
|
|
If 'regs' points to a local stack variable, prepare_frametrace() stores
all registers to the stack. This confuses objtool as it expects them to
be restored from the stack later.
The stores don't affect stack tracing, so use unwind hints to hide them
from objtool.
Fixes the following warnings:
arch/loongarch/kernel/traps.o: warning: objtool: show_stack+0xe0: stack state mismatch: reg1[22]=-1+0 reg2[22]=-2-160
arch/loongarch/kernel/traps.o: warning: objtool: show_stack+0xe0: stack state mismatch: reg1[23]=-1+0 reg2[23]=-2-152
Fixes: cb8a2ef0848c ("LoongArch: Add ORC stack unwinder support")
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/270cadd8040dda74db2307f23497bb68e65db98d.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202503280703.OARM8SrY-lkp@intel.com/
|
|
Thanks to CONFIG_DEBUG_SECTION_MISMATCH, empty functions can be
generated out of line. rcu_irq_work_resched() can be called from
noinstr code, so make sure it's always inlined.
Fixes: 564506495ca9 ("rcu/context-tracking: Move deferred nocb resched to context tracking")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/e84f15f013c07e4c410d972e75620c53b62c1b3e.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/d1eca076-fdde-484a-b33e-70e0d167c36d@infradead.org
|
|
Thanks to CONFIG_DEBUG_SECTION_MISMATCH, empty functions can be
generated out of line. These can be called from noinstr code, so make
sure they're always inlined.
Fixes the following warnings:
vmlinux.o: warning: objtool: irqentry_nmi_enter+0xa2: call to ct_nmi_enter() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_nmi_exit+0x16: call to ct_nmi_exit() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_exit+0x78: call to ct_irq_exit() leaves .noinstr.text section
Fixes: 6f0e6c1598b1 ("context_tracking: Take IRQ eqs entrypoints over RCU")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/8509bce3f536bcd4ae7af3a2cf6930d48c5e631a.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/d1eca076-fdde-484a-b33e-70e0d167c36d@infradead.org
|
|
sched_smt_active() can be called from noinstr code, so it should always
be inlined. The CONFIG_SCHED_SMT version already has __always_inline.
Do the same for its !CONFIG_SCHED_SMT counterpart.
Fixes the following warning:
vmlinux.o: error: objtool: intel_idle_ibrs+0x13: call to sched_smt_active() leaves .noinstr.text section
Fixes: 321a874a7ef8 ("sched/smt: Expose sched_smt_present static key")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/1d03907b0a247cf7fb5c1d518de378864f603060.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/r/202503311434.lyw2Tveh-lkp@intel.com/
|
|
In verbose mode, when printing the disassembly of affected functions, if
CROSS_COMPILE isn't set, the objdump command string gets prefixed with
"(null)".
Somehow this worked before. Maybe some versions of glibc return an
empty string instead of NULL. Fix it regardless.
[ jpoimboe: Rewrite commit log. ]
Fixes: ca653464dd097 ("objtool: Add verbose option for disassembling affected functions")
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250215142321.14081-1-david.laight.linux@gmail.com
Link: https://lore.kernel.org/r/b931a4786bc0127aa4c94e8b35ed617dcbd3d3da.1743481539.git.jpoimboe@kernel.org
|
|
This is similar to GCC's behavior and makes it more obvious why the
build failed.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/0ea76f4b0e7a370711ed9f75fd0792bb5979c2bf.1743481539.git.jpoimboe@kernel.org
|
|
Objtool writes several object annotations which are used to enable
critical kernel runtime functionalities like static calls and
retpoline/rethunk patching.
In the rare case where it fails to read or write an object, the
annotations don't get written, causing runtime code patching to fail and
code to become corrupted.
Due to the catastrophic nature of such warnings, convert them to errors
which fail the build regardless of CONFIG_OBJTOOL_WERROR.
Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/7d35684ca61eac56eb2424f300ca43c5d257b170.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/SJ1PR11MB61295789E25C2F5197EFF2F6B9A72@SJ1PR11MB6129.namprd11.prod.outlook.com
|
|
This reverts commit 0a7fb6f07e3ad497d31ae9a2082d2cacab43d54a.
The "skipping duplicate warnings" warning is technically not an actual
warning, which can cause confusion. This feature isn't all that useful
anyway. It's exceedingly rare for a function to have more than one
unrelated warning.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/e5abe5e858acf1a9207a5dfa0f37d17ac9dca872.1743481539.git.jpoimboe@kernel.org
|
|
Append with "()" to clarify it's a function.
Before:
vmlinux.o: warning: objtool: cdns_mrvl_xspi_setup_clock: unexpected end of section .text.cdns_mrvl_xspi_setup_clock
After:
vmlinux.o: warning: objtool: cdns_mrvl_xspi_setup_clock(): unexpected end of section .text.cdns_mrvl_xspi_setup_clock
Fixes: c5995abe1547 ("objtool: Improve error handling")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/692e1e0d0b15a71bd35c6b4b87f3c75cd5a57358.1743481539.git.jpoimboe@kernel.org
|
|
When KCOV or GCOV is enabled, dead code can be left behind, in which
case objtool silences unreachable and undefined behavior (fallthrough)
warnings.
Fallthrough warnings, and their variant "end of section" warnings, were
silenced with the following commit:
6b023c784204 ("objtool: Silence more KCOV warnings")
Another variant of a fallthrough warning is a jump to the end of a
function. If that function happens to be at the end of a section, the
jump destination doesn't actually exist.
Normally that would be a fatal objtool error, but for KCOV/GCOV it's
just another undefined behavior fallthrough. Silence it like the
others.
Fixes the following warning:
drivers/iommu/dma-iommu.o: warning: objtool: iommu_dma_sw_msi+0x92: can't find jump dest instruction at .text+0x54d5
Fixes: 6b023c784204 ("objtool: Silence more KCOV warnings")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/08fbe7d7e1e20612206f1df253077b94f178d93e.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/314f8809-cd59-479b-97d7-49356bf1c8d1@infradead.org/
|
|
Similar to GCOV, KCOV can leave behind dead code and undefined behavior.
Warnings related to those should be ignored.
The previous commit:
6b023c784204 ("objtool: Silence more KCOV warnings")
... only did so for CONFIG_CGOV_KERNEL. Also do it for CONFIG_KCOV, but
for real this time.
Fixes the following warning:
vmlinux.o: warning: objtool: synaptics_report_mt_data: unexpected end of section .text.synaptics_report_mt_data
Fixes: 6b023c784204 ("objtool: Silence more KCOV warnings")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/a44ba16e194bcbc52c1cef3d3cd9051a62622723.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202503282236.UhfRsF3B-lkp@intel.com/
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust fix from Miguel Ojeda:
"Fix 'generate_rust_analyzer.py' due to typo during merge"
Mea culpa, mea maxima culpa.
* tag 'rust-fixes-6.15-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
scripts: generate_rust_analyzer: fix pin-init name in kernel deps
|
|
Pull more bcachefs updates from Kent Overstreet:
"All bugfixes and logging improvements"
* tag 'bcachefs-2025-03-31' of git://evilpiepirate.org/bcachefs: (35 commits)
bcachefs: fix bch2_write_point_to_text() units
bcachefs: Log original key being moved in data updates
bcachefs: BCH_JSET_ENTRY_log_bkey
bcachefs: Reorder error messages that include journal debug
bcachefs: Don't use designated initializers for disk_accounting_pos
bcachefs: Silence errors after emergency shutdown
bcachefs: fix units in rebalance_status
bcachefs: bch2_ioctl_subvolume_destroy() fixes
bcachefs: Clear fs_path_parent on subvolume unlink
bcachefs: Change btree_insert_node() assertion to error
bcachefs: Better printing of inconsistency errors
bcachefs: bch2_count_fsck_err()
bcachefs: Better helpers for inconsistency errors
bcachefs: Consistent indentation of multiline fsck errors
bcachefs: Add an "ignore unknown" option to bch2_parse_mount_opts()
bcachefs: bch2_time_stats_init_no_pcpu()
bcachefs: Fix bch2_fs_get_tree() error path
bcachefs: fix logging in journal_entry_err_msg()
bcachefs: add missing newline in bch2_trans_updates_to_text()
bcachefs: print_string_as_lines: fix extra newline
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, udf, and isofs updates from Jan Kara:
- conversion of ext2 to the new mount API
- small folio conversion work for ext2
- a fix of an unexpected return value in udf in inode_getblk()
- a fix of handling of corrupted directory in isofs
* tag 'fs_for_v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix inode_getblk() return value
ext2: Make ext2_params_spec static
ext2: create ext2_msg_fc for use during parsing
ext2: convert to the new mount API
ext2: Remove reference to bh->b_page
isofs: fix KMSAN uninit-value bug in do_isofs_readdir()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Fix random stack corruption and incorrect error returns in
exfat_get_block()
- Optimize exfat_get_block() by improving checking corner cases
- Fix an endless loop by self-linked chain in exfat_find_last_cluster
- Remove dead EXFAT_CLUSTERS_UNTRACKED codes
- Add missing shutdown check
- Improve the delete performance with discard mount option
* tag 'exfat-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: call bh_read in get_block only when necessary
exfat: fix potential wrong error return from get_block
exfat: fix missing shutdown check
exfat: fix the infinite loop in exfat_find_last_cluster()
exfat: fix random stack corruption after get_block
exfat: remove count used cluster from exfat_statfs()
exfat: support batch discard of clusters when freeing clusters
|
|
Pull smb server updates from Steve French:
- Two fixes for bounds checks of open contexts
- Two multichannel fixes, including one for important UAF
- Oplock/lease break fix for potential ksmbd connection refcount leak
- Security fix to free crypto data more securely
- Fix to enable allowing Kerberos authentication by default
- Two RDMA/smbdirect fixes
- Minor cleanup
* tag 'v6.15rc-part1-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix r_count dec/increment mismatch
ksmbd: fix multichannel connection failure
ksmbd: fix use-after-free in ksmbd_sessions_deregister()
ksmbd: use ib_device_get_netdev() instead of calling ops.get_netdev
ksmbd: use aead_request_free to match aead_request_alloc
Revert "ksmbd: fix missing RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev()"
ksmbd: add bounds check for create lease context
ksmbd: add bounds check for durable handle context
ksmbd: make SMB_SERVER_KERBEROS5 enable by default
ksmbd: Use str_read_write() and str_true_false() helpers
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- Fix for network namespace refcount leak
- Multichannel fix and minor multichannel debug message cleanup
- Fix potential null ptr reference in SMB3 close
- Fix for special file handling when reparse points not supported by
server
- Two ACL fixes one for stricter ACE validation, one for incorrect
perms requested
- Three RFC1001 fixes: one for SMB3 mounts on port 139, one for better
default hostname, and one for better session response processing
- Minor update to email address for MAINTAINERS file
- Allow disabling Unicode for access to old SMB1 servers
- Three minor cleanups
* tag '6.15-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add new mount option -o nounicode to disable SMB1 UNICODE mode
cifs: Set default Netbios RFC1001 server name to hostname in UNC
smb: client: Fix netns refcount imbalance causing leaks and use-after-free
cifs: add validation check for the fields in smb_aces
CIFS: Propagate min offload along with other parameters from primary to secondary channels.
cifs: Improve establishing SMB connection with NetBIOS session
cifs: Fix establishing NetBIOS session for SMB2+ connection
cifs: Fix getting DACL-only xattr system.cifs_acl and system.smb3_acl
cifs: Check if server supports reparse points before using them
MAINTAINERS: reorder preferred email for Steve French
cifs: avoid NULL pointer dereference in dbg call
smb: client: Remove redundant check in smb2_is_path_accessible()
smb: client: Remove redundant check in cifs_oplock_break()
smb: mark the new channel addition log as informational log with cifs_info
smb: minor cleanup to remove unused function declaration
|
|
Pull nfsd updates from Chuck Lever:
"Neil Brown contributed more scalability improvements to NFSD's open
file cache, and Jeff Layton contributed a menagerie of repairs to
NFSD's NFSv4 callback / backchannel implementation.
Mike Snitzer contributed a change to NFS re-export support that
disables support for file locking on a re-exported NFSv4 mount. This
is because NFSv4 state recovery is currently difficult if not
impossible for re-exported NFS mounts. The change aims to prevent data
integrity exposures after the re-export server crashes.
Work continues on the evolving NFSD netlink administrative API.
Many thanks to the contributors, reviewers, testers, and bug reporters
who participated during the v6.15 development cycle"
* tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
NFSD: Add a Kconfig setting to enable delegated timestamps
sysctl: Fixes nsm_local_state bounds
nfsd: use a long for the count in nfsd4_state_shrinker_count()
nfsd: remove obsolete comment from nfs4_alloc_stid
nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()
nfsd: reorganize struct nfs4_delegation for better packing
nfsd: handle errors from rpc_call_async()
nfsd: move cb_need_restart flag into cb_flags
nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
nfsd: prevent callback tasks running concurrently
nfsd: disallow file locking and delegations for NFSv4 reexport
nfsd: filecache: drop the list_lru lock during lock gc scans
nfsd: filecache: don't repeatedly add/remove files on the lru list
nfsd: filecache: introduce NFSD_FILE_RECENT
nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()
NFSD: Re-organize nfsd_file_gc_worker()
nfsd: filecache: remove race handling.
fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning
...
|
|
This reverts commit 0de2a5c4b824da2205658ebebb99a55c43cdf60f.
I forgot that a TCP socket could receive messages in its error queue.
sock_queue_err_skb() can be called without socket lock being held,
and changes sk->sk_rmem_alloc.
The fact that skbs in error queue are limited by sk->sk_rcvbuf
means that error messages can be dropped if socket receive
queues are full, which is an orthogonal issue.
In future kernels, we could use a separate sk->sk_error_mem_alloc
counter specifically for the error queue.
Fixes: 0de2a5c4b824 ("tcp: avoid atomic operations on sk->sk_rmem_alloc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250331075946.31960-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Taehee reports missing rtnl from bnxt_shutdown path:
inetdev_event (./include/linux/inetdevice.h:256 net/ipv4/devinet.c:1585)
notifier_call_chain (kernel/notifier.c:85)
__dev_close_many (net/core/dev.c:1732 (discriminator 3))
kernel/locking/mutex.c:713 kernel/locking/mutex.c:732)
dev_close_many (net/core/dev.c:1786)
netif_close (./include/linux/list.h:124 ./include/linux/list.h:215
bnxt_shutdown (drivers/net/ethernet/broadcom/bnxt/bnxt.c:16707) bnxt_en
pci_device_shutdown (drivers/pci/pci-driver.c:511)
device_shutdown (drivers/base/core.c:4820)
kernel_restart (kernel/reboot.c:271 kernel/reboot.c:285)
Bring back the rtnl lock.
Link: https://lore.kernel.org/netdev/CAMArcTV4P8PFsc6O2tSgzRno050DzafgqkLA2b7t=Fv_SY=brw@mail.gmail.com/
Fixes: 004b5008016a ("eth: bnxt: remove most dependencies on RTNL")
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250328174216.3513079-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All the misc entry points end up calling into either gve_open()
or gve_close(), they take rtnl_lock today but since the recent
instance locking changes should also take the instance lock.
Found by code inspection and untested.
Fixes: cae03e5bdd9e ("net: hold netdev instance lock during queue operations")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20250328164742.1268069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: misc. fixes for 6.15-rc0
Here are 4 unrelated patches:
- Patch 1: fix a NULL pointer when two SYN-ACK for the same request are
handled in parallel. A fix for up to v5.9.
- Patch 2: selftests: fix check for the wrong FD. A fix for up to v5.17.
- Patch 3: selftests: close all FDs in case of error. A fix for up to
v5.17.
- Patch 4: selftests: ignore a new generated file. A fix for 6.15-rc0.
====================
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-0-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A new binary is now generated by the MPTCP selftests: mptcp_diag.
Like the other binaries from this directory, there is no need to track
this in Git, it should then be ignored.
Fixes: 00f5e338cf7e ("selftests: mptcp: Add a tool to get specific msk_info")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-4-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The file descriptor 'fd_in' is opened when cfg_input is configured, but
not closed in main_loop(), this patch fixes it.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Co-developed-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-3-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix a bug where the code was checking the wrong file descriptors
when opening the input files. The code was checking 'fd' instead
of 'fd_in', which could lead to incorrect error handling.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Fixes: ca7ae8916043 ("selftests: mptcp: mptfo Initiator/Listener")
Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-2-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When testing valkey benchmark tool with MPTCP, the kernel panics in
'mptcp_can_accept_new_subflow' because subflow_req->msk is NULL.
Call trace:
mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P)
subflow_syn_recv_sock (./net/mptcp/subflow.c:854)
tcp_check_req (./net/ipv4/tcp_minisocks.c:863)
tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268)
ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207)
ip_local_deliver_finish (./net/ipv4/ip_input.c:234)
ip_local_deliver (./net/ipv4/ip_input.c:254)
ip_rcv_finish (./net/ipv4/ip_input.c:449)
...
According to the debug log, the same req received two SYN-ACK in a very
short time, very likely because the client retransmits the syn ack due
to multiple reasons.
Even if the packets are transmitted with a relevant time interval, they
can be processed by the server on different CPUs concurrently). The
'subflow_req->msk' ownership is transferred to the subflow the first,
and there will be a risk of a null pointer dereference here.
This patch fixes this issue by moving the 'subflow_req->msk' under the
`own_req == true` conditional.
Note that the !msk check in subflow_hmac_valid() can be dropped, because
the same check already exists under the own_req mpj branch where the
code has been moved to.
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-1-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Due to the incorrect initial vector number in
rvu_nix_unregister_interrupts(), NIX_AF_INT_VEC_GEN is not
geeting free. Fix the vector number to include NIX_AF_INT_VEC_GEN
irq.
Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250327094054.2312-1-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When number of RVU VFs > 64, the vfs value passed to "rvu_queue_work"
function is incorrect. Due to which mbox workqueue entries for
VFs 0 to 63 never gets added to workqueue.
Fixes: 9bdc47a6e328 ("octeontx2-af: Mbox communication support btw AF and it's VFs")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250327091441.1284-1-gakula@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the netdev_nl_sock_priv_destroy(), an instance lock is acquired
before calling net_devmem_unbind_dmabuf(), then releasing an instance
lock(netdev_unlock(binding->dev)).
However, a binding is freed in the net_devmem_unbind_dmabuf().
So using a binding after net_devmem_unbind_dmabuf() occurs UAF.
To fix this UAF, it needs to use temporary variable.
Fixes: ba6f418fbf64 ("net: bubble up taking netdev instance lock to callers of net_devmem_unbind_dmabuf()")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250328062237.3746875-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
selftests: drv-net: replace the rpath helper with Path objects
Trying to change the env.rpath() helper during the development
cycle was causing a lot of conflicts between net and net-next.
Let's get it converted now that the trees are converged.
v2: https://lore.kernel.org/20250306171158.1836674-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250327222315.1098596-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that net and net-next have converged we can use the Path
helpers in the ping test without conflicts.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 29b036be1b0b ("selftests: drv-net: test XDP, HDS auto and
the ioctl path") added an sample XDP_PASS prog in net/lib, so
that we can reuse it in various sub-directories. Delete the old
sample and use the one from the lib in existing tests.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The single letter + "path" helpers do not have many fans (see Link).
Use a Path object with a better name. test_dir is the replacement
for rpath(), net_lib_dir is a new path of the $ksft/net/lib directory.
The Path() class overloads the "/" operator and can be cast to string
automatically, so to get a path to a file tests can do:
path = env.test_dir / "binary"
Link: https://lore.kernel.org/CA+FuTSemTNVZ5MxXkq8T9P=DYm=nSXcJnL7CJBPZNAT_9UFisQ@mail.gmail.com
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
drivers/net/wan/lapbether.c uses stacked devices.
Like similar drivers, it must use netdev_lockdep_set_classes()
to avoid LOCKDEP splats.
This is similar to commit 9bfc9d65a1dc ("hamradio:
use netdev_lockdep_set_classes() helper")
Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
Reported-by: syzbot+377b71db585c9c705f8e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/67cd611c.050a0220.14db68.0073.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250327144439.2463509-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It turns out the code to generate the x86 cpufeaturemasks.h header was
way too aggressive, and would re-generate it whenever the timestamp on
the kernel config file changed.
Now, the regular 'make *config' tools are fairly careful to not rewrite
the kernel config file unless the contents change, but other usecases
aren't that careful.
Michael Kelley reports that 'make-kpkg' ends up doing "make syncconfig"
multiple times in prepping to build, and will modify the config file in
the process (and then modify it back, but by then the timestamps have
changed).
Jakub Kicinski reports that the netdev CI does something similar in how
it generates the config file in multiple steps.
In both cases, the config file timestamp updates then cause the
cpufeaturemasks.h file to be regenerated, and that in turn then causes
lots of unnecessary rebuilds due to all the normal dependencies.
Fix it by using our 'filechk' infrastructure in the Makefile to generate
the header file. That will only write a new version of the file if the
contents of the file have actually changed.
Fixes: 841326332bcb ("x86/cpufeatures: Generate the <asm/cpufeaturemasks.h> header based on build config")
Reported-by: Michael Kelley <mhklinux@outlook.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/all/SN6PR02MB415756D1829740F6E8AC11D1D4D82@SN6PR02MB4157.namprd02.prod.outlook.com/
Link: https://lore.kernel.org/all/20250328162311.08134fa6@kernel.org/
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
- Restructure the persistent memory to have a "scratch" area
Instead of hard coding the KASLR offset in the persistent memory by
the ring buffer, push that work up to the callers of the persistent
memory as they are the ones that need this information. The offsets
and such is not important to the ring buffer logic and it should not
be part of that.
A scratch pad is now created when the caller allocates a ring buffer
from persistent memory by stating how much memory it needs to save.
- Allow where modules are loaded to be saved in the new scratch pad
Save the addresses of modules when they are loaded into the
persistent memory scratch pad.
- A new module_for_each_mod() helper function was created
With the acknowledgement of the module maintainers a new module
helper function was created to iterate over all the currently loaded
modules. This has a callback to be called for each module. This is
needed for when tracing is started in the persistent buffer and the
currently loaded modules need to be saved in the scratch area.
- Expose the last boot information where the kernel and modules were
loaded
The last_boot_info file is updated to print out the addresses of
where the kernel "_text" location was loaded from a previous boot, as
well as where the modules are loaded. If the buffer is recording the
current boot, it only prints "# Current" so that it does not expose
the KASLR offset of the currently running kernel.
- Allow the persistent ring buffer to be released (freed)
To have this in production environments, where the kernel command
line can not be changed easily, the ring buffer needs to be freed
when it is not going to be used. The memory for the buffer will
always be allocated at boot up, but if the system isn't going to
enable tracing, the memory needs to be freed. Allow it to be freed
and added back to the kernel memory pool.
- Allow stack traces to print the function names in the persistent
buffer
Now that the modules are saved in the persistent ring buffer, if the
same modules are loaded, the printing of the function names will
examine the saved modules. If the module is found in the scratch area
and is also loaded, then it will do the offset shift and use kallsyms
to display the function name. If the address is not found, it simply
displays the address from the previous boot in hex.
* tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Use _text and the kernel offset in last_boot_info
tracing: Show last module text symbols in the stacktrace
ring-buffer: Remove the unused variable bmeta
tracing: Skip update_last_data() if cleared and remove active check for save_mod()
tracing: Initialize scratch_size to zero to prevent UB
tracing: Fix a compilation error without CONFIG_MODULES
tracing: Freeable reserved ring buffer
mm/memblock: Add reserved memory release function
tracing: Update modules to persistent instances when loaded
tracing: Show module names and addresses of last boot
tracing: Have persistent trace instances save module addresses
module: Add module_for_each_mod() function
tracing: Have persistent trace instances save KASLR offset
ring-buffer: Add ring_buffer_meta_scratch()
ring-buffer: Add buffer meta data for persistent ring buffer
ring-buffer: Use kaslr address instead of text delta
ring-buffer: Fix bytes_dropped calculation issue
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing documentation fix from Steven Rostedt:
"Documentation fix for runtime verifier
The runtime verifier documents that were created were not referenced
in the indices, which caused warning when building the documentation
tree. Those documents are now added to the rv indices"
* tag 'trace-latency-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
Documentation/rv: Add sched pages to the indices
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf record:
- Introduce latency profiling using scheduler information.
The latency profiling is to show impacts on wall-time rather than
cpu-time. By tracking context switches, it can weight samples and
find which part of the code contributed more to the execution
latency.
The value (period) of the sample is weighted by dividing it by the
number of parallel execution at the moment. The parallelism is
tracked in perf report with sched-switch records. This will reduce
the portion that are run in parallel and in turn increase the
portion of serial executions.
For now, it's limited to profile processes, IOW system-wide
profiling is not supported. You can add --latency option to enable
this.
$ perf record --latency -- make -C tools/perf
I've run the above command for perf build which adds -j option to
make with the number of CPUs in the system internally. Normally
it'd show something like below:
$ perf report -F overhead,comm
...
#
# Overhead Command
# ........ ...............
#
78.97% cc1
6.54% python3
4.21% shellcheck
3.28% ld
1.80% as
1.37% cc1plus
0.80% sh
0.62% clang
0.56% gcc
0.44% perl
0.39% make
...
The cc1 takes around 80% of the overhead as it's the actual
compiler. However it runs in parallel so its contribution to
latency may be less than that. Now, perf report will show both
overhead and latency (if --latency was given at record time) like
below:
$ perf report -s comm
...
#
# Overhead Latency Command
# ........ ........ ...............
#
78.97% 48.66% cc1
6.54% 25.68% python3
4.21% 0.39% shellcheck
3.28% 13.70% ld
1.80% 2.56% as
1.37% 3.08% cc1plus
0.80% 0.98% sh
0.62% 0.61% clang
0.56% 0.33% gcc
0.44% 1.71% perl
0.39% 0.83% make
...
You can see latency of cc1 goes down to around 50% and python3 and
ld contribute a lot more than their overhead. You can use --latency
option in perf report to get the same result but ordered by
latency.
$ perf report --latency -s comm
perf report:
- As a side effect of the latency profiling work, it adds a new
output field 'latency' and a sort key 'parallelism'. The below is a
result from my system with 64 CPUs. The build was well-parallelized
but contained some serial portions.
$ perf report -s parallelism
...
#
# Overhead Latency Parallelism
# ........ ........ ...........
#
16.95% 1.54% 62
13.38% 1.24% 61
12.50% 70.47% 1
11.81% 1.06% 63
7.59% 0.71% 60
4.33% 12.20% 2
3.41% 0.33% 59
2.05% 0.18% 64
1.75% 1.09% 9
1.64% 1.85% 5
...
- Support Feodra mini-debuginfo which is a LZMA compressed symbol
table inside ".gnu_debugdata" ELF section.
perf annotate:
- Add --code-with-type option to enable data-type profiling with the
usual annotate output.
Instead of focusing on data structure, it shows code annotation
together with data type it accesses in case the instruction refers
to a memory location (and it was able to resolve the target data
type). Currently it only works with --stdio.
$ perf annotate --stdio --code-with-type
...
Percent | Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period)
----------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff81050610 <__fdget>:
0.00 : ffffffff81050610: callq 0xffffffff81c01b80 <__fentry__> # data-type: (stack operation)
0.00 : ffffffff81050615: pushq %rbp # data-type: (stack operation)
0.00 : ffffffff81050616: movq %rsp, %rbp
0.00 : ffffffff81050619: pushq %r15 # data-type: (stack operation)
0.00 : ffffffff8105061b: pushq %r14 # data-type: (stack operation)
0.00 : ffffffff8105061d: pushq %rbx # data-type: (stack operation)
0.00 : ffffffff8105061e: subq $0x10, %rsp
0.00 : ffffffff81050622: movl %edi, %ebx
0.00 : ffffffff81050624: movq %gs:0x7efc4814(%rip), %rax # 0x14e40 <current_task> # data-type: struct task_struct* +0
0.00 : ffffffff8105062c: movq 0x8d0(%rax), %r14 # data-type: struct task_struct +0x8d0 (files)
0.00 : ffffffff81050633: movl (%r14), %eax # data-type: struct files_struct +0 (count.counter)
0.00 : ffffffff81050636: cmpl $0x1, %eax
0.00 : ffffffff81050639: je 0xffffffff810506a9 <__fdget+0x99>
0.00 : ffffffff8105063b: movq 0x20(%r14), %rcx # data-type: struct files_struct +0x20 (fdt)
0.00 : ffffffff8105063f: movl (%rcx), %eax # data-type: struct fdtable +0 (max_fds)
0.00 : ffffffff81050641: cmpl %ebx, %eax
0.00 : ffffffff81050643: jbe 0xffffffff810506ef <__fdget+0xdf>
0.00 : ffffffff81050649: movl %ebx, %r15d
5.56 : ffffffff8105064c: movq 0x8(%rcx), %rdx # data-type: struct fdtable +0x8 (fd)
...
The "# data-type:" part was added with this change. The first few
entries are not very interesting. But later you can it accesses a
couple of fields in the task_struct, files_struct and fdtable.
perf trace:
- Support syscall tracing for different ABI. For example it can trace
system calls for 32-bit applications on 64-bit kernel
transparently.
- Add --summary-mode=total option to show global syscall summary. The
default is 'thread' to show per-thread syscall summary.
Python support:
- Add more interfaces to 'perf' module to parse events, and config,
enable or disable the event list properly so that it can implement
basic functionalities purely in Python. There is an example code
for these new interfaces in python/tracepoint.py.
- Add mypy and pylint support to enable build time checking. Fix some
code based on the findings from these tools.
Internals:
- Introduce io_dir__readdir() API to make directory traveral (usually
for proc or sysfs) efficient with less memory footprint.
JSON vendor events:
- Add events and metrics for ARM Neoverse N3 and V3
- Update events and metrics on various Intel CPUs
- Add/update events for a number of SiFive processors"
* tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits)
perf bpf-filter: Fix a parsing error with comma
perf report: Fix a memory leak for perf_env on AMD
perf trace: Fix wrong size to bpf_map__update_elem call
perf tools: annotate asm_pure_loop.S
perf python: Fix setup.py mypy errors
perf test: Address attr.py mypy error
perf build: Add pylint build tests
perf build: Add mypy build tests
perf build: Rename TEST_LOGS to SHELL_TEST_LOGS
tools/build: Don't pass test log files to linker
perf bench sched pipe: fix enforced blocking reads in worker_thread
perf tools: Fix is_compat_mode build break in ppc64
perf build: filter all combinations of -flto for libperl
perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation
perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata
perf trace: Fix evlist memory leak
perf trace: Fix BTF memory leak
perf trace: Make syscall table stable
perf syscalltbl: Mask off ABI type for MIPS system calls
perf build: Remove Makefile.syscalls
...
|
|
The _DDC method should return a buffer, or an integer in case of an error.
But some Lenovo laptops incorrectly return EDID as buffer in ACPI package.
Calling _DDC generates this ACPI Warning:
ACPI Warning: \_SB.PCI0.GP17.VGA.LCD._DDC: Return type mismatch - \
found Package, expected Integer/Buffer (20240827/nspredef-254)
Use the first element of the package to get the EDID buffer.
The DSDT:
Name (AUOP, Package (0x01)
{
Buffer (0x80)
{
...
}
})
...
Method (_DDC, 1, NotSerialized) // _DDC: Display Data Current
{
If ((PAID == AUID))
{
Return (AUOP) /* \_SB_.PCI0.GP17.VGA_.LCD_.AUOP */
}
ElseIf ((PAID == IVID))
{
Return (IVOP) /* \_SB_.PCI0.GP17.VGA_.LCD_.IVOP */
}
ElseIf ((PAID == BOID))
{
Return (BOEP) /* \_SB_.PCI0.GP17.VGA_.LCD_.BOEP */
}
ElseIf ((PAID == SAID))
{
Return (SUNG) /* \_SB_.PCI0.GP17.VGA_.LCD_.SUNG */
}
Return (Zero)
}
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
Cc: stable@vger.kernel.org
Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4085
Signed-off-by: Gergo Koteles <soyer@irl.hu>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/61c3df83ab73aba0bc7a941a443cd7faf4cf7fb0.1743195250.git.soyer@irl.hu
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
When the ring is allocated, it is kzalloc-ed. ring->queue_refs will
already be initialized to 0 by default. It does not need to be
atomically set to 0.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
There is a race condition leading to a kernel crash from a null
dereference when attemping to access fc->lock in
fuse_uring_create_queue(). fc may be NULL in the case where another
thread is creating the uring in fuse_uring_create() and has set
fc->ring but has not yet set ring->fc when fuse_uring_create_queue()
reads ring->fc. There is another race condition as well where in
fuse_uring_register(), ring->nr_queues may still be 0 and not yet set
to the new value when we compare qid against it.
This fix sets fc->ring only after ring->fc and ring->nr_queues have been
set, which guarantees now that ring->fc is a proper pointer when any
queues are created and ring->nr_queues reflects the right number of
queues if ring is not NULL. We must use smp_store_release() and
smp_load_acquire() semantics to ensure the ordering will remain correct
where fc->ring is assigned only after ring->fc and ring->nr_queues have
been assigned.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Fixes: 24fe962c86f5 ("fuse: {io-uring} Handle SQEs - register commands")
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Our file system has a translation capability for S3-to-posix.
The current value of 1kiB is enough to cover S3 keys, but
does not allow encoding of %xx escape characters.
The limit is increased to (PATH_MAX - 1), as we need
3 x 1024 and that is close to PATH_MAX (4kB) already.
-1 is used as the terminating null is not included in the
length calculation.
Testing large file names was hard with libfuse/example file systems,
so I created a new memfs that does not have a 255 file name length
limitation.
https://github.com/libfuse/libfuse/pull/1077
The connection is initialized with FUSE_NAME_LOW_MAX, which
is set to the previous value of FUSE_NAME_MAX of 1024. With
FUSE_MIN_READ_BUFFER of 8192 that is enough for two file names
+ fuse headers.
When FUSE_INIT reply sets max_pages to a value > 1 we know
that fuse daemon supports request buffers of at least 2 pages
(+ header) and can therefore hold 2 x PATH_MAX file names - operations
like rename or link that need two file names are no issue then.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_notify_inval_entry and fuse_notify_delete were using fixed allocations
of FUSE_NAME_MAX to hold the file name. Often that large buffers are not
needed as file names might be smaller, so this uses the actual file name
size to do the allocation.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Introduce two new sysctls, "default_request_timeout" and
"max_request_timeout". These control how long (in seconds) a server can
take to reply to a request. If the server does not reply by the timeout,
then the connection will be aborted. The upper bound on these sysctl
values is 65535.
"default_request_timeout" sets the default timeout if no timeout is
specified by the fuse server on mount. 0 (default) indicates no default
timeout should be enforced. If the server did specify a timeout, then
default_request_timeout will be ignored.
"max_request_timeout" sets the max amount of time the server may take to
reply to a request. 0 (default) indicates no maximum timeout. If
max_request_timeout is set and the fuse server attempts to set a
timeout greater than max_request_timeout, the system will use
max_request_timeout as the timeout. Similarly, if default_request_timeout
is greater than max_request_timeout, the system will use
max_request_timeout as the timeout. If the server does not request a
timeout and default_request_timeout is set to 0 but max_request_timeout
is set, then the timeout will be max_request_timeout.
Please note that these timeouts are not 100% precise. The request may
take roughly an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the set max
timeout due to how it's internally implemented.
$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 0
$ echo 65536 | sudo tee /proc/sys/fs/fuse/default_request_timeout
tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument
$ echo 65535 | sudo tee /proc/sys/fs/fuse/default_request_timeout
65535
$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 65535
$ echo 0 | sudo tee /proc/sys/fs/fuse/default_request_timeout
0
$ sysctl -a | grep fuse.default_request_timeout
fs.fuse.default_request_timeout = 0
[Luis Henriques: Limit the timeout to the range [FUSE_TIMEOUT_TIMER_FREQ,
fuse_max_req_timeout]]
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Luis Henriques <luis@igalia.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
There are situations where fuse servers can become unresponsive or
stuck, for example if the server is deadlocked. Currently, there's no
good way to detect if a server is stuck and needs to be killed manually.
This commit adds an option for enforcing a timeout (in seconds) for
requests where if the timeout elapses without the server responding to
the request, the connection will be automatically aborted.
Please note that these timeouts are not 100% precise. For example, the
request may take roughly an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond
the requested timeout due to internal implementation, in order to
mitigate overhead.
[SzM: Bump the API version number]
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If filesystem doesn't support FUSE_LINK (i.e. returns -ENOSYS), then
remember this and next time return immediately, without incurring the
overhead of a round trip to the server.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|