Age | Commit message (Collapse) | Author |
|
The conversion to folios switched __free_page() to __folio_put() in the
error path in btrfs_do_encoded_write().
However, this gets the page refcounting wrong. If we do hit that error
path (I reproduced by modifying btrfs_do_encoded_write to pretend to
always fail in a way that jumps to out_folios and running the fstests
case btrfs/281), then we always hit the following BUG freeing the folio:
BUG: Bad page state in process btrfs pfn:40ab0b
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x61be5 pfn:0x40ab0b
flags: 0x5ffff0000000000(node=0|zone=2|lastcpupid=0x1ffff)
raw: 05ffff0000000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000061be5 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: nonzero _refcount
Call Trace:
<TASK>
dump_stack_lvl+0x3d/0xe0
bad_page+0xea/0xf0
free_unref_page+0x8e1/0x900
? __mem_cgroup_uncharge+0x69/0x90
__folio_put+0xe6/0x190
btrfs_do_encoded_write+0x445/0x780
? current_time+0x25/0xd0
btrfs_do_write_iter+0x2cc/0x4b0
btrfs_ioctl_encoded_write+0x2b6/0x340
It turns out __free_page() decreases the page reference count while
__folio_put() does not. Switch __folio_put() to folio_put() which
decreases the folio reference count first.
Fixes: 400b172b8cdc ("btrfs: compression: migrate compression/decompression paths to folios")
Tested-by: Ed Tomlinson <edtoml@gmail.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Explicitly reset 'subdir' variable when descending to
tools/perf/Documentation. Similar to commit f89fb55714b62 ("perf build:
Don't propagate subdir to submakes for install_headers", 2023-01-02),
calling the 'tools/perf_install' target via top-levels Makefile results
in repeated subdir components when attempting to call the perf
documentation installation rules:
$ make tools/perf_install NO_LIBTRACEEVENT=1 JOBS=1
[...]
/bin/sh: 1: cd: can't cd to /data/linux/kbuild/tools/perf/tools/perf/
../../scripts/Makefile.include:17: *** output directory "/data/linux/kbuild/tools/perf/tools/perf/" does not exist. Stop.
make[5]: *** [Makefile.perf:1096: try-install-man] Error 2
make[4]: *** [Makefile.perf:264: sub-make] Error 2
make[3]: *** [Makefile:113: install] Error 2
make[2]: *** [Makefile:131: perf_install] Error 2
Resetting 'subdir' fixes the call from top-level Makefile.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Nicolas Schier <n.schier@avm.de>
Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Tested-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20240523-make-tools-perf-install-v1-1-3903499e637f@avm.de
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add JSON metrics for i.MX95 DDR Performance Monitor.
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Cc: festevam@gmail.com
Cc: conor+dt@kernel.org
Cc: robh+dt@kernel.org
Cc: shawnguo@kernel.org
Cc: will@kernel.org
Cc: krzysztof.kozlowski+dt@linaro.org
Cc: mike.leach@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: imx@lists.linux.dev
Cc: kernel@pengutronix.de
Cc: s.hauer@pengutronix.de
Cc: devicetree@vger.kernel.org
Link: https://lore.kernel.org/r/20240529080358.703784-8-xu.yang_2@nxp.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add JSON metrics for i.MX93 DDR Performance Monitor.
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Cc: festevam@gmail.com
Cc: conor+dt@kernel.org
Cc: robh+dt@kernel.org
Cc: shawnguo@kernel.org
Cc: will@kernel.org
Cc: krzysztof.kozlowski+dt@linaro.org
Cc: mike.leach@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: imx@lists.linux.dev
Cc: john.g.garry@oracle.com
Cc: kernel@pengutronix.de
Cc: s.hauer@pengutronix.de
Cc: devicetree@vger.kernel.org
Link: https://lore.kernel.org/r/20240529080358.703784-7-xu.yang_2@nxp.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Based on multiple conversations, most recently on the ksummit mailing
list [1], add some best practices for using the Link trailer, such as:
- how to use markdown-like bracketed numbers in the commit message to
indicate the corresponding link
- when to use lore.kernel.org vs patch.msgid.link domains
Cc: ksummit@lists.linux.dev
Link: https://lore.kernel.org/20240617-arboreal-industrious-hedgehog-5b84ae@meerkat # [1]
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240619-docs-patch-msgid-link-v2-2-72dd272bfe37@linuxfoundation.org
|
|
There have been some changes to the way mailing lists are hosted at
kernel.org. This patch does the following:
1. fixes links that are pointing at the outdated resources
2. removes an outdated patchbomb admonition
We still don't particularly want or welcome huge patchbombs, but they
are less likely to overload our systems.
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Reviewed-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240619-docs-patch-msgid-link-v2-1-72dd272bfe37@linuxfoundation.org
|
|
When it was in text format, it correctly hardcoded steps 8a to 8c.
However, after it was converted to RST, the sequence numbers were
auto-generated during rendering and became incorrect after some
steps were inserted.
Change it to refer to steps a to c in a relative way.
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
[jc: Indented the line to make the relative reference more clear]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614010028.48262-1-lizhijian@fujitsu.com
|
|
Finish the translation of researcher-guidelines and add it to the
index file.
Update to commit 27103dddc2da ("Documentation: update mailing list
addresses")
Reviewed-by: Alex Shi <alexs@kernel.org>
Signed-off-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240614032211.241899-1-dzm91@hust.edu.cn
|
|
align header of document with filename and rest of the content
Signed-off-by: Jiri Kastner <cz172638@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240626203906.191841-1-cz172638@gmail.com
|
|
Translate Documentation/process/maintainer-kvm-x86.rst into Spanish.
Co-developed-by: Juan Embid <jembid@ucm.es>
Signed-off-by: Juan Embid <jembid@ucm.es>
Signed-off-by: Carlos Bilbao <carlos.bilbao.osdev@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
[jc: fixed apply- and build-time warnings]
Link: https://lore.kernel.org/r/20240626221942.2780668-1-carlos.bilbao.osdev@gmail.com
|
|
syzbot reports:
KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
[..]
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Problem is that the notifier does a conditional flush, but its possible
that the table-to-be-removed is still referenced by transactions being
processed by the worker, so we need to flush unconditionally.
We could make the flush_work depend on whether we found a table to delete
in nf-next to avoid the flush for most cases.
AFAICS this problem is only exposed in nf-next, with
commit e169285f8c56 ("netfilter: nf_tables: do not store nft_ctx in transaction objects"),
with this commit applied there is an unconditional fetch of
table->family which is whats triggering the above splat.
Fixes: 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier")
Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Convert the word "quired" to the word "queried" which makes more
sense in this context.
Signed-off-by: Daniel Watson <ozzloy@each.do>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/878qymrjrg.fsf@trent-reznor
|
|
libps2 has been using kerneldoc to document its methods, but was not
actually plugged into driver-api.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/ZoMQhkyUQYi1Bx4t@google.com
|
|
The memory allocation profiling document was added to the bottom of the
new outline. Apparently it was not decided by well-defined guidelines
or a thorough discussions. Rather than that, it was added there just
because there was no place for such unsorted documents. Now there is
the chapter. Move the document to the new place.
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240701190512.49379-5-sj@kernel.org
|
|
The intention of 'Legacy Documentation' chapter is to keep the old
documents that not yet sorted into the new outline, and encourage new
documents to be integrated in the new outline from the beginning.
However, the new outline will take some more time to be completed. It
has started about two years ago, and still many parts are not yet
written. Also, there is no clear guidline for placing each document for
all cases, for not only the 'legacy' documents, but also for new
documents. For example, memory allocation profiling document has been
added to the bottom of the new outline. Apparently it was not following
some well-defined guideliens or a result of a discussion.
Furthermore, the title ("legacy") makes people feel the documents on the
chapter might be outdated or not actively maintained.
Rename 'Legacy Documentation' to 'Unsorted Documentation' and remove the
description saying it is for 'older' documents. After this change, new
documents that not clear enough where it should be placed on the new
outline can be added on the chapter while well-defined guidelines or
discussion for the new outline is made.
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240701190512.49379-4-sj@kernel.org
|
|
'Memory Management Guide' chapter aims to be not an additional chapter
of the document, but the ultimate single outline of the document. In
the sense, marking it as a chapter under the document makes no sense,
and the rendered document looks odd. Remove the chapter marker.
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240701190512.49379-3-sj@kernel.org
|
|
'Theory of operation' part of allocation-profiling document is
apparently a chapter. However, it is mistakenly marked as a document
title. As a result, rendered mm document index page shows two items for
the document. Fix it to be marked as a chapter.
Signed-off-by: SeongJae Park <sj@kernel.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20240701190512.49379-2-sj@kernel.org
|
|
When del_timer_sync() is called in an interrupt context it throws a warning
because of potential deadlock. The timer is used only to exit from
wait_for_completion() after a timeout so replacing the call with
wait_for_completion_timeout() allows to remove the problematic timer and
its related functions altogether.
Fixes: 41561f28e76a ("i2c: New Philips PNX bus driver")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk@timesys.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
|
|
dsos__add would add at the end of the dso array possibly requiring a
later find to re-sort the array. Patterns of find then add were
becoming O(n*log n) due to the sorts. Change the add routine to be
O(n) rather than O(1) but to maintain the sorted-ness of the dsos
array so that later finds don't need the O(n*log n) sort.
Fixes: 3f4ac23a9908 ("perf dsos: Switch backing storage to array from rbtree/list")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Link: https://lore.kernel.org/r/20240703172117.810918-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The array is sorted, so just move the elements and insert in order.
Fixes: 13ca628716c6 ("perf comm: Add reference count checking to 'struct comm_str'")
Reported-by: Matt Fleming <matt@readmodwrite.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Matt Fleming <matt@readmodwrite.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240703172117.810918-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix ioctl conflict with memmapped ring buffer ioctl
It was reported that the ioctl() number used to update the ring buffer
memory mapping conflicted with the TCGETS ioctl causing strace to
report:
$ strace -e ioctl stty
ioctl(0, TCGETS or TRACE_MMAP_IOCTL_GET_READER, {c_iflag=ICRNL|IXON, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0
Since this ioctl hasn't been in a full release yet, change it from
"T", 0x1 to "R" 0x20, and also reserve 0x20-0x2F for future ioctl
commands, as some more are being worked on for the future"
* tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F
|
|
Fix example to also conform to rules specified in the separate
not-included gpadc binding.
Fixes: 62e4f3396197 ("dt-bindings: regulator: twl-regulator: convert to yaml")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Reported-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Lee Jones <lee@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240612134039.1089839-1-andreas@kemnade.info
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Driver's probe() has two allocations which are needed only within the
probe() itself - for devm_regmap_init_mmio().
Usage of devm interface is a bit misleading here, because these can be
freed right after each scope finishes.
This makes the code a bit more obvious and self documenting.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240701-b4-qcom-audio-lpass-codec-cleanups-v3-6-6d98d4dd1ef5@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Driver uses ARRAY_SIZE() to get number of widgets later passed to
snd_soc_dapm_new_controls(), which is an 'unsigned int'.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240701-b4-qcom-audio-lpass-codec-cleanups-v3-5-6d98d4dd1ef5@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Number of widgets in array passed to snd_soc_dapm_new_controls() cannot
be negative, so make it explicit by using 'unsigned int', just like
snd_soc_add_component_controls() is doing.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240701-b4-qcom-audio-lpass-codec-cleanups-v3-4-6d98d4dd1ef5@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The driver has static 'struct regmap_config', which is then customized
depending on device version. This works fine, because there should not
be two devices in a system simultaneously and even less likely that such
two devices would have different versions, thus different regmap config.
However code is cleaner and more obvious when static data in the driver
is also const - it serves as a template.
Mark the 'struct regmap_config' as const and duplicate it in the probe()
with kmemdup to allow customizing per detected device variant.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240701-b4-qcom-audio-lpass-codec-cleanups-v3-3-6d98d4dd1ef5@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Allocate the default register values array with scoped/cleanup.h to
reduce number of error paths and make code a bit simpler.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240701-b4-qcom-audio-lpass-codec-cleanups-v3-2-6d98d4dd1ef5@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Eliminate PDS cleanup by using devm_add_action_or_reset() which results
in one less error path and smaller cleanup in remove().
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240701-b4-qcom-audio-lpass-codec-cleanups-v3-1-6d98d4dd1ef5@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
To prevent conflicts with other ioctl numbers to allow strace to have an
idea of what is happening, add the range of ioctls for the trace buffer
mapping from _IO("T", 0x1) to the range of "R" 0x20 - 0x2F.
Link: https://lore.kernel.org/linux-trace-kernel/20240630105322.GA17573@altlinux.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240630213626.GA23566@altlinux.org/
Cc: Jonathan Corbet <corbet@lwn.net>
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Link: https://lore.kernel.org/20240702153354.367861db@rorschach.local.home
Reported-by: "Dmitry V. Levin" <ldv@strace.io>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Switch the driver to use GPIO descriptors.
Notice that we let the gpiolib handle line inversion for the
active low reset line (nreset !reset).
There are no upstream device trees using the tas5086 compatible
string, if there were, we would need to ascertain that they all
set the GPIO_ACTIVE_LOW flag on their GPIO lines.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20240701-asoc-tas-gpios-v1-1-d69ec5d79939@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
If the kexec crash code is called in the interrupt context, the
machine_kexec_mask_interrupts() function will trigger a deadlock while
trying to acquire the irqdesc spinlock and then deactivate irqchip in
irq_set_irqchip_state() function.
Unlike arm64, riscv only requires irq_eoi handler to complete EOI and
keeping irq_set_irqchip_state() will only leave this possible deadlock
without any use. So we simply remove it.
Link: https://lore.kernel.org/linux-riscv/20231208111015.173237-1-songshuaishuai@tinylab.org/
Fixes: b17d19a5314a ("riscv: kexec: Fixup irq controller broken in kexec crash path")
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Ryo Takakura <takakura@valinux.co.jp>
Link: https://lore.kernel.org/r/20240626023316.539971-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
ftrace_graph_ret_addr() takes an `idx` integer pointer that is used to
optimize the stack unwinding. Pass it a valid pointer to utilize the
optimizations that might be available in the future.
The commit is making riscv's usage of ftrace_graph_ret_addr() match
x86_64.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240618145820.62112-1-puranjay@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Clang does not support implicit LMUL in the vset* instruction sequences.
Introduce an explicit LMUL in the vsetivli instruction.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Fixes: 9d5328eeb185 ("riscv: selftests: Add signal handling vector tests")
Link: https://lore.kernel.org/r/20240702-fix_sigreturn_test-v1-1-485f88a80612@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Atish Patra <atishp@rivosinc.com> says:
This series contains 3 fixes out of which the first one is a new fix
for invalid event data reported in lkml[2]. The last two are v3 of Samuel's
patch[1]. I added the RB/TB/Fixes tag and moved 1 unrelated change
to its own patch. I also changed an error message in kvm vcpu_pmu from
pr_err to pr_debug to avoid redundant failure error messages generated
due to the boot time quering of events implemented in the patch[1]
Here is the original cover letter for the patch[1]
Before this patch:
$ perf list hw
List of pre-defined events (to be used in -e or -M):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
stalled-cycles-backend OR idle-cycles-backend [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
$ perf stat -ddd true
Performance counter stats for 'true':
4.36 msec task-clock # 0.744 CPUs utilized
1 context-switches # 229.325 /sec
0 cpu-migrations # 0.000 /sec
38 page-faults # 8.714 K/sec
4,375,694 cycles # 1.003 GHz (60.64%)
728,945 instructions # 0.17 insn per cycle
79,199 branches # 18.162 M/sec
17,709 branch-misses # 22.36% of all branches
181,734 L1-dcache-loads # 41.676 M/sec
5,547 L1-dcache-load-misses # 3.05% of all L1-dcache accesses
<not counted> LLC-loads (0.00%)
<not counted> LLC-load-misses (0.00%)
<not counted> L1-icache-loads (0.00%)
<not counted> L1-icache-load-misses (0.00%)
<not counted> dTLB-loads (0.00%)
<not counted> dTLB-load-misses (0.00%)
<not counted> iTLB-loads (0.00%)
<not counted> iTLB-load-misses (0.00%)
<not counted> L1-dcache-prefetches (0.00%)
<not counted> L1-dcache-prefetch-misses (0.00%)
0.005860375 seconds time elapsed
0.000000000 seconds user
0.010383000 seconds sys
After this patch:
$ perf list hw
List of pre-defined events (to be used in -e or -M):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
$ perf stat -ddd true
Performance counter stats for 'true':
5.16 msec task-clock # 0.848 CPUs utilized
1 context-switches # 193.817 /sec
0 cpu-migrations # 0.000 /sec
37 page-faults # 7.171 K/sec
5,183,625 cycles # 1.005 GHz
961,696 instructions # 0.19 insn per cycle
85,853 branches # 16.640 M/sec
20,462 branch-misses # 23.83% of all branches
243,545 L1-dcache-loads # 47.203 M/sec
5,974 L1-dcache-load-misses # 2.45% of all L1-dcache accesses
<not supported> LLC-loads
<not supported> LLC-load-misses
<not supported> L1-icache-loads
<not supported> L1-icache-load-misses
<not supported> dTLB-loads
19,619 dTLB-load-misses
<not supported> iTLB-loads
6,831 iTLB-load-misses
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
0.006085625 seconds time elapsed
0.000000000 seconds user
0.013022000 seconds sys
[1] https://lore.kernel.org/linux-riscv/20240418014652.1143466-1-samuel.holland@sifive.com/
[2] https://lore.kernel.org/all/CC51D53B-846C-4D81-86FC-FBF969D0A0D6@pku.edu.cn/
* b4-shazam-merge:
perf: RISC-V: Check standard event availability
drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
drivers/perf: riscv: Do not update the event data if uptodate
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-0-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The RISC-V SBI PMU specification defines several standard hardware and
cache events. Currently, all of these events are exposed to userspace,
even when not actually implemented. They appear in the `perf list`
output, and commands like `perf stat` try to use them.
This is more than just a cosmetic issue, because the PMU driver's .add
function fails for these events, which causes pmu_groups_sched_in() to
prematurely stop scheduling in other (possibly valid) hardware events.
Add logic to check which events are supported by the hardware (i.e. can
be mapped to some counter), so only usable events are reported to
userspace. Since the kernel does not know the mapping between events and
possible counters, this check must happen during boot, when no counters
are in use. Make the check asynchronous to minimize impact on boot time.
Fixes: e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-3-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Currently, we stop all the counters while a new cpu is brought online.
However, the hpmevent to counter mappings are not reset. The firmware may
have some stale encoding in their mapping structure which may lead to
undesirable results. We have not encountered such scenario though.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-2-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
In case of an counter overflow, the event data may get corrupted
if called from an external overflow handler. This happens because
we can't update the counter without starting it when SBI PMU
extension is in use. However, the prev_count has been already
updated at the first pass while the counter value is still the
old one.
The solution is simple where we don't need to update it again
if it is already updated which can be detected using hwc state.
The event state in the overflow handler is updated in the following
patch. Thus, this fix can't be backported to kernel version where
overflow support was added.
Fixes: a8625217a054 ("drivers/perf: riscv: Implement SBI PMU snapshot function")
Closes:https://lore.kernel.org/all/CC51D53B-846C-4D81-86FC-FBF969D0A0D6@pku.edu.cn/
Reported-by: garthlei@pku.edu.cn
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-1-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add a new configuration CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG_PLATFORM_KEYRING
that enables verifying dm-verity signatures using the platform keyring,
which is populated using the UEFI DB certificates. This is useful for
self-enrolled systems that do not use MOK, as the secondary keyring which
is already used for verification, if the relevant kconfig is enabled, is
linked to the machine keyring, which gets its certificates loaded from MOK.
On datacenter/virtual/cloud deployments it is more common to deploy one's
own certificate chain directly in DB on first boot in unattended mode,
rather than relying on MOK, as the latter typically requires interactive
authentication to enroll, and is more suited for personal machines.
Default to the same value as DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING
if not otherwise specified, as it is likely that if one wants to use
MOK certificates to verify dm-verity volumes, DB certificates are
going to be used too. Keys in DB are allowed to load a full kernel
already anyway, so they are already highly privileged.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
rm-raid devices will occasionally trigger the following warning when
being resumed after a table load because DM_RECOVERY_RUNNING is set:
WARNING: CPU: 7 PID: 5660 at drivers/md/dm-raid.c:4105 raid_resume+0xee/0x100 [dm_raid]
The failing check is:
WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
This check is designed to make sure that the sync thread isn't
registered, but md_check_recovery can set MD_RECOVERY_RUNNING without
the sync_thread ever getting registered. Instead of checking if
MD_RECOVERY_RUNNING is set, check if sync_thread is non-NULL.
Fixes: 16c4770c75b1 ("dm-raid: really frozen sync_thread during suspend")
Suggested-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
Currently dm-verity computes the hash of each block by using multiple
calls to the "ahash" crypto API. While the exact sequence depends on
the chosen dm-verity settings, in the vast majority of cases it is:
1. crypto_ahash_init()
2. crypto_ahash_update() [salt]
3. crypto_ahash_update() [data]
4. crypto_ahash_final()
This is inefficient for two main reasons:
- It makes multiple indirect calls, which is expensive on modern CPUs
especially when mitigations for CPU vulnerabilities are enabled.
Since the salt is the same across all blocks on a given dm-verity
device, a much more efficient sequence would be to do an import of the
pre-salted state, then a finup.
- It uses the ahash (asynchronous hash) API, despite the fact that
CPU-based hashing is almost always used in practice, and therefore it
experiences the overhead of the ahash-based wrapper for shash.
Because dm-verity was intentionally converted to ahash to support
off-CPU crypto accelerators, a full reversion to shash might not be
acceptable. Yet, we should still provide a fast path for shash with
the most common dm-verity settings.
Another reason for shash over ahash is that the upcoming multibuffer
hashing support, which is specific to CPU-based hashing, is much
better suited for shash than for ahash. Supporting it via ahash would
add significant complexity and overhead. And it's not possible for
the "same" code to properly support both multibuffer hashing and HW
accelerators at the same time anyway, given the different computation
models. Unfortunately there will always be code specific to each
model needed (for users who want to support both).
Therefore, this patch adds a new shash import+finup based fast path to
dm-verity. It is used automatically when appropriate. This makes
dm-verity optimized for what the vast majority of users want: CPU-based
hashing with the most common settings, while still retaining support for
rarer settings and off-CPU crypto accelerators.
In benchmarks with veritysetup's default parameters (SHA-256, 4K data
and hash block sizes, 32-byte salt), which also match the parameters
that Android currently uses, this patch improves block hashing
performance by about 15% on x86_64 using the SHA-NI instructions, or by
about 5% on arm64 using the ARMv8 SHA2 instructions. On x86_64 roughly
two-thirds of the improvement comes from the use of import and finup,
while the remaining third comes from the switch from ahash to shash.
Note that another benefit of using "import" to handle the salt is that
if the salt size is equal to the input size of the hash algorithm's
compression function, e.g. 64 bytes for SHA-256, then the performance is
exactly the same as no salt. This doesn't seem to be much better than
veritysetup's current default of 32-byte salts, due to the way SHA-256's
finalization padding works, but it should be marginally better.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
In preparation for adding shash support to dm-verity, change
verity_hash() to take a pointer to a struct dm_verity_io instead of a
pointer to the ahash_request embedded inside it.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
dm-verity needs to access data blocks by virtual address in three
different cases (zeroization, recheck, and forward error correction),
and one more case (shash support) is coming. Since it's guaranteed that
dm-verity data blocks never cross pages, and kmap_local_page and
kunmap_local are no-ops on modern platforms anyway, just unconditionally
"map" every data block's page and work with the virtual buffer directly.
This simplifies the code and eliminates unnecessary overhead.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
Since Linux v6.1, some filesystems support submitting direct I/O that is
aligned to only dma_alignment instead of the logical_block_size
alignment that was required before. I/O that is not aligned to the
logical_block_size is difficult to handle in device-mapper targets that
do cryptographic processing of data, as it makes the units of data that
are hashed or encrypted possibly be split across pages, creating rarely
used and rarely tested edge cases.
As such, dm-crypt and dm-integrity have already opted out of this by
setting dma_alignment to 'logical_block_size - 1'.
Although dm-verity does have code that handles these cases (or at least
is intended to do so), supporting direct I/O with such a low amount of
alignment is not really useful on dm-verity devices. So, opt dm-verity
out of it too so that it's not necessary to handle these edge cases.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
Change the digest fields in struct dm_verity_io from variable-length to
fixed-length, since their maximum length is fixed at
HASH_MAX_DIGESTSIZE, i.e. 64 bytes, which is not too big. This is
simpler and makes the fields a bit faster to access.
(HASH_MAX_DIGESTSIZE did not exist when this code was written, which may
explain why it wasn't used.)
This makes the verity_io_real_digest() and verity_io_want_digest()
functions trivial, but this patch leaves them in place temporarily since
most of their callers will go away in a later patch anyway.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
When the WARN_ON_ONCE() triggers, the printk() of the additional
information related to the warning will not happen in print level
"warn". When reading dmesg with a restriction to level "warn", the
information published by the printk_once() will not show up there.
Transform WARN_ON_ONCE() and printk_once() into a WARN_ONCE().
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240610103552.25252-1-anna-maria@linutronix.de
|
|
Move the code that handles mismatches of data block hashes into its own
function so that it doesn't clutter up verity_verify_io().
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
|
|
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack@suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits). If limits end up being larger, we will hit overflows,
possible divisions by 0 etc. Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway. For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits. For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc. So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.
This is root-only triggerable problem which occurs when the operator
sets dirty limits to >16 TB.
Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|