Age | Commit message (Collapse) | Author |
|
To parse command line the bench utility uses the argp_parse() function. This
function takes as an argument a parent 'struct argp' structure which defines
common command line options and an array of children 'struct argp' structures
which defines additional command line options for particular benchmarks. This
implementation doesn't allow benchmarks to share option names, e.g., if two
benchmarks want to use, say, the --option option, then only one of them will
succeed (the first one encountered in the array). This will be convenient if
same option names could be used in different benchmarks (with the same
semantics, e.g., --nr_loops=N).
Fix this by calling the argp_parse() function twice. The first call is the same
as it was before, with all children argps, and helps to find the benchmark name
and to print a combined help message if anything is wrong. Given the name, we
can call the argp_parse the second time, but now the children array points only
to a correct benchmark thus always calling the correct parsers. (If there's no
a specific list of arguments, then only one call to argp_parse will be done.)
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-4-aspsk@isovalent.com
|
|
The hashmap_report_final callback function defined in the
benchs/bench_bpf_hashmap_full_update.c file should be static.
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-3-aspsk@isovalent.com
|
|
To call the bpf_hashmap_full_update benchmark, one should say:
bench bpf-hashmap-ful-update
The patch adds a missing 'l' to the benchmark name.
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-2-aspsk@isovalent.com
|
|
Hou Tao says:
====================
From: Hou Tao <houtao1@huawei.com>
Hi,
The patchset tries to fix the hard-up problem found when checking how htab
handles element reuse in bpf memory allocator. The immediate reuse of
freed elements will reinitialize special fields (e.g., bpf_spin_lock) in
htab map value and it may corrupt lookup procedure with BFP_F_LOCK flag
which acquires bpf-spin-lock during value copying, and lead to hard-lock
as shown in patch #2. Patch #1 fixes it by using __GFP_ZERO when allocating
the object from slab and the behavior is similar with the preallocated
hash-table case. Please see individual patches for more details. And comments
are always welcome.
Regards,
Change Log:
v1:
* Use __GFP_ZERO instead of ctor to avoid retpoline overhead (from Alexei)
* Add comments for check_and_init_map_value() (from Alexei)
* split __GFP_ZERO patches out of the original patchset to unblock
the development work of others.
RFC: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The reinitialization of spin-lock in map value after immediate reuse may
corrupt lookup with BPF_F_LOCK flag and result in hard lock-up, so add
one test case to demonstrate the problem.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230215082132.3856544-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently the freed element in bpf memory allocator may be immediately
reused, for htab map the reuse will reinitialize special fields in map
value (e.g., bpf_spin_lock), but lookup procedure may still access
these special fields, and it may lead to hard-lockup as shown below:
NMI backtrace for cpu 16
CPU: 16 PID: 2574 Comm: htab.bin Tainted: G L 6.1.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
RIP: 0010:queued_spin_lock_slowpath+0x283/0x2c0
......
Call Trace:
<TASK>
copy_map_value_locked+0xb7/0x170
bpf_map_copy_value+0x113/0x3c0
__sys_bpf+0x1c67/0x2780
__x64_sys_bpf+0x1c/0x20
do_syscall_64+0x30/0x60
entry_SYSCALL_64_after_hwframe+0x46/0xb0
......
</TASK>
For htab map, just like the preallocated case, these is no need to
initialize these special fields in map value again once these fields
have been initialized. For preallocated htab map, these fields are
initialized through __GFP_ZERO in bpf_map_area_alloc(), so do the
similar thing for non-preallocated htab in bpf memory allocator. And
there is no need to use __GFP_ZERO for per-cpu bpf memory allocator,
because __alloc_percpu_gfp() does it implicitly.
Fixes: 0fd7c5d43339 ("bpf: Optimize call_rcu in non-preallocated hash map.")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230215082132.3856544-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Clang warns:
drivers/macintosh/windfarm_lm75_sensor.c:63:14: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
lm->inited = 1;
^ ~
drivers/macintosh/windfarm_smu_sensors.c:356:19: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
pow->fake_volts = 1;
^ ~
drivers/macintosh/windfarm_smu_sensors.c:368:18: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
pow->quadratic = 1;
^ ~
There is no bug here since no code checks the actual value of these
fields, just whether or not they are zero (boolean context), but this
can be easily fixed by switching to an unsigned type.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230215-windfarm-wsingle-bit-bitfield-constant-conversion-v1-1-26415072e855@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor fix from John Johansen:
"Regression fix for getattr mediation of old policy"
* tag 'apparmor-v6.2-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: Fix regression in compat permissions for getattr
|
|
Since this driver can only be built-in, it fails to link when
the I2C layer is in a loadable module:
x86_64-linux-ld: drivers/power/reset/odroid-go-ultra-poweroff.o: in function `odroid_go_ultra_poweroff_get_pmic_device':
odroid-go-ultra-poweroff.c:(.text+0x30): undefined reference to `i2c_find_device_by_fwnode'
Tighten the dependency to only allow enabling
POWER_RESET_ODROID_GO_ULTRA_POWEROFF is I2C is built-in as well.
Fixes: cec3b46b8bda ("power: reset: add Odroid Go Ultra poweroff driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
Instead of relying on an accidental, transitive inclusion of linux/leds.h
use it directly.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
This code accidentally returns success instead of a negative error code.
Fixes: 50a6620dd1fb ("spi: bcm63xx-hsspi: Add polling mode support")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: William Zhang <william.zhang@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/Y+zmoGH6LubPhiI0@kili
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
This code accidentally returns success instead of a negative error code.
Fixes: a38a2233f23b ("spi: bcmbca-hsspi: Add driver for newer HSSPI controller")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: William Zhang <william.zhang@broadcom.com>
Link: https://lore.kernel.org/r/Y+zmrNJ9zjNQpzWq@kili
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
It's fine for these properties to be absent, as the driver doesn't fail
without them and functions with settings inherited from the reset/previous
stage bootloader state.
Fixes: 6c463222a21d ("dt-bindings: power: supply: pm8941-coincell: Convert to DT schema format")
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
Add a specific compatible for the coincell charger present on PM8998.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
DT_SCHEMA_FILES used to allow specifying a space separated list of file
paths, but the introduction of partial matches support broke this
feature:
$ make dtbs_check DT_SCHEMA_FILES="path/to/schema1.yaml path/to/schema2.yaml"
[...]
LINT Documentation/devicetree/bindings
usage: yamllint [-h] [-] [-c CONFIG_FILE | -d CONFIG_DATA] [--list-files] [...]
[-v]
[FILE_OR_DIR ...]
yamllint: error: one of the arguments FILE_OR_DIR - is required
[...]
Restore the lost functionality by preparing a grep filter that is able
to handle multiple search patterns.
Additionally, as suggested by Rob, use ':' instead of ' ' as the
patterns separator char. Hence, the command above becomes:
$ make dtbs_check DT_SCHEMA_FILES="path/to/schema1.yaml:path/to/schema2.yaml"
Fixes: 309d955985ee ("dt-bindings: kbuild: Support partial matches with DT_SCHEMA_FILES")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Link: https://lore.kernel.org/r/20230209193735.795288-1-cristian.ciocaltea@collabora.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
It's important to know reserved-mem information in mobile world
since reserved memory via device tree keeps increased in platform
(e.g., 45% in our platform). Therefore, it's crucial to know the
reserved memory sizes breakdown for the memory accounting.
This patch prints out reserved memory details during boot to make
them visible.
Below is an example output:
[ 0.000000] OF: reserved mem: 0x00000009f9400000..0x00000009fb3fffff ( 32768 KB ) map reusable test1
[ 0.000000] OF: reserved mem: 0x00000000ffdf0000..0x00000000ffffffff ( 2112 KB ) map non-reusable test2
[ 0.000000] OF: reserved mem: 0x0000000091000000..0x00000000912fffff ( 3072 KB ) nomap non-reusable test3
Signed-off-by: Martin Liu <liumartin@google.com>
Link: https://lore.kernel.org/r/20230209160954.1471909-1-liumartin@google.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
On surprise removal, pciehp_unconfigure_device() and acpiphp's
trim_stale_devices() call pci_dev_set_disconnected() to mark removed
devices as permanently offline. Thereby, the PCI core and drivers know
to skip device accesses.
However pci_dev_set_disconnected() takes the device_lock and thus waits for
a concurrent driver bind or unbind to complete. As a result, the driver's
->probe and ->remove hooks have no chance to learn that the device is gone.
That doesn't make any sense, so drop the device_lock and instead use atomic
xchg() and cmpxchg() operations to update the device state.
As a byproduct, an AB-BA deadlock reported by Anatoli is fixed which occurs
on surprise removal with AER concurrently performing a bus reset.
AER bus reset:
INFO: task irq/26-aerdrv:95 blocked for more than 120 seconds.
Tainted: G W 6.2.0-rc3-custom-norework-jan11+
schedule
rwsem_down_write_slowpath
down_write_nested
pciehp_reset_slot # acquires reset_lock
pci_reset_hotplug_slot
pci_slot_reset # acquires device_lock
pci_bus_error_reset
aer_root_reset
pcie_do_recovery
aer_process_err_devices
aer_isr
pciehp surprise removal:
INFO: task irq/26-pciehp:96 blocked for more than 120 seconds.
Tainted: G W 6.2.0-rc3-custom-norework-jan11+
schedule_preempt_disabled
__mutex_lock
mutex_lock_nested
pci_dev_set_disconnected # acquires device_lock
pci_walk_bus
pciehp_unconfigure_device
pciehp_disable_slot
pciehp_handle_presence_or_link_change
pciehp_ist # acquires reset_lock
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590
Fixes: a6bd101b8f84 ("PCI: Unify device inaccessible")
Link: https://lore.kernel.org/r/3dc88ea82bdc0e37d9000e413d5ebce481cbd629.1674205689.git.lukas@wunner.de
Reported-by: Anatoli Antonovitch <anatoli.antonovitch@amd.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.20+
Cc: Keith Busch <kbusch@kernel.org>
|
|
for-6.3/block
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.3
- fix and cleanup freeing single sgl (Keith Busch)"
* tag 'nvme-6.3-2023-02-15' of git://git.infradead.org/nvme:
nvme-pci: remove iod use_sgls
nvme-pci: fix freeing single sgl
|
|
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.2
- always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote)
- add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner)
- set the DMA mask earlier (Christoph Hellwig)"
* tag 'nvme-6.2-2023-02-15' of git://git.infradead.org/nvme:
nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
nvme-pci: set the DMA mask earlier
nvme-pci: add bogus ID quirk for ADATA SX6000PNP
|
|
Correct spelling problems for Documentation/i2c/ as reported
by codespell.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Convert i2c-st.txt into st,sti-i2c.yaml for the i2c-st driver.
Signed-off-by: Alain Volmat <avolmat@me.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix a teardown bug in the new nfs4_file hashtable
* tag 'nfsd-6.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't destroy global nfs4_file table in per-net shutdown
|
|
Eduard Zingerman says:
====================
This patch-set is a part of preparation work for -mcpu=v4 option for
BPF C compiler (discussed in [1]). Among other things -mcpu=v4 should
enable generation of BPF_ST instruction by the compiler.
- Patches #1,2 adjust verifier to track values of constants written to
stack using BPF_ST. Currently these are tracked imprecisely, unlike
the writes using BPF_STX, e.g.:
fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm
after such instruction, where m stands for "misc",
just a note that something is written at fp[-8].
r1 = 42; verifier tracks r1=42 after this instruction.
fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction.
This patch makes both cases equivalent.
- Patches #3,4 adjust verifier.c:check_stack_write_fixed_off() to
preserve STACK_ZERO marks when BPF_ST writes zero. Currently these
are replaced by STACK_MISC, unlike zero writes using BPF_STX, e.g.:
... stack range [X,Y] is marked as STACK_ZERO ...
r0 = ... variable offset pointer to stack with range [X,Y] ...
fp[r0] = 0; currently verifier marks range [X,Y] as
STACK_MISC for such instructions.
r1 = 0;
fp[r0] = r1; verifier keeps STACK_ZERO marks for range [X,Y].
This patch makes both cases equivalent.
Motivating example for patch #1 could be found at [3].
Previous version of the patch-set is here [2], the changes are:
- Explicit initialization of fake register parent link is removed from
verifier.c:check_stack_write_fixed_off() as parent links are now
correctly handled by verifier.c:save_register_state().
- Original patch #1 is split in patches #1 & #3.
- Missing test case added for patch #3
verifier.c:check_stack_write_fixed_off() adjustment.
- Test cases are updated to use .prog_type = BPF_PROG_TYPE_SK_LOOKUP,
which requires return value to be in the range [0,1] (original test
cases assumed that such range is always required, which is not true).
- Original patch #3 with changes allowing BPF_ST writes to context is
withheld for now, w/o compiler support for BPF_ST it requires some
creative testing.
- Original patch #5 is removed from the patch-set. This patch
contained adjustments to expected verifier error messages in some
tests, necessary when C compiler generates BPF_ST instruction
instead of BPF_STX (changes to expected instruction indices). These
changes are not necessary yet.
[1] https://lore.kernel.org/bpf/01515302-c37d-2ee5-c950-2f556a4caad0@meta.com/
[2] https://lore.kernel.org/bpf/20221231163122.1360813-1-eddyz87@gmail.com/
[3] https://lore.kernel.org/bpf/f1e4282bf00aa21a72fc5906f8c3be1ae6c94a5e.camel@gmail.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
A test case to verify that variable offset BPF_ST instruction
preserves STACK_ZERO marks when writes zeros, e.g. in the following
situation:
*(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8]
r0 = random(-7, -1) ; some random number in range of [-7, -1]
r0 += r10 ; r0 is now variable offset pointer to stack
*(u8*)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for
; fp[-8] should be preserved.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
BPF_STX instruction preserves STACK_ZERO marks for variable offset
writes in situations like below:
*(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8]
r0 = random(-7, -1) ; some random number in range of [-7, -1]
r0 += r10 ; r0 is now a variable offset pointer to stack
r1 = 0
*(u8*)(r0) = r1 ; BPF_STX writing zero, STACK_ZERO mark for
; fp[-8] is preserved
This commit updates verifier.c:check_stack_write_var_off() to process
BPF_ST in a similar manner, e.g. the following example:
*(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8]
r0 = random(-7, -1) ; some random number in range of [-7, -1]
r0 += r10 ; r0 is now variable offset pointer to stack
*(u8*)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for
; fp[-8] is preserved
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Check that verifier tracks the value of 'imm' spilled to stack by
BPF_ST_MEM instruction. Cover the following cases:
- write of non-zero constant to stack;
- write of a zero constant to stack.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
For aligned stack writes using BPF_ST instruction track stored values
in a same way BPF_STX is handled, e.g. make sure that the following
commands produce similar verifier knowledge:
fp[-8] = 42; r1 = 42;
fp[-8] = r1;
This covers two cases:
- non-null values written to stack are stored as spill of fake
registers;
- null values written to stack are stored as STACK_ZERO marks.
Previously both cases above used STACK_MISC marks instead.
Some verifier test cases relied on the old logic to obtain STACK_MISC
marks for some stack values. These test cases are updated in the same
commit to avoid failures during bisect.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixlet from Steven Rostedt:
"Make trace_define_field_ext() static.
Just after the fix to TASK_COMM_LEN not converted to its value in
trace_events was pulled, the kernel test robot reported that the
helper function trace_define_field_ext() added to that change was only
used in the file it was defined in but was not declared static.
Make it a local function"
* tag 'trace-v6.2-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Make trace_define_field_ext() static
|
|
This fixes a regression in mediation of getattr when old policy built
under an older ABI is loaded and mapped to internal permissions.
The regression does not occur for all getattr permission requests,
only appearing if state zero is the final state in the permission
lookup. This is because despite the first state (index 0) being
guaranteed to not have permissions in both newer and older permission
formats, it may have to carry permissions that were not mediated as
part of an older policy. These backward compat permissions are
mapped here to avoid special casing the mediation code paths.
Since the mapping code already takes into account backwards compat
permission from older formats it can be applied to state 0 to fix
the regression.
Fixes: 408d53e923bd ("apparmor: compute file permissions on profile load")
Reported-by: Philip Meulengracht <the_meulengracht@hotmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Qualcomm G-Link RPM edge bindings do not allow and do not use mbox-names
property.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230208101545.45711-4-krzysztof.kozlowski@linaro.org
|
|
Merge power management utilities and documentation updates for 6.3-rc1:
- Modify some power management utilities to use the canonical ftrace
path (Ross Zwisler).
- Correct spelling problems for Documentation/power/ as reported by
codespell (Randy Dunlap).
* pm-tools:
PM: tools: use canonical ftrace path
* pm-docs:
Documentation: power: correct spelling
|
|
Merge updates of the powercap framework, generic PM domains, Energy
Model and operating performance points for 6.3-rc1:
- Fix possible name leak in powercap_register_zone() (Yang Yingliang).
- Add Meteor Lake and Emerald Rapids support to the intel_rapl power
capping driver (Zhang Rui).
- Modify the idle_inject power capping facility to support 100% idle
injection (Srinivas Pandruvada).
- Fix large time windows handling in the intel_rapl power capping
driver (Zhang Rui).
- Fix memory leaks with using debugfs_lookup() in the generic PM
domains and Energy Model code (Greg Kroah-Hartman).
- Add missing 'cache-unified' property in example for kryo OPP bindings
(Rob Herring).
- Fix error checking in opp_migrate_dentry() (Qi Zheng).
- Remove "select SRCU" (Paul E. McKenney).
- Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
Dybcio).
* powercap:
powercap: intel_rapl: Fix handling for large time window
powercap: idle_inject: Support 100% idle injection
powercap: intel_rapl: add support for Emerald Rapids
powercap: intel_rapl: add support for Meteor Lake
powercap: fix possible name leak in powercap_register_zone()
* pm-domains:
PM: domains: fix memory leak with using debugfs_lookup()
* pm-em:
PM: EM: fix memory leak with using debugfs_lookup()
* pm-opp:
OPP: fix error checking in opp_migrate_dentry()
dt-bindings: opp: v2-qcom-level: Let qcom,opp-fuse-level be a 2-long array
drivers/opp: Remove "select SRCU"
dt-bindings: opp: opp-v2-kryo-cpu: Add missing 'cache-unified' property in example
|
|
guoren@kernel.org <guoren@kernel.org> says:
From: Guo Ren <guoren@linux.alibaba.com>
The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
- The most horrible bug is preemption panic which found by Andy [1].
Let's disable preemption for ftrace first, and Andy could continue
the ftrace preemption work.
- The "-fpatchable-function-entry= CFLAG" wasted code size
!RISCV_ISA_C.
- The ftrace detour implementation wasted code size.
- When livepatching, the trampoline (ftrace_regs_caller) would not
return to <func_prolog+12> but would rather jump to the new function.
So, "REG_L ra, -SZREG(sp)" would not run and the original return
address would not be restored. The kernel is likely to hang or crash
as a result. (Found by Evgenii Shatokhin [4])
[Palmer: The first three patches in this series are pretty concrete
fixes, so I'm pulling them ahead of the rest of the series.]
* b4-shazam-merge:
riscv: ftrace: Reduce the detour code size to half
riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
riscv: ftrace: Fixup panic by disabling preemption
Link: https://lore.kernel.org/r/20230112090603.1295340-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Use a temporary register to reduce the size of detour code from 16 bytes to
8 bytes. The previous implementation is from 'commit afc76b8b8011 ("riscv:
Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'.
Before the patch:
<func_prolog>:
0: REG_S ra, -SZREG(sp)
4: auipc ra, ?
8: jalr ?(ra)
12: REG_L ra, -SZREG(sp)
(func_boddy)
After the patch:
<func_prolog>:
0: auipc t0, ?
4: jalr t0, ?(t0)
(func_boddy)
This patch not just reduces the size of detour code, but also fixes an
important issue:
An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can
actually change the instruction pointer, e.g. to "replace" the given
kernel function with a new one, which is needed for livepatching, etc.
In this case, the trampoline (ftrace_regs_caller) would not return to
<func_prolog+12> but would rather jump to the new function. So, "REG_L
ra, -SZREG(sp)" would not run and the original return address would not
be restored. The kernel is likely to hang or crash as a result.
This can be easily demonstrated if one tries to "replace", say,
cmdline_proc_show() with a new function with the same signature using
instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace
callback.
Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/
Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
Co-developed-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Evgenii Shatokhin <e.shatokhin@yadro.com>
Reviewed-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/r/20230112090603.1295340-4-guoren@kernel.org
Cc: stable@vger.kernel.org
Fixes: 10626c32e382 ("riscv/ftrace: Add basic support")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
When CONFIG_RISCV_ISA_C=n, -fpatchable-function-entry=8 would generate
more nops than we expect. Because it treat nop opcode as 0x00000013
instead of 0x0001.
Dump of assembler code for function dw_pcie_free_msi:
0xffffffff806fce94 <+0>: sd ra,-8(sp)
0xffffffff806fce98 <+4>: auipc ra,0xff90f
0xffffffff806fce9c <+8>: jalr -684(ra) # 0xffffffff8000bbec
<ftrace_caller>
0xffffffff806fcea0 <+12>: ld ra,-8(sp)
0xffffffff806fcea4 <+16>: nop /* wasted */
0xffffffff806fcea8 <+20>: nop /* wasted */
0xffffffff806fceac <+24>: nop /* wasted */
0xffffffff806fceb0 <+28>: nop /* wasted */
0xffffffff806fceb4 <+0>: addi sp,sp,-48
0xffffffff806fceb8 <+4>: sd s0,32(sp)
0xffffffff806fcebc <+8>: sd s1,24(sp)
0xffffffff806fcec0 <+12>: sd s2,16(sp)
0xffffffff806fcec4 <+16>: sd s3,8(sp)
0xffffffff806fcec8 <+20>: sd ra,40(sp)
0xffffffff806fcecc <+24>: addi s0,sp,48
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230112090603.1295340-3-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
In RISCV, we must use an AUIPC + JALR pair to encode an immediate,
forming a jump that jumps to an address over 4K. This may cause errors
if we want to enable kernel preemption and remove dependency from
patching code with stop_machine(). For example, if a task was switched
out on auipc. And, if we changed the ftrace function before it was
switched back, then it would jump to an address that has updated 11:0
bits mixing with previous XLEN:12 part.
p: patched area performed by dynamic ftrace
ftrace_prologue:
p| REG_S ra, -SZREG(sp)
p| auipc ra, 0x? ------------> preempted
...
change ftrace function
...
p| jalr -?(ra) <------------- switched back
p| REG_L ra, -SZREG(sp)
func:
xxx
ret
Fixes: afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230112090603.1295340-2-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definitions to prevent
modification at runtime.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The only user in the zoned remap code is gone now, so remove the argument.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_record_physical_zoned relies on a bio->bi_bdev samples in the
bio_end_io handler to find the reverse map for remapping the zone append
write, but stacked block device drivers can and usually do change bi_bdev
when sending on the bio to a lower device. This can happen e.g. with the
nvme-multipath driver when a NVMe SSD sets the shared namespace bit.
But there is no real need for the bdev in btrfs_record_physical_zoned,
as it is only passed to btrfs_rmap_block, which uses it to pick the
mapping to report if there are multiple reverse mappings. As zone
writes can only do simple non-mirror writes right now, and anything
more complex will use the stripe tree there is no chance of the multiple
mappings case actually happening.
Instead open code the subset of btrfs_rmap_block in
btrfs_record_physical_zoned, which also removes a memory allocation and
remove the bdev field in the ordered extent.
Fixes: d8e3fb106f39 ("btrfs: zoned: use ZONE_APPEND write for zoned mode")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Using Zone Append only makes sense for writes to the device, so check
that in btrfs_use_zone_append. This avoids the possibility of
artificially limited read size on zoned file systems.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
struct btrfs_bio has all the information needed for btrfs_use_append, so
pass that instead of a btrfs_inode and file_offset.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Instead of digging into the bio_vec in submit_one_bio, set file_offset at
bio allocation time from the provided parameter. This also ensures that
the file_offset is available all the time when building up the bio
payload.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_ordered_extent->disk_bytenr can be rewritten by the zoned I/O
completion handler, and thus in general is not a good idea to limit I/O
size. But the maximum bio size calculation can easily be done using the
file_offset fields in the btrfs_ordered_extent and btrfs_bio structures,
so switch to that instead.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In the search loop of the binary search function, we are doing a division
by 2 of the sum of the high and low slots. Because the slots are integers,
the generated assembly code for it is the following on x86_64:
0x00000000000141f1 <+145>: mov %eax,%ebx
0x00000000000141f3 <+147>: shr $0x1f,%ebx
0x00000000000141f6 <+150>: add %eax,%ebx
0x00000000000141f8 <+152>: sar %ebx
It's a few more instructions than a simple right shift, because signed
integer division needs to round towards zero. However we know that slots
can never be negative (btrfs_header_nritems() returns an u32), so we
can instead use unsigned types for the low and high slots and therefore
use unsigned integer division, which results in a single instruction on
x86_64:
0x00000000000141f0 <+144>: shr %ebx
So use unsigned types for the slots and therefore unsigned division.
This is part of a small patchset comprised of the following two patches:
btrfs: eliminate extra call when doing binary search on extent buffer
btrfs: do unsigned integer division in the extent buffer binary search loop
The following fs_mark test was run on a non-debug kernel (Debian's default
kernel config) before and after applying the patchset:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
MOUNT_OPTIONS="-o ssd"
MKFS_OPTIONS="-O no-holes -R free-space-tree"
FILES=100000
THREADS=$(nproc --all)
FILE_SIZE=0
umount $DEV &> /dev/null
mkfs.btrfs -f $MKFS_OPTIONS $DEV
mount $MOUNT_OPTIONS $DEV $MNT
OPTS="-S 0 -L 6 -n $FILES -s $FILE_SIZE -t $THREADS -k"
for ((i = 1; i <= $THREADS; i++)); do
OPTS="$OPTS -d $MNT/d$i"
done
fs_mark $OPTS
umount $MNT
Results before applying patchset:
FSUse% Count Size Files/sec App Overhead
2 1200000 0 174472.0 11549868
4 2400000 0 253503.0 11694618
4 3600000 0 257833.1 11611508
6 4800000 0 247089.5 11665983
6 6000000 0 211296.1 12121244
10 7200000 0 187330.6 12548565
Results after applying patchset:
FSUse% Count Size Files/sec App Overhead
2 1200000 0 207556.0 11393252
4 2400000 0 266751.1 11347909
4 3600000 0 274397.5 11270058
6 4800000 0 259608.4 11442250
6 6000000 0 238895.8 11635921
8 7200000 0 211942.2 11873825
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The function btrfs_bin_search() is just a wrapper around the function
generic_bin_search(), which passes the same arguments plus a default
low slot with a value of 0. This adds an unnecessary extra function
call, since btrfs_bin_search() is not static. So improve on this by
making btrfs_bin_search() an inline function that calls
generic_bin_search(), renaming the later to btrfs_generic_bin_search()
and exporting it.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The only caller of scrub_rbio calls rbio_orig_end_io right after it,
move it into scrub_rbio to match the other work item helpers.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Both callers of recover_rbio call rbio_orig_end_io right after it, so
move the call into the shared function.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Both callers of rmv_rbio call rbio_orig_end_io right after it, so
move the call into the shared function.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Instead of filling in a bio_list and submitting the bios in the only
caller, do that in scrub_assemble_read_bios. This removes the
need to pass the bio_list, and also makes it clear that the extra
bio_list cleanup in the caller is entirely pointless. Rename the
function to scrub_read_bios to make it clear that the bios are not
only assembled.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There is very little extra code in rmw_read_bios, and a large part of it
is the superfluous extra cleanup of the bio list. Merge the two
functions, and only clean up the bio list after it has been added to
but before it has been emptied again by submit_read_wait_bio_list.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|