summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-15selftest/bpf/benchs: Enhance argp parsingAnton Protopopov
To parse command line the bench utility uses the argp_parse() function. This function takes as an argument a parent 'struct argp' structure which defines common command line options and an array of children 'struct argp' structures which defines additional command line options for particular benchmarks. This implementation doesn't allow benchmarks to share option names, e.g., if two benchmarks want to use, say, the --option option, then only one of them will succeed (the first one encountered in the array). This will be convenient if same option names could be used in different benchmarks (with the same semantics, e.g., --nr_loops=N). Fix this by calling the argp_parse() function twice. The first call is the same as it was before, with all children argps, and helps to find the benchmark name and to print a combined help message if anything is wrong. Given the name, we can call the argp_parse the second time, but now the children array points only to a correct benchmark thus always calling the correct parsers. (If there's no a specific list of arguments, then only one call to argp_parse will be done.) Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-4-aspsk@isovalent.com
2023-02-15selftest/bpf/benchs: Make a function static in bpf_hashmap_full_updateAnton Protopopov
The hashmap_report_final callback function defined in the benchs/bench_bpf_hashmap_full_update.c file should be static. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-3-aspsk@isovalent.com
2023-02-15selftest/bpf/benchs: Fix a typo in bpf_hashmap_full_updateAnton Protopopov
To call the bpf_hashmap_full_update benchmark, one should say: bench bpf-hashmap-ful-update The patch adds a missing 'l' to the benchmark name. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-2-aspsk@isovalent.com
2023-02-15Merge branch 'Use __GFP_ZERO in bpf memory allocator'Alexei Starovoitov
Hou Tao says: ==================== From: Hou Tao <houtao1@huawei.com> Hi, The patchset tries to fix the hard-up problem found when checking how htab handles element reuse in bpf memory allocator. The immediate reuse of freed elements will reinitialize special fields (e.g., bpf_spin_lock) in htab map value and it may corrupt lookup procedure with BFP_F_LOCK flag which acquires bpf-spin-lock during value copying, and lead to hard-lock as shown in patch #2. Patch #1 fixes it by using __GFP_ZERO when allocating the object from slab and the behavior is similar with the preallocated hash-table case. Please see individual patches for more details. And comments are always welcome. Regards, Change Log: v1: * Use __GFP_ZERO instead of ctor to avoid retpoline overhead (from Alexei) * Add comments for check_and_init_map_value() (from Alexei) * split __GFP_ZERO patches out of the original patchset to unblock the development work of others. RFC: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15selftests/bpf: Add test case for element reuse in htab mapHou Tao
The reinitialization of spin-lock in map value after immediate reuse may corrupt lookup with BPF_F_LOCK flag and result in hard lock-up, so add one test case to demonstrate the problem. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230215082132.3856544-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15bpf: Zeroing allocated object from slab in bpf memory allocatorHou Tao
Currently the freed element in bpf memory allocator may be immediately reused, for htab map the reuse will reinitialize special fields in map value (e.g., bpf_spin_lock), but lookup procedure may still access these special fields, and it may lead to hard-lockup as shown below: NMI backtrace for cpu 16 CPU: 16 PID: 2574 Comm: htab.bin Tainted: G L 6.1.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), RIP: 0010:queued_spin_lock_slowpath+0x283/0x2c0 ...... Call Trace: <TASK> copy_map_value_locked+0xb7/0x170 bpf_map_copy_value+0x113/0x3c0 __sys_bpf+0x1c67/0x2780 __x64_sys_bpf+0x1c/0x20 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 ...... </TASK> For htab map, just like the preallocated case, these is no need to initialize these special fields in map value again once these fields have been initialized. For preallocated htab map, these fields are initialized through __GFP_ZERO in bpf_map_area_alloc(), so do the similar thing for non-preallocated htab in bpf memory allocator. And there is no need to use __GFP_ZERO for per-cpu bpf memory allocator, because __alloc_percpu_gfp() does it implicitly. Fixes: 0fd7c5d43339 ("bpf: Optimize call_rcu in non-preallocated hash map.") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230215082132.3856544-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-16macintosh: windfarm: Use unsigned type for 1-bit bitfieldsNathan Chancellor
Clang warns: drivers/macintosh/windfarm_lm75_sensor.c:63:14: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion] lm->inited = 1; ^ ~ drivers/macintosh/windfarm_smu_sensors.c:356:19: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion] pow->fake_volts = 1; ^ ~ drivers/macintosh/windfarm_smu_sensors.c:368:18: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion] pow->quadratic = 1; ^ ~ There is no bug here since no code checks the actual value of these fields, just whether or not they are zero (boolean context), but this can be easily fixed by switching to an unsigned type. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230215-windfarm-wsingle-bit-bitfield-constant-conversion-v1-1-26415072e855@kernel.org
2023-02-15Merge tag 'apparmor-v6.2-rc9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor fix from John Johansen: "Regression fix for getattr mediation of old policy" * tag 'apparmor-v6.2-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: Fix regression in compat permissions for getattr
2023-02-15power: reset: odroid-go-ultra: fix I2C dependencyArnd Bergmann
Since this driver can only be built-in, it fails to link when the I2C layer is in a loadable module: x86_64-linux-ld: drivers/power/reset/odroid-go-ultra-poweroff.o: in function `odroid_go_ultra_poweroff_get_pmic_device': odroid-go-ultra-poweroff.c:(.text+0x30): undefined reference to `i2c_find_device_by_fwnode' Tighten the dependency to only allow enabling POWER_RESET_ODROID_GO_ULTRA_POWEROFF is I2C is built-in as well. Fixes: cec3b46b8bda ("power: reset: add Odroid Go Ultra poweroff driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2023-02-15power: supply: leds: explicitly include linux/leds.hThomas Weißschuh
Instead of relying on an accidental, transitive inclusion of linux/leds.h use it directly. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2023-02-15spi: bcm63xx-hsspi: fix error code in probeDan Carpenter
This code accidentally returns success instead of a negative error code. Fixes: 50a6620dd1fb ("spi: bcm63xx-hsspi: Add polling mode support") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: William Zhang <william.zhang@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/Y+zmoGH6LubPhiI0@kili Signed-off-by: Mark Brown <broonie@kernel.org>
2023-02-15spi: bcmbca-hsspi: Fix error code in probe() functionDan Carpenter
This code accidentally returns success instead of a negative error code. Fixes: a38a2233f23b ("spi: bcmbca-hsspi: Add driver for newer HSSPI controller") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: William Zhang <william.zhang@broadcom.com> Link: https://lore.kernel.org/r/Y+zmrNJ9zjNQpzWq@kili Signed-off-by: Mark Brown <broonie@kernel.org>
2023-02-15dt-bindings: power: supply: pm8941-coincell: Don't require charging propertiesKonrad Dybcio
It's fine for these properties to be absent, as the driver doesn't fail without them and functions with settings inherited from the reset/previous stage bootloader state. Fixes: 6c463222a21d ("dt-bindings: power: supply: pm8941-coincell: Convert to DT schema format") Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2023-02-15dt-bindings: power: supply: pm8941-coincell: Add PM8998 compatibleKonrad Dybcio
Add a specific compatible for the coincell charger present on PM8998. Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2023-02-15dt-bindings: Fix multi pattern support in DT_SCHEMA_FILESCristian Ciocaltea
DT_SCHEMA_FILES used to allow specifying a space separated list of file paths, but the introduction of partial matches support broke this feature: $ make dtbs_check DT_SCHEMA_FILES="path/to/schema1.yaml path/to/schema2.yaml" [...] LINT Documentation/devicetree/bindings usage: yamllint [-h] [-] [-c CONFIG_FILE | -d CONFIG_DATA] [--list-files] [...] [-v] [FILE_OR_DIR ...] yamllint: error: one of the arguments FILE_OR_DIR - is required [...] Restore the lost functionality by preparing a grep filter that is able to handle multiple search patterns. Additionally, as suggested by Rob, use ':' instead of ' ' as the patterns separator char. Hence, the command above becomes: $ make dtbs_check DT_SCHEMA_FILES="path/to/schema1.yaml:path/to/schema2.yaml" Fixes: 309d955985ee ("dt-bindings: kbuild: Support partial matches with DT_SCHEMA_FILES") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20230209193735.795288-1-cristian.ciocaltea@collabora.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-02-15of: reserved-mem: print out reserved-mem details during bootMartin Liu
It's important to know reserved-mem information in mobile world since reserved memory via device tree keeps increased in platform (e.g., 45% in our platform). Therefore, it's crucial to know the reserved memory sizes breakdown for the memory accounting. This patch prints out reserved memory details during boot to make them visible. Below is an example output: [ 0.000000] OF: reserved mem: 0x00000009f9400000..0x00000009fb3fffff ( 32768 KB ) map reusable test1 [ 0.000000] OF: reserved mem: 0x00000000ffdf0000..0x00000000ffffffff ( 2112 KB ) map non-reusable test2 [ 0.000000] OF: reserved mem: 0x0000000091000000..0x00000000912fffff ( 3072 KB ) nomap non-reusable test3 Signed-off-by: Martin Liu <liumartin@google.com> Link: https://lore.kernel.org/r/20230209160954.1471909-1-liumartin@google.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-02-15PCI: hotplug: Allow marking devices as disconnected during bind/unbindLukas Wunner
On surprise removal, pciehp_unconfigure_device() and acpiphp's trim_stale_devices() call pci_dev_set_disconnected() to mark removed devices as permanently offline. Thereby, the PCI core and drivers know to skip device accesses. However pci_dev_set_disconnected() takes the device_lock and thus waits for a concurrent driver bind or unbind to complete. As a result, the driver's ->probe and ->remove hooks have no chance to learn that the device is gone. That doesn't make any sense, so drop the device_lock and instead use atomic xchg() and cmpxchg() operations to update the device state. As a byproduct, an AB-BA deadlock reported by Anatoli is fixed which occurs on surprise removal with AER concurrently performing a bus reset. AER bus reset: INFO: task irq/26-aerdrv:95 blocked for more than 120 seconds. Tainted: G W 6.2.0-rc3-custom-norework-jan11+ schedule rwsem_down_write_slowpath down_write_nested pciehp_reset_slot # acquires reset_lock pci_reset_hotplug_slot pci_slot_reset # acquires device_lock pci_bus_error_reset aer_root_reset pcie_do_recovery aer_process_err_devices aer_isr pciehp surprise removal: INFO: task irq/26-pciehp:96 blocked for more than 120 seconds. Tainted: G W 6.2.0-rc3-custom-norework-jan11+ schedule_preempt_disabled __mutex_lock mutex_lock_nested pci_dev_set_disconnected # acquires device_lock pci_walk_bus pciehp_unconfigure_device pciehp_disable_slot pciehp_handle_presence_or_link_change pciehp_ist # acquires reset_lock Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590 Fixes: a6bd101b8f84 ("PCI: Unify device inaccessible") Link: https://lore.kernel.org/r/3dc88ea82bdc0e37d9000e413d5ebce481cbd629.1674205689.git.lukas@wunner.de Reported-by: Anatoli Antonovitch <anatoli.antonovitch@amd.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v4.20+ Cc: Keith Busch <kbusch@kernel.org>
2023-02-15Merge tag 'nvme-6.3-2023-02-15' of git://git.infradead.org/nvme into ↵Jens Axboe
for-6.3/block Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - fix and cleanup freeing single sgl (Keith Busch)" * tag 'nvme-6.3-2023-02-15' of git://git.infradead.org/nvme: nvme-pci: remove iod use_sgls nvme-pci: fix freeing single sgl
2023-02-15Merge tag 'nvme-6.2-2023-02-15' of git://git.infradead.org/nvme into block-6.2Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote) - add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner) - set the DMA mask earlier (Christoph Hellwig)" * tag 'nvme-6.2-2023-02-15' of git://git.infradead.org/nvme: nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev nvme-pci: set the DMA mask earlier nvme-pci: add bogus ID quirk for ADATA SX6000PNP
2023-02-15Documentation: i2c: correct spellingRandy Dunlap
Correct spelling problems for Documentation/i2c/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-02-15dt-bindings: i2c: i2c-st: convert to DT schemaAlain Volmat
Convert i2c-st.txt into st,sti-i2c.yaml for the i2c-st driver. Signed-off-by: Alain Volmat <avolmat@me.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-02-15Merge tag 'nfsd-6.2-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix a teardown bug in the new nfs4_file hashtable * tag 'nfsd-6.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't destroy global nfs4_file table in per-net shutdown
2023-02-15Merge branch 'Improvements for BPF_ST tracking by verifier 'Alexei Starovoitov
Eduard Zingerman says: ==================== This patch-set is a part of preparation work for -mcpu=v4 option for BPF C compiler (discussed in [1]). Among other things -mcpu=v4 should enable generation of BPF_ST instruction by the compiler. - Patches #1,2 adjust verifier to track values of constants written to stack using BPF_ST. Currently these are tracked imprecisely, unlike the writes using BPF_STX, e.g.: fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm after such instruction, where m stands for "misc", just a note that something is written at fp[-8]. r1 = 42; verifier tracks r1=42 after this instruction. fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. This patch makes both cases equivalent. - Patches #3,4 adjust verifier.c:check_stack_write_fixed_off() to preserve STACK_ZERO marks when BPF_ST writes zero. Currently these are replaced by STACK_MISC, unlike zero writes using BPF_STX, e.g.: ... stack range [X,Y] is marked as STACK_ZERO ... r0 = ... variable offset pointer to stack with range [X,Y] ... fp[r0] = 0; currently verifier marks range [X,Y] as STACK_MISC for such instructions. r1 = 0; fp[r0] = r1; verifier keeps STACK_ZERO marks for range [X,Y]. This patch makes both cases equivalent. Motivating example for patch #1 could be found at [3]. Previous version of the patch-set is here [2], the changes are: - Explicit initialization of fake register parent link is removed from verifier.c:check_stack_write_fixed_off() as parent links are now correctly handled by verifier.c:save_register_state(). - Original patch #1 is split in patches #1 & #3. - Missing test case added for patch #3 verifier.c:check_stack_write_fixed_off() adjustment. - Test cases are updated to use .prog_type = BPF_PROG_TYPE_SK_LOOKUP, which requires return value to be in the range [0,1] (original test cases assumed that such range is always required, which is not true). - Original patch #3 with changes allowing BPF_ST writes to context is withheld for now, w/o compiler support for BPF_ST it requires some creative testing. - Original patch #5 is removed from the patch-set. This patch contained adjustments to expected verifier error messages in some tests, necessary when C compiler generates BPF_ST instruction instead of BPF_STX (changes to expected instruction indices). These changes are not necessary yet. [1] https://lore.kernel.org/bpf/01515302-c37d-2ee5-c950-2f556a4caad0@meta.com/ [2] https://lore.kernel.org/bpf/20221231163122.1360813-1-eddyz87@gmail.com/ [3] https://lore.kernel.org/bpf/f1e4282bf00aa21a72fc5906f8c3be1ae6c94a5e.camel@gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15selftests/bpf: check if BPF_ST with variable offset preserves STACK_ZEROEduard Zingerman
A test case to verify that variable offset BPF_ST instruction preserves STACK_ZERO marks when writes zeros, e.g. in the following situation: *(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now variable offset pointer to stack *(u8*)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for ; fp[-8] should be preserved. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15bpf: BPF_ST with variable offset should preserve STACK_ZERO marksEduard Zingerman
BPF_STX instruction preserves STACK_ZERO marks for variable offset writes in situations like below: *(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now a variable offset pointer to stack r1 = 0 *(u8*)(r0) = r1 ; BPF_STX writing zero, STACK_ZERO mark for ; fp[-8] is preserved This commit updates verifier.c:check_stack_write_var_off() to process BPF_ST in a similar manner, e.g. the following example: *(u64*)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now variable offset pointer to stack *(u8*)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for ; fp[-8] is preserved Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15selftests/bpf: check if verifier tracks constants spilled by BPF_ST_MEMEduard Zingerman
Check that verifier tracks the value of 'imm' spilled to stack by BPF_ST_MEM instruction. Cover the following cases: - write of non-zero constant to stack; - write of a zero constant to stack. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15bpf: track immediate values written to stack by BPF_ST instructionEduard Zingerman
For aligned stack writes using BPF_ST instruction track stored values in a same way BPF_STX is handled, e.g. make sure that the following commands produce similar verifier knowledge: fp[-8] = 42; r1 = 42; fp[-8] = r1; This covers two cases: - non-null values written to stack are stored as spill of fake registers; - null values written to stack are stored as STACK_ZERO marks. Previously both cases above used STACK_MISC marks instead. Some verifier test cases relied on the old logic to obtain STACK_MISC marks for some stack values. These test cases are updated in the same commit to avoid failures during bisect. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230214232030.1502829-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15Merge tag 'trace-v6.2-rc7-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixlet from Steven Rostedt: "Make trace_define_field_ext() static. Just after the fix to TASK_COMM_LEN not converted to its value in trace_events was pulled, the kernel test robot reported that the helper function trace_define_field_ext() added to that change was only used in the file it was defined in but was not declared static. Make it a local function" * tag 'trace-v6.2-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Make trace_define_field_ext() static
2023-02-15apparmor: Fix regression in compat permissions for getattrJohn Johansen
This fixes a regression in mediation of getattr when old policy built under an older ABI is loaded and mapped to internal permissions. The regression does not occur for all getattr permission requests, only appearing if state zero is the final state in the permission lookup. This is because despite the first state (index 0) being guaranteed to not have permissions in both newer and older permission formats, it may have to carry permissions that were not mediated as part of an older policy. These backward compat permissions are mapped here to avoid special casing the mediation code paths. Since the mapping code already takes into account backwards compat permission from older formats it can be applied to state 0 to fix the regression. Fixes: 408d53e923bd ("apparmor: compute file permissions on profile load") Reported-by: Philip Meulengracht <the_meulengracht@hotmail.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2023-02-15dt-bindings: mailbox: qcom,apcs-kpss-global: drop mbox-names from exampleKrzysztof Kozlowski
Qualcomm G-Link RPM edge bindings do not allow and do not use mbox-names property. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230208101545.45711-4-krzysztof.kozlowski@linaro.org
2023-02-15Merge branches 'pm-tools' and 'pm-docs'Rafael J. Wysocki
Merge power management utilities and documentation updates for 6.3-rc1: - Modify some power management utilities to use the canonical ftrace path (Ross Zwisler). - Correct spelling problems for Documentation/power/ as reported by codespell (Randy Dunlap). * pm-tools: PM: tools: use canonical ftrace path * pm-docs: Documentation: power: correct spelling
2023-02-15Merge branches 'powercap', 'pm-domains', 'pm-em' and 'pm-opp'Rafael J. Wysocki
Merge updates of the powercap framework, generic PM domains, Energy Model and operating performance points for 6.3-rc1: - Fix possible name leak in powercap_register_zone() (Yang Yingliang). - Add Meteor Lake and Emerald Rapids support to the intel_rapl power capping driver (Zhang Rui). - Modify the idle_inject power capping facility to support 100% idle injection (Srinivas Pandruvada). - Fix large time windows handling in the intel_rapl power capping driver (Zhang Rui). - Fix memory leaks with using debugfs_lookup() in the generic PM domains and Energy Model code (Greg Kroah-Hartman). - Add missing 'cache-unified' property in example for kryo OPP bindings (Rob Herring). - Fix error checking in opp_migrate_dentry() (Qi Zheng). - Remove "select SRCU" (Paul E. McKenney). - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad Dybcio). * powercap: powercap: intel_rapl: Fix handling for large time window powercap: idle_inject: Support 100% idle injection powercap: intel_rapl: add support for Emerald Rapids powercap: intel_rapl: add support for Meteor Lake powercap: fix possible name leak in powercap_register_zone() * pm-domains: PM: domains: fix memory leak with using debugfs_lookup() * pm-em: PM: EM: fix memory leak with using debugfs_lookup() * pm-opp: OPP: fix error checking in opp_migrate_dentry() dt-bindings: opp: v2-qcom-level: Let qcom,opp-fuse-level be a 2-long array drivers/opp: Remove "select SRCU" dt-bindings: opp: opp-v2-kryo-cpu: Add missing 'cache-unified' property in example
2023-02-15Merge patch series "riscv: Optimize function trace"Palmer Dabbelt
guoren@kernel.org <guoren@kernel.org> says: From: Guo Ren <guoren@linux.alibaba.com> The previous ftrace detour implementation fc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems. - The most horrible bug is preemption panic which found by Andy [1]. Let's disable preemption for ftrace first, and Andy could continue the ftrace preemption work. - The "-fpatchable-function-entry= CFLAG" wasted code size !RISCV_ISA_C. - The ftrace detour implementation wasted code size. - When livepatching, the trampoline (ftrace_regs_caller) would not return to <func_prolog+12> but would rather jump to the new function. So, "REG_L ra, -SZREG(sp)" would not run and the original return address would not be restored. The kernel is likely to hang or crash as a result. (Found by Evgenii Shatokhin [4]) [Palmer: The first three patches in this series are pretty concrete fixes, so I'm pulling them ahead of the rest of the series.] * b4-shazam-merge: riscv: ftrace: Reduce the detour code size to half riscv: ftrace: Remove wasted nops for !RISCV_ISA_C riscv: ftrace: Fixup panic by disabling preemption Link: https://lore.kernel.org/r/20230112090603.1295340-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-15riscv: ftrace: Reduce the detour code size to halfGuo Ren
Use a temporary register to reduce the size of detour code from 16 bytes to 8 bytes. The previous implementation is from 'commit afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'. Before the patch: <func_prolog>: 0: REG_S ra, -SZREG(sp) 4: auipc ra, ? 8: jalr ?(ra) 12: REG_L ra, -SZREG(sp) (func_boddy) After the patch: <func_prolog>: 0: auipc t0, ? 4: jalr t0, ?(t0) (func_boddy) This patch not just reduces the size of detour code, but also fixes an important issue: An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can actually change the instruction pointer, e.g. to "replace" the given kernel function with a new one, which is needed for livepatching, etc. In this case, the trampoline (ftrace_regs_caller) would not return to <func_prolog+12> but would rather jump to the new function. So, "REG_L ra, -SZREG(sp)" would not run and the original return address would not be restored. The kernel is likely to hang or crash as a result. This can be easily demonstrated if one tries to "replace", say, cmdline_proc_show() with a new function with the same signature using instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace callback. Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/ Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/ Co-developed-by: Song Shuai <suagrfillet@gmail.com> Signed-off-by: Song Shuai <suagrfillet@gmail.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Cc: Evgenii Shatokhin <e.shatokhin@yadro.com> Reviewed-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Link: https://lore.kernel.org/r/20230112090603.1295340-4-guoren@kernel.org Cc: stable@vger.kernel.org Fixes: 10626c32e382 ("riscv/ftrace: Add basic support") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-15riscv: ftrace: Remove wasted nops for !RISCV_ISA_CGuo Ren
When CONFIG_RISCV_ISA_C=n, -fpatchable-function-entry=8 would generate more nops than we expect. Because it treat nop opcode as 0x00000013 instead of 0x0001. Dump of assembler code for function dw_pcie_free_msi: 0xffffffff806fce94 <+0>: sd ra,-8(sp) 0xffffffff806fce98 <+4>: auipc ra,0xff90f 0xffffffff806fce9c <+8>: jalr -684(ra) # 0xffffffff8000bbec <ftrace_caller> 0xffffffff806fcea0 <+12>: ld ra,-8(sp) 0xffffffff806fcea4 <+16>: nop /* wasted */ 0xffffffff806fcea8 <+20>: nop /* wasted */ 0xffffffff806fceac <+24>: nop /* wasted */ 0xffffffff806fceb0 <+28>: nop /* wasted */ 0xffffffff806fceb4 <+0>: addi sp,sp,-48 0xffffffff806fceb8 <+4>: sd s0,32(sp) 0xffffffff806fcebc <+8>: sd s1,24(sp) 0xffffffff806fcec0 <+12>: sd s2,16(sp) 0xffffffff806fcec4 <+16>: sd s3,8(sp) 0xffffffff806fcec8 <+20>: sd ra,40(sp) 0xffffffff806fcecc <+24>: addi s0,sp,48 Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230112090603.1295340-3-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-15riscv: ftrace: Fixup panic by disabling preemptionAndy Chiu
In RISCV, we must use an AUIPC + JALR pair to encode an immediate, forming a jump that jumps to an address over 4K. This may cause errors if we want to enable kernel preemption and remove dependency from patching code with stop_machine(). For example, if a task was switched out on auipc. And, if we changed the ftrace function before it was switched back, then it would jump to an address that has updated 11:0 bits mixing with previous XLEN:12 part. p: patched area performed by dynamic ftrace ftrace_prologue: p| REG_S ra, -SZREG(sp) p| auipc ra, 0x? ------------> preempted ... change ftrace function ... p| jalr -?(ra) <------------- switched back p| REG_L ra, -SZREG(sp) func: xxx ret Fixes: afc76b8b8011 ("riscv: Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230112090603.1295340-2-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-15btrfs: make kobj_type structures constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: remove the bdev argument to btrfs_rmap_blockChristoph Hellwig
The only user in the zoned remap code is gone now, so remove the argument. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: don't rely on unchanging ->bi_bdev for zone append remapsChristoph Hellwig
btrfs_record_physical_zoned relies on a bio->bi_bdev samples in the bio_end_io handler to find the reverse map for remapping the zone append write, but stacked block device drivers can and usually do change bi_bdev when sending on the bio to a lower device. This can happen e.g. with the nvme-multipath driver when a NVMe SSD sets the shared namespace bit. But there is no real need for the bdev in btrfs_record_physical_zoned, as it is only passed to btrfs_rmap_block, which uses it to pick the mapping to report if there are multiple reverse mappings. As zone writes can only do simple non-mirror writes right now, and anything more complex will use the stripe tree there is no chance of the multiple mappings case actually happening. Instead open code the subset of btrfs_rmap_block in btrfs_record_physical_zoned, which also removes a memory allocation and remove the bdev field in the ordered extent. Fixes: d8e3fb106f39 ("btrfs: zoned: use ZONE_APPEND write for zoned mode") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: never return true for reads in btrfs_use_zone_appendChristoph Hellwig
Using Zone Append only makes sense for writes to the device, so check that in btrfs_use_zone_append. This avoids the possibility of artificially limited read size on zoned file systems. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: pass a btrfs_bio to btrfs_use_appendChristoph Hellwig
struct btrfs_bio has all the information needed for btrfs_use_append, so pass that instead of a btrfs_inode and file_offset. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: set bbio->file_offset in alloc_new_bioChristoph Hellwig
Instead of digging into the bio_vec in submit_one_bio, set file_offset at bio allocation time from the provided parameter. This also ensures that the file_offset is available all the time when building up the bio payload. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: use file_offset to limit bios size in calc_bio_boundariesChristoph Hellwig
btrfs_ordered_extent->disk_bytenr can be rewritten by the zoned I/O completion handler, and thus in general is not a good idea to limit I/O size. But the maximum bio size calculation can easily be done using the file_offset fields in the btrfs_ordered_extent and btrfs_bio structures, so switch to that instead. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: do unsigned integer division in the extent buffer binary search loopFilipe Manana
In the search loop of the binary search function, we are doing a division by 2 of the sum of the high and low slots. Because the slots are integers, the generated assembly code for it is the following on x86_64: 0x00000000000141f1 <+145>: mov %eax,%ebx 0x00000000000141f3 <+147>: shr $0x1f,%ebx 0x00000000000141f6 <+150>: add %eax,%ebx 0x00000000000141f8 <+152>: sar %ebx It's a few more instructions than a simple right shift, because signed integer division needs to round towards zero. However we know that slots can never be negative (btrfs_header_nritems() returns an u32), so we can instead use unsigned types for the low and high slots and therefore use unsigned integer division, which results in a single instruction on x86_64: 0x00000000000141f0 <+144>: shr %ebx So use unsigned types for the slots and therefore unsigned division. This is part of a small patchset comprised of the following two patches: btrfs: eliminate extra call when doing binary search on extent buffer btrfs: do unsigned integer division in the extent buffer binary search loop The following fs_mark test was run on a non-debug kernel (Debian's default kernel config) before and after applying the patchset: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-O no-holes -R free-space-tree" FILES=100000 THREADS=$(nproc --all) FILE_SIZE=0 umount $DEV &> /dev/null mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT OPTS="-S 0 -L 6 -n $FILES -s $FILE_SIZE -t $THREADS -k" for ((i = 1; i <= $THREADS; i++)); do OPTS="$OPTS -d $MNT/d$i" done fs_mark $OPTS umount $MNT Results before applying patchset: FSUse% Count Size Files/sec App Overhead 2 1200000 0 174472.0 11549868 4 2400000 0 253503.0 11694618 4 3600000 0 257833.1 11611508 6 4800000 0 247089.5 11665983 6 6000000 0 211296.1 12121244 10 7200000 0 187330.6 12548565 Results after applying patchset: FSUse% Count Size Files/sec App Overhead 2 1200000 0 207556.0 11393252 4 2400000 0 266751.1 11347909 4 3600000 0 274397.5 11270058 6 4800000 0 259608.4 11442250 6 6000000 0 238895.8 11635921 8 7200000 0 211942.2 11873825 Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: eliminate extra call when doing binary search on extent bufferFilipe Manana
The function btrfs_bin_search() is just a wrapper around the function generic_bin_search(), which passes the same arguments plus a default low slot with a value of 0. This adds an unnecessary extra function call, since btrfs_bin_search() is not static. So improve on this by making btrfs_bin_search() an inline function that calls generic_bin_search(), renaming the later to btrfs_generic_bin_search() and exporting it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: raid56: handle endio in scrub_rbioChristoph Hellwig
The only caller of scrub_rbio calls rbio_orig_end_io right after it, move it into scrub_rbio to match the other work item helpers. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: raid56: handle endio in recover_rbioChristoph Hellwig
Both callers of recover_rbio call rbio_orig_end_io right after it, so move the call into the shared function. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: raid56: handle endio in rmw_rbioChristoph Hellwig
Both callers of rmv_rbio call rbio_orig_end_io right after it, so move the call into the shared function. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: raid56: submit the read bios from scrub_assemble_read_biosChristoph Hellwig
Instead of filling in a bio_list and submitting the bios in the only caller, do that in scrub_assemble_read_bios. This removes the need to pass the bio_list, and also makes it clear that the extra bio_list cleanup in the caller is entirely pointless. Rename the function to scrub_read_bios to make it clear that the bios are not only assembled. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: raid56: fold rmw_read_wait_recover into rmw_read_biosChristoph Hellwig
There is very little extra code in rmw_read_bios, and a large part of it is the superfluous extra cleanup of the bio list. Merge the two functions, and only clean up the bio list after it has been added to but before it has been emptied again by submit_read_wait_bio_list. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>