Age | Commit message (Collapse) | Author |
|
Factor out a helper from __blkdev_issue_discard that chews off as much as
possible from a discard range and allocates a bio for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240506042027.2289826-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is basically blk_next_bio just with the bio allocation moved
to the caller to allow for more flexible bio handling in the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240506042027.2289826-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Most bio operations get basic sanity checking in submit_bio and anything
more complicated than that is done in the callers. Discards are a bit
different from that in that a lot of checking is done in
__blkdev_issue_discard, and the specific errnos for that are returned
to userspace. Move the checks that require specific errnos to the ioctl
handler instead, and just leave the basic sanity checking in submit_bio
for the other handlers. This introduces two changes in behavior:
1) the logical block size alignment check of the start and len is lost
for non-ioctl callers.
This matches what is done for other operations including reads and
writes. We should probably verify this for all bios, but for now
make discards match the normal flow.
2) for non-ioctl callers all errors are reported on I/O completion now
instead of synchronously. Callers in general mostly ignore or log
errors so this will actually simplify the code once cleaned up
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240506042027.2289826-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We now set a default granularity in the queue limits API, so don't
bother with this extra check.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240506042027.2289826-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Running syzkaller with the newly reintroduced signed integer overflow
sanitizer shows this report:
[ 62.982337] ------------[ cut here ]------------
[ 62.985692] cgroup: Invalid name
[ 62.986211] UBSAN: signed-integer-overflow in ../block/ioctl.c:36:46
[ 62.989370] 9pnet_fd: p9_fd_create_tcp (7343): problem connecting socket to 127.0.0.1
[ 62.992992] 9223372036854775807 + 4095 cannot be represented in type 'long long'
[ 62.997827] 9pnet_fd: p9_fd_create_tcp (7345): problem connecting socket to 127.0.0.1
[ 62.999369] random: crng reseeded on system resumption
[ 63.000634] GUP no longer grows the stack in syz-executor.2 (7353): 20002000-20003000 (20001000)
[ 63.000668] CPU: 0 PID: 7353 Comm: syz-executor.2 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1
[ 63.000677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 63.000682] Call Trace:
[ 63.000686] <TASK>
[ 63.000731] dump_stack_lvl+0x93/0xd0
[ 63.000919] __get_user_pages+0x903/0xd30
[ 63.001030] __gup_longterm_locked+0x153e/0x1ba0
[ 63.001041] ? _raw_read_unlock_irqrestore+0x17/0x50
[ 63.001072] ? try_get_folio+0x29c/0x2d0
[ 63.001083] internal_get_user_pages_fast+0x1119/0x1530
[ 63.001109] iov_iter_extract_pages+0x23b/0x580
[ 63.001206] bio_iov_iter_get_pages+0x4de/0x1220
[ 63.001235] iomap_dio_bio_iter+0x9b6/0x1410
[ 63.001297] __iomap_dio_rw+0xab4/0x1810
[ 63.001316] iomap_dio_rw+0x45/0xa0
[ 63.001328] ext4_file_write_iter+0xdde/0x1390
[ 63.001372] vfs_write+0x599/0xbd0
[ 63.001394] ksys_write+0xc8/0x190
[ 63.001403] do_syscall_64+0xd4/0x1b0
[ 63.001421] ? arch_exit_to_user_mode_prepare+0x3a/0x60
[ 63.001479] entry_SYSCALL_64_after_hwframe+0x6f/0x77
[ 63.001535] RIP: 0033:0x7f7fd3ebf539
[ 63.001551] Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[ 63.001562] RSP: 002b:00007f7fd32570c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 63.001584] RAX: ffffffffffffffda RBX: 00007f7fd3ff3f80 RCX: 00007f7fd3ebf539
[ 63.001590] RDX: 4db6d1e4f7e43360 RSI: 0000000020000000 RDI: 0000000000000004
[ 63.001595] RBP: 00007f7fd3f1e496 R08: 0000000000000000 R09: 0000000000000000
[ 63.001599] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 63.001604] R13: 0000000000000006 R14: 00007f7fd3ff3f80 R15: 00007ffd415ad2b8
...
[ 63.018142] ---[ end trace ]---
Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang; It was re-enabled in the
kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
sanitizer").
Let's rework this overflow checking logic to not actually perform an
overflow during the check itself, thus avoiding the UBSAN splat.
[1]: https://github.com/llvm/llvm-project/pull/82432
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240507-b4-sio-block-ioctl-v3-1-ba0c2b32275e@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The first 32k of memory is reserved for interrupt vectors, however for
powerpc64 this might not be enough. Fix this by reserving the maximum
size between 32k and the real size of interrupt vectors.
Signed-off-by: GUO Zihua <guozihua@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240113080509.1598290-1-guozihua@huawei.com
|
|
The chips supported by the emc1403 driver support configurable
conversion rates. Add support for it.
Cc: Lars Petter Mostad <lars.petter.mostad@appear.net>
Tested-by: Lars Petter Mostad <lars.petter.mostad@appear.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Various temperature and limit registers support 11 bit accuracy.
Add support for it.
Cc: Lars Petter Mostad <lars.petter.mostad@appear.net>
Tested-by: Lars Petter Mostad <lars.petter.mostad@appear.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Convert driver to register with the hwmon subsystem using
devm_hwmon_device_register_with_info() instead of
devm_hwmon_device_register_with_groups() to simplify the code
and to reduce its size. As side effect, this also fixes a couple
of overflow problems when writing limit and hysteresis registers.
Cc: Lars Petter Mostad <lars.petter.mostad@appear.net>
Tested-by: Lars Petter Mostad <lars.petter.mostad@appear.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Update the node name to be align with binding document.
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240119203911.3143928-4-Frank.Li@nxp.com
|
|
Update dts to match dts binding document.
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240119203911.3143928-3-Frank.Li@nxp.com
|
|
Due to the INTA is shared with the active-low PHY2 interrupt on
P1010RDB-PA board, so configure P1010RDB-PA's INTA with polarity as
active-low, the P1010RDB-PB board is used separately, so configure
P1010RDB-PB's INTA with polarity as active-high. The INTX in
P1010RDB-PB do not work because of the pcie@0 node fixup will be
overwrited by p1010si-post.dtsi file, so we move the pcie@0 node fixup
to p1010rdb-pb.dts and p1010rdb-pb_36b.dts.
Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240119203911.3143928-2-Frank.Li@nxp.com
|
|
Enable Power Management feature on device tree, including MPC8536,
MPC8544, MPC8548, MPC8572, P1010, P1020, P1021, P1022, P2020, P2041,
P3041, T104X, T1024.
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240119203911.3143928-1-Frank.Li@nxp.com
|
|
Newer processors support various MP2 register sets. Therefore, to ensure
compatibility and obtain C2P data, use the amd_get_c2p_val().
Co-developed-by: Patil Rajesh Reddy <patreddy@amd.com>
Signed-off-by: Patil Rajesh Reddy <patreddy@amd.com>
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
Resume or suspend each sensor device based on the num_hid_devices.
Therefore, add a check to handle the special case where no sensors are
present.
Fixes: 93ce5e0231d7 ("HID: amd_sfh: Implement SFH1.1 functionality")
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
Modify log messages, but only log errors when sensors are missing or a
true failure occurs to avoid misleading "failed" messages.
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
Change device type because
a. it is exactly a mouse, with left/right buttons and scroll wheel;
b. it does not have visible marks or crosshairs, thus does not provide
higher accuracy than stylus.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
This commit adds kernel-doc style comments with complete parameter
descriptions for the function smp_startup_cpu().
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408053109.96360-2-yang.lee@linux.alibaba.com
|
|
Fix some function names in kernel-doc comments.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408053109.96360-1-yang.lee@linux.alibaba.com
|
|
Fix the kernel-doc annotation for the 'skip' parameter in the
partial_decompress() function by adding a missing underscore and colon.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240408083916.123369-1-yang.lee@linux.alibaba.com
|
|
make C=1 currently gives the following warning:
drivers/cpufreq/intel_pstate.c:262: warning: Function parameter or struct member 'epp_cached' not described in 'cpudata'
Add the missing ":" to fix the trivial kernel-doc syntax error.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Commit ab1a517d55b0 ("powerpc/syscall: Rename syscall_64.c into
interrupt.c") missed to update these three lines:
GCOV_PROFILE_syscall_64.o := n
KCOV_INSTRUMENT_syscall_64.o := n
UBSAN_SANITIZE_syscall_64.o := n
To restore the original behavior, we could replace them with:
GCOV_PROFILE_interrupt.o := n
KCOV_INSTRUMENT_interrupt.o := n
UBSAN_SANITIZE_interrupt.o := n
However, nobody has noticed the functional change in the past three
years, so they were unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240216135517.2002749-1-masahiroy@kernel.org
|
|
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:
test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
eBPF filter opcode 00d7 (@2) unsupported
jited:0 301 PASS
test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
eBPF filter opcode 00d7 (@2) unsupported
jited:0 555 PASS
test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
eBPF filter opcode 00d7 (@2) unsupported
jited:0 268 PASS
test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
eBPF filter opcode 00d7 (@2) unsupported
jited:0 269 PASS
test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
eBPF filter opcode 00d7 (@2) unsupported
jited:0 460 PASS
test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
eBPF filter opcode 00d7 (@2) unsupported
jited:0 320 PASS
test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
eBPF filter opcode 00d7 (@2) unsupported
jited:0 222 PASS
test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
eBPF filter opcode 00d7 (@2) unsupported
jited:0 273 PASS
test_bpf: #344 BPF_LDX_MEMSX | BPF_B
eBPF filter opcode 0091 (@5) unsupported
jited:0 432 PASS
test_bpf: #345 BPF_LDX_MEMSX | BPF_H
eBPF filter opcode 0089 (@5) unsupported
jited:0 381 PASS
test_bpf: #346 BPF_LDX_MEMSX | BPF_W
eBPF filter opcode 0081 (@5) unsupported
jited:0 505 PASS
test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
eBPF filter opcode 0006 (@1) unsupported
jited:0 261 PASS
test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]
Fix them by adding missing processing.
Fixes: daabb2b098e0 ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
|
|
This loop is supposed to copy the mac address to cmd->addr but the
i++ increment is missing so it copies everything to cmd->addr[0] and
only the last address is recorded.
Fixes: 22bedad3ce11 ("net: convert multicast list to list_head")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://msgid.link/b788be9a-15f5-4cca-a3fe-79df4c8ce7b2@moroto.mountain
|
|
make C=1 reports:
warning: Function parameter or struct member 'dev' not described in 'ladder_do_selection'
Document 'dev' for this function.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_parse_memory_affinity()
After removing architectural code the helper function
acpi_numa_memory_affinity_init() is no longer needed. Squash it into
acpi_parse_memory_affinity(). No functional changes intended.
While at it, fixing checkpatch complaints in code moved.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202403220943.96dde419-oliver.sang@intel.com
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
After removing architectural code the helper function
acpi_numa_slit_init() is no longer needed. Squash it into
acpi_parse_slit(). No functional changes intended.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
With the removal of the Itanium architecture [1] the last architecture
dependent functions:
acpi_numa_slit_init(), acpi_numa_memory_affinity_init()
were removed. Remove its remainings in the header files too and make
them static.
[1] commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture")
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
For configurations that have the kconfig option NUMA_KEEP_MEMINFO
disabled, numa_fill_memblks() only returns with NUMA_NO_MEMBLK (-1).
SRAT lookup fails then because an existing SRAT memory range cannot be
found for a CFMWS address range. This causes the addition of a
duplicate numa_memblk with a different node id and a subsequent page
fault and kernel crash during boot.
Fix this by making numa_fill_memblks() always available regardless of
NUMA_KEEP_MEMINFO.
As Dan suggested, the fix is implemented to remove numa_fill_memblks()
from sparsemem.h and alos using __weak for the function.
Note that the issue was initially introduced with [1]. But since
phys_to_target_node() was originally used that returned the valid node
0, an additional numa_memblk was not added. Though, the node id was
wrong too, a message is seen then in the logs:
kernel/numa.c: pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
[1] commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
CFMWS not in SRAT")
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/all/66271b0072317_69102944c@dwillia2-xfh.jf.intel.com.notmuch/
Fixes: 8f1004679987 ("ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Lenovo Slim 7 16ARH7 is a machine with switchable graphics between AMD
and Nvidia, and the backlight can't be adjusted properly unless
acpi_backlight=native is passed. Although nvidia-wmi-backlight is
present and loaded, this doesn't work as expected at all.
For making it working as default, add the corresponding quirk entry
with a DMI matching "LENOVO" "82UX".
Link: https://bugzilla.suse.com/show_bug.cgi?id=1217750
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
page_pool::p is driver-defined params, copied directly from the
structure passed to page_pool_create(). The structure isn't meant
to be modified by the Page Pool core code and this even might look
confusing[0][1].
In order to be able to alter some flags, let's define our own, internal
fields the same way as the already existing one (::has_init_callback).
They are defined as bits in the driver-set params, leave them so here
as well, to not waste byte-per-bit or so. Almost 30 bits are still free
for future extensions.
We could've defined only new flags here or only the ones we may need
to alter, but checking some flags in one place while others in another
doesn't sound convenient or intuitive. ::flags passed by the driver can
now go to the "slow" PP params.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link[0]: https://lore.kernel.org/netdev/20230703133207.4f0c54ce@kernel.org
Suggested-by: Alexander Duyck <alexanderduyck@fb.com>
Link[1]: https://lore.kernel.org/netdev/CAKgT0UfZCGnWgOH96E4GV3ZP6LLbROHM7SHE8NKwq+exX+Gk_Q@mail.gmail.com
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
After commit 5027ec19f104 ("net: page_pool: split the page_pool_params
into fast and slow") that made &page_pool contain only "hot" params at
the start, cacheline boundary chops frag API fields group in the middle
again.
To not bother with this each time fast params get expanded or shrunk,
let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
their actual size (2 longs + 1 int). This ensures 16-byte alignment for
the 32-bit architectures and 32-byte alignment for the 64-bit ones,
excluding unnecessary false-sharing.
::page_state_hold_cnt is used quite intensively on hotpath no matter if
frag API is used, so move it to the newly created hole in the first
cacheline.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When IOMMU is on, the actual synchronization happens in the same cases
as with the direct DMA. Advertise %DMA_F_CAN_SKIP_SYNC in IOMMU DMA to
skip sync ops calls (indirect) for non-SWIOTLB buffers.
perf profile before the patch:
18.53% [kernel] [k] gq_rx_skb
14.77% [kernel] [k] napi_reuse_skb
8.95% [kernel] [k] skb_release_data
5.42% [kernel] [k] dev_gro_receive
5.37% [kernel] [k] memcpy
<*> 5.26% [kernel] [k] iommu_dma_sync_sg_for_cpu
4.78% [kernel] [k] tcp_gro_receive
<*> 4.42% [kernel] [k] iommu_dma_sync_sg_for_device
4.12% [kernel] [k] ipv6_gro_receive
3.65% [kernel] [k] gq_pool_get
3.25% [kernel] [k] skb_gro_receive
2.07% [kernel] [k] napi_gro_frags
1.98% [kernel] [k] tcp6_gro_receive
1.27% [kernel] [k] gq_rx_prep_buffers
1.18% [kernel] [k] gq_rx_napi_handler
0.99% [kernel] [k] csum_partial
0.74% [kernel] [k] csum_ipv6_magic
0.72% [kernel] [k] free_pcp_prepare
0.60% [kernel] [k] __napi_poll
0.58% [kernel] [k] net_rx_action
0.56% [kernel] [k] read_tsc
<*> 0.50% [kernel] [k] __x86_indirect_thunk_r11
0.45% [kernel] [k] memset
After patch, lines with <*> no longer show up, and overall
cpu usage looks much better (~60% instead of ~72%):
25.56% [kernel] [k] gq_rx_skb
9.90% [kernel] [k] napi_reuse_skb
7.39% [kernel] [k] dev_gro_receive
6.78% [kernel] [k] memcpy
6.53% [kernel] [k] skb_release_data
6.39% [kernel] [k] tcp_gro_receive
5.71% [kernel] [k] ipv6_gro_receive
4.35% [kernel] [k] napi_gro_frags
4.34% [kernel] [k] skb_gro_receive
3.50% [kernel] [k] gq_pool_get
3.08% [kernel] [k] gq_rx_napi_handler
2.35% [kernel] [k] tcp6_gro_receive
2.06% [kernel] [k] gq_rx_prep_buffers
1.32% [kernel] [k] csum_partial
0.93% [kernel] [k] csum_ipv6_magic
0.65% [kernel] [k] net_rx_action
iavf yields +10% of Mpps on Rx. This also unblocks batched allocations
of XSk buffers when IOMMU is active.
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Quite often, devices do not need dma_sync operations on x86_64 at least.
Indeed, when dev_is_dma_coherent(dev) is true and
dev_use_swiotlb(dev) is false, iommu_dma_sync_single_for_cpu()
and friends do nothing.
However, indirectly calling them when CONFIG_RETPOLINE=y consumes about
10% of cycles on a cpu receiving packets from softirq at ~100Gbit rate.
Even if/when CONFIG_RETPOLINE is not set, there is a cost of about 3%.
Add dev->need_dma_sync boolean and turn it off during the device
initialization (dma_set_mask()) depending on the setup:
dev_is_dma_coherent() for the direct DMA, !(sync_single_for_device ||
sync_single_for_cpu) or the new dma_map_ops flag, %DMA_F_CAN_SKIP_SYNC,
advertised for non-NULL DMA ops.
Then later, if/when swiotlb is used for the first time, the flag
is reset back to on, from swiotlb_tbl_map_single().
On iavf, the UDP trafficgen with XDP_DROP in skb mode test shows
+3-5% increase for direct DMA.
Suggested-by: Christoph Hellwig <hch@lst.de> # direct DMA shortcut
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Some platforms do have DMA, but DMA there is always direct and coherent.
Currently, even on such platforms DMA sync operations are compiled and
called.
Add a new hidden Kconfig symbol, DMA_NEED_SYNC, and set it only when
either sync operations are needed or there is DMA ops or swiotlb
or DMA debug is enabled. Compile global dma_sync_*() and dma_need_sync()
only when it's set, otherwise provide empty inline stubs.
The change allows for future optimizations of DMA sync calls depending
on runtime conditions.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
iommu_dma_map_page() allocates swiotlb memory as a bounce buffer when an
untrusted device wants to map only part of the memory in an granule. The
goal is to disallow the untrusted device having DMA access to unrelated
kernel data that may be sharing the granule. To meet this goal, the
bounce buffer itself is zeroed, and any additional swiotlb memory up to
alloc_size after the bounce buffer end (i.e., "post-padding") is also
zeroed.
However, as of commit 901c7280ca0d ("Reinstate some of "swiotlb: rework
"fix info leak with DMA_FROM_DEVICE"""), swiotlb_tbl_map_single() always
initializes the contents of the bounce buffer to the original memory.
Zeroing the bounce buffer is redundant and probably wrong per the
discussion in that commit. Only the post-padding needs to be zeroed.
Also, when the DMA min_align_mask is non-zero, the allocated bounce
buffer space may not start on a granule boundary. The swiotlb memory
from the granule boundary to the start of the allocated bounce buffer
might belong to some unrelated bounce buffer. So as described in the
"second issue" in [1], it can't be zeroed to protect against untrusted
devices. But as of commit af133562d5af ("swiotlb: extend buffer
pre-padding to alloc_align_mask if necessary"), swiotlb_tbl_map_single()
allocates pre-padding slots when necessary to meet min_align_mask
requirements, making it possible to zero the pre-padding area as well.
Finally, iommu_dma_map_page() uses the swiotlb for untrusted devices
and also for certain kmalloc() memory. Current code does the zeroing
for both cases, but it is needed only for the untrusted device case.
Fix all of this by updating iommu_dma_map_page() to zero both the
pre-padding and post-padding areas, but not the actual bounce buffer.
Do this only in the case where the bounce buffer is used because
of an untrusted device.
[1] https://lore.kernel.org/all/20210929023300.335969-1-stevensd@google.com/
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Petr Tesarik <petr@tesarici.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Currently swiotlb_tbl_map_single() takes alloc_align_mask and
alloc_size arguments to specify an swiotlb allocation that is larger
than mapping_size. This larger allocation is used solely by
iommu_dma_map_single() to handle untrusted devices that should not have
DMA visibility to memory pages that are partially used for unrelated
kernel data.
Having two arguments to specify the allocation is redundant. While
alloc_align_mask naturally specifies the alignment of the starting
address of the allocation, it can also implicitly specify the size
by rounding up the mapping_size to that alignment.
Additionally, the current approach has an edge case bug.
iommu_dma_map_page() already does the rounding up to compute the
alloc_size argument. But swiotlb_tbl_map_single() then calculates the
alignment offset based on the DMA min_align_mask, and adds that offset to
alloc_size. If the offset is non-zero, the addition may result in a value
that is larger than the max the swiotlb can allocate. If the rounding up
is done _after_ the alignment offset is added to the mapping_size (and
the original mapping_size conforms to the value returned by
swiotlb_max_mapping_size), then the max that the swiotlb can allocate
will not be exceeded.
In view of these issues, simplify the swiotlb_tbl_map_single() interface
by removing the alloc_size argument. Most call sites pass the same value
for mapping_size and alloc_size, and they pass alloc_align_mask as zero.
Just remove the redundant argument from these callers, as they will see
no functional change. For iommu_dma_map_page() also remove the alloc_size
argument, and have swiotlb_tbl_map_single() compute the alloc_size by
rounding up mapping_size after adding the offset based on min_align_mask.
This has the side effect of fixing the edge case bug but with no other
functional change.
Also add a sanity test on the alloc_align_mask. While IOMMU code
currently ensures the granule is not larger than PAGE_SIZE, if that
guarantee were to be removed in the future, the downstream effect on the
swiotlb might go unnoticed until strange allocation failures occurred.
Tested on an ARM64 system with 16K page size and some kernel test-only
hackery to allow modifying the DMA min_align_mask and the granule size
that becomes the alloc_align_mask. Tested these combinations with a
variety of original memory addresses and sizes, including those that
reproduce the edge case bug:
* 4K granule and 0 min_align_mask
* 4K granule and 0xFFF min_align_mask (4K - 1)
* 16K granule and 0xFFF min_align_mask
* 64K granule and 0xFFF min_align_mask
* 64K granule and 0x3FFF min_align_mask (16K - 1)
With the changes, all combinations pass.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Petr Tesarik <petr@tesarici.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
'timeout' is a vague name for the return value of wait_event_*_timeout
because it actually returns the time left. Because the variable is never
used later, just drop the return value. Since variable 'timeout' is then
only used to carry a fixed timeout value, drop this in favor of a fixed
function argument as in the other call to wait_event_timeout() above.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Drop static const arrays with HSI2 clocks parent data which are not
referenced by any clock. This might cause -Werror=unused-const-variable
warnings.
Reported-by: Stephen Boyd <sboyd@kernel.org>
Closes: https://lore.kernel.org/all/8bf65df598680f0785c3d6db70acfb9a.sboyd@kernel.org/
Fixes: 093c290084a4 ("clk: samsung: gs101: add support for cmu_hsi2")
Reviewed-by: Peter Griffin <peter.griffin@linaro.org>
Link: https://lore.kernel.org/r/20240507055948.34554-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
|
|
Daniel Jurgens says:
====================
Remove RTNL lock protection of CVQ
Currently the buffer used for control VQ commands is protected by the
RTNL lock. Previously this wasn't a major concern because the control VQ
was only used during device setup and user interaction. With the recent
addition of dynamic interrupt moderation the control VQ may be used
frequently during normal operation.
This series removes the RNTL lock dependency by introducing a mutex
to protect the control buffer and writing SGs to the control VQ.
v6:
- Rebased over new stats code.
- Added comment to cvq_lock, init the mutex unconditionally,
and replaced some duplicate code with a goto.
- Fixed minor grammer errors, checkpatch warnings, and clarified
a comment.
v5:
- Changed cvq_lock to a mutex.
- Changed dim_lock to mutex, because it's held taking
the cvq_lock.
- Use spin/mutex_lock/unlock vs guard macros.
v4:
- Protect dim_enabled with same lock as well intr_coal.
- Rename intr_coal_lock to dim_lock.
- Remove some scoped_guard where the error path doesn't
have to be in the lock.
v3:
- Changed type of _offloads to __virtio16 to fix static
analysis warning.
- Moved a misplaced hunk to the correct patch.
v2:
- New patch to only process the provided queue in
virtnet_dim_work
- New patch to lock per queue rx coalescing structure.
====================
Link: https://lore.kernel.org/r/20240503202445.1415560-1-danielj@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The rtnl lock is no longer needed to protect the control buffer and
command VQ.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once the RTNL locking around the control buffer is removed there can be
contention on the per queue RX interrupt coalescing data. Use a mutex
per queue. A mutex is required because virtnet_send_command can sleep.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since we no longer have to hold the RTNL lock here just do updates for
the specified queue.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The command VQ will no longer be protected by the RTNL lock. Use a
mutex to protect the control buffer header and the VQ.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Allocate memory for the data when it's used. Ideally the struct could
be on the stack, but we can't DMA stack memory. With this change only
the header and status memory are shared between commands, which will
allow using a tighter lock than RTNL.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Stop storing RSS setting in the control buffer. This is prep work for
removing RTNL lock protection of the control buffer.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
Tested-by: Heng Qi <hengqi@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, the MT7530 DSA subdriver configures the MT7530 switch to provide
direct access to switch PHYs, meaning, the switch PHYs listen on the MDIO
bus the switch listens on. The PHY muxing feature makes use of this.
This is problematic as the PHY may be attached before the switch is
initialised, in which case, the PHY will fail to be attached.
Since commit 91374ba537bd ("net: dsa: mt7530: support OF-based registration
of switch MDIO bus"), we can describe the switch PHYs on the MDIO bus of
the switch on the device tree. Extend the check to detect PHY muxing when
the PHY is defined on the MDIO bus of the switch on the device tree.
When the PHY is described this way, the switch will be initialised first,
then the switch MDIO bus will be registered. Only after these steps, the
PHY will be attached.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/20240430-b4-for-netnext-mt7530-use-switch-mdio-bus-for-phy-muxing-v2-1-9104d886d0db@arinc9.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit 9385/2 introduced a few branches inside function
prototypes when using CFI in order to deal with the situation
where CFI inserts a few bytes of function information in front
of the symbol.
This is not good for older CPUs where every cycle counts.
Commit 9386/2 alleviated the situation a bit by using aliases
for the cache functions with identical signatures.
This leaves the coherent cache flush functions
*_coherent_kern_range() with these branches to the corresponing
*_coherent_user_range() around, since their return type differ and
they therefore cannot be aliased.
Solve this by a simple ifdef so at least we can use fallthroughs
when compiling without CFI enabled.
Link: https://lore.kernel.org/linux-arm-kernel/Zi+e9M%2Ff5b%2FSto9H@shell.armlinux.org.uk/
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Eric Dumazet says:
====================
rtnetlink: more rcu conversions for rtnl_fill_ifinfo()
We want to no longer rely on RTNL for "ip link show" command.
This is a long road, this series takes care of some parts.
====================
Link: https://lore.kernel.org/r/20240503192059.3884225-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
All rtnl_link_ops->get_link_net() methods already using dev_net()
are ready. I added READ_ONCE() annotations on others.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|