summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-15io_uring: introduce concept of memory regionsPavel Begunkov
We've got a good number of mappings we share with the userspace, that includes the main rings, provided buffer rings, upcoming rings for zerocopy rx and more. All of them duplicate user argument parsing and some internal details as well (page pinnning, huge page optimisations, mmap'ing, etc.) Introduce a notion of regions. For userspace for now it's just a new structure called struct io_uring_region_desc which is supposed to parameterise all such mapping / queue creations. A region either represents a user provided chunk of memory, in which case the user_addr field should point to it, or a request for the kernel to allocate the memory, in which case the user would need to mmap it after using the offset returned in the mmap_offset field. With a uniform userspace API we can avoid additional boiler plate code and apply future optimisation to all of them at once. Internally, there is a new structure struct io_mapped_region holding all relevant runtime information and some helpers to work with it. This patch limits it to user provided regions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0e6fe25818dfbaebd1bd90b870a6cac503fe1a24.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-15io_uring: temporarily disable registered waitsPavel Begunkov
Disable wait argument registration as it'll be replaced with a more generic feature. We'll still need IORING_ENTER_EXT_ARG_REG parsing in a few commits so leave it be. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/70b1d1d218c41ba77a76d1789c8641dab0b0563e.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-15io_uring: disable ENTER_EXT_ARG_REG for IOPOLLPavel Begunkov
IOPOLL doesn't use the extended arguments, no need for it to support IORING_ENTER_EXT_ARG_REG. Let's disable it for IOPOLL, if anything it leaves more space for future extensions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a35ecd919dbdc17bd5b7932273e317832c531b45.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-15io_uring: fortify io_pin_pages with a warningPavel Begunkov
We're a bit too frivolous with types of nr_pages arguments, converting it to long and back to int, passing an unsigned int pointer as an int pointer and so on. Shouldn't cause any problem but should be carefully reviewed, but until then let's add a WARN_ON_ONCE check to be more confident callers don't pass poorely checked arguents. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d48e0c097cbd90fb47acaddb6c247596510d8cfc.1731689588.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-15switch io_msg_ring() to CLASS(fd)Al Viro
Use CLASS(fd) to get the file for sync message ring requests, rather than open-code the file retrieval dance. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20241115034902.GP3387508@ZenIV [axboe: make a more coherent commit message] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-15workqueue: Reduce expensive locks for unbound workqueueWangyang Guo
For unbound workqueue, pwqs usually map to just a few pools. Most of the time, pwqs will be linked sequentially to wq->pwqs list by cpu index. Usually, consecutive CPUs have the same workqueue attribute (e.g. belong to the same NUMA node). This makes pwqs with the same pool cluster together in the pwq list. Only do lock/unlock if the pool has changed in flush_workqueue_prep_pwqs(). This reduces the number of expensive lock operations. The performance data shows this change boosts FIO by 65x in some cases when multiple concurrent threads write to xfs mount points with fsync. FIO Benchmark Details - FIO version: v3.35 - FIO Options: ioengine=libaio,iodepth=64,norandommap=1,rw=write, size=128M,bs=4k,fsync=1 - FIO Job Configs: 64 jobs in total writing to 4 mount points (ramdisks formatted as xfs file system). - Kernel Codebase: v6.12-rc5 - Test Platform: Xeon 8380 (2 sockets) Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-11-15efi/libstub: Parse builtin command line after bootloader provided oneArd Biesheuvel
When CONFIG_CMDLINE_EXTEND is set, the core kernel command line handling logic appends CONFIG_CMDLINE to the bootloader provided command line. The EFI stub does the opposite, and parses the builtin one first. The usual behavior of command line options is that the last one takes precedence if it appears multiple times, unless there is a meaningful way to combine them. In either case, parsing the builtin command line first while the core kernel does it in the opposite order is likely to produce inconsistent results in such cases. Therefore, switch the order in the stub to match the core kernel. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-11-15x86/efi: Apply EFI Memory Attributes after kexecNicolas Saenz Julienne
Kexec bypasses EFI's switch to virtual mode. In exchange, it has its own routine, kexec_enter_virtual_mode(), which replays the mappings made by the original kernel. Unfortunately, that function fails to reinstate EFI's memory attributes, which would've otherwise been set after entering virtual mode. Remediate this by calling efi_runtime_update_mappings() within kexec's routine. Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-11-15x86/efi: Drop support for the EFI_PROPERTIES_TABLENicolas Saenz Julienne
Drop support for the EFI_PROPERTIES_TABLE. It was a failed, short-lived experiment that broke the boot both on Linux and Windows, and was replaced by the EFI_MEMORY_ATTRIBUTES_TABLE shortly after. Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-11-15bpf: Add necessary migrate_disable to range_tree.Yonghong Song
When running bpf selftest (./test_progs -j), the following warnings showed up: $ ./test_progs -t arena_atomics ... BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u19:0/12501 caller is bpf_mem_free+0x128/0x330 ... Call Trace: <TASK> dump_stack_lvl check_preemption_disabled bpf_mem_free range_tree_destroy arena_map_free bpf_map_free_deferred process_scheduled_works ... For selftests arena_htab and arena_list, similar smp_process_id() BUGs are dumped, and the following are two stack trace: <TASK> dump_stack_lvl check_preemption_disabled bpf_mem_alloc range_tree_set arena_map_alloc map_create ... <TASK> dump_stack_lvl check_preemption_disabled bpf_mem_alloc range_tree_clear arena_vm_fault do_pte_missing handle_mm_fault do_user_addr_fault ... Add migrate_{disable,enable}() around related bpf_mem_{alloc,free}() calls to fix the issue. Fixes: b795379757eb ("bpf: Introduce range_tree data structure and use it in bpf arena") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20241115060354.2832495-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-15bpf: Do not alloc arena on unsupported archesViktor Malik
Do not allocate BPF arena on arches that do not support it, instead return EOPNOTSUPP. This is useful to prevent bugs such as soft lockups while trying to free the arena which we have witnessed on ppc64le [1]. [1] https://lore.kernel.org/bpf/4afdcb50-13f2-4772-8db1-3fd02bd985b3@redhat.com/ Signed-off-by: Viktor Malik <vmalik@redhat.com> Link: https://lore.kernel.org/r/20241115082548.74972-1-vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-15perf test shell trace_exit_race: Show what went wrong in verbose modeArnaldo Carvalho de Melo
If it fails we need to check what was the reason, what were the lines that didn't match the expected format, so: root@number:~# perf test -v "trace exit race" --- start --- test child forked, pid 2028724 Lines not matching the expected regexp: ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$': 0.000 :2028750/2028750 syscalls:sys_enter_exit_group() ---- end(-1) ---- 110: perf trace exit race : FAILED! root@number:~# In this case we're not resolving the process COMM for some reason and fallback to printing just the pid/tid, this will be fixed in a followup patch. Howard Chu spotted a problem with single code surrounding a regexp, that made the test always fail, but since there were some failures when I tested (COMM not being resolved in some of the results) the end inverse grep would show some lines and thus didn't notice the single quote problem. He also provided a patch to test if less than the number of expected matches took place but all of them with the expected output, in which case the inverse grep wouldn't show anything, confusing the tester. Reviewed-by: Howard Chu <howardchu95@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Benjamin Peterson <benjamin@engflow.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZzdknoHqrJbojb6P@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-11-15Merge tag 'soc_fsl-6.13-1' of https://github.com/chleroy/linux into soc/driversArnd Bergmann
FSL SOC changes for 6.13: - Fix a missing of_node_put() in RCPM - Fix a missing error code on failure in CPM1 QMC - Switch to using for_each_available_child_of_node_scoped() in CPM1 TSA * tag 'soc_fsl-6.13-1' of https://github.com/chleroy/linux: soc: fsl: cpm1: qmc: Set the ret error code on platform_get_irq() failure soc: fsl: rcpm: fix missing of_node_put() in copy_ippdexpcr1_setting() soc: fsl: cpm1: tsa: switch to for_each_available_child_of_node_scoped() Link: https://lore.kernel.org/r/c3c4961b-fe2a-4fcc-a7a1-f8b5352e09a2@csgroup.eu Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-15block: make struct rq_list available for !CONFIG_BLOCKJens Axboe
A previous commit changed how requests are linked in the plug structure, but unlike the previous method, it uses a new type for it rather than struct request. The latter is available even for !CONFIG_BLOCK, while struct rq_list is now. Move it outside CONFIG_BLOCK. Reported-by: Nathan Chancellor <nathan@kernel.org> Fixes: a3396b99990d ("block: add a rq_list type") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-15ASoC: hdmi-codec: reorder channel allocation listJonas Karlman
The ordering in hdmi_codec_get_ch_alloc_table_idx() results in wrong channel allocation for a number of cases, e.g. when ELD reports FL|FR|LFE|FC|RL|RR or FL|FR|LFE|FC|RL|RR|RC|RLC|RRC: ca_id 0x01 with speaker mask FL|FR|LFE is selected instead of ca_id 0x03 with speaker mask FL|FR|LFE|FC for 4 channels and ca_id 0x04 with speaker mask FL|FR|RC gets selected instead of ca_id 0x0b with speaker mask FL|FR|LFE|FC|RL|RR for 6 channels Fix this by reordering the channel allocation list with most specific speaker masks at the top. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Link: https://patch.msgid.link/20241115044344.3510979-1-christianshewitt@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-15tools/thermal: Fix common realloc mistakezhang jiao
If the 'realloc' fails, the thermal zones pointer is set to NULL. This makes all thermal zones references which were previously successfully initialized to be lost. [dlezcano] : Fixed indentation Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241114084039.42149-1-zhangjiao2@cmss.chinamobile.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-11-15crypto: marvell/cesa - fix uninit value for struct mv_cesa_op_ctxKarol Przybylski
In cesa/cipher.c most declarations of struct mv_cesa_op_ctx are uninitialized. This causes one of the values in the struct to be left unitialized in later usages. This patch fixes it by adding initializations in the same way it is done in cesa/hash.c. Fixes errors discovered in coverity: 1600942, 1600939, 1600935, 1600934, 1600929, 1600927, 1600925, 1600921, 1600920, 1600919, 1600915, 1600914 Signed-off-by: Karol Przybylski <karprzy7@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: cavium - Fix an error handling path in cpt_ucode_load_fw()Christophe JAILLET
If do_cpt_init() fails, a previous dma_alloc_coherent() call needs to be undone. Add the needed dma_free_coherent() before returning. Fixes: 9e2c7d99941d ("crypto: cavium - Add Support for Octeon-tx CPT Engine") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: aesni - Move back to module_initHerbert Xu
This patch reverts commit 0fbafd06bdde938884f7326548d3df812b267c3c ("crypto: aesni - fix failing setkey for rfc4106-gcm-aesni") by moving the aesni init function back to module_init from late_initcall. The original patch was needed because tests were synchronous. This is no longer the case so there is no need to postpone the registration. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: lib/mpi - Export mpi_set_bitHerbert Xu
This function is part of the exposed API and should be exported. Otherwise a modular user would fail to build, e.g., crypto/rsa. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: aes-gcm-p10 - Use the correct bit to test for P10Michal Suchanek
A hwcap feature bit is passed to cpu_has_feature, resulting in testing for CPU_FTR_MMCRA instead of the 3.1 platform revision. Fixes: c954b252dee9 ("crypto: powerpc/p10-aes-gcm - Register modules as SIMD") Reported-by: Nicolai Stange <nstange@suse.com> Signed-off-by: Michal Suchanek <msuchanek@suse.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15hwrng: amd - remove reference to removed PPC_MAPLE configLukas Bulwahn
Commit 62f8f307c80e ("powerpc/64: Remove maple platform") removes the PPC_MAPLE config as a consequence of the platform’s removal. The config definition of HW_RANDOM_AMD refers to this removed config option in its dependencies. Remove the reference to the removed config option. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: arm/crct10dif - Implement plain NEON variantArd Biesheuvel
The CRC-T10DIF algorithm produces a 16-bit CRC, and this is reflected in the folding coefficients, which are also only 16 bits wide. This means that the polynomial multiplications involving these coefficients can be performed using 8-bit long polynomial multiplication (8x8 -> 16) in only a few steps, and this is an instruction that is part of the base NEON ISA, which is all most real ARMv7 cores implement. (The 64-bit PMULL instruction is part of the crypto extensions, which are only implemented by 64-bit cores) The final reduction is a bit more involved, but we can delegate that to the generic CRC-T10DIF implementation after folding the entire input into a 16 byte vector. This results in a speedup of around 6.6x on Cortex-A72 running in 32-bit mode. On Cortex-A8 (BeagleBone White), the results are substantially better than that, but not sufficiently reproducible (with tcrypt) to quote a number here. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: arm/crct10dif - Macroify PMULL asm codeArd Biesheuvel
To allow an alternative version to be created of the PMULL based CRC-T10DIF algorithm, turn the bulk of it into a macro, except for the final reduction, which will only be used by the existing version. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: arm/crct10dif - Use existing mov_l macro instead of __adrlArd Biesheuvel
Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback codeArd Biesheuvel
The only remaining user of the fallback implementation of 64x64 polynomial multiplication using 8x8 PMULL instructions is the final reduction from a 16 byte vector to a 16-bit CRC. The fallback code is complicated and messy, and this reduction has little impact on the overall performance, so instead, let's calculate the final CRC by passing the 16 byte vector to the generic CRC-T10DIF implementation when running the fallback version. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiplyArd Biesheuvel
The CRC-T10DIF implementation for arm64 has a version that uses 8x8 polynomial multiplication, for cores that lack the crypto extensions, which cover the 64x64 polynomial multiplication instruction that the algorithm was built around. This fallback version rather naively adopted the 64x64 polynomial multiplication algorithm that I ported from ARM for the GHASH driver, which needs 8 PMULL8 instructions to implement one PMULL64. This is reasonable, given that each 8-bit vector element needs to be multiplied with each element in the other vector, producing 8 vectors with partial results that need to be combined to yield the correct result. However, most PMULL64 invocations in the CRC-T10DIF code involve multiplication by a pair of 16-bit folding coefficients, and so all the partial results from higher order bytes will be zero, and there is no need to calculate them to begin with. Then, the CRC-T10DIF algorithm always XORs the output values of the PMULL64 instructions being issued in pairs, and so there is no need to faithfully implement each individual PMULL64 instruction, as long as XORing the results pairwise produces the expected result. Implementing these improvements results in a speedup of 3.3x on low-end platforms such as Raspberry Pi 4 (Cortex-A72) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: arm64/crct10dif - Remove obsolete chunking logicArd Biesheuvel
This is a partial revert of commit fc754c024a343b, which moved the logic into C code which ensures that kernel mode NEON code does not hog the CPU for too long. This is no longer needed now that kernel mode NEON no longer disables preemption, so we can drop this. Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: bcm - add error check in the ahash_hmac_init functionChen Ridong
The ahash_init functions may return fails. The ahash_hmac_init should not return ok when ahash_init returns error. For an example, ahash_init will return -ENOMEM when allocation memory is error. Fixes: 9d12ba86f818 ("crypto: brcm - Add Broadcom SPU driver") Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15crypto: caam - add error check to caam_rsa_set_priv_key_formChen Ridong
The caam_rsa_set_priv_key_form did not check for memory allocation errors. Add the checks to the caam_rsa_set_priv_key_form functions. Fixes: 52e26d77b8b3 ("crypto: caam - add support for RSA key form 2") Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-11-15netfilter: bitwise: add support for doing AND, OR and XOR directlyJeremy Sowden
Hitherto, these operations have been converted in user space to mask-and-xor operations on one register and two immediate values, and it is the latter which have been evaluated by the kernel. We add support for evaluating these operations directly in kernel space on one register and either an immediate value or a second register. Pablo made a few changes to the original patch: - EINVAL if NFTA_BITWISE_SREG2 is used with fast version. - Allow _AND,_OR,_XOR with _DATA != sizeof(u32) - Dump _SREG2 or _DATA with _AND,_OR,_XOR Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15efi/memattr: Ignore table if the size is clearly bogusArd Biesheuvel
There are reports [0] of cases where a corrupt EFI Memory Attributes Table leads to out of memory issues at boot because the descriptor size and entry count in the table header are still used to reserve the entire table in memory, even though the resulting region is gigabytes in size. Given that the EFI Memory Attributes Table is supposed to carry up to 3 entries for each EfiRuntimeServicesCode region in the EFI memory map, and given that there is no reason for the descriptor size used in the table to exceed the one used in the EFI memory map, 3x the size of the entire EFI memory map is a reasonable upper bound for the size of this table. This means that sizes exceeding that are highly likely to be based on corrupted data, and the table should just be ignored instead. [0] https://bugzilla.suse.com/show_bug.cgi?id=1231465 Cc: Gregory Price <gourry@gourry.net> Cc: Usama Arif <usamaarif642@gmail.com> Acked-by: Jiri Slaby <jirislaby@kernel.org> Acked-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/all/20240912155159.1951792-2-ardb+git@google.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-11-15ecryptfs: Fix spelling mistake "validationg" -> "validating"Colin Ian King
There is a spelling mistake in an error message literal string. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20241108112509.109891-1-colin.i.king@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-15Merge patch series "ecryptfs: convert to the new mount API"Christian Brauner
Eric Sandeen <sandeen@redhat.com> says: This is lightly tested with the kernel tests present in ecryptfs-utils, but it could certainly use a bit more testing and review, particularly with invalid mount option sets. This one is a little unique compared to other filesystems in that I allocate both an fs context and the *sbi in .init_fs_context; the *sbi is long-lived, and the context is only present during the initial mount. Allocating sbi with the filesystem context means we can set options into it directly, rather than needing to do it after parsing. And it's particularly simple to do it this way given that there is no remount. * patches from https://lore.kernel.org/r/20241028143359.605061-1-sandeen@redhat.com: ecryptfs: Convert ecryptfs to use the new mount API ecryptfs: Factor out mount option validation Link: https://lore.kernel.org/r/20241028143359.605061-1-sandeen@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-15ecryptfs: Convert ecryptfs to use the new mount APIEric Sandeen
Convert ecryptfs to the new mount API. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/20241028143359.605061-3-sandeen@redhat.com Acked-by: Tyler Hicks <code@tyhicks.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-15ecryptfs: Factor out mount option validationEric Sandeen
Under the new mount API, mount options are parsed one at a time. Any validation that examines multiple options must be done after parsing is complete, so factor out a ecryptfs_validate_options() which can be called separately. To facilitate this, temporarily move the local variables that tracked whether various options have been set in the parsing function, into the ecryptfs_mount_crypt_stat structure so that they can be examined later. These will be moved to a more ephemeral struct in the mount api conversion patch to follow. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/20241028143359.605061-2-sandeen@redhat.com Acked-by: Tyler Hicks <code@tyhicks.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-15Merge patch series "API for exporting connectable file handles to userspace"Christian Brauner
Amir Goldstein <amir73il@gmail.com> says: These patches bring the NFS connectable file handles feature to userspace servers. They rely on Christian's and Aleksa's changes recently merged to v6.12. The API I chose for encoding conenctable file handles is pretty conventional (AT_HANDLE_CONNECTABLE). open_by_handle_at(2) does not have AT_ flags argument, but also, I find it more useful API that encoding a connectable file handle can mandate the resolving of a connected fd, without having to opt-in for a connected fd independently. I chose to implemnent this by using upper bits in the handle type field It may be that out-of-tree filesystems return a handle type with upper bits set, but AFAIK, no in-tree filesystem does that. I added some warnings just in case we encouter that. I have written an fstest [1] and a man page draft [2] for the feature. [1] https://github.com/amir73il/xfstests/commits/connectable-fh/ [2] https://github.com/amir73il/man-pages/commits/connectable-fh/ * patches from https://lore.kernel.org/r/20241011090023.655623-1-amir73il@gmail.com: fs: open_by_handle_at() support for decoding "explicit connectable" file handles fs: name_to_handle_at() support for "explicit connectable" file handles fs: prepare for "explicit connectable" file handles Link: https://lore.kernel.org/r/20241011090023.655623-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-15fs: open_by_handle_at() support for decoding "explicit connectable" file handlesAmir Goldstein
Teach open_by_handle_at(2) about the type format of "explicit connectable" file handles that were created using the AT_HANDLE_CONNECTABLE flag to name_to_handle_at(2). When decoding an "explicit connectable" file handles, name_to_handle_at(2) should fail if it cannot open a "connected" fd with known path, which is accessible (to capable user) from mount fd path. Note that this does not check if the path is accessible to the calling user, just that it is accessible wrt the mount namesapce, so if there is no "connected" alias, or if parts of the path are hidden in the mount namespace, open_by_handle_at(2) will return -ESTALE. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20241011090023.655623-4-amir73il@gmail.com Fixes: 570df4e9c23f ("ceph: snapshot nfs re-export") Acked-by: Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-15fs: name_to_handle_at() support for "explicit connectable" file handlesAmir Goldstein
nfsd encodes "connectable" file handles for the subtree_check feature, which can be resolved to an open file with a connected path. So far, userspace nfs server could not make use of this functionality. Introduce a new flag AT_HANDLE_CONNECTABLE to name_to_handle_at(2). When used, the encoded file handle is "explicitly connectable". The "explicitly connectable" file handle sets bits in the high 16bit of the handle_type field, so open_by_handle_at(2) will know that it needs to open a file with a connected path. old kernels will now recognize the handle_type with high bits set, so "explicitly connectable" file handles cannot be decoded by open_by_handle_at(2) on old kernels. The flag AT_HANDLE_CONNECTABLE is not allowed together with either AT_HANDLE_FID or AT_EMPTY_PATH. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20241011090023.655623-3-amir73il@gmail.com Fixes: 570df4e9c23f ("ceph: snapshot nfs re-export") Acked-by: Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-15fs: prepare for "explicit connectable" file handlesAmir Goldstein
We would like to use the high 16bit of the handle_type field to encode file handle traits, such as "connectable". In preparation for this change, make sure that filesystems do not return a handle_type value with upper bits set and that the open_by_handle_at(2) syscall rejects these handle types. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20241011090023.655623-2-amir73il@gmail.com Fixes: 570df4e9c23f ("ceph: snapshot nfs re-export") Acked-by: Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-15netfilter: bitwise: rename some boolean operation functionsJeremy Sowden
In the next patch we add support for doing AND, OR and XOR operations directly in the kernel, so rename some functions and an enum constant related to mask-and-xor boolean operations. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t.Guillaume Nault
Use ip4h_dscp() instead of reading iph->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t.Guillaume Nault
Use ip4h_dscp() instead of reading iph->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15netfilter: rpfilter: Convert rpfilter_mt() to dscp_t.Guillaume Nault
Use ip4h_dscp() instead of reading iph->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15netfilter: flow_offload: Convert nft_flow_route() to dscp_t.Guillaume Nault
Use ip4h_dscp()instead of reading ip_hdr()->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Also, remove the comment about the net/ip.h include file, since it's now required for the ip4h_dscp() helper too. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15netfilter: ipv4: Convert ip_route_me_harder() to dscp_t.Guillaume Nault
Use ip4h_dscp()instead of reading iph->tos directly. ip4h_dscp() returns a dscp_t value which is temporarily converted back to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to dscp_t in the future, we'll only have to remove that inet_dscp_to_dsfield() call. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-15efi/zboot: Fix outdated comment about using LoadImage/StartImageArd Biesheuvel
EFI zboot no longer uses LoadImage/StartImage, but subsumes the arch code to load and start the bare metal image directly. Fix the Kconfig description accordingly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-11-15efi/libstub: Free correct pointer on failureArd Biesheuvel
cmdline_ptr is an out parameter, which is not allocated by the function itself, and likely points into the caller's stack. cmdline refers to the pool allocation that should be freed when cleaning up after a failure, so pass this instead to free_pool(). Fixes: 42c8ea3dca09 ("efi: libstub: Factor out EFI stub entrypoint ...") Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-11-15microblaze: mb: Use str_yes_no() helper in show_cpuinfo()Thorsten Blum
Remove hard-coded strings by using the str_yes_no() helper function. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20241114224649.57946-4-thorsten.blum@linux.dev Signed-off-by: Michal Simek <michal.simek@amd.com>
2024-11-15Merge branches 'intel/vt-d', 'amd/amd-vi' and 'iommufd/arm-smmuv3-nested' ↵Joerg Roedel
into next