summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-05-03block: move two helpers into bdev.cYu Kuai
disk_live() and block_size() access bd_inode directly, prepare to remove the field bd_inode from block_device, and only access bd_inode in block layer. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240411145346.2516848-8-viro@zeniv.linux.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-03nvmem: layouts: store owner from modules with nvmem_layout_driver_register()Krzysztof Kozlowski
Modules registering driver with nvmem_layout_driver_register() might forget to set .owner field. The field is used by some of other kernel parts for reference counting (try_module_get()), so it is expected that drivers will set it. Solve the problem by moving this task away from the drivers to the core code, just like we did for platform_driver in commit 9447057eaff8 ("platform_device: use a macro instead of platform_driver_register"). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20240430084921.33387-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-03Merge tag 'coresight-next-v6.10' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Suzuki writes: coresight: hwtracing subsystem updates for v6.10 CoreSight/hwtracing updates for the next release includes: - ACPI power management support for CoreSight legacy components, via migration from AMBA to platform device - Fixes for ETE register save/restore during CPU Idle. - ACPI support TMC for Scatter-Gather mode. - his_ptt driver update to set the parent device for PMU and documentation fixes - Qcomm Trace component DT binding fixes - Miscellaneous cleanups Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> * tag 'coresight-next-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (28 commits) hwtracing: hisi_ptt: Assign parent for event_source device Documentation: ABI + trace: hisi_ptt: update paths to bus/event_source coresight: tmc: Enable SG capability on ACPI based SoC-400 TMC ETR devices coresight: Docs/ABI/testing/sysfs-bus-coresight-devices: Fix spelling errors coresight: tpiu: Convert to platform remove callback returning void coresight: tmc: Convert to platform remove callback returning void coresight: stm: Convert to platform remove callback returning void coresight: debug: Convert to platform remove callback returning void coresight: catu: Convert to platform remove callback returning void coresight: Remove duplicate linux/amba/bus.h header coresight: stm: Remove duplicate linux/acpi.h header coresight: etm4x: Fix access to resource selector registers coresight: etm4x: Safe access for TRCQCLTR coresight: etm4x: Do not save/restore Data trace control registers coresight: etm4x: Do not hardcode IOMEM access for register restore coresight: debug: Move ACPI support from AMBA driver to platform driver coresight: stm: Move ACPI support from AMBA driver to platform driver coresight: tmc: Move ACPI support from AMBA driver to platform driver coresight: tpiu: Move ACPI support from AMBA driver to platform driver coresight: catu: Move ACPI support from AMBA driver to platform driver ...
2024-05-03spi: pxa2xx: Move contents of linux/spi/pxa2xx_spi.h to a local oneAndy Shevchenko
There is no user of the linux/spi/pxa2xx_spi.h. Move its contents to the drivers/spi/spi-pxa2xx.h. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240417110334.2671228-4-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-03regulator: devres: add API for reference voltage suppliesDavid Lechner
A common use case for regulators is to supply a reference voltage to an analog input or output device. This adds a new devres API to get, enable, and get the voltage in a single call. This allows eliminating boilerplate code in drivers that use reference supplies in this way. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20240429-regulator-get-enable-get-votlage-v2-1-b1f11ab766c1@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-02bdev: move ->bd_make_it_fail to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_ro_warned to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_has_subit_bio to ->__bd_flagsAl Viro
In bdev_alloc() we have all flags initialized to false, so assignment to ->bh_has_submit_bio n there is a no-op unless we have partno != 0 and flag already set on entire device. In device_add_disk() we have just allocated the block_device in question and it had been a full-device one, so the flag is guaranteed to be still clear when we get to assignment. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_write_holder into ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_read_only to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: infrastructure for flagsAl Viro
Replace bd_partno with a 32bit field (__bd_flags). The lower 8 bits contain the partition number, the upper 24 are for flags. Helpers: bdev_{test,set,clear}_flag(bdev, flag), with atomic_or() and atomic_andnot() used to set/clear. NOTE: this commit does not actually move any flags over there - they are still bool fields. As the result, it shifts the fields wrt cacheline boundaries; that's going to be restored once the first 3 flags are dealt with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02tcp: Add new args for cong_control in tcp_congestion_opsMiao Xu
This patch adds two new arguments for cong_control of struct tcp_congestion_ops: - ack - flag These two arguments are inherited from the caller tcp_cong_control in tcp_intput.c. One use case of them is to update cwnd and pacing rate inside cong_control based on the info they provide. For example, the flag can be used to decide if it is the right time to raise or reduce a sender's cwnd. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Miao Xu <miaxu@meta.com> Link: https://lore.kernel.org/r/20240502042318.801932-2-miaxu@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02PCI/ASPM: Consolidate link state definesIlpo Järvinen
The linux/pci.h and aspm.c files define their own sets of link state related defines which are almost the same. Consolidate the use of defines into those defined by linux/pci.h and expand PCIE_LINK_STATE_L0S to match earlier ASPM_STATE_L0S that includes both upstream and downstream bits. Rename also the defines that are internal to aspm.c to start with PCIE_LINK_STATE for consistency. While the PCIE_LINK_STATE_L0S BIT(0) -> (BIT(0) | BIT(1)) transformation is not 1:1, in practice aspm.c already used ASPM_STATE_L0S that has both bits enabled except during mapping. While at it, place the PCIE_LINK_STATE_CLKPM define last to have more logical grouping. Use static_assert() to ensure PCIE_LINK_STATE_L0S is strictly equal to the combination of PCIE_LINK_STATE_L0S_UP/DW. Link: https://lore.kernel.org/r/20240322123952.6384-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-02wrapper for access to ->bd_partnoAl Viro
On the next step it's going to get folded into a field where flags will go. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02Use bdev_is_paritition() instead of open-coding itAl Viro
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02set_blocksize(): switch to passing struct file *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02swapon(2): open swap with O_EXCLAl Viro
... eliminating the need to reopen block devices so they could be exclusively held. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02swapon(2)/swapoff(2): don't bother with block sizeAl Viro
once upon a time that used to matter; these days we do swap IO for swap devices at the level that doesn't give a damn about block size, buffer_head or anything of that sort - just attach the page to bio, set the location and size (the latter to PAGE_SIZE) and feed into queue. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02drm/amdkfd: enable single alu ops for gfx12Jonathan Kim
GFX12 debugging requires setting up precise ALU operation for catching ALU exceptions. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Lancelot Six <lancelot.six@amd.com> Reviewed-by: Eric Huang <jinhuieric.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02drm/amdgpu: Add gfx v12_0_0 family idLikun Gao
Add gfx v12_0_0 family id Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-02cxl/cper: Fix non-ACPI-APEI-GHES buildIra Weiny
If ACPI_APEI_GHES is not configured the [un]register work functions are not properly declared. 0day notices that the cxl_cper_register_work() declaration in the CONFIG_ACPI_APEI_GHES=n is broken, fix it to be typical nop stub. Reported-by: kernel test robot <lkp@intel.com> Closes: http://lore.kernel.org/r/202405012230.6kXItWen-lkp@intel.com Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20240501-cper-fix-0day-v1-1-c0b0056eafbc@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02ovl: implement tmpfileMiklos Szeredi
Combine inode creation with opening a file. There are six separate objects that are being set up: the backing inode, dentry and file, and the overlay inode, dentry and file. Cleanup in case of an error is a bit of a challenge and is difficult to test, so careful review is needed. All tmpfile testcases except generic/509 now run/pass, and no regressions are observed with full xfstests. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
2024-05-02media: cec: core: avoid recursive cec_claim_log_addrsHans Verkuil
Keep track if cec_claim_log_addrs() is running, and return -EBUSY if it is when calling CEC_ADAP_S_LOG_ADDRS. This prevents a case where cec_claim_log_addrs() could be called while it was still in progress. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reported-by: Yang, Chenyuan <cy54@illinois.edu> Closes: https://lore.kernel.org/linux-media/PH7PR11MB57688E64ADE4FE82E658D86DA09EA@PH7PR11MB5768.namprd11.prod.outlook.com/ Fixes: ca684386e6e2 ("[media] cec: add HDMI CEC framework (api)") Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2024-05-02Merge tag 'net-6.9-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Relatively calm week, likely due to public holiday in most places. No known outstanding regressions. Current release - regressions: - rxrpc: fix wrong alignmask in __page_frag_alloc_align() - eth: e1000e: change usleep_range to udelay in PHY mdic access Previous releases - regressions: - gro: fix udp bad offset in socket lookup - bpf: fix incorrect runtime stat for arm64 - tipc: fix UAF in error path - netfs: fix a potential infinite loop in extract_user_to_sg() - eth: ice: ensure the copied buf is NUL terminated - eth: qeth: fix kernel panic after setting hsuid Previous releases - always broken: - bpf: - verifier: prevent userspace memory access - xdp: use flags field to disambiguate broadcast redirect - bridge: fix multicast-to-unicast with fraglist GSO - mptcp: ensure snd_nxt is properly initialized on connect - nsh: fix outer header access in nsh_gso_segment(). - eth: bcmgenet: fix racing registers access - eth: vxlan: fix stats counters. Misc: - a bunch of MAINTAINERS file updates" * tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) MAINTAINERS: mark MYRICOM MYRI-10G as Orphan MAINTAINERS: remove Ariel Elior net: gro: add flush check in udp_gro_receive_segment net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb ipv4: Fix uninit-value access in __ip_make_skb() s390/qeth: Fix kernel panic after setting hsuid vxlan: Pull inner IP header in vxlan_rcv(). tipc: fix a possible memleak in tipc_buf_append tipc: fix UAF in error path rxrpc: Clients must accept conn from any address net: core: reject skb_copy(_expand) for fraglist GSO skbs net: bridge: fix multicast-to-unicast with fraglist GSO mptcp: ensure snd_nxt is properly initialized on connect e1000e: change usleep_range to udelay in PHY mdic access net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341 cxgb4: Properly lock TX queue for the selftest. rxrpc: Fix using alignmask being zero for __page_frag_alloc_align() vxlan: Add missing VNI filter counter update in arp_reduce(). vxlan: Fix racy device stats updates. net: qede: use return from qede_parse_actions() ...
2024-05-02string: Add additional __realloc_size() annotations for "dup" helpersKees Cook
Several other "dup"-style interfaces could use the __realloc_size() attribute. (As a reminder to myself and others: "realloc" is used here instead of "alloc" because the "alloc_size" attribute implies that the memory contents are uninitialized. Since we're copying contents into the resulting allocation, it must use "realloc_size" to avoid confusing the compiler's optimization passes.) Add KUnit test coverage where possible. (KUnit still does not have the ability to manipulate userspace memory.) Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/r/20240502145218.it.729-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-05-02KVM: Remove kvm_make_all_cpus_request_except()Venkatesh Srinivas
Remove kvm_make_all_cpus_request_except() as it effectively has no users, and arguably should never have been added in the first place. Commit 54163a346d4a ("KVM: Introduce kvm_make_all_cpus_request_except()") added the "except" variation for use in SVM's AVIC update path, which used it to skip sending a request to the current vCPU (commit 7d611233b016 ("KVM: SVM: Disable AVIC before setting V_IRQ")). But the AVIC usage of kvm_make_all_cpus_request_except() was essentially a hack-a-fix that simply squashed the most likely scenario of a racy WARN without addressing the underlying problem(s). Commit f1577ab21442 ("KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated") eventually fixed the WARN itself, and the "except" usage was subsequently dropped by df63202fe52b ("KVM: x86: APICv: drop immediate APICv disablement on current vCPU"). That kvm_make_all_cpus_request_except() hasn't gained any users in the last ~3 years isn't a coincidence. If a VM-wide broadcast *needs* to skip the current vCPU, then odds are very good that there is underlying bug that could be better fixed elsewhere. Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org> Link: https://lore.kernel.org/r/20240404232651.1645176-1-venkateshs@chromium.org [sean: rewrite changelog with --verbose] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-02seq_file: Optimize seq_puts()Christophe JAILLET
Most of seq_puts() usages are done with a string literal. In such cases, the length of the string car be computed at compile time in order to save a strlen() call at run-time. seq_putc() or seq_write() can then be used instead. This saves a few cycles. To have an estimation of how often this optimization triggers: $ git grep seq_puts.*\" | wc -l 3436 $ git grep seq_puts.*\".\" | wc -l 84 Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/a8589bffe4830dafcb9111e22acf06603fea7132.1713781332.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christian Brauner <brauner@kernel.org> The output for seq_putc() generation has also be checked and works.
2024-05-02kallsyms: Avoid weak references for kallsyms symbolsArd Biesheuvel
kallsyms is a directory of all the symbols in the vmlinux binary, and so creating it is somewhat of a chicken-and-egg problem, as its non-zero size affects the layout of the binary, and therefore the values of the symbols. For this reason, the kernel is linked more than once, and the first pass does not include any kallsyms data at all. For the linker to accept this, the symbol declarations describing the kallsyms metadata are emitted as having weak linkage, so they can remain unsatisfied. During the subsequent passes, the weak references are satisfied by the kallsyms metadata that was constructed based on information gathered from the preceding passes. Weak references lead to somewhat worse codegen, because taking their address may need to produce NULL (if the reference was unsatisfied), and this is not usually supported by RIP or PC relative symbol references. Given that these references are ultimately always satisfied in the final link, let's drop the weak annotation, and instead, provide fallback definitions in the linker script that are only emitted if an unsatisfied reference exists. While at it, drop the FRV specific annotation that these symbols reside in .rodata - FRV is long gone. Tested-by: Nick Desaulniers <ndesaulniers@google.com> # Boot Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20230504174320.3930345-1-ardb%40kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-05-02net: gro: fix udp bad offset in socket lookup by adding ↵Richard Gobert
{inner_}network_offset to napi_gro_cb Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the complete phase of gro. The functions always return skb->network_header, which in the case of encapsulated packets at the gro complete phase, is always set to the innermost L3 of the packet. That means that calling {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ L3/L4 may return an unexpected value. This incorrect usage leads to a bug in GRO's UDP socket lookup. udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These *_hdr functions return network_header which will point to the innermost L3, resulting in the wrong offset being used in __udp{4,6}_lib_lookup with encapsulated packets. This patch adds network_offset and inner_network_offset to napi_gro_cb, and makes sure both are set correctly. To fix the issue, network_offsets union is used inside napi_gro_cb, in which both the outer and the inner network offsets are saved. Reproduction example: Endpoint configuration example (fou + local address bind) # ip fou add port 6666 ipproto 4 # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip # ip link set tun1 up # ip a add 1.1.1.2/24 dev tun1 Netperf TCP_STREAM result on net-next before patch is applied: net-next main, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.28 2.37 net-next main, GRO disabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2745.06 patch applied, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2877.38 Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02ASoC: Use inline function for type safety in snd_soc_substream_to_rtd()Krzysztof Kozlowski
A common pattern in sound drivers is getting 'struct snd_soc_pcm_runtime' from 'struct snd_pcm_substream' opaque pointer private_data field with snd_soc_substream_to_rtd(). However 'private_data' appears in several other structures as well, including 'struct snd_compr_stream'. The field might not hold the same type for every structure, although seems the case at least for 'struct snd_compr_stream', so code can easily make a mistake by using macro for wrong structure passed as argument. Switch from macro to inline function, so such mistake will be build-time detectable. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240501175127.34301-1-krzysztof.kozlowski@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-01net: dsa: Remove adjust_link pathsFlorian Fainelli
Now that we no longer any drivers using PHYLIB's adjust_link callback, remove all paths that made use of adjust_link as well as the associated functions. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240430164816.2400606-3-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01net: dsa: Remove fixed_link_update memberFlorian Fainelli
We have not had a switch driver use a fixed_link_update callback since 58d56fcc3964f9be0a9ca42fd126bcd9dc7afc90 ("net: dsa: bcm_sf2: Get rid of PHYLIB functions") remove this callback. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240430164816.2400606-2-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01net: Protect dev->name by seqlock.Kuniyuki Iwashima
We will convert ioctl(SIOCGARP) to RCU, and then we need to copy dev->name which is currently protected by rtnl_lock(). This patch does the following: 1) Add seqlock netdev_rename_lock to protect dev->name 2) Add netdev_copy_name() that copies dev->name to buffer under netdev_rename_lock 3) Use netdev_copy_name() in netdev_get_name() and drop devnet_rename_sem Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89iJEWs7AYSJqGCUABeVqOCTkErponfZdT5kV-iD=-SajnQ@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240430015813.71143-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01kunit/fortify: Fix replaced failure path to unbreak __alloc_sizeKees Cook
The __alloc_size annotation for kmemdup() was getting disabled under KUnit testing because the replaced fortify_panic macro implementation was using "return NULL" as a way to survive the sanity checking. But having the chance to return NULL invalidated __alloc_size, so kmemdup was not passing the __builtin_dynamic_object_size() tests any more: [23:26:18] [PASSED] fortify_test_alloc_size_kmalloc_const [23:26:19] # fortify_test_alloc_size_kmalloc_dynamic: EXPECTATION FAILED at lib/fortify_kunit.c:265 [23:26:19] Expected __builtin_dynamic_object_size(p, 1) == expected, but [23:26:19] __builtin_dynamic_object_size(p, 1) == -1 (0xffffffffffffffff) [23:26:19] expected == 11 (0xb) [23:26:19] __alloc_size() not working with __bdos on kmemdup("hello there", len, gfp) [23:26:19] [FAILED] fortify_test_alloc_size_kmalloc_dynamic Normal builds were not affected: __alloc_size continued to work there. Use a zero-sized allocation instead, which allows __alloc_size to behave. Fixes: 4ce615e798a7 ("fortify: Provide KUnit counters for failure testing") Fixes: fa4a3f86d498 ("fortify: Add KUnit tests for runtime overflows") Link: https://lore.kernel.org/r/20240501232937.work.532-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-05-01cifs: Implement netfslib hooksDavid Howells
Provide implementation of the netfslib hooks that will be used by netfslib to ask cifs to set up and perform operations. Of particular note are (*) cifs_clamp_length() - This is used to negotiate the size of the next subrequest in a read request, taking into account the credit available and the rsize. The credits are attached to the subrequest. (*) cifs_req_issue_read() - This is used to issue a subrequest that has been set up and clamped. (*) cifs_prepare_write() - This prepares to fill a subrequest by picking a channel, reopening the file and requesting credits so that we can set the maximum size of the subrequest and also sets the maximum number of segments if we're doing RDMA. (*) cifs_issue_write() - This releases any unneeded credits and issues an asynchronous data write for the contiguous slice of file covered by the subrequest. This should possibly be folded in to all ->async_writev() ops and that called directly. (*) cifs_begin_writeback() - This gets the cached writable handle through which we do writeback (this does not affect writethrough, unbuffered or direct writes). At this point, cifs is not wired up to actually *use* netfslib; that will be done in a subsequent patch. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-05-01netfs, afs: Use writeback retry to deal with alternate keysDavid Howells
Use a hook in the new writeback code's retry algorithm to rotate the keys once all the outstanding subreqs have failed rather than doing it separately on each subreq. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: Miscellaneous tidy upsDavid Howells
Do a couple of miscellaneous tidy ups: (1) Add a qualifier into a file banner comment. (2) Put the writeback folio traces back into alphabetical order. (3) Remove some unused folio traces. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: Cut over to using new writeback codeDavid Howells
Cut over to using the new writeback code. The old code is #ifdef'd out or otherwise removed from compilation to avoid conflicts and will be removed in a future patch. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs, 9p: Implement helpers for new write codeDavid Howells
Implement the helpers for the new write code in 9p. There's now an optional ->prepare_write() that allows the filesystem to set the parameters for the next write, such as maximum size and maximum segment count, and an ->issue_write() that is called to initiate an (asynchronous) write operation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: New writeback implementationDavid Howells
The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: Switch to using unsigned long long rather than loff_tDavid Howells
Switch to using unsigned long long rather than loff_t in netfslib to avoid problems with the sign flipping in the maths when we're dealing with the byte at position 0x7fffffffffffffff. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: Use mempools for allocating requests and subrequestsDavid Howells
Use mempools for allocating requests and subrequests in an effort to make sure that allocation always succeeds so that when performing writeback we can always make progress. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-05-01netfs: Remove ->launder_folio() supportDavid Howells
Remove support for ->launder_folio() from netfslib and expect filesystems to use filemap_invalidate_inode() instead. netfs_launder_folio() can then be got rid of. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Steve French <sfrench@samba.org> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: devel@lists.orangefs.org
2024-05-01mm: Provide a means of invalidation without using launder_folioDavid Howells
Implement a replacement for launder_folio. The key feature of invalidate_inode_pages2() is that it locks each folio individually, unmaps it to prevent mmap'd accesses interfering and calls the ->launder_folio() address_space op to flush it. This has problems: firstly, each folio is written individually as one or more small writes; secondly, adjacent folios cannot be added so easily into the laundry; thirdly, it's yet another op to implement. Instead, use the invalidate lock to cause anyone wanting to add a folio to the inode to wait, then unmap all the folios if we have mmaps, then, conditionally, use ->writepages() to flush any dirty data back and then discard all pages. The invalidate lock prevents ->read_iter(), ->write_iter() and faulting through mmap all from adding pages for the duration. This is then used from netfslib to handle the flusing in unbuffered and direct writes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Christoph Hellwig <hch@lst.de> cc: Andrew Morton <akpm@linux-foundation.org> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: devel@lists.orangefs.org
2024-05-01Merge remote-tracking branch 'cxl/for-6.10/cper' into cxl-for-nextDave Jiang
Add support to send CPER records to CXL for more detailed parsing.
2024-05-01acpi/ghes: Process CXL Component EventsIra Weiny
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then inform the OS of these events via UEFI. UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference lies in the use of a GUID as the CPER Section Type which matches the UUID defined in CXL 3.1 Table 8-43. Currently a configuration such as this will trace a non standard event in the log omitting useful details of the event. In addition the CXL sub-system contains additional region and HPA information useful to the user.[0] The CXL code is required to be called from process context as it needs to take a device lock. The GHES code may be in interrupt context. This complicated the use of a callback. Dan Williams suggested the use of work items as an atomic way of switching between the callback execution and a default handler.[1] The use of a kfifo simplifies queue processing by providing lock free fifo operations. cxl_cper_kfifo_get() allows easier management of the kfifo between the ghes and cxl modules. CXL 3.1 Table 8-127 requires a device to have a queue depth of 1 for each of the four event logs. A combined queue depth of 32 is chosen to provide room for 8 entries of each log type. Add GHES support to detect CXL CPER records. Add the ability for the CXL sub-system to register a work queue to process the events. This patch adds back the functionality which was removed to fix the report by Dan Carpenter[2]. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Link: http://lore.kernel.org/r/cover.1711598777.git.alison.schofield@intel.com [0] Link: http://lore.kernel.org/r/65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch [1] Link: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Link: https://lore.kernel.org/r/20240426-cxl-cper3-v4-1-58076cce1624@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01Merge tag 'asoc-fix-v6.9-rc6' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.9 This is much larger than is ideal, partly due to your holiday but also due to several vendors having come in with relatively large fixes at similar times. It's all driver specific stuff. The meson fixes from Jerome fix some rare timing issues with blocking operations happening in triggers, plus the continuous clock support which fixes clocking for some platforms. The SOF series from Peter builds to the fix to avoid spurious resets of ChainDMA which triggered errors in cleanup paths with both PulseAudio and PipeWire, and there's also some simple new debugfs files from Pierre which make support a lot eaiser.
2024-05-01Merge tag 'regulator-fix-v6.9-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "There's a few simple driver specific fixes here, plus some core cleanups from Matti which fix issues found with client drivers due to the API being confusing. The two fixes for the stubs provide more constructive behaviour with !REGULATOR configurations, issues were noticed with some hwmon drivers which would otherwise have needed confusing bodges in the users. The irq_helpers fix to duplicate the provided name for the interrupt controller was found because a driver got this wrong and it's again a case where the core is the sensible place to put the fix" * tag 'regulator-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: change devm_regulator_get_enable_optional() stub to return Ok regulator: change stubbed devm_regulator_get_enable to return Ok regulator: vqmmc-ipq4019: fix module autoloading regulator: qcom-refgen: fix module autoloading regulator: mt6360: De-capitalize devicetree regulator subnodes regulator: irq_helpers: duplicate IRQ name
2024-05-01KVM: arm64: Simplify vgic-v3 hypercallsMarc Zyngier
Consolidate the GICv3 VMCR accessor hypercalls into the APR save/restore hypercalls so that all of the EL2 GICv3 state is covered by a single pair of hypercalls. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-17-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>