summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-09-01mm: kvmalloc: align kvrealloc() with krealloc()Danilo Krummrich
Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. Besides that, implementing kvrealloc() by making use of krealloc() and vrealloc() provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: disable KASAN when switching to vmalloc] Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: vmalloc: implement vrealloc()Danilo Krummrich
Patch series "Align kvrealloc() with krealloc()", v2. Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. In order to be able to get rid of kvrealloc()'s oldsize parameter, introduce vrealloc() and make use of it in kvrealloc(). Making use of vrealloc() in kvrealloc() also provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. Besides the above, those functions are required by Rust's allocator abstractons [1] (rework based on this series in [2]). With `Vec` or `KVec` respectively, potentially growing (and shrinking) data structures are rather common. [1] https://lore.kernel.org/lkml/20240704170738.3621-1-dakr@redhat.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/dakr/linux.git/log/?h=rust/mm This patch (of 2): Implement vrealloc() analogous to krealloc(). Currently, krealloc() requires the caller to pass the size of the previous memory allocation, which, instead, should be self-contained. We attempt to fix this in a subsequent patch which, in order to do so, requires vrealloc(). Besides that, we need realloc() functions for kernel allocators in Rust too. With `Vec` or `KVec` respectively, potentially growing (and shrinking) data structures are rather common. [dakr@kernel.org: fix missing nommu implementation] Link: https://lkml.kernel.org/r/20240725141227.13954-1-dakr@kernel.org [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: consider spare memory for __GFP_ZERO] Link: https://lkml.kernel.org/r/20240730185049.6244-3-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-4-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-1-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-2-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: add node_reclaim successes to VM event countersMatthew Cassell
/proc/vmstat currently shows the number of node_reclaim() failures when vm.zone_reclaim_mode is set appropriately. It would be convenient to have the number of successes right next to zone_reclaim_failed (similar to compaction and migration). While just a trivially addition to the vmstat file. It was helpful during benchmarking to not have to probe node_reclaim() to observe the success/failure ratio. Link: https://lkml.kernel.org/r/20240722171316.7517-1-mcassell411@gmail.com Signed-off-by: Matthew Cassell <mcassell411@gmail.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01Merge tag 'x86-urgent-2024-09-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable it thereby creating inconsistent state. Reorder the logic so it actually works correctly - The XSTATE logic for handling LBR is incorrect as it assumes that XSAVES supports LBR when the CPU supports LBR. In fact both conditions need to be true. Otherwise the enablement of LBR in the IA32_XSS MSR fails and subsequently the machine crashes on the next XRSTORS operation because IA32_XSS is not initialized. Cache the XSTATE support bit during init and make the related functions use this cached information and the LBR CPU feature bit to cure this. - Cure a long standing bug in KASLR KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. - Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of an initialized variabled on the stack to the VMM. The variable is only required as output value, so it does not have to exposed to the VMM in the first place. - Prevent an array overrun in the resource control code on systems with Sub-NUMA Clustering enabled because the code failed to adjust the index by the number of SNC nodes per L3 cache. * tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix arch_mbm_* array overrun on SNC x86/tdx: Fix data leak in mmio_read() x86/kaslr: Expose and use the end of the physical memory address space x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported x86/apic: Make x2apic_disable() work correctly
2024-09-01SUNRPC: make various functions static, or not exported.NeilBrown
Various functions are only used within the sunrpc module, and several are only use in the one file. So clean up: These are marked static, and any EXPORT is removed. svc_rcpb_setup() svc_rqst_alloc() svc_rqst_free() - also moved before first use svc_rpcbind_set_version() svc_drop() - also moved to svc.c These are now not EXPORTed, but are not static. svc_authenticate() svc_sock_update_bufs() Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-01lockd: discard nlmsvc_timeoutNeilBrown
nlmsvc_timeout always has the same value as (nlm_timeout * HZ), so use that in the one place that nlmsvc_timeout is used. In truth it *might* not always be the same as nlmsvc_timeout is only set when lockd is started while nlm_timeout can be set at anytime via sysctl. I think this difference it not helpful so removing it is good. Also remove the test for nlm_timout being 0. This is not possible - unless a module parameter is used to set the minimum timeout to 0, and if that happens then it probably should be honoured. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-01NFS: trace: show TIMEDOUT instead of 0x6eChen Hanxiao
__nfs_revalidate_inode may return ETIMEDOUT. print symbol of ETIMEDOUT in nfs trace: before: cat-5191 [005] 119.331127: nfs_revalidate_inode_exit: error=-110 (0x6e) after: cat-1738 [004] 44.365509: nfs_revalidate_inode_exit: error=-110 (TIMEDOUT) Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-09-01PCI: endpoint: Assign PCI domain number for endpoint controllersManivannan Sadhasivam
Right now, PCI endpoint subsystem doesn't assign PCI domain number for the PCI endpoint controllers. But this domain number could be useful to the EPC drivers to uniquely identify each controller based on the hardware instance when there are multiple ones present in an SoC (even multiple RC/EP). So let's make use of the existing pci_bus_find_domain_nr() API to allocate domain numbers based on either devicetree (linux,pci-domain) property or dynamic domain number allocation scheme. It should be noted that the domain number allocated by this API will be based on both RC and EP controllers in a SoC. If the 'linux,pci-domain' DT property is present, then the domain number represents the actual hardware instance of the PCI endpoint controller. If not, then the domain number will be allocated based on the PCI EP/RC controller probe order. If the architecture doesn't support CONFIG_PCI_DOMAINS_GENERIC (rare), then currently a warning is thrown to indicate that the architecture specific implementation is needed. Link: https://lore.kernel.org/linux-pci/20240828-pci-qcom-hotplug-v4-5-263a385fbbcb@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com>
2024-09-01Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - copy_file_range fix - two read fixes including read past end of file rc fix and read retry crediting fix - falloc zero range fix * tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region cifs: Fix copy offload to flush destination region netfs, cifs: Fix handling of short DIO read cifs: Fix lack of credit renegotiation on read retry
2024-09-01Merge tag 'arm-fixes-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There is a fairly large number of bug fixes for Qualcomm platforms, most of them addressing issues with the devicetree files for the newly added Snapdragon X1 based laptops to make them more reliable. The Qualcomm driver changes address a few build-time issues as well as runtime problems in the tzmem and scm firmware, the USB Type-C driver, and the cmd-db and pmic_glink soc drivers. The NXP i.MX usually gets a bunch of devicetree fixes that is proportional to the number of supported machines. This includes both warning fixes and correctness for the 64-bit i.MX9, i.MX8 and layerscape platforms, as well as a single fix for a 32-bit i.MX6 based board. The other changes are the usual minor changes, including an update to the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs (risc-v)" * tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits) firmware: microchip: fix incorrect error report of programming:timeout on success soc: qcom: pd-mapper: Fix singleton refcount firmware: qcom: tzmem: disable sdm670 platform soc: qcom: pmic_glink: Actually communicate when remote goes down usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Fix race during initialization firmware: qcom: qseecom: remove unused functions firmware: qcom: tzmem: fix virtual-to-physical address conversion firmware: qcom: scm: Mark get_wq_ctx() as atomic call arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt arm64: dts: qcom: disable GPU on x1e80100 by default arm64: dts: imx8mm-phygate: fix typo pinctrcl-0 arm64: dts: imx95: correct L3Cache cache-sets arm64: dts: imx95: correct a55 power-domains arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design arm64: dts: imx93: update default value for snps,clk-csr arm64: dts: freescale: tqma9352: Fix watchdog reset arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962 ...
2024-08-31xfrm: Unmask upper DSCP bits in xfrm_get_tos()Ido Schimmel
The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain: xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4) Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Remove IPTOS_RT_MASK since it is no longer used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-31ipv4: Unmask upper DSCP bits in get_rttos()Ido Schimmel
The function is used by a few socket types to retrieve the TOS value with which to perform the FIB lookup for packets sent through the socket (flowi4_tos). If a DS field was passed using the IP_TOS control message, then it is used. Otherwise the one specified via the IP_TOS socket option. Unmask the upper DSCP bits so that in the future the lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-31ipv4: Unmask upper DSCP bits in ip_sock_rt_tos()Ido Schimmel
The function is used to read the DS field that was stored in IPv4 sockets via the IP_TOS socket option so that it could be used to initialize the flowi4_tos field before resolving an output route. Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-30Revert "Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE"Luiz Augusto von Dentz
This reverts commit 59b047bc98084f8af2c41483e4d68a5adf2fa7f7 which breaks compatibility with commands like: bluetoothd[46328]: @ MGMT Command: Load.. (0x0013) plen 74 {0x0001} [hci0] Keys: 2 BR/EDR Address: C0:DC:DA:A5:E5:47 (Samsung Electronics Co.,Ltd) Key type: Authenticated key from P-256 (0x03) Central: 0x00 Encryption size: 16 Diversifier[2]: 0000 Randomizer[8]: 0000000000000000 Key[16]: 6ed96089bd9765be2f2c971b0b95f624 LE Address: D7:2A:DE:1E:73:A2 (Static) Key type: Unauthenticated key from P-256 (0x02) Central: 0x00 Encryption size: 16 Diversifier[2]: 0000 Randomizer[8]: 0000000000000000 Key[16]: 87dd2546ededda380ffcdc0a8faa4597 @ MGMT Event: Command Status (0x0002) plen 3 {0x0001} [hci0] Load Long Term Keys (0x0013) Status: Invalid Parameters (0x0d) Cc: stable@vger.kernel.org Link: https://github.com/bluez/bluez/issues/875 Fixes: 59b047bc9808 ("Bluetooth: MGMT/SMP: Fix address type when using SMP over BREDR/LE") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-08-30Bluetooth: hci_sync: Introduce hci_cmd_sync_run/hci_cmd_sync_run_onceLuiz Augusto von Dentz
This introduces hci_cmd_sync_run/hci_cmd_sync_run_once which acts like hci_cmd_sync_queue/hci_cmd_sync_queue_once but runs immediately when already on hdev->cmd_sync_work context. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-08-30ieee802154: Correct spelling in nl802154.hSimon Horman
Correct spelling in nl802154.h. As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Message-ID: <20240829-wpan-spell-v1-2-799d840e02c4@kernel.org> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2024-08-30mac802154: Correct spelling in mac802154.hSimon Horman
Correct spelling in mac802154.h. As reported by codespell. Signed-off-by: Simon Horman <horms@kernel.org> Message-ID: <20240829-wpan-spell-v1-1-799d840e02c4@kernel.org> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2024-08-30cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1Chen Ridong
This patch introduces CONFIG_CPUSETS_V1 and guard cpuset-v1 code under CONFIG_CPUSETS_V1. The default value of CONFIG_CPUSETS_V1 is N, so that user who adopted v2 don't have 'pay' for cpuset v1. Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-30dma-buf: Split out dma fence array create into alloc and arm functionsMatthew Brost
Useful to preallocate dma fence array and then arm in path of reclaim or a dma fence. v2: - s/arm/init (Christian) - Drop !array warn (Christian) v3: - Fix kernel doc typos (dim) Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826170144.2492062-2-matthew.brost@intel.com
2024-08-30icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netnsEric Dumazet
Previous patch made ICMP rate limits per netns, it makes sense to allow each netns to change the associated sysctl. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240829144641.3880376-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-30icmp: move icmp_global.credit and icmp_global.stamp to per netns storageEric Dumazet
Host wide ICMP ratelimiter should be per netns, to provide better isolation. Following patch in this series makes the sysctl per netns. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20240829144641.3880376-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-30icmp: change the order of rate limitsEric Dumazet
ICMP messages are ratelimited : After the blamed commits, the two rate limiters are applied in this order: 1) host wide ratelimit (icmp_global_allow()) 2) Per destination ratelimit (inetpeer based) In order to avoid side-channels attacks, we need to apply the per destination check first. This patch makes the following change : 1) icmp_global_allow() checks if the host wide limit is reached. But credits are not yet consumed. This is deferred to 3) 2) The per destination limit is checked/updated. This might add a new node in inetpeer tree. 3) icmp_global_consume() consumes tokens if prior operations succeeded. This means that host wide ratelimit is still effective in keeping inetpeer tree small even under DDOS. As a bonus, I removed icmp_global.lock as the fast path can use a lock-free operation. Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Reported-by: Keyu Man <keyu.man@email.ucr.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240829144641.3880376-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-31Merge tag 'iommu-fixes-v6.11-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - Fix a device-stall problem in bad io-page-fault setups (faults received from devices with no supporting domain attached). - Context flush fix for Intel VT-d. - Do not allow non-read+non-write mapping through iommufd as most implementations can not handle that. - Fix a possible infinite-loop issue in map_pages() path. - Add Jean-Philippe as reviewer for SMMUv3 SVA support * tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer iommu: Do not return 0 from map_pages if it doesn't do anything iommufd: Do not allow creating areas without READ or WRITE iommu/vt-d: Fix incorrect domain ID in context flush helper iommu: Handle iommu faults for a bad iopf setup
2024-08-30ARM: OMAP2+: Remove obsoleted declaration for gpmc_onenand_initGaosheng Cui
The gpmc_onenand_init() have been removed since commit 2514830b8b8c ("ARM: OMAP2+: Remove gpmc-onenand"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20240826035823.4043171-1-cuigaosheng1@huawei.com Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2024-08-30drm/msm: Expose expanded UBWC config uapiConnor Abbott
This adds extra parameters that affect UBWC tiling that will be used by the Mesa implementation of VK_EXT_host_image_copy. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Patchwork: https://patchwork.freedesktop.org/patch/607401/ Signed-off-by: Rob Clark <robdclark@chromium.org>
2024-08-30fbdev: Introduce devm_register_framebuffer()Thomas Weißschuh
Introduce a device-managed variant of register_framebuffer() which automatically unregisters the framebuffer on device destruction. This can simplify the error handling and resource management in drivers. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Helge Deller <deller@gmx.de>
2024-08-30arm64: smccc: Reserve block of KVM "vendor" services for pKVM hypercallsWill Deacon
pKVM relies on hypercalls to expose services such as memory sharing to protected guests. Tentatively allocate a block of 58 hypercalls (i.e. fill the remaining space in the first 64 function IDs) for pKVM usage, as future extensions such as pvIOMMU support, range-based memory sharing and validation of assigned devices will require additional services. Suggested-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/86a5h5yg5y.wl-maz@kernel.org Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-8-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30drivers/virt: pkvm: Intercept ioremap using pKVM MMIO_GUARD hypercallWill Deacon
Hook up pKVM's MMIO_GUARD hypercall so that ioremap() and friends will register the target physical address as MMIO with the hypervisor, allowing guest exits to that page to be emulated by the host with full syndrome information. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-7-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30drivers/virt: pkvm: Hook up mem_encrypt API using pKVM hypercallsWill Deacon
If we detect the presence of pKVM's SHARE and UNSHARE hypercalls, then register a backend implementation of the mem_encrypt API so that things like DMA buffers can be shared appropriately with the host. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-5-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30drivers/virt: pkvm: Add initial support for running as a protected guestWill Deacon
Implement a pKVM protected guest driver to probe the presence of pKVM and determine the memory protection granule using the HYP_MEMINFO hypercall. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240830130150.8568-3-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30regulator: core: Stub devm_regulator_bulk_get_const() if !CONFIG_REGULATORDouglas Anderson
When adding devm_regulator_bulk_get_const() I missed adding a stub for when CONFIG_REGULATOR is not enabled. Under certain conditions (like randconfig testing) this can cause the compiler to reports errors like: error: implicit declaration of function 'devm_regulator_bulk_get_const'; did you mean 'devm_regulator_bulk_get_enable'? Add the stub. Fixes: 1de452a0edda ("regulator: core: Allow drivers to define their init data as const") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408301813.TesFuSbh-lkp@intel.com/ Cc: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20240830073511.1.Ib733229a8a19fad8179213c05e1af01b51e42328@changeid Signed-off-by: Mark Brown <broonie@kernel.org>
2024-08-30fs: drop GFP_NOFAIL mode from alloc_page_buffersMichal Hocko
There is only one called of alloc_page_buffers and it doesn't require __GFP_NOFAIL so drop this allocation mode. Signed-off-by: Michal Hocko <mhocko@suse.com> Link: https://lore.kernel.org/r/20240829130640.1397970-1-mhocko@kernel.org Acked-by: Song Liu <song@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30iommu: Allow ATS to work on VFs when the PF uses IDENTITYJason Gunthorpe
PCI ATS has a global Smallest Translation Unit field that is located in the PF but shared by all of the VFs. The expectation is that the STU will be set to the root port's global STU capability which is driven by the IO page table configuration of the iommu HW. Today it becomes set when the iommu driver first enables ATS. Thus, to enable ATS on the VF, the PF must have already had the correct STU programmed, even if ATS is off on the PF. Unfortunately the PF only programs the STU when the PF enables ATS. The iommu drivers tend to leave ATS disabled when IDENTITY translation is being used. Thus we can get into a state where the PF is setup to use IDENTITY with the DMA API while the VF would like to use VFIO with a PAGING domain and have ATS turned on. This fails because the PF never loaded a PAGING domain and so it never setup the STU, and the VF can't do it. The simplest solution is to have the iommu driver set the ATS STU when it probes the device. This way the ATS STU is loaded immediately at boot time to all PFs and there is no issue when a VF comes to use it. Add a new call pci_prepare_ats() which should be called by iommu drivers in their probe_device() op for every PCI device if the iommu driver supports ATS. This will setup the STU based on whatever page size capability the iommu HW has. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-0fb4d2ab6770+7e706-ats_vf_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-30drivers/perf: arm_spe: Use perf_allow_kernel() for permissionsJames Clark
Use perf_allow_kernel() for 'pa_enable' (physical addresses), 'pct_enable' (physical timestamps) and context IDs. This means that perf_event_paranoid is now taken into account and LSM hooks can be used, which is more consistent with other perf_event_open calls. For example PERF_SAMPLE_PHYS_ADDR uses perf_allow_kernel() rather than just perfmon_capable(). This also indirectly fixes the following error message which is misleading because perf_event_paranoid is not taken into account by perfmon_capable(): $ perf record -e arm_spe/pa_enable/ Error: Access to performance monitoring and observability operations is limited. Consider adjusting /proc/sys/kernel/perf_event_paranoid setting ... Suggested-by: Al Grant <al.grant@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240827145113.1224604-1-james.clark@linaro.org Link: https://lore.kernel.org/all/20240807120039.GD37996@noisy.programming.kicks-ass.net/ Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30mfd: ds1wm: Remove remaining header fileWilken Gottwalt
The driver was removed after kernel 6.2 rendering the header file unused. Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net> Link: https://lore.kernel.org/r/ZpTbGHb6EX2Oe7ok@monster.localdomain Signed-off-by: Lee Jones <lee@kernel.org>
2024-08-30mfd: 88pm80x: Constify read-only regmap structsJavier Carrasco
`pm800_irq`, `pm805_irq` and `pm805_irq_chip` are not modified and can be declared as const to move their data to a read-only section. In order to keep the const modifier for the regmap_irq_chip structures, the pointer used to reference them must be converted to const as well. Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://lore.kernel.org/r/20240704-mfd-const-regmap_config-v2-8-0c8785b1331d@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-08-30Merge branches 'ib-mfd-for-iio-power-6.12' and 'ib-mfd-gpio-pwm-6.12' into ↵Lee Jones
ibs-for-mfd-merged
2024-08-30writeback: Refine the show_inode_state() macro definitionJulian Sun
Currently, the show_inode_state() macro only prints part of the state of inode->i_state. Let’s improve it to display more of its state. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Link: https://lore.kernel.org/r/20240828081359.62429-1-sunjunchao2870@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30inode: make i_state a u32Christian Brauner
Now that we use the wait var event mechanism make i_state a u32 and free up 4 bytes. This means we currently have two 4 byte holes in struct inode which we can pack. Link: https://lore.kernel.org/r/20240823-work-i_state-v3-6-5cd5fd207a57@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30inode: port __I_NEW to var eventChristian Brauner
Port the __I_NEW mechanism to use the new var event mechanism. Link: https://lore.kernel.org/r/20240823-work-i_state-v3-4-5cd5fd207a57@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: reorder i_state bitsChristian Brauner
so that we can use the first bits to derive unique addresses from i_state. Link: https://lore.kernel.org/r/20240823-work-i_state-v3-2-5cd5fd207a57@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: add i_state helpersChristian Brauner
The i_state member is an unsigned long so that it can be used with the wait bit infrastructure which expects unsigned long. This wastes 4 bytes which we're unlikely to ever use. Switch to using the var event wait mechanism using the address of the bit. Thanks to Linus for the address idea. Link: https://lore.kernel.org/r/20240823-work-i_state-v3-1-5cd5fd207a57@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: s/__u32/u32/ for s_fsnotify_maskChristian Brauner
The underscore variants are for uapi whereas the non-underscore variants are for in-kernel consumers. Link: https://lore.kernel.org/r/20240822-anwerben-nutzung-1cd6c82a565f@brauner Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: remove unused path_put_init()Christian Brauner
This helper has been unused for a while now. Link: https://lore.kernel.org/r/20240822-bewuchs-werktag-46672b3c0606@brauner Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30inode: remove __I_DIO_WAKEUPChristian Brauner
Afaict, we can just rely on inode->i_dio_count for waiting instead of this awkward indirection through __I_DIO_WAKEUP. This survives LTP dio and xfstests dio tests. Link: https://lore.kernel.org/r/20240816-vfs-misc-dio-v1-1-80fe21a2c710@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30autofs: add per dentry expire timeoutIan Kent
Add ability to set per-dentry mount expire timeout to autofs. There are two fairly well known automounter map formats, the autofs format and the amd format (more or less System V and Berkley). Some time ago Linux autofs added an amd map format parser that implemented a fair amount of the amd functionality. This was done within the autofs infrastructure and some functionality wasn't implemented because it either didn't make sense or required extra kernel changes. The idea was to restrict changes to be within the existing autofs functionality as much as possible and leave changes with a wider scope to be considered later. One of these changes is implementing the amd options: 1) "unmount", expire this mount according to a timeout (same as the current autofs default). 2) "nounmount", don't expire this mount (same as setting the autofs timeout to 0 except only for this specific mount) . 3) "utimeout=<seconds>", expire this mount using the specified timeout (again same as setting the autofs timeout but only for this mount). To implement these options per-dentry expire timeouts need to be implemented for autofs indirect mounts. This is because all map keys (mounts) for autofs indirect mounts use an expire timeout stored in the autofs mount super block info. structure and all indirect mounts use the same expire timeout. Now I have a request to add the "nounmount" option so I need to add the per-dentry expire handling to the kernel implementation to do this. The implementation uses the trailing path component to identify the mount (and is also used as the autofs map key) which is passed in the autofs_dev_ioctl structure path field. The expire timeout is passed in autofs_dev_ioctl timeout field (well, of the timeout union). If the passed in timeout is equal to -1 the per-dentry timeout and flag are cleared providing for the "unmount" option. If the timeout is greater than or equal to 0 the timeout is set to the value and the flag is also set. If the dentry timeout is 0 the dentry will not expire by timeout which enables the implementation of the "nounmount" option for the specific mount. When the dentry timeout is greater than zero it allows for the implementation of the "utimeout=<seconds>" option. Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/20240814090231.963520-1-raven@themaw.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30fs: move FMODE_UNSIGNED_OFFSET to fop_flagsChristian Brauner
This is another flag that is statically set and doesn't need to use up an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit. (1) mem_open() used from proc_mem_operations (2) adi_open() used from adi_fops (3) drm_open_helper(): (3.1) accel_open() used from DRM_ACCEL_FOPS (3.2) drm_open() used from (3.2.1) amdgpu_driver_kms_fops (3.2.2) psb_gem_fops (3.2.3) i915_driver_fops (3.2.4) nouveau_driver_fops (3.2.5) panthor_drm_driver_fops (3.2.6) radeon_driver_kms_fops (3.2.7) tegra_drm_fops (3.2.8) vmwgfx_driver_fops (3.2.9) xe_driver_fops (3.2.10) DRM_GEM_FOPS (3.2.11) DEFINE_DRM_GEM_DMA_FOPS (4) struct memdev sets fmode flags based on type of device opened. For devices using struct mem_fops unsigned offset is used. Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts into the open helper to ensure that the flag is always set. Link: https://lore.kernel.org/r/20240809-work-fop_unsigned-v1-1-658e054d893e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30vfs: only read fops once in fops_get/putMateusz Guzik
In do_dentry_open() the usage is: f->f_op = fops_get(inode->i_fop); In generated asm the compiler emits 2 reads from inode->i_fop instead of just one. This popped up due to false-sharing where loads from that offset end up bouncing a cacheline during parallel open. While this is going to be fixed, the spurious load does not need to be there. This makes do_dentry_open() go down from 1177 to 1154 bytes. fops_put() is patched to maintain some consistency. No functional changes. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240810064753.1211441-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30vfs: dodge smp_mb in break_lease and break_deleg in the common caseMateusz Guzik
These inlines show up in the fast path (e.g., in do_dentry_open()) and induce said full barrier regarding i_flctx access when in most cases the pointer is NULL. The pointer can be safely checked before issuing the barrier, dodging it in most cases as a result. It is plausible the consume fence would be sufficient, but I don't want to go audit all callers regarding what they before calling here. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240806172846.886570-1-mjguzik@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30Merge tag 'drm-intel-next-2024-08-29' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next Cross-driver (xe-core) Changes: - Require BMG scanout buffers to be 64k physically aligned (Maarten) Core (drm) Changes: - Introducing Xe2 ccs modifiers for integrated and discrete graphics (Juha-Pekka) Driver Changes: - General cleanup and more work moving towards intel_display isolation (Jani) - New display workaround (Suraj) - Use correct cp_irq_count on HDCP (Suraj) - eDP PSR fix when CRC is enabled (Jouni) - Fix DP MST state after a sink reset (Imre) - Fix Arrow Lake GSC firmware version (John) - Use chained DSBs for LUT programming (Ville) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZtCC0lJ0Zf3MoSdW@intel.com