summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2023-10-27perf: Add branch_sample_call_stackKan Liang
Add a helper function to check call stack sample type. The later patch will invoke the function in several places. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231025201626.3000228-3-kan.liang@linux.intel.com
2023-10-27perf: Add branch stack countersKan Liang
Currently, the additional information of a branch entry is stored in a u64 space. With more and more information added, the space is running out. For example, the information of occurrences of events will be added for each branch. Two places were suggested to append the counters. https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/ One place is right after the flags of each branch entry. It changes the existing struct perf_branch_entry. The later ARCH specific implementation has to be really careful to consistently pick the right struct. The other place is right after the entire struct perf_branch_stack. The disadvantage is that the pointer of the extra space has to be recorded. The common interface perf_sample_save_brstack() has to be updated. The latter is much straightforward, and should be easily understood and maintained. It is implemented in the patch. Add a new branch sample type, PERF_SAMPLE_BRANCH_COUNTERS, to indicate the event which is recorded in the branch info. The "u64 counters" may store the occurrences of several events. The information regarding the number of events/counters and the width of each counter should be exposed via sysfs as a reference for the perf tool. Define the branch_counter_nr and branch_counter_width ABI here. The support will be implemented later in the Intel-specific patch. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231025201626.3000228-1-kan.liang@linux.intel.com
2023-10-27cdx: add sysfs for subsystem, class and revisionAbhijit Gangurde
CDX controller provides subsystem vendor, subsystem device, class and revision info of the device along with vendor and device ID in native endian format. CDX Bus system uses this information to bind the cdx device to the cdx device driver. Co-developed-by: Puneet Gupta <puneet.gupta@amd.com> Signed-off-by: Puneet Gupta <puneet.gupta@amd.com> Co-developed-by: Nipun Gupta <nipun.gupta@amd.com> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Tested-by: Nikhil Agarwal <nikhil.agarwal@amd.com> Link: https://lore.kernel.org/r/20231017160505.10640-8-abhijit.gangurde@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27cdx: add support for bus enable and disableAbhijit Gangurde
CDX bus needs to be disabled before updating/writing devices in the FPGA. Once the devices are written, the bus shall be rescanned. This change provides sysfs entry to enable/disable the CDX bus. Co-developed-by: Nipun Gupta <nipun.gupta@amd.com> Signed-off-by: Nipun Gupta <nipun.gupta@amd.com> Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://lore.kernel.org/r/20231017160505.10640-6-abhijit.gangurde@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27cdx: Register cdx bus as a device on cdx subsystemAbhijit Gangurde
While scanning for CDX devices, register newly discovered bus as a cdx device. CDX device attributes are visible based on device type. Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://lore.kernel.org/r/20231017160505.10640-5-abhijit.gangurde@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27cdx: Remove cdx controller list from cdx bus systemAbhijit Gangurde
Remove xarray list of cdx controller. Instead, use platform bus to locate the cdx controller using compat string used by cdx controller platform driver. Also, use ida to allocate a unique id for the controller. Signed-off-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Link: https://lore.kernel.org/r/20231017160505.10640-2-abhijit.gangurde@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27Revert "nvmem: add new config option"Rafał Miłecki
This reverts commit 517f14d9cf3533d5ab4fded195ab6f80a92e378f. Config option "no_of_node" is no longer needed since adding a more explicit and targeted option "add_legacy_fixed_of_cells". That "no_of_node" config option was needed *earlier* to help mtd's case. DT nodes of MTD partitions (that are also NVMEM devices) may contain subnodes. Those SHOULD NOT be treated as NVMEM fixed cells. To prevent NVMEM core code from parsing subnodes a "no_of_node" option was added (and set to true in mtd) to make for_each_child_of_node() in NVMEM a no-op. That was a bit hacky because it was messing with "of_node" pointer to achieve some side-effect. With the introduction of "add_legacy_fixed_of_cells" config option things got more explicit. MTD subsystem simply tells NVMEM when to look for fixed cells and there is no need to hack "of_node" pointer anymore. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20231023102759.31529-1-zajec5@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-27x509: Add OIDs for FIPS 202 SHA-3 hash and signaturesDimitri John Ledkov
Add OID for FIPS 202 SHA-3 family of hash functions, RSA & ECDSA signatures using those. Limit to 256 or larger sizes, for interoperability reasons. 224 is too weak for any practical uses. Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27crypto: ahash - remove support for nonzero alignmaskEric Biggers
Currently, the ahash API checks the alignment of all key and result buffers against the algorithm's declared alignmask, and for any unaligned buffers it falls back to manually aligned temporary buffers. This is virtually useless, however. First, since it does not apply to the message, its effect is much more limited than e.g. is the case for the alignmask for "skcipher". Second, the key and result buffers are given as virtual addresses and cannot (in general) be DMA'ed into, so drivers end up having to copy to/from them in software anyway. As a result it's easy to use memcpy() or the unaligned access helpers. The crypto_hash_walk_*() helper functions do use the alignmask to align the message. But with one exception those are only used for shash algorithms being exposed via the ahash API, not for native ahashes, and aligning the message is not required in this case, especially now that alignmask support has been removed from shash. The exception is the n2_core driver, which doesn't set an alignmask. In any case, no ahash algorithms actually set a nonzero alignmask anymore. Therefore, remove support for it from ahash. The benefit is that all the code to handle "misaligned" buffers in the ahash API goes away, reducing the overhead of the ahash API. This follows the same change that was made to shash. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27units: Add BYTES_PER_*BITDamian Muszynski
There is going to be a new user of the BYTES_PER_[K/M/G]BIT definition besides possibly existing ones. Add them to the header. Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Tero Kristo <tero.kristo@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27net: Add MDB get device operationIdo Schimmel
Add MDB net device operation that will be invoked by rtnetlink code in response to received RTM_GETMDB messages. Subsequent patches will implement the operation in the bridge and VXLAN drivers. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27Merge tag 'thunderbolt-for-v6.7-rc1' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next Mika writes: thunderbolt: Changes for v6.7 merge window This includes following USB4/Thunderbolt changes for the v6.7 merge window: - Configure asymmetric link if the DisplayPort bandwidth requires so - Enable path power management packet support for USB4 v2 routers - Make the bandwidth reservations to follow the USB4 v2 connection manager guide suggestions - DisplayPort tunneling improvements - Small cleanups and improvements around the driver. All these have been in linux-next with no reported issues. * tag 'thunderbolt-for-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (25 commits) thunderbolt: Fix one kernel-doc comment thunderbolt: Configure asymmetric link if needed and bandwidth allows thunderbolt: Add support for asymmetric link thunderbolt: Introduce tb_switch_depth() thunderbolt: Introduce tb_for_each_upstream_port_on_path() thunderbolt: Introduce tb_port_path_direction_downstream() thunderbolt: Set path power management packet support bit for USB4 v2 routers thunderbolt: Change bandwidth reservations to comply USB4 v2 thunderbolt: Make is_gen4_link() available to the rest of the driver thunderbolt: Use weight constants in tb_usb3_consumed_bandwidth() thunderbolt: Use constants for path weight and priority thunderbolt: Add DP IN added last in the head of the list of DP resources thunderbolt: Create multiple DisplayPort tunnels if there are more DP IN/OUT pairs thunderbolt: Log NVM version of routers and retimers thunderbolt: Use tb_tunnel_xxx() log macros in tb.c thunderbolt: Expose tb_tunnel_xxx() log macros to the rest of the driver thunderbolt: Use tb_tunnel_dbg() where possible to make logging more consistent thunderbolt: Fix typo of HPD bit for Hot Plug Detect thunderbolt: Fix typo in enum tb_link_width kernel-doc thunderbolt: Fix debug log when DisplayPort adapter not available for pairing ...
2023-10-27net/tcp: Wire TCP-AO to request socketsDmitry Safonov
Now when the new request socket is created from the listening socket, it's recorded what MKT was used by the peer. tcp_rsk_used_ao() is a new helper for checking if TCP-AO option was used to create the request socket. tcp_ao_copy_all_matching() will copy all keys that match the peer on the request socket, as well as preparing them for the usage (creating traffic keys). Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO sign to twskDmitry Safonov
Add support for sockets in time-wait state. ao_info as well as all keys are inherited on transition to time-wait socket. The lifetime of ao_info is now protected by ref counter, so that tcp_ao_destroy_sock() will destruct it only when the last user is gone. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Introduce TCP_AO setsockopt()sDmitry Safonov
Add 3 setsockopt()s: 1. TCP_AO_ADD_KEY to add a new Master Key Tuple (MKT) on a socket 2. TCP_AO_DEL_KEY to delete present MKT from a socket 3. TCP_AO_INFO to change flags, Current_key/RNext_key on a TCP-AO sk Userspace has to introduce keys on every socket it wants to use TCP-AO option on, similarly to TCP_MD5SIG/TCP_MD5SIG_EXT. RFC5925 prohibits definition of MKTs that would match the same peer, so do sanity checks on the data provided by userspace. Be as conservative as possible, including refusal of defining MKT on an established connection with no AO, removing the key in-use and etc. (1) and (2) are to be used by userspace key manager to add/remove keys. (3) main purpose is to set RNext_key, which (as prescribed by RFC5925) is the KeyID that will be requested in TCP-AO header from the peer to sign their segments with. At this moment the life of ao_info ends in tcp_v4_destroy_sock(). Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO config and structuresDmitry Safonov
Introduce new kernel config option and common structures as well as helpers to be used by TCP-AO code. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio', 'x86/vt-d', ↵Joerg Roedel
'x86/amd', 'core' and 's390' into next
2023-10-26Merge tag 'wireless-next-2023-10-26' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.7 The third, and most likely the last, features pull request for v6.7. Fixes all over and only few small new features. Major changes: iwlwifi - more Multi-Link Operation (MLO) work ath12k - QCN9274: mesh support ath11k - firmware-2.bin container file format support * tag 'wireless-next-2023-10-26' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (155 commits) wifi: ray_cs: Remove unnecessary (void*) conversions Revert "wifi: ath11k: call ath11k_mac_fils_discovery() without condition" wifi: ath12k: Introduce and use ath12k_sta_to_arsta() wifi: ath12k: fix htt mlo-offset event locking wifi: ath12k: fix dfs-radar and temperature event locking wifi: ath11k: fix gtk offload status event locking wifi: ath11k: fix htt pktlog locking wifi: ath11k: fix dfs radar event locking wifi: ath11k: fix temperature event locking wifi: ath12k: rename the sc naming convention to ab wifi: ath12k: rename the wmi_sc naming convention to wmi_ab wifi: ath11k: add firmware-2.bin support wifi: ath11k: qmi: refactor ath11k_qmi_m3_load() wifi: rtw89: cleanup firmware elements parsing wifi: rt2x00: rework MT7620 PA/LNA RF calibration wifi: rt2x00: rework MT7620 channel config function wifi: rt2x00: improve MT7620 register initialization MAINTAINERS: wifi: rt2x00: drop Helmut Schaa wifi: wlcore: main: replace deprecated strncpy with strscpy wifi: wlcore: boot: replace deprecated strncpy with strscpy ... ==================== Link: https://lore.kernel.org/r/20231026090411.B2426C433CB@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26Merge tag 'for-netdev' of ↵Jakub Kicinski
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-26 We've added 51 non-merge commits during the last 10 day(s) which contain a total of 75 files changed, 5037 insertions(+), 200 deletions(-). The main changes are: 1) Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF, from Chuyi Zhou. 2) Fix BPF verifier's iterator convergence logic to use exact states comparison for convergence checks, from Eduard Zingerman, Andrii Nakryiko and Alexei Starovoitov. 3) Add BPF programmable net device where bpf_mprog defines the logic of its xmit routine. It can operate in L3 and L2 mode, from Daniel Borkmann and Nikolay Aleksandrov. 4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking for global per-CPU allocator, from Hou Tao. 5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section was going to be present whenever a binary has SHT_GNU_versym section, from Andrii Nakryiko. 6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into atomic_set_release(), from Paul E. McKenney. 7) Add a warning if NAPI callback missed xdp_do_flush() under CONFIG_DEBUG_NET which helps checking if drivers were missing the former, from Sebastian Andrzej Siewior. 8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing a warning under sleepable programs, from Yafang Shao. 9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before checking map_locked, from Song Liu. 10) Make BPF CI linked_list failure test more robust, from Kumar Kartikeya Dwivedi. 11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik. 12) Fix xsk starving when multiple xsk sockets were associated with a single xsk_buff_pool, from Albert Huang. 13) Clarify the signed modulo implementation for the BPF ISA standardization document that it uses truncated division, from Dave Thaler. 14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider signed bounds knowledge, from Andrii Nakryiko. 15) Add an option to XDP selftests to use multi-buffer AF_XDP xdp_hw_metadata and mark used XDP programs as capable to use frags, from Larysa Zaremba. 16) Fix bpftool's BTF dumper wrt printing a pointer value and another one to fix struct_ops dump in an array, from Manu Bretelle. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits) netkit: Remove explicit active/peer ptr initialization selftests/bpf: Fix selftests broken by mitigations=off samples/bpf: Allow building with custom bpftool samples/bpf: Fix passing LDFLAGS to libbpf samples/bpf: Allow building with custom CFLAGS/LDFLAGS bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free selftests/bpf: Add selftests for netkit selftests/bpf: Add netlink helper library bpftool: Extend net dump with netkit progs bpftool: Implement link show support for netkit libbpf: Add link-based API for netkit tools: Sync if_link uapi header netkit, bpf: Add bpf programmable net device bpf: Improve JEQ/JNE branch taken logic bpf: Fold smp_mb__before_atomic() into atomic_set_release() bpf: Fix unnecessary -EBUSY from htab_lock_bucket xsk: Avoid starving the xsk further down the list bpf: print full verifier states on infinite loop detection selftests/bpf: test if state loops are detected in a tricky case bpf: correct loop detection for iterators convergence ... ==================== Link: https://lore.kernel.org/r/20231026150509.2824-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26exportfs: Change bcachefs fid_type enum to avoid conflictsKent Overstreet
Per Amir Goldstein, the fid types that bcachefs picked conflicted with xfs and fuse, which previously were in use but not deviced in the master enum. Since bcachefs is still out of tree, we can move. https://lore.kernel.org/linux-next/20231026203733.fx65mjyic4pka3e5@moria.home.lan/T/#ma59f65ba61f605b593e69f4690dbd317526d83ba Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-26Merge tag 'net-6.6-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi and netfilter. Most regressions addressed here come from quite old versions, with the exceptions of the iavf one and the WiFi fixes. No known outstanding reports or investigation. Fixes to fixes: - eth: iavf: in iavf_down, disable queues when removing the driver Previous releases - regressions: - sched: act_ct: additional checks for outdated flows - tcp: do not leave an empty skb in write queue - tcp: fix wrong RTO timeout when received SACK reneging - wifi: cfg80211: pass correct pointer to rdev_inform_bss() - eth: i40e: sync next_to_clean and next_to_process for programming status desc - eth: iavf: initialize waitqueues before starting watchdog_task Previous releases - always broken: - eth: r8169: fix data-races - eth: igb: fix potential memory leak in igb_add_ethtool_nfc_entry - eth: r8152: avoid writing garbage to the adapter's registers - eth: gtp: fix fragmentation needed check with gso" * tag 'net-6.6-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits) iavf: in iavf_down, disable queues when removing the driver vsock/virtio: initialize the_virtio_vsock before using VQs net: ipv6: fix typo in comments net: ipv4: fix typo in comments net/sched: act_ct: additional checks for outdated flows netfilter: flowtable: GC pushes back packets to classic path i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR gtp: fix fragmentation needed check with gso gtp: uapi: fix GTPA_MAX Fix NULL pointer dereference in cn_filter() sfc: cleanup and reduce netlink error messages net/handshake: fix file ref count in handshake_nl_accept_doit() wifi: mac80211: don't drop all unprotected public action frames wifi: cfg80211: fix assoc response warning on failed links wifi: cfg80211: pass correct pointer to rdev_inform_bss() isdn: mISDN: hfcsusb: Spelling fix in comment tcp: fix wrong RTO timeout when received SACK reneging r8152: Block future register access if register access fails r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en() ...
2023-10-26Merge branch 'for-next/cpus_have_const_cap' into for-next/coreCatalin Marinas
* for-next/cpus_have_const_cap: (38 commits) : cpus_have_const_cap() removal arm64: Remove cpus_have_const_cap() arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2 arm64: Avoid cpus_have_const_cap() for ARM64_SSBS arm64: Avoid cpus_have_const_cap() for ARM64_MTE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT ...
2023-10-26Merge tag 'v6.6-rc7' into coreJoerg Roedel
Linux 6.6-rc7
2023-10-26iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domainJason Gunthorpe
Following the pattern of identity domains, just assign the BLOCKED domain global statics to a value in ops. Update the core code to use the global static directly. Update powerpc to use the new scheme and remove its empty domain_alloc callback. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/1-v2-bff223cf6409+282-dart_paging_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-26iommu: Add iommu_copy_struct_from_user helperNicolin Chen
Wrap up the data type/pointer/len sanity and a copy_struct_from_user call for iommu drivers to copy driver specific data via struct iommu_user_data. And expect it to be used in the domain_alloc_user op for example. Link: https://lore.kernel.org/r/20231026043938.63898-9-yi.l.liu@intel.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu: Pass in parent domain with user_data to domain_alloc_user opYi Liu
domain_alloc_user op already accepts user flags for domain allocation, add a parent domain pointer and a driver specific user data support as well. The user data would be tagged with a type for iommu drivers to add their own driver specific user data per hw_pagetable. Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an iommufd core uAPI structure. Make the user data opaque to the core, since a userspace driver must match the kernel driver. In the future, if drivers share some common parameter, there would be a generic parameter as well. Link: https://lore.kernel.org/r/20231026043938.63898-7-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu: Add IOMMU_DOMAIN_NESTEDLu Baolu
Introduce a new domain type for a user I/O page table, which is nested on top of another user space address represented by a PAGING domain. This new domain can be allocated by the domain_alloc_user op, and attached to a device through the existing iommu_attach_device/group() interfaces. The mappings of a nested domain are managed by user space software, so it is not necessary to have map/unmap callbacks. Link: https://lore.kernel.org/r/20231026043938.63898-2-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26Merge branches 'pm-sleep', 'powercap' and 'pm-tools'Rafael J. Wysocki
Merge updates related to system sleep handling, one power capping update and one PM utility update for 6.7-rc1: - Use __get_safe_page() rather than touching the list in hibernation snapshot code (Brian Geffon). - Fix symbol export for _SIMPLE_ variants of _PM_OPS() (Raag Jadav). - Clean up sync_read handling in snapshot_write_next() (Brian Geffon). - Fix kerneldoc comments for swsusp_check() and swsusp_close() to better match code (Christoph Hellwig). - Downgrade BIOS locked limits pr_warn() in the Intel RAPL power capping driver to pr_debug() (Ville Syrjälä). - Change the minimum python version for the intel_pstate_tracer utility from 2.7 to 3.6 (Doug Smythies). * pm-sleep: PM: hibernate: fix the kerneldoc comment for swsusp_check() and swsusp_close() PM: hibernate: Clean up sync_read handling in snapshot_write_next() PM: sleep: Fix symbol export for _SIMPLE_ variants of _PM_OPS() PM: hibernate: Use __get_safe_page() rather than touching the list * powercap: powercap: intel_rapl: Downgrade BIOS locked limits pr_warn() to pr_debug() * pm-tools: tools/power/x86/intel_pstate_tracer: python minimum version
2023-10-26Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq updates for 6.7-rc1: - Add support for several Qualcomm SoC versions and other similar changes (Christian Marangi, Dmitry Baryshkov, Luca Weiss, Neil Armstrong, Richard Acayan, Robert Marko, Rohit Agarwal, Stephan Gerhold and Varadarajan Narayanan). - Clean up the tegra cpufreq driver (Sumit Gupta). - Use of_property_read_reg() to parse "reg" in pmac32 driver (Rob Herring). - Add support for TI's am62p5 Soc (Bryan Brattlof). - Make ARM_BRCMSTB_AVS_CPUFREQ depends on !ARM_SCMI_CPUFREQ (Florian Fainelli). - Update Kconfig to mention i.MX7 as well (Alexander Stein). - Revise global turbo disable check in intel_pstate (Srinivas Pandruvada). - Carry out initialization of sg_cpu in the schedutil cpufreq governor in one loop (Liao Chang). - Simplify the condition for storing 'down_threshold' in the conservative cpufreq governor (Liao Chang). - Use fine-grained mutex in the userspace cpufreq governor (Liao Chang). - Move is_managed indicator in the userspace cpufreq governor into a per-policy structure (Liao Chang). - Rebuild sched-domains when removing cpufreq driver (Pierre Gondois). - Fix buffer overflow detection in trans_stats() (Christian Marangi). * pm-cpufreq: (32 commits) dt-bindings: cpufreq: qcom-hw: document SM8650 CPUFREQ Hardware cpufreq: arm: Kconfig: Add i.MX7 to supported SoC for ARM_IMX_CPUFREQ_DT cpufreq: qcom-nvmem: add support for IPQ8064 cpufreq: qcom-nvmem: also accept operating-points-v2-krait-cpu cpufreq: qcom-nvmem: drop pvs_ver for format a fuses dt-bindings: cpufreq: qcom-cpufreq-nvmem: Document krait-cpu cpufreq: qcom-nvmem: add support for IPQ6018 dt-bindings: cpufreq: qcom-cpufreq-nvmem: document IPQ6018 cpufreq: qcom-nvmem: Add MSM8909 cpufreq: qcom-nvmem: Simplify driver data allocation cpufreq: stats: Fix buffer overflow detection in trans_stats() dt-bindings: cpufreq: cpufreq-qcom-hw: Add SDX75 compatible cpufreq: ARM_BRCMSTB_AVS_CPUFREQ cannot be used with ARM_SCMI_CPUFREQ cpufreq: ti-cpufreq: Add opp support for am62p5 SoCs cpufreq: dt-platdev: add am62p5 to blocklist cpufreq: tegra194: remove redundant AND with cpu_online_mask cpufreq: tegra194: use refclk delta based loop instead of udelay cpufreq: tegra194: save CPU data to avoid repeated SMP calls cpufreq: Rebuild sched-domains when removing cpufreq driver cpufreq: userspace: Move is_managed indicator into per-policy structure ...
2023-10-26Merge branches 'acpi-ac', 'acpi-pad' and 'pnp'Rafael J. Wysocki
Merge updates of the ACPI AC and ACPI PAD drivers and PNP updates for 6.7-rc1: - Switch over the ACPI AC and ACPI PAD drivers to using the platform driver interface which, is more logically consistent than binding a driver directly to an ACPI device object, and clean them up (Michal Wilczynski). - Replace strncpy() in the PNP code with either memcpy() or strscpy() as appropriate (Justin Stitt). - Clean up coding style in pnp.h (GuoHua Cheng). * acpi-ac: ACPI: AC: Rename ACPI device from device to adev ACPI: AC: Replace acpi_driver with platform_driver ACPI: AC: Use string_choices API instead of ternary operator ACPI: AC: Remove redundant checks * acpi-pad: ACPI: acpi_pad: Rename ACPI device from device to adev ACPI: acpi_pad: Use dev groups for sysfs ACPI: acpi_pad: Replace acpi_driver with platform_driver * pnp: PNP: replace deprecated strncpy() with memcpy() PNP: ACPI: replace deprecated strncpy() with strscpy() PNP: Clean up coding style in pnp.h
2023-10-26Merge branches 'acpi-ec', 'acpi-sysfs', 'acpi-misc' and 'acpi-uid'Rafael J. Wysocki
Merge ACPI EC driver updates, ACPI sysfs interface updates, misc updates related to ACPI and changes related to ACPI _UID handling for 6.7-rc1: - Add EC GPE detection quirk for HP 250 G7 Notebook PC (Jonathan Denose). - Fix and clean up create_pnp_modalias() and create_of_modalias() (Christophe JAILLET). - Modify 2 pieces of code to use acpi_evaluate_dsm_typed() (Andy Shevchenko). - Define acpi_dev_uid_match() for matching _UID and use it in several places (Raag Jadav). - Use acpi_device_uid() for fetching _UID in 2 places (Raag Jadav). * acpi-ec: ACPI: EC: Add quirk for HP 250 G7 Notebook PC * acpi-sysfs: ACPI: sysfs: Clean up create_pnp_modalias() and create_of_modalias() ACPI: sysfs: Fix create_pnp_modalias() and create_of_modalias() * acpi-misc: ACPI: x86: s2idle: Switch to use acpi_evaluate_dsm_typed() ACPI: PCI: Switch to use acpi_evaluate_dsm_typed() * acpi-uid: perf: arm_cspmu: use acpi_dev_hid_uid_match() for matching _HID and _UID ACPI: x86: use acpi_dev_uid_match() for matching _UID ACPI: utils: use acpi_dev_uid_match() for matching _UID pinctrl: intel: use acpi_dev_uid_match() for matching _UID ACPI: utils: Introduce acpi_dev_uid_match() for matching _UID perf: qcom: use acpi_device_uid() for fetching _UID ACPI: sysfs: use acpi_device_uid() for fetching _UID
2023-10-26x86/apic/msi: Fix misconfigured non-maskable MSI quirkKoichiro Den
commit ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted"), reworked the code so that the x86 specific quirk for affinity setting of non-maskable PCI/MSI interrupts is not longer activated if necessary. This could be solved by restoring the original logic in the core MSI code, but after a deeper analysis it turned out that the quirk flag is not required at all. The quirk is only required when the PCI/MSI device cannot mask the MSI interrupts, which in turn also prevents reservation mode from being enabled for the affected interrupt. This allows ot remove the NOMASK quirk bit completely as msi_set_affinity() can instead check whether reservation mode is enabled for the interrupt, which gives exactly the same answer. Even in the momentary non-existing case that the reservation mode would be not set for a maskable MSI interrupt this would not cause any harm as it just would cause msi_set_affinity() to go needlessly through the functionaly equivalent slow path, which works perfectly fine with maskable interrupts as well. Rework msi_set_affinity() to query the reservation mode and remove all NOMASK quirk logic from the core code. [ tglx: Massaged changelog ] Fixes: ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp
2023-10-25mempolicy: mmap_lock is not needed while migrating foliosHugh Dickins
mbind(2) holds down_write of current task's mmap_lock throughout (exclusive because it needs to set the new mempolicy on the vmas); migrate_pages(2) holds down_read of pid's mmap_lock throughout. They both hold mmap_lock across the internal migrate_pages(), under which all new page allocations (huge or small) are made. I'm nervous about it; and migrate_pages() certainly does not need mmap_lock itself. It's done this way for mbind(2), because its page allocator is vma_alloc_folio() or alloc_hugetlb_folio_vma(), both of which depend on vma and address. Now that we have alloc_pages_mpol(), depending on (refcounted) memory policy and interleave index, mbind(2) can be modified to use that or alloc_hugetlb_folio_nodemask(), and then not need mmap_lock across the internal migrate_pages() at all: add alloc_migration_target_by_mpol() to replace mbind's new_page(). (After that change, alloc_hugetlb_folio_vma() is used by nothing but a userfaultfd function: move it out of hugetlb.h and into the #ifdef.) migrate_pages(2) has chosen its target node before migrating, so can continue to use the standard alloc_migration_target(); but let it take and drop mmap_lock just around migrate_to_node()'s queue_pages_range(): neither the node-to-node calculations nor the page migrations need it. It seems unlikely, but it is conceivable that some userspace depends on the kernel's mmap_lock exclusion here, instead of doing its own locking: more likely in a testsuite than in real life. It is also possible, of course, that some pages on the list will be munmapped by another thread before they are migrated, or a newer memory policy applied to the range by that time: but such races could happen before, as soon as mmap_lock was dropped, so it does not appear to be a concern. Link: https://lkml.kernel.org/r/21e564e8-269f-6a89-7ee2-fd612831c289@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mempolicy: alloc_pages_mpol() for NUMA policy without vmaHugh Dickins
Shrink shmem's stack usage by eliminating the pseudo-vma from its folio allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the principal actor for passing mempolicy choice down to __alloc_pages(), rather than vma_alloc_folio(gfp, order, vma, addr, hugepage). vma_alloc_folio() and alloc_pages() remain, but as wrappers around alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the additional args to policy_nodemask(), which subsumes policy_node(). Cleanup throughout, cutting out some unhelpful "helpers". It would all be much simpler without MPOL_INTERLEAVE, but that adds a dynamic to the constant mpol: complicated by v3.6 commit 09c231cb8bfd ("tmpfs: distribute interleave better across nodes"), which added ino bias to the interleave, hidden from mm/mempolicy.c until this commit. Hence "ilx" throughout, the "interleave index". Originally I thought it could be done just with nid, but that's wrong: the nodemask may come from the shared policy layer below a shmem vma, or it may come from the task layer above a shmem vma; and without the final nodemask then nodeid cannot be decided. And how ilx is applied depends also on page order. The interleave index is almost always irrelevant unless MPOL_INTERLEAVE: with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX passed down from vma-less alloc_pages() is also used as hint not to use THP-style hugepage allocation - to avoid the overhead of a hugepage arg (though I don't understand why we never just added a GFP bit for THP - if it actually needs a different allocation strategy from other pages of the same order). vma_alloc_folio() still carries its hugepage arg here, but it is not used, and should be removed when agreed. get_vma_policy() no longer allows a NULL vma: over time I believe we've eradicated all the places which used to need it e.g. swapoff and madvise used to pass NULL vma to read_swap_cache_async(), but now know the vma. [hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()] Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mempolicy trivia: use pgoff_t in shared mempolicy treeHugh Dickins
Prefer the more explicit "pgoff_t" to "unsigned long" when dealing with a shared mempolicy tree. Delete confusing comment about pseudo mm vmas. Link: https://lkml.kernel.org/r/5451157-3818-4af5-fd2c-5d26a5d1dc53@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mempolicy trivia: slightly more consistent namingHugh Dickins
Before getting down to work, do a little cleanup, mainly of inconsistent variable naming. I gave up trying to rationalize mpol versus pol versus policy, and node versus nid, but let's avoid p and nd. Remove a few superfluous blank lines, but add one; and here prefer vma->vm_policy to vma_policy(vma) - the latter being appropriate in other sources, which have to allow for !CONFIG_NUMA. That intriguing line about KERNEL_DS? should have gone in v2.6.15, when numa_policy_init() stopped using set_mempolicy(2)'s system call handler. Link: https://lkml.kernel.org/r/68287974-b6ae-7df-4ba-d19ddd69cbf@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25hugetlbfs: drop shared NUMA mempolicy pretenceHugh Dickins
Patch series "mempolicy: cleanups leading to NUMA mpol without vma", v2. Mostly cleanups in mm/mempolicy.c, but finally removing the pseudo-vma from shmem folio allocation, and removing the mmap_lock around folio migration for mbind and migrate_pages syscalls. This patch (of 12): hugetlbfs_fallocate() goes through the motions of pasting a shared NUMA mempolicy onto its pseudo-vma, but how could there ever be a shared NUMA mempolicy for this file? hugetlb_vm_ops has never offered a set_policy method, and hugetlbfs_parse_param() has never supported any mpol options for a mount-wide default policy. It's just an illusion: clean it away so as not to confuse others, giving us more freedom to adjust shmem's set_policy/get_policy implementation. But hugetlbfs_inode_info is still required, just to accommodate seals. Yes, shared NUMA mempolicy support could be added to hugetlbfs, with a set_policy method and/or mpol mount option (Andi's first posting did include an admitted-unsatisfactory hugetlb_set_policy()); but it seems that nobody has bothered to add that in the nineteen years since v2.6.7 made it possible, and there is at least one company that has invested enough into hugetlbfs, that I guess they have learnt well enough how to manage its NUMA, without needing shared mempolicy. Remove linux/mempolicy.h from linux/hugetlb.h: include linux/pagemap.h in its place, because hugetlb.h's recently added use of filemap_lock_folio() requires that (although most .configs and .c's get it in some other way). Link: https://lkml.kernel.org/r/ebc0987e-beff-8bfb-9283-234c2cbd17c5@google.com Link: https://lkml.kernel.org/r/cae82d4b-904a-faaf-282a-34fcc188c81f@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tejun heo <tj@kernel.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm/damon: implement a function for max nr_accesses safe calculationSeongJae Park
Patch series "avoid divide-by-zero due to max_nr_accesses overflow". The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Avoid the divide-by-zero by implementing a function that handles the corner case (first patch), and replaces the vulnerable direct max nr_accesses calculations (remaining patches). Note that the patches for the replacements are divided for broken commits, to make backporting on required tres easier. Especially, the last patch is for a patch that not yet merged into the mainline but in mm tree. This patch (of 4): The maximum nr_accesses of given DAMON context can be calculated by dividing the aggregation interval by the sampling interval. Some logics in DAMON uses the maximum nr_accesses as a divisor. Hence, the value shouldn't be zero. Such case is avoided since DAMON avoids setting the agregation interval as samller than the sampling interval. However, since nr_accesses is unsigned int while the intervals are unsigned long, the maximum nr_accesses could be zero while casting. Implement a function that handles the corner case. Note that this commit is not fixing the real issue since this is only introducing the safe function that will replaces the problematic divisions. The replacements will be made by followup commits, to make backporting on stable series easier. Link: https://lkml.kernel.org/r/20231019194924.100347-1-sj@kernel.org Link: https://lkml.kernel.org/r/20231019194924.100347-2-sj@kernel.org Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> [5.16+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm/khugepaged: convert alloc_charge_hpage() to use foliosVishal Moola (Oracle)
Also remove count_memcg_page_event now that its last caller no longer uses it and reword hpage_collapse_alloc_page() to hpage_collapse_alloc_folio(). This removes 1 call to compound_head() and helps convert khugepaged to use folios throughout. Link: https://lkml.kernel.org/r/20231020183331.10770-5-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25bootmem: use kmemleak_free_part_phys in free_bootmem_pageLiu Shixin
Since kmemleak_alloc_phys() rather than kmemleak_alloc() was called from memblock_alloc_range_nid(), kmemleak_free_part_phys() should be used to delete kmemleak object in free_bootmem_page(). In debug mode, there are following warning: kmemleak: Partially freeing unknown object at 0xffff97345aff7000 (size 4096) Link: https://lkml.kernel.org/r/20231018102952.3339837-3-liushixin2@huawei.com Fixes: 028725e73375 ("bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Patrick Wang <patrick.wang.shcn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm: remove page_cpupid_xchg_last()Kefeng Wang
Since all calls use folio_xchg_last_cpupid(), remove page_cpupid_xchg_last(). Link: https://lkml.kernel.org/r/20231018140806.2783514-20-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm: make finish_mkwrite_fault() staticKefeng Wang
Make finish_mkwrite_fault static since it is not used outside of memory.c. Link: https://lkml.kernel.org/r/20231018140806.2783514-17-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm: add folio_xchg_last_cpupid()Kefeng Wang
Add folio_xchg_last_cpupid() wrapper, which is required to convert page_cpupid_xchg_last() to folio vertion later in the series. Link: https://lkml.kernel.org/r/20231018140806.2783514-13-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm: remove xchg_page_access_time()Kefeng Wang
Since all calls use folio_xchg_access_time(), remove xchg_page_access_time(). Link: https://lkml.kernel.org/r/20231018140806.2783514-12-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm: add folio_xchg_access_time()Kefeng Wang
Add folio_xchg_access_time() wrapper, which is required to convert xchg_page_access_time() to folio vertion later in the series. Link: https://lkml.kernel.org/r/20231018140806.2783514-8-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm: remove page_cpupid_last()Kefeng Wang
Since all calls use folio_last_cpupid(), remove page_cpupid_last(). Link: https://lkml.kernel.org/r/20231018140806.2783514-7-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm: add folio_last_cpupid()Kefeng Wang
Add folio_last_cpupid() wrapper, which is required to convert page_cpupid_last() to folio vertion later in the series. Link: https://lkml.kernel.org/r/20231018140806.2783514-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm_types: add virtual and _last_cpupid into struct folioKefeng Wang
Patch series "mm: convert page cpupid functions to folios", v3. The cpupid(or access time) used by numa balancing is stored in flags or _last_cpupid(if LAST_CPUPID_NOT_IN_PAGE_FLAGS) of page, this is to convert page cpupid to folio cpupid, a new _last_cpupid is added into folio, which make us to use folio->_last_cpupid directly, and the page cpupid functions are converted to folio ones. page_cpupid_last() -> folio_last_cpupid() xchg_page_access_time() -> folio_xchg_access_time() page_cpupid_xchg_last() -> folio_xchg_last_cpupid() This patch (of 19): If WANT_PAGE_VIRTUAL and LAST_CPUPID_NOT_IN_PAGE_FLAGS defined, the 'virtual' and '_last_cpupid' are in struct page, and since _last_cpupid is used by numa balancing feature, it is better to move it before KMSAN metadata from struct page, also add them into struct folio to make us to access them from folio directly. Link: https://lkml.kernel.org/r/20231018140806.2783514-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20231018140806.2783514-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-25mm: kmem: reimplement get_obj_cgroup_from_current()Roman Gushchin
Reimplement get_obj_cgroup_from_current() using current_obj_cgroup(). get_obj_cgroup_from_current() and current_obj_cgroup() share 80% of the code, so the new implementation is almost trivial. get_obj_cgroup_from_current() is a convenient function used by the bpf subsystem, so there is no reason to get rid of it completely. Link: https://lkml.kernel.org/r/20231019225346.1822282-7-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>