summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-05-06printk: Change type of CONFIG_BASE_SMALL to boolYoann Congal
CONFIG_BASE_SMALL is currently a type int but is only used as a boolean. So, change its type to bool and adapt all usages: CONFIG_BASE_SMALL == 0 becomes !IS_ENABLED(CONFIG_BASE_SMALL) and CONFIG_BASE_SMALL != 0 becomes IS_ENABLED(CONFIG_BASE_SMALL). Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Yoann Congal <yoann.congal@smile.fr> Link: https://lore.kernel.org/r/20240505080343.1471198-3-yoann.congal@smile.fr Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-05-06NFSD: add listener-{set,get} netlink commandLorenzo Bianconi
Introduce write_ports netlink command. For listener-set, userspace is expected to provide a NFS listeners list it wants enabled. All other sockets will be closed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Co-developed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06SUNRPC: add a new svc_find_listener helperJeff Layton
svc_find_listener will return the transport instance pointer for the endpoint accepting connections/peer traffic from the specified transport class and matching sockaddr. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06SUNRPC: introduce svc_xprt_create_from_sa utility routineLorenzo Bianconi
Add svc_xprt_create_from_sa utility routine and refactor svc_xprt_create() codebase in order to introduce the capability to create a svc port from socket address. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: add write_version to netlink commandLorenzo Bianconi
Introduce write_version netlink command through a "declarative" interface. This patch introduces a change in behavior since for version-set userspace is expected to provide a NFS major/minor version list it wants to enable while all the other ones will be disabled. (procfs write_version command implements imperative interface where the admin writes +3/-3 to enable/disable a single version. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06NFSD: convert write_threads to netlink commandLorenzo Bianconi
Introduce write_threads netlink command similar to the one available through the procfs. Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Co-developed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06nfsd: trivial GET_DIR_DELEGATION supportJeff Layton
This adds basic infrastructure for handing GET_DIR_DELEGATION calls from clients, including the decoders and encoders. For now, it always just returns NFS4_OK + GDD4_UNAVAIL. Eventually clients may start sending this operation, and it's better if we can return GDD4_UNAVAIL instead of having to abort the whole compound. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06powerpc/dexcr: Add DEXCR prctl interfaceBenjamin Gray
Now that we track a DEXCR on a per-task basis, individual tasks are free to configure it as they like. The interface is a pair of getter/setter prctl's that work on a single aspect at a time (multiple aspects at once is more difficult if there are different rules applied for each aspect, now or in future). The getter shows the current state of the process config, and the setter allows setting/clearing the aspect. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> [mpe: Account for PR_RISCV_SET_ICACHE_FLUSH_CTX, shrink some longs lines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240417112325.728010-5-bgray@linux.ibm.com
2024-05-06Merge back thermal cotntrol material for v6.10.Rafael J. Wysocki
2024-05-06alpha: drop pre-EV56 supportArnd Bergmann
All EV4 machines are already gone, and the remaining EV5 based machines all support the slightly more modern EV56 generation as well. Debian only supports EV56 and later. Drop both of these and build kernels optimized for EV56 and higher when the "generic" options is selected, tuning for an out-of-order EV6 pipeline, same as Debian userspace. Since this was the only supported architecture without 8-bit and 16-bit stores, common kernel code no longer has to worry about aligning struct members, and existing workarounds from the block and tty layers can be removed. The alpha memory management code no longer needs an abstraction for the differences between EV4 and EV5+. Link: https://lists.debian.org/debian-alpha/2023/05/msg00009.html Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-05-06net: create tcp_gro_header_pull helper functionFelix Fietkau
Pull the code out of tcp_gro_receive in order to access the tcp header from tcp4/6_gro_receive. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: create tcp_gro_lookup helper functionFelix Fietkau
This pulls the flow port matching out of tcp_gro_receive, so that it can be reused for the next change, which adds the TCP fraglist GRO heuristic. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06net: move skb_gro_receive_list from udp to coreFelix Fietkau
This helper function will be used for TCP fraglist GRO support Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router DiscoveryLinus Lüssing
So far Multicast Router Advertisements and Multicast Router Solicitations from the Multicast Router Discovery protocol (RFC4286) would be marked as INVALID for IPv6, even if they are in fact intact and adhering to RFC4286. This broke MRA reception and by that multicast reception on IPv6 multicast routers in a Proxmox managed setup, where Proxmox would install a rule like "-m conntrack --ctstate INVALID -j DROP" at the top of the FORWARD chain with br-nf-call-ip6tables enabled by default. Similar to as it's done for MLDv1, MLDv2 and IPv6 Neighbor Discovery already, fix this issue by excluding MRD from connection tracking handling as MRD always uses predefined multicast destinations for its messages, too. This changes the ct-state for ICMPv6 MRD messages from INVALID to UNTRACKED. This issue was found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc). Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06firewire: core: remove flag and width from u64 formats of tracepoints eventsTakashi Sakamoto
The pointer to fw_packet structure is passed to ring buffer of tracepoints framework as the value of u64 type. '0x%016llx' is used for the print format of value, while the flag and width are useless in the case. This commit removes them. Link: https://lore.kernel.org/r/20240506082154.396077-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: fix type of timestamp for async_inbound_template tracepoints ↵Takashi Sakamoto
events The type of time stamp should be u16, instead of u8. Link: https://lore.kernel.org/r/20240506082154.396077-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoint event for handling bus resetTakashi Sakamoto
The core function expects hardware drivers to call fw_core_handle_bus_reset() when changing bus topology. The 1394 OHCI driver calls it when handling selfID event as a result of any bus-reset. This commit adds a tracepoints event for it. Link: https://lore.kernel.org/r/20240501073238.72769-6-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints events for initiating bus resetTakashi Sakamoto
At a commit 673249124304 ("firewire: core: option to log bus reset initiation"), some kernel log messages were added to trace initiation of bus reset. The kernel log messages are really helpful, while nowadays it is not preferable just for debugging purpose. For the purpose, Linux kernel tracepoints is more preferable. This commit adds some alternative tracepoints events. Link: https://lore.kernel.org/r/20240501073238.72769-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints event for asynchronous inbound phy packetTakashi Sakamoto
At the former commit, a pair of tracepoints events is added to trace asynchronous outbound phy packet. This commit adds a tracepoints event to trace inbound phy packet. It includes transaction status as well as the content of phy packet. This is an example for Remote Reply Packet as a response to Remote Access Packet sent by lsfirewirephy command in linux-firewire-utils: async_phy_inbound: \ packet=0xffff955fc02b4e10 generation=1 status=1 timestamp=0x0619 \ first_quadlet=0x001c8208 second_quadlet=0xffe37df7 Link: https://lore.kernel.org/r/20240430001404.734657-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core/cdev: add tracepoints events for asynchronous phy packetTakashi Sakamoto
In IEEE 1394 bus, the type of asynchronous packet without any offset to node address space is called as phy packet. The destination of packet is IEEE 1394 phy itself. This type of packet is used for several purposes, mainly for selfID at the state of bus reset, to force selection of root node, and to adjust gap count. This commit adds tracepoints events for the type of asynchronous outbound packet. Like asynchronous outbound transaction packets, a pair of events are added to trace initiation and completion of transmission. In the case that the phy packet is sent by kernel API, the match between the initiation and completion is not so easy, since the data of 'struct fw_packet' is allocated statically. In the case that it is sent by userspace applications via cdev, the match is easy, since the data is allocated per each. This example is for Remote Access Packet by lsfirewirephy command in linux-firewire-utils: async_phy_outbound_initiate: \ packet=0xffff89fb34e42e78 generation=1 first_quadlet=0x00148200 \ second_quadlet=0xffeb7dff async_phy_outbound_complete: \ packet=0xffff89fb34e42e78 generation=1 status=1 timestamp=0x0619 Link: https://lore.kernel.org/r/20240430001404.734657-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints events for asynchronous outbound responseTakashi Sakamoto
In a view of core transaction service, the asynchronous outbound response consists of two stages; initiation and completion. This commit adds a pair of events for the asynchronous outbound response. The following example is for asynchronous write quadlet request as IEC 61883-1 FCP response to node 0xffc1. async_response_outbound_initiate: \ transaction=0xffff89fa08cf16c0 generation=4 scode=2 dst_id=0xffc1 \ tlabel=25 tcode=2 src_id=0xffc0 rcode=0 \ header={0xffc16420,0xffc00000,0x0,0x0} data={} async_response_outbound_complete: \ transaction=0xffff89fa08cf16c0 generation=4 scode=2 status=1 \ timestamp=0x0000 Link: https://lore.kernel.org/r/20240429043218.609398-6-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoint event for asynchronous inbound requestTakashi Sakamoto
This commit adds an event for asynchronous inbound request. The following example is for asynchronous block write request as IEC 61883-1 FCP request from node 0xffc1. async_request_inbound: \ transaction=0xffff89fa08cf16c0 generation=4 scode=2 status=2 \ timestamp=0x00b3 dst_id=0xffc0 tlabel=19 tcode=1 src_id=0xffc1 \ offset=0xfffff0000d00 header={0xffc04d10,0xffc1ffff,0xf0000d00,0x80000} \ data={0x19ff08,0xffff0090} Link: https://lore.kernel.org/r/20240429043218.609398-5-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints event for asynchronous inbound responseTakashi Sakamoto
In the transaction of IEEE 1394, the node to receive the asynchronous request transfers any response packet to the requester except for the unified transaction. This commit adds an event for the inbound packet. Note that the code to decode the packet header is moved, against the note about the sanity check. The following example is for asynchronous lock response with compare_and_swap code. async_response_inbound: \ transaction=0xffff955fc6a07a10 generation=5 scode=2 status=1 \ timestamp=0x0089 dst_id=0xffc1 tlabel=54 tcode=11 src_id=0xffc0 \ rcode=0 header={0xffc1d9b0,0xffc00000,0x0,0x40002} data={0x50800080} Link: https://lore.kernel.org/r/20240429043218.609398-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add tracepoints events for asynchronous outbound requestTakashi Sakamoto
In a view of core transaction service, the asynchronous outbound request consists of two stages; initiation and completion. This commit adds a pair of event for them. The following example is for asynchronous lock request with compare_swap code to offset 0x'ffff'f000'0904 in node 0xffc0. async_request_outbound_initiate: \ transaction=0xffff955fc6a07a10 generation=5 scode=2 dst_id=0xffc0 \ tlabel=54 tcode=9 src_id=0xffc1 offset=0xfffff0000904 \ header={0xffc0d990,0xffc1ffff,0xf0000904,0x80002} data={0x80,0x940181} async_request_outbound_complete: \ transaction=0xffff955fc6a07a10 generation=5 scode=2 status=2 \ timestamp=0xd887 Link: https://lore.kernel.org/r/20240429043218.609398-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: core: add support for Linux kernel tracepointsTakashi Sakamoto
The Linux Kernel Tracepoints framework is enough useful to trace packet data inbound to and outbound from core. This commit adds firewire subsystem to use the framework. Link: https://lore.kernel.org/r/20240429043218.609398-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06firewire: Annotate struct fw_iso_packet with __counted_by()Gustavo A. R. Silva
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZgIrOuR3JI/jzqoH@neat Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-05-06regulator: new API for voltage reference suppliesMark Brown
Merge series from David Lechner <dlechner@baylibre.com>: In the IIO subsystem, we noticed a pattern in many drivers where we need to get, enable and get the voltage of a supply that provides a reference voltage. In these cases, we only need the voltage and not a handle to the regulator. Another common pattern is for chips to have an internal reference voltage that is used when an external reference is not available. There are also a few drivers outside of IIO that do the same. So we would like to propose a new regulator consumer API to handle these specific cases to avoid repeating the same boilerplate code in multiple drivers. As an example of how these functions are used, I have included a few patches to consumer drivers. But to avoid a giant patch bomb, I have omitted the iio/adc and iio/dac patches I have prepared from this series. I will send those separately but these will add 36 more users of devm_regulator_get_enable_read_voltage() in addition to the 6 here. In total, this will eliminate nearly 1000 lines of similar code and will simplify writing and reviewing new drivers in the future.
2024-05-05mm/pagemap: make trylock_page return boolHao Ge
Make trylock_page return bool to align the return values of folio_trylock function and it also corresponds to its comment. Link: https://lkml.kernel.org/r/20240428014711.11169-1-gehao@kylinos.cn Signed-off-by: Hao Ge <gehao@kylinos.cn> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm/damon: add DAMOS filter type YOUNGSeongJae Park
Define yet another DAMOS filter type, YOUNG. Like anon and memcg, the type of filter will be applied to each page in the memory region, and see if the page is accessed since the last check. Based on the 'matching' parameter, the page is filtered out or in. Note that this commit is adding only the type definition. The implementation should be made by DAMON operations sets. A commit for the implementation on 'paddr' DAMON operations set will follow. Link: https://lkml.kernel.org/r/20240426195247.100306-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Tested-by: Honggyu Kim <honggyu.kim@sk.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: simplify thp_vma_allowable_orderMatthew Wilcox
Combine the three boolean arguments into one flags argument for readability. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05writeback: support retrieving per group debug writeback stats of bdiKemeng Shi
Add /sys/kernel/debug/bdi/xxx/wb_stats to show per group writeback stats of bdi. Following domain hierarchy is tested: global domain (320G) / \ cgroup domain1(10G) cgroup domain2(10G) | | bdi wb1 wb2 /* per wb writeback info of bdi is collected */ cat wb_stats WbCgIno: 1 WbWriteback: 0 kB WbReclaimable: 0 kB WbDirtyThresh: 0 kB WbDirtied: 0 kB WbWritten: 0 kB WbWriteBandwidth: 102400 kBps b_dirty: 0 b_io: 0 b_more_io: 0 b_dirty_time: 0 state: 1 WbCgIno: 4091 WbWriteback: 1792 kB WbReclaimable: 820512 kB WbDirtyThresh: 6004692 kB WbDirtied: 1820448 kB WbWritten: 999488 kB WbWriteBandwidth: 169020 kBps b_dirty: 0 b_io: 0 b_more_io: 1 b_dirty_time: 0 state: 5 WbCgIno: 4131 WbWriteback: 1120 kB WbReclaimable: 820064 kB WbDirtyThresh: 6004728 kB WbDirtied: 1822688 kB WbWritten: 1002400 kB WbWriteBandwidth: 153520 kBps b_dirty: 0 b_io: 0 b_more_io: 1 b_dirty_time: 0 state: 5 [shikemeng@huaweicloud.com: fix build problems] Link: https://lkml.kernel.org/r/20240423034643.141219-4-shikemeng@huaweicloud.com Link: https://lkml.kernel.org/r/20240423034643.141219-3-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Brian Foster <bfoster@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: remove PageReferencedMatthew Wilcox (Oracle)
All callers now use folio_*_referenced() so we can remove the PageReferenced family of functions. Link: https://lkml.kernel.org/r/20240424191914.361554-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: remove page_ref_sub_return()Matthew Wilcox (Oracle)
With all callers converted to folios, we can act directly on folio->_refcount. Link: https://lkml.kernel.org/r/20240424191914.361554-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs()Matthew Wilcox (Oracle)
All callers have a folio so we can remove this use of page_ref_sub_return(). Link: https://lkml.kernel.org/r/20240424191914.361554-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: remove put_devmap_managed_page()Matthew Wilcox (Oracle)
It only has one caller; convert that caller to use put_devmap_managed_page_refs() instead. Link: https://lkml.kernel.org/r/20240424191914.361554-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: remove page_cache_alloc()Matthew Wilcox (Oracle)
Patch series "More folio compat code removal". More code removal with bonus kernel-doc addition. This patch (of 7): All callers have now been converted to filemap_alloc_folio(). Link: https://lkml.kernel.org/r/20240424191914.361554-1-willy@infradead.org Link: https://lkml.kernel.org/r/20240424191914.361554-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm/memory-failure: pass the folio to collect_procs_ksm()Matthew Wilcox (Oracle)
We've already calculated it, so pass it in instead of recalculating it in collect_procs_ksm(). Link: https://lkml.kernel.org/r/20240412193510.2356957-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: convert hugetlb_page_mapping_lock_write to folioMatthew Wilcox (Oracle)
The page is only used to get the mapping, so the folio will do just as well. Both callers already have a folio available, so this saves a call to compound_head(). Link: https://lkml.kernel.org/r/20240412193510.2356957-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jane Chu  <jane.chu@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm/memory-failure: convert shake_page() to shake_folio()Matthew Wilcox (Oracle)
Removes two calls to compound_head(). Move the prototype to internal.h; we definitely don't want code outside mm using it. Link: https://lkml.kernel.org/r/20240412193510.2356957-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jane Chu <jane.chu@oracle.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: return the address from page_mapped_in_vma()Matthew Wilcox (Oracle)
The only user of this function calls page_address_in_vma() immediately after page_mapped_in_vma() calculates it and uses it to return true/false. Return the address instead, allowing memory-failure to skip the call to page_address_in_vma(). Link: https://lkml.kernel.org/r/20240412193510.2356957-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05xarray: don't use "proxy" headersAndy Shevchenko
Update header inclusions to follow IWYU (Include What You Use) principle. Link: https://lkml.kernel.org/r/20240423142204.2408923-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05xarray: use BITS_PER_LONGS()Andy Shevchenko
Patch series "xarray: Clean up xarray.h". Main portion of this change is to get rid of kernel.h included into other globally available headers. This decreases a dependency hell degree. The first patch makes it possible to avoid math.h to be included as bitops.h is implied by bitmap.h. This patch (of 2): Use BITS_PER_LONGS() instead of open coded variant. Link: https://lkml.kernel.org/r/20240423142204.2408923-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05memcg: simple cleanup of stats update functionsShakeel Butt
mod_memcg_lruvec_state() is never called from outside of memcontrol.c and with always irq disabled. So, replace it with the irq disabled version and add an assert that irq is disabled in the caller. Similarly mod_objcg_state() is not called from outside of memcontrol.c, so simply make it static and change it's name to __mod_objcg_state(). Link: https://lkml.kernel.org/r/20240420232505.2768428-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: T.J. Mercier <tjmercier@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm/page-flags: make PageUptodate return boolHao Ge
Make PageUptodate return bool to align the return values of folio_test_uptodate function Link: https://lkml.kernel.org/r/20240422032725.41452-1-gehao@kylinos.cn Signed-off-by: Hao Ge <gehao@kylinos.cn> Cc: David Hildenbrand <david@redhat.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ruihan Li <lrh2000@pku.edu.cn> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm/madvise: introduce clear_young_dirty_ptes() batch helperLance Yang
Patch series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free", v10. This patchset adds support for lazyfreeing multi-size THP (mTHP) without needing to first split the large folio via split_folio(). However, we still need to split a large folio that is not fully mapped within the target range. If a large folio is locked or shared, or if we fail to split it, we just leave it in place and advance to the next PTE in the range. But note that the behavior is changed; previously, any failure of this sort would cause the entire operation to give up. As large folios become more common, sticking to the old way could result in wasted opportunities. Performance Testing =================== On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folios of the same size results in the following runtimes for madvise(MADV_FREE) in seconds (shorter is better): Folio Size | Old | New | Change ------------------------------------------ 4KiB | 0.590251 | 0.590259 | 0% 16KiB | 2.990447 | 0.185655 | -94% 32KiB | 2.547831 | 0.104870 | -95% 64KiB | 2.457796 | 0.052812 | -97% 128KiB | 2.281034 | 0.032777 | -99% 256KiB | 2.230387 | 0.017496 | -99% 512KiB | 2.189106 | 0.010781 | -99% 1024KiB | 2.183949 | 0.007753 | -99% 2048KiB | 0.002799 | 0.002804 | 0% This patch (of 4): This commit introduces clear_young_dirty_ptes() to replace mkold_ptes(). By doing so, we can use the same function for both use cases (madvise_pageout and madvise_free), and it also provides the flexibility to only clear the dirty flag in the future if needed. Link: https://lkml.kernel.org/r/20240418134435.6092-1-ioworker0@gmail.com Link: https://lkml.kernel.org/r/20240418134435.6092-2-ioworker0@gmail.com Signed-off-by: Lance Yang <ioworker0@gmail.com> Suggested-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Jeff Xie <xiehuan09@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05buffer: add kernel-doc for bforget() and __bforget()Matthew Wilcox (Oracle)
Distinguish these functions from brelse() and __brelse(). Link: https://lkml.kernel.org/r/20240416031754.4076917-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05buffer: add kernel-doc for brelse() and __brelse()Matthew Wilcox (Oracle)
Move the documentation for __brelse() to brelse(), format it as kernel-doc and update it from talking about pages to folios. Link: https://lkml.kernel.org/r/20240416031754.4076917-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05buffer: fix __bread and __bread_gfp kernel-docMatthew Wilcox (Oracle)
The extra indentation confused the kernel-doc parser, so remove it. Fix some other wording while I'm here, and advise the user they need to call brelse() on this buffer. __bread_gfp() isn't used directly by filesystems, but the other wrappers for it don't have documentation, so document it accordingly. Link: https://lkml.kernel.org/r/20240416031754.4076917-5-willy@infradead.org Co-developed-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: add per-order mTHP anon_swpout and anon_swpout_fallback countersBarry Song
This helps to display the fragmentation situation of the swapfile, knowing the proportion of how much we haven't split large folios. So far, we only support non-split swapout for anon memory, with the possibility of expanding to shmem in the future. So, we add the "anon" prefix to the counter names. Link: https://lkml.kernel.org/r/20240412114858.407208-3-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback countersBarry Song
Patch series "mm: add per-order mTHP alloc and swpout counters", v6. The patchset introduces a framework to facilitate mTHP counters, starting with the allocation and swap-out counters. Currently, only four new nodes are appended to the stats directory for each mTHP size. /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats anon_fault_alloc anon_fault_fallback anon_fault_fallback_charge anon_swpout anon_swpout_fallback These nodes are crucial for us to monitor the fragmentation levels of both the buddy system and the swap partitions. In the future, we may consider adding additional nodes for further insights. This patch (of 4): Profiling a system blindly with mTHP has become challenging due to the lack of visibility into its operations. Presenting the success rate of mTHP allocations appears to be pressing need. Recently, I've been experiencing significant difficulty debugging performance improvements and regressions without these figures. It's crucial for us to understand the true effectiveness of mTHP in real-world scenarios, especially in systems with fragmented memory. This patch establishes the framework for per-order mTHP counters. It begins by introducing the anon_fault_alloc and anon_fault_fallback counters. Additionally, to maintain consistency with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP. Incorporating additional counters should now be straightforward as well. Link: https://lkml.kernel.org/r/20240412114858.407208-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240412114858.407208-2-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>