summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-04mm: memcg: guard memcg1-specific members of struct mem_cgroup_per_nodeRoman Gushchin
Put memcg1-specific members of struct mem_cgroup_per_node under the CONFIG_MEMCG_V1 config option. Link: https://lkml.kernel.org/r/20240628210317.272856-8-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: put memcg1-specific struct mem_cgroup's members under CONFIG_MEMCG_V1Roman Gushchin
Put memcg1-specific members of struct mem_cgroup under the CONFIG_MEMCG_V1 config option. Also group them close to the end of struct mem_cgroup just before the dynamic per-node part. Link: https://lkml.kernel.org/r/20240628210317.272856-7-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: guard memcg1-specific fields accesses in mm/memcontrol.cRoman Gushchin
There are only few memcg1-specific struct mem_cgroup's members accesses left in mm/memcontrol.c. Let's guard them with the CONFIG_MEMCG_V1 config option. Link: https://lkml.kernel.org/r/20240628210317.272856-6-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: gather memcg1-specific fields initialization in memcg1_memcg_init()Roman Gushchin
Gather all memcg1-specific struct mem_cgroup's members initialization in a new memcg1_memcg_init() function, defined in mm/memcontrol-v1.c. Obviously, if CONFIG_MEMCG_V1 is not set, there is no need to initialize these fields, so the function becomes trivial. Link: https://lkml.kernel.org/r/20240628210317.272856-5-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: guard cgroup v1-specific code in mem_cgroup_print_oom_meminfo()Roman Gushchin
Put cgroup v1-specific code in mem_cgroup_print_oom_meminfo() under CONFIG_MEMCG_V1. Link: https://lkml.kernel.org/r/20240628210317.272856-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: factor out legacy socket memory accounting codeRoman Gushchin
Move out the legacy cgroup v1 socket memory accounting code into mm/memcontrol-v1.c. This commit introduces three new functions: memcg1_tcpmem_active(), memcg1_charge_skmem() and memcg1_uncharge_skmem(), which contain all cgroup v1-specific code and become trivial if CONFIG_MEMCG_V1 isn't set. Note, that !!memcg->tcpmem_pressure check in mem_cgroup_under_socket_pressure() can't be easily moved into memcontrol-v1.h without including memcontrol-v1.h from memcontrol.h which isn't a good idea, so it's better to just #ifdef it. Link: https://lkml.kernel.org/r/20240628210317.272856-3-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: move memcg_account_kmem() to memcontrol-v1.cRoman Gushchin
Patch series "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1". This patchset puts all cgroup v1's members of struct mem_cgroup, struct mem_cgroup_per_node and struct task_struct under the CONFIG_MEMCG_V1 config option. If cgroup v1 support is not required (and it's true for many cgroup users these days), it allows to save a bit of memory and compile out some code, some of which is on relatively hot paths. It also structures the code a bit better by grouping cgroup v1-specific stuff in one place. This patch (of 9): memcg_account_kmem() consists of a trivial statistics change via mod_memcg_state() call and a relatively large memcg1-specific part. Let's factor out the mod_memcg_state() call and move the rest into the mm/memcontrol-v1.c file. Also rename memcg_account_kmem() into memcg1_account_kmem() for consistency. Link: https://lkml.kernel.org/r/20240628210317.272856-1-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20240628210317.272856-2-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: add swappiness= arg to memory.reclaimDan Schatzberg
Allow proactive reclaimers to submit an additional swappiness=<val> argument to memory.reclaim. This overrides the global or per-memcg swappiness setting for that reclaim attempt. For example: echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim will perform reclaim on the rootcg with a swappiness setting of 0 (no swap) regardless of the vm.swappiness sysctl setting. Userspace proactive reclaimers use the memory.reclaim interface to trigger reclaim. The memory.reclaim interface does not allow for any way to effect the balance of file vs anon during proactive reclaim. The only approach is to adjust the vm.swappiness setting. However, there are a few reasons we look to control the balance of file vs anon during proactive reclaim, separately from reactive reclaim: * Swapout should be limited to manage SSD write endurance. In near-OOM situations we are fine with lots of swap-out to avoid OOMs. As these are typically rare events, they have relatively little impact on write endurance. However, proactive reclaim runs continuously and so its impact on SSD write endurance is more significant. Therefore it is desireable to control swap-out for proactive reclaim separately from reactive reclaim * Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap exhaustion. This makes sense if the swap exhaustion is triggered due to reactive reclaim but less so if it is triggered due to proactive reclaim (e.g. one could see OOMs when free memory is ample but anon is just particularly cold). Therefore, it's desireable to have proactive reclaim reduce or stop swap-out before the threshold at which OOM killing occurs. In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before writes to memory.reclaim[2]. This has been in production for nearly two years and has addressed our needs to control proactive vs reactive reclaim behavior but is still not ideal for a number of reasons: * vm.swappiness is a global setting, adjusting it can race/interfere with other system administration that wishes to control vm.swappiness. In our case, we need to disable Senpai before adjusting vm.swappiness. * vm.swappiness is stateful - so a crash or restart of Senpai can leave a misconfigured setting. This requires some additional management to record the "desired" setting and ensure Senpai always adjusts to it. With this patch, we avoid these downsides of adjusting vm.swappiness globally. [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598 Link: https://lkml.kernel.org/r/20240103164841.2800183-3-schatzberg.dan@gmail.com Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> Suggested-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Yue Zhao <findns94@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: add defines for min/max swappinessDan Schatzberg
Patch series "Add swappiness argument to memory.reclaim", v6. This patch proposes augmenting the memory.reclaim interface with a swappiness=<val> argument that overrides the swappiness value for that instance of proactive reclaim. Userspace proactive reclaimers use the memory.reclaim interface to trigger reclaim. The memory.reclaim interface does not allow for any way to effect the balance of file vs anon during proactive reclaim. The only approach is to adjust the vm.swappiness setting. However, there are a few reasons we look to control the balance of file vs anon during proactive reclaim, separately from reactive reclaim: * Swapout should be limited to manage SSD write endurance. In near-OOM situations we are fine with lots of swap-out to avoid OOMs. As these are typically rare events, they have relatively little impact on write endurance. However, proactive reclaim runs continuously and so its impact on SSD write endurance is more significant. Therefore it is desireable to control swap-out for proactive reclaim separately from reactive reclaim * Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap exhaustion. This makes sense if the swap exhaustion is triggered due to reactive reclaim but less so if it is triggered due to proactive reclaim (e.g. one could see OOMs when free memory is ample but anon is just particularly cold). Therefore, it's desireable to have proactive reclaim reduce or stop swap-out before the threshold at which OOM killing occurs. In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before writes to memory.reclaim[2]. This has been in production for nearly two years and has addressed our needs to control proactive vs reactive reclaim behavior but is still not ideal for a number of reasons: * vm.swappiness is a global setting, adjusting it can race/interfere with other system administration that wishes to control vm.swappiness. In our case, we need to disable Senpai before adjusting vm.swappiness. * vm.swappiness is stateful - so a crash or restart of Senpai can leave a misconfigured setting. This requires some additional management to record the "desired" setting and ensure Senpai always adjusts to it. With this patch, we avoid these downsides of adjusting vm.swappiness globally. Previously, this exact interface addition was proposed by Yosry[3]. In response, Roman proposed instead an interface to specify precise file/anon/slab reclaim amounts[4]. More recently Huan also proposed this as well[5] and others similarly questioned if this was the proper interface. Previous proposals sought to use this to allow proactive reclaimers to effectively perform a custom reclaim algorithm by issuing proactive reclaim with different settings to control file vs anon reclaim (e.g. to only reclaim anon from some applications). Responses argued that adjusting swappiness is a poor interface for custom reclaim. In contrast, I argue in favor of a swappiness setting not as a way to implement custom reclaim algorithms but rather to bias the balance of anon vs file due to differences of proactive vs reactive reclaim. In this context, swappiness is the existing interface for controlling this balance and this patch simply allows for it to be configured differently for proactive vs reactive reclaim. Specifying explicit amounts of anon vs file pages to reclaim feels inappropriate for this prupose. Proactive reclaimers are un-aware of the relative age of file vs anon for a cgroup which makes it difficult to manage proactive reclaim of different memory pools. A proactive reclaimer would need some amount of anon reclaim attempts separate from the amount of file reclaim attempts which seems brittle given that it's difficult to observe the impact. [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598 [3]https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/ [4]https://lore.kernel.org/linux-mm/YoPHtHXzpK51F%2F1Z@carbon/ [5]https://lore.kernel.org/lkml/20231108065818.19932-1-link@vivo.com/ This patch (of 2): We use the constants 0 and 200 in a few places in the mm code when referring to the min and max swappiness. This patch adds MIN_SWAPPINESS and MAX_SWAPPINESS #defines to improve clarity. There are no functional changes. Link: https://lkml.kernel.org/r/20240103164841.2800183-1-schatzberg.dan@gmail.com Link: https://lkml.kernel.org/r/20240103164841.2800183-2-schatzberg.dan@gmail.com Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yue Zhao <findns94@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04MAINTAINERS: add mm/memcontrol-v1.c/h to the list of maintained filesRoman Gushchin
Link: https://lkml.kernel.org/r/20240625005906.106920-15-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: put cgroup v1-specific code under a config optionRoman Gushchin
Put legacy cgroup v1 memory controller code under a new CONFIG_MEMCG_V1 config option. The option is turned off by default. Nobody except those who are still using cgroup v1 should turn it on. If the option is not set, memory controller can still be mounted under cgroup v1, but none of memcg-specific control files are present. Please note, that not all cgroup v1's memory controller code is guarded yet (but most of it), it's a subject for some follow-up work. Thanks to Michal Hocko for providing a better Kconfig option description. [roman.gushchin@linux.dev: better config option description provided by Michal] Link: https://lkml.kernel.org/r/ZnxXNtvqllc9CDoo@google.com Link: https://lkml.kernel.org/r/20240625005906.106920-14-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: group cgroup v1 memcg related declarationsRoman Gushchin
Group all cgroup v1-related declarations at the end of memcontrol.h and mm/memcontrol-v1.h with an intention to put them all together under a config option later on. It should make things easier to follow and maintain too. Link: https://lkml.kernel.org/r/20240625005906.106920-13-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: make memcg1_update_tree() staticRoman Gushchin
memcg1_update_tree() is not used outside of mm/memcontrol-v1.c anymore, define it as static and remove the declaration from the header file. Link: https://lkml.kernel.org/r/20240625005906.106920-12-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: move cgroup v1 interface files to memcontrol-v1.cRoman Gushchin
Move legacy cgroup v1 memory controller interfaces and corresponding code into memcontrol-v1.c. [roman.gushchin@linux.dev: move two functions] Link: https://lkml.kernel.org/r/20240704002712.2077812-1-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20240625005906.106920-11-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: rename memcg_oom_recover()Roman Gushchin
Rename memcg_oom_recover() into memcg1_oom_recover() for consistency with other memory cgroup v1-related functions. Move the declaration in mm/memcontrol-v1.h to be nearby other memcg v1 oom handling functions. Link: https://lkml.kernel.org/r/20240625005906.106920-10-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: move cgroup v1 oom handling code into memcontrol-v1.cRoman Gushchin
Cgroup v1 supports a complicated OOM handling in userspace mechanism, which is not supported by cgroup v2. Let's move the corresponding code into memcontrol-v1.c. Aside from mechanical code movement this patch introduces two new functions: memcg1_oom_prepare() and memcg1_oom_finish(). Those are implementing cgroup v1-specific parts of the common memcg OOM handling path. Link: https://lkml.kernel.org/r/20240625005906.106920-9-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: rename memcg_check_events()Roman Gushchin
Rename memcg_check_events() into memcg1_check_events() for consistency with other cgroup v1-specific functions. Link: https://lkml.kernel.org/r/20240625005906.106920-8-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: move legacy memcg event code into memcontrol-v1.cRoman Gushchin
Cgroup v1's memory controller contains a pretty complicated event notifications mechanism which is not used on cgroup v2. Let's move the corresponding code into memcontrol-v1.c. Please, note, that mem_cgroup_event_ratelimit() remains in memcontrol.c, otherwise it would require exporting too many details on memcg stats outside of memcontrol.c. Link: https://lkml.kernel.org/r/20240625005906.106920-7-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: rename charge move-related functionsRoman Gushchin
Rename exported function related to the charge move to have the memcg1_ prefix. Link: https://lkml.kernel.org/r/20240625005906.106920-6-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: move charge migration code to memcontrol-v1.cRoman Gushchin
Unlike the legacy cgroup v1 memory controller, cgroup v2 memory controller doesn't support moving charged pages between cgroups. It's a fairly large and complicated code which created a number of problems in the past. Let's move this code into memcontrol-v1.c. It shaves off 1k lines from memcontrol.c. It's also another step towards making the legacy memory controller code optionally compiled. Link: https://lkml.kernel.org/r/20240625005906.106920-5-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: rename soft limit reclaim-related functionsRoman Gushchin
Rename exported function related to the softlimit reclaim to have memcg1_ prefix. Link: https://lkml.kernel.org/r/20240625005906.106920-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: move soft limit reclaim code to memcontrol-v1.cRoman Gushchin
Soft limits are cgroup v1-specific and are not supported by cgroup v2, so let's move the corresponding code into memcontrol-v1.c. Aside from simple moving the code, this commits introduces a trivial memcg1_soft_limit_reset() function to reset soft limits and also moves the global soft limit tree initialization code into a new memcg1_init() function. It also moves corresponding declarations shared between memcontrol.c and memcontrol-v1.c into mm/memcontrol-v1.h. Link: https://lkml.kernel.org/r/20240625005906.106920-3-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: memcg: introduce memcontrol-v1.cRoman Gushchin
Patch series "mm: memcg: separate legacy cgroup v1 code and put under config option", v2. Cgroups v2 have been around for a while and many users have fully adopted them, so they never use cgroups v1 features and functionality. Yet they have to "pay" for the cgroup v1 support anyway: 1) the kernel binary contains an unused cgroup v1 code, 2) some code paths have additional checks which are not needed, 3) some common structures like task_struct and mem_cgroup contain unused cgroup v1-specific members. Cgroup v1's memory controller has a number of features that are not supported by cgroup v2 and their implementation is pretty much self contained. Most notably, these features are: soft limit reclaim, oom handling in userspace, complicated event notification system, charge migration. Cgroup v1-specific code in memcontrol.c is close to 4k lines in size and it's intervened with generic and cgroup v2-specific code. It's a burden on developers and maintainers. This patchset aims to solve these problems by: 1) moving cgroup v1-specific memcg code to the new mm/memcontrol-v1.c file, 2) putting definitions shared by memcontrol.c and memcontrol-v1.c into the mm/memcontrol-v1.h header, 3) introducing the CONFIG_MEMCG_V1 config option, turned off by default, 4) making memcontrol-v1.c to compile only if CONFIG_MEMCG_V1 is set. If CONFIG_MEMCG_V1 is not set, cgroup v1 memory controller is still available for mounting, however no memory-specific control knobs are present. This patch (of 14): This patch introduces the mm/memcontrol-v1.c source file which will be used for all legacy (cgroup v1) memory cgroup code. It also introduces mm/memcontrol-v1.h to keep declarations shared between mm/memcontrol.c and mm/memcontrol-v1.c. As of now, let's compile it if CONFIG_MEMCG is set, similar to mm/memcontrol.c. Later on it can be switched to use a separate config option, so that the legacy code won't be compiled if not required. Link: https://lkml.kernel.org/r/20240625005906.106920-1-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20240625005906.106920-2-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm/ksm: optimize the chain()/chain_prune() interfacesChengming Zhou
Now the implementation of stable_node_dup() causes chain()/chain_prune() interfaces and usages are overcomplicated. Why? stable_node_dup() only find and return a candidate stable_node for sharing, so the users have to recheck using stable_node_dup_any() if any non-candidate stable_node exist. And try to ksm_get_folio() from it again. Actually, stable_node_dup() can just return a best stable_node as it can, then the users can check if it's a candidate for sharing or not. The code is simplified too and fewer corner cases: such as stable_node and stable_node_dup can't be NULL if returned tree_folio is not NULL. Link: https://lkml.kernel.org/r/20240621-b4-ksm-scan-optimize-v2-3-1c328aa9e30b@linux.dev Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm/ksm: don't waste time searching stable tree for fast changing pageChengming Zhou
The code flow in cmp_and_merge_page() is suboptimal for handling the ksm page and non-ksm page at the same time. For example: - ksm page 1. Mostly just return if this ksm page is not migrated and this rmap_item has been on the rmap hlist. Or we have to fix this rmap_item mapping. 2. But we absolutely don't need to checksum for this ksm page, since it can't change. - non-ksm page 1. First don't need to waste time searching stable tree if fast changing. 2. Should try to merge with zero page before search the stable tree. 3. Then search stable tree to find mergeable ksm page. This patch optimizes the code flow so the handling differences between ksm page and non-ksm page become clearer and more efficient too. Link: https://lkml.kernel.org/r/20240621-b4-ksm-scan-optimize-v2-2-1c328aa9e30b@linux.dev Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm/ksm: refactor out try_to_merge_with_zero_page()Chengming Zhou
Patch series "mm/ksm: cmp_and_merge_page() optimizations and cleanup", v2. This series mainly optimizes cmp_and_merge_page() to have more efficient separate code flow for ksm page and non-ksm anon page. - ksm page: don't need to calculate the checksum obviously. - anon page: don't need to search stable tree if changing fast and try to merge with zero page before searching ksm page on stable tree. Please see the patch-2 for details. Patch-3 is cleanup also a little optimization for the chain()/chain_prune interfaces, which made the stable_tree_search()/stable_tree_insert() over complex. I have done simple testing using "hackbench -g 1 -l 300000" (maybe I need to use a better workload) on my machine, have seen a little CPU usage decrease of ksmd and some improvements of cmp_and_merge_page() latency: We can see the latency of cmp_and_merge_page() when handling non-ksm anon pages has been improved. This patch (of 3): In preparation for later changes, refactor out a new function called try_to_merge_with_zero_page(), which tries to merge with zero page. Link: https://lkml.kernel.org/r/20240621-b4-ksm-scan-optimize-v2-0-1c328aa9e30b@linux.dev Link: https://lkml.kernel.org/r/20240621-b4-ksm-scan-optimize-v2-1-1c328aa9e30b@linux.dev Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04hugetlb: force allocating surplus hugepages on mempolicy allowed nodesAristeu Rozanski
When trying to allocate a hugepage with no reserved ones free, it may be allowed in case a number of overcommit hugepages was configured (using /proc/sys/vm/nr_overcommit_hugepages) and that number wasn't reached. This allows for a behavior of having extra hugepages allocated dynamically, if there're resources for it. Some sysadmins even prefer not reserving any hugepages and setting a big number of overcommit hugepages. But while attempting to allocate overcommit hugepages in a multi node system (either NUMA or mempolicy/cpuset) said allocations might randomly fail even when there're resources available for the allocation. This happens due to allowed_mems_nr() only accounting for the number of free hugepages in the nodes the current process belongs to and the surplus hugepage allocation is done so it can be allocated in any node. In case one or more of the requested surplus hugepages are allocated in a different node, the whole allocation will fail due allowed_mems_nr() returning a lower value. So allocate surplus hugepages in one of the nodes the current process belongs to. Easy way to reproduce this issue is to use a 2+ NUMA nodes system: # echo 0 >/proc/sys/vm/nr_hugepages # echo 1 >/proc/sys/vm/nr_overcommit_hugepages # numactl -m0 ./tools/testing/selftests/mm/map_hugetlb 2 Repeating the execution of map_hugetlb test application will eventually fail when the hugepage ends up allocated in a different node. [aris@ruivo.org: v2] Link: https://lkml.kernel.org/r/20240701212343.GG844599@cathedrallabs.org Link: https://lkml.kernel.org/r/20240621190050.mhxwb65zn37doegp@redhat.com Signed-off-by: Aristeu Rozanski <aris@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Aristeu Rozanski <aris@ruivo.org> Cc: David Hildenbrand <david@redhat.com> Cc: Vishal Moola <vishal.moola@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm/damon/paddr: initialize nr_succeeded in __damon_pa_migrate_folio_list()SeongJae Park
The variable is supposed to be set via later migrate_pages() call. However, the function does not do that when CONFIG_MIGRATION is unset. Initialize the variable to zero. Link: https://lkml.kernel.org/r/20240701165332.47495-1-sj@kernel.org Fixes: 5311c0a2eee3 ("mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202406251102.GE07hqfQ-lkp@intel.com/ Cc: Honggyu Kim <honggyu.kim@sk.com> Cc: Hyeongtak Ji <hyeongtak.ji@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-04mm: refactor folio_undo_large_rmappable()Kefeng Wang
Folios of order <= 1 are not in deferred list, the check of order is added into folio_undo_large_rmappable() from commit 8897277acfef ("mm: support order-1 folios in the page cache"), but there is a repeated check for small folio (order 0) during each call of the folio_undo_large_rmappable(), so only keep folio_order() check inside the function. In addition, move all the checks into header file to save a function call for non-large-rmappable or empty deferred_list folio. Link: https://lkml.kernel.org/r/20240521130315.46072-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-05tpm: Address !chip->auth in tpm_buf_append_hmac_session*()Jarkko Sakkinen
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can cause a null derefence in tpm_buf_hmac_session*(). Thus, address !chip->auth in tpm_buf_hmac_session*() and remove the fallback implementation for !TCG_TPM2_HMAC. Cc: stable@vger.kernel.org # v6.9+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-07-05tpm: Address !chip->auth in tpm_buf_append_name()Jarkko Sakkinen
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can cause a null derefence in tpm_buf_append_name(). Thus, address !chip->auth in tpm_buf_append_name() and remove the fallback implementation for !TCG_TPM2_HMAC. Cc: stable@vger.kernel.org # v6.10+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: d0a25bb961e6 ("tpm: Add HMAC session name/handle append") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-07-05tpm: Address !chip->auth in tpm2_*_auth_session()Jarkko Sakkinen
Unless tpm_chip_bootstrap() was called by the driver, !chip->auth can cause a null derefence in tpm2_*_auth_session(). Thus, address !chip->auth in tpm2_*_auth_session(). Cc: stable@vger.kernel.org # v6.9+ Reported-by: Stefan Berger <stefanb@linux.ibm.com> Closes: https://lore.kernel.org/linux-integrity/20240617193408.1234365-1-stefanb@linux.ibm.com/ Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions") Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-07-04ethtool: move firmware flashing flag to struct ethtool_netdev_stateEdward Cree
Commit 31e0aa99dc02 ("ethtool: Veto some operations during firmware flashing process") added a flag module_fw_flash_in_progress to struct net_device. As this is ethtool related state, move it to the recently created struct ethtool_netdev_state, accessed via the 'ethtool' member of struct net_device. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20240703121849.652893-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/phy/aquantia/aquantia.h 219343755eae ("net: phy: aquantia: add missing include guards") 61578f679378 ("net: phy: aquantia: add support for PHY LEDs") drivers/net/ethernet/wangxun/libwx/wx_hw.c bd07a9817846 ("net: txgbe: remove separate irq request for MSI and INTx") b501d261a5b3 ("net: txgbe: add FDIR ATR support") https://lore.kernel.org/all/20240703112936.483c1975@canb.auug.org.au/ include/linux/mlx5/mlx5_ifc.h 048a403648fc ("net/mlx5: IFC updates for changing max EQs") 99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO") https://lore.kernel.org/all/20240701133951.6926b2e3@canb.auug.org.au/ Adjacent changes: drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c 4130c67cd123 ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference") 3f3126515fbe ("wifi: iwlwifi: mvm: add mvm-specific guard") include/net/mac80211.h 816c6bec09ed ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP") 5a009b42e041 ("wifi: mac80211: track changes in AP's TPE") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-04Merge branches 'doc.2024.06.06a', 'fixes.2024.07.04a', 'mb.2024.06.28a', ↵Paul E. McKenney
'nocb.2024.06.03a', 'rcu-tasks.2024.06.06a', 'rcutorture.2024.06.06a' and 'srcu.2024.06.18a' into HEAD doc.2024.06.06a: Documentation updates. fixes.2024.07.04a: Miscellaneous fixes. mb.2024.06.28a: Grace-period memory-barrier redundancy removal. nocb.2024.06.03a: No-CB CPU updates. rcu-tasks.2024.06.06a: RCU-Tasks updates. rcutorture.2024.06.06a: Torture-test updates. srcu.2024.06.18a: SRCU polled-grace-period updates.
2024-07-04rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocationFrederic Weisbecker
When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off rnp->qsmaskinitnext, it means that all accesses from the offline CPU preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including callbacks expiration and counter updates. However interrupts can still fire after stop_machine() re-enables interrupts and before rcutree_report_cpu_dead(). The related accesses happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing are _NOT_ guaranteed to be seen by rcu_barrier() without proper ordering, especially when callbacks are invoked there to the end, making rcutree_migrate_callback() bypass barrier_lock. The following theoretical race example can make rcu_barrier() hang: CPU 0 CPU 1 ----- ----- //cpu_down() smpboot_park_threads() //ksoftirqd is parked now <IRQ> rcu_sched_clock_irq() invoke_rcu_core() do_softirq() rcu_core() rcu_do_batch() // callback storm // rcu_do_batch() returns // before completing all // of them // do_softirq also returns early because of // timeout. It defers to ksoftirqd but // it's parked </IRQ> stop_machine() take_cpu_down() rcu_barrier() spin_lock(barrier_lock) // observes rcu_segcblist_n_cbs(&rdp->cblist) != 0 <IRQ> do_softirq() rcu_core() rcu_do_batch() //completes all pending callbacks //smp_mb() implied _after_ callback number dec </IRQ> rcutree_report_cpu_dead() rnp->qsmaskinitnext &= ~rdp->grpmask; rcutree_migrate_callback() // no callback, early return without locking // barrier_lock //observes !rcu_rdp_cpu_online(rdp) rcu_barrier_entrain() rcu_segcblist_entrain() // Observe rcu_segcblist_n_cbs(rsclp) == 0 // because no barrier between reading // rnp->qsmaskinitnext and rsclp->len rcu_segcblist_add_len() smp_mb__before_atomic() // will now observe the 0 count and empty // list, but too late, we enqueue regardless WRITE_ONCE(rsclp->len, rsclp->len + v); // ignored barrier callback // rcu barrier stall... This could be solved with a read memory barrier, enforcing the message passing between rnp->qsmaskinitnext and rsclp->len, matching the full memory barrier after rsclp->len addition in rcu_segcblist_add_len() performed at the end of rcu_do_batch(). However the rcu_barrier() is complicated enough and probably doesn't need too many more subtleties. CPU down is a slowpath and the barrier_lock seldom contended. Solve the issue with unconditionally locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure that either rcu_barrier() sees the empty queue or its entrained callback will be migrated. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2024-07-04rcu: Eliminate lockless accesses to rcu_sync->gp_countOleg Nesterov
The rcu_sync structure's ->gp_count field is always accessed under the protection of that same structure's ->rss_lock field, with the exception of a pair of WARN_ON_ONCE() calls just prior to acquiring that lock in functions rcu_sync_exit() and rcu_sync_dtor(). These lockless accesses are unnecessary and impair KCSAN's ability to catch bugs that might be inserted via other lockless accesses. This commit therefore moves those WARN_ON_ONCE() calls under the lock. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2024-07-04MAINTAINERS: Add Uladzislau Rezki as RCU maintainerPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Uladzislau Rezki <urezki@gmail.com>
2024-07-04rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitterPaul E. McKenney
If a CPU is running either a userspace application or a guest OS in nohz_full mode, it is possible for a system call to occur just as an RCU grace period is starting. If that CPU also has the scheduling-clock tick enabled for any reason (such as a second runnable task), and if the system was booted with rcutree.use_softirq=0, then RCU can add insult to injury by awakening that CPU's rcuc kthread, resulting in yet another task and yet more OS jitter due to switching to that task, running it, and switching back. In addition, in the common case where that system call is not of excessively long duration, awakening the rcuc task is pointless. This pointlessness is due to the fact that the CPU will enter an extended quiescent state upon returning to the userspace application or guest OS. In this case, the rcuc kthread cannot do anything that the main RCU grace-period kthread cannot do on its behalf, at least if it is given a few additional milliseconds (for example, given the time duration specified by rcutree.jiffies_till_first_fqs, give or take scheduling delays). This commit therefore adds a rcutree.nohz_full_patience_delay kernel boot parameter that specifies the grace period age (in milliseconds, rounded to jiffies) before which RCU will refrain from awakening the rcuc kthread. Preliminary experimentation suggests a value of 1000, that is, one second. Increasing rcutree.nohz_full_patience_delay will increase grace-period latency and in turn increase memory footprint, so systems with constrained memory might choose a smaller value. Systems with less-aggressive OS-jitter requirements might choose the default value of zero, which keeps the traditional immediate-wakeup behavior, thus avoiding increases in grace-period latency. [ paulmck: Apply Leonardo Bras feedback. ] Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/ Reported-by: Leonardo Bras <leobras@redhat.com> Suggested-by: Leonardo Bras <leobras@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Leonardo Bras <leobras@redhat.com>
2024-07-04parisc: Use max() to calculate parisc_tlb_flush_thresholdThorsten Blum
Use max() to reduce 4 lines of code to a single line and improve its readability. Fixes the following Coccinelle/coccicheck warning reported by minmax.cocci: WARNING opportunity for max() Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Helge Deller <deller@gmx.de>
2024-07-04ARM: Emulate one-byte cmpxchgPaul E. McKenney
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on ARM systems with ARCH == ARMv6K. [ paulmck: Apply Arnd Bergmann and Nathan Chancellor feedback. ] [ paulmck: Apply Linus Walleij feedback. ] Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/54798f68-48f7-4c65-9cba-47c0bf175143@sirena.org.uk/ Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYuZ+pf6p8AXMZWtdFtX-gbG8HMaBKp=XbxcdzA_QeLkxQ@mail.gmail.com/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Davis <afd@ti.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Eric DeVolder <eric.devolder@oracle.com> Cc: Rob Herring <robh@kernel.org> Cc: <linux-arm-kernel@lists.infradead.org>
2024-07-04Merge branch 'icc-rpmh-qos' into icc-nextGeorgi Djakov
This series adds QoS support for QNOC type device which can be found on SC7280 platform. It adds support for programming priority, priority forward disable and urgency forwarding. This helps in priortizing the traffic originating from different interconnect masters at NOC (Network On Chip). * icc-rpmh-qos dt-bindings: interconnect: add clock property to enable QOS on SC7280 interconnect: qcom: icc-rpmh: Add QoS configuration support interconnect: qcom: sc7280: enable QoS configuration interconnect: qcom: Fix DT backwards compatibility for QoS Link: https://lore.kernel.org/r/20240607173927.26321-1-quic_okukatla@quicinc.com Signed-off-by: Georgi Djakov <djakov@kernel.org>
2024-07-04interconnect: qcom: Fix DT backwards compatibility for QoSOdelu Kukatla
Add qos_clks_required flag to skip QoS configuration if clocks property is not populated in devicetree for providers which require clocks to be enabled for accessing registers. This is to keep the QoS configuration backwards compatible with devices that have older DTB. Reported-by: Bjorn Andersson <andersson@kernel.org> Closes: https://lore.kernel.org/all/ciji6nlxn752ina4tmh6kwvek52nxpnguomqek6plwvwgvoqef@yrtexkpmn5br/ Signed-off-by: Odelu Kukatla <quic_okukatla@quicinc.com> Tested-by: Bjorn Andersson <andersson@kernel.org> Fixes: fbd908bb8bc0 ("interconnect: qcom: sc7280: enable QoS configuration") Link: https://lore.kernel.org/r/20240704125515.22194-1-quic_okukatla@quicinc.com Signed-off-by: Georgi Djakov <djakov@kernel.org>
2024-07-04Merge branch 'icc-msm8953' into icc-nextGeorgi Djakov
Add interconnect driver for MSM8953-based devices. * icc-msm8953 dt-bindings: interconnect: qcom: Add Qualcomm MSM8953 NoC interconnect: qcom: Add MSM8953 driver Link: https://lore.kernel.org/r/20240628-msm8953-interconnect-v3-0-a70d582182dc@mainlining.org Signed-off-by: Georgi Djakov <djakov@kernel.org>
2024-07-04Merge branch 'icc-fixes' into icc-nextGeorgi Djakov
* icc-fixes interconnect: qcom: qcm2290: Fix mas_snoc_bimc RPM master ID Signed-off-by: Georgi Djakov <djakov@kernel.org>
2024-07-04arm64: dts: rockchip: fixes PHY reset for Lunzn Fastrhino R68SChukun Pan
Fixed the PHY address and reset GPIOs (does not match the corresponding pinctrl) for gmac0 and gmac1. Fixes: b9f8ca655d80 ("arm64: dts: rockchip: Add Lunzn Fastrhino R68S") Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://lore.kernel.org/r/20240630150010.55729-7-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-07-04arm64: dts: rockchip: disable display subsystem for Lunzn Fastrhino R6xSChukun Pan
The R66S and R68S boards do not have HDMI output, so disable the display subsystem. Fixes: c79dab407afd ("arm64: dts: rockchip: Add Lunzn Fastrhino R66S") Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://lore.kernel.org/r/20240701143028.1203997-3-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-07-04arm64: dts: rockchip: remove unused usb2 nodes for Lunzn Fastrhino R6xSChukun Pan
Fix the following error when booting: [ 15.851853] platform fd800000.usb: deferred probe pending [ 15.852384] platform fd840000.usb: deferred probe pending [ 15.852881] platform fd880000.usb: deferred probe pending This is due to usb2phy1 is not enabled. There is no USB 2.0 port on the board, just remove it. Fixes: c79dab407afd ("arm64: dts: rockchip: Add Lunzn Fastrhino R66S") Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://lore.kernel.org/r/20240630150010.55729-5-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-07-04arm64: dts: rockchip: fix pmu_io supply for Lunzn Fastrhino R6xSChukun Pan
Fixes pmu_io_domains supply according to the schematic. Among them, the vccio3 is responsible for the io voltage of sdcard. There is no sdcard slot on the R68S, and it's connected to vcc_3v3, so describe the supply of vccio3 separately. Fixes: c79dab407afd ("arm64: dts: rockchip: Add Lunzn Fastrhino R66S") Fixes: b9f8ca655d80 ("arm64: dts: rockchip: Add Lunzn Fastrhino R68S") Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://lore.kernel.org/r/20240630150010.55729-4-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-07-04arm64: dts: rockchip: fix usb regulator for Lunzn Fastrhino R6xSChukun Pan
Remove the non-existent usb_host regulator and fix the supply according to the schematic. Also remove the unnecessary always-on and boot-on for the usb_otg regulator. Fixes: c79dab407afd ("arm64: dts: rockchip: Add Lunzn Fastrhino R66S") Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://lore.kernel.org/r/20240701143028.1203997-2-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>