summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-06-17KVM: arm64: Add Oliver as a reviewerMarc Zyngier
Oliver Upton has agreed to help with reviewing the KVM/arm64 patches, and has been doing so for a while now, so adding him as to the reviewer list. Note that Oliver is using a different email address for this purpose, rather than the one his been using for his other contributions. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Oliver Upton <oupton@google.com> Link: https://lore.kernel.org/r/20220616085318.1303657-1-maz@kernel.org
2022-06-17KVM: arm64: Prevent kmemleak from accessing pKVM memoryQuentin Perret
Commit a7259df76702 ("memblock: make memblock_find_in_range method private") changed the API using which memory is reserved for the pKVM hypervisor. However, memblock_phys_alloc() differs from the original API in terms of kmemleak semantics -- the old one didn't report the reserved regions to kmemleak while the new one does. Unfortunately, when protected KVM is enabled, all kernel accesses to pKVM-private memory result in a fatal exception, which can now happen because of kmemleak scans: $ echo scan > /sys/kernel/debug/kmemleak [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290! [ 34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000 [ 34.991813] Kernel panic - not syncing: HYP panic: [ 34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 [ 34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 [ 34.991813] VCPU:0000000000000000 [ 34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102 [ 34.994059] Hardware name: linux,dummy-virt (DT) [ 34.994452] Call trace: [ 34.994641] dump_backtrace.part.0+0xcc/0xe0 [ 34.994932] show_stack+0x18/0x6c [ 34.995094] dump_stack_lvl+0x68/0x84 [ 34.995276] dump_stack+0x18/0x34 [ 34.995484] panic+0x16c/0x354 [ 34.995673] __hyp_pgtable_total_pages+0x0/0x60 [ 34.995933] scan_block+0x74/0x12c [ 34.996129] scan_gray_list+0xd8/0x19c [ 34.996332] kmemleak_scan+0x2c8/0x580 [ 34.996535] kmemleak_write+0x340/0x4a0 [ 34.996744] full_proxy_write+0x60/0xbc [ 34.996967] vfs_write+0xc4/0x2b0 [ 34.997136] ksys_write+0x68/0xf4 [ 34.997311] __arm64_sys_write+0x20/0x2c [ 34.997532] invoke_syscall+0x48/0x114 [ 34.997779] el0_svc_common.constprop.0+0x44/0xec [ 34.998029] do_el0_svc+0x2c/0xc0 [ 34.998205] el0_svc+0x2c/0x84 [ 34.998421] el0t_64_sync_handler+0xf4/0x100 [ 34.998653] el0t_64_sync+0x18c/0x190 [ 34.999252] SMP: stopping secondary CPUs [ 35.000034] Kernel Offset: disabled [ 35.000261] CPU features: 0x800,00007831,00001086 [ 35.000642] Memory Limit: none [ 35.001329] ---[ end Kernel panic - not syncing: HYP panic: [ 35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 [ 35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 [ 35.001329] VCPU:0000000000000000 ]--- Fix this by explicitly excluding the hypervisor's memory pool from kmemleak like we already do for the hyp BSS. Cc: Mike Rapoport <rppt@kernel.org> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220616161135.3997786-1-qperret@google.com
2022-06-17ALSA: x86: intel_hdmi_audio: use pm_runtime_resume_and_get()Pierre-Louis Bossart
The current code does not check for errors and does not release the reference on errors. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20220616222910.136854-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-06-17ALSA: x86: intel_hdmi_audio: enable pm_runtime and set autosuspend delayPierre-Louis Bossart
The existing code uses pm_runtime_get_sync/put_autosuspend, but pm_runtime was not explicitly enabled. The autosuspend delay was not set either, the value is set to 5s since HDMI is rather painful to resume. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20220616222910.136854-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-06-17ALSA: hda: intel-nhlt: remove use of __func__ in dev_dbgPierre-Louis Bossart
The module and function information can be added with 'modprobe foo dyndbg=+pmf' Suggested-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20220616220559.136160-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-06-17ALSA: hda: intel-dspcfg: use SOF for UpExtreme and UpExtreme11 boardsPierre-Louis Bossart
The UpExtreme BIOS reports microphones that are not physically present, so this module ends-up selecting SOF, while the UpExtreme11 BIOS does not report microphones so the snd-hda-intel driver is selected. For consistency use SOF unconditionally in autodetection mode. The use of the snd-hda-intel driver can still be enabled with 'options snd-intel-dspcfg dsp_driver=1' Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://lore.kernel.org/r/20220616201029.130477-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-06-17firewire: convert sysfs sprintf/snprintf family to sysfs_emitJiapeng Chong
Fix the following coccicheck warning: ./drivers/firewire/core-device.c:375:8-16: WARNING: use scnprintf or sprintf. Reported-by: Abaci Robot<abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20220615121505.61412-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-06-17firewire: cdev: fix potential leak of kernel stack due to uninitialized valueTakashi Sakamoto
Recent change brings potential leak of value on kernel stack to userspace due to uninitialized value. This commit fixes the bug. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: baa914cd81f5 ("firewire: add kernel API to access CYCLE_TIME register") Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20220512112037.103142-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-06-17ata: libata: add qc->flags in ata_qc_complete_template tracepointEdward Wu
Add flags value to check the result of ata completion Fixes: 255c03d15a29 ("libata: Add tracepoints") Cc: stable@vger.kernel.org Signed-off-by: Edward Wu <edwardwu@realtek.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-06-16Merge tag 'drm-fixes-2022-06-17' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular drm fixes for rc3. Nothing too serious, i915, amdgpu and exynos all have a few small driver fixes, and two ttm fixes, and one compiler warning. atomic: - fix spurious compiler warning ttm: - add NULL ptr check in swapout code - fix bulk move handling i915: - Fix page fault on error state read - Fix memory leaks in per-gt sysfs - Fix multiple fence handling - Remove accidental static from a local variable amdgpu: - Fix regression in GTT size reporting - OLED backlight fix exynos: - Check a null pointer instead of IS_ERR() - Rework initialization code of Exynos MIC driver" * tag 'drm-fixes-2022-06-17' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Cap OLED brightness per max frame-average luminance drm/amdgpu: Fix GTT size reporting in amdgpu_ioctl drm/exynos: mic: Rework initialization drm/exynos: fix IS_ERR() vs NULL check in probe drm/ttm: fix bulk move handling v2 drm/i915/uc: remove accidental static from a local variable drm/i915: Individualize fences before adding to dma_resv obj drm/i915/gt: Fix memory leaks in per-gt sysfs drm/i915/reset: Fix error_state_read ptr + offset use drm/ttm: fix missing NULL check in ttm_device_swapout drm/atomic: fix warning of unused variable
2022-06-16phy: aquantia: Fix AN when higher speeds than 1G are not advertisedClaudiu Manoil
Even when the eth port is resticted to work with speeds not higher than 1G, and so the eth driver is requesting the phy (via phylink) to advertise up to 1000BASET support, the aquantia phy device is still advertising for 2.5G and 5G speeds. Clear these advertising defaults when requested. Cc: Ondrej Spacek <ondrej.spacek@nxp.com> Fixes: 09c4c57f7bc41 ("net: phy: aquantia: add support for auto-negotiation configuration") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-16mm/kmemleak: prevent soft lockup in first object iteration loop of ↵Waiman Long
kmemleak_scan() The first RCU-based object iteration loop has to modify the object count. So we cannot skip taking the object lock. One way to avoid soft lockup is to insert occasional cond_resched() call into the loop. This cannot be done while holding the RCU read lock which is to protect objects from being freed. However, taking a reference to the object will prevent it from being freed. We can then do a cond_resched() call after every 64k objects safely. Link: https://lkml.kernel.org/r/20220614220359.59282-4-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/kmemleak: skip unlikely objects in kmemleak_scan() without taking lockWaiman Long
There are 3 RCU-based object iteration loops in kmemleak_scan(). Because of the need to take RCU read lock, we can't insert cond_resched() into the loop like other parts of the function. As there can be millions of objects to be scanned, it takes a while to iterate all of them. The kmemleak functionality is usually enabled in a debug kernel which is much slower than a non-debug kernel. With sufficient number of kmemleak objects, the time to iterate them all may exceed 22s causing soft lockup. watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kmemleak:625] In this particular bug report, the soft lockup happen in the 2nd iteration loop. In the 2nd and 3rd loops, most of the objects are checked and then skipped under the object lock. Only a selected fews are modified. Those objects certainly need lock protection. However, the lock/unlock operation is slow especially with interrupt disabling and enabling included. We can actually do some basic check like color_white() without taking the lock and skip the object accordingly. Of course, this kind of check is racy and may miss objects that are being modified concurrently. The cost of missed objects, however, is just that they will be discovered in the next scan instead. The advantage of doing so is that iteration can be done much faster especially with LOCKDEP enabled in a debug kernel. With a debug kernel running on a 2-socket 96-thread x86-64 system (HZ=1000), the 2nd and 3rd iteration loops speedup with this patch on the first kmemleak_scan() call after bootup is shown in the table below. Before patch After patch Loop # # of objects Elapsed time # of objects Elapsed time ------ ------------ ------------ ------------ ------------ 2 2,599,850 2.392s 2,596,364 0.266s 3 2,600,176 2.171s 2,597,061 0.260s This patch reduces loop iteration times by about 88%. This will greatly reduce the chance of a soft lockup happening in the 2nd or 3rd iteration loops. Even though the first loop runs a little bit faster, it can still be problematic if many kmemleak objects are there. As the object count has to be modified in every object, we cannot avoid taking the object lock. So other way to prevent soft lockup will be needed. Link: https://lkml.kernel.org/r/20220614220359.59282-3-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/kmemleak: use _irq lock/unlock variants in kmemleak_scan/_clear()Waiman Long
Patch series "mm/kmemleak: Avoid soft lockup in kmemleak_scan()", v2. There are 3 RCU-based object iteration loops in kmemleak_scan(). Because of the need to take RCU read lock, we can't insert cond_resched() into the loop like other parts of the function. As there can be millions of objects to be scanned, it takes a while to iterate all of them. The kmemleak functionality is usually enabled in a debug kernel which is much slower than a non-debug kernel. With sufficient number of kmemleak objects, the time to iterate them all may exceed 22s causing soft lockup. watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kmemleak:625] This patch series make changes to the 3 object iteration loops in kmemleak_scan() to prevent them from causing soft lockup. This patch (of 3): kmemleak_scan() is called only from the kmemleak scan thread or from write to the kmemleak debugfs file. Both are in task context and so we can directly use the simpler _irq() lock/unlock calls instead of the more complex _irqsave/_irqrestore variants. Similarly, kmemleak_clear() is called only from write to the kmemleak debugfs file. The same change can be applied. Link: https://lkml.kernel.org/r/20220614220359.59282-1-longman@redhat.com Link: https://lkml.kernel.org/r/20220614220359.59282-2-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/sparse-vmemmap.c: remove unwanted initialization in ↵Gautam Menghani
vmemmap_populate_compound_pages() Remove unnecessary initialization for the variable 'next'. This fixes the clang scan warning: Value stored to 'next' during its initialization is never read [deadcode.DeadStores] Link: https://lkml.kernel.org/r/20220612182320.160651-1-gautammenghani201@gmail.com Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16selftests: make use of GUP_TEST_FILE macroJoel Savitz
Commit 17de1e559cf1 ("selftests: clarify common error when running gup_test") had most of its hunks dropped due to a conflict with another patch accepted into Linux around the same time that implemented the same behavior as a subset of other changes. However, the remaining hunk defines the GUP_TEST_FILE macro without making use of it. This patch makes use of the macro in the two relevant places. Furthermore, the above mentioned commit's log message erroneously describes the changes that were dropped from the patch. This patch corrects the record. Link: https://lkml.kernel.org/r/20220609203217.3206247-1-jsavitz@redhat.com Fixes: 17de1e559cf1 ("selftests: clarify common error when running gup_test") Signed-off-by: Joel Savitz <jsavitz@redhat.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16userfaultfd/selftests: fix typo in commentXiang wangx
Delete the redundant word 'in'. Link: https://lkml.kernel.org/r/20220610071244.59679-1-wangxiang@cdjrlc.com Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16net: set proper memcg for net_init hooks allocationsVasily Averin
__register_pernet_operations() executes init hook of registered pernet_operation structure in all existing net namespaces. Typically, these hooks are called by a process associated with the specified net namespace, and all __GFP_ACCOUNT marked allocation are accounted for corresponding container/memcg. However __register_pernet_operations() calls the hooks in the same context, and as a result all marked allocations are accounted to one memcg for all processed net namespaces. This patch adjusts active memcg for each net namespace and helps to account memory allocated inside ops_init() into the proper memcg. Link: https://lkml.kernel.org/r/f9394752-e272-9bf9-645f-a18c56d1c4ec@openvz.org Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Florian Westphal <fw@strlen.de> Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Linux Kernel Functional Testing <lkft@linaro.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm: kmem: make mem_cgroup_from_obj() vmalloc()-safeRoman Gushchin
Currently mem_cgroup_from_obj() is not working properly with objects allocated using vmalloc(). It creates problems in some cases, when it's called for static objects belonging to modules or generally allocated using vmalloc(). This patch makes mem_cgroup_from_obj() safe to be called on objects allocated using vmalloc(). It also introduces mem_cgroup_from_slab_obj(), which is a faster version to use in places when we know the object is either a slab object or a generic slab page (e.g. when adding an object to a lru list). Link: https://lkml.kernel.org/r/20220610180310.1725111-1-roman.gushchin@linux.dev Suggested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Shakeel Butt <shakeelb@google.com> Tested-by: Vasily Averin <vvs@openvz.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Qian Cai <quic_qiancai@quicinc.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/memremap: fix memunmap_pages() race with get_dev_pagemap()Miaohe Lin
Think about the below scene: CPU1 CPU2 memunmap_pages percpu_ref_exit __percpu_ref_exit free_percpu(percpu_count); /* percpu_count is freed here! */ get_dev_pagemap xa_load(&pgmap_array, PHYS_PFN(phys)) /* pgmap still in the pgmap_array */ percpu_ref_tryget_live(&pgmap->ref) if __ref_is_percpu /* __PERCPU_REF_ATOMIC_DEAD not set yet */ this_cpu_inc(*percpu_count) /* access freed percpu_count here! */ ref->percpu_count_ptr = __PERCPU_REF_ATOMIC_DEAD; /* too late... */ pageunmap_range To fix the issue, do percpu_ref_exit() after pgmap_array is emptied. So we won't do percpu_ref_tryget_live() against a being freed percpu_ref. Link: https://lkml.kernel.org/r/20220609121305.2508-1-linmiaohe@huawei.com Fixes: b7b3c01b1915 ("mm/memremap_pages: support multiple ranges per invocation") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm: kmemleak: check physical address when scanPatrick Wang
Check the physical address of objects for its boundary when scan instead of in kmemleak_*_phys(). Link: https://lkml.kernel.org/r/20220611035551.1823303-5-patrick.wang.shcn@gmail.com Fixes: 23c2d497de21 ("mm: kmemleak: take a full lowmem check in kmemleak_*_phys()") Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Yee Lee <yee.lee@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm: kmemleak: add rbtree and store physical address for objects allocated ↵Patrick Wang
with PA Add object_phys_tree_root to store the objects allocated with physical address. Distinguish it from object_tree_root by OBJECT_PHYS flag or function argument. The physical address is stored directly in those objects. Link: https://lkml.kernel.org/r/20220611035551.1823303-4-patrick.wang.shcn@gmail.com Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Yee Lee <yee.lee@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm: kmemleak: add OBJECT_PHYS flag for objects allocated with physical addressPatrick Wang
Add OBJECT_PHYS flag for object. This flag is used to identify the objects allocated with physical address. The create_object_phys() function is added as well to set that flag and is used by kmemleak_alloc_phys(). Link: https://lkml.kernel.org/r/20220611035551.1823303-3-patrick.wang.shcn@gmail.com Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Yee Lee <yee.lee@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm: kmemleak: remove kmemleak_not_leak_phys() and the min_count argument to ↵Patrick Wang
kmemleak_alloc_phys() Patch series "mm: kmemleak: store objects allocated with physical address separately and check when scan", v4. The kmemleak_*_phys() interface uses "min_low_pfn" and "max_low_pfn" to check address. But on some architectures, kmemleak_*_phys() is called before those two variables initialized. The following steps will be taken: 1) Add OBJECT_PHYS flag and rbtree for the objects allocated with physical address 2) Store physical address in objects if allocated with OBJECT_PHYS 3) Check the boundary when scan instead of in kmemleak_*_phys() This patch set will solve: https://lore.kernel.org/r/20220527032504.30341-1-yee.lee@mediatek.com https://lore.kernel.org/r/9dd08bb5-f39e-53d8-f88d-bec598a08c93@gmail.com v3: https://lore.kernel.org/r/20220609124950.1694394-1-patrick.wang.shcn@gmail.com v2: https://lore.kernel.org/r/20220603035415.1243913-1-patrick.wang.shcn@gmail.com v1: https://lore.kernel.org/r/20220531150823.1004101-1-patrick.wang.shcn@gmail.com This patch (of 4): Remove the unused kmemleak_not_leak_phys() function. And remove the min_count argument to kmemleak_alloc_phys() function, assume it's 0. Link: https://lkml.kernel.org/r/20220611035551.1823303-1-patrick.wang.shcn@gmail.com Link: https://lkml.kernel.org/r/20220611035551.1823303-2-patrick.wang.shcn@gmail.com Signed-off-by: Patrick Wang <patrick.wang.shcn@gmail.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Yee Lee <yee.lee@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16lib/test_hmm: avoid accessing uninitialized pagesMiaohe Lin
If make_device_exclusive_range() fails or returns pages marked for exclusive access less than required, remaining fields of pages will left uninitialized. So dmirror_atomic_map() will access those yet uninitialized fields of pages. To fix it, do dmirror_atomic_map() iff all pages are marked for exclusive access (we will break if mapped is less than required anyway) so we won't access those uninitialized fields of pages. Link: https://lkml.kernel.org/r/20220609130835.35110-1-linmiaohe@huawei.com Fixes: b659baea7546 ("mm: selftests for exclusive device memory") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/memremap: fix wrong function name above memremap_pages()Miaohe Lin
Fix the wrong function name dev_memremap_pages above memremap_pages() to avoid confusion. Minor readability improvement. Link: https://lkml.kernel.org/r/20220607143621.58989-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/mempool: use might_alloc()Daniel Vetter
mempool are generally used for GFP_NOIO, so this wont benefit all that much because might_alloc currently only checks GFP_NOFS. But it does validate against mmu notifier pte zapping, some might catch some drivers doing really silly things, plus it's a bit more meaningful in what we're checking for here. Link: https://lkml.kernel.org/r/20220605152539.3196045-3-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/slab: delete cache_alloc_debugcheck_before()Daniel Vetter
It only does a might_sleep_if(GFP_RECLAIM) check, which is already covered by the might_alloc() in slab_pre_alloc_hook(). And all callers of cache_alloc_debugcheck_before() call that beforehand already. Link: https://lkml.kernel.org/r/20220605152539.3196045-2-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/page_alloc: use might_alloc()Daniel Vetter
... instead of open coding it. Completely equivalent code, just a notch more meaningful when reading. Link: https://lkml.kernel.org/r/20220605152539.3196045-1-daniel.vetter@ffwll.ch Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/highmem: delete memmove_page()Fabio M. De Francesco
Matthew Wilcox reported that, while he was looking at memmove_page(), he realized that it can't actually work. The reasons are hidden in its implementation, which makes use of memmove() on logical addresses provided by kmap_local_page(). memmove() does the wrong thing when it tests "if (dest <= src)". Therefore, delete memmove_page(). No need to change any other code because we have no call sites of memmove_page() across the whole kernel. Link: https://lkml.kernel.org/r/20220606141533.555-1-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reported-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm: memcontrol: add {pgscan,pgsteal}_{kswapd,direct} items in memory.stat of ↵Qi Zheng
cgroup v2 There are already statistics of {pgscan,pgsteal}_kswapd and {pgscan,pgsteal}_direct of memcg event here, but now only the sum of the two is displayed in memory.stat of cgroup v2. In order to obtain more accurate information during monitoring and debugging, and to align with the display in /proc/vmstat, it better to display {pgscan,pgsteal}_kswapd and {pgscan,pgsteal}_direct separately. Also, for forward compatibility, we still display pgscan and pgsteal items so that it won't break existing applications. [zhengqi.arch@bytedance.com: add comment for memcg_vm_event_stat (suggested by Michal)] Link: https://lkml.kernel.org/r/20220606154028.55030-1-zhengqi.arch@bytedance.com [zhengqi.arch@bytedance.com: fix the doc, thanks to Johannes] Link: https://lkml.kernel.org/r/20220607064803.79363-1-zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/20220604082209.55174-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/vmalloc: add code comment for find_vmap_area_exceed_addr()Baoquan He
Its behaviour is like find_vma() which finds an area above the specified address, add comment to make it easier to understand. And also fix two places of grammer mistake/typo. Link: https://lkml.kernel.org/r/20220607105958.382076-5-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/vmalloc: fix typo in local variable nameBaoquan He
In __purge_vmap_area_lazy(), rename local_pure_list to local_purge_list. Link: https://lkml.kernel.org/r/20220607105958.382076-4-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/vmalloc: remove the redundant boundary checkBaoquan He
In find_va_links(), when traversing the vmap_area tree, the comparing to check if the passed in 'va' is above or below 'tmp_va' is redundant, assuming both 'va' and 'tmp_va' has ->va_start <= ->va_end. Here, to simplify the checking as code change. Link: https://lkml.kernel.org/r/20220607105958.382076-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/vmalloc: invoke classify_va_fit_type() in adjust_va_to_fit_type()Baoquan He
Patch series "Cleanup patches of vmalloc", v2. Some cleanup patches found when reading vmalloc code. This patch (of 4): adjust_va_to_fit_type() checks all values of passed in fit type, including NOTHING_FIT in the else branch. However, the check of NOTHING_FIT has been done inside adjust_va_to_fit_type() and before it's called in all call sites. In fact, both of these functions are coupled tightly, since classify_va_fit_type() is doing the preparation work for adjust_va_to_fit_type(). So putting invocation of classify_va_fit_type() inside adjust_va_to_fit_type() can simplify code logic and the redundant check of NOTHING_FIT issue will go away. Link: https://lkml.kernel.org/r/20220607105958.382076-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20220607105958.382076-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Suggested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/damon: remove obsolete comments of kdamond_stopChengming Zhou
Since commit 0f91d13366a4 ("mm/damon: simplify stop mechanism") delete kdamond_stop and change to use kthread stop mechanism, these obsolete comments should be removed accordingly. Link: https://lkml.kernel.org/r/20220531020421.46849-1-zhouchengming@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/memory_hotplug: drop 'reason' argument from check_pfn_span()Anshuman Khandual
In check_pfn_span(), a 'reason' string is being used to recreate the caller function name, while printing the warning message. It is really unnecessary as the warning message could just be printed inside the caller depending on the return code. Currently there are just two callers for check_pfn_span() i.e __add_pages() and __remove_pages(). Let's clean this up. Link: https://lkml.kernel.org/r/20220531090441.170650-1-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/shmem.c: clean up comment of shmem_swapin_folioMiaohe Lin
shmem_swapin_folio has changed to use folio but comment still mentions page. Update the relevant comment accordingly as suggested by Naoya. Link: https://lkml.kernel.org/r/20220530115841.4348-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm: avoid unnecessary page fault retires on shared memory typesPeter Xu
I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon no page. It's because we'll need to release the mmap lock for dirty rate limit purpose with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). Then after that throttling we return VM_FAULT_RETRY. We did that probably because VM_FAULT_RETRY is the only way we can return to the fault handler at that time telling it we've released the mmap lock. However that's not ideal because it's very likely the fault does not need to be retried at all since the pgtable was well installed before the throttling, so the next continuous fault (including taking mmap read lock, walk the pgtable, etc.) could be in most cases unnecessary. It's not only slowing down page faults for shared file-backed, but also add more mmap lock contention which is in most cases not needed at all. To observe this, one could try to write to some shmem page and look at "pgfault" value in /proc/vmstat, then we should expect 2 counts for each shmem write simply because we retried, and vm event "pgfault" will capture that. To make it more efficient, add a new VM_FAULT_COMPLETED return code just to show that we've completed the whole fault and released the lock. It's also a hint that we should very possibly not need another fault immediately on this page because we've just completed it. This patch provides a ~12% perf boost on my aarch64 test VM with a simple program sequentially dirtying 400MB shmem file being mmap()ed and these are the time it needs: Before: 650.980 ms (+-1.94%) After: 569.396 ms (+-1.38%) I believe it could help more than that. We need some special care on GUP and the s390 pgfault handler (for gmap code before returning from pgfault), the rest changes in the page fault handlers should be relatively straightforward. Another thing to mention is that mm_account_fault() does take this new fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping them as-is. Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vineet Gupta <vgupta@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part] Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Brian Cain <bcain@quicinc.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Weinberger <richard@nod.at> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Will Deacon <will@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Simek <monstr@monstr.eu> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: David Hildenbrand <david@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Helge Deller <deller@gmx.de> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16tools/vm/slabinfo: use alphabetic order when two values are equalYuanzheng Song
When the number of partial slabs in each cache is the same (e.g., the value are 0), the results of the `slabinfo -X -N5` and `slabinfo -P -N5` are different. / # slabinfo -X -N5 ... Slabs sorted by number of partial slabs --------------------------------------- Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg inode_cache 15180 392 6217728 758/0/1 20 1 0 95 a kernfs_node_cache 22494 88 2002944 488/0/1 46 0 0 98 shmem_inode_cache 663 464 319488 38/0/1 17 1 0 96 biovec-max 50 3072 163840 4/0/1 10 3 0 93 A dentry 19050 136 2600960 633/0/2 30 0 0 99 a / # slabinfo -P -N5 Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg bdev_cache 32 984 32.7K 1/0/1 16 2 0 96 Aa ext4_inode_cache 42 752 32.7K 1/0/1 21 2 0 96 a dentry 19050 136 2.6M 633/0/2 30 0 0 99 a TCPv6 17 1840 32.7K 0/0/1 17 3 0 95 A RAWv6 18 856 16.3K 0/0/1 18 2 0 94 A This problem is caused by the sort_slabs(). So let's use alphabetic order when two values are equal in the sort_slabs(). By the way, the content of the `slabinfo -h` is not aligned because the `-P|--partial Sort by number of partial slabs` uses tabs instead of spaces. So let's use spaces instead of tabs to fix it. Link: https://lkml.kernel.org/r/20220528063117.935158-1-songyuanzheng@huawei.com Fixes: 1106b205a3fe ("tools/vm/slabinfo: add partial slab listing to -X") Signed-off-by: Yuanzheng Song <songyuanzheng@huawei.com> Cc: "Tobin C. Harding" <tobin@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm: use PAGE_ALIGNED instead of IS_ALIGNEDFanjun Kong
<linux/mm.h> already provides the PAGE_ALIGNED macro. Let's use this macro instead of IS_ALIGNED and passing PAGE_SIZE directly. Link: https://lkml.kernel.org/r/20220526140257.1568744-1-bh1scw@gmail.com Signed-off-by: Fanjun Kong <bh1scw@gmail.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/x86: remove dead code for hugetlbpage.cPeter Xu
It seems to exist since the old times and never used once. Remove them. Link: https://lkml.kernel.org/r/20220525195220.10241-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16Merge branch 'bpf: Fix cookie values for kprobe multi'Alexei Starovoitov
Jiri Olsa says: ==================== hi, there's bug in kprobe_multi link that makes cookies misplaced when using symbols to attach. The reason is that we sort symbols by name but not adjacent cookie values. Current test did not find it because bpf_fentry_test* are already sorted by name. v3 changes: - fixed kprobe_multi bench test to filter out invalid entries from available_filter_functions v2 changes: - rebased on top of bpf/master - checking if cookies are defined later in swap function [Andrii] - added acks thanks, jirka ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16selftest/bpf: Fix kprobe_multi bench testJiri Olsa
With [1] the available_filter_functions file contains records starting with __ftrace_invalid_address___ and marking disabled entries. We need to filter them out for the bench test to pass only resolvable symbols to kernel. [1] commit b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") Fixes: b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16bpf: Force cookies array to follow symbols sortingJiri Olsa
When user specifies symbols and cookies for kprobe_multi link interface it's very likely the cookies will be misplaced and returned to wrong functions (via get_attach_cookie helper). The reason is that to resolve the provided functions we sort them before passing them to ftrace_lookup_symbols, but we do not do the same sort on the cookie values. Fixing this by using sort_r function with custom swap callback that swaps cookie values as well. Fixes: 0236fec57a15 ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16ftrace: Keep address offset in ftrace_lookup_symbolsJiri Olsa
We want to store the resolved address on the same index as the symbol string, because that's the user (bpf kprobe link) code assumption. Also making sure we don't store duplicates that might be present in kallsyms. Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Fixes: bed0d9a50dac ("ftrace: Add ftrace_lookup_symbols function") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16selftests/bpf: Shuffle cookies symbols in kprobe multi testJiri Olsa
There's a kernel bug that causes cookies to be misplaced and the reason we did not catch this with this test is that we provide bpf_fentry_test* functions already sorted by name. Shuffling function bpf_fentry_test2 deeper in the list and keeping the current cookie values as before will trigger the bug. The kernel fix is coming in following changes. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16mailmap: add entry for Christian MarangiChristian Marangi
Add entry to map ansuelsmth@gmail.com to the unique identity of Christian Marangi. Link: https://lkml.kernel.org/r/20220615225012.18782-1-ansuelsmth@gmail.com Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16mm/memory-failure: disable unpoison once hw error happenszhenwei pi
Currently unpoison_memory(unsigned long pfn) is designed for soft poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets cleared on a x86 platform once hardware memory corrupts. Unpoisoning a hardware corrupted page puts page back buddy only, the kernel has a chance to access the page with *NOT PRESENT* KPTE. This leads BUG during accessing on the corrupted KPTE. Suggested by David&Naoya, disable unpoison mechanism when a real HW error happens to avoid BUG like this: Unpoison: Software-unpoisoned page 0x61234 BUG: unable to handle page fault for address: ffff888061234000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ... RIP: 0010:clear_page_erms+0x7/0x10 Code: ... RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000 RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000 RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001 FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> prep_new_page+0x151/0x170 get_page_from_freelist+0xca0/0xe20 ? sysvec_apic_timer_interrupt+0xab/0xc0 ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 __alloc_pages+0x17e/0x340 __folio_alloc+0x17/0x40 vma_alloc_folio+0x84/0x280 __handle_mm_fault+0x8d4/0xeb0 handle_mm_fault+0xd5/0x2a0 do_user_addr_fault+0x1d0/0x680 ? kvm_read_and_reset_apf_flags+0x3b/0x50 exc_page_fault+0x78/0x170 asm_exc_page_fault+0x27/0x30 Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support") Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned") Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16hugetlbfs: zero partial pages during fallocate hole punchMike Kravetz
hugetlbfs fallocate support was originally added with commit 70c3547e36f5 ("hugetlbfs: add hugetlbfs_fallocate()"). Initial support only operated on whole hugetlb pages. This makes sense for populating files as other interfaces such as mmap and truncate require hugetlb page size alignment. Only operating on whole hugetlb pages for the hole punch case was a simplification and there was no compelling use case to zero partial pages. In a recent discussion[1] it was assumed that hugetlbfs hole punch would zero partial hugetlb pages as that is in line with the man page description saying 'partial filesystem blocks are zeroed'. However, the hugetlbfs hole punch code actually does this: hole_start = round_up(offset, hpage_size); hole_end = round_down(offset + len, hpage_size); Modify code to zero partial hugetlb pages in hole punch range. It is possible that application code could note a change in behavior. However, that would imply the code is passing in an unaligned range and expecting only whole pages be removed. This is unlikely as the fallocate documentation states the opposite. The current hugetlbfs fallocate hole punch behavior is tested with the libhugetlbfs test fallocate_align[2]. This test will be updated to validate partial page zeroing. [1] https://lore.kernel.org/linux-mm/20571829-9d3d-0b48-817c-b6b15565f651@redhat.com/ [2] https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/fallocate_align.c Link: https://lkml.kernel.org/r/YqeiMlZDKI1Kabfe@monkey Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>