summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-05-13netfilter: conntrack: add nf_conntrack_events autodetect modeFlorian Westphal
This adds the new nf_conntrack_events=2 mode and makes it the default. This leverages the earlier flag in struct net to allow to avoid the event extension as long as no event listener is active in the namespace. This avoids, for most cases, allocation of ct->ext area. A followup patch will take further advantage of this by avoiding calls down into the event framework if the extension isn't present. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: conntrack: un-inline nf_ct_ecache_ext_addFlorian Westphal
Only called when new ct is allocated or the extension isn't present. This function will be extended, place this in the conntrack module instead of inlining. The callers already depend on nf_conntrack module. Return value is changed to bool, noone used the returned pointer. Make sure that the core drops the newly allocated conntrack if the extension is requested but can't be added. This makes it necessary to ifdef the section, as the stub always returns false we'd drop every new conntrack if the the ecache extension is disabled in kconfig. Add from data path (xt_CT, nft_ct) is unchanged. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: nfnetlink: allow to detect if ctnetlink listeners existFlorian Westphal
At this time, every new conntrack gets the 'event cache extension' enabled for it. This is because the 'net.netfilter.nf_conntrack_events' sysctl defaults to 1. Changing the default to 0 means that commands that rely on the event notification extension, e.g. 'conntrack -E' or conntrackd, stop working. We COULD detect if there is a listener by means of 'nfnetlink_has_listeners()' and only add the extension if this is true. The downside is a dependency from conntrack module to nfnetlink module. This adds a different way: inc/dec a counter whenever a ctnetlink group is being (un)subscribed and toggle a flag in struct net. Next patches will take advantage of this and will only add the event extension if the flag is set. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: conntrack: add nf_ct_iter_data object for nf_ct_iterate_cleanup*()Pablo Neira Ayuso
This patch adds a structure to collect all the context data that is passed to the cleanup iterator. struct nf_ct_iter_data { struct net *net; void *data; u32 portid; int report; }; There is a netns field that allows to clean up conntrack entries specifically owned by the specified netns. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: conntrack: avoid unconditional local_bh_disableFlorian Westphal
Now that the conntrack entry isn't placed on the pcpu list anymore the bh only needs to be disabled in the 'expectation present' case. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: conntrack: remove unconfirmed listFlorian Westphal
It has no function anymore and can be removed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: conntrack: remove __nf_ct_unconfirmed_destroyFlorian Westphal
Its not needed anymore: A. If entry is totally new, then the rcu-protected resource must already have been removed from global visibility before call to nf_ct_iterate_destroy. B. If entry was allocated before, but is not yet in the hash table (uncofirmed case), genid gets incremented and synchronize_rcu() call makes sure access has completed. C. Next attempt to peek at extension area will fail for unconfirmed conntracks, because ext->genid != genid. D. Conntracks in the hash are iterated as before. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: cttimeout: decouple unlink and free on netns destructionFlorian Westphal
Increment the extid on module removal; this makes sure that even in extreme cases any old uncofirmed entry that happened to be kept e.g. on nfnetlink_queue list will not trip over a stale timeout reference. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: extensions: introduce extension genid countFlorian Westphal
Multiple netfilter extensions store pointers to external data in their extension area struct. Examples: 1. Timeout policies 2. Connection tracking helpers. No references are taken for these. When a helper or timeout policy is removed, the conntrack table gets traversed and affected extensions are cleared. Conntrack entries not yet in the hashtable are referenced via a special list, the unconfirmed list. On removal of a policy or connection tracking helper, the unconfirmed list gets traversed an all entries are marked as dying, this prevents them from getting committed to the table at insertion time: core checks for dying bit, if set, the conntrack entry gets destroyed at confirm time. The disadvantage is that each new conntrack has to be added to the percpu unconfirmed list, and each insertion needs to remove it from this list. The list is only ever needed when a policy or helper is removed -- a rare occurrence. Add a generation ID count: Instead of adding to the list and then traversing that list on policy/helper removal, increment a counter that is stored in the extension area. For unconfirmed conntracks, the extension has the genid valid at ct allocation time. Removal of a helper/policy etc. increments the counter. At confirmation time, validate that ext->genid == global_id. If the stored number is not the same, do not allow the conntrack insertion, just like as if a confirmed-list traversal would have flagged the entry as dying. After insertion, the genid is no longer relevant (conntrack entries are now reachable via the conntrack table iterators and is set to 0. This allows removal of the percpu unconfirmed list. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: remove nf_ct_unconfirmed_destroy helperFlorian Westphal
This helper tags connections not yet in the conntrack table as dying. These nf_conn entries will be dropped instead when the core attempts to insert them from the input or postrouting 'confirm' hook. After the previous change, the entries get unlinked from the list earlier, so that by the time the actual exit hook runs, new connections no longer have a timeout policy assigned. Its enough to walk the hashtable instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: cttimeout: decouple unlink and free on netns destructionFlorian Westphal
Make it so netns pre_exit unlinks the objects from the pernet list, so they cannot be found anymore. netns core issues a synchronize_rcu() before calling the exit hooks so any the time the exit hooks run unconfirmed nf_conn entries have been free'd or they have been committed to the hashtable. The exit hook still tags unconfirmed entries as dying, this can now be removed in a followup change. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: conntrack: remove the percpu dying listFlorian Westphal
Its no longer needed. Entries that need event redelivery are placed on the new pernet dying list. The advantage is that there is no need to take additional spinlock on conntrack removal unless event redelivery failed or the conntrack entry was never added to the table in the first place (confirmed bit not set). The IPS_CONFIRMED bit now needs to be set as soon as the entry has been unlinked from the unconfirmed list, else the destroy function may attempt to unlink it a second time. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: conntrack: include ecache dying list in dumpsFlorian Westphal
The new pernet dying list includes conntrack entries that await delivery of the 'destroy' event via ctnetlink. The old percpu dying list will be removed soon. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13netfilter: ecache: use dedicated list for event redeliveryFlorian Westphal
This disentangles event redelivery and the percpu dying list. Because entries are now stored on a dedicated list, all entries are in NFCT_ECACHE_DESTROY_FAIL state and all entries still have confirmed bit set -- the reference count is at least 1. The 'struct net' back-pointer can be removed as well. The pcpu dying list will be removed eventually, it has no functionality. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-13xtensa: add trap handler for division by zeroMax Filippov
Add c-level handler for the division by zero exception and kill the task if it was thrown from the kernel space or send SIGFPE otherwise. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2022-05-13Merge tag 'mvebu-arm-5.19-1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into arm/soc mvebu arm for 5.19 (part 1) Fix typos in comment on orion5x files * tag 'mvebu-arm-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu: orion5x: fix typos in comments Link: https://lore.kernel.org/r/87o801r2ss.fsf@BL-laptop Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-13Merge tag 'mvebu-dt-5.19-1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into arm/dt mvebu dt for 5.19 (part 1) Add the crypto module atsha204a node for the turis omnia (Armada 385 bases) * tag 'mvebu-dt-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu: ARM: dts: turris-omnia: Add atsha204a node Link: https://lore.kernel.org/r/87lev5r2rg.fsf@BL-laptop Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-13Merge tag 'mvebu-dt64-5.19-1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu into arm/dt mvebu dt64 for 5.19 (part 1) Update sdhci node names to match schema on all mvebu dt64 dtsi files Armada 3720: uDPU board: - correct temperature sensors - update partition table espressobin-ultra board: - enable front USB3 port - add PHY and switch reset pins - fix SPI-NOR config * tag 'mvebu-dt64-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gclement/mvebu: arm64: dts: marvell: Update sdhci node names to match schema arm64: dts: marvell: espressobin-ultra: enable front USB3 port arm64: dts: marvell: espressobin-ultra: add PHY and switch reset pins arm64: dts: marvell: espressobin-ultra: fix SPI-NOR config arm64: dts: uDPU: correct temperature sensors arm64: dts: uDPU: update partition table Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-05-13drm/vmwgfx: Disable command buffers on svga3 without gbobjectsZack Rusin
With very limited vram on svga3 it's difficult to handle all the surface migrations. Without gbobjects, i.e. the ability to store surfaces in guest mobs, there's no reason to support intermediate svga2 features, especially because we can fall back to fb traces and svga3 will never support those in-between features. On svga3 we wither want to use fb traces or screen targets (i.e. gbobjects), nothing in between. This fixes presentation on a lot of fusion/esxi tech previews where the exposed svga3 caps haven't been finalized yet. Signed-off-by: Zack Rusin <zackr@vmware.com> Fixes: 2cd80dbd3551 ("drm/vmwgfx: Add basic support for SVGA3") Cc: <stable@vger.kernel.org> # v5.14+ Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220318174332.440068-5-zack@kde.org
2022-05-13drm/vmwgfx: Initialize drm_mode_fb_cmd2Zack Rusin
Transition to drm_mode_fb_cmd2 from drm_mode_fb_cmd left the structure unitialized. drm_mode_fb_cmd2 adds a few additional members, e.g. flags and modifiers which were never initialized. Garbage in those members can cause random failures during the bringup of the fbcon. Initializing the structure fixes random blank screens after bootup due to flags/modifiers mismatches during the fbcon bring up. Fixes: dabdcdc9822a ("drm/vmwgfx: Switch to mode_cmd2") Signed-off-by: Zack Rusin <zackr@vmware.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Reviewed-by: Martin Krastev <krastevm@vmware.com> Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302152426.885214-7-zack@kde.org
2022-05-13drm/vmwgfx: Fix fencing on SVGAv3Zack Rusin
Port of the vmwgfx to SVGAv3 lacked support for fencing. SVGAv3 removed FIFO's and replaced them with command buffers and extra registers. The initial version of SVGAv3 lacked support for most advanced features (e.g. 3D) which made fences unnecessary. That is no longer the case, especially as 3D support is being turned on. Switch from FIFO commands and capabilities to command buffers and extra registers to enable fences on SVGAv3. Fixes: 2cd80dbd3551 ("drm/vmwgfx: Add basic support for SVGA3") Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Martin Krastev <krastevm@vmware.com> Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220302152426.885214-5-zack@kde.org
2022-05-13mm/memory-failure.c: simplify num_poisoned_pages_inc/deczhenwei pi
Originally, do num_poisoned_pages_inc() in memory failure routine, use num_poisoned_pages_dec() to rollback the number if filtered/ cancelled. Suggested by Naoya, do num_poisoned_pages_inc() only in action_result(), this make this clear and simple. Link: https://lkml.kernel.org/r/20220509105641.491313-6-pizhenwei@bytedance.com Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/hwpoison: disable hwpoison filter during removingzhenwei pi
hwpoison filter is enabled by hwpoison-inject module, after removing this module, hwpoison filter still works. What is worse, user can not find the debugfs entries to know this. Disable the hwpoison filter during removing hwpoison-inject module. Link: https://lkml.kernel.org/r/20220509105641.491313-5-pizhenwei@bytedance.com Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/memory-failure.c: add hwpoison_filter for soft offlinezhenwei pi
hwpoison_filter is missing in the soft offline path, this leads an issue: after enabling the corrupt filter, the user process still has a chance to inject hwpoison fault by madvise(addr, len, MADV_SOFT_OFFLINE) at PFN which is expected to reject. Also do a minor change in comment of memory_failure(). Link: https://lkml.kernel.org/r/20220509105641.491313-4-pizhenwei@bytedance.com Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/memory-failure.c: simplify num_poisoned_pages_deczhenwei pi
Don't decrease the number of poisoned pages in page_alloc.c, let the memory-failure.c do inc/dec poisoned pages only. Also simplify unpoison_memory(), only decrease the number of poisoned pages when: - TestClearPageHWPoison() succeed - put_page_back_buddy succeed After decreasing, print necessary log. Finally, remove clear_page_hwpoison() and unpoison_taken_off_page(). Link: https://lkml.kernel.org/r/20220509105641.491313-3-pizhenwei@bytedance.com Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/memory-failure.c: move clear_hwpoisoned_pageszhenwei pi
Patch series "memory-failure: fix hwpoison_filter", v2. As well known, the memory failure mechanism handles memory corrupted event, and try to send SIGBUS to the user process which uses this corrupted page. For the virtualization case, QEMU catches SIGBUS and tries to inject MCE into the guest, and the guest handles memory failure again. Thus the guest gets the minimal effect from hardware memory corruption. The further step I'm working on: 1, try to modify code to decrease poisoned pages in a single place (mm/memofy-failure.c: simplify num_poisoned_pages_dec in this series). 2, try to use page_handle_poison() to handle SetPageHWPoison() and num_poisoned_pages_inc() together. It would be best to call num_poisoned_pages_inc() in a single place too. 3, introduce memory failure notifier list in memory-failure.c: notify the corrupted PFN to someone who registers this list. If I can complete [1] and [2] part, [3] will be quite easy(just call notifier list after increasing poisoned page). 4, introduce memory recover VQ for memory balloon device, and registers memory failure notifier list. During the guest kernel handles memory failure, balloon device gets notified by memory failure notifier list, and tells the host to recover the corrupted PFN(GPA) by the new VQ. 5, host side remaps the corrupted page(HVA), and tells the guest side to unpoison the PFN(GPA). Then the guest fixes the corrupted page(GPA) dynamically. This patch (of 5): clear_hwpoisoned_pages() clears HWPoison flag and decreases the number of poisoned pages, this actually works as part of memory failure. Move this function from sparse.c to memory-failure.c, finally there is no CONFIG_MEMORY_FAILURE in sparse.c. Link: https://lkml.kernel.org/r/20220509105641.491313-1-pizhenwei@bytedance.com Link: https://lkml.kernel.org/r/20220509105641.491313-2-pizhenwei@bytedance.com Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/page_owner: use strscpy() instead of strlcpy()Eric Dumazet
current->comm[] is not a string (no guarantee for a zero byte in it). strlcpy(s1, s2, l) is calling strlen(s2), potentially causing out-of-bound access, as reported by syzbot: detected buffer overflow in __fortify_strlen ------------[ cut here ]------------ kernel BUG at lib/string_helpers.c:980! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 4087 Comm: dhcpcd-run-hooks Not tainted 5.18.0-rc3-syzkaller-01537-g20b87e7c29df #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:fortify_panic+0x18/0x1a lib/string_helpers.c:980 Code: 8c e8 c5 ba e1 fa e9 23 0f bf fa e8 0b 5d 8c f8 eb db 55 48 89 fd e8 e0 49 40 f8 48 89 ee 48 c7 c7 80 f5 26 8a e8 99 09 f1 ff <0f> 0b e8 ca 49 40 f8 48 8b 54 24 18 4c 89 f1 48 c7 c7 00 00 27 8a RSP: 0018:ffffc900000074a8 EFLAGS: 00010286 RAX: 000000000000002c RBX: ffff88801226b728 RCX: 0000000000000000 RDX: ffff8880198e0000 RSI: ffffffff81600458 RDI: fffff52000000e87 RBP: ffffffff89da2aa0 R08: 000000000000002c R09: 0000000000000000 R10: ffffffff815fae2e R11: 0000000000000000 R12: ffff88801226b700 R13: ffff8880198e0830 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5876ad6ff8 CR3: 000000001a48c000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: <IRQ> __fortify_strlen include/linux/fortify-string.h:128 [inline] strlcpy include/linux/fortify-string.h:143 [inline] __set_page_owner_handle+0x2b1/0x3e0 mm/page_owner.c:171 __set_page_owner+0x3e/0x50 mm/page_owner.c:190 prep_new_page mm/page_alloc.c:2441 [inline] get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408 alloc_pages+0x1aa/0x310 mm/mempolicy.c:2272 alloc_slab_page mm/slub.c:1799 [inline] allocate_slab+0x26c/0x3c0 mm/slub.c:1944 new_slab mm/slub.c:2004 [inline] ___slab_alloc+0x8df/0xf20 mm/slub.c:3005 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3092 slab_alloc_node mm/slub.c:3183 [inline] slab_alloc mm/slub.c:3225 [inline] __kmem_cache_alloc_lru mm/slub.c:3232 [inline] kmem_cache_alloc+0x360/0x3b0 mm/slub.c:3242 dst_alloc+0x146/0x1f0 net/core/dst.c:92 Link: https://lkml.kernel.org/r/20220509145949.265184-1-eric.dumazet@gmail.com Fixes: 865ed6a32786 ("mm/page_owner: record task command name") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Waiman Long <longman@redhat.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13kasan: clean-up kconfig options descriptionsAndrey Konovalov
Various readability clean-ups of KASAN Kconfig options. No functional changes. Link: https://lkml.kernel.org/r/c160840dd9e4b1ad5529ecfdb0bba35d9a14d826.1652203271.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/47afaecec29221347bee49f58c258ac1ced3b429.1652123204.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13kasan: move boot parameters section in documentationAndrey Konovalov
Move the "Boot parameters" section in KASAN documentation next to the section that describes KASAN build options. No content changes. Link: https://lkml.kernel.org/r/870628e1293b4f44edf7cbcb92374ff9eb7503d7.1652203271.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/ec9c923f35e7c5312836c4624a7f317dc1ee2c1c.1652123204.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13kasan: update documentationAndrey Konovalov
Do assorted clean-ups and improvements to KASAN documentation, including: - Describe each mode in a dedicated paragraph. - Split out a Support section that describes in details which compilers, architectures and memory types each mode requires/supports. - Capitalize the first letter in the names of each KASAN mode. [andreyknvl@google.com: rewording, per Marco] Link: https://lkml.kernel.org/r/896b2d914d6b50d677fd7b38f76967cc705c01ba.1652203271.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/5bd58ebebf066593ce0e1d265d60278b5f5a1874.1652123204.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13kasan: give better names to shadow valuesAndrey Konovalov
Rename KASAN_KMALLOC_* shadow values to KASAN_SLAB_*, as they are used for all slab allocations, not only for kmalloc. Also rename KASAN_FREE_PAGE to KASAN_PAGE_FREE to be consistent with KASAN_PAGE_REDZONE and KASAN_SLAB_FREE. Link: https://lkml.kernel.org/r/bebcaf4eafdb0cabae0401a69c0af956aa87fcaa.1652111464.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13kasan: use tabs to align shadow valuesAndrey Konovalov
Consistently use tabs instead of spaces to shadow value definitions. Link: https://lkml.kernel.org/r/00e7e66b5fc375d58200dc1489949b3edcd096b7.1652111464.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13kasan: clean up comments in internal kasan.hAndrey Konovalov
Clean up comments in mm/kasan/kasan.h: clarify, unify styles, fix punctuation, etc. Link: https://lkml.kernel.org/r/a0680ff30035b56cb7bdd5f59fd400e71712ceb5.1652111464.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/vmalloc: use raw_cpu_ptr() for vmap_block_queue accessSebastian Andrzej Siewior
The per-CPU resource vmap_block_queue is accessed via get_cpu_var(). That macro disables preemption and then loads the pointer from the current CPU. This doesn't work on PREEMPT_RT because a spinlock_t is later accessed within the preempt-disable section. There is no need to disable preemption while accessing the per-CPU struct vmap_block_queue because the list is protected with a spinlock_t. The per-CPU struct is also accessed cross-CPU in purge_fragmented_blocks(). It is possible that by using raw_cpu_ptr() the code migrates to another CPU and uses struct from another CPU. This is fine because the list is locked and the locked section is very short. Use raw_cpu_ptr() to access vmap_block_queue. Link: https://lkml.kernel.org/r/YnKx3duAB53P7ojN@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13tracing: incorrect gfp_t conversionVasily Averin
Fixes the following sparse warnings: include/trace/events/*: sparse: cast to restricted gfp_t include/trace/events/*: sparse: restricted gfp_t degrades to integer gfp_t type is bitwise and requires __force attributes for any casts. Link: https://lkml.kernel.org/r/331d88fe-f4f7-657c-02a2-d977f15fbff6@openvz.org Signed-off-by: Vasily Averin <vvs@openvz.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13zram: remove double compression logicAlexey Romanov
The 2nd trial allocation under per-cpu presumption has been used to prevent regression of allocation failure. However, it makes trouble for maintenance without significant benefit. The slowpath branch is executed extremely rarely: getting there is problematic. Therefore, we delete this branch. Since b09ab054b69b ("zram: support BDI_CAP_STABLE_WRITES"), zram has used QUEUE_FLAG_STABLE_WRITES to prevent buffer change between 1st and 2nd memory allocations. Since we remove second trial memory allocation logic, we could remove the STABLE_WRITES flag because there is no change buffer to be modified under us. Link: https://lkml.kernel.org/r/20220505094443.11728-1-avromanov@sberdevices.ru Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru> Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13percpu: improve percpu_alloc_percpu event traceVasily Averin
Add call_site, bytes_alloc and gfp_flags fields to the output of the percpu_alloc_percpu ftrace event: mkdir-4393 [001] 169.334788: percpu_alloc_percpu: call_site=mem_cgroup_css_alloc+0xa6 reserved=0 is_atomic=0 size=2408 align=8 base_addr=0xffffc7117fc00000 off=402176 ptr=0x3dc867a62300 bytes_alloc=14448 gfp_flags=GFP_KERNEL_ACCOUNT This is required to track memcg-accounted percpu allocations. Link: https://lkml.kernel.org/r/a07be858-c8a3-7851-9086-e3262cbcf707@openvz.org Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13docs: vm/page_owner: tweak literal block in STANDARD FORMAT SPECIFIERSAkira Yokosawa
A semantic conflict between commit 5603f9bdea68 ("docs: vm/page_owner: use literal blocks for param description") and a change queued for v5.19 authored by Jiajian Ye ("tools/vm/page_owner_sort.c: support sorting blocks by multiple keys") results in a warning from "make htmldocs" saying: [...]/vm/page_owner.rst:176: WARNING: Literal block expected; none found. This is because a literal block in ReST ends at a line which has the same indent as the paragraph preceding it. In this case the one with no indent. Indent the two "For --xxxx option:" lines by two columns and make the whole section a literal block. While at it, fix indents by white spaces of "ator" keys. Link: https://lkml.kernel.org/r/fdfecc82-d41e-6d8a-738d-4beb6faa27fb@gmail.com Signed-of-by: Akira Yokosawa <akiyks@gmail.com> Reported-by: Shenghong Han <hanshenghong2019@email.szu.edu.cn> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn> Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn> Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Cc: Yongqiang Liu <liuyongqiang13@huawei.com> Cc: Yuhong Feng <yuhongf@szu.edu.cn> Cc: Haowen Bai <baihaowen@meizu.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/damon/reclaim: use resource_size function on resource objectJiapeng Chong
Fix the following coccicheck warnings: ./mm/damon/reclaim.c:241:30-33: WARNING: Suspicious code. resource_size is maybe missing with res. Link: https://lkml.kernel.org/r/20220507032512.129598-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: "Boehme, Markus" <markubo@amazon.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: functions may simplify the use of return valuesLi kunyu
p4d_clear_huge may be optimized for void return type and function usage. vunmap_p4d_range function saves a few steps here. Link: https://lkml.kernel.org/r/20220507150630.90399-1-kunyu@nfschina.com Signed-off-by: Li kunyu <kunyu@nfschina.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13riscv/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECKTong Tiangen
As commit d283d422c6c4 ("x86: mm: add x86_64 support for page table check"), enable ARCH_SUPPORTS_PAGE_TABLE_CHECK on riscv. Add additional page table check stubs for page table helpers, these stubs can be used to check the existing page table entries. Link: https://lkml.kernel.org/r/20220507110114.4128854-7-tongtiangen@huawei.com Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECKKefeng Wang
As commit d283d422c6c4 ("x86: mm: add x86_64 support for page table check") , enable ARCH_SUPPORTS_PAGE_TABLE_CHECK on arm64. Add additional page table check stubs for page table helpers, these stubs can be used to check the existing page table entries. Link: https://lkml.kernel.org/r/20220507110114.4128854-6-tongtiangen@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: remove __HAVE_ARCH_PTEP_CLEAR in pgtable.hTong Tiangen
Currently, there is no architecture definition __HAVE_ARCH_PTEP_CLEAR, Generic ptep_clear() is the only definition for all architecture, So drop the "#ifndef __HAVE_ARCH_PTEP_CLEAR". Link: https://lkml.kernel.org/r/20220507110114.4128854-5-tongtiangen@huawei.com Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: page_table_check: add hooks to public helpersTong Tiangen
Move ptep_clear() to the include/linux/pgtable.h and add page table check relate hooks to some helpers, it's prepare for support page table check feature on new architecture. Optimize the implementation of ptep_clear(), page table hooks added page table check stubs, the interface control should be at stubs, there is no rationale for doing a IS_ENABLED() check here. For architectures that do not enable CONFIG_PAGE_TABLE_CHECK, they will call a fallback page table check stubs[1] when getting their page table helpers[2] in include/linux/pgtable.h. [1] page table check stubs defined in include/linux/page_table_check.h [2] ptep_clear() ptep_get_and_clear() pmdp_huge_get_and_clear() pudp_huge_get_and_clear() Link: https://lkml.kernel.org/r/20220507110114.4128854-4-tongtiangen@huawei.com Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: page_table_check: move pxx_user_accessible_page into x86Kefeng Wang
The pxx_user_accessible_page() checks the PTE bit, it's architecture-specific code, move them into x86's pgtable.h. These helpers are being moved out to make the page table check framework platform independent. Link: https://lkml.kernel.org/r/20220507110114.4128854-3-tongtiangen@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: page_table_check: using PxD_SIZE instead of PxD_PAGE_SIZETong Tiangen
Patch series "mm: page_table_check: add support on arm64 and riscv", v7. Page table check performs extra verifications at the time when new pages become accessible from the userspace by getting their page table entries (PTEs PMDs etc.) added into the table. It is supported on X86[1]. This patchset made some simple changes and make it easier to support new architecture, then we support this feature on ARM64 and RISCV. [1]https://lore.kernel.org/lkml/20211123214814.3756047-1-pasha.tatashin@soleen.com/ This patch (of 6): Compared with PxD_PAGE_SIZE, which is defined and used only on X86, PxD_SIZE is more common in each architecture. Therefore, it is more reasonable to use PxD_SIZE instead of PxD_PAGE_SIZE in page_table_check.c. At the same time, it is easier to support page table check in other architectures. The substitution has no functional impact on the x86. Link: https://lkml.kernel.org/r/20220507110114.4128854-1-tongtiangen@huawei.com Link: https://lkml.kernel.org/r/20220507110114.4128854-2-tongtiangen@huawei.com Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/migrate: convert move_to_new_page() into move_to_new_folio()Matthew Wilcox (Oracle)
Pass in the folios that we already have in each caller. Saves a lot of calls to compound_head(). Link: https://lkml.kernel.org/r/20220504182857.4013401-27-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: add folio_test_movable()Matthew Wilcox (Oracle)
This is the folio equivalent of PageMovable() which is needed to convert mm/migrate.c to folios. Link: https://lkml.kernel.org/r/20220504182857.4013401-26-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm: add folio_mapping_flags()Matthew Wilcox (Oracle)
This is the equivalent of PageMappingFlags and is needed for converting mm/migrate.c to folios. Link: https://lkml.kernel.org/r/20220504182857.4013401-25-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()Matthew Wilcox (Oracle)
shmem_swapin_page() only brings in order-0 pages, which are folios by definition. Link: https://lkml.kernel.org/r/20220504182857.4013401-24-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>