summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2022-03-22Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ...
2022-03-22Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull folio updates from Matthew Wilcox: - Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/ - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/ - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) * tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits) mm/damon: minor cleanup for damon_pa_young selftests/vm/transhuge-stress: Support file-backed PMD folios mm/filemap: Support VM_HUGEPAGE for file mappings mm/readahead: Switch to page_cache_ra_order mm/readahead: Align file mappings for non-DAX mm/readahead: Add large folio readahead mm: Support arbitrary THP sizes mm: Make large folios depend on THP mm: Fix READ_ONLY_THP warning mm/filemap: Allow large folios to be added to the page cache mm: Turn can_split_huge_page() into can_split_folio() mm/vmscan: Convert pageout() to take a folio mm/vmscan: Turn page_check_references() into folio_check_references() mm/vmscan: Account large folios correctly mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios mm/vmscan: Free non-shmem folios without splitting them mm/rmap: Constify the rmap_walk_control argument mm/rmap: Convert rmap_walk() to take a folio mm: Turn page_anon_vma() into folio_anon_vma() mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() ...
2022-03-22net/mlx5e: Fix build warning, detected write beyond size of fieldSaeed Mahameed
When merged with Linus tree, the cited patch below will cause the following build warning: In function 'fortify_memset_chk', inlined from 'mlx5e_xmit_xdp_frame' at drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c:438:3: include/linux/fortify-string.h:242:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix that by grouping the fields to memeset in struct_group() to avoid the false alarm. Fixes: 9ded70fa1d81 ("net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer mode") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20220322172224.31849-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22iwlwifi: mvm: Don't fail if PPAG isn't supportedMiri Korenblit
When we're copying the PPAG table into the cmd structure we're failing if the table doesn't exist in ACPI or is invalid, or if the FW doesn't support PPAG setting etc. This is wrong because those are valid scenarios. Fix this by not failing in those cases. Fixes: e8e10a37c51c ("iwlwifi: acpi: move ppag code from mvm to fw/acpi") Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Acked-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/iwlwifi.20220322173828.fa47f369b717.I6a9c65149c2c3c11337f3a802dff22f514a3a436@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22drivers/base/memory: clarify adding and removing of memory blocksDavid Hildenbrand
Let's make it clearer at which places we actually add and remove memory blocks -- streamlining the terminology -- and highlight which memory block start out online and which start out as offline. * rename add_memory_block -> add_boot_memory_block * rename init_memory_block -> add_memory_block * rename unregister_memory -> remove_memory_block * rename register_memory -> __add_memory_block * add add_hotplug_memory_block * mark add_boot_memory_block with __init (suggested by Oscar) __add_memory_block() is a pure helper for add_memory_block(), remove the somewhat obvious comment. Link: https://lkml.kernel.org/r/20220221154531.11382-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/memory: determine and store zone for single-zone memory blocksDavid Hildenbrand
test_pages_in_a_zone() is just another nasty PFN walker that can easily stumble over ZONE_DEVICE memory ranges falling into the same memory block as ordinary system RAM: the memmap of parts of these ranges might possibly be uninitialized. In fact, we observed (on an older kernel) with UBSAN: UBSAN: Undefined behaviour in ./include/linux/mm.h:1133:50 index 7 is out of range for type 'zone [5]' CPU: 121 PID: 35603 Comm: read_all Kdump: loaded Tainted: [...] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.12.2 11/15/2019 Call Trace: dump_stack+0x9a/0xf0 ubsan_epilogue+0x9/0x7a __ubsan_handle_out_of_bounds+0x13a/0x181 test_pages_in_a_zone+0x3c4/0x500 show_valid_zones+0x1fa/0x380 dev_attr_show+0x43/0xb0 sysfs_kf_seq_show+0x1c5/0x440 seq_read+0x49d/0x1190 vfs_read+0xff/0x300 ksys_read+0xb8/0x170 do_syscall_64+0xa5/0x4b0 entry_SYSCALL_64_after_hwframe+0x6a/0xdf RIP: 0033:0x7f01f4439b52 We seem to stumble over a memmap that contains a garbage zone id. While we could try inserting pfn_to_online_page() calls, it will just make memory offlining slower, because we use test_pages_in_a_zone() to make sure we're offlining pages that all belong to the same zone. Let's just get rid of this PFN walker and determine the single zone of a memory block -- if any -- for early memory blocks during boot. For memory onlining, we know the single zone already. Let's avoid any additional memmap scanning and just rely on the zone information available during boot. For memory hot(un)plug, we only really care about memory blocks that: * span a single zone (and, thereby, a single node) * are completely System RAM (IOW, no holes, no ZONE_DEVICE) If one of these conditions is not met, we reject memory offlining. Hotplugged memory blocks (starting out offline), always meet both conditions. There are three scenarios to handle: (1) Memory hot(un)plug A memory block with zone == NULL cannot be offlined, corresponding to our previous test_pages_in_a_zone() check. After successful memory onlining/offlining, we simply set the zone accordingly. * Memory onlining: set the zone we just used for onlining * Memory offlining: set zone = NULL So a hotplugged memory block starts with zone = NULL. Once memory onlining is done, we set the proper zone. (2) Boot memory with !CONFIG_NUMA We know that there is just a single pgdat, so we simply scan all zones of that pgdat for an intersection with our memory block PFN range when adding the memory block. If more than one zone intersects (e.g., DMA and DMA32 on x86 for the first memory block) we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. (3) Boot memory with CONFIG_NUMA At the point in time we create the memory block devices during boot, we don't know yet which nodes *actually* span a memory block. While we could scan all zones of all nodes for intersections, overlapping nodes complicate the situation and scanning all nodes is possibly expensive. But that problem has already been solved by the code that sets the node of a memory block and creates the link in the sysfs -- do_register_memory_block_under_node(). So, we hook into the code that sets the node id for a memory block. If we already have a different node id set for the memory block, we know that multiple nodes *actually* have PFNs falling into our memory block: we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. If there is no node id set, we do the same as (2) for the given node. Note that the call order in driver_init() is: -> memory_dev_init(): create memory block devices -> node_dev_init(): link memory block devices to the node and set the node id So in summary, we detect if there is a single zone responsible for this memory block and we consequently store the zone in that case in the memory block, updating it during memory onlining/offlining. Link: https://lkml.kernel.org/r/20220210184359.235565-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Rafael Parra <rparrazo@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rafael Parra <rparrazo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/node: rename link_mem_sections() to ↵David Hildenbrand
register_memory_block_under_node() Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2. I remember talking to Michal in the past about removing test_pages_in_a_zone(), which we use for: * verifying that a memory block we intend to offline is really only managed by a single zone. We don't support offlining of memory blocks that are managed by multiple zones (e.g., multiple nodes, DMA and DMA32) * exposing that zone to user space via /sys/devices/system/memory/memory*/valid_zones Now that I identified some more cases where test_pages_in_a_zone() might go wrong, and we received an UBSAN report (see patch #3), let's get rid of this PFN walker. So instead of detecting the zone at runtime with test_pages_in_a_zone() by scanning the memmap, let's determine and remember for each memory block if it's managed by a single zone. The stored zone can then be used for the above two cases, avoiding a manual lookup using test_pages_in_a_zone(). This avoids eventually stumbling over uninitialized memmaps in corner cases, especially when ZONE_DEVICE ranges partly fall into memory block (that are responsible for managing System RAM). Handling memory onlining is easy, because we online to exactly one zone. Handling boot memory is more tricky, because we want to avoid scanning all zones of all nodes to detect possible zones that overlap with the physical memory region of interest. Fortunately, we already have code that determines the applicable nodes for a memory block, to create sysfs links -- we'll hook into that. Patch #1 is a simple cleanup I had laying around for a longer time. Patch #2 contains the main logic to remove test_pages_in_a_zone() and further details. [1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com [2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com This patch (of 2): Let's adjust the stale terminology, making it match unregister_memory_block_under_nodes() and do_register_memory_block_under_node(). We're dealing with memory block devices, which span 1..X memory sections. Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rafael Parra <rparrazo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/node: consolidate node device subsystem initialization in ↵David Hildenbrand
node_dev_init() ... and call node_dev_init() after memory_dev_init() from driver_init(), so before any of the existing arch/subsys calls. All online nodes should be known at that point: early during boot, arch code determines node and zone ranges and sets the relevant nodes online; usually this happens in setup_arch(). This is in line with memory_dev_init(), which initializes the memory device subsystem and creates all memory block devices. Similar to memory_dev_init(), panic() if anything goes wrong, we don't want to continue with such basic initialization errors. The important part is that node_dev_init() gets called after memory_dev_init() and after cpu_dev_init(), but before any of the relevant archs call register_cpu() to register the new cpu device under the node device. The latter should be the case for the current users of topology_init(). Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Tested-by: Anatoly Pugachev <matorola@gmail.com> (sparc64) Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/memory: add memory block to memory group after registration ↵David Hildenbrand
succeeded If register_memory() fails, we freed the memory block but already added the memory block to the group list, not good. Let's defer adding the block to the memory group to after registering the memory block device. We do handle it properly during unregister_memory(), but that's not called when the registration fails. Link: https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handlerluofei
When the hwpoison page meets the filter conditions, it should not be regarded as successful memory_failure() processing for mce handler, but should return a distinct value, otherwise mce handler regards the error page has been identified and isolated, which may lead to calling set_mce_nospec() to change page attribute, etc. Here memory_failure() return -EOPNOTSUPP to indicate that the error event is filtered, mce handler should not take any action for this situation and hwpoison injector should treat as correct. Link: https://lkml.kernel.org/r/20220223082135.2769649-1-luofei@unicloud.com Signed-off-by: luofei <luofei@unicloud.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm: enforce pageblock_order < MAX_ORDERDavid Hildenbrand
Some places in the kernel don't really expect pageblock_order >= MAX_ORDER, and it looks like this is only possible in corner cases: 1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order pages via __free_pages_core(), which cannot possibly work. 2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger pageblock_order, we could have a single pageblock partially managed by two zones. 3) compaction code runs into __fragmentation_index() with order >= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1] 4) mm/page_reporting.c won't be reporting any pages with default page_reporting_order == pageblock_order, as we'll be skipping the reporting loop inside page_reporting_process_zone(). 5) __rmqueue_fallback() will never be able to steal with ALLOC_NOFRAGMENT. pageblock_order >= MAX_ORDER is weird either way: it's a pure optimization for making alloc_contig_range(), as used for allcoation of gigantic pages, a little more reliable to succeed. However, if there is demand for somewhat reliable allocation of gigantic pages, affected setups should be using CMA or boottime allocations instead. So let's make sure that pageblock_order < MAX_ORDER and simplify. [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20220214174132.219303-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Frank Rowand <frowand.list@gmail.com> Cc: John Garry via iommu <iommu@lists.linux-foundation.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22cma: factor out minimum alignment requirementDavid Hildenbrand
Patch series "mm: enforce pageblock_order < MAX_ORDER". Having pageblock_order >= MAX_ORDER seems to be able to happen in corner cases and some parts of the kernel are not prepared for it. For example, Aneesh has shown [1] that such kernels can be compiled on ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will run into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right during boot. We can get pageblock_order >= MAX_ORDER when the default hugetlb size is bigger than the maximum allocation granularity of the buddy, in which case we are no longer talking about huge pages but instead gigantic pages. Having pageblock_order >= MAX_ORDER can only make alloc_contig_range() of such gigantic pages more likely to succeed. Reliable use of gigantic pages either requires boot time allcoation or CMA, no need to overcomplicate some places in the kernel to optimize for corner cases that are broken in other areas of the kernel. This patch (of 2): Let's enforce pageblock_order < MAX_ORDER and simplify. Especially patch #1 can be regarded a cleanup before: [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment. [2] [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com [2] https://lkml.kernel.org/r/20220211164135.1803616-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20220214174132.219303-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Rob Herring <robh@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: John Garry via iommu <iommu@lists.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22memcg: enable accounting for tty-related objectsVasily Averin
At each login the user forces the kernel to create a new terminal and allocate up to ~1Kb memory for the tty-related structures. By default it's allowed to create up to 4096 ptys with 1024 reserve for initial mount namespace only and the settings are controlled by host admin. Though this default is not enough for hosters with thousands of containers per node. Host admin can be forced to increase it up to NR_UNIX98_PTY_MAX = 1<<20. By default container is restricted by pty mount_opt.max = 1024, but admin inside container can change it via remount. As a result, one container can consume almost all allowed ptys and allocate up to 1Gb of unaccounted memory. It is not enough per-se to trigger OOM on host, however anyway, it allows to significantly exceed the assigned memcg limit and leads to troubles on the over-committed node. It makes sense to account for them to restrict the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/5d4bca06-7d4f-a905-e518-12981ebca1b3@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22fs: allocate inode by using alloc_inode_sb()Muchun Song
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22remove bdi_congested() and wb_congested() and related functionsNeilBrown
These functions are no longer useful as no BDIs report congestions any more. Removing the test on bdi_write_contested() in current_may_throttle() could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE is set. So replace the calls by 'false' and simplify the code - and remove the functions. [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs] Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22Merge branch 'remotes/lorenzo/pci/xgene'Bjorn Helgaas
- Revert "PCI: xgene: Use inbound resources for setup" (Marc Zyngier) - Revert "PCI: xgene: Fix IB window setup" (Marc Zyngier) * remotes/lorenzo/pci/xgene: PCI: xgene: Revert "PCI: xgene: Fix IB window setup" PCI: xgene: Revert "PCI: xgene: Use inbound resources for setup"
2022-03-22Merge branch 'remotes/lorenzo/pci/uniphier'Bjorn Helgaas
- Add DT binding and endpoint driver support for UniPhier NX1 SoC (Kunihiko Hayashi) * remotes/lorenzo/pci/uniphier: PCI: uniphier-ep: Add NX1 support PCI: uniphier-ep: Add SoC data structure dt-bindings: PCI: uniphier-ep: Add bindings for NX1 SoC
2022-03-22Merge branch 'remotes/lorenzo/pci/rcar'Bjorn Helgaas
- Finish transition to L1 state in rcar_pcie_config_access() because R-Car can't do it on its own (Marek Vasut) - Return PCI_ERROR_RESPONSE for reads that trigger PCIe errors (Marek Vasut) * remotes/lorenzo/pci/rcar: PCI: rcar: Use PCI_SET_ERROR_RESPONSE after read which triggered an exception PCI: rcar: Finish transition to L1 state in rcar_pcie_config_access()
2022-03-22Merge branch 'remotes/lorenzo/pci/qcom'Bjorn Helgaas
- Save pointer to device match data instead of copying it (Dmitry Baryshkov) - Add ddrss_sf_tbu flag to device match data instead of checking OF compatible string (Dmitry Baryshkov) - Add SM8450 SoC PCIe DT bindings (Dmitry Baryshkov) - Add SM8450 PCIe support (Dmitry Baryshkov) * remotes/lorenzo/pci/qcom: PCI: qcom: Add SM8450 PCIe support PCI: qcom: Add ddrss_sf_tbu flag PCI: qcom: Remove redundancy between qcom_pcie and qcom_pcie_cfg dt-bindings: pci: qcom: Document PCIe bindings for SM8450
2022-03-22Merge branch 'remotes/lorenzo/pci/mvebu'Bjorn Helgaas
- Add Pali Rohár as pci-mvebu.c maintainer (Pali Rohár) - Make struct pci_bridge_emul_ops const (Pali Rohár) - Rename PCI_BRIDGE_EMUL_NO_PREFETCHABLE_BAR to PCI_BRIDGE_EMUL_NO_PREFMEM_FORWARD since it doesn't apply to BARs (Pali Rohár) - Add new flag PCI_BRIDGE_EMUL_NO_IO_FORWARD for bridges that don't support IO forwarding (Pali Rohár) - Add Kconfig help text for CONFIG_PCI_MVEBU (Pali Rohár) - Remove duplicate nports assignment (Pali Rohár) - Set PCI_BRIDGE_EMUL_NO_IO_FORWARD when IO is unsupported (Pali Rohár) - Initialize vendor, device and revision of emulated bridge (Pali Rohár) - Fix Data Link Layer Link Active reporting on emulated bridge (Pali Rohár) - Rearrange tests in bridge emulation for easier maintenance (Russell King) - Add emulated bridge support for PCIe extended capabilities (Russell King) - Add emulated bridge support for bridge Subsystem Vendor ID capability (Pali Rohár) - Configure Maximum Link Width based on DT "num-lanes" property (Pali Rohár) - Emulate bridge Subsystem Vendor ID capability (Pali Rohár) - Emulate AER Capability (Pali Rohár) - Use PCI core bridge->ops and bridge->child_ops to separate config accesses to Root Port vs downstream devices (Pali Rohár) - Unmask all INTx interrupts; they're reported via a single shared GIC source (Pali Rohár) - Add INTx support (Pali Rohár) * remotes/lorenzo/pci/mvebu: PCI: mvebu: Implement support for legacy INTx interrupts PCI: mvebu: Fix macro names and comments about legacy interrupts dt-bindings: PCI: mvebu: Update information about intx interrupts PCI: mvebu: Use child_ops API PCI: mvebu: Add support for Advanced Error Reporting registers on emulated bridge PCI: mvebu: Add support for PCI Bridge Subsystem Vendor ID on emulated bridge PCI: mvebu: Correctly configure x1/x4 mode dt-bindings: PCI: mvebu: Add num-lanes property PCI: pci-bridge-emul: Add support for PCI Bridge Subsystem Vendor ID capability PCI: pci-bridge-emul: Add support for PCIe extended capabilities PCI: pci-bridge-emul: Re-arrange register tests PCI: mvebu: Fix reporting Data Link Layer Link Active on emulated bridge PCI: mvebu: Update comment for PCI_EXP_LNKCTL register on emulated bridge PCI: mvebu: Update comment for PCI_EXP_LNKCAP register on emulated bridge PCI: mvebu: Properly initialize vendor, device and revision of emulated bridge PCI: mvebu: Set PCI_BRIDGE_EMUL_NO_IO_FORWARD when IO is unsupported PCI: mvebu: Remove duplicate nports assignment PCI: mvebu: Add help string for CONFIG_PCI_MVEBU option PCI: pci-bridge-emul: Add support for new flag PCI_BRIDGE_EMUL_NO_IO_FORWARD PCI: pci-bridge-emul: Rename PCI_BRIDGE_EMUL_NO_PREFETCHABLE_BAR to PCI_BRIDGE_EMUL_NO_PREFMEM_FORWARD PCI: pci-bridge-emul: Make struct pci_bridge_emul_ops as const MAINTAINERS: Add Pali Rohár as pci-mvebu.c maintainer
2022-03-22Merge branch 'remotes/lorenzo/pci/misc'Bjorn Helgaas
- Add generic SZ_1T macro instead of a local one in pci-xgene.c (Christophe Leroy) * remotes/lorenzo/pci/misc: sizes.h: Add SZ_1T macro
2022-03-22Merge branch 'remotes/lorenzo/pci/imx6'Bjorn Helgaas
- Allow host controller driver to probe successfully (as other drivers do) even if link is currently down (Fabio Estevam) - Enable i.MX6QP PCIe power management (Richard Zhu) - Invoke PHY exit function after PHY power off (Richard Zhu) - Assert i.MX8MM CLKREQ# even if no device present to avoid boot hangs (Richard Zhu) * remotes/lorenzo/pci/imx6: PCI: imx6: Assert i.MX8MM CLKREQ# even if no device present PCI: imx6: Invoke the PHY exit function after PHY power off PCI: imx6: Enable i.MX6QP PCIe power management support PCI: imx6: Allow to probe when dw_pcie_wait_for_link() fails
2022-03-22Merge branch 'remotes/lorenzo/pci/hv'Bjorn Helgaas
- Avoid retarget interrupt hypercall in irq_unmask() on ARM64 (Boqun Feng) * remotes/lorenzo/pci/hv: PCI: hv: Avoid the retarget interrupt hypercall in irq_unmask() on ARM64
2022-03-22Merge branch 'pci/host/fu740'Bjorn Helgaas
- Drop redundant '-gpios' from DT GPIO lookup (Ben Dooks) - Force 2.5GT/s for initial device probe to workaround enumeration issue on SiFive Unmatched board (Ben Dooks) * pci/host/fu740: PCI: fu740: Force 2.5GT/s for initial device probe PCI: fu740: Drop redundant '-gpios' from DT GPIO lookup
2022-03-22Merge branch 'remotes/lorenzo/pci/endpoint'Bjorn Helgaas
- Fix alignment fault error in copy tests (Hou Zhiqiang) - Fix misused goto label (Li Chen) * remotes/lorenzo/pci/endpoint: PCI: endpoint: Fix misused goto label PCI: endpoint: Fix alignment fault error in copy tests
2022-03-22Merge branch 'pci/host/dwc'Bjorn Helgaas
- Restore MSI Receiver mask during resume (Jisheng Zhang) * pci/host/dwc: PCI: dwc: Restore MSI Receiver mask during resume
2022-03-22Merge branch 'remotes/lorenzo/pci/aardvark'Bjorn Helgaas
- Use PCI_INTERRUPT_* definitions from PCI core instead of custom ones (Pali Rohár) - Derive MSI number from bit(s) set in PCIE_MSI_STATUS_REG, not from PCIE_MSI_PAYLOAD_REG (Pali Rohár) - Align multi-MSI vectors to power of two (Pali Rohár) - Rewrite IRQ code to use chained IRQ handler (Pali Rohár) - Check return value of generic_handle_domain_irq() and warn about spurious interrupts (Pali Rohár) - Make MSI irq_chip structures static to driver (Marek Behún) - Make msi_domain_info structure static to driver (Marek Behún) - Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node) (Marek Behún) - Refactor unmasking of summary MSI interrupt (Pali Rohár) - Add support for masking MSI interrupts and leave them masked at setup (Pali Rohár) - Set MSI doorbell address to address of struct advk_pcie (Pali Rohár) - Enable MSI-X support (Pali Rohár) - Add support for ERR interrupt on emulated bridge (Pali Rohár) - Fix read of PCI_EXP_RTSTA_PME bit on emulated bridge (Pali Rohár) - Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge (Pali Rohár) - Add support for PME interrupts (Pali Rohár) - Fix support for PME requester on emulated bridge (Pali Rohár) - Use separate INTA interrupt for emulated Root Port so PME and AER interrupt is not shared with downstream devices (Pali Rohár) - Remove irq_mask_ack() callback for INTx interrupts (Pali Rohár) - Don't mask legacy INTx interrupts when mapping (Pali Rohár) - Drop unnecessary "__maybe_unused" from advk_pcie_disable_phy() (Marek Behún) - Update comment about why we check for link being up before issuing a config request (Marek Behún) * remotes/lorenzo/pci/aardvark: PCI: aardvark: Update comment about link going down after link-up PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy() PCI: aardvark: Don't mask irq when mapping PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts PCI: aardvark: Use separate INTA interrupt for emulated root bridge PCI: aardvark: Fix support for PME requester on emulated bridge PCI: aardvark: Add support for PME interrupts PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge PCI: aardvark: Fix reading PCI_EXP_RTSTA_PME bit on emulated bridge PCI: aardvark: Add support for ERR interrupt on emulated bridge PCI: aardvark: Enable MSI-X support PCI: aardvark: Fix setting MSI address PCI: aardvark: Add support for masking MSI interrupts PCI: aardvark: Refactor unmasking summary MSI interrupt PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node) PCI: aardvark: Make msi_domain_info structure a static driver structure PCI: aardvark: Make MSI irq_chip structures static driver structures PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ PCI: aardvark: Rewrite IRQ code to chained IRQ handler PCI: aardvark: Fix support for MSI interrupts PCI: aardvark: Fix reading MSI interrupt number PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_*
2022-03-22Merge branch 'pci/vga'Bjorn Helgaas
- Move vgaarb.c from drivers/gpu/vga to drivers/pci (Bjorn Helgaas) - Factor out default VGA device selection (Huacai Chen) - Move firmware default device detection to ADD_DEVICE path so we can select a default device regardless of whether it is enumerated before or after vga_arb_device_init() (Huacai Chen) - Move non-legacy VGA detection to ADD_DEVICE path (Huacai Chen) - Move disabled VGA device detection to ADD_DEVICE path (Huacai Chen) * pci/vga: PCI/VGA: Replace full MIT license text with SPDX identifier PCI/VGA: Use unsigned format string to print lock counts PCI/VGA: Log bridge control messages when adding devices PCI/VGA: Remove empty vga_arb_device_card_gone() PCI/VGA: Move disabled VGA device detection to ADD_DEVICE path PCI/VGA: Move non-legacy VGA detection to ADD_DEVICE path PCI/VGA: Move firmware default device detection to ADD_DEVICE path PCI/VGA: Factor out default VGA device selection PCI/VGA: Factor out vga_select_framebuffer_device() PCI/VGA: Move vga_arb_integrated_gpu() earlier in file PCI/VGA: Move vgaarb to drivers/pci
2022-03-22Merge branch 'pci/p2pdma'Bjorn Helgaas
- Add Intel 3rd Gen Intel Xeon Scalable Processors to P2PDMA whitelist (Michael J. Ruhl) * pci/p2pdma: PCI/P2PDMA: Add Intel 3rd Gen Intel Xeon Scalable Processors to whitelist
2022-03-22Merge branch 'pci/msi'Bjorn Helgaas
- Avoid broken MSI on SB600 USB devices (Bjorn Helgaas) * pci/msi: PCI: Avoid broken MSI on SB600 USB devices
2022-03-22Merge branch 'pci/misc'Bjorn Helgaas
- Update the aer-inject URL (Yicong Yang) - Declare pci_filp_private only when HAVE_PCI_MMAP to avoid unused struct definition (Krzysztof Wilczyński) - Remove unused assignments (Bjorn Helgaas) - Add #includes to asm/pci_x86.h to prevent build errors (Randy Dunlap) * pci/misc: x86/PCI: Add #includes to asm/pci_x86.h PCI: ibmphp: Remove unused assignments PCI: cpqphp: Remove unused assignments PCI: fu740: Remove unused assignments PCI: kirin: Remove unused assignments PCI: Remove unused assignments PCI: Declare pci_filp_private only when HAVE_PCI_MMAP PCI/AER: Update aer-inject URL
2022-03-22Merge branch 'pci/hotplug'Bjorn Helgaas
- Clear pciehp cmd_busy bit when command completes in polling mode to avoid spurious timeouts (Liguang Zhang) - Add quirk to work around Qualcomm hardware defect in Command Completed signaling (Manivannan Sadhasivam) * pci/hotplug: PCI: pciehp: Add Qualcomm quirk for Command Completed erratum PCI: pciehp: Clear cmd_busy bit in polling mode
2022-03-22Merge branch 'pci/enumeration'Bjorn Helgaas
- Support BAR sizes up to 8TB (Dongdong Liu) - Reduce warnings on hardware that doesn't support 8- or 16-bit PCI writes and hence may corrupt RW1C bits (Mark Tomlinson) * pci/enumeration: PCI: Reduce warnings on possible RW1C corruption PCI: Support BAR sizes up to 8TB
2022-03-22Merge branch 'pci/bridge-class-codes'Bjorn Helgaas
- Add and use #defines for normal and subtractive PCI bridges (Pali Rohár) - Set all 24 bits of PCI class code for iproc (Pali Rohár) * pci/bridge-class-codes: PCI: iproc: Set all 24 bits of PCI class code PCI: Add defines for normal and subtractive PCI bridges
2022-03-22Merge tag 'sched-core-2022-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Cleanups for SCHED_DEADLINE - Tracing updates/fixes - CPU Accounting fixes - First wave of changes to optimize the overhead of the scheduler build, from the fast-headers tree - including placeholder *_api.h headers for later header split-ups. - Preempt-dynamic using static_branch() for ARM64 - Isolation housekeeping mask rework; preperatory for further changes - NUMA-balancing: deal with CPU-less nodes - NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD) - Updates to RSEQ UAPI in preparation for glibc usage - Lots of RSEQ/selftests, for same - Add Suren as PSI co-maintainer * tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits) sched/headers: ARM needs asm/paravirt_api_clock.h too sched/numa: Fix boot crash on arm64 systems headers/prep: Fix header to build standalone: <linux/psi.h> sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y cgroup: Fix suspicious rcu_dereference_check() usage warning sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity() sched/deadline,rt: Remove unused functions for !CONFIG_SMP sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy() sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file sched/deadline: Remove unused def_dl_bandwidth sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE sched/tracing: Don't re-read p->state when emitting sched_switch event sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race sched/cpuacct: Remove redundant RCU read lock sched/cpuacct: Optimize away RCU read lock sched/cpuacct: Fix charge percpu cpuusage sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies ...
2022-03-22Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-03-21 v2 We've added 137 non-merge commits during the last 17 day(s) which contain a total of 143 files changed, 7123 insertions(+), 1092 deletions(-). The main changes are: 1) Custom SEC() handling in libbpf, from Andrii. 2) subskeleton support, from Delyan. 3) Use btf_tag to recognize __percpu pointers in the verifier, from Hao. 4) Fix net.core.bpf_jit_harden race, from Hou. 5) Fix bpf_sk_lookup remote_port on big-endian, from Jakub. 6) Introduce fprobe (multi kprobe) _without_ arch bits, from Masami. The arch specific bits will come later. 7) Introduce multi_kprobe bpf programs on top of fprobe, from Jiri. 8) Enable non-atomic allocations in local storage, from Joanne. 9) Various var_off ptr_to_btf_id fixed, from Kumar. 10) bpf_ima_file_hash helper, from Roberto. 11) Add "live packet" mode for XDP in BPF_PROG_RUN, from Toke. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (137 commits) selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" bpftool: Fix a bug in subskeleton code generation bpf: Fix bpf_prog_pack when PMU_SIZE is not defined bpf: Fix bpf_prog_pack for multi-node setup bpf: Fix warning for cast from restricted gfp_t in verifier bpf, arm: Fix various typos in comments libbpf: Close fd in bpf_object__reuse_map bpftool: Fix print error when show bpf map bpf: Fix kprobe_multi return probe backtrace Revert "bpf: Add support to inline bpf_get_func_ip helper on x86" bpf: Simplify check in btf_parse_hdr() selftests/bpf/test_lirc_mode2.sh: Exit with proper code bpf: Check for NULL return from bpf_get_btf_vmlinux selftests/bpf: Test skipping stacktrace bpf: Adjust BPF stack helper functions to accommodate skip > 0 bpf: Select proper size for bpf_prog_pack ... ==================== Link: https://lore.kernel.org/r/20220322050159.5507-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22ACPI, APEI: Use the correct variable for sizeof()Jakob Koschel
While the original code is valid, it is not the obvious choice for the sizeof() call and in preparation to limit the scope of the list iterator variable the sizeof should be changed to the size of the variable being allocated. Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-22cxl/core/port: Fix NULL but dereferenced coccicheck errorWan Jiabing
Fix the following coccicheck warning: ./drivers/cxl/core/port.c:913:21-24: ERROR: port is NULL but dereferenced. The put_device() is only relevent in the is_cxl_root() case. Fixes: 2703c16c75ae ("cxl/core/port: Add switch port enumeration") Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Link: https://lore.kernel.org/r/20220307094158.404882-1-wanjiabing@vivo.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-22PCI: ibmphp: Remove unused assignmentsBjorn Helgaas
Remove variables and assignments that are never used. Found by Krzysztof using cppcheck, e.g.: $ cppcheck --enable=all --force unreadVariable drivers/pci/hotplug/ibmphp_res.c:1958 Variable 'bus_sec' is assigned a value that is never used. Reported-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20220313192933.434746-6-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-03-22PCI: cpqphp: Remove unused assignmentsBjorn Helgaas
Remove variables and assignments that are never used. Found by Krzysztof using cppcheck, e.g.: $ cppcheck --enable=all --force unreadVariable drivers/pci/hotplug/cpqphp_core.c:1257 Variable 'rc' is assigned a value that is never used. Reported-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20220313192933.434746-5-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-03-22PCI: fu740: Remove unused assignmentsBjorn Helgaas
fu740_pcie_host_init() assigned "ret", but never used the value. Drop it. Found by Krzysztof using cppcheck: $ cppcheck --enable=all --force unreadVariable drivers/pci/controller/dwc/pcie-fu740.c:227 Variable 'ret' is assigned a value that is never used. Reported-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20220313192933.434746-4-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-03-22PCI: kirin: Remove unused assignmentsBjorn Helgaas
hi3660_pcie_phy_init() assigned "pdev", but never used the value. Drop it. Found by Krzysztof using cppcheck: $ cppcheck --enable=all --force unreadVariable drivers/pci/controller/dwc/pcie-kirin.c:336 Variable 'pdev' is assigned a value that is never used. Reported-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20220313192933.434746-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-03-22PCI: Remove unused assignmentsBjorn Helgaas
Remove variables and assignments that are never used. Found by Krzysztof using cppcheck, e.g., $ cppcheck --enable=all --force uselessAssignmentPtrArg drivers/pci/proc.c:102 Assignment of function parameter has no effect outside the function. Did you forget dereferencing it? unreadVariable drivers/pci/setup-bus.c:1528 Variable 'old_flags' is assigned a value that is never used. Reported-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20220313192933.434746-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-03-22PCI: Declare pci_filp_private only when HAVE_PCI_MMAPKrzysztof Wilczyński
The struct pci_filp_private has no users outside drivers/pci/proc.c and is only used when HAVE_PCI_MMAP is defined. Wrap the struct pci_filp_private definition itself in #ifdef HAVE_PCI_MMAP. Found by cppcheck: $ cppcheck --enable=all --force drivers/pci/proc.c drivers/pci/proc.c:192:6: style: struct member 'pci_filp_private::write_combine' is never used. [unusedStructMember] Link: https://lore.kernel.org/r/20210706003145.3054881-1-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2022-03-22Revert "random: block in /dev/urandom"Linus Torvalds
This reverts commit 6f98a4bfee72c22f50aedb39fb761567969865fe. It turns out we still can't do this. Way too many platforms that don't have any real source of randomness at boot and no jitter entropy because they don't even have a cycle counter. As reported by Guenter Roeck: "This causes a large number of qemu boot test failures for various architectures (arm, m68k, microblaze, sparc32, xtensa are the ones I observed). Common denominator is that boot hangs at 'Saving random seed:'" This isn't hugely unexpected - we tried it, it failed, so now we'll revert it. Link: https://lore.kernel.org/all/20220322155820.GA1745955@roeck-us.net/ Reported-and-bisected-by: Guenter Roeck <linux@roeck-us.net> Cc: Jason Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22Merge branch 'opp/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp Pull OPP (Operating Performance Points) changes for 5.18-rc1 from Viresh Kumar: "- Introduce opp-microwatt property to the OPP core, bindings, etc (Lukasz Luba). - Convert DT bindings to schema format and various related fixes (Yassine Oudjana). - Expose OPP's OF node in debugfs (Viresh Kumar)." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: Documentation: EM: Describe new registration method using DT OPP: Add support of "opp-microwatt" for EM registration PM: EM: add macro to set .active_power() callback conditionally OPP: Add "opp-microwatt" supporting code dt-bindings: opp: Add "opp-microwatt" entry in the OPP dt-bindings: power: avs: qcom,cpr: Convert to DT schema arm64: dts: qcom: qcs404: Rename CPU and CPR OPP tables arm64: dts: qcom: msm8996: Rename cluster OPP tables dt-bindings: opp: Convert qcom-nvmem-cpufreq to DT schema dt-bindings: opp: qcom-opp: Convert to DT schema arm64: dts: qcom: msm8996-mtp: Add msm8996 compatible dt-bindings: arm: qcom: Add msm8996 and apq8096 compatibles opp: Expose of-node's name in debugfs
2022-03-22Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq updates for 5.18-rc1 fron Viresh Kumar: "- Add per core DVFS support for QCom SoC (Bjorn Andersson), convert to yaml binding (Manivannan Sadhasivam) and various other fixes to the QCom drivers (Luca Weiss). - Add OPP table for imx7s SoC (Denys Drozdov) and minor fixes (Stefan Agner). - Fix CPPC driver's freq/performance conversions (Pierre Gondois). - Minor generic cleanups (Yury Norov)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: dt-bindings: cpufreq: cpufreq-qcom-hw: Convert to YAML bindings dt-bindings: dvfs: Use MediaTek CPUFREQ HW as an example cpufreq: blocklist Qualcomm sc8280xp and sa8540p in cpufreq-dt-platdev cpufreq: qcom-hw: Add support for per-core-dcvs cpufreq: CPPC: Fix performance/frequency conversion cpufreq: Add i.MX7S to cpufreq-dt-platdev blocklist ARM: dts: imx7s: Define operating points table for cpufreq cpufreq: qcom-cpufreq-nvmem: fix reading of PVS Valid fuse cpufreq: replace cpumask_weight with cpumask_empty where appropriate
2022-03-22bnx2x: truncate value to original sizingBill Wendling
The original behavior was to print out unsigned short or unsigned char values. The change in commit d65aea8e8298 ("bnx2x: use correct format characters") prints out the whole value if not truncated. So truncate the value to an unsigned {short|char} to retain the original behavior. Fixes: d65aea8e8298 ("bnx2x: use correct format characters") Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220321023155.106066-1-morbo@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-22net: wwan: qcom_bam_dmux: fix wrong pointer passed to IS_ERR()Yang Yingliang
It should check dmux->tx after calling dma_request_chan(). Fixes: 21a0ffd9b38c ("net: wwan: Add Qualcomm BAM-DMUX WWAN network driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Stephan Gerhold <stephan@gerhold.net> Link: https://lore.kernel.org/r/20220319032450.3288224-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>